Saturday, June 13, 2009

Advances in Compiler Writing

..or rather, my baby steps towards writing a very basic compiler (^^;>

Having to parse the RenderMan Shading Language (RSL) is so far the most difficult problem in RibTools/RibRender because it's something I just haven't done before in any remote form.

I've got to a decent point so far, where a function like:


vector func2( vector a )
{
 float b = 1 + 2 * 3;
 float c = 1 + 2 + 3;
 c += 14;
 return vector( 1, 2, 3 );
}

..becomes..


function func2
 * $t0 2 3
 + $t1 1 $t0
 mov _9@_b $t1

 + $t0 1 2
 + $t1 $t0 3
 mov _9@_c $t1

 += _9@_c _9@_c 14

 ret

What's working here is:
- Variables being interpreted and getting unique names (mangled with some "block of code" identifier)
- Expressions are evaluated following the proper operator's order: multiplication before addition
- Temporary registers are utilized for the basic operations into which the original expression is decomposed

..the return value is however completely ignored (!), and the operators are not really replaced by the actual assembly instructions such as add and mul.
But, those are details.. ..actually, getting the return value is not such a detail ! And I'm working on that right now but, before that, I need to get function calls and pseudo-constructors such as vector( x, y, z ) working.

The thing is that I've been indiscriminately interpreting all commas as comma operators and been wondering how that fits with the order by which parameters are passed to function calls ..being those also separated by commas.
The answer is that those commas are really just a different kind of commas 8)
Perhaps it's obvious, but reading about all this formalism and language parsing, one kind of hopes that things like that have a certain degree of consistency that goes beyond the common sense acquired in 20 years of C programming 8)
This is not to say that there are any formal ambiguities.. it's just that the context is necessary to resolve the usage of such a basic symbol, and that complicates things a bit.

I guess that if I had to design C (C-like languages) from scratch, I'd probably choose different symbols for the comma operator and the comma separator. But would that be good for the human programmer or just for me to make compiler writing a simpler task ? ummmummm !

By the way, I also added loading of immediate values a compare and jump operator in the virtual machine !! Which required me also to handle labels in the assembly compiler.. which by the way, it's so much nicer to write than a high-level compiler 8) No trees, no deductions !!

See the before and after (I suggest maximizing the browser) the immediate load and compare & jump.

Another thing I added is uniform registers. Shaders work on N samples at once, but to effectively use SIMD, jumps have to happen all at once, so, looping variables ought to be uniform (or equal for all samples).

The if then conditional stuff needs to be handled differently, and that's where the masked SIMD stuff like in the LRB opcodes come in handy... but that's a whole different topic for another time !

woooooo