Monday, July 13, 2009

compile compile compile

I've been procrastinating a lot about the shader compiler project.

I seem to always find some excuse to turn on the TV, watch some web site, etc. I keep looking for distractions because the compiler thing is really not much fun !

I like computers because they are very linear, logic, unambiguous.. but high level programming languages aren't like that.
Sure, they are all well defined syntactically, but they are also very much a human interface.
What makes an high level language so powerful is also what makes it painful to implement 8)

Currently, I do most of my work around trees.. or rather the AST (Abstract Syntax Tree) that comes out of the parser.

The nice thing about functions in a shader is that they they get inlined. This seems to be the general rule, and it's what I decided to go for.
Inlining can bring to larger code, but simplifies things a lot as one doesn't need to keep a stack with values to save and restore before and after function calls.
I think that RenderMan SL inlines because it specifically does not support recursive functions.. which would require a stack system.

About working with trees, here is a simple example:

c = a + b
becomes

=
 c
 +
  a
  b
..where indentation determines parenting.
So, "c" and "+" are both child of "=". This makes it easy to then convert even complex operations utilizing basic instructions.

The assembly generator would scan the tree, going down to the deepest operator and working its way up:

  ;+ (tmp)
  ; a
  ; b
  add tmp, a, b

  ;=
  ; c
  ; + (tmp)
  mov c, tmp
..two operators, two instructions.

Here is a what happens with a function being called by a shader:

float dude()
{
  float  c = 2;
  return c + 1;
}

surface test5()
{
  Oi = dude();
}
The intermediate tree looks like this:

// "dude" is unused and kept in the tree only to use as a base for
// inlining
float
dude
(
  {
    float
    =
      c
      2
    return
    +
      c
      1

// "test5" is the shader, where all code is really generated
surface
test5
(
  {
    // this is "dude" being inlined
    (
      {
        float
        =
          c
          2
        =    // assignment derived from "return"
          Oi // destination for the return value
          +
            c
            1
"dude" gets inlined, and
return c + 1
becomes
Oi = c + 1

..and the resulting pseudo-asm is..


surface test5
  mov $s0 2     // c = 2
  add $s1 $s0 1 // tmp = c + 1
  mov Oi $s1    // Oi = tmp
  ret

..all but global variables (Oi, in RSL is a global and an output variable), become registers.
I prefix registers with $ and "s" in the name stands for scalar, as opposed to vector or matrix registers.

Still, there are many complications. Up-conversion is one thing that bugs me. That is, how a scalar can become a vector:

vector a = 1;
..this is very practical, but one more complication. I somewhat implemented this, but there is going to be more about it, I'm sure.

Another issue is the built in functions. For example the function "distance":

float a = distance( c, d );
For these I started building some definitions that get automatically included. So, "distance" gets its own defintion.

float distance( vector c, vector d )
{
 float tmp;
 __distance( tmp, c, d );
 return tmp;
}
..here the function "distance" explicitly calls the intrinsic "__distance". The intrinsic has a 1:1 correspondence to the asm/VM opcode and it should be easy to produce asm from that.

So..

__distance( tmp, c, d )
..should become..

__distance $s0, $v0, $v1

Intrinsics are not in yet, and they'll be next, once I'm done with function calls.. almost there !

Then conditionals should be interesting. I've already implemented some conditionals.. but things get complicated with shaders, because of the concept of varying and uniform types.. another topic for the future !

zzzzzzzzzzzzz

2 comments:

  1. Interesting concepts. I remember implementing conditionals was tricky when I was doing graph based shaders. Generating shader code from a graph of nodes where each node was a HLSL/Cg intrinsic. Output of one node becomes input to another node. Conditional nodes take as an input the evaluation of the condition but they dont really have a predefined data output, since blocks of code in each branch can be of arbitrary length and work on any number of variables.... I forgot how I eventually solved it, but remember it was difficult!

    ReplyDelete
  2. The conditional node thing seems very much like the return value of a function thing.
    It seems that all these problems come down to trees and recursion, where one has to think backwards 8)

    About conditionals in shader virtual machines instead.. the added complexity there is that most variables tend to be evaluated in parallel several times.. using vector instructions that are often 16 or 32 wide. The VM itself may run the same shader code for hundreds of samples.

    Conditionals imply jumps for selected samples (a pixel, a vertex..), and this breaks the parallelism. So, instead of jumping, one forces the same flow for everything, but masks out those operations aren't supposed to happen for certain samples.

    When instead a variable is "uniform" (same value for all samples), an actual jump can be carried because there is no ambiguity.

    I guess this is why Larrabee instruction set has those 16 bit mask registers, where each bit selects whether any of the 16 floats (or lanes) are affected by a certain operation.

    ...parallelismmmmm !!!

    ReplyDelete