Friday, November 27, 2009

One bone at the time

Some decent progress with RibTools.. the light shaders seem to work. See the nice specular highlights on the airplane model in the picture.

I had a couple of silly problems that threw me off for a few days.
One was the specular highlight. The formula was right, but the specular lobe wasn't right. It turns out, that the standard RenderMan specular power function is actually being multiplied by a factor of 8 or so in PRMan and by some other free renderer that I'm using to check the results (mostly Aqsis).

Another, bigger, problem was getting the light direction right. The "from" and "to" standard shader parameters passed to the light shader as such:

LightSource "distantlight" 2 "from" [ 5 10 -10 ] "to" [ 0 0 0 ] "intensity" [ 0.7 ]

..need to be transformed into the space where the shader is being computed. This is apparently usually the "camera" space and so is that for me, too.

As usual, with these things there are ambiguities. But this time around there was one more potential flaw: the shader compiler 8)
The register allocation was broken. And in a case something like:

a = b - c

..was converted to the VM instruction..

sub.vvv $v6 $v7 $v7

..therefore always producing and "nice" lean 0 value into the destination register ($v6).

Register allocation (or, "register coloring") is a topic on its own which I can't spend time on right now.
Luckily there are no actual hardware limitations, since it's all software. It's my VM and registers are virtual. The limit is in the memory usage not the instruction set.
However, each register can use quite a bit of memory. Varying registers represent all samples being shaded for a certain grid. Practically, this means that a single register could represent 1000 float values.
1K floats (4K bytes) isn't a lot of memory, but with a few of those around, the L1 data cache starts to get trashed for no good reason.
Keeping the number of registers to a minimum is therefore actually critical for performance.

Next, however, I really have to get this micro-polygons sampling going. The term micro-polygon is where the hype is. It's kind of like saying "ray-tracing". Are you doing Ray Tracing or are you doing Micro Polygons ?  ...emmm.
In this case really, it's the REYES pipeline that I'm trying out, shading system included.
It seems to me that those are much more complex and broad topics. While the micro-polygons business is really just about the final steps, where shaded samples are discretized into a grid of pixels.

..still, it will be nice to finally get some anti-aliasing ! ;)
And that should also open the door to motion blur and depth of field (whichever is easier to implement first).

After that, I plan on implementing texture mapping.. wooo !

A running joke has it that I'm travelling through time with my "advances". Apparently I'm still stuck in the 80s !!  ..but as far as I'm concerned, I'm in the 80s in a parallel universe.. not the one about compromising quality for speed (real-time universe) but that about compromising time for quality (the production rendering universe).

..this reminds me of the General Relativity book I've been reading on my Kindle.. ahh !! (BTW, the Kindle 2/International now supports PDF.. too bad I have no time to read gfx papers (^^;)).



  1. what is a .vvv ??? how many bits are in the registers?

  2. I use V for vector, S for scalar and X for string.

    The register names specify more:

    $s1 is a scalar varying, so it's a float array.
    $su1 is a scalar uniform (the U specifies that), so it's just a float.

    The size of the array for the varying registers depends on the size of the grid. A primitive is split into smaller bits and then diced into a grid.
    The maximum grid size that I'm allowing is 2304 samples (comes out of an empirically chosen 48 * 48 samples).

  3. so sub.vvv means:

    'subtract vector from vector and store the result in another vector'?

  4. Looks like you're still developing on XP ... ? What's with the Miami Vice and other photo? 8P

  5. I actually read part of Michael Abrashe's rasterization on the Larrabee article. Seems like a cool technology, but will it actually be useful? The whole advantage over the GPU comes from programmability, but then it is going to suffer the same fate that the cell processor has suffered where people will label it too difficult to develop for.

    But if you dont develop yourself and use some 3rd part library then how is it different from using the GPU?

  6. Ragin,

    Why am I developing on XP ? I'm actually using Vista 64 both at work and at home 8)
    Bu that doesn't really matter.. because I'm not using Direct3D (I am using OpenGL, but only for visual debugging purposes).
    The Miami Vice pics is because I mentioned the 80s (and with all that pinkness it looks extra gay !)
    The other picture is an ice cream place in some food market in Shibuya. Colorful bins -> register coloring ? 8)


    Well.. I won't be saying whether I have any special knowledge about any upcoming chips (nowadays I know less and less). But RibRender can indeed compile using the openly available simulated Larrabee SIMD instructions.
    That was very easy.. because I assumed that the same C++ code would run on the Larrabee as simply as it runs on the CPU.

    In reality, I suspect that eventually there is going to be some sort of memory transfer to deal with. So, there is going to be a CPU side and a Larrabee side.. unless of course Intel is planning to run Windows on their own system.. but that's sounds a bit extreme.

    In general, I think that the winning point of Intel solution is that the Larrabee is basically a bunch of x86.. with a fairly automated caching system.. no special voodoo about nuggets of memory per-chip to worry about.
    ..I think (?) Because in fact I generally don't really care about those details.
    As far as I'm concerned, I'll be happy if OpenMP runs on it. The SIMD part is rather easy for rendering.. unless one is doing ray tracing (I guess).

  7. Kaz I guess we will find out when people start developing for it.

    The issue with the cell is that you do have to worry about the details a lot. And that requires large teams with a lot of engineering time devoted to development. I personally have nothing against that approach. In fact I am all for it.

    But in reality I think most projects dont want to employ a team of 3 Michael Abrashes dedicated fulltime to getting the most out of the hardware.

    Automated caching system still doesnt preclude you from having to access your data in a cache friendly way. I would imaging the caching scheme has to be extremely complex to service so many CPUs, so a penalty for a cache miss must be pretty high.

    Anyway just reading Abrashes article I see some similarity between the cell and the Larrabee as far as development is concerned.

  8. Mr. Paul,

    What are the similarities ? Are you talking about vector instructions or something else ?

    About hardware being more or less "hardwarish", I've seen that people are divided in two types: those that think that the pain of programming the Cell is for a good cause and those that prefer to avoid it.

    I'm not sure where I'd place myself.. I tend to prefer simpler low level stuff (of course 8).. but most of all I like good performance and I appreciate having a clearer idea of how efficient my code is going to be.

    Maybe it's better if the code doesn't run at all if you can't put all in the cache, rather than having to code run anyway, but N times slower 8)
    ..maybe !


  9. Kaz, I guess vector instructions look similar on any type of hardware. In fact people who implement software rasterizers on the cell use the same algorithm that Abrash is talking about.

    What I am concerned about is the development approach. The main advantage of the Larrabee over the GPU is programmability. So does that mean everybody needs to implement their own rasterizer? If thats the case then it looks like you will need a team of dedicated programmers constantly working on the rasterizer to get good results. And that was the main reason why the cell was dubbed as difficult to work on.

    If you dont implement your own rasterizer and use some 3rd party implemented fixed pipeline then why not just implement that pipeline in hardware? In which case we go back a full cycle to the GPU concept.

    If the flexible rasterizer can be easily implemented then Larrabee is a success, but from reading that article it seems pretty difficult.

  10. ummmummm... well, no respectable 3D accelerator would get away without D3D and OpenGL drivers.. so that should be there for those that don't want to write their own rasterizers.

    I certainly could use some added programmability over D3D..
    Writing yet another rasterizer from scratch seems a bit pointless unless one has something special to add to it... though it's always advantageous to have the low level code to optimize the interface to the engine.

    Also, one of the biggest problems would be the shaders.
    Implementing those would mean also dealing directly with eventual texture samplers..
    I don't see too many people writing their own shader compilers..

    Perhaps something could be done by having a system that transparently uses a C/C++ compiler behind the scenes to build a DLL that can be used as a shader.. or something based on LLVM, TinyC.. to simplify the job.
    The main thing is really not to go back to having to hardwire shaders in the engine code. It's important to keep shader writing accessible to the non-engine programmers.

    It's hard to predict what's really going to happen. If more programmability picks up, then perhaps it will just be more work for the 3rd party companies.
    Things have changed a lot, and many more game companies are currently relying on external engines for high level stuff.. so, using low level libs is only going to seem natural.

  11. BTW I just posted a page that describes the shader assembly I've been working on.

    No big revelations, just some basic documentation 8)

  12. hhmm i think we mentioned it before, but there should be a 'kaz' in there somewhere, maybe __kazMain ?

  13. Maybe I should do a search and replace from Rib to Kaz ;)