As some may have noticed, this blog also shows the RSS feed of the RibTools project.. commits comments say little but are certainly more frequent than blog posts.
Twitter to me also seems like version control commit comments. I sort of use it that way, makes sense. I also like the fact that unlike instant messengers it doesn't have to be instantaneous, so it's not as distracting (though it's public by default 8)
Anyway, as of the commit #123, RibRender supports Larrabee native instructions using the public native instructions prototype include file.
I use the LRB native instructions (LRBni) very much like SSE instructions.. in fact they are often interchangeable !
The key to a simple port was first writing this VecN class template that defines a statically-sized vector.
When compiling for SSE, VecN has a template specialization for size 4 (4 floats) and when compiling for LRBni, VecN has a specialization for 16 floats.
The math library also defines a VecSIMDf to be either 4-floats or 16-floats depending on what is "native".
The renderer/shader portion of the code then will simply use VecSIMDf, and process data in chunks of appropriate size.
A VecSIMDf is actually seen as a scalar, and a Vec3xSIMDf is what one would consider a 3D vector.
In fact, the shader has its types defined as such:
Right now I can just change a define and decide to compile either plain C++ (thought the compiler may add SSE instructions of its own), SSE or LRNni.
I did a few tests and, unsurprisingly, SSE is the fastest by far.
The test is rendering a teapot at full screen (1600x1200 minus my right side task bar), and here are the results.
|Test Type||Seconds||VecSIMDf Format|
|NO-SSE 1||8.51||plain float v|
|NO-SSE 4||7.735||plain float v|
|NO-SSE 16||11.845||plain float v|
|SSE||1.355||128 bit register|
|LRBni-SSE||21.705||512 bit reg (simulated with SSE)|
|LRBni-C code||25.045||512 bit reg (simulated with plain C++)|
..ummummm still more places can be optimized. But for now I should switch to work on light shaders.