Intro
This article deals with fast stereo rendering. Stereo rendering is one of the basic building blocks of VR rendering, so it's worth spending some time trying to get it implemented right. One of the things novices mistakenly assume is that doing stereo is simply rendering the shadows and other eye-independant effects once and then rendering the rest of the scene twice. Or in other words, some variation of calling the render function in their game loop twice with different 'cameras'. This is a natural thing to do, but as we are going to see is this article, this is not how you do performant stereo rendering.
Basics
So, the idea here is to get efficient stereo rendering, or in other words, get the engine to render stereo in less than twice the time it takes to render mono. As mentioned above, the most basic and naive implementation of stereo rendering would be something like this:
DoProcessEye(); // Do frustum culling, compile draw command/indirect buffers
DoRenderEyeCommon(); // Render eye-shared shadow maps, and distant geometry with no parallax
for( int i=0; i < 2; i++ ) // stereo
{
DoRenderEye( i ); // Render regular geometry and postpro effects with parallax
}
or written in mode detail:
DoProcessEye();
DoRenderEyeCommon();
for( int i=0; i < 2; i++ )
{
BindRenderTarget( i )
SetViewPort( i );
SetEyeUniforms( i );
for( int j=0; j < numObjects; j++ )
{
BindState( j );
DrawObject( j );
}
}
layout (row_major, binding=1) uniform EyeUniforms
{
mat4x4 mMatrix;
mat4x4 mMatrixInverse;
};
The central idea behind rendering stereo efficiently is to reverse the order of the inner and outer loops above, and transform the code into the following:
DoProcessEye();
DoRenderEyeCommon();
BindRenderTarget()
SetViewPorts();
SetEyeUniforms();
for( int j=0; j < numObjects; j++ )
{
BindState( j );
for( int i=0; i < 2; i++ )
{
DrawObject( i, j );
}
}
layout (row_major, binding=1) uniform EyeUniforms
{
struct Eye
{
mat4x4 mMatrix;
mat4x4 mMatrixInverse;
}mEye[2];
mat4x4 mMatrix;
mat4x4 mMatrixInverse;
};
The other thing to note is that now the eye information uniform actually contains both the left and right eye transformation matrices. This, together with the fact that we uploaded both eye's viewport descriptions (check glViewportIndexed to see how to achieve this in OpenGL), will allow us to project the geometry correctly as seen from the left of right eye and route it to the correct viewport in the big left+right compound render target. The per-eye transformation are in the mEye[2] array, and the global viewer position is right after those two. As explained in the basic VR rendering article, use the formers for projection/rasterization purposes and the others for culling, level of detail, etc. Obviously, for this to work all you need to do is to index into the right mEye member based on the left/right index i within your shaders.
There's a few ways to do this. From less efficient to more efficient, some of the ways are:
- Call DrawIndexed() twice, use a uniform to do projection+viewport routing (in the Vertex Shader)
- Call DrawIndexedInstanced() once with instancing of 2 and use gl_InstanceID to do projection+viewport routing (in the Vertex Shader)
- Call DrawIndexed() once with a Geometry Shader doing 2 invocations, and use the invokation id to do projection+viewport routing
- The first option is the less efficient of the three. Beware we are benefiting from all the savings in state changes, but we are still processing all the vertices of each mesh twice and doing 2 render calls per objects.
- With option number two, we are reducing the render calls to only one per instance. Routing is as simple as gl_ViewportIndex = gl_InstanceID. However, we are still running the vertex shader twice per object (meaning, we are reading from the vertex buffer memory twice and we are running the primitive assembly twice).
- Option number three performs only one render call per object as well and runs the vertex shader only once. The duplication and routing happens in a Geometry Shader by specifying the number of invocations to be 2. These two invokations happen in parallel, as far as I know. In OpenGL, you can achieve this by using the qualifier "layout(invocations = 2) in;". Then, viewport routing is done with gl_ViewportIndex = gl_InvocationID.
So, after all the rearchitecturing, the final code looks like this:
DoProcessEye();
DoRenderEyeCommon();
BindRenderTarget()
SetViewPorts();
SetEyeUniforms();
for( int j=0; j < numObjects; j++ )
{
BindState( j );
DrawObject( j ); // either DrawIndexed() + GS
// or DrawIndexedInstanced() + VS
}
layout (row_major, binding=1) uniform EyeUniforms
{
struct Eye
{
mat4x4 mMatrix;
mat4x4 mMatrixInverse;
}mEye[2];
mat4x4 mMatrix;
mat4x4 mMatrixInverse;
};
Conclusions
With a little bit of work it is possible to remove the state changes, render call and vertex processing bottlenecks when doing stereo rendering, leaving only the pixel shading and framebuffer bandwidth bottlenecks. Optimizing those is for another article.