Deferred Material Rendering Systems
Details of how I created a flexible and efficient deferred material rendering system for my 3D game engine.
Introduction
This is the third post I am making covering making my custom games engine, which means three weeks have passed since the first one! Time really does fly doesn’t it. For anyone who has not read the prior two blogs: I am a games programmer with almost 9 years of C++ experience, who is undertaking the task of creating an entirely custom games engine. The goal of these blog posts are to solidify my understanding of the work I am doing, and to share the lessons and progression with a wider audience, who may find it useful and entertaining. Links to my previous posts can be found here.
So, deferred rendering?
Forward Rendering
Before I can fully explain what deferred rendering is, I need to explain forward rendering. Forward rendering is what most people would imagine when thinking about drawing something to the screen. You have your geometry, textures, and shaders - and you use them together to get a final colour, which is then output to a texture which is shown to the screen. All of the lights are run through for each model rendered, and the pixel colour is calculated regardless of if the new pixel is going to be overriding a previously calculated one, due to it being closer to the camera. This system works and is fairly straightforwards to implement, but has some major drawbacks - the biggest of which is known as overdraw. Overdraw is where multiple different pixel colours are calculated for the same pixel, where one replaces the other and makes calculating the old one pointless. Or in other words, colouring the same pixel multiple times. This is unavoidable in certain areas, such as when rendering transparent objects. But for rendering opaque models, this is to be avoided as much as possible as it is just wasted work.
This is where deferred rendering steps in. This approach has all lighting/shading of surfaces being ‘deferred‘ to a later stage, splitting it into two different parts - the G-buffer generation stage, and the lighting stage. What this allows for is using very basic shaders to run through each model first, to determine the closest fragment from the camera’s perspective. During this stage overdraw still happens, but is a lot less of an issue as the pixel’s results that are being overridden are not expensive to calculate. Then, now that we have the data - referring to colour, normal, metallic-roughness, etc - for the closest pixels to the camera, the lights are introduced. At this stage we can be fully confident that there will be no overdraw, at least until we hit the transparent models. What this looks like in practice is shown in the next section.
Engine Implementation
Below is the screenshot that I am going to be breaking down:
It has in it two different spheres on pillars, a golden torus shape, some cubes, and a skybox. Each is using different materials, textures, and shaders. Here is the ordering of the relevant frame flow:
Cull models to the camera’s view.
Use simple shaders to create deferred buffers.
Go through each unique material to be rendered using the gathered buffers.
- Using the stencil buffer, render the complex lighting for each pixel of this material type.
Render transparent models.
Combine order independent transparency buffers into the colour buffers.
HDR (future work).
For this post only the second and third point (including the sub-point) is relevant. What that looks like is first creating the needed buffer stores for:
- Colour/diffuse:
- Normals:
- Metallic-roughness:
- Texture Coordinates (needed for if the lighting calculations need to map a texture onto the model after-the-fact):
- World position (view position could alternatively be used, depending on how you do your lighting calculations):
- Depth (adjusted to a range where the objects are visible):
- Stencil (also adjusted)
An emissive colour buffer is also output, but as the scene has no emissive elements I have omitted it here. A screen space ambient occlusion pass is run after these buffers have been created:
Now that these buffers have been generated, they can be can combined together with the relevant shaders and lighting information to get the shaded colour for the pixel. But how do we make sure that only the pixels for the current material are being shaded? For example, the spheres need to be drawn differently to the cubes, which have the same inputs. This is where the stencil buffer I mentioned earlier comes into play.
Every time one of the basic shader versions used for the G-buffer pass are run, the material ID is written out to a stencil buffer. This is visible in the stencil buffer above by the spheres and torus having the same value (the one for PBR rendering), and the cubes having a different ID (the one for colour passthrough). What this allows for is the program to set which IDs should be skipped for each full-screen render, and to then cycle through all IDs on screen. In terms of render commands this pass looks like:
Where each vkCmdDrawIndexed line is a fullscreen render. And where the stencil state for each full-screen render is shown below (with the reference value being set to each material ID in turn):
How this actually omits pixels through the stencil test is fairly complex, but as a quick rundown it uses a comparison equation, such as equals or less than, along with a mask, and a reference value. It first determines if the test passes or fails. If it passes then the input values are combined together to determine the value to write out to the buffer. For a more in-depth look you can read this link, which is to the stencil buffer breakdown in the vulkan spec: https://registry.khronos.org/vulkan/specs/latest/html/vkspec.html#fragops-stencil. I intend to do a full blog post specifically on the stencil buffer at some point as I feel that it gets talked about a lot less than it should. If one exists when you are reading this then it will be linked here.
To remove pointless full-screen draws for materials that are not on the screen - as all of the pixels will be culled - the engine keeps track of which IDs have been drawn CPU side using a very basic container:
struct MaterialStencilFlagContainer final
{
void ResetFlagBindings();
void SetMaterialBeingUsed(Generated::Materials_Fragment type);
// For being set each frame to determine which materials are actually being shown on the screen
std::vector<Generated::Materials_Fragment> mStencilFlagBindings;
};
Where each function is this:
void MaterialStencilFlagContainer::ResetFlagBindings()
{
mStencilFlagBindings.clear();
}
void MaterialStencilFlagContainer::SetMaterialBeingUsed(Generated::Materials_Fragment type)
{
for (Generated::Materials_Fragment check : mStencilFlagBindings)
{
if (check == type)
return;
}
mStencilFlagBindings.push_back(type);
}
And in the opaque render loop this is present:
// Set the lighting flag value used in the stencil buffer for this material
stencilFlagContainer.SetMaterialBeingUsed(materialID);
And then the full-screen renders reference this store to only run through the IDs actually on screen.
Generated::Materials_Fragment
Conclusion
To conclude, this blog has covered what forward rendering is, and then explained deferred rendering, as well as why you would want to render your scene using that approach. It has then broken down an example scene render from my engine’s current state, and shown the internal buffers that are generated. Then to finish, I have covered an optimization created to reduce unneeded draw calls during the full-screen pass stage.
Thank you for reading :D
Next weeks post is going to be on how I have written my own custom JSON loader.
If you enjoyed this post you can subscribe to the mailing list to be notified whenever I post a new blog. Or you could read one of the other posts I have done - one of which is on how a frame is generated within my engine, and another on order independent transparency.