Structuring a Render Engine Frame

Introduction

Hi, and welcome to the first blog in a series I am going to be running about developing different game engine features. These blogs are going to outline the work I have completed on certain areas of my custom engine, which I have been working on for the past couple of years. The engine has come a long way in certain areas, specifically the graphics rendering, but is also under developed in some other places, such as in audio playing. It is now approaching the state at which I have a large amount of confidence in it, and feel like sharing the journey of adding features would be beneficial, or at the least entertaining, to someone interested in game engine development.

But first I should outline who I am. I’m Brandon, I have been using C++ for the past 8+ years and have always found games which push their graphics to a new level to be really impressive, with the Dark Souls and TrackMania series specifically coming to mind. Its games like these, along with older more classic games, such as Doom and the 3D Mario series, that captured and kept my focus on games. Which eventually led into the world of graphics and engine development.

A ‘Frame’?

The aim of this blog is to outline what a ‘frame‘ is, in a game/render engine context, and how I have gone about creating one within my own custom game engine. So, firstly, what is a frame?

A frame, simply put, is a single image that the game sends to the screen to be presented to the player. Like how a film (or movie) uses many different images to give the illusion of movement, but slightly different due to the game having to determine what is in, and render, the frame in real-time. Whereas a film is simply displaying the next image in a sequence.

So that’s what a frame is, but how are they made and shown to the screen? Well, to explain that, anyone familiar with games may have heard the idea of a ‘game loop‘, or an ‘update loop‘. This is a continuous updating of everything (relevant) in the game, and facilitates objects moving around the world. How this update works is through a loop that looks like the code snippet below:

bool playing = true;
while(playing)
{   
    // Update the game,
    // where deltaTime is the amount of time passed since last frame
    Update(deltaTime);

    playing = CheckHasQuit();
}

What the code snippet shows is a loop that keeps looping until the player has quit the game, at which point ‘playing‘ becomes false, and the loop ends. It would be very easy to insert a function call to render our frame to the screen within the game loop. The following code snippet gives an idea as to how this would look:

bool playing = true;
while(playing)
{   
    Update(deltaTime);
    Render();
    playing = CheckHasQuit();
}

Now the game loop updates and then renders the objects to the screen before wrapping around for the next frame. This would work, but couples the updating logic to the drawing logic. This can be fine for very simple games, or games with very simple visuals, however both the Update() and Render() logic of a frame will likely take a while to calculate - a while in this context meaning a couple of milliseconds. When bearing in mind that for a program to run at 60 frames per second, each loop needs to take less than 16ms. Which sounds like a lot, but 60 is becoming a less desirable target than say 144 FPS or 240 FPS (due to higher refresh rate monitors becoming more popular than in prior decades). For 144 FPS each update has only 6.9ms, and 4.1ms for 240 FPS. Which, when combining both the update, and the render logic, the target time starts to become more difficult to hit.

The ideal solution to this is to add another thread, one for updating and one for rendering. For example, take a look at the snippet below which introduces the ‘render thread’:

// ---------- Main thread ---------- //
bool playing = true;
while(playing)
{
    Update(deltaTime);
    playing = CheckHasQuit();
}

// ---------- Render thread ---------- //
while(playing)
{
    RenderFrame();
    PresentFrameToScreen()
}

How the threads are created is based on what platform you are writing this for. And how you get the playing bool from the main thread to the render thread is up to you - they could even be thread specific, but handling that becomes more complex. Now each thread has its own 4.1ms to use before either the framerate or tickrate of the game drops below 240hz, instead of them being combined together.

Current Contents of a Single Frame

Now that the idea behind the continuous updating of a game has been established: what actually happens in the ‘RenderFrame()‘ function?

The answer to that is purely game, platform, and API dependent. What this means is that if a game was written in Vulkan for Windows, then the render function will look very different to a game that was written using DirectX for XBOX. And that’s assuming that the exact same rendering functionality was being implemented. But many games have different requirements. For example, some games have heavy usage of fog, and will need to add that logic, but other games have no fog whatsoever, and don’t have any need for handling that.

But, to be specific to my engine and the functionality it currently has, it is made using Vulkan as the API, and is only targeting windows currently. As for the makeup of a frame, this is a high level overview of what it does:

Handle getting next frame.
Compute Shader dispatching.
Streamed textures uploading.
Deferred rendering of opaque models.
Ambient Occlusion generation.
Collating deferred buffers together with lights.
Transparent model rendering.
Order Independent Transparency.
Post Processing.

How this looks in code is something along the lines of:

void RenderFrame()
{
    // Wait for the GPU to tell us that we can provide more render commands for this image
    WaitOnInFlightFence();
    GetNextSwapChainImage();         // Get the next GPU image to draw to    
    HandleComputeShaderDispatches(); // Send off any compute shader calls we need
    SubmitStreamedImageCommands();   // Send off the image streaming commands

    // Render opaque UI and models to deferred buffers     
    RenderOpaqueUI();
    RenderOpaqueModels();
    GenerateAO();                    // Generate ambient occlusion based off deferred buffers
    CollateDeferredBuffers();        // Introduce lights into the drawing of opaque models 

    MakeCopyOfDepthBuffer();         // So that the transparent models can query depth info
    RenderTransparentModels();
    HandleOIT();

    PostProcessing();

    RenderImGuiDebugMenus();        // Add debug menus over the top of the frame
}

Almost all of these areas are big enough to write a blog post on by themselves, and there is some more boilerplate code that has been omitted. But on the whole this is the flow that currently happens. Here is a very basic example of the visuals that are able to be made with this logic so far:

Swapping to a More Versatile Approach

The flow shown in the previous code snippet works, and is a straightforward and easy to follow flow. However it has a downside that stems from how Vulkan orders its rendering commands. How Vulkan ensures that certain events happen before others is through the use of something called a Semaphore. This is essentially a gate that can be waited on, and signaled. For example, the transparent models cannot be rendered until the opaque ones have finished drawing. Therefore the transparent model’s render commands will wait on a semaphore that signals when the opaque models have finished drawing. This process works great but gets very awkward when trying to reorder a frame, or have certain parts not happen under certain situations.

For example, say that there were no transparent models on the screen, then there is no need to run any of the transparency code, as we know that there is going to be no difference in what the player sees on their screen anyway. But skipping this is not straight forward as there are commands later on that are waiting on the transparency semaphores’ signal.

So, to make this flow more flexible and versatile, I have been working on making a different approach for structuring a frame. The logic behind this is to have a bunch of stages, which can either be skipped or run, and which have a list of other stages to wait on. And then the render function itself just calls into a stage container, which then loops through them in order and determines what should be waiting on what.

As a further explanation of the previous paragraph, here is a flowchart of the whole frame flow, with nothing being skipped:

Where each black arrow represents a waiting dependency on the previous stage. Now, if the transparency sections were to be skipped, due to there being no transparent meshes on screen, then the flow would look like this:

Where each red space is a skipped section, and the red arrows are the search process for which stage to wait on, keeping looping back until they find a non-skipped stage.

Here is an example of how the frame is structured in code now:

void VulkanRenderPipeline::SetupRenderFlow()
{
    // Layout here is: Type, function to call, stages to wait on
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::IMAGE_AVALIABLE, {}, {});

    // Render opaque UI
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::OPAQUE_UI_RENDER, &HandleOpaqueUIDrawing, { RenderedFrameSegmentType::IMAGE_AVALIABLE });

    // Render Opaque models flow
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::OPAQUE_MODELS_RENDER, &HandleModelDrawing_Opaque, { RenderedFrameSegmentType::OPAQUE_UI_RENDER });
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::AO_GENERATION, &HandleAO, { RenderedFrameSegmentType::OPAQUE_MODELS_RENDER });
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::DEFERRED_BUFFERS_COLLATED, &CollateDeferredBuffersInToFinalImage, { RenderedFrameSegmentType::AO_GENERATION });

        // Alternate flow for above
        mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::OPAQUE_MODELS_ALTERNATE_FLOW, &NoOpaqueMeshesAlternateFlow, { RenderedFrameSegmentType::OPAQUE_UI_RENDER });

    // Transparent mesh normal flow
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::COPY_ACROSS_DEPTH_BUFFER, &CopyAcrossDepthBuffer, { RenderedFrameSegmentType::DEFERRED_BUFFERS_COLLATED, Rendering::RenderedFrameSegmentType::OPAQUE_MODELS_ALTERNATE_FLOW });
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::TRANSPARENT_MODELS_RENDER, &HandleModelDrawing_Transparent, { RenderedFrameSegmentType::COPY_ACROSS_DEPTH_BUFFER });
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::OIT_COLLATE, &ApplyOIT, { RenderedFrameSegmentType::TRANSPARENT_MODELS_RENDER });

        // Alternate flow for above
        mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::TRANSPARENT_MODELS_ALTERNATE_FLOW, &NoTransparentMeshesAlternateFlow, { RenderedFrameSegmentType::DEFERRED_BUFFERS_COLLATED, RenderedFrameSegmentType::OPAQUE_MODELS_ALTERNATE_FLOW });

    // Post processing
    mFrameFlow.InitFrameSegment(RenderedFrameSegmentType::POST_PROCESSING, &ApplyPostProcessing, { RenderedFrameSegmentType::OIT_COLLATE, RenderedFrameSegmentType::TRANSPARENT_MODELS_ALTERNATE_FLOW });
}

void RenderFrame()
{
    mFrameFlow.RenderFrame();
}

Where ‘InitFrameSegment()‘ initialises the given segment with what function callback runs the render commands, and which stages to wait on. The key part of this system is that in each of the callbacks, stages can be marked as skipped or not. For example, in the opaque render callback there is a check for meshes on the screen:

if (!AnyVisibleTransparentMeshes())
{
    // If no visible transparent meshes then we can skip these segments
    renderFlow.FindSegmentOfType(RenderedFrameSegmentType::COPY_ACROSS_DEPTH_BUFFER)->SetToSkipThisSegment(true);
    renderFlow.FindSegmentOfType(RenderedFrameSegmentType::TRANSPARENT_MODELS_RENDER)->SetToSkipThisSegment(true);
    renderFlow.FindSegmentOfType(RenderedFrameSegmentType::OIT_COLLATE)->SetToSkipThisSegment(true);
}
else
{
    renderFlow.FindSegmentOfType(RenderedFrameSegmentType::TRANSPARENT_MODELS_ALTERNATE_FLOW)->SetToSkipThisSegment(true);
}

if (!AnyVisibleOpaqueMeshes())
{
    // If no visible opaque meshes then we can skip this
    renderFlow.FindSegmentOfType(RenderedFrameSegmentType::AO_GENERATION)->SetToSkipThisSegment(true);
    renderFlow.FindSegmentOfType(RenderedFrameSegmentType::DEFERRED_BUFFERS_COLLATED)->SetToSkipThisSegment(true);
}
else
{
    renderFlow.FindSegmentOfType(RenderedFrameSegmentType::OPAQUE_MODELS_ALTERNATE_FLOW)->SetToSkipThisSegment(true);
}

And then when the frame is actually rendered, the stages to wait on are determined by seeing if any of the stages marked to wait for, have been skipped. If so, then go to the ones that the skipped one was to wait for, and wait on them instead.

Future Improvements

There are some potential improvements that can be made from the approach outlined, namely:

Making the frame structure less linear.
Multi-threading the render thread.
Ensuring no/a small performance hit when calculating which stages to wait on.
Expanding to multiple render command functions per step.

Conclusion

That’s all there is for this blog post. To summarise, this blog has covered what a ‘frame‘ is in the context of a game engine; how I originally went about implementing one into my engine; and then the changes that made it more flexible, and easier to expand on in the future. Each area I have talked about throughout could have been expanded into much more detail, but I didn’t want this post to go on for too long. A point that is worth mentioning is that the code snippets have been simplified from the actual code to make it more readable, and to have it apply to more implementations than my specific one. If there are any questions about anything then feel free to ask :D

For the next blog I am going to be covering how I added order independent transparency into the engine.

Thanks for reading.

Structuring of a Frame