Fitting Multiple Cameras into a Frame

Fitting Multiple Cameras into a Frame

·

9 min read

This post is going to be a spiritual successor to my first ever blog post, which was on how I have been structuring the rendering of a frame. It is also the direct continuation of my previous one on game cameras.

Debug Camera

When discussing cameras within game engines, there is one thing that should always come to mind: the debug camera. This camera is free from any game entity/object attachments and allows for free movement around a game world. It is vital for creating anything within a 3D space as without it you wouldn’t be able to see what you were doing.

In my engine I have a split between debug only and release only code that is achieved through using pre-processor commands, where each flag, debug or release, is set in the build settings. What this looks like is shown below:

#ifdef _DEBUG_BUILD
    // Code that is only include in a build in debug mode
#endif

#ifdef _RELEASE_BUILD
    // Code that is only include in a build in release mode
#endif

Using these flags I only create and process a debug camera in debug modes. Whereas in release mode all of the memory and processing needed for a debug camera is stripped out. This does mean that the memory usage and performance metrics are different between each build state, so testing in the right build state is important to consider when trying to optimize.

In-Game Cameras

For actual game-play, and in-game rendering, using the debug camera is obviously not an option. Instead other cameras are needed to be placed into the game world. How I have achieved this is through having a ‘Camera Component‘ which holds the state for a camera it is responsible for, and hooks into the ‘Camera Collection’. Every time a camera component is enabled it passes its internal camera over to the collection for inclusion in the rendering flow.

Camera Collection
The camera collection is just a place to hold the list of cameras, along with being a holding point for other minor elements - such as a buffer of the per-frame camera data.

By handling cameras this way the systems are able to scale up the rendering load to be exactly what is required by the active scene and not send any render commands that are not needed. So, for example, if there are no cameras then no rendering is going to take place. Swapping to this multi-camera system from the old single camera approach was both straight forwards and complex at the same time.

The simple part was that it looked like simply just doing what a single camera does, but multiple times. This is sort of true. Imagine two flows, one where each camera is run through, and the whole render flow is performed for each camera. And another where each render flow section is run through, with all cameras being taken into consideration. Both flows are very similar yet vastly different.

Below is the first flow described, with the whole render flow shown on the right.

Frame flow
The flow shown is just the current state of my engine’s flow, it is missing some parts that will be added later, and the order may change.

Here is the second flow described, where each camera is processed per flow section:

The result of each flow is the same, but how they are implemented is vastly different. The performance may also potentially be different, depending on how the semaphores are added between each step the first approach may result in the second one due to waits and re-ordering done by the GPU. Or, if semaphores are added between each render flow step for each and every camera, then performance will fall through the floor.

For my engine I have added the second approach as to me it looks like it would result in vastly better performance and scalability. However I haven’t tested the other approach, so if you are considering adding one or the other then you should do your own performance testing.

As an example of how the multi-camera looping has been added into each part of the render flow, here is a code snippet of generating screen space AO:

CommandBuffer* VulkanRenderPipeline::HandleAO(RenderedFrameFlow& renderFlow, QueueToSubmit& queue)
{
    CommandBuffer* AOBuffer = AO::GetCommandBuffer(mThisFrameRenderID);
    if (!AOBuffer) 
        ASSERTFAIL("Not a valid command buffer");

    AOBuffer->StartRecording();

    // Grab all active cameras
    std::vector<Camera*> activeCameras = mCameraCollection.GetCameras();

#ifdef _DEBUG_BUILD
    activeCameras.push_back(mCameraCollection.GetDebugCamera());
#endif

    // Go through each camera
    for (Camera* camera : activeCameras)
    {
        if (!camera || !camera->GetInternalBuffers().GetDepthBuffer() || !camera->GetInternalBuffers().GetAOBuffer())
            continue;

        // Generate screen space AO using AMD fidelity FX
        glm::mat4                projectionMatGLM = camera->GetProjectionMatrix();
        Maths::Matrix::Matrix4X4 projectionMat    = Maths::Matrix::Matrix4X4(projectionMatGLM);

        // For this process to work we need to transition some textures to being the right format
        camera->GetInternalBuffers().GetDepthBuffer()->TransitionToImageLayout(VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, AOBuffer); // Depth to being readable

        glm::mat4                normalsToViewMatrix_glm = camera->GetViewMatrix();
        Maths::Matrix::Matrix4X4 normalsToViewMatrix     = Maths::Matrix::Matrix4X4(normalsToViewMatrix_glm);

        camera->GetAOGenerator().GenerateAO(AOBuffer, &projectionMat[0], &normalsToViewMatrix[0], camera->GetInternalBuffers().GetAOBuffer());
    }

    AOBuffer->StopRecording();

    queue = QueueToSubmit::GRAPHICS;

    return AOBuffer;
}

The important part here is that all of the AO generation commands are recorded into the same command buffer. If they were in different ones then there may be unneeded waiting between each camera’s unique AO generation. (Technically they could be in separate secondary command buffers with no issues, but that is besides the point)

Multiple Camera Considerations

In addition to making sure that all cameras are processed at the same time for each section, there are some other considerations needing to be taken into account for multi-cameras. The first one is that each camera has their own view of the world, which means that they need their own unique view and projection matrices. And, because of this, this data needs to be passed into shaders somehow. Constant buffers could be used, but you would run out of memory in the constant buffer fairly fast. Instead, to solve this, I have a UBO (uniform buffer object) for each camera, holding their data.

Another consideration is that not all of the cameras will be rendered directly to the screen. In fact, it is likely that only one camera will be drawing directly to the screen. The rest of the cameras will doing what is known as render to texture, which is rendering things like mini-maps and other off-screen renderings of the world - such as in-game news cameras, or CCTV. What this means is that the cameras cannot be directly piped to the swap-chain image and will need their own final render target. This is not overly problematic, but can be a bit of a pain from a texture format, and accessing standpoint.

Decoupling Internal Render Size from Window Size

One massive benefit to going through this camera rework has been that the resolution of the camera has been decoupled from the screen’s resolution. What this allows for is shown in this short video below where I am changing the resolution of a camera in real-time.

The output is slightly fuzzy as it is not using nearest neighbor sampling when being shown to the screen. This is something that I may change later and would just involve changing which sampler used.

Factors When Resizing Output

In order to resize the camera buffers like is shown in the above video, a lot of work has to be completed. This work includes but is not limited to:

  • Deleting and Recreating Images Involved.

  • Updating Descriptor Sets.

  • Recreating (or creating new) frame-buffers.

These steps may not sound like major problems, but if they are not done fully in any case then it is very likely to result in a crash (due to old images being referenced, or an invalid size being used). Another important side point is that if you are using frames in flight (covered here: https://vulkan-tutorial.com/Drawing_a_triangle/Drawing/Frames_in_flight), then the old frame-buffers and descriptor sets will be being referenced by active command buffers. The simplest way to solve this is to wait for the GPU to be idle, and as such have no more work to complete - meaning that what was pending is no longer pending:

vkDeviceWaitIdle(logicalDevice);

Viewport Culling

When rendering graphics to the screen, a lot of processes have to be completed. These processes take time to do. And what would really suck is if we went through all of these processes to render something that makes no impact on the visual output of the frame. In these cases it is beneficial to use culling, which is where models (and potentially other things) are removed from the list of objects that needs to be drawn this frame. Where the name ‘Viewport’ comes from is referring to the camera’s frustum. Viewport culling is the act of first running through the objects in a scene and finding which ones do not fall within the camera’s view.

This can be done a number of ways, some more complex than others, and some more time consuming than others. A common way is to have objects that can be culled have a bounding box around them (usually axis-aligned due to faster calculations), which is then mathematically checked against the camera’s frustum. If none of the box is within view, then it is culled. Here is a breakdown of the maths behind this: https://iquilezles.org/articles/frustumcorrect/

There are more advanced additional forms of culling, such as occlusion culling, but Ill be saving those for another blog.

Perspective vs Orthographic Projection

One final point that I am going to discuss in this post is the two standard projection options that are available for cameras: Perspective and Orthographic.

Projection matrices are a part of the transformation process used to render a 3D model to a 2D image. From my previous post you may remember this image showing the flow:

As shown above the projection matrix converts from view space to clip space. What this means is that, given all models being relative to the camera, the projection matrix converts these vertices from 3D to 2D. Essentially squashing everything in the camera’s frustum down into one plane. This 2D view has the vertices stored in what’s known as Normalised Device Coordinates (NDC). These are values from -1 to 1 in the X and Y axis, with (0, 0) being at the center of the screen. Anything outside of this range is clipped.

Perspective projection is the most common and obvious projection version, with further away objects appearing smaller than closer ones, aiming to mimic reality.

Orthographic projection is another option, with no visual perspective being incorporated. This version is heavily used within CAD (computer aided design), and games with unique graphical effects, such as Monument Valley and FEZ.

The maths behind these matrices is not going to be covered here, but if you are using a maths library in your code then it is very likely that there is a built in function to generate them for you. For example, in glm there are these functions:

glm::ortho(Left, Right, Bottom, Top, NearDistance, FarDistance);

glm::perspective(FOVAsRadians, AspectRatio, NearDistance, FarDistance);

Conclusion

To conclude, this article has covered some of the considerations that need to be handled when swapping from having a single camera in an engine, to having multiple. Along with a quick overview of view/frustum culling, and perspective vs orthographic projection.

The next post is going to be on making an event system within C++, and how you can use it to communicate cross your code without adding unneeded dependencies.

Thanks for reading :D