Introduction

Welcome back to my blog series showing my progress towards creating my own custom game engine using C++ and Vulkan. This process so far has taught me a massive amount, and I have been enjoying every moment of it so far. For anyone who has not read any of my previous posts, Hi. I’m Brandon. I’m a games programmer who has around 9 years of experience working within C++, and my current goal is to make my own custom games engine.

Before starting this endeavor I knew about the stencil buffer, but after going through the process of making a low-level renderer, I have gained a strong appreciation for its functionality. And I feel that it gets talked about a lot less than its twin - the depth buffer.

The Stencil Buffer

Depth Buffer

I’m fairly confident that not everyone who is reading this would have heard of the stencil buffer, so I’m going to start with a high-level overview of what it is by using a more well known example. Introducing… the depth buffer. This texture stores how far away from the camera a pixel is. They commonly look like this:

(Yes this is of a rendering of an eyeball :D)

How the depth buffer gets populated is through, when each model is drawn to the screen, if they are closer to the camera than the previous pixel (pass the depth test), then they override the previously stored depth value. For example, if a cube was rendered in-front of the eye then you would be able to see the depth data for the cube instead of the eye, as it is closer.

The previous statement is correct and also in-correct at the same time. This is because in the context that the image was drawn in, the cube would be visible - that is true. But there are some controls the user has over the depth buffer, and under what conditions data is written out to it. It is possible, for example, to set the depth comparison operation to only pass if the depth value is greater than the one currently stored. In that case the cube wouldn’t be visible in the depth buffer, as it would fail the depth test (which is now the opposite as in the previous example).

How the stencil buffer works is very similar, but you have a lot more control over what is drawn and why. And you can also use it as a much more controllable image mask.

Setting up the Stencil Buffer

Before I can get to the actual equations used by the stencil buffer, I need to explain how you actually set one up. This explanation is going to be for Vulkan as that is what I use most often these days, but most of the information will carry over to other APIs such as OpenGL and DirectX. I will also be giving links back to the Vulkan spec where relevant.

So, in Vulkan, when you create a depth buffer you need to give it a specific format. The possible formats are:

VK_FORMAT_D16_UNORM,
VK_FORMAT_D32_SFLOAT,
VK_FORMAT_X8_D24_UNORM_PACK32,

VK_FORMAT_S8_UINT,
VK_FORMAT_D16_UNORM_S8_UINT,
VK_FORMAT_D24_UNORM_S8_UINT,
VK_FORMAT_D32_SFLOAT_S8_UINT

Taken from: https://registry.khronos.org/vulkan/specs/latest/html/vkspec.html#formats

How these formats are read is by using this key:

D = depth
S = stencil
X = unused
UINT = unsigned int
SFLOAT = signed float
UNORM = unsigned normalised

And the numbers represent bit count. So, for example: VK_FORMAT_D32_SFLOAT means that it is a 32 bit format that holds depth information, and has the data type of a signed float.

What you may have noticed is that the bottom four formats have bits reserved for use by the stencil buffer. This is not an edge case. In all cases (apart from VK_FORMAT_S8_UINT), the stencil buffer is the depth buffer (from a memory perspective).

Pipeline State

Now the only other extra part needing to be setup for the stencil buffer is the pipeline state, which tells the GPU what operations to run for it. This is encapsulated by this data structure: VkPipelineDepthStencilStateCreateInfo

// Provided by VK_VERSION_1_0
typedef struct VkPipelineDepthStencilStateCreateInfo {
    VkStructureType                           sType;
    const void*                               pNext;
    VkPipelineDepthStencilStateCreateFlags    flags;
    VkBool32                                  depthTestEnable;
    VkBool32                                  depthWriteEnable;
    VkCompareOp                               depthCompareOp;
    VkBool32                                  depthBoundsTestEnable;
    VkBool32                                  stencilTestEnable;
    VkStencilOpState                          front;
    VkStencilOpState                          back;
    float                                     minDepthBounds;
    float                                     maxDepthBounds;
} VkPipelineDepthStencilStateCreateInfo;

As can be seen in the code above, the user needs to set if the stencil test is enabled, and give what’s known as VkStencilOpState for both front and back facing polygons.

The VkStencilOpState structure looks like this:

// Provided by VK_VERSION_1_0
typedef struct VkStencilOpState {
    VkStencilOp    failOp;
    VkStencilOp    passOp;
    VkStencilOp    depthFailOp;
    VkCompareOp    compareOp;
    uint32_t       compareMask;
    uint32_t       writeMask;
    uint32_t       reference;
} VkStencilOpState;

And is where we are getting into the meat of the functionality provided.

VkStencilOpState

This structure provides a lot of possible variations for how the stencil buffer performs operations. Some of which can be explained now, and others will need a full breakdown of the internal flow to be explained properly.

So, starting with the ones that don’t need a full breakdown:

 VkStencilOp    failOp;
 VkStencilOp    passOp;
 VkStencilOp    depthFailOp;
 VkCompareOp    compareOp;

Each of these state an operation to be performed under certain situations.

Starting with the comparison operation as it is simpler:

// Provided by VK_VERSION_1_0
typedef enum VkCompareOp {
    VK_COMPARE_OP_NEVER = 0,
    VK_COMPARE_OP_LESS = 1,
    VK_COMPARE_OP_EQUAL = 2,
    VK_COMPARE_OP_LESS_OR_EQUAL = 3,
    VK_COMPARE_OP_GREATER = 4,
    VK_COMPARE_OP_NOT_EQUAL = 5,
    VK_COMPARE_OP_GREATER_OR_EQUAL = 6,
    VK_COMPARE_OP_ALWAYS = 7,
} VkCompareOp;

These comparisons should be fairly logical. Each one maps to their relevant conditional state. For example, VK_COMPARE_OP_LESS means less than, ‘<‘. And VK_COMPARE_OP_GREATER_OR_EQUAL means ‘>=‘. Ones that don’t map, like VK_COMPARE_NEVER, just means what they say - such as never, or always.

What this comparison state determines is when the stencil test passes or not. For example, if the two comparison values (explained below) are equal to each other, but the comparison operation is set to VK_COMPARE_OP_LESS, then the stencil test fails. But if it was set to VK_COMPARE_OP_EQUAL, then it would pass.

Now for the less straightforward other three variables:

// Provided by VK_VERSION_1_0
typedef enum VkStencilOp {
    VK_STENCIL_OP_KEEP = 0,
    VK_STENCIL_OP_ZERO = 1,
    VK_STENCIL_OP_REPLACE = 2,
    VK_STENCIL_OP_INCREMENT_AND_CLAMP = 3,
    VK_STENCIL_OP_DECREMENT_AND_CLAMP = 4,
    VK_STENCIL_OP_INVERT = 5,
    VK_STENCIL_OP_INCREMENT_AND_WRAP = 6,
    VK_STENCIL_OP_DECREMENT_AND_WRAP = 7,
} VkStencilOp;

This structure states what to do with the read in, and calculated, values from the equations explained below. For example, if the state is set to VK_STENCIL_OP_KEEP then the current value stored in the stencil buffer is kept. Whereas if the state was VK_STENCIL_OP_INVERT then the current value stored is bitwise inverted and stored.

Bitwise inversion

What bitwise inversion means is that each bit of the memory stored is inverted - referring to any 1’s becoming 0’s and vise versa. 0010 (2) would become 1101 (13), and 1110 (14) would become 0001 (1) for example.

The three variables (failOp, passOp, and depthFailOp) state what operations should happen under each condition. Fail/pass refer to the stencil test passing or failing. And depthFailOp refers to what to happen if the stencil test passes but the depth test fails.

The Complicated Bit

Now there is only three variables left in the VkStencilOpState structure that haven’t been explained:

uint32_t       compareMask;
uint32_t       writeMask;
uint32_t       reference;

This is because they are the most complicated and finicky to get right.

So, to start with it is useful to establish what inputs we are dealing with. There are:

The current value stored in the stencil buffer.
The compare mask.
The write mask.
The reference value.
The stencil operation to perform for this situation.

Here is the logical flow:

First, if the stencil test is not enabled then the test is not performed.

Then which face is currently being tested determines the state, with potentially different state being used for front/back facing polygons.

Then both the reference value (this is passed into the process by the user and can be whatever value you want), and the currently stored value, are bitwise AND’d with the comparison mask (also set by the user). For anyone who likes maths equations, here’s how this looks:

$$S_{r} \& M_{c}=S'_{r}$$

$$S_{a}\&M_{c}=S'_{a}$$

Where Sr is the reference value, Mc is the compare mask, and Sa is the read stencil attachment value.

These two are then passed into the comparison operator, and the pass/fail state for the stencil test is determined.

Regardless of if the test passes or fails, a new value is generated:

$$S_{g}$$

How this value is generated is based off of what stencil operation is set for the current context. For example, if the stencil test fails then the value set in failOp is used. If the test passes then the operation depends on if the depth test passes. If it does then passOp is used, otherwise depthFailOp is used. This is one of the quirks of the stencil buffer that I feel gives it a vast amount of variation in use-cases - you can set the value even for operations that fail the depth test!

Regardless of where it pulls the stencil operation from, the generated value is created from the stored stencil buffer value.

Now, for the final step, the stencil buffer is updated with the generated value… but not quite. How it is updated is based on the write mask, following this calculation:

$$S_{a} = (S_{a}\wedge ¬S_{w})V(S_{g}\wedge S_{w})$$

Nice and complicated :D

To break this down a bit, this symbol means logical and:

$$\wedge$$

And this one means logical or:

$$∨$$

And, thirdly, this means logical not:

$$¬$$

So, after re-writing this in a C++ way it becomes:

// Sa equals (Sa and not(Sw)) or (Sg and Sw)
Sa = (Sa & !Sw) | (Sg & Sw)

To explain this in a more straightforward way I’m going to convert each part into a short phrase explaining what it is doing. This becomes:

New attachment value = Stored attachment value masked with the opposite of the write mask, then bitwise OR’d with the generated value masked with the write mask.

Even with this breakdown its still fairly opaque as to what it is doing and why. So I‘m going to run through a couple of examples.

Overdraw Tracker

Say that you are rendering objects to the screen and you wanted to have the stencil buffer count how many times each pixel has been drawn to - this would work as an overdraw tracker. What you would want to do is set the compareOp to VK_COMPARE_OP_ALWAYS, as the operation of adding one to the counter should always be run. passOp should be set to VK_STENCIL_OP_INCREMENT_AND_CLAMP, failOp and depthFailOp should be set to VK_STENCIL_OP_KEEP, as we don’t want the value to be changed. (Technically failOp could be set to anything as we are always going to pass due to the compare operation set)

Now, what should the reference, and the compare and write masks be set to? This is the part I mostly find difficult when using the stencil buffer, as it is always never clear just from looking at the equation. So what I do is work it backwards.

I know the result I want is for the result to be the current value plus one. We know that the generated value (Sg) is going to be the stored value (Sa) + 1, because of the stencil operation used (VK_STENCIL_OP_INCREMENT_AND_CLAMP).

So, from looking at the equation we want the process of anding with the write mask to not change the value (right hand side of the or). How that is done is through setting the mask to being all 1’s.

Now what we have is:

// Base equation
Sa = (Sa & !Sw) | (Sg & Sw)

// Pass in the values that we know so far
Sa = (Sa & !(1)) | ((Sa + 1) & 1)

// After simplifying
Sa = (Sa & 0) | (Sa + 1)

// which goes to
Sa = Sa + 1

Which is what we want!

So what should be set is:

CompareOp set to VK_COMPARE_OP_ALWAYS
PassOp set to VK_STENCIL_OP_INCREMENT_AND_CLAMP
Compare mask to anything as it will always pass (because of VK_COMPARE_OP_ALWAYS)
Reference doesn’t matter either for the same reason as above
Write mask set to !0, as we don’t want to lose any data there

Material ID Store

The last example went well, but the reference value and compare mask was not used, so here is another example that uses more of the variables. In my deferred rendering process I want to write out what material ID the model is using to the stencil buffer, so that later on I can use the stencil buffer as a mask for pixels that need to be culled. What should I set the variables to in the stencil pipeline?

To figure out this I am again going to work backwards.

The logic I want it to do is replace the currently stored value with my new passed in one if the depth test passes. Any other situation I want to keep the currently stored on. So with that knowledge I can set failOp and depthFailOp to being VK_STENCIL_OP_KEEP, and passOp to VK_STENCIL_OP_REPLACE. Great, but how do I get my new value to being stored by the buffer? How I am going to go about this is by working backwards in the equation. It is currently:

Sa = (Sa & !Sw) | (Sg & Sw)

But I know that the value stored in Sa is not needed, so I want the left hand side of the OR to result in zeros. That can be done through what it is anded with being all 0’s. Which in this case means that Sw needs to be all 1’s. (as !1 = 0) That solves what Sw needs to be. So lets run that through:

Sa = (Sa & !(1)) | (Sg & 1)

Sa = (Sa & 0) | Sg

Sa = Sg

Great! The flow will now store whatever value Sg is into the stencil buffer. But what is Sg currently? In the previous example it was forced to be the stored value plus one. This time I didn’t force anything. According to the spec:

A new stencil value s_g is generated according to a stencil operation defined by VkStencilOp parameters set by vkCmdSetStencilOp or VkPipelineDepthStencilStateCreateInfo.

The stencil operation set was: VK_STENCIL_OP_REPLACE. Which again according to the spec:

VK_STENCIL_OP_REPLACE sets the value to reference.

Which explains what value Sg has.

To round up this example:

Reference needs to be set to the new material ID.
Write mask needs to be set to all 1’s (done through setting it to !0 - or ~0 if using C++ bitwise operations).
Compare mask can be set to anything as the stencil test has been set to always pass.
PassOp needs to be set to VK_STENCIL_OP_REPLACE.
CompareOp needs to be set to VK_COMPARE_OP_ALWAYS.

Conclusion

To conclude, this blog post has broken down Vulkan’s stencil buffer in great detail, given examples of when it should be used, and the benefits that it provides.

Thank you for getting this far into the post. This has been one of the more in-depth ones, but I felt that this should be a useful reference for anyone learning about the stencil buffer. Or if they simply want a less verbose way of be reminded how the internal equation works. If you are still confused by how this works then feel free to leave a question in the comments, or read through Vulkan’s specification on the area.

If you are interested in topics similar to this one then I have many other posts here: https://indiegamescreation.hashnode.dev/

I post weekly on Tuesdays. And if you would like to be reminded when I do, then you can subscribe to the mailing list below.

Vulkan's Stencil Buffer

An in-depth breakdown of how to use the criminally under-talked about stencil buffer.