Virtual Cameras
Discussion around game cameras and displaying views into virtual worlds.
Cameras
I have spent a lot of time this last week working to add multi-camera support into my custom game engine, so I figured that for this week’s blog I am going to talk about cameras. Virtual cameras that is, not real ones.
So… cameras?
Very Basic intro to Cameras
Real World Cameras:
Cameras in the real world are very complex things. They take in the light reflected off objects, and then store that onto an image (which could be physical or digital) through the use of mirrors, lenses, and image sensors. This essentially makes them passive observers of the world that they are within. Virtual cameras act fairly similar to this, with one massive difference.
This difference is that game cameras are not passive observers of their world. This is because for something to be rendered onto an output from a virtual camera, the object has to go through numerous processing steps - some of which are dependent on the camera, such as view frustum culling. Whereas in reality, everything is still going to get hit by light regardless of if that light later then is viewed by a camera.
Rendering: Spaces
One question that quickly arises when thinking about rendering a ‘camera‘ in an entirely fake world is how does it see? Meaning, how does it move around the space and display what is there? It doesn’t have the benefit of being in the real world, in which light is abundant and reflecting off everything around it. Virtual worlds have to deliberately render - or not render - their objects.
For anyone who has watched Futurama you may be familiar with how the space ship in the show flies around the universe. It doesn’t move itself, it moves the universe around it.
This is exactly how game cameras work.
To explain how this is the case I’m gonna have to first explain the concept of ‘spaces‘. Spaces are essentially just relativity. If you describe something as being in the ‘space‘ of something else, then you are describing it as being relative to the other thing. In terms of rendering a model, the spaces involved are: local, world, view, clip, and screen.
Each space is a relative state, with each green box being a transformation to convert from one space to another. For example, local space refers to a model’s own space. So, for a unit cube, its vertex positions would range from -1 to 1 on each axis (or -0.5 to 0.5 depending on how you want to think about it). These coordinates are relative to the center of the model. But now say that that cube is positioned somewhere in the game world. It’s vertex positions cannot be from -1 to 1 anymore, otherwise it wouldn’t have moved.
Imagine the cube is at (100, 100, 0) in the world, then its coordinates need to be centered on (100, 100, 0). To get to this position then a transformation of (100, 100, 0) needs to be applied to the cube’s local space coordinates. This is done through a model matrix, and results in world space positions.
After this is done for every model in the game then we would have a world full of models in their correct positions. Everything is great, right? Not exactly. How do we draw these models to the screen so that the player can see them?
The answer to that is having another space: the ‘View Space’. ‘View’ in this context refers to the camera. So what happens now is that we use a view matrix to transform all models in the world to be relative to the camera. Now the camera is the origin of its world, pointing forwards in its own space.
Great! Now everything is relative to the camera. The only step left is to convert the model’s vertex data from 3D (view) space to 2D (clip) space, this is done through using a projection matrix. Then these 2D coordinates (called Normalised Device Coordinated (NDC)) are converted to screen coordinates - which is done through applying a viewport transformation. The maths behind the projection matrix is a bit too complex for this blog, but in the future I may delve more into how the 3D to 2D projection works.
Logic to Move Around the World
Cameras tend to come in two variants, first person and third person. How they work from a mathematical standpoint is very similar, with some very key differences.
First person
First person cameras have four very important internal vectors: position, forwards, up, and right.
Position is where the camera is in the world, and the other three vectors depict rotational data about the camera. They must be at 90 degrees to each other!
To look around the world using a first person camera, the forward, right, and up vectors need to be updated before being passed into the view matrix calculations. I have seen this done a couple of different ways, only one of which actually works properly. The first way I have seen it be achieved is to detect how far a mouse has moved from the prior frame and rotate the vectors according to what that movement means, compounding changes with prior frames. Visually this looks like this:
With each orange X being a mouse position on the screen, and the arrow shows which way the mouse has moved between frames. Then this movement would mean that the camera should rotate to look towards its right, meaning a rotation around its local up vector. If this was to be done through simply converting the pixels moved into an angle, and multiplying the vector with the Y rotation matrix for that angle, then you would get strange results. This is because the results of previous mouse movements compound to later frames, which is not what we want.
One way to fix this, is to instead store two more variables: rotation from horizontal and vertical axis. Then, whenever the mouse is moved the angle is added to this store, which then is used to update the vectors with that rotation from their starting values. The code below shows this flow:
float lookSpeed = 0.4f;
bool lookChanged = false;
Vector2D<float> movementDelta = mouse->GetMousePositionDelta();
// If looking left/right
if (movementDelta.x != 0.0)
{
mAngleFromVertical -= deltaTime * movementDelta.x * lookSpeed;
lookChanged = true;
}
// If looking up/down
if (movementDelta.y != 0.0f)
{
mAngleFromHorizonal -= deltaTime * movementDelta.y * lookSpeed;
lookChanged = true;
}
// Now calculate the new vectors
if (lookChanged)
UpdateLookDirection();
void UpdateLookDirection()
{
// Rotation around Y (left/right looking)
float cosTheta = cos(mAngleFromVertical);
float sinTheta = sin(mAngleFromVertical);
Matrix3X3 yRotation;
yRotation.SetFromArray({ cosTheta, 0.0f, sinTheta,
0.0f, 1.0f, 0.0f,
-sinTheta, 0.0f, cosTheta });
// Rotation around X (up/down looking)
cosTheta = cos(mAngleFromHorizonal);
sinTheta = sin(mAngleFromHorizonal);
Maths::Matrix::Matrix3X3 xRotation;
xRotation.SetFromArray({ 1.0f, 0.0f, 0.0f,
0.0f, cosTheta, -sinTheta,
0.0f, sinTheta, cosTheta });
Matrix3X3 combinedMatrix = yRotation * xRotation;
mForward = combinedMatrix.MultiplyBy(mStartForward);
mRight = combinedMatrix.MultiplyBy(mStartRight);
mUp = combinedMatrix.MultiplyBy(mStartUp);
}
Third person
The main key difference between first and third person cameras is having a focal point. First person cameras are simply looking directly in-front of them, whereas third person ones are focused on a point, and are rotating around it.
If you are using a maths library, such as glm, then this is very simple to implement as it has a ‘look at‘ function:
// glm::lookAt(position, look at position, up)
viewMatrix = glm::lookAt(mPosition, mFocalPoint, mUp);
The only difference here is that for a first person camera, ‘mFocalPoint’ is replaced with (mPosition + mForward).
Then, when handing the mouse/controller movements, instead of the vectors being updated as they were before, they need to take into account that they are looking at something and rotate around that point instead (usually with a radius enforced).
Conclusion
To conclude, this blog has covered what virtual cameras are, how they are represented within virtual worlds, the flow models take to have their data be shown on the screen, and the two most common types of cameras: first and third person.
In the next post I am going to be going into more detail as to more features that cameras usually have within a game engine, such as render to texture, view culling, the difference between perspective and orthographic projections, and multi-camera rendering. Till then :D