Planetary Terrain

Building a game that takes place on a realistically-scaled planet is difficult. Detailed geometry and texture data is too large to be represented explicitly, requiring procedural methods to populate it, coupled with level-of-detail and streaming for visualization and collision detection within areas of interest to the player. Numerical precision complicates things further: f32 coordinates have a resolution of about half a meter at one earth-radius (~6 million meters) away from the origin, and f64 operations on a GPU are unreasonably expensive. Despite these challenges, planetary-scale terrain is feasible even on low-end commodity hardware. This post discusses the implementation found in my planetmap library, and the vertex shader methods that work best with it.

Background: Flat Terrain

Some of these problems are well studied for flat terrain, i.e. terrain that doesn't have any large-scale curvature, either continuing forever or bounded by impassible walls. Popular approaches use 2D textures and heightmaps to represent material and geometry data respectively, used to color and vertically displace a square grid. When these are too large to be loaded all at once, they're divided into chunks that can be passed to the GPU and physics engine as needed, each corresponding to e.g. a 17x17 region of the grid. Neighboring chunks share the data along their common edge, a one-sample overlap that enables continuity without interpolation across chunk boundaries.

To handle the level-of-detail necessary to support arbitrarily large view distances, progressively lower-detail chunks can be organized in a quadtree, with the highest-detail chunks forming the leaves. Low-detail chunks contain the same number of vertices and texels as high-detail chunks, but spread them over an exponentially larger area. Determining the set of chunks needed to render a particular view point is then a matter of iterating through the tree from the root, stopping iteration along a branch and adding the current node to the set to be rendered whenever sufficient level of detail is reached. Stopping if any children of a node haven't yet been streamed onto the GPU further allows streaming to be fully asynchronous, ensuring gap-free terrain can be rendered from the moment the lowest level of detail is available.

A set of terrain chunks with varying detail

Particular care must be taken when rendering chunks with differing levels of detail that share an edge. A naive approach produces "T-junctions": shared edges represented with a single pair of vertices in the low-detail chunk, but three or more vertices in the high-detail chunk. Even if the middle vertices are shifted to form a straight line, rounding error means that hairline cracks can appear when viewed from the right angle. The middle vertices must be somehow removed, a process referred to as "stitching."

T-junctions inside the edge between two chunks

Generating the grid geometry for heightmapping on the fly in a vertex shader allows dynamically adapting the detail of each edge of a chunk by collapsing extra vertices into their neighbors. When the number of vertices along the edge of a chunk is one greater than a power of two, this can be accomplished by masking out the N least-significant bits of the integer position of the vertex in the grid, where N is the difference in quadtree depth between the current chunk and the lower-detail neighboring chunk, the "LoD delta". N must always be at least 0; negative deltas are ignored, as a mismatch will always be handled by the higher-detail chunk.

uvec2 coords = uvec2(gl_VertexIndex % VERTICES_PER_EDGE, 
                     gl_VertexIndex / VERTICES_PER_EDGE);
if (coords.y == 0) {
    coords.x &= -1 << top_lod_delta;
} else if (coords.y == VERTICES_PER_EDGE - 1) {
    coords.x &= -1 << bottom_lod_delta;
}
if (coords.x == 0) {
    coords.y &= -1 << left_lod_delta;
} else if (coords.x == VERTICES_PER_EDGE - 1) {
    coords.y &= -1 << right_lod_delta;
}
vec2 normalized_coords = vec2(coords) / vec2(VERTICES_PER_EDGE - 1);

This clamps vertices with coordinates not divisible by 2^N to the closest preceding vertex whose coordinates are. Because the neighboring chunk is exactly 2^N times as large, the only remaining distinct vertex positions are those which align perfectly with the neighbor's. The constraint on the total number of vertices per edge ensures that the first and last vertex are never moved, so long as 2^N is not greater than the number of vertices per edge.

Additional discussion of this approach, including a GPU-driven streaming architecture, can be found in the excellent GDC slides on Far Cry 5.

This model also gracefully supports physics. A fixed level of detail, near or at the maximum, should be used for consistent results. Care must only be taken to load chunks before any physical interaction might occur with terrain. This can be accomplished by integrating closely with your physics engine's collision detection logic, loading height data synchronously as needed to resolve queries. Unlike rendering, an asynchronous solution is not generally viable, but the risk of hitching is lower: physics only needs data for the exact chunks in which collision queries are being evaluated, typically far less data than an entire view frustum.

Topology

Unfortunately, data organized in 2D as outlined above is poorly suited to representing the surface of a sphere. Traditional solutions such as equirectangular projections allocate wildly inconsistent precision to different regions of the surface, leading to poor performance, inconsistent quality, and severe artifacts due to singularities at the poles. Cube maps present a solution: while typically applied to represent environments rather than surfaces, they define a way to map six 2D structures onto a spherical domain with no singularities and only modest distortion. Using the tools from Zucker and Higashi's "Cube-to-sphere Projections for Procedural Texturing and Beyond" we can further improve the distribution of precision to nearly optimal levels at low cost.

We apply a cube map projection (or variation thereof) by constructing a quadtree for each cube map face. Operations that evaluate chunk neighborhoods, e.g. to compute level-of-detail deltas, must then be extended to traverse between faces when necessary.

The procedure to generate geometry for displacement by a heightmap is also affected. Whereas each chunk in flat terrain starts as the same uniform grid (modulo scaling and stitching), chunks of data from a cube map have a huge variety of slightly different shapes due to distortion. The solution is simple: determine the location of a grid vertex on its face of the cube map, then invert the look-up operation that finds the cube-map coordinates for a point on the sphere, and finally scale to the desired altitude. Generating grid vertices on the fly is now much more important, as distortion causes most chunks to have slightly different shapes.

Procedural generation also must be modified to account for spherical topology. Two-dimensional noise functions cannot be used. Luckily, gradient (e.g. simplex) noise, the usual primitive building block for procedural heightmaps, extends naturally to three dimensions. Continuous, distortion-free height data can therefore be obtained by sampling a spherical section of a 3D noise function.

Numerical Precision

Once planets become large enough to present numerical precision issues, which Earth-like planets far surpass, additional care is required. Coordinates relative to the planet's origin cannot, under any circumstances, be stored in f32 precision; doing so will result in catastrophic visual jittering as rounding errors lead to huge changes in vertex positions from frame to frame. Off the shelf physics engines like Rapier can usually be built to work with f64 precision, more than adequate for even the largest planets (around 10nm at one Jupiter-radius). However, using f64 on the GPU incurs massive overhead, running at anywhere from 1/16th to 1/64th the speed of f32 on high-end desktop GPUs. We must therefore find a way to render with only f32.

The GPU has little use for absolute coordinates in the first place. All it needs is coordinates relative to the viewer, with precision proportional to the inverse square of distance from the viewer. Computing model and view matrices on the CPU at f64 precision, composing them, and only then rounding down to f32 for the GPU as a combined model-view matrix ensures adequate precision when rendering objects based on f64 coordinates known on the CPU. Unfortunately, this trick is insufficient for terrain: otherwise-insignificant rounding errors in model-view matrices used for neighboring chunks can introduce hairline cracks, just like T-junctions.

Instead, we can forego evaluation of view and projection matrices on the GPU entirely. For each chunk to be rendered, we compute the positions and normals of each corner (before heightmap displacement) and bring them into view space using a single f64-precision planet-to-view transform on the CPU. The use of a single global transform ensures that a corner will always produce the same view-space coordinates, no matter which chunk it's associated with. The vertex shader can then construct the remaining vertices within the chunk with bilinear interpolation.

Special care is required when interpolating to guarantee bitwise-exact results between chunks, even if they're interpolating in the opposite direction. For example, the GLSL mix primitive is implemented by some vendors as a + t * (b - a), where rounding errors in b - a can cause mix(a, b, 1) to differ from mix(b, a, 0).

// Numerically robust bilinear interpolation
vec3 bilerp(vec3 a, vec3 b, vec3 c, vec3 d, vec2 coords) {
    vec3 y0 = a * (1 - coords.x) + b * coords.x;
    vec3 y1 = c * (1 - coords.x) + d * coords.x;
    return y0 * (1 - coords.y) + y1 * coords.y;
}

Bilinear interpolation alone will give poor results at low levels of detail (i.e. for large chunks) because it ignores the curvature of the planet. This error is corrected with some trigonometry. Consider four points, all defined in view space:

C, some corner of the chunk
I, the interpolated point within the chunk
X, the desired point on the surface of the sphere, lying somewhere along the normal extending from I
O, the origin of the planet

To avoid cracks, C should be chosen consistently between vertices shared by neighboring chunks.

C and I are known. ∠CIX is the angle between CI and the surface normal at I, and hence easily computed. Because both C and X are on the surface of a sphere centered on O, the triangle CXO is necessarily an isosceles triangle, whose vertex angle ∠COX we can compute by taking the angle between the surface normals at I and C, from which the base angle ∠CXO, and therefore ∠CXI, follows:

Derivation of CXO

The dot products will often yield values very close to 1, and due to rounding error may actually exceed it, so inputs to acos should be clamped to at most 1.

Having two angles of the triangle CIX, we can derive the third and apply the law of sines to obtain X:

Derivation of X

Throughout this computation we've operated exclusively in terms of nearby coordinates, unit vectors, and angles, maintaining numerical stability and bit-perfect consistency between chunks despite the arbitrarily large scale of the planet. Finally, heightmap displacement is applied by shifting X along its normal by the desired distance.

Remarkably, f32 precision is adequate for most noise functions used to generate heightmap data. This is due to the definition of the heightmap as relative to a reference "surface" level, ensuring the dangerously large distance from the planet's origin need not be involved in any computation, and the highly stable nature of linear combinations such functions use to blend gradient data. However, this precision is unlikely to hold up to the demands of finite differencing; for high quality bump/normal/derivative maps, instead use a noise function which computes derivatives directly.

Despite all this effort, a conventional projection matrix will yield severe Z fighting artifacts if the far clip plane is set far enough to appreciate what we've accomplished. An reversed f32 depth buffer should be used instead, enabling infinite view distances without significant Z fighting. If the desired left, right, top, and bottom frustum planes are positioned at the angles l, r, t, and b with respect to forwards, we can compute a suitable projection as follows:

projection matrix

Combined with a "Greater" depth test, this provides good behavior even at astronomical distances. For further discussion, see NVIDIA's "Depth Precision Visualized".

24 September 2021