Here’s the “settle it once and for all” mental model I’d teach for an “infinite”, real-time editable voxel world in Unity—built from the same core ideas you see across Minecraft-style engines (chunking + surface meshing) and Claybook-style tech (GPU-friendly volumetrics like SDFs). Claybook is a great reminder that the “voxel” you store doesn’t have to be a block ID; it can be a field (distance/density) that you update efficiently on the GPU. citeturn0search0turn0search4
Below is the big-picture tutorial: the problems that kill performance, and the design choices that make it scale.
There are two dominant “SOTA” families:
- Each cell stores a material ID (air/stone/grass…).
- Rendering is a surface extraction problem: turn visible faces into triangles.
- Best for: crisp blocks, lots of edits, simple physics, simple networking.
- Each cell stores a scalar (density or signed distance), possibly plus material.
- Rendering is iso-surface extraction (marching cubes / dual contouring) or raymarching in special cases.
- Best for: smooth terrain, sculpting, blends, booleans, unified collision queries.
- Claybook specifically models clay as SDF volumes and leans heavily on GPU workflows for updates. citeturn0search0turn0search4
The “best approach” for maximum performance in a giant editable world is usually:
- Block voxels for gameplay-scale building (fast edits, cheap meshing).
- Optional SDF/density only where you truly need smooth deformation (local regions, caves, terrain layers, etc.).
Trying to make everything globally smooth and editable at infinite scale is doable, but it’s a heavier streaming + LOD + meshing problem.
An infinite voxel world only works if edits and rendering are local.
So every system must obey:
- O(edited area) updates (not O(world), not O(chunk columns), not O(all meshes)).
- Work bounded per frame (hard cap on how many chunks can rebuild/upload).
This leads to the cornerstone pattern…
You split the world into fixed-size chunks (typically cubic, e.g. 16³, 32³, 64³). Minecraft effectively streams and builds in chunk/section units, and most voxel meshing discussions orbit that model. citeturn0search9turn0search12
Key idea:
- Chunk size is not about “the world grid.”
- It’s about the cost of:
- rebuild mesh
- upload mesh to GPU
- cull / draw it
- keep it in memory
Group chunks into regions (e.g. 8×8×8 chunks) so you can load/save in big contiguous blobs.
For far distances, don’t keep “real chunks” forever. Keep LOD rings around the camera (like voxel clipmaps):
- Near: real, editable, high-res chunks.
- Mid/Far: progressively lower-res representations, derived from the near data or regenerated from seeds.
This is one of the big “Nanite-adjacent” lessons: virtualize what you can, keep only what’s needed resident, and make LOD selection cheap. citeturn0search14turn0search7
“Infinite” means coordinates are unbounded; memory is not.
So you want a structure where empty space is cheap:
Use a hierarchy like:
- World hash map (or sparse grid) → region → chunk → sub-chunk/sections.
- Inside a chunk:
- Palette + bitpacking (store “air/stone/dirt” as small indices).
- Run-length encoding or “homogeneous chunk” flags (all air/all stone = almost free).
Use bricks (small 3D tiles) stored sparsely:
- Only allocate bricks near surfaces or edited areas.
- You can also store a narrow band around the surface (Claybook uses banded SDF thinking; the general principle is “don’t store what you don’t need”). citeturn0search0turn0search8
The rule:
- Your storage format should make “nothing here” extremely cheap.
You have 3 typical options:
- Naïve face culling (emit a quad for every exposed face)
- Simple, surprisingly OK for prototypes.
- Scales poorly.
- Greedy meshing (merge adjacent coplanar faces into big quads)
- Huge reduction in triangles/vertices.
- Standard “grown-up” approach for Minecraft-like. citeturn0search9turn0search32
- Meshlets / cluster-based culling (GPU-driven direction)
- Treat chunk meshes as clusters.
- Build compact clusters and let GPU cull aggressively (like “Nanite thinking,” but applied to voxel surfaces).
- More complex, but it’s where maximum scale goes.
Important real-time-editing reality: greedy meshing is great, but rebuilding it too often can hitch—so edits must produce localized remesh requests and you must throttle rebuilds. citeturn0search28turn0search9
- Surface extraction is usually marching cubes / dual contouring style.
- Edits are brush operations that modify density/SDF values in a region; only affected chunks rebuild.
- This is where GPU compute can shine for both edit application and meshing, if you’re careful about data movement.
An edit touches more than just a voxel value:
When a voxel changes you may need to update:
- the chunk’s mesh (and neighbor chunk borders)
- collision
- lighting / AO
- navigation
- gameplay metadata (support, destruction, fluid, etc.)
The winning strategy is a dirty-propagation graph:
- Apply edit to voxel data (CPU or GPU).
- Mark dirty chunks:
- Always include the edited chunk.
- Include neighbor chunks if you touched a boundary (because faces/isosurfaces cross chunk edges).
- Enqueue jobs:
- mesh rebuild job
- collider rebuild job (often at lower frequency)
- lighting update job
You do not rebuild everything immediately.
You process rebuild queues with a per-frame cap:
- “max N chunks mesh uploads per frame”
- “max M collider rebuilds per second”
This is how you stay hitch-free no matter how chaotic edits get.
Unity-specific big idea: the main thread should mostly:
- schedule work
- upload finished meshes/buffers
- issue draw calls (or feed GPU-driven draws)
Everything else becomes jobs/compute.
- Chunk mesh generation runs in Burst + Jobs (or Entities/DOTS).
- Results written into native buffers.
- Main thread converts them into Mesh objects (or uses MeshData API) and uploads.
- Store chunk voxel data in GPU buffers/textures.
- Run compute shaders for:
- edit application
- surface extraction / compaction
- Use indirect drawing patterns where possible (GPU-driven).
Claybook is a proof point that heavy voxel/field manipulation can be GPU-native. citeturn0search0turn0search4
In practice, many “best possible” Unity voxel engines end up hybrid:
- CPU jobs for meshing (simpler debugging, easier data locality)
- GPU compute for expensive “field math” or special effects
- GPU-driven instancing/draw for throughput
You need three sources of truth, layered:
-
Procedural base (seeded noise / rules)
- Generates chunks deterministically.
-
Delta edits (player modifications)
- Stored as compact diffs against the base.
-
Baked caches (optional)
- Prebuilt meshes or compressed chunk payloads for fast loads.
When a chunk comes into range:
- Check if it exists on disk:
- load base+delta (or full snapshot)
- otherwise:
- apply deltas
- enqueue mesh build
When a chunk leaves range:
- keep it in an LRU cache (memory permitting)
- or serialize it out as region data
This is the “infinite world” loop: predictable paging.
LOD is where most voxel projects either:
- blow performance (too many high-res chunks)
- or blow visuals (popping, cracks)
The stable approach:
- Near: full resolution.
- Far: simplified (bigger chunks, merged materials, or impostors).
- Often you can just cap view distance because block worlds read well even with fog.
You need crack-free transitions:
- Use dual grid / Transvoxel-style seam stitching, or
- Keep LOD levels aligned (clipmaps), or
- Generate meshes with shared boundaries.
This is one reason many “infinite editable smooth voxel terrain” engines keep the editable region limited to a high-res band around the player, and everything else becomes lower-res, derived, or frozen.
Collision meshes are expensive and Unity physics likes stability.
Common winning pattern:
- Visual mesh updates frequently.
- Collision updates:
- less frequently,
- or only in a radius around the player,
- or using simpler proxy colliders (voxel AABBs, heightfields, etc.)
If you’re doing SDF-style worlds, collision queries can be unified conceptually (distance checks), but Unity’s built-in physics still wants colliders, so you’ll usually still need a pragmatic bridge.
A top-tier voxel world is basically:
Sparse storage + chunked streaming + budgeted rebuild queues + multithreaded (Burst/Jobs) meshing + greedy/clustered rendering + carefully throttled collision/lighting.
And if you want Claybook-grade deformation:
store a field (SDF/density) in sparse bricks, update it locally (often on GPU), and rebuild only the affected surface chunks. citeturn0search0turn0search8
If you want, next we can turn this into a proper “chaptered” tutorial structure (with diagrams and Unity-specific implementation milestones), and only then start writing code in the clean order you’d actually build it (data → streaming → meshing → rendering → editing → physics → LOD).