Voxelization tutorial Unity

2025-12-13 11:55 UTC gpt-5-2-thinking Open in ChatGPT ↗

Here’s the “settle it once and for all” mental model I’d teach for an “infinite”, real-time editable voxel world in Unity—built from the same core ideas you see across Minecraft-style engines (chunking + surface meshing) and Claybook-style tech (GPU-friendly volumetrics like SDFs). Claybook is a great reminder that the “voxel” you store doesn’t have to be a block ID; it can be a field (distance/density) that you update efficiently on the GPU. citeturn0search0turn0search4

Below is the big-picture tutorial: the problems that kill performance, and the design choices that make it scale.

0) Decide what “voxel world” means (because it changes everything)

There are two dominant “SOTA” families:

A) Block voxels (Minecraft-like)

Each cell stores a material ID (air/stone/grass…).
Rendering is a surface extraction problem: turn visible faces into triangles.
Best for: crisp blocks, lots of edits, simple physics, simple networking.

B) Field voxels (density / Signed Distance Field like Claybook)

Each cell stores a scalar (density or signed distance), possibly plus material.
Rendering is iso-surface extraction (marching cubes / dual contouring) or raymarching in special cases.
Best for: smooth terrain, sculpting, blends, booleans, unified collision queries.
Claybook specifically models clay as SDF volumes and leans heavily on GPU workflows for updates. citeturn0search0turn0search4

The “best approach” for maximum performance in a giant editable world is usually:

Block voxels for gameplay-scale building (fast edits, cheap meshing).
Optional SDF/density only where you truly need smooth deformation (local regions, caves, terrain layers, etc.). Trying to make everything globally smooth and editable at infinite scale is doable, but it’s a heavier streaming + LOD + meshing problem.

1) The fundamental enemy: “global work”

An infinite voxel world only works if edits and rendering are local.

So every system must obey:

O(edited area) updates (not O(world), not O(chunk columns), not O(all meshes)).
Work bounded per frame (hard cap on how many chunks can rebuild/upload).

This leads to the cornerstone pattern…

2) Spatial partitioning: chunks + regions + (optional) clipmaps

Chunks (the unit of meshing & GPU upload)

You split the world into fixed-size chunks (typically cubic, e.g. 16³, 32³, 64³). Minecraft effectively streams and builds in chunk/section units, and most voxel meshing discussions orbit that model. citeturn0search9turn0search12

Key idea:

Chunk size is not about “the world grid.”
It’s about the cost of:
- rebuild mesh
- upload mesh to GPU
- cull / draw it
- keep it in memory

Regions (the unit of disk/network paging)

Group chunks into regions (e.g. 8×8×8 chunks) so you can load/save in big contiguous blobs.

Clipmaps (optional, but very powerful for “infinite” terrain)

For far distances, don’t keep “real chunks” forever. Keep LOD rings around the camera (like voxel clipmaps):

Near: real, editable, high-res chunks.
Mid/Far: progressively lower-res representations, derived from the near data or regenerated from seeds.

This is one of the big “Nanite-adjacent” lessons: virtualize what you can, keep only what’s needed resident, and make LOD selection cheap. citeturn0search14turn0search7

3) Data model: make the storage sparse, not the coordinates

“Infinite” means coordinates are unbounded; memory is not.

So you want a structure where empty space is cheap:

For block voxels

Use a hierarchy like:

World hash map (or sparse grid) → region → chunk → sub-chunk/sections.
Inside a chunk:
- Palette + bitpacking (store “air/stone/dirt” as small indices).
- Run-length encoding or “homogeneous chunk” flags (all air/all stone = almost free).

For density/SDF voxels

Use bricks (small 3D tiles) stored sparsely:

Only allocate bricks near surfaces or edited areas.
You can also store a narrow band around the surface (Claybook uses banded SDF thinking; the general principle is “don’t store what you don’t need”). citeturn0search0turn0search8

The rule:

Your storage format should make “nothing here” extremely cheap.

4) Rendering pipeline: “extract surface” with the cheapest possible mesh

Block world rendering (fast path)

You have 3 typical options:

Naïve face culling (emit a quad for every exposed face)

Simple, surprisingly OK for prototypes.
Scales poorly.

Greedy meshing (merge adjacent coplanar faces into big quads)

Huge reduction in triangles/vertices.
Standard “grown-up” approach for Minecraft-like. citeturn0search9turn0search32

Meshlets / cluster-based culling (GPU-driven direction)

Treat chunk meshes as clusters.
Build compact clusters and let GPU cull aggressively (like “Nanite thinking,” but applied to voxel surfaces).
More complex, but it’s where maximum scale goes.

Important real-time-editing reality: greedy meshing is great, but rebuilding it too often can hitch—so edits must produce localized remesh requests and you must throttle rebuilds. citeturn0search28turn0search9

Smooth terrain rendering (density/SDF)

Surface extraction is usually marching cubes / dual contouring style.
Edits are brush operations that modify density/SDF values in a region; only affected chunks rebuild.
This is where GPU compute can shine for both edit application and meshing, if you’re careful about data movement.

5) The edit system: “write voxels” is easy; “keep everything consistent” is hard

An edit touches more than just a voxel value:

The dependency fan-out

When a voxel changes you may need to update:

the chunk’s mesh (and neighbor chunk borders)
collision
lighting / AO
navigation
gameplay metadata (support, destruction, fluid, etc.)

The winning strategy is a dirty-propagation graph:

Apply edit to voxel data (CPU or GPU).
Mark dirty chunks:
- Always include the edited chunk.
- Include neighbor chunks if you touched a boundary (because faces/isosurfaces cross chunk edges).
Enqueue jobs:
- mesh rebuild job
- collider rebuild job (often at lower frequency)
- lighting update job

Budget everything

You do not rebuild everything immediately. You process rebuild queues with a per-frame cap:

“max N chunks mesh uploads per frame”
“max M collider rebuilds per second”

This is how you stay hitch-free no matter how chaotic edits get.

6) Threading model in Unity: treat the main thread as a “presenter”

Unity-specific big idea: the main thread should mostly:

schedule work
upload finished meshes/buffers
issue draw calls (or feed GPU-driven draws)

Everything else becomes jobs/compute.

CPU path (highly practical)

Chunk mesh generation runs in Burst + Jobs (or Entities/DOTS).
Results written into native buffers.
Main thread converts them into Mesh objects (or uses MeshData API) and uploads.

GPU path (high ceiling, more engineering)

Store chunk voxel data in GPU buffers/textures.
Run compute shaders for:
- edit application
- surface extraction / compaction
Use indirect drawing patterns where possible (GPU-driven).

Claybook is a proof point that heavy voxel/field manipulation can be GPU-native. citeturn0search0turn0search4

In practice, many “best possible” Unity voxel engines end up hybrid:

CPU jobs for meshing (simpler debugging, easier data locality)
GPU compute for expensive “field math” or special effects
GPU-driven instancing/draw for throughput

7) Streaming: generation, loading, and saving without stalls

You need three sources of truth, layered:

Procedural base (seeded noise / rules)
- Generates chunks deterministically.
Delta edits (player modifications)
- Stored as compact diffs against the base.
Baked caches (optional)
- Prebuilt meshes or compressed chunk payloads for fast loads.

When a chunk comes into range:

Check if it exists on disk:
- load base+delta (or full snapshot)
otherwise:
- generate base
apply deltas
enqueue mesh build

When a chunk leaves range:

keep it in an LRU cache (memory permitting)
or serialize it out as region data

This is the “infinite world” loop: predictable paging.

8) LOD that doesn’t look terrible

LOD is where most voxel projects either:

blow performance (too many high-res chunks)
or blow visuals (popping, cracks)

The stable approach:

Block worlds

Near: full resolution.
Far: simplified (bigger chunks, merged materials, or impostors).
Often you can just cap view distance because block worlds read well even with fog.

Smooth / density worlds

You need crack-free transitions:

Use dual grid / Transvoxel-style seam stitching, or
Keep LOD levels aligned (clipmaps), or
Generate meshes with shared boundaries.

This is one reason many “infinite editable smooth voxel terrain” engines keep the editable region limited to a high-res band around the player, and everything else becomes lower-res, derived, or frozen.

9) Collision and physics: don’t rebuild it like you rebuild visuals

Collision meshes are expensive and Unity physics likes stability.

Common winning pattern:

Visual mesh updates frequently.
Collision updates:
- less frequently,
- or only in a radius around the player,
- or using simpler proxy colliders (voxel AABBs, heightfields, etc.)

If you’re doing SDF-style worlds, collision queries can be unified conceptually (distance checks), but Unity’s built-in physics still wants colliders, so you’ll usually still need a pragmatic bridge.

10) What “maximum performance” looks like in one sentence

A top-tier voxel world is basically:

Sparse storage + chunked streaming + budgeted rebuild queues + multithreaded (Burst/Jobs) meshing + greedy/clustered rendering + carefully throttled collision/lighting.

And if you want Claybook-grade deformation: store a field (SDF/density) in sparse bricks, update it locally (often on GPU), and rebuild only the affected surface chunks. citeturn0search0turn0search8

If you want, next we can turn this into a proper “chaptered” tutorial structure (with diagrams and Unity-specific implementation milestones), and only then start writing code in the clean order you’d actually build it (data → streaming → meshing → rendering → editing → physics → LOD).

Awesome — let’s build this like a real engine tutorial: milestone-driven, Unity-aware, and always focused on why each decision exists.

I’ll lay this out as a sequence of “chapters” you can implement one by one, and at each step I’ll call out the performance traps and the “SOTA” solution patterns.

Chapter 1 — Define the contract (what your world guarantees)

Before any data structures, lock these invariants:

1) Coordinate system

Use integer voxel coordinates in “world space”.
Use chunk coordinates (chunk = floor(world / chunkSize)), and local voxel coords (local = world - chunk*chunkSize).
Make sure negative coordinates behave correctly (floor division, not truncation).

2) Edit guarantee

When a player edits voxels, you guarantee:

The edit is applied immediately to data (so gameplay/queries are correct).
Visuals/collision may update with a bounded delay (to stay hitch-free).
Only a local set of chunks become dirty.

3) Performance guarantee

You will never rebuild unlimited chunks in one frame.
Everything heavy is scheduled and budgeted.

This “contract” dictates the whole architecture.

Chapter 2 — World storage: sparse, layered, and streamable

Think in 3 layers:

Layer A: Procedural base (stateless)

“What the world would be if nobody edited it.”
Generated from a seed + rules.
Must be deterministic.

Layer B: Edit deltas (stateful)

The minimal representation of player changes.
Stored per region/chunk.

Layer C: Runtime cache (ephemeral)

Recently used chunks in RAM.
Mesh/collider GPU/physics objects tied to those chunks.

Storage shapes (Unity-friendly)

Regions (big paging unit):

Something like RegionCoord -> RegionFile.
Contains many chunks’ deltas.

Chunks (runtime unit):

Contains voxel payload + metadata + dirty flags + mesh handles.

Inside a chunk, you want a representation that can be both:

fast to edit locally
fast to iterate for meshing

The “grown-up” approach is:

Start with a simple dense array during early milestones.
Then evolve into palette + bitpacking and/or “homogeneous chunk” fast paths.
If you go smooth/density later, pivot to sparse bricks only near surfaces.

Chapter 3 — The streaming ring: how “infinite” happens

You don’t have an infinite world — you have an infinite address space and a finite working set.

The working set is typically:

A radius (in chunks) around the player camera
Plus a buffer for prediction (movement direction)
Plus a small LRU cache behind you

Streaming loop (per frame):

Compute desired chunk set (based on camera position).
Diff vs currently loaded:
- Enqueue loads for missing chunks.
- Enqueue unloads for far chunks.
Process a capped amount of IO / generation work per frame.

Unity gotcha: don’t do file IO on the main thread. Ever.
Even when you “think it’s small”.

Chapter 4 — The mesh pipeline (block voxels): fastest path to “it works”

This is the core “Minecraft-like” rendering approach.

Step 1: visibility (face culling)

A voxel face is visible if the neighbor voxel in that direction is “empty”.

Problem: checking neighbors across chunk borders.

Solution: give every chunk a 1-voxel “ghost layer” during meshing (read-only):

Either read neighbor chunk data on demand
Or maintain a border cache
But your mesher must “see” neighbors so seams don’t break.

Step 2: greedy meshing (the first major win)

Instead of spitting one quad per exposed face, you merge adjacent faces into bigger quads.

This cuts:

vertices
indices
upload size
draw cost

Important edit reality: greedy meshing isn’t incremental-friendly at the micro level.
So you plan for: “rebuild chunk mesh when dirty”, not “patch tiny mesh pieces”.

Step 3: chunk mesh as a product

A chunk mesh build outputs:

vertex buffer
index buffer
submesh ranges by material (optional)
bounds

In Unity terms, the main thread later “publishes” this product to a Mesh (or via MeshData API).

Chapter 5 — Scheduling: how to never hitch when edits happen

This is the difference between a demo and an engine.

The golden rule

Edits mark dirty chunks → dirty chunks go into queues → queues are processed with budgets.

You typically maintain several queues:

Data edits (immediate)
Mesh rebuild requests
Mesh upload requests (main-thread limited)
Collider rebuild requests (rate-limited)
Lighting update requests (rate-limited)

Each frame you spend:

up to X ms on job completion checks
upload up to N meshes
rebuild up to M colliders per second

And you prioritize by:

distance to camera
visibility / frustum
importance (player near)

This is the “SOTA feel”: the world is always responsive, but the heavy work is amortized.

Chapter 6 — Unity threading model: what goes where

A clean split:

Main thread (“Presenter”)

decides what should exist
schedules jobs
applies finished results to Unity objects:
- mesh assignment
- renderer enable/disable
- collider assignment

Worker threads (Jobs + Burst)

chunk generation (procedural base)
applying deltas to chunk payload
meshing (greedy or whatever)
building collider mesh data (if you do mesh colliders)

GPU (optional / later)

compute-based edits for density fields
GPU-driven rendering (indirect draws)
fancy lighting/ao

If you keep the rule “main thread only presents”, your engine scales.

Chapter 7 — Edits: brushes, deltas, and locality

Edits come in two broad forms:

Block edit

set voxel at (x,y,z) = material
or fill a box / sphere

Locality principle:

A brush touches a bounded AABB.
Convert that to “affected chunk coords”.
Mark those chunks dirty (+ neighbors for borders).

Delta storage

You want edits to serialize compactly. Common patterns:

store changed voxels keyed by local index (sparse)
if density is high, store full chunk snapshot for that chunk
periodically “repack” deltas to keep them compact

The key is: saving should not require writing the entire world.

Chapter 8 — Collision: decouple from visuals (or you will suffer)

If you rebuild physics colliders as often as visuals, you’ll hitch and/or spike GC.

SOTA patterns:

Only update colliders within an “interaction radius”
Update colliders at a lower frequency
Use simpler collision representations:
- block AABBs (cheap)
- heightfield for terrain-like areas
- mesh collider only for complex surfaces, and only nearby

This is also where “smooth SDF worlds” get tricky in Unity: Unity physics still wants colliders, so you need a pragmatic bridge.

Chapter 9 — Lighting & AO: treat as another budgeted system

Lighting is a whole topic, but the architecture stays the same:

edits dirty lighting for local chunks
lighting is recomputed incrementally and throttled
store per-voxel or per-vertex lighting depending on style

For block worlds, cheap wins:

baked-ish skylight + block light propagation (Minecraft-ish)
per-vertex AO derived from neighbor occupancy (super cheap and looks good)

Chapter 10 — LOD & far rendering: the “infinite” illusion

Once near-field chunks are solid, you scale view distance by not drawing high-res meshes forever.

Options (pick one based on style):

Fog + view distance cap (block worlds look fine)
Chunk LOD (bigger voxels farther away)
Clipmaps (rings of decreasing resolution around the camera)

The important seam rule:

You must prevent cracks between LOD levels (either stitching or alignment).

Chunk addressing + streaming ring (load/unload empty chunks)
Dense chunk storage + procedural base generator
Naïve face meshing (get visuals fast)
Greedy meshing (huge perf jump)
Edit + dirty propagation + rebuild queues
Burst/Jobs meshing (make it scale)
Collider strategy (decoupled + throttled)
Save/load region files (deltas)
Lighting/AO
LOD / far rendering
Optional: GPU compute / SDF zones (Claybook-style deformation)

Next step (we start implementing the tutorial “for real”)

Let’s start with Chapter 2–3 in detail: the exact runtime objects you keep for World -> Region -> Chunk, what metadata each chunk stores (dirty flags, last built version, mesh handles), and the streaming ring algorithm with budgets.

I’ll write it as a concrete “Unity scene architecture” plan (what GameObjects exist, what systems exist, what runs in Update vs jobs), still with no code.

If you want the absolute “maximum performance” target from day one, I’ll assume:

chunk size: 32³ (good compromise)
greedy meshing
Burst jobs for meshing
main-thread mesh uploads capped

If you’d rather optimize for simplicity first, we’ll start at 16³ and evolve.