This response presents a Unity compute shader implementation of the practical 2D diffusion‑curve solver by Jeschke et al., which initializes from a Voronoi color image and a distance map (with radii reduced by 8% to prevent boundary crossing), performs four‑directional sampling per pixel, and incorporates two variable‑stencil shrinking strategies—Shrink Always (SA) and Shrink Half (SH)—to achieve visually smooth convergence in just a few Jacobi‑like iterations citeturn1view0turn0search2.
The solver begins with an initial guess obtained by rasterizing diffusion curves into a Voronoi diagram, coloring each pixel by its nearest curve segment citeturn1view0.
A distance map records, for every pixel, the distance to the closest boundary curve, ensuring sampling circles never cross region boundaries citeturn1view0.
The Voronoi color image serves both as the starting solution and as a Dirichlet boundary condition that remains fixed during diffusion iterations citeturn1view0.
The distance map is produced by rendering “slanted” polygons and using depth comparisons; it has an inherent up-to-8 % error due to the fan approximation, which the solver compensates for by scaling radii by 0.92 citeturn1view0.
Rather than computing a full circular average (which is too costly), the solver samples only four axis‑aligned neighbors—up, down, left, and right—per pixel citeturn1view0.
At each iteration, each interior pixel is updated to the average of those four samples, taken at the current stencil radius determined by the distance map and shrinking strategy citeturn1view0.
In SA, the stencil radius at iteration i is linearly scaled by $1 - \tfrac{i}{n}$, where $n$ is the total number of iterations citeturn1view0.
SA typically eliminates banding artifacts within about eight iterations, though full convergence to the exact minimal surface may require many more passes citeturn1view0.
SH maintains the full sampling radius for the first $n/2$ iterations and then linearly shrinks it over the remaining steps, accelerating overall convergence but allowing minor banding for up to 12–14 iterations citeturn1view0.
Figure 5 on ResearchGate illustrates that, after eight iterations, SH reduces error by more than half compared to SA citeturn0search4.
The shader declares a read‑only Texture2D<float4> for the current solution and a RWTexture2D<float4> for the next iteration, plus a Texture2D<float> for the distance map and a Texture2D<uint> mask for Dirichlet boundaries citeturn2search0.
A [numthreads(8,8,1)] directive groups threads into 8×8 tiles (64 threads each), balancing parallel throughput and memory access citeturn2search2.
Inside the kernel, if the boundary mask is set, the pixel is reset to its original Voronoi color; otherwise, four neighbor samples are fetched at offsets based on the computed radius for the current iteration and strategy, and their average is written out citeturn2search0.
Texture2D<float4> _PrevTex;
RWTexture2D<float4> _ResultTex;
Texture2D<float4> _VoronoiTex;
Texture2D<float> _DistanceMap;
Texture2D<uint> _BoundaryMask;
int _Strategy; // 0 = SA, 1 = SH
void CSMain(uint3 id : SV_DispatchThreadID) {
if (_BoundaryMask.Load(int3(uv,0)) == 1) {
_ResultTex[uv] = _VoronoiTex.Load(int3(uv,0));
float d0 = _DistanceMap.Load(int3(uv,0));
float baseR = d0 * 0.92; // compensate 8% rasterization error
float t = _Iteration / (float)_Iterations;
float r = (_Strategy == 0)
: (t < 0.5 ? baseR : baseR * (1 - 2*(t - 0.5)));
float4 sum = float4(0,0,0,0);
sum += _PrevTex.Load(int3(uv + int2( ir, 0), 0));
sum += _PrevTex.Load(int3(uv + int2(-ir, 0), 0));
sum += _PrevTex.Load(int3(uv + int2( 0, ir), 0));
sum += _PrevTex.Load(int3(uv + int2( 0, -ir), 0));
_ResultTex[uv] = sum * 0.25;
A Unity C# script loads the ComputeShader, creates two RenderTexture buffers with enableRandomWrite=true for ping‑ponging, and initializes one with the Voronoi color image for the first pass citeturn2search4.
Each frame, it sets shader parameters (_DistanceMap, _BoundaryMask, _VoronoiTex, _Iterations, _Strategy, _Iteration), dispatches via Dispatch(kernelID, width/8, height/8, 1), then swaps the buffers for the next iteration citeturn3search0.
public class DiffusionSolverController : MonoBehaviour {
public ComputeShader solver;
public Texture2D voronoiTex;
public Texture2D distanceMap;
public Texture2D boundaryMask;
public int iterations = 12;
public int strategy = 0; // 0 = SA, 1 = SH
RenderTexture texA, texB;
int w = voronoiTex.width, h = voronoiTex.height;
texA = MakeRT(w,h); texB = MakeRT(w,h);
Graphics.Blit(voronoiTex, texA);
kernelID = solver.FindKernel("CSMain");
RenderTexture MakeRT(int w,int h) {
var rt = new RenderTexture(w,h,0,RenderTextureFormat.ARGBFloat);
rt.enableRandomWrite = true; rt.Create();
solver.SetTexture(kernelID, "_PrevTex", texA);
solver.SetTexture(kernelID, "_ResultTex", texB);
solver.SetTexture(kernelID, "_VoronoiTex",voronoiTex);
solver.SetTexture(kernelID, "_DistanceMap",distanceMap);
solver.SetTexture(kernelID, "_BoundaryMask",boundaryMask);
solver.SetInt ("_Iterations", iterations);
solver.SetInt ("_Strategy", strategy);
for (int i = 0; i < iterations; i++) {
solver.SetInt("_Iteration", i);
int gx = Mathf.CeilToInt(voronoiTex.width / 8f);
int gy = Mathf.CeilToInt(voronoiTex.height / 8f);
solver.Dispatch(kernelID, gx, gy, 1);
(texA,texB) = (texB,texA);
// texA now contains the final diffused image
Typically eight iterations suffice to produce visually smooth output, and on a GeForce 8800 GTX a 1024×1024 distance map plus seven diffusion steps took only ~7.6 ms (≈130 Hz) citeturn1view0.
Use SA when you need minimal banding artifacts within few iterations, and SH when you prefer faster overall convergence and can tolerate brief low‑frequency banding citeturn1view0.