Dev Tools Article

Writing a C++20 Path Tracer From Scratch Without AI

A dependency-free renderer showcases the power of modern C++20, custom acceleration structures, and classic graphics engineering.

Lenn Voss

Cloud & Infrastructure Writer · Jun 15, 2026 · 4 min read

Writing a path tracer is a rite of passage for graphics programmers. Doing it in modern C++20 with absolutely zero third-party dependencies—and without leaning on AI assistants—is a masterclass in software craftsmanship.

That is exactly what developer themartiano delivered with Luz, a CPU-based Monte Carlo path tracer built entirely from the ground up. By eschewing external libraries for everything from vector math to image output, the project serves as a clean, highly readable blueprint for how modern C++ can handle complex mathematical simulations and heavy parallel workloads.

The Zero-Dependency Architecture

In modern software development, it is easy to fall into the trap of dependency bloat. A typical project might pull in dozens of libraries for windowing, image saving, math, and parsing. Luz takes the opposite path.

To keep the codebase entirely self-contained, the author implemented every core utility manually. This includes:

Custom Math & Geometry: Vector operations, ray-object intersections, and probability density functions (PDFs) for importance sampling.
File I/O: Custom parsers for .luz scene files and standard Wavefront OBJ meshes, alongside custom encoders for BMP and TIFF image outputs.
Integration Tools: A custom exporter to convert .blend files from Blender directly into the native .luz format, bridging the gap between professional modeling tools and a custom engine.

The project targets macOS, Linux, and Windows (via MSVC or MinGW), and supports building with either a standard Makefile or CMake. The only optional dependency is Python, which is used solely for helper scripts and tooling.

Acceleration and Sampling Under the Hood

Path tracing is computationally brutal. Without optimization, casting millions of rays into a scene with complex geometry will quickly grind any CPU to a halt. To achieve interactive rendering speeds, Luz implements several classic and advanced graphics algorithms.

Binned SAH BVH Acceleration

To handle complex OBJ meshes, Luz uses a Bounding Volume Hierarchy (BVH). Specifically, it implements packed mesh BVHs constructed using a binned Surface Area Heuristic (SAH) and traversed using a near-first approach. SAH binning significantly speeds up BVH construction times by grouping primitives into spatial bins to find the optimal split plane, while near-first traversal ensures the ray intersects the closest bounding boxes first, allowing for early termination of intersection tests.

Intelligent Adaptive Sampling

Instead of throwing a uniform number of samples at every pixel, Luz features an adaptive sampling engine. The user sets a maximum sample limit, and the renderer processes each pixel progressively. After a minimum threshold of samples is met, the engine periodically checks the luminance and RGB confidence intervals of the pixel.

If the pixel has converged (meaning the noise is below a specified threshold), rendering stops early for that coordinate. To prevent "fireflies" or missed light paths, very dark pixels use a conservative minimum sample count before they are allowed to stop, ensuring rare light contributions are not mistaken for converged black.

Materials, Mediums, and Post-Processing

Despite having zero external dependencies, Luz supports a surprisingly rich feature set that rivals commercial toy renderers:

Materials & Lights: Support for Lambertian, metal, dielectric (glass), emissive, and isotropic materials. Light sources include area, point, sphere, and directional lights.
Volumetrics & Atmosphere: Isotropic materials allow for rendering participating media (like fog or smoke), complemented by an atmospheric simulation that models Rayleigh and Mie scattering.
Post-Processing Pipeline: Once the raw rays are traced, Luz applies a built-in post-processing stack. This includes depth of field, antialiasing, exposure compensation, contrast adjustment, tone mapping, gamma correction, and bloom.
Denoising: To clean up Monte Carlo noise without waiting hours for renders to converge, Luz includes an integrated Non-Local Means (NFOR-style) denoiser that can output a clean companion image alongside the raw render.

Squeezing Performance Out of the CPU

Because Luz is a multithreaded CPU renderer, squeezing every drop of performance out of the hardware is critical. The build system is designed to compile highly optimized binaries by default, utilizing aggressive compiler flags:

-O3: Enables high-level compiler optimizations.
-march=native: Instructs the compiler to generate instructions specific to the host CPU (utilizing modern vector instruction sets like AVX if available).
-flto: Enables Link-Time Optimization (or interprocedural optimization in CMake) to optimize across translation units.
Fast Math: Enables fast floating-point modes where supported by the compiler and platform.

While these flags yield massive performance gains, they can occasionally cause illegal-instruction crashes on older CPUs or trigger toolchain-specific linker bugs. To address this, the build system allows developers to easily disable native tuning and LTO via command-line overrides (e.g., make NATIVE=0 LTO=0).

To help developers measure the impact of code changes, Luz includes a deterministic benchmarking harness. Running make benchmark generates detailed CSV reports breaking down performance across rendering, denoising, and post-processing, allowing for precise before-and-after comparisons during optimization passes.

Sources & further reading

Show HN: I wrote a C++ ray tracer from scratch without AI — github.com

#Performance #Cpp #Graphics #Ray Tracing #Rendering

Written by

Lenn Voss · Cloud & Infrastructure Writer

Lenn writes about cloud platforms, Kubernetes internals, and the infrastructure decisions that quietly make or break engineering organizations. Based in Berlin's vibrant tech scene, they have a talent for turning dense platform-engineering topics into prose that people actually finish reading.

Discussion 4

Join the discussion

Oleg Petrov @db_nerd_oleg · 1 day ago

i love how luz handles parallel workloads, reminds me of how postgres uses parallel query support to speed up complex queries, really interesting to see similar concepts applied to graphics rendering

Emma Lindgren @excited_emma · 1 day ago

okay this is actually huge

Maya Ito @opensource_maya · 1 day ago

@excited_emma i totally agree, the fact that luz is dependency-free makes it a great learning resource - you can really see how all the pieces fit together without any external libraries getting in the way

Tobias Lindqvist @securepaws · 23 hours ago

@excited_emma yeah but what's the attack surface on a custom vector math lib?

Writing a C++20 Path Tracer From Scratch Without AI

The Zero-Dependency Architecture

Acceleration and Sampling Under the Hood

Binned SAH BVH Acceleration

Intelligent Adaptive Sampling

Materials, Mediums, and Post-Processing

Squeezing Performance Out of the CPU

Sources & further reading

Discussion 4

Related Reading

HTTP Requests Without Curl: The Bash /dev/tcp Trick

Inside TimescaleDB's Columnar Compression Pipeline

Iroh 1.0 Wants You to Dial Keys, Not IP Addresses

Bootstrap an Arch Linux Dev Machine: pacman, yay, and a Full Toolchain