Skip to content

Software rasterizer, optimized using SIMD and multithreading.

Notifications You must be signed in to change notification settings

krzosa/software_rasterizer

Repository files navigation

Realtime Software Renderer

Optimized realtime software renderer. Renders Sponza Palace at 30FPS(on Ryzen 5800U), it was optimized using SIMD instructions and multithreading.

screenshot1 screenshot2

Rasterization

Algorithm used is from the article "A Parallel Algorithm for Polygon Rasterization" by Juan Pineda. First a bounding box of a triangle is calculated. Every pixel of that triangle is checked using the edge function from the paper to figure out if it belongs to the triangle. Other then that during rasterization also these things happen:

  • Clipping
  • Texture mapping
  • Depth buffer, near objects occlude far objects
  • Transparency using premultiplied alpha
  • Gamma correct interpolation of colors

SIMD

The inner loop of the rasterization is fully vectorized using AVX, AVX2 and FMA intrinsics. The general simd strategy is that 8 pixels are processed at the time. On every iteration, bitmap format is converted to the format for simd computation. SIMD format groups every color channel into separate registers. Then for example all compuation is done on a vector of 8 reds.

Multithreading

Rendered image is split to tiles, each thread gets one tile to render. To synchronize work between threads a simple work queue is implemented. It only uses atomic operations and semaphores to distribute work. Work queue is implemented in one producer, multiple consumers architecture.

Clipping

There are 3 clipping stages, 2 clipping stages in 3D space against zfar and znear and 1 clipping stage in 2D against left, bottom, right, top(2D image bounds).

First the triangles get clipped against the zfar plane, if a triangle has even one vertex outside the clipping region, the entire triangle gets cut. So far I didn't have problems with that. It simplifies the computations and splitting triangles on zfar seems like a waste of power.

The second clipping stage is znear plane. Triangles get fully and nicely clipped against znear. Every time a triangle gets partially outside the clipping region it gets cut to the znear and either one or two new triangles get derived from the old one.

Last clipping stage is performed in the 2D image space. Every triangle has a corresponding AABB box. In this box every pixel gets tested to see if it's in the triangle. In this clipping stage the box is clipped to the image metrics - 0, 0, width, height.

Source reading guide

  • main.cpp contains all the relevent drawing routines, including the optimized triangle rasterizing and other stuff like rendering text bitmaps
  • base files act as a standard library
  • base.cpp contains used data structures
  • os_windows_base.cpp contains platform specific code that base partially depends on
  • os_windows_multimedia.cpp deals with creating a window, creating a writable framebuffer etc.

Building

  1. Download Visual Studio and Clang
  2. Run build.bat
  3. Executable requires a specific Sponza obj + textures and it's not bundled with the repository, it's too big(500mb), repository is only for showcase, if someone actually wants to run this you can PM me but I doubt that anyone would want to run this...

Things to do:

  • Drawing triangles

  • Drawing cubes and lines for testing

  • Y up coordinate system, left handed

  • Drawing a cube with perspective

  • Culling triangles facing away from camera

  • Texture mapping

  • Basic linear transformations - rotation, translation, scaling

  • Bilinear filtering of textures

  • Nearest filtering

  • Fix the gaps between triangles (it also improved look of triangle edges)

  • Perspective matrix vs simple perspective

  • Perspective correct interpolation

  • Depth buffer

  • Gamma correct blending - converting to almost linear space

  • Alpha blending

  • Premultiplied alpha

  • Merge with base

  • Fill convention

  • Antialiasing (seems like performance gets really bad with this)

  • LookAt Camera

  • FPS Camera

  • Quarternions for rotations

  • Reading OBJ models

  • Dumping raw obj files

  • Loading raw obj files, big startup speedup!

  • Reading more OBJ formats

  • Reading OBJ .mtl files

  • Loading materials

  • Rendering textures obj models

  • Reading complex obj models (sponza)

  • Fix sponza uv coordinates - the issue was uv > 1 and uv < 0

  • Clipping

    • Triagnle rectangle bound clipping
    • A way of culling Z out triangles
      • Simple test z clipping
      • Maybe should clip a triangle on znear zfar plane?
      • Maybe should clip out triangles that are fully z out before draw_triangle
  • Proper infrustructure for transparent textures - sorting before rendering

  • Effects!!!

    • Outlines
  • Lightning

  • Reading PMX files

  • Rendering multiple objects, queue renderer

    • Simple function to render a mesh
  • Simple profiling tooling

  • Statistics based on profiler data

  • Find cool profilers - ExtraSleepy, Vtune

  • Optimizations

    • Inline edge function
    • Expand edge functions to more optimized version
    • [-] Test 4x2 bitmap layout?
    • [-] Edge function to integer
    • [-] Use integer bit operations to figure out if plus. (edge0|edge1|edge2)>=0
    • SIMD
    • Optimized SIMD
    • Multithreading
  • Text rendering

  • UI

    • Labels
    • Settings variables
    • Signals
    • Sliders
    • Groups
  • Gamma correct alpha blending for rectangles and bitmaps

  • Plotting of profile data

    • Simple scatter plot
  • Asset processor as second program

Resources that helped me build the rasterizer (Might be helpful to you too):

To read

About

Software rasterizer, optimized using SIMD and multithreading.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published