Skip to content

Developer Info

Hannes Hergeth edited this page Apr 8, 2018 · 7 revisions

Developer Information

Using the library

The general layout of the source code is based on the following directories:

  • Math: Linear algebra math classes as well as sampling functions and function integrators
  • Base: General purpose classes like timing, fixed size strings, high performance file streams and random number generators. CUDA memory manager, base classes for pseudo virtual classes in CUDA
  • Engine: All basic components a rendering algorithm needs such as a Scene, texture and mesh loading.
  • Kernel: Ray tracing operations, buffer management on CPU and GPU, the ImagePipeline to tonemap and filter accumulated images and also all PixelDebugVisualizers to output debug information on pixels.
  • Integrators: All rendering algorithms, some ported to "Wavefront Path Tracing" for efficiency on the GPU
  • SceneTypes: BSDFs, Emitters, Sensors, Image filters, Textures all polymorphic types present in a Monte Carlo based Path Tracing library.

Prefix

In the host side overview shown above it is clearly visible that polymorphism would be a useful concept for such an implementation. Due to technical necessities CUDA does not support creating virtual classes on the host and copying them to the device. To circumvent this issue a small helper class CudaVirtualAggregate is used to store classes with virtual functions. For the user of the library this adds the small inconvenience that it is not possible to create new objects like the following:

new PerspectiveSensor(fov)

Instead of that, one has to use:

CreateAggregate<Sensor>(PerspectiveSensor(fov))

Here the template argument specifies the base class of the type we would like to construct.

Initialization

On the host one needs to initialize/deinitialize the static parts of the library (FreeImage, SpectrumHelper and RoughTransmittanceManager), in the following way:

InitializeCuda4Tracer("path to ior/microfacet folder");
...
DeInitializeCuda4Tracer();

Creating a scene is done like this:

Sensor camera = CreateAggregate<Sensor>(PerspectiveSensor(width, height, fov));
DynamicScene scene(&camera, SceneInitData::CreateForScene(10, 10, 1000), &fManager);

There are some parts of the library which do not need a scene object but still need the static initialization! The SceneInitData object describes the size of the scene about to be created, so enough storage is allocated on the GPU. The pointer to fManager is an object which implements IFileManager telling the scene where to store temporary compiled mesh objects and textures.

Extending the library

Tracing some rays!

Before tracing any rays the kernel module has to be initialized with:

void UpdateKernel(DynamicScene* a_Scene);

Now traceRay can be used to trace rays through the scene and obtain TraceResults. The diagram above shows what can be done with such a result. DifferentialGeometry describes the geometry at the intersection, and BSDFSamplingRecord is necessary to sample BSDF objects. Doing algorithmic development the class KernelDynamicScene is helpful, it provides all sampling strategies which commonly occur during Monte Carlo ray tracing.

Custom Integrators

It is easily possible to implement custom integrators by deriving from

template<bool USE_BLOCKSAMPLER, bool PROGRESSIVE> class Tracer

The first argument specifies whether an image space sampler should be used to sample unconverged pixels more often. The second parameter specifies whether the tracer will progressively sample a frame until convergence or generate a new frame each time it is called. In case the integrator can be implemented in terms of sampling pixels, it is sufficient to override

virtual void RenderBlock(Image* I, int x, int y, int blockW, int blockH);

This method should add pixel samples for the specified area. Other integrators such as a photon tracer have to override

virtual void DoRender(Image* I);

and sample all pixels. In these methods you can either use the GPU to compute pixel samples by launching CUDA kernels or use the CPU. Due to the design it is possible to make large parts of the code independent of whether it is run on the device or host. This will help during debugging as you can just implement

void DebugInternal(Image* I, const Vec2i& p);

and use the host debugger to figure out what is going on.

A complete example of how to use the library can be found in the wiki.

#f03c15 No longer possible due to changes in CUDA 8.0

Note of caution: On Windows one must link against ALL separate *.cu.obj object files from the original library. Without doing that, the CUDA linker will not be able to link device functions.

Issues and Notes

Here are some notes about the code and its issues:

  • Here it is stated that CUDA does not allow using an object of a class derived from virtual base classes in a device function. This has probably to do with the memory layout of objects with virtual functions. This library currently operates on the assumption that doing so is acceptable as long as no virtual functions are used. It will take a considerable amount of time to correctly solve this problem. [Side note: when using multiple inheritance this will no longer work, due to mismatching host/device class layouts]
  • size_t vs unsigned int: In some places size_t is used while in others unsigned int is used. Optimally on the host only size_t should be used and unsigned int on the device to reduce register pressure. This design was not completely implemented and in some places the host side is working with unsigned ints.
  • Memory management is not done consistently. While CUDA_FREE provides some debug info on where the memory is kept, it would have been wiser to use some memory classes to keep track of copies of the same data on the host and the device.
  • The Buffer module is not const correct, there is no notion of const iterators and all methods return normal references.
  • Constructors are commonly ignored, e.g. the buffer classes use malloc instead of new to avoid making guarantees about using the constructor. The problem here is that due to technical necessities CUDA does not allow constructors on symbolic variables which are used extensively.