Developer Info

Developer Information

Using the library

alt text

The general layout of the source code is based on the following directories:

ROOT: CUDA memory manager, base classes for pseudo virtual classes in CUDA
Math: Linear algebra math classes as well as sampling functions and function integrators
Base: General purpose classes like timing, fixed size strings, high performance file streams and random number generators
Engine: BSDFs, Emitters, Sensors, Image filters, Textures and all other basic components a rendering algorithm needs
Kernel: Ray tracing operations, buffer management on CPU and GPU
Integrators: Rendering algorithms listed above, some ported to "Wavefront Path Tracing" for efficiency on the GPU

Prefix

In the host side overview shown above it is clearly visible that polymorphism would be a useful concept for such an implementation. Due to technical neccessities CUDA does not support creating virtual classes on the host and copying them to the device. To circumvent this issue a small helper class CudaVirtualAggregate is used to store classes with virtual functions. For the user of the library this adds the small inconvenience that it is not possible to create new objects like the following:

new PerspectiveSensor(fov)

Instead of that, one has to use:

CreateAggregate<Sensor>(PerspectiveSensor(fov))

Here the template argument specifies the base class of the type we would like to construct.

Initialization

On the host one needs to initialize/deinitialize the static parts of the library. e.g. FreeImage, in the following way:

InitializeCuda4Tracer("path to ior/microfacet folder");
...
DeInitializeCuda4Tracer();

Creating a scene is done like this:

Sensor camera = CreateAggregate<Sensor>(PerspectiveSensor(width, height, fov));
DynamicScene scene(&camera, SceneInitData::CreateForScene(10, 10, 1000), &fManager);

There are other parts of the library which do not need a scene object but still the static initialization! The SceneInitData object describes the size of the scene about to be created, so enough storage is allocated on the gpu. The pointer to fManager is an object which implements IFileManager telling the scene where to store temporary compiled mesh objects and textures.

Extending the library

alt text

Tracing some rays!

Before tracing any rays the kernel module has to be intialized with:

void k_INITIALIZE(DynamicScene* a_Scene, const CudaRNGBuffer& a_RngBuf);

The second parameter is a reference to a buffer of random number genderators, TracerBase provides one for convenience. Now traceRay can be used to trace rays through the scene and obtain TraceResults. The diagram above shows what can be done with such a result. DifferentialGeometry describes the geometry at the intersection, and BSDFSamplingRecord is neccessary to sample BSDF objcets. Doing algorithmic development the class KernelDynamicScene is helpful, it provides all sampling strategies which commonly occur during Monte Carlo ray tracing.

Custom Integrators

It is easily possible to implement custom integrators by deriving from

template<bool USE_BLOCKSAMPLER, bool PROGRESSIVE> class Tracer

The first argument specifies whether an image space sampler should be used to sample unconverged pixels more often. The second parameter specifies whether the tracer will progressively sample a frame until convergence or generate a new frame each time it is called. In case the integrator can be implemented in terms of sampling pixels, it is sufficient to override

virtual void RenderBlock(Image* I, int x, int y, int blockW, int blockH);

This method should add pixel samples for the specified area. Other integrators such as a photon tracer have to override

virtual void DoRender(Image* I);

and sample all pixels. In these methods you can either use the GPU to compute pixel samples by launching CUDA kernels or use the CPU. Due to the design it is possible to make large parts of the code independent of whether it is run on the device or host. This will help during debugging as you can just implement

void Debug(Image* I, const Vec2i& p);

and use the host debugger to figure out what is going on.

Note of caution: On Windows one must link against ALL seperate *.cu.obj object files from the original library. Without doing that, the CUDA linker will not be able to link device functions.

A complete example of how to use the library can be found in the wiki.

Issues and Notes

Here are some notes about the code and its issues:

Here it is stated that CUDA does not allow to use an object of a class derived from virtual base classes in a device function. This has probably to do with the memory layout of objects with virtual functions. This library currently operates on the assumption that doing so is acceptable as long as no virtual functions are used. It will take a considerable amount of time to correctly solve this problem. [Side note: when using multiple inheritence this will no longer work, due to mismatching host/device class layouts]
size_t vs unsigned int: In some places size_t is used while in others unsigned int is used. This mistake was made at the start of the project, but is actually harder to fix than one might assume because of performance considerations for the GPU.
Memory management is not done consistently. While CUDA_FREE provides some debug info on where the memory is kept, it would have been wiser to use some memory classes to keep track of copies of the same data on the host and device.
The Buffer module is not const correct, there is no notion of const iterators and all methods return normal references.
Constructors are commonly ignored, e.g. the buffer classes use malloc instead of new to avoid making guarantees about using the constructor. The problem here is that due to technical necesseties CUDA does not allow constructors on symbolic variables which are used extensivly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly