-
Notifications
You must be signed in to change notification settings - Fork 9
Developer Info
The general layout of the source code is based on the following directories:
- ROOT: CUDA memory manager, base classes for pseudo virtual classes in CUDA
- Math: Linear algebra math classes as well as sampling functions and function integrators
- Base: General purpose classes like timing, fixed size strings, high performance file streams and random number generators
- Engine: BSDFs, Emitters, Sensors, Image filters, Textures and all other basic components a rendering algorithm needs
- Kernel: Ray tracing operations, buffer management on CPU and GPU
- Integrators: Rendering algorithms listed above, some ported to "Wavefront Path Tracing" for efficiency on the GPU
In the host side overview shown above it is clearly visible that polymorphism would be a useful concept for such an implementation. Due to technical neccessities CUDA does not support creating virtual classes on the host and copying them to the device. To circumvent this issue a small helper class CudaVirtualAggregate
is used to store classes with virtual functions. For the user of the library this adds the small inconvenience that it is not possible to create new objects like the following:
new PerspectiveSensor(fov)
Instead of that, one has to use:
CreateAggregate<Sensor>(PerspectiveSensor(fov))
Here the template argument specifies the base class of the type we would like to construct.
On the host one needs to initialize/deinitialize the static parts of the library. e.g. FreeImage, in the following way:
InitializeCuda4Tracer("path to ior/microfacet folder");
...
DeInitializeCuda4Tracer();
Creating a scene is done like this:
Sensor camera = CreateAggregate<Sensor>(PerspectiveSensor(width, height, fov));
DynamicScene scene(&camera, SceneInitData::CreateForScene(10, 10, 1000), &fManager);
There are other parts of the library which do not need a scene object but still the static initialization!
The SceneInitData
object describes the size of the scene about to be created, so enough storage is allocated on the gpu. The pointer to fManager
is an object which implements IFileManager
telling the scene where to store temporary compiled mesh objects and textures.
Before tracing any rays the kernel module has to be intialized with:
void k_INITIALIZE(DynamicScene* a_Scene, const CudaRNGBuffer& a_RngBuf);
The second parameter is a reference to a buffer of random number genderators, TracerBase provides one for convenience.
Now traceRay
can be used to trace rays through the scene and obtain TraceResults. The diagram above shows what can be done with such a result. DifferentialGeometry describes the geometry at the intersection, and BSDFSamplingRecord is neccessary to sample BSDF
objcets. Doing algorithmic development the class KernelDynamicScene
is helpful, it provides all sampling strategies which commonly occur during Monte Carlo ray tracing.
It is easily possible to implement custom integrators by deriving from
template<bool USE_BLOCKSAMPLER, bool PROGRESSIVE> class Tracer
The first argument specifies whether an image space sampler should be used to sample unconverged pixels more often. The second parameter specifies whether the tracer will progressively sample a frame until convergence or generate a new frame each time it is called. In case the integrator can be implemented in terms of sampling pixels, it is sufficient to override
virtual void RenderBlock(Image* I, int x, int y, int blockW, int blockH);
This method should add pixel samples for the specified area. Other integrators such as a photon tracer have to override
virtual void DoRender(Image* I);
and sample all pixels. In these methods you can either use the GPU to compute pixel samples by launching CUDA kernels or use the CPU. Due to the design it is possible to make large parts of the code independent of whether it is run on the device or host. This will help during debugging as you can just implement
void Debug(Image* I, const Vec2i& p);
and use the host debugger to figure out what is going on.
Note of caution: On Windows one must link against ALL seperate *.cu.obj
object files from the original library. Without doing that, the CUDA linker will not be able to link device functions.
A complete example of how to use the library can be found in the wiki.
Here are some notes about the code and its issues:
- Here it is stated that CUDA does not allow to use an object of a class derived from virtual base classes in a device function. This has probably to do with the memory layout of objects with virtual functions. This library currently operates on the assumption that doing so is acceptable as long as no virtual functions are used. It will take a considerable amount of time to correctly solve this problem. [Side note: when using multiple inheritence this will no longer work, due to mismatching host/device class layouts]
-
size_t
vsunsigned int
: In some placessize_t
is used while in othersunsigned int
is used. This mistake was made at the start of the project, but is actually harder to fix than one might assume because of performance considerations for the GPU. - Memory management is not done consistently. While CUDA_FREE provides some debug info on where the memory is kept, it would have been wiser to use some memory classes to keep track of copies of the same data on the host and device.
- The Buffer module is not
const
correct, there is no notion ofconst
iterators and all methods return normal references. - Constructors are commonly ignored, e.g. the buffer classes use
malloc
instead ofnew
to avoid making guarantees about using the constructor. The problem here is that due to technical necesseties CUDA does not allow constructors on symbolic variables which are used extensivly.