Skip to content

tugrul512bit/FastCollisionDetectionLib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastCollisionDetectionLib

C++ fast collision detection for uniform(and non-uniform)-distributed AABB particles using adaptive grid with implicit vectorization.

Non-Sparse Adaptive Grid

  • 10 million dynamic particles AABB collision check per second against static grid of 8000 particles
  • 1000x speedup against naive brute-force algorithm for 40k particles (static vs static), with uniform-distribution in range [0 - 1]).
    • teapot-in-stadium problem is partially solved by "adaptive" grid:
    • 290x speedup when half of AABBs are 10x further than each other [0-1] and [10-11]
    • 230x speedup when half of AABBs are 10x far and a single AABB 100x far: [0-1] x N/2, [10-11] x N/2, [100-101] x1
  • Produced collision list does not contain duplicate pairs of collisions
  • Particle data is not touched, work done only on pointers internally
  • Currently it is adaptive, but needs optimizations on memory handling.
    • On every cell-overflow, it stretches the cell to AABB of all particles and converts to a grid of 4x4x4 cells each with 4 capacity
  • Implementation of IParticle is an AABB (axis-aligned bounding box) model
    • In user defined particle (box as example here), methods (getMinX/Y/Z and getMaxX/Y/Z) must return AABB corners of the underlying user-particle

Sparse - Linear - Adaptive Grid

  • More than 2000x speedup against naive brute-force, in single-thread for 10000 particles
  • 60 FPS for 20000 particles and ~25000 collision pairs on 2.1GHz FX8150 single-thread
    • 100+ FPS for new CPUs
  • Better performance stability compared to non-sparse version
  • Better SIMD support on all-pairs computation method using tiled-computing
  • Non-zero based object-id values supported (getId() method in IParticle interface)
// prepare memory pool
FastColDetLib::MemoryPool memPool;

// map grid to a volume of cube between corners of (0,0,0) and (10005,10005,10005)
FastColDetLib::AdaptiveGridV2 grid(memPool,0,0,0,10005,10005,10005);

// implement IParticle<float>
struct AABBofPointCloud: public FastColDetLib::IParticle<float>
{
   ...
   	const CoordType getMaxX()const {return xmax;}
	const CoordType getMaxY()const {return ymax;}
	const CoordType getMaxZ()const {return zmax;}
	const CoordType getMinX()const {return xmin;}
	const CoordType getMinY()const {return ymin;}
	const CoordType getMinZ()const {return zmin;}
	const int getId()const {return id;}
    ...
};

// initialize AABB vector
std::vector<AABBofPointCloud> AABBs;

while(simulation)
{
    // clear tree data
    grid.clear();
    
    // add particles that implement IParticle<float> into grid
    grid.addParticles(N,AABBs.data());
    
    // build tree
    grid.buildTree();
    
    // compute all-pairs collision array 
    // 60FPS on FX8150 2.1GHz single-thread for 20000 particles with less than 29000 collisions
    // 100FPS on Xeon Gold 5215 2.9GHz single-thread for 20000 particles with less than 29000 collisions
    std::vector<std::pair<int,int>> vec = grid2_0.findCollisionsAll();
    
    // the vec contains id-values of particles that have their AABBs collide so that you can do further fine-grained collision checks between them
}

Multithreaded Tree of Sparse - Linear - Adaptive Grid

  • 60 FPS for 40000 particles' AABB all-pair computations
  • Bottlenecked by RAM bandwidth and mutex-array locking throughput
  • Only zero-based object-id values supported
  • Work load is balanced on particles, not volumes, this makes better distribution of particles on threads but causes duplicated work due to merging of results from all leaf nodes
    // 7 threads with load-balancing by a tree, mapped to (0,0,0)-(10005,10005,10005) region
    AdaptiveGridTree<7> test(0,0,0,10005,10005,10005);

    for(int i=0;i<100;i++)
    {
		size_t nano;
		{
			FastColDetLib::Bench bench(&nano);
			
			// clear contents of memory pool of adaptive grid tree
			test.clear();
			
			// AABBs is a vector of objects that implements IParticle<float> interface 
	                // (only zero-based object-id values supported from getId() method of IParticle<float>)
			test.addParticles(N,AABBs.data()); 
			
			// non-duplicate pairs of collisions 
			const auto coll = test.computeAllPairs();
			std::cout<<"c="<<coll.size()<<std::endl;
		}
		std::cout<<"t="<<nano<<std::endl;
    }

For details, please visit https://github.com/tugrul512bit/FastCollisionDetectionLib/wiki wiki page.

Working demo for non-sparse (old version) adaptive grid (requires linking pthread for header and gomp/fopenmp for this demo):

#include"FastCollisionDetectionLib.h"

// user-object, with IParticle<coordinate_type> interface for querying AABB within the API
class Box: public FastColDetLib::IParticle<float>
{
public:
	Box(float xPrm=0, float yPrm=0, float zPrm=0, float x2Prm=0, float y2Prm=0, float z2Prm=0, size_t idPrm=0)
	{
		x=xPrm;
		y=yPrm;
		z=zPrm;

		x2=x2Prm;
		y2=y2Prm;
		z2=z2Prm;
		id=idPrm;
	}

	const float getMaxX() const { return x>=x2?x:x2;}
	const float getMaxY() const { return y>=y2?y:y2;}
	const float getMaxZ() const { return z>=z2?z:z2;}

	const float getMinX() const { return x>=x2?x2:x;}
	const float getMinY() const { return y>=y2?y2:y;}
	const float getMinZ() const { return z>=z2?z2:z;}
	const int getId() const {return id;}

	~Box(){}

private:
	float x,y,z,x2,y2,z2;
	size_t id;
};

#include<iostream>
#include<omp.h>
#include"Generator.h"
int main()
{

	oofrng::Generator<64> gen;

	{
		// d^3 number of particles
		const int d = 20;
		const int n=d*d*d;
		std::cout<<"n="<<n<<std::endl;
		std::vector<Box> box(n);
		for(int i=0;i<n;i++)
		{
			auto x1 = gen.generate1Float()*d;
			auto y1 = gen.generate1Float()*d;
			auto z1 = gen.generate1Float()*d;

			float dx = 0.05f;
			float dx2 = 0.05f;
			box[i]=Box(x1,y1,z1,x1+dx2+dx*gen.generate1Float(),y1+dx2+dx*gen.generate1Float(),z1+dx2+dx*gen.generate1Float(),i);
		}

		// thread-pool releases itself once it is out of scope
		// single instance of thread pool can be used for multiple adaptive grids
		FastColDetLib::ThreadPool<float> thr;
		FastColDetLib::AdaptiveGrid<float> grid(thr,-1,-1,-1,d+1,d+1,d+1);
		FastColDetLib::BruteForce<float> bruteForce;
		std::vector<FastColDetLib::CollisionPair<float>> coll3D,coll3Dbrute;


		std::cout<<"add"<<std::endl;
		bruteForce.add(&box[0],n);
		std::cout<<"compute"<<std::endl;
		bool bfor = true;


		for(int k=0;k<40;k++)
		{
			size_t nano1,nano2;

			{
				{
					FastColDetLib::Bench bench(&nano1);
					{
						FastColDetLib::Bench bench(&nano1);
						grid.clear();
					}
					std::cout<<"grid-static-object-clear: "<<nano1/1000000000.0<<" s "<<std::endl;
					{
						FastColDetLib::Bench bench(&nano1);
						grid.add(box.data(),n);
					}
					std::cout<<"grid-static-object-add ("<<n<<" particles AABB): "<<nano1/1000000000.0<<" s "<<std::endl;
					{
						FastColDetLib::Bench bench(&nano1);
						coll3D = grid.getCollisions();
					}
					std::cout<<"grid-static-object-compute: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl;
				}
				std::cout<<"grid-static-total: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl;
				std::vector<float> rand(1000000*3);
				gen.generate(rand.data(),1000000*3);
				{
					std::mutex mut;
					std::vector<FastColDetLib::IParticle<float>*> res;
					{
						FastColDetLib::Bench bench(&nano1);
						#pragma omp parallel for
						for(int j=0;j<100;j++)
						{
							std::vector<FastColDetLib::IParticle<float>*> resTmp;
							for(int i=0;i<1000;i++)
							{
								auto x = rand[i*3]*d;
								auto y = rand[i*3+1]*d;
								auto z = rand[i*3+2]*d;
								auto item = Box(x,y,z,x+0.25,y+0.25,z+0.25);
								auto collisions = grid.getDynamicCollisionListFor(&item);
								std::copy(collisions.begin(),collisions.end(),std::back_inserter(resTmp));
							}
							std::lock_guard<std::mutex> lg(mut);
							std::copy(resTmp.begin(),resTmp.end(),std::back_inserter(res));
						}

					}
					std::cout<<"grid-compute-dynamic (100k particles AABB): "<<nano1/1000000000.0<<" s "<<res.size()<<std::endl;
				}

			}


			if(bfor)
			{
				FastColDetLib::Bench bench(&nano2);
				coll3Dbrute = bruteForce.getCollisions();
			}
			if(bfor)
				std::cout<<"Brute-force ("<<n<<" particles AABB): "<<nano2/1000000000.0<<" s "<<coll3Dbrute.size()<<std::endl;
			std::cout<<"------------------------------------------------------------------------------------"<<std::endl;
		}


		if(bfor)
		{
			if(coll3D.size() != coll3Dbrute.size())
			{
				std::cout<<"ERROR: not equal sizes of collision lists"<<std::endl;
				std::cout<<coll3D.size()<<" vs "<< coll3Dbrute.size()<<std::endl;
				const int sz = std::min(coll3D.size(),coll3Dbrute.size());
				const int sz2 = sz<10?sz:10;
				for(int i=0;i<sz2;i++)
				{
					if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) &&
							(coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId())
							)
					{
						std::cout<<"ERRRROOOR!"<<std::endl;
						std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<"        "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl;
					}
				}
			}
			else
			{
				const size_t sz = coll3D.size();
				for(size_t i=0;i<sz;i++)
				{

					if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) &&
							(coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId())
							)
					{
						std::cout<<"ERRRROOOR!"<<std::endl;
						std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<"        "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl;
					}
				}
			}
		}
		std::cout<<"--"<<std::endl;


	}
	return 0;
}

output for FX8150 at 2.1GHz:

n=8000
add
compute
grid-static-object-clear: 6.4614e-05 s 
grid-static-object-add (8000 particles AABB): 0.00656692 s 
grid-static-object-compute: 0.00195991 s 16
grid-static-total: 0.00866529 s 16
grid-compute-dynamic (100k particles AABB): 0.0137853 s 3600
Brute-force (8000 particles AABB): 0.541069 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00143487 s 
grid-static-object-add (8000 particles AABB): 0.00523593 s 
grid-static-object-compute: 0.00182263 s 16
grid-static-total: 0.00853913 s 16
grid-compute-dynamic (100k particles AABB): 0.00835323 s 4300
Brute-force (8000 particles AABB): 0.541662 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00161331 s 
grid-static-object-add (8000 particles AABB): 0.00521717 s 
grid-static-object-compute: 0.00180544 s 16
grid-static-total: 0.0086723 s 16
grid-compute-dynamic (100k particles AABB): 0.00864333 s 3200
Brute-force (8000 particles AABB): 0.54128 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00151348 s 
grid-static-object-add (8000 particles AABB): 0.00524639 s 
grid-static-object-compute: 0.00180144 s 16
grid-static-total: 0.00860804 s 16
grid-compute-dynamic (100k particles AABB): 0.010513 s 3200
Brute-force (8000 particles AABB): 0.541246 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00160501 s 
grid-static-object-add (8000 particles AABB): 0.00521408 s 
grid-static-object-compute: 0.00176137 s 16
grid-static-total: 0.00862669 s 16
grid-compute-dynamic (100k particles AABB): 0.0120599 s 3000
Brute-force (8000 particles AABB): 0.54391 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00174257 s 
grid-static-object-add (8000 particles AABB): 0.00548767 s 
grid-static-object-compute: 0.00182023 s 16
grid-static-total: 0.00909577 s 16
grid-compute-dynamic (100k particles AABB): 0.0083235 s 3500
Brute-force (8000 particles AABB): 0.54521 s 16
------------------------------------------------------------------------------------

This is ~8500x performance for checking a particle collision against a grid of static objects and ~50x performance for static-static collision check (or dynamic-dynamic).

More particles:

n=64000
add
compute
grid-static-object-clear: 1.3058e-05 s 
grid-static-object-add (64000 particles AABB): 0.0347275 s 
grid-static-object-compute: 0.0106954 s 114
grid-static-total: 0.0455527 s 114
grid-compute-dynamic (100k particles AABB): 0.0154652 s 3700
Brute-force (64000 particles AABB): 35.473 s 114

777x performance for static vs static collision checking, more than 2000x performance for static vs dynamic.

Releases

No releases published

Packages

No packages published

Languages