GitHub - sarahforcier/CUDA-Flocking: An introduction to CUDA programming by way of a Boids Flocking simulation

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

Sarah Forcier
Tested on: GeForce GTX 1070

100,000 boids	5,000 boids

720 FPS	620 FPS

Performance Analysis

Framerate change with increasing number of boids

Framerate change with and without visualization

Framerate change with increasing block size

Q&A

How does changing the number of boids affect performance? Why?

Generally, increasing the number of boids slows performance. However, the brute force method performs comparably with the grid-accelerated method for lower boid counts because iterating through all the boids can be completed faster than the necessary set up required for sorting the boids with respect to the grid.

How does changing the block count and block size affect performance? Why?

The performance increases with block size, but plateaus at 32 because the GPU architecture has a warp size of 32. Smaller block sizes do not take advantage of the full warp, but larger sizes that are multiples of 32 cannot get better performance because they have already use the full warp. However, for blocksizes that are not multiples of 32, performance is negatively affected because some warps will not be completely filled.

For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

A speed up was achieved by using a coherent uniform grid. This is the expected behavior because it reduces the number of cache misses that occur when the position and velocity is looked up. In the uniform grid implementation, the boid pointers are sorted in grid order but these pointers point to uncontiguous memory. The coherent uniform grid sorts the velocity and position data directly so that these values are contiguous.

Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not?

Decreasing the cell width to the neighborhood distance, and checking 27 neighboring cells decreased the performace by a factor of 7 for the coherent and uniform grid methods. The brute force method is not affected by this change. The simulation is much slower because there are simply more cells to check during each step, and with more cells to check, there is a greater probability of cache misses. This test was run with 10,000 boids because the FPS is zero for simulations with 50,000 boids and greater.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
cmake		cmake
external		external
images		images
shaders		shaders
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
README.md		README.md
blocksize.png		blocksize.png
flocking.gif		flocking.gif
flocking1.gif		flocking1.gif
performance.xlsx		performance.xlsx
typecomparison.png		typecomparison.png
visualization.png		visualization.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance Analysis

Framerate change with increasing number of boids

Framerate change with and without visualization

Framerate change with increasing block size

Q&A

How does changing the number of boids affect performance? Why?

How does changing the block count and block size affect performance? Why?

For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not?

About

Releases

Packages

Languages

sarahforcier/CUDA-Flocking

Folders and files

Latest commit

History

Repository files navigation

Performance Analysis

Framerate change with increasing number of boids

Framerate change with and without visualization

Framerate change with increasing block size

Q&A

How does changing the number of boids affect performance? Why?

How does changing the block count and block size affect performance? Why?

For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages