GitHub - cawney4/Project1-CUDA-Flocking

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

Flocking with CUDA on the GPU

Connie Chang

LinkedIn, Demo Reel
Tested on: Windows 10, Intel Xeon CPU E5-1630 v4 @ 3.70 GHz, GTX 1070 8GB (SIG Lab)

Introduction

This project consists of three different algorithms for flocking, and a performance analysis. Everthing is written in CUDA for the GPU. I was responsible for implementing the flocking algorithms, the neighborhood search, and invoking CUDA kernels.

10,000 particles flocking together

Screenshot of 5,000 particles

The three algorithms are:

Naive: Querying every particle to find close neighbors.
Uniform Grid: Breaking the space into voxels, and only checking nearby voxels.
Coherent Uniform Grid: The same as Uniform Grid, but with structuring particle data as contiguously as possible

Performance Graphs

A graph comparing the performance of each algorithm.

A graph comparing the number of particles (boids).

A graph comparing the number of threads per block.

A graph comparing the uniform grid's cell width, relative to the neighbor search distance.

Performance Questions and Answers:

For each implementation, how does changing the number of boids affect performance? Why do you think this is?

Naive: Increasing boids slows down the simulation. This makes sense because the algorithm takes more time to search through every boid.
Uniform Grid: Surprisingly, performance gets better as the number of boids increases. I've heard that GPUs can perform faster when larger blocks of threads are used. My guess is that's why my results are like so.
Coherent: Similar to uniform grid, the performance increases as the number of boids increases.

For each implementation, how does changing the block count and block size affect performance? Why do you think this is?

For all implementations, changing the block size did not significantly affect performance. I think this is because we need much more than 1024 threads, the maximum block size. Therefore, the blocks must be queued even at max block size, and we cannot go any faster.

For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

No, I did not see any performance improvements. In fact, it was sometimes slower to use the coherent uniform grid. This was not what I expected because I thought it would be faster. I expected lining up the boids more contiguously in memory would mean faster access to them. Seeing my results, my guess is that the extra steps to reshuffle the position/velocity data were too slow. It slowed down the process enough that the contiguous memory could not make up for it.

Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not? Be careful: it is insufficient (and possibly incorrect) to say that 27-cell is slower simply because there are more cells to check!

No, changing the cell width did not affect performance. Even though we are checking more neighboring cells with a smaller width, we check for less boids in each cell. In other words, a large cell has a higher chance of encompassing more boids that are too far to influence the current boid. On the other hand, a small cell has a lower chance of encountering such boids.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
cmake		cmake
external		external
images		images
performance		performance
shaders		shaders
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flocking with CUDA on the GPU

Connie Chang

Introduction

Performance Graphs

Performance Questions and Answers:

About

Releases

Packages

Languages

cawney4/Project1-CUDA-Flocking

Folders and files

Latest commit

History

Repository files navigation

Flocking with CUDA on the GPU

Connie Chang

Introduction

Performance Graphs

Performance Questions and Answers:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages