Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 1: Henry Zhu #6

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ set(CORELIBS
)

# Enable C++11 for host code
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)

# Enable CUDA debug info in debug mode builds
list(APPEND CUDA_NVCC_FLAGS_DEBUG -G -g)
Expand Down
83 changes: 77 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,82 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
## 1000 boids and 10000 boids with coherence

### (TODO: Your README)
![](coherent_1000.gif)

![](coherent_10000.gif)

* Henry Zhu
* [LinkedIn](https://www.linkedin.com/in/henry-zhu-347233121/), [personal website](https://maknee.github.io/), [twitter](https://twitter.com/maknees1), etc.
* Tested on: Windows 10 Home, Intel i7-4710HQ @ 2.50GHz 22GB, GTX 870M (Own computer)

## Answer to Questions

### For each implementation, how does changing the number of boids affect performance? Why do you think this is?

If one has more boids, the CUDA has to calculate iterate through more boids and calulate neighboring nodes. This impacts performance, especially for the naive implementation, which iterates through all boids for each boid when checking for the rule. For the non-naive implemenation, this is less impactful since boids are stored in grid cells.

### For each implementation, how does changing the block count and block size affect performance? Why do you think this is?

When changing the block count/block size to be more, this impacts performance by splitting up more work to the GPUs, so more kernel threads can run.

### For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

I did notice a performance boost, which is caused by cache hits in the GPU.

#### Coherent FPS

![](performance_coherent.png)

#### Uniform FPS

![](performance_uniform.png)

### Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not? Be careful: it is insufficient (and possibly incorrect) to say that 27-cell is slower simply because there are more cells to check!

When changing the block count/block size to be more, this impacts performance by making the non-naive implementation has to iterate through more boids. This does not impact the naive implementation as it iterates through all the boids anyways.

## Performance Analysis (FPS)

![](performance.png)

## Performance Analysis (From NSight)

### 2000 boids vs 20000 boids (visualized)

#### Naive

![](naive_per_2000.png)
![](naive_per_20000.png)

#### Uniform

![](uniform_per_2000.png)
![](uniform_per_20000.png)

#### Coherent

![](coherent_per_2000.png)
![](coherent_per_20000.png)

### 20000 boids (non visualized)

#### Uniform

![](uniform_per_20000_no_opengl.png)

#### Coherent

![](coherent_per_20000_no_opengl.png)

### 128 vs 256 blocks (20000 boids)

#### Uniform

![](uniform_per_20000_256_block.png)

#### Coherent

![](coherent_per_20000_256_block.png)

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
Binary file added coherent_1000.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added coherent_10000.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added coherent_per_2000.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added coherent_per_20000.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added coherent_per_20000_256_block.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added coherent_per_20000_no_opengl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added naive_per_2000.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added naive_per_20000.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added performance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added performance_coherent.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added performance_uniform.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_30
)
Loading