CIS565-Fall-2018 · Zichuanyun · Sep 8, 2018 · Sep 9, 2018 · Sep 9, 2018 · Sep 9, 2018
diff --git a/README.md b/README.md
@@ -1,11 +1,49 @@
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture,
 Project 1 - Flocking**
 
-* (TODO) YOUR NAME HERE
-  * (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Zichuan Yu
+  * [LinkedIn](https://www.linkedin.com/in/zichuan-yu/), [Behance](https://www.behance.net/zainyu717ebcc)
+* Tested on: Windows 10.0.17134 Build 17134, i7-4710 @ 2.50GHz 16GB, GTX 980m 4096MB GDDR5
 
-### (TODO: Your README)
+## Screenshots
+
+### Static
+
+![result](images/static.png)
+
+### GIF
+
+![result](images/motion.gif)
+
+## Performance Analysis
+
+### Change number of boids, with visualization
+
+![fps_with_visualization](images/FPS%20with%20visualization.png)
+
+### Change number of boids, without visualization
+
+![fps_without_visualization](images/FPS%20without%20visualization.png)
+
+### Fix number of boids, change block size, without visualization
+
+![fps_without_visualization_blocksize](images/FPS%20without%20visualization%20block%20size.png)
+
+## Questions
+
+### For each implementation, how does changing the number of boids affect performance? Why do you think this is?
+
+Yes. The two advanced methoeds have much better preformance. Because we are saving a lot of unnecessary checkings. The larger the number of boids, the more we save.
+
+### For each implementation, how does changing the block count and block size affect performance? Why do you think this is?
+
+When the block size is small, the performance is not good. When it is larger or equal to 32, the performance is largely improved and doesn't change much as we increase the block size. Not sure why this happens. **Maybe** related to wrap size.
+
+### For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?
+
+I experience much performance improvements with it. Because when we use scattered memeory method, why are accessing a random memeory place everytime we need to check a neigboring boids. Yet in coherent method, we only do this once when we shuffle the pos and vel.
+
+### Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not? Be careful: it is insufficient (and possibly incorrect) to say that 27-cell is slower simply because there are more cells to check!
+
+When the number of boids becomes large, int 27 cells' case though we check more neighbors, the actuall boids we need to iterate through is actually less, so the performance is better.
 
-Include screenshots, analysis, etc. (Remember, this is public, so don't put
-anything here that you don't want to share with the world.)
diff --git a/images/FPS with visualization.png b/images/FPS with visualization.png
diff --git a/images/FPS without visualization block size.png b/images/FPS without visualization block size.png
diff --git a/images/FPS without visualization.png b/images/FPS without visualization.png
diff --git a/images/motion.gif b/images/motion.gif
diff --git a/images/static.png b/images/static.png
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
@@ -10,5 +10,5 @@ set(SOURCE_FILES
 
 cuda_add_library(src
     ${SOURCE_FILES}
-    OPTIONS -arch=sm_20
+    OPTIONS -arch=sm_50
     )