This repository contains an implementation of Local Binary Pattern algorithm using GPU acceleration with CUDA. The project is made to compare speed performances wrt sequential CPU-only version.
- Place an image in .jpg format in
input/
folder - Run the program specifying the image name
- At the end of the run an histogram will be generated in
output/
We compared running time between three different implementations:
- Simple sequential CPU version
- Non-optimized GPU accelerated version that uses only global memory
- Optimized GPU accelerated version using also shared memory
Running time for different sizes of a square image
We could reach up to 15x speed-up on GeForce GTX 980 Ti.
For a detailed description of code implementation and tests you can check our report. (available in italian only, sorry)
We also made a similar comparison between sequential vs multithread version on CPU only.
Parallel Computing - Computer Engineering Master Degree @University of Florence.