vmmul omp instructional test harness

This directory contains a benchmark harness for testing different implementations of vector-matrix multiply (VMM) for varying problem sizes.

The main code is benchmark.cpp, which sets up the problem, iterates over problem sizes, sets up the vector and matrix, executes the vmmul call, and tests the result for accuracy by comparing your result against a reference implementation (CBLAS).

Note that cmake needs to be able to find the CBLAS package. For CSC 746/656 Fall 2023, this condition is true on Perlmutter@NERSC and on the class VM. It is also true for some other platforms, but you are on your own if using a platform other than Perlmutter@NERSC or the class VM.

Build instructions - general

After downloading the code, you may first need to make modifications to your environment to access the correct compilers. See below for more information.

Once your environment is set up, then cd into the main source directly, then:

mkdir build
cd build
cmake ../

When building on Perlmutter, you may do compilations and brief runs on a login node. First, set up your environment to use Perlmutter CPU nodes:

module load cpu

Here, "brief" means < 10 second runs.

When you are ready to do builds/runs on a Perlmutter CPU node, use the salloc command to hop onto a CPU node:

salloc --nodes=1 --qos=interactive --time=00:15:00 --constraint=cpu --account=m3930

Build peculiarities for MacOSX platforms:

Compiler version. The default version of g++ shipped with the the development library on MacOS 12.6.8 (Monterey) is clang version 12.0.5 (clang-1205.0.22.9) and this version of the compiler WILL NOT WORK with this assignment because it does not support OpenMP. The simplest fix is to install a new compiler: brew install gcc, which will install the most current version of gcc/g++, which is 12.2.0 (for MacOSC 12.6.8, Monterey) as of the time of this writing.

There may be a way to force Apple's clang to enable OpenMP. See this thread, which Prof. Bethel has not tried: https://stackoverflow.com/questions/44380459/is-openmp-available-in-high-sierra-llvm/47230419#47230419

Setting the CXXFLAGS to point to the directory containing cblas.h. On Prof. Bethel's laptop, which is an intel-based Macbook Pro running MacOS 12.6.8 (Monterey), with Xcode installed, cmake (version 3.20.1) can find the BLAS package, but then the build fails with an error about g++ not being able to find cblas.h.

The workaround is to tell cmake where cblas.h lives by using an environment variable: export CXXFLAGS="-I /path/to/headers" then clean your build directory (rm -rf * inside build) and run cmake again.

Note you will need to "locate cblas.h" on your machine and replace the path to cblas.h in the CXXFLAGS line above with the path on your specific machine.

Run the command:

locate cblas.h which on Prof. Bethel's laptop produces the following output:

/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/Headers/cblas.h /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/Headers/cblas.h /usr/local/Cellar/openblas/0.3.23/include/cblas.h

Use the path to the newest headers, here the MacOSX11.3.sdk version:

export CXXFLAGS = "-I /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/Headers/"

Then clean your build directory, and rerun cmake then make.

Adding your code

For timing:

You will need to modify the benchmark.cpp code to add timing instrumentation, to report FLOPs executed, and so forth.

For vector-matrix multiplication:

There are stub routines inside dgemv-basic.cpp, dgemv-vectorized.cpp, and dgemv-openmp.cpp where you can add your code for doing basic, vectorized, and OpenMP-parallel vector-matrix multiply, respectively.

For the OpenMP parallel code, note that you specify concurrency at runtime using the OMP_NUM_THREADS environment variable. While it is possible to set the number of concurrent OpenMP threads at compile time, it is generally considered better practice to specify the number of OpenMP threads via the OMP_NUM_THREADS environment variable.

Running the codes

There is a sample job script provided for running the OpenMP code at 4 levels of concurrency: 1, 4, 16, 64 threads. You may launch that script either as a batch job using the sbatch command, or you may run it as a shell script from an interactive node (preferred).

For the run as a shell script on an interactive CPU node:

sh ./job-openmp

For the other codes -- benchmark-blas, benchmark-basic, and benchmark-vectorized -- it is easiest to run these from the command line from an interactive node by typing:

srun ./benchmark-basic
or
srun ./benchmark-vectorized
or
srun ./benchmark-blas

#eof

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
CMakeLists.txt		CMakeLists.txt
README.md		README.md
benchmark.cpp		benchmark.cpp
dgemv-basic.cpp		dgemv-basic.cpp
dgemv-blas.cpp		dgemv-blas.cpp
dgemv-openmp.cpp		dgemv-openmp.cpp
dgemv-vectorized.cpp		dgemv-vectorized.cpp
job.in		job.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vmmul omp instructional test harness

Build instructions - general

Build peculiarities for MacOSX platforms:

Adding your code

Running the codes

About

Releases

Packages

Languages

Kostanix/vmmul-omp-harness-instructional

Folders and files

Latest commit

History

Repository files navigation

vmmul omp instructional test harness

Build instructions - general

Build peculiarities for MacOSX platforms:

Adding your code

Running the codes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages