This is a guide to contributing to this project.
This workflow contains custom C++ sources which must be compiled. No extra actions are necessary if using the Docker container with the awsbatch executor (see below).
To compile, you should first create and activate the Conda environment defining the required packages, then use cmake to build the project.
cd /where/you/cloned/this/repository
conda env create --force --file conf/conda/drugseq-env.yml
conda activate drugseq-env
cmake . -B build -DCMAKE_MAKE_PROGRAM=qmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$(realpath .) -DCMAKE_PREFIX_PATH=${CONDA_PREFIX} -DCMAKE_MODULE_PATH=${CONDA_PREFIX}/unpacked_source/cmake
cmake --build build --target test install/strip --parallel
To test Python components, run the following inside this directory:
python -m unittest discover -s tests/python-unittest
To test the workflow, you will also need a compatible version of nextflow and a compute environment that supports the processes with high resource demand. You will need a minimum of 10 logical CPUs and 80 GB of RAM at your disposal. If you have less memory available, you can pass a smaller human-readable value via --MAXMEM
, however this is potentially unstable.
nextflow run -profile [...],test --IO.outdir ./outs
This project is written primarily in three languages: Python, C++, and Nextflow (Groovy). The general organization principle for this project is as such:
- Workflow organization is done in Nextflow.
main.nf
,workflows/DRAGoN.nf
, andworkflows/STARsolo.nf
are the main entrypoints. Subworkflows are imported fromcomponents/
. See the Nextflow documentation for language details. - Heavy-lifting processes are performed using either third-party tools or custom C++ programs. For the former, the space of tools that are available to workers are defined in
conf/conda/drugseq-env.yml
. This also defines the Python packages and C/C++ libraries available to the C++ sources insrc/
and the Python scripts inbin/
.- Every custom C++ program has an equivalent Python fallback script.
- C++ binaries are also installed into
bin/
for local execution.
- Smaller processes such as merging intermediates are handled using Python scripts in
bin/
.
This project uses GitHub for fine-grained version control. The main branch is master
, and this has branch protections preventing direct push. Thus any change made to the pipeline should be made in a separate branch, and work logged in a JIRA ticket. The branch name convention is feature/<ticket>
or bugfix/<ticket>
.
All contributors should verify that their changes satisfy the following checks:
- If C++ sources are modified, they should compile, and all Ctest tests should pass.
- If python scripts are modified, all python tests should pass.
- Running this workflow locally (using the
conda
profile and locally-built binaries) should pass. - All output files should be checked for proper formatting.
- The Build Docker Image workflow on GitHub Actions should pass.
- During the PR review process:
- The Run Unit Tests workflow should pass.
- version.txt should be updated and committed.
- nf-test snapshots should be updated (
nf-test test . --updateSnapshots
) and committed. - Running this workflow on the CTC cluster using the
singularity
profile should pass. You may need to manually update the container image into your cache directory (specified byNXF_SINGULARITY_CACHEDIR
). - Running this workflow from a DRUG-seq head node should pass.