Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cleavage profile #73

Merged
merged 39 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
8578dd0
Merge pull request #60 from epifluidlab/develop
jamesli124 Apr 30, 2024
cbdaf09
Merge pull request #64 from epifluidlab/develop
jamesli124 May 10, 2024
aecfc37
Merge pull request #65 from epifluidlab/develop
jamesli124 May 10, 2024
e49833e
change cleavage profile intersect_policy to any
jamesli124 May 10, 2024
718657d
fix typo in docstring
jamesli124 May 10, 2024
7933c9c
Update README.md
ravibandaru-lab May 11, 2024
1aaa02d
Update README.md
ravibandaru-lab May 11, 2024
5c19b57
Update README.md
ravibandaru-lab May 11, 2024
7c27b46
Update README.md
ravibandaru-lab May 11, 2024
ada7658
Update README.md
ravibandaru-lab May 11, 2024
9ba5e40
Update README.md
ravibandaru-lab May 11, 2024
e1ae467
Update README.md
ravibandaru-lab May 11, 2024
e7ed11e
Delete src/finaletoolkit/methylation directory
ravibandaru-lab May 12, 2024
aa970a7
Delete src/finaletoolkit/qc directory
ravibandaru-lab May 12, 2024
f936b57
Delete src/finaletoolkit/too directory
ravibandaru-lab May 12, 2024
7cea2b1
Merge pull request #66 from epifluidlab/readme_change
jamesli124 May 13, 2024
d65e910
Merge pull request #67 from epifluidlab/remove-redundant-files
jamesli124 May 13, 2024
9aceebd
Merge pull request #68 from epifluidlab/main
jamesli124 May 13, 2024
facc6b4
mask for zeros when calculating proportions to avoid errors
jamesli124 May 13, 2024
ff727c4
remove _get_contigs and add _parse_chrom_sizes
jamesli124 May 13, 2024
5044c82
fix coverage verbose log
jamesli124 May 13, 2024
1c092f9
add numerous util functions related to merging overlapping bins
jamesli124 May 13, 2024
000c2be
Changed _cli_cleavage_profile
jamesli124 May 13, 2024
be3037b
update documentation
ravibandaru-lab May 13, 2024
ef477b7
Merge pull request #69 from epifluidlab/documentation-update
jamesli124 May 14, 2024
4faca95
Create .nojekyll
jamesli124 May 14, 2024
4379aa6
Create static.yml
jamesli124 May 14, 2024
914d1fd
Merge pull request #70 from epifluidlab/documentation-update
jamesli124 May 14, 2024
9a07747
Merge pull request #71 from epifluidlab/github-pages-integration
jamesli124 May 14, 2024
1751523
Merge pull request #72 from epifluidlab/main
jamesli124 May 14, 2024
4376521
update cleavage_profile docstring
jamesli124 May 15, 2024
e8ad538
added chrom_sizes_to_dict
jamesli124 May 15, 2024
4ffc98a
left and right option added to cleavage_profile
jamesli124 May 15, 2024
35efd71
update cleavage-profile cli to have laft and right coordinate
jamesli124 May 15, 2024
2d43669
files for cleavage_profile tests
jamesli124 May 15, 2024
37677b7
fix cleavage profile with left and right options
jamesli124 May 15, 2024
73cdbeb
add basic test for cleavage profile
jamesli124 May 15, 2024
5766b3f
remove unnecessary comments
jamesli124 May 15, 2024
60b82f0
update changelog
jamesli124 May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/static.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Simple workflow for deploying static content to GitHub Pages
name: Deploy static content to Pages

on:
# Runs on pushes targeting the default branch
push:
branches: ["documentation-update","main"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Single deploy job since we're just deploying
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Pages
uses: actions/configure-pages@v5
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
# Html build
path: './docs/_build/html/'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [UNRELEASED VERSION]

### Fixed
- Fixed intersect policy for cleavage_profile
- Clean up some comments
- Fixed logging from coverage function

### Added
- Added numerous util functions
- Added `left` and `right` options to `cleavage_profile` and CLI
`cleavage-profile`.
- Added tests for cleavage profiling.


## [0.5.2] - 2024-05-08

### Fixed
Expand Down
127 changes: 53 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,69 @@
# FinaleToolkit
A package and standalone program to extract fragmentation patterns of cell-free
DNA from paired-end sequencing data. FinaleToolkit refers to FragmentatIoN
AnaLysis of cEll-free DNA Tools.

FinaleToolkit is in active development, and all API is subject to change and
should be considered unstable.
# <img alt="dna with letters FT" src="https://bananasrlowkeygood.github.io/images/finaletools_logo_rounded.png" height="60"> ‎ ‎ ‎FinaleToolkit
<summary><h3>Table of Contents</h2></summary>
<ol>
<li><a href="#about-the-project">About The Project</a></li>
<li><a href="#installation">Installation</a></li>
<li>
<a href="#usage">Usage</a>
<ul>
<li><a href="#functionality">Functionality</a></li>
<li><a href="#documentation">Documentation</a></li>
<li><a href="#compatible-file-formats">Compatible File Formats</a></li>
<li><a href="#using-fragment-files">Using Fragment Files</a></li>
</ul>
</li>
<li><a href="#contact">Contact</a></li>
<li><a href="#license">License</a></li>
</ol>




## About The Project
FinaleToolkit (**F**ragmentat**I**o**N** **A**na**L**ysis of c**E**ll-free DNA **Toolkit**) is a package and standalone program to extract fragmentation features of cell-free DNA from paired-end sequencing data.

## Installation
Instructions:
- (Optional) create a conda or venv environment to use FinaleToolkit in.
- Run `pip install finaletoolkit`

To verify FinaleToolkit has been successfully installed, try
You can install the package using `pip`.
```
$ finaletoolkit -h
usage: finaletoolkit [-h]
{coverage,frag-length,frag-length-bins,frag-length-intervals,wps,delfi,filter-bam,adjust-wps,agg-wps,delfi-gc-correct,end-motifs,mds}
...

Calculates fragmentation features given a CRAM/BAM/SAM file

options:
-h, --help show this help message and exit

subcommands:
{coverage,frag-length,frag-length-bins,frag-length-intervals,wps,delfi,filter-bam,adjust-wps,agg-wps,delfi-gc-correct,end-motifs,mds}
$ pip install finaletoolkit
```

## Usage
Documentation can be found at https://epifluidlab.github.io/finaletoolkit-docs/

FinaleToolkit functions generally accept reads in a few file formats:
- Binary Alignment Map (BAM) Files
- Compressed Reference-oriented Alignment Map
- FinaleDB Frag.gz Files
### Functionality

Frag.gz files are block-gzipped BED3+2 files with the following format:
`chrom start stop mapq strand(+/-)`
FinaleToolkit has support for the following cell-free DNA fragmentation features:

The below script can be used to convert from bam to frag.gz:
```
INPUT=input.bam
OUTPUT=output.frag.gz

samtools sort -n -o qsorted.bam -@ 16 input.bam;
samtools view -h -f 3 -F 3852 -G 48 --incl-flags 48 \
qsorted.bam |\
bamToBed -bedpe -mate1 -i stdin |\
awk -F'\t' -v OFS="\t" '{if ($1!=$4) next; if ($9=="+") {s=$2;e=$6} else {s=$5;e=$3} if (e>s) print $1,s,e,$8,$9}' |\
sort -k1,1V -k2,2n |\
bgzip > $OUTPUT;
tabix -p bed $OUTPUT;
```
- Fragment Length
- Coverage
- End Motifs
- Motif Diversity Score [![DOI](https://img.shields.io/badge/DOI-10.1158%2F2159--8290.CD--19--0622-blue?style=flat-square)](https://doi.org/10.1158/2159-8290.CD-19-0622)
- Windowed Protection Score [![DOI](https://img.shields.io/badge/DOI-110.1016%2Fj.cell.2015.11.050-blue?style=flat-square)](https://doi.org/10.1016/j.cell.2015.11.050)
- DELFI [![DOI](https://img.shields.io/badge/DOI-10.1038%2Fs41586--019--1272--6-blue?style=flat-square&link=https%3A%2F%2Fdoi.org%2F10.1038%252Fs41586-019-1272-6)](https://doi.org/10.1038%2Fs41586-019-1272-6)
- Cleavage Profile [![DOI](https://img.shields.io/badge/DOI-10.1073%2Fpnas.2209852119-blue?style=flat-square)](https://doi.org/10.1073/pnas.2209852119)

Frag.gz files can be retrieved from http://finaledb.research.cchmc.org/
### Documentation
Documentation for FinaleToolkit can be found [here](https://epifluidlab.github.io/finaletoolkit-docs/).

Because FinaleToolkit uses pysam, BAM files should be bai-indexed and Frag.gz files should be tabix-indexed.
### Compatible File Formats

To view fragment length distribution
```
$ finaletoolkit frag-length-bins --contig 22 --histogram sample.bam
Fragment Lengths for 22:-
10.61% ▇ mean :169.28
09.85% ▆█▁ median :169.00
09.09% ███ stdev :25.52
08.34% ████ min :67.00
07.58% ▁████ max :289.00
06.82% █████▂
06.06% ██████
05.31% ▆██████▂
04.55% ▄████████▁
03.79% ▃██████████
03.03% ▂████████████▆
02.27% ██████████████▇▃
01.52% ▇█████████████████▅▂
00.76% ▂▂▂▂▂▂▃▃▄▅▄████████████████████████▆▅▄▃▂▂▂▂▂▂▂▁▁▂▂▂▁▁
len (nt)067 091 115 139 163 187 211 235 259 283
```
FinaleToolkit is compatible with almost any paired-end sequence data:

- Binary Alignment Map (`.bam`) files with an associated index file (`.bam.bai`).
- Sequence Alignment Map (`.sam`) files.
- Compressed Reference-oriented Alignment Map (`.cram`) files.
- Fragment (`.frag.gz`) files with an associated tabix index file (`.frag.gz.tbi`).

### Using Fragment Files

## Testing
Fragment (`.frag.gz`) files are block-gzipped BED3+2 files with the following columns: `chrom` , `start` , `stop` , `mapq` , `strand`.

To run unit tests, navigate to the root directory of your local copy of this
repo and run `pytest`. You may have to download pytest first.
We encourage you to use our comprehensive database, FinaleDB, to access relevant fragment files. Learn more about FinaleDB [here](http://finaledb.research.cchmc.org).

## FAQ
Q: When running on an ARM64 Mac, I can install FinaleToolkit without errors.
However, I get an `ImportError` when I run it.
## Contact
- James Li: lijw21@wfu.edu
- Ravi Bandaru: ravi.bandaru@northwestern.edu
- Yaping Liu: yaping@northwestern.edu

A: Try `brew install curl`. Otherwise, email me and I will try to help you.
## License
For academic research, please refer to MIT license. For commerical usage, please contact the authors.
4 changes: 2 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file added docs/_build/doctrees/index.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/.buildinfo → docs/_build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 6041be4c3e87185b8cc4c04670f254ba
config: 7b9a79626a3bd201c50461a4a33a6bd0
tags: 645f666f9bcd5a90fca523b33c5a78b7
File renamed without changes.
Loading