diff --git a/CHANGELOG.md b/CHANGELOG.md index 1f26a1f5..9e015aa5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,24 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.5.0] - 2024-04-24 + +### Added +- Added finaletools.interval_end_motifs function to calculate end-motifs +over genomic intervals. Stores results in an IntervalEndMotifs object. +- Added CLI subcommand interval-end-motifs to calculate end-motifs over +genomic intervals. +- Added CLI subcommand interval-mds to calculate MDS over intervals from +interval end-motifs table. + +### Changed +- Added gc_correct option to delfi_merge_bins so that merging is possible +without GC correction + +### Fixed +- `delfi` can now be run with `gc_correct=false` and `merge_bins=true` +- fixed `cleavage_profile` import in `frag` + ## [0.4.5] - 2024-04-9 ### Added diff --git a/README.md b/README.md index a604b047..03dfbe80 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,20 @@ -# FinaleTools +# FinaleToolkit A package and standalone program to extract fragmentation patterns of cell-free -DNA from paired-end sequencing data. FinaleTools refers to FragmentatIoN +DNA from paired-end sequencing data. FinaleToolkit refers to FragmentatIoN AnaLysis of cEll-free DNA Tools. -FinaleTools is in active development, and all API is subject to change and +FinaleToolkit is in active development, and all API is subject to change and should be considered unstable. ## Installation Instructions: -- (Optional) create a conda or venv environment to use FinaleTools in. -- Run `pip install finaletools` +- (Optional) create a conda or venv environment to use FinaleToolkit in. +- Run `pip install finaletoolkit` -To verify FinaleTools has been successfully installed, try +To verify FinaleToolkit has been successfully installed, try ``` -$ finaletools -h -usage: finaletools [-h] +$ finaletoolkit -h +usage: finaletoolkit [-h] {coverage,frag-length,frag-length-bins,frag-length-intervals,wps,delfi,filter-bam,adjust-wps,agg-wps,delfi-gc-correct,end-motifs,mds} ... @@ -28,9 +28,9 @@ subcommands: ``` ## Usage -Documentation can be found at https://epifluidlab.github.io/finaletools-docs/ +Documentation can be found at https://epifluidlab.github.io/finaletoolkit-docs/ -FinaleTools functions generally accept reads in a few file formats: +FinaleToolkit functions generally accept reads in a few file formats: - Binary Alignment Map (BAM) Files - Compressed Reference-oriented Alignment Map - FinaleDB Frag.gz Files @@ -55,11 +55,11 @@ tabix -p bed $OUTPUT; Frag.gz files can be retrieved from http://finaledb.research.cchmc.org/ -Because FinaleTools uses pysam, BAM files should be bai-indexed and Frag.gz files should be tabix-indexed. +Because FinaleToolkit uses pysam, BAM files should be bai-indexed and Frag.gz files should be tabix-indexed. To view fragment length distribution ``` -$ finaletools frag-length-bins --contig 22 --histogram sample.bam +$ finaletoolkit frag-length-bins --contig 22 --histogram sample.bam Fragment Lengths for 22:- 10.61% ▇ mean :169.28 09.85% ▆█▁ median :169.00 @@ -79,7 +79,7 @@ len (nt)067 091 115 139 163 187 211 235 259 283 ``` ## FAQ -Q: When running on an ARM64 Mac, I can install FinaleTools without errors. +Q: When running on an ARM64 Mac, I can install FinaleToolkit without errors. However, I get an `ImportError` when I run it. A: Try `brew install curl`. Otherwise, email me and I will try to help you. \ No newline at end of file diff --git a/docs/build/doctrees/api.doctree b/docs/build/doctrees/api.doctree index d0f17c45..cb88ef81 100644 Binary files a/docs/build/doctrees/api.doctree and b/docs/build/doctrees/api.doctree differ diff --git a/docs/build/doctrees/cli.doctree b/docs/build/doctrees/cli.doctree index 391ccd5c..dae9b8c8 100644 Binary files a/docs/build/doctrees/cli.doctree and b/docs/build/doctrees/cli.doctree differ diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle index eeb57907..ced225c1 100644 Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ diff --git a/docs/build/doctrees/index.doctree b/docs/build/doctrees/index.doctree index a189cb49..65ed15de 100644 Binary files a/docs/build/doctrees/index.doctree and b/docs/build/doctrees/index.doctree differ diff --git a/docs/build/doctrees/quick_start.doctree b/docs/build/doctrees/quick_start.doctree index bab64e64..343c353f 100644 Binary files a/docs/build/doctrees/quick_start.doctree and b/docs/build/doctrees/quick_start.doctree differ diff --git a/docs/build/doctrees/releases.doctree b/docs/build/doctrees/releases.doctree index 71504875..10a4fe3a 100644 Binary files a/docs/build/doctrees/releases.doctree and b/docs/build/doctrees/releases.doctree differ diff --git a/docs/build/doctrees/usage.doctree b/docs/build/doctrees/usage.doctree deleted file mode 100644 index 6c04446d..00000000 Binary files a/docs/build/doctrees/usage.doctree and /dev/null differ diff --git a/docs/build/html/_sources/api.rst.txt b/docs/build/html/_sources/api.rst.txt index 056253c6..6d5af2d6 100644 --- a/docs/build/html/_sources/api.rst.txt +++ b/docs/build/html/_sources/api.rst.txt @@ -29,16 +29,28 @@ DELFI .. autofunction:: finaletools.frag.delfi_gc_correct +.. autofunction:: finaletools.frag.delfi_merge_bins + End-motifs ========== .. autoclass:: finaletools.frag.EndMotifFreqs :members: +.. autoclass:: finaletools.frag.EndMotifsIntervals + :members: + .. autofunction:: finaletools.frag.region_end_motifs .. autofunction:: finaletools.frag.end_motifs +.. autofunction:: finaletools.frag.interval_end_motifs + +Cleavage Profile +================ +.. autofunction:: finaletools.frag.cleavage_profile + + Frag File Utilities =================== diff --git a/docs/build/html/_sources/index.rst.txt b/docs/build/html/_sources/index.rst.txt index 98916dce..624a6b24 100644 --- a/docs/build/html/_sources/index.rst.txt +++ b/docs/build/html/_sources/index.rst.txt @@ -15,7 +15,6 @@ Welcome to FinaleTools's documentation! :maxdepth: 2 :caption: Contents: - usage quick_start cli api diff --git a/docs/build/html/api.html b/docs/build/html/api.html index 07a7fef1..098be17a 100644 --- a/docs/build/html/api.html +++ b/docs/build/html/api.html @@ -63,6 +63,7 @@
EndMotifFreqs.to_tsv()
EndMotifsIntervals
+region_end_motifs()
end_motifs()
interval_end_motifs()
Return estimated fragment coverage over intervals specified in -intervals. Fragments are read from input_file which may be
---a SAM, BAM, CRAM, or Frag.gz file. Uses an algorithm where the midpoints of
-
fragments are calculated and coverage is tabulated from the +intervals. Fragments are read from input_file which may be +a SAM, BAM, CRAM, or Frag.gz file. Uses an algorithm where the midpoints of +fragments are calculated and coverage is tabulated from the midpoints that fall into the specified region. Not suitable for fragments of size approaching interval size.
Helper function that takes window data and performs GC adjustment.
Class that stores frequencies of end-motif k-mers over +user-specified intervals and contains methods to manipulate this +data.
+intervals (Iterable) – A collection of tuples, each containing a tuple representing +a genomic interval (chrom, 0-based start, 1-based stop) and a +dict that maps kmers to frequencies in the interval.
k (int) – Size of k-mers
quality_threshold (int, optional) – Minimum mapping quality used. Default is 30.
Returns a list of intervals and associated frquency for given +kmer. Results are in the form (chrom, 0-based start, 1-based +stop, frequency).
+Reads kmer frequency from a tab-delimited file
+file_path (str) – Path string containing path to file.
sep (str, optional) – Delimiter used in file.
kmer_freqs
+Writes MDS for each interval to a bed/bedgraph file.
+Calculates a motif diversity score (MDS) for each interval using +normalized Shannon entropy as described by Jiang et al (2020). This +function is generalized for any k instead of just 4-mers.
+Take frequency of specified kmer and writes to BED.
+output_file (str) – File to write frequencies to.
calc_freq (bool, optional) – Calculates frequency of motifs if true. Otherwise, writes counts +for each motif. Default is true.
sep (str, optional) – Separator for table. Tab-separated by default.
Take frequency of specified kmer and writes to bedgraph.
+output_file (str) – File to write frequencies to.
calc_freq (bool, optional) – Calculates frequency of motifs if true. Otherwise, writes counts +for each motif. Default is true.
sep (str, optional) – Separator for table. Tab-separated by default.
Writes all intervals and associated frquencies to file.
+output_file (str) – File to write frequencies to.
calc_freq (bool, optional) – Calculates frequency of motifs if true. Otherwise, writes counts +for each motif. Default is true.
sep (str, optional) – Separator for table. Tab-separated by default.
Function that reads fragments in the specified region from a BAM, SAM, or tabix indexed file and returns the 5’ k-mer (default is -4-mer) end motif counts as a structured array. This function +4-mer) end motif counts as a dictionary. This function reproduces the methodology of Zhou et al (2023).
input_file (str) –
contig (str) –
start (int) –
stop (int) –
refseq_file (str) –
k (int, optional) –
input_file (str) – Path of SAM, BAM, CRAM, or Frag.gz containing pair-end reads.
contig (str) – Name of contig or chromosome for region.
start (int) – 0-based start coordinate.
stop (int) – 1-based end coordinate.
refseq_file (str) – 2bit file with reference sequence input_file was aligned to.
k (int, optional) – Length of end motif kmer. Default is 4.
fraction_low (int, optional) – Minimum fragment length.
fraction_high (int, optional) – Maximum fragment length.
both_strands (bool, optional) – Choose whether to use forward 5’ ends only or use 5’ ends for +both ends of PE reads.
output_file (None or str, optional) –
quality_threshold (int, optional) –
verbose (bool or int, optional) –
Function that reads fragments from a BAM, SAM, or tabix indexed file and returns the 5’ k-mer (default is 4-mer) end motif frequencies as a dictionary. Optionally writes data to a tsv. This @@ -517,12 +647,42 @@
input_file (str) –
refseq_file (str) –
k (int, optional) –
output_file (None or str, optional) –
quality_threshold (int, optional) –
workers (int, optional) –
input_file (str) – SAM, BAM, CRAM, or Frag.gz file with paired-end reads.
refseq_file (str) – 2bit file with sequence of reference genome input_file is +aligned to.
k (int, optional) – Length of end motif kmer. Default is 4.
output_file (None or str, optional) – File path to write results to. Either tsv or csv.
quality_threshold (int, optional) – Minimum MAPQ to filter.
workers (int, optional) – Number of worker processes.
verbose (bool or int, optional) –
end_motif_freq
+Function that reads fragments from a BAM, SAM, or tabix indexed +file and user-specified intervals and returns the 5’ k-mer +(default is 4-mer) end motif. Optionally writes data to a tsv.
+input_file (str) – Path of SAM, BAM, CRAM, or Frag.gz containing pair-end reads.
refseq_file (str) – Path of 2bit file for reference genome that reads are aligned to.
intervals (str or tuple) – Path of BED file containing intervals or list of tuples +(chrom, start, stop).
k (int, optional) – Length of end motif kmer. Default is 4.
output_file (None or str, optional) – File path to write results to. Either tsv or csv.
quality_threshold (int, optional) – Minimum MAPQ to filter.
workers (int, optional) – Number of worker processes.
verbose (bool or int, optional) –
end_motif_freq
list
+EndMotifIntervals