-
Notifications
You must be signed in to change notification settings - Fork 103
Home
This is a GitHub repo and (mostly independent) Python/R scripts that I personally use on a daily basis to make my life easier. Most of the code is very simple, less than 20 lines, and does some kind of basic processing like reverse complementing a sequence.
This repo + wiki is meant to supplement the official Iso-Seq software.
The simplest way to use the script is to simply clone the GitHub repository, then add the GitHub repo path to your $PATH
variable. The scripts are organized into different sub-directories (ex: sequence/
, rarefaction/
etc) so you will have to add them individually.
git clone https://github.com/Magdoll/cDNA_Cupcake.git
export PATH=$PATH:<path_to_Cupcake>/sequence/
And so on...
However if you wish to use scripts such as collapse_isoforms_by_sam.py
and chain_samples.py
, you will need to install Cupcake. See Cupcake: supporting scripts for Iso-Seq after clustering step
# only if you need to use certain scripts
python setup.py build
python setup.py install
The only exception is Cupcake: supporting scripts for Iso-Seq after clustering step, which does require compiling and installation.
All the scripts assume that the input/output sequences consist only of: A, T, C, G.
Other nucleotides such as N
, U
, R
, might cause incorrect behavior. Use at own risk.
-
make_file_for_subsampling_from_collapsed.py
: Prepare file for running subsampling (rarefaction curve). -
subsample.py
andsubsample_with_category.py
: Running subsamping. Results can be plotted with Excel graphics and R, etc.
See Annotation and Rarefaction Wiki for usage details.
-
calc_probe_hit_from_sam.py
: calculate on-target rate based on FL read alignment + probe BED file.
See Targeted Iso-Seq Wiki for usage details.
-
get_seq_stats.py
: Summarize length distribution of a FASTA/FASTQ file. -
rev_comp.py
: Reverse complement a sequence from command line. -
fa2fq.py
andfq2fa.py
: Convert between FASTA and FASTQ format. -
sort_fasta_by_len.py
: sort fasta file by length (increasing or decreasing). -
get_seqs_from_list.py
: extract list of sequences given a fasta file and a list of IDs. -
err_correct_w_genome.py
: generate fasta sequences given genome and SAM file. -
calc_expected_accuracy_from_fastq.py
: calculate expected accuracy from FASTQ file. Can be used to calculate expected accuracies in Quiver/Arrow-polished low-quality isoform sequences. -
sam_to_bam.py
: quick script to run SAM to BAM conversion. Assumessamtools
is installed. -
sam_to_gff3.py
: use BCBio and BioPython to convert SAM file into GFF3 format. -
group_ORF_sequences.py
: group identical ORF sequences from different isoforms.
See Sequence Manipulation Wiki for usage details.
-
collapse_isoforms_by_sam.py
: Collapse HQ isoform results to unique isoforms (based on genome alignment). -
get_abundance_post_collapse.py
: Obtain count information post collapse to unique isoforms. -
filter_by_count.py
: Filter collapse result by FL count information. -
filter_away_subset.py
: Filter away 5' degraded isoforms. -
simple_stats_post_collapse.py
: Generating simple stats file to plot in R later. -
chain_samples.py
: Chaining together multiple samples. -
fusion_finder.py
: Finding fusion genes. -
fusion_collate_info.py
: Collate fusion information after running SQANTI(3). -
color_bed12_post_sqanti.py
: Color BED12 files using FL counts after running SQANTI(3).
See Cupcake: supporting scripts for Iso-Seq after clustering step for usage details.
A list of useful tools that complements Cupcake: