Skip to content
forked from cobilab/jarvis2

Efficient DNA sequence compression

License

Notifications You must be signed in to change notification settings

cobioders/jarvis2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JARVIS2

High reference-free compression of genomic data

Installation

git clone https://github.com/cobioders/jarvis2.git
cd jarvis2/src/
make

Execution

Run JARVIS2

Run JARVIS2 using level 9:

./JARVIS2 -v -l 9 File.seq

Parameters

To see the possible options type

./JARVIS2 -h

This will print the following options:


SYNOPSIS                                                           
      ./JARVIS2 [OPTION]... [FILE]                                 
                                                                   
SAMPLE                                                             
      Run Compression   -> ./JARVIS2 -v -l 4 sequence.txt          
      Run Decompression -> ./JARVIS2 -v -d sequence.txt.jc         
                                                                   
DESCRIPTION                                                        
      Lossless compression and decompression of genomic            
      sequences for efficient storage and analysis purposes.       
      Measure an upper bound of the sequence complexity.           
                                                                   
      -h,  --help                                                  
           Usage guide (help menu).                                
                                                                   
      -a,  --version                                               
           Display program and version information.                
                                                                   
      -x,  --explanation                                           
           Explanation of the context and repeat models.           
                                                                   
      -f,  --force                                                 
           Force mode. Overwrites old files.                       
                                                                   
      -v,  --verbose                                               
           Verbose mode (more information).                        
                                                                   
      -d,  --decompress                                            
           Decompression mode.                                     
                                                                   
      -e,  --estimate                                              
           It creates a file with the extension ".iae" with the  
           respective information content. If the file is FASTA or 
           FASTQ it will only use the "ACGT" (genomic) sequence. 
                                                                   
      -s,  --show-levels                                           
           Show pre-computed compression levels (configured).      
                                                                   
      -l [NUMBER],  --level [NUMBER]                               
           Compression level (integer).                            
           Default level: 4.                                      
           It defines compressibility in balance with computational
           resources (RAM & time). Use -s for levels perception.   
                                                                   
      -hs [NUMBER],  --hidden-size [NUMBER]                        
           Hidden size of the neural network (integer).            
           Default value: 40.                                      
                                                                   
      -lr [DOUBLE],  --learning-rate [DOUBLE]                      
           Neural Network leaning rate (double).                   
           Default value: 0.030.                                   
                                                                   
      [FILE]                                                       
           Input sequence filename (to compress) -- MANDATORY.     
           File to compress is the last argument.                  
                                                                   

To see the possible levels (automatic choosen compression parameters), type:

./JARVIS2 -s

Compression of FASTA data

Preparing JARVIS2 for FASTA:

cd FASTA/
chmod +x *.sh
./JARVIS2_FASTA --install

Compression:

./JARVIS2_FASTA.sh --threads 8 --block 10MB --input sample.fa

Decompression:

./JARVIS2_FASTA.sh --decompress --threads 4 --input sample.fa.tar

Compression of FASTQ data

Preparing JARVIS2 for FASTQ:

cd FASTQ/
chmod +x *.sh
./JARVIS2_FASTQ --install

Compression:

./JARVIS2_FASTQ.sh --threads 8 --block 40MB --input sample.fq

Decompression:

./JARVIS2_FASTQ.sh --decompress --threads 4 --input sample.fq.tar

Citation

In progress...

Issues

For any issue let us know at issues link.

License

License: GPL v3

For more information:

http://www.gnu.org/licenses/gpl-3.0.html

About

Efficient DNA sequence compression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 81.4%
  • C++ 14.9%
  • Shell 3.1%
  • Makefile 0.6%