Containerized tool for Analyzing and visualizing GUIDEseq experiments.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
The purpose of the iceberg tool is to preform our analysis by steps to the well-known GUIDE-Seq experiments. It takes as input a file (.yaml) which contains the paths to raw sequencing reads files (FASTQ) of treatment and control experiments along with other parameters and it identifies different genomic sites types:
- CRISPR activity sites: On/Off targets.
- Spontaneous break sites: Spontaneous DNA breaks (which appear also in the control experiment).
- Noise sites: Sites with low number of reads in comparison to the control experiment sites
(where sites are grouped to bins by their reads MAPQ and comparison done on each bin separately).
The package generate report (.html) that contains over view on the iceberg pipeline and links to three reports - one for each site type, where each one contains all the corresponding genomic sites. Each site is visualized with IGV and other complementary visualization components controlled by a sites table which contains complementary analyzed information. To avoid retyping too much info. Do a search and replace with your text editor for the following: `github_username`, `repo_name`, `twitter_handle`, `linkedin_username`, `email`, `email_client`, `project_title`, `project_description`
- DOCKER
Link for download: Version: Docker Desktop 4.6.1 *** Verify that Docker has access to atleast 12GB Memory (RAM) ***
Follow the next steps for getting started with iceberg.
-
Pull the iceberg's docker image.
docker pull 100200300400/iceberg_image:firsttry
-
Run the iceberg's docker container and keep it alive.
docker run -d --name iceberg_container 100200300400/iceberg_image:firsttry tail -f /dev/null
If the command fails try to replace the image name with the image id.
docker run -d --name iceberg_container <image-id> tail -f /dev/nul
We recommend you to run the test before you continue.
python -m unittest test/test_iceberg.py
-
Copy your genome file - hg.fa.
docker cp /ABSULUTE/HOST/PATH/GENOME/hg.fa iceberg_container:root/input/genome/hg.fa
-
Copy your treatment experiment directory - TX_DIRECTORY.
docker cp /ABSULUTE/HOST/PATH/TX_DIRECTORY/ iceberg_container:root/input/TX_DIRECTORY/
-
Copy your control experiment directory - CONTROL_DIRECTORY.
docker cp /ABSULUTE/HOST/PATH/CONTROL_DIRECTORY/ iceberg_container:root/input/CONTROL_DIRECTORY/
-
Copy your iceberg input file - iceberg_input_file.yaml.
docker cp /ABSULUTE/HOST/PATH/iceberg_input_file.yaml iceberg_container:root/input/iceberg_input_file.yaml
-
Access the iceberg docker container interactive shell after you copy your input data.
docker exec -ti iceberg_container sh
-
Get into the container root directory.
cd root
-
Run the iceberg command, exit the container when it done.
python iceberg/analyzer.py --input_file_path input/iceberg_input_file.yaml
- Copy your iceberg output folder - ICEBERG_OUTPUT.
docker cp iceberg_container:root/ICEBERG_OUTPUT/ /ABSULUTE/HOST/PATH/ICEBERG_OUTPUT/
The input.yaml file contains the following arguments relative to the iceberg's docker container:
-
OUTPUT_FOLDER_PATH
: Absolute path to folder that will contain the iceberg output. -
ANALYZER STEPS
: The steps that will execute.UMI
: Step DocumentationEXPERIMENT_LIBRARIES_DETECTION
: Step DocumentationEXPERIMENT_TRACES_REMOVAL
: Step DocumentationBWA
: Step DocumentationUNITE_READS_TO_ICEBERGS
: Step DocumentationMERGE_TREATMENT_AND_CONTROL_ICEBERGS_BY_LOCUS
: Step DocumentationUNITE_CLOSE_ICEBERGS_SITES
: Step DocumentationCALCULATE_ICEBERGS_SITES_PROFILE
: Step DocumentationBREAKS_CLASSIFY
: Step DocumentationGUIDERNA_ALIGNMENT
: Step DocumentationREPORTS
: Step Documentation
-
EXPERIMENTS
:-
GENERAL
:REFERENCE_GENOME_PATH
: Absolute path to genome reference file.EXPERIMENTS_TAG
: The tag that was injected at the cut events during the GUIDEseq experiment.
-
TX
:-
NAME
: Name for the experiment, use to name files during the pipeline. -
EXPERIMENT_FOLDER_PATH
: Absolute path to folder that contain the following experiment files. -
R1
: Treatment R1 fastq file name. -
R2
: Treatment R2 fastq file name. -
I1
: Treatment I1 fastq file name. -
I2
: Treatment I2 fastq file name. -
GUIDERNA
: The GuideRNA used in the GUIDEseq experiment.
-
-
CONTROL
:NAME
: Name for the experiment, use to name files during the pipeline.EXPERIMENT_FOLDER_PATH
: Absolute path to folder that contain the following experiment files.R1
: Control R1 fastq file name.R2
: Control R2 fastq file name.I1
: Control I1 fastq file name.I2
: Control I2 fastq file name.
-
-
HYPERPARAMATERS
:-
UMI_BPS_AMOUNT_FROM_READS_START
: For each paired-end reads and their indexes - r1, r2, i1,i2 from an experiment R1, R2, I1, I2 fastqs, Unique Molecular Index (UMI) is assigned to both r1 and r2 by
i2[8:16]_r1[0:umi_bps_amount_from_reads_start]_r2[0:umi_bps_amount_from_reads_start] -
MIN_QUALITY
: The minimum quality of a read for it to be considered in the consolidation process at the UMI step. -
MIN_FREQUENCY
: The minimum frequency of a read for the position to be consolidated in the consolidation process at the UMI step. -
MAX_READS_DISTANCE
: The maximum distance allowed between read and iceberg (in nts) for the read to be joined to the iceberg. -
MAX_ICEBERG_DISTANCE
: The maximum distance allowed between two icebergs-sites (in nts) for the icebergs sites to be joined. -
MAX_ALIGNMENTS_HAMMING_DISTANCE
: The maximum Hamming distance allowed between two sequences alignment to be valid. -
NOISE_BINS_AND_CONTROL_MAPQ_PERCENTILE
: Lists of bins for the noise sites filtering. Where
bin defined as: [[bin-min-icebergs-mapq, bin-max-icebergs-mapq), bin-icebergs-control-percentile] Example : [[[0, 1], 0.99],[[1, 50] 0.99],[[50, 61] 0.95]] -
CRISPR_ACTIVITY_THRESHOLD
: Used for classify the icebergs sites in to CRISPR activities and Spontanues Break Sites, see BREAKS_CLASSIFY step for more information.
-
Here we will explain about the iceberg outputs.
For more Information, please refer to the Documentation
- [] Feature 1
- [] Feature 2
- [] Feature 3
- [] Nested Feature
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
Your Name - @twitter_handle - email@email_client.com
Project Link: https://github.com/github_username/repo_name