GitHub - YakhiniGroup/ICEBERG

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
iceberg		iceberg
images		images
test		test
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README-OLD.md		README-OLD.md
README.md		README.md
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml
input.yaml		input.yaml
requirements-work.txt		requirements-work.txt
requirements.txt		requirements.txt
requirements1.txt		requirements1.txt
requirements2.txt		requirements2.txt
setup.py		setup.py

Repository files navigation

ICEBERG PROJECT

Containerized tool for Analyzing and visualizing GUIDEseq experiments.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
Project Diagram
Built With
Prerequisites
Getting Started

Installation
Tests

Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

The purpose of the iceberg tool is to preform our analysis by steps to the well-known GUIDE-Seq experiments. It takes as input a file (.yaml) which contains the paths to raw sequencing reads files (FASTQ) of treatment and control experiments along with other parameters and it identifies different genomic sites types:

CRISPR activity sites: On/Off targets.
Spontaneous break sites: Spontaneous DNA breaks (which appear also in the control experiment).
Noise sites: Sites with low number of reads in comparison to the control experiment sites
(where sites are grouped to bins by their reads MAPQ and comparison done on each bin separately).

The package generate report (.html) that contains over view on the iceberg pipeline and links to three reports - one for each site type, where each one contains all the corresponding genomic sites. Each site is visualized with IGV and other complementary visualization components controlled by a sites table which contains complementary analyzed information. To avoid retyping too much info. Do a search and replace with your text editor for the following: `github_username`, `repo_name`, `twitter_handle`, `linkedin_username`, `email`, `email_client`, `project_title`, `project_description`

Project Diagram

diagram

Built With

Prerequisites

DOCKER

Link for download:
Version: Docker Desktop 4.6.1 
*** Verify that Docker has access to atleast 12GB Memory (RAM) ***

Getting Started

Follow the next steps for getting started with iceberg.

Installation

Pull the iceberg's docker image.

docker pull 100200300400/iceberg_image:firsttry

Run the iceberg's docker container and keep it alive.

docker run -d --name iceberg_container 100200300400/iceberg_image:firsttry tail -f /dev/null

If the command fails try to replace the image name with the image id.

docker run -d --name iceberg_container <image-id>  tail -f /dev/nul

Tests (NOT IMPLEMENTED YET)

We recommend you to run the test before you continue.

 python -m unittest test/test_iceberg.py

Usage

Copy your input data from the host machine into the iceberg's docker container

Copy your genome file - hg.fa.

docker cp /ABSULUTE/HOST/PATH/GENOME/hg.fa iceberg_container:root/input/genome/hg.fa

Copy your treatment experiment directory - TX_DIRECTORY.

docker cp /ABSULUTE/HOST/PATH/TX_DIRECTORY/ iceberg_container:root/input/TX_DIRECTORY/

Copy your control experiment directory - CONTROL_DIRECTORY.

docker cp /ABSULUTE/HOST/PATH/CONTROL_DIRECTORY/ iceberg_container:root/input/CONTROL_DIRECTORY/

Create your iceberg input file.

Copy your iceberg input file - iceberg_input_file.yaml.

docker cp /ABSULUTE/HOST/PATH/iceberg_input_file.yaml iceberg_container:root/input/iceberg_input_file.yaml

Running Procedure.

Access the iceberg docker container interactive shell after you copy your input data.
```
docker exec -ti iceberg_container sh
```
Get into the container root directory.
```
cd root
```

Run the iceberg command, exit the container when it done.

python iceberg/analyzer.py --input_file_path input/iceberg_input_file.yaml

Copy your output data from the iceberg's docker container into the host machine

Copy your iceberg output folder - ICEBERG_OUTPUT.

docker cp iceberg_container:root/ICEBERG_OUTPUT/ /ABSULUTE/HOST/PATH/ICEBERG_OUTPUT/

Iceberg input file

The input.yaml file contains the following arguments relative to the iceberg's docker container:

OUTPUT_FOLDER_PATH: Absolute path to folder that will contain the iceberg output.
ANALYZER STEPS: The steps that will execute.
- UMI: Step Documentation
- EXPERIMENT_LIBRARIES_DETECTION: Step Documentation
- EXPERIMENT_TRACES_REMOVAL: Step Documentation
- BWA: Step Documentation
- UNITE_READS_TO_ICEBERGS: Step Documentation
- MERGE_TREATMENT_AND_CONTROL_ICEBERGS_BY_LOCUS: Step Documentation
- UNITE_CLOSE_ICEBERGS_SITES: Step Documentation
- CALCULATE_ICEBERGS_SITES_PROFILE: Step Documentation
- BREAKS_CLASSIFY: Step Documentation
- GUIDERNA_ALIGNMENT: Step Documentation
- REPORTS: Step Documentation
EXPERIMENTS:
- GENERAL:
  - REFERENCE_GENOME_PATH: Absolute path to genome reference file.
  - EXPERIMENTS_TAG: The tag that was injected at the cut events during the GUIDEseq experiment.
- TX:
  - NAME: Name for the experiment, use to name files during the pipeline.
  - EXPERIMENT_FOLDER_PATH: Absolute path to folder that contain the following experiment files.
  - R1: Treatment R1 fastq file name.
  - R2: Treatment R2 fastq file name.
  - I1: Treatment I1 fastq file name.
  - I2: Treatment I2 fastq file name.
  - GUIDERNA: The GuideRNA used in the GUIDEseq experiment.
- CONTROL:
  - NAME: Name for the experiment, use to name files during the pipeline.
  - EXPERIMENT_FOLDER_PATH: Absolute path to folder that contain the following experiment files.
  - R1: Control R1 fastq file name.
  - R2: Control R2 fastq file name.
  - I1: Control I1 fastq file name.
  - I2: Control I2 fastq file name.
HYPERPARAMATERS:
- UMI_BPS_AMOUNT_FROM_READS_START: For each paired-end reads and their indexes - r1, r2, i1,i2 from an experiment R1, R2, I1, I2 fastqs, Unique Molecular Index (UMI) is assigned to both r1 and r2 by
  i2[8:16]_r1[0:umi_bps_amount_from_reads_start]_r2[0:umi_bps_amount_from_reads_start]
- MIN_QUALITY: The minimum quality of a read for it to be considered in the consolidation process at the UMI step.
- MIN_FREQUENCY: The minimum frequency of a read for the position to be consolidated in the consolidation process at the UMI step.
- MAX_READS_DISTANCE: The maximum distance allowed between read and iceberg (in nts) for the read to be joined to the iceberg.
- MAX_ICEBERG_DISTANCE: The maximum distance allowed between two icebergs-sites (in nts) for the icebergs sites to be joined.
- MAX_ALIGNMENTS_HAMMING_DISTANCE: The maximum Hamming distance allowed between two sequences alignment to be valid.
- NOISE_BINS_AND_CONTROL_MAPQ_PERCENTILE: Lists of bins for the noise sites filtering. Where
  bin defined as: [[bin-min-icebergs-mapq, bin-max-icebergs-mapq), bin-icebergs-control-percentile] Example : [[[0, 1], 0.99],[[1, 50] 0.99],[[50, 61] 0.95]]
- CRISPR_ACTIVITY_THRESHOLD: Used for classify the icebergs sites in to CRISPR activities and Spontanues Break Sites, see BREAKS_CLASSIFY step for more information.

Iceberg outputs

Here we will explain about the iceberg outputs.

For more Information, please refer to the Documentation

Roadmap

[] Feature 1
[] Feature 2
[] Feature 3
- [] Nested Feature

See the open issues for a full list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Your Name - @twitter_handle - email@email_client.com

Project Link: https://github.com/github_username/repo_name

Acknowledgments

About

No description, website, or topics provided.

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages