Skip to content

YakhiniGroup/ICEBERG

Repository files navigation


Logo

ICEBERG PROJECT

Containerized tool for Analyzing and visualizing GUIDEseq experiments.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Project Diagram
  3. Built With
  4. Prerequisites
  5. Getting Started
  6. Usage
  7. Roadmap
  8. Contributing
  9. License
  10. Contact
  11. Acknowledgments

About The Project

Product Name Screen Shot

The purpose of the iceberg tool is to preform our analysis by steps to the well-known GUIDE-Seq experiments. It takes as input a file (.yaml) which contains the paths to raw sequencing reads files (FASTQ) of treatment and control experiments along with other parameters and it identifies different genomic sites types:

  1. CRISPR activity sites: On/Off targets.
  2. Spontaneous break sites: Spontaneous DNA breaks (which appear also in the control experiment).
  3. Noise sites: Sites with low number of reads in comparison to the control experiment sites
    (where sites are grouped to bins by their reads MAPQ and comparison done on each bin separately).

The package generate report (.html) that contains over view on the iceberg pipeline and links to three reports - one for each site type, where each one contains all the corresponding genomic sites. Each site is visualized with IGV and other complementary visualization components controlled by a sites table which contains complementary analyzed information. To avoid retyping too much info. Do a search and replace with your text editor for the following: `github_username`, `repo_name`, `twitter_handle`, `linkedin_username`, `email`, `email_client`, `project_title`, `project_description`

(back to top)

Project Diagram

(back to top)

Built With

(back to top)

Prerequisites

  • DOCKER
    Link for download:
    Version: Docker Desktop 4.6.1 
    *** Verify that Docker has access to atleast 12GB Memory (RAM) ***
    

Getting Started

Follow the next steps for getting started with iceberg.

Installation

  1. Pull the iceberg's docker image.

    docker pull 100200300400/iceberg_image:firsttry
    
  2. Run the iceberg's docker container and keep it alive.

    docker run -d --name iceberg_container 100200300400/iceberg_image:firsttry tail -f /dev/null
    

    If the command fails try to replace the image name with the image id.

    docker run -d --name iceberg_container <image-id>  tail -f /dev/nul
    

(back to top)

Tests (NOT IMPLEMENTED YET)

We recommend you to run the test before you continue.

 python -m unittest test/test_iceberg.py

(back to top)

Usage

Copy your input data from the host machine into the iceberg's docker container

  1. Copy your genome file - hg.fa.

    docker cp /ABSULUTE/HOST/PATH/GENOME/hg.fa iceberg_container:root/input/genome/hg.fa
    
  2. Copy your treatment experiment directory - TX_DIRECTORY.

    docker cp /ABSULUTE/HOST/PATH/TX_DIRECTORY/ iceberg_container:root/input/TX_DIRECTORY/
    
  3. Copy your control experiment directory - CONTROL_DIRECTORY.

    docker cp /ABSULUTE/HOST/PATH/CONTROL_DIRECTORY/ iceberg_container:root/input/CONTROL_DIRECTORY/
    
  4. Create your iceberg input file.

  5. Copy your iceberg input file - iceberg_input_file.yaml.

    docker cp /ABSULUTE/HOST/PATH/iceberg_input_file.yaml iceberg_container:root/input/iceberg_input_file.yaml
    

(back to top)

Running Procedure.

  1. Access the iceberg docker container interactive shell after you copy your input data.

    docker exec -ti iceberg_container sh
    
  2. Get into the container root directory.

    cd root
    
  3. Run the iceberg command, exit the container when it done.

    python iceberg/analyzer.py --input_file_path input/iceberg_input_file.yaml
    

(back to top)

Copy your output data from the iceberg's docker container into the host machine

  1. Copy your iceberg output folder - ICEBERG_OUTPUT.
    docker cp iceberg_container:root/ICEBERG_OUTPUT/ /ABSULUTE/HOST/PATH/ICEBERG_OUTPUT/
    

(back to top)

Iceberg input file

The input.yaml file contains the following arguments relative to the iceberg's docker container:

  • OUTPUT_FOLDER_PATH: Absolute path to folder that will contain the iceberg output.

  • ANALYZER STEPS: The steps that will execute.

  • EXPERIMENTS:

    • GENERAL:

      • REFERENCE_GENOME_PATH: Absolute path to genome reference file.
      • EXPERIMENTS_TAG: The tag that was injected at the cut events during the GUIDEseq experiment.
    • TX:

      • NAME: Name for the experiment, use to name files during the pipeline.

      • EXPERIMENT_FOLDER_PATH: Absolute path to folder that contain the following experiment files.

      • R1: Treatment R1 fastq file name.

      • R2: Treatment R2 fastq file name.

      • I1: Treatment I1 fastq file name.

      • I2: Treatment I2 fastq file name.

      • GUIDERNA: The GuideRNA used in the GUIDEseq experiment.

    • CONTROL:

      • NAME: Name for the experiment, use to name files during the pipeline.
      • EXPERIMENT_FOLDER_PATH: Absolute path to folder that contain the following experiment files.
      • R1: Control R1 fastq file name.
      • R2: Control R2 fastq file name.
      • I1: Control I1 fastq file name.
      • I2: Control I2 fastq file name.
  • HYPERPARAMATERS:

    • UMI_BPS_AMOUNT_FROM_READS_START: For each paired-end reads and their indexes - r1, r2, i1,i2 from an experiment R1, R2, I1, I2 fastqs, Unique Molecular Index (UMI) is assigned to both r1 and r2 by
      i2[8:16]_r1[0:umi_bps_amount_from_reads_start]_r2[0:umi_bps_amount_from_reads_start]

    • MIN_QUALITY: The minimum quality of a read for it to be considered in the consolidation process at the UMI step.

    • MIN_FREQUENCY: The minimum frequency of a read for the position to be consolidated in the consolidation process at the UMI step.

    • MAX_READS_DISTANCE: The maximum distance allowed between read and iceberg (in nts) for the read to be joined to the iceberg.

    • MAX_ICEBERG_DISTANCE: The maximum distance allowed between two icebergs-sites (in nts) for the icebergs sites to be joined.

    • MAX_ALIGNMENTS_HAMMING_DISTANCE: The maximum Hamming distance allowed between two sequences alignment to be valid.

    • NOISE_BINS_AND_CONTROL_MAPQ_PERCENTILE: Lists of bins for the noise sites filtering. Where
      bin defined as: [[bin-min-icebergs-mapq, bin-max-icebergs-mapq), bin-icebergs-control-percentile] Example : [[[0, 1], 0.99],[[1, 50] 0.99],[[50, 61] 0.95]]

    • CRISPR_ACTIVITY_THRESHOLD: Used for classify the icebergs sites in to CRISPR activities and Spontanues Break Sites, see BREAKS_CLASSIFY step for more information.

(back to top)

Iceberg outputs

Here we will explain about the iceberg outputs.

For more Information, please refer to the Documentation

(back to top)

Roadmap

  • [] Feature 1
  • [] Feature 2
  • [] Feature 3
    • [] Nested Feature

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Your Name - @twitter_handle - email@email_client.com

Project Link: https://github.com/github_username/repo_name

(back to top)

Acknowledgments

(back to top)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published