Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
gprezza committed Jun 29, 2023
0 parents commit 5b9ae91
Show file tree
Hide file tree
Showing 5 changed files with 1,148 additions and 0 deletions.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 gprezza

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
114 changes: 114 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# CRISPRi_designer
Tools to design gRNA/array pools targeting a list of genes.

## Dependencies
Create a conda environment from the environment.yml file:
```
conda env create -f environment.yml
```
## get_sequences_from_gff
The first step in the pipeline is to obtain a fasta file with the sequences of the genes that are to be targeted. These sequences must contain the promoter. The file can be created from a gff annotation with the get_sequences_from_gff.py script.

### Usage
```
usage: get_sequences_from_gff.py [-h] [--outname OUTNAME]
[--promoter PROMOTER] [--genetype GENETYPE]
[--attribute ATTRIBUTE] [--name NAME]
gff_file fasta_file
Extract gene sequences from gff and genome files.
positional arguments:
gff_file Input annotation (gff) file with sequences of the
target genes.
fasta_file Input genome file.
optional arguments:
-h, --help show this help message and exit
--outname OUTNAME, -o OUTNAME
Base name of the output files. The .fasta extension
will be appended to this. (Default: target_genes.
--promoter PROMOTER, -p PROMOTER
Length before the gene start to be considered as the
promoter. (Default: 50)
--genetype GENETYPE, -t GENETYPE
Consider only genes of this type (Third column of the
gff file).
--attribute ATTRIBUTE, -a ATTRIBUTE
Consider only genes that have this attribute (ninth
column of the gff file).E.g. if you want to filter for
the attribute "sRNA_type=Intergenic"
writesRNA_type=Intergenic.
--name NAME, -n NAME Take gene name from this attribute of the gff
attribute column (ninth column of the gff file)
```
### Input files
* A gff annotation (*gff_file*) in the standard format (tab-separated, no header, columns: seqname,feature,start,end,score,strand,frame,attribute).
* A (multi)fasta file (*fasta_file*) containing the genome file (chromosome(s) and plasmid(s), if present).

### Output
* A _basename_coordinates.txt_ file containing the coordinates of each gene (promoter included).
* A _basename_sequences.fasta_ file containing the genes (promoter included).
_basename_ is defined by the --outname option.

## design_CRISPRi_gRNAs.py
This script designs the gRNA/array library.

### Usage

```
usage: design_CRISPRi_gRNAs.py [-h] [--processors PROCESSORS]
[--in_name IN_NAME] [--coordinates COORDINATES]
[--folding FOLDING] [--left LEFT]
[--right RIGHT] [--arrays] [--non_targeting]
[--PAM PAM] [--spacer_length SPACER_LENGTH]
[--promoter_length PROMOTER_LENGTH]
[--mutants MUTANTS] --reference REFERENCE
Designs spacers targeting the input genes.
optional arguments:
-h, --help show this help message and exit
--processors PROCESSORS, -p PROCESSORS
Number of processors (Default: 1).
--in_name IN_NAME, -i IN_NAME
Input base name of the fasta file with sequences of
the target genes.(Default: target_genes.)
--coordinates COORDINATES, -c COORDINATES
File with coordinates of the target genes. (Default: "
--in_name + _coordinates.txt".)
--folding FOLDING, -f FOLDING
Minimal accepted correct folding probability of the
arrays. Scale: 0-1. (Default: 0.2).
--left LEFT, -lo LEFT
Left overhang for cloning mature spacers. (Default:
atctttgcagtaatttctactgttgtagat).
--right RIGHT, -ro RIGHT
Right overhang for cloning mature spacers. (Default:
ccggcttatcggtcagtttcacctgattta).
--arrays, -a The script attempts to design one array per target
gene if this flag is present.
--non_targeting, -nt The script designs nontargeting gRNAs if this flag is
present.
--PAM PAM, -pam PAM PAM sequence. (Default: TTV).
--spacer_length SPACER_LENGTH, -sl SPACER_LENGTH
Length of each spacer. No more than 26 if the -a flag
is used. (Default: 20).
--promoter_length PROMOTER_LENGTH, -pl PROMOTER_LENGTH
Length of the promoter region. (Default: 50).
--mutants MUTANTS, -m MUTANTS
Number of spacers (including one array if using the -a
flag) to be designed per targeted gene. (Default: 4).
--reference REFERENCE, -r REFERENCE
Folder containing the genome sequence(s).
```
### Input
* The two files generated by _get_sequences_from_gff_, which are indicated by *in_name* (and *--coordinates*, if the names are different).

### Output
The following files are generated by the script:
* _constructs_mature.txt_: fasta file with all gRNAs + homology arms. Non-targeting gRNAs are also included if the _-nt_ option is used.
* _mature_oligo_pool2.txt_: text file with all gRNAs + homology arms. Like the previous file, but without sequence ID.
* _nontargeting_oligo_pool2.txt_: only if the _-nt_ option is used. Text file with the non-targeting gRNAs + homology arms.
* _constructs_array.txt_: only if the _-a_ option is used. Fasta file with all arrays.
* _array_oligo_pool_n.txt_: only if the _-a_ option is used. n Text files with the sequences of the oligos needed to construct the array via CRATES. One file is generated per each pool to be ordered.
Loading

0 comments on commit 5b9ae91

Please sign in to comment.