From 338f0f1e0fe2f10d88b004a1748b51e2784316eb Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Thu, 21 Nov 2024 15:59:20 +0000 Subject: [PATCH] build based on e13c622 --- dev/.documenter-siteinfo.json | 2 +- dev/accessing/index.html | 33 +++++++++++++++++---- dev/examples/index.html | 53 ++++++++++++++++++++++++++++++++-- dev/genes/index.html | 2 +- dev/index.html | 2 +- dev/io/index.html | 6 ++-- dev/loci/index.html | 2 +- dev/objects.inv | Bin 712 -> 770 bytes dev/search_index.js | 2 +- 9 files changed, 87 insertions(+), 15 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index faab9b1..8858dd9 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-11-18T09:20:02","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-11-21T15:59:15","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/accessing/index.html b/dev/accessing/index.html index a88f2db..1d8b6d2 100644 --- a/dev/accessing/index.html +++ b/dev/accessing/index.html @@ -1,14 +1,37 @@ -Accessing and modifying annotations · GenomicAnnotations.jl

Accessing and modifying annotations

Feature

Features (genes) can be added using addgene!. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section).

GenomicAnnotations.addgene!Function
addgene!(chr::Record, feature, locus; kw...)

Add gene to chr. locus can be an AbstractLocus, a String, a UnitRange, or a StepRange (for decreasing ranges, which will be annotated on the complementary strand).

Example

addgene!(chr, "CDS", 1:756;
+Accessing and modifying annotations · GenomicAnnotations.jl

Accessing and modifying annotations

Features

The following functions can be used to read and modify the data associated with a gene:

GenomicAnnotations.locus!Function
locus!(gene::AbstractGene, loc)
+locus!(gene::AbstractGene, loc::AbstractLocus)

Replace gene with a new Gene with loc as its Locus. If loc is not an AbstractLocus, it is parsed with Locus(loc).

source
GenomicAnnotations.feature!Function
feature!(g::Gene, f::Symbol)

Change the feature of g to f, returning a new instance of Gene. Since Genes are immutable, feature! only mutates the parent of g and not g itself. Thus, in the first example below the original unmodified g is printed, not the updated version:

# This will not work as expected:
+for source in @genes(chr, source)
+    feature!(source, :region)
+    println(source)
+end
+
+# But this will:
+for source in @genes(chr, source)
+    source = feature!(source, :region)
+    println(source)
source
Base.parentFunction
parent(A)

Return the underlying parent object of the view. This parent of objects of types SubArray, SubString, ReshapedArray or LinearAlgebra.Transpose is what was passed as an argument to view, reshape, transpose, etc. during object creation. If the input is not a wrapped object, return the input itself. If the input is wrapped multiple times, only the outermost wrapper will be removed.

Examples

julia> A = [1 2; 3 4]
+2×2 Matrix{Int64}:
+ 1  2
+ 3  4
+
+julia> V = view(A, 1:2, :)
+2×2 view(::Matrix{Int64}, 1:2, :) with eltype Int64:
+ 1  2
+ 3  4
+
+julia> parent(V)
+2×2 Matrix{Int64}:
+ 1  2
+ 3  4
source
GenomicAnnotations.attributesFunction
attributes(g::Gene)

Return an immutable NamedTuple containing copies of all annotated attributes of g. Missing attributes are excluded. See genedata for a non-allocating way to access the gene data directly.

source

Features (genes) can be added using addgene!. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section).

GenomicAnnotations.addgene!Function
addgene!(chr::Record, feature, locus; kw...)

Add gene to chr. locus can be an AbstractLocus, a String, a UnitRange, or a StepRange (for decreasing ranges, which will be annotated on the complementary strand).

Example

addgene!(chr, "CDS", 1:756;
     locus_tag = "gene0001",
-    product = "Chromosomal replication initiator protein dnaA")
source

After adding a new feature, sort! can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome:

sort!(chr)

Existing features can be removed using delete!:

Base.delete!Method
delete!{T}(h::MutableBinaryHeap{T}, i::Int)

Deletes the element with handle i from heap h .

source
delete!(collection, key)

Delete the mapping for the given key in a collection, and return the collection.

Examples

julia> d = RobinDict("a"=>1, "b"=>2)
+    product = "Chromosomal replication initiator protein dnaA")
source

After adding a new feature, sort! can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome:

sort!(chr)

Existing features can be removed using delete!:

Base.delete!Method
delete!{T}(h::MutableBinaryHeap{T}, i::Int)

Deletes the element with handle i from heap h .

source
delete!(collection, key)

Delete the mapping for the given key in a collection, and return the collection.

Examples

julia> d = RobinDict("a"=>1, "b"=>2)
 RobinDict{String,Int64} with 2 entries:
   "b" => 2
   "a" => 1
 
 julia> delete!(d, "b")
 RobinDict{String,Int64} with 1 entry:
-  "a" => 1
source
delete!(tree::RBTree, key)

Deletes key from tree, if present, else returns the unmodified tree.

source
delete!(gene::AbstractGene)

Delete gene from parent(gene). Warning: does not work when broadcasted! Use delete!(::AbstractVector{Gene}) instead.

source
Base.delete!Method
delete!(genes::AbstractArray{Gene, 1})

Delete all genes in genes from parent(genes[1]).

Example

delete!(@genes(chr, length(gene) <= 60))
source

Qualifiers

Features can have multiple qualifiers, which can be modified using Julia's property syntax:

# Remove newspace from gene product descriptions
+  "a" => 1
source
delete!(tree::RBTree, key)

Deletes key from tree, if present, else returns the unmodified tree.

source
delete!(gene::AbstractGene)

Delete gene from parent(gene). Warning: does not work when broadcasted! Use delete!(::AbstractVector{Gene}) instead.

source
Base.delete!Method
delete!(genes::AbstractArray{Gene, 1})

Delete all genes in genes from parent(genes[1]).

Example

delete!(@genes(chr, length(gene) <= 60))
source

Qualifiers

Features can have multiple attributes/qualifiers, which can be modified using Julia's property syntax:

# Remove newspace from gene product descriptions
 for gene in @genes(chr, CDS)
     replace!(gene.product, '\n' => ' ')
 end

Properties also work on views of genes, typically generated using @genes:

interestinggenes = readlines("/path/to/list/of/interesting/genes.txt")
@@ -24,7 +47,7 @@
  "EC:4.3.2.1"
 
 julia> eltype(chr.genedata[!, :EC_number])
-Union{Missing, Array{String,1}}
source

Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use get() than to access the property directly.

# chr.genes[2].pseudo returns missing, so this will throw an error
+Union{Missing, Array{String,1}}
source

Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use get() than to access the property directly.

# chr.genes[2].pseudo returns missing, so this will throw an error
 if chr.genes[2].pseudo
     println("Gene 2 is a pseudogene")
 end
@@ -32,4 +55,4 @@
 # ... but this works:
 if get(chr.genes[2], :pseudo, false)
     println("Gene 2 is a pseudogene")
-end

Sequences

The sequence of a Chromosome chr is stored in chr.sequence. Sequences of individual features can be read with sequence:

GenomicAnnotations.sequenceMethod
sequence(gene::AbstractGene; translate = false, preserve_alternate_start = false)

Return genomic sequence for gene. If translate is true, the sequence will be translated to a LongAA, excluding the stop, otherwise it will be returned as a LongDNA{4} (including the stop codon). If preserve_alternate_start is set to false, alternate start codons will be assumed to code for methionine. ```

source
+end

Sequences

The sequence of a Chromosome chr is stored in chr.sequence. Sequences of individual features can be read with sequence:

GenomicAnnotations.sequenceMethod
sequence(gene::AbstractGene; translate = false, preserve_alternate_start = false)

Return genomic sequence for gene. If translate is true, the sequence will be translated to a LongAA, excluding the stop, otherwise it will be returned as a LongDNA{4} (including the stop codon). If preserve_alternate_start is set to false, alternate start codons will be assumed to code for methionine. ```

source
diff --git a/dev/examples/index.html b/dev/examples/index.html index 188fb22..5e8e669 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -24,10 +24,59 @@ for chr in chrs write(w, chr) end -end

Converting between formats

Note that GenBank and GFF3 headers do not contain the same information, thus all information in the header is lost when saving annotations as another format.

using GenomicAnnotations
+end

Converting between formats

Annotations can be read from one file format and written as another. Converting between the supported human-readable formats (GenBank and EMBL) or the tab-delimited formats (GFF3 and GTF) will likely work out of the box, but converting from a human-readable format to a tab-delimited format, or vice versa, may need some human intervention. Currently, GenomicAnnotations does not make any attempt to rename columns or perform any sanity checks to ensure that the resulting file meets specifications. Refer to the respective format specifications for details on what attributes need to be included, etc. Notably, GenBank and GFF3 headers do not contain the same information, and GTF files lack a header altogether, thus all information in the header is lost when saving annotations as another format. GTF files also do not allow the inclusion of sequence data, unlike GFF3. Below is a simple example script that demonstrates some of the changes that need to be made. If your use-case includes more complex features, such as multi-exon genes, you will likely need to make more changes as part of the convertion.

using GenomicAnnotations
+using DataFrames
 chrs = readgbk("genome.gbk")
 open(GFF.Writer, "genome.gff") do w
     for chr in chrs
+        # GenBank features often contain features that are not usually included
+        # in GFF3 files, so let's remove some:
+        cols_to_remove = intersect(["translation", "mol_type", "organism"], names(chr.genedata))
+        if !isempty(cols_to_remove)
+            chr.genedata = chr.genedata[:, Not(cols_to_remove)]
+        end
+        # The GenBank format uses the :source feature to store metadata about
+        # the record, but in GFF3, :source is the name of the column which
+        # contains the sequence name. Thus, we need to change the GenBank
+        # :source to the GFF3 equivalent :region. According to the GenBank
+        # specification, :source is mandatory, but it's best to be safe and
+        # check that it's really there:
+        source_entries = @genes(chr, source)
+        if !isempty(source_entries)
+            for source in source_entries
+                region = feature!(source, :region)
+                region.Name = chr.name
+                region.ID = chr.name
+                # GenBank files include information about circularity of a
+                # contig in its header, but in the GFF3 format this
+                # information is encoded in the "Is_circular" attribute of
+                # the first :region feature:
+                if occursin("circular", chr.header)
+                    region.Is_circular = true
+                end
+            end
+        else
+            # If the :source feature is missing, we'll have to create a :region
+            # from scratch:
+            addgene!(chr, "region", 1:length(chr.sequence);
+                Name = chr.name,
+                ID = chr.name,
+                Is_circular = occursin("circular", chr.header))
+            sort!(chr.genes)
+        end
+        # Most features, such as :CDS or :tRNA, have a corresponding :gene
+        # that it belongs to. In GFF3, this hierarchical relationship is shown
+        # using the "ID" and "Parent" attributes. Here, we set the "ID"
+        # attribute of all :gene features to match their "locus_tag", and then
+        # set the "Parent" attributes of all non-:gene features to match the
+        # "ID" of their respective :gene, if there is one:
+        gene_features = @genes(chr, gene)
+        gene_features.ID .= gene_features.locus_tag
+        for gene in @genes(chr, !gene)
+            if get(gene, :locus_tag, "missing") in skipmissing(gene_features.locus_tag)
+                gene.Parent = gene.locus_tag
+            end
+        end
         write(w, chr)
     end
-end
+end diff --git a/dev/genes/index.html b/dev/genes/index.html index f1c3980..a3f04ef 100644 --- a/dev/genes/index.html +++ b/dev/genes/index.html @@ -28,4 +28,4 @@ @genes(chr, :locus_tag in d[$:category1]) gene = chr.genes[5] -@genes(chr, gene == $gene)source +@genes(chr, gene == $gene)source diff --git a/dev/index.html b/dev/index.html index cb09a52..48406e8 100644 --- a/dev/index.html +++ b/dev/index.html @@ -36,4 +36,4 @@ open(GenBank.Writer, "updated.gbk") do w write(w, chr) -end +end diff --git a/dev/io/index.html b/dev/io/index.html index dc88526..ff0b185 100644 --- a/dev/io/index.html +++ b/dev/io/index.html @@ -7,8 +7,8 @@ for record in record print(record) end -endsource
GenomicAnnotations.GFF.ReaderType
GFF.Reader(input::IO)

Create a data reader of the GFF3 file format.

source

Output

Annotations can be printed with GenBank formatting using GenBank.Writer, and as GFF3 with GFF.Writer. Headers are not automatically converted between formats; GFF.Writer only prints the header of the first Record, and only if it starts with a #, while GenBank.Writer prints a default header if the stored one starts with #.

GenomicAnnotations.GenBank.WriterType
GenBank.Writer(output::IO; width=70)

Create a data writer of the GenBank file format.

open(GenBank.Writer, outfile) do writer
+end
source
GenomicAnnotations.GFF.ReaderType
GFF.Reader(input::IO)

Create a data reader of the GFF3 file format.

source

Output

Annotations can be printed with GenBank formatting using GenBank.Writer, and as GFF3 with GFF.Writer. Headers are not automatically converted between formats; GFF.Writer only prints the header of the first Record, and only if it starts with a #, while GenBank.Writer prints a default header if the stored one starts with #.

GenomicAnnotations.GenBank.WriterType
GenBank.Writer(output::IO; width=70)

Create a data writer of the GenBank file format.

open(GenBank.Writer, outfile) do writer
     write(writer, genome)
-end
source
GenomicAnnotations.GFF.WriterType
GFF.Writer(output::IO; width=70)

Create a data writer of the GFF file format.

open(GFF.Writer, outfile) do writer
+end
source
GenomicAnnotations.GFF.WriterType
GFF.Writer(output::IO; width=70)

Create a data writer of the GFF file format.

open(GFF.Writer, outfile) do writer
     write(writer, genome)
-end
source

In the REPL, instances of Gene are displayed as they would be in the annotation file.

+endsource

In the REPL, instances of Gene are displayed as they would be in the annotation file.

diff --git a/dev/loci/index.html b/dev/loci/index.html index 3218171..048c690 100644 --- a/dev/loci/index.html +++ b/dev/loci/index.html @@ -6,4 +6,4 @@ complement(1..3)

The eachposition(locus) function is provided for iterating over the individual genomic positions in the locus. Note that this ignores any metadata such as strandedness.

julia> for p in eachposition(Locus("complement(join(1..3,7..9))"))
            print(p)
        end
-987321
+987321 diff --git a/dev/objects.inv b/dev/objects.inv index a37d4e27a0d128823815c70d39277508dfbece82..0f086b79109a0cd1cbfce98229976f228f62ceeb 100644 GIT binary patch delta 644 zcmV-~0(veb29O)$wSuc9SacG+0N4GzqPnwr3Mxz@1~K_RZF+ z>VKc@e2*k05<#(8CW!NntB;4`ADb zun?2AkO;=X7X*($J{7o>B}PZ^#L`T5Y1LXxzLdYBjmmNi7|C`Z%T6!f5gNH}Pd7HR z>-E-ce#@B@djZCFF^v;JqjrNjH)!n!ZMiYRKgY`@=Uy;58l7sZQ=p2uFVvJtNu7PU zrN|Edy{2};zkektkVb)c1%jG51rij9SBSn=vDGpmI45x$okND`nuw^2h!zpi5K&jE zi1f0?&ps>zWA^cVn==;q8W`pV#uM2%da$m)6B_j%hAueZZowu-WT#SLcbntjV13r= z4;b$i-2wguP;{?N%5Ws4hC@mYDLHP0ExM$7nkpJru76|2BsEn?BkfmU(gjD~;V}o5 zyy0y*i;n|mqi5(EXrJ$HX~op<`8_ZWhCIjxK4(B0Cwn+yUN1a$pQ|Kv+NpDhGP}O& z7tC^d%#6;Q`_I#K+sr}vs>feO;JH2Ct=E(A{*Z8nY`FRX;=VXhlX(o$h+U3uUPh@a zW?{}j0)4^-{gJ!jcvCl0T#UL^%QYTngo@twhi!t7?%)o=@5Wp;Dx zJ_Ge~?me0oZ%HbW?dc?xKC$w$(EciIc?i$Zf0N#MS*Rvs#+~=cS^Hs}azM)~+t(d# e^!n=W!B&_5=yXm?tt1Ve$KwBP)P#RbNI0jz^gVU} delta 586 zcmV-Q0=4~u2FL}FiGNgGZyGTWe7|3@QOcvaCan~Sr$SRIq)Mo1l|CDL0jECuV*A>l zDF3~-@3@b093UZty)(1+%wnN9zJPUn!ZibO1Mn4<&IFq(r5$xbD?5IYiO_A3&?zj^ zWkMN4YlYghi}hv-)4EO-hfH&^-j%CbJdv7-Ha)CUc);ACMStZ2d{Z7NmJ+`}+a|<~ z6y`!YG#)+yw?+6=;xfvNPVj|hx$M&RdntUWekBW)7cOEXU7@UQdi@Sa;%|FcSd4Dx zOJDq6P$|{|jqPF@C!$8}2Jvo?+6~fjV}O5-ms8GzAln+9X{$3Niuy0736+v~`wB}O z-Tc?4b|Q3%3V+ZjKu`du!7D&ifS{0k+s9VR#NfQdVRRlDqiZ4}J|a>^L}El;sWQ^* z8bA6ljf~O9_ifHl9UjX?=n@y1nU--fzvQ`aOO?%FzUU zx62WlpawX 1000, ! :pseudo)\n println(gene.locus_tag)\nend","category":"page"},{"location":"genes/","page":"Filtering: the @genes macro","title":"Filtering: the @genes macro","text":"@genes","category":"page"},{"location":"genes/#GenomicAnnotations.@genes","page":"Filtering: the @genes macro","title":"GenomicAnnotations.@genes","text":"@genes(chr, exs...)\n\nIterate over and evaluate expressions in exs for all genes in chr.genes, returning genes where all expressions evaluate to true. Any given symbol s in the expression will be substituted for gene.s. The gene itself can be accessed in the expression as gene. Accessing properties of the returned list of genes returns a view, which can be altered.\n\nSome short-hand forms are available to make life easier: CDS, rRNA, and tRNA expand to feature(gene) == \"...\", get(s::Symbol, default) expands to get(gene, s, default)\n\nExamples\n\njulia> chromosome = readgbk(\"example.gbk\")\nChromosome 'example' (5028 bp) with 6 annotations\n\njulia> @genes(chromosome, CDS) |> length\n3\n\njulia> @genes(chromosome, length(gene) < 500)\n CDS 3..206\n /db_xref=\"GI:1293614\"\n /locus_tag=\"tag01\"\n /codon_start=\"3\"\n /product=\"TCP1-beta\"\n /protein_id=\"AAA98665.1\"\n\njulia> @genes(chromosome, ismissing(:gene)) |> length\n2\n\njulia> @genes(chromosome, ismissing(:gene)).gene .= \"Unknown\";\n\njulia> @genes(chromosome, ismissing(:gene)) |> length\n0\n\nAll arguments have to evaluate to true for a gene to be included, so the following expressions are equivalent:\n\n@genes(chr, CDS, length(gene) > 300)\n@genes(chr, CDS && (length(gene) > 300))\n\n@genes returns a Vector{Gene}. Attributes can be accessed with dot-syntax, and can be assigned to\n\n@genes(chr, :locus_tag == \"tag03\")[1].pseudo = true\n@genes(chr, CDS, ismissing(:gene)).gene .= \"unknown\"\n\nSymbols and expressions escaped with $ will be ignored.\n\nd = Dict(:category1 => [\"tag01\", \"tag02\"], :category2 => [\"tag03\"])\n@genes(chr, :locus_tag in d[$:category1])\n\ngene = chr.genes[5]\n@genes(chr, gene == $gene)\n\n\n\n\n\n","category":"macro"},{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Adding-chromosome-name-to-all-locus-tags","page":"Examples","title":"Adding chromosome name to all locus tags","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"When iterating over genes, the parent chromosome can be accessed with parent(::Gene).","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using GenomicAnnotations\nchrs = readgbk(\"genome.gbk\")\nfor gene in @genes(chrs)\n gene.locus_tag = string(parent(gene).name, \"_\", gene.locus_tag)\nend\nprintgbk(\"updated_genome.gbk\", chrs)","category":"page"},{"location":"examples/#Adding-qualifiers","page":"Examples","title":"Adding qualifiers","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"GenomicAnnotations supports arbitrary qualifiers, so you can add any kind of information. The following script reads and adds the output from Phobius (a predictor for transmembrane helices) to the annotations.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using GenomicAnnotations\nchrs = readgbk(\"genome.gbk\")\n\nfunction addphobius!(chr, file)\n @progress for line in readlines(file)\n m = match(r\"^(\\w+) +(\\d+) +\", line)\n if m != nothing\n locus_tag = m[1]\n tmds = parse(Int, m[2])\n @genes(chr, CDS, :locus_tag == locus_tag).phobius .= tmds\n end\n end\nend\n\naddphobius!(chrs, \"phobius.txt\")\n\nopen(GenBank.Writer, \"updated_genome.gbk\") do w\n for chr in chrs\n write(w, chr)\n end\nend","category":"page"},{"location":"examples/#Converting-between-formats","page":"Examples","title":"Converting between formats","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"Note that GenBank and GFF3 headers do not contain the same information, thus all information in the header is lost when saving annotations as another format.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using GenomicAnnotations\nchrs = readgbk(\"genome.gbk\")\nopen(GFF.Writer, \"genome.gff\") do w\n for chr in chrs\n write(w, chr)\n end\nend","category":"page"},{"location":"loci/#Loci","page":"Representing genomic loci","title":"Representing genomci loci","text":"","category":"section"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"The easiest way to create a locus is to use the constructor Locus(s), which takes an AbstractString s and parses it as a GenBank locus string as defined here: https://www.insdc.org/submitting-standards/feature-table/#3.4. Note that remote entry descriptors have not been implemented.","category":"page"},{"location":"loci/#Internal-representation","page":"Representing genomic loci","title":"Internal representation","text":"","category":"section"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"Since v0.4.0, genomic loci are represented using instances of AbstractLocus. Simple descriptors are represented with PointLocus{T} and SpanLocus{T}, where T is an AbstractDescriptor:","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"GenBank string GenomicAnnotations representation Description\n1 PointLocus{SingleNucleotide}(1) Refers to a single nucleotide.\n1^2 PointLocus{BetweenNucleotides}(1) Refers to the internucleotide space immediately after position 1.\n10..20 SpanLocus{ClosedSpan}(10:20) Denotes a closed sequence span.\n10..>20 SpanLocus{OpenRightSpan}(10:20) Denotes a sequence span where the right side is open, i.e. the end-point is undefined but earliest at position 20.\n<10..20 SpanLocus{OpenLeftSpan}(10:20) The left end-point is undefined.\n<10..>20 SpanLocus{OpenSpan}(10:20) Both end-points are undefined.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"These can be wrapped in Complement for loci on the complement strand, e.g. Complement(SpanLocus{ClosedSpan}(10:20)) representing \"complement(10..20)\". Simplified constructors are provided for all AbstractDescriptors, e.g. ClosedSpan(1:10) == SpanLocus(1:10, ClosedSpan).","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"Compound loci are represented with Join and Order. Both types have a single field, loc which contains any number of simple descriptors. They can be wrapped with complement, as can the individual elements in loc.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"Locus(\"complement(join(10..20,30..>40))\") isa Complement{Join{SpanLocus{ClosedSpan}, SpanLocus{OpenRightSpan}}}","category":"page"},{"location":"loci/#Iteration","page":"Representing genomic loci","title":"Iteration","text":"","category":"section"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"AbstractLocus instances are themselves iterable, yielding each compound locus in sequence. If the locus is wrapped in Complement, the compound loci are returned in reverse order, and individually wrapped in Complement.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"julia> for loc in Locus(\"complement(join(1..3,7..9))\")\n println(loc)\n end\ncomplement(7..9)\ncomplement(1..3)","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"The eachposition(locus) function is provided for iterating over the individual genomic positions in the locus. Note that this ignores any metadata such as strandedness.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"julia> for p in eachposition(Locus(\"complement(join(1..3,7..9))\"))\n print(p)\n end\n987321","category":"page"},{"location":"accessing/#Accessing-and-modifying-annotations","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"","category":"section"},{"location":"accessing/#Feature","page":"Accessing and modifying annotations","title":"Feature","text":"","category":"section"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Features (genes) can be added using addgene!. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section).","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"addgene!","category":"page"},{"location":"accessing/#GenomicAnnotations.addgene!","page":"Accessing and modifying annotations","title":"GenomicAnnotations.addgene!","text":"addgene!(chr::Record, feature, locus; kw...)\n\nAdd gene to chr. locus can be an AbstractLocus, a String, a UnitRange, or a StepRange (for decreasing ranges, which will be annotated on the complementary strand).\n\nExample\n\naddgene!(chr, \"CDS\", 1:756;\n locus_tag = \"gene0001\",\n product = \"Chromosomal replication initiator protein dnaA\")\n\n\n\n\n\n","category":"function"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"After adding a new feature, sort! can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"sort!(chr)","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Existing features can be removed using delete!:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"delete!(::Gene)\ndelete!(::AbstractVector{Gene})","category":"page"},{"location":"accessing/#Base.delete!-Tuple{Gene}","page":"Accessing and modifying annotations","title":"Base.delete!","text":"delete!{T}(h::MutableBinaryHeap{T}, i::Int)\n\nDeletes the element with handle i from heap h .\n\n\n\n\n\ndelete!(collection, key)\n\nDelete the mapping for the given key in a collection, and return the collection.\n\nExamples\n\njulia> d = RobinDict(\"a\"=>1, \"b\"=>2)\nRobinDict{String,Int64} with 2 entries:\n \"b\" => 2\n \"a\" => 1\n\njulia> delete!(d, \"b\")\nRobinDict{String,Int64} with 1 entry:\n \"a\" => 1\n\n\n\n\n\ndelete!(tree::RBTree, key)\n\nDeletes key from tree, if present, else returns the unmodified tree.\n\n\n\n\n\ndelete!(gene::AbstractGene)\n\nDelete gene from parent(gene). Warning: does not work when broadcasted! Use delete!(::AbstractVector{Gene}) instead.\n\n\n\n\n\n","category":"method"},{"location":"accessing/#Base.delete!-Tuple{AbstractVector{Gene}}","page":"Accessing and modifying annotations","title":"Base.delete!","text":"delete!(genes::AbstractArray{Gene, 1})\n\nDelete all genes in genes from parent(genes[1]).\n\nExample\n\ndelete!(@genes(chr, length(gene) <= 60))\n\n\n\n\n\n","category":"method"},{"location":"accessing/#Qualifiers","page":"Accessing and modifying annotations","title":"Qualifiers","text":"","category":"section"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Features can have multiple qualifiers, which can be modified using Julia's property syntax:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"# Remove newspace from gene product descriptions\nfor gene in @genes(chr, CDS)\n replace!(gene.product, '\\n' => ' ')\nend","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Properties also work on views of genes, typically generated using @genes:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"interestinggenes = readlines(\"/path/to/list/of/interesting/genes.txt\")\n@genes(chr, CDS, :locus_tag in interestinggenes).interesting .= true","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Sometimes features have multiple instances of the same qualifier, such genes having several EC-numbers. Assigning qualifiers with property syntax overwrites any data that was previously stored for that feature, and trying to assign a vector of values to a qualifier that is currently storing scalars will result in an error, so to safely assign qualifiers that might have more instances one can use pushproperty!:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"pushproperty!","category":"page"},{"location":"accessing/#GenomicAnnotations.pushproperty!","page":"Accessing and modifying annotations","title":"GenomicAnnotations.pushproperty!","text":"pushproperty!(gene::AbstractGene, qualifier::Symbol, value::T)\n\nAdd a property to gene, similarly to Base.setproperty!(::gene), but if the property is not missing in gene, it will be transformed to store a vector instead of overwriting existing data.\n\njulia> eltype(chr.genedata[!, :EC_number])\nUnion{Missing,String}\n\njulia> chr.genes[1].EC_number = \"EC:1.2.3.4\"\n\"EC:1.2.3.4\"\n\njulia> pushproperty!(chr.genes[1], :EC_number, \"EC:4.3.2.1\"); chr.genes[1].EC_number\n2-element Array{String,1}:\n \"EC:1.2.3.4\"\n \"EC:4.3.2.1\"\n\njulia> eltype(chr.genedata[!, :EC_number])\nUnion{Missing, Array{String,1}}\n\n\n\n\n\n","category":"function"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use get() than to access the property directly.","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"# chr.genes[2].pseudo returns missing, so this will throw an error\nif chr.genes[2].pseudo\n println(\"Gene 2 is a pseudogene\")\nend\n\n# ... but this works:\nif get(chr.genes[2], :pseudo, false)\n println(\"Gene 2 is a pseudogene\")\nend","category":"page"},{"location":"accessing/#Sequences","page":"Accessing and modifying annotations","title":"Sequences","text":"","category":"section"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"The sequence of a Chromosome chr is stored in chr.sequence. Sequences of individual features can be read with sequence:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"sequence(::Gene)","category":"page"},{"location":"accessing/#GenomicAnnotations.sequence-Tuple{Gene}","page":"Accessing and modifying annotations","title":"GenomicAnnotations.sequence","text":"sequence(gene::AbstractGene; translate = false, preserve_alternate_start = false)\n\nReturn genomic sequence for gene. If translate is true, the sequence will be translated to a LongAA, excluding the stop, otherwise it will be returned as a LongDNA{4} (including the stop codon). If preserve_alternate_start is set to false, alternate start codons will be assumed to code for methionine. ```\n\n\n\n\n\n","category":"method"},{"location":"#GenomicAnnotations.jl","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"","category":"section"},{"location":"#Description","page":"GenomicAnnotations.jl","title":"Description","text":"","category":"section"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank and GFF3 file formats.","category":"page"},{"location":"#Installation","page":"GenomicAnnotations.jl","title":"Installation","text":"","category":"section"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"julia>]\npkg> add GenomicAnnotations","category":"page"},{"location":"#Examples","page":"GenomicAnnotations.jl","title":"Examples","text":"","category":"section"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"GenBank and GFF3 files are read with readgbk(input) and readgff(input), which return vectors of Records. input can be an IOStream or a file path. GZipped data can be read by setting the keyword gunzip to true, which is done automatically if a filename ending in \".gz\" is passed as input. If we're only interested in the first chromosome in example.gbk we only need to store the first record.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"chr = readgbk(\"test/example.gbk\")[1]","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Another way to read files is to use the corresponding Reader directly:","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"open(GenBank.Reader, \"test/example.gbk\") do reader\n for record in reader\n println(record.name)\n end\nend","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Records have five fields, name, header, genes, genedata, and sequence. The name is read from the header, which is stored as a string. The annotation data is stored in genedata, but generally you should use genes to access that data. For example, it can be used to iterate over annotations, and to modify them.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"for gene in chr.genes\n gene.locus_tag = \"$(chr.name)_$(gene.locus_tag)\"\nend\n\nchr.genes[2].locus_tag = \"test123\"","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"The locus of a Gene is represented by an AbstractLocus (see Loci), which can be retrieved with locus(gene). The locus of a gene can be updated with locus!(gene, newlocus). The easiest way to create a locus is to use the constructor Locus(s), which takes an AbstractString s and parses it as a GenBank locus string as defined here: https://www.insdc.org/submitting-standards/feature-table/#3.4. Note that remote entry descriptors have not been implemented.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"# Creating a new locus\nnewlocus = Locus(\"complement(join(1..100,200..>300))\")\n\n# Assigning a new locus to a gene\nlocus!(gene, newlocus)\n# which is equivalent to\nlocus!(gene, \"complement(join(1..100,200..>300))\")","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"For simplicity, position(gene) is shorthand for locus(gene).position. locus(gene).position gives an iteratable object that generates each individual position in the defined order. Thus:","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"loc = Locus(\"join(4..6,1..3)\")\ncollect(loc.position) # Returns [4,5,6,1,2,3]","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"The macro @genes can be used to filter through the annotations (see @genes). The keyword gene is used to refer to the individual Genes. @genes can also be used to modify annotations.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"@genes(chr, length(gene) > 300) # Returns all features longer than 300 nt\n\n@genes(chr, CDS, ismissing(:product)) .= \"hypothetical product\"","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Gene sequences can be accessed with sequence(gene), which returns the nucleotide sequence. If the translate keyword is set to true, the translated amino acid sequence is returned instead. By default the first codon is translated to methionine also for alternate start codons, but this behaviour can be toggled by setting preserve_alternate_start to false. No checks are made to ensure that the gene points to a valid open reading frame, so this should be done by the user. The following example will write the translated sequences of all protein-coding genes in chr to a file:","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"using BioSequences\nusing FASTX\nopen(FASTA.Writer, \"proteins.fasta\") do w\n for gene in @genes(chr, CDS, iscomplete(gene))\n aaseq = GenomicAnnotations.sequence(gene; translate = true)\n write(w, FASTA.Record(gene.locus_tag, get(:product, \"\"), aaseq))\n end\nend","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Genes can be added using addgene!, and sort! can be used to make sure that the resulting annotations are in the correct order for printing. delete! is used to remove genes.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"newgene = addgene!(chr, \"regulatory\", 670:677)\nnewgene.locus_tag = \"reg02\"\nsort!(chr.genes)\n\n# Genes can be deleted. This works for all genes where `:pseudo` is `true`, and ignores genes where it is `false` or `missing`\ndelete!(@genes(chr, :pseudo))\n# Delete all genes 60 nt or shorter\ndelete!(@genes(chr, length(gene) <= 60))","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Individual genes, and Vector{Gene}s are printed in GBK format. To include the GBK header and the nucleotide sequence, write(::GenBank.Writer, chr) can be used to write them to a file. Use GFF.Writer instead to print the annotations as GFF3, in which case the GenBank header is lost.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"println(chr.genes[1])\nprintln(@genes(chr, CDS))\n\nopen(GenBank.Writer, \"updated.gbk\") do w\n write(w, chr)\nend","category":"page"},{"location":"io/#I/O","page":"I/O","title":"I/O","text":"","category":"section"},{"location":"io/#Input","page":"I/O","title":"Input","text":"","category":"section"},{"location":"io/","page":"I/O","title":"I/O","text":"Annotation files are read with GenBank.Reader and GFF.Reader. Currently these assume that the file follows either standard GenBank format, or GFF3. Any metadata in GFF3 files, apart from the header, is ignored.","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"open(GenBank.Reader, \"example.gbk\") do reader\n for record in reader\n do_something()\n end\nend","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"readgbk(input) and readgff(input) are aliases for collect(open(GenBank.Reader, input)) and collect(open(GFF.Reader, input)), respectively.","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"GenBank.Reader\nGFF.Reader","category":"page"},{"location":"io/#GenomicAnnotations.GenBank.Reader","page":"I/O","title":"GenomicAnnotations.GenBank.Reader","text":"GenBank.Reader(input::IO)\n\nCreate a data reader of the GenBank file format.\n\nopen(GenBank.Reader, \"test/example.gbk\") do records\n for record in record\n print(record)\n end\nend\n\n\n\n\n\n","category":"type"},{"location":"io/#GenomicAnnotations.GFF.Reader","page":"I/O","title":"GenomicAnnotations.GFF.Reader","text":"GFF.Reader(input::IO)\n\nCreate a data reader of the GFF3 file format.\n\n\n\n\n\n","category":"type"},{"location":"io/#Output","page":"I/O","title":"Output","text":"","category":"section"},{"location":"io/","page":"I/O","title":"I/O","text":"Annotations can be printed with GenBank formatting using GenBank.Writer, and as GFF3 with GFF.Writer. Headers are not automatically converted between formats; GFF.Writer only prints the header of the first Record, and only if it starts with a #, while GenBank.Writer prints a default header if the stored one starts with #.","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"GenBank.Writer\nGFF.Writer","category":"page"},{"location":"io/#GenomicAnnotations.GenBank.Writer","page":"I/O","title":"GenomicAnnotations.GenBank.Writer","text":"GenBank.Writer(output::IO; width=70)\n\nCreate a data writer of the GenBank file format.\n\nopen(GenBank.Writer, outfile) do writer\n write(writer, genome)\nend\n\n\n\n\n\n","category":"type"},{"location":"io/#GenomicAnnotations.GFF.Writer","page":"I/O","title":"GenomicAnnotations.GFF.Writer","text":"GFF.Writer(output::IO; width=70)\n\nCreate a data writer of the GFF file format.\n\nopen(GFF.Writer, outfile) do writer\n write(writer, genome)\nend\n\n\n\n\n\n","category":"type"},{"location":"io/","page":"I/O","title":"I/O","text":"In the REPL, instances of Gene are displayed as they would be in the annotation file.","category":"page"}] +[{"location":"genes/#Filtering:-the-@genes-macro","page":"Filtering: the @genes macro","title":"Filtering: the @genes macro","text":"","category":"section"},{"location":"genes/","page":"Filtering: the @genes macro","title":"Filtering: the @genes macro","text":"A useful tool provided by GenomicAnnotations is the macro @genes. It is used to filter through annotations, for example to look at only at coding sequences or rRNAs, which can then be modified or iterated over:","category":"page"},{"location":"genes/","page":"Filtering: the @genes macro","title":"Filtering: the @genes macro","text":"# Print locus tags of all coding sequences longer than 1000 nt, that are not pseudo genes\nfor gene in @genes(chr, CDS, length(gene) > 1000, ! :pseudo)\n println(gene.locus_tag)\nend","category":"page"},{"location":"genes/","page":"Filtering: the @genes macro","title":"Filtering: the @genes macro","text":"@genes","category":"page"},{"location":"genes/#GenomicAnnotations.@genes","page":"Filtering: the @genes macro","title":"GenomicAnnotations.@genes","text":"@genes(chr, exs...)\n\nIterate over and evaluate expressions in exs for all genes in chr.genes, returning genes where all expressions evaluate to true. Any given symbol s in the expression will be substituted for gene.s. The gene itself can be accessed in the expression as gene. Accessing properties of the returned list of genes returns a view, which can be altered.\n\nSome short-hand forms are available to make life easier: CDS, rRNA, and tRNA expand to feature(gene) == \"...\", get(s::Symbol, default) expands to get(gene, s, default)\n\nExamples\n\njulia> chromosome = readgbk(\"example.gbk\")\nChromosome 'example' (5028 bp) with 6 annotations\n\njulia> @genes(chromosome, CDS) |> length\n3\n\njulia> @genes(chromosome, length(gene) < 500)\n CDS 3..206\n /db_xref=\"GI:1293614\"\n /locus_tag=\"tag01\"\n /codon_start=\"3\"\n /product=\"TCP1-beta\"\n /protein_id=\"AAA98665.1\"\n\njulia> @genes(chromosome, ismissing(:gene)) |> length\n2\n\njulia> @genes(chromosome, ismissing(:gene)).gene .= \"Unknown\";\n\njulia> @genes(chromosome, ismissing(:gene)) |> length\n0\n\nAll arguments have to evaluate to true for a gene to be included, so the following expressions are equivalent:\n\n@genes(chr, CDS, length(gene) > 300)\n@genes(chr, CDS && (length(gene) > 300))\n\n@genes returns a Vector{Gene}. Attributes can be accessed with dot-syntax, and can be assigned to\n\n@genes(chr, :locus_tag == \"tag03\")[1].pseudo = true\n@genes(chr, CDS, ismissing(:gene)).gene .= \"unknown\"\n\nSymbols and expressions escaped with $ will be ignored.\n\nd = Dict(:category1 => [\"tag01\", \"tag02\"], :category2 => [\"tag03\"])\n@genes(chr, :locus_tag in d[$:category1])\n\ngene = chr.genes[5]\n@genes(chr, gene == $gene)\n\n\n\n\n\n","category":"macro"},{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Adding-chromosome-name-to-all-locus-tags","page":"Examples","title":"Adding chromosome name to all locus tags","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"When iterating over genes, the parent chromosome can be accessed with parent(::Gene).","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using GenomicAnnotations\nchrs = readgbk(\"genome.gbk\")\nfor gene in @genes(chrs)\n gene.locus_tag = string(parent(gene).name, \"_\", gene.locus_tag)\nend\nprintgbk(\"updated_genome.gbk\", chrs)","category":"page"},{"location":"examples/#Adding-qualifiers","page":"Examples","title":"Adding qualifiers","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"GenomicAnnotations supports arbitrary qualifiers, so you can add any kind of information. The following script reads and adds the output from Phobius (a predictor for transmembrane helices) to the annotations.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using GenomicAnnotations\nchrs = readgbk(\"genome.gbk\")\n\nfunction addphobius!(chr, file)\n @progress for line in readlines(file)\n m = match(r\"^(\\w+) +(\\d+) +\", line)\n if m != nothing\n locus_tag = m[1]\n tmds = parse(Int, m[2])\n @genes(chr, CDS, :locus_tag == locus_tag).phobius .= tmds\n end\n end\nend\n\naddphobius!(chrs, \"phobius.txt\")\n\nopen(GenBank.Writer, \"updated_genome.gbk\") do w\n for chr in chrs\n write(w, chr)\n end\nend","category":"page"},{"location":"examples/#Converting-between-formats","page":"Examples","title":"Converting between formats","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"Annotations can be read from one file format and written as another. Converting between the supported human-readable formats (GenBank and EMBL) or the tab-delimited formats (GFF3 and GTF) will likely work out of the box, but converting from a human-readable format to a tab-delimited format, or vice versa, may need some human intervention. Currently, GenomicAnnotations does not make any attempt to rename columns or perform any sanity checks to ensure that the resulting file meets specifications. Refer to the respective format specifications for details on what attributes need to be included, etc. Notably, GenBank and GFF3 headers do not contain the same information, and GTF files lack a header altogether, thus all information in the header is lost when saving annotations as another format. GTF files also do not allow the inclusion of sequence data, unlike GFF3. Below is a simple example script that demonstrates some of the changes that need to be made. If your use-case includes more complex features, such as multi-exon genes, you will likely need to make more changes as part of the convertion.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using GenomicAnnotations\nusing DataFrames\nchrs = readgbk(\"genome.gbk\")\nopen(GFF.Writer, \"genome.gff\") do w\n for chr in chrs\n # GenBank features often contain features that are not usually included\n # in GFF3 files, so let's remove some:\n cols_to_remove = intersect([\"translation\", \"mol_type\", \"organism\"], names(chr.genedata))\n if !isempty(cols_to_remove)\n chr.genedata = chr.genedata[:, Not(cols_to_remove)]\n end\n # The GenBank format uses the :source feature to store metadata about\n # the record, but in GFF3, :source is the name of the column which\n # contains the sequence name. Thus, we need to change the GenBank\n # :source to the GFF3 equivalent :region. According to the GenBank\n # specification, :source is mandatory, but it's best to be safe and\n # check that it's really there:\n source_entries = @genes(chr, source)\n if !isempty(source_entries)\n for source in source_entries\n region = feature!(source, :region)\n region.Name = chr.name\n region.ID = chr.name\n # GenBank files include information about circularity of a\n # contig in its header, but in the GFF3 format this\n # information is encoded in the \"Is_circular\" attribute of\n # the first :region feature:\n if occursin(\"circular\", chr.header)\n region.Is_circular = true\n end\n end\n else\n # If the :source feature is missing, we'll have to create a :region\n # from scratch:\n addgene!(chr, \"region\", 1:length(chr.sequence);\n Name = chr.name,\n ID = chr.name,\n Is_circular = occursin(\"circular\", chr.header))\n sort!(chr.genes)\n end\n # Most features, such as :CDS or :tRNA, have a corresponding :gene\n # that it belongs to. In GFF3, this hierarchical relationship is shown\n # using the \"ID\" and \"Parent\" attributes. Here, we set the \"ID\"\n # attribute of all :gene features to match their \"locus_tag\", and then\n # set the \"Parent\" attributes of all non-:gene features to match the\n # \"ID\" of their respective :gene, if there is one:\n gene_features = @genes(chr, gene)\n gene_features.ID .= gene_features.locus_tag\n for gene in @genes(chr, !gene)\n if get(gene, :locus_tag, \"missing\") in skipmissing(gene_features.locus_tag)\n gene.Parent = gene.locus_tag\n end\n end\n write(w, chr)\n end\nend","category":"page"},{"location":"loci/#Loci","page":"Representing genomic loci","title":"Representing genomci loci","text":"","category":"section"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"The easiest way to create a locus is to use the constructor Locus(s), which takes an AbstractString s and parses it as a GenBank locus string as defined here: https://www.insdc.org/submitting-standards/feature-table/#3.4. Note that remote entry descriptors have not been implemented.","category":"page"},{"location":"loci/#Internal-representation","page":"Representing genomic loci","title":"Internal representation","text":"","category":"section"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"Since v0.4.0, genomic loci are represented using instances of AbstractLocus. Simple descriptors are represented with PointLocus{T} and SpanLocus{T}, where T is an AbstractDescriptor:","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"GenBank string GenomicAnnotations representation Description\n1 PointLocus{SingleNucleotide}(1) Refers to a single nucleotide.\n1^2 PointLocus{BetweenNucleotides}(1) Refers to the internucleotide space immediately after position 1.\n10..20 SpanLocus{ClosedSpan}(10:20) Denotes a closed sequence span.\n10..>20 SpanLocus{OpenRightSpan}(10:20) Denotes a sequence span where the right side is open, i.e. the end-point is undefined but earliest at position 20.\n<10..20 SpanLocus{OpenLeftSpan}(10:20) The left end-point is undefined.\n<10..>20 SpanLocus{OpenSpan}(10:20) Both end-points are undefined.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"These can be wrapped in Complement for loci on the complement strand, e.g. Complement(SpanLocus{ClosedSpan}(10:20)) representing \"complement(10..20)\". Simplified constructors are provided for all AbstractDescriptors, e.g. ClosedSpan(1:10) == SpanLocus(1:10, ClosedSpan).","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"Compound loci are represented with Join and Order. Both types have a single field, loc which contains any number of simple descriptors. They can be wrapped with complement, as can the individual elements in loc.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"Locus(\"complement(join(10..20,30..>40))\") isa Complement{Join{SpanLocus{ClosedSpan}, SpanLocus{OpenRightSpan}}}","category":"page"},{"location":"loci/#Iteration","page":"Representing genomic loci","title":"Iteration","text":"","category":"section"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"AbstractLocus instances are themselves iterable, yielding each compound locus in sequence. If the locus is wrapped in Complement, the compound loci are returned in reverse order, and individually wrapped in Complement.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"julia> for loc in Locus(\"complement(join(1..3,7..9))\")\n println(loc)\n end\ncomplement(7..9)\ncomplement(1..3)","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"The eachposition(locus) function is provided for iterating over the individual genomic positions in the locus. Note that this ignores any metadata such as strandedness.","category":"page"},{"location":"loci/","page":"Representing genomic loci","title":"Representing genomic loci","text":"julia> for p in eachposition(Locus(\"complement(join(1..3,7..9))\"))\n print(p)\n end\n987321","category":"page"},{"location":"accessing/#Accessing-and-modifying-annotations","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"","category":"section"},{"location":"accessing/#Features","page":"Accessing and modifying annotations","title":"Features","text":"","category":"section"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"The following functions can be used to read and modify the data associated with a gene:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"locus\nlocus!\nfeature\nfeature!\nparent\ngenedata\nattributes","category":"page"},{"location":"accessing/#GenomicAnnotations.locus","page":"Accessing and modifying annotations","title":"GenomicAnnotations.locus","text":"locus(g::Gene)\n\nReturn the AbstractLocus of g.\n\n\n\n\n\n","category":"function"},{"location":"accessing/#GenomicAnnotations.locus!","page":"Accessing and modifying annotations","title":"GenomicAnnotations.locus!","text":"locus!(gene::AbstractGene, loc)\nlocus!(gene::AbstractGene, loc::AbstractLocus)\n\nReplace gene with a new Gene with loc as its Locus. If loc is not an AbstractLocus, it is parsed with Locus(loc).\n\n\n\n\n\n","category":"function"},{"location":"accessing/#GenomicAnnotations.feature","page":"Accessing and modifying annotations","title":"GenomicAnnotations.feature","text":"feature(g::Gene)\n\nReturn the feature type (i.e. gene, CDS, tRNA, etc.) of g.\n\n\n\n\n\n","category":"function"},{"location":"accessing/#GenomicAnnotations.feature!","page":"Accessing and modifying annotations","title":"GenomicAnnotations.feature!","text":"feature!(g::Gene, f::Symbol)\n\nChange the feature of g to f, returning a new instance of Gene. Since Genes are immutable, feature! only mutates the parent of g and not g itself. Thus, in the first example below the original unmodified g is printed, not the updated version:\n\n# This will not work as expected:\nfor source in @genes(chr, source)\n feature!(source, :region)\n println(source)\nend\n\n# But this will:\nfor source in @genes(chr, source)\n source = feature!(source, :region)\n println(source)\n\n\n\n\n\n","category":"function"},{"location":"accessing/#Base.parent","page":"Accessing and modifying annotations","title":"Base.parent","text":"parent(A)\n\nReturn the underlying parent object of the view. This parent of objects of types SubArray, SubString, ReshapedArray or LinearAlgebra.Transpose is what was passed as an argument to view, reshape, transpose, etc. during object creation. If the input is not a wrapped object, return the input itself. If the input is wrapped multiple times, only the outermost wrapper will be removed.\n\nExamples\n\njulia> A = [1 2; 3 4]\n2×2 Matrix{Int64}:\n 1 2\n 3 4\n\njulia> V = view(A, 1:2, :)\n2×2 view(::Matrix{Int64}, 1:2, :) with eltype Int64:\n 1 2\n 3 4\n\njulia> parent(V)\n2×2 Matrix{Int64}:\n 1 2\n 3 4\n\n\n\n\n\n","category":"function"},{"location":"accessing/#GenomicAnnotations.genedata","page":"Accessing and modifying annotations","title":"GenomicAnnotations.genedata","text":"genedata(g::Gene)\n\nReturn the DataFrameRow where the data for g is stored. See attributes for a slower but potentially more convenient alternative.\n\n\n\n\n\n","category":"function"},{"location":"accessing/#GenomicAnnotations.attributes","page":"Accessing and modifying annotations","title":"GenomicAnnotations.attributes","text":"attributes(g::Gene)\n\nReturn an immutable NamedTuple containing copies of all annotated attributes of g. Missing attributes are excluded. See genedata for a non-allocating way to access the gene data directly.\n\n\n\n\n\n","category":"function"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Features (genes) can be added using addgene!. A feature must have a feature name and a locus (position), and can have any number of additional qualifiers associated with it (see next section).","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"addgene!","category":"page"},{"location":"accessing/#GenomicAnnotations.addgene!","page":"Accessing and modifying annotations","title":"GenomicAnnotations.addgene!","text":"addgene!(chr::Record, feature, locus; kw...)\n\nAdd gene to chr. locus can be an AbstractLocus, a String, a UnitRange, or a StepRange (for decreasing ranges, which will be annotated on the complementary strand).\n\nExample\n\naddgene!(chr, \"CDS\", 1:756;\n locus_tag = \"gene0001\",\n product = \"Chromosomal replication initiator protein dnaA\")\n\n\n\n\n\n","category":"function"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"After adding a new feature, sort! can be used to make sure that the annotations are stored (and printed) in the order in which they occur on the chromosome:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"sort!(chr)","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Existing features can be removed using delete!:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"delete!(::Gene)\ndelete!(::AbstractVector{Gene})","category":"page"},{"location":"accessing/#Base.delete!-Tuple{Gene}","page":"Accessing and modifying annotations","title":"Base.delete!","text":"delete!{T}(h::MutableBinaryHeap{T}, i::Int)\n\nDeletes the element with handle i from heap h .\n\n\n\n\n\ndelete!(collection, key)\n\nDelete the mapping for the given key in a collection, and return the collection.\n\nExamples\n\njulia> d = RobinDict(\"a\"=>1, \"b\"=>2)\nRobinDict{String,Int64} with 2 entries:\n \"b\" => 2\n \"a\" => 1\n\njulia> delete!(d, \"b\")\nRobinDict{String,Int64} with 1 entry:\n \"a\" => 1\n\n\n\n\n\ndelete!(tree::RBTree, key)\n\nDeletes key from tree, if present, else returns the unmodified tree.\n\n\n\n\n\ndelete!(gene::AbstractGene)\n\nDelete gene from parent(gene). Warning: does not work when broadcasted! Use delete!(::AbstractVector{Gene}) instead.\n\n\n\n\n\n","category":"method"},{"location":"accessing/#Base.delete!-Tuple{AbstractVector{Gene}}","page":"Accessing and modifying annotations","title":"Base.delete!","text":"delete!(genes::AbstractArray{Gene, 1})\n\nDelete all genes in genes from parent(genes[1]).\n\nExample\n\ndelete!(@genes(chr, length(gene) <= 60))\n\n\n\n\n\n","category":"method"},{"location":"accessing/#Qualifiers","page":"Accessing and modifying annotations","title":"Qualifiers","text":"","category":"section"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Features can have multiple attributes/qualifiers, which can be modified using Julia's property syntax:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"# Remove newspace from gene product descriptions\nfor gene in @genes(chr, CDS)\n replace!(gene.product, '\\n' => ' ')\nend","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Properties also work on views of genes, typically generated using @genes:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"interestinggenes = readlines(\"/path/to/list/of/interesting/genes.txt\")\n@genes(chr, CDS, :locus_tag in interestinggenes).interesting .= true","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Sometimes features have multiple instances of the same qualifier, such genes having several EC-numbers. Assigning qualifiers with property syntax overwrites any data that was previously stored for that feature, and trying to assign a vector of values to a qualifier that is currently storing scalars will result in an error, so to safely assign qualifiers that might have more instances one can use pushproperty!:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"pushproperty!","category":"page"},{"location":"accessing/#GenomicAnnotations.pushproperty!","page":"Accessing and modifying annotations","title":"GenomicAnnotations.pushproperty!","text":"pushproperty!(gene::AbstractGene, qualifier::Symbol, value::T)\n\nAdd a property to gene, similarly to Base.setproperty!(::gene), but if the property is not missing in gene, it will be transformed to store a vector instead of overwriting existing data.\n\njulia> eltype(chr.genedata[!, :EC_number])\nUnion{Missing,String}\n\njulia> chr.genes[1].EC_number = \"EC:1.2.3.4\"\n\"EC:1.2.3.4\"\n\njulia> pushproperty!(chr.genes[1], :EC_number, \"EC:4.3.2.1\"); chr.genes[1].EC_number\n2-element Array{String,1}:\n \"EC:1.2.3.4\"\n \"EC:4.3.2.1\"\n\njulia> eltype(chr.genedata[!, :EC_number])\nUnion{Missing, Array{String,1}}\n\n\n\n\n\n","category":"function"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use get() than to access the property directly.","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"# chr.genes[2].pseudo returns missing, so this will throw an error\nif chr.genes[2].pseudo\n println(\"Gene 2 is a pseudogene\")\nend\n\n# ... but this works:\nif get(chr.genes[2], :pseudo, false)\n println(\"Gene 2 is a pseudogene\")\nend","category":"page"},{"location":"accessing/#Sequences","page":"Accessing and modifying annotations","title":"Sequences","text":"","category":"section"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"The sequence of a Chromosome chr is stored in chr.sequence. Sequences of individual features can be read with sequence:","category":"page"},{"location":"accessing/","page":"Accessing and modifying annotations","title":"Accessing and modifying annotations","text":"sequence(::Gene)","category":"page"},{"location":"accessing/#GenomicAnnotations.sequence-Tuple{Gene}","page":"Accessing and modifying annotations","title":"GenomicAnnotations.sequence","text":"sequence(gene::AbstractGene; translate = false, preserve_alternate_start = false)\n\nReturn genomic sequence for gene. If translate is true, the sequence will be translated to a LongAA, excluding the stop, otherwise it will be returned as a LongDNA{4} (including the stop codon). If preserve_alternate_start is set to false, alternate start codons will be assumed to code for methionine. ```\n\n\n\n\n\n","category":"method"},{"location":"#GenomicAnnotations.jl","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"","category":"section"},{"location":"#Description","page":"GenomicAnnotations.jl","title":"Description","text":"","category":"section"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank and GFF3 file formats.","category":"page"},{"location":"#Installation","page":"GenomicAnnotations.jl","title":"Installation","text":"","category":"section"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"julia>]\npkg> add GenomicAnnotations","category":"page"},{"location":"#Examples","page":"GenomicAnnotations.jl","title":"Examples","text":"","category":"section"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"GenBank and GFF3 files are read with readgbk(input) and readgff(input), which return vectors of Records. input can be an IOStream or a file path. GZipped data can be read by setting the keyword gunzip to true, which is done automatically if a filename ending in \".gz\" is passed as input. If we're only interested in the first chromosome in example.gbk we only need to store the first record.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"chr = readgbk(\"test/example.gbk\")[1]","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Another way to read files is to use the corresponding Reader directly:","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"open(GenBank.Reader, \"test/example.gbk\") do reader\n for record in reader\n println(record.name)\n end\nend","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Records have five fields, name, header, genes, genedata, and sequence. The name is read from the header, which is stored as a string. The annotation data is stored in genedata, but generally you should use genes to access that data. For example, it can be used to iterate over annotations, and to modify them.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"for gene in chr.genes\n gene.locus_tag = \"$(chr.name)_$(gene.locus_tag)\"\nend\n\nchr.genes[2].locus_tag = \"test123\"","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"The locus of a Gene is represented by an AbstractLocus (see Loci), which can be retrieved with locus(gene). The locus of a gene can be updated with locus!(gene, newlocus). The easiest way to create a locus is to use the constructor Locus(s), which takes an AbstractString s and parses it as a GenBank locus string as defined here: https://www.insdc.org/submitting-standards/feature-table/#3.4. Note that remote entry descriptors have not been implemented.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"# Creating a new locus\nnewlocus = Locus(\"complement(join(1..100,200..>300))\")\n\n# Assigning a new locus to a gene\nlocus!(gene, newlocus)\n# which is equivalent to\nlocus!(gene, \"complement(join(1..100,200..>300))\")","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"For simplicity, position(gene) is shorthand for locus(gene).position. locus(gene).position gives an iteratable object that generates each individual position in the defined order. Thus:","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"loc = Locus(\"join(4..6,1..3)\")\ncollect(loc.position) # Returns [4,5,6,1,2,3]","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"The macro @genes can be used to filter through the annotations (see @genes). The keyword gene is used to refer to the individual Genes. @genes can also be used to modify annotations.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"@genes(chr, length(gene) > 300) # Returns all features longer than 300 nt\n\n@genes(chr, CDS, ismissing(:product)) .= \"hypothetical product\"","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Gene sequences can be accessed with sequence(gene), which returns the nucleotide sequence. If the translate keyword is set to true, the translated amino acid sequence is returned instead. By default the first codon is translated to methionine also for alternate start codons, but this behaviour can be toggled by setting preserve_alternate_start to false. No checks are made to ensure that the gene points to a valid open reading frame, so this should be done by the user. The following example will write the translated sequences of all protein-coding genes in chr to a file:","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"using BioSequences\nusing FASTX\nopen(FASTA.Writer, \"proteins.fasta\") do w\n for gene in @genes(chr, CDS, iscomplete(gene))\n aaseq = GenomicAnnotations.sequence(gene; translate = true)\n write(w, FASTA.Record(gene.locus_tag, get(:product, \"\"), aaseq))\n end\nend","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Genes can be added using addgene!, and sort! can be used to make sure that the resulting annotations are in the correct order for printing. delete! is used to remove genes.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"newgene = addgene!(chr, \"regulatory\", 670:677)\nnewgene.locus_tag = \"reg02\"\nsort!(chr.genes)\n\n# Genes can be deleted. This works for all genes where `:pseudo` is `true`, and ignores genes where it is `false` or `missing`\ndelete!(@genes(chr, :pseudo))\n# Delete all genes 60 nt or shorter\ndelete!(@genes(chr, length(gene) <= 60))","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"Individual genes, and Vector{Gene}s are printed in GBK format. To include the GBK header and the nucleotide sequence, write(::GenBank.Writer, chr) can be used to write them to a file. Use GFF.Writer instead to print the annotations as GFF3, in which case the GenBank header is lost.","category":"page"},{"location":"","page":"GenomicAnnotations.jl","title":"GenomicAnnotations.jl","text":"println(chr.genes[1])\nprintln(@genes(chr, CDS))\n\nopen(GenBank.Writer, \"updated.gbk\") do w\n write(w, chr)\nend","category":"page"},{"location":"io/#I/O","page":"I/O","title":"I/O","text":"","category":"section"},{"location":"io/#Input","page":"I/O","title":"Input","text":"","category":"section"},{"location":"io/","page":"I/O","title":"I/O","text":"Annotation files are read with GenBank.Reader and GFF.Reader. Currently these assume that the file follows either standard GenBank format, or GFF3. Any metadata in GFF3 files, apart from the header, is ignored.","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"open(GenBank.Reader, \"example.gbk\") do reader\n for record in reader\n do_something()\n end\nend","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"readgbk(input) and readgff(input) are aliases for collect(open(GenBank.Reader, input)) and collect(open(GFF.Reader, input)), respectively.","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"GenBank.Reader\nGFF.Reader","category":"page"},{"location":"io/#GenomicAnnotations.GenBank.Reader","page":"I/O","title":"GenomicAnnotations.GenBank.Reader","text":"GenBank.Reader(input::IO)\n\nCreate a data reader of the GenBank file format.\n\nopen(GenBank.Reader, \"test/example.gbk\") do records\n for record in record\n print(record)\n end\nend\n\n\n\n\n\n","category":"type"},{"location":"io/#GenomicAnnotations.GFF.Reader","page":"I/O","title":"GenomicAnnotations.GFF.Reader","text":"GFF.Reader(input::IO)\n\nCreate a data reader of the GFF3 file format.\n\n\n\n\n\n","category":"type"},{"location":"io/#Output","page":"I/O","title":"Output","text":"","category":"section"},{"location":"io/","page":"I/O","title":"I/O","text":"Annotations can be printed with GenBank formatting using GenBank.Writer, and as GFF3 with GFF.Writer. Headers are not automatically converted between formats; GFF.Writer only prints the header of the first Record, and only if it starts with a #, while GenBank.Writer prints a default header if the stored one starts with #.","category":"page"},{"location":"io/","page":"I/O","title":"I/O","text":"GenBank.Writer\nGFF.Writer","category":"page"},{"location":"io/#GenomicAnnotations.GenBank.Writer","page":"I/O","title":"GenomicAnnotations.GenBank.Writer","text":"GenBank.Writer(output::IO; width=70)\n\nCreate a data writer of the GenBank file format.\n\nopen(GenBank.Writer, outfile) do writer\n write(writer, genome)\nend\n\n\n\n\n\n","category":"type"},{"location":"io/#GenomicAnnotations.GFF.Writer","page":"I/O","title":"GenomicAnnotations.GFF.Writer","text":"GFF.Writer(output::IO; width=70)\n\nCreate a data writer of the GFF file format.\n\nopen(GFF.Writer, outfile) do writer\n write(writer, genome)\nend\n\n\n\n\n\n","category":"type"},{"location":"io/","page":"I/O","title":"I/O","text":"In the REPL, instances of Gene are displayed as they would be in the annotation file.","category":"page"}] }