-
Notifications
You must be signed in to change notification settings - Fork 1
Project Data
HBV-GLUE contains reference sequences for HBV subtypes and genotypes.
Sequences are in GenBank XML format and are contained in this directory.
Reference sequences are defined in this csv file.
Reference sequences are organised hierarchically, via GLUE's alignment tree data structure. The hierarchy reflects the taxonomic structure of hepatitis B virus (HBV) into species, genotypes, and subgenotypes. These reference sequences serve as the backbone for alignment-based analyses, phylogenetics, and genotyping, ensuring consistency and reproducibility across studies. Below is an overview of the reference sequence organization:
-
Name:
AL_MASTER
- Display Name: Hepatitis B Virus
- Clade Category: Species
-
Constraining Reference Sequence:
NC_003977
This alignment serves as the overarching framework for HBV genomic data and includes all genotypes and subgenotypes as child alignments.
The master alignment is subdivided into child alignments representing HBV genotypes (e.g., A, B, C), each with a constraining reference sequence. For example:
-
Genotype A (
AL_A
): Constraining reference:FJ692557
-
Genotype B (
AL_B
): Constraining reference:GU815637
-
Genotype C (
AL_C
): Constraining reference:GQ377617
Each genotype is further subdivided into alignments representing subgenotypes, with specific reference sequences for finer resolution. For example:
-
Genotype A Subgenotypes:
- Subgenotype A1 (1): Reference:
KP168423
- Subgenotype A2: Reference:
EU594385
- Subgenotype A6: Reference:
GQ331046
- Subgenotype A1 (1): Reference:
-
Genotype C Subgenotypes:
- Subgenotype C1: Reference:
DQ089781
- Subgenotype C6: Reference:
EU670263
- Subgenotype C10: Reference:
KJ173333
- Subgenotype C1: Reference:
-
Genotype E (
AL_E
): Single constraining reference:GQ161817
-
Genotype G (
AL_G
): Single constraining reference:AB056513
-
Genotype H (
AL_H
): Single constraining reference:FJ356715
-
Genotype J (
AL_J
): Single constraining reference:AB486012
Reference sequences are curated from publicly available databases and assigned to alignments based on their role in defining HBV genotypes and subgenotypes. They are used for:
- Constraining Alignments: To standardize the positions and annotations of sequence data.
- Genotyping and Phylogenetics: To classify and analyze sequences accurately.
- Hierarchical Organization: To mirror the taxonomic structure of HBV and streamline downstream analysis.
This structured organization of reference sequences ensures the HBV-GLUE Core Project provides a robust platform for comparative genomic analysis, facilitating genotype and subgenotype classification, phylogenetic investigations, and alignment-based analyses.