Skip to content

Project Data

Robert J. Gifford edited this page Nov 27, 2024 · 2 revisions

Reference Sequences

HBV-GLUE contains reference sequences for HBV subtypes and genotypes.

Sequences are in GenBank XML format and are contained in this directory.

Reference sequences are defined in this csv file.


Alignment Tree

Reference sequences are organised hierarchically, via GLUE's alignment tree data structure. The hierarchy reflects the taxonomic structure of hepatitis B virus (HBV) into species, genotypes, and subgenotypes. These reference sequences serve as the backbone for alignment-based analyses, phylogenetics, and genotyping, ensuring consistency and reproducibility across studies. Below is an overview of the reference sequence organization:

1. Master Alignment

  • Name: AL_MASTER
  • Display Name: Hepatitis B Virus
  • Clade Category: Species
  • Constraining Reference Sequence: NC_003977
    This alignment serves as the overarching framework for HBV genomic data and includes all genotypes and subgenotypes as child alignments.

2. Genotype Alignments

The master alignment is subdivided into child alignments representing HBV genotypes (e.g., A, B, C), each with a constraining reference sequence. For example:

  • Genotype A (AL_A): Constraining reference: FJ692557
  • Genotype B (AL_B): Constraining reference: GU815637
  • Genotype C (AL_C): Constraining reference: GQ377617

3. Subgenotype Alignments

Each genotype is further subdivided into alignments representing subgenotypes, with specific reference sequences for finer resolution. For example:

  • Genotype A Subgenotypes:
    • Subgenotype A1 (1): Reference: KP168423
    • Subgenotype A2: Reference: EU594385
    • Subgenotype A6: Reference: GQ331046
  • Genotype C Subgenotypes:
    • Subgenotype C1: Reference: DQ089781
    • Subgenotype C6: Reference: EU670263
    • Subgenotype C10: Reference: KJ173333

4. Additional Genotypes

  • Genotype E (AL_E): Single constraining reference: GQ161817
  • Genotype G (AL_G): Single constraining reference: AB056513
  • Genotype H (AL_H): Single constraining reference: FJ356715
  • Genotype J (AL_J): Single constraining reference: AB486012

5. Role of Reference Sequences

Reference sequences are curated from publicly available databases and assigned to alignments based on their role in defining HBV genotypes and subgenotypes. They are used for:

  • Constraining Alignments: To standardize the positions and annotations of sequence data.
  • Genotyping and Phylogenetics: To classify and analyze sequences accurately.
  • Hierarchical Organization: To mirror the taxonomic structure of HBV and streamline downstream analysis.

This structured organization of reference sequences ensures the HBV-GLUE Core Project provides a robust platform for comparative genomic analysis, facilitating genotype and subgenotype classification, phylogenetic investigations, and alignment-based analyses.