final remarks

meiers · Mar 14, 2018 · ba5b855 · ba5b855
1 parent 3befb1b
commit ba5b855
Show file tree

Hide file tree

Showing 3 changed files with 49 additions and 37 deletions.
diff --git a/conclusions.tex b/conclusions.tex
@@ -24,18 +24,18 @@ \chapter{Conclusions and Discussion}
 
 \section{Complex inversions in the human genome}
 
-In \cref{sec:complex_invs}, I analyzed inversions in the scope of the 1000
-Genomes Project. Together with colleagues, we were able to solve the
-``validation problem'' by using targeted long-read sequencing on both \pacbio
-and \ont MinION platforms. I verified that more than 80\% of the predicted
-inversions indeed carried an inversion signature---meaning they were
-validated---which could previously be ascertained neither based on \mps data nor
-via \pcr experiments. This solved my first research goal and a principal
+Inversions are a \sv class of outstanding relevance for human disease \citep{Feuk2010},
+yet they are especially difficult to detect and they eluded ascertainment also
+in the 1000 Genomes Project. As I show in \cref{sec:complex_invs}, I was able to
+validate hundreds of inversion loci by using targeted long-read sequencing data
+from both \pacbio and \ont MinION platforms. This revealed that more than 80\% of the
+inversion loci predicted from \mps indeed carried an inversion signature.
+Strikingly, this verification had previously not been possible via \pcr
+experiments.  This solved my first research goal and a principal
 challenge of the overall study \citep{Sudmant2015}.
 
-Moreover, I then found that
-the majority of predicted loci contained not simple inversions, but complex
-variants containing inverted sequence. I categorized them into five major
+Moreover, I then found that the majority of predicted loci contained not simple
+inversions, but complex variants containing inverted sequence. I categorized them into five major
 classes, which included inverted duplications as the most frequent event. These
 insights had only been possible due to the ability of long-read techniques to
 span complete loci around predicted inversions. My analyses critically relied on
@@ -52,19 +52,22 @@ \section{Complex inversions in the human genome}
 originate from the same mutagenic process, with slight evidence for replication-based
 mechanisms such as \mmbir.
 
-It is good to know that, after my own contribution, the role and prevalence of complex variation has
-been studied further by others \citep{Chaisson2014,Collins2017}.
-Using 10X Genomics and mate pair sequencing, \citet{Colling2017} even extended
+Intrigued by the unforeseen amount of complex variation revealed in the 1000 Genomes
+Project, others continued to study this \sv class in human genomes \citep{Chaisson2014,Collins2017}.
+Using the emerging 10X Genomics technology and mate pair sequencing, \citet{Collins2017} even extended
 the five classes that I reported to a total of 16 different complex \sv classes
 (which they call cxSV), more than 80\% of which contained inverted sequence.
 This further emphasizes the that this phenomenon was previously underappreciated,
-as I predicted, and which I could not resolve in our study due to the initial
-calling from low coverage \mps data. They also note that these complex events might
-have been created by a replicative mechanism such as \mmbir.
-The prevalence of these classes in patients
-with autism spectrum disorder and other
-further emphasizing the prevalence
-of this underappreciated phenomenon .
+as I predicted. They also note that these complex events might have been created
+by a replicative mechanism such as \mmbir.
+
+My work and the subsequent finding of \citet{Collins2017} underline the
+prevalence of complex inverted rearrangments---leading to the notion of the ``morbin''
+human genome. Whereas my work revelead complex \acp{sv} in healthy individuals,
+\Citet{Collins2017} found them in patients with autism spectrum disorder. The
+functional role of these \sv classes is not yet understood, but our results
+suggest that inverted and complex variation can and and should be detected,
+especially in the context of genetic studies around human disease.
 
 
 
@@ -94,7 +97,9 @@ \subsubsection{Long-read sequencing on the rise}
 plant genomics community, which had been affected by the limitations of
 short-read \mps to a special degree \citep{Bickhart2014}. Notably, the hope is
 to perform \textit{de novo} assembly of highly repetitive, or even polyploid
-genomes \citep{Li2017}. However, the problem of \textit{de novo} assembly from
+genomes \citep{Li2017}. An accurate assembly would make the discovery of \acp{sv}
+trivial---it could simply be done by sequence comparison.
+However, the problem of \textit{de novo} assembly from
 \pacbio data alone is not yet considered to be solved, despite a number of
 available software tools \citep{Chin2013,Chin2016,Koren2017,Koren2018} and the
 attention of renowned scientists\footnote{E.g. the efforts of Gene Myers, see
@@ -129,13 +134,17 @@ \subsubsection{Long-read sequencing on the rise}
 (and a maximum of 880~kb). This is a length so far unachieved by PacBio, which
 typically yields a maximum read length below 100~kb\footnoteref{footnote:pacbioblog}.
 
+Together, these technological improvements in long-read sequencing will facilitate
+studies on \acp{sv} that have been overlooked in the past---they might even, at some point in the future,
+make whole-homologue \emph{de novo} assembly possible, which would directly reveal the full spectrum
+of \acp{sv} within an indiual's genome.
 
 
 
 
 \section{Effects of SVs on gene expression and chromatin organization}
 
-In \cref{sec:balancer}, we set out to study the functional consequences of
+In \cref{sec:balancer}, I set out to study the functional consequences of
 \acp{sv} in respect to gene expression and chromatin conformation. My first goal
 within this collaborative project was to characterize the variants present in
 highly rearranged balancer chromosomes. I achieved this by utilizing deep \wgs
@@ -148,7 +157,7 @@ \section{Effects of SVs on gene expression and chromatin organization}
 advantage of \hic data, I could additionally detect precisely (in 2
 cases) or approximately (in 1 case) the breakpoints that had been missed by
 these studies. In addition, I utilized haplotype-resolved \hic maps to validate
-large rearrangements including a inversion, and a duplication of 258~kb. The
+large rearrangements including an inversion, and a duplication of 258~kb. The
 large duplication most likely inserted in reverse orientation next to the
 original copy, which I concluded from the differential contact frequencies
 around the affected locus. Together, these findings clearly show the benefits
@@ -158,18 +167,18 @@ \section{Effects of SVs on gene expression and chromatin organization}
 test for \acl{ase} that utilizes multiple biological replicates and that
 corrects for effects of maternally deposited RNA. I found that changes in
 expression occur almost everywhere across the genome and that they appear not to
-be caused by enhancer hijacking, as had been observed in previous studies.
+be caused by enhancer hijacking, as had been observed in previous studies (\cref{sec:balancer_background}).
 Instead, \acp{sv} alter expression via alternative mechanisms such as dosage
 effects or chimeric expression of transcripts through mobile elements (summarized in \cref{sec:balancer_concl}). Our
 findings appear contrary to what has been seen in other scenarios; however, I
-argue that this might be a result of natural selection in both the other
+argued that this might be a result of natural selection in both the other
 studies and in ours. In conclusion, balancer chromosomes show a remarkable
 robustness towards the huge rearrangements and other variation that they carry,
 and the potential effects of enhancer hijacking mechanisms appear to be buffered.
 I speculated that this buffering might be caused by other forms of variation,
 such as \acp{snv}, or possible via changes of the epigenome.
 
-We think that these results will complement
+I think that these results will complement
 previous studies and lead to a more holistic view on the role of chromatin
 architecture. The manuscript was in preparation at the time of writing this
 thesis.
@@ -183,18 +192,19 @@ \subsubsection{SV characterization via \hic}
 characterization. Naturally---and considering the popularity of \hic and the
 amount of publicly available data---this observation was made by others, too.
 
-The prospects of \hic for purposes other than studying chromatin conformation has been noted early in the field of \textit{de novo} assembly:
+The prospects of \hic for purposes other than studying chromatin conformation have
+been noted early in the field of \textit{de novo} assembly:
 \Citet{Kaplan2013}, for instance, predicted that \hic could facilitate assembly
 and assigned unplaced contigs to the human genome; \Citet{Burton2013} created
 scaffolds of human, mouse, and \textit{Drosophila} genomes based on \hic and
 \mps data; \Citet{Selvaraj2013} successfully extended the idea to haplotyping;
 And recently, the mosquito \textit{Aedes aegypti}, vector of the Zika virus, was
 assembled using \hic data \citep{Dudchenko2017}.
-
-Interstingly, the biological folding of chromosomes is not relevant---,aybe even
-impairing---for the purpose of assembly or \sv detection. \citet{Putnam2016}
-hence developed a protocol that reconstitutes chromatin \textit{in vitro} prior to \hic
-library preparation.
+%Interstingly, the biological
+%folding of chromatin is not relevant---maybe even impairing---for the pure
+%purpose of assembly or \sv detection. \citet{Putnam2016}
+%hence developed a protocol that reconstitutes chromatin \textit{in vitro} prior to \hic
+%library preparation.
 
 The core idea of \hic-based \sv detection is the identification of characteristic
 alterations in contact frequencies. The presumably first \acp{sv} detected using

diff --git a/intro.tex b/intro.tex
@@ -148,7 +148,9 @@ \section{Research goals and thesis overview}
 shortcomings of current \sv detection methods. This especially affects studies
 of balanced or complex rearrangements, which had often remained cryptic in
 previous studies. In this dissertation, I aim at uncovering and further examining
-\acp{sv} that had been difficult to ascertain beforehand. In order to do so, I
+\acp{sv} that had been difficult to ascertain beforehand.
+
+In order to do so, I
 utilize emerging sequencing technologies and protocols---namely the
 techniques introduced in \crefrange{sec:long_read_seq}{sec:strandseq}.
 My work is structured into three separate research projects, in which I explore

diff --git a/inversions.tex b/inversions.tex
@@ -4,15 +4,15 @@ \chapter{Complex Inversions in the Human Genome}
 
 In 2014 and 2015 I had the opportunity to collaborate with a large consortium of
 scientists on the 1000 Genomes Project. My supervisor Jan Korbel was the
-co-leader of the structural variation subgroup and together with my colleagues
-\tobias, \adrian, \benjamin, \markus, and \andreas
-I was involved in the validation and characterization of inversions.
+co-leader of the structural variation subgroup and, together with my colleagues
+\tobias, \adrian, \benjamin, \markus, and \andreas,
+I approached the validation and characterization of inversions.
 This chapter covers my work for the 1000 Genomes Project, which not only turned
 out to solve an interesting mystery but also resulted in a co-authorship in
 \citet{Sudmant2015}. I continue by describing subsequent work, including a side
 project on sequence match visualization that came into being from collaboration
 with \markus (\cref{sec:maze}), as well as an analysis of inversion breakpoints
-(\cref{sec:breakpoints}). These results were presented partially in form of a
+(\cref{sec:breakpoints}). The latter results were presented in form of a
 poster at the German Conference for
 Bioinformatics 2016 in Berlin. There is supplementary information to this
 chapter enclosed in the appendix (\cref{sec:suppl_inversions}).