-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Estimated genome size is half #132
Comments
Can you send the link to the genomescope webpage with your results?
Sometimes the automatic modeling process gets confused and needs a hint on
how to fit the model.
And you report the assembled genome size was 8.2G - is this the total
amount of sequence that was assembled? If so, the difference is explained
by genomescope reporting the haploid genome size while the assembly size
will be about twice this amount for highly heterozygous samples. This is
because the two haplotypes will separate out, and cause the duplicate genes
that you see in the BUSCO report. For example, for humans it reports the
(haploid) genome size as 3Gbp while a phased assembly will be about 6Gbp.
Good luck!
Mike
…On Wed, Jun 19, 2024 at 2:36 AM gunjanpandey ***@***.***> wrote:
I have assembled for a genome a "suspected" highly hetrozygous genome
using hifi.
The assembled genome size is 8.2G, which gives following BUSCO results.
Could you please help me understand how to analyse these results. And how
to perform this analysis properly, as I believe, I am somehow getting the
genome size estimation half of its real value?
image.png (view on web)
<https://github.com/schatzlab/genomescope/assets/50389451/a03d1e86-16b3-4df2-b86e-5dca2c0caf1d>
I have run following for the genome size estimation in genomescope2 using
paired-end illumina files.
meryl count k=19 output k19.meryl ${R1} ${R2}
meryl histogram k19.meryl/ > 19_meryl.hist
Rscript genomescope2.0/genomescope.R -i k19_meryl.hist -k 19 -o k19_genomescpe
and I get the following results summary
image.png (view on web)
<https://github.com/schatzlab/genomescope/assets/50389451/491cc940-8a9d-4228-aab2-bc447d82a257>
And the graph
image.png (view on web)
<https://github.com/schatzlab/genomescope/assets/50389451/22767229-74e3-48ea-91f5-02f7096a838c>
—
Reply to this email directly, view it on GitHub
<#132>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABP343PVH2X7BMFL6KX2MLZIERIDAVCNFSM6AAAAABJRMR5B2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DCNBUHEYDEMQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thanks for a quick reply @mschatz The website is giving me an error so I am uploading the file here. |
@mschatz - thoughts, please? |
Hi,
I tried running your file through the website (the raw file is too big so I
selected the first 100k rows).
By default, it estimates the haploid genome size to be 3.7Gb with a very
high heterozygosity rate (7.53%). This is an extreme level of
heterozygosity: human is about 0.1%, and an F1 of two wild strains of
Arabidopsis is about 2%.
http://genomescope.org/genomescope2/analysis.php?code=hTdEo4uGrO6TsCTSfKeX
I also did a second run where I gave it a hint that the peak at 50x
coverage is really the homozygous peak. I did this by setting "Average
k-mer coverage for polyploid genome" to 25. This gives a nice fit for a
haploid genome size of 7.3Gb with a much more reasonable heterozygosity
rate of 0.1%
http://genomescope.org/genomescope2/analysis.php?code=0TnxODdt3XSZQI9lZ4AO
From just the kmer profile it is ambiguous which is the correct model fit
(although I have seen the lower rate of heterozygosity is more often
correct). From your BUSCO results, you have a very high rate of BUSCO gene
duplicates, which can occur when you have a heterozygous assembly so that
you get separate representations of the maternal and paternal chromosome,
but this can also suggest a whole genome duplication event. How did you
assemble the genome? Was BUSCO run on all contigs or just the primary
assembly? If you extract the duplicate BUSCO genes and align them, do you
see that they have a ~7% different rate or a ~0.1% difference rate? This
can be a good clue
Good luck!
Mike
…On Thu, Oct 24, 2024 at 4:03 PM gunjanpandey ***@***.***> wrote:
@mschatz <https://github.com/mschatz> - thoughts, please?
—
Reply to this email directly, view it on GitHub
<#132 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABP343SDFR5IER6YGQU7ULZ5FG7LAVCNFSM6AAAAABJRMR5B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZWGIZTQOBWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have assembled for a genome a "suspected" highly hetrozygous genome using hifi.
The assembled genome size is 8.2G, which gives following BUSCO results.
Could you please help me understand how to analyse these results. And how to perform this analysis properly, as I believe, I am somehow getting the genome size estimation half of its real value?
I have run following for the genome size estimation in genomescope2 using paired-end illumina files.
and I get the following results summary
And the graph
@rahulvrane, thoughts?
The text was updated successfully, but these errors were encountered: