Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkv database version #73

Open
wen1112 opened this issue Apr 24, 2024 · 5 comments
Open

checkv database version #73

wen1112 opened this issue Apr 24, 2024 · 5 comments

Comments

@wen1112
Copy link

wen1112 commented Apr 24, 2024

Could you can tell me which checkv database version were used? Because I found the different checkv database version the result is also different.
Thank you!

@efratmuller
Copy link

efratmuller commented Dec 12, 2024

Hi @apcamargo !
First of all - thank you for this really great resource.
Joining the question above: I was wondering if you can share which CheckV version have you used here?
I found that there are multiple discrepancies between the CheckV statistics as provided in your uhgv_metadata file, and those I receive when running CheckV by myself on the same contigs (CheckV 1.0.1, DB version 1.0). These include contigs that were marked as "high quality" in your metadata file, but medium or low quality in mine, and vice versa. Any clarification would be much appreciated!
Thanks again,
Efrat

@apcamargo
Copy link
Collaborator

Hi @efratmuller

Thank you! Although I should say that @snayfach did most of the heavy lifting to generate the data resource :)

Any reason you are using version 1.0 of the database? The latest version is 1.5 and we used version 1.4 to get the estimates that are listed in the metadata file.

@efratmuller
Copy link

Thanks @apcamargo for your quick reply!

I have re-run checkV with db version 1.4 but for some reason I'm still getting weird discrepancies and I was wondering whether you have any idea as to what could be the reason?

A few concrete examples:

(1) "UHGV-0889122" (representative of vOTU-155218) is reported to have an estimated completeness 76.09 (as listed in your metadata file), but when I ran checkv I got a completeness estimation of 43.07. Overall quality category in your metadata was "medium quality" while mine came out "low-quality". Genome lengths and cds_count were the same (just as a sanity). Notably, the "checkv completeness method" in my run was "HMM-based (lower-bound)" and in yours it was "AAI-based".

(2) "UHGV-0404930" (rep of vOTU-052718) has 100% completeness in your metadata (quality = "Complete"), but only 82.72% completeness in my run (quality = "Medium-quality"). Again, genome length and cds_count are the same. Completeness method in my run was "AAI-based (high-confidence)" and in your metadata table it is "DTR".

Overall, ~25% of the genomes (in the MQ version) seem to have different quality categories in your checkv run vs. mine. Any help figuring out what are we doing differently will be greatly appreciated!!

Many thanks in advance (and Merry Christmas),
Efrat

@apcamargo
Copy link
Collaborator

apcamargo commented Dec 20, 2024

Hi @efratmuller, sorry for the late reply (and thanks for all the details!)

It looks like the discrepancy is because the complete genomes of UHGV were added to the CheckV database before the completeness of the other genomes was estimated. I’ll update the CheckV database before the preprint is out to make sure everything is reproducible. In the meantime, you can set up a custom database on your end if you need it.

@efratmuller
Copy link

Thanks @apcamargo , I appreciate the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants