You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The root question that led to discovering #2318 was that @bluegenes was getting unexpectedly inaccurate results from a zymo mock community analysis, and she wanted to understand why.
In brief, there were two genomes that were producing identical sketches, GCA_902364275.1 Faecalibacterium prausnitzii, MGYG-HGUT-00195 and GCA_003478405.1 Faecalibacterium sp. AF28-13AC strain=AF28-13AC. And of course sourmash prefetch was finding both of them, and gather was picking one of them.
If it picked the first, the species was "correctly" assigned as Faecalibacterium prausnitzii because that's the NCBI taxonomy entry for this genome, and the mock community composition indicated it contained F. prausnitzii. If it picked the second, then it was missing the correct species.
See #1615 and the idea for making identical matches accessible. It is not clear to me how to properly handle this in gather output, since sourmash tax only sees gather output. Maybe we could provide multiple entries per rank where appropriate, and mark them as "secondary", and then sourmash tax could choose the one with the most specific tax, or something? Ugh.
The text was updated successfully, but these errors were encountered:
The root question that led to discovering #2318 was that @bluegenes was getting unexpectedly inaccurate results from a zymo mock community analysis, and she wanted to understand why.
In brief, there were two genomes that were producing identical sketches,
GCA_902364275.1 Faecalibacterium prausnitzii, MGYG-HGUT-00195
andGCA_003478405.1 Faecalibacterium sp. AF28-13AC strain=AF28-13AC
. And of coursesourmash prefetch
was finding both of them, andgather
was picking one of them.If it picked the first, the species was "correctly" assigned as
Faecalibacterium prausnitzii
because that's the NCBI taxonomy entry for this genome, and the mock community composition indicated it contained F. prausnitzii. If it picked the second, then it was missing the correct species.See #1615 and the idea for making identical matches accessible. It is not clear to me how to properly handle this in gather output, since
sourmash tax
only sees gather output. Maybe we could provide multiple entries per rank where appropriate, and mark them as "secondary", and thensourmash tax
could choose the one with the most specific tax, or something? Ugh.The text was updated successfully, but these errors were encountered: