Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add ProfileHMM[*] semantic types #328

Merged
merged 29 commits into from
Jun 25, 2024
Merged

Conversation

Sann5
Copy link
Contributor

@Sann5 Sann5 commented May 16, 2024

Closes #327.

Adds new semantic types for profile hidden markov models as implemented in the HMMER + tests and test data.

@gregcaporaso
Copy link
Member

gregcaporaso commented May 23, 2024

@misialq, would you be able to review this one?

Update: @colinvwood is going to try to take a pass through this and merge today, so it's in the prepare this week.

@colinvwood
Copy link
Contributor

Hey @misialq, I didn't have time to look at this today. If you have time tomorrow to look at this then just let me know otherwise I'll plan to look at it tomorrow. Excuse all the pings 🥸

@misialq
Copy link
Collaborator

misialq commented May 24, 2024

Hey @gregcaporaso, @colinvwood - sure thing, I already had a glance - there are some significant changes which I proposed to @Sann5 so please do not review yet - the contents will likely change. We'll ping you when ready, thanks! 🙏

@gregcaporaso gregcaporaso marked this pull request as draft May 24, 2024 12:44
@gregcaporaso
Copy link
Member

gregcaporaso commented May 24, 2024

Good to know, thanks @misialq. I converted this to a Draft pull request. Since we have the release next week, I'm going to bump this to the project board for the next release - let us know if it'll be an issue to not have this in 2024.5.

@misialq
Copy link
Collaborator

misialq commented May 24, 2024

Hey @gregcaporaso, thanks! No, I don't think it's an issue if we don't have it in 2024.5. We will probably want to test it out a bit together with our new moshpit action for eggnog so we may need some more time anyway :)

@Sann5 Sann5 changed the title ENH: add ReferenceDB[HMMER] semantic type ENH: add HMM[*] semantic types Jun 4, 2024
@Sann5 Sann5 marked this pull request as ready for review June 10, 2024 15:17
@Sann5 Sann5 changed the title ENH: add HMM[*] semantic types ENH: add ProfileHMM[*] semantic types Jun 10, 2024
Copy link
Collaborator

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Sann5, first "superficial" review and some change suggestions below. Will look at the tests and give it a go once you update.

q2_types/profile_hmms/_format.py Outdated Show resolved Hide resolved
q2_types/profile_hmms/_format.py Outdated Show resolved Hide resolved
q2_types/profile_hmms/_format.py Outdated Show resolved Hide resolved
q2_types/profile_hmms/_type.py Show resolved Hide resolved
@Sann5 Sann5 requested a review from misialq June 20, 2024 08:17
@misialq
Copy link
Collaborator

misialq commented Jun 21, 2024

Hey @Sann5, what's up with the two failing tests?

@Sann5
Copy link
Contributor Author

Sann5 commented Jun 21, 2024

Hey @Sann5, what's up with the two failing tests?

@misialq I opened an issue in phammer complaining how the error thrown when loading a file with mixed profiles (DNA, RNA, Protein) was uninformative. They already fixed it and pushed the patch to conda. I will update the error handling accordingly here.

@Sann5 Sann5 requested a review from misialq June 25, 2024 11:45
Copy link
Collaborator

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Sann5, LGTM, thanks! If it's not too much trouble, do you think you could attach here this nice table you presented once in our meeting - it may be helpful in understanding what all the formats do 🙏

@lizgehret do you think you could check this out? :)

@Sann5
Copy link
Contributor Author

Sann5 commented Jun 25, 2024

Hey @Sann5, LGTM, thanks! If it's not too much trouble, do you think you could attach here this nice table you presented once in our meeting - it may be helpful in understanding what all the formats do 🙏

@lizgehret do you think you could check this out? :)

Sure thing!

Profile HMM's

How are they used

The way they are usually used is:

  1. You take a group of sequences that are known to be related (i.e. a protein family)
  2. You build a profile HMM from the alignment of these sequences (called a seed alignment).
  3. You use the profile HMM to estimate the probability that a sequence "belongs" to this family (i.e. search for homologs).

One can also use profile HMMs to do sequence annotation or alignment.

How are they stored

Profile HMMs are different for different sequence types (e.g. DNA, RNA, and protein). Moreover, HMMER, the go-to software for biological sequence analysis with profile HMMs, saves profiles as text (or binary) files. One file can contain one or more profiles, each representing a group of sequences. However, no valid file can have profiles from more than one sequence type. Files with multiple profiles will be used to run some programs in HMMER while files with a single profile can run other programs.

The proposal

To accommodate the different things that these profiles represent as well as the future use cases, this PR proposed the following semantic types.

Protein DNA RNA
Single Profile ProfileHMM[SingleProtein] ProfileHMM[SingleDNA] ProfileHMM[SingleRNA]
Multiple Profiles ProfileHMM[MultipleProtein] ProfileHMM[MultipleDNA] ProfileHMM[MultipleRNA]
Multiple Profiles in Binary + Indexed ProfileHMM[PressedProtein] ProfileHMM[PressedDNA] ProfileHMM[PressedRNA]

Copy link
Member

@lizgehret lizgehret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all looks reasonable, thanks @Sann5!

@lizgehret lizgehret merged commit a22a64a into qiime2:dev Jun 25, 2024
4 checks passed
@lizgehret lizgehret self-assigned this Jun 25, 2024
@Sann5 Sann5 deleted the st_hmmer_db branch June 26, 2024 07:41
@lizgehret lizgehret removed their assignment Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

ENH: add ReferenceDB[HMMER] semantic type
5 participants