Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Large structural changes to MPcules #926

Merged
merged 93 commits into from
Sep 4, 2024

Conversation

espottesmith
Copy link
Contributor

@espottesmith espottesmith commented Jan 18, 2024

Meant to do this piecemeal, but there kept being things that I wanted/needed. Also, the main point was to make changes/fixes to the API, and that's actually one of the things I still need to mess with/test. Anyways, here we are.

Contributor Checklist

  • I have broken down my PR scope into the following TODO tasks
    • Add new document containing molecular dipole/multipole data and associated builders
    • On @munrojm's suggestion, changed the has_props field in the molecule summary doc to aid in querying
    • Remove probably unnecessary query operators
    • Switch molecule element querying to be based on composition, which should be more efficient?
    • Add dipoles/multipoles to the information returned in a trajectory query. Also improved testing for the trajectory query operator
    • Add three-center (3C) bonds and 3-center 4-electron hyperbonds to the orbital doc/builder. Thanks @samblau for making the changes to pymatgen necessary to parse this data
    • Change all molecule property builders to break down documents based on hash, rather than formula. Even for the small and quick property builders, the current scale of the data (~4M tasks, ~600k molecules) made separation by formula problematic.
    • Reformat some data to remove lists (where possible) and make querying on summary document faster
    • Make some changes to NBO-based bonding, based on some issues identified by @samblau and @orionarcher
    • (Potentially) add string reps (InChI and SMILES) to all molecules docs, rather than just tasks (for SMILES) and molecules/summary (for InChI)
  • I have run the tests locally and they passed. (I never run ALL the tests, cause that takes a hot minute, but I did run all of the tests that I messed with!)
  • I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR

need to do the same thing with property builders, but we'll cross that
bridge later
hash comparisons possible at a high level across the build process
@espottesmith
Copy link
Contributor Author

Yo @tschaume @yang-ruoxi @tsmathis, can this be merged?

@tsmathis
Copy link
Collaborator

tsmathis commented Aug 1, 2024

I'd say more or less yes, just a couple questions from me

  1. I see the enums for qchem are now excluded from version control in the .gitignore, where the vasp enums are still included. Should that be the case?
  2. I see large chunks of code in some places that are commented out like here:
    # class ElementsQuery(QueryOperator):
    is there a reason to keep sections of code like this for later? Or should they be removed?
  3. And I see some new test files, are these just generated task documents? Nothing copyrighted?

@tschaume
Copy link
Member

tschaume commented Aug 1, 2024

In addition to the comments by @tsmathis, I'd like to go through the indexes that we need to make sure are built for the most common queries to be performant. I've added a few already but it'd be great if you could let me know any that are missing:

  • molecule_id
  • last_updated
  • formula_alphabetical
  • charge
  • spin_multiplicity
  • natoms
  • nelements
  • builder_meta.$**
  • elements.$**
  • composition.$**
  • chemsys
  • symmetry.$**
  • has_props.$**

@espottesmith
Copy link
Contributor Author

To @tsmathis:

  1. Since the enums get auto-generated, I figured there was no need for changes made to them to get pushed. If you want the line removed from .gitignore, that's fine.
  2. Those lines can be removed for now. If they need to be added back in later, that's fine.
  3. All of the test data that I've added is from collections that are already in MP or are going to be in MP, I believe. So no copyright issues.

Some fields that should probably be indexed:

  • species_hash
  • inchi
  • inchi_key
  • task_ids
  • unique_levels_of_theory
  • unique_solvents

@tschaume
Copy link
Member

tschaume commented Aug 1, 2024

Great, thanks! I added the indexes. Are all query operators for the summary collection covered by indexes now?

@tsmathis
Copy link
Collaborator

tsmathis commented Aug 1, 2024

Okay, cool. All fine by me with those then, just wanted to check

@espottesmith
Copy link
Contributor Author

I just added HashQuery to the summary resource, which means that we should add an index to coord_hash in addition to species_hash. Otherwise, yeah, everything should be covered.

@tschaume
Copy link
Member

tschaume commented Aug 2, 2024

Ok great. I've added the index on coord_hash too. Are there any mp-api client changes necessary that we should be aware of?

@espottesmith
Copy link
Contributor Author

It doesn't look like it? I think all of the fields available in the rester search() function are still query-able through the API. Though weirdly, in mprester.py, the molecule endpoint is

molecules: MoleculeRester

which points to the molecules collection, while I'm pretty sure it should be

molecules: MoleculesSummaryRester

pointing to the molecules summary collection.

@tschaume tschaume merged commit 69adac5 into materialsproject:main Sep 4, 2024
7 checks passed
tschaume added a commit that referenced this pull request Sep 19, 2024
tschaume added a commit that referenced this pull request Sep 19, 2024
* Revert "Merge pull request #926 from espottesmith/query_restructure"

This reverts commit 69adac5, reversing
changes made to 7ded807.

* regenerate enums after mpcules PR revert

* [automated commit] update calc type enums

* molecules: rm StringRepQuery

---------

Co-authored-by: github-actions <github-actions@github.com>
tschaume added a commit that referenced this pull request Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants