-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restructure #9
Comments
Nice, thanks for these thoughtful observations and proposals, @yuanqing-wang ! I mulled this over yesterday afternoon and this morning, and largely agree with this. We want to separate concerns in a way that lets us do the comparisons we plan as painlessly as possible, and it's tricky because we want to make comparisons at a few different levels. I would draw the lines between the tasks slightly differently, and in particular I would emphasize the distinctions between the molecule graph, the graph you perform message-passing on, and the factor graph representing the potential energy function. I would start by stating the overall task in a way that doesn't commit to a specific way to solve it.
where To implement
To implement
For different tasks, we may be interested in doing different things with the output
|
Thanks for fleshing this out! @maxentile I think it's a lot clearer to further distinct the graphs into these stages. I would suggest that we look more closely at Moreover, we could compare |
Agreed -- there's a lot going on at that step! Perhaps we can narrow down by listing the types of comparisons involving this step that we need to support, and the types of comparisons we may eventually want to support. Required comparisons (based on current version of paper outline):
To support this, maybe we have two subclasses of Speculative comparisons:
|
Rather than hard-code Class I and Class II, how about having a
|
I agree that we should definitely compare this. But just in terms of how to engineer it: the |
Both options are basically the same for a single binary comparison. In one case, there's a single class I prefer to have two sub-classes, rather than having if-else branches that check a string flag to find out what kind of object the However, if we're entertaining the possibility of more than two variants of In general, when we find ourselves branching on some string that says what class an object is, that's often a hint that we'd be better off with sub-classes. (This page has a helpful comparison of these two options: https://refactoring.guru/replace-conditional-with-polymorphism ) |
I think it's time, for the sake of scaling this up to more systematic experiments, to restructure this project.
The primary challenge I realized when designing experiments and structuring the current project is that, there are too many parts in this streamline that we want to model. Specifically:
rdkit.Molecule
,openeye.GraphMol
, oropenforcefield.Molecule
)↓
dgl.Graph
ordgl.HeteroGraph
)the object that contains the information regarding the identities / attributes of the atoms and how they are connected.
↓
dgl.Graph
ordgl.HeteroGraph
)the object that contains all the parameters necessary for MM-like energy evaluation, namely
k
s andeq
s and so forth↓
the end object in both fitting and inference
There are problems with each of these objects, and we can argue that there are still downstream objects that we can have, namely those needed for
openmm
simulations.But first, here's how I picture to structure the repo:
espaloma/
core code for graph nets-powered ������force field fitting.graph.py
graph abstraction of molecules.parametrized_graph.py
molecular graph with all the parameters needed for energy evaluationnet.py
interface for neural networks that operates onespaloma.graph
mm/
helper functions to calculate MM energyscripts/
experiment scriptstyping/
experiment scripts to reproduce atom-typingmm_fitting/
experiment scripts to fitespaloma
potentials to Molecular Mechanicsqm_fitting/
experiment scripts to fitespaloma
potentials to Quantum Mechanics-
class_i/
class_ii/
with the inclusion of coupling and higher-order termsdeployment/
experiment to provide interface toopenmm.system
objects and run actual MD.devtools
development toolsThe following functions are needed to allow the aforementioned streamline in a cleaner manner:
espaloma.graph.from_oemol
and friends: read molecules into graphsespaloma.parameterized_graph.from_forcefield(forcefield='gaff2')
parametrize a mol graph using a legacy forcefield. we will likely need to port from other repos for these implementations, or import them.espalmoa.mm.energy(g, x)
evaluate energy from the parametrized molecule (however it's parametrized) and its geometry ( lots of helper functions are needed, of course, and test coverage would ideally ensure it's consistency withOpenMM
although that might be tricky for especially nonbondned terms.The following are the trickiest choices that I would like to have a discussion here:
ways to structure
graph
currently this is done by having both
graph
,heterograph
andhierarchical_graph
objects as input. I find this a bit ugly. I suggest allowing only one kind of graph as the input of either NNs or legacy force fields, and put whatever needed to express the relationships between graph entities as part of the model. Note, however, that we would need tricks to make sure that this part of the modeling is only executed exactly once during training.ways to output energies
without any sum functions, one molecule would have separated bond, angle, and nonbonded energies all with different dimensions. this becomes even more complicated when you have a batch with multiple molecules. I think it's critical that we find a simple way to output energies
ways to have a general enough
espaloma.net
object to enable future expansionnow on the representation level we already have a bunch of ideas: atom-level message-passing only, hierarchical message-passing, or somewhat fancier version of it that I proposed to use an RNN to encode the walks with different length (corresponding to different levels in the hierarchy). If we were to limit the input graph to be universal, how are we going to develop the
net
module so that it can be not too much of a headache to express these ideas.The text was updated successfully, but these errors were encountered: