Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing the Lux model for parameters and state #12

Merged
merged 12 commits into from
May 30, 2024

Conversation

CedricTravelletti
Copy link
Collaborator

@CedricTravelletti CedricTravelletti commented Apr 29, 2024

This introduces state inside AtomsCalculators by follwing the Lux model.

This PR is a followup to the Zulip discussion following the April 10th 2024 CESMIX and JuliaMolSim community meeting. It also builds on the discussion in #11 and adresses the same needs.

Philosophy

The basic idea behing this PR is to separate high-level and low-level calls. High-level calls are defined as the ones to potential_energy, forces, forces! and virial, whereas the low-level ones are the ones to calculate.

High-level calls preserve backward compatiblity, while low level calls now follow the Lux model, by using immutable calculators, accepting parameters and state arguments and returning state.

Detailed documentation is bundled with the PR. We refer to it for details.

@CedricTravelletti CedricTravelletti changed the title Introductin the Lux model for parameters and state Introducing the Lux model for parameters and state Apr 29, 2024
@tjjarvinen
Copy link
Collaborator

Thanks for the PR. For the main points it looks good!

I did not look the all inner workings of the @generate_interface macro yet, but I'll look for it soon.

Also, I would like this to become a breaking release (v0.2).
I have some bug corrections and other issues coming for v0.1 that I would like to merge first, So, that all the initial works are in good shape with a final release in v0.1 series. After that we could proceed with this and add other features too, like Hessian.

@CedricTravelletti
Copy link
Collaborator Author

@tjjarvinen Thank you for your feedback.

Waiting on these bug corrections then.

@cortner
Copy link
Member

cortner commented May 11, 2024

This is now blocking for me to proceed with the new ace models. Would be great to get it reviewed and merged.

@tjjarvinen
Copy link
Collaborator

This is now blocking for me to proceed with the new ace models. Would be great to get it reviewed and merged.

You can do review it too! We need couple of them anyways.

I will try to go trough the macro. But I presume it is in good shape. The macro does need more documentation on how it works. But we can do that in the future.

@tjjarvinen tjjarvinen requested review from cortner and tjjarvinen May 11, 2024 02:01
@cortner
Copy link
Member

cortner commented May 11, 2024

Of course I will as soon as somebody has done a first pass please.

Copy link
Collaborator

@tjjarvinen tjjarvinen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking this long, but there are a lot of points to check with this.

I am ok with the idea and marked some points on the code that could be looked onto.

But then I have also two bigger issues. One onto which I would like to have some comments from others too. And other that is just implementing tests.

1 Lux interface

We add state and parameters and in the return structure we give out state.

For us this is logical. We return internal state (electronic structure) and we can then calculate some additional features out of it. Parameters like Lennard-Jones terms are not returned.

This is a contradiction to Lux where state is the one you applie learning onto - we don't want to learn internal state, but instead we want to learn parameters.

So, what I am saying here is that the Lux model does not apply straight to our case and then the question is how do we deal with this?

2 Tests

If we are defining the new interface to return state and the function calls to allow calls with 5 positional arguments, then we need to have tests for these also.

examples/low_vs_high_level_calculator.jl Outdated Show resolved Hide resolved
examples/low_vs_high_level_calculator.jl Outdated Show resolved Hide resolved
docs/src/interface-definition.md Outdated Show resolved Hide resolved
src/utils.jl Show resolved Hide resolved

# Low-level call, implemented by the user.
AtomsCalculators.calculate(AtomsCalculators.Energy(), hydrogen, LowLevelCalculator(nothing),
state = 1.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is not a correct call. state = 1.0 is now considered as a keyword and not the argument state.

julia> AtomsCalculators.calculate(AtomsCalculators.Energy(), hydrogen, LowLevelCalculator(1), 1, state=2)
(energy = 0.0 Eₕ, state = nothing)

#but
julia> AtomsCalculators.calculate(AtomsCalculators.Energy(), hydrogen, LowLevelCalculator(1), 1, 2)
(energy = 0.0 Eₕ, state = 2)

@cortner
Copy link
Member

cortner commented May 15, 2024

I think the state could in principle contain part of the output of the model, provided it is invalidated on the next call, or when the model is called with a new input. I actually quite like this usage of st. Do I understand this correctly? ...

at1 = # some structure
at2 = # another structure 

v1, st = virial(at1, calc, ps, st)
# st.virial, st.energy, st.forces are all computed during this call.

E1 = energy(at1, calc, ps, st) 
# no new computation to be done, just return st.energy
# (same with forces) 

E2 = energy(at2, calc, ps, st)
# new input structure at2, so the content of st must be invalidated. 

To be clear, I'm not advocating for this yet. I am just thinking out loud -- is this the intention? And if yes, I don't really see how it contradicts the Lux model?

This is a contradiction to Lux where state is the one you applie learning onto

I don't understand this statement. In Lux the state is just an internal state of the model that can contain anything you want (temporary arrays for example) or just nothing at all.

My understanding of Lux is that you do

output, st = model(input, ps, st)

you learn parameters ps such that output matches some data. The value of output should be entirely independent of the contents of st.

@cortner
Copy link
Member

cortner commented May 15, 2024

Re tests, I agree of course.

@tjjarvinen
Copy link
Collaborator

The issue here is most likely what the terms mean.

Like if one implementation uses state for learning and other does not. Will this lead to some issue? I am concerned that this leads to some problems in the future.

In any case we should have more documentation that defines on what to do with these two (state and parameter).

@tjjarvinen
Copy link
Collaborator

And one thing I forgot from the review.

We need functions get_state, get_parameters and probably also set_state and set_parameters.

@cortner
Copy link
Member

cortner commented May 15, 2024

I agree with that. I already have those in ACEpotentials and could switch over to extending methods here.

@rkurchin rkurchin mentioned this pull request May 15, 2024
@CedricTravelletti
Copy link
Collaborator Author

I think the state could in principle contain part of the output of the model, provided it is invalidated on the next call, or when the model is called with a new input. I actually quite like this usage of st. Do I understand this correctly? ...

at1 = # some structure
at2 = # another structure 

v1, st = virial(at1, calc, ps, st)
# st.virial, st.energy, st.forces are all computed during this call.

E1 = energy(at1, calc, ps, st) 
# no new computation to be done, just return st.energy
# (same with forces) 

E2 = energy(at2, calc, ps, st)
# new input structure at2, so the content of st must be invalidated. 

To be clear, I'm not advocating for this yet. I am just thinking out loud -- is this the intention? And if yes, I don't really see how it contradicts the Lux model?

This is a contradiction to Lux where state is the one you applie learning onto

I don't understand this statement. In Lux the state is just an internal state of the model that can contain anything you want (temporary arrays for example) or just nothing at all.

My understanding of Lux is that you do

output, st = model(input, ps, st)

you learn parameters ps such that output matches some data. The value of output should be entirely independent of the contents of st.

Exactly, and this is the intention in this PR.

Just to clarify (inspired from Why we wrote Lux:

  • Calculators must be immutable – cannot store any parameter/state but rather store the information to construct them

  • Calculators (in their low-level calls) are pure functions

  • Calculators (in their low-level call) return a Tuple containing the result and the updated state

Hope this clarifies the intent.

@CedricTravelletti
Copy link
Collaborator Author

@tjjarvinen and @cortner Thans for the feedback!
I will go over the suggestions and add the requested tests.

Concerning the get_state and set_state, if we strictly follow the Lux model, then Calculators calls are pure functions, so we don't have a state inside the calculator.

I can write some more documentation and examples to clarify the situation.

@cortner
Copy link
Member

cortner commented May 15, 2024

if we strictly follow the Lux model ...

I thought we had agreed to not do that. But I'm totally open to revisiting this. The reason we felt we can have a mixed model is that users never want to see ps, st. I thought that the suggestion was to have calls involving ps, st act in a pure way (Lux model) but that we can still store those parameters inside the model if we wish and in that case the calls can become mutating.

An alternative approach that was also discussed and which might be preferrable is to provide a simple wrapper

struct StatefulCalculator
    calc
    ps 
    st 
end

I am not able to anticipate all potential problems with either approaches. I immagine both have advantages and disadvantages.

@tjjarvinen
Copy link
Collaborator

We also have the high level interface that needs to have parameters and state in the calculator structure. So, we need to have the function calls to extract the state and parameters too.

If we go for the pure Lux formulation then we need to separate the calculators to low and high levels, which is not good.

@cortner
Copy link
Member

cortner commented May 15, 2024

I think at a minimum there is no harm - for now - in allowing calculators to store ps and st.

Provided the low level interface strictly follows Lux rules.

If we learn otherwise we can revisit this discussion.

@tjjarvinen
Copy link
Collaborator

While calculators contain state and parameters, in low level calls when you get state and parameters as an input, you can just ingnore the contained state and parameters, and perform the calculation with the ones you got as an input. Then you have pure function calls with low level interface.

@mfherbst
Copy link
Member

In my memory we also advocated for a mixed model, where high-level is convenient and low-level is lux-like. This of course has a little bit the risk of suggesting some wrong expectations about how the mechanics works, so we have to be strict in code examples and document properly.

I would suggest for example that the low-level calls ignore internal state and parameters and that the high-level calls do not take state and parameters at all, but require you to externally take care of updating the state yourself. I.e. I advocate a get_state and set_state and similar and I advocate forcing users of the high-level interface to use those for updating the objects stored in the calculator to make the mechanics clear. Thoughts on that?

@CedricTravelletti
Copy link
Collaborator Author

I agree with the above comments about the state and parameters issue.
I think @cortner 's comment pins down the solution:

[...] there is no harm - for now - in allowing calculators to store ps and st.
Provided the low level interface strictly follows Lux rules.

So my implementation suggestion is the following:

  1. low-level calls take (and need) ps and st, thereby strictly following the Lux convention.
  2. bundled calculators are allowed, and are the convention for high-level calls.
  3. high-level calls do not take ps and st.
  4. This means that developpers of calculators are left free to specify the intended behaviour of high-level calls (re-use the state or start afresh or else) and users that want manual control have to use the getters/setters.
  5. We include getters and setters for ps and st.

If you agree with those points, I'll begin working on an implementation.

@tjjarvinen
Copy link
Collaborator

We should also make sure that low level calculators work with high level calls and force it in the test set.

For usability point it is very important that all low level calculators work with high level calls.

@CedricTravelletti
Copy link
Collaborator Author

We should also make sure that low level calculators work with high level calls and force it in the test set.

For usability point it is very important that all low level calculators work with high level calls.

Agreed. Do you think that what is currently being done in AtomsCalculators.jl/src/submodules/AtomsCalculatorsTesting.jl covers this?

It seems to me that these tests cover both low-level and high-level calls for energy, forces and virial.

@cortner
Copy link
Member

cortner commented May 25, 2024

Can somebody give me a brief summary what is still needed to finish this PR please?

@@ -32,6 +57,8 @@ Each of the individual calls have two common inputs: `AtomsBase.AbstractSystem`
- First input is either `Energy()`, `Forces()` or `Virial()`
- Second is `AtomsBase.AbstractSystem` compatible structure
- Third is `calculator` structure
- Fourth is `parameters`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag parameters and state and optional?

@cortner
Copy link
Member

cortner commented May 25, 2024

I didn't see anything in the documentation or the PR about providing calls of the form

frc, vir = calculate( ( Forces(), Virial(), ), sys, calc, ...) 

Is this not intended to be supported?

From the user's point of view the most common calls will be

  • energy only
  • forces ony
  • forces and virial (e.g. to do MD or optimization in position and cell space)
  • all of energy, forces and virials together

@cortner
Copy link
Member

cortner commented May 25, 2024

It is also not entirely clear to me how to make the calculate interface non-allocating. Something like the following might be needed?

frc = # allocate
frc, vir = calculate!(  (forces = frc, virial = nothing), sys, calc, ...)
E, frc, vir = calculate!(  (energy = nothing, forces = frc, virial = nothing), sys, calc, ...)

To be clear, I'm not happy with the nothing, I just couldn't think of how to do this more elegantly off the top of my head.

If I missed something when skimming the PR, please point it out.

Was the calculate interface a bad idea afterall? I know I was really pushing for it because it would be extendable. And I'd like to make it work. @CedricTravelletti -- are you free for a zoom call next week Wed-Fri to discuss it?

@tjjarvinen
Copy link
Collaborator

Calculate interface does not have these at this moment. It has always been discussed as what to do, but not implemented.

You can call

res = AtomsCalculators.energy_forces(sys, calc; kwords...)
res.energy # energy
res.forces # forces
# note there can be something else too - this is allowed by the interface

res = AtomsCalculators.energy_forces_virial(sys, calc; kwords...)
res.energy # energy
res.forces # forces
res.virial   # virial
# note there can be something else too - this is allowed by the interface

Also, the order might not be energy, forces, as that is not defined by the interface. The interface also allows you to return something else to gether with energy, forces and virial.

There are no forces_virial call at this moment.

@cortner
Copy link
Member

cortner commented May 25, 2024

One more comment - I am getting the impression that we are giving calc developers the option of implementing either interface. My first instinct is that this is not helpful. It will create confusion on which one to implement. I think there should be a very clear default, and anybody not following that proceeds at their own risk.

Ideally that default is relatively simple. My personal preference is that this should be a non-allocating low-level interface involving params and states and that the everything else gets derived from that automagically. (e.g. through macros)

@cortner
Copy link
Member

cortner commented May 25, 2024

@tjjarvinen

Also, the order might not ...

The way I wrote the call, the order would become unambiguous so one could in principle enforce it. It may or may not be a good idea but it is possible.

There are no forces_virial call at this moment

Yes, I'm aware. I'm just listing what are typical use-cases. For most use-cases this might not be needed since energy normally doesn't cost anything. But what about a calculator that has no energy? E.g. force mixing. You could return missing for the energy. Again, not clear what it should be, I'm just listing what I think are canonical use-cases.

@tjjarvinen
Copy link
Collaborator

One more comment - I am getting the impression that we are giving calc developers the option of implementing either interface. My first instinct is that this is not helpful. It will create confusion on which one to implement. I think there should be a very clear default, and anybody not following that proceeds at their own risk.

The issue with only having low level (lux call) that is implemented, is that developers then need to understand lux and that will just give new people impression that the interface is just for esoteric persons. Basically what they mostly want is just an easy way to create a custom calculator and get going, and not to bother with lux stuff.

So, the whole point here is that you can do it easy, and if you need the extra features the low level interface offers, then you can put the extra effort to implement it.

@cortner
Copy link
Member

cortner commented May 25, 2024

You are probably right. But the documentation needs to be crystal clear about it.

@cortner
Copy link
Member

cortner commented May 25, 2024

I think I've been derailing the discussion here a little bit and I apologize for this. I'm just concerned that with the slow speed of development and review I don't want another major revision of the interface immediately after merging the current one.

Having thought about this again a bit longer, could there be an alternative way to make calculators non-allocating by allowing the caller to pass in pre-allocated forces as a keyword argument? It would look something like that.

# Implementation 
function forces( sys, calc, ...;   frc = zero_forces(sys, calc),  kwargs...) 
     # ... 
     return frc 
end 

# allocating usage 
frc = forces(sys, calc) 

# non-allocating usage 
frc = # allocate by whatever means, e.g. bumper
forces(sys, calc; frc = frc) 

I think this would have the huge advantage of requiring only one interface and we can get rid of forces! and similar.

I haven't thought through what the best option could be for the calculate interface, but I can see multiple ways how to achieve this.

@tjjarvinen
Copy link
Collaborator

There are problems with using keywords like that.

First, the keyword frc would need to be implemented for all calculators, as it would become a part of the interface.

Currently the intended interface usage is that keywords are used to give control commands to individual calculators. Calculators can ignore these, if they wish. Arguments on the other hand are intended to be reacted by every calculator.

Second issue is that, are there any Julia programs that mutate keywords? Or is it safe from language point?

suggestion for the issue

That being said, could you simply do

function myforces(sys, calc, ...;   frc = zero_forces(sys, calc),  kwargs...)
     return forces!(frc, sys, calc; kwargs...)
end

You could do that in your own package without need to change AtomsCalculators interface itself. We could even have this call in AtomsUtilityCalculators with the current version of AtomsCalculators.

calculator interface

For calculator interface I suggest doing

mutable struct Forces{T}
    f::T
 # add builder that makes sure f is either nothing or valid force array
end

# allocating call
F_allocating = Forces(nothing)  # or simply F()
calculate(F_allocating, sys, calc, ...; kwargs...)

# non allocating call
F_non_allocating = Forces( zero_forces(sys, calc) )
calculate(F_non_allocating, sys, calc, ...; kwargs...)

This would need an changes in calculator implementation and we would need to change the macro etc.

@tjjarvinen
Copy link
Collaborator

tjjarvinen commented May 25, 2024

You could also write a macro that would do something like

@forces sys calc f_pre_allocated (args...) (kwargs...)

@cortner
Copy link
Member

cortner commented May 25, 2024

Part of my proposal was to avoid having too many interface functions. if it were just about forces!, where would not be an issue, but we also need energy_forces! and energy_virial_forces! and so forth ... My impression is that all of the alternative suggestions increase complexity rather than reduce it. Both AtomsBase and AtomsCalculators are already overwhelming and need to simplify them rather than making them more complex. And once we get to the calculate interface it gets even more complex.

First, I don't see the problem of requiring all calculators to accept the frc keyword argument. Why can we not ask that?

But secondly, I think one could even do this without that requirement. Once can simply make it clear in the interface that there is no guarantee that the returned forces are the same array as was passed in and therefore to require usage as follows

# Implementation 
function forces( sys, calc, ...;   frc = zero_forces(sys, calc),  kwargs...) 
     # ... 
     return frc 
end 

# allocating usage 
frc = forces(sys, calc) 

# non-allocating usage 
frc_alc = # allocate by whatever means, e.g. bumper
frc = forces(sys, calc; frc = frc_alc) 

and analogous usage of forces_virial and of energy_forces_virial.

I actually think this doesn't even require any change to the interface, only documentation and can be entirely optional.

To be clear I think it would be kind of a corner-case usage where the allocation of the force vector creates a measurable slow-down. But I have encountered such cases and would therefore like to have that option if possible without creating too much additional complexity.

@tjjarvinen
Copy link
Collaborator

I actually think this doesn't even require any change to the interface, only documentation and can be entirely optional.

Yes, you can implement that with the current implementation. That is completely valid, but optional.

@tjjarvinen
Copy link
Collaborator

Agreed. Do you think that what is currently being done in AtomsCalculators.jl/src/submodules/AtomsCalculatorsTesting.jl covers this?

It seems to me that these tests cover both low-level and high-level calls for energy, forces and virial.

It covers it well with the current interface. But the new calculator interface returns state and that has to be taken into account. Also, some kind of tests that it is possible to have state and parameters, as arguments need to be implemented too.

@CedricTravelletti
Copy link
Collaborator Author

Agreed. Do you think that what is currently being done in AtomsCalculators.jl/src/submodules/AtomsCalculatorsTesting.jl covers this?
It seems to me that these tests cover both low-level and high-level calls for energy, forces and virial.

It covers it well with the current interface. But the new calculator interface returns state and that has to be taken into account. Also, some kind of tests that it is possible to have state and parameters, as arguments need to be implemented too.

Ok, then I'll include those tests.

@CedricTravelletti
Copy link
Collaborator Author

To summarize the discussion up to now:

  1. It seems that we have reached a consensus on using the Lux model for parameters and state. And that the primary objective of this PR is thus adressed.

So, the whole point here is that you can do it easy, and if you need the extra features the low level interface offers, then you can put the extra effort to implement it.

  1. The (intended) way to use this new interface should be documented more thoroughly.

  2. There is still some debate concerning allocating vs non-allocating call, and how have combined calls (e.g. energy + forces + virial).

While I agree that 3. is important, I feel it is a separate issue, and that it can be somehow adressed by using states.
Indeed, for most calculators, if you want energy AND forces, what will be done under the hood is to first compute energies and use the resulting state of that calculation to compute forces. I thus tinkg that having forces! and states is sufficient for the moment.

@CedricTravelletti
Copy link
Collaborator Author

@cortner I am available on coming Wednesday or Thursday for a call.

@cortner
Copy link
Member

cortner commented May 27, 2024

I'll look at my schedule and get back to you on Zulip.

Regarding "most" - you are thinking of electronic structure. In the empirical and ML potentials world this is not normally done.

parameters through `potential_energy`, `forces`, `forces!` and `virial` and low-level calls
with user-specifiable parameters through `calculate`.

The low-level `calculate` interface follows the [Lux](https://lux.csail.mit.edu/stable/) model for parameters and state. This means that calculators are **immutable** structs that are passed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculators will NOT be immutable. But they need to ACT as immutable. This is a subtle but important difference.

@cortner
Copy link
Member

cortner commented May 30, 2024

@CedricTravelletti -- following up from our discussion ... to make it easier for me to contribute to this PR (especially docs), please create a new lux branch in the current REPO, redirect the PR to that branch. I will then merge it. And then we create a new PR to target main. I think it will also be good to restart the discussion with a clean slate, as there is now a lot of noise here.

@CedricTravelletti CedricTravelletti changed the base branch from master to lux May 30, 2024 09:11
@CedricTravelletti
Copy link
Collaborator Author

@cortner Ready for branch transfer. I will add yesterday's meeting notes to the new PR discussion.

@cortner cortner merged commit ac04750 into JuliaMolSim:lux May 30, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants