-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Julia error running DA explicit water training #88
Comments
I think this is due to TS as both ConfigurationSet and logic variable. I had the same problem with the previous script and the current one. I fixed it by changing two variables: generate_init_configs(n, bulk_water_logic=True, TS_logic=True). I can see the generated water_sys containing TS for the pure water system. |
Dear all, just to clarify, during which section or machine-learning potential of the training did you experience this issue? |
@Hanwen1018 I believe this is related to DA_paper/training/explicit/endo_ace_ex.py. My own testing/reading of the code led me think this error might be due to this part. def generate_init_configs(n, bulk_water=True, TS=True): |
Would you mind testing the following code: ts_in_water_init = generate_init_configs(n=10, bulk_water=True, TS=True) To check whether the configurations can be generated and saved? |
thanks @Hanwen1018! I tested the following codes following your suggestions: `if name == 'main':
ts_in_water_init_config.xyz looks perfect but there are problems with water_sys_init_config.xyz - I believe that is the error @tanoury1 and I met (I had the same issue in the earlier version). Even though it is a water sys but water_sys_init_config.xyz contains TS (see below). When running Julia to fit the potential, the input *.jl file only has two element for water sys - I think that is what the error was complaining about. Hopefully I understood the code correctly. So I modified bulk_water and TS 109 |
It is identical to your example under DA_paper: endo_ace_ex.py but I changed the main part to the following for the testing. if name == 'main':
|
I saw your modification. It is correct. Now, the function should be def generate_init_configs(n, bulk_water_logic=True, TS_logic=True):
"""generate initial configuration to train potential
it can generate three sets (pure water, TS immersed in water and TS bounded two water molecules)
of initial configuration by modify the boolean variables
n: number of init_configs
bulk_water: whether to include a solution
TS: whether to include the TS of the reaction in the system"""
init_configs = mlt.ConfigurationSet()
TS = mlt.ConfigurationSet()
TS.load_xyz(filename='cis_endo_TS_wB97M.xyz', charge=0, mult=1)
TS = TS[0]
TS.box = Box([11, 11, 11])
TS.charge = 0
TS.mult = 1
if bulk_water_logic:
# TS immersed in a water box
if TS_logic:
water_mol = mlt.Molecule(name='h2o.xyz')
water_system = mlt.System(water_mol, box=Box([11, 11, 11]))
water_system.add_molecules(water_mol, num=43)
for i in range(n):
solvated = solvation(
solute_config=TS,
solvent_config=water_system.random_configuration(),
apm=3,
radius=1.7,
)
init_configs.append(solvated)
# pure water box
else:
water_mol = mlt.Molecule(name='h2o.xyz')
water_system = mlt.System(water_mol, box=Box([9.32, 9.32, 9.32]))
water_system.add_molecules(water_mol, num=26)
for i in range(n):
pure_water = water_system.random_configuration()
init_configs.append(pure_water)
# TS bounded with two water molecules at carbonyl group to form hydrogen bond
else:
assert TS_logic is True, 'cannot generate initial configuration'
for i in range(n):
TS_with_water = add_water(solute=TS, n=2)
init_configs.append(TS_with_water)
# Change the box of system to extermely large to imitate cluster system
# the box is needed for ACE potential
for config in init_configs:
config.box = Box([100, 100, 100])
return init_configs Would you mind testing this function for pure water configurations? |
Yes! that is what I did - it worked. There are a few places in the script (in the main) which need to be fixed. Thanks for your help! We are really interested in using this package. |
Thank you for testing and letting us know the bugs. We will fix them. If you have any question, pls contact us |
OK. That worked. I edited the file. Needed to make an additional edit (unless I missed something) updated bulk_water and TS as you noted. Don't know if you can help me with my next issues: IPFitting is not precompiling when running install_ace.py. I tried to do it manually in Julia, but still no luck. The ACE folks seem to be moving away from IPFitting and replacing it with ACEfitting (or something like that). Did you experience the same issues with IPFitting? What version of Julia are you using? Thanks, |
I am using julia-1.7.1 but it always complains about IPFitting... but it seems to run afterwards. |
OK. Then it must be something else. Below if the full, and lengthy, error output: 2024-03-15 18:20:04 hpchead mlptrain.log[313412] WARNING Save called without defining what energy and forces to print. Had true energies to using those Jerry |
Hi, please try to install "JuLIP", version="0.10.1"; "ACE", version="0.8.4"; and also "IPFitting", version="0.5.0". Also would you mind sharing the water_sys.jl with us? |
Yep. I've got those exact versions installed. I'm running Julia 1.10. water_sys.jl attached. Also, perhaps it may be an ACE error. Near the top of the full error output, there is: RuntimeError: ACE train errored with: |
This might have something to do with my Julia version. I installed v. 1.7.1 using 'juliaup add 1.7.1', then adding the packages again for this version of Julia. I got further along running water_sys.jl directly from command line, but got a 'ERROR: LoadError: MethodError: no method matching pretty_table'. I'm going do a fresh conda install of mlptrain-ace with julia 1.7.1 being my default version and see how things go. |
Hi, how is your reinstallation going? If it still doesn't work, you can try to train with MACE potential first, and we can prepare an installation script for you at the meantime. |
Hi. It did not work for me with v1.10.0. Jerry |
I am getting an error, but I don't know if it means anything. The job continues to run: 2024-03-20 13:37:23 hpchead autode.log.log[297226] INFO Getting gradients from tmp_orca.out |
If this is related with TS_in_water system, I noticed that there is another potential problem in the script: the generated (10) configs don't have exactly the same atom as it is packing water within a box. I hardcoded to include those with the same number of waters into the valid configs otherwise discarded it. (see #57 (comment)) |
Hi, the potential can try system with different atomic size. I want to double check, during which section or machine-learning potential of the training did you experience this issue? Did you obtain trained potentials? |
I don't have a trained potential yet. The training is still running. I have 49 dataset files is the datasets directory. Not sure how to determine how close I am to convergence. |
That is correct - I had to separate training (pure water, TS in water etc) and combine later due the the HPC queue time. This works, which means they can be trained on the datasets with different number of atoms. But weirdly for the training on TS_in_water only, I had this issue and I harded code to select only those with the same number of waters. Sorry to injecting my query into the discussions. |
Do you have anything *_al.xyz or *_al.npz file? I just want to clarify whether the issues you have come from starting training of other potential |
Oh, this is because of the recently implementation of Metadynamics in AL. Even in this case, the MTD doesn't apply; the configurations need to pass the MTD bias check first, which requires the same size of configurations. I will update the example code. |
I have water_sys_al.xyz and water_sys_al.npz files. So, things seem to be going as expected. |
Hi, looks like I may have run into a numpy error. I am using numpy version 1.26.4. I've attached the entire output from the training. To make sure my endo_ace_ex.py is correct (after making the edits we discussed above), here is the portion where the code errored out (I think....): generate sub training set of pure water system by AL training
|
My error is identical to @tanoury1 's. You can see from their output that pure water worked and it happened after training TS_in_water MLP based on 10 snapshots. This problem will go away if I hardcoded in |
Your MLP for pure water worked though! So I doubt that is because of numpy. |
Hi, I think the issue
Is indeed coming from the numpy, but it is triggered only when the training set contains structures with different numbers of molecules. I have corrected it in PR #89, hopefully it will be enough. Please test the update. Eventually, try to decrease the version of numpy. As my PR currently fails the tests, I will need to check a bit more in detail what is happening in the arrays. |
I was able to complete the training with the updated code, using numpy 1.26.4. To confirm, here is the list of the files in my explicit directory: It took 5.5 days to get all the training done. Does that timing sound correct? I ran on 60 cores. Jerry |
Add optional box, energy and force loading to load_xyz() in ConfigurationSet. Add dtype object for np arrays to allow loading structures with a different number of atoms. (Solves one of the issues in #88 ) Add test for plotting function and functions in ConfigurationSet.
Hi,
I updated mlptrain and ran through the DA explicit solvent example. Everything was going fine until I got to Julia. Which version of Julia should I be using? I'm currently using 1.6.7.
Progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:20
┌─────────────┬───────┬───────┬───────┬───────┬───────┐
│ config_type │ #cfgs │ #envs │ #E │ #F │ #V │
│ String │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────────────┼───────┼───────┼───────┼───────┼───────┤
│ nothing │ 10 │ 1186 │ 10 │ 3558 │ 0 │
├─────────────┼───────┼───────┼───────┼───────┼───────┤
│ total │ 10 │ 1186 │ 10 │ 3558 │ 0 │
│ missing │ 0 │ 0 │ 0 │ 0 │ 90 │
└─────────────┴───────┴───────┴───────┴───────┴───────┘
The total number of basis functions is
length(B) = 6424
Assemble LSQ blocks in serial
ERROR: LoadError: z = <6> not found in ZList AtomicNumber[<1>, <8>]
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] z2i
@ ~/.julia/packages/JuLIP/KNi0Z/src/potentials_base.jl:156 [inlined]
[3] z2i
@ ~/.julia/packages/JuLIP/KNi0Z/src/potentials_base.jl:161 [inlined]
[4] _Bidx0(pB::PolyPairBasis{ACE.OrthPolys.TransformedPolys{Float64, PolyTransform{Int64, Float64}, ACE.OrthPolys.OrthPolyBasis{Float64}}, 2}, zi::AtomicNumber, zj::AtomicNumber)
@ ACE.PairPotentials ~/.julia/packages/ACE/OVgdR/src/pairpots/pair_basis.jl:91
[5] energy(pB::PolyPairBasis{ACE.OrthPolys.TransformedPolys{Float64, PolyTransform{Int64, Float64}, ACE.OrthPolys.OrthPolyBasis{Float64}}, 2}, at::Atoms{Float64})
@ ACE.PairPotentials ~/.julia/packages/ACE/OVgdR/src/pairpots/pair_basis.jl:100
[6] (::JuLIP.MLIPs.var"#13#14"{Atoms{Float64}})(B::PolyPairBasis{ACE.OrthPolys.TransformedPolys{Float64, PolyTransform{Int64, Float64}, ACE.OrthPolys.OrthPolyBasis{Float64}}, 2})
@ JuLIP.MLIPs ./none:0
[7] iterate
@ ./generator.jl:47 [inlined]
[8] collect(itr::Base.Generator{Vector{JuLIP.MLIPs.IPBasis}, JuLIP.MLIPs.var"#13#14"{Atoms{Float64}}})
@ Base ./array.jl:681
[9] energy(superB::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, at::Atoms{Float64})
@ JuLIP.MLIPs ~/.julia/packages/JuLIP/KNi0Z/src/mlips.jl:141
[10] eval_obs(#unused#::Val{:E}, B::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, dat::Dat)
@ IPFitting.DataTypes ~/.julia/packages/IPFitting/Ypo4v/src/datatypes.jl:28
[11] eval_obs(::String, ::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, ::Dat)
@ IPFitting.DataTypes ~/.julia/packages/IPFitting/Ypo4v/src/datatypes.jl:13
[12] safe_append!(db::LsqDB, db_lock::Base.Threads.SpinLock, cfg::Dat, okey::String)
@ IPFitting.DB ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:270
[13] #9
@ ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:182 [inlined]
[14] #7
@ ~/.julia/packages/IPFitting/Ypo4v/src/obsiter.jl:98 [inlined]
[15] tfor(f::IPFitting.var"#7#9"{Vector{Dat}, IPFitting.DB.var"#9#10"{LsqDB}, Base.Threads.SpinLock, Vector{String}, Vector{Int64}}, rg::UnitRange{Int64}; verbose::Bool, msg::String, costs::Vector{Int64}, maxnthreads::Int64)
@ IPFitting.Tools ~/.julia/packages/IPFitting/Ypo4v/src/tools.jl:22
[16] tfor_observations(configs::Vector{Dat}, callback::IPFitting.DB.var"#9#10"{LsqDB}; verbose::Bool, msg::String, maxnthreads::Int64)
@ IPFitting ~/.julia/packages/IPFitting/Ypo4v/src/obsiter.jl:98
[17] LsqDB(dbpath::String, basis::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, configs::Vector{Dat}; verbose::Bool, maxnthreads::Int64)
@ IPFitting.DB ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:181
[18] LsqDB(dbpath::String, basis::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, configs::Vector{Dat})
@ IPFitting.DB ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:177
[19] top-level scope
@ ~/Duarte-codes/mlp-train/examples/DA_paper/training/explicit/water_sys.jl:59
in expression starting at /cluster/home/tanoury/Duarte-codes/mlp-train/examples/DA_paper/training/explicit/water_sys.jl:59
All the best,
Jerry
The text was updated successfully, but these errors were encountered: