Julia error running DA explicit water training #88

tanoury1 · 2024-03-12T14:50:34Z

Hi,
I updated mlptrain and ran through the DA explicit solvent example. Everything was going fine until I got to Julia. Which version of Julia should I be using? I'm currently using 1.6.7.

Progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:20
┌─────────────┬───────┬───────┬───────┬───────┬───────┐
│ config_type │ #cfgs │ #envs │ #E │ #F │ #V │
│ String │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────────────┼───────┼───────┼───────┼───────┼───────┤
│ nothing │ 10 │ 1186 │ 10 │ 3558 │ 0 │
├─────────────┼───────┼───────┼───────┼───────┼───────┤
│ total │ 10 │ 1186 │ 10 │ 3558 │ 0 │
│ missing │ 0 │ 0 │ 0 │ 0 │ 90 │
└─────────────┴───────┴───────┴───────┴───────┴───────┘
The total number of basis functions is
length(B) = 6424
Assemble LSQ blocks in serial
ERROR: LoadError: z = <6> not found in ZList AtomicNumber[<1>, <8>]
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] z2i
@ ~/.julia/packages/JuLIP/KNi0Z/src/potentials_base.jl:156 [inlined]
[3] z2i
@ ~/.julia/packages/JuLIP/KNi0Z/src/potentials_base.jl:161 [inlined]
[4] _Bidx0(pB::PolyPairBasis{ACE.OrthPolys.TransformedPolys{Float64, PolyTransform{Int64, Float64}, ACE.OrthPolys.OrthPolyBasis{Float64}}, 2}, zi::AtomicNumber, zj::AtomicNumber)
@ ACE.PairPotentials ~/.julia/packages/ACE/OVgdR/src/pairpots/pair_basis.jl:91
[5] energy(pB::PolyPairBasis{ACE.OrthPolys.TransformedPolys{Float64, PolyTransform{Int64, Float64}, ACE.OrthPolys.OrthPolyBasis{Float64}}, 2}, at::Atoms{Float64})
@ ACE.PairPotentials ~/.julia/packages/ACE/OVgdR/src/pairpots/pair_basis.jl:100
[6] (::JuLIP.MLIPs.var"#13#14"{Atoms{Float64}})(B::PolyPairBasis{ACE.OrthPolys.TransformedPolys{Float64, PolyTransform{Int64, Float64}, ACE.OrthPolys.OrthPolyBasis{Float64}}, 2})
@ JuLIP.MLIPs ./none:0
[7] iterate
@ ./generator.jl:47 [inlined]
[8] collect(itr::Base.Generator{Vector{JuLIP.MLIPs.IPBasis}, JuLIP.MLIPs.var"#13#14"{Atoms{Float64}}})
@ Base ./array.jl:681
[9] energy(superB::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, at::Atoms{Float64})
@ JuLIP.MLIPs ~/.julia/packages/JuLIP/KNi0Z/src/mlips.jl:141
[10] eval_obs(#unused#::Val{:E}, B::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, dat::Dat)
@ IPFitting.DataTypes ~/.julia/packages/IPFitting/Ypo4v/src/datatypes.jl:28
[11] eval_obs(::String, ::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, ::Dat)
@ IPFitting.DataTypes ~/.julia/packages/IPFitting/Ypo4v/src/datatypes.jl:13
[12] safe_append!(db::LsqDB, db_lock::Base.Threads.SpinLock, cfg::Dat, okey::String)
@ IPFitting.DB ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:270
[13] #9
@ ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:182 [inlined]
[14] #7
@ ~/.julia/packages/IPFitting/Ypo4v/src/obsiter.jl:98 [inlined]
[15] tfor(f::IPFitting.var"#7#9"{Vector{Dat}, IPFitting.DB.var"#9#10"{LsqDB}, Base.Threads.SpinLock, Vector{String}, Vector{Int64}}, rg::UnitRange{Int64}; verbose::Bool, msg::String, costs::Vector{Int64}, maxnthreads::Int64)
@ IPFitting.Tools ~/.julia/packages/IPFitting/Ypo4v/src/tools.jl:22
[16] tfor_observations(configs::Vector{Dat}, callback::IPFitting.DB.var"#9#10"{LsqDB}; verbose::Bool, msg::String, maxnthreads::Int64)
@ IPFitting ~/.julia/packages/IPFitting/Ypo4v/src/obsiter.jl:98
[17] LsqDB(dbpath::String, basis::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, configs::Vector{Dat}; verbose::Bool, maxnthreads::Int64)
@ IPFitting.DB ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:181
[18] LsqDB(dbpath::String, basis::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, configs::Vector{Dat})
@ IPFitting.DB ~/.julia/packages/IPFitting/Ypo4v/src/lsq_db.jl:177
[19] top-level scope
@ ~/Duarte-codes/mlp-train/examples/DA_paper/training/explicit/water_sys.jl:59
in expression starting at /cluster/home/tanoury/Duarte-codes/mlp-train/examples/DA_paper/training/explicit/water_sys.jl:59

All the best,
Jerry

physorgchem · 2024-03-14T05:08:03Z

I think this is due to TS as both ConfigurationSet and logic variable. I had the same problem with the previous script and the current one. I fixed it by changing two variables: generate_init_configs(n, bulk_water_logic=True, TS_logic=True).

I can see the generated water_sys containing TS for the pure water system.
ERROR: LoadError: z = <6> not found in ZList AtomicNumber[<1>, <8>]

Hanwen1018 · 2024-03-15T09:03:56Z

Dear all, just to clarify, during which section or machine-learning potential of the training did you experience this issue?

physorgchem · 2024-03-15T11:08:44Z

@Hanwen1018 I believe this is related to DA_paper/training/explicit/endo_ace_ex.py. My own testing/reading of the code led me think this error might be due to this part.

def generate_init_configs(n, bulk_water=True, TS=True):
"""generate initial configuration to train potential
it can generate three sets (pure water, TS immersed in water and TS bounded two water molecules)
of initial configuration by modify the boolean variables
n: number of init_configs
bulk_water: whether to include a solution
TS: whether to include the TS of the reaction in the system"""
init_configs = mlt.ConfigurationSet()
TS = mlt.ConfigurationSet()
TS.load_xyz(filename='cis_endo_TS_wB97M.xyz', charge=0, mult=1)
TS = TS[0]
TS.box = Box([11, 11, 11])
TS.charge = 0
TS.mult = 1

Hanwen1018 · 2024-03-15T11:59:02Z

Would you mind testing the following code:

ts_in_water_init = generate_init_configs(n=10, bulk_water=True, TS=True)
ts_in_water_init.save_xyz(filename = 'init_config.xyz')

To check whether the configurations can be generated and saved?

physorgchem · 2024-03-15T12:24:21Z

thanks @Hanwen1018!

I tested the following codes following your suggestions:

`if name == 'main':
water_mol = mlt.Molecule(name='h2o.xyz')
ts_mol = mlt.Molecule(name='cis_endo_TS_wB97M.xyz')

ts_in_water_init = generate_init_configs(n=10, bulk_water=True, TS=True)
ts_in_water_init.save_xyz(filename = 'ts_in_water_init_config.xyz')

water_sys_init = generate_init_configs(n=10, bulk_water=True, TS=False)
water_sys_init.save_xyz(filename = 'water_sys_init_config.xyz')`

ts_in_water_init_config.xyz looks perfect but there are problems with water_sys_init_config.xyz - I believe that is the error @tanoury1 and I met (I had the same issue in the earlier version).

Even though it is a water sys but water_sys_init_config.xyz contains TS (see below). When running Julia to fit the potential, the input *.jl file only has two element for water sys - I think that is what the error was complaining about.

Hopefully I understood the code correctly. So I modified bulk_water and TS def generate_init_configs(n, bulk_water=True, TS=True) as within generate_init_configs TS is used as ConfigurationSet as well.

109
Lattice="100.000000 0.000000 0.000000 0.000000 100.000000 0.000000 0.000000 0.000000 100.000000"
C 5.68346 7.18986 6.64891
C 4.87191 6.17740 7.41838
C 5.76285 5.17128 7.80558
H 3.99704 6.44672 7.99201
C 6.91477 5.25013 7.02144
H 5.53927 4.37712 8.50069
C 6.78320 6.30828 6.13985
H 7.70367 4.51517 7.00683
H 7.53838 6.63622 5.44209
H 6.11653 7.89680 7.36627
H 5.14232 7.75235 5.89481
C 5.10394 5.32733 4.80340
C 4.05770 5.36098 5.71525
H 3.29633 6.12216 5.61504
H 3.72848 4.41837 6.12557
C 5.82878 4.07143 4.59485
H 5.23046 6.11091 4.06977
O 5.66512 3.10913 5.32338
C 6.80456 4.02397 3.44137
H 7.47298 4.88474 3.46615
H 6.25537 4.06411 2.49931
H 7.37889 3.10320 3.48301
O 1.03627 7.68908 1.96254
H 1.43769 7.72623 2.86703
H 0.30603 8.35745 1.98754
O 2.24636 9.66622 7.89843
H 3.18021 9.97683 8.00821

physorgchem · 2024-03-15T12:36:56Z

It is identical to your example under DA_paper: endo_ace_ex.py but I changed the main part to the following for the testing.

if name == 'main':
water_mol = mlt.Molecule(name='h2o.xyz')
ts_mol = mlt.Molecule(name='cis_endo_TS_wB97M.xyz')

ts_in_water_init = generate_init_configs(n=10, bulk_water=True, TS=True)
ts_in_water_init.save_xyz(filename = 'ts_in_water_init_config.xyz')

water_sys_init = generate_init_configs(n=10, bulk_water=True, TS=False)
water_sys_init.save_xyz(filename = 'water_sys_init_config.xyz')

Hanwen1018 · 2024-03-15T12:40:25Z

I saw your modification. It is correct. Now, the function should be

def generate_init_configs(n, bulk_water_logic=True, TS_logic=True):
    """generate initial configuration to train potential
    it can generate three sets (pure water, TS immersed in water and TS bounded two water molecules)
    of initial configuration by modify the boolean variables
    n: number of init_configs
    bulk_water: whether to include a solution
    TS: whether to include the TS of the reaction in the system"""
    init_configs = mlt.ConfigurationSet()
    TS = mlt.ConfigurationSet()
    TS.load_xyz(filename='cis_endo_TS_wB97M.xyz', charge=0, mult=1)
    TS = TS[0]
    TS.box = Box([11, 11, 11])
    TS.charge = 0
    TS.mult = 1

    if bulk_water_logic:
        # TS immersed in a water box
        if TS_logic:
            water_mol = mlt.Molecule(name='h2o.xyz')
            water_system = mlt.System(water_mol, box=Box([11, 11, 11]))
            water_system.add_molecules(water_mol, num=43)
            for i in range(n):
                solvated = solvation(
                    solute_config=TS,
                    solvent_config=water_system.random_configuration(),
                    apm=3,
                    radius=1.7,
                )
                init_configs.append(solvated)

        # pure water box
        else:
            water_mol = mlt.Molecule(name='h2o.xyz')
            water_system = mlt.System(water_mol, box=Box([9.32, 9.32, 9.32]))
            water_system.add_molecules(water_mol, num=26)

            for i in range(n):
                pure_water = water_system.random_configuration()
                init_configs.append(pure_water)

    # TS bounded with two water molecules at carbonyl group to form hydrogen bond
    else:
        assert TS_logic is True, 'cannot generate initial configuration'
        for i in range(n):
            TS_with_water = add_water(solute=TS, n=2)
            init_configs.append(TS_with_water)

    # Change the box of system to extermely large to imitate cluster system
    # the box is needed for ACE potential
    for config in init_configs:
        config.box = Box([100, 100, 100])
    return init_configs

Would you mind testing this function for pure water configurations?

physorgchem · 2024-03-15T12:45:46Z

Yes! that is what I did - it worked. There are a few places in the script (in the main) which need to be fixed.

Thanks for your help! We are really interested in using this package.

Hanwen1018 · 2024-03-15T12:48:15Z

Thank you for testing and letting us know the bugs. We will fix them. If you have any question, pls contact us

tanoury1 · 2024-03-16T16:20:19Z

OK. That worked. I edited the file. Needed to make an additional edit (unless I missed something) updated bulk_water and TS as you noted.
Traceback (most recent call last):
File "/cluster/home/tanoury/Duarte-codes/mlp-train/examples/DA_paper/training/explicit/endo_ace_ex.py", line 279, in
water_init = generate_init_configs(n=10, bulk_water=True, TS=False)
TypeError: generate_init_configs() got an unexpected keyword argument 'bulk_water'

Don't know if you can help me with my next issues: IPFitting is not precompiling when running install_ace.py. I tried to do it manually in Julia, but still no luck. The ACE folks seem to be moving away from IPFitting and replacing it with ACEfitting (or something like that). Did you experience the same issues with IPFitting? What version of Julia are you using?

Thanks,
Jerry

physorgchem · 2024-03-17T01:50:55Z

I am using julia-1.7.1 but it always complains about IPFitting... but it seems to run afterwards.

tanoury1 · 2024-03-18T15:43:39Z

OK. Then it must be something else. Below if the full, and lengthy, error output:

2024-03-15 18:20:04 hpchead mlptrain.log[313412] WARNING Save called without defining what energy and forces to print. Had true energies to using those
2024-03-15 18:20:04 hpchead mlptrain.log[313412] INFO Training an ACE potential on 10 training data
2024-03-15 18:20:07 hpchead mlptrain.log[313412] INFO ACE training ran in 0.1 m
Traceback (most recent call last):
File "/cluster/home/tanoury/Duarte-codes/mlp-train/examples/DA_paper/training/explicit/endo_ace_ex.py", line 280, in
Water_mlp.al_train(
File "/cluster/home/tanoury/Duarte-codes/mlp-train/mlptrain/potentials/_base.py", line 196, in al_train
al_train(self, method_name=method_name, **kwargs)
File "/cluster/home/tanoury/Duarte-codes/mlp-train/mlptrain/training/active.py", line 165, in train
mlp.train()
File "/cluster/home/tanoury/Duarte-codes/mlp-train/mlptrain/potentials/_base.py", line 66, in train
self._train()
File "/cluster/home/tanoury/Duarte-codes/mlp-train/mlptrain/potentials/ace/ace.py", line 58, in _train
raise RuntimeError(f'ACE train errored with:\n{err.decode()}\n')
RuntimeError: ACE train errored with:
ERROR: LoadError: Creating a new global in closed module __toplevel__ (##meta#58) breaks incremental compilation because the side effects will not be permanent.
Stacktrace:
[1] top-level scope
@ none:1
[2] eval
@ ./boot.jl:385 [inlined]
[3] initmeta(m::Module)
@ Base.Docs ./docs/Docs.jl:85
[4] doc!(module::Module, b::Base.Docs.Binding, str::Base.Docs.DocStr, sig::Any)
@ Base.Docs ./docs/Docs.jl:235
[5] top-level scope
@ ~/.julia/packages/IPFitting/Ypo4v/src/IPFitting.jl:1
[6] include
@ ./Base.jl:495 [inlined]
[7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
@ Base ./loading.jl:2222
[8] top-level scope
@ stdin:3
in expression starting at /cluster/home/tanoury/.julia/packages/IPFitting/Ypo4v/src/IPFitting.jl:1
in expression starting at stdin:3
ERROR: LoadError: Failed to precompile IPFitting [3002bd4c-79e4-52ce-b924-91256dde4e52] to "/cluster/home/tanoury/.julia/compiled/v1.10/IPFitting/jl_FPpKSS".
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
@ Base ./loading.jl:2468
[3] compilecache
@ ./loading.jl:2340 [inlined]
[4] (::Base.var"#968#969"{Base.PkgId})()
@ Base ./loading.jl:1974
[5] mkpidlock(f::Base.var"#968#969"{Base.PkgId}, at::String, pid::Int32; kwopts::@kwargs{stale_age::Int64, wait::Bool})
@ FileWatching.Pidfile ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/FileWatching/src/pidfile.jl:93
[6] #mkpidlock#6
@ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/FileWatching/src/pidfile.jl:88 [inlined]
[7] trymkpidlock(::Function, ::Vararg{Any}; kwargs::@kwargs{stale_age::Int64})
@ FileWatching.Pidfile ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/FileWatching/src/pidfile.jl:111
[8] #invokelatest#2
@ ./essentials.jl:894 [inlined]
[9] invokelatest
@ ./essentials.jl:889 [inlined]
[10] maybe_cachefile_lock(f::Base.var"#968#969"{Base.PkgId}, pkg::Base.PkgId, srcpath::String; stale_age::Int64)
@ Base ./loading.jl:2983
[11] maybe_cachefile_lock
@ ./loading.jl:2980 [inlined]
[12] _require(pkg::Base.PkgId, env::String)
@ Base ./loading.jl:1970
[13] __require_prelocked(uuidkey::Base.PkgId, env::String)
@ Base ./loading.jl:1812
[14] #invoke_in_world#3
@ ./essentials.jl:926 [inlined]
[15] invoke_in_world
@ ./essentials.jl:923 [inlined]
[16] _require_prelocked(uuidkey::Base.PkgId, env::String)
@ Base ./loading.jl:1803
[17] macro expansion
@ ./loading.jl:1790 [inlined]
[18] macro expansion
@ ./lock.jl:267 [inlined]
[19] __require(into::Module, mod::Symbol)
@ Base ./loading.jl:1753
[20] #invoke_in_world#3
@ ./essentials.jl:926 [inlined]
[21] invoke_in_world
@ ./essentials.jl:923 [inlined]
[22] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1746
in expression starting at /cluster/home/tanoury/Duarte-codes/mlp-train/examples/DA_paper/training/explicit/water_sys.jl:1

Jerry

Hanwen1018 · 2024-03-19T07:56:30Z

Hi, please try to install "JuLIP", version="0.10.1"; "ACE", version="0.8.4"; and also "IPFitting", version="0.5.0". Also would you mind sharing the water_sys.jl with us?

tanoury1 · 2024-03-19T10:43:18Z

Yep. I've got those exact versions installed. I'm running Julia 1.10.

water_sys.jl attached.

water_sys.jl.txt

Also, perhaps it may be an ACE error. Near the top of the full error output, there is:

RuntimeError: ACE train errored with:
ERROR: LoadError: Creating a new global in closed module __toplevel__ (##meta#58) breaks incremental compilation because the side effects will not be permanent.

tanoury1 · 2024-03-19T12:04:02Z

This might have something to do with my Julia version. I installed v. 1.7.1 using 'juliaup add 1.7.1', then adding the packages again for this version of Julia.

I got further along running water_sys.jl directly from command line, but got a 'ERROR: LoadError: MethodError: no method matching pretty_table'. I'm going do a fresh conda install of mlptrain-ace with julia 1.7.1 being my default version and see how things go.

Hanwen1018 · 2024-03-20T13:02:11Z

Hi, how is your reinstallation going? If it still doesn't work, you can try to train with MACE potential first, and we can prepare an installation script for you at the meantime.

tanoury1 · 2024-03-20T13:24:55Z

Hi.
Endo_ace_ex.py is working. Horray!! The training has been going for about 18 hr. After making the edits to the script (bulk_water_logic and TS_logic), I had to rebuild the ace conda environment with Julia 1.7.1. That was the secret. It seems this only works with v1.7.1.

It did not work for me with v1.10.0.

Jerry

tanoury1 · 2024-03-20T17:41:04Z

I am getting an error, but I don't know if it means anything. The job continues to run:

2024-03-20 13:37:23 hpchead autode.log.log[297226] INFO Getting gradients from tmp_orca.out
2024-03-20 13:37:23 hpchead autode.log.log[297226] INFO Getting energy from tmp_orca.out
2024-03-20 13:37:23 hpchead autode.log.log[297226] INFO Checking for tmp_orca.out normal termination
2024-03-20 13:37:23 hpchead autode.log.log[297226] INFO orca terminated normally
2024-03-20 13:37:23 hpchead autode.log.log[297226] INFO Getting atomic charges from tmp_orca.out
2024-03-20 13:37:24 hpchead mlptrain.log[104027] INFO PLUMED coordinates not defined - returning None
2024-03-20 13:37:24 hpchead mlptrain.log[104027] ERROR predicted forces not defined - returning None
2024-03-20 13:37:24 hpchead mlptrain.log[104027] WARNING Not adding basis functions on H
2024-03-20 13:37:24 hpchead mlptrain.log[104027] WARNING Save called without defining what energy and forces to print. Had true energies to using those

physorgchem · 2024-03-20T21:15:08Z

If this is related with TS_in_water system, I noticed that there is another potential problem in the script: the generated (10) configs don't have exactly the same atom as it is packing water within a box. I hardcoded to include those with the same number of waters into the valid configs otherwise discarded it. (see #57 (comment))

Hanwen1018 · 2024-03-21T12:57:54Z

Hi, the potential can try system with different atomic size. I want to double check, during which section or machine-learning potential of the training did you experience this issue? Did you obtain trained potentials?

tanoury1 · 2024-03-21T14:46:29Z

I don't have a trained potential yet. The training is still running. I have 49 dataset files is the datasets directory. Not sure how to determine how close I am to convergence.

physorgchem · 2024-03-21T21:43:40Z

Hi, the potential can try system with different atomic size. I want to double check, during which section or machine-learning potential of the training did you experience this issue? Did you obtain trained potentials?

That is correct - I had to separate training (pure water, TS in water etc) and combine later due the the HPC queue time. This works, which means they can be trained on the datasets with different number of atoms.

But weirdly for the training on TS_in_water only, I had this issue and I harded code to select only those with the same number of waters. Sorry to injecting my query into the discussions.

Hanwen1018 · 2024-03-22T06:58:16Z

I don't have a trained potential yet. The training is still running. I have 49 dataset files is the datasets directory. Not sure how to determine how close I am to convergence.

Do you have anything *_al.xyz or *_al.npz file? I just want to clarify whether the issues you have come from starting training of other potential

Hanwen1018 · 2024-03-22T07:00:46Z

But weirdly for the training on TS_in_water only, I had this issue and I harded code to select only those with the same number of waters. Sorry to injecting my query into the discussions.

Oh, this is because of the recently implementation of Metadynamics in AL. Even in this case, the MTD doesn't apply; the configurations need to pass the MTD bias check first, which requires the same size of configurations. I will update the example code.

tanoury1 · 2024-03-22T10:25:06Z

I have water_sys_al.xyz and water_sys_al.npz files. So, things seem to be going as expected.

tanoury1 · 2024-03-23T16:14:49Z

Hi, looks like I may have run into a numpy error. I am using numpy version 1.26.4.

I've attached the entire output from the training.
endo.output.txt

To make sure my endo_ace_ex.py is correct (after making the edits we discussed above), here is the portion where the code errored out (I think....):

generate sub training set of pure water system by AL training

water_system = mlt.System(water_mol, box=Box([100, 100, 100]))
water_system.add_molecules(water_mol, num=26)
Water_mlp = mlt.potentials.ACE('water_sys', water_system)
water_init = generate_init_configs(n=10, bulk_water_logic=True, TS_logic=False)
Water_mlp.al_train(
    method_name='orca',
    selection_method=AtomicEnvSimilarity(),
    fix_init_config=True,
    init_configs=water_init,
    max_active_time=5000,
)

# generate sub training set of TS in water system by AL training
ts_in_water = mlt.System(ts_mol, box=Box([100, 100, 100]))
ts_in_water.add_molecules(water_mol, num=40)
ts_in_water_mlp = mlt.potentials.ACE('TS_in_water', ts_in_water)
ts_in_water_init = generate_init_configs(n=10, bulk_water_logic=True, TS_logic=True)
ts_in_water_mlp.al_train(
    method_name='orca',
    selection_method=AtomicEnvSimilarity(),
    fix_init_config=True,
    init_configs=ts_in_water_init,
    max_active_time=5000,
)

# generate sub training set of TS with two water system by AL training
ts_2water = mlt.System(ts_mol, box=Box([100, 100, 100]))
ts_2water.add_molecules(water_mol, num=2)
ts_2water_mlp = mlt.potentials.ACE('TS_2water', ts_2water)
ts_2water_init = generate_init_configs(n=10, bulk_water_logic=False, TS_logic=True)
ts_2water_mlp.al_train(
    method_name='orca',
    selection_method=AtomicEnvSimilarity(),
    fix_init_config=True,
    init_configs=ts_2water_init,
    max_active_time=5000,
)

physorgchem · 2024-03-24T01:23:47Z

But weirdly for the training on TS_in_water only, I had this issue and I harded code to select only those with the same number of waters. Sorry to injecting my query into the discussions.

Oh, this is because of the recently implementation of Metadynamics in AL. Even in this case, the MTD doesn't apply; the configurations need to pass the MTD bias check first, which requires the same size of configurations. I will update the example code.

My error is identical to @tanoury1 's. You can see from their output that pure water worked and it happened after training TS_in_water MLP based on 10 snapshots. This problem will go away if I hardcoded in generate_init_configs to include only those with a fixed number of waters (the most probable one by sampling a large number of configs).

#88 (comment)

physorgchem · 2024-03-24T01:24:55Z

Hi, looks like I may have run into a numpy error. I am using numpy version 1.26.4.

I've attached the entire output from the training. endo.output.txt

To make sure my endo_ace_ex.py is correct (after making the edits we discussed above), here is the portion where the code errored out (I think....):

generate sub training set of pure water system by AL training

water_system = mlt.System(water_mol, box=Box([100, 100, 100]))
water_system.add_molecules(water_mol, num=26)
Water_mlp = mlt.potentials.ACE('water_sys', water_system)
water_init = generate_init_configs(n=10, bulk_water_logic=True, TS_logic=False)
Water_mlp.al_train(
    method_name='orca',
    selection_method=AtomicEnvSimilarity(),
    fix_init_config=True,
    init_configs=water_init,
    max_active_time=5000,
)

# generate sub training set of TS in water system by AL training
ts_in_water = mlt.System(ts_mol, box=Box([100, 100, 100]))
ts_in_water.add_molecules(water_mol, num=40)
ts_in_water_mlp = mlt.potentials.ACE('TS_in_water', ts_in_water)
ts_in_water_init = generate_init_configs(n=10, bulk_water_logic=True, TS_logic=True)
ts_in_water_mlp.al_train(
    method_name='orca',
    selection_method=AtomicEnvSimilarity(),
    fix_init_config=True,
    init_configs=ts_in_water_init,
    max_active_time=5000,
)

# generate sub training set of TS with two water system by AL training
ts_2water = mlt.System(ts_mol, box=Box([100, 100, 100]))
ts_2water.add_molecules(water_mol, num=2)
ts_2water_mlp = mlt.potentials.ACE('TS_2water', ts_2water)
ts_2water_init = generate_init_configs(n=10, bulk_water_logic=False, TS_logic=True)
ts_2water_mlp.al_train(
    method_name='orca',
    selection_method=AtomicEnvSimilarity(),
    fix_init_config=True,
    init_configs=ts_2water_init,
    max_active_time=5000,
)

Your MLP for pure water worked though! So I doubt that is because of numpy.

juraskov · 2024-03-25T21:57:06Z

Hi, I think the issue

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (20,) + inhomogeneous part.

Is indeed coming from the numpy, but it is triggered only when the training set contains structures with different numbers of molecules. I have corrected it in PR #89, hopefully it will be enough. Please test the update. Eventually, try to decrease the version of numpy. As my PR currently fails the tests, I will need to check a bit more in detail what is happening in the arrays.

tanoury1 · 2024-04-01T10:33:10Z

I was able to complete the training with the updated code, using numpy 1.26.4. To confirm, here is the list of the files in my explicit directory:
cis_endo_TS_wB97M.xyz
datasets
endo_ace_ex.py
endo_in_water_ace_wB97M.json
h2o.xyz
TS_2water_al.npz
TS_2water_al.xyz
TS_2water.json
TS_gasphase_al.npz
TS_gasphase_al.xyz
TS_gasphase.json
TS_in_water_al.npz
TS_in_water_al.xyz
TS_in_water.json
water_sys_al.npz
water_sys_al.xyz
water_sys.json

It took 5.5 days to get all the training done. Does that timing sound correct? I ran on 60 cores.

Jerry

Add optional box, energy and force loading to load_xyz() in ConfigurationSet. Add dtype object for np arrays to allow loading structures with a different number of atoms. (Solves one of the issues in #88 ) Add test for plotting function and functions in ConfigurationSet.

juraskov mentioned this issue Mar 25, 2024

Numpy update #89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Julia error running DA explicit water training #88

Julia error running DA explicit water training #88

tanoury1 commented Mar 12, 2024

physorgchem commented Mar 14, 2024

Hanwen1018 commented Mar 15, 2024

physorgchem commented Mar 15, 2024 •

edited

Loading

Hanwen1018 commented Mar 15, 2024

physorgchem commented Mar 15, 2024 •

edited

Loading

physorgchem commented Mar 15, 2024

Hanwen1018 commented Mar 15, 2024 •

edited

Loading

physorgchem commented Mar 15, 2024 •

edited

Loading

Hanwen1018 commented Mar 15, 2024

tanoury1 commented Mar 16, 2024

physorgchem commented Mar 17, 2024 •

edited

Loading

tanoury1 commented Mar 18, 2024

Hanwen1018 commented Mar 19, 2024 •

edited

Loading

tanoury1 commented Mar 19, 2024

tanoury1 commented Mar 19, 2024

Hanwen1018 commented Mar 20, 2024

tanoury1 commented Mar 20, 2024

tanoury1 commented Mar 20, 2024

physorgchem commented Mar 20, 2024

Hanwen1018 commented Mar 21, 2024

tanoury1 commented Mar 21, 2024

physorgchem commented Mar 21, 2024

Hanwen1018 commented Mar 22, 2024

Hanwen1018 commented Mar 22, 2024 •

edited

Loading

tanoury1 commented Mar 22, 2024

tanoury1 commented Mar 23, 2024

physorgchem commented Mar 24, 2024 •

edited

Loading

physorgchem commented Mar 24, 2024

generate sub training set of pure water system by AL training

juraskov commented Mar 25, 2024 •

edited

Loading

tanoury1 commented Apr 1, 2024

Julia error running DA explicit water training #88

Julia error running DA explicit water training #88

Comments

tanoury1 commented Mar 12, 2024

physorgchem commented Mar 14, 2024

Hanwen1018 commented Mar 15, 2024

physorgchem commented Mar 15, 2024 • edited Loading

Hanwen1018 commented Mar 15, 2024

physorgchem commented Mar 15, 2024 • edited Loading

physorgchem commented Mar 15, 2024

Hanwen1018 commented Mar 15, 2024 • edited Loading

physorgchem commented Mar 15, 2024 • edited Loading

Hanwen1018 commented Mar 15, 2024

tanoury1 commented Mar 16, 2024

physorgchem commented Mar 17, 2024 • edited Loading

tanoury1 commented Mar 18, 2024

Hanwen1018 commented Mar 19, 2024 • edited Loading

tanoury1 commented Mar 19, 2024

tanoury1 commented Mar 19, 2024

Hanwen1018 commented Mar 20, 2024

tanoury1 commented Mar 20, 2024

tanoury1 commented Mar 20, 2024

physorgchem commented Mar 20, 2024

Hanwen1018 commented Mar 21, 2024

tanoury1 commented Mar 21, 2024

physorgchem commented Mar 21, 2024

Hanwen1018 commented Mar 22, 2024

Hanwen1018 commented Mar 22, 2024 • edited Loading

tanoury1 commented Mar 22, 2024

tanoury1 commented Mar 23, 2024

generate sub training set of pure water system by AL training

physorgchem commented Mar 24, 2024 • edited Loading

physorgchem commented Mar 24, 2024

generate sub training set of pure water system by AL training

juraskov commented Mar 25, 2024 • edited Loading

tanoury1 commented Apr 1, 2024

physorgchem commented Mar 15, 2024 •

edited

Loading

physorgchem commented Mar 15, 2024 •

edited

Loading

Hanwen1018 commented Mar 15, 2024 •

edited

Loading

physorgchem commented Mar 15, 2024 •

edited

Loading

physorgchem commented Mar 17, 2024 •

edited

Loading

Hanwen1018 commented Mar 19, 2024 •

edited

Loading

Hanwen1018 commented Mar 22, 2024 •

edited

Loading

physorgchem commented Mar 24, 2024 •

edited

Loading

juraskov commented Mar 25, 2024 •

edited

Loading