first commit

qiqb-osaka · Dec 16, 2024 · e11ff40 · e11ff40
commit e11ff40
Show file tree

Hide file tree

Showing 5 changed files with 302 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 TShiotaSS
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,137 @@
+# MACE_Osaka24 models
+This repository provides the model and training scripts for a multi-domain universal machine learning interatomic potentials (MLIPs), the MACE-Osaka24 models, capable of accurately describing both crystalline and molecular domains.
+
+The MACE-Osaka24 model is a universal MLIP trained on datasets of both crystals and molecules, which were generated using a dataset integration technique called "Total Energy Alignment" that combines first-principles calculations under various conditions. 
+
+Its architecture is based on the first-generation MACE model. To use the models please install the [MACE code](https://github.com/ACEsuit/mace).
+
+## Models
+
+The first generation of models are available in the [MACE_Osaka24](https://github.com/TShiotaSS/mace_osaka24/releases/tag/v0.0.1).
+
+If you use the models please cite
+
+```bib
+@article{batatia2023foundation,
+      title={A foundation model for atomistic materials chemistry},
+      author={Ilyes Batatia and Philipp Benner and Yuan Chiang and Alin M. Elena and Dávid P. Kovács and Janosh Riebesell and Xavier R. Advincula and Mark Asta and William J. Baldwin and Noam Bernstein and Arghya Bhowmik and Samuel M. Blau and Vlad Cărare and James P. Darby and Sandip De and Flaviano Della Pia and Volker L. Deringer and Rokas Elijošius and Zakariya El-Machachi and Edvin Fako and Andrea C. Ferrari and Annalena Genreith-Schriever and Janine George and Rhys E. A. Goodall and Clare P. Grey and Shuang Han and Will Handley and Hendrik H. Heenen and Kersti Hermansson and Christian Holm and Jad Jaafar and Stephan Hofmann and Konstantin S. Jakob and Hyunwook Jung and Venkat Kapil and Aaron D. Kaplan and Nima Karimitari and Namu Kroupa and Jolla Kullgren and Matthew C. Kuner and Domantas Kuryla and Guoda Liepuoniute and Johannes T. Margraf and Ioan-Bogdan Magdău and Angelos Michaelides and J. Harry Moore and Aakash A. Naik and Samuel P. Niblett and Sam Walton Norwood and Niamh O'Neill and Christoph Ortner and Kristin A. Persson and Karsten Reuter and Andrew S. Rosen and Lars L. Schaaf and Christoph Schran and Eric Sivonxay and Tamás K. Stenczel and Viktor Svahn and Christopher Sutton and Cas van der Oord and Eszter Varga-Umbrich and Tejs Vegge and Martin Vondrák and Yangshuai Wang and William C. Witt and Fabian Zills and Gábor Csányi},
+      year={2023},
+      eprint={2401.00096},
+      archivePrefix={arXiv},
+      primaryClass={physics.chem-ph}
+}
+
+@article{deng2023chgnet,
+      title={CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling},
+      author={Bowen Deng and Peichen Zhong and KyuJung Jun and Janosh Riebesell and Kevin Han and Christopher J. Bartel and Gerbrand Ceder},
+      year={2023},
+      eprint={2302.14231},
+      archivePrefix={arXiv},
+      primaryClass={cond-mat.mtrl-sci}
+}
+
+@inproceedings{NEURIPS2022_4a36c3c5,
+ author = {Batatia, Ilyes and Kovacs, David P and Simm, Gregor and Ortner, Christoph and Csanyi, Gabor},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
+ pages = {11423--11436},
+ publisher = {Curran Associates, Inc.},
+ title = {MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/4a36c3c51af11ed9f34615b81edb5bbc-Paper-Conference.pdf},
+ volume = {35},
+ year = {2022}
+}
+```
+
+## Training scripts
+
+We provide training scripts for the models in this repository. The latest training command line is found in [`mace_osaka24/mace-osaka24-large.sh`](mace_osaka24/mace-osaka24-large.sh).
+
+## Training data
+
+The integrated inorganic–organic domain dataset used to train the models—composed of the inorganic MPtrj dataset and the organic SPICE, QMug, water clusters, and Tripeptides (OFF23) datasets—is available at [figshare](). If you use any of these datasets, please cite the following paper.
+
+```bib
+@article{deng2023chgnet,
+      title={CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling},
+      author={Bowen Deng and Peichen Zhong and KyuJung Jun and Janosh Riebesell and Kevin Han and Christopher J. Bartel and Gerbrand Ceder},
+      year={2023},
+      eprint={2302.14231},
+      archivePrefix={arXiv},
+      primaryClass={cond-mat.mtrl-sci}
+}
+
+@misc{kovacs2023maceoff23,
+      title={MACE-OFF23: Transferable Machine Learning Force Fields for Organic Molecules}, 
+      author={Dávid Péter Kovács and J. Harry Moore and Nicholas J. Browning and Ilyes Batatia and Joshua T. Horton and Venkat Kapil and William C. Witt and Ioan-Bogdan Magdău and Daniel J. Cole and Gábor Csányi},
+      year={2023},
+      eprint={2312.15211},
+      archivePrefix={arXiv},
+}
+
+
+@article{eastman2023spice,
+  title={Spice, a dataset of drug-like molecules and peptides for training machine learning potentials},
+  author={Eastman, Peter and Behara, Pavan Kumar and Dotson, David L and Galvelis, Raimondas and Herr, John E and Horton, Josh T and Mao, Yuezhi and Chodera, John D and Pritchard, Benjamin P and Wang, Yuanqing and others},
+  journal={Scientific Data},
+  volume={10},
+  number={1},
+  pages={11},
+  year={2023},
+  publisher={Nature Publishing Group UK London}
+}
+
+@article{donchev2021quantum,
+  title={Quantum chemical benchmark databases of gold-standard dimer interaction energies},
+  author={Donchev, Alexander G and Taube, Andrew G and Decolvenaere, Elizabeth and Hargus, Cory and McGibbon, Robert T and Law, Ka-Hei and Gregersen, Brent A and Li, Je-Luen and Palmo, Kim and Siva, Karthik and others},
+  journal={Scientific data},
+  volume={8},
+  number={1},
+  pages={55},
+  year={2021},
+  publisher={Nature Publishing Group UK London}
+}
+
+@article{isert2022qmugs,
+  title={QMugs, quantum mechanical properties of drug-like molecules},
+  author={Isert, Clemens and Atz, Kenneth and Jim{\'e}nez-Luna, Jos{\'e} and Schneider, Gisbert},
+  journal={Scientific Data},
+  volume={9},
+  number={1},
+  pages={273},
+  year={2022},
+  publisher={Nature Publishing Group UK London}
+}
+```
+
+## Example
+
+In this example, the energy of a silicon crystal and acetic acid is calculated using universal multi-domain MLIP MACE-Osaka24 and Atomic Simulation Environment (ASE).
+
+```python
+from ase.build import bulk
+from ase.build import molecule
+from mace.calculators import MACECalculator
+
+si = bulk('Si', 'diamond', a=5.43)
+calculator = MACECalculator(model_path='/path-to-mace-osaka24/mace-osaka24-large.model', device='cpu')
+si.calc = calculator 
+
+energy_si = si.get_potential_energy()
+print("Single-point energy of diamond Si:", energy_si)
+
+acid = molecule('CH3COOH')
+calculator = MACECalculator(model_path='/path-to-mace-osaka24/mace-osaka24-large.model', device='cpu')
+acid.calc = calculator 
+
+energy_acid = acid.get_potential_energy()
+print("Single-point energy of acetic acid:", energy_acid)
+```
+
+## Contributors
+This project was developed by:
+
+- Tomoya Shiota (@TShiotaSS) 
+- Kenji Ishihara (@kenji-ishihara-os)  
+- Toshio Mori (@forest1040)
+- Wataru Mizukami (@wmizukami)
diff --git a/mace_osaka24/mace-osaka24-large.sh b/mace_osaka24/mace-osaka24-large.sh
@@ -0,0 +1,48 @@
+python3 /opt/src/new_mace_0729/mace/mace/cli/run_train.py \
+        --name="${MODEL_NAME}" \
+        --train_file="/dataset/train" \
+        --valid_file="/dataset/val" \
+        --test_file="/dataset/test" \
+        --statistics_file="/dataset/statistics.json" \
+        --loss='universal' \
+        --energy_weight=1 \
+        --forces_weight=10 \
+        --compute_stress=True \
+        --stress_weight=100 \
+        --stress_key='stress' \
+        --eval_interval=1 \
+        --error_table='PerAtomMAE' \
+        --model="ScaleShiftMACE" \
+        --interaction_first="RealAgnosticResidualInteractionBlock" \
+        --interaction="RealAgnosticResidualInteractionBlock" \
+        --num_interactions=2 \
+        --correlation=3 \
+        --max_ell=3 \
+        --r_max=4.5 \
+        --max_L=2 \
+        --num_channels=128 \
+        --num_radial_basis=10 \
+        --MLP_irreps="16x0e" \
+        --scaling='rms_forces_scaling' \
+        --num_workers=64 \
+        --lr=0.005 \
+        --weight_decay=1e-8 \
+        --ema \
+        --ema_decay=0.995 \
+        --scheduler_patience=5 \
+        --batch_size=16 \
+        --valid_batch_size=32 \
+        --max_num_epochs=200 \
+        --patience=50 \
+        --amsgrad \
+        --device=cuda \
+        --seed=1 \
+        --clip_grad=100 \
+        --keep_checkpoints \
+        --save_cpu \
+	--restart_latest \
+        --log_dir="/mnt/logs/${MODEL_NAME}" \
+        --model_dir="/mnt/models/${MODEL_NAME}" \
+        --checkpoints_dir="/mnt/checkpoints/${MODEL_NAME}" \
+        --results_dir="/mnt/results/${MODEL_NAME}" \
+        --distributed >> ${RESULT_FILE}
diff --git a/mace_osaka24/mace-osaka24-medium.sh b/mace_osaka24/mace-osaka24-medium.sh
@@ -0,0 +1,48 @@
+python3 /opt/src/new_mace_0729/mace/mace/cli/run_train.py \
+        --name="${MODEL_NAME}" \
+        --train_file="/dataset/train" \
+        --valid_file="/dataset/val" \
+        --test_file="/dataset/test" \
+        --statistics_file="/dataset/statistics.json" \
+        --loss='universal' \
+        --energy_weight=1 \
+        --forces_weight=10 \
+        --compute_stress=True \
+        --stress_weight=100 \
+        --stress_key='stress' \
+        --eval_interval=1 \
+        --error_table='PerAtomMAE' \
+        --model="ScaleShiftMACE" \
+        --interaction_first="RealAgnosticResidualInteractionBlock" \
+        --interaction="RealAgnosticResidualInteractionBlock" \
+        --num_interactions=2 \
+        --correlation=3 \
+        --max_ell=3 \
+        --r_max=4.5 \
+        --max_L=1 \
+        --num_channels=128 \
+        --num_radial_basis=10 \
+        --MLP_irreps="16x0e" \
+        --scaling='rms_forces_scaling' \
+        --num_workers=64 \
+        --lr=0.005 \
+        --weight_decay=1e-8 \
+        --ema \
+        --ema_decay=0.995 \
+        --scheduler_patience=5 \
+        --batch_size=16 \
+        --valid_batch_size=32 \
+        --max_num_epochs=200 \
+        --patience=50 \
+        --amsgrad \
+        --device=cuda \
+        --seed=1 \
+        --clip_grad=100 \
+        --keep_checkpoints \
+        --save_cpu \
+	--restart_latest \
+        --log_dir="/mnt/logs/${MODEL_NAME}" \
+        --model_dir="/mnt/models/${MODEL_NAME}" \
+        --checkpoints_dir="/mnt/checkpoints/${MODEL_NAME}" \
+        --results_dir="/mnt/results/${MODEL_NAME}" \
+        --distributed >> ${RESULT_FILE}
diff --git a/mace_osaka24/mace-osaka24-small.sh b/mace_osaka24/mace-osaka24-small.sh
@@ -0,0 +1,48 @@
+python3 /opt/src/new_mace_0729/mace/mace/cli/run_train.py \
+        --name="${MODEL_NAME}" \
+        --train_file="/dataset/train" \
+        --valid_file="/dataset/val" \
+        --test_file="/dataset/test" \
+        --statistics_file="/dataset/statistics.json" \
+        --loss='universal' \
+        --energy_weight=1 \
+        --forces_weight=10 \
+        --compute_stress=True \
+        --stress_weight=100 \
+        --stress_key='stress' \
+        --eval_interval=1 \
+        --error_table='PerAtomMAE' \
+        --model="ScaleShiftMACE" \
+        --interaction_first="RealAgnosticResidualInteractionBlock" \
+        --interaction="RealAgnosticResidualInteractionBlock" \
+        --num_interactions=2 \
+        --correlation=3 \
+        --max_ell=3 \
+        --r_max=4.5 \
+        --max_L=0 \
+        --num_channels=128 \
+        --num_radial_basis=10 \
+        --MLP_irreps="16x0e" \
+        --scaling='rms_forces_scaling' \
+        --num_workers=64 \
+        --lr=0.005 \
+        --weight_decay=1e-8 \
+        --ema \
+        --ema_decay=0.995 \
+        --scheduler_patience=5 \
+        --batch_size=16 \
+        --valid_batch_size=32 \
+        --max_num_epochs=200 \
+        --patience=50 \
+        --amsgrad \
+        --device=cuda \
+        --seed=1 \
+        --clip_grad=100 \
+        --keep_checkpoints \
+        --save_cpu \
+	--restart_latest \
+        --log_dir="/mnt/logs/${MODEL_NAME}" \
+        --model_dir="/mnt/models/${MODEL_NAME}" \
+        --checkpoints_dir="/mnt/checkpoints/${MODEL_NAME}" \
+        --results_dir="/mnt/results/${MODEL_NAME}" \
+        --distributed >> ${RESULT_FILE}