Add WiSE electrolyte benchmark (density, X-ray S(q), Li-O RDF)#445
Add WiSE electrolyte benchmark (density, X-ray S(q), Li-O RDF)#445LucaBrugnoli wants to merge 3 commits intoddmms:mainfrom
Conversation
|
@joehart2001 The code and analysis are complete. The remaining step I think is uploading the trajectory data to |
Hi @LucaBrugnoli thanks for the PR! The easiest way is to attach a zip file containing your data to this PR and I can upload it. Hopefully that works |
|
Here are the 6 data files, one per model.
The expected directory structure on S3 is: Let me know if anything looks off or if you need the data in a different format mace-mh-1-omat.zip |
Three sub-benchmarks for 21 m LiTFSI/H2O with 6 MLIP models (matpes-r2scan, mace-mpa-0-medium, mace-omat-0-medium, mace-mp-0b3, mace-mh-1-omat, mace-mh-1-omol): - NPT density vs Gilbert et al. JCED 2017 - X-ray structure factor S(q) vs SAXS experiment - Li-O RDF coordination numbers vs Watanabe et al. JPCB 2021
dba4e00 to
ef5aecc
Compare
| return results | ||
|
|
||
|
|
||
| def normalize_metric(value: float, good: float, bad: float) -> float: |
There was a problem hiding this comment.
this will be taken care of when you build the table using the decorator. you could define your own function if you dont want to use our defualt, see the docs
| APP_ROOT = Path(__file__).resolve().parents[3] / "app" | ||
| OUT_PATH = APP_ROOT / "data" / "wise_electrolytes" / "density" | ||
|
|
||
| MODELS = [ |
There was a problem hiding this comment.
usually we import this to get all the models. this is better for in the future when we have more models than now.
from ml_peg.models.get_models import load_models
from ml_peg.models.models import current_models
| # --- Metrics table ----------------------------------------------------------- | ||
|
|
||
|
|
||
| def build_metrics_table(data: dict[str, dict]) -> dict: |
There was a problem hiding this comment.
we have decorators to build this automatically, see the tutorial
|
Hi @LucaBrugnoli thanks for the PR! From what i understand, you've provided us with NVT trajectories for each model and then your calc script takes these trajectories to calculate e.g. the rdf etc. Thank you for providing these trajectories! However, to make this test not rely on you computing these for each new model that is added in the future (as im sure you would prefer us running them for you!), I think we need to do some reshuffling. The ideal workflow would be:
analysis
app
|
- Merge density / rdf / xray_sf into a single litfsi_h2o_21m benchmark
under ml_peg/{calcs,analysis,app}/wise_electrolytes/litfsi_h2o_21m/.
- Adopt the standard ml-peg patterns in the analysis and calc scripts:
load_models(current_models) for model discovery, @build_table for
the metrics table, and metrics.yml for thresholds/tooltips/weights.
- Add a Janus recast of the LAMMPS+symmetrix Adastra production protocol
in ml_peg/calcs/wise_electrolytes/md_reference/calc_md_reference.py
(pytest-skipped reference; documents the exact MD parameters).
- Add the docs page docs/source/user_guide/benchmarks/wise_electrolytes.rst
(and toctree entry) and wire the app docs_url to it.
- Update the parent wise_electrolytes.yml to a single benchmark weight.
There was a problem hiding this comment.
Thanks for adding this reference!
Apologies if you've stated this somewhere and I missed it, but how confident are you that this reproduces the LAMMPS dynamics?
I would suggest we don't use pytest.skip, and instead mark them as very slow (@pytest.mark.very_slow). This already requires users to explicitly request it, with the understanding that they will likely take a minimum of several hours on GPU, so we would run a single model at a time.
Since we can't guarantee all models can be run through LAMMPS/with additional acceleration, we would expect to run this test for several, if not all, of the models using the ASE calculator.
While of course it will be considerably slower, it means we can guarantee we can run every model, and that the settings are identical for each model.
We already have several tests that take over of day of GPU time, including #388, so this isn't a problem - it's a high, but one-off cost.
- Switch NPT (Melchionna) -> NPT_MTK (Martyna-Tobias-Klein) so the janus reference matches the LAMMPS fix npt formulation used in production. Pass thermostat_chain=3 and barostat_chain=3 explicitly, matching the LAMMPS default chain length. - Replace pytest.mark.skip with pytest.mark.very_slow per Joseph's review on PR ddmms#445: the reference protocol is now opt-in via --run-very-slow rather than unconditionally skipped, so it can be exercised when validating new models.
|
Thanks for the code updates! I think this test could also be well suited in the molecular dynamics category instead of its own one as its quite specific, opinions @ElliottKasoar? |
|
Thanks for both messages, I'll let you and @ElliottKasoar decide on the location; happy with whichever directory you prefer. |
Pre-review checklist for PR author
Summary
New benchmark for 21 molal LiTFSI/H₂O water-in-salt electrolyte (WiSE), evaluating MLIP foundation models on three
experimental observables:
(2017)
(2021). Computed via dynasor.
(2021)
Further details on the simulation protocol and MLIP assessment for this system: L. Brugnoli, arXiv:2603.22099
(2026).
Linked issue
Resolves #304
Progress
Note: Trajectory data (~500 MB, extxyz) needs to be uploaded to the ml-peg S3 bucket. Data files are available on
request.
Testing
Tested on 6 models:
matpes-r2scan,mace-mpa-0-medium,mace-omat-0-medium,mace-mp-0b3,mace-mh-1-omat,mace-mh-1-omol.Requirement: ASE < 3.28 (3.28.0 has a bug in
ase.io.extxyz.ixyzchunks). Tests run with--noconftest.New decorators/callbacks
No new callbacks required. The RDF app uses existing
plot_from_table_columnfrom ml-peg utils.