Skip to content

Add WiSE electrolyte benchmark (density, X-ray S(q), Li-O RDF)#445

Draft
LucaBrugnoli wants to merge 3 commits intoddmms:mainfrom
LucaBrugnoli:wise-electrolytes-benchmark
Draft

Add WiSE electrolyte benchmark (density, X-ray S(q), Li-O RDF)#445
LucaBrugnoli wants to merge 3 commits intoddmms:mainfrom
LucaBrugnoli:wise-electrolytes-benchmark

Conversation

@LucaBrugnoli
Copy link
Copy Markdown

@LucaBrugnoli LucaBrugnoli commented Apr 3, 2026

Pre-review checklist for PR author

Summary

New benchmark for 21 molal LiTFSI/H₂O water-in-salt electrolyte (WiSE), evaluating MLIP foundation models on three
experimental observables:

  1. Density — NPT equilibrium density vs Gilbert et al., JCED 62, 2056
    (2017)
  2. X-ray S(q) — Structure factor R-factor vs SAXS data, Zhang et al., JPCB 125, 4501
    (2021)
    . Computed via dynasor.
  3. Li-O RDF — Coordination numbers (Li-O_water, Li-O_TFSI) vs Watanabe et al., JPCB 125, 7477
    (2021)

Further details on the simulation protocol and MLIP assessment for this system: L. Brugnoli, arXiv:2603.22099
(2026)
.

Linked issue

Resolves #304

Progress

  • Calculations
  • Analysis
  • Application
  • Documentation

Note: Trajectory data (~500 MB, extxyz) needs to be uploaded to the ml-peg S3 bucket. Data files are available on
request.

Testing

Tested on 6 models: matpes-r2scan, mace-mpa-0-medium, mace-omat-0-medium, mace-mp-0b3, mace-mh-1-omat,
mace-mh-1-omol.

Requirement: ASE < 3.28 (3.28.0 has a bug in ase.io.extxyz.ixyzchunks). Tests run with --noconftest.

New decorators/callbacks

No new callbacks required. The RDF app uses existing plot_from_table_column from ml-peg utils.

@LucaBrugnoli
Copy link
Copy Markdown
Author

@joehart2001 The code and analysis are complete. The remaining step I think is uploading the trajectory data to
the S3 bucket. Could you let me know how to proceed with the upload, please?

@joehart2001
Copy link
Copy Markdown
Collaborator

@joehart2001 The code and analysis are complete. The remaining step I think is uploading the trajectory data to the S3 bucket. Could you let me know how to proceed with the upload, please?

Hi @LucaBrugnoli thanks for the PR! The easiest way is to attach a zip file containing your data to this PR and I can upload it. Hopefully that works

@LucaBrugnoli
Copy link
Copy Markdown
Author

Here are the 6 data files, one per model.
Each zip contains:

  • nvt_trajectory.extxyz — 501-frame NVT trajectory (50 ps, p64_w170 cell, 1534 atoms), used for both the X-ray S(q) and
    RDF benchmarks
  • density.json — NPT equilibrium density data (p16_w42 cell, 50 ps), used for the density benchmark

The expected directory structure on S3 is:
wise_electrolytes/xray_sf/{model}/nvt_trajectory.extxyz
wise_electrolytes/density/{model}/density.json

Let me know if anything looks off or if you need the data in a different format

mace-mh-1-omat.zip
mace-mh-1-omol.zip
mace-mp-0b3.zip
mace-mpa-0-medium.zip
mace-omat-0-medium.zip
matpes-r2scan.zip

@ElliottKasoar ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Apr 9, 2026
Three sub-benchmarks for 21 m LiTFSI/H2O with 6 MLIP models
(matpes-r2scan, mace-mpa-0-medium, mace-omat-0-medium, mace-mp-0b3,
mace-mh-1-omat, mace-mh-1-omol):
- NPT density vs Gilbert et al. JCED 2017
- X-ray structure factor S(q) vs SAXS experiment
- Li-O RDF coordination numbers vs Watanabe et al. JPCB 2021
@LucaBrugnoli LucaBrugnoli force-pushed the wise-electrolytes-benchmark branch from dba4e00 to ef5aecc Compare April 16, 2026 17:40
return results


def normalize_metric(value: float, good: float, bad: float) -> float:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be taken care of when you build the table using the decorator. you could define your own function if you dont want to use our defualt, see the docs

APP_ROOT = Path(__file__).resolve().parents[3] / "app"
OUT_PATH = APP_ROOT / "data" / "wise_electrolytes" / "density"

MODELS = [
Copy link
Copy Markdown
Collaborator

@joehart2001 joehart2001 Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we import this to get all the models. this is better for in the future when we have more models than now.

from ml_peg.models.get_models import load_models
from ml_peg.models.models import current_models

# --- Metrics table -----------------------------------------------------------


def build_metrics_table(data: dict[str, dict]) -> dict:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have decorators to build this automatically, see the tutorial

@joehart2001
Copy link
Copy Markdown
Collaborator

Hi @LucaBrugnoli thanks for the PR! From what i understand, you've provided us with NVT trajectories for each model and then your calc script takes these trajectories to calculate e.g. the rdf etc. Thank you for providing these trajectories! However, to make this test not rely on you computing these for each new model that is added in the future (as im sure you would prefer us running them for you!), I think we need to do some reshuffling.

The ideal workflow would be:
calc:

  • this is where your reproducable NVT script would go, and you can look at examples such as PR Aqueous Iron Chloride Oxidation States #360 (direct link to line) for an easy way to do this with e.g. Janus (used for many of our MD benchmarks).
  • To keep things modular and separated, we like to keep anything that could be considered analysis in the anlysis scripts only. The PR linked computes the rdf as part of the calc, so you could do this too, or separate it out into the analysis script. This format will mean you will only have one directory for your benchmark instead of three.

analysis

  • ive left some comments about using our helpers and decorators to help be consistent and also keep up with future changes to table builidng etc.

app

  • overall looks good, just need to add the docs link later. do you think structure visualisation would be useful in this benchmark? if so we could help add that

- Merge density / rdf / xray_sf into a single litfsi_h2o_21m benchmark
  under ml_peg/{calcs,analysis,app}/wise_electrolytes/litfsi_h2o_21m/.
- Adopt the standard ml-peg patterns in the analysis and calc scripts:
  load_models(current_models) for model discovery, @build_table for
  the metrics table, and metrics.yml for thresholds/tooltips/weights.
- Add a Janus recast of the LAMMPS+symmetrix Adastra production protocol
  in ml_peg/calcs/wise_electrolytes/md_reference/calc_md_reference.py
  (pytest-skipped reference; documents the exact MD parameters).
- Add the docs page docs/source/user_guide/benchmarks/wise_electrolytes.rst
  (and toctree entry) and wire the app docs_url to it.
- Update the parent wise_electrolytes.yml to a single benchmark weight.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this reference!

Apologies if you've stated this somewhere and I missed it, but how confident are you that this reproduces the LAMMPS dynamics?

I would suggest we don't use pytest.skip, and instead mark them as very slow (@pytest.mark.very_slow). This already requires users to explicitly request it, with the understanding that they will likely take a minimum of several hours on GPU, so we would run a single model at a time.

Since we can't guarantee all models can be run through LAMMPS/with additional acceleration, we would expect to run this test for several, if not all, of the models using the ASE calculator.

While of course it will be considerably slower, it means we can guarantee we can run every model, and that the settings are identical for each model.

We already have several tests that take over of day of GPU time, including #388, so this isn't a problem - it's a high, but one-off cost.

- Switch NPT (Melchionna) -> NPT_MTK (Martyna-Tobias-Klein) so the
  janus reference matches the LAMMPS fix npt formulation used in
  production. Pass thermostat_chain=3 and barostat_chain=3 explicitly,
  matching the LAMMPS default chain length.
- Replace pytest.mark.skip with pytest.mark.very_slow per Joseph's
  review on PR ddmms#445: the reference protocol is now opt-in via
  --run-very-slow rather than unconditionally skipped, so it can be
  exercised when validating new models.
@joehart2001
Copy link
Copy Markdown
Collaborator

Thanks for the code updates! I think this test could also be well suited in the molecular dynamics category instead of its own one as its quite specific, opinions @ElliottKasoar?

@LucaBrugnoli
Copy link
Copy Markdown
Author

LucaBrugnoli commented Apr 28, 2026

Thanks for both messages, I'll let you and @ElliottKasoar decide on the location; happy with whichever directory you prefer.
On the LAMMPS/ASE reproducibility question: I'm currently running the protocol on a cluster with GPUs MI250X, with mace-mp-0b3 and mace-matpes-r2scan to validate it empirically.
I've tried to match the integrator formulations exactly, NVT_NH for LAMMPS fix nvt (Nosé–Hoover chain, length 3) and NPT_MTK for fix npt iso (Martyna–Tobias–Klein chain, length 3), with the same TDAMP=50 fs / PDAMP=500 fs and dt=0.5 fs.
Even so, I don't expect bit-for-bit trajectory agreement: the LAMMPS production used the SymmetriX/Kokkos MACE kernels, while ASE/janus uses the reference PyTorch path; combined with different RNG-seeded initial velocities, chaotic divergence is unavoidable. What should agree, within sampling uncertainty, are the statistical observables, i.e the equilibrium density, RDFs, coordination numbers, and S(q).
I'll update the PR as soon as the runs are done with the LAMMPS/ASE comparison and the wallclock cost per ns of MD on a single MI250X GCD, so we have a realistic estimate for running the marker across the registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new benchmark Proposals and suggestions for new benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Water-in-salt electrolytes

3 participants