Add WiSE electrolyte benchmark (density, X-ray S(q), Li-O RDF) by LucaBrugnoli · Pull Request #445 · ddmms/ml-peg

LucaBrugnoli · 2026-04-03T11:06:01Z

Pre-review checklist for PR author

I've confirmed the contribution guidelines.

Summary

New benchmark for 21 molal LiTFSI/H₂O water-in-salt electrolyte (WiSE), evaluating MLIP foundation models on three
experimental observables:

Density — NPT equilibrium density vs Gilbert et al., JCED 62, 2056
(2017)
X-ray S(q) — Structure factor R-factor vs SAXS data, Zhang et al., JPCB 125, 4501
(2021). Computed via dynasor.
Li-O RDF — Coordination numbers (Li-O_water, Li-O_TFSI) vs Watanabe et al., JPCB 125, 7477
(2021)

Further details on the simulation protocol and MLIP assessment for this system: L. Brugnoli, arXiv:2603.22099
(2026).

Linked issue

Resolves #304

Progress

Calculations
Analysis
Application
Documentation

Note: Trajectory data (~500 MB, extxyz) needs to be uploaded to the ml-peg S3 bucket. Data files are available on
request.

Testing

Tested on 6 models: matpes-r2scan, mace-mpa-0-medium, mace-omat-0-medium, mace-mp-0b3, mace-mh-1-omat,
mace-mh-1-omol.

Requirement: ASE < 3.28 (3.28.0 has a bug in ase.io.extxyz.ixyzchunks). Tests run with --noconftest.

New decorators/callbacks

No new callbacks required. The RDF app uses existing plot_from_table_column from ml-peg utils.

LucaBrugnoli · 2026-04-03T11:40:47Z

@joehart2001 The code and analysis are complete. The remaining step I think is uploading the trajectory data to
the S3 bucket. Could you let me know how to proceed with the upload, please?

joehart2001 · 2026-04-04T14:25:55Z

@joehart2001 The code and analysis are complete. The remaining step I think is uploading the trajectory data to the S3 bucket. Could you let me know how to proceed with the upload, please?

Hi @LucaBrugnoli thanks for the PR! The easiest way is to attach a zip file containing your data to this PR and I can upload it. Hopefully that works

LucaBrugnoli · 2026-04-07T14:13:06Z

Here are the 6 data files, one per model.
Each zip contains:

nvt_trajectory.extxyz — 501-frame NVT trajectory (50 ps, p64_w170 cell, 1534 atoms), used for both the X-ray S(q) and
RDF benchmarks
density.json — NPT equilibrium density data (p16_w42 cell, 50 ps), used for the density benchmark

The expected directory structure on S3 is:
wise_electrolytes/xray_sf/{model}/nvt_trajectory.extxyz
wise_electrolytes/density/{model}/density.json

Let me know if anything looks off or if you need the data in a different format

mace-mh-1-omat.zip
mace-mh-1-omol.zip
mace-mp-0b3.zip
mace-mpa-0-medium.zip
mace-omat-0-medium.zip
matpes-r2scan.zip

Three sub-benchmarks for 21 m LiTFSI/H2O with 6 MLIP models (matpes-r2scan, mace-mpa-0-medium, mace-omat-0-medium, mace-mp-0b3, mace-mh-1-omat, mace-mh-1-omol): - NPT density vs Gilbert et al. JCED 2017 - X-ray structure factor S(q) vs SAXS experiment - Li-O RDF coordination numbers vs Watanabe et al. JPCB 2021

joehart2001 · 2026-04-21T19:29:54Z

+    return results
+
+
+def normalize_metric(value: float, good: float, bad: float) -> float:


this will be taken care of when you build the table using the decorator. you could define your own function if you dont want to use our defualt, see the docs

joehart2001 · 2026-04-21T19:30:25Z

+APP_ROOT = Path(__file__).resolve().parents[3] / "app"
+OUT_PATH = APP_ROOT / "data" / "wise_electrolytes" / "density"
+
+MODELS = [


usually we import this to get all the models. this is better for in the future when we have more models than now.

from ml_peg.models.get_models import load_models from ml_peg.models.models import current_models

joehart2001 · 2026-04-21T19:32:27Z

+# --- Metrics table -----------------------------------------------------------
+
+
+def build_metrics_table(data: dict[str, dict]) -> dict:


we have decorators to build this automatically, see the tutorial

joehart2001 · 2026-04-21T19:36:46Z

Hi @LucaBrugnoli thanks for the PR! From what i understand, you've provided us with NVT trajectories for each model and then your calc script takes these trajectories to calculate e.g. the rdf etc. Thank you for providing these trajectories! However, to make this test not rely on you computing these for each new model that is added in the future (as im sure you would prefer us running them for you!), I think we need to do some reshuffling.

The ideal workflow would be:
calc:

this is where your reproducable NVT script would go, and you can look at examples such as PR Aqueous Iron Chloride Oxidation States #360 (direct link to line) for an easy way to do this with e.g. Janus (used for many of our MD benchmarks).
To keep things modular and separated, we like to keep anything that could be considered analysis in the anlysis scripts only. The PR linked computes the rdf as part of the calc, so you could do this too, or separate it out into the analysis script. This format will mean you will only have one directory for your benchmark instead of three.

analysis

ive left some comments about using our helpers and decorators to help be consistent and also keep up with future changes to table builidng etc.

app

overall looks good, just need to add the docs link later. do you think structure visualisation would be useful in this benchmark? if so we could help add that

- Merge density / rdf / xray_sf into a single litfsi_h2o_21m benchmark under ml_peg/{calcs,analysis,app}/wise_electrolytes/litfsi_h2o_21m/. - Adopt the standard ml-peg patterns in the analysis and calc scripts: load_models(current_models) for model discovery, @build_table for the metrics table, and metrics.yml for thresholds/tooltips/weights. - Add a Janus recast of the LAMMPS+symmetrix Adastra production protocol in ml_peg/calcs/wise_electrolytes/md_reference/calc_md_reference.py (pytest-skipped reference; documents the exact MD parameters). - Add the docs page docs/source/user_guide/benchmarks/wise_electrolytes.rst (and toctree entry) and wire the app docs_url to it. - Update the parent wise_electrolytes.yml to a single benchmark weight.

ElliottKasoar · 2026-04-27T10:54:54Z

Thanks for adding this reference!

Apologies if you've stated this somewhere and I missed it, but how confident are you that this reproduces the LAMMPS dynamics?

I would suggest we don't use pytest.skip, and instead mark them as very slow (@pytest.mark.very_slow). This already requires users to explicitly request it, with the understanding that they will likely take a minimum of several hours on GPU, so we would run a single model at a time.

Since we can't guarantee all models can be run through LAMMPS/with additional acceleration, we would expect to run this test for several, if not all, of the models using the ASE calculator.

While of course it will be considerably slower, it means we can guarantee we can run every model, and that the settings are identical for each model.

We already have several tests that take over of day of GPU time, including #388, so this isn't a problem - it's a high, but one-off cost.

- Switch NPT (Melchionna) -> NPT_MTK (Martyna-Tobias-Klein) so the janus reference matches the LAMMPS fix npt formulation used in production. Pass thermostat_chain=3 and barostat_chain=3 explicitly, matching the LAMMPS default chain length. - Replace pytest.mark.skip with pytest.mark.very_slow per Joseph's review on PR ddmms#445: the reference protocol is now opt-in via --run-very-slow rather than unconditionally skipped, so it can be exercised when validating new models.

joehart2001 · 2026-04-28T12:44:59Z

Thanks for the code updates! I think this test could also be well suited in the molecular dynamics category instead of its own one as its quite specific, opinions @ElliottKasoar?

LucaBrugnoli · 2026-04-28T13:45:14Z

Thanks for both messages, I'll let you and @ElliottKasoar decide on the location; happy with whichever directory you prefer.
On the LAMMPS/ASE reproducibility question: I'm currently running the protocol on a cluster with GPUs MI250X, with mace-mp-0b3 and mace-matpes-r2scan to validate it empirically.
I've tried to match the integrator formulations exactly, NVT_NH for LAMMPS fix nvt (Nosé–Hoover chain, length 3) and NPT_MTK for fix npt iso (Martyna–Tobias–Klein chain, length 3), with the same TDAMP=50 fs / PDAMP=500 fs and dt=0.5 fs.
Even so, I don't expect bit-for-bit trajectory agreement: the LAMMPS production used the SymmetriX/Kokkos MACE kernels, while ASE/janus uses the reference PyTorch path; combined with different RNG-seeded initial velocities, chaotic divergence is unavoidable. What should agree, within sampling uncertainty, are the statistical observables, i.e the equilibrium density, RDFs, coordination numbers, and S(q).
I'll update the PR as soon as the runs are done with the LAMMPS/ASE comparison and the wallclock cost per ns of MD on a single MI250X GCD, so we have a realistic estimate for running the marker across the registry.

ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Apr 9, 2026

LucaBrugnoli force-pushed the wise-electrolytes-benchmark branch from dba4e00 to ef5aecc Compare April 16, 2026 17:40

joehart2001 reviewed Apr 21, 2026

View reviewed changes

ElliottKasoar reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WiSE electrolyte benchmark (density, X-ray S(q), Li-O RDF)#445

Add WiSE electrolyte benchmark (density, X-ray S(q), Li-O RDF)#445
LucaBrugnoli wants to merge 3 commits intoddmms:mainfrom
LucaBrugnoli:wise-electrolytes-benchmark

LucaBrugnoli commented Apr 3, 2026 •

edited

Loading

Uh oh!

LucaBrugnoli commented Apr 3, 2026

Uh oh!

joehart2001 commented Apr 4, 2026

Uh oh!

LucaBrugnoli commented Apr 7, 2026

Uh oh!

joehart2001 Apr 21, 2026

Uh oh!

joehart2001 Apr 21, 2026 •

edited

Loading

Uh oh!

joehart2001 Apr 21, 2026

Uh oh!

joehart2001 commented Apr 21, 2026

Uh oh!

ElliottKasoar Apr 27, 2026

Uh oh!

joehart2001 commented Apr 28, 2026

Uh oh!

LucaBrugnoli commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return results


		def normalize_metric(value: float, good: float, bad: float) -> float:

		# --- Metrics table -----------------------------------------------------------


		def build_metrics_table(data: dict[str, dict]) -> dict:

Conversation

LucaBrugnoli commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist for PR author

Summary

Linked issue

Progress

Testing

New decorators/callbacks

Uh oh!

LucaBrugnoli commented Apr 3, 2026

Uh oh!

joehart2001 commented Apr 4, 2026

Uh oh!

LucaBrugnoli commented Apr 7, 2026

Uh oh!

joehart2001 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

joehart2001 Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joehart2001 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

joehart2001 commented Apr 21, 2026

Uh oh!

ElliottKasoar Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

joehart2001 commented Apr 28, 2026

Uh oh!

LucaBrugnoli commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucaBrugnoli commented Apr 3, 2026 •

edited

Loading

joehart2001 Apr 21, 2026 •

edited

Loading

LucaBrugnoli commented Apr 28, 2026 •

edited

Loading