Releases: jqmc-project/jQMC
Release list
v0.2.2
Stable release since v0.1.0. v0.2.2 ships everything accumulated across four alphas (v0.2.0a1, v0.2.1a1, v0.2.1a2, v0.2.2a1) plus a final round of polish. Per-alpha sections are preserved below; this entry is a roll-up of the highlights from v0.1.0 to v0.2.2.
Highlights (v0.1.0 -> v0.2.2)
Optimization
- Linear Method (LM) optimizer integrated under
method="sr"with a unifieduse_lm/lm_subspace_dimhierarchy (plain SR / aSR / LM). New|v_0|^2 < 0.9fallback to plain SR keeps non-linear-regime updates from producing NaN energies. - Adaptive learning rate for Stochastic Reconfiguration.
- MO optimization for JSD via the projection method with Attacalite-Sorella regularization, plus geminal AO -> MO projection.
- AO basis optimization (
opt_J3_basis_coeff/exp,opt_lambda_basis_coeff/exp) with shell-shared constraint and dual symmetrization. - Distributed tall-CG SR solver via
psum, removingmpi_size-scaling memory in the SR solve.
Performance
- Fast-update use across MCMC / VMC / LRDMC, with mat-vec hot paths converted to GEMM for better GPU utilization.
- On-GPU VMC optimization with
use_device_collectivesauto-selected by JAX backend; multi-GPUrun_optimizesupported. - LU -> SVD in determinant / geminal / GFMC_n / GFMC_t for ill-conditioned stability; Cartesian / Spherical AO conversion (Cartesian GTOs are substantially faster on GPU); ECP fast path (
compute_ecp_coulomb_potential_fast).
Numerical precision
- Mixed-precision support with
"full"/"mixed"modes and per-zone dtype control. Three explicit design principles. AGP/SD geminal stays fp64 to preventlog|det|amplification; electron-nucleusr - Rdifferences are reconstructed in fp64 before downcast to avoid catastrophic cancellation.ao_grad_lapandmo_grad_lapzones are split for finer-grained control.
Features
- LRDMC atomic forces with the Pathak-Wagner regularization.
- Runtime-selectable Jastrow forms:
jastrow_1b_typeandjastrow_2b_type(exp/pade). use_swctflag to toggle Space Warp Coordinate Transformation in MCMC and GFMC_n / GFMC_t.
jqmc_workflow automation package
- jqmc-workflow is introduced as a multi-stage QMC pipeline orchestrator (WF conversion -> VMC opt -> MCMC / LRDMC production) with automatic step estimation, checkpointing, and remote job management.
bug fixes
- GFMC_n / GFMC_t spin-polarized (
n_up != n_dn,n_dn >= 1) MPI bug. - MPI deadlock in
max_time/stop_flagchecks;Allreducevsallreducefor scalars. - Optimizer step estimation; force NaN; MCMC memory overflow from
r_up_history/r_dn_historystorage.
Infrastructure
- Restart files migrated from pickle
.chkto HDF5.h5(no backward compatibility). - Ruff lint pipeline (
jqmc-lint-ruff.yml) and pre-commit updates; non-ASCII cleanup across code and docstrings. - Nightly CI + Codecov activated with the
pytest-xdistsupport. - Examples: 11 end-to-end tutorials (
jqmc-example01tojqmc-example08,jqmc-workflow-example01tojqmc-workflow-example03). - Project ownership transferred to the
jqmc-projectGitHub organization; URLs updated.
Breaking changes since v0.1.0
- Restart files: pickle
.chkis no longer supported; HDF5.h5is the only format. - Optimizer API:
num_param_opt,opt_filter_min_SN_ratio,adaptive_learning_rate, andmethod="lm"are all removed or replaced; the Linear Method is now accessed viamethod="sr"withuse_lm=true(and the newlm_subspace_dim/lm_condparameters).
See the per-alpha sections below for full details.
v0.2.2a1
This release brings configurable mixed-precision support, deep kernel-level performance work (AOs, Jastrow, det/Jastrow ratios, GFMC), on-GPU VMC optimization, and a project-wide lint/cleanup.
New features
- Mixed precision support: Added a configurable precision system with per-zone dtype control (fp64 by default; fp32 in selected zones in
"mixed"mode). Refactored as selectable-precision modules with three explicit design principles. The geminal/AGP/SD path remains in fp64 to prevent fp32 amplification oflog|det|, and electron-nucleusr - Rdifferences are computed in fp64 before downcasting to avoid catastrophic cancellation.ao_grad_lapandmo_grad_lapprecision zones are split for finer-grained control. The public API is reduced to a single mode selector:"full"or"mixed". - On-GPU VMC optimization: VMC parameter optimization can now run entirely on GPU. Added
use_device_collectives(auto-selected by the JAX backend: GPU=True, otherwiseFalse) with an MPI/JAX consistency check, along with matching CLI/TOML options. Multi-GPUrun_optimizeis supported. - Ruff lint pipeline: Added
jqmc-lint-ruff.ymlGitHub Action and updated.pre-commit-config.yaml. Applied auto-fixes and manual cleanups across the codebase, and removed non-ASCII characters from code and docstrings.
Performance & memory
- AO module (HLO-level): Reduced L1/L2/DRAM traffics. Replaced
segment_sumwith a bucketed reduce+gather scheme (includingV_l). Unrolled(8Z)**lin the Cartesian kernels to avoid XLAwhileloops. Removedepsfrom Cartesian GTOs. Fused AO/MO value/grad/lap into a single dispatch on hot paths. - Streaming caches on hot paths: Stream cached AO and paired tables into every det-ratio / jas-ratio hot paths. Improved the J3 streaming cache in
jastrow_factor.py. Reused J3 streaming-state AOs in LRDMC mesh / ECP non-local ratios. Improved the K state carry injqmc/wavefunction.py. - Jastrow ratios: Optimized J2 ratio from an
O(N^2)baseline toO(N * N_grid)per-grid sums. Introduced a slim J3 state carry in the MCMC wavefunction update. Polished Jastrow with a dense(N, N)up-up / dn-dn pair reduction and removed scatter-addwhileloops. - GFMC_t: Added a streaming kinetic-state path (parity with GFMC_n) and replaced a Python
whileloop withlax.while_loopin the projection. - Misc: Vectorized the electron-configuration generator and PRNG key initialization. Switched the jackknife standard deviation to a two-pass centered sum-of-squares for better numerical stability. Replaced
hessian()withjvp(grad)for the NN-Jastrow Laplacian.
Bug fixes
- GFMC/MCMC logging: Improved loggers in
jqmc/jqmc_gfmc.pyandjqmc/jqmc_mcmc.py.
Workflow (jqmc_workflow)
- Refactored workflow modules (
vmc_workflow.py,mcmc_workflow.py,lrdmc_workflow.py,lrdmc_ext_workflow.py,workflow.py, and the_*.pyhelpers) for readability, with no behavior change.
Tests & infrastructure
- Polished tolerance control across the test suite; introduced a medium tolerance for numerical-Laplacian tests and removed the separate autodiff tolerance.
- Removed
test_kinetic_energy_analytic_and_numericalandtest_numerial_and_auto_grads_and_laplacians_ln_Detbecause the numerical references are intrinsically unstable; analytical versions are already validated against JAX autograd. - Shortened
jqmc-run-full-pytest.ymland updated GitHub Actions. - Made the numerical-Laplacian debug functions more stable.
v0.2.1a2
Minor update focusing on workflow improvements, bug fixes, and new benchmark infrastructure.
New features
- Kernel benchmark suite: Added benchmark modules and tests for profiling kernel performance.
cleanup_patternsoption: Added acleanup_patternsconfiguration option tojqmc_workflowfor automatic post-run file cleanup, with support for recursive matching in subdirectories.
Bug fixes
- MPI deadlock in
max_time/stop_flag: Fixed a deadlock that could occur duringmax_timeandstop_flagchecks in MPI runs.
v0.2.1a1
This release focuses on a major update of the VMC optimizer (Linear Method), extended AO basis optimization, memory/performance improvements of the jqmc kernel package, and substantial hardening of the jqmc_workflow automation package.
New features
-
Linear Method (LM) optimizer: Implemented the Linear Method optimizer that solves the generalized eigenvalue problem
$\bar{H} v = E \bar{S} v$ for optimal parameter updates, providing a powerful alternative to the naive Stochastic Reconfiguration (SR). The optimization is robust and fast.- LM is now integrated into the
method="sr"code path, controlled by theuse_lmflag andlm_subspace_dimparameter (inspired by TurboRVB'sncg=1design). -
Unified optimizer hierarchy under
method="sr":-
use_lm=false: plain SR -
use_lm=true, lm_subspace_dim=0: adaptive SR (aSR) with gamma scaling -
use_lm=true, lm_subspace_dim>0: LM with SR collective variable + top-$p$ S/N ratio parameters -
use_lm=true, lm_subspace_dim=-1: LM with SR collective variable + all parameters
-
- SR collective variable (
$g = S^{-1}f$ ) is used as the first LM basis vector for stability. -
dgelscut-based preconditioning with iterative eigenvalue conditioning on the correlation matrix (condition number$\leq 1/\epsilon$ ), inspired by TurboRVB's implementation. - S-orthonormalization (
$P = U \Lambda^{-1/2}$ ) converts the generalized eigenvalue problem to standard form. - Symmetrization of
$H = K + B$ before eigenvalue solve to suppress finite-sample noise. - Eigenvector selection by
$\max |v_0|^2$ criterion. - Separate
epsilon(SR regularization) andlm_cond(LM dgelscut threshold) parameters. - Fallback mechanisms: plain SR fallback (
$\gamma=0.1$ ) when aSR finds no positive root; plain SR fallback ($0.1 \cdot g_\mathrm{sr}$ ) when LM does not predict energy improvement ($E_\mathrm{LM} > E_0 + 3\sigma$ ).
- LM is now integrated into the
-
Extended AO basis optimization: Implemented
opt_J3_basis_coeff,opt_J3_basis_exp,opt_lambda_basis_coeff, andopt_lambda_basis_expoptions for optimizing three-body Jastrow and geminal AO basis exponents and coefficients in VMC.-
Shell-shared constraint: Same-atom, same-shell primitives share exponents/coefficients via
symmetrize_metric(size-preserving shell averaging), consistent withj3_matrix/lambda_matrixsymmetrization. -
Dual symmetrization strategy:
$O_k$ derivatives are symmetrized at source inget_dln_WFfor accurate$f$ and$S$ , and post-hoc symmetrization is applied afterapply_block_updateto prevent floating-point drift over hundreds of optimization steps. - Improved AO basis exponent selection using a log-spaced median window with widened margin (
/2.5).
-
Shell-shared constraint: Same-atom, same-shell primitives share exponents/coefficients via
-
use_swctparameter: Addeduse_swctflag to MCMC, GFMC_t, and GFMC_n classes to control Space Warp Coordinate Transformation (SWCT) on/off for atomic force calculations. Default isTruefor MCMC andFalsefor GFMC (LRDMC). When disabled, zero arrays are used foromega/grad_omega, and force formulas reduce to bare Hellmann-Feynman and Pulay forces. -
S/N ratio filter: Applied S/N ratio filtering before SR matrix construction to reduce the effective matrix dimension, improving both speed and numerical stability.
O_matrix_localis sliced to selected parameters before building$S$ , so all SR computations operate in the reduced space. -
Shape assertions: Added rigorous shape assertions (using
mcmc_counter,num_walkers,n_atoms) toget_E,get_aF,get_gF,get_aHin MCMC, GFMC_t, GFMC_n, and their debug counterparts.
Performance & memory
-
Buffer-based MPI reduce in SR: Eliminated
list()round-trips and switched to buffer-based MPI reduce in the SR optimizer for lower overhead. -
Pre-compute collective observable: Compute
$O_\mathrm{SR} = \delta O \cdot g$ while the full$O$ -matrix is still in memory during the SR solve, avoiding a redundantget_dln_WFcall in the LM path. -
Avoid redundant JIT compilation: Refactored
run_optimizeinjqmc_mcmc.pyto skip redundant computation via earlycontinue, reducing unnecessary recompilation. -
jax.clear_caches()after optimization loop: Added cache clearing after the optimization loop as an OOM workaround. -
Store
np.arrayinstead oflist(np.array): Refactored internal data storage to usenp.arraydirectly, reducing memory fragmentation. -
Wrapped properties with
np.asarray(): Prevent accidental storage of JAX arrays in checkpoint data to avoid OOM. - Avoid redundant energy/force post-processing: Skip unnecessary re-computation of energy and force post-processing.
-
Better memory management: Improved memory handling in
jqmc_gfmc.pyandjqmc_mcmc.py.
Bug fixes
- GFMC_n / GFMC_t spin-polarized MPI bug: Fixed a critical bug for systems where
n_up != n_dnandn_dn >= 1with MPI >= 2 processes. - GFMC_t projection averaging: Fixed incorrect averaging of the number of projections across MPI ranks in GFMC_t.
- SR with
num_params >= num_samples: Fixed MPI bug when the number of optimizable parameters exceeds the number of samples. - MPI
Allreducefor scalars: ReplacedAllreducewithallreducefor scalarintandfloatvalues injqmc_mcmc.pyandjqmc_gfmc.py, asAllreducefor scalars exhibits implementation-dependent behavior. - Optimizer step estimation: Fixed
estimate_required_steps— removed incorrectceilrounding andmaxclamp that ignoredwalker_ratio; addedmin_stepsparameter. - SR stability near convergence: Improved stability of SR with adaptive learning rate in the vicinity of convergence.
- Pytree inconsistency: Fixed a JAX pytree structural mismatch.
- S/N ratio diagnostics: Fixed averaging (last S/N ratio → averaged S/N ratios) and trivial output bugs.
Workflow (jqmc_workflow)
- Major refactoring: Comprehensive overhaul of all workflow modules (
vmc_workflow.py,mcmc_workflow.py,lrdmc_workflow.py,workflow.py) with improved robustness, cleaner code structure, and new_phase.pymodule for phase management. - SSH / file-descriptor leak fixes: Fixed SSH connection hangs and leaks; consolidated
Machineobjects to prevent resource exhaustion. - Continuation behavior: Changed and improved the behavior of workflow continuations with
SHA256-based input fingerprinting for reliable restart detection. - Step count accumulation: Fixed a bug in accumulated step counts for VMC and LRDMC workflows.
- VMC convergence check: Implemented a new VMC energy-slope-based convergence check.
- New VMC workflow parameters: Introduced additional configurable parameters for
vmc_workflow.py. - Output parser fixes: Fixed parsers for workflow output processing.
FileFromhandling: Fixed and polishedFileFromfile-transfer logic.- Job ID checks: Updated and improved job ID check logic for remote execution.
- Error estimation in workflows: Fixed error estimation methods used by workflows.
Breaking changes
- Removed
num_param_optandopt_filter_min_SN_ratio: These parameters have been removed fromrun_optimize(), CLI, workflow, and TOML config. SR and optax now always optimize all parameters; parameter selection is handled internally by the LM subspace mechanism. - Replaced
adaptive_learning_ratewithuse_lm: Theadaptive_learning_rateflag is replaced by theuse_lmflag, which controls the unified LM/aSR optimizer hierarchy. - Removed
method="lm"as separate code path: The Linear Method is now accessed viamethod="sr"withuse_lm=true. - New optimizer parameter names:
lm_cond(default 0.001) replaces the previous LM-specific delta/epsilon naming.
v0.2.0a1
This is a major update with drastic performance improvements, new features, and a new workflow automation package (jqmc-workflow).
Performance
- Drastic speedups: MCMC, VMC, and LRDMC are all significantly faster than the previous version thanks to pervasive use of fast-update algorithms throughout the code.
- LU -> SVD replacement: Replaced LU factorizations with SVD across determinant, geminal,
GFMC_n, andGFMC_tmodules, greatly improving numerical stability for ill-conditioned matrices. - GEMM optimization: Converted matrix-vector operations to matrix-matrix (GEMM) operations in Coulomb potential, determinant, and Jastrow factor modules for better GPU utilization.
- Cartesian / Spherical AO conversion: Implemented Cartesian AO <-> Spherical AO conversion. Cartesian GTOs are substantially faster than spherical GTOs on GPUs, so users can now exploit this for better throughput.
- ECP fast computation: Implemented
compute_ecp_coulomb_potential_fastfor efficient pseudopotential evaluation. vmap+jitfix:vmap-ed functions are now explicitly wrapped withjit, asvmapdoes not automatically JIT-compile the mapped function.- Removed
mpi4jaxdependency for CG: Conjugate gradient (CG) solver now uses pure MPI on CPUs, eliminating thempi4jaxdependency.
Optimization
- Adaptive learning rate for Stochastic Reconfiguration: Implemented a linear-method-inspired automatic learning-rate adjustment scheme, leading to dramatically faster optimization convergence.
- Molecular orbital optimization: Added MO optimization for JSD wavefunctions via the projection method with Attacalite-Sorella regularization.
- Geminal AO -> MO projection: Implemented AO overlap matrix computation and geminal AO -> MO projection for constrained optimization.
New features
- LRDMC force calculations: Implemented LRDMC atomic forces with the Pathak–Wagner regularization.
-
Jastrow functions: Added
jastrow_1b_type('exp'/'pade') andjastrow_2b_type('pade'/'exp') fields toJastrow_one_body_dataandJastrow_two_body_data, enabling runtime selection of the one-body and two-body Jastrow functional forms.- Exponential form:
$u(r) = \frac{1}{2b}(1 - e^{-br})$ - Padé form:
$u(r) = \frac{r}{2(1 + br)}$
- Exponential form:
Bug fixes
- Fixed force calculations producing NaN values; added NaN checks in all tests.
- Fixed MCMC memory overflow caused by storing
r_up_history/r_dn_history. - Fixed wavefunction without Jastrow not working for MCMC.
- Fixed missing NN-Jastrow derivatives in
_GFMC_n_debug.
Infrastructure
- Restart file format change: Switched restart files from pickle-based
*.chkto HDF5-based*.h5. Note: backward compatibility with old*.chkfiles is not maintained. jqmc_workflowpackage: Introduced thejqmc_workflowautomation package for orchestrating multi-stage QMC pipelines (WF conversion → VMC optimization → MCMC / LRDMC production) with automatic step estimation, checkpointing, and remote job management.- Removed
SWCT_data: Cleaned up legacySWCT_dataclass as part of codebase refactoring. - More comprehensive tests: Substantially expanded the test suite to cover the new features and improve overall reliability.
- Expanded examples: Reorganized and enriched the
examples/directory with 11 end-to-end tutorials (jqmc-example01–jqmc-example08,jqmc-workflow-example01–jqmc-workflow-example03) covering single-point VMC/LRDMC, force calculations, GPU walker-scaling benchmarks, interaction-energy workflows, and PES scans with automatedjqmc_workflowpipelines.
v0.1.0
Release of the first stable version of jQMC.
Known Limitation(s)
- Periodic Boundary Condition (PBC) calculations are being implemented for the next major release.
v0.1.0a3
Release of the third alpha version of jQMC.
Key Features
- Analytical derivatives:
- Implemented analytical gradients and Laplacians for atomic and molecular orbitals in both spherical and Cartesian GTO bases.
- JAX autograd is now used primarily for validating the analytical gradients.
- Logarithmic derivatives of the wavefunction and derivatives of atomic force calculations still use JAX autograd.
- Testing precision:
- Tightened and systematized decimal controls in tests, improving overall reliability.
- Fast updates:
- Expanded fast-update implementations to more functions, yielding significant speedups in both MCMC and GFMC modules.
v0.1.0a1
Release of the second alpha version of jQMC.
Key Features
- Neural Network Jastrow:
- Introduced
NNJastrow, a PauliNet-inspired neural network architecture for many-body Jastrow factors, enabling more accurate wavefunction ansatz.
- Introduced
- Optimization Control:
- Implemented proper gradient masking mechanisms (e.g.,
with_param_grad_mask). This allows for selectively freezing or optimizing specific parameter blocks (One-body, Two-body, Three-body, NN, and Geminal coefficients) during the VMC optimizations.
- Implemented proper gradient masking mechanisms (e.g.,
Enhancements & Fixes
- I/O: Changed the storage format for
hamiltonian_datafrom pickled binary files to HDF5 (.h5) for better portability and compatibility. - Documentation: Updated
README.md, docstrings, and API references to reflect recent changes and fix Sphinx warnings. - CI/CD: Updated pre-commit configurations and GitHub workflow triggers.
- Code Quality: Refactored code based on suggestions and improved type hinting.
v0.1.0a0
We are pleased to announce the first alpha release of jQMC, a Python-based Quantum Monte Carlo package built on JAX.
Key Features
- JAX-based Core: Fully utilizes JAX's Just-In-Time (JIT) compilation and automatic vectorization (
vmap) for high-performance simulations on GPUs and TPUs. - Algorithms:
- Variational Monte Carlo (VMC): Supports wavefunction optimization via Stochastic Reconfiguration (SR) and Natural Gradient methods.
- Lattice Regularized Diffusion Monte Carlo (LRDMC): A stable and efficient projection method for ground state calculations.
- Wavefunctions:
- Ansatz: Supports Jastrow-Slater Determinant (JSD) and Jastrow-Antisymmetrized Geminal Power (JAGP).
- Jastrow Factors: Includes One-body, Two-body, Three/Four-body terms.
- Determinant Types: Single Determinant (SD), Antisymmetrized Geminal Power (AGP), and Number-constrained AGP (AGPn).
- I/O & Interoperability:
- TREX-IO Support: Interfaces with the TREX-IO library (HDF5 backend) for standardized input of molecular structure and basis sets (Cartesian & Spherical GTOs).
- Parallelization:
- MPI Support: Implements
mpi4pyfor efficient parallelization across multiple nodes.
- MPI Support: Implements
- Documentation:
- Comprehensive technical notes on Wavefunctions, VMC, LRDMC, and JAX implementation details.
- Examples demonstrating usage for various systems (H2, N2, Water, etc.).
Known Limitations (Alpha)
- Periodic Boundary Conditions (PBC) are currently in development.
- Atomic force calculations with spherical harmonics are computationally intensive on current JAX versions.
- Complex wavefunctions are not yet supported.