Skip to content

Add Neural Operator Factory (for reservoir simulation)#1552

Open
wdyab wants to merge 2 commits intoNVIDIA:mainfrom
wdyab:pr/neural-operator-factory
Open

Add Neural Operator Factory (for reservoir simulation)#1552
wdyab wants to merge 2 commits intoNVIDIA:mainfrom
wdyab:pr/neural-operator-factory

Conversation

@wdyab
Copy link
Copy Markdown

@wdyab wdyab commented Apr 6, 2026

Config-driven framework for training neural operator surrogates for reservoir simulation and beyond, built on PhysicsNeMo. Supports 165 model architectures (FNO, U-FNO, Conv-FNO, FNO4D, DeepONet with 8 variants including TNO), 6 training regimes (full-mapping and autoregressive with teacher forcing, pushforward, and rollout), and physics-informed losses (derivative regularization, mass conservation) — all from YAML config, zero code changes.

Key features:

  • xFNO family: FNO, U-FNO, Conv-FNO, Conv-U-FNO (3D), FNO4D (4D)
  • xDeepONet family: DeepONet, U-DeepONet, Fourier-DeepONet, Conv-DeepONet, Hybrid-DeepONet, MIONet, Fourier-MIONet, TNO
  • Composable spatial branches (Fourier, UNet, Conv in any combination)
  • Three-stage autoregressive training pipeline
  • Dimension-agnostic: same code handles 2D and 3D spatial data
  • Automatic inactive-cell mask detection (ACTNUM, non-zero fallback)
  • Multi-GPU DDP with DDP-safe autoregressive rollout
  • Self-describing checkpoints for model reconstruction
  • 375 unit tests

Includes reproducible examples with published results:

  • U-FNO (Wen et al. 2022) on CO2 sequestration
  • U-DeepONet (Diab & Al Kobaisi 2024) on CO2 sequestration
  • Fourier-MIONet (Jiang et al. 2024) on CO2 sequestration
  • TNO (Diab & Al Kobaisi 2025) on CO2 sequestration
  • Physics-informed TNO on Norne field (4D)

Checklist

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 6, 2026

Greptile Summary

This PR introduces a comprehensive Neural Operator Factory for reservoir simulation, adding config-driven support for 165 model architecture configurations (FNO, U-FNO, Conv-FNO, FNO4D, and 8 DeepONet variants), three-stage autoregressive training regimes, and physics-informed losses — all as a self-contained example directory with 375 tests and reproducible benchmark results.

Key issues found:

  • torch.load without weights_only (utils/checkpoint.py): load_checkpoint calls torch.load without specifying weights_only, which raises a security warning in PyTorch ≥ 2.0 and allows arbitrary code execution from malicious/corrupted checkpoint files.
  • Shared activation module instance in nn.Sequential (models/xfno.py, models/xdeeponet.py): Multiple nn.Sequential builders reuse the same activation module object across loop iterations (e.g., self.activation_fn in conv_modules, _build_lifting_network, and _build_decoder_network). This works for stateless activations but silently breaks for parametric activations (PReLU, etc.) — causing incorrect state_dict keys and parameter deduplication issues. The same pattern appears in DeepONet._build_decoder and DeepONet3D._build_decoder.
  • AR validation diluted by GT prefix (training/ar_utils.py): ar_validate_full_rollout prepends L ground-truth timesteps to predictions before returning; when train.py computes loss against full targets the first L steps always match perfectly, silently reducing every reported validation metric by a factor proportional to L/total_T.
  • Duplicate import (training/train.py): from pathlib import Path is imported twice (lines 21 and 26).

Important Files Changed

Filename Overview
examples/reservoir_simulation/neural_operator_factory/models/xfno.py New UFNO/FNO4D model implementations; shared activation module instances added to multiple nn.Sequential containers in conv_modules loop and multi-layer lifting/decoder builders
examples/reservoir_simulation/neural_operator_factory/models/xdeeponet.py New DeepONet family implementations; decoder builder reuses same activation instance in loop building nn.Sequential; overall solid architecture
examples/reservoir_simulation/neural_operator_factory/utils/checkpoint.py New checkpoint save/load utilities; torch.load called without weights_only argument, raises security warning in PyTorch 2.0+
examples/reservoir_simulation/neural_operator_factory/training/ar_utils.py New autoregressive training utilities; ar_validate_full_rollout prepends GT prefix to predictions, which dilutes validation metrics when loss is computed against full targets
examples/reservoir_simulation/neural_operator_factory/training/train.py Main Hydra-based training script; duplicate pathlib.Path import on lines 21 and 26; otherwise well-structured multi-GPU DDP training loop with stage-based AR training
examples/reservoir_simulation/neural_operator_factory/training/losses.py New unified loss function with data-fitting, derivative regularization, and physics-informed terms; well-implemented with proper per-sample masking support
examples/reservoir_simulation/neural_operator_factory/training/physics_losses.py Physics-informed mass conservation loss with cell-volume weighting and caching; clean implementation with appropriate warning for physically nonsensical pressure use
examples/reservoir_simulation/neural_operator_factory/data/dataloader.py Unified 3D/4D dataset loader; val/test datasets initialized with identity normalization as placeholder before create_dataloaders() overrides with training stats — direct ReservoirDataset instantiation bypasses this
examples/reservoir_simulation/neural_operator_factory/models/unet.py Custom UNet2D/UNet3D implementations used as skip-connection modules in FNO and DeepONet architectures; straightforward implementation
examples/reservoir_simulation/neural_operator_factory/models/physicsnemo_unet.py Thin wrapper around PhysicsNeMo UNet providing 2D and 3D variants with standardized interface for FNO/DeepONet

Reviews (1): Last reviewed commit: "Add Neural Operator Factory for reservoi..." | Re-trigger Greptile

Comment thread examples/reservoir_simulation/neural_operator_factory/utils/checkpoint.py Outdated
Comment thread examples/reservoir_simulation/neural_operator_factory/training/train.py Outdated
@wdyab wdyab force-pushed the pr/neural-operator-factory branch from bb72657 to ee2baa0 Compare April 6, 2026 12:59
Comment thread test/ci_tests/interrogate_baseline.txt Outdated
@coreyjadams
Copy link
Copy Markdown
Collaborator

@greptile this PR introduces a number of models, including UNets and FNOs. How much overlap is there with physicsnemo models and .nn components, and how much code reuse could be consolidated?

Config-driven framework for training neural operator surrogates for
reservoir simulation, built on PhysicsNeMo. Supports 165 model
architectures (FNO, U-FNO, Conv-FNO, FNO4D, DeepONet with 8 variants
including TNO), 6 training regimes (full-mapping and autoregressive
with teacher forcing, pushforward, and rollout), and physics-informed
losses (derivative regularization, mass conservation) — all from YAML
config, zero code changes.

Key features:
- xFNO family: FNO, U-FNO, Conv-FNO, Conv-U-FNO (3D), FNO4D (4D)
- xDeepONet family: DeepONet, U-DeepONet, Fourier-DeepONet,
  Conv-DeepONet, Hybrid-DeepONet, MIONet, Fourier-MIONet, TNO
- Composable spatial branches (Fourier, UNet, Conv in any combination)
- Three-stage autoregressive training pipeline
- Dimension-agnostic: same code handles 2D and 3D spatial data
- Automatic inactive-cell mask detection (ACTNUM, non-zero fallback)
- Multi-GPU DDP with DDP-safe autoregressive rollout
- Self-describing checkpoints for model reconstruction
- 375 unit tests

Includes reproducible examples with published results:
- U-FNO (Wen et al. 2022) on CO2 sequestration
- U-DeepONet (Diab & Al Kobaisi 2024) on CO2 sequestration
- Fourier-MIONet (Jiang et al. 2024) on CO2 sequestration
- TNO (Diab & Al Kobaisi 2025) on CO2 sequestration
- Physics-informed TNO on Norne field (4D)

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
@wdyab wdyab force-pushed the pr/neural-operator-factory branch from ee2baa0 to 90e9eb5 Compare April 7, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants