Overview
This project provides a flexible framework for oceanographic time‑series forecasting. It separates dimensionality reduction (DR) and forecasting into interchangeable components, enabling you to swap in your own algorithms with minimal changes.
-
Clone the repository
git clone https://github.com/m2lines/nemo-spinup-forecast cd <repo_dir>
-
Set up a virtual environment
python3 -m venv venv source venv/bin/activate -
Install dependencies
pip install . # or for developer dependencies pip install .[dev]
If you plan to edit the code, use an editable install so your changes take effect without reinstalling:
pip install -e .Without
-e, any edit to the source requires rerunningpip install .before the CLI picks it up.
This repository is one piece of a toolkit for NEMO spin-up work.
- nemo-spinup-restart — builds NEMO restart files from forecast output, if you want to continue a simulation from a forecast produced here, use that repo.
- nemo-spinup-evaluation — tools for evaluating forecast quality.
-
Once you have cloned the repository and built the environment, there is test data available for quick experimentation. Download it using the script in the
testsdirectory:./tests/download_test_data.sh
-
Run the forecasting script on the test data:
Set
--data-pathto the directory where you downloaded the test data and--output-pathto where results should be written:python -m nemo_spinup_forecast \ --ye True \ --start 20 \ --end 50 \ --comp 1 \ --steps 30 \ --data-path /path/to/simulation/files \ --output-path /path/to/output \ --ocean-terms /path/to/ocean_terms.yaml \ --techniques-config /path/to/techniques_config.yaml
This will fit the model on 30 years of data and forecast a jump of 20 years using PCA and a Gaussian process.
data-path— Directory containing the simulation filesoutput-path— Directory to write forecast results toye— The simulation is expressed in years (True) or months (False)start— Starting year (training data)end— Ending year (usually the last simulated year)comp— Number or ratio of components to acceleratesteps— Jump size (years ifye=True, months otherwise)ocean-terms— Path to a customocean_terms.yamlmapping logical terms (e.g., SSH, Salinity, Temperature) to dataset variable names. If omitted, a packaged default is used.techniques-config— Path to a customtechniques_config.yamlselecting DR and forecast techniques. If omitted, the default packaged config directory is used.
- Prepared data in
<output-path>/latest/forecast/simu_prepared/{term}/ - Forecasted components in
<output-path>/latest/forecast/simu_predicted/{term}.npy
All user‑selectable techniques live in techniques_config.yaml:
DR_technique:
name: PCA # Options: PCA, KernelPCA, or your custom class
Forecast_technique:
name: GaussianProcessRecursiveForecaster # Options: GaussianProcessForecaster, GaussianProcessRecursiveForecaster, or your custom classOcean term definitions live in src/nemo_spinup_forecast/configs/ocean_terms.DINO.yaml.
This file is the single source of truth for which variables to forecast
and which NetCDF files to read them from:
terms:
salinity:
filename: DINO_1y_grid_T.nc
term: soce
temperature:
filename: DINO_1y_grid_T.nc
term: toce
ssh:
filename: DINO_1m_To_1y_grid_T.nc
term: sshEach entry has three parts:
- top-level key (
ssh,salinity,temperature) — identifier used throughout the pipeline to reference this term filename— exact NetCDF filename inside--data-path. Not a glob or regex; the literal filename is expectedterm— name of the variable inside that NetCDF file
The packaged config targets DINO output. To forecast different data (e.g. ORCA1, or DINO files you renamed), copy the packaged YAML, edit the filenames and variable names to match your files, and pass the custom path to the CLI:
python -m nemo_spinup_forecast \
--ye True \
--start 20 \
--end 50 \
--comp 1 \
--steps 30 \
--data-path /path/to/simulation/files \
--output-path /path/to/output \
--ocean-terms /path/to/my_ocean_terms.yaml \
--techniques-config /path/to/techniques_config.yamlBecause filenames live in the YAML, changing them does not require reinstalling the package — just edit the YAML and rerun the CLI.
├── Notebooks
│ ├── Jumper.ipynb
│ └── Resample_ssh.ipynb
├── pyproject.toml
├── README.md
├── ruff.toml
├── src
│ └── nemo_spinup_forecast
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli.py
│ ├── configs
│ │ ├── ocean_terms.DINO.yaml
│ │ └── techniques_config.yaml
│ ├── density.py
│ ├── dimensionality_reduction.py
│ ├── forecast_method.py
│ ├── forecast.py
│ ├── pipeline.py
│ ├── pipeline_utils.py
│ ├── plotting_utils.py
│ └── utils.py
└── tests
from nemo_spinup_forecast.dimensionality_reduction import DimensionalityReduction
class MyDR(DimensionalityReduction):
def __init__(self, comp, **kwargs):
self.comp = comp
# initialise other parameters
def set_from_simulation(self, sim):
# copy metadata from Simulation
...
def decompose(self, simulation, length):
# return components, model‑instance, mask
...
@staticmethod
def reconstruct_predictions(predictions, n, info, begin=0):
# return mask, reconstructed‑array
...Register the class in the dimensionality_reduction_techniques dict at the bottom of src/nemo_spinup_forecast/dimensionality_reduction.py, and select it in techniques_config.yaml just as for the built‑in techniques.
from nemo_spinup_forecast.forecast_method import BaseForecaster
class MyForecaster(BaseForecaster):
def __init__(self, **params):
# initialise your model
...
def apply_forecast(self, y_train, x_train, x_pred):
# fit your model on y_train (and x_train), predict x_pred
# return (y_hat, y_hat_std)
...Register the class in the forecast_techniques dict at the bottom of src/nemo_spinup_forecast/forecast_method.py and reference it in techniques_config.yaml.
Jumper.ipynb demonstrates forecasting with PCA. Copy, modify, or extend it to test your own techniques.
The objective is to implement a Gaussian process forecast to forecast yearly simulations of the NEMO coupled climate model. For this we need simulation files of the sea surface height (zos or ssh), the salinity (so) and temperature (thetao).
We apply PCA on each simulation to transform those features to time series and observe the trend in the first component.
We forecast each component with a Gaussian process using the following kernel:
- Long‑term trend :
0.1*DotProduct(sigma_0=0.0) - Periodic patterns :
10*ExpSineSquared(length_scale=5/45, periodicity=5/45) - White noise :
2*WhiteKernel(noise_level=1)
We then evaluate the RMSE:
The tests are designed to ensure the functionality of the Spin-Up NEMO project, which involves preparing and forecasting simulations.
To run the tests, you first need to download the necessary data files.
You can do this by running the download script within the tests directory from the root of the project:
./tests/download_test_data.shThen execute the tests using pytest. The tests are located in the tests directory, and you can run them with the following command:
pytest tests/This project includes code adapted from the Spinup-NEMO repository by Maud Tissot. The original work is licensed under the MIT License.


