An evaluation framework from the paper Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?
This repository provides a unified framework for evaluating modern deep neural networks on small tabular datasets, evaluated on 31 field- and farm-scale digital soil mapping datasets from LimeSoDa.
-
Datasets: Uses soil datasets from the LimeSoDa repository with proximal soil sensing and remote sensing features.
-
Models: Implements 15+ models with a unified interface:
- Classical ML: Linear Regression, Ridge, Lasso, PLSR, Random Forest, XGBoost
- MLP-based NNs: MLP, TabM, RealMLP
- Retrieval-based NNs: TabR, ModernNCA
- Attention-based NNs: AutoInt, FT-Transformer, ExcelFormer, T2G-Former, AMFormer
- In-context learning foundation models: TabPFN
-
Configuration: Experiment settings defined via YAML configuration files. Configuration files for datasets with feature-to-sample ratio < 1 are in the config/pss/ folder, while configurations for high-dimensional datasets with ratio > 1 (including MIR/NIR spectroscopy features) are in the config/spectroscopic/ folder.
-
Preprocessing: Built-in support for PCA, feature scaling, numerical embeddings
Requirements: Python 3.10+
pip install -r requirements.txtRun experiments using YAML configuration files:
python benchmark.py --config config/pss/limesoda_mlp.yamlExample configuration files are provided in config/pss/ and config/spectroscopic/ folders.
Complete experimental results, including optimized hyperparameters for all dataset-model combinations and model predictions, are available: results.tar.gz
@article{barkov2026modern,
title = {Modern neural networks for small tabular datasets: {The} new default for field-scale {Digital} {Soil} {Mapping}?},
author = {Barkov, Viacheslav and Schmidinger, Jonas and Gebbers, Robin and Atzmueller, Martin},
journal = {European Journal of Soil Science},
volume = {77},
year = {2026},
pages = {e70299},
number = {2},
doi = {10.1111/ejss.70299},
}