Zero-Shot Depth from Defocus

Yiming Zuo* · Hongyu Wen* · Venkat Subramanian* · Patrick Chen · Karhan Kayan · Mario Bijelic · Felix Heide · Jia Deng

(*Equal Contribution)

Princeton Vision & Learning Lab (PVL)

Paper · Project

ZEDD Benchmark

Released under CC BY 4.0 License at

Website and test server: https://zedd.cs.princeton.edu/.
Huggingface download link: https://huggingface.co/datasets/venkatsubra/ZEDD.

Roadmap

✅ Release FOSSA training code
✅ Release FOSSA evaluation code
✅ Release ZEDD dataset and test server

Installation & Setup

Step 1: Create and activate conda environment

conda create -n fossa python=3.8
conda activate fossa

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Build PowerExpPSF CUDA Extension

This is required for training and evaluation with synthetic defocus effects.

Build steps

cd power_exp_psf

# Build and install the extension
python setup.py install

# Verify successful installation
python - <<'PY'
import torch
try:
    import power_exp_psf_cuda
    import os
    path = power_exp_psf_cuda.__file__
    if os.path.exists(path):
        print(f"SUCCESS: power_exp_psf_cuda loaded from {path}")
    else:
        print(f"ERROR: module loaded but file does not exist at {path}")
except Exception as e:
    print(f"IMPORT FAILED: {e}")
PY

cd ..

Step 4: Load datasets into `dataset/datasets`

Datasets download instructions

📦 HAMMER

Download: HAMMER Dataset prepared by MoGe2.

cd dataset/datasets
wget https://huggingface.co/datasets/Ruicheng/monocular-geometry-evaluation/resolve/main/HAMMER.zip
unzip HAMMER.zip
rm -f HAMMER.zip
cd ../..

📦 DDFF-12

Data split

cd dataset/datasets
mkdir ddff12_val_generation
cd ddff12_val_generation
mkdir third_part

Then, in your browser, navigate to the DFV Split (MS Sharepoint) prepared by DFF-DFV.

Click the download button. Then, copy the downloaded "my_ddff_trainVal.h5" file into dataset/datasets/ddff12_val_generation and rename it to "dfv_trainVal.h5".

Intrinsics matrix:

The intrinsics matrix is also provided by DFV(.mat file).

Download the "raw file" in the GitHub UI and place the downloaded IntParamLF.mat at "dataset/datasets/ddff_val_generation/third_part/".

At the end, the "dataset" directory should look like this (of which only ddff12_val_generation and HAMMER you need to create).

Expected format:

dataset/
├── datasets/
│   ├── ddff12_val_generation/
│   │   ├── dfv_trainVal.h5
│   │   └── third_part/
│   │       └── IntParamLF.mat
│   ├── HAMMER/
│   │   └── scene2_traj1_1/
│   │   │   └── 000000/
│   │   │   │   └── depth.png
│   │   │   │   └── intrinsics.json
│   │   │   │   └── meta.json
│   │   │   └── ...
│   │   └── ...
│   │   └── .index.txt
│   └── splits/
│       └── infinigen_defocus/
│           └── val.json
├── __init__.py
├── base.py
├── ddff12_val.py
├── hammer.py
├── infinigen_defocus.py
├── uniformat.py
└── zedd.py

Datasets that are loaded from HuggingFace (no user downloading necessary)

Note: the first time that evaluation is done on these datasets will take some time for the zip file to download and get unpacked. If you are downloading the zip file manually, note that you will have to delete the outer folder created by the unzipped file to achieve the above file structure (deleting of the outer folder is done automatically in the provided code).

Final expected format:

dataset/
├── datasets/
│   ├── ddff12_val_generation/
│   │   ├── dfv_trainVal.h5
│   │   └── third_part/
│   │       └── IntParamLF.mat
│   ├── defocus_uniformat/
│   │   ├── diode/
│   │   │   ├── diode_indoor_v2/
│   │   │   │   ├── 000000.npy
│   │   │   │   ├── 000001.npy
│   │   │   │   └── ...
│   │   │   └── diode_outdoor_v2/
│   │   │       ├── 000000.npy
│   │   │       ├── 000001.npy
│   │   │       └── ...
│   │   └── ibims/
│   │       ├── 000000.npy
│   │       ├── 000001.npy
│   │       └── ...
│   ├── HAMMER/
│   │   ├── scene2_traj1_1/
│   │   │   ├── 000000/
│   │   │   │   ├── depth.png
│   │   │   │   ├── intrinsics.json
│   │   │   │   └── meta.json
│   │   │   └── ...
│   │   ├── ...
│   │   └── .index.txt
│   ├── infinigen_defocus/
│   │   ├── 1a4897de_1/
│   │   │   ├── cam_all_in_focus.npz
│   │   │   ├── cam_ap_1.40_fd_0.80.npz
│   │   │   ├── ...
│   │   │   ├── depth.npy
│   │   │   ├── image_all_in_focus.png
│   │   │   └── image_ap_1.40_fd_0.80.png
│   │   └── ...
│   ├── ZEDD/
│   │   ├── test/
│   │   │   ├── test_0001/
│   │   │   │   ├── focus_stack/
│   │   │   │   │   ├── img_run_1_motor_6D3E_aperture_F1.4.jpg
│   │   │   │   │   ├── img_run_1_motor_6D3E_aperture_F2.0.jpg
│   │   │   │   │   └── ...
│   │   │   │   └── gt/
│   │   │   │       └── K.txt
│   │   │   └── ...
│   │   └── val/
│   │       ├── val_0001/
│   │       │   ├── focus_stack/
│   │       │   │   ├── img_run_1_motor_6D3E_aperture_F1.4.jpg
│   │       │   │   ├── img_run_1_motor_6D3E_aperture_F2.0.jpg
│   │       │   │   └── ...
│   │       │   └── gt/
│   │       │       ├── depth_vis.jpg
│   │       │       ├── depth.npy
│   │       │       ├── K.txt
│   │       │       └── overlay.jpg
│   │       └── ...
│   └── splits/
│       └── infinigen_defocus/
│           └── val.json
├── __init__.py
├── base.py
├── ddff12_val.py
├── hammer.py
├── infinigen_defocus.py
├── uniformat.py
└── zedd.py

Validation Quickstart

Running Validation

The easiest way to validate is using the distributed validation script:

bash dist_val.sh --encoder [VITS/VITB] --resumed_from [NAME OF PARAMETERS] --val_loader_config_choice [VAL_CONFIG_CHOICE]

Available Validation Configurations

See config/validation_configs.py for all predefined validation setups:

Model Loading Options

Option 1: Load from HuggingFace Hub (recommended)

resumed_from='model_name'  # automatically pull from venkatsubra/model_name

Option 2: Load from local path

resumed_from='/path/to/model.pth'

link: ViT-B, ViT-S

Reproducing Numbers in the Paper

🔹 ViT-S

Table 2

ZEDD

Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split

bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8

D1.05	D1.15	D1.25	abs_rel
0.4450	0.7866	0.8858	0.0985

Infinigen

bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0

D1.05	D1.15	D1.25	abs_rel
0.5201	0.8635	0.9400	0.0847

Table 3

iBims-1

bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice ibims_F1_4_adaptive_fd

D1.05	D1.15	D1.25	abs_rel
0.5193	0.8502	0.9540	0.0745

DIODE

bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice diode_F1_4_adaptive_fd

D1.05	D1.15	D1.25	abs_rel
0.4105	0.6649	0.7661	0.1778

HAMMER

bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice hammer_F1_4_adaptive_fd

D1.05	D1.15	D1.25	abs_rel
0.6006	0.9889	0.9987	0.0440

Table 4

DDFF12 (Base Model)

bash dist_val.sh --encoder vits --resumed_from fossa-vits \
  --val_loader_config_choice ddff12_val

MSE	RMSE	AbsRel	SqRel	D1	D2	D3
0.0015	0.0352	0.2676	0.0119	0.3462	0.8119	0.9544

DDFF12 (Finetuned)

bash dist_val.sh --encoder vits --resumed_from fossa-vits-ddff-finetuned \
  --val_loader_config_choice ddff12_val

MSE	RMSE	AbsRel	SqRel	D1	D2	D3
0.0004	0.0183	0.1076	0.0045	0.9363	0.9829	0.9908

🔹 ViT-B

Table 2

ZEDD

Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8

D1.05	D1.15	D1.25	abs_rel
0.4317	0.8101	0.9194	0.0957

Infinigen

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0

readme

D1.05	D1.15	D1.25	abs_rel
0.4199	0.8199	0.9355	0.0908

Table 3

iBims-1

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice ibims_F1_4_adaptive_fd

D1.05	D1.15	D1.25	abs_rel
0.5548	0.8719	0.9633	0.0701

DIODE

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice diode_F1_4_adaptive_fd

D1.05	D1.15	D1.25	abs_rel
0.4127	0.6692	0.7786	0.1601

HAMMER

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice hammer_F1_4_adaptive_fd

D1.05	D1.15	D1.25	abs_rel
0.9377	0.9974	0.9993	0.0172

Table 4

DDFF12 (Base Model)

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
  --val_loader_config_choice ddff12_val

MSE	RMSE	AbsRel	SqRel	D1	D2	D3
0.0013	0.0324	0.2105	0.0107	0.6075	0.9206	0.9679

DDFF12 (Finetuned)

bash dist_val.sh --encoder vitb --resumed_from fossa-vitb-ddff-finetuned \
  --val_loader_config_choice ddff12_val

MSE	RMSE	AbsRel	SqRel	D1	D2	D3
0.0003	0.0148	0.1088	0.0025	0.9322	0.9866	0.9939

Submitting to ZEDD Test Server

For ZEDD test set, save model outputs in the following format:

A single .zip file containing exactly 50 .npy files at the root level (no subdirectories)
Files must be named zedd_output_0001.npy through zedd_output_0050.npy
Each .npy file must be a 2-D float array of shape (H=1216, W=1824) — no channel dimension
All values must be finite (no NaN or Inf)

Please run the following command to check the file format before submitting to the server:

python zedd_test/zedd_check_format.py --zip [YOUR_ZIP_FILE]

Here is an example to compile the zip file for FOSSA ViT-S:

bash dist_test.sh --encoder=vits --resumed_from fossa-vits --val_loader_config_choice zedd_test_F2_8_fixed_fd_0_2_4_6_8 --experiment_name=FOSSA --zedd_test_output_dir=zedd_outputs

Finally, submit your zip file to the ZEDD test server.

Training from Scratch & Finetuning on DDFF

See Training.md for details.

Troubleshooting

PowerExpPSF building

❌ Error: `nvcc` not found / CUDA extension build fails

If you see an error like: "error: [Errno 2] No such file or directory: '/usr/local/cuda-12.1/bin/nvcc'" or "nvcc not found", this means your environment does not have a CUDA toolkit with nvcc available.

✅ Fix: Load a valid CUDA toolkit and set environment variables

On cluster environments, load an available CUDA module:

module avail cuda
module load cudatoolkit/12.6   # or closest version to your PyTorch CUDA
export CUDA_HOME=/usr/local/cuda-12.6
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"

Then verify:

which nvcc
nvcc --version

Then retry:

python setup.py install

Citation

@article{ZeroShotDepthFromDefocus,
  author  = {Zuo, Yiming and Wen, Hongyu and Subramanian, Venkat and Chen, Patrick and Kayan, Karhan and Bijelic, Mario and Heide, Felix and Deng, Jia},
  title   = {Zero-Shot Depth from Defocus},
  journal = {arXiv preprint arXiv:2603.26658},
  year    = {2026},
  url     = {https://arxiv.org/abs/2603.26658}
}

Acknowledgments

This codebase is partially based on Depth Anything v2, Video Depth Anything, DFF-DFV, and Unsupervised Depth from Focus.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
FOSSAModel		FOSSAModel
assets		assets
config		config
dataset		dataset
engine		engine
loss		loss
power_exp_psf		power_exp_psf
util		util
zedd_test		zedd_test
LICENSE		LICENSE
README.md		README.md
Training.md		Training.md
dist_test.sh		dist_test.sh
dist_train.sh		dist_train.sh
dist_val.sh		dist_val.sh
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Zero-Shot Depth from Defocus

Paper · Project

ZEDD Benchmark

Roadmap

Installation & Setup

Step 1: Create and activate conda environment

Step 2: Install Dependencies

Step 3: Build PowerExpPSF CUDA Extension

Step 4: Load datasets into dataset/datasets

📦 HAMMER

📦 DDFF-12

Data split

Intrinsics matrix:

Expected format:

Datasets that are loaded from HuggingFace (no user downloading necessary)

Final expected format:

📦 ZEDD

📦 Infinigen Defocus

📦 iBims-1 and DIODE

Validation Quickstart

Running Validation

Available Validation Configurations

Model Loading Options

Reproducing Numbers in the Paper

Table 2

ZEDD

Infinigen

Table 3

iBims-1

DIODE

HAMMER

Table 4

DDFF12 (Base Model)

DDFF12 (Finetuned)

Table 2

ZEDD

Infinigen

Table 3

iBims-1

DIODE

HAMMER

Table 4

DDFF12 (Base Model)

DDFF12 (Finetuned)

Submitting to ZEDD Test Server

Training from Scratch & Finetuning on DDFF

Troubleshooting

❌ Error: nvcc not found / CUDA extension build fails

✅ Fix: Load a valid CUDA toolkit and set environment variables

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 4: Load datasets into `dataset/datasets`

❌ Error: `nvcc` not found / CUDA extension build fails

Packages