Yiming Zuo* Β· Hongyu Wen* Β· Venkat Subramanian* Β· Patrick Chen Β· Karhan Kayan Β· Mario Bijelic Β· Felix Heide Β· Jia Deng
(*Equal Contribution)
Princeton Vision & Learning Lab (PVL)
Released under CC BY 4.0 License at
- Website and test server: https://zedd.cs.princeton.edu/.
- Huggingface download link: https://huggingface.co/datasets/venkatsubra/ZEDD.
- β Release FOSSA training code
- β Release FOSSA evaluation code
- β Release ZEDD dataset and test server
conda create -n fossa python=3.8
conda activate fossapip install -r requirements.txtThis is required for training and evaluation with synthetic defocus effects.
Build steps
cd power_exp_psf
# Build and install the extension
python setup.py install
# Verify successful installation
python - <<'PY'
import torch
try:
import power_exp_psf_cuda
import os
path = power_exp_psf_cuda.__file__
if os.path.exists(path):
print(f"SUCCESS: power_exp_psf_cuda loaded from {path}")
else:
print(f"ERROR: module loaded but file does not exist at {path}")
except Exception as e:
print(f"IMPORT FAILED: {e}")
PY
cd ..
Datasets download instructions
Download: HAMMER Dataset prepared by MoGe2.
cd dataset/datasets
wget https://huggingface.co/datasets/Ruicheng/monocular-geometry-evaluation/resolve/main/HAMMER.zip
unzip HAMMER.zip
rm -f HAMMER.zip
cd ../..cd dataset/datasets
mkdir ddff12_val_generation
cd ddff12_val_generation
mkdir third_partThen, in your browser, navigate to the DFV Split (MS Sharepoint) prepared by DFF-DFV.
Click the download button. Then, copy the downloaded "my_ddff_trainVal.h5" file into dataset/datasets/ddff12_val_generation and rename it to "dfv_trainVal.h5".
The intrinsics matrix is also provided by DFV(.mat file).
Download the "raw file" in the GitHub UI and place the downloaded IntParamLF.mat at "dataset/datasets/ddff_val_generation/third_part/".
At the end, the "dataset" directory should look like this (of which only ddff12_val_generation and HAMMER you need to create).
dataset/
βββ datasets/
β βββ ddff12_val_generation/
β β βββ dfv_trainVal.h5
β β βββ third_part/
β β βββ IntParamLF.mat
β βββ HAMMER/
β β βββ scene2_traj1_1/
β β β βββ 000000/
β β β β βββ depth.png
β β β β βββ intrinsics.json
β β β β βββ meta.json
β β β βββ ...
β β βββ ...
β β βββ .index.txt
β βββ splits/
β βββ infinigen_defocus/
β βββ val.json
βββ __init__.py
βββ base.py
βββ ddff12_val.py
βββ hammer.py
βββ infinigen_defocus.py
βββ uniformat.py
βββ zedd.py
Note: the first time that evaluation is done on these datasets will take some time for the zip file to download and get unpacked. If you are downloading the zip file manually, note that you will have to delete the outer folder created by the unzipped file to achieve the above file structure (deleting of the outer folder is done automatically in the provided code).
dataset/
βββ datasets/
β βββ ddff12_val_generation/
β β βββ dfv_trainVal.h5
β β βββ third_part/
β β βββ IntParamLF.mat
β βββ defocus_uniformat/
β β βββ diode/
β β β βββ diode_indoor_v2/
β β β β βββ 000000.npy
β β β β βββ 000001.npy
β β β β βββ ...
β β β βββ diode_outdoor_v2/
β β β βββ 000000.npy
β β β βββ 000001.npy
β β β βββ ...
β β βββ ibims/
β β βββ 000000.npy
β β βββ 000001.npy
β β βββ ...
β βββ HAMMER/
β β βββ scene2_traj1_1/
β β β βββ 000000/
β β β β βββ depth.png
β β β β βββ intrinsics.json
β β β β βββ meta.json
β β β βββ ...
β β βββ ...
β β βββ .index.txt
β βββ infinigen_defocus/
β β βββ 1a4897de_1/
β β β βββ cam_all_in_focus.npz
β β β βββ cam_ap_1.40_fd_0.80.npz
β β β βββ ...
β β β βββ depth.npy
β β β βββ image_all_in_focus.png
β β β βββ image_ap_1.40_fd_0.80.png
β β βββ ...
β βββ ZEDD/
β β βββ test/
β β β βββ test_0001/
β β β β βββ focus_stack/
β β β β β βββ img_run_1_motor_6D3E_aperture_F1.4.jpg
β β β β β βββ img_run_1_motor_6D3E_aperture_F2.0.jpg
β β β β β βββ ...
β β β β βββ gt/
β β β β βββ K.txt
β β β βββ ...
β β βββ val/
β β βββ val_0001/
β β β βββ focus_stack/
β β β β βββ img_run_1_motor_6D3E_aperture_F1.4.jpg
β β β β βββ img_run_1_motor_6D3E_aperture_F2.0.jpg
β β β β βββ ...
β β β βββ gt/
β β β βββ depth_vis.jpg
β β β βββ depth.npy
β β β βββ K.txt
β β β βββ overlay.jpg
β β βββ ...
β βββ splits/
β βββ infinigen_defocus/
β βββ val.json
βββ __init__.py
βββ base.py
βββ ddff12_val.py
βββ hammer.py
βββ infinigen_defocus.py
βββ uniformat.py
βββ zedd.py
Dataset: ZEDD on Hugging Face
Dataset: Infinigen Defocus on Hugging Face
Dataset: Preprocessed (depth holes filled) on Hugging Face
The easiest way to validate is using the distributed validation script:
bash dist_val.sh --encoder [VITS/VITB] --resumed_from [NAME OF PARAMETERS] --val_loader_config_choice [VAL_CONFIG_CHOICE]See config/validation_configs.py for all predefined validation setups:
Option 1: Load from HuggingFace Hub (recommended)
resumed_from='model_name' # automatically pull from venkatsubra/model_nameOption 2: Load from local path
resumed_from='/path/to/model.pth'πΉ ViT-S
Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4450 | 0.7866 | 0.8858 | 0.0985 |
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.5201 | 0.8635 | 0.9400 | 0.0847 |
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice ibims_F1_4_adaptive_fd| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.5193 | 0.8502 | 0.9540 | 0.0745 |
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice diode_F1_4_adaptive_fd| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4105 | 0.6649 | 0.7661 | 0.1778 |
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice hammer_F1_4_adaptive_fd| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.6006 | 0.9889 | 0.9987 | 0.0440 |
bash dist_val.sh --encoder vits --resumed_from fossa-vits \
--val_loader_config_choice ddff12_val| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0015 | 0.0352 | 0.2676 | 0.0119 | 0.3462 | 0.8119 | 0.9544 |
bash dist_val.sh --encoder vits --resumed_from fossa-vits-ddff-finetuned \
--val_loader_config_choice ddff12_val| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0004 | 0.0183 | 0.1076 | 0.0045 | 0.9363 | 0.9829 | 0.9908 |
πΉ ViT-B
Note: The results below are on the validation split, so do not match the numbers in Table 2 on the test split
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice zedd_F2_8_fixed_fd_0_2_4_6_8| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4317 | 0.8101 | 0.9194 | 0.0957 |
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice infinigen_defocus_F1_4_fixed_fd_0_8,1_7,3_0,4_7,8_0readme
| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4199 | 0.8199 | 0.9355 | 0.0908 |
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice ibims_F1_4_adaptive_fd| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.5548 | 0.8719 | 0.9633 | 0.0701 |
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice diode_F1_4_adaptive_fd| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.4127 | 0.6692 | 0.7786 | 0.1601 |
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice hammer_F1_4_adaptive_fd| D1.05 | D1.15 | D1.25 | abs_rel |
|---|---|---|---|
| 0.9377 | 0.9974 | 0.9993 | 0.0172 |
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb \
--val_loader_config_choice ddff12_val| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0013 | 0.0324 | 0.2105 | 0.0107 | 0.6075 | 0.9206 | 0.9679 |
bash dist_val.sh --encoder vitb --resumed_from fossa-vitb-ddff-finetuned \
--val_loader_config_choice ddff12_val| MSE | RMSE | AbsRel | SqRel | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 0.0003 | 0.0148 | 0.1088 | 0.0025 | 0.9322 | 0.9866 | 0.9939 |
For ZEDD test set, save model outputs in the following format:
- A single
.zipfile containing exactly 50.npyfiles at the root level (no subdirectories) - Files must be named
zedd_output_0001.npythroughzedd_output_0050.npy - Each
.npyfile must be a 2-D float array of shape (H=1216, W=1824) β no channel dimension - All values must be finite (no NaN or Inf)
Please run the following command to check the file format before submitting to the server:
python zedd_test/zedd_check_format.py --zip [YOUR_ZIP_FILE]Here is an example to compile the zip file for FOSSA ViT-S:
bash dist_test.sh --encoder=vits --resumed_from fossa-vits --val_loader_config_choice zedd_test_F2_8_fixed_fd_0_2_4_6_8 --experiment_name=FOSSA --zedd_test_output_dir=zedd_outputsFinally, submit your zip file to the ZEDD test server.
See Training.md for details.
PowerExpPSF building
If you see an error like: "error: [Errno 2] No such file or directory: '/usr/local/cuda-12.1/bin/nvcc'" or "nvcc not found", this means your environment does not have a CUDA toolkit with nvcc available.
On cluster environments, load an available CUDA module:
module avail cuda
module load cudatoolkit/12.6 # or closest version to your PyTorch CUDA
export CUDA_HOME=/usr/local/cuda-12.6
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"Then verify:
which nvcc
nvcc --versionThen retry:
python setup.py install@article{ZeroShotDepthFromDefocus,
author = {Zuo, Yiming and Wen, Hongyu and Subramanian, Venkat and Chen, Patrick and Kayan, Karhan and Bijelic, Mario and Heide, Felix and Deng, Jia},
title = {Zero-Shot Depth from Defocus},
journal = {arXiv preprint arXiv:2603.26658},
year = {2026},
url = {https://arxiv.org/abs/2603.26658}
}This codebase is partially based on Depth Anything v2, Video Depth Anything, DFF-DFV, and Unsupervised Depth from Focus.