Skip to content

Fix F.pad axis swap in pad_to_training_size#8

Open
sidd462 wants to merge 1 commit into
MrGiovanni:mainfrom
sidd462:fix-pad-to-training-size
Open

Fix F.pad axis swap in pad_to_training_size#8
sidd462 wants to merge 1 commit into
MrGiovanni:mainfrom
sidd462:fix-pad-to-training-size

Conversation

@sidd462

@sidd462 sidd462 commented Jun 21, 2026

Copy link
Copy Markdown

Closes #7.

Summary

pad_to_training_size() in rsuper_train/predict_abdomenatlas.py had two F.pad(...) calls whose padding tuples were ordered for the wrong axis. The z-axis branch was padding W, and the x-axis branch was padding D. When an input volume was too small on the z- or x-axis, the originally-too-small axis stayed too small, and the next layer rejected the shape — but the postprocess block above swallowed the exception with a bare-except, so the only signal was a single line of FAILED postprocess with no context. 25 of 901 PanTS-te cases were silently lost this way.

This PR makes two changes in one file:

  1. Fixes the F.pad axis order so each branch pads the axis it claims to.
  2. Upgrades the bare-except in the postprocess block so a future silent failure of this class isn't possible.

Why the original code was wrong

torch.nn.functional.pad reads its padding tuple last-dim-first. For a 5D tensor of shape (N, C, D, H, W) the tuple has to be ordered:

(W_left, W_right, H_left, H_right, D_left, D_right)

So the tuple controls W first, then H, then D — not the other way round.

In the original code:

  • The z-axis branch (where z < args.training_size[0]) intended to pad D, but passed (diff, diff, 0,0, 0,0). By the last-dim-first rule that pads W (and leaves D untouched).
  • The x-axis branch (where x < args.training_size[2]) intended to pad W, but passed (0,0, 0,0, diff, diff). By the same rule, that pads D.

So the two branches were each padding the other axis's branch's target. The originally-too-small axis was never enlarged, and the next layer threw on a shape mismatch.

The fix (1/2) — swap the tuples between the branches

@@ pad_to_training_size — z-axis branch (line 256)
-            tensor_img = F.pad(tensor_img, (diff, diff, 0,0, 0,0))
+            tensor_img = F.pad(tensor_img, (0,0, 0,0, diff, diff))

@@ pad_to_training_size — x-axis branch (line 274)
-            tensor_img = F.pad(tensor_img, (0,0, 0,0, diff, diff))
+            tensor_img = F.pad(tensor_img, (diff, diff, 0,0, 0,0))

After the swap:

  • z-axis branch passes (0,0, 0,0, diff, diff) → pads D (the z axis). Correct.
  • x-axis branch passes (diff, diff, 0,0, 0,0) → pads W (the x axis). Correct.

Each branch now pads the axis named in its own guard. The other axes get (0, 0) so they're untouched.

The fix (2/2) — surface postprocess failures instead of swallowing them

The bug was easy to miss for one reason: the postprocess block above the padding code had except: (catches everything) and printed only FAILED postprocess. No error type, no traceback, no case ID. That's why an axis-order bug that triggers on ~3% of cases shipped unnoticed.

-        except:
-            print('FAILED postprocess')
+        except Exception as e:
+            import traceback
+            print(f'FAILED postprocess for {img_name}: {type(e).__name__}: {e}')
+            traceback.print_exc()

Why this is bundled with the F.pad fix and not a separate PR: the bare-except is what hid this bug. Replacing it with something diagnosable closes the same root cause from a different angle — if another shape-related bug crops up in pad_to_training_size (or anywhere else in the postprocess pipeline), it'll surface immediately instead of silently dropping cases. I'd rather land both together than ship the F.pad fix and leave the silent-swallowing scaffolding in place.

If the maintainers prefer the bare-except upgrade to be a separate PR, happy to split it out.

Diff stats

1 file changed, 6 insertions(+), 4 deletions(-), all in rsuper_train/predict_abdomenatlas.py. 3 hunks total: 2 single-line F.pad swaps + the except-block expansion. No other files changed, no dependencies added.

Verification

Tested with the R-Super checkpoint on PanTS-te (n=901):

  • Before: 25 of 901 cases failed with FAILED postprocess. No prediction outputs were written for those cases, so a downstream evaluator that globs for predictions/*.nii.gz simply never saw them — exactly the loss-by-silence the bare-except was producing.
  • After: all 901 cases produce a full predictions/<class>.nii.gz tree.
  • The 25 previously-failing cases all share the property that their post-preprocessing shape is < 128 on the z- or x-axis — i.e. they hit exactly the broken branch.
  • python -m py_compile rsuper_train/predict_abdomenatlas.py passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

predict_abdomenatlas.py: pad_to_training_size pads the wrong axis on undersized inputs

1 participant