Fix orthogonal parameterization save/load for training continuation by aryamanarora · Pull Request #179 · stanfordnlp/pyreft

aryamanarora · 2026-01-12T23:09:56Z

Summary

Fix orthogonal parameterization save/load to preserve internal state for proper training continuation
Add optional debug flag for metrics logging (off by default)

Problem

The orthogonal parameterization in LoreftIntervention uses PyTorch's torch.nn.utils.parametrizations.orthogonal, which stores an internal "original" tensor and computes the orthogonal weight via Cayley/Householder transform.

Previously, state_dict() only saved the computed orthogonal weight, not the internal parametrization state. When loading and continuing training:

The orthogonal weight was written to .base
But .original was freshly initialized
After the first optimizer step, orthogonality broke (error jumped from ~1e-7 to ~0.1)

This caused loss spikes when continuing training from checkpoints.

Solution

state_dict() now saves rotate_layer_original and rotate_layer_base
load_state_dict() restores full parametrization state when available
Backwards compatible: legacy checkpoints (without new keys) still work for inference

Test plan

Verified orthogonality is preserved through save/load/continue cycle
Verified legacy checkpoints still load correctly for inference
Tested with LoreftIntervention directly

🤖 Generated with Claude Code

aryamanarora · 2026-01-12T23:15:12Z

@codex review

aryamanarora · 2026-01-13T00:18:25Z

looks like codex is not working. have to call my other agent.

@frankaging review

frankaging · 2026-01-13T02:52:37Z

LGTM!

The orthogonal parameterization in LoreftIntervention now correctly preserves internal state during checkpoint save/load. Previously, only the computed orthogonal weight was saved, which broke orthogonality during training continuation (causing loss spikes). Changes: - state_dict now saves rotate_layer_original and rotate_layer_base - load_state_dict restores full parametrization state when available - Backwards compatible: legacy checkpoints still work for inference - Add optional debug flag for metrics logging (off by default) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

stanfordnlp deleted a comment from chatgpt-codex-connector bot Jan 12, 2026

aryamanarora requested a review from frankaging January 13, 2026 00:18

aryamanarora force-pushed the fix/orthogonal-save-load branch from fe2cfc5 to d09c5bd Compare January 13, 2026 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix orthogonal parameterization save/load for training continuation#179

Fix orthogonal parameterization save/load for training continuation#179
aryamanarora wants to merge 1 commit intomainfrom
fix/orthogonal-save-load

aryamanarora commented Jan 12, 2026

Uh oh!

aryamanarora commented Jan 12, 2026

Uh oh!

aryamanarora commented Jan 13, 2026

Uh oh!

frankaging commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aryamanarora commented Jan 12, 2026

Summary

Problem

Solution

Test plan

Uh oh!

aryamanarora commented Jan 12, 2026

Uh oh!

aryamanarora commented Jan 13, 2026

Uh oh!

frankaging commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants