Draft
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive support for tokenwise (per-token) timestep conditioning and CREPA Self-Flow regularization across all supported model architectures. Self-Flow is a self-supervised alternative to REPA that doesn't require external encoder models, originating from the BFL team's research.
Changes:
- Added
supports_crepa_self_flow()and_prepare_crepa_self_flow_batch()to every model class, enabling per-token noise scheduling for self-flow training - Extended every transformer's forward pass to accept 2D tokenwise timestep tensors with validation, and updated modulation/normalization blocks to handle per-token embeddings
- Introduced
CrepaFeatureSourceenum and refactored CREPA to support encoder, backbone, and self-flow feature sources uniformly
Reviewed changes
Copilot reviewed 94 out of 94 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| simpletuner/helpers/training/crepa.py | New CrepaFeatureSource enum, refactored feature source selection |
| simpletuner/helpers/training/trainer.py | Handle multi-dim timesteps in logging |
| simpletuner/helpers/models/*/model.py | Added self-flow batch prep, tokenwise timestep handling, capture block override in model_predict |
| simpletuner/helpers/models/*/transformer.py | Extended forward passes to accept 2D tokenwise timesteps with validation and per-token modulation |
| simpletuner/helpers/models/*/attention.py | Updated AdaLN blocks for tokenwise scale/shift/gate |
| tests/* | Comprehensive test coverage for tokenwise timestep acceptance, rejection, and self-flow batch preparation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2632
Most of the model implementations haven't had full training passes done, so this is still a draft.
It's reusing a lot of the CREPA code paths, but every single model required extensive modification for token-wise noise.
Validation (inference) isn't yet working for these finetunes.
Self flow requires a lot of data (millions of samples) - and the performance benefits tail off as you scale dataset up.
The usefulness of it for end-user finetuning is questionable.
This pull request introduces comprehensive support for tokenwise conditioning and self-flow regularization in both the ACE-Step and AuraFlow models. The changes enable more flexible handling of per-token timesteps and embeddings, improve error checking, and unify the processing of conditioning information for advanced training scenarios such as CREPA Self-Flow. Key updates include new batch preparation methods, improved embedding handling, and robust error handling for tokenwise inputs.
Tokenwise conditioning and self-flow regularization support:
supports_crepa_self_flowand_prepare_crepa_self_flow_batchmethods to bothACEStepandAuraFlowmodels, enabling CREPA Self-Flow training with correct patch size and token masking logic. (simpletuner/helpers/models/ace_step/model.py,simpletuner/helpers/models/auraflow/model.py) [1] [2]_select_crepa_hidden_statesmethods in both models to retrieve hidden states from specific transformer layers for regularization. (simpletuner/helpers/models/ace_step/model.py,simpletuner/helpers/models/auraflow/model.py) [1] [2]Tokenwise timestep and embedding handling:
_acestep_apply_tokenwise_timestep_embedhelper and updated embedding logic to handle per-token timesteps in ACE-Step transformer and decoding modules, including robust error checking for shape mismatches. (simpletuner/helpers/models/ace_step/transformer.py) [1] [2]_prepare_model_predict_timestepsto validate and normalize tokenwise timesteps, ensuring proper handling of batch and sequence dimensions. (simpletuner/helpers/models/auraflow/model.py) [1] [2]AdaLayerNorm and attention updates:
_apply_adaln_zero, allowing correct processing of per-token embeddings and normalization for both main and context branches. (simpletuner/helpers/models/auraflow/transformer.py) [1] [2] [3] [4]simpletuner/helpers/models/ace_step/attention.py,simpletuner/helpers/models/ace_step/transformer.py) [1] [2]Additional improvements and error handling:
simpletuner/helpers/models/ace_step/model.py,simpletuner/helpers/models/auraflow/model.py,simpletuner/helpers/models/auraflow/transformer.py) [1] [2] [3]These updates collectively enable advanced training and inference workflows with tokenwise conditioning, improving both robustness and flexibility for CREPA Self-Flow and related regularization techniques.