Skip to content

black forest labs self-flow (wip)#2639

Draft
bghira wants to merge 9 commits intomainfrom
feature/self-flow
Draft

black forest labs self-flow (wip)#2639
bghira wants to merge 9 commits intomainfrom
feature/self-flow

Conversation

@bghira
Copy link
Copy Markdown
Owner

@bghira bghira commented Mar 14, 2026

Closes #2632

Most of the model implementations haven't had full training passes done, so this is still a draft.

It's reusing a lot of the CREPA code paths, but every single model required extensive modification for token-wise noise.

Validation (inference) isn't yet working for these finetunes.

Self flow requires a lot of data (millions of samples) - and the performance benefits tail off as you scale dataset up.

The usefulness of it for end-user finetuning is questionable.

This pull request introduces comprehensive support for tokenwise conditioning and self-flow regularization in both the ACE-Step and AuraFlow models. The changes enable more flexible handling of per-token timesteps and embeddings, improve error checking, and unify the processing of conditioning information for advanced training scenarios such as CREPA Self-Flow. Key updates include new batch preparation methods, improved embedding handling, and robust error handling for tokenwise inputs.

Tokenwise conditioning and self-flow regularization support:

  • Added supports_crepa_self_flow and _prepare_crepa_self_flow_batch methods to both ACEStep and AuraFlow models, enabling CREPA Self-Flow training with correct patch size and token masking logic. (simpletuner/helpers/models/ace_step/model.py, simpletuner/helpers/models/auraflow/model.py) [1] [2]
  • Introduced _select_crepa_hidden_states methods in both models to retrieve hidden states from specific transformer layers for regularization. (simpletuner/helpers/models/ace_step/model.py, simpletuner/helpers/models/auraflow/model.py) [1] [2]

Tokenwise timestep and embedding handling:

  • Added _acestep_apply_tokenwise_timestep_embed helper and updated embedding logic to handle per-token timesteps in ACE-Step transformer and decoding modules, including robust error checking for shape mismatches. (simpletuner/helpers/models/ace_step/transformer.py) [1] [2]
  • Improved AuraFlow's model prediction logic with _prepare_model_predict_timesteps to validate and normalize tokenwise timesteps, ensuring proper handling of batch and sequence dimensions. (simpletuner/helpers/models/auraflow/model.py) [1] [2]

AdaLayerNorm and attention updates:

  • Refactored AdaLayerNormZero and attention blocks in AuraFlow to use _apply_adaln_zero, allowing correct processing of per-token embeddings and normalization for both main and context branches. (simpletuner/helpers/models/auraflow/transformer.py) [1] [2] [3] [4]
  • Updated ACE-Step attention and transformer blocks to handle tokenwise scale/shift/gate parameters, including squeeze operations for correct dimensionality. (simpletuner/helpers/models/ace_step/attention.py, simpletuner/helpers/models/ace_step/transformer.py) [1] [2]

Additional improvements and error handling:

  • Enhanced error messages and shape validation throughout batch preparation, embedding, and model prediction functions to prevent silent failures and guide proper usage. (simpletuner/helpers/models/ace_step/model.py, simpletuner/helpers/models/auraflow/model.py, simpletuner/helpers/models/auraflow/transformer.py) [1] [2] [3]

These updates collectively enable advanced training and inference workflows with tokenwise conditioning, improving both robustness and flexibility for CREPA Self-Flow and related regularization techniques.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive support for tokenwise (per-token) timestep conditioning and CREPA Self-Flow regularization across all supported model architectures. Self-Flow is a self-supervised alternative to REPA that doesn't require external encoder models, originating from the BFL team's research.

Changes:

  • Added supports_crepa_self_flow() and _prepare_crepa_self_flow_batch() to every model class, enabling per-token noise scheduling for self-flow training
  • Extended every transformer's forward pass to accept 2D tokenwise timestep tensors with validation, and updated modulation/normalization blocks to handle per-token embeddings
  • Introduced CrepaFeatureSource enum and refactored CREPA to support encoder, backbone, and self-flow feature sources uniformly

Reviewed changes

Copilot reviewed 94 out of 94 changed files in this pull request and generated no comments.

Show a summary per file
File Description
simpletuner/helpers/training/crepa.py New CrepaFeatureSource enum, refactored feature source selection
simpletuner/helpers/training/trainer.py Handle multi-dim timesteps in logging
simpletuner/helpers/models/*/model.py Added self-flow batch prep, tokenwise timestep handling, capture block override in model_predict
simpletuner/helpers/models/*/transformer.py Extended forward passes to accept 2D tokenwise timesteps with validation and per-token modulation
simpletuner/helpers/models/*/attention.py Updated AdaLN blocks for tokenwise scale/shift/gate
tests/* Comprehensive test coverage for tokenwise timestep acceptance, rejection, and self-flow batch preparation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants