Skip to content

[compiler][autotuner] Autotuner heuristics#2392

Merged
ethche merged 12 commits into
mainfrom
compiler-seed-heuristics
May 11, 2026
Merged

[compiler][autotuner] Autotuner heuristics#2392
ethche merged 12 commits into
mainfrom
compiler-seed-heuristics

Conversation

@ethche
Copy link
Copy Markdown
Contributor

@ethche ethche commented May 10, 2026

PR #2250 introduced compiler_seed_configs to have the compiler provide a strong config in the initial population for known kernel patterns. This provides a much better initialization for the autotuner, without constraining search if the heuristic turns out to not be perfect for all relevant kernels. This allows the compiler to utilize specialized optimizations, without having to be extremely precise on where/when they should be applied.

This functionality is helpful because some high-value configs are not naturally easy for a random or generic initial population to discover:

Given that expert heuristics like from #2250 and #2357 can speed up auto-tuning, this PR sets up an centralized library and convention for compiler_seed_heuristics. The core advantage of the having the compiler provide seed configs versus user-specified seeds (enabled in #2276) is

  • the compiler can make use of data in device_ir and env to automatically determine eligiblity
  • the compiler can automatically adjust the seed config to the shape and structure of the kernel

This PR creates a new directory called autotuner_heuristics. This contains backend-specific functions:

  helion/_compiler/autotuner_heuristics/
  |-- __init__.py      # public entry point + live heuristic registry
  |-- registry.py      # SeedHeuristic interface
  |-- common.py        # shared helpers: dedupe, hardware matching, block clamping
  |-- triton.py        # Triton heuristics, e.g. TritonSkinnyGemmHeuristic
  `-- cute.py          # CuTe heuristics, e.g. CuteTcgen05ClusterM2Heuristic
  ... other backends

This PR also introduces AutotunerHeuristic as a template for compiler seed heuristics:

  class AutotunerHeuristic:                                                                                                                                                             
      name: ClassVar[str]                                                                                                                                                          
      backend: ClassVar[str]                                                                                                                                                       
                                                                                                                                                                                   
      @classmethod                                                                                                                                                                 
      def is_eligible(cls, env, device_ir) -> bool: ...                                                                                                                            
                                                                                                                                                                                   
      @classmethod                                                                                                                                                                 
      def get_seed_config(cls, env, device_ir) -> helion.Config | None: ...  

The functionality is pretty simple:

  • is_eligible() answers whether the heuristic applies.
  • get_seed_config() returns one helion.Config
  • Later on, this can include mutation heuristics that also add neighbors to the searcher.

Specific design decisions:

  • One autotuner heuristic contributes at most one seed config. This is to avoid a heuristic suggesting 100s of configs for some kernels, which can increase compile time and make autotuning more unpredictable. A heuristic should be focused and opinionated. Nevertheless, we will definitely revisit if this imposes an unproductive restriction on heuristics.
  • Multiple autotunerheuristics could be applicable for the same kernel. This could be the case for complex kernels that have multiple patterns. This could be due to heuristics making suggestions about different aspects of the config (heuristic 1: apply num_warps = 4 in this situation, heuristic 2: block_sizes in that situation, etc.). This also makes it easier to write heuristics -- we don't have to exhaustively partition the set of all kernels.
  • is_eligible() manages hardware compatibility. E.g. the heuristic may be only applicable to H100 or AMD.
  • Heuristics are backend specific. Given that different backends often correspond to different hardware and config structure, there will likely not be many heuristics that are universally applicable to all backends.
  • Metadata of which heuristics were triggered is stored in autotuner_heuristics. This saves all the heuristics that were applied (for debugging purposes). This will include heuristics that were applied but ended up being no-ops due to adding an already considered config.
  • If the config is invalid, we shouldn't crash autotuning. Given that these heuristics could be used in broad circumstances, we don't want to block anyone if the heuristic does anything weird.
  • Recommended configs should not try to specify all config fields. Specifying only specific changes (e.g. block sizes) is often good enough.

All of the active heuristics explicitly mentioned helion/_compiler/autotuner_heuristics/__init__.py.

HEURISTICS_BY_BACKEND: dict[str, tuple[SeedHeuristicType, ...]] = {
    "cute": (CuteTcgen05ClusterM2Heuristic,),
    "triton": (TritonSkinnyGemmHeuristic,),
}

To illustrate, we provide 2 examples:

We make additional changes:

  • We save MatmulFact in matmul_ops which saves the matmul metadata used by TritonSkinnyGemmHeuristic and can be useful for future heuristics.
  • We move HardwareInfo from aot_cache to a centralized _hardware.py, as this is useful for specifying hardware eligiblity in heuristics.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 10, 2026
@ethche ethche requested review from choijon5 and jansel May 10, 2026 02:55
@choijon5
Copy link
Copy Markdown
Contributor

looks like we're working on the same thing (#2378). We should merge :)

@ethche
Copy link
Copy Markdown
Contributor Author

ethche commented May 10, 2026

looks like we're working on the same thing (#2378). We should merge :)

Exactly :) would appreciate your feedback on this overall design. I think this approach can easily accomodate heuristics encoded in a JSON file.

@ethche ethche force-pushed the compiler-seed-heuristics branch from f1e5ed2 to 7513279 Compare May 10, 2026 03:21
@choijon5
Copy link
Copy Markdown
Contributor

choijon5 commented May 10, 2026

Exactly :) would appreciate your feedback on this overall design. I think this approach can easily accomodate heuristics encoded in a JSON file.

Overall design makes sense to me. My goal is to replace the default configs with better ones discovered from lots of benchmarking (some of which I gathered from #2209), which can be composed with your approach as you mentioned. I also initially had the seed configs in python, but moved to JSON as the number of seeds increased. I think there's a place for both, we should decide when something goes into one vs the other.

Would you like to take over my PR so that we don't overlap? Or I can merge mine after yours.
I plan to finish the current benchmarking on building more data into the JSON file, then look at how the LLM can utilize the seeds to create better generalized prompts that the seeds may not cover (giving the seeds directly to LLM prompt didn't help vs just using them as default configs without LLM).

compute_capability="sm100",
)


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of tests, do we need all of them? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking longer term we might want mutation heuristics as well (a custom function to get neighbors). Should we give this a more general name (without heuristics in the name)?

Copy link
Copy Markdown
Contributor Author

@ethche ethche May 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. To clarify, would this heuristic override the get_neighbors function, or seed the list of neighbors? Would be helpful to see an example.

In this case would the design look something like?

class CuteTcgen05ClusterM2:
      name: ClassVar[str]                                                                                                                                                          
      backend: ClassVar[str]                                                                                                                                                       
                                                                                                                                                                                   
      @classmethod                                                                                                                                                                 
      def is_eligible(cls, env, device_ir) -> bool: ...                                                                                                                            
                                                                                                                                                                                   
      @classmethod                                                                                                                                                                 
      def get_seed_config(cls, env, device_ir) -> helion.Config: ...  

      @classmethod
      def get_neighbors(cls, env, device_ir) -> ...
...

To replace the name SeedHeuristic, how do you feel about Template. Other options I'm thinking of:

  • Annotation
  • Tactic
  • AutotunerTemplate
  • KernelTemplate
  • Variant
  • Pattern / KernelPattern

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add the neighbors to the existing set.

Maybe AutotunerHeuristic?

Copy link
Copy Markdown
Contributor Author

@ethche ethche May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I re-named to AutotunerHeuristic with the following template:

class CuteTcgen05ClusterM2(AutotunerHeuristic):
      name: ClassVar[str]                                                                                                                                                          
      backend: ClassVar[str]                                                                                                                                                       
                                                                                                                                                                                   
      @classmethod                                                                                                                                                                 
      def is_eligible(cls, env, device_ir) -> bool: ...                                                                                                                            
                                                                                                                                                                                   
      @classmethod                                                                                                                                                                 
      def get_seed_config(cls, env, device_ir) -> helion.Config | None: ...  
...

@ethche ethche changed the title [compiler][autotuner] Compiler seed heuristics [compiler][autotuner] Autotuner heuristics May 11, 2026
@ethche ethche merged commit 551f70c into main May 11, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants