[compiler][autotuner] Autotuner heuristics#2392
Conversation
# Conflicts: # docs/aot_autotuning.md
|
looks like we're working on the same thing (#2378). We should merge :) |
Exactly :) would appreciate your feedback on this overall design. I think this approach can easily accomodate heuristics encoded in a JSON file. |
f1e5ed2 to
7513279
Compare
Overall design makes sense to me. My goal is to replace the default configs with better ones discovered from lots of benchmarking (some of which I gathered from #2209), which can be composed with your approach as you mentioned. I also initially had the seed configs in python, but moved to JSON as the number of seeds increased. I think there's a place for both, we should decide when something goes into one vs the other. Would you like to take over my PR so that we don't overlap? Or I can merge mine after yours. |
| compute_capability="sm100", | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
this is a lot of tests, do we need all of them? :)
There was a problem hiding this comment.
I am thinking longer term we might want mutation heuristics as well (a custom function to get neighbors). Should we give this a more general name (without heuristics in the name)?
There was a problem hiding this comment.
Interesting. To clarify, would this heuristic override the get_neighbors function, or seed the list of neighbors? Would be helpful to see an example.
In this case would the design look something like?
class CuteTcgen05ClusterM2:
name: ClassVar[str]
backend: ClassVar[str]
@classmethod
def is_eligible(cls, env, device_ir) -> bool: ...
@classmethod
def get_seed_config(cls, env, device_ir) -> helion.Config: ...
@classmethod
def get_neighbors(cls, env, device_ir) -> ...
...To replace the name SeedHeuristic, how do you feel about Template. Other options I'm thinking of:
- Annotation
- Tactic
- AutotunerTemplate
- KernelTemplate
- Variant
- Pattern / KernelPattern
There was a problem hiding this comment.
I would add the neighbors to the existing set.
Maybe AutotunerHeuristic?
There was a problem hiding this comment.
Great. I re-named to AutotunerHeuristic with the following template:
class CuteTcgen05ClusterM2(AutotunerHeuristic):
name: ClassVar[str]
backend: ClassVar[str]
@classmethod
def is_eligible(cls, env, device_ir) -> bool: ...
@classmethod
def get_seed_config(cls, env, device_ir) -> helion.Config | None: ...
...# Conflicts: # test/test_dot_requirements.py
PR #2250 introduced
compiler_seed_configsto have the compiler provide a strong config in the initial population for known kernel patterns. This provides a much better initialization for the autotuner, without constraining search if the heuristic turns out to not be perfect for all relevant kernels. This allows the compiler to utilize specialized optimizations, without having to be extremely precise on where/when they should be applied.This functionality is helpful because some high-value configs are not naturally easy for a random or generic initial population to discover:
[64, 64, 256], but as the PR points out, this is rare under random sampling.Given that expert heuristics like from #2250 and #2357 can speed up auto-tuning, this PR sets up an centralized library and convention for
compiler_seed_heuristics. The core advantage of the having the compiler provide seed configs versus user-specified seeds (enabled in #2276) isdevice_irandenvto automatically determine eligiblityThis PR creates a new directory called
autotuner_heuristics. This contains backend-specific functions:This PR also introduces
AutotunerHeuristicas a template for compiler seed heuristics:The functionality is pretty simple:
is_eligible()answers whether the heuristic applies.get_seed_config()returns one helion.ConfigSpecific design decisions:
is_eligible()manages hardware compatibility. E.g. the heuristic may be only applicable to H100 or AMD.autotuner_heuristics. This saves all the heuristics that were applied (for debugging purposes). This will include heuristics that were applied but ended up being no-ops due to adding an already considered config.All of the active heuristics explicitly mentioned
helion/_compiler/autotuner_heuristics/__init__.py.To illustrate, we provide 2 examples:
CuteTcgen05ClusterM2Heuristic: this is just copying over the heuristic from [cutedsl] seed two-cta autotune search #2250.TritonSkinnyGemmHeuristic: this is @umechand-amd's heuristic from Seeding the configs for aymmetric skinny matmuls #2357 with some slight modifications. We restrict to kernels with a single matmul and without batched. @umechand-amd let me know what you think and happy to help with a follow-up PR for more complicated kernels.We make additional changes:
MatmulFactinmatmul_opswhich saves the matmul metadata used byTritonSkinnyGemmHeuristicand can be useful for future heuristics.HardwareInfofrom aot_cache to a centralized_hardware.py, as this is useful for specifying hardware eligiblity in heuristics.