Skip to content

[LV] Don't skip VPlan cost model for div/rem instructions#187056

Open
david-arm wants to merge 4 commits intollvm:mainfrom
david-arm:no_skip_vplan_costs_pt1
Open

[LV] Don't skip VPlan cost model for div/rem instructions#187056
david-arm wants to merge 4 commits intollvm:mainfrom
david-arm:no_skip_vplan_costs_pt1

Conversation

@david-arm
Copy link
Copy Markdown
Contributor

In LoopVectorizationPlanner::precomputeCosts we are skipping calculation of costs using the VPlan cost model, instead preferring to use the legacy costs. This helps to prevent the legacy and vplan cost model assert firing, but really we should be encouraging full use of the VPlan cost model.

I've created this initial PR to stop skipping the computation costs for udiv/urem/sdiv/srem instructions. The VPlan costs seem to match up nicely.

I intend to follow up with more PRs to move more opcodes across.

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Mar 17, 2026

@llvm/pr-subscribers-llvm-transforms

Author: David Sherwood (david-arm)

Changes

In LoopVectorizationPlanner::precomputeCosts we are skipping calculation of costs using the VPlan cost model, instead preferring to use the legacy costs. This helps to prevent the legacy and vplan cost model assert firing, but really we should be encouraging full use of the VPlan cost model.

I've created this initial PR to stop skipping the computation costs for udiv/urem/sdiv/srem instructions. The VPlan costs seem to match up nicely.

I intend to follow up with more PRs to move more opcodes across.


Full diff: https://github.com/llvm/llvm-project/pull/187056.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+14-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll (+4-4)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ac9b790c739bf..59b039a75eec2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7009,8 +7009,21 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
     });
     Cost += ForcedCost;
   }
+
+  auto UseVPlanCostModel = [](Instruction *I) -> bool {
+    switch (I->getOpcode()) {
+    case Instruction::SDiv:
+    case Instruction::UDiv:
+    case Instruction::SRem:
+    case Instruction::URem:
+      return true;
+    default:
+      return false;
+    }
+  };
   for (const auto &[Scalarized, ScalarCost] : CM.InstsToScalarize[VF]) {
-    if (CostCtx.skipCostComputation(Scalarized, VF.isVector()))
+    if (UseVPlanCostModel(Scalarized) ||
+        CostCtx.skipCostComputation(Scalarized, VF.isVector()))
       continue;
     CostCtx.SkipCostComputation.insert(Scalarized);
     LLVM_DEBUG({
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll b/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
index 1f3949172b758..983c1b9c2b902 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
@@ -13,7 +13,7 @@ target triple = "aarch64--linux-gnu"
 ; %var4 a lower scalarization overhead.
 ;
 ; COST-LABEL:  predicated_udiv_scalarized_operand
-; COST:        Cost of 5 for VF 2: profitable to scalarize   %var4 = udiv i64 %var2, %var3
+; COST:        Cost of 5 for VF 2: REPLICATE ir<%var4> = udiv ir<%var2>, ir<%var3> (S->V)
 ;
 ;
 define i64 @predicated_udiv_scalarized_operand(ptr %a, i64 %x) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
index d84a6e27e5473..944632a796bdb 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
@@ -19,7 +19,7 @@ target triple = "aarch64--linux-gnu"
 ;   (udiv(2) + extractelement(8) + insertelement(4)) / 2 = 7
 ;
 ; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
-; CHECK: Cost of 7 for VF 2: profitable to scalarize   %tmp4 = udiv i32 %tmp2, %tmp3
+; CHECK: Cost of 7 for VF 2: REPLICATE ir<%tmp4> = udiv ir<%tmp2>, ir<%tmp3> (S->V)
 ;
 define i32 @predicated_udiv(ptr %a, ptr %b, i1 %c, i64 %n) {
 entry:
@@ -135,8 +135,8 @@ for.end:
 ;
 ; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x
 ; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
-; CHECK: Cost of 5 for VF 2: profitable to scalarize   %tmp4 = udiv i32 %tmp2, %tmp3
 ; CHECK: Cost of 3 for VF 2: profitable to scalarize   %tmp3 = add nsw i32 %tmp2, %x
+; CHECK: Cost of 5 for VF 2: REPLICATE ir<%tmp4> = udiv ir<%tmp2>, ir<%tmp3> (S->V)
 ;
 
 define i32 @predicated_udiv_scalarized_operand(ptr %a, i1 %c, i32 %x, i64 %n) {
@@ -233,11 +233,11 @@ for.end:
 ; CHECK:     Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2
 ; CHECK:     Scalarizing: %tmp5 = sub i32 %tmp4, %x
 ; CHECK:     Scalarizing and predicating: store i32 %tmp5, ptr %tmp0, align 4
-; CHECK: Cost of 7 for VF 2: profitable to scalarize   %tmp3 = sdiv i32 %tmp1, %tmp2
-; CHECK: Cost of 7 for VF 2: profitable to scalarize   %tmp4 = udiv i32 %tmp3, %tmp2
 ; CHECK: Cost of 2 for VF 2: profitable to scalarize   store i32 %tmp5, ptr %tmp0, align 4
 ; CHECK: Cost of 3 for VF 2: profitable to scalarize   %tmp5 = sub i32 %tmp4, %x
 ; CHECK: Cost of 1 for VF 2: WIDEN ir<%tmp2> = add ir<%tmp1>, ir<%x>
+; CHECK: Cost of 7 for VF 2: REPLICATE ir<%tmp3> = sdiv ir<%tmp1>, ir<%tmp2>
+; CHECK: Cost of 5 for VF 2: REPLICATE ir<%tmp4> = udiv ir<%tmp3>, ir<%tmp2>
 ;
 define void @predication_multi_context(ptr %a, i1 %c, i32 %x, i64 %n) {
 entry:

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Mar 17, 2026

@llvm/pr-subscribers-vectorizers

Author: David Sherwood (david-arm)

Changes

In LoopVectorizationPlanner::precomputeCosts we are skipping calculation of costs using the VPlan cost model, instead preferring to use the legacy costs. This helps to prevent the legacy and vplan cost model assert firing, but really we should be encouraging full use of the VPlan cost model.

I've created this initial PR to stop skipping the computation costs for udiv/urem/sdiv/srem instructions. The VPlan costs seem to match up nicely.

I intend to follow up with more PRs to move more opcodes across.


Full diff: https://github.com/llvm/llvm-project/pull/187056.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+14-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll (+4-4)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ac9b790c739bf..59b039a75eec2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7009,8 +7009,21 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
     });
     Cost += ForcedCost;
   }
+
+  auto UseVPlanCostModel = [](Instruction *I) -> bool {
+    switch (I->getOpcode()) {
+    case Instruction::SDiv:
+    case Instruction::UDiv:
+    case Instruction::SRem:
+    case Instruction::URem:
+      return true;
+    default:
+      return false;
+    }
+  };
   for (const auto &[Scalarized, ScalarCost] : CM.InstsToScalarize[VF]) {
-    if (CostCtx.skipCostComputation(Scalarized, VF.isVector()))
+    if (UseVPlanCostModel(Scalarized) ||
+        CostCtx.skipCostComputation(Scalarized, VF.isVector()))
       continue;
     CostCtx.SkipCostComputation.insert(Scalarized);
     LLVM_DEBUG({
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll b/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
index 1f3949172b758..983c1b9c2b902 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
@@ -13,7 +13,7 @@ target triple = "aarch64--linux-gnu"
 ; %var4 a lower scalarization overhead.
 ;
 ; COST-LABEL:  predicated_udiv_scalarized_operand
-; COST:        Cost of 5 for VF 2: profitable to scalarize   %var4 = udiv i64 %var2, %var3
+; COST:        Cost of 5 for VF 2: REPLICATE ir<%var4> = udiv ir<%var2>, ir<%var3> (S->V)
 ;
 ;
 define i64 @predicated_udiv_scalarized_operand(ptr %a, i64 %x) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
index d84a6e27e5473..944632a796bdb 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
@@ -19,7 +19,7 @@ target triple = "aarch64--linux-gnu"
 ;   (udiv(2) + extractelement(8) + insertelement(4)) / 2 = 7
 ;
 ; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
-; CHECK: Cost of 7 for VF 2: profitable to scalarize   %tmp4 = udiv i32 %tmp2, %tmp3
+; CHECK: Cost of 7 for VF 2: REPLICATE ir<%tmp4> = udiv ir<%tmp2>, ir<%tmp3> (S->V)
 ;
 define i32 @predicated_udiv(ptr %a, ptr %b, i1 %c, i64 %n) {
 entry:
@@ -135,8 +135,8 @@ for.end:
 ;
 ; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x
 ; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
-; CHECK: Cost of 5 for VF 2: profitable to scalarize   %tmp4 = udiv i32 %tmp2, %tmp3
 ; CHECK: Cost of 3 for VF 2: profitable to scalarize   %tmp3 = add nsw i32 %tmp2, %x
+; CHECK: Cost of 5 for VF 2: REPLICATE ir<%tmp4> = udiv ir<%tmp2>, ir<%tmp3> (S->V)
 ;
 
 define i32 @predicated_udiv_scalarized_operand(ptr %a, i1 %c, i32 %x, i64 %n) {
@@ -233,11 +233,11 @@ for.end:
 ; CHECK:     Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2
 ; CHECK:     Scalarizing: %tmp5 = sub i32 %tmp4, %x
 ; CHECK:     Scalarizing and predicating: store i32 %tmp5, ptr %tmp0, align 4
-; CHECK: Cost of 7 for VF 2: profitable to scalarize   %tmp3 = sdiv i32 %tmp1, %tmp2
-; CHECK: Cost of 7 for VF 2: profitable to scalarize   %tmp4 = udiv i32 %tmp3, %tmp2
 ; CHECK: Cost of 2 for VF 2: profitable to scalarize   store i32 %tmp5, ptr %tmp0, align 4
 ; CHECK: Cost of 3 for VF 2: profitable to scalarize   %tmp5 = sub i32 %tmp4, %x
 ; CHECK: Cost of 1 for VF 2: WIDEN ir<%tmp2> = add ir<%tmp1>, ir<%x>
+; CHECK: Cost of 7 for VF 2: REPLICATE ir<%tmp3> = sdiv ir<%tmp1>, ir<%tmp2>
+; CHECK: Cost of 5 for VF 2: REPLICATE ir<%tmp4> = udiv ir<%tmp3>, ir<%tmp2>
 ;
 define void @predication_multi_context(ptr %a, i1 %c, i32 %x, i64 %n) {
 entry:

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates LoopVectorize’s cost precomputation logic to stop bypassing the VPlan cost model for scalarized div/rem instructions, aligning the debug-cost output and tests with VPlan-based costing for those opcodes.

Changes:

  • Adjusted LoopVectorizationPlanner::precomputeCosts to avoid precomputing legacy scalarization costs for sdiv/udiv/srem/urem, allowing VPlan recipe costing to account for them.
  • Updated AArch64 LoopVectorize debug-cost tests to expect REPLICATE ... VPlan cost lines instead of legacy “profitable to scalarize” lines for affected div/rem instructions.
  • Refreshed corresponding FileCheck patterns in AArch64 predication cost tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Changes cost precomputation to defer div/rem scalarized instruction costing to the VPlan model.
llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll Updates cost-model assertions to match VPlan REPLICATE debug output for div/rem.
llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll Updates expected cost-model debug output for udiv scalarization to VPlan REPLICATE form.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 7024 to 7027
for (const auto &[Scalarized, ScalarCost] : CM.InstsToScalarize[VF]) {
if (CostCtx.skipCostComputation(Scalarized, VF.isVector()))
if (UseVPlanCostModel(Scalarized) ||
CostCtx.skipCostComputation(Scalarized, VF.isVector()))
continue;
Comment on lines +239 to +240
; CHECK: Cost of 7 for VF 2: REPLICATE ir<%tmp3> = sdiv ir<%tmp1>, ir<%tmp2>
; CHECK: Cost of 5 for VF 2: REPLICATE ir<%tmp4> = udiv ir<%tmp3>, ir<%tmp2>
Copy link
Copy Markdown
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run this on a set of workloads and it looks like there's some divergences at least on AAarch64. Will prepare a reproducer soon

@david-arm
Copy link
Copy Markdown
Contributor Author

I did run this on a set of workloads and it looks like there's some divergences at least on AAarch64. Will prepare a reproducer soon

OK thanks. I also independently found that the current VPlan cost model for udiv/sdiv is incorrect. I have another patch I intend to upstream that may fix the divergence you have found. It relates to uses of the udiv - if the only use is within the same block then it will be scalar and no inserts of the result into a vector are required. This is one area of divergence between the VP context getScalarizationOverhead and the scalar costs calculated in computePredInstDiscounts, which only calculates the cost of the result inserts if it's scalar with predication.

@david-arm
Copy link
Copy Markdown
Contributor Author

I did run this on a set of workloads and it looks like there's some divergences at least on AAarch64. Will prepare a reproducer soon.

Any update here @fhahn? I've improved the cost model to better reflect the VPlan model and the generated LLVM IR. I have more patches to follow in this area that are currently gated on this patch.

Comment on lines +1802 to +1803
// Is this recipe only used by other recipes in the same block? If so, the
// result does not need scalarizing since it's only use will be scalar.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised this does not trigger the divergence assertion, as this doesn't match the legacy result as per the test changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, although in this case I believe the vplan cost model to be more accurate. It would be a shame to deliberately over-cost simply to align with the incorrect legacy cost model.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is also required for follow-on patches, such as using the vplan replicate cost model for more instructions such as Instruction::FAdd where the legacy cost model was producing lower costs than the vplan model.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to provide the existing test case that shows this problem if that helps?

fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 23, 2026
The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with llvm#187056.
@fhahn
Copy link
Copy Markdown
Contributor

fhahn commented Mar 23, 2026

I did run this on a set of workloads and it looks like there's some divergences at least on AAarch64. Will prepare a reproducer soon.

Any update here @fhahn? I've improved the cost model to better reflect the VPlan model and the generated LLVM IR. I have more patches to follow in this area that are currently gated on this patch.

Sorry took a bit longer to get a good reproducer. It looks like a separate issue to insert/extract costs. I think with #188126, the version of the PR without the latest change didn't trigger any crashes with a set of different configs in my testing. I've not yet tried with the latest change, but it may be worth splitting it up?

@david-arm
Copy link
Copy Markdown
Contributor Author

I did run this on a set of workloads and it looks like there's some divergences at least on AAarch64. Will prepare a reproducer soon.

Any update here @fhahn? I've improved the cost model to better reflect the VPlan model and the generated LLVM IR. I have more patches to follow in this area that are currently gated on this patch.

Sorry took a bit longer to get a good reproducer. It looks like a separate issue to insert/extract costs. I think with #188126, the version of the PR without the latest change didn't trigger any crashes with a set of different configs in my testing. I've not yet tried with the latest change, but it may be worth splitting it up?

OK sure, happy to split up. Does that mean the original version of this PR essentially depends upon #188126 to avoid crashes?

@fhahn
Copy link
Copy Markdown
Contributor

fhahn commented Mar 24, 2026

I did run this on a set of workloads and it looks like there's some divergences at least on AAarch64. Will prepare a reproducer soon.

Any update here @fhahn? I've improved the cost model to better reflect the VPlan model and the generated LLVM IR. I have more patches to follow in this area that are currently gated on this patch.

Sorry took a bit longer to get a good reproducer. It looks like a separate issue to insert/extract costs. I think with #188126, the version of the PR without the latest change didn't trigger any crashes with a set of different configs in my testing. I've not yet tried with the latest change, but it may be worth splitting it up?

OK sure, happy to split up. Does that mean the original version of this PR essentially depends upon #188126 to avoid crashes?

Yep, I expect that this may surface a few more issues by using the VPlan-based cost model more, but at least the issue in #188126 seems to be a genuine bug in the VPlan based cost model (+ one in the AArch64 cost model perhaps)

In LoopVectorizationPlanner::precomputeCosts we are skipping
calculation of costs using the VPlan cost model, instead
preferring to use the legacy costs. This helps to prevent
the legacy and vplan cost model assert firing, but really
we should be encouraging full use of the VPlan cost model.

I've created this initial PR to stop skipping the
computation costs for udiv/urem/sdiv/srem instructions. The
VPlan costs seem to match up nicely.
@david-arm david-arm force-pushed the no_skip_vplan_costs_pt1 branch from 9e4a2e8 to efe5082 Compare March 24, 2026 10:22
@david-arm
Copy link
Copy Markdown
Contributor Author

Hi @fhahn, I've reverted the other cost model changes for now and will move them to a separate PR.

fhahn added a commit that referenced this pull request Mar 25, 2026
The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with #187056.

PR: #188126
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 25, 2026
…e. (#188126)

The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with llvm/llvm-project#187056.

PR: llvm/llvm-project#188126
; (sdiv(2) + extractelement(8) + insertelement(4)) / 2 = 7
; Cost of udiv:
; (udiv(2) + extractelement(8) + insertelement(4)) / 2 = 7
; (udiv(2) + extractelement(4) + insertelement(4)) / 2 = 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah there's still a difference even with the other improvements removed. Do you know where this is coming from?

I'm running the patch on a large test-set now to see if it triggers any more issues, but if possible to avoid this divergence, that would probably be more reliable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because of this code in VPCostContext::getScalarizationOverhead:

  // Compute the cost of scalarizing the operands, skipping ones that do not
  // require extraction/scalarization and do not incur any overhead.
  SmallPtrSet<const VPValue *, 4> UniqueOperands;
  SmallVector<Type *> Tys;
  for (auto *Op : Operands) {
    if (isa<VPIRValue>(Op) ||
        (!AlwaysIncludeReplicatingR &&
         isa<VPReplicateRecipe, VPPredInstPHIRecipe>(Op)) ||
        (isa<VPReplicateRecipe>(Op) &&
         cast<VPReplicateRecipe>(Op)->getOpcode() == Instruction::Load) ||
        !UniqueOperands.insert(Op).second)
      continue;
    Tys.push_back(toVectorizedTy(Types.inferScalarType(Op), VF));
  }
  return ScalarizationCost +
         TTI.getOperandsScalarizationOverhead(Tys, CostKind, VIC);

So if it's introducing a divergence between legacy and vplan cost model for udiv/urem then it's presumably also introducing the same divergence everywhere we call getScalarizationOverhead. Having said that, it feels like a backward step to deliberately regress the vplan cost model just to match the inaccurate legacy one.

I would prefer simply disabling the assert if there are replicating recipes in the vplan.

ambergorzynski pushed a commit to ambergorzynski/llvm-project that referenced this pull request Mar 27, 2026
…8126)

The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with llvm#187056.

PR: llvm#188126
Aadarsh-Keshri pushed a commit to Aadarsh-Keshri/llvm-project that referenced this pull request Mar 28, 2026
…8126)

The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with llvm#187056.

PR: llvm#188126
Himadhith pushed a commit to Himadhith/llvm-project that referenced this pull request Apr 2, 2026
…8126)

The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with llvm#187056.

PR: llvm#188126
@david-arm
Copy link
Copy Markdown
Contributor Author

Any thoughts on whether this can be merged now @fhahn?

sriyalamar pushed a commit to sriyalamar/cpullvm-toolchain that referenced this pull request Apr 7, 2026
The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with llvm/llvm-project#187056.

PR: llvm/llvm-project#188126
(cherry picked from commit 86c1510)
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 7, 2026
Almost all recipes now go through ::computeCost to properly compute
their costs using the VPlan-based cost model. There are currently no
known cases where the VPlan-based cost model returns an incorrect cost
vs the legacy cost model. I check the remaining open issues with reports
of the assertion triggering and in all cases the VPlan-based cost model
is more accurate, which is causing the divergence.

There are still some fall-back paths, mostly via precomputeCosts,
but those cannot be easily removed without triggering the assert, as the
VPlan-based cost model is more accurate for those cases. An example of
this is llvm#187056.

Fixes llvm#38575.
Fixes llvm#149651.
Fixes llvm#182646.
Fixes llvm#183739.
Fixes llvm#187523.
Copy link
Copy Markdown
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

The assert should no longer be an issue after #190838

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants