Skip to content

[LV] Update remaining tests to use VPlan cost output (NFC).#190038

Merged
fhahn merged 3 commits intollvm:mainfrom
fhahn:vplan-replace-legacy-cost-checks
Apr 7, 2026
Merged

[LV] Update remaining tests to use VPlan cost output (NFC).#190038
fhahn merged 3 commits intollvm:mainfrom
fhahn:vplan-replace-legacy-cost-checks

Conversation

@fhahn
Copy link
Copy Markdown
Contributor

@fhahn fhahn commented Apr 1, 2026

Move remaining tests checking legacy cost output to check the VPlan's cost model output.

In some cases, checks become much more compact (checking a single interleave group cost vs checking the individual members which all have the group's cost). In some cases, auto-generation consistently checks all relevant VFs.

Move remaining tests checking legacy cost output to check the VPlan's
cost model output.

In some cases, checks become much more compact (checking a single
interleave group cost vs checking the individual members which all have
the group's cost). In some cases, auto-generation consistently checks
all relevant VFs.
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Apr 1, 2026

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

Move remaining tests checking legacy cost output to check the VPlan's cost model output.

In some cases, checks become much more compact (checking a single interleave group cost vs checking the individual members which all have the group's cost). In some cases, auto-generation consistently checks all relevant VFs.


Patch is 1.10 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/190038.diff

115 Files Affected:

  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/binop-costs.ll (-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/reg-usage.ll (+16-12)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/WebAssembly/int-mac-reduction-costs.ll (+62-62)
  • (modified) llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll (+383-450)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/handle-iptr-with-data-layout-to-not-assert.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f32-stride-2.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f32-stride-3.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f32-stride-4.ll (+20-21)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f32-stride-5.ll (+20-122)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f32-stride-6.ll (+20-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f32-stride-7.ll (+19-165)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f32-stride-8.ll (+19-173)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f64-stride-2.ll (+19-53)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f64-stride-3.ll (+19-72)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f64-stride-4.ll (+18-85)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f64-stride-5.ll (+18-112)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f64-stride-6.ll (+18-135)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f64-stride-7.ll (+18-151)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-f64-stride-8.ll (+17-157)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-half.ll (+7-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i16-stride-2.ll (+27-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i16-stride-3.ll (+27-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i16-stride-4.ll (+27-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i16-stride-5.ll (+27-176)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i16-stride-6.ll (+27-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i16-stride-7.ll (+27-248)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i16-stride-8.ll (+27-252)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-2-indices-0u.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-2.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-3-indices-01u.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-3-indices-0uu.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-3.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-4-indices-012u.ll (+20-24)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-4-indices-01uu.ll (+20-53)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-4-indices-0uuu.ll (+20-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-4.ll (+20-21)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-5.ll (+20-122)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-6.ll (+20-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-7.ll (+19-165)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i32-stride-8.ll (+19-173)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i64-stride-2.ll (+19-53)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i64-stride-3.ll (+19-72)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i64-stride-4.ll (+18-85)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i64-stride-5.ll (+18-102)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i64-stride-6.ll (+18-123)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i64-stride-7.ll (+18-151)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i64-stride-8.ll (+17-157)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i8-stride-2.ll (+27-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i8-stride-3.ll (+27-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i8-stride-4.ll (+27-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i8-stride-5.ll (+27-176)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i8-stride-6.ll (+27-212)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i8-stride-7.ll (+27-248)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-load-i8-stride-8.ll (+27-284)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f32-stride-2.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f32-stride-3.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f32-stride-4.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f32-stride-5.ll (+21-26)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f32-stride-6.ll (+21-26)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f32-stride-7.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f32-stride-8.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f64-stride-2.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f64-stride-3.ll (+21-26)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f64-stride-4.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f64-stride-5.ll (+21-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f64-stride-6.ll (+21-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f64-stride-7.ll (+21-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-f64-stride-8.ll (+33-156)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i16-stride-2.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i16-stride-3.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i16-stride-4.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i16-stride-5.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i16-stride-6.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i16-stride-7.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i16-stride-8.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i32-stride-2.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i32-stride-3.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i32-stride-4.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i32-stride-5.ll (+21-26)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i32-stride-6.ll (+21-26)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i32-stride-7.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i32-stride-8.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i64-stride-2.ll (+21-29)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i64-stride-3.ll (+21-26)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i64-stride-4.ll (+21-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i64-stride-5.ll (+21-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i64-stride-6.ll (+21-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i64-stride-7.ll (+21-22)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i64-stride-8.ll (+33-156)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i8-stride-2.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i8-stride-3.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i8-stride-4.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i8-stride-5.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i8-stride-6.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i8-stride-7.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/interleaved-store-i8-stride-8.ll (+27-37)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-interleaved-load-i16.ll (+33-49)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-interleaved-store-i16.ll (+34-51)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-scatter-i32-with-i8-index.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-scatter-i64-with-i8-index.ll (+26-26)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-store-i16.ll (+21-21)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-store-i32.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-store-i64.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/masked-store-i8.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/scatter-i16-with-i8-index.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/scatter-i32-with-i8-index.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/scatter-i64-with-i8-index.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/scatter-i8-with-i8-index.ll (+25-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/strided-load-i16.ll (+20-20)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/strided-load-i32.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/strided-load-i64.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/strided-load-i8.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll (+20-5)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/binop-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/binop-costs.ll
index ff1dee41e62bf..8e4c6d470c9be 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/binop-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/binop-costs.ll
@@ -17,12 +17,6 @@ define void @udiv_rhs_opt_cost(ptr %dst) #0 {
 ; CHECK:  Cost of 0 for VF vscale x 2: IR %div = udiv i8 %iv.trunc, 3
 ; CHECK:  Cost of 5 for VF vscale x 4: CLONE ir<%div> = udiv vp<[[VP7]]>, ir<3>
 ; CHECK:  Cost of 0 for VF vscale x 4: IR %div = udiv i8 %iv.trunc, 3
-; CHECK:  LV: Found an estimated cost of 5 for VF 1 For instruction: %div = udiv i8 %iv.trunc, 3
-; CHECK:  LV: Found an estimated cost of 5 for VF 2 For instruction: %div = udiv i8 %iv.trunc, 3
-; CHECK:  LV: Found an estimated cost of 5 for VF 4 For instruction: %div = udiv i8 %iv.trunc, 3
-; CHECK:  LV: Found an estimated cost of 5 for VF vscale x 1 For instruction: %div = udiv i8 %iv.trunc, 3
-; CHECK:  LV: Found an estimated cost of 5 for VF vscale x 2 For instruction: %div = udiv i8 %iv.trunc, 3
-; CHECK:  LV: Found an estimated cost of 5 for VF vscale x 4 For instruction: %div = udiv i8 %iv.trunc, 3
 ;
 entry:
   br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll
index a44a16455445c..bcd1c28318450 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll
@@ -8,6 +8,7 @@ target triple = "aarch64-unknown-linux-gnu"
 
 define void @zext_i8_i16(ptr noalias nocapture readonly %p, ptr noalias nocapture %q, i32 %len) #0 {
 ; CHECK-COST-LABEL: LV: Checking a loop in 'zext_i8_i16'
+; CHECK-COST: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv = zext i8 %0 to i32
 ; CHECK-COST: Cost of 1 for VF 2: WIDEN-CAST ir<%conv> = zext ir<%0> to i16
 ; CHECK-COST: Cost of 1 for VF 4: WIDEN-CAST ir<%conv> = zext ir<%0> to i16
 ; CHECK-COST: Cost of 1 for VF 8: WIDEN-CAST ir<%conv> = zext ir<%0> to i16
@@ -16,7 +17,6 @@ define void @zext_i8_i16(ptr noalias nocapture readonly %p, ptr noalias nocaptur
 ; CHECK-COST: Cost of 1 for VF vscale x 2: WIDEN-CAST ir<%conv> = zext ir<%0> to i16
 ; CHECK-COST: Cost of 1 for VF vscale x 4: WIDEN-CAST ir<%conv> = zext ir<%0> to i16
 ; CHECK-COST: Cost of 0 for VF vscale x 8: WIDEN-CAST ir<%conv> = zext ir<%0> to i16
-; CHECK-COST: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv = zext i8 %0 to i32
 ; CHECK-LABEL: define void @zext_i8_i16
 ; CHECK-SAME: (ptr noalias readonly captures(none) [[P:%.*]], ptr noalias captures(none) [[Q:%.*]], i32 [[LEN:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  entry:
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage.ll b/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage.ll
index 99139da67bb78..b0738cad80064 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage.ll
@@ -84,22 +84,26 @@ define void @goo(ptr nocapture noundef %a, i32 noundef signext %n) {
 ; CHECK-SCALAR:      LV(REG): VF = 1
 ; CHECK-SCALAR-NEXT: LV(REG): Found max usage: 1 item
 ; CHECK-SCALAR-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
-; CHECK-LMUL1:       LV(REG): VF = vscale x 2
+; CHECK-LMUL1-LABEL: goo
+; CHECK-LMUL1:       LV(REG): VF = vscale x 1
 ; CHECK-LMUL1-NEXT:  LV(REG): Found max usage: 2 item
-; CHECK-LMUL1-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
-; CHECK-LMUL1-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 2 registers
-; CHECK-LMUL2:       LV(REG): VF = vscale x 4
+; CHECK-LMUL1-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
+; CHECK-LMUL1-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 1 registers
+; CHECK-LMUL2-LABEL: goo
+; CHECK-LMUL2:       LV(REG): VF = vscale x 2
 ; CHECK-LMUL2-NEXT:  LV(REG): Found max usage: 2 item
-; CHECK-LMUL2-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
-; CHECK-LMUL2-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 4 registers
-; CHECK-LMUL4:       LV(REG): VF = vscale x 8
+; CHECK-LMUL2-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
+; CHECK-LMUL2-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 2 registers
+; CHECK-LMUL4-LABEL: goo
+; CHECK-LMUL4:       LV(REG): VF = vscale x 4
 ; CHECK-LMUL4-NEXT:  LV(REG): Found max usage: 2 item
-; CHECK-LMUL4-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
-; CHECK-LMUL4-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 8 registers
-; CHECK-LMUL8:       LV(REG): VF = vscale x 16
+; CHECK-LMUL4-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
+; CHECK-LMUL4-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 4 registers
+; CHECK-LMUL8-LABEL: goo
+; CHECK-LMUL8:       LV(REG): VF = vscale x 8
 ; CHECK-LMUL8-NEXT:  LV(REG): Found max usage: 2 item
-; CHECK-LMUL8-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
-; CHECK-LMUL8-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 16 registers
+; CHECK-LMUL8-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
+; CHECK-LMUL8-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 8 registers
 entry:
   %cmp3 = icmp sgt i32 %n, 0
   br i1 %cmp3, label %for.body.preheader, label %for.cond.cleanup
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction-cost.ll b/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction-cost.ll
index 10d83f4ad125e..fe39700d1787c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction-cost.ll
@@ -3,8 +3,8 @@
 ; RUN: -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue \
 ; RUN: -mtriple=riscv64 -mattr=+v -S < %s 2>&1 | FileCheck %s
 
+; CHECK: Cost of 0 for VF vscale x 4: WIDEN-REDUCTION-PHI ir<%rdx> = phi
 ; CHECK: Cost of 2 for VF vscale x 4: WIDEN-INTRINSIC vp<%{{.+}}> = call llvm.vp.merge(ir<true>, ir<%add>, ir<%rdx>, vp<%{{.+}}>)
-; CHECK: LV: Found an estimated cost of 2 for VF vscale x 4 For instruction:   %rdx = phi i32 [ %start, %entry ], [ %add, %loop ]
 
 define i32 @add(ptr %a, i64 %n, i32 %start) {
 entry:
diff --git a/llvm/test/Transforms/LoopVectorize/WebAssembly/int-mac-reduction-costs.ll b/llvm/test/Transforms/LoopVectorize/WebAssembly/int-mac-reduction-costs.ll
index d23c2272d9c0d..9f824d1a963eb 100644
--- a/llvm/test/Transforms/LoopVectorize/WebAssembly/int-mac-reduction-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/WebAssembly/int-mac-reduction-costs.ll
@@ -11,17 +11,17 @@ define hidden i32 @i32_mac_s8(ptr nocapture noundef readonly %a, ptr nocapture n
 ; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv2 = sext i8 %1 to i32
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul nsw i32 %conv2, %conv
 
-; CHECK: LV: Found an estimated cost of 3 for VF 2 For instruction:   %0 = load i8, ptr %arrayidx, align 1
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv = sext i8 %0 to i32
-; CHECK: LV: Found an estimated cost of 3 for VF 2 For instruction:   %1 = load i8, ptr %arrayidx1, align 1
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv2 = sext i8 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul nsw i32 %conv2, %conv
-
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %0 = load i8, ptr %arrayidx, align 1
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv = sext i8 %0 to i32
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %1 = load i8, ptr %arrayidx1, align 1
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv2 = sext i8 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %mul = mul nsw i32 %conv2, %conv
+; CHECK: Cost of 3 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv> = sext ir<%0> to i32
+; CHECK: Cost of 3 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv2> = sext ir<%1> to i32
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul nsw ir<%conv2>, ir<%conv>
+
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%0> = load
+; CHECK: Cost of 1 for VF 4: WIDEN-CAST ir<%conv> = sext ir<%0> to i32
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%1> = load
+; CHECK: Cost of 1 for VF 4: WIDEN-CAST ir<%conv2> = sext ir<%1> to i32
+; CHECK: Cost of 1 for VF 4: WIDEN ir<%mul> = mul nsw ir<%conv2>, ir<%conv>
 ; CHECK: LV: Selecting VF: 4.
 entry:
   %cmp7.not = icmp eq i32 %N, 0
@@ -55,17 +55,17 @@ define hidden i32 @i32_mac_s16(ptr nocapture noundef readonly %a, ptr nocapture
 ; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv2 = sext i16 %1 to i32
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul nsw i32 %conv2, %conv
 
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %0 = load i16, ptr %arrayidx, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv = sext i16 %0 to i32
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %1 = load i16, ptr %arrayidx1, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv2 = sext i16 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul nsw i32 %conv2, %conv
-
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %0 = load i16, ptr %arrayidx, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction:   %conv = sext i16 %0 to i32
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %1 = load i16, ptr %arrayidx1, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction:   %conv2 = sext i16 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %mul = mul nsw i32 %conv2, %conv
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv> = sext ir<%0> to i32
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv2> = sext ir<%1> to i32
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul nsw ir<%conv2>, ir<%conv>
+
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%0> = load
+; CHECK: Cost of 0 for VF 4: WIDEN-CAST ir<%conv> = sext ir<%0> to i32
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%1> = load
+; CHECK: Cost of 0 for VF 4: WIDEN-CAST ir<%conv2> = sext ir<%1> to i32
+; CHECK: Cost of 1 for VF 4: WIDEN ir<%mul> = mul nsw ir<%conv2>, ir<%conv>
 ; CHECK: LV: Selecting VF: 4.
 entry:
   %cmp7.not = icmp eq i32 %N, 0
@@ -99,11 +99,11 @@ define hidden i64 @i64_mac_s16(ptr nocapture noundef readonly %a, ptr nocapture
 ; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv2 = sext i16 %1 to i64
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul nsw i64 %conv2, %conv
 
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %0 = load i16, ptr %arrayidx, align 2
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv = sext i16 %0 to i64
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %1 = load i16, ptr %arrayidx1, align 2
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv2 = sext i16 %1 to i64
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul nsw i64 %conv2, %conv
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 1 for VF 2: WIDEN-CAST ir<%conv> = sext ir<%0> to i64
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 1 for VF 2: WIDEN-CAST ir<%conv2> = sext ir<%1> to i64
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul nsw ir<%conv2>, ir<%conv>
 ; CHECK: LV: Selecting VF: 2.
 entry:
   %cmp7.not = icmp eq i32 %N, 0
@@ -136,10 +136,10 @@ define hidden i64 @i64_mac_s32(ptr nocapture noundef readonly %a, ptr nocapture
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul i32 %1, %0
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %conv = sext i32 %mul to i64
 
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %0 = load i32, ptr %arrayidx, align 4
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %1 = load i32, ptr %arrayidx1, align 4
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul i32 %1, %0
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv = sext i32 %mul to i64
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul ir<%1>, ir<%0>
+; CHECK: Cost of 1 for VF 2: WIDEN-CAST ir<%conv> = sext ir<%mul> to i64
 ; CHECK: LV: Selecting VF: 2.
 entry:
   %cmp6.not = icmp eq i32 %N, 0
@@ -172,17 +172,17 @@ define hidden i32 @i32_mac_u8(ptr nocapture noundef readonly %a, ptr nocapture n
 ; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv2 = zext i8 %1 to i32
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul nuw nsw i32 %conv2, %conv
 
-; CHECK: LV: Found an estimated cost of 3 for VF 2 For instruction:   %0 = load i8, ptr %arrayidx, align 1
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv = zext i8 %0 to i32
-; CHECK: LV: Found an estimated cost of 3 for VF 2 For instruction:   %1 = load i8, ptr %arrayidx1, align 1
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv2 = zext i8 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul nuw nsw i32 %conv2, %conv
-
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %0 = load i8, ptr %arrayidx, align 1
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv = zext i8 %0 to i32
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %1 = load i8, ptr %arrayidx1, align 1
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv2 = zext i8 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %mul = mul nuw nsw i32 %conv2, %conv
+; CHECK: Cost of 3 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv> = zext ir<%0> to i32
+; CHECK: Cost of 3 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv2> = zext ir<%1> to i32
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul nuw nsw ir<%conv2>, ir<%conv>
+
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%0> = load
+; CHECK: Cost of 1 for VF 4: WIDEN-CAST ir<%conv> = zext ir<%0> to i32
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%1> = load
+; CHECK: Cost of 1 for VF 4: WIDEN-CAST ir<%conv2> = zext ir<%1> to i32
+; CHECK: Cost of 1 for VF 4: WIDEN ir<%mul> = mul nuw nsw ir<%conv2>, ir<%conv>
 ; CHECK: LV: Selecting VF: 4.
 entry:
   %cmp7.not = icmp eq i32 %N, 0
@@ -216,17 +216,17 @@ define hidden i32 @i32_mac_u16(ptr nocapture noundef readonly %a, ptr nocapture
 ; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv2 = zext i16 %1 to i32
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul nuw nsw i32 %conv2, %conv
 
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %0 = load i16, ptr %arrayidx, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv = zext i16 %0 to i32
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %1 = load i16, ptr %arrayidx1, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction:   %conv2 = zext i16 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul nuw nsw i32 %conv2, %conv
-
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %0 = load i16, ptr %arrayidx, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction:   %conv = zext i16 %0 to i32
-; CHECK: LV: Found an estimated cost of 2 for VF 4 For instruction:   %1 = load i16, ptr %arrayidx1, align 2
-; CHECK: LV: Found an estimated cost of 0 for VF 4 For instruction:   %conv2 = zext i16 %1 to i32
-; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction:   %mul = mul nuw nsw i32 %conv2, %conv
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv> = zext ir<%0> to i32
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 0 for VF 2: WIDEN-CAST ir<%conv2> = zext ir<%1> to i32
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul nuw nsw ir<%conv2>, ir<%conv>
+
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%0> = load
+; CHECK: Cost of 0 for VF 4: WIDEN-CAST ir<%conv> = zext ir<%0> to i32
+; CHECK: Cost of 2 for VF 4: WIDEN ir<%1> = load
+; CHECK: Cost of 0 for VF 4: WIDEN-CAST ir<%conv2> = zext ir<%1> to i32
+; CHECK: Cost of 1 for VF 4: WIDEN ir<%mul> = mul nuw nsw ir<%conv2>, ir<%conv>
 ; CHECK: LV: Selecting VF: 4.
 entry:
   %cmp7.not = icmp eq i32 %N, 0
@@ -260,11 +260,11 @@ define hidden i64 @i64_mac_u16(ptr nocapture noundef readonly %a, ptr nocapture
 ; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv2 = zext i16 %1 to i64
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul nuw nsw i64 %conv2, %conv
 
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %0 = load i16, ptr %arrayidx, align 2
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv = zext i16 %0 to i64
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %1 = load i16, ptr %arrayidx1, align 2
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv2 = zext i16 %1 to i64
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul nuw nsw i64 %conv2, %conv
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 1 for VF 2: WIDEN-CAST ir<%conv> = zext ir<%0> to i64
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 1 for VF 2: WIDEN-CAST ir<%conv2> = zext ir<%1> to i64
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul nuw nsw ir<%conv2>, ir<%conv>
 ; CHECK: LV: Selecting VF: 2.
 entry:
   %cmp8.not = icmp eq i32 %N, 0
@@ -297,10 +297,10 @@ define hidden i64 @i64_mac_u32(ptr nocapture noundef readonly %a, ptr nocapture
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul = mul i32 %1, %0
 ; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction:   %conv = zext i32 %mul to i64
 
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %0 = load i32, ptr %arrayidx, align 4
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction:   %1 = load i32, ptr %arrayidx1, align 4
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul = mul i32 %1, %0
-; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv = zext i32 %mul to i64
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%0> = load
+; CHECK: Cost of 2 for VF 2: WIDEN ir<%1> = load
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%mul> = mul ir<%1>, ir<%0>
+; CHECK: Cost of 1 for VF 2: WIDEN-CAST ir<%conv> = zext ir<%mul> to i64
 ; CHECK: LV: Selecting VF: 2.
 entry:
   %cmp6.not = icmp eq i32 %N, 0
diff --git a/llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll b/llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll
index 54cbab78b1e29..573863121966c 100644
--- a/llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll
+++ b/llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll
@@ -19,12 +19,12 @@ target triple = "wasm32-unknown-wasi"
 %struct.FourFloats = type { float, float, float, float }
 
 ; CHECK-LABEL: two_ints_same_op
+; CHECK: LV: Scalar loop costs: 12.
 ; CHECK: Cost of 7 for VF 2: INTERLEAVE-GROUP with factor 2 at %10
+; CHECK: Cost for VF 2: 27 (Estimated cost per lane: 13.5)
 ; CHECK: Cost of 6 for VF 4: INTERLEAVE-GROUP with factor 2 at %10
-; CHECK: LV: Scalar loop costs: 12.
-; CHECK: LV: Vector loop of width 2 costs: 13.
-; CHECK: LV: Vector loop of width 4 costs: 6.
-; CHECK: LV: Selecting VF: 4
+; CHECK: Cost for VF 4: 24 (Estimated cost per lane: 6.0)
+; CHECK: LV: Selecting VF: 4.
 define hidden void @two_ints_same_op(ptr noalias nocapture noundef writeonly %0, ptr nocapture noundef readonly %1, ptr nocapture noundef readonly %2, i32 ...
[truncated]

; CHECK-SCALAR-NEXT: LV(REG): Found max usage: 1 item
; CHECK-SCALAR-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
; CHECK-LMUL1: LV(REG): VF = vscale x 2
; CHECK-LMUL1-LABEL: goo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how this was correct before!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think the check patterns before were not really checking what they were supposed to and would break when the legacy cost model output would be removed

; CHECK: LV: Selecting VF: 1
; CHECK: Cost of 6 for VF 2: REPLICATE ir<%10> = load ir<%9>
; CHECK: Cost of 6 for VF 2: REPLICATE ir<%12> = load ir<%11>
; CHECK: Cost for VF 2: 61 (Estimated cost per lane: 30.5)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's missing the store costs or have you intentionally dropped them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, should be re-added. thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be added, thanks

; CHECK: LV: Found an estimated cost of 4 for VF 16 For instruction: %13 = mul i8
; CHECK: LV: Found an estimated cost of 6 for VF 16 For instruction: store i8 %19
; CHECK: LV: Vector loop of width 16 costs: 1.
; CHECK: Cost of 6 for VF 2: REPLICATE ir<%10> = load ir<%9>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you dropped the muls here because they aren't necessary for the test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the check lines. in the test are not really consistent throughout the test file, dropping them seemed a bit more consistent with other functions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: %v4 = load double, ptr %in4, align 8
; AVX512: Cost of 9 for VF 2: INTERLEAVE-GROUP with factor 5 at %v0, ir<%in0>
; AVX512: Cost of 18 for VF 4: INTERLEAVE-GROUP with factor 5 at %v0, ir<%in0>
; AVX512: Cost of 35 for VF 8: INTERLEAVE-GROUP with factor 5 at %v0, ir<%in0>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want the VFs of 16 and 32?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be restored, thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be added back, thanks

; AVX512: Cost of 11 for VF 2: INTERLEAVE-GROUP with factor 6 at %v0, ir<%in0>
; AVX512: Cost of 21 for VF 4: INTERLEAVE-GROUP with factor 6 at %v0, ir<%in0>
; AVX512: Cost of 51 for VF 8: INTERLEAVE-GROUP with factor 6 at %v0, ir<%in0>
;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VFs 16 and 32?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be restored thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be added back thanks

; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %v6 = load i64, ptr %in6, align 8
; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %v7 = load i64, ptr %in7, align 8
; AVX512: Cost of 14 for VF 2: INTERLEAVE-GROUP with factor 8 at %v0, ir<%in0>
; AVX512: Cost of 40 for VF 4: INTERLEAVE-GROUP with factor 8 at %v0, ir<%in0>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VFs 8 and 16?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be added back and filter updated, thanks

; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store double %v6, ptr %out6, align 8
; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store double %v7, ptr %out7, align 8
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out0>, ir<%v0>
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out1>, ir<%v1>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test every store for each VF? Perhaps one is enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, but it would be difficult I think to include them from the filter. I left if as-is for now

; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %v6, ptr %out6, align 8
; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %v7, ptr %out7, align 8
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out0>, ir<%v>
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out1>, ir<%v1>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test all these stores?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I think it would be diffuclt to update the filter to keep only some of them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, but it would be difficult I think to include them from the filter. I left if as-is for now

; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v3, ptr %out3, align 8
; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v4, ptr %out4, align 8
; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v5, ptr %out5, align 8
; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v6, ptr %out6, align 8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the VFs of 4 and 2 have been dropped.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed, thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be added back and filter updated, thanks

; SSE42: LV: Found an estimated cost of 4 for VF 4 For instruction: store i64 %valB, ptr %out, align 8
; SSE42: LV: Found an estimated cost of 8 for VF 8 For instruction: store i64 %valB, ptr %out, align 8
; SSE42: LV: Found an estimated cost of 16 for VF 16 For instruction: store i64 %valB, ptr %out, align 8
; SSE42: Cost of 2 for VF 2: profitable to scalarize store i64 %valB, ptr %out, align 8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still the cost from the legacy cost model - it comes from CM.InstsToScalarize in LoopVectorizationCostModel, which is what I'm trying to fix in #187056

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is just to remove the reliance on the output of legacy computeCost

Copy link
Copy Markdown
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version adjust filters in tests matching interleave groups to fully match the members of the interleave groups as well, which is closer to the original check lines which also matched all load/stores in a group (just with the first member having the cost assigned, the other members having cost 0)

; CHECK-SCALAR-NEXT: LV(REG): Found max usage: 1 item
; CHECK-SCALAR-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
; CHECK-LMUL1: LV(REG): VF = vscale x 2
; CHECK-LMUL1-LABEL: goo
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think the check patterns before were not really checking what they were supposed to and would break when the legacy cost model output would be removed

; CHECK: LV: Selecting VF: 1
; CHECK: Cost of 6 for VF 2: REPLICATE ir<%10> = load ir<%9>
; CHECK: Cost of 6 for VF 2: REPLICATE ir<%12> = load ir<%11>
; CHECK: Cost for VF 2: 61 (Estimated cost per lane: 30.5)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, should be re-added. thanks

; CHECK: LV: Found an estimated cost of 4 for VF 16 For instruction: %13 = mul i8
; CHECK: LV: Found an estimated cost of 6 for VF 16 For instruction: store i8 %19
; CHECK: LV: Vector loop of width 16 costs: 1.
; CHECK: Cost of 6 for VF 2: REPLICATE ir<%10> = load ir<%9>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the check lines. in the test are not really consistent throughout the test file, dropping them seemed a bit more consistent with other functions

; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: %v4 = load double, ptr %in4, align 8
; AVX512: Cost of 9 for VF 2: INTERLEAVE-GROUP with factor 5 at %v0, ir<%in0>
; AVX512: Cost of 18 for VF 4: INTERLEAVE-GROUP with factor 5 at %v0, ir<%in0>
; AVX512: Cost of 35 for VF 8: INTERLEAVE-GROUP with factor 5 at %v0, ir<%in0>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be restored, thanks

; AVX512: Cost of 11 for VF 2: INTERLEAVE-GROUP with factor 6 at %v0, ir<%in0>
; AVX512: Cost of 21 for VF 4: INTERLEAVE-GROUP with factor 6 at %v0, ir<%in0>
; AVX512: Cost of 51 for VF 8: INTERLEAVE-GROUP with factor 6 at %v0, ir<%in0>
;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be restored thanks

; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %v6 = load i64, ptr %in6, align 8
; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %v7 = load i64, ptr %in7, align 8
; AVX512: Cost of 14 for VF 2: INTERLEAVE-GROUP with factor 8 at %v0, ir<%in0>
; AVX512: Cost of 40 for VF 4: INTERLEAVE-GROUP with factor 8 at %v0, ir<%in0>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be added back and filter updated, thanks

; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store double %v6, ptr %out6, align 8
; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store double %v7, ptr %out7, align 8
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out0>, ir<%v0>
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out1>, ir<%v1>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, but it would be difficult I think to include them from the filter. I left if as-is for now

; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %v6, ptr %out6, align 8
; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %v7, ptr %out7, align 8
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out0>, ir<%v>
; AVX512: Cost of 10 for VF 8: WIDEN store ir<%out1>, ir<%v1>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, but it would be difficult I think to include them from the filter. I left if as-is for now

; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %v6 = load i64, ptr %in6, align 8
; AVX512: Cost of 12 for VF 2: INTERLEAVE-GROUP with factor 7 at %v0, ir<%in0>
; AVX512: Cost of 35 for VF 4: INTERLEAVE-GROUP with factor 7 at %v0, ir<%in0>
; AVX512: Cost of 70 for VF 8: INTERLEAVE-GROUP with factor 7 at %v0, ir<%in0>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be added back for all VFs and filter updated, thanks

; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v3, ptr %out3, align 8
; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v4, ptr %out4, align 8
; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v5, ptr %out5, align 8
; AVX512: LV: Found an estimated cost of 0 for VF 4 For instruction: store i64 %v6, ptr %out6, align 8
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be added back and filter updated, thanks

Copy link
Copy Markdown
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fhahn fhahn merged commit e382a95 into llvm:main Apr 7, 2026
10 checks passed
@fhahn fhahn deleted the vplan-replace-legacy-cost-checks branch April 7, 2026 19:24
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 7, 2026
…. (#190038)

Move remaining tests checking legacy cost output to check the VPlan's
cost model output.

In some cases, checks become much more compact (checking a single
interleave group cost vs checking the individual members which all have
the group's cost). In some cases, auto-generation consistently checks
all relevant VFs.

PR: llvm/llvm-project#190038
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants