Skip to content

Conversation

@sdesmalen-arm
Copy link
Collaborator

i16 -> i32 dot-product operations are natively supported with SVE2p1 and SME2. This updates the cost-model so that those operations are recognized as cheap by the LoopVectorizer.

i16 -> i32 dot-product operations are natively supported with
SVE2p1 and SME2. This updates the cost-model so that those operations
are recognized as cheap by the LoopVectorizer.
@llvmbot
Copy link
Member

llvmbot commented Dec 31, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

i16 -> i32 dot-product operations are natively supported with SVE2p1 and SME2. This updates the cost-model so that those operations are recognized as cheap by the LoopVectorizer.


Full diff: https://github.com/llvm/llvm-project/pull/174102.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+5)
  • (added) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll (+37)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index fdf973d0cf1b7..ad2d50a28d284 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5915,6 +5915,11 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
     if (AccumLT.second.getScalarType() == MVT::i64 &&
         InputLT.second.getScalarType() == MVT::i16)
       return Cost;
+    // i16 -> i32 is natively supported with SVE2p1
+    if (AccumLT.second.getScalarType() == MVT::i32 &&
+        InputLT.second.getScalarType() == MVT::i16 &&
+        (ST->hasSVE2p1() || ST->hasSME2()))
+      return Cost;
     // i8 -> i64 is supported with an extra level of extends
     if (AccumLT.second.getScalarType() == MVT::i64 &&
         InputLT.second.getScalarType() == MVT::i8)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll
new file mode 100644
index 0000000000000..d090f61a47821
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll
@@ -0,0 +1,37 @@
+; REQUIRES: asserts
+; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=8 \
+; RUN:     -enable-epilogue-vectorization=false -debug-only=loop-vectorize         \
+; RUN:     -mattr=+sve2p1 -scalable-vectorization=off                              \
+; RUN:     -disable-output < %s 2>&1 | FileCheck %s --check-prefix=CHECK-FIXED
+; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=8 \
+; RUN:     -enable-epilogue-vectorization=false -debug-only=loop-vectorize         \
+; RUN:     -mattr=+sve2p1 -scalable-vectorization=on                               \
+; RUN:     -disable-output < %s 2>&1 | FileCheck %s --check-prefix=CHECK-SCALABLE
+; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=8 \
+; RUN:     -enable-epilogue-vectorization=false -debug-only=loop-vectorize         \
+; RUN:     -mattr=+sve2,+sme2 -scalable-vectorization=on                           \
+; RUN:     -disable-output < %s 2>&1 | FileCheck %s --check-prefix=CHECK-SCALABLE
+
+; CHECK-FIXED: Cost of 1 for VF 8: EXPRESSION vp<%8> = ir<%acc> + partial.reduce.add (ir<%load> sext to i32)
+; CHECK-SCALABLE: Cost of 1 for VF vscale x 8: EXPRESSION vp<%8> = ir<%acc> + partial.reduce.add (ir<%load> sext to i32)
+
+target triple = "aarch64"
+
+define i32 @sext_reduction_i16_to_i32(ptr %arr, i32 %n) vscale_range(1,16) {
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+  %acc = phi i32 [ 0, %entry ], [ %add, %loop ]
+  %gep = getelementptr inbounds i16, ptr %arr, i32 %iv
+  %load = load i16, ptr %gep
+  %sext = sext i16 %load to i32
+  %add = add i32 %acc, %sext
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp ult i32 %iv.next, %n
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i32 %add
+}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants