Fallback SVE depending on vector lanes by stevesuzuki-arm · Pull Request #9084 · halide/Halide

stevesuzuki-arm · 2026-03-30T09:52:43Z

While Halide accepts arbitrary factor of vectorization, compiling
it to SVE target with LLVM scalable vector type has some challenges:

Only vectors with lanes multiple-of-vscale is representable
Backend compiler is crashy for odd number of lanes

This problem happens more frequently when running existing
Halide unit tests on target with vector_bits longer than 128bit,
because the vectorization factor is too short in some case, or
unusual value for the purpose of corner case testing.

Lowering everything with predicates might be an option, however,
that would require invasive changes and the feasibility is unknown.
The other option is to convert to/from fixed sized vector, but
this causes the issue of mixing fixed and scalable in a intrin,
, performance penalty, and also infeasible on target without NEON.

To workaround this problem, we inspect vector lanes used in a function,
and if we find any problematic number for given target.vector_bits,
we stop using scalable vector entirely (i.e. set effective_vscale = 0).

Intrinsic map for Call is created for both NEON and SVE at init_module,
and dynamically selected for each function to compile.

In case this fallback happens, user is warned that SVE is disabled.
Note this situation can be typically resolved by vectorizing with
a longer and power-of-two factor.

Checklist

Tests added or updated (not required for docs, CI config, or typo fixes)
Documentation updated (if public API changed)
Python bindings updated (if public API changed)
Benchmarks are included here if the change is intended to affect performance.
Commits include AI attribution where applicable (see Code of Conduct)

While Halide accepts arbitrary factor of vectorization, compiling it to SVE target with LLVM scalable vector type has some challenges: - Only vectors with lanes multiple-of-vscale is representable - Backend compiler is crashy for odd number of lanes This problem happens more frequently when running existing Halide unit tests on target with vector_bits longer than 128bit, because the vectorization factor is too short in some case, or unusual value for the purpose of corner case testing. Lowering everything with predicates might be an option, however, that would require invasive changes and the feasibility is unknown. The other option is to convert to/from fixed sized vector, but this causes the issue of mixing fixed and scalable in a intrin, , performance penalty, and also infeasible on target without NEON. To workaround this problem, we inspect vector lanes used in a function, and if we find any problematic number for given target.vector_bits, we stop using scalable vector entirely (i.e. set effective_vscale = 0). Intrinsic map for Call is created for both NEON and SVE at init_module, and dynamically selected for each function to compile. In case this fallback happens, user is warned that SVE is disabled. Note this situation can be typically resolved by vectorizing with a longer and power-of-two factor.

abadams · 2026-03-30T18:08:36Z

I'm hoping the vector width legalization pass in #8629 might make this unnecessary. Let's try to get that merged.

abadams · 2026-04-10T20:20:51Z

Actually, not it won't help, because it doesn't pad vectors out to native length. At the dev meeting we decided that we should proceed with this PR to unblock other SVE work.

abadams · 2026-04-13T17:54:32Z

+    // TODO: Target::SVE not supported https://github.com/halide/Halide/issues/8872
+    feasible_vscale = 0;
+    if (target.features_any_of({Target::SVE2})) {
+        VectorLanesGatherer vector_lanes_gatherer;


Simpler alternative would be something like:

std::set<int> lanes_used; mutate_with(func.body, [&](auto *self, const Expr &e) { lanes_used.insert(e.type().lanes()); self->mutate_base(e); });

That should visit every Expr node, which should see all the expr types.

abadams · 2026-04-14T16:30:16Z

-        feasible_vscale = check_feasible_vscale(target.vector_bits, vector_lanes_gatherer.lanes_used, simple_name);
+        std::set<int> lanes_used;
+        mutate_with(func.body, [&](auto *self, const Expr &e) {
+            Type t = e.type();


I'm confused about some of this code. The ramp and broadcast nodes look like they don't do anything. Isn't t already that type? And why does lowering an intrinsic change the type? What can't this function not just look at t.lanes() (except for handles)?

What we want to check here ideally is what vector lanes we will potentially see during the code-gen. In Ramp and Broadcast, op->lanes can be different from op->type.lanes(). lower_intrinsic potentially generates different type and lanes, called in CodeGen_LLVM. The idea is borrowed from CodeGen_C. Does it make sense? I don't think this approach is able to perfectly catch vector lanes in LLVM::Value, so it is a kind of mitigation. If there is better approach, I'm happy to change.

For Ramp and Broadcast, the lanes field is equal to the type's lanes field after the vector flattening pass has run, so they should be the same at this point. And even if it weren't, I think the thing that is going to get generated is a vector with the Ramp-type's lanes, not the Ramp-node's lanes. I don't understand why CodeGen_C does something special here. lower_intrinsic doesn't do any vector reinterprets (I think), so it should never change the number of lanes either. CodeGen_C just wants to look at it to see if it introduces new element types. Do you have any cases that fail if this just looks at t.lanes()?

I've done the experiment where I collect lanes in two different ways and assert if they match. As a result, it turned out specializing Ramp, Broadcast, and Call is not making any difference at least in Halide unit tests. Also, removing the handle case does not cause any test failure. I've updated with simplified code. Thank you for this feedback.

zvookin · 2026-04-15T19:52:31Z

I don't have an objection to the PR in that it looks to make things better without breaking anything else, but really this is another chunk of complexity to deal with a failed design at the LLVM level. Is there any chance of things improving on the LLVM side to allow reliable fixed width targeting of SVE2? A major use case is WASM flexible vectors which are generally JIT compiled so the exact vector width is known at compile time. This should work without using vscale types at all.

At this point the main value in this feature is for Halide targeting the more restricted SVE2 subset stuff for neural network applications that is actually showing up in hardware. My guess is we have to go around LLVM entirely to do that via using inline assembly blocks.

abadams · 2026-04-15T20:27:06Z

This is intended to be temporary. PR #8629 adds a pass that forces vectors to be a size the backend can accept by slicing up too-wide vectors. It uses it for source-based GPU APIs (e.g. metal). It doesn't have all the features necessary to make this PR unnecessary, but it's close. The plan is to merge both to unblock people, then extend the legalization pass to handle padding sub-width vectors, then delete the check in this PR and instead legalize. That should mean the sve2 backend can handle anything, because we'll do the legalization ourselves on the Halide side instead of relying on LLVM.

So yes, it's more complexity on the Halide side, but ultimately vector width legalization is complexity we also need for some of the non-LLVM backends.

zvookin · 2026-04-16T01:01:19Z

So yes, it's more complexity on the Halide side, but ultimately vector width legalization is complexity we also need for some of the non-LLVM backends.

The "complexity" I'm worried about here is the effective vscale is no longer constant during compilation. It shouldn't be a huge problem as it is either the value it is before this PR or zero, but reasoning about the code definitely gets trickier. It might be better to introduce a type that encodes a vector type constraint if one is needed, the effective vscale, and whether vectorization is suspended do to a length mismatch. This would be passed consistently rather than the current selection of zero to three of those values as separate arguments in various places.. But really I expect this all gets ripped out when vector length canonicalization happens.

stevesuzuki-arm added 2 commits March 30, 2026 08:12

Add tests for fallback SVE depending on vector_bits

95cf920

alexreinking requested a review from halidebuildbots March 30, 2026 12:49

abadams reviewed Apr 13, 2026

View reviewed changes

stevesuzuki-arm added 2 commits April 14, 2026 06:56

Merge branch 'main' into pr-fallback_sve

f3b00e0

Use mutate_with() to gather vector lanes

377ad10

abadams reviewed Apr 14, 2026

View reviewed changes

Simplify pass to gather vector lanes

ffa72f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback SVE depending on vector lanes#9084

Fallback SVE depending on vector lanes#9084
stevesuzuki-arm wants to merge 5 commits intohalide:mainfrom
stevesuzuki-arm:pr-fallback_sve

stevesuzuki-arm commented Mar 30, 2026

Uh oh!

abadams commented Mar 30, 2026

Uh oh!

abadams commented Apr 10, 2026

Uh oh!

abadams Apr 13, 2026

Uh oh!

abadams Apr 14, 2026 •

edited

Loading

Uh oh!

stevesuzuki-arm Apr 14, 2026

Uh oh!

abadams Apr 14, 2026

Uh oh!

stevesuzuki-arm Apr 15, 2026

Uh oh!

zvookin commented Apr 15, 2026 •

edited

Loading

Uh oh!

abadams commented Apr 15, 2026

Uh oh!

zvookin commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stevesuzuki-arm commented Mar 30, 2026

Checklist

Uh oh!

abadams commented Mar 30, 2026

Uh oh!

abadams commented Apr 10, 2026

Uh oh!

abadams Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

abadams Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevesuzuki-arm Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

abadams Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

stevesuzuki-arm Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

zvookin commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abadams commented Apr 15, 2026

Uh oh!

zvookin commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abadams Apr 14, 2026 •

edited

Loading

zvookin commented Apr 15, 2026 •

edited

Loading