exp-simd-vectorization
SIMD Vectorization
Decision Gate
- Check
Span<T>andMemoryExtensionsfirst. If the operation can be expressed using built-inSpan<T>methods (e.g.,Contains,IndexOf,CopyTo,SequenceEqual) orMemoryExtensions, use them — no additional dependency is needed and the runtime already vectorizes many of these internally. - Check for TensorPrimitives next. If one or more TensorPrimitives methods cover the operation → use them. If the
.csprojdoes NOT already referenceSystem.Numerics.Tensors, add the package, for example:<PackageReference Include="System.Numerics.Tensors" />(or use the versioning approach already used by your solution). Then replace the scalar loop with TP calls and stop. See the full API table below. Compose multiple TP calls when needed (e.g., finding both min and max →TensorPrimitives.Min(span)+TensorPrimitives.Max(span)as two calls). Do NOT write manual Vector128 code for operations TP already handles. - Scalar loop over contiguous array/span of
byte,sbyte,short,ushort,int,uint,long,ulong,nint,nuint,float,double(andcharvia reinterpretation asushort)? → Implement with explicitVector128<T>/Vector256<T>/Vector512<T>intrinsics using the patterns below. - No contiguous numeric arrays to process (dictionary lookups, tree traversals, linked lists, state machines, string formatting, small collections, enum comparisons, recursive algorithms, decimal arithmetic)? → Report
[NO SIMD OPPORTUNITY]and write a full paragraph explaining WHY, referencing the specific code characteristics that prevent vectorization (e.g., "State machines require sequential branching on enum values — there are no contiguous numeric arrays to process in parallel, and each transition depends on the previous state"). This explanation is graded.
TensorPrimitives API Reference
TensorPrimitives APIs are generic and work for any primitive type that satisfies the method's generic constraints — not just float/double. For example, Sum requires IAdditionOperators<T,T,T> + IAdditiveIdentity<T,T> and works for all primitive numeric types, while CosineSimilarity requires IRootFunctions<T> and only works for float/double. If the project doesn't already reference System.Numerics.Tensors, add it to the .csproj. Replace the entire manual loop with one or more TensorPrimitives calls as needed (prefer a single call when possible):
Reductions (span → scalar)
| Operation | API |
|---|---|
| Sum | TensorPrimitives.Sum(span) |
| Sum of squares | TensorPrimitives.SumOfSquares(span) |
| Sum of magnitudes (L1 norm) | TensorPrimitives.SumOfMagnitudes(span) |
| L2 norm | TensorPrimitives.Norm(span) |
| Product of all elements | TensorPrimitives.Product(span) |
More from managedcode/dotnet-skills
dotnet
Primary router skill for broad .NET work. Classify the repo by app model and cross-cutting concern first, then switch to the narrowest matching .NET skill instead of staying at a generic layer.
18dotnet-aspnet-core
Build, debug, modernize, or review ASP.NET Core applications with correct hosting, middleware, security, configuration, logging, and deployment patterns on current .NET.
13dotnet-entity-framework-core
Design, tune, or review EF Core data access with proper modeling, migrations, query translation, performance, and lifetime management for modern .NET applications.
12dotnet-code-review
Review .NET changes for bugs, regressions, architectural drift, missing tests, incorrect async or disposal behavior, and platform-specific pitfalls before you approve or merge them.
11dotnet-architecture
Design or review .NET solution architecture across modular monoliths, clean architecture, vertical slices, microservices, DDD, CQRS, and cloud-native boundaries without over-engineering.
11dotnet-signalr
Implement or review SignalR hubs, streaming, reconnection, transport, and real-time delivery patterns in ASP.NET Core applications.
10