The Agent Skills Directory

[Implementation Guidelines]: The skill documents internal CUDA kernel dispatch logic and eligibility criteria for Flash, Memory Efficient, and Unfused Attention paths within ONNX Runtime.
[Numerical Safety]: It provides specific guidance for preventing floating-point overflows in CUTLASS softmax kernels by capping filter values, which is a standard robustness practice.
[Developer Tooling]: It includes examples of using environment variables to control kernel selection during testing and debugging, intended for development environments.
[Best Practices]: The content includes technical advice on using grid-stride loops and proper error handling for CUDA kernel launches to ensure memory safety and operational stability.

cuda-attention-kernel-patterns