cuda-attention-kernel-patterns
Pass
Audited by Gen Agent Trust Hub on May 15, 2026
Risk Level: SAFE
Full Analysis
- [Implementation Guidelines]: The skill documents internal CUDA kernel dispatch logic and eligibility criteria for Flash, Memory Efficient, and Unfused Attention paths within ONNX Runtime.
- [Numerical Safety]: It provides specific guidance for preventing floating-point overflows in CUTLASS softmax kernels by capping filter values, which is a standard robustness practice.
- [Developer Tooling]: It includes examples of using environment variables to control kernel selection during testing and debugging, intended for development environments.
- [Best Practices]: The content includes technical advice on using grid-stride loops and proper error handling for CUDA kernel launches to ensure memory safety and operational stability.
Audit Metadata