cuda-attention-kernel-patterns

Pass

Audited by Gen Agent Trust Hub on May 15, 2026

Risk Level: SAFE
Full Analysis
  • [Implementation Guidelines]: The skill documents internal CUDA kernel dispatch logic and eligibility criteria for Flash, Memory Efficient, and Unfused Attention paths within ONNX Runtime.
  • [Numerical Safety]: It provides specific guidance for preventing floating-point overflows in CUTLASS softmax kernels by capping filter values, which is a standard robustness practice.
  • [Developer Tooling]: It includes examples of using environment variables to control kernel selection during testing and debugging, intended for development environments.
  • [Best Practices]: The content includes technical advice on using grid-stride loops and proper error handling for CUDA kernel launches to ensure memory safety and operational stability.
Audit Metadata
Risk Level
SAFE
Analyzed
May 15, 2026, 01:51 PM
Security Audit — agent-trust-hub — cuda-attention-kernel-patterns