perf-nsight-compute-analysis
Installation
SKILL.md
Nsight Compute Analysis
NVIDIA Nsight Compute (ncu) profiles individual CUDA kernels to determine
why they are slow and what to optimize. It measures GPU throughput as a
percentage of theoretical peak (Speed of Light / SOL%), enabling systematic
bottleneck classification and targeted optimization.
When to Use
Reach for this skill when you encounter:
- Triggers: User wants to profile a CUDA kernel, analyze
ncuoutput, interpret.ncu-repreports, or optimize GPU kernel performance - Symptoms: Kernel running slower than expected, low GPU utilization, need to classify compute-bound vs memory-bound, occupancy issues
- Keywords: "ncu", "nsight compute", "SOL%", "speed of light", "kernel profiling", "compute-bound", "memory-bound", "latency-bound", "occupancy", "roofline", "warp stalls", "cache hit rate", "ncu-rep"