perf-nsight-compute-analysis

Installation
SKILL.md

Nsight Compute Analysis

NVIDIA Nsight Compute (ncu) profiles individual CUDA kernels to determine why they are slow and what to optimize. It measures GPU throughput as a percentage of theoretical peak (Speed of Light / SOL%), enabling systematic bottleneck classification and targeted optimization.

When to Use

Reach for this skill when you encounter:

  • Triggers: User wants to profile a CUDA kernel, analyze ncu output, interpret .ncu-rep reports, or optimize GPU kernel performance
  • Symptoms: Kernel running slower than expected, low GPU utilization, need to classify compute-bound vs memory-bound, occupancy issues
  • Keywords: "ncu", "nsight compute", "SOL%", "speed of light", "kernel profiling", "compute-bound", "memory-bound", "latency-bound", "occupancy", "roofline", "warp stalls", "cache hit rate", "ncu-rep"
Installs
2
GitHub Stars
13.8K
First Seen
May 8, 2026
perf-nsight-compute-analysis — nvidia/tensorrt-llm