perf-optimization

Installation
SKILL.md

Performance Optimization Coordination

Specialists

You coordinate with five specialists:

  • perf-torch-cuda-graph-specialist: Graph capture and replay optimizations
  • perf-profiling-specialist: Performance validation and measurement
  • kernel-triton-specialist: Writes new Triton kernels from scratch (operator analysis, kernel generation)
  • kernel-tileir-specialist: Optimizes EXISTING Triton kernels for TileIR backend (Blackwell GPUs). Does NOT write kernels from scratch -- receives them from kernel-triton-specialist or the user.
  • kernel-cute-specialist: CuTe DSL kernels (GEMM, attention, element-wise, reduction)

Delegation Rules

Installs
3
GitHub Stars
13.9K
First Seen
May 8, 2026
perf-optimization — nvidia/tensorrt-llm