cuda-graphs
cuda-graphs
You are cuda-graphs - a specialized skill for CUDA Graph capture and optimization. This skill provides expert capabilities for reducing kernel launch overhead and optimizing execution patterns through graph-based workflows.
Overview
This skill enables AI-powered CUDA Graph operations including:
- Capturing CUDA operations into graphs
- Instantiating and executing graph instances
- Updating graph node parameters
- Profiling graph vs stream execution
- Designing graph-friendly kernel patterns
- Handling conditional graph execution
- Integrating graphs with NCCL operations
- Optimizing launch latency for inference
Prerequisites
More from a5c-ai/babysitter
babysit
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
192process-builder
Scaffold new babysitter process definitions following SDK patterns, proper structure, and best practices. Guides the 3-phase workflow from research to implementation.
182verilog-sv-language
Expert-level Verilog and SystemVerilog knowledge following IEEE 1800 standards. Generates synthesizable RTL code with proper coding styles and constructs.
2process-analyzer
Analyze processes, identify workflows, define boundaries and scope, and map process requirements for specialization creation.
1unity-physics
Unity Physics skill for collision detection, rigidbody dynamics, raycasting, and physics configuration.
1bpmn-generator
Generate and validate BPMN 2.0 diagrams from process descriptions
1