ascend-schedule-analysis
Ascend Schedule Analysis
Analyze NPU scheduling issues from evidence first. Do not start with tuning suggestions. Establish whether the workload is actually Host Bound, then separate model-side dispatch pressure from CPU runtime scheduling interference and environment/configuration issues.
Core Workflow
- Validate the profiling format and available artifacts.
- Establish whether Host Bound exists.
- Split Free time into preparation, in-step gaps, and post-dispatch queueing behavior.
- Identify the dominant cause category.
- Map each finding to the smallest practical optimization experiment.
Prefer DB profiling data with ascend_pytorch_profiler_{rank_id}.db. For DB data, first try to create or use the dispatch view through msprof_mcp-create_dispatch_view. If the view is unavailable, fall back to direct DB queries only for the fields required by the current question.
For cluster profiling data, rank candidates by device Free time first. Prefer analyzing the rank with the longest Free time, because it is the most likely Host Bound suspect. When the evidence is ambiguous or when a fast/slow-rank explanation is needed, compare dispatch APIs between a long-Free rank and a short-Free rank in the same step range.
Evidence Standard
Treat a schedule issue as Host Bound only when the evidence shows device starvation: