hccl-test
HCCL Performance Test
HCCL性能测试工具用于测试HCCL(Huawei Collective Communication Library)集合通信的功能正确性以及性能。
Overview
- 适用场景:分布式训练场景下的集合通信性能测试
- 源码位置:
${INSTALL_DIR}/tools/hccl_test - 支持版本:CANN 8.3.RC1, CANN 8.5, CANN 25.RC
支持的产品型号
| 产品系列 | 最大 Rank 数 | 备注 |
|---|---|---|
| Atlas 训练系列产品 | 4096 | - |
| Atlas A2 训练系列产品 | 32K | - |
| Atlas A3 训练系列产品/Atlas A3 推理系列产品 | 32K | AlltoAll/AlltoAllV 最大 8K |
| Atlas 300I Duo 推理卡 | - | - |
More from ascend-ai-coding/awesome-ascend-skills
npu-smi
Huawei Ascend NPU npu-smi command reference. Use for device queries (health, temperature, power, memory, processes, ECC), configuration (thresholds, modes, fan), firmware upgrades (MCU, bootloader, VRD), virtualization (vNPU), and certificate management.
67atc-model-converter
Complete toolkit for Huawei Ascend NPU model conversion and end-to-end inference adaptation. Workflow 1 auto-discovers input shapes and parameters from user source code. Workflow 2 exports PyTorch models to ONNX. Workflow 3 converts ONNX to .om via ATC with multi-CANN version support. Workflow 4 adapts the user's full inference pipeline (preprocessing + model + postprocessing) to run end-to-end on NPU. Workflow 5 verifies precision between ONNX and OM outputs. Workflow 6 generates a reproducible README. Supports any standard PyTorch/ONNX model. Use when converting, testing, or deploying models on Ascend AI processors.
55ascendc
AscendC transformer/GMM/MoE 算子与 Matmul/Cube Kernel 的统一开发规范。用于在 ops-transformer 下新增或修改 op_host、tiling/infershape、op_kernel(含 MatmulImpl/Cube 调用)、以及对应的 CANN aclnn 示例和单测。
51ascend-docker
Create Docker containers for Huawei Ascend NPU development with proper device mappings and volume mounts. Use when setting up Ascend development environments in Docker, running CANN applications in containers, or creating isolated NPU development workspaces. Supports privileged mode (default), basic mode, and full mode with profiling/logging. Auto-detects available NPU devices.
51msmodelslim
Huawei Ascend NPU model compression tool (msModelSlim). Use for LLM quantization (W4A8, W8A8, W8A8S, W8A16), MoE model compression, multimodal model compression (Qwen-VL, InternVL, HunyuanVideo, FLUX, SD3), calibration data preparation, precision auto-tuning, sensitive layer analysis, custom model integration, and deployment in MindIE/vLLM-Ascend. Supports Qwen, LLaMA, DeepSeek, GLM, Kimi, InternLM and more.
44vllm-ascend
vLLM Ascend plugin for LLM inference serving on Huawei Ascend NPU. Use for offline batch inference, API server deployment, quantization inference (with msmodelslim quantized models), tensor/pipeline parallelism for distributed serving, and OpenAI-compatible API endpoints. Supports Qwen, DeepSeek, GLM, LLaMA models with Ascend-optimized kernels.
41