ais-bench
Installation
SKILL.md
AISBench Benchmark Tool
AISBench Benchmark is a model evaluation tool built based on OpenCompass. It supports evaluation scenarios for both accuracy and performance testing of AI models on Ascend NPU.
Overview
- Accuracy Evaluation: Accuracy verification of service-deployed models and local models on various QA and reasoning benchmark datasets, covering text, multimodal, and other scenarios.
- Performance Evaluation: Latency and throughput evaluation of service-deployed models, extreme performance testing under stress test scenarios, steady-state performance evaluation, and real business traffic simulation.
Supported Scenarios
| Scenario | Description |
|---|---|
| Accuracy Evaluation | Model accuracy on text/multimodal datasets |
| Performance Evaluation | Latency, throughput, stress testing |
| Steady-State Performance | Obtain true optimal system performance |
| Real Traffic Simulation | Simulate real business traffic patterns |
| Multi-turn Dialogue | Evaluate multi-turn conversation models |
| Function Call (BFCL) | Function calling capability evaluation |