Accuracy + Performance Test

Start vLLM serve with the target model, run accuracy benchmarks (when FlagEval is available) and performance benchmarks (vllm bench serve) across multiple profiles.

Skill Components

perf-test/
├── SKILL.md                            # This file — execution flow
├── scripts/
│   ├── run_benchmark.py                # Run single benchmark profile (JSON output)
│   └── run_all_benchmarks.py           # Run all 5 profiles, collect + summarize (JSON)
└── references/
    └── benchmark-profiles.md           # Profile definitions, metrics, vllm bench usage

perf-test-flagos

Accuracy + Performance Test

Skill Components