harbor-adapter-creator
Creating Harbor Benchmark Adapters
Adapters convert external benchmarks (SimpleQA, GAIA, AiderPolyglot, CodePDE, spider2-dbt, etc.) into Harbor's standardized task directory format. Each adapter reads source benchmark data and generates many individual task directories, one per benchmark instance.
When to Use an Adapter vs. Creating Tasks Directly
Use an adapter when:
- You have an existing benchmark dataset with many instances
- Tasks share the same structure but differ in data (questions, code, etc.)
- You want to track evaluation parity with the original benchmark
Create tasks directly when:
- You're authoring original evaluation challenges
- Each task has unique structure and environment
- There are fewer than ~10 tasks
Adapter Directory Structure
More from harbor-framework/skills
harbor-cli
Harbor CLI command reference and usage patterns. Covers harbor run, harbor jobs, harbor trials, harbor datasets, harbor adapters, harbor tasks, harbor view, harbor sweeps, harbor traces, harbor cache, and harbor admin commands. Use this skill whenever running Harbor evaluations, managing datasets, viewing results, debugging tasks, exporting traces, or working with any harbor CLI command. Also use when constructing harbor command lines, looking up flag names, or troubleshooting CLI errors.
8harbor-task-creator
Create Harbor evaluation tasks from scratch. Generates task.toml configuration, instruction.md for agents, environment/Dockerfile setup, tests/test.sh verification scripts, and solution/solve.sh reference solutions. Use this skill whenever creating, scaffolding, or authoring new Harbor benchmark tasks, evaluation environments, or agent challenges. Also use when fixing broken tasks, debugging reward file issues, or structuring multi-container evaluation environments.
7