harbor-task-creator
Creating Harbor Evaluation Tasks
Harbor tasks are self-contained evaluation challenges for AI agents. Each task provides an instruction, a sandboxed Docker environment, automated tests, and optionally a reference solution. Understanding how Harbor wires these pieces together at runtime is key to writing tasks that work correctly.
Quick Start
Scaffold a new task with the CLI:
harbor tasks init my-task
# Flags:
# --tasks-dir (-p) Parent directory (default: current dir)
# --no-pytest Use plain bash tests instead of pytest template
# --no-solution Skip generating solution/solve.sh
# --include-canary-strings Embed tracking GUIDs in generated files
# --metadata-template <path> Seed task.toml [metadata] from a .toml template
To convert Terminal Bench tasks to Harbor format: harbor tasks migrate -i <tb-tasks/> -o <harbor-tasks/>
More from harbor-framework/skills
harbor-cli
Harbor CLI command reference and usage patterns. Covers harbor run, harbor jobs, harbor trials, harbor datasets, harbor adapters, harbor tasks, harbor view, harbor sweeps, harbor traces, harbor cache, and harbor admin commands. Use this skill whenever running Harbor evaluations, managing datasets, viewing results, debugging tasks, exporting traces, or working with any harbor CLI command. Also use when constructing harbor command lines, looking up flag names, or troubleshooting CLI errors.
7harbor-adapter-creator
Create Harbor benchmark adapters that convert external benchmark datasets into Harbor task format. Use when porting an existing benchmark to Harbor, running parity experiments, registering a dataset to the Harbor registry, or debugging adapter validation failures. Covers: adapter class interface (generate_task, make_local_task_id), directory layout including YAML job configs, oracle verification, parity planning and experiments, dataset registration, and the full post-implementation workflow.
6