Creating Harbor Evaluation Tasks

Harbor tasks are self-contained evaluation challenges for AI agents. Each task provides an instruction, a sandboxed Docker environment, automated tests, and optionally a reference solution. Understanding how Harbor wires these pieces together at runtime is key to writing tasks that work correctly.

Quick Start

Scaffold a new task with the CLI:

harbor tasks init my-task
# Flags:
#   --tasks-dir (-p)              Parent directory (default: current dir)
#   --no-pytest                   Use plain bash tests instead of pytest template
#   --no-solution                 Skip generating solution/solve.sh
#   --include-canary-strings      Embed tracking GUIDs in generated files
#   --metadata-template <path>    Seed task.toml [metadata] from a .toml template

To convert Terminal Bench tasks to Harbor format: harbor tasks migrate -i <tb-tasks/> -o <harbor-tasks/>

harbor-task-creator

Creating Harbor Evaluation Tasks

Quick Start

More from harbor-framework/skills

harbor-cli

harbor-adapter-creator