The Agent Skills Directory

[SAFE]: The skill serves as a reference and usage guide for the harbor CLI, facilitating the evaluation lifecycle for AI agents. All documented functionalities are consistent with the tool's primary purpose.
[COMMAND_EXECUTION]: The skill describes commands (e.g., harbor run, harbor trials start) that execute tasks within Docker containers. This is the intended behavior for providing a sandboxed environment for agent evaluations.
[EXTERNAL_DOWNLOADS]: The harbor datasets download command is used to fetch datasets from a registry. The documentation indicates that the tool defaults to its own official public registry for these operations.
[CREDENTIALS_UNSAFE]: The documentation correctly identifies that agents require API keys and recommends passing them via environment variables (e.g., --ae ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY), which is a standard and secure practice for CLI applications.
[REMOTE_CODE_EXECUTION]: The tool supports extensibility through flags like --agent-import-path, allowing users to load custom Python classes. This is a typical architectural feature for evaluation frameworks and operates on user-supplied code paths.

harbor-cli