The Agent Skills Directory

[COMMAND_EXECUTION]: Orchestrator scripts run_eval_cc.py and run_eval_openclaw.py extensively use asyncio.create_subprocess_exec to run local and remote shell commands via gcloud compute ssh to manage the evaluation environment.
[COMMAND_EXECUTION]: The script run_eval_openclaw.py uses sudo systemctl restart to dynamically update environment variables for the gateway service on remote servers.
[COMMAND_EXECUTION]: The documentation in server-setup.md and references/common-execution.md instructs users to modify ~/.bashrc and ~/.ssh/config to persist environment settings and SSH tunnel configurations.
[REMOTE_CODE_EXECUTION]: server-setup.md provides instructions to download and execute setup scripts from well-known sources (NodeSource) and vendor repositories (raw.githubusercontent.com/CoboGlobal) via shell piping (curl | bash).
[EXTERNAL_DOWNLOADS]: Fetches binaries and dependencies from well-known repositories such as the NodeSource registry and the official CoboGlobal GitHub organization.
[DATA_EXFILTRATION]: Orchestrator scripts automatically retrieve session logs and wallet transaction specifications from remote instances using gcloud compute scp for local analysis.
[PROMPT_INJECTION]:
Ingestion points: score_traces.py ingests untrusted agent session data from Langfuse traces or local JSONL files.
Boundary markers: The judge_cc.py script embeds these logs into judge prompts using headers like [USER] and [ASSISTANT], but does not fully escape or encapsulate the untrusted content.
Capability inventory: The orchestration scripts possess high-privilege capabilities including arbitrary shell command execution and remote service management.
Sanitization: The logs undergo truncation for length in _build_session_text_from_observations but lack robust sanitization against embedded instructions targeting the LLM-as-Judge subagent.

caw-eval