caw-eval
Fail
Audited by Gen Agent Trust Hub on Apr 30, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: Orchestrator scripts
run_eval_cc.pyandrun_eval_openclaw.pyextensively useasyncio.create_subprocess_execto run local and remote shell commands viagcloud compute sshto manage the evaluation environment. - [COMMAND_EXECUTION]: The script
run_eval_openclaw.pyusessudo systemctl restartto dynamically update environment variables for the gateway service on remote servers. - [COMMAND_EXECUTION]: The documentation in
server-setup.mdandreferences/common-execution.mdinstructs users to modify~/.bashrcand~/.ssh/configto persist environment settings and SSH tunnel configurations. - [REMOTE_CODE_EXECUTION]:
server-setup.mdprovides instructions to download and execute setup scripts from well-known sources (NodeSource) and vendor repositories (raw.githubusercontent.com/CoboGlobal) via shell piping (curl | bash). - [EXTERNAL_DOWNLOADS]: Fetches binaries and dependencies from well-known repositories such as the NodeSource registry and the official CoboGlobal GitHub organization.
- [DATA_EXFILTRATION]: Orchestrator scripts automatically retrieve session logs and wallet transaction specifications from remote instances using
gcloud compute scpfor local analysis. - [PROMPT_INJECTION]:
- Ingestion points:
score_traces.pyingests untrusted agent session data from Langfuse traces or local JSONL files. - Boundary markers: The
judge_cc.pyscript embeds these logs into judge prompts using headers like[USER]and[ASSISTANT], but does not fully escape or encapsulate the untrusted content. - Capability inventory: The orchestration scripts possess high-privilege capabilities including arbitrary shell command execution and remote service management.
- Sanitization: The logs undergo truncation for length in
_build_session_text_from_observationsbut lack robust sanitization against embedded instructions targeting the LLM-as-Judge subagent.
Recommendations
- HIGH: Downloads and executes remote code from: http://metadata.google.internal/computeMetadata/v1 - DO NOT USE without thorough review
Audit Metadata