context-eval

Pass

Audited by Gen Agent Trust Hub on Mar 20, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill facilitates benchmarking of agent instructions through a controlled 'with vs. without' evaluation loop. All logic is contained within standard markdown instructions and local utility scripts.
  • [SAFE]: The included Python scripts, estimate_tokens.py and generate_report.py, use only standard library modules such as pathlib, json, and statistics. They do not perform any network operations, use unsafe deserialization, or execute dynamic code.
  • [SAFE]: No external package installations (pip/npm) or remote code downloads are performed. The skill relies on the agent's native capabilities and local script execution.
  • [SAFE]: The skill does not contain any prompt injection, obfuscation, or persistence mechanisms. It operates on a per-session basis within a workspace directory provided by the user.
  • [SAFE]: While the skill is designed to ingest and process external context artifacts, this is its primary intended function (benchmarking). The processing is handled through standard file reading and passing content to subagents for task execution, which is a common and legitimate pattern for evaluation tools.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 20, 2026, 04:47 PM
Security Audit — agent-trust-hub — context-eval