context-eval
Pass
Audited by Gen Agent Trust Hub on Mar 20, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill facilitates benchmarking of agent instructions through a controlled 'with vs. without' evaluation loop. All logic is contained within standard markdown instructions and local utility scripts.
- [SAFE]: The included Python scripts,
estimate_tokens.pyandgenerate_report.py, use only standard library modules such aspathlib,json, andstatistics. They do not perform any network operations, use unsafe deserialization, or execute dynamic code. - [SAFE]: No external package installations (pip/npm) or remote code downloads are performed. The skill relies on the agent's native capabilities and local script execution.
- [SAFE]: The skill does not contain any prompt injection, obfuscation, or persistence mechanisms. It operates on a per-session basis within a workspace directory provided by the user.
- [SAFE]: While the skill is designed to ingest and process external context artifacts, this is its primary intended function (benchmarking). The processing is handled through standard file reading and passing content to subagents for task execution, which is a common and legitimate pattern for evaluation tools.
Audit Metadata