grading-backend

Pass

Audited by Gen Agent Trust Hub on Apr 15, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes local shell scripts (probes/probe-robustness.sh and probes/probe-performance.sh) and standard utilities like curl and grep. These tools are used to test the functionality, robustness, and performance of a backend service hosted locally (typically at http://localhost:8080). This execution occurs within the context of analyzing a development project.
  • [EXTERNAL_DOWNLOADS]: The skill utilizes npx to download and run well-known developer tools such as autocannon (for load testing) and ajv-cli (for JSON schema validation). These downloads target the official NPM registry, which is a well-known service and thus considered safe for this use case.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it reads and processes untrusted source code and documentation from the projects being graded.
  • Ingestion points: Project source files (e.g., .ts, .py, .go), specification documentation, and tool outputs (e.g., curl responses, test logs) are read into the agent's context from the file system and local network.
  • Boundary markers: The skill uses rigid markdown report templates and JSON schemas to structure the output, but it lacks explicit boundary delimiters or 'ignore' instructions for the third-party code being analyzed.
  • Capability inventory: The agent has the capability to execute subprocesses via the provided shell scripts, perform file read/write operations, and probe local network services.
  • Sanitization: No specific sanitization or filtering of external content is performed; the skill relies on the agent's internal reasoning and regex-based searching (grep) to process data.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 15, 2026, 01:11 AM