llm-prompting

Pass

Audited by Gen Agent Trust Hub on Mar 30, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill architecture is vulnerable to indirect prompt injection as it ingests untrusted external content into LLM prompts.
  • Ingestion points: Raw scraped website content and mission data are interpolated into the {charity_data} variable within prompt templates in data-pipeline/src/llm/prompts/.
  • Boundary markers: The documented templates do not utilize explicit boundary markers (such as XML tags or dedicated delimiters) or instructions to ignore embedded commands within the interpolated data.
  • Capability inventory: The skill uses LLM responses for scoring and narrative generation, which directly influences the evaluation outcomes. It does not appear to have access to shell execution or other high-privilege system tools.
  • Sanitization: No sanitization or escaping of the ingested raw content is specified before prompt interpolation.
  • Mitigating Factors: The design includes critical safeguards, most notably 'Deterministic Score Overrides' where code-calculated values overwrite LLM-generated scores, and a secondary 'LLM-as-Judge' mechanism that validates the quality and factual accuracy of the output.
  • [PROMPT_INJECTION]: The LLM client configuration in data-pipeline/src/llm/llm_client.py explicitly disables safety filters (BLOCK_NONE) for various harm categories. While the author justifies this based on the nature of charity data, it reduces the model's inherent protection against processing or generating harmful content if malicious data is successfully injected into the pipeline.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 30, 2026, 01:20 AM