agent-observability-eval-bootstrap

Pass

Audited by Gen Agent Trust Hub on Jun 19, 2026

Risk Level: SAFE
Full Analysis
  • [COMMAND_EXECUTION]: The skill utilizes the Bash tool to execute the pup CLI (a Datadog-specific utility) for interacting with the LLM Observability API. This is used to search for spans, retrieve trace details, and manage evaluator configurations. This behavior is consistent with the skill's stated purpose and originates from a known vendor.
  • [DATA_EXFILTRATION]: The skill processes production LLM trace data to identify quality dimensions. It includes a built-in 'PII scrub' mechanism (Phase 3D) that uses regex to redact sensitive information (emails, phone numbers, API keys) from the data before it is output to the user or saved to disk.
  • [INDIRECT_PROMPT_INJECTION]: The skill ingests untrusted production trace content to generate prompt templates for LLM-as-judge evaluators. While this represents a theoretical indirect prompt injection surface, it is mitigated by several factors: the skill requires explicit user confirmation of the proposed suite (Phase 2), generates evaluators as disabled drafts in Datadog (Phase 3C), and encourages the use of offline SDK code for testing before production use.
Audit Metadata
Risk Level
SAFE
Analyzed
Jun 19, 2026, 08:57 AM
Security Audit — agent-trust-hub — agent-observability-eval-bootstrap