together-evaluations

Pass

Audited by Gen Agent Trust Hub on May 6, 2026

Risk Level: SAFE
Full Analysis
  • [PROMPT_INJECTION]: The skill processes untrusted external datasets that are subsequently evaluated by LLM judge models, which establishes an inherent surface for indirect prompt injection. 1. Ingestion points: Dataset files provided via input_data_file_path in scripts/run_evaluation.py and scripts/run_evaluation.ts. 2. Boundary markers: Absent. 3. Capability inventory: Orchestrates LLM evaluations via Together AI API and performs local file system operations for dataset handling. 4. Sanitization: No validation or filtering of dataset content is performed prior to processing.
  • [COMMAND_EXECUTION]: The provided scripts perform legitimate file operations, such as creating temporary JSONL files for upload and downloading results, alongside network requests to the Together AI API and external providers. These actions are consistent with the skill's stated purpose of automating model evaluations.
Audit Metadata
Risk Level
SAFE
Analyzed
May 6, 2026, 07:54 PM
Security Audit — agent-trust-hub — together-evaluations