together-evaluations
Pass
Audited by Gen Agent Trust Hub on May 6, 2026
Risk Level: SAFE
Full Analysis
- [PROMPT_INJECTION]: The skill processes untrusted external datasets that are subsequently evaluated by LLM judge models, which establishes an inherent surface for indirect prompt injection. 1. Ingestion points: Dataset files provided via
input_data_file_pathinscripts/run_evaluation.pyandscripts/run_evaluation.ts. 2. Boundary markers: Absent. 3. Capability inventory: Orchestrates LLM evaluations via Together AI API and performs local file system operations for dataset handling. 4. Sanitization: No validation or filtering of dataset content is performed prior to processing. - [COMMAND_EXECUTION]: The provided scripts perform legitimate file operations, such as creating temporary JSONL files for upload and downloading results, alongside network requests to the Together AI API and external providers. These actions are consistent with the skill's stated purpose of automating model evaluations.
Audit Metadata