evaluate-multimodal

Installation
SKILL.md

Evaluate Your Multimodal Agent

This recipe helps you evaluate agents that process images, audio, PDFs, or other non-text inputs.

Step 1: Identify Modalities

Read the codebase to understand what your agent processes:

  • Images: classification, analysis, generation, OCR
  • Audio: transcription, voice agents, audio Q&A
  • PDFs/Documents: parsing, extraction, summarization
  • Mixed: multiple input types in one pipeline

Step 2: Read the Relevant Docs

Use the LangWatch MCP:

  • fetch_scenario_docs → search for multimodal pages (image analysis, audio testing, file analysis)
  • fetch_langwatch_docs → search for evaluation SDK docs

For PDF evaluation specifically, reference the pattern from python-sdk/examples/pdf_parsing_evaluation.ipynb:

Related skills

More from langwatch/skills

Installs
8
GitHub Stars
2
First Seen
Mar 23, 2026