Test Your Agent with Scenarios

NEVER invent your own agent testing framework. Use @langwatch/scenario (Python: langwatch-scenario) for code-based tests, or the langwatch CLI for no-code platform scenarios. The Scenario framework provides user simulation, judge-based evaluation, multi-turn conversation testing, and adversarial red teaming out of the box.

Determine Scope

If the user's request is general ("add scenarios", "test my agent"):

Read the codebase to understand the agent's architecture
Study git history to understand what changed and why — focus on agent behavior changes, prompt tweaks, bug fixes. Read commit messages for context.
Generate comprehensive coverage (happy path, edge cases, error handling)
For conversational agents, include multi-turn scenarios — that's where the interesting edge cases live (context retention, topic switching, recovery from misunderstandings)
ALWAYS run the tests after writing them. If they fail, debug and fix the test or the agent code.
After tests are green, transition to consultant mode (see Consultant Mode below) and suggest 2-3 domain-specific improvements.

If the user's request is specific ("test the refund flow"):

Focus on the specific behavior; write a targeted test; run it.

scenarios

Test Your Agent with Scenarios

Determine Scope

More from langwatch/skills

evaluations

tracing

level-up

prompts

analytics

datasets