bedrock-agentcore-evaluations
Installation
SKILL.md
Amazon Bedrock AgentCore Evaluations
Overview
AgentCore Evaluations transforms agent testing from "vibes-based" to metric-based quality assurance. Test agents before production, then continuously monitor live interactions using 13 built-in evaluators and custom scoring systems.
Purpose: Ensure AI agents meet quality, safety, and effectiveness standards
Pattern: Task-based (5 operations)
Key Principles (validated by AWS December 2025):
- Pre-Production Testing - Validate before deployment
- Continuous Monitoring - Sample and score live interactions
- 13 Built-in Evaluators - Standard quality dimensions
- Custom Evaluators - LLM-as-Judge for domain-specific metrics
- Alerting Integration - CloudWatch for proactive monitoring
- On-Demand + Continuous - Both testing modes supported