model-evaluation
Model Evaluation Code Generator
Generate a Jupyter notebook that evaluates a SageMaker fine-tuned model using LLM-as-Judge via sagemaker-python-sdk v3.
Principles
- One thing at a time. Each response advances exactly one decision. Never combine multiple questions or recommendations in a single turn.
- Confirm before proceeding. Wait for the user to agree before moving to the next step. You are a guide, not a runaway train.
- Don't read files until you need them. Only read reference files when you've reached the workflow step that requires them and the user has confirmed the direction. Never read ahead.
- No narration. Don't explain what you're about to do or what you just did. Share outcomes and ask questions. Keep responses short and focused.
- No repetition. If you said something before a tool call, don't repeat it after. Only share new information.
Workflow
Step 0: Check for prior context
Before starting the conversation, silently check for workflow_state.json in the project directory.
If it exists, read it and remember any useful information (such as model package ARN, model package group name, training job name, dataset paths).
More from awslabs/agent-plugins
deploy
Deploy applications to AWS. Triggers on phrases like: deploy to AWS, host on AWS, run this on AWS, AWS architecture, estimate AWS cost, generate infrastructure. Analyzes any codebase and deploys to optimal AWS services.
121aws-lambda
Design, build, deploy, test, and debug serverless applications with AWS Lambda. Triggers on phrases like: Lambda function, event source, serverless application, API Gateway, EventBridge, Step Functions, serverless API, event-driven architecture, Lambda trigger. For deploying non-serverless apps to AWS, use deploy-on-aws plugin instead.
114aws-serverless-deployment
AWS SAM and AWS CDK deployment for serverless applications. Triggers on phrases like: use SAM, SAM template, SAM init, SAM deploy, CDK serverless, CDK Lambda construct, NodejsFunction, PythonFunction, SAM and CDK together, serverless CI/CD pipeline. For general app deployment with service selection, use deploy-on-aws plugin instead.
87use-case-specification
Creates a reusable use case specification file that defines the business problem, stakeholders, and measurable success criteria for model customization, as recommended by the AWS Responsible AI Lens. Use as the default first step in any model customization plan. Skip only if the user explicitly declines or already has a use case specification to reuse. Captures problem statement, primary users, and LLM-as-a-Judge success tenets.
59amplify-workflow
Build and deploy full-stack web and mobile apps with AWS Amplify Gen2
58planning
Discovers user intent and generates a structured, step-by-step plan for SageMaker AI model customization workflows (fine-tuning, data preparation, evaluation, deployment). Activate when the user's request relates to these areas or when the user asks to modify the current plan. Handles intent discovery, plan generation, plan iteration, and mid-execution plan alterations.
57