Model Evaluation Code Generator

Generate a Jupyter notebook that evaluates a SageMaker fine-tuned model using LLM-as-Judge via sagemaker-python-sdk v3.

Principles

One thing at a time. Each response advances exactly one decision. Never combine multiple questions or recommendations in a single turn.
Confirm before proceeding. Wait for the user to agree before moving to the next step. You are a guide, not a runaway train.
Don't read files until you need them. Only read reference files when you've reached the workflow step that requires them and the user has confirmed the direction. Never read ahead.
No narration. Don't explain what you're about to do or what you just did. Share outcomes and ask questions. Keep responses short and focused.
No repetition. If you said something before a tool call, don't repeat it after. Only share new information.

Workflow

Step 0: Check for prior context

Before starting the conversation, silently check for workflow_state.json in the project directory. If it exists, read it and remember any useful information (such as model package ARN, model package group name, training job name, dataset paths).

model-evaluation

Model Evaluation Code Generator

Principles

Workflow

Step 0: Check for prior context

More from awslabs/agent-plugins

deploy

aws-lambda

aws-serverless-deployment

use-case-specification

amplify-workflow

planning