llm-obs-experiment-py-bootstrap
Installation
SKILL.md
LLM Obs Experiment (Python) Bootstrap — Generate a Python Experiment Using ddtrace.llmobs
Produce a single self-contained Python experiment that uses the official ddtrace.llmobs SDK. Output is either a .py script or an .ipynb notebook. The generated code mirrors the patterns shown in DataDog's reference notebooks at https://github.com/DataDog/llm-observability/tree/main/experiments/notebooks.
The SDK handles lazy project/experiment creation, dataset push diffing, the 5 MB / 1000-record bulk threshold, eval metric streaming, and the status state machine on the user's behalf. This skill must therefore never re-implement those primitives — it just imports LLMObs and trusts it.
Usage
/llm-obs-experiment-py-bootstrap [--format py|ipynb] [--dataset <path>] [--dataset-name <name>] [--dataset-version <int>] [--project-name <name>] [--evaluator-style function|class|remote] [--jobs <n>] [--output <path>]
Arguments: $ARGUMENTS
Inputs
All inputs are optional. If the user omits a flag, fall back to the default — never block on prompting for --jobs, --format, etc.