arize
Arize Phoenix for Claude Agent SDK & LangGraph
Phoenix is an open-source LLM observability platform from Arize, built on OpenTelemetry. OpenInference is the companion spec that defines LLM-specific span attributes (llm.model_name, input.value, tool.name, retrieval.documents.*, …) and ships auto-instrumentors. Any OTLP backend (Datadog, Tempo, Jaeger) also accepts the same spans.
Scope of this skill:
- Claude Agent SDK (
claude-agent-sdkPython package — same SDK that powers Claude Code). No auto-instrumentor exists; use the manual wrapper below. - LangGraph — traced via
openinference-instrumentation-langchain(no separate LangGraph package needed).
Packages (exact names)
pip install arize-phoenix # Phoenix server + phoenix.otel / evals / client / experiments
pip install arize-phoenix-otel # phoenix.otel.register() helper
pip install openinference-instrumentation-claude-agent-sdk # auto-instruments the Claude Agent SDK
pip install openinference-instrumentation-langchain # covers LangChain AND LangGraph
pip install openinference-semantic-conventions # attribute name constants
pip install opentelemetry-sdk opentelemetry-exporter-otlp
pip install claude-agent-sdk # the Claude Agent SDK itself
More from kundeng/bayeslearner-skills
analytic-workbench
Use this skill for analytics and data-science workflow setup, exploratory analysis, notebook-first EDA, repo normalization for analysis projects, experiment comparison, AutoML, causal analysis, and promotion from ad hoc exploration into reusable pipelines. Trigger when the user asks for analysis best practices, how to structure an analytics repo, how to organize notebooks and runs, whether to use marimo or Quarto/qmd, how to handle experiment sweeps, how to compare models, or how to make analysis reproducible. Also trigger on phrases such as analytic workbench, EDA, exploratory analysis, notebook workflow, analytics pipeline, reproducible analysis, experiment sweep, hyperparameter comparison, comparison table, marimo, Quarto, qmd, Hamilton, sf-hamilton, dataflow, DAG driver, Hydra, DVC, Kedro, MLflow, AutoML, PyCaret, causal analysis, feature engineering, or model review.
11spec-driven-dev
Spec-driven development: plan → go → review loop with spec lifecycle states and a project-level feature ledger. Use for planning features, implementing from specs, refining specs, tracking what features exist across specs, and resuming work. Trigger on requests mentioning specs, requirements/design/tasks, spec-help, spec-plan, feature ledger, FEATURES.md, spec-ledger, `.kiro`. IMPORTANT: Never edit spec files without first reading this skill.
10design2spec
Convert UI designs into structured JSONC spec packages before code is written, especially for constrained platforms like extensions, dashboards, desktop shells, and mobile apps. Use for design handoff and design-to-spec workflows. Outputs specs, not implementation code.
7workflow-guardrails
Use this skill for agent execution discipline on development and analysis projects: inspect the repo before restructuring, keep durable truth in repo artifacts instead of chat memory, maintain specs/tasks/status docs, verify work honestly, avoid shortcuts, and keep moving through the next concrete work item when the human is away. Trigger when the user asks for workflow discipline, project hygiene, execution guardrails, repo normalization, or when a task risks drifting across architecture, storage, specs, continuity, or tooling boundaries.
7wise-scraper
Structured web scraping for AI coders: explore, then exploit with shipped templates, runner, and hooks.
6splunk-platform
Deep skill for Splunk development, administration, SDK/REST integrations, dashboards, UCC add-ons, ITSI automation, SPL2 authoring, and AI-facing tooling. Use for Splunk SDK, REST, jobs/export, SPL, dashboards, packaging, and MCP-backed analysis workflows.
6