Baseten
SKILL.md
Baseten Skill
Product summary
Baseten is a training and inference platform for deploying AI models at scale. Use Truss, an open-source framework, to package models into deployable containers with a config.yaml file specifying GPU, dependencies, and runtime behavior. Push models with truss push to get an autoscaling API endpoint. For hosted models without deployment, use Model APIs (OpenAI-compatible endpoints for LLMs like DeepSeek, Qwen, GLM). Orchestrate multi-step workflows with Chains, each step running on independent hardware. Key files: config.yaml (model configuration), model/model.py (custom inference logic), .trussrc (local CLI config). Primary docs: https://docs.baseten.co
When to use
- Deploying models: Package open-source LLMs, embeddings, or custom models with Truss and push to production.
- Running inference: Call deployed models via REST API or use Model APIs for instant access to hosted models.
- Building pipelines: Orchestrate multi-model workflows (RAG, image generation, transcription) with Chains.
- Iterating rapidly: Use
truss push --watchfor live reload during development, then promote to production. - Scaling automatically: Configure autoscaling to handle traffic spikes and scale to zero when idle.
- Training and fine-tuning: Run training jobs on H100/H200 GPUs and deploy checkpoints directly to production.
- Async processing: Queue long-running inference tasks with webhooks for batch processing.