Baseten Skill

Product summary

Baseten is a training and inference platform for deploying AI models at scale. Use Truss, an open-source framework, to package models into deployable containers with a config.yaml file specifying GPU, dependencies, and runtime behavior. Push models with truss push to get an autoscaling API endpoint. For hosted models without deployment, use Model APIs (OpenAI-compatible endpoints for LLMs like DeepSeek, Qwen, GLM). Orchestrate multi-step workflows with Chains, each step running on independent hardware. Key files: config.yaml (model configuration), model/model.py (custom inference logic), .trussrc (local CLI config). Primary docs: https://docs.baseten.co

When to use

Deploying models: Package open-source LLMs, embeddings, or custom models with Truss and push to production.
Running inference: Call deployed models via REST API or use Model APIs for instant access to hosted models.
Building pipelines: Orchestrate multi-model workflows (RAG, image generation, transcription) with Chains.
Iterating rapidly: Use truss push --watch for live reload during development, then promote to production.
Scaling automatically: Configure autoscaling to handle traffic spikes and scale to zero when idle.
Training and fine-tuning: Run training jobs on H100/H200 GPUs and deploy checkpoints directly to production.
Async processing: Queue long-running inference tasks with webhooks for batch processing.

Baseten

Baseten Skill

Product summary

When to use

Quick reference

Essential CLI commands