cloudflare-workers-ai
Run LLMs, embeddings, and image generation on Cloudflare's GPU network with 14 new 2025 models, streaming support, and 7 documented error preventions.
- Supports 40+ models across text generation (Llama 4, Gemma 3, Mistral 3.1, GPT-OSS), embeddings (BGE 2x faster, EmbeddingGemma), image generation (Flux, Leonardo), vision, and audio (Deepgram, Whisper v3)
- Handles critical 2025 breaking changes: context window validation switched from characters to tokens, BGE pooling parameter no longer backwards compatible with mean, max_tokens now correctly defaults to 256
- Prevents 7 documented errors including NSFW filter false positives, missing num_steps for image generation, Miniflare AI binding resolution, and neuron consumption discrepancies
- Integrates with AI Gateway for per-request cache control (custom TTL, skip cache headers), logging, cost tracking, and OpenAI-compatible API endpoints
Cloudflare Workers AI
Status: Production Ready ✅ Last Updated: 2026-01-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.58.0, @cloudflare/workers-types@4.20260109.0, workers-ai-provider@3.0.2
Recent Updates (2025):
- April 2025 - Performance: Llama 3.3 70B 2-4x faster (speculative decoding, prefix caching), BGE embeddings 2x faster
- April 2025 - Breaking Changes: max_tokens now correctly defaults to 256 (was not respected), BGE pooling parameter (cls NOT backwards compatible with mean)
- 2025 - New Models (14): Mistral 3.1 24B (vision+tools), Gemma 3 12B (128K context), EmbeddingGemma 300M, Llama 4 Scout, GPT-OSS 120B/20B, Qwen models (QwQ 32B, Coder 32B), Leonardo image gen, Deepgram Aura 2, Whisper v3 Turbo, IBM Granite, Nova 3
- 2025 - Platform: Context windows API change (tokens not chars), unit-based pricing with per-model granularity, workers-ai-provider v3.0.2 (AI SDK v5), LoRA rank up to 32 (was 8), 100 adapters per account
- October 2025: Model deprecations (use Llama 4, GPT-OSS instead)
Quick Start (5 Minutes)
More from jezweb/claude-skills
tailwind-v4-shadcn
|
2.7Ktanstack-query
|
2.5Kshadcn-ui
Install and configure shadcn/ui components for React projects. Guides component selection, installation order, dependency management, customisation with semantic tokens, and common UI recipes (forms, data tables, navigation, modals). Use after tailwind-theme-builder has set up the theme infrastructure, when adding components, building forms, creating data tables, or setting up navigation.
2.5Ktailwind-theme-builder
>
2.2Kfastapi
|
2.0Kcolor-palette
>
1.9K