Groq

Groq provides ultra-fast LLM inference using its custom LPU (Language Processing Unit) hardware. The API is fully OpenAI-compatible, so any workflow that works against api.openai.com can be pointed at api.groq.com/openai/v1 with minimal changes.

Official docs: https://console.groq.com/docs/overview

When to Use

Use this skill when you need to:

Run chat completions at extremely low latency (Groq LPU is significantly faster than GPU-based inference)
Use open-weight models such as Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B, or Gemma 2 9B
Transcribe audio using Whisper via an OpenAI-compatible endpoint
List available models on Groq's platform
Drop in a fast, cost-effective inference backend where OpenAI compatibility is assumed