ElevenLabs Speech Engine

Add a real-time voice interface to a custom agent. ElevenLabs handles microphone audio, speech-to-text, turn-taking, text-to-speech, and browser playback; your server exposes a Speech Engine WebSocket endpoint and streams response text back.

Setup: See Installation Guide. For JavaScript, use @elevenlabs/* packages only. For deeper SDK details, read JavaScript SDK Reference or Python SDK Reference.

When to Use

Use Speech Engine when the user wants to:

Add voice to an existing chat app or custom server pipeline
Add voice to OpenClaw, Hermes, or a similar agent runtime while keeping agent logic on the developer-owned server
Build a developer-hosted WebSocket server for ElevenLabs voice conversations
Stream response text back as spoken audio after your server validates user intent
Handle user interruptions while a response is still streaming
Build a browser client with @elevenlabs/react or @elevenlabs/client using a server-issued conversation token

Use the agents skill instead when the user is creating or configuring a hosted ElevenLabs Conversational AI agent with platform-managed prompts, tools, workflows, phone numbers, or widgets.

speech-engine

ElevenLabs Speech Engine

When to Use