deepslate-pipecat package provides a DeepslateRealtimeLLMService implementation for the Pipecat framework, enabling seamless integration with Deepslate’s unified voice AI infrastructure.
This plugin lives in the deepslate-sdks monorepo. We welcome contributions — feel free to open issues or pull requests there.
Prerequisites
- A Deepslate account with API credentials
- Python 3.11+
- A Pipecat-compatible transport (e.g. Daily.co, Twilio, generic WebSocket)
- (Optional) ElevenLabs API key for server-side TTS
Installation
Environment Variables
Set up your credentials as environment variables:| Variable | Required | Description |
|---|---|---|
DEEPSLATE_VENDOR_ID | Yes | Your Deepslate vendor ID |
DEEPSLATE_ORGANIZATION_ID | Yes | Your Deepslate organization ID |
DEEPSLATE_API_KEY | Yes | Your Deepslate API key |
ELEVENLABS_API_KEY | No | ElevenLabs API key for server-side TTS |
ELEVENLABS_VOICE_ID | No | ElevenLabs voice ID |
ELEVENLABS_MODEL_ID | No | ElevenLabs model (e.g., eleven_turbo_v2) |
Quick Start
Configuration Reference
DeepslateOptions
DeepslateOptions
The main configuration class for connecting to the Deepslate API. Use
DeepslateOptions.from_env() to load credentials from environment variables, with optional keyword overrides.| Parameter | Type | Default | Description |
|---|---|---|---|
vendor_id | str | env: DEEPSLATE_VENDOR_ID | Your Deepslate vendor ID |
organization_id | str | env: DEEPSLATE_ORGANIZATION_ID | Your Deepslate organization ID |
api_key | str | env: DEEPSLATE_API_KEY | Your Deepslate API key |
base_url | str | https://app.deepslate.eu | Base URL for the Deepslate API |
system_prompt | str | "You are a helpful assistant." | System prompt for the model |
temperature | float | 1.0 | Sampling temperature (0.0–2.0) |
generate_reply_timeout | float | 30.0 | Timeout in seconds waiting for a model reply (0 = no timeout) |
ws_url | str | None | None | Direct WebSocket URL override — useful for local development |
max_retries | int | 3 | Maximum reconnection attempts before emitting an ErrorFrame |
VAD Configuration
VAD Configuration
Pass a
VadConfig to DeepslateRealtimeLLMService to tune server-side Voice Activity Detection. Disable client-side VAD on your transport since Deepslate handles it.| Parameter | Type | Default | Description |
|---|---|---|---|
confidence_threshold | float | 0.5 | Minimum confidence to classify audio as speech (0.0–1.0) |
min_volume | float | 0.01 | Minimum volume threshold (0.0–1.0) |
start_duration_ms | int | 200 | Consecutive speech required to detect a turn start (ms) |
stop_duration_ms | int | 500 | Silence required to detect a turn end (ms) |
backbuffer_duration_ms | int | 1000 | Audio buffered before the detection window (ms) |
HostedTtsConfig
HostedTtsConfig
Use a voice cloned and hosted within Deepslate — no external TTS provider credentials required. Pass an instance to
DeepslateRealtimeLLMService(tts_config=...) to enable PCM audio output.| Parameter | Type | Default | Description |
|---|---|---|---|
voice_id | str | required | The ID of the hosted (cloned) voice to use for synthesis |
mode | HostedTtsMode | HostedTtsMode.HIGH_QUALITY | Quality/latency tradeoff for synthesis |
HostedTtsMode values:| Value | Description |
|---|---|
HIGH_QUALITY | Best output quality with still relatively low latency. Recommended for most use cases (default). |
LOW_LATENCY | Low latency generation mode that takes next to no time to complete. Output quality may be significantly reduced. |
ElevenLabsTtsConfig
ElevenLabsTtsConfig
Configure server-side text-to-speech with ElevenLabs via Deepslate. Pass an instance to
Use
DeepslateRealtimeLLMService(tts_config=...) to enable PCM audio output.| Parameter | Type | Description |
|---|---|---|
api_key | str | ElevenLabs API key (env: ELEVENLABS_API_KEY) |
voice_id | str | Voice ID (env: ELEVENLABS_VOICE_ID) |
model_id | str | None | Model ID, e.g., eleven_turbo_v2 (env: ELEVENLABS_MODEL_ID) |
location | ElevenLabsLocation | API endpoint region — US (default), EU, or INDIA |
voice_settings | ElevenLabsVoiceSettingsConfig | None | Fine-grained voice control (see below) |
ElevenLabsTtsConfig.from_env() to create a config from environment variables.ElevenLabsVoiceSettingsConfig — fine-grained control over the synthesized voice:| Parameter | Type | Description |
|---|---|---|
stability | float | None | Voice consistency (0.0–1.0); higher = more stable |
similarity_boost | float | None | Clarity and similarity to the original voice (0.0–1.0) |
style | float | None | Style exaggeration (0.0–1.0) |
use_speaker_boost | bool | None | Boost similarity to the original speaker |
speed | float | None | Speaking speed multiplier |
Features
Real-time Voice Streaming
Low-latency bidirectional PCM audio streaming over WebSockets for natural conversations
Server-side VAD
Voice activity detection handled server-side for reliable, configurable speech detection
Function Calling
Full tool/function calling support using OpenAI JSON schema format with async handlers
ElevenLabs TTS
Server-side TTS with regional endpoints and fine-grained voice settings
Low Latency Mode
Hosted voice TTS supports a low latency mode for fastest possible response at the cost of some output quality
Direct Speech
Speak text directly via TTS without routing through the LLM
Conversation Queries
Run one-shot side-channel inference without affecting the main conversation
Chat History Export
Export the full structured conversation history on demand
Dynamic Context Injection
Inject user or system messages mid-conversation via
LLMMessagesAppendFrameAutomatic Reconnection
Exponential-backoff reconnection with a configurable retry limit
Transport Agnostic
Works with any Pipecat transport: Daily.co, Twilio, generic WebSocket, and more
Session Initialized Frame
DeepslateRealtimeLLMService emits a DeepslateSessionInitializedFrame exactly once, when the WebSocket session is fully initialized and ready to accept messages.
Function Calling
Define tools in OpenAI JSON schema format, register async handlers on the service, and push the definitions into the pipeline before it starts:Dynamic Context Injection
Inject messages into the live conversation context without restarting the session. This is useful for passing user profile data, injecting tool results from external systems, or priming the model with background context.LLMMessagesUpdateFrame to resync the full context and optionally trigger an immediate model reply.
Direct Speech
Push aDeepslateDirectSpeechFrame to synthesize and play text directly — bypassing the LLM entirely. Useful for scripted prompts, confirmations, or fallback messages.
include_in_history=False to speak without adding the text to the conversation context — ideal for system-level announcements.
Conversation Queries
ADeepslateConversationQueryFrame runs a one-shot inference call on a side channel. The result arrives as a DeepslateConversationQueryResultFrame and does not affect the main conversation history or trigger any audio output.
Chat History Export
Push aDeepslateExportChatHistoryFrame to request the full conversation history. The result arrives as a DeepslateChatHistoryFrame downstream in the pipeline.
ChatMessageDict has role, delivery_status, ephemeral, and a content list of typed blocks (text, input_audio, tool_call, tool_result, and more).
Custom Frames Reference
In addition to standard Pipecat frames,deepslate-pipecat exposes the following frames for controlling and observing Deepslate-specific behaviour.
Input Frames (push into the pipeline)
| Frame | Description |
|---|---|
DeepslateExportChatHistoryFrame | Request a full chat history export. await_pending: bool — wait for in-flight ops before exporting. |
DeepslateDirectSpeechFrame | Speak text directly via TTS, bypassing the LLM. text: str, include_in_history: bool. |
DeepslateConversationQueryFrame | One-shot side-channel inference. prompt: str | None, instructions: str | None. |
Output Frames (emitted by the service)
| Frame | Description |
|---|---|
DeepslateSessionInitializedFrame | Emitted once when the session is fully initialized and ready to accept messages. |
DeepslateChatHistoryFrame | Chat history export result. messages: list[ChatMessageDict]. |
DeepslateConversationQueryResultFrame | Side-channel query result. text: str. |
DeepslateUserTranscriptionFrame | User speech-to-text transcription from Deepslate. |
DeepslateModelTranscriptionFrame | Word-aligned transcription for the model’s TTS audio. text: str. |
Transport Examples
The Deepslate service is transport-agnostic. Swap the transport to suit your deployment.Daily.co (WebRTC)
Daily.co (WebRTC)
Twilio
Twilio
Generic WebSocket
Generic WebSocket
Contributing
This plugin is open source. Visit the deepslate-sdks monorepo to:- Report issues
- Submit pull requests
- Request features
Next Steps
API Reference
Full message schemas and configuration options
Pipecat Docs
Pipecat framework documentation
GitHub Repository
Source code, issues, and contributions