TheDocumentation Index
Fetch the complete documentation index at: https://docs.deepslate.eu/llms.txt
Use this file to discover all available pages before exploring further.
deepslate-livekit package provides a RealtimeModel implementation for the LiveKit Agents framework, enabling seamless integration with Deepslate’s unified voice AI infrastructure.
This plugin lives in the deepslate-sdks monorepo. We welcome contributions — feel free to open issues or pull requests there.
Prerequisites
- A Deepslate account with API credentials
- Python 3.11+
- LiveKit server and API credentials
- (Optional) ElevenLabs API key for server-side TTS
Installation
Environment Variables
Set up your credentials as environment variables:| Variable | Required | Description |
|---|---|---|
DEEPSLATE_VENDOR_ID | Yes | Your Deepslate vendor ID |
DEEPSLATE_ORGANIZATION_ID | Yes | Your Deepslate organization ID |
DEEPSLATE_API_KEY | Yes | Your Deepslate API key |
ELEVENLABS_API_KEY | No | ElevenLabs API key for server-side TTS |
ELEVENLABS_VOICE_ID | No | ElevenLabs voice ID |
ELEVENLABS_MODEL_ID | No | ElevenLabs model (e.g., eleven_turbo_v2) |
Quick Start
Configuration Reference
RealtimeModel Parameters
RealtimeModel Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
vendor_id | str | None | env: DEEPSLATE_VENDOR_ID | Deepslate vendor ID |
organization_id | str | None | env: DEEPSLATE_ORGANIZATION_ID | Deepslate organization ID |
api_key | str | None | env: DEEPSLATE_API_KEY | Deepslate API key |
base_url | str | https://app.deepslate.eu | Base URL for the Deepslate API |
system_prompt | str | "You are a helpful assistant." | Default system prompt |
temperature | float | 1.0 | Sampling temperature (0.0–2.0) |
generate_reply_timeout | float | 30.0 | Timeout in seconds for generate_reply (0 = no timeout) |
ws_url | str | None | None | Direct WebSocket URL override — useful for local development |
tts_config | ElevenLabsTtsConfig | HostedTtsConfig | None | None | TTS configuration (enables audio output). Use ElevenLabsTtsConfig for ElevenLabs synthesis or HostedTtsConfig for Deepslate-hosted cloned voices. |
http_session | aiohttp.ClientSession | None | None | Shared aiohttp session |
vad_confidence_threshold | float | 0.5 | Minimum confidence to consider audio as speech (0.0–1.0) |
vad_min_volume | float | 0.01 | Minimum volume threshold (0.0–1.0) |
vad_start_duration_ms | int | 200 | Duration of speech to detect start (ms) |
vad_stop_duration_ms | int | 500 | Duration of silence to detect speech end (ms) |
vad_backbuffer_duration_ms | int | 1000 | Audio buffered before speech detection (ms) |
VAD Configuration
VAD Configuration
Voice Activity Detection is handled server-side by Deepslate. You tune it via the
vad_* parameters on RealtimeModel — no client-side VAD pipeline is needed.| Parameter | Default | Description |
|---|---|---|
vad_confidence_threshold | 0.5 | Minimum confidence score to classify audio as speech (0.0–1.0) |
vad_min_volume | 0.01 | Minimum audio volume to consider (0.0–1.0) |
vad_start_duration_ms | 200 | Consecutive speech duration required to start a turn (ms) |
vad_stop_duration_ms | 500 | Silence duration required to end a turn (ms) |
vad_backbuffer_duration_ms | 1000 | Audio buffered before the detection window (ms) |
HostedTtsConfig
HostedTtsConfig
Use a voice cloned and hosted within Deepslate — no external TTS provider credentials required. Pass an instance to
RealtimeModel(tts_config=...) to enable audio output.| Parameter | Type | Default | Description |
|---|---|---|---|
voice_id | str | required | The ID of the hosted (cloned) voice to use for synthesis |
mode | HostedTtsMode | HostedTtsMode.HIGH_QUALITY | Quality/latency tradeoff for synthesis |
HostedTtsMode values:| Value | Description |
|---|---|
HIGH_QUALITY | Best output quality with still relatively low latency. Recommended for most use cases (default). |
LOW_LATENCY | Low latency generation mode that takes next to no time to complete. Output quality may be significantly reduced. |
ElevenLabsTtsConfig
ElevenLabsTtsConfig
Configure server-side text-to-speech with ElevenLabs. Pass an instance to
Use
RealtimeModel(tts_config=...) to enable audio output and automatic interruption handling.| Parameter | Type | Description |
|---|---|---|
api_key | str | ElevenLabs API key (env: ELEVENLABS_API_KEY) |
voice_id | str | Voice ID (env: ELEVENLABS_VOICE_ID) |
model_id | str | None | Model ID, e.g., eleven_turbo_v2 (env: ELEVENLABS_MODEL_ID) |
location | ElevenLabsLocation | API endpoint region — US (default), EU, or INDIA |
voice_settings | ElevenLabsVoiceSettingsConfig | None | Fine-grained voice control (see below) |
ElevenLabsTtsConfig.from_env() to create a config from environment variables.ElevenLabsVoiceSettingsConfig — fine-grained control over the synthesized voice:| Parameter | Type | Description |
|---|---|---|
stability | float | None | Voice consistency (0.0–1.0); higher = more stable |
similarity_boost | float | None | Clarity and similarity to the original voice (0.0–1.0) |
style | float | None | Style exaggeration (0.0–1.0) |
use_speaker_boost | bool | None | Boost similarity to the original speaker |
speed | float | None | Speaking speed multiplier |
Features
Real-time Voice Streaming
Low-latency bidirectional audio streaming for natural conversations
Server-side VAD
Voice activity detection handled server-side for reliable, configurable speech detection
Function Tools
Define and use function tools with the
@function_tool() decoratorElevenLabs TTS
Server-side TTS with regional endpoints and fine-grained voice settings
Low Latency Mode
Hosted voice TTS supports a low latency mode for fastest possible response at the cost of some output quality
Direct Speech
Speak text directly via TTS without routing through the LLM
Conversation Queries
Run one-shot side-channel inference without affecting the main conversation
Chat History Export
Export the full conversation history on demand
Live Configuration
Update the system prompt and temperature mid-session without reconnecting
Session Initialized Event
DeepslateRealtimeSession emits a "session_initialized" event once the WebSocket session is fully set up and ready to accept messages.
model.session() is available after AgentSession is created. Register the listener before calling session.start() to avoid missing the event.Function Tools
Use the@function_tool() decorator to give your agent capabilities:
Direct Speech
speak_direct() lets you synthesize and play audio directly — bypassing the LLM entirely. This is useful for scripted prompts, confirmations, or fallback messages.
include_in_history=False speaks the text without adding it to the conversation context — ideal for system-level announcements.
Conversation Queries
query_conversation() runs a one-shot inference call on a side channel, separate from the main conversational turn. The result is returned as a string and does not affect the conversation history or trigger any audio.
instructions to further constrain the model’s output format.
Chat History Export
Export the full conversation history as a list of structured message dicts at any point during a session:ChatMessageDict structure with role, delivery_status, ephemeral, and a content list of typed content blocks (text, input_audio, tool_call, tool_result, etc.).
Live Configuration
Update the system prompt or temperature mid-session without reconnecting:Contributing
This plugin is open source. Visit the deepslate-sdks monorepo to:- Report issues
- Submit pull requests
- Request features
Next Steps
WebSocket API
Low-level WebSocket access for custom integrations
API Reference
Full message schemas and configuration options
LiveKit Agents Docs
LiveKit Agents framework documentation
GitHub Repository
Source code, issues, and contributions