deepslate-livekit package provides a RealtimeModel implementation for the LiveKit Agents framework, enabling seamless integration with Deepslate’s unified voice AI infrastructure.
This plugin is in early development. We welcome contributions! See the GitHub repository to get involved.
Prerequisites
- A Deepslate account with API credentials
- Python 3.11+
- LiveKit server and API credentials
- (Optional) ElevenLabs API key for server-side TTS
Installation
Environment Variables
Set up your credentials as environment variables:| Variable | Required | Description |
|---|---|---|
DEEPSLATE_VENDOR_ID | Yes | Your Deepslate vendor ID |
DEEPSLATE_ORGANIZATION_ID | Yes | Your Deepslate organization ID |
DEEPSLATE_API_KEY | Yes | Your Deepslate API key |
ELEVENLABS_API_KEY | No | ElevenLabs API key for TTS |
ELEVENLABS_VOICE_ID | No | ElevenLabs voice ID |
ELEVENLABS_MODEL_ID | No | ElevenLabs model (e.g., eleven_turbo_v2) |
Quick Start
Configuration Reference
RealtimeModel Parameters
RealtimeModel Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
vendor_id | str | env: DEEPSLATE_VENDOR_ID | Deepslate vendor ID |
organization_id | str | env: DEEPSLATE_ORGANIZATION_ID | Deepslate organization ID |
api_key | str | env: DEEPSLATE_API_KEY | Deepslate API key |
base_url | str | https://app.deepslate.eu | Base URL for Deepslate API |
system_prompt | str | "You are a helpful assistant." | System prompt for the model |
generate_reply_timeout | float | 30.0 | Timeout in seconds for generate_reply (0 = no timeout) |
tts_config | ElevenLabsTtsConfig | None | TTS configuration (enables audio output) |
VAD Configuration
VAD Configuration
Voice Activity Detection settings control how the server detects when the user starts and stops speaking.
| Parameter | Type | Default | Description |
|---|---|---|---|
vad_confidence_threshold | float | 0.5 | Minimum confidence to consider audio as speech (0.0-1.0) |
vad_min_volume | float | 0.01 | Minimum volume threshold (0.0-1.0) |
vad_start_duration_ms | int | 200 | Duration of speech to detect start (ms) |
vad_stop_duration_ms | int | 500 | Duration of silence to detect end (ms) |
vad_backbuffer_duration_ms | int | 1000 | Audio buffer before speech detection (ms) |
ElevenLabsTtsConfig
ElevenLabsTtsConfig
Configure server-side text-to-speech with ElevenLabs.
Use
| Parameter | Type | Description |
|---|---|---|
api_key | str | ElevenLabs API key (env: ELEVENLABS_API_KEY) |
voice_id | str | Voice ID (env: ELEVENLABS_VOICE_ID) |
model_id | str | None | Model ID, e.g., eleven_turbo_v2 (env: ELEVENLABS_MODEL_ID) |
ElevenLabsTtsConfig.from_env() to create a config from environment variables.Features
Real-time Voice Streaming
Low-latency bidirectional audio streaming for natural conversations
Server-side VAD
Voice activity detection handled server-side for reliable speech detection
Function Tools
Define and use function tools with the
@function_tool() decoratorElevenLabs TTS
Server-side text-to-speech with automatic interruption handling
Function Tools
Use the@function_tool() decorator to give your agent capabilities:
Contributing
This plugin is open source and we welcome contributions. Visit the GitHub repository to:- Report issues
- Submit pull requests
- Request features