@deepslate-labs/livekit package to add a RealtimeModel implementation to the LiveKit Agents Node.js / TypeScript framework, so you can integrate with the Deepslate unified voice AI infrastructure.
Using the Python LiveKit framework instead? See the LiveKit Plugin (Python) page for the
deepslate-livekit package. The two plugins share the same configuration model and feature set; this page is written against the Node framework’s API (the realtime classes live under the llm namespace, and audio frames come from @livekit/rtc-node).This plugin lives in the deepslate-sdks monorepo. We welcome contributions — feel free to open issues or pull requests there.
Prerequisites
- A Deepslate account with API credentials
- Node.js 18+
- LiveKit server and API credentials
- (Optional) ElevenLabs API key for server-side TTS
Installation
@deepslate-labs/core separately. It’s pulled in automatically.
Environment Variables
Set up your credentials as environment variables:| Variable | Required | Description |
|---|---|---|
DEEPSLATE_VENDOR_ID | Yes | Your Deepslate vendor ID |
DEEPSLATE_ORGANIZATION_ID | Yes | Your Deepslate organization ID |
DEEPSLATE_API_KEY | Yes | Your Deepslate API key |
ELEVENLABS_API_KEY | No | ElevenLabs API key for server-side TTS |
ELEVENLABS_VOICE_ID | No | ElevenLabs voice ID |
ELEVENLABS_MODEL_ID | No | ElevenLabs model (e.g., eleven_turbo_v2) |
Quick Start
Configuration Reference
RealtimeModel Options
RealtimeModel Options
The
RealtimeModel constructor takes a single options object (RealtimeModelOptions):| Field | Type | Default | Description |
|---|---|---|---|
vendorId | string | env: DEEPSLATE_VENDOR_ID | Deepslate vendor ID |
organizationId | string | env: DEEPSLATE_ORGANIZATION_ID | Deepslate organization ID |
apiKey | string | env: DEEPSLATE_API_KEY | Deepslate API key |
baseUrl | string | "https://app.deepslate.eu" | Base URL for the Deepslate API |
systemPrompt | string | "You are a helpful assistant." | Default system prompt |
temperature | number | 1.0 | Sampling temperature (0.0–2.0) |
generateReplyTimeout | number | 30.0 | Timeout in seconds for generateReply (0 = no timeout) |
vad | VadConfig | defaults | Voice activity detection tuning (see below) |
ttsConfig | TtsConfig | undefined | TTS configuration (enables audio output). Use a hosted or ElevenLabs config (see below). |
wsUrl | string | undefined | Direct WebSocket URL override — useful for local development |
VAD Configuration
VAD Configuration
Voice Activity Detection is handled server-side by Deepslate. You tune it via the
Tuning tips:
vad object on RealtimeModel — no client-side VAD pipeline is needed.| Field | Default | Description |
|---|---|---|
confidenceThreshold | 0.5 | Minimum confidence score to classify audio as speech (0.0–1.0) |
minVolume | 0.01 | Minimum audio volume to consider (0.0–1.0) |
startDurationMs | 200 | Consecutive speech duration required to start a turn (ms) |
stopDurationMs | 500 | Silence duration required to end a turn (ms) |
backbufferDurationMs | 1000 | Audio buffered before the detection window (ms) |
- Noisy environments: increase
confidenceThreshold(0.6–0.8) andminVolume(0.02–0.05) - Lower latency: decrease
startDurationMs(100–150) andstopDurationMs(200–300) - Natural pacing: slightly increase
stopDurationMs(600–800)
HostedTtsConfig
HostedTtsConfig
Use a voice cloned and hosted within Deepslate — no external TTS provider credentials required. Pass it as
ttsConfig to enable audio output.| Field | Type | Default | Description |
|---|---|---|---|
provider | "hosted" | required | Selects the Deepslate-hosted TTS provider |
voiceId | string | required | The ID of the hosted (cloned) voice to use for synthesis |
mode | HostedTtsMode | HostedTtsMode.HIGH_QUALITY | Quality/latency tradeoff for synthesis |
HostedTtsMode values:| Value | Description |
|---|---|
HIGH_QUALITY | Best output quality with still relatively low latency. Recommended for most use cases (default). |
LOW_LATENCY | Low latency generation mode that takes next to no time to complete. Output quality may be significantly reduced. |
ElevenLabsTtsConfig
ElevenLabsTtsConfig
Configure server-side text-to-speech with ElevenLabs. Pass it as
Use
ttsConfig to enable audio output and automatic interruption handling.| Field | Type | Description |
|---|---|---|
provider | "eleven_labs" | Selects the ElevenLabs TTS provider |
apiKey | string | ElevenLabs API key (env: ELEVENLABS_API_KEY) |
voiceId | string | Voice ID (env: ELEVENLABS_VOICE_ID) |
modelId | string | Model ID, e.g., eleven_turbo_v2 (env: ELEVENLABS_MODEL_ID) |
location | ElevenLabsLocation | API endpoint region — US (default), EU, or INDIA |
voiceSettings | ElevenLabsVoiceSettings | Fine-grained voice control (see below) |
elevenLabsConfigFromEnv() to build a config from environment variables.ElevenLabsVoiceSettings — fine-grained control over the synthesized voice:| Field | Type | Description |
|---|---|---|
stability | number | Voice consistency (0.0–1.0); higher = more stable |
similarityBoost | number | Clarity and similarity to the original voice (0.0–1.0) |
style | number | Style exaggeration (0.0–1.0) |
useSpeakerBoost | boolean | Boost similarity to the original speaker |
speed | number | Speaking speed multiplier |
Features
Real-time Voice Streaming
Low-latency bidirectional audio streaming for natural conversations
Server-side VAD
Voice activity detection handled server-side for reliable, configurable speech detection
Function Tools
Define and use function tools with LiveKit’s
llm.tool() helperFlexible TTS
Server-side TTS via Deepslate-hosted (cloned) voices or ElevenLabs
Low Latency Mode
Hosted voice TTS supports a low latency mode for fastest possible response at the cost of some output quality
Direct Speech
Speak text directly via TTS without routing through the LLM
Conversation Queries
Run one-shot side-channel inference without affecting the main conversation
Chat History Export
Export the full conversation history on demand
Live Configuration
Update the system prompt mid-session without reconnecting
Function Tools
Use LiveKit’sllm.tool() helper to expose tools to the model. Tool parameters are described with a zod schema:
The Deepslate Session
For Deepslate-specific capabilities (welcome messages, direct speech, conversation queries, history export, live configuration), obtain the underlyingDeepslateRealtimeSession from the model with model.session():
session.on(...).
Session Initialized Event
DeepslateRealtimeSession emits a "session_initialized" event once the WebSocket session is fully set up and ready to accept messages. Combine it with speakDirect() to send a welcome message instead of relying on a fixed delay:
Register the listener before the session connects to avoid missing the event.
Direct Speech
speakDirect() synthesizes and plays audio directly — bypassing the LLM entirely. This is useful for scripted prompts, confirmations, or fallback messages.
false speaks the text without adding it to the conversation context — ideal for system-level announcements.
Conversation Queries
queryConversation() runs a one-shot inference call on a side channel, separate from the main conversational turn. The result is returned as a string and does not affect the conversation history or trigger any audio.
instructions argument to further constrain the model’s output format.
Chat History Export
Export the full conversation history at any point during a session. The result is delivered via the"chat_history_exported" event:
Live Configuration
Update the system prompt mid-session without reconnecting:Contributing
This plugin is open source. Visit the deepslate-sdks monorepo to:- Report issues
- Submit pull requests
- Request features
Next Steps
LiveKit Plugin (Python)
The Python edition of this plugin
WebSocket API
Low-level WebSocket access for custom integrations
API Reference
Full message schemas and configuration options
LiveKit Agents Docs
LiveKit Agents framework documentation
GitHub Repository
Source code, issues, and contributions