Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.deepslate.eu/llms.txt

Use this file to discover all available pages before exploring further.

The deepslate-livekit package provides a RealtimeModel implementation for the LiveKit Agents framework, enabling seamless integration with Deepslate’s unified voice AI infrastructure.
This plugin lives in the deepslate-sdks monorepo. We welcome contributions — feel free to open issues or pull requests there.

Prerequisites

  • A Deepslate account with API credentials
  • Python 3.11+
  • LiveKit server and API credentials
  • (Optional) ElevenLabs API key for server-side TTS

Installation

pip install deepslate-livekit

Environment Variables

Set up your credentials as environment variables:
VariableRequiredDescription
DEEPSLATE_VENDOR_IDYesYour Deepslate vendor ID
DEEPSLATE_ORGANIZATION_IDYesYour Deepslate organization ID
DEEPSLATE_API_KEYYesYour Deepslate API key
ELEVENLABS_API_KEYNoElevenLabs API key for server-side TTS
ELEVENLABS_VOICE_IDNoElevenLabs voice ID
ELEVENLABS_MODEL_IDNoElevenLabs model (e.g., eleven_turbo_v2)
Never expose your Deepslate or ElevenLabs API keys to clients. This plugin is for server-side use with LiveKit Agents.

Quick Start

from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, room_io

import deepslate.livekit.realtime
from deepslate.livekit.realtime import ElevenLabsTtsConfig

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.")

server = AgentServer()

@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        llm=deepslate.livekit.realtime.RealtimeModel(
            tts_config=ElevenLabsTtsConfig.from_env()
        ),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(server)

Configuration Reference

ParameterTypeDefaultDescription
vendor_idstr | Noneenv: DEEPSLATE_VENDOR_IDDeepslate vendor ID
organization_idstr | Noneenv: DEEPSLATE_ORGANIZATION_IDDeepslate organization ID
api_keystr | Noneenv: DEEPSLATE_API_KEYDeepslate API key
base_urlstrhttps://app.deepslate.euBase URL for the Deepslate API
system_promptstr"You are a helpful assistant."Default system prompt
temperaturefloat1.0Sampling temperature (0.0–2.0)
generate_reply_timeoutfloat30.0Timeout in seconds for generate_reply (0 = no timeout)
ws_urlstr | NoneNoneDirect WebSocket URL override — useful for local development
tts_configElevenLabsTtsConfig | HostedTtsConfig | NoneNoneTTS configuration (enables audio output). Use ElevenLabsTtsConfig for ElevenLabs synthesis or HostedTtsConfig for Deepslate-hosted cloned voices.
http_sessionaiohttp.ClientSession | NoneNoneShared aiohttp session
vad_confidence_thresholdfloat0.5Minimum confidence to consider audio as speech (0.0–1.0)
vad_min_volumefloat0.01Minimum volume threshold (0.0–1.0)
vad_start_duration_msint200Duration of speech to detect start (ms)
vad_stop_duration_msint500Duration of silence to detect speech end (ms)
vad_backbuffer_duration_msint1000Audio buffered before speech detection (ms)
Voice Activity Detection is handled server-side by Deepslate. You tune it via the vad_* parameters on RealtimeModel — no client-side VAD pipeline is needed.
ParameterDefaultDescription
vad_confidence_threshold0.5Minimum confidence score to classify audio as speech (0.0–1.0)
vad_min_volume0.01Minimum audio volume to consider (0.0–1.0)
vad_start_duration_ms200Consecutive speech duration required to start a turn (ms)
vad_stop_duration_ms500Silence duration required to end a turn (ms)
vad_backbuffer_duration_ms1000Audio buffered before the detection window (ms)
Use a voice cloned and hosted within Deepslate — no external TTS provider credentials required. Pass an instance to RealtimeModel(tts_config=...) to enable audio output.
ParameterTypeDefaultDescription
voice_idstrrequiredThe ID of the hosted (cloned) voice to use for synthesis
modeHostedTtsModeHostedTtsMode.HIGH_QUALITYQuality/latency tradeoff for synthesis
HostedTtsMode values:
ValueDescription
HIGH_QUALITYBest output quality with still relatively low latency. Recommended for most use cases (default).
LOW_LATENCYLow latency generation mode that takes next to no time to complete. Output quality may be significantly reduced.
from deepslate.livekit.realtime import HostedTtsConfig, HostedTtsMode

# Default — high quality
llm = deepslate.livekit.realtime.RealtimeModel(
    tts_config=HostedTtsConfig(voice_id="c3dfa73f-a1ab-4aad-b48a-0e9b9fe4a69f")
)

# Explicit low latency mode
llm = deepslate.livekit.realtime.RealtimeModel(
    tts_config=HostedTtsConfig(
        voice_id="c3dfa73f-a1ab-4aad-b48a-0e9b9fe4a69f",
        mode=HostedTtsMode.LOW_LATENCY,
    )
)
Configure server-side text-to-speech with ElevenLabs. Pass an instance to RealtimeModel(tts_config=...) to enable audio output and automatic interruption handling.
ParameterTypeDescription
api_keystrElevenLabs API key (env: ELEVENLABS_API_KEY)
voice_idstrVoice ID (env: ELEVENLABS_VOICE_ID)
model_idstr | NoneModel ID, e.g., eleven_turbo_v2 (env: ELEVENLABS_MODEL_ID)
locationElevenLabsLocationAPI endpoint region — US (default), EU, or INDIA
voice_settingsElevenLabsVoiceSettingsConfig | NoneFine-grained voice control (see below)
Use ElevenLabsTtsConfig.from_env() to create a config from environment variables.ElevenLabsVoiceSettingsConfig — fine-grained control over the synthesized voice:
ParameterTypeDescription
stabilityfloat | NoneVoice consistency (0.0–1.0); higher = more stable
similarity_boostfloat | NoneClarity and similarity to the original voice (0.0–1.0)
stylefloat | NoneStyle exaggeration (0.0–1.0)
use_speaker_boostbool | NoneBoost similarity to the original speaker
speedfloat | NoneSpeaking speed multiplier
from deepslate.livekit.realtime import (
    ElevenLabsTtsConfig,
    ElevenLabsVoiceSettingsConfig,
    ElevenLabsLocation,
)

tts_config = ElevenLabsTtsConfig.from_env(
    location=ElevenLabsLocation.EU,
    voice_settings=ElevenLabsVoiceSettingsConfig(
        stability=0.7,
        similarity_boost=0.85,
        speed=1.1,
    ),
)
When using ElevenLabs TTS, automatic interruption handling (context truncation) is enabled. The server tracks exactly what was spoken before the interruption, keeping the model’s context accurate. Without server-side TTS, you can use LiveKit’s standard TTS integration, but this interruption context tracking will not be available.

Features

Real-time Voice Streaming

Low-latency bidirectional audio streaming for natural conversations

Server-side VAD

Voice activity detection handled server-side for reliable, configurable speech detection

Function Tools

Define and use function tools with the @function_tool() decorator

ElevenLabs TTS

Server-side TTS with regional endpoints and fine-grained voice settings

Low Latency Mode

Hosted voice TTS supports a low latency mode for fastest possible response at the cost of some output quality

Direct Speech

Speak text directly via TTS without routing through the LLM

Conversation Queries

Run one-shot side-channel inference without affecting the main conversation

Chat History Export

Export the full conversation history on demand

Live Configuration

Update the system prompt and temperature mid-session without reconnecting

Session Initialized Event

DeepslateRealtimeSession emits a "session_initialized" event once the WebSocket session is fully set up and ready to accept messages.
import asyncio
from livekit import agents
from livekit.agents import AgentSession, Agent
import deepslate.livekit.realtime
from deepslate.livekit.realtime import ElevenLabsTtsConfig

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.")

@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
    model = deepslate.livekit.realtime.RealtimeModel(
        tts_config=ElevenLabsTtsConfig.from_env()
    )
    session = AgentSession(llm=model)

    deepslate_session = model.session()
    deepslate_session.on("session_initialized", lambda _: asyncio.create_task(
        deepslate_session.speak_direct("Hello! How can I help you today?")
    ))

    await session.start(room=ctx.room, agent=Assistant())
model.session() is available after AgentSession is created. Register the listener before calling session.start() to avoid missing the event.

Function Tools

Use the @function_tool() decorator to give your agent capabilities:
from livekit.agents import function_tool

@function_tool()
async def lookup_weather(location: str) -> str:
    """Get the current weather for a location.

    Args:
        location: The city or location to get weather for
    """
    # Your implementation here
    return f"The weather in {location} is sunny and 22°C"

class WeatherAssistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful weather assistant.",
            tools=[lookup_weather],
        )

Direct Speech

speak_direct() lets you synthesize and play audio directly — bypassing the LLM entirely. This is useful for scripted prompts, confirmations, or fallback messages.
# Inside a session or agent callback
await session.llm.session().speak_direct(
    "Welcome back! How can I help you today?",
    include_in_history=True,  # Record as an assistant turn (default: True)
)
Setting include_in_history=False speaks the text without adding it to the conversation context — ideal for system-level announcements.

Conversation Queries

query_conversation() runs a one-shot inference call on a side channel, separate from the main conversational turn. The result is returned as a string and does not affect the conversation history or trigger any audio.
summary = await session.llm.session().query_conversation(
    prompt="Summarize the conversation so far in one sentence.",
)
print(summary)  # e.g., "The user asked about weather in Berlin."
You can also pass instructions to further constrain the model’s output format.

Chat History Export

Export the full conversation history as a list of structured message dicts at any point during a session:
from livekit.agents import EventTypes

# Listen for the export result
@session.llm.session().on("chat_history_exported")
def on_history(messages):
    for msg in messages:
        print(msg["role"], msg["content"])

# Request the export
await session.llm.session().export_chat_history(
    await_pending=False,  # Set True to wait for any in-flight operations first
)
Each message follows the ChatMessageDict structure with role, delivery_status, ephemeral, and a content list of typed content blocks (text, input_audio, tool_call, tool_result, etc.).

Live Configuration

Update the system prompt or temperature mid-session without reconnecting:
await session.llm.update_options(
    system_prompt="You are now a concise assistant. Keep replies under two sentences.",
    temperature=0.8,
)
Changes take effect on the next model turn.

Contributing

This plugin is open source. Visit the deepslate-sdks monorepo to:
  • Report issues
  • Submit pull requests
  • Request features

Next Steps

WebSocket API

Low-level WebSocket access for custom integrations

API Reference

Full message schemas and configuration options

LiveKit Agents Docs

LiveKit Agents framework documentation

GitHub Repository

Source code, issues, and contributions