Skip to main content
Use the @deepslate-labs/livekit package to add a RealtimeModel implementation to the LiveKit Agents Node.js / TypeScript framework, so you can integrate with the Deepslate unified voice AI infrastructure.
Using the Python LiveKit framework instead? See the LiveKit Plugin (Python) page for the deepslate-livekit package. The two plugins share the same configuration model and feature set; this page is written against the Node framework’s API (the realtime classes live under the llm namespace, and audio frames come from @livekit/rtc-node).
This plugin lives in the deepslate-sdks monorepo. We welcome contributions — feel free to open issues or pull requests there.

Prerequisites

  • A Deepslate account with API credentials
  • Node.js 18+
  • LiveKit server and API credentials
  • (Optional) ElevenLabs API key for server-side TTS

Installation

npm install @deepslate-labs/livekit
The plugin declares the LiveKit framework packages as peer dependencies — install them alongside it:
npm install @livekit/agents @livekit/rtc-node
You don’t need to install @deepslate-labs/core separately. It’s pulled in automatically.

Environment Variables

Set up your credentials as environment variables:
VariableRequiredDescription
DEEPSLATE_VENDOR_IDYesYour Deepslate vendor ID
DEEPSLATE_ORGANIZATION_IDYesYour Deepslate organization ID
DEEPSLATE_API_KEYYesYour Deepslate API key
ELEVENLABS_API_KEYNoElevenLabs API key for server-side TTS
ELEVENLABS_VOICE_IDNoElevenLabs voice ID
ELEVENLABS_MODEL_IDNoElevenLabs model (e.g., eleven_turbo_v2)
Never expose your Deepslate or ElevenLabs API keys to clients. This plugin is for server-side use with LiveKit Agents.

Quick Start

import { fileURLToPath } from "node:url";
import { type JobContext, ServerOptions, cli, defineAgent, voice } from "@livekit/agents";
import { RealtimeModel, elevenLabsConfigFromEnv } from "@deepslate-labs/livekit";

export default defineAgent({
  entry: async (ctx: JobContext) => {
    await ctx.connect();

    const session = new voice.AgentSession({
      llm: new RealtimeModel({
        ttsConfig: elevenLabsConfigFromEnv(),
      }),
    });

    await session.start({
      agent: new voice.Agent({ instructions: "You are a helpful voice AI assistant." }),
      room: ctx.room,
    });

    session.generateReply({ instructions: "Greet the user and offer your assistance." });
  },
});

cli.runApp(new ServerOptions({ agent: fileURLToPath(import.meta.url) }));

Configuration Reference

The RealtimeModel constructor takes a single options object (RealtimeModelOptions):
FieldTypeDefaultDescription
vendorIdstringenv: DEEPSLATE_VENDOR_IDDeepslate vendor ID
organizationIdstringenv: DEEPSLATE_ORGANIZATION_IDDeepslate organization ID
apiKeystringenv: DEEPSLATE_API_KEYDeepslate API key
baseUrlstring"https://app.deepslate.eu"Base URL for the Deepslate API
systemPromptstring"You are a helpful assistant."Default system prompt
temperaturenumber1.0Sampling temperature (0.0–2.0)
generateReplyTimeoutnumber30.0Timeout in seconds for generateReply (0 = no timeout)
vadVadConfigdefaultsVoice activity detection tuning (see below)
ttsConfigTtsConfigundefinedTTS configuration (enables audio output). Use a hosted or ElevenLabs config (see below).
wsUrlstringundefinedDirect WebSocket URL override — useful for local development
Voice Activity Detection is handled server-side by Deepslate. You tune it via the vad object on RealtimeModel — no client-side VAD pipeline is needed.
FieldDefaultDescription
confidenceThreshold0.5Minimum confidence score to classify audio as speech (0.0–1.0)
minVolume0.01Minimum audio volume to consider (0.0–1.0)
startDurationMs200Consecutive speech duration required to start a turn (ms)
stopDurationMs500Silence duration required to end a turn (ms)
backbufferDurationMs1000Audio buffered before the detection window (ms)
import { RealtimeModel } from "@deepslate-labs/livekit";

const model = new RealtimeModel({
  vad: {
    confidenceThreshold: 0.5,
    minVolume: 0.01,
    startDurationMs: 200,
    stopDurationMs: 500,
    backbufferDurationMs: 1000,
  },
});
Tuning tips:
  • Noisy environments: increase confidenceThreshold (0.6–0.8) and minVolume (0.02–0.05)
  • Lower latency: decrease startDurationMs (100–150) and stopDurationMs (200–300)
  • Natural pacing: slightly increase stopDurationMs (600–800)
Use a voice cloned and hosted within Deepslate — no external TTS provider credentials required. Pass it as ttsConfig to enable audio output.
FieldTypeDefaultDescription
provider"hosted"requiredSelects the Deepslate-hosted TTS provider
voiceIdstringrequiredThe ID of the hosted (cloned) voice to use for synthesis
modeHostedTtsModeHostedTtsMode.HIGH_QUALITYQuality/latency tradeoff for synthesis
HostedTtsMode values:
ValueDescription
HIGH_QUALITYBest output quality with still relatively low latency. Recommended for most use cases (default).
LOW_LATENCYLow latency generation mode that takes next to no time to complete. Output quality may be significantly reduced.
import { RealtimeModel, HostedTtsMode } from "@deepslate-labs/livekit";

// Default — high quality
const model = new RealtimeModel({
  ttsConfig: {
    provider: "hosted",
    voiceId: "c3dfa73f-a1ab-4aad-b48a-0e9b9fe4a69f",
  },
});

// Explicit low latency mode
const fastModel = new RealtimeModel({
  ttsConfig: {
    provider: "hosted",
    voiceId: "c3dfa73f-a1ab-4aad-b48a-0e9b9fe4a69f",
    mode: HostedTtsMode.LOW_LATENCY,
  },
});
Configure server-side text-to-speech with ElevenLabs. Pass it as ttsConfig to enable audio output and automatic interruption handling.
FieldTypeDescription
provider"eleven_labs"Selects the ElevenLabs TTS provider
apiKeystringElevenLabs API key (env: ELEVENLABS_API_KEY)
voiceIdstringVoice ID (env: ELEVENLABS_VOICE_ID)
modelIdstringModel ID, e.g., eleven_turbo_v2 (env: ELEVENLABS_MODEL_ID)
locationElevenLabsLocationAPI endpoint region — US (default), EU, or INDIA
voiceSettingsElevenLabsVoiceSettingsFine-grained voice control (see below)
Use elevenLabsConfigFromEnv() to build a config from environment variables.ElevenLabsVoiceSettings — fine-grained control over the synthesized voice:
FieldTypeDescription
stabilitynumberVoice consistency (0.0–1.0); higher = more stable
similarityBoostnumberClarity and similarity to the original voice (0.0–1.0)
stylenumberStyle exaggeration (0.0–1.0)
useSpeakerBoostbooleanBoost similarity to the original speaker
speednumberSpeaking speed multiplier
import { RealtimeModel, ElevenLabsLocation, elevenLabsConfigFromEnv } from "@deepslate-labs/livekit";

// Load from environment variables
const model = new RealtimeModel({ ttsConfig: elevenLabsConfigFromEnv() });

// Or configure manually, with overrides
const tunedModel = new RealtimeModel({
  ttsConfig: {
    provider: "eleven_labs",
    apiKey: "your_elevenlabs_key",
    voiceId: "21m00Tcm4TlvDq8ikWAM",
    modelId: "eleven_turbo_v2",
    location: ElevenLabsLocation.EU,
    voiceSettings: { stability: 0.7, similarityBoost: 0.85, speed: 1.1 },
  },
});
When using server-side TTS (ElevenLabs or hosted), automatic interruption handling (context truncation) is enabled. The server tracks exactly what was spoken before the interruption, keeping the model’s context accurate. Without server-side TTS you can use LiveKit’s standard TTS integration, but this interruption context tracking will not be available.

Features

Real-time Voice Streaming

Low-latency bidirectional audio streaming for natural conversations

Server-side VAD

Voice activity detection handled server-side for reliable, configurable speech detection

Function Tools

Define and use function tools with LiveKit’s llm.tool() helper

Flexible TTS

Server-side TTS via Deepslate-hosted (cloned) voices or ElevenLabs

Low Latency Mode

Hosted voice TTS supports a low latency mode for fastest possible response at the cost of some output quality

Direct Speech

Speak text directly via TTS without routing through the LLM

Conversation Queries

Run one-shot side-channel inference without affecting the main conversation

Chat History Export

Export the full conversation history on demand

Live Configuration

Update the system prompt mid-session without reconnecting

Function Tools

Use LiveKit’s llm.tool() helper to expose tools to the model. Tool parameters are described with a zod schema:
import { llm, voice } from "@livekit/agents";
import { z } from "zod";

const lookupWeather = llm.tool({
  description: "Get the current weather for a given location.",
  parameters: z.object({
    location: z.string().describe("The city or location to look up weather for."),
  }),
  execute: async ({ location }) => `It's sunny and 22°C in ${location}.`,
});

const agent = new voice.Agent({
  instructions: "You are a helpful assistant.",
  tools: { lookupWeather },
});

The Deepslate Session

For Deepslate-specific capabilities (welcome messages, direct speech, conversation queries, history export, live configuration), obtain the underlying DeepslateRealtimeSession from the model with model.session():
import { RealtimeModel, elevenLabsConfigFromEnv } from "@deepslate-labs/livekit";

const model = new RealtimeModel({ ttsConfig: elevenLabsConfigFromEnv() });
const session = model.session();
The session is an event emitter — subscribe with session.on(...).

Session Initialized Event

DeepslateRealtimeSession emits a "session_initialized" event once the WebSocket session is fully set up and ready to accept messages. Combine it with speakDirect() to send a welcome message instead of relying on a fixed delay:
const model = new RealtimeModel({ ttsConfig: elevenLabsConfigFromEnv() });
const session = model.session();

session.on("session_initialized", () => {
  void session.speakDirect("Hello! How can I help you today?");
});
Register the listener before the session connects to avoid missing the event.

Direct Speech

speakDirect() synthesizes and plays audio directly — bypassing the LLM entirely. This is useful for scripted prompts, confirmations, or fallback messages.
await session.speakDirect(
  "Welcome back! How can I help you today?",
  true, // includeInHistory — record as an assistant turn (default: true)
);
Passing false speaks the text without adding it to the conversation context — ideal for system-level announcements.

Conversation Queries

queryConversation() runs a one-shot inference call on a side channel, separate from the main conversational turn. The result is returned as a string and does not affect the conversation history or trigger any audio.
const summary = await session.queryConversation(
  "Summarize the conversation so far in one sentence.",
);
console.log(summary); // e.g., "The user asked about weather in Berlin."
You can also pass a second instructions argument to further constrain the model’s output format.

Chat History Export

Export the full conversation history at any point during a session. The result is delivered via the "chat_history_exported" event:
// Listen for the export result
session.on("chat_history_exported", (messages) => {
  for (const msg of messages) {
    console.log(msg.role, msg.content);
  }
});

// Request the export
await session.exportChatHistory(
  false, // awaitPending — set true to wait for any in-flight operations first
  false, // excludeAudio — set true to omit audio blobs (transcripts only)
);

Live Configuration

Update the system prompt mid-session without reconnecting:
await session.updateInstructions(
  "You are now a concise assistant. Keep replies under two sentences.",
);
Changes take effect on the next model turn.

Contributing

This plugin is open source. Visit the deepslate-sdks monorepo to:
  • Report issues
  • Submit pull requests
  • Request features

Next Steps

LiveKit Plugin (Python)

The Python edition of this plugin

WebSocket API

Low-level WebSocket access for custom integrations

API Reference

Full message schemas and configuration options

LiveKit Agents Docs

LiveKit Agents framework documentation

GitHub Repository

Source code, issues, and contributions