Skip to main content
The WebSocket API provides low-level access to Deepslate Realtime for server-side integrations. Use this for telephony backends, SIP gateways, or custom voice pipelines.
This interface is for server-side use only. End users should connect through WebRTC or your application’s frontend. Never expose your API key to clients.

Prerequisites

npm install ws protobufjs

Connect

Connect to the WebSocket endpoint with your API key in the headers:
import WebSocket from 'ws';
import protobuf from 'protobufjs';

const ws = new WebSocket('wss://app.deepslate.eu/api/v1/vendors/{vendorId}/organizations/{organizationId}/realtime', {
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  }
});

ws.binaryType = 'arraybuffer';

ws.on('open', () => {
  console.log('Connected');
  // Initialize session immediately after connecting
});

ws.on('message', (data) => {
  // Handle incoming protobuf messages
});

ws.on('close', (code, reason) => {
  console.log(`Disconnected: ${code}`);
});

Initialize Session

The first message must be an InitializeSessionRequest to configure audio format, VAD, and model behavior:
// Load protobuf definitions
const root = await protobuf.load('realtime.proto');
const ServiceBoundMessage = root.lookupType('eu.deepslate.realtime.speeq.ServiceBoundMessage');
const ClientBoundMessage = root.lookupType('eu.deepslate.realtime.speeq.ClientBoundMessage');

// Build initialization message
const initMessage = ServiceBoundMessage.create({
  initializeSessionRequest: {
    inputAudioLine: {
      sampleRate: 16000,
      channelCount: 1,
      sampleFormat: 1  // SIGNED_16_BIT
    },
    outputAudioLine: {
      sampleRate: 16000,
      channelCount: 1,
      sampleFormat: 1  // SIGNED_16_BIT
    },
    vadConfiguration: {
      confidenceThreshold: 0.5,
      minVolume: 0,
      startDuration: { seconds: 0, nanos: 300000000 },  // 300ms
      stopDuration: { seconds: 0, nanos: 700000000 },   // 700ms
      backbufferDuration: { seconds: 1, nanos: 0 }      // 1s
    },
    inferenceConfiguration: {
      systemPrompt: 'You are a helpful assistant.',
      temperature: 0.7
    },
    supportsPlaybackReporting: true  // Enable precise playback position reporting
  }
});

// Send as binary protobuf
const buffer = ServiceBoundMessage.encode(initMessage).finish();
ws.send(buffer);
Set supportsPlaybackReporting: true if your client can report how many audio bytes have been played. This enables accurate context truncation when the user interrupts the model mid-response. See Playback Position Reporting.
See the API Reference for all configuration options including TTS providers and tool definitions.

Send Audio

Stream audio as UserInput messages. Audio must match your inputAudioLine configuration:
let packetId = 0;

function sendAudio(pcmBuffer) {
  const message = ServiceBoundMessage.create({
    userInput: {
      packetId: packetId++,
      mode: 2,  // IMMEDIATE - interrupt ongoing inference
      audioData: {
        data: pcmBuffer  // Raw PCM bytes matching your config
      }
    }
  });

  const buffer = ServiceBoundMessage.encode(message).finish();
  ws.send(buffer);
}
The mode field controls how the input interacts with ongoing inference:
ValueNameBehavior
0NO_TRIGGERSend audio without triggering inference.
1QUEUEQueue inference to run after any current inference completes, or immediately if idle.
2IMMEDIATEInterrupt any ongoing inference and start a new one immediately.
You can also send text input instead of audio by using textData in place of audioData.
Audio format must exactly match your session configuration. For 16-bit signed PCM at 16kHz mono, each sample is 2 bytes, little-endian.

Handle Responses

The server sends ClientBoundMessage with one of several payload types:
ws.on('message', (data) => {
  const message = ClientBoundMessage.decode(new Uint8Array(data));

  if (message.responseBegin) {
    // Model has started responding
    console.log('Response started');
  }

  if (message.modelTextFragment) {
    // Streamed text tokens (when TTS is not configured)
    process.stdout.write(message.modelTextFragment.text);
  }

  if (message.modelAudioChunk) {
    // TTS audio output
    const audioData = message.modelAudioChunk.audio.data;
    const transcript = message.modelAudioChunk.transcript;

    // Queue audio for playback
    playAudio(audioData);
  }

  if (message.responseEnd) {
    // Model has finished responding
    console.log('Response ended');
  }

  if (message.playbackClearBuffer) {
    // User started speaking - clear any buffered audio immediately
    clearAudioQueue();
  }

  if (message.userTranscriptionResult) {
    // Async transcription for a completed user audio turn
    const { turnId, text, language } = message.userTranscriptionResult;
    console.log(`Turn ${turnId} transcribed (${language}): ${text}`);
  }

  if (message.error) {
    // Structured error notification sent before the server closes the connection
    const { category, message: msg, traceId } = message.error;
    console.error(`Session error [${category}]: ${msg}`, traceId ?? '');
  }
});

Handle Interruptions

When the user starts speaking, the server sends PlaybackClearBuffer proactively to ensure any ongoing playback is stopped. This is sent regardless of whether there is currently TTS playback. You should immediately discard any queued audio that hasn’t played yet:
let audioQueue = [];

function playAudio(data) {
  audioQueue.push(data);
  // Process queue...
}

function clearAudioQueue() {
  audioQueue = [];
  // Also stop any currently playing audio
}

Trigger Inference

Use TriggerInference to make the model respond immediately without waiting for user speech. The primary use case is generating a greeting when the session opens.
function triggerGreeting() {
  const message = ServiceBoundMessage.create({
    triggerInference: {
      extraInstructions: 'Greet the user warmly and ask how you can help.'
    }
  });
  ws.send(ServiceBoundMessage.encode(message).finish());
}

ws.on('open', () => {
  // Initialize session first, then trigger a greeting
  send({ initializeSessionRequest: { /* ... */ } });
  triggerGreeting();
});
TriggerInference is designed for generating a greeting before any user input. Using it directly after a model response may produce unpredictable results.

Reconfigure Session

Use ReconfigureSessionRequest to update the input audio format or system prompt mid-session without reconnecting. You can update either field or both.
function reconfigure({ inputAudioLine, systemPrompt } = {}) {
  const message = ServiceBoundMessage.create({
    reconfigureSessionRequest: {
      ...(inputAudioLine && { inputAudioLine }),
      ...(systemPrompt && {
        inferenceConfiguration: { systemPrompt }
      })
    }
  });
  ws.send(ServiceBoundMessage.encode(message).finish());
}

// Example: switch system prompt mid-call
reconfigure({ systemPrompt: 'You are now a billing specialist.' });
Reconfiguration is not guaranteed to be seamless. There may be brief audio glitches or dropped audio around the transition.

Direct Speech

Use DirectSpeech to speak text via TTS immediately, bypassing the LLM. Any active inference is cancelled and the audio buffer is cleared before the text is spoken.
function speak(text, includeInHistory = true) {
  const message = ServiceBoundMessage.create({
    directSpeech: {
      text,
      includeInHistory  // false = ephemeral, LLM won't know it was spoken
    }
  });
  ws.send(ServiceBoundMessage.encode(message).finish());
}

// Speak a notice the LLM shouldn't know about
speak('Please hold, transferring your call.', false);
When includeInHistory is false, the message is marked as ephemeral in the chat history — it is audible to the user but invisible to the LLM’s context.

Conversation Query

Use ConversationQuery to run a one-shot LLM inference over the current conversation history without modifying it. The result is returned as a ConversationQueryResult. This is useful for side tasks like summarization or classification that should not affect the ongoing conversation.
function queryConversation() {
  const message = ServiceBoundMessage.create({
    conversationQuery: {
      prompt: 'You are a sentiment analyzer.',       // Replaces system prompt for this query
      instructions: 'Rate the user sentiment so far as positive, neutral, or negative. Reply with one word.'
    }
  });
  ws.send(ServiceBoundMessage.encode(message).finish());
}

// Handle the result in your message handler
if (message.conversationQueryResult) {
  console.log('Sentiment:', message.conversationQueryResult.text);
}
At least one of prompt or instructions must be provided. If prompt is absent, the session’s current system prompt is used.

Playback Position Reporting

If you declared supportsPlaybackReporting: true during session initialization, send PlaybackPositionReport messages regularly as audio plays. This gives the server accurate data to truncate the LLM context to exactly what the user heard when they interrupt.
let bytesPlayed = 0;

function onAudioBytesPlayed(byteCount) {
  bytesPlayed += byteCount;

  const message = ServiceBoundMessage.create({
    playbackPositionReport: {
      bytesPlayed
    }
  });
  ws.send(ServiceBoundMessage.encode(message).finish());
}
Without playback reporting, the server falls back to elapsed-time estimation for context truncation, which is less precise.

Export Chat History

Use ExportChatHistoryRequest to retrieve the full conversation history at any point. Set awaitPending: true to wait for any in-flight transcriptions to finish before the history is returned.
function exportHistory(awaitPending = false) {
  const message = ServiceBoundMessage.create({
    exportChatHistoryRequest: { awaitPending }
  });
  ws.send(ServiceBoundMessage.encode(message).finish());
}

// Handle ChatHistory in your message handler
if (message.chatHistory) {
  for (const msg of message.chatHistory.messages) {
    console.log(msg.role, msg.content, msg.deliveryStatus);
  }
}
Each ChatMessage includes a role (SYSTEM, USER, or ASSISTANT), ordered content blocks, a deliveryStatus (DELIVERY_COMPLETE, DELIVERY_INTERRUPTED), and an ephemeral flag for messages spoken via DirectSpeech with includeInHistory: false. Audio content blocks (input_audio and tts_audio) include a transcription string that is populated asynchronously for user audio turns.

Tool Calling

Enable the model to call functions by defining tools and handling requests.

Define Tools

Send an UpdateToolDefinitionsRequest to register available tools. Each tool needs a name, description, and JSON Schema parameters:
function updateTools() {
  const message = ServiceBoundMessage.create({
    updateToolDefinitionsRequest: {
      toolDefinitions: [
        {
          name: 'get_weather',
          description: 'Get current weather for a location',
          parameters: {
            fields: {
              location: {
                kind: { stringValue: 'string' }
              }
            }
          }
        },
        {
          name: 'get_time',
          description: 'Get the current time',
          parameters: {}
        }
      ]
    }
  });

  ws.send(ServiceBoundMessage.encode(message).finish());
}
Calling UpdateToolDefinitionsRequest replaces all existing tools. Send an empty array to clear all tools.

Handle Tool Requests

When the model wants to use a tool, you receive a ToolCallRequest. You must respond with a ToolCallResponse:
ws.on('message', (data) => {
  const message = ClientBoundMessage.decode(new Uint8Array(data));

  if (message.toolCallRequest) {
    const { id, name, parameters } = message.toolCallRequest;

    // Execute the tool
    const result = executeToolCall(name, parameters);

    // Send the response (required for every request)
    const response = ServiceBoundMessage.create({
      toolCallResponse: {
        id: id,      // Must match the request ID
        result: result
      }
    });
    ws.send(ServiceBoundMessage.encode(response).finish());
  }
});

function executeToolCall(name, parameters) {
  switch (name) {
    case 'get_weather':
      const location = parameters?.fields?.location?.kind?.stringValue;
      return JSON.stringify({ temperature: 22, condition: 'sunny', location });

    case 'get_time':
      return JSON.stringify({ time: new Date().toISOString() });

    default:
      return JSON.stringify({ error: `Unknown tool: ${name}` });
  }
}
Every ToolCallRequest must receive a ToolCallResponse, even if the tool execution fails. The model waits for the response before continuing.

Complete Example

A Node.js client with microphone input and speaker output:
npm install ws protobufjs audify
import WebSocket from 'ws';
import protobuf from 'protobufjs';
import pkg from 'audify';
const { RtAudio, RtAudioFormat } = pkg;

async function main() {
  // Load protobuf types
  const root = await protobuf.load('realtime.proto');
  const ServiceBoundMessage = root.lookupType('eu.deepslate.realtime.speeq.ServiceBoundMessage');
  const ClientBoundMessage = root.lookupType('eu.deepslate.realtime.speeq.ClientBoundMessage');

  // Audio config
  const SAMPLE_RATE = 16000;
  const CHANNELS = 1;
  const FRAME_SIZE = 1600; // 100ms of audio

  // Set up audio I/O
  const rtAudio = new RtAudio();

  // State
  let packetId = 0;
  let playbackQueue = [];

  // Connect
  const ws = new WebSocket('wss://app.deepslate.eu/api/v1/vendors/{vendorId}/organizations/{organizationId}/realtime', {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  });
  ws.binaryType = 'arraybuffer';

  // Helper to send messages
  function send(payload) {
    const msg = ServiceBoundMessage.create(payload);
    ws.send(ServiceBoundMessage.encode(msg).finish());
  }

  ws.on('open', () => {
    // Initialize session
    send({
      initializeSessionRequest: {
        inputAudioLine: { sampleRate: SAMPLE_RATE, channelCount: CHANNELS, sampleFormat: 1 },
        outputAudioLine: { sampleRate: SAMPLE_RATE, channelCount: CHANNELS, sampleFormat: 1 },
        vadConfiguration: {
          confidenceThreshold: 0.5,
          minVolume: 0,
          startDuration: { seconds: 0, nanos: 300000000 },
          stopDuration: { seconds: 0, nanos: 700000000 },
          backbufferDuration: { seconds: 1, nanos: 0 }
        },
        inferenceConfiguration: {
          systemPrompt: 'You are a friendly and helpful assistant.'
        },
        supportsPlaybackReporting: true
      }
    });

    // Register tools
    send({
      updateToolDefinitionsRequest: {
        toolDefinitions: [{
          name: 'get_time',
          description: 'Get the current time',
          parameters: {}
        }]
      }
    });

    // Open microphone input
    rtAudio.openStream(
      null, // No output in this stream
      { nChannels: CHANNELS },
      RtAudioFormat.RTAUDIO_SINT16,
      SAMPLE_RATE,
      FRAME_SIZE,
      'deepslate-input',
      (pcm) => {
        send({
          userInput: {
            packetId: packetId++,
            mode: 2,  // IMMEDIATE
            audioData: { data: pcm }
          }
        });
      }
    );
    rtAudio.start();

    console.log('Listening... speak into your microphone');
  });

  ws.on('message', (data) => {
    const msg = ClientBoundMessage.decode(new Uint8Array(data));

    if (msg.responseBegin) {
      console.log('\n[Response started]');
    }

    if (msg.modelTextFragment) {
      process.stdout.write(msg.modelTextFragment.text);
    }

    if (msg.modelAudioChunk) {
      // Queue audio for playback
      playbackQueue.push(Buffer.from(msg.modelAudioChunk.audio.data));
    }

    if (msg.responseEnd) {
      console.log('\n[Response ended]');
    }

    if (msg.playbackClearBuffer) {
      // User started speaking - clear queued audio
      playbackQueue = [];
    }

    if (msg.toolCallRequest) {
      const { id, name, parameters } = msg.toolCallRequest;
      console.log(`Tool call: ${name}`, parameters);
      const result = handleTool(name, parameters);
      console.log(`Tool result: ${result}`);
      send({ toolCallResponse: { id, result } });
    }

    if (msg.userTranscriptionResult) {
      const { turnId, text, language } = msg.userTranscriptionResult;
      console.log(`[Transcription turn ${turnId} (${language})]: ${text}`);
    }

    if (msg.error) {
      const { category, message: errMsg, traceId } = msg.error;
      console.error(`Session error [${category}]: ${errMsg}`, traceId ?? '');
    }
  });

  ws.on('close', (code) => {
    rtAudio.stop();
    rtAudio.closeStream();
    console.log('Disconnected:', code);
  });

  ws.on('error', (err) => console.error('Error:', err));
}

function handleTool(name, parameters) {
  switch (name) {
    case 'get_time':
      return JSON.stringify({ time: new Date().toLocaleTimeString() });
    default:
      return JSON.stringify({ error: `Unknown tool: ${name}` });
  }
}

main();

Next Steps

API Reference

Full message schemas, all configuration options, and tool calling

WebRTC

Browser-based integration for end-user applications

LiveKit Plugin

LiveKit Agents plugin for Deepslate integration

Pipecat Plugin

Pipecat framework plugin for Deepslate integration