Connect to Deepslate Realtime via WebSocket for server-side voice streaming
The WebSocket API provides low-level access to Deepslate Realtime for server-side integrations. Use this for telephony backends, SIP gateways, or custom voice pipelines.
This interface is for server-side use only. End users should connect through WebRTC or your application’s frontend. Never expose your API key to clients.
Set supportsPlaybackReporting: true if your client can report how many audio bytes have been played. This enables accurate context truncation when the user interrupts the model mid-response. See Playback Position Reporting.
See the API Reference for all configuration options including TTS providers and tool definitions.
The server sends ClientBoundMessage with one of several payload types:
Copy
ws.on('message', (data) => { const message = ClientBoundMessage.decode(new Uint8Array(data)); if (message.responseBegin) { // Model has started responding console.log('Response started'); } if (message.modelTextFragment) { // Streamed text tokens (when TTS is not configured) process.stdout.write(message.modelTextFragment.text); } if (message.modelAudioChunk) { // TTS audio output const audioData = message.modelAudioChunk.audio.data; const transcript = message.modelAudioChunk.transcript; // Queue audio for playback playAudio(audioData); } if (message.responseEnd) { // Model has finished responding console.log('Response ended'); } if (message.playbackClearBuffer) { // User started speaking - clear any buffered audio immediately clearAudioQueue(); } if (message.userTranscriptionResult) { // Async transcription for a completed user audio turn const { turnId, text, language } = message.userTranscriptionResult; console.log(`Turn ${turnId} transcribed (${language}): ${text}`); } if (message.error) { // Structured error notification sent before the server closes the connection const { category, message: msg, traceId } = message.error; console.error(`Session error [${category}]: ${msg}`, traceId ?? ''); }});
When the user starts speaking, the server sends PlaybackClearBuffer proactively to ensure any ongoing playback is stopped. This is sent regardless of whether there is currently TTS playback. You should immediately discard any queued audio that hasn’t played yet:
Copy
let audioQueue = [];function playAudio(data) { audioQueue.push(data); // Process queue...}function clearAudioQueue() { audioQueue = []; // Also stop any currently playing audio}
Use TriggerInference to make the model respond immediately without waiting for user speech. The primary use case is generating a greeting when the session opens.
Copy
function triggerGreeting() { const message = ServiceBoundMessage.create({ triggerInference: { extraInstructions: 'Greet the user warmly and ask how you can help.' } }); ws.send(ServiceBoundMessage.encode(message).finish());}ws.on('open', () => { // Initialize session first, then trigger a greeting send({ initializeSessionRequest: { /* ... */ } }); triggerGreeting();});
TriggerInference is designed for generating a greeting before any user input. Using it directly after a model response may produce unpredictable results.
Use DirectSpeech to speak text via TTS immediately, bypassing the LLM. Any active inference is cancelled and the audio buffer is cleared before the text is spoken.
Copy
function speak(text, includeInHistory = true) { const message = ServiceBoundMessage.create({ directSpeech: { text, includeInHistory // false = ephemeral, LLM won't know it was spoken } }); ws.send(ServiceBoundMessage.encode(message).finish());}// Speak a notice the LLM shouldn't know aboutspeak('Please hold, transferring your call.', false);
When includeInHistory is false, the message is marked as ephemeral in the chat history — it is audible to the user but invisible to the LLM’s context.
Use ConversationQuery to run a one-shot LLM inference over the current conversation history without modifying it. The result is returned as a ConversationQueryResult. This is useful for side tasks like summarization or classification that should not affect the ongoing conversation.
Copy
function queryConversation() { const message = ServiceBoundMessage.create({ conversationQuery: { prompt: 'You are a sentiment analyzer.', // Replaces system prompt for this query instructions: 'Rate the user sentiment so far as positive, neutral, or negative. Reply with one word.' } }); ws.send(ServiceBoundMessage.encode(message).finish());}// Handle the result in your message handlerif (message.conversationQueryResult) { console.log('Sentiment:', message.conversationQueryResult.text);}
At least one of prompt or instructions must be provided. If prompt is absent, the session’s current system prompt is used.
If you declared supportsPlaybackReporting: true during session initialization, send PlaybackPositionReport messages regularly as audio plays. This gives the server accurate data to truncate the LLM context to exactly what the user heard when they interrupt.
Use ExportChatHistoryRequest to retrieve the full conversation history at any point. Set awaitPending: true to wait for any in-flight transcriptions to finish before the history is returned.
Copy
function exportHistory(awaitPending = false) { const message = ServiceBoundMessage.create({ exportChatHistoryRequest: { awaitPending } }); ws.send(ServiceBoundMessage.encode(message).finish());}// Handle ChatHistory in your message handlerif (message.chatHistory) { for (const msg of message.chatHistory.messages) { console.log(msg.role, msg.content, msg.deliveryStatus); }}
Each ChatMessage includes a role (SYSTEM, USER, or ASSISTANT), ordered content blocks, a deliveryStatus (DELIVERY_COMPLETE, DELIVERY_INTERRUPTED), and an ephemeral flag for messages spoken via DirectSpeech with includeInHistory: false. Audio content blocks (input_audio and tts_audio) include a transcription string that is populated asynchronously for user audio turns.