Skip to main content
The Opal WebSocket Protocol uses Protocol Buffers for message encoding. All messages are wrapped in either ServiceBoundMessage (client to server) or ClientBoundMessage (server to client).
Messages are binary-encoded protobuf. JSON examples below are shown for readability. Download the proto file.

Connection

wss://app.deepslate.eu/api/v1/vendors/{vendorId}/organizations/{organizationId}/realtime
Authentication via Bearer token in the connection headers.

Client Messages

Messages sent from client to server, wrapped in ServiceBoundMessage.

InitializeSessionRequest

Must be the first message sent. Configures the session parameters.
FieldTypeDescription
input_audio_lineAudioLineConfigurationInput audio format configuration
output_audio_lineAudioLineConfigurationOutput audio format configuration
vad_configurationVadConfigurationVoice activity detection settings
inference_configurationInferenceConfigurationSystem prompt and model behavior
tts_configurationTtsConfigurationOptional TTS provider config (e.g., ElevenLabs)
supports_playback_reportingboolWhether the client will send PlaybackPositionReport messages. When true, enables accurate context truncation on user interrupt. Defaults to false.
{
  "initializeSessionRequest": {
    "inputAudioLine": {
      "sampleRate": 16000,
      "channelCount": 1,
      "sampleFormat": "SIGNED_16_BIT"
    },
    "outputAudioLine": {
      "sampleRate": 16000,
      "channelCount": 1,
      "sampleFormat": "SIGNED_16_BIT"
    },
    "vadConfiguration": {
      "confidenceThreshold": 0.5,
      "minVolume": 0,
      "startDuration": { "seconds": 0, "nanos": 200000000 },
      "stopDuration": { "seconds": 0, "nanos": 500000000 },
      "backbufferDuration": { "seconds": 1, "nanos": 0 }
    },
    "inferenceConfiguration": {
      "systemPrompt": "You are a helpful assistant.",
      "temperature": 0.7
    },
    "supportsPlaybackReporting": true
  }
}
Reconfigure an ongoing session. Useful for changing audio input settings or the system prompt on the fly. You can update either field or both.
Reconfiguration may not be seamless. There may be glitches or dropped audio during the transition.
FieldTypeDescription
input_audio_lineAudioLineConfigurationUpdated input audio format configuration
inference_configurationInferenceConfigurationUpdated system prompt and model behavior
{
  "reconfigureSessionRequest": {
    "inputAudioLine": {
      "sampleRate": 48000,
      "channelCount": 1,
      "sampleFormat": "SIGNED_16_BIT"
    },
    "inferenceConfiguration": {
      "systemPrompt": "You are now a billing specialist."
    }
  }
}
User input data (audio or text).
FieldTypeDescription
packet_iduint64Client-defined packet identifier for tracking
modeInferenceTriggerModeHow to trigger inference for this input
audio_dataAudioDataRaw PCM audio bytes (one of audio_data or text_data)
text_dataTextDataText input (one of audio_data or text_data)
{
  "userInput": {
    "packetId": 1,
    "mode": "IMMEDIATE",
    "audioData": {
      "data": "<binary-pcm-data>"
    }
  }
}
Define or update available tools. Replaces all existing definitions.
FieldTypeDescription
tool_definitionsToolDefinition[]List of tool definitions
Each ToolDefinition contains:
FieldTypeDescription
namestringTool identifier used by the model
descriptionstringPurpose and functionality description
parametersobjectJSON Schema for tool parameters
{
  "updateToolDefinitionsRequest": {
    "toolDefinitions": [
      {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    ]
  }
}
Response to a tool call request from the server.
FieldTypeDescription
idstringMust match the id from ToolCallRequest
resultstringTool execution result (any format the model understands)
Every ToolCallRequest must receive a corresponding ToolCallResponse, even if execution fails.
{
  "toolCallResponse": {
    "id": "call_abc123",
    "result": "{\"temperature\": 22, \"condition\": \"sunny\"}"
  }
}
Manually trigger inference processing immediately, instead of waiting for natural pauses or end-of-input signals.
Primary use case is generating an initial greeting. Model behavior may be unpredictable if used directly after a model response.
FieldTypeDescription
extra_instructionsstringOptional extra instructions to guide the inference
{
  "triggerInference": {
    "extraInstructions": "Greet the user warmly and ask how you can help."
  }
}
Request the full conversation history. The server responds with a ChatHistory message.
FieldTypeDescription
await_pendingboolWhen true, waits for all in-flight async operations (e.g. transcriptions) to complete before responding
{
  "exportChatHistoryRequest": {
    "awaitPending": true
  }
}
Reports how many audio bytes the client has played. Only sent when the client declares supports_playback_reporting: true in InitializeSessionRequest.The server uses this data to accurately truncate the LLM context to exactly what the user heard when they interrupt. Without this, the server falls back to elapsed-time estimation.
FieldTypeDescription
bytes_playeduint64Cumulative number of audio bytes played by the client
{
  "playbackPositionReport": {
    "bytesPlayed": 32000
  }
}
Instructs the service to speak the given text via TTS immediately, bypassing the LLM. Any active inference is cancelled and the audio buffer is cleared before the text is spoken.
FieldTypeDescription
textstringText to speak
include_in_historyboolWhen false, the text is spoken but marked as ephemeral — the LLM won’t know it was spoken
{
  "directSpeech": {
    "text": "Please hold, transferring your call.",
    "includeInHistory": false
  }
}
Runs a one-shot LLM inference over the current conversation history without modifying it. Useful for side tasks like summarization or classification. The server responds with a ConversationQueryResult.At least one of prompt or instructions must be provided.
FieldTypeDescription
promptstringReplaces the system prompt for this one-shot call. If absent, uses the session’s current system prompt
instructionsstringAppended as instructions after the conversation turns. If absent, no extra instructions are appended
{
  "conversationQuery": {
    "prompt": "You are a sentiment analyzer.",
    "instructions": "This is the chat history between an assistant and a customer. Do not reply to the chat in any way; only create a brief summary of the entire chat and describe the key points as briefly as possible. 1-2 sentences without line breaks. Reply directly with the summary, no introductions like: 'Okay, here is a brief summary: ...' Example summary: 'Customer requested a callback; conversation positive.'"
  }
}

Server Messages

Messages sent from server to client, wrapped in ClientBoundMessage.

ModelTextFragment

Streamed text output as tokens arrive. Used when TTS is not configured.
FieldTypeDescription
textstringText content of this fragment
{
  "modelTextFragment": {
    "text": "Hello, how can I help you?"
  }
}
TTS audio output when a TTS provider is configured.
FieldTypeDescription
audioAudioDataAudio bytes matching output_audio_line config
transcriptstringOptional text that was spoken (alignment data from TTS provider)
{
  "modelAudioChunk": {
    "audio": {
      "data": "<binary-pcm-data>"
    },
    "transcript": "Hello, how can I help you?"
  }
}
Model requests to execute a tool.
FieldTypeDescription
idstringUnique identifier for this request
namestringName of the tool to call
parametersobjectParameters matching the tool’s schema
{
  "toolCallRequest": {
    "id": "call_abc123",
    "name": "get_weather",
    "parameters": {
      "location": "Amsterdam"
    }
  }
}
Notification to clear the audio playback buffer. Sent proactively when the user starts speaking, regardless of whether there is ongoing TTS playback.When received, immediately discard any buffered audio that hasn’t been played yet. This message may be sent multiple times if the user interrupts multiple times.
{
  "playbackClearBuffer": {}
}
Notification that the model has begun its response.
{
  "responseBegin": {}
}
Notification that the model has finished its response.
{
  "responseEnd": {}
}
The full conversation history, returned in response to ExportChatHistoryRequest.
FieldTypeDescription
messagesChatMessage[]Ordered list of conversation messages
{
  "chatHistory": {
    "messages": [
      {
        "role": "user",
        "delivery_status": "DELIVERY_COMPLETE",
        "ephemeral": false,
        "content": [
          {
            "type": "input_audio",
            "transcription": "What time is it?"
          }
        ]
      },
      {
        "role": "assistant",
        "delivery_status": "DELIVERY_COMPLETE",
        "ephemeral": false,
        "content": [
          {
            "type": "text",
            "text": "It's currently 3:45 PM.",
            "tts_audio": null
          }
        ]
      }
    ]
  }
}
Structured error notification sent before the server closes the connection.
FieldTypeDescription
categorySessionErrorCategoryError category for programmatic handling
messagestringHuman-readable message for logging or display
trace_idstringOptional trace ID for correlating with server logs
{
  "error": {
    "category": "ERROR_CONFIGURATION",
    "message": "Invalid sample rate: must be between 8000 and 48000",
    "traceId": "abc-123-xyz"
  }
}
Async transcription result for a completed user audio turn. Sent after the transcription worker finishes processing.
FieldTypeDescription
turn_iduint32Identifies which conversation turn this transcription belongs to
textstringTranscribed text
languagestringDetected language (ISO 639-1 code, e.g. "en")
{
  "userTranscriptionResult": {
    "turnId": 3,
    "text": "What time is it?",
    "language": "en"
  }
}
Result of a ConversationQuery request.
FieldTypeDescription
textstringThe LLM’s complete response text
{
  "conversationQueryResult": {
    "text": "positive"
  }
}

Type Definitions

AudioLineConfiguration

FieldTypeDescription
sample_rateuint32Sample rate in Hz (e.g., 16000)
channel_countuint32Number of channels (typically 1 for mono)
sample_formatSampleFormatAudio sample format

SampleFormat

ValueDescription
UNSIGNED_8_BIT8-bit unsigned integer samples
SIGNED_16_BIT16-bit signed integer samples (recommended)
SIGNED_32_BIT32-bit signed integer samples
FLOAT_32_BIT32-bit floating point (0.0 to 1.0)
FLOAT_64_BIT64-bit floating point (0.0 to 1.0)

VadConfiguration

Voice Activity Detection settings.
FieldTypeDescription
confidence_thresholdfloatMin confidence for speech detection (0.0-1.0)
min_volumefloatMin volume level for speech (0.0-1.0)
start_durationDurationSpeech duration to trigger start
stop_durationDurationSilence duration to trigger end
backbuffer_durationDurationAudio buffer before speech start (recommended: 1s)

InferenceConfiguration

FieldTypeDescription
system_promptstringSystem prompt to guide model behavior
temperaturedoubleControls output randomness. Higher values produce more random output, lower values more deterministic output.

TtsConfiguration

Optional text-to-speech configuration. If omitted, raw text fragments are sent.
FieldTypeDescription
api_keystringYour ElevenLabs API key
voice_idstringVoice ID (e.g., “21m00Tcm4TlvDq8ikWAM”)
model_idstringOptional model ID (e.g., “eleven_turbo_v2”)
voice_settingsElevenLabsVoiceSettingsOptional voice fine-tuning settings
locationElevenLabsLocationService location for data residency (default: US)
{
  "ttsConfiguration": {
    "elevenLabs": {
      "apiKey": "sk-...",
      "voiceId": "21m00Tcm4TlvDq8ikWAM",
      "modelId": "eleven_turbo_v2",
      "voiceSettings": {
        "stability": 0.5,
        "similarityBoost": 0.75,
        "style": 0.0,
        "useSpeakerBoost": true,
        "speed": 1.0
      },
      "location": "EU"
    }
  }
}

ElevenLabsVoiceSettings

Fine-tuning settings for ElevenLabs voices.
FieldTypeDescription
stabilitydoubleStability for the voice (0.0-1.0)
similarity_boostdoubleSimilarity boost for the voice (0.0-1.0)
styledoubleStyle setting for v2 models (0.0-1.0)
use_speaker_boostboolWhether to apply speaker boost
speeddoubleSpeed setting for the voice

ElevenLabsLocation

Controls which ElevenLabs regional endpoint is used. See ElevenLabs data residency docs for details.
ValueDescription
USUnited States (default) — accessed via https://elevenlabs.io/
EUEuropean Union — requires ElevenLabs enterprise access
INDIAIndia — requires ElevenLabs enterprise access

Duration

FieldTypeDescription
secondsuint64Whole seconds
nanosuint32Nanoseconds component
// 500 milliseconds
{ "seconds": 0, "nanos": 500000000 }

// 1.5 seconds
{ "seconds": 1, "nanos": 500000000 }

InferenceTriggerMode

Controls how this input interacts with ongoing inference.
ValueDescription
NO_TRIGGERDon’t trigger inference from this input. Audio is buffered for VAD processing but won’t start inference on its own.
QUEUEQueue inference to start after current inference completes (or immediately if idle).
IMMEDIATEInterrupt any ongoing inference and start processing new input immediately. Recommended for streaming audio.

TextData

Text input wrapper.
FieldTypeDescription
datastringRaw text data

ChatMessage

A single message in the conversation history.
FieldTypeDescription
roleChatMessageRoleRole of the entity this message is attributed to
contentChatMessageContent[]Ordered content blocks of this message
delivery_statusChatDeliveryStatusDelivery status of this message
ephemeralbooltrue when the message was spoken via DirectSpeech with include_in_history: false — audible to the user but not in the LLM’s context

ChatMessageRole

ValueDescription
SYSTEMSystem message (usually the system prompt)
USERUser message
ASSISTANTAssistant message

ChatDeliveryStatus

ValueDescription
DELIVERY_IN_PROGRESSTurn is still being generated
DELIVERY_COMPLETEAll content was delivered to the client
DELIVERY_INTERRUPTEDUser interrupted — content reflects what was actually delivered

ChatMessageContent

A single content block within a chat message. Contains one of:
FieldTypeDescription
text_contentChatTextContentText content, optionally with TTS-synthesized audio
input_audioChatAudioDataUser input or model-output audio (not TTS-synthesized)
thoughtsstringInternal model reasoning / chain-of-thought
tool_callToolCallRequestTool call requested by the model
tool_resultToolCallResponseTool execution result
instructionsstringModel instructions (e.g. directives injected via TriggerInference)

ChatTextContent

Text content from a conversation turn, with optional TTS audio. When TTS is active, each synthesized sentence becomes a ChatTextContent with both fields populated.
FieldTypeDescription
textstringThe text content
tts_audioChatAudioDataTTS-synthesized audio for this text, if available

ChatAudioData

Self-describing audio data including format metadata so consumers can decode without out-of-band knowledge.
If you reconfigure the audio pipeline mid-conversation, the format may change. Always inspect the format field rather than assuming it matches the initial configuration.
FieldTypeDescription
audioAudioDataRaw audio bytes
formatAudioLineConfigurationAudio format (sample rate, channels, sample format)
transcriptionstringTranscription of the audio content. Populated asynchronously for user audio turns.

SessionErrorCategory

Broad error categories for programmatic handling of SessionErrorNotification.
ValueDescription
ERROR_UNKNOWNUnknown or unclassified error
ERROR_SESSIONSession lifecycle errors (not initialized, already initialized)
ERROR_CONFIGURATIONConfiguration errors (invalid audio format, missing required fields)
ERROR_PROTOCOLProtocol errors (malformed packets, unexpected message types)
ERROR_INFERENCEInference/AI processing errors (model unavailable, processing failed, timeout)
ERROR_AUDIOAudio pipeline errors (codec failure, VAD errors)
ERROR_TTSTTS synthesis errors
ERROR_INTERNALInternal service errors (catch-all for server-side issues)