Skip to main content
The Opal WebSocket Protocol uses Protocol Buffers for message encoding. All messages are wrapped in either ServiceBoundMessage (client to server) or ClientBoundMessage (server to client).
Messages are binary-encoded protobuf. JSON examples below are shown for readability. Download the proto file.

Connection

wss://app.deepslate.eu/api/v1/vendors/{vendorId}/organizations/{organizationId}/realtime
Authentication via Bearer token in the connection headers.

Client Messages

Messages sent from client to server, wrapped in ServiceBoundMessage.

InitializeSessionRequest

Must be the first message sent. Configures the session parameters.
FieldTypeDescription
input_audio_lineAudioLineConfigurationInput audio format configuration
output_audio_lineAudioLineConfigurationOutput audio format configuration
vad_configurationVadConfigurationVoice activity detection settings
inference_configurationInferenceConfigurationSystem prompt and model behavior
tts_configurationTtsConfigurationOptional TTS provider config (e.g., ElevenLabs)
{
  "initializeSessionRequest": {
    "inputAudioLine": {
      "sampleRate": 16000,
      "channelCount": 1,
      "sampleFormat": "SIGNED_16_BIT"
    },
    "outputAudioLine": {
      "sampleRate": 16000,
      "channelCount": 1,
      "sampleFormat": "SIGNED_16_BIT"
    },
    "vadConfiguration": {
      "confidenceThreshold": 0.5,
      "minVolume": 0,
      "startDuration": { "seconds": 0, "nanos": 200000000 },
      "stopDuration": { "seconds": 0, "nanos": 500000000 },
      "backbufferDuration": { "seconds": 1, "nanos": 0 }
    },
    "inferenceConfiguration": {
      "systemPrompt": "You are a helpful assistant."
    }
  }
}
Reconfigure an ongoing session. Useful for changing audio input settings on the fly.
Reconfiguration may not be seamless. There may be glitches or dropped audio during the transition.
FieldTypeDescription
input_audio_lineAudioLineConfigurationUpdated input audio format configuration
{
  "reconfigureSessionRequest": {
    "inputAudioLine": {
      "sampleRate": 48000,
      "channelCount": 1,
      "sampleFormat": "SIGNED_16_BIT"
    }
  }
}
User input data (audio or text).
FieldTypeDescription
packet_iduint64Client-defined packet identifier for tracking
modeInferenceTriggerModeHow to trigger inference for this input
audio_dataAudioDataRaw PCM audio bytes (one of audio_data or text_data)
text_dataTextDataText input (one of audio_data or text_data)
{
  "userInput": {
    "packetId": 1,
    "mode": "IMMEDIATE",
    "audioData": {
      "data": "<binary-pcm-data>"
    }
  }
}
Define or update available tools. Replaces all existing definitions.
FieldTypeDescription
tool_definitionsToolDefinition[]List of tool definitions
Each ToolDefinition contains:
FieldTypeDescription
namestringTool identifier used by the model
descriptionstringPurpose and functionality description
parametersobjectJSON Schema for tool parameters
{
  "updateToolDefinitionsRequest": {
    "toolDefinitions": [
      {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    ]
  }
}
Response to a tool call request from the server.
FieldTypeDescription
idstringMust match the id from ToolCallRequest
resultstringTool execution result (any format the model understands)
Every ToolCallRequest must receive a corresponding ToolCallResponse, even if execution fails.
{
  "toolCallResponse": {
    "id": "call_abc123",
    "result": "{\"temperature\": 22, \"condition\": \"sunny\"}"
  }
}
Manually trigger inference processing immediately, instead of waiting for natural pauses or end-of-input signals.
Primary use case is generating an initial greeting. Model behavior may be unpredictable if used directly after a model response.
FieldTypeDescription
extra_instructionsstringOptional extra instructions to guide the inference
{
  "triggerInference": {
    "extraInstructions": "Greet the user warmly and ask how you can help."
  }
}

Server Messages

Messages sent from server to client, wrapped in ClientBoundMessage.

ModelTextFragment

Streamed text output as tokens arrive. Used when TTS is not configured.
FieldTypeDescription
textstringText content of this fragment
{
  "modelTextFragment": {
    "text": "Hello, how can I help you?"
  }
}
TTS audio output when a TTS provider is configured.
FieldTypeDescription
audioAudioDataAudio bytes matching output_audio_line config
transcriptstringOptional text that was spoken
{
  "modelAudioChunk": {
    "audio": {
      "data": "<binary-pcm-data>"
    },
    "transcript": "Hello, how can I help you?"
  }
}
Model requests to execute a tool.
FieldTypeDescription
idstringUnique identifier for this request
namestringName of the tool to call
parametersobjectParameters matching the tool’s schema
{
  "toolCallRequest": {
    "id": "call_abc123",
    "name": "get_weather",
    "parameters": {
      "location": "Amsterdam"
    }
  }
}
Notification to clear the audio playback buffer. Sent proactively when the user starts speaking, regardless of whether there is ongoing TTS playback.When received, immediately discard any buffered audio that hasn’t been played yet. This message may be sent multiple times if the user interrupts multiple times.
{
  "playbackClearBuffer": {}
}
Notification that the model has begun its response.
{
  "responseBegin": {}
}
Notification that the model has finished its response.
{
  "responseEnd": {}
}

Type Definitions

AudioLineConfiguration

FieldTypeDescription
sample_rateuint32Sample rate in Hz (e.g., 16000)
channel_countuint32Number of channels (typically 1 for mono)
sample_formatSampleFormatAudio sample format

SampleFormat

ValueDescription
UNSIGNED_8_BIT8-bit unsigned integer samples
SIGNED_16_BIT16-bit signed integer samples (recommended)
SIGNED_32_BIT32-bit signed integer samples
FLOAT_32_BIT32-bit floating point (0.0 to 1.0)
FLOAT_64_BIT64-bit floating point (0.0 to 1.0)

VadConfiguration

Voice Activity Detection settings.
FieldTypeDescription
confidence_thresholdfloatMin confidence for speech detection (0.0-1.0)
min_volumefloatMin volume level for speech (0.0-1.0)
start_durationDurationSpeech duration to trigger start
stop_durationDurationSilence duration to trigger end
backbuffer_durationDurationAudio buffer before speech start (recommended: 1s)

TtsConfiguration

Optional text-to-speech configuration. If omitted, raw text fragments are sent.
FieldTypeDescription
api_keystringYour ElevenLabs API key
voice_idstringVoice ID (e.g., “21m00Tcm4TlvDq8ikWAM”)
model_idstringOptional model ID (e.g., “eleven_turbo_v2”)
{
  "ttsConfiguration": {
    "elevenLabs": {
      "apiKey": "sk-...",
      "voiceId": "21m00Tcm4TlvDq8ikWAM",
      "modelId": "eleven_turbo_v2"
    }
  }
}

Duration

FieldTypeDescription
secondsuint64Whole seconds
nanosuint32Nanoseconds component
// 500 milliseconds
{ "seconds": 0, "nanos": 500000000 }

// 1.5 seconds
{ "seconds": 1, "nanos": 500000000 }

InferenceTriggerMode

Controls how this input interacts with ongoing inference.
ValueDescription
NO_TRIGGERDon’t trigger inference from this input. Audio is buffered for VAD processing but won’t start inference on its own.
QUEUEQueue inference to start after current inference completes (or immediately if idle).
IMMEDIATEInterrupt any ongoing inference and start processing new input immediately. Recommended for streaming audio.

TextData

Text input wrapper.
FieldTypeDescription
datastringRaw text data