Skip to main content
Opal is Deepslate’s proprietary end-to-end speech-to-speech (S2S) model. Unlike traditional voice AI systems that chain together separate components, Opal processes audio input and generates audio output in a single unified model.

What Makes Opal Different

No ASR Pipeline

Direct speech processing means faster responses and better context awareness. No transcription errors to compound.

Ultra-Low Latency

Sub-300ms first byte latency enables natural turn-taking that feels human, not robotic.

High Intelligence

Advanced reasoning with complex instruction following, context retention, and task completion.

Emotion Awareness

Understands emotional cues and responds with appropriate tone and inflection.

Core Architecture

Unlike traditional voice AI that chains separate ASR, LLM, and TTS components, Opal understands speech directly. This eliminates latency penalties and error propagation between stages.

Traditional Cascaded Approach

Each stage introduces latency. Transcription errors compound through the pipeline. Total response time is the sum of all components.

Opal End-to-End Approach

Opal supports two output modes depending on your use case:
The model operates entirely in embedding space, preserving acoustic information that would be lost in text-based intermediate representations. No transcription step means no transcription errors.

Performance Comparison

MetricOpalTraditional Cascade
First byte latencyUnder 300ms800-1500ms
Turn-taking gapNaturalNoticeable delay
Interruption handlingNativeOften problematic
Error propagationNoneCompounds across stages

Key Capabilities

Opal combines speech understanding with advanced reasoning:
  • Complex instruction following — Handles multi-step requests and nuanced instructions
  • Context retention — Maintains conversation context across long interactions
  • Domain adaptation — Quickly adapts to specialized terminology and workflows
  • Task completion — Drives conversations toward defined goals while handling edge cases
Opal maintains consistent voice characteristics throughout conversations or adopts custom voice profiles. This enables branded voice experiences that match your organization’s identity.
The model understands emotional cues in speech and responds appropriately:
  • Detecting caller frustration, confusion, or satisfaction
  • Adjusting response tone to match the situation
  • Conveying empathy, urgency, or reassurance as needed
Opal supports multiple languages and accents, enabling global deployment without requiring separate models for each locale.
Opal supports streaming in both directions:
  • Input streaming — Begins processing before the speaker finishes
  • Output streaming — Starts speaking while still generating the response
This enables natural interruption handling and reduces perceived latency.

Integration with Deepslate Realtime

Opal powers both Assistants (inbound) and Agents (outbound) on the Deepslate platform. When you configure an assistant or agent, you’re defining the behavior, knowledge, and goals — Opal handles the real-time voice interaction.