Skip to main content

Documentation Index

Fetch the complete documentation index at: https://doc.rapida.ai/llms.txt

Use this file to discover all available pages before exploring further.

Custom Speech-to-Text lets you connect Rapida to a WebSocket transcription service that is not available as a built-in provider. You configure the provider URL, handshake headers, audio settings, and a small JSON DSL that maps Rapida audio packets to your provider’s WebSocket protocol. Use Custom STT when your provider can accept streaming audio over WebSocket and return transcript events as JSON or text frames. Provider identifier: custom-stt
Custom STT is an end-user configuration feature. You do not need to write a new Rapida transformer when your provider can be described with the WebSocket DSL below.

Setup flow

1

Create the Custom STT credential

Open Credentials or Integrations > Models, choose Custom STT, and create a credential with your WebSocket connection details.
2

Select Custom STT in Voice Input

Open your assistant deployment, go to Voice Input, and select Custom STT as the speech-to-text provider.
3

Set audio arguments

Choose the model, language, audio encoding, and sample rate your provider expects.
4

Write WebSocket DSL rules

Define query parameters, request rules, and response rules so Rapida knows how to talk to your provider.
5

Test with conversation logs

Run a test call or web session. Check transcripts, latency, errors, and whether interim/final transcripts are emitted correctly.

Credential fields

FieldRequiredDescription
apiCompatibilityYesMust be websocket_v1.
baseUrlYesWebSocket URL for your STT service, for example wss://stt.example.com/v1/listen.
headersNoHeader map sent during the WebSocket handshake, for example {"Authorization":"Bearer sk_..."}.
The runtime also accepts snake case keys: api_compatibility and base_url.
Headers are copied from the credential as static values. The DSL cannot template headers or change the WebSocket path dynamically.

STT arguments

Option keyRequiredDefaultDescription
listen.modelNoEmptyProvider model identifier. Available as model in query params and config.model in request rules.
listen.languageNoEmptyProvider language code. Available as language in query params and config.language in request rules.
listen.audio.encodingYesLINEAR16Audio encoding sent to the provider. Supported values: LINEAR16, MuLaw8.
listen.audio.sample_rateYes16000Audio sample rate sent to the provider. Common values include 8000, 16000, 24000, 44100, and 48000.
listen.ws.query_paramsNo{}Flat JSON object appended to baseUrl as query parameters.
listen.ws.request_rulesYesNoneOrdered JSON array that maps Rapida packets to outbound WebSocket frames. Must contain at least one audio rule.
listen.ws.response_rulesYesNoneOrdered JSON array that maps provider frames to transcripts or errors.

DSL sections

Custom STT has three DSL sections:
SectionPurpose
Query parametersAdd static or dynamic query params to the WebSocket URL.
Request rulesConvert Rapida packets into provider WebSocket messages.
Response rulesConvert provider WebSocket frames into Rapida transcript events.
The DSL is intentionally small. It does not run JavaScript, call functions, read environment variables, use regex, concatenate strings, or perform compound conditions.

Query parameters

Use listen.ws.query_params when your provider expects configuration in the WebSocket URL. Supported variables:
VariableSource
modellisten.model
languagelisten.language
encodinglisten.audio.encoding
sample_ratelisten.audio.sample_rate
{
  "language": { "$var": "language" },
  "model": { "$var": "model" },
  "encoding": { "$var": "encoding" },
  "sample_rate": {
    "$cast": "number",
    "value": { "$var": "sample_rate" }
  }
}
Rules:
  • Query params must be a flat JSON object.
  • Values must resolve to a primitive: string, number, boolean, or null.
  • Existing query params in baseUrl are preserved unless the rendered DSL uses the same key.

Request rules

Request rules are evaluated for normalized packets produced by Rapida.
PacketWhen it is sentAvailable paths
turn_changeA new turn or context startspacket.kind, packet.context_id, config.model, config.language, config.audio.encoding, config.audio.sample_rate
audioAudio is ready to streampacket.kind, packet.context_id, packet.audio.bytes, packet.audio.base64, config.*
interruptUser interruption is detectedpacket.kind, packet.context_id, config.*
Supported outbound frames:
FrameBody must resolve to
binaryBytes or string. Use this for raw audio streams.
jsonValid JSON value.
textValue convertible to string.

Binary audio stream

Use this when the provider expects raw audio WebSocket frames.
[
  {
    "when": { "packet": "audio" },
    "send": {
      "frame": "binary",
      "body": { "$path": "packet.audio.bytes" }
    }
  }
]

JSON audio payload

Use this when the provider expects base64 audio inside JSON.
[
  {
    "when": { "packet": "audio" },
    "send": {
      "frame": "json",
      "body": {
        "audio": { "$path": "packet.audio.base64" },
        "encoding": { "$path": "config.audio.encoding" },
        "sample_rate": {
          "$cast": "number",
          "value": { "$path": "config.audio.sample_rate" }
        }
      }
    }
  }
]

Start, audio, and interrupt

Use this pattern when the provider expects a session-start message, binary audio frames, and a flush message on interruption.
[
  {
    "when": { "packet": "turn_change" },
    "send": {
      "frame": "json",
      "body": {
        "type": "start",
        "language": { "$path": "config.language" },
        "sample_rate": {
          "$cast": "number",
          "value": { "$path": "config.audio.sample_rate" }
        }
      }
    }
  },
  {
    "when": { "packet": "audio" },
    "send": {
      "frame": "binary",
      "body": { "$path": "packet.audio.bytes" }
    }
  },
  {
    "when": { "packet": "interrupt" },
    "send": {
      "frame": "json",
      "body": { "type": "flush" }
    }
  }
]

Response rules

Response rules parse provider WebSocket frames into Rapida transcript packets. The first matching rule is evaluated and later rules are skipped for that frame. Supported inbound frames:
FrameUse when
jsonProvider returns structured transcript events.
textProvider returns plain transcript text.
Supported emit keys:
Emit keyTypeEffect
scriptstringTranscript text. Empty transcripts are ignored.
confidencenumberOptional transcript confidence. Defaults to 0 when omitted.
languagestringOptional transcript language. Falls back to listen.language when omitted.
interimbooleantrue emits an interim transcript; false emits a completed transcript.
errorstringEmits an STT error instead of a transcript.

JSON partial and final transcripts

[
  {
    "when": { "frame": "json", "path": "type", "equals": "partial" },
    "emit": {
      "script": { "$path": "text" },
      "confidence": {
        "$cast": "number",
        "value": { "$path": "confidence" }
      },
      "language": { "$path": "language" },
      "interim": true
    }
  },
  {
    "when": { "frame": "json", "path": "type", "equals": "final" },
    "emit": {
      "script": { "$path": "text" },
      "confidence": {
        "$cast": "number",
        "value": { "$path": "confidence" }
      },
      "language": { "$path": "language" },
      "interim": false
    }
  },
  {
    "when": { "frame": "json", "path": "type", "equals": "error" },
    "emit": {
      "error": { "$path": "error.message" }
    }
  }
]

Plain text transcript response

[
  {
    "when": { "frame": "text" },
    "emit": {
      "script": { "$frame": "text" },
      "interim": false
    }
  }
]

Operators

Every operator object must contain only that operator and its required fields.
OperatorWhere supportedDescription
$varQuery parametersReads model, language, encoding, or sample_rate.
$pathRequest rules, response rulesReads a dot path from request scope or a JSON response frame.
$castQuery parameters, request rules, response rulesCasts to string, number, or boolean.
$frameResponse rulesReads the full current text response frame.
Unsupported for Custom STT:
  • $decode
  • Binary response handling
  • $frame: "binary"
  • $frame: "json"

Cast behavior

CastBehavior
stringConverts strings, bytes, numbers, booleans, and null to string form.
numberConverts JSON numbers, numeric values, or numeric strings to an integer or float.
booleanConverts booleans, boolean strings, and numeric values. JSON numbers are accepted as 0 or 1; typed numeric values use zero as false and non-zero as true.

Path behavior

$path uses dot-separated paths.
{ "$path": "packet.audio.base64" }
Objects are traversed by key. Arrays are traversed by numeric index.
{ "$path": "results.0.transcript" }
Limits:
  • Keys containing a literal dot are not addressable.
  • Request rules can only read from config and packet.
  • Response rules can use $path only with JSON response frames.
  • A missing path in when.path means the rule does not match.
  • A missing path in emit or send.body is an error.

Runtime behavior

  • The connection URL is built from baseUrl and listen.ws.query_params.
  • Audio is resampled to listen.audio.encoding and listen.audio.sample_rate before request rules run.
  • turn_change and audio packets open the WebSocket connection if needed.
  • interrupt rules are sent only when a connection is already active.
  • If no response rule matches an inbound frame, the frame is ignored.
  • If a response emits error, Rapida emits an STT error packet.
  • If a response emits non-empty script, Rapida emits a transcript packet and conversation event.

Current limits

  • No regex, contains, starts-with, greater-than, or compound match conditions.
  • No string interpolation or concatenation.
  • No fallback values inside expressions.
  • No dynamic headers or dynamic WebSocket path segments.
  • No binary response handling for STT.
  • No $decode.

Troubleshooting

SymptomLikely causeWhat to check
WebSocket does not connectBad baseUrl, headers, or compatibility valueConfirm apiCompatibility, baseUrl, and auth headers.
Provider receives no audioMissing audio request ruleAdd a when.packet = audio rule.
Provider receives JSON but expected binaryWrong send.frameUse binary with packet.audio.bytes.
Transcript never appearsResponse rules do not match provider framesCheck when.frame, when.path, and when.equals.
Partial transcripts show as finalWrong interim valueEmit interim: true for partial responses.
Language or sample rate is wrongQuery params or request body not mappedUse $var in query params or $path from config.* in request rules.

Speech-to-Text

Configure standard STT providers and transcription tuning.

Custom TTS

Configure a custom WebSocket TTS provider.

Voice Pipeline Overview

See how STT fits into the full pipeline.

Open-source runtime reference

Review the assistant-api implementation reference.