The voice pipeline controls how Rapida captures user audio, prepares it for transcription, detects when the user has finished speaking, and speaks the assistant response back to the user. Use this overview to understand the full flow. Use the dedicated pages in this section when you need to tune one part of the pipeline.Documentation Index
Fetch the complete documentation index at: https://doc.rapida.ai/llms.txt
Use this file to discover all available pages before exploring further.
Voice input and voice output are configured per deployment. The same assistant can use different voice settings for Phone Call, Web Widget, and Web App / SDK deployments.
Configure it
Open your assistant, select Configure Assistant, then open Deployments. Voice settings appear inside each deployment that supports audio.| Deployment | Voice input | Voice output | Notes |
|---|---|---|---|
| Phone Call | Required | Required | Caller audio and assistant speech are both required for live calls. |
| Web Widget | Optional | Optional | Can run as text-only, voice-input-only, voice-output-only, or full voice. |
| Web App / SDK | Optional | Optional | Your application controls the UI while Rapida handles the audio pipeline. |
| Not used | Not used | WhatsApp uses text messages, not the voice pipeline. |
Pipeline components
Noise Cancellation
Clean background noise before VAD and STT process the user’s audio.
Speech-to-Text
Choose the provider, credential, model, and language used to transcribe user speech.
Text-to-Speech
Choose the provider, voice, model, language, pronunciation, and speech delivery settings.
Voice Activity Detection
Tune speech detection, silence frames, and barge-in sensitivity.
End of Speech Detection
Decide when the user has finished a turn and the assistant should respond.
Recommended starting point
| Area | Start with |
|---|---|
| STT | A streaming provider and model that matches your channel audio. |
| Noise cancellation | RNNoise enabled for phone calls and noisy browser environments. |
| VAD | Silero VAD. |
| EOS | Pipecat Smart Turn for natural conversations, or Silence-Based for simple IVR-style flows. |
| TTS | A low-latency streaming voice that supports the assistant’s primary language. |
| Prompt | Short spoken responses, usually one or two sentences. |
Troubleshooting map
| Symptom | First place to look |
|---|---|
| Assistant responds before the user is done | End of Speech Detection |
| Assistant interrupts on coughs or background noise | Voice Activity Detection and Noise Cancellation |
| Transcript is wrong or incomplete | Speech-to-Text |
| Assistant voice is slow, unnatural, or mispronounces terms | Text-to-Speech |
| Phone calls behave differently from web sessions | Deployment-level voice input/output settings |
Related
Phone Call Deployment
Configure required voice input and output for phone calls.
Web Widget Deployment
Add optional microphone input and spoken responses to the website widget.
Web App / SDK Deployment
Build a custom voice interface while Rapida handles the audio pipeline.
Create an Assistant
Create the assistant before configuring deployment voice settings.