Voice pipeline overview - rapida.ai documentation

The voice pipeline controls how Rapida captures user audio, prepares it for transcription, detects when the user has finished speaking, and speaks the assistant response back to the user. Use this overview to understand the full flow. Use the dedicated pages in this section when you need to tune one part of the pipeline.

Voice input and voice output are configured per deployment. The same assistant can use different voice settings for Phone Call, Web Widget, and Web App / SDK deployments.

Configure it

Open your assistant, select Configure Assistant, then open Deployments. Voice settings appear inside each deployment that supports audio.

Deployment	Voice input	Voice output	Notes
Phone Call	Required	Required	Caller audio and assistant speech are both required for live calls.
Web Widget	Optional	Optional	Can run as text-only, voice-input-only, voice-output-only, or full voice.
Web App / SDK	Optional	Optional	Your application controls the UI while Rapida handles the audio pipeline.
WhatsApp	Not used	Not used	WhatsApp uses text messages, not the voice pipeline.

Pipeline components

Noise Cancellation

Clean background noise before VAD and STT process the user’s audio.

Speech-to-Text

Choose the provider, credential, model, and language used to transcribe user speech.

Text-to-Speech

Choose the provider, voice, model, language, pronunciation, and speech delivery settings.

Voice Activity Detection

Tune speech detection, silence frames, and barge-in sensitivity.

End of Speech Detection

Decide when the user has finished a turn and the assistant should respond.

Recommended starting point

Area	Start with
STT	A streaming provider and model that matches your channel audio.
Noise cancellation	RNNoise enabled for phone calls and noisy browser environments.
VAD	Silero VAD.
EOS	Pipecat Smart Turn for natural conversations, or Silence-Based for simple IVR-style flows.
TTS	A low-latency streaming voice that supports the assistant’s primary language.
Prompt	Short spoken responses, usually one or two sentences.

Tune the pipeline from real conversation logs. If a caller gets cut off, start with EOS and VAD. If transcription is wrong, check language, audio quality, noise cancellation, and STT model. If the assistant feels slow, check EOS timeout, LLM latency, and TTS latency.

Troubleshooting map

Symptom	First place to look
Assistant responds before the user is done	End of Speech Detection
Assistant interrupts on coughs or background noise	Voice Activity Detection and Noise Cancellation
Transcript is wrong or incomplete	Speech-to-Text
Assistant voice is slow, unnatural, or mispronounces terms	Text-to-Speech
Phone calls behave differently from web sessions	Deployment-level voice input/output settings

Phone Call Deployment

Configure required voice input and output for phone calls.

Web Widget Deployment

Add optional microphone input and spoken responses to the website widget.

Web App / SDK Deployment

Build a custom voice interface while Rapida handles the audio pipeline.

Create an Assistant

Create the assistant before configuring deployment voice settings.

Documentation Index

​Configure it

​Pipeline components

Noise Cancellation

Speech-to-Text

Text-to-Speech

Voice Activity Detection

End of Speech Detection

​Recommended starting point

​Troubleshooting map

​Related

Phone Call Deployment

Web Widget Deployment

Web App / SDK Deployment

Create an Assistant

Configure it

Pipeline components

Recommended starting point

Troubleshooting map

Related