Pipecat Smart Turn uses a Whisper-based audio model (~8 MB) to predict turn completion directly from speech audio. It detects prosodic cues — falling intonation, speech rate changes — that indicate a caller has finished speaking. Provider identifier:Documentation Index
Fetch the complete documentation index at: https://doc.rapida.ai/llms.txt
Use this file to discover all available pages before exploring further.
pipecat_smart_turn_eos
Source Location
How It Works
UserAudioPacketaudio is accumulated in a rolling float32 buffer (max ~5 seconds at 16 kHz)- When a final
SpeechToTextPacketarrives, the model runs inference on the buffered audio - The model outputs a turn-completion probability (0.0 – 1.0)
- If
probability >= threshold→ set timer toquick_timeout(caller is likely done) - If
probability < threshold→ set timer tosilence_timeout(caller is likely still speaking) - Interim transcripts reset the timer to
fallback_timeout - When the timer fires,
EndOfSpeechPacketis emitted
Parameters
| Option Key | Default | Range | Description |
|---|---|---|---|
microphone.eos.threshold | 0.5 | 0.1–0.9 | Turn completion probability threshold |
microphone.eos.quick_timeout | 250 ms | 50–1000 ms | Silence buffer when model says “done” |
microphone.eos.silence_timeout | 2000 ms | 500–5000 ms | Silence duration when model says “still speaking” |
microphone.eos.timeout | 500 ms | 500–4000 ms | Fallback timeout for interim transcripts and inference failures |
Model Setup
Docker
The model is downloaded from Hugging Face during the Docker build and patched for ONNX Runtime compatibility. No manual action required.From Source
Download the model manually:libonnxruntime) — same dependency as Silero/FireRed VAD.