silero_vad
Source Location
How It Works
- Incoming LINEAR16 bytes are converted to
float32samples in the range[-1.0, 1.0] - Samples are fed to the Silero ONNX detector which produces speech segments with start/end timestamps
- On speech onset, an
InterruptionPacketis emitted (triggers barge-in) - While speech is active,
VadSpeechActivityPacketheartbeats keep the EOS timer from firing
Parameters
| Option Key | Default | Range | Description |
|---|---|---|---|
microphone.vad.threshold | 0.5 | 0.3–1.0 | Speech probability threshold |
microphone.vad.min_silence_frame | 20 | 1–30 | Silence frames before segment end (× 10 ms) |
microphone.vad.min_speech_frame | 8 | 1–20 | Speech frames before segment start (× 10 ms) |
min_silence_frame × 10 = MinSilenceDurationMs, min_speech_frame × 10 = SpeechPadMs.
Model Path
| Source | Resolution |
|---|---|
| Environment variable | SILERO_MODEL_PATH — absolute path to the .onnx file |
| Default (Docker) | ./models/silero_vad/silero_vad_20251001.onnx |
| Default (source) | api/assistant-api/internal/vad/internal/silero_vad/models/silero_vad_20251001.onnx |