Skip to main content
Voice Activity Detection (VAD) runs on every audio frame before STT. It determines whether the caller is speaking, emits interruption signals for barge-in, and sends speech activity heartbeats that keep the End of Speech timer from firing during active speech. All three VAD providers run locally using ONNX or native C libraries. No external API calls, no credentials, no added latency.

VAD Interface

Every provider implements the Vad interface:
// api/assistant-api/internal/type/vad.go
type Vad interface {
    Name() string
    Process(ctx context.Context, pkt UserAudioPacket) error
    Close() error
}
Input audio is always 16 kHz LINEAR16 mono — the platform’s internal format. Resampling from 8 kHz telephony audio happens upstream before VAD sees it.

Factory Function

// api/assistant-api/internal/vad/vad.go
func GetVAD(ctx, logger, callback, options) (Vad, error)
The factory reads microphone.vad.provider from the assistant’s audio options and returns the matching implementation. If no provider is set, Silero VAD is used as the default.

Provider Identifiers

IdentifierProviderModel
silero_vadSilero VADsilero_vad_20251001.onnx (~2 MB)
ten_vadTEN VADNative C library (no model file)
firered_vadFireRed VADfireredvad_stream_vad_with_cache.onnx (~5 MB)

Shared Parameters

All three providers use the same configuration keys:
Option KeyTypeDefaultDescription
microphone.vad.providerstringsilero_vadProvider selection
microphone.vad.thresholdfloat0.5Speech probability threshold (0.3–1.0). Lower = more sensitive.
microphone.vad.min_silence_framefloat20Consecutive silence frames before ending a speech segment. Each frame = 10 ms. Default 20 = 200 ms.
microphone.vad.min_speech_framefloat8Consecutive speech frames before confirming speech onset. Each frame = 10 ms. Default 8 = 80 ms.

Model Files

VAD models are bundled in the Docker image at build time. For source builds, models are resolved from the source tree relative to each provider’s Go package directory.

Docker

Models are copied into the runtime image and referenced via environment variables:
SILERO_MODEL_PATH=./models/silero_vad/silero_vad_20251001.onnx
FIRERED_VAD_MODEL_PATH=./models/firered_vad/fireredvad_stream_vad_with_cache.onnx
TEN VAD uses a native shared library instead of an ONNX model:
LD_LIBRARY_PATH="/opt/ten_vad/lib:..."
These are set automatically in the Dockerfile — no manual configuration needed when running via Docker Compose.

From Source

When running go run cmd/assistant/assistant.go, each provider resolves its model path using runtime.Caller to find the source directory:
ProviderDefault path (relative to package dir)
Silero VADapi/assistant-api/internal/vad/internal/silero_vad/models/silero_vad_20251001.onnx
FireRed VADapi/assistant-api/internal/vad/internal/firered_vad/models/fireredvad_stream_vad_with_cache.onnx
TEN VADNo model file — requires libten_vad.so in the library path
Silero and FireRed models are checked into the repository. TEN VAD requires the shared library to be installed or available in LD_LIBRARY_PATH / DYLIB_LIBRARY_PATH. To override model paths, set the environment variables:
export SILERO_MODEL_PATH=/path/to/silero_vad.onnx
export FIRERED_VAD_MODEL_PATH=/path/to/fireredvad.onnx

CGO Dependencies

All VAD providers use CGO for inference:
ProviderCGO dependencyBuild flags
Silero VADONNX Runtime (libonnxruntime)-I/opt/onnxruntime/include -L/opt/onnxruntime/lib -lonnxruntime
FireRed VADONNX Runtime (libonnxruntime)Same as Silero
TEN VADTEN VAD library (libten_vad)-I/opt/ten_vad/include -L/opt/ten_vad/lib -lten_vad
The Docker base image (rapidaai/rapida-golang) includes all CGO dependencies pre-installed. For local source builds, you need ONNX Runtime and the TEN VAD shared library installed on your system.

Providers

ProviderApproachBest for
Silero VADONNX model, 100+ languagesGeneral purpose — the default
TEN VADNative C library, 16 ms frame hopsLowest per-frame latency
FireRed VADDFSMN model, 4-state postprocessorNoisy environments, precise boundaries
See the VAD concepts guide for detailed parameter tuning guidance and use-case recommendations.