Architecture
Theassistant-api decouples audio I/O from provider-specific logic through a transformer layer. Each provider implements the same generic interface. The factory functions resolve the provider string at call time.
Transformer Interface
Every STT and TTS provider implements the sameTransformers[IN] generic interface:
Provider Identifiers
Provider strings are defined asAudioTransformer constants in api/assistant-api/internal/transformer/transformer.go:
Factory Functions
The factory functions accept the provider string and return the correct implementation:switch on AudioTransformer(provider). Adding a new provider means adding a case to each switch.
Supported Providers
- STT
- TTS
| Provider | Identifier | Streaming | Notes |
|---|---|---|---|
| Deepgram | deepgram | ✅ | Nova-2 / Nova-3; WebSocket SDK |
| Google Cloud STT | google-speech-service | ✅ | 100+ languages |
| Azure Cognitive Services | azure-speech-service | ✅ | Neural Speech |
| Cartesia | cartesia | ✅ | Low latency |
| AssemblyAI | assemblyai | ✅ | Speaker diarization |
| Rev.ai | revai | ✅ | Real-time |
| Sarvam AI | sarvamai | ✅ | Indian languages |
| AWS Transcribe | (aws) | ✅ | Real-time streaming |
| OpenAI Whisper | (openai) | ✗ | Batch only |
| Speechmatics | (speechmatics) | ✅ | Real-time |
Reference Implementation — Deepgram
The Deepgram transformer is the reference implementation. Its structure is the same for every provider.Option struct (deepgram.go)
Option struct (deepgram.go)
"key") matches what is stored in the credential vault.STT implementation (stt.go)
STT implementation (stt.go)
UserAudioPacket.Audio contains raw PCM 16-bit 16kHz audio bytes.TTS implementation (tts.go)
TTS implementation (tts.go)
onPacket callback is called with each synthesized audio chunk, which is sent directly to the client.Adding a New Provider
Follow these steps to add a new STT or TTS provider.Create the provider directory
| File | Purpose |
|---|---|
<provider>.go | Option struct, credential extraction, client initialization |
stt.go | STT implementation (omit if TTS-only) |
tts.go | TTS implementation (omit if STT-only) |
normalizer.go | Text normalizer for TTS (strip markdown, apply pronunciation dict) |
Register in the factory (transformer.go)
Add a case to both factory functions in No changes to any other service are needed.
api/assistant-api/internal/transformer/transformer.go: