assistant-api decouples speech synthesis from provider-specific logic through the same transformer layer as STT. Each TTS provider implements Transformers[LLMPacket] and is resolved at call time by the factory.
Transformer Interface
LLMPacket Types
TheLLMPacket interface is satisfied by three packet types that TTS providers must handle:
| Packet Type | Description | Action |
|---|---|---|
LLMResponseDeltaPacket | A text token from the LLM stream | Send token to TTS for synthesis |
LLMResponseDonePacket | End of LLM response | Flush/finalise synthesis |
InterruptionPacket | User started speaking — cancel TTS | Clear/cancel queued audio |
Factory Function
Supported TTS Providers
| Provider | Identifier | Streaming | Notes |
|---|---|---|---|
| Deepgram Aura | deepgram | ✅ | Low-latency WebSocket, wss://api.deepgram.com/v1/speak |
| ElevenLabs | elevenlabs | ✅ | High-fidelity voice cloning |
| Cartesia | cartesia | ✅ | Ultra-low latency streaming |
| Google Cloud TTS | google-speech-service | ✅ | WaveNet / Neural2, 100+ voices |
| Azure Cognitive | azure-speech-service | ✅ | Neural voices, 140+ languages |
| Sarvam AI | sarvamai | ✅ | Indian languages |
| Resemble AI | resemble | ✅ | Voice cloning |
| OpenAI TTS | openai | ✅ | tts-1, tts-1-hd |
| AWS Polly | aws | ✅ | Neural voices |