Skip to main content
The assistant-api decouples speech synthesis from provider-specific logic through the same transformer layer as STT. Each TTS provider implements Transformers[LLMPacket] and is resolved at call time by the factory.

Transformer Interface

// api/assistant-api/internal/type/transformer.go
type Transformers[IN any] interface {
    Initialize() error
    Transform(ctx context.Context, in IN) error
    Close(ctx context.Context) error
}

// Type alias for TTS
type TextToSpeechTransformer = Transformers[LLMPacket]

LLMPacket Types

The LLMPacket interface is satisfied by three packet types that TTS providers must handle:
Packet TypeDescriptionAction
LLMResponseDeltaPacketA text token from the LLM streamSend token to TTS for synthesis
LLMResponseDonePacketEnd of LLM responseFlush/finalise synthesis
InterruptionPacketUser started speaking — cancel TTSClear/cancel queued audio
// Example from Deepgram TTS:
func (d *deepgramTTS) Transform(ctx context.Context, in LLMPacket) error {
    switch pkt := in.(type) {
    case InterruptionPacket:
        return d.conn.WriteJSON(map[string]string{"type": "Clear"})
    case LLMResponseDeltaPacket:
        return d.conn.WriteJSON(map[string]string{"type": "Speak", "text": pkt.Text})
    case LLMResponseDonePacket:
        return d.conn.WriteJSON(map[string]string{"type": "Flush"})
    }
    return nil
}

Factory Function

// api/assistant-api/internal/transformer/transformer.go
func GetTextToSpeechTransformer(
    ctx        context.Context,
    logger     commons.Logger,
    provider   string,                     // AudioTransformer constant string
    credential *protos.VaultCredential,    // decrypted vault credential
    onPacket   func(AudioPacket),          // callback invoked with each synthesised audio chunk
    opts       utils.Option,
) (TextToSpeechTransformer, error)

Supported TTS Providers

ProviderIdentifierStreamingNotes
Deepgram AuradeepgramLow-latency WebSocket, wss://api.deepgram.com/v1/speak
ElevenLabselevenlabsHigh-fidelity voice cloning
CartesiacartesiaUltra-low latency streaming
Google Cloud TTSgoogle-speech-serviceWaveNet / Neural2, 100+ voices
Azure Cognitiveazure-speech-serviceNeural voices, 140+ languages
Sarvam AIsarvamaiIndian languages
Resemble AIresembleVoice cloning
OpenAI TTSopenaitts-1, tts-1-hd
AWS PollyawsNeural voices

Provider Pages