Purpose
Theassistant-api is the runtime core of the Rapida platform. Every active voice call passes through this service. It owns:
- Real-time audio streaming via WebSocket (browser and telephony)
- Speech-to-text transcription (streaming, provider-agnostic)
- LLM inference via
integration-api(streaming token delivery) - Text-to-speech synthesis (streaming, provider-agnostic)
- Telephony signalling for Twilio, Vonage, Exotel, Asterisk, and SIP
- Knowledge base retrieval (RAG) via
document-api - Conversation state and metrics persistence
Port
9007 — HTTP · gRPC · WebSocket (cmux)Port
4573 — Asterisk AudioSocket (TCP)Language
Go 1.25
Gin (REST) + gRPC
Voice Pipeline
Every call follows this sequence. STT, LLM, and TTS run as streaming pipelines — the LLM begins generating before STT finishes, and TTS begins speaking before the LLM completes.Input Channels
| Channel | Transport | Entry Point | Use Case |
|---|---|---|---|
| WebSocket (browser) | wss://host:8080/talk_api.TalkService/AssistantTalk | gRPC-web | Web widget, SDK |
| Twilio | WebSocket per call | wss://PUBLIC_ASSISTANT_HOST/v1/talk/twilio/ctx/{contextId} | Inbound / outbound PSTN |
| Vonage | WebSocket per call | wss://PUBLIC_ASSISTANT_HOST/v1/talk/vonage/ctx/{contextId} | Inbound / outbound PSTN |
| Exotel | WebSocket per call | wss://PUBLIC_ASSISTANT_HOST/v1/talk/exotel/ctx/{contextId} | India / SEA PSTN |
| Asterisk AudioSocket | TCP 0.0.0.0:4573 | Raw TCP (AudioSocket protocol) | Self-hosted PBX |
| Asterisk WebSocket | WebSocket per call | wss://PUBLIC_ASSISTANT_HOST/v1/talk/asterisk/ctx/{contextId} | Self-hosted PBX |
| SIP | UDP 0.0.0.0:5090 | SIP INVITE | Direct SIP / Asterisk |
PUBLIC_ASSISTANT_HOST must be a publicly reachable hostname or IP. Twilio, Vonage, and Exotel call back to this host for WebSocket media streaming.STT / TTS Providers
| Provider Identifier | STT | TTS | Notes |
|---|---|---|---|
deepgram | ✅ | ✅ | Nova-2 / Nova-3; streaming |
google-speech-service | ✅ | ✅ | Streaming STT; WaveNet / Neural2 TTS |
azure-speech-service | ✅ | ✅ | Neural voices; streaming |
elevenlabs | ✗ | ✅ | High-fidelity voice cloning |
cartesia | ✅ | ✅ | Low-latency streaming |
assemblyai | ✅ | ✗ | Streaming + batch |
sarvamai | ✅ | ✅ | Indian languages |
revai | ✅ | ✗ | Streaming STT |
AudioTransformer (api/assistant-api/internal/transformer/transformer.go). They map directly to the provider field in the assistant configuration.
Key Components
Transformer Layer (STT / TTS)
Transformer Layer (STT / TTS)
Each provider lives under The factory functions resolve the provider string to a concrete implementation at call time:See STT / TTS Providers for how to add a new provider.
api/assistant-api/internal/transformer/<provider>/. All providers implement the same generic interface:Telephony Channel Layer
Telephony Channel Layer
Telephony providers live under All providers implement the
api/assistant-api/internal/channel/telephony/internal/<provider>/. The factory in telephony.go creates the correct provider at runtime:Telephony interface (ReceiveCall, OutboundCall, InboundCall, StatusCallback). See Telephony for provider setup.Call Context Store
Call Context Store
Call context (assistant ID, conversation ID, auth token, provider, caller number) is persisted in PostgreSQL via the
callcontext.Store interface. The context ID is passed through the call URL path so the WebSocket or AudioSocket handler can resolve the full session without requiring the client to re-authenticate.Communication Interface
Communication Interface
The
Communication interface in api/assistant-api/internal/type/ is the central contract tying together the STT callback, LLM execution, TTS callback, auth, tracing, and conversation state for a single call. Each channel (WebSocket, telephony, SIP) creates a Communication implementation per call.Running
- Docker Compose
- From Source
Health Endpoints
| Endpoint | Purpose |
|---|---|
GET /readiness/ | Service ready (DB + Redis + OpenSearch connected) |
GET /healthz/ | Liveness probe |
Next Steps
Configuration
All environment variables with defaults and descriptions.
STT / TTS Providers
Supported providers, the transformer interface, and how to add a new provider.
Telephony
Twilio, Vonage, Exotel, Asterisk AudioSocket, and SIP setup.
Integration API
LLM provider execution layer called by assistant-api.