Overview
Theassistant-api is the voice orchestration engine of the Rapida platform. Every active call — whether inbound from a phone, a browser SDK session, or an Asterisk trunk — runs through this service. It owns the complete real-time pipeline: audio streaming, speech-to-text, LLM inference (via integration-api), text-to-speech, and call event emission (via endpoint-api).
Port
9007 — HTTP · gRPC · WebSocket (cmux)
4573 — Asterisk AudioSocket (TCP)Language
Go 1.25
gRPC inter-service
cmux port multiplexing
Storage
PostgreSQL
assistant_db
Redis (session, cache)
OpenSearch (transcripts)The
assistant-api does not store provider credentials. It delegates all LLM, STT, and TTS calls to integration-api, which manages encrypted credential access. The assistant-api never touches raw API keys.Components
The assistant-api is composed of four internal subsystems. Each subsystem has a well-defined boundary and can be extended independently.Voice Pipeline — STT / LLM / TTS orchestration
Voice Pipeline — STT / LLM / TTS orchestration
The pipeline runs per-utterance within a conversation. When voice activity is detected, audio frames are forwarded to the configured STT provider. The transcript is sent to
integration-api for LLM inference, and the response is synthesized by the configured TTS provider before being streamed back to the caller.| Stage | Component | What Happens |
|---|---|---|
| Audio In | Audio receiver | Raw PCM / G.711 frames arrive over WebSocket or SIP |
| VAD | Voice Activity Detector | Silence detection determines utterance start/end |
| STT | internal/transformer/<provider>/stt.go | Audio chunks streamed to provider; transcript returned |
| LLM | integration-api:9004 via gRPC | Transcript + system prompt + conversation history sent; tokens streamed back |
| TTS | internal/transformer/<provider>/tts.go | LLM tokens synthesized to audio in real time |
| Audio Out | Audio sender | Synthesized audio chunks streamed back to caller |
Transformer Layer — STT and TTS provider adapters
Transformer Layer — STT and TTS provider adapters
Each STT and TTS provider is implemented as a Go struct that satisfies one of two interfaces:Provider directories live at
Supported STT providers
Supported TTS providers
api/assistant-api/internal/transformer/<provider>/. Each directory contains:| File | Purpose |
|---|---|
<provider>.go | Client initialization and configuration |
stt.go | Implements the Transformers[AudioPacket] interface |
tts.go | Implements the Transformers[TextPacket] interface |
normalizer.go | Audio format normalization (sample rate, encoding) |
| Provider | Notes |
|---|---|
| Google Cloud STT | Streaming recognition, 100+ languages |
| Azure Cognitive Services | Microsoft Neural Speech |
| Deepgram | Low-latency streaming, Nova models |
| AssemblyAI | Streaming and batch |
| Cartesia | Real-time with speaker diarization |
| Sarvam AI | Indian language support |
| Provider | Notes |
|---|---|
| Google Cloud TTS | WaveNet / Neural2 voices |
| Azure Cognitive Services | Neural voices, 140+ languages |
| ElevenLabs | High-fidelity cloned voices |
| Deepgram Aura | Low-latency streaming synthesis |
| Cartesia | Streaming synthesis |
| Sarvam AI | Indian language support |
Telephony Channel Layer — inbound/outbound call handling
Telephony Channel Layer — inbound/outbound call handling
Each telephony provider is implemented as a channel adapter under
api/assistant-api/internal/channel/<provider>/. Channels handle connection negotiation, audio codec translation, and call lifecycle events (answer, transfer, hangup).Supported telephony channels| Provider | Integration Method | Audio Protocol |
|---|---|---|
| Twilio | HTTP webhook + Media Streams | WebSocket (µ-law) |
| Vonage | HTTP webhook + WebSocket | WebSocket (PCM) |
| Exotel | HTTP webhook | WebSocket |
| Asterisk | AudioSocket TCP | Raw G.711 µ-law frames |
| SIP (direct) | SIP INVITE | RTP |
| WebRTC (browser) | WebSocket / gRPC-web | PCM / Opus |
Communication Interface — core pipeline contract
Communication Interface — core pipeline contract
The
Communication interface in api/assistant-api/internal/type/ is the top-level contract that ties together the pipeline. It exposes:OnAudioReceived— called with each audio frame from the clientOnTranscript— called when STT returns a transcript segmentOnLLMResponse— called with each streamed LLM tokenOnAudioSend— called with each TTS audio chunk to deliver to the clientOnCallEnd— called when the conversation terminates
Call Routing
- WebSocket / Browser SDK
- Telephony Webhooks (Twilio / Vonage / Exotel)
- SIP / Asterisk (AudioSocket)
- Direct gRPC (Server SDK)
Browser clients and server SDKs connect via WebSocket through the Nginx gateway. Nginx upgrades the HTTP connection and proxies it to In production with TLS:Required headers
Nginx routing rule (from
assistant-api:9007 with keepalive.Connection URL| Header | Value |
|---|---|
Authorization | Bearer <jwt> |
X-Assistant-Id | Assistant UUID |
nginx.conf)Voice Pipeline Flow
The sequence below shows a complete utterance cycle within an active call.Configuration
The assistant-api reads its configuration fromdocker/assistant-api/.assistant.env (Docker) or environment variables (local).
Required variables
| Variable | Required | Default | Description |
|---|---|---|---|
SECRET | ✅ Yes | rpd_pks | JWT signing secret — must match all other services |
POSTGRES__HOST | ✅ Yes | postgres | PostgreSQL host |
POSTGRES__DB_NAME | ✅ Yes | assistant_db | Database name |
POSTGRES__AUTH__USER | ✅ Yes | rapida_user | Database user |
POSTGRES__AUTH__PASSWORD | ✅ Yes | — | Database password |
REDIS__HOST | ✅ Yes | redis | Redis host |
OPENSEARCH__HOST | ✅ Yes | opensearch | OpenSearch host |
INTEGRATION_HOST | ✅ Yes | integration-api:9004 | gRPC address of integration-api |
ENDPOINT_HOST | ✅ Yes | endpoint-api:9005 | gRPC address of endpoint-api |
Tuning variables
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL | debug | debug · info · warn · error |
ENV | development | development · staging · production |
POSTGRES__MAX_OPEN_CONNECTION | 50 | Database connection pool size |
POSTGRES__MAX_IDEAL_CONNECTION | 25 | Idle connections to keep open |
REDIS__MAX_CONNECTION | 10 | Redis connection pool size |
OPENSEARCH__MAX_RETRIES | 3 | Retry attempts on OpenSearch failure |
OPENSEARCH__MAX_CONNECTION | 10 | OpenSearch connection pool size |
Full environment file
Running
- Docker Compose
- From Source
Health & Observability
| Endpoint | Purpose |
|---|---|
GET /readiness/ | Reports whether the service is ready to accept traffic (DB + Redis + OpenSearch connected) |
GET /healthz/ | Liveness probe — confirms the Go process is alive |
Troubleshooting
Audio stream disconnects immediately
Audio stream disconnects immediately
- Verify the JWT token in the
Authorizationheader is valid and not expired. - Confirm the assistant UUID exists in
assistant_db. - Run
make logs-assistantand look forconnection rejectedorauth failedentries.
STT returns empty or incorrect transcripts
STT returns empty or incorrect transcripts
- Confirm the STT provider credentials are stored in
integration-apiand pass the credential test. - Verify the audio sample rate matches the provider’s expectation (most require 8 kHz or 16 kHz).
- Check that
OPENSEARCH__HOSTis reachable — OpenSearch indexing failures can surface as STT errors in logs.
LLM responses are slow or timing out
LLM responses are slow or timing out
- Check
integration-apiis healthy:curl http://localhost:9004/readiness/ - Verify the LLM provider API key is valid and has sufficient quota.
- Increase
POSTGRES__MAX_OPEN_CONNECTIONif database contention is visible in logs.
Asterisk AudioSocket: connection refused on port 4573
Asterisk AudioSocket: connection refused on port 4573
- Confirm the port is exposed:
docker port assistant-api 4573 - Check the Asterisk dialplan passes a valid assistant UUID.
- Verify firewall rules allow TCP from the Asterisk host to port 4573.
Twilio / Vonage: no audio or one-way audio
Twilio / Vonage: no audio or one-way audio
- Confirm the webhook URL is publicly reachable from the provider’s servers (not
localhost). - Check that the TwiML / Vonage NCCO returns the correct stream URL.
- Review
make logs-assistantfor codec negotiation errors.
Next Steps
Integration API
Configure LLM, STT, and TTS provider credentials used by this service.
Document API
Connect knowledge bases to assistants for RAG-powered responses.
Architecture
Understand how all services connect and communicate.
Configuration Reference
Full environment variable reference for all services.