Skip to main content

Purpose

The assistant-api is the runtime core of the Rapida platform. Every active voice call passes through this service. It owns:
  • Real-time audio streaming via WebSocket (browser and telephony)
  • Speech-to-text transcription (streaming, provider-agnostic)
  • LLM inference via integration-api (streaming token delivery)
  • Text-to-speech synthesis (streaming, provider-agnostic)
  • Telephony signalling for Twilio, Vonage, Exotel, Asterisk, and SIP
  • Knowledge base retrieval (RAG) via document-api
  • Conversation state and metrics persistence

Port

9007 — HTTP · gRPC · WebSocket (cmux)

Port

4573 — Asterisk AudioSocket (TCP)

Language

Go 1.25 Gin (REST) + gRPC

Voice Pipeline

Every call follows this sequence. STT, LLM, and TTS run as streaming pipelines — the LLM begins generating before STT finishes, and TTS begins speaking before the LLM completes.

Input Channels

ChannelTransportEntry PointUse Case
WebSocket (browser)wss://host:8080/talk_api.TalkService/AssistantTalkgRPC-webWeb widget, SDK
TwilioWebSocket per callwss://PUBLIC_ASSISTANT_HOST/v1/talk/twilio/ctx/{contextId}Inbound / outbound PSTN
VonageWebSocket per callwss://PUBLIC_ASSISTANT_HOST/v1/talk/vonage/ctx/{contextId}Inbound / outbound PSTN
ExotelWebSocket per callwss://PUBLIC_ASSISTANT_HOST/v1/talk/exotel/ctx/{contextId}India / SEA PSTN
Asterisk AudioSocketTCP 0.0.0.0:4573Raw TCP (AudioSocket protocol)Self-hosted PBX
Asterisk WebSocketWebSocket per callwss://PUBLIC_ASSISTANT_HOST/v1/talk/asterisk/ctx/{contextId}Self-hosted PBX
SIPUDP 0.0.0.0:5090SIP INVITEDirect SIP / Asterisk
PUBLIC_ASSISTANT_HOST must be a publicly reachable hostname or IP. Twilio, Vonage, and Exotel call back to this host for WebSocket media streaming.

STT / TTS Providers

Provider IdentifierSTTTTSNotes
deepgramNova-2 / Nova-3; streaming
google-speech-serviceStreaming STT; WaveNet / Neural2 TTS
azure-speech-serviceNeural voices; streaming
elevenlabsHigh-fidelity voice cloning
cartesiaLow-latency streaming
assemblyaiStreaming + batch
sarvamaiIndian languages
revaiStreaming STT
Provider identifiers are the string constants used in AudioTransformer (api/assistant-api/internal/transformer/transformer.go). They map directly to the provider field in the assistant configuration.

Key Components

Each provider lives under api/assistant-api/internal/transformer/<provider>/. All providers implement the same generic interface:
// api/assistant-api/internal/type/transformer.go
type Transformers[IN any] interface {
    Initialize() error
    Transform(context.Context, IN) error
    Close(context.Context) error
}
The factory functions resolve the provider string to a concrete implementation at call time:
// api/assistant-api/internal/transformer/transformer.go
func GetSpeechToTextTransformer(ctx, logger, provider, credential, onPacket, opts) (SpeechToTextTransformer, error)
func GetTextToSpeechTransformer(ctx, logger, provider, credential, onPacket, opts) (TextToSpeechTransformer, error)
See STT / TTS Providers for how to add a new provider.
Telephony providers live under api/assistant-api/internal/channel/telephony/internal/<provider>/. The factory in telephony.go creates the correct provider at runtime:
// api/assistant-api/internal/channel/telephony/telephony.go
const (
    Twilio   Telephony = "twilio"
    Exotel   Telephony = "exotel"
    Vonage   Telephony = "vonage"
    Asterisk Telephony = "asterisk"
    SIP      Telephony = "sip"
)
All providers implement the Telephony interface (ReceiveCall, OutboundCall, InboundCall, StatusCallback). See Telephony for provider setup.
Call context (assistant ID, conversation ID, auth token, provider, caller number) is persisted in PostgreSQL via the callcontext.Store interface. The context ID is passed through the call URL path so the WebSocket or AudioSocket handler can resolve the full session without requiring the client to re-authenticate.
The Communication interface in api/assistant-api/internal/type/ is the central contract tying together the STT callback, LLM execution, TTS callback, auth, tracing, and conversation state for a single call. Each channel (WebSocket, telephony, SIP) creates a Communication implementation per call.

Running

# Start assistant-api and its dependencies
make up-assistant

# Follow logs
make logs-assistant

# Rebuild after code changes
make rebuild-assistant

# Shell access
make shell-assistant

Health Endpoints

EndpointPurpose
GET /readiness/Service ready (DB + Redis + OpenSearch connected)
GET /healthz/Liveness probe
curl http://localhost:9007/readiness/

Next Steps