Skip to main content
Rapida is an open-source voice AI orchestration platform. It gives you a complete, production-ready stack — speech processing, LLM routing, telephony, knowledge retrieval, and observability — unified under one system so you stop assembling fragile integrations and start shipping voice AI that works.
Rapida runs as a fully-managed cloud service or as a self-hosted deployment on your own infrastructure. Same codebase, same APIs, same SDKs — your choice of where it runs.

Omnichannel by default

Deploy the same assistant across every channel from a single configuration. No per-channel codebases, no duplicated logic.
Change your prompt, swap a model, or update a knowledge base once — every channel picks it up instantly. No redeployments.

Carrier-grade telephony

Most voice AI platforms bolt telephony on as an afterthought. Rapida makes it a first-class concern.
Rapida speaks native SIP (RFC 3261), manages WebRTC peer connections with full ICE/STUN/TURN, and ships a built-in AudioSocket server for Asterisk — all in the same binary, no media gateways required.
  • Connect any provider. Twilio, Vonage, Exotel, Asterisk, or any SIP-compatible carrier. Switching providers never touches your assistant logic — swap at the channel layer without any code changes.
  • Inbound and outbound at scale. Handle thousands of simultaneous calls while running outbound dialing campaigns in parallel. Launch campaigns programmatically via SDK or REST bulk call API.
  • Live call transfer. Transfer active calls to a human agent mid-conversation. Rapida sends a SIP REFER in-flight with no audio gap and hands off the full conversation context to the receiving agent.
  • Mix providers across regions. Route calls to different telephony providers per region for optimal cost, latency, and compliance without rebuilding your assistant.

The full stack

Speech Processing

STT and TTS with VAD, noise cancellation, and sub-300ms latency. Mix providers per language or use case — Deepgram, ElevenLabs, Azure, AWS, OpenAI, and more.

LLM Routing & AgentKit

Route to any LLM — OpenAI, Anthropic, Gemini, Mistral, Bedrock. Or bring your own backend with AgentKit: plug any LLM server in over gRPC while Rapida manages the audio pipeline.

Knowledge Retrieval

Connect documents, wikis, and databases. Rapida handles chunking, embedding, and semantic search via OpenSearch so your assistant always has the right context at call time.

Tools, Connectors & Observability

Call HTTP endpoints mid-conversation, sync data from Notion, GitHub, HubSpot, Jira, and more. Every call is logged with full transcripts, latency breakdowns, and webhook delivery.

Works with everything you use

Rapida is provider-agnostic. Swap any layer without changing your assistant configuration.
LayerSupported providers
LLMOpenAI, Anthropic, Azure OpenAI, Google Gemini, Vertex AI, Mistral, Cohere, AWS Bedrock, HuggingFace, VoyageAI, AgentKit (custom gRPC)
STTDeepgram, AssemblyAI, Azure, Google, AWS Transcribe, OpenAI Whisper, Cartesia, Rev.ai, Speechmatics, Sarvam
TTSElevenLabs, Deepgram, Azure, Google, AWS Polly, OpenAI, Cartesia, Resemble, Sarvam
TelephonyTwilio, Vonage, Exotel, Asterisk (AudioSocket + WebSocket), any SIP trunk

From demo to enterprise scale

Most teams start with a single assistant and a handful of test calls. Rapida is designed so the same codebase, the same APIs, and the same configuration that worked for your demo handles your production traffic — without re-architecting at every order of magnitude.
  • Day one. One API call, one assistant, one channel. The managed cloud handles infrastructure so you can focus on the product.
  • Team growth. Add knowledge bases, connect CRM tools, define endpoint tools, roll out to more channels — all from the same platform, no new services to operate.
  • Production scale. Rapida’s Go microservices scale horizontally and independently. Each service — voice orchestration, LLM routing, endpoint invocation, document processing — runs as a stateless unit behind a load balancer. Redis handles distributed state and pub/sub. OpenSearch scales knowledge retrieval across millions of documents.
  • Enterprise deployment. Private VPC, dedicated infrastructure, custom SLAs, and full data residency. Move from managed cloud to self-hosted without changing a line of your application code.
Rapida has no arbitrary call concurrency limits built into the platform. Throughput scales with infrastructure — horizontal pod autoscaling, Redis cluster, and OpenSearch node expansion are all first-class deployment patterns.
Enterprise adoption of voice AI faces two distinct scrutiny layers: the engineering team evaluates correctness, latency, and operability; legal and compliance teams evaluate data handling, sovereignty, and auditability. Rapida is built to satisfy both.

Data Sovereignty

Self-hosted deployments run entirely on your infrastructure. No audio, no transcripts, no customer data transits Rapida’s servers. Deploy in any region — your data stays where regulations require it.

Open Source Auditability

The full platform is open source. Legal and security teams can audit every component that handles customer data — the audio pipeline, LLM routing layer, credential vault, and logging system — without relying on vendor assurances.

Compliance-aligned deployment

Self-hosted deployments on HIPAA-compliant or GDPR-scoped infrastructure with your own BAAs and DPAs. Audit logs for every call, tool invocation, and API access. Configurable data retention and PII redaction at the transcript level.

Governance & Access Control

Organization → Project RBAC hierarchy. Scoped roles (Super Admin, Admin, Writer, Reader) across all resources. A credential vault stores all API keys and OAuth tokens encrypted at rest — no secrets in environment variables or config files.

No Vendor Lock-in

Open source with no proprietary formats. Swap LLMs, STT, TTS, or telephony providers at the configuration layer. If Rapida ever stops fitting your needs, your assistant logic, knowledge bases, and integration configs are all yours to take.

Scalable, Observable Architecture

Every call emits structured logs, latency spans, and token usage metrics. Webhooks deliver post-call events to your data warehouse or alerting system. Full integration with standard observability stacks.

Get started