Getting Started
To integrate Cartesia Text-to-Speech with your Rapida application, follow these steps:Supported Models
Text-to-Speech Models
| Model Name | Description | Use Case |
|---|---|---|
| Sonic English | High-quality English voices | Professional TTS applications |
| Bark Model | Expressive speech synthesis | Creative and dynamic voice outputs |
| Tortoise TTS | Ultra-realistic voices | High-fidelity voice synthesis |
Supported Languages
Cartesia supports 20+ languages including:- English (multiple accents)
- Spanish, French, German, Italian
- Portuguese, Russian, Polish, Dutch
- Japanese, Mandarin, Korean
- Hindi and other Indian languages
- And more
Voice Features
- Multiple Voice Personalities: Choose from various voice options
- Emotion Expression: Control emotional tone and expressiveness
- Pronunciation Control: Define how specific words are pronounced
- Speed Control: Adjust speaking rate from slow to fast
- Streaming Support: Stream audio in real-time
Prerequisites
- Have a Cartesia account (sign up at https://cartesia.ai)
- Navigate to your API dashboard
- Generate an API key
- Copy the API key (make sure to save it securely)
Setting Up Provider Credentials
1
Access the Integrations Page

2
Select Cartesia
On the Integrations page, find the Cartesia provider card.Click the “Setup Credential” button for Cartesia.
3
Create Provider Credential
A modal window will appear titled “Create provider credential”. Follow these steps:
- Select “Cartesia” from the dropdown (if not already selected)
- Enter a Key Name: Assign a unique name to this provider key for easy identification
- Enter the API Key: Input your Cartesia API key
- Click “Configure” to save the credential
4
Verify Credential Setup
After setting up the credential, you can verify it’s been added:
- The Cartesia provider card should now show “Connected”
- If you click on the provider, you’ll see a “View provider credential” modal
- This modal displays the credential name, when it was last updated, and options to delete or close
Integration Features
- Ultra-Realistic Voices: Advanced voice synthesis quality
- 20+ Languages: Comprehensive language coverage
- Multiple Voice Models: Choose different TTS models
- Expression Control: Fine-tune emotional expression
- Real-time Processing: Low-latency audio generation
- Streaming TTS: Stream audio as it’s being generated
- Professional Quality: Enterprise-grade voice synthesis