Getting Started
To integrate Cartesia Text-to-Speech with your Rapida application, follow these steps:Supported Models
Text-to-Speech Models
| Model Name | Description | Use Case |
|---|---|---|
| Sonic English | High-quality English voices | Professional TTS applications |
| Bark Model | Expressive speech synthesis | Creative and dynamic voice outputs |
| Tortoise TTS | Ultra-realistic voices | High-fidelity voice synthesis |
Supported Languages
Cartesia supports 20+ languages including:- English (multiple accents)
- Spanish, French, German, Italian
- Portuguese, Russian, Polish, Dutch
- Japanese, Mandarin, Korean
- Hindi and other Indian languages
- And more
Voice Features
- Multiple Voice Personalities: Choose from various voice options
- Emotion Expression: Control emotional tone and expressiveness
- Pronunciation Control: Define how specific words are pronounced
- Speed Control: Adjust speaking rate from slow to fast
- Streaming Support: Stream audio in real-time
Prerequisites
- Have a Cartesia account (sign up at https://cartesia.ai)
- Navigate to your API dashboard
- Generate an API key
- Copy the API key (make sure to save it securely)
Setting Up Provider Credentials
Select Cartesia
On the Integrations page, find the Cartesia provider card.Click the “Setup Credential” button for Cartesia.
Create Provider Credential
A modal window will appear titled “Create provider credential”. Follow these steps:
- Select “Cartesia” from the dropdown (if not already selected)
- Enter a Key Name: Assign a unique name to this provider key for easy identification
- Enter the API Key: Input your Cartesia API key
- Click “Configure” to save the credential
Verify Credential Setup
After setting up the credential, you can verify it’s been added:
- The Cartesia provider card should now show “Connected”
- If you click on the provider, you’ll see a “View provider credential” modal
- This modal displays the credential name, when it was last updated, and options to delete or close
Integration Features
- Ultra-Realistic Voices: Advanced voice synthesis quality
- 20+ Languages: Comprehensive language coverage
- Multiple Voice Models: Choose different TTS models
- Expression Control: Fine-tune emotional expression
- Real-time Processing: Low-latency audio generation
- Streaming TTS: Stream audio as it’s being generated
- Professional Quality: Enterprise-grade voice synthesis
