Skip to main content
Version: 2.0.0.1.9.25

Voices - Complete Documentation

Overview

Voices in Voicing AI platform are AI-generated speech profiles that convert text into natural-sounding speech. The platform supports multiple voice providers, voice types, and extensive customization options for creating lifelike conversational experiences.

Key Concepts

  • Voice: A speech profile that defines how text is converted to audio
  • Speaker: A specific voice instance with a name, language, and characteristics
  • Voice Provider: The underlying TTS service (configured TTS provider)
  • Voice Type: Curated (pre-built) or Cloned (custom-created)
  • TTS Model: The AI model that powers voice generation

Table of Contents

  1. Voice Providers
  2. Voice Types
  3. Voice Configuration
  4. Voice API Endpoints
  5. Voice Service Methods
  6. TTS Models
  7. Voice Generation
  8. Voice Cloning
  9. Complete Code Examples

Voice Providers

Voicing AI uses a Text-to-Speech (TTS) provider to generate speech from text.

1. Voicing

Provider ID: "voicing"

Features:

  • Custom TTS service
  • Language mapping support
  • Custom pronunciation rules
  • Expressive level control
  • Speed adjustment
  • Translation capabilities

Configuration:

# Environment Variables Required
VOICING_TTS_API_KEY=your_api_key_here
VOICING_TTS_API_URL=https://api.voicing.ai
TTS_PROVIDER=voicing

Voice Settings:

  • speed (float): Speech speed multiplier (default: 1.0)
  • expressive_level (0-100): Emotional expressiveness (default: 30)
  • translate_text (boolean): Auto-translate text if needed
  • output_format: Audio format (default: "pcm_24000")

Supported Languages:

  • German (de)
  • Greek (el)
  • English (en)
  • Spanish (es)
  • Finnish (fi)
  • French (fr)
  • Italian (it)
  • Portuguese (pt)
  • Swedish (sv)
  • Hindi (hi)
  • English-UK (en-uk)

Example Voice Configuration:

{
"voice_id": "speaker_001",
"speaker_name": "Sarah",
"speaker_type": "voicing",
"language": "en",
"stability": 0.5,
"similarity_boost": 0.5
}

Voices_Example

Voice Types

1. Curated Voices

Type: "curated"

Curated voices are pre-built, professionally designed voices available in the platform's voice library.

Characteristics:

  • Pre-configured and ready to use
  • Professionally optimized
  • Available in multiple languages
  • No training required
  • Consistent quality

Usage: Curated voices are fetched from the external Voicing TTS API and displayed in the voice selection interface.

Example:

{
"id": "019aa111-2bcd-3456-def0-123456789012",
"name": "Michael",
"language": "English",
"accent": "American",
"speaker_type": "voicing",
"voice_id": "1SM7GgM6IMuvQlz2BwM3",
"gender": "Male",
"rating": "4.5",
"tag": "professional-male-en"
}

2. Cloned Voices

Type: "cloned"

Cloned voices are custom voices created by training on audio samples provided by users.

Characteristics:

  • Custom voice creation
  • Requires audio samples
  • Unique voice profiles
  • Personal or brand-specific voices
  • Training process required

Creation Process:

  1. Upload audio samples
  2. System processes and trains the model
  3. Voice clone is generated
  4. Voice becomes available for use

Example:

{
"id": "019bb222-3cde-4567-ef01-234567890123",
"name": "Custom Brand Voice",
"voice_type": "cloned",
"status": "completed",
"voice_clone": {
"voice_id": "custom_voice_123",
"name": "Custom Brand Voice",
"category": "cloned",
"description": "Custom voice for brand identity"
}
}

Voice Configuration

Assistant Voice Settings

Voices are configured in the Assistant's TTS Settings section.

Schema Structure:

{
"tts_settings": {
"model": "019cc123-4def-5678-9abc-def012345678",
"voice": {
"voice_id": "1SM7GgM6IMuvQlz2BwM3",
"speaker_type": "voicing",
"speaker_name": "Michael",
"language": "en",
"stability": 0.5,
"similarity_boost": 0.5
}
}
}

Voice Fields Explained

voice_id (string, required)

  • Unique identifier for the voice
  • Provider voice ID / speaker identifier (from your configured TTS provider)

speaker_name (string, required)

  • Human-readable name of the voice
  • Examples: "Michael", "Sarah", "David"
  • Used for display and identification

speaker_type (string, required)

  • Provider type: "voicing"
  • Determines which TTS service to use
  • Auto-set based on TTS_PROVIDER setting if not specified

language (string, required)

  • Language code or name
  • Examples: "en", "English", "es", "Spanish"
  • Must match voice's supported language

stability (float, 0.0-1.0, default: 0.5)

  • Controls voice consistency when supported by your configured provider

similarity_boost (float, 0.0-1.0, default: 0.5)

  • Controls similarity/voice adherence when supported by your configured provider

Complete Voice Configuration Example

{
"tts_settings": {
"model": "019cc123-4def-5678-9abc-def012345678",
"voice": {
"voice_id": "1SM7GgM6IMuvQlz2BwM3",
"speaker_type": "voicing",
"speaker_name": "Michael",
"language": "en",
"stability": 0.6,
"similarity_boost": 0.7
},
"speech_enhancements": {
"expressiveness": 75,
"speed": 1.0,
"enable_number_normalization": true
}
}
}

Voice API Endpoints

1. Get Curated Voices

Endpoint: GET /api/v1/aivoices

Description: Retrieve all available curated voices from the external TTS API.

Query Parameters:

  • language (optional, string): Filter voices by language (e.g., "English", "Spanish")
  • model_id (optional, string): Filter voices by TTS model ID (filters by provider)

Authentication: Required (VOICES.VIEW permission)

Response Schema:

{
"aivoices": [
{
"id": "019aa111-2bcd-3456-def0-123456789012",
"name": "Michael",
"language": "English",
"accent": "American",
"voice_id": "1SM7GgM6IMuvQlz2BwM3",
"gender": "Male",
"rating": "4.5",
"tag": "professional-male-en",
"filler_words": "um, uh, like",
"profile_photo_url": "https://storage.example.com/voices/michael.jpg",
"filler_words_audio_urls": "https://storage.example.com/voices/michael_filler.wav"
}
]
}

Example Request:

curl -X GET "https://api.voicing.ai/api/v1/aivoices?language=English" \
-H "Authorization: Bearer YOUR_TOKEN"

Example Response:

{
"aivoices": [
{
"id": "voice_001",
"name": "Michael",
"language": "English",
"accent": "American",
"voice_id": "1SM7GgM6IMuvQlz2BwM3",
"gender": "Male",
"rating": "4.5",
"tag": "professional-male-en"
},
{
"id": "voice_002",
"name": "Sarah",
"language": "English",
"accent": "British",
"voice_id": "2TN8HhN7JNuvRlz3CxN4",
"gender": "Female",
"rating": "4.8",
"tag": "professional-female-en-uk"
}
]
}

2. Get Speaker Languages

Endpoint: GET /api/v1/aivoices/languages

Description: Retrieve all unique languages available from speaker profiles.

Query Parameters:

  • speaker (optional, string): Filter languages for a specific speaker name
  • model_id (optional, string): Filter by TTS model ID

Authentication: Required (VOICES.VIEW permission)

Response Schema:

{
"languages": [
"English",
"Spanish",
"French",
"German",
"Hindi"
]
}

Example Request:

curl -X GET "https://api.voicing.ai/api/v1/aivoices/languages?speaker=Michael" \
-H "Authorization: Bearer YOUR_TOKEN"

Example Response:

{
"languages": [
"English",
"Spanish",
"French"
]
}

Voice Service Methods

AIVoiceService

The AIVoiceService class handles all voice-related operations.

1. Get Speaker Profiles from External API

Method: get_speaker_profiles_from_external_api()

Description: Fetches speaker profiles from the external Voicing TTS API with optional filtering.

Parameters:

  • language (Optional[str]): Filter by language
  • model_id (Optional[str]): Filter by TTS model ID (determines provider)

Returns: List[IAIVoiceResponseSchema]

Code Example:

from app.services.aivoice_service import AIVoiceService
from app.dependencies.storage_service import get_storage_service
from sqlalchemy.ext.asyncio import AsyncSession

# Initialize service
aivoice_service = AIVoiceService(
db=db_session,
storage_service=storage_service
)

# Get all voices
all_voices = await aivoice_service.get_speaker_profiles_from_external_api()

# Get voices filtered by language
english_voices = await aivoice_service.get_speaker_profiles_from_external_api(
language="English"
)

# Get voices filtered by model (provider)
model_voices = await aivoice_service.get_speaker_profiles_from_external_api(
model_id="019cc123-4def-5678-9abc-def012345678"
)

2. Get All Speaker Languages

Method: get_all_speaker_languages()

Description: Extract all unique languages from available speaker profiles.

Parameters:

  • speaker (Optional[str]): Filter languages for a specific speaker
  • model_id (Optional[str]): Filter by TTS model ID

Returns: List[str] (sorted list of unique languages)

Code Example:

# Get all available languages
all_languages = await aivoice_service.get_all_speaker_languages()

# Get languages for a specific speaker
michael_languages = await aivoice_service.get_all_speaker_languages(
speaker="Michael"
)

# Get languages for a specific model
model_languages = await aivoice_service.get_all_speaker_languages(
model_id="019cc123-4def-5678-9abc-def012345678"
)

TTS Models

TTS Models define the AI engines that power voice generation. Each model supports specific voices and has unique capabilities.

TTS Model Structure

Database Schema:

class TTSModel(Base):
id: UUID # Unique identifier
model_name: str # Technical model name
display_name: str # Human-readable name
provider: str # "voicing"
description: Optional[str] # Model description
supported_speakers: List[Dict] # Available speakers/voices
tags: List[str] # Categorization tags
is_active: bool # Whether model is active
recommended: bool # Whether model is recommended
created_at: datetime
updated_at: datetime

Supported Speakers Format

Each TTS model contains a supported_speakers array with voice information:

{
"supported_speakers": [
{
"voice_id": "1SM7GgM6IMuvQlz2BwM3",
"name": "Michael",
"language": "en",
"gender": "Male",
"accent": "American"
},
{
"voice_id": "2TN8HhN7JNuvRlz3CxN4",
"name": "Sarah",
"language": "en",
"gender": "Female",
"accent": "British"
}
]
}

TTS Model API Endpoints

Create TTS Model

Endpoint: POST /api/v1/tts/models

Request Body:

{
"model_name": "voicing_default",
"display_name": "Voicing Default",
"provider": "voicing",
"description": "Default voice model",
"supported_speakers": [
{
"voice_id": "1SM7GgM6IMuvQlz2BwM3",
"name": "Michael",
"language": "en"
}
],
"tags": ["multilingual", "high-quality"],
"is_active": true,
"recommended": true
}

Get TTS Models

Endpoint: GET /api/v1/tts/models

Query Parameters:

  • provider (optional): Filter by provider ("voicing")
  • is_active (optional, boolean): Filter by active status
  • recommended (optional, boolean): Filter by recommended status

Response:

{
"success": true,
"data": [
{
"id": "019cc123-4def-5678-9abc-def012345678",
"display_name": "Voicing Default",
"description": "Default voice model",
"supported_speakers": [...],
"tags": ["multilingual"],
"is_active": true,
"recommended": true,
"created_at": "2025-01-08T10:00:00Z",
"updated_at": "2025-01-08T10:00:00Z"
}
],
"total": 1
}

Voice Generation

Text-to-Speech Generation

The platform generates speech from text using configured voices and TTS providers.

TTS Generation Process

  1. Text Input: User provides text to convert
  2. Voice Selection: System uses configured voice settings
  3. Provider Selection: Based on TTS_PROVIDER setting or voice speaker_type
  4. Audio Generation: TTS API generates audio
  5. Format Conversion: Audio converted to WAV format
  6. Storage: Audio stored in cloud storage
  7. Record Creation: TTS record created in database

TTS Generation API

Endpoint: POST /api/v1/tts/generate

Request Body:

{
"text": "Hello, this is a test message for text-to-speech conversion.",
"voice": "1SM7GgM6IMuvQlz2BwM3",
"language": "English"
}

Response:

{
"success": true,
"message": "Audio generated and stored successfully",
"data": {
"id": "123e4567-e89b-12d3-a456-426614174000",
"text": "Hello, this is a test message for text-to-speech conversion.",
"voice": "1SM7GgM6IMuvQlz2BwM3",
"language": "en",
"audio_url": "https://storage.googleapis.com/bucket/tts/user123/audio.wav",
"provider": "voicing",
"file_size": 48000,
"created_at": "2025-01-08T14:30:00Z"
}
}

TTS Service Code Example

from app.services.tts_service import TTSService
from app.dependencies.storage_service import get_storage_service
from sqlalchemy.ext.asyncio import AsyncSession

# Initialize service
tts_service = TTSService(
storage_service=storage_service,
db=db_session
)

# Generate and store audio
tts_record = await tts_service.generate_and_store_audio(
user_id="user_123",
text="Hello, welcome to our service!",
voice="1SM7GgM6IMuvQlz2BwM3",
language="en",
organization_id="org_456"
)

# Access generated audio
audio_url = tts_record.audio_url
print(f"Audio available at: {audio_url}")

Voicing TTS Generation

Class: VoicingTTS

Methods:

text_to_audio()

Generate audio using Voicing TTS API.

Parameters:

  • text (str): Text to convert
  • voice (str): Voice ID or speaker name
  • language (str, default: "en"): Language code or name
  • output_format (str, default: "pcm_24000"): Audio format
  • speed (float, default: 1.0): Speech speed
  • expressive_level (int, default: 30): Expressiveness (0-100)
  • translate_text (bool, default: True): Auto-translate if needed

Returns: bytes (Base64-encoded audio string)

Code Example:

from app.utils.voice_generation_utils import VoicingTTS

# Initialize client
voicing_tts = VoicingTTS(
api_key="your_api_key",
api_url="https://api.voicing.ai"
)

# Generate audio
audio_base64 = await voicing_tts.text_to_audio(
text="Hello, this is a test message.",
voice="speaker_001",
language="English",
speed=1.0,
expressive_level=50,
translate_text=False
)

# Decode base64 to bytes
import base64
audio_bytes = base64.b64decode(audio_base64)

Voice Cloning

Voice cloning allows you to create custom voices by training on audio samples.


Complete Code Examples

Example 1: Get Available Voices

from fastapi import Depends
from app.services.aivoice_service import AIVoiceService
from app.dependencies.services import get_aivoice_service
from app.models.auth import AuthenticatedUserModel
from app.dependencies.current_user import require_permission
from app.constants.user import PERMISSIONS

async def get_voices_example(
language: str = None,
model_id: str = None,
aivoice_service: AIVoiceService = Depends(get_aivoice_service),
user: AuthenticatedUserModel = Depends(require_permission(PERMISSIONS.VOICES.VIEW))
):
"""Get available voices with optional filtering"""

# Get all voices
all_voices = await aivoice_service.get_speaker_profiles_from_external_api()

# Get voices filtered by language
if language:
filtered_voices = await aivoice_service.get_speaker_profiles_from_external_api(
language=language
)
return {"voices": filtered_voices}

# Get voices filtered by model (provider)
if model_id:
model_voices = await aivoice_service.get_speaker_profiles_from_external_api(
model_id=model_id
)
return {"voices": model_voices}

return {"voices": all_voices}

Example 2: Configure Voice in Assistant

from app.schemas.assistant import AssistantCreateRequest, TTSSettings, AssistantVoice

# Create assistant with voice configuration
assistant_data = AssistantCreateRequest(
basic_settings={
"basic_info": {
"name": "Customer Support Assistant",
"description": "Handles customer inquiries",
"model_selection": "019bb7a2-bc4f-7f09-a79c-2d6445d2811f"
},
"assistant_flow_manager": {
"assistant_mode": "prompt",
"pathway_id": None
}
},
tts_settings=TTSSettings(
model="019cc123-4def-5678-9abc-def012345678",
voice=AssistantVoice(
voice_id="1SM7GgM6IMuvQlz2BwM3",
speaker_type="voicing",
speaker_name="Michael",
language="en",
stability=0.6,
similarity_boost=0.7
),
speech_enhancements={
"expressiveness": 75,
"speed": 1.0,
"enable_number_normalization": True
}
)
)

# Create assistant via API
response = await create_assistant(assistant_data)

Example 3: Generate TTS Audio

from app.services.tts_service import TTSService
from app.dependencies.storage_service import get_storage_service
from sqlalchemy.ext.asyncio import AsyncSession

async def generate_tts_example(
db: AsyncSession,
storage_service: BaseStorageService,
user_id: str,
text: str,
voice_id: str,
language: str = "en"
):
"""Generate TTS audio and store it"""

# Initialize TTS service
tts_service = TTSService(
storage_service=storage_service,
db=db
)

# Generate and store audio
tts_record = await tts_service.generate_and_store_audio(
user_id=user_id,
text=text,
voice=voice_id,
language=language
)

return {
"success": True,
"audio_url": tts_record.audio_url,
"record_id": str(tts_record.id),
"file_size": tts_record.file_size
}

Example 4: Clone Voice

Voice cloning examples are intentionally omitted here.

Example 5: Get TTS Records

from app.services.tts_service import TTSService
from uuid import UUID

async def get_tts_records_example(
tts_service: TTSService,
user_id: str,
organization_id: UUID = None,
limit: int = 50
):
"""Get TTS records for a user"""

# Get user's TTS records
records = await tts_service.get_user_tts_records(
user_id=user_id,
limit=limit,
organization_id=organization_id
)

# Format response
return {
"records": [
{
"id": str(record.id),
"text": record.text,
"voice": record.voice,
"language": record.language,
"audio_url": record.audio_url,
"provider": record.provider,
"file_size": record.file_size,
"created_at": record.created_at.isoformat()
}
for record in records
],
"total": len(records)
}

Example 6: Voice Configuration with Multiple Languages

from app.schemas.assistant import TTSSettings, MultiLanguageSettings

# Configure voice with multi-language support
tts_settings = TTSSettings(
model="019cc123-4def-5678-9abc-def012345678",
voice=AssistantVoice(
voice_id="1SM7GgM6IMuvQlz2BwM3",
speaker_type="voicing",
speaker_name="Michael",
language="en",
stability=0.5,
similarity_boost=0.5
),
multi_language=MultiLanguageSettings(
enabled=True,
auto_detect_language=True,
translation_provider="default",
supported_languages=["English", "Spanish", "French", "German"]
)
)

Voice Response Schemas

IAIVoiceResponseSchema

Response schema for voice profiles from external API.

class IAIVoiceResponseSchema(BaseModel):
id: str
name: str
language: str
accent: str
voice_id: Optional[str] = None
filler_words: str
gender: Optional[str] = None
rating: Optional[str] = None
tag: Optional[str] = None
profile_photo_url: Optional[str] = None
filler_words_audio_file: Optional[str] = None
filler_words_audio_urls: Optional[str] = None
config_file: Optional[str] = None
weight_file: Optional[str] = None
speaker_metadat: Optional[Dict] = {}
reference_audio_file: Optional[str] = None

AssistantVoiceResponse

Response schema for voice in assistant settings (without speaker_type).

class AssistantVoiceResponse(BaseModel):
voice_id: str | None = None
speaker_name: str | None = None
language: Optional[str] = None
stability: Optional[float] = 0.5
similarity_boost: Optional[float] = 0.5

Voice Constants

Voice Type Enum

class VoiceType(str, BaseEnum):
CURATED = "curated" # Pre-built voices
CLONED = "cloned" # Custom cloned voices

Voice Status Enum

class Status(str, BaseEnum):
DRAFT = "draft" # Voice creation in progress
COMPLETED = "completed" # Voice ready for use

Best Practices

1. Voice Selection

  • Choose Appropriate Voice: Match voice characteristics to use case

    • Professional voices for business applications
    • Friendly voices for customer service
    • Neutral voices for general purposes
  • Language Matching: Ensure voice language matches conversation language

  • Provider Selection: Consider provider capabilities for your needs

    • Voicing: Best for custom requirements and specific languages

2. Voice Settings Tuning

  • Stability:

    • Use lower values (0.3-0.5) for more natural, expressive speech
    • Use higher values (0.7-1.0) for consistent, predictable speech
  • Similarity Boost:

    • Use higher values (0.7-1.0) when voice consistency is critical
    • Use lower values (0.3-0.5) for more variation
  • Speed:

    • Default (1.0) is usually optimal
    • Adjust slightly (0.9-1.1) for specific use cases

3. Voice Cloning

  • Audio Quality: Use high-quality audio samples (16kHz+, clear audio)
  • Sample Length: Provide 30-60 seconds of clear speech
  • Sample Variety: Include different phrases and emotions
  • Background Noise: Minimize background noise in samples

4. Performance Optimization

  • Caching: Enable TTS caching for frequently used phrases
  • Batch Generation: Generate multiple audio files in batch when possible
  • Storage Management: Regularly clean up unused TTS records

5. Multi-Language Support

  • Auto-Detection: Enable auto language detection for international users
  • Language Mapping: Ensure proper language code mapping
  • Translation: Use translation provider for seamless multilingual experience

Troubleshooting

Voice Not Generating

Issue: TTS generation fails or returns empty audio

Solutions:

  1. Check API keys are configured correctly
  2. Verify voice_id is valid for the provider
  3. Ensure text is not empty and within length limits
  4. Check provider API status
  5. Verify network connectivity

Wrong Voice Playing

Issue: Different voice than expected is used

Solutions:

  1. Verify voice_id matches the intended voice
  2. Check speaker_type matches the provider
  3. Ensure voice is available for the selected language
  4. Verify TTS model supports the voice

Poor Audio Quality

Issue: Generated audio sounds distorted or unclear

Solutions:

  1. Adjust stability and similarity_boost settings
  2. Check audio format and sample rate
  3. Verify provider model selection
  4. Test with different voice settings

Voice Cloning Fails

Issue: Voice cloning process fails or produces poor results

Solutions:

  1. Ensure audio samples are high quality (16kHz+)
  2. Provide sufficient audio length (30-60 seconds)
  3. Minimize background noise
  4. Use clear, natural speech samples
  5. Check your provider's API quota and limits

Language Not Supported

Issue: Voice doesn't support requested language

Solutions:

  1. Check voice's supported languages
  2. Use a different voice that supports the language
  3. Enable translation if available
  4. Verify language code format (e.g., "en" vs "English")

API Reference Summary

Voice Endpoints

EndpointMethodDescriptionAuth Required
/api/v1/aivoicesGETGet curated voicesYes (VOICES.VIEW)
/api/v1/aivoices/languagesGETGet available languagesYes (VOICES.VIEW)
/api/v1/tts/generatePOSTGenerate TTS audioYes
/api/v1/tts/recordsGETGet TTS recordsYes
/api/v1/tts/modelsGETGet TTS modelsYes
/api/v1/tts/modelsPOSTCreate TTS modelYes (Admin)

Voice Configuration in Assistant

{
"tts_settings": {
"model": "<TTS_MODEL_UUID>",
"voice": {
"voice_id": "<VOICE_ID>",
"speaker_type": "voicing",
"speaker_name": "<SPEAKER_NAME>",
"language": "<LANGUAGE_CODE>",
"stability": 0.0-1.0,
"similarity_boost": 0.0-1.0
},
"speech_enhancements": {
"expressiveness": 0-100,
"speed": 0.25-2.0,
"enable_number_normalization": true
}
}
}

Environment Variables

Required for Voicing TTS

VOICING_TTS_API_KEY=your_voicing_api_key
VOICING_TTS_API_URL=https://api.voicing.ai
TTS_PROVIDER=voicing

Storage Configuration

BUCKET_NAME=your_storage_bucket_name
CLOUD_PROVIDER=<your_cloud_provider>

Last updated: [01/19/2026]