Voice Cloning

Steno.ai uses advanced voice cloning technology to create a digital replica of your voice, allowing your AI Twin to speak with your authentic voice during conversations.

How Voice Cloning Works

Voice cloning analyzes audio samples of your voice to create a digital model that can generate natural-sounding speech. The AI Twin uses this voice clone during real-time voice conversations with users.

What Makes a Good Voice Sample

For the best results, your audio sample should be:

High Quality: Clear audio with minimal background noise
Single Speaker: Only your voice, no other speakers
Natural Speech: Conversational tone, not scripted reading
Varied Content: Different phrases, questions, emotions
Consistent Audio: Same recording environment and equipment

Voice Clone Types

Standard Voice Clone

Audio Required: 2 minutes of clean audio

Quality: High-fidelity voice replication suitable for most use cases

Included In: All plans (Starter, Growth, Scale, Enterprise)

Best For:

Most conversational AI applications
Text-to-speech with your voice
General customer engagement
Initial launches and testing

Characteristics:

Accurate voice replication
Natural conversational tone
Standard expressiveness
Fast generation time

Professional Voice Clone

Audio Required: 1 hour of clean audio

Quality: Higher level of expressiveness and emotion

Included In: Scale plan and above

Best For:

Premium customer experiences
Emotionally nuanced conversations
Coaching and personal development applications
Longer conversation sessions

Characteristics:

Enhanced emotional range
More natural prosody and intonation
Better handling of complex sentences
Subtle voice variations for emphasis

Voice Sample Preparation

If You Have Existing Audio

Great sources for voice samples include:

Podcast Episodes: High-quality conversational audio
Video Recordings: Extract audio from YouTube videos or courses
Webinar Recordings: Clear speaking with varied content
Professional Recordings: Studio-quality audio from books or courses

If You Don’t Have Audio

No problem! Our team will help you create a suitable voice sample:

For Standard Clone (2 minutes):

We’ll provide a script to read
Record on your phone or computer
Use a quiet room with minimal echo
Follow our recording guidelines

For Professional Clone (1 hour):

We’ll provide varied scripts and prompts
May require multiple recording sessions
More emphasis on recording quality
May recommend professional recording setup

Recording Best Practices

Equipment

Minimum:

Smartphone with good microphone
Quiet room
Pop filter (or improvise with a sock over the mic)

Recommended:

USB condenser microphone (Blue Yeti, Audio-Technica AT2020)
Closed room with soft furnishings
Microphone stand
Pop filter

Professional:

Studio-quality condenser microphone
Treated recording space
Audio interface
Professional editing software

Recording Environment

Quiet Location: No background noise, traffic, or HVAC sounds
Minimal Echo: Avoid empty rooms; use soft furnishings to absorb sound
Consistent Acoustics: Record all audio in the same location
Eliminate Interruptions: Turn off notifications, close windows, etc.

Recording Technique

Consistent Distance: Stay 6-12 inches from the microphone
Natural Speaking: Use your normal conversational voice
Steady Pace: Not too fast or too slow
Clear Articulation: Enunciate words clearly without over-pronouncing
Varied Emotion: Include questions, statements, emphasis, warmth

Voice Clone Delivery

Timeline

Standard Clone: Ready within your 10-day demo delivery timeline
Professional Clone: May add 2-3 days to demo delivery

Testing Your Voice Clone

When you receive your demo AI Twin, test the voice clone for:

Accuracy: Does it sound like you?
Naturalness: Does it sound conversational and not robotic?
Clarity: Are words clear and easy to understand?
Emotion: Does it convey appropriate warmth and tone?

If the voice clone doesn’t meet your expectations, we’ll iterate based on your feedback.

Voice Clone Updates

When to Update Your Voice Clone

Consider updating your voice clone if:

Your voice has changed significantly
You want to adjust the tone or energy level
The original recording quality was poor
You’re upgrading from Standard to Professional clone

How to Request an Update

Contact support@steno.ai with:

Description of what you’d like to change
New audio samples (if applicable)
Timeline for the update

Updates may incur additional fees depending on the scope of changes required.

Multilingual Capabilities

Voice Clone Language Flexibility

Even if your voice clone is created from English audio, your AI Twin can speak in over 23 supported languages.

How It Works:

Voice characteristics transfer across languages
AI maintains your vocal tone and style
Accent may vary by language
Quality is consistent across languages

Example: Your English voice clone can speak fluent Mandarin while maintaining your vocal characteristics.

Voice Clone Ownership

Your Rights

You own the recordings you provide for voice cloning. The voice clone itself is created as part of our service to you.

Our Commitments

We will never use your voice clone for other customers
Your voice data is stored securely and separately from other customers
Upon contract termination, we will delete your voice clone per our data deletion policy

Technical Specifications

Supported Audio Formats: MP3, WAV, M4A, FLAC

Sample Rate: Minimum 16kHz (44.1kHz or 48kHz recommended)

Bit Depth: Minimum 16-bit (24-bit recommended)

File Size: No strict limit, but larger files take longer to process

Questions about voice cloning? Contact support@steno.ai.