Skip to main content

Voice ID — Automatic Speaker Identification

Teach Wave to recognize voices so speakers are automatically labeled by name in future recordings

Updated over a week ago
Introducing Voice ID: Wave Learns Who’s Talking

Wave learns to recognize the people you talk to — so your transcripts and summaries include real names instead of "Speaker 1" and "Speaker 2."

Save a speaker’s name once, and Wave identifies them automatically in every future recording. No manual labeling. It gets better every time you record.


Why Speaker Names Matter

A summary that says "Sarah suggested moving the deadline to Friday" is immediately useful. A summary that says "Speaker B suggested moving the deadline to Friday" requires you to figure out who Speaker B is.

Voice ID makes every transcript and summary more readable, more searchable, and more actionable — because they include the names of the people who actually said things.


How It Actually Works

When you name a speaker in a recording, Wave creates a voice embedding — a mathematical fingerprint of that person’s voice.

Here’s what that means in plain English:

  • Wave takes the audio segments where that person spoke and converts them into a string of numbers that captures the unique qualities of their voice — pitch, cadence, tone, and vocal texture.

  • This fingerprint is not a recording. It can’t be played back. It can’t be reverse-engineered into audio. It’s purely mathematical.

  • Think of it like a voice signature. It’s only used to answer one question: does this speaker sound like someone you’ve saved?

When you make a new recording, Wave compares every speaker it hears against your saved Voice IDs. If there’s a confident match, the speaker is automatically labeled with their name — before you even open the transcript.

The matching is conservative on purpose. Wave would rather leave a speaker unlabeled than get it wrong.


It Gets Better Over Time

Voice IDs aren’t static — they improve automatically.

Every time Wave confidently recognizes someone, it quietly refines the profile by incorporating the new voice data. More recordings of a person means a stronger, more accurate fingerprint.

Wave also uses a sliding window approach, giving more weight to recent recordings. This means Voice IDs stay accurate even if someone’s voice changes subtly — a cold, a new microphone, a different room.

You can track this progress directly. Each Voice ID shows a confidence level:

  • Building — Wave has limited data. Recognition may not be consistent yet.

  • Good — Enough data for reliable recognition in most situations.

  • Strong — Highly accurate recognition across different recording conditions.

The more you record, the stronger your Voice IDs get — with zero extra effort from you.


Setting Up Voice ID

Create a Voice ID (10 seconds)

  1. Open any recording with speaker labels in Wave

  2. Tap a speaker name (e.g., "Speaker 1")

  3. Enter the person’s real name — that’s it

Wave creates the voice fingerprint and starts recognizing them automatically in future recordings.

You can also tap Guess Speaker Names to let Wave’s AI suggest names based on conversation context.

Save Voice IDs

After saving speaker names, a modal appears asking "Save Voice IDs?"

  • Each speaker shows a checkbox — toggle on the ones you want to save

  • You can mark one speaker as "This is my voice" to set your own host profile

  • Optionally toggle "Update summary with speaker names" to re-summarize with the correct names

  • Tap Save Voice IDs

Manage Your Voice IDs

Go to Settings → Voice ID to see all saved Voice IDs.

From here you can:

  • Toggle Automatic Voice Matching on or off

  • Play a sample of any Voice ID to confirm the voice

  • Rename a Voice ID by tapping the name

  • Delete a Voice ID by swiping left

Each Voice ID card shows the speaker name, confidence level, number of samples, and total audio duration.


Privacy & Security

Voice data is sensitive. We built Voice ID with that front and center.

  • Your Voice IDs, your account only. Voice IDs are scoped entirely to your account. No one else can access them — not other users, not Wave employees, not anyone.

  • One-way embeddings. Voice fingerprints are mathematical vectors, not audio. They cannot be played back or reverse-engineered into a recording of someone’s voice.

  • SOC 2 compliant infrastructure. Voice IDs are stored with the same SOC 2 grade security that protects all your Wave data — encrypted at rest and in transit.

  • Delete anytime. Remove a Voice ID and its embedding data is permanently deleted from our servers. No soft deletes, no hidden backups.

  • Disable with one toggle. Turn off Voice ID entirely in Settings. Wave will stop creating and matching immediately.

  • No AI training. Voice embeddings are never used to train AI models. They exist only to identify speakers in your recordings.


Where It Works

Voice ID works everywhere you record with Wave.

Your Voice IDs sync across all platforms — name a speaker on your phone, and they’ll be recognized on the web and desktop apps too.

Voice ID works with every recording type: in-person recordings, phone calls, meeting bot recordings, desktop capture, and imported audio files.


Tips for Best Results

  • Clear audio matters — Voice IDs work best from recordings with minimal background noise

  • More enrollments = better accuracy — Strengthen Voice IDs across multiple sessions

  • At least 10 seconds per speaker — Short clips (under 10s) don’t have enough audio for reliable enrollment

  • Enroll yourself first — Mark "This is my voice" so Wave always knows which speaker is you

  • Works with recurring speakers — Best for people you record often: team members, clients, interviewees


Troubleshooting

  • Speaker not matched? They may not have enough audio in the recording (minimum 10 seconds). Try strengthening the Voice ID with more sessions.

  • Wrong match? Edit the speaker name and re-enroll. Strengthening with correct audio improves accuracy.

  • No enrollment option? Make sure you’ve renamed speakers first — Wave can’t create Voice IDs from generic labels like "Speaker A."

  • "Building" confidence? This is normal for new Voice IDs. Enroll the speaker from 2-3 more recordings to reach "Good" or "Strong."

Did this answer your question?