ElevenLabs’ new speech-to-text model claims 97% accuracy

DM Television

Crypto execs beef up security following string of kidnappings: Report

May

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

ElevenLabs’ new speech-to-text model claims 97% accuracy

Tags: api applications audio content option video

Author: DATE POSTED:February 27, 2025

Feed: Dataconomy

View: Original article

ElevenLabs’ new speech-to-text model claims 97% accuracy

ElevenLabs, an AI startup recognized for its audio-generation capabilities, has launched a stand-alone speech-to-text model named Scribe. The launch follows a substantial $180 million funding round, elevating the company’s valuation to $3.3 billion.

ElevenLabs launches Scribe: A new AI speech-to-text model

Scribe supports over 99 languages and achieves a word error rate of less than 5% in over 25 languages, including English, which has a claimed accuracy rate of 97%. Other languages in the excellent accuracy category include French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese. Additional languages are classified with varying error rates from high (5% to 10%) to moderate (25% to 50%).

Video: ElevenLabs

The new model reportedly outperforms Google’s Gemini 2.0 Flash and OpenAI’s Whisper Large v3 in multiple languages based on FLEURS and Common Voice benchmark tests. Scribe is the first separate speech detection model from ElevenLabs, which had previously integrated speech-to-text components into its AI conversational agent platform.

ChatGPT Plus subscribers now enjoy deep research feature

CEO Mati Staniszewski highlighted the goal of enhancing understanding of conversations: “We are working on ways to move away from only generating content and understanding and transcribing speech,” he said. The model features speaker diarization, word-level timestamps for accurate subtitles, and auto-tagging of non-verbal audio events.

Scribe is currently limited to pre-recorded audio formats, with a real-time version expected to be released soon. The pricing for Scribe is $0.40 per hour of transcribed audio, with an introductory 50% discount available for the first six weeks.

elevenlabs-new-speech-to-text-model-claims-97-percent-accuracy

Image: ElevenLabs

Benchmark tests indicate Scribe records the lowest word error rates for various languages, achieving 98.7% in Italian and 96.7% in English. Key features include the ability to differentiate speakers in multi-speaker recordings, detailed timestamps, and the detection of non-speech events.

For enterprise users, Scribe serves as a scalable transcription tool, beneficial for sectors that rely on documentation, meeting transcriptions, and accessibility initiatives. The forthcoming real-time version could further enhance its utility in live communication scenarios.

The launch of Scribe coincided with the release of Hume AI’s Octave, a customizable, LLM-powered text-to-speech model tailored for content creation. ElevenLabs claims Scribe has consistently outperformed competitors in transcription accuracy.

Scribe can be accessed directly through the ElevenLabs website or API, allowing users to upload audio or video files for formatted transcripts. Its structured output aids integration into various applications, presenting a competitive option for businesses seeking high-accuracy transcription services.

Featured image credit: ElevenLabs

Feed: Dataconomy

View: Original article

Tags: api applications audio content option video