News

AWS expands Amazon Transcribe’s language recognition capabilities with generative AI

Published

2 years ago

November 28, 2023

In a major development announced at the AWS re: Invent 2023, Amazon Web Services (AWS) has enhanced its transcription platform, Amazon Transcribe, by incorporating generative AI. The platform can now recognize a staggering 100 spoken languages, marking a significant stride in the realm of AI-powered language processing.

Amazon Transcribe is an automatic speech recognition (ASR) service provided by Amazon Web Services (AWS) that enables developers to easily add speech-to-text capabilities to their applications.

It uses deep learning technologies to automatically generate time-stamped text transcripts from audio files, making it easy for developers to integrate speech-to-text capabilities into their applications.

It is designed to transcribe customer service calls, automate closed captioning and subtitling, and generate metadata for media assets to create a fully searchable archive. It can also be used in clinical documentation applications through Amazon Transcribe Medical, which adds medical speech-to-text capabilities.

The service is fully managed and continuously trained, providing state-of-the-art speech recognition models to improve business outcomes and extract actionable insights from customer conversations.

Amazon Transcribe is intended for use on audio samples that contain naturally occurring human speech and supports a large general-purpose vocabulary, with the ability to add custom vocabularies and custom language models for coverage of words and phrases from specialized domains.

Amazon Transcribe, a crucial tool for AWS customers seeking speech-to-text capabilities, previously supported 79 languages as of late 2022. The latest update showcases AWS’s commitment to linguistic diversity, as it now caters to 100 languages, ensuring a more inclusive and accurate transcription service.

To achieve this multilingual capability, Amazon Transcribe underwent training on “millions of hours of unlabeled audio data from over 100 languages.” AWS utilized self-supervised algorithms, allowing the platform to discern patterns in human speech across different languages and accents.

Notably, efforts were made to prevent the over-representation of certain languages in the training data, emphasizing accuracy across both widely spoken and lesser-used languages.

According to AWS, the improvement will make its Call Analytics platform better, often utilized by contact centers. Amazon Transcribe Call Analytics, now fueled by generative AI models, streamlines interactions between agents and customers.