Case Study: WellSaid Labs and Oxford Languages

WellSaid Labs & Oxford Languages

Giving content creators more control over their voiceover preferences

WellSaid Labs is an artificial intelligence voice platform that provides AI voice generation at scale in the form of text-to-speech (TTS) products. It offers a web app and Application Programming Interfaces (APIs) that enable users to create speech synthesis for their own products and services.

 

The platform is used by content creators across various industries who need voiceovers for advertising, corporate training, video production, and audiobooks.

 

The software uses a novel respelling system that enables users to control the pronunciation of specific words in their voiceovers. Users can also adjust the tone and performance of the AI voice by guiding its pace, loudness, and pausing.

 

Find out more about WellSaid Labs here, powered by Oxford Languages data.

 

The Problem


 

Users of text-to-speech platforms often need a standardized pronunciation guide, such as the International Phonetic Alphabet (IPA). To enhance the user experience, WellSaid Labs developed a more intuitive and approachable respelling system and embedded it directly in their deep learning TTS model.

 

To embed a respelling feature within its text-to-speech product (an industry first), WellSaid Labs’ deep learning model required a trusted resource for phonetic pronunciation.

 

Its respelling mechanism had to be powered by accurate, up-to-date pronunciation datasets that would give users more control over phonetic preferences, including consonant and vowel sounds, and syllabic delivery. Likewise, it was important for the datasets to be updated regularly to include new words as they are added to the English language.

 

For the software to reflect the diversity of its customer base, WellSaid Labs wanted to expand its respelling feature into the various accents and dialects offered on its platform. However, finding high-quality data that catered to the level of variations that the AI model required proved challenging.

 

"Working with the Oxford Languages team has been a huge asset for us. We didn’t just want a data export; we wanted a relationship - and that’s exactly what we have. Their dataset pronunciations are succinct, straight to the point, and have the right balance of contextual, historical, and alternative metadata for each entry."

 

— Rhyan Johnson, Machine Learning Product Manager, WellSaid Labs

 

The Solution


 

After researching several potential providers, WellSaid Labs chose Oxford Languages’ datasets to train its AI model using International Phonetic Alphabet (IPA) transcriptions. It was important that WellSaid Labs was able to develop its own respelling system that could guide the TTS product towards the correct pronunciations of real, or fictitious words.

 

WellSaid Labs was aware of Oxford Languages’ reputation and is proud to power its system with such high-quality data. The Oxford English Pronunciations dataset has over 500,000 transcriptions, including syllabified and non-syllabified transcriptions and provides accompanying audio for most of these words.

 

Oxford Languages took the time to understand WellSaid Labs’ formatting preferences and delivers its datasets bi-annually as CSV files. The way in which the datasets are specifically labelled as US or UK English fulfills WellSaid Labs’ need to load the information into its machine learning processes quickly and easily.

 

With these datasets, WellSaid Labs now enables its users to adjust text to improve the outcome of text-to-speech pronunciations. They can also adjust parts of speech and specify heteronyms (words with the same spellings but different pronunciations). The company also has access to 15,000 initialisms (abbreviations formed from the initials of letters), which help with unique pronunciations that do not follow conventional rules.

 

Given that Oxford Languages’ datasets also cover domain-specific terminology, WellSaid Labs hopes to improve the accuracy of the AI’s pronunciation in sectors like medicine. Likewise, users can set pronunciation cues as default, so they don’t need to continually adjust their preferences every time a technical word appears in a script.

 

Content moderation was also an important factor for WellSaid Labs, as the company had to be sure that its AI model was learning from (and offering users) pronunciation recommendations that were suitable and relevant. This was equally as important to Oxford Languages, which works to ensure that its content is contemporary, appropriate, and reflective of current language usage.

 

Find WellSaid Labs online


 

wellsaidlabs.com

 

Images courtesy of WellSaid Labs.

 

 

Contact us


 

Oxford Languages’ pronunciations data for text-to-speech

 

To find out more about Oxford Languages from Oxford University Press, and how our data can power your products, get in touch through our Contact page.