Case study | WellSaid Labs & Oxford Languages

WellSaid is an artificial intelligence voice platform that provides AI voice generation at scale in the form of text-to-speech (TTS) products. It offers a web app and Application Programming Interfaces (APIs) that enable users to create speech synthesis for their own products and services.

The platform is used by content creators across various industries who need voiceovers for advertising, corporate training, video production, and audiobooks.

The software uses a novel respelling system that enables users to control the pronunciation of specific words in their voiceovers. Users can also adjust the tone and performance of the AI voice by guiding its pace, loudness, and pausing.

Find out more about WellSaid here, powered by Oxford Languages data.

To embed a respelling feature within its text-to-speech product (an industry first), WellSaid’s deep learning model required a trusted resource for phonetic pronunciation.

Its respelling mechanism had to be powered by accurate, up-to-date pronunciation datasets that would give users more control over phonetic preferences, including consonant and vowel sounds, and syllabic delivery. Likewise, it was important for the datasets to be updated regularly to include new words as they are added to the English language.

For the software to reflect the diversity of its customer base, WellSaid wanted to expand its respelling feature into the various accents and dialects offered on its platform. However, finding high-quality data that catered to the level of variations that the AI model required proved challenging.

"Working with the Oxford Languages team has been a huge asset for us. We didn’t just want a data export; we wanted a relationship - and that’s exactly what we have. Their dataset pronunciations are succinct, straight to the point, and have the right balance of contextual, historical, and alternative metadata for each entry."

— Rhyan Johnson, Machine Learning Product Manager, WellSaid

The Solution

After researching several potential providers, WellSaid chose Oxford Languages’ datasets to train its AI model using International Phonetic Alphabet (IPA) transcriptions. It was important that WellSaid was able to develop its own respelling system that could guide the TTS product towards the correct pronunciations of real, or fictitious words.

WellSaid was aware of Oxford Languages’ reputation and is proud to power its system with such high-quality data. The Oxford English Pronunciations dataset has over 500,000 transcriptions, including syllabified and non-syllabified transcriptions and provides accompanying audio for most of these words.

Oxford Languages took the time to understand WellSaid’s formatting preferences and delivers its datasets bi-annually as CSV files. The way in which the datasets are specifically labelled as US or UK English fulfills WellSaid’s need to load the information into its machine learning processes quickly and easily.

With these datasets, WellSaid now enables its users to adjust text to improve the outcome of text-to-speech pronunciations. They can also adjust parts of speech and specify heteronyms (words with the same spellings but different pronunciations). The company also has access to 15,000 initialisms (abbreviations formed from the initials of letters), which help with unique pronunciations that do not follow conventional rules.

Given that Oxford Languages’ datasets also cover domain-specific terminology, WellSaid hopes to improve the accuracy of the AI’s pronunciation in sectors like medicine. Likewise, users can set pronunciation cues as default, so they don’t need to continually adjust their preferences every time a technical word appears in a script.

Content moderation was also an important factor for WellSaid, as the company had to be sure that its AI model was learning from (and offering users) pronunciation recommendations that were suitable and relevant. This was equally as important to Oxford Languages, which works to ensure that its content is contemporary, appropriate, and reflective of current language usage.

WellSaid & Oxford Languages

Giving content creators more control over their voiceover preferences