Case Study: WellSaid

The objective

Giving content creators more control over their voiceover preferences with AI.

The company and the product

WellSaid is an artificial intelligence voice platform that provides AI voice generation at scale in the form of text-to-speech (TTS) products.

It offers a web app and Application Programming Interfaces (APIs) that enable users to create speech synthesis for their own products and services.

The platform is used by content creators across various industries who need voiceovers for advertising, corporate training, video production, and audiobooks.

The software uses a novel respelling system that enables users to control the pronunciation of specific words in their voiceovers. Users can also adjust the tone and performance of the AI voice by guiding its pace, loudness, and pausing.

Find out more about WellSaid here, powered by Oxford Languages.

“Working with the Oxford Languages team has been a huge asset for us. We didn’t just want a data export; we wanted a relationship – and that’s exactly what we have. Their dataset pronunciations are succinct, straight to the point, and have the right balance of contextual, historical, and alternative metadata for each entry.”

Rhyan Johnson | Machine Learning Product Manager, WellSaid

The problem

Users of text-to-speech platforms often need a standardized pronunciation guide, such as the International Phonetic Alphabet (IPA). To enhance the user experience, WellSaid developed a more intuitive and approachable respelling system and embedded it directly into their deep learning TTS model.

To embed a respelling feature within its text-to-speech product (an industry first), WellSaid’s deep learning model required a trusted resource for phonetic pronunciation.

Its respelling mechanism had to be powered by accurate, up-to-date pronunciation datasets that would give users more control over phonetic preferences, including consonant and vowel sounds, and syllabic delivery. Likewise, it was important for the datasets to be updated regularly to include new words as they’re added to the English language.

For the software to reflect the diversity of its customer base, WellSaid wanted to expand its respelling feature into the various dialects offered on its platform. However, finding high-quality data that catered to the level of variations that the AI model required proved challenging.

The Solution

After researching several potential providers, WellSaid chose our datasets to train its AI model using International Phonetic Alphabet (IPA) transcriptions.

It was important that WellSaid was able to develop its own respelling system that could guide the TTS product towards the correct pronunciations of real, or fictitious words.

WellSaid was aware of our reputation and is proud to power its system with such high-quality data. The Oxford English Pronunciations dataset has over 500,000 transcriptions, including syllabified and non-syllabified transcriptions and provides accompanying audio for most of these words.

We took the time to understand WellSaid’s formatting preferences and deliver its datasets bi-annually as CSV files. The way in which the datasets are specifically labelled as US or UK English fulfills WellSaid’s need to load the information into its machine learning processes quickly and easily.

With these datasets, WellSaid now enables its users to adjust text to improve the outcome of text-to-speech pronunciations. They can also adjust parts of speech and specify heteronyms (words with the same spellings but different pronunciations). The company also has access to 15,000 initialisms (abbreviations formed from the initials of letters), which help with unique pronunciations that don’t follow conventional rules.

Given that our datasets also cover domain-specific terminology, WellSaid hopes to improve the accuracy of the AI’s pronunciation in sectors like medicine. Likewise, users can set pronunciation cues as default, so they don’t need to continually adjust their preferences every time a technical words appears in a script.

Content moderation was also an important factor for WellSaid, as the company had to be sure that its AI model was learning from (and offering uses) pronunciation recommendations that were suitable and relevant. This was equally as important to us, as we work to ensure that our content is contemporary, appropriate, and reflective of current language usage.

For more information about WellSaid, powered by Oxford Languages data, click here.

Additional information

A recent webinar we held with WellSaid explored how human-curated language data can support innovative text-to-speech advancements by helping to tackle the complexities of AI voice generation and give content creators more control over their voiceover preferences.

Watch the recording

Find WellSaid online

If you’d like to learn more about WellSaid you can visit them here:

Website

Facebook

Images courtesy of WellSaid