Lexical Datasets for NLP: Pronunciations data

Text-to-speech (TTS) model quality and performance can be improved with Oxford Languages pronunciation data, covering most words and English variations such as American, British, Australian, Indian and World English with curated pronunciation data including IPA transcription pronunciations, audio recording files, and alternative respelling data.

Our data is carefully compiled by our in-house team of language experts as an output of our language research programme, one of the largest in the world.

Syllabified and non-syllabified IPA (International Phonetic Alphabet) transcriptions for each wordform to give the most natural and accurate pronunciation for speech synthesis use cases

Variant spellings of each word (# is used as a separator) to enable coverage of pronunciation variations which are used and accepted

Variety of English, British, American, Australian, Indian and World English to allow for the user experience to be tailored per locale

Pronunciation group identifier, a unique identifier for each pronunciation group. Pronunciations which have the same identifier are used interchangeably e.g. engross /ɪnˈɡroʊs/ /ɛnˈɡroʊs/

Pronunciation data for Natural Language Processing

Pronunciation data for
Natural Language Processing