Language Datasets | Oxford Languages

50+ available languages We offer flexible, curated datasets for more than 50 of the world’s major languages, which include definitions, translations, examples, idioms, phonetics and phonetic transcriptions, regional varieties, and inflected forms.

Uniquely, we also offer an ever-growing portfolio of high quality datasets in low resource languages, part of our ongoing commitment to ensure that all language communities benefit from digital access and representation.

See our available languages ⟶

Children's Language Datasets

Our children's language datasets curate the right words, defined at the right level, for each age and stage of learning. They are created using our unique Oxford Children's Corpus, the world's largest children's language database.

Read more here ⟶

How could our language datasets enhance your products?

This deep experience informs how we support your projects, large or small, to ensure language and technology integrate seamlessly to enhance your products.

A partnership with Oxford ensures that the language content and data you need meets Oxford’s quality standards and gives you a single point of contact for multiple languages.

Flexible and curated language datasets

Flexible and curated
language datasets