Lexical data samples

Sample our English lexical datasets




At Oxford Languages, we are the leading provider of lexical datasets. Our data can be used for a range of purposes, including:



Training data: Lexical data to help train your AI and NLP processes.


Dictionary display: Find the definition of a word within your product to contain user experience.


Advanced feature support: Such as confirming whether your users have correctly spelled and/or used a word.


We have a range of structured lexical datasets that support a wide variety of use cases.

We offer lexical datasets to support software product development and/or enhancements. See our popular English monolingual datasets below:

Accurate and trusted data

Our data is human curated by our team of expert lexicographers. The Oxford name and the expected standards that come with it backs the data that will bolster your products and brands.

Flexible data delivery

Our lexical datasets are available in different data formats such as JSON via API and XML options.


Our Customer Success team is available to help you get the most value from our data.



Astrid use Oxford Languages data to validate their voice based, AI-powered language learning platform, enabling their users to communicate confidently in English.


Read our case study on Astrid ⟶





Oxford Languages and Kobo collaborated to develop a tailor-made solution for their built-in dictionary feature, creating a seamless experience for users.


Read our case study on Kobo ⟶



MonolingualIdeal for dictionary look up and display.
BilingualIdeal for translation.
BilingualizedIdeal for language learners.
ThesaurusIdeal for synonyms suggestions and NLP.
PronunciationsIdeal for demonstrating the pronunciation of a word.
SentencesIdeal for understanding how a word is used in context.

Our datasets feature: headwords, definitions, translations, pronunciations, parts of speech, senses, example sentences, synonyms, and etymologies.


Get in touch if you would like to sample our other datasets ⟶