How we create language content

How we create

language content

All of Oxford Languages’ content aims to describe, rather than prescribe, the way languages are used by people around the world. We take an evidence-based approach to language content creation, looking at real examples of the ways words are used in context to provide an accurate picture of a language.


To gather this evidence, our corpora – massive collections of spoken and written language data – track and record the very latest language developments across an enormous variety of publications, covering everything from specialist journals to newspapers to social media posts.


We have large corpora in English, Arabic, Indonesian, and many other languages in development, enabling the lexicographers and language technologists who create our dictionaries, datasets, and language resources to identify new and emerging words in context and spot trends and patterns in usage, spelling, regional varieties, and more.


Our expert team of lexicographers source all of our descriptive sentence examples from our vast language databases to provide accurate and meaningful descriptions of words in use. The team analyses the corpus data to select examples that support a word in the correct grammatical and semantic context without distracting from the essential information the definition conveys.


We do our best to eliminate sentence examples that repeat factually incorrect, prejudiced, or offensive statements from the source and are always grateful when readers inform us of cases that do not meet our rigorous quality standards – whether due to human error or changing cultural sensitivities – so that we can review and update our content.


All of our content is the result of continuous research and review as we seek to document and describe new language developments as they unfold, providing the world’s most trusted language content.