Frequently asked questions | Oxford Languages

Do you offer audio data?

Yes, we offer audio files for every word in English, which are referenced in respective entries of the English monolingual dictionary.

The English datasets (ODE* and NOAD**) will include the audio files for each word in the dictionary, offered in MP3 or WAV format. The size of the audio delivery is approximately 2GB. Our pronunciations assets offer a combination of audio data along with pronunciations data (which includes metadata such as transcription of each word), which comes in tabular format. If you are interested in audio data for specific languages, please contact us.

*ODE: Oxford Dictionary of English (British English)

**NOAD: New Oxford American Dictionary (US English)

What do dictionary updates include?

Oxford Languages provides two different types of updates.

The most frequent is content updates, which correspond to the addition of new words or phrases, revisions of entries, and changes made within sensitivity, etymology and audio. An overview of these updates can be found in the Content Release Notes, which are included with the annual or bi-annual update deliveries.

The other type of update is the data updates, which are less frequent. These correspond to improvements in specific parts of our data structure, such as the removal, replacement, and addition of elements or attributes, leading to some differences from previous versions of the data. An overview of these updates can be found in the Data Release Notes, which are included with the respective update deliveries.

What languages and types of dataset do you offer?

Oxford Languages offers data for over 60 languages. The range of datasets available per language varies from one language to another. Types of datasets include monolingual, bilingual and bilingualised dictionaries, audio data, thesauri, pronunciation datasets, morphology datasets, wordlists, and also corpora.

If you are looking for data in specific languages, please ask about their availability as API or one-off file deliveries, or check out this link to see what’s available on the API.

Does the dictionary data include synonyms?

No, but it does include sense-level linking to the Oxford Thesaurus of English within the <linkGroup> element. This seamless integration of the dictionary with the thesaurus enables the user to easily link a word's synonyms with its dictionary entry. This supports various dictionary display use cases.

For further information about synonym integration, please ask about our thesaurus assets, which you are able to license alongside the dictionary assets.

Does the dictionary offer syllabification of words?

The US dictionary (NOAD) contains this information in the "syllabified" attribute within <hw> (headword). This provides syllabified word forms for headwords throughout the dictionary, e.g. fol·low·er. Currently, this is only included for headwords/lemmas.

For further data on syllabification, please ask about our English Pronunciations Asset, which includes syllabified IPA transcriptions across all inflected forms of words. This asset also comes with other pronunciation features, such as sound files and pronunciation variations.

How are word inflections and derivatives treated in the dictionary?

Inflections can be found in <infg> elements (e.g. "write" has irregular inflections "wrote" and "written"), whereas a complete list of inflections are found in the <morphSet>, whether regular or irregular. Inflected word forms do not have their own separate entries in the dictionary. For more complex inflected forms and metadata, please ask for our separate morphology assets.

Derivations, on the other hand, such as "mover" or "unmoved" would have their own separate entries and would typically not be included within the root word entry, "move". Often, derivations are included within the <subEntry> elements in the root word entry. Whether the derivation is included in the root entry or as a standalone entry depends on the word's frequency as well as our editors' judgement.

How are word senses encoded in the data?

Within an entry, the various definitions are ordered by part of speech, followed by senses. The element <sg> (sense group) includes one or more of the element <se1>, each of which represents the part of speech, e.g. "noun", "verb", etc. Within <se1>, the element <se2> represents the separate senses the word can have under that part of speech. Within each sense section, senses and their definitions are ordered according to frequency, such that the most frequent sense of the word will appear first. Each separate sense is labelled with its own unique ID.

What is a dictionary entry?

The Oxford Dictionary of English (ODE) and the New Oxford American Dictionary (NOAD) are English monolingual dictionaries that offer meaning information for words. A dictionary entry consists of a headword (with homonym numbers for different words that are spelled identically but have completely unrelated meanings and histories), line breaks or syllable breaks, pronunciations, parts of speech, labels for region/register/subject, senses, definitions, example sentences, phrases/idioms, derivatives, and origins (etymology). Some entries also include notes giving usage, technical, or encyclopedic information. Other features include semantic class, domain class, sensitivity classification, and top 1,000 frequent words.

Additional information regarding dictionary content can be found here.

What is a thesaurus entry?

The Oxford Thesaurus of English (OTE) is a type of thesaurus sometimes called a "synonyms dictionary", meaning that it is arranged by headword and gives synonyms based on their relation to a specific word-sense. A thesaurus entry consists of senses containing a number of synonym groups. Some of the synonym groups contain words that are extremely close in meaning, connotation, register, etc., and others that are more distant. The Oxford Thesaurus of English is organized so that the closest synonyms for a given sense, the ones that are the best match in meaning, come first.

In which format/formats does the data come in?

Our dictionaries can be offered as one-time delivery or on-demand via the API. Dictionaries delivered as a one-time delivery are typically in XML format. Other formats can also be considered under special requests. Dictionary data delivered via the API is in JSON format. Other datasets such as pronunciation and morphology datasets may vary in format and may be offered in XML, JSON or in tabular format.

For more information about the API, including endpoints and their capabilities, please visit the Oxford Dictionaries API website.

What is the difference between the API and bulk data you offer?

The main difference between our API and bulk data is the format. While the API outputs in JSON, our bulk datasets are typically offered in XML. The API data has a simpler structure in order to accommodate users who prefer a quicker and more intuitive entry point to the data. Although the bulk data may have a more complex structure, it also provides some more depth and detail in its data features, such as more intricate lexical information as well as convenient links to other datasets.

If you are looking for data in specific languages, please ask about their availability as API or bulk deliveries, or check out this link to see what's available on the API.

If you would like to see more specific differences between API and bulk XML data, please contact your Customer and Partner Success Manager and ask for our feature-mapping documentation.

I am interested in one of your products, but I am not sure whether it will suit my needs. Do you offer samples?

Yes, we offer free samples of our datasets to allow you to test them out in your products. This can be done for bulk data, for which we would typically offer data for one letter of the alphabet, or for API access, for which we offer 500 free calls across all endpoints and languages. During this sample phase, our Customer Success team is available to support you in understanding and ingesting the sample data.

Australian English	New Zealand English
Australian & New Zealand English	Philippine English
British English	Scottish English
Canadian English	Southeast Asian English
East African English	South African English
Indian English	South Asian English
Irish English	US English
Nigerian English	Welsh English
North American English	West African English
Northern England	West Indian English
Northern Irish English

Archaic	Ironic
Child language	Literary
Dated	Military slang
Derogatory	Nautical slang
Dialect	Offensive
Euphemistic	Rare
Formal	Rhyming slang
Historical	Technical
Humorous	Trademark
Informal	Vulgar slang