Monolingual datasets

Spanish

This release includes the addition of 114 new entries, 20 new phrases, and 20 new senses for existing entries.

Korean

The content update resulted in the addition of approximately 1270 new entries.

The Korean monolingual lexical dataset also included data improvements. Some changes have been made to the list of labels and parts of speech used in the dataset, including new additions. Other tags like region and register have been improved to provide clear differentiation.

Finally, the tagging of identical headwords has been improved by adding and amending homograph attributes.

Bilingual datasets

Arabic-English

The editorial team included over 200 new and updated entries across Arabic-English and English-Arabic.

A new domain label (Farming) and a new region label (Indian) have been added to the English-Arabic data.

Chinese-English

Content updates for the bilingual Chinese (Mandarin, simplified Chinese) data includes around 100 new terms that were added to the English-Chinese side.

German-English

The German bilingual includes over 250 new and updated entries across German-English and English-German. This year’s update focused on the evolving language of sex and gender.

Updates on the data were also made. For instance, stress and syllable markers from the German headwords were removed but are still present within attribute on the headword element.

French-English

The French bilingual update includes over 200 new or updated entries across English-French and French-English.

Structural data were also made to this data. Like the German-English dataset, the stress and syllable markers have been removed from the French headwords but are still present within the one of the attributes on the headword element.

Other structural data updates were also made to the bilingual dataset.

Italian-English

In this round of update, the Italian bilingual data included approximately 100 new terms on the English-Italian side.

Korean-English

Release for the Korean bilingual dataset includes the addition of approximately 800 new entries.

Additionally, some data elements were reviewed and improved. For instance, sensitivity improvements have been made to remove references to curse words from entries where they are not relevant.

Portuguese-English

For this year content update for our Portuguese bilingual dataset, around 100 new terms have been added to the English-Portuguese side.

Instances of black referring to ethnicity have been changed to Black in both sides, in line with current US English usage, as this dataset primarily uses US English.

Russian-English

The content update made in the Russian bilingual dictionary data includes approximately 50 new and updated entries across Russian-English and English-Russian.

Updates around the data was also included, where stress and syllable markers have been removed from the Russian headwords but are still present within the one of the attributes on the headword element.

Spanish

The editorial team added and revised approximately 300 entries across English-Spanish and Spanish-English.

Sensitivity review has been carried out across both sides to ensure that sensitive terms are labelled and used appropriately. Besides the content update, updates on the data were also made, including the expansion of abbreviations for ‘somebody’ (‘sb’), ‘something’ (‘sth’), and ‘alguien’ (‘algn’) across both sides of the bilingual dataset.

Thesaurus data

To complete the content update for our Spanish datasets made this time, our editorial team also revised and updated the Spanish thesaurus dataset.

There was a comprehensive review project around this data covering sensitivity review where particular attention was paid to inclusivity, sexism, LGBT+ phobia, and discrimination against minorities.

Sensitivity labelling has been improved to enhance clarity and avoid repetition. Usage notes in the relevant entries have been reviewed as well.

The team also revised entries in the data to reflect present use of the Spanish language, and appropriate register labels have been applied to the relevant headwords, senses, and synonyms.
 
The content update for the thesaurus data also includes the expansion of the headword list where approximately 1000 entries have been added.

The thesaurus dataset also went through a structural update to improve data usage.