July 2024 Updates

Oxford Languages July 2024 updates

We have more than 60 languages available to license. Our dedicated editorial team continuously updates content for our monolingual and bilingual dictionaries, along with our thesaurus.
 

Following the extensive content update for our English datasets in June, we are excited to announce the next update that our editors have been working on, this time focused on other languages.

 

New entries, phrases and senses were added to monolingual dictionaries such as:

  • Spanish
  • Korean

 

Many bilingual dictionaries were also updated:

  • Arabic-English
  • Chinese-English
  • German-English
  • French-English
  • Italian-English
  • Korean-English
  • Portuguese-English
  • Russian-English
  • Spanish-English

 
Our July release also focused on data for our Spanish thesaurus

This release includes the addition of 114 new entries, 20 new phrases, and 20 new senses for existing entries.
 
New entries include:

  • Andropausia (andropause)
  • Bífobo (biphobic)
  • Desestigmatizar (destigmatize)
  • Edadismo (ageism)
  • Ecoansiedad (eco-anxiety)
  • Hiperinmune (hyperimmune)
  • Reduflación (shrinkflation)
  • Sobrepesca (overfishing)

 

The content update resulted in the addition of approximately 1270 new entries. New entries include:

  • 홈캉스
  • 찬물베개
  • 팬데믹
  • 초연결사회

 

The Korean monolingual lexical data also had data improvements. Some changes have been made to the list of labels and parts of speech used in the dataset, including the addition of new parts of speech and sensitivity labels.

 
Two new sensitivity labels have been added to the data in order to appropriately label the new content that has been added to the dataset.
 
The new sensitivity labels are:

  • 비어 (vulgar slang; derogatory)
  • 못마땅함 (derogatory)

 
New part of speech labels in the data. It contains the following new parts of speech:

  • 어말어미
  • 명사∙관형사
  • 의존명사
  • 명사 또는 관형사
  • 명사 또는 부사
  • 자동사

 
In addition, other tags like region and register have been improved to provide clear differentiation.

 
Finally, the tagging of identical headwords has been improved by adding and amending homograph attributes.

The editorial team included over 200 new and updated entries across Arabic-English and English-Arabic.
 
The new entries include:

  • مُتَحَ سس (sensitive, sensor)
  • اسْتِّحْلا ب (milking, extraction)
  • شَعْبَوي ة (populism)
  • طَوائِّفي ة (sectarianism)
  • ماحِّق (devastating)
  • Abbasid (عَب اس ي)
  • Close combat (قِّتال مُباشِّ ر/قِّتال مُتَلاحِّم)
  • Deep learning (التَعَلُّم العَميق/التَعَلُّم المُتَعَ مق)

 
A new domain label (Farming) and a new region label (Indian) have been added to the English-Arabic data.

Content updates for the bilingual Chinese (Mandarin, simplified Chinese) data includes around 100 new terms that were added to the English-Chinese side, including:

  • Air fryer (空气炸锅)
  • Deepfake (深度伪造)
  • Passkey (通行密钥)
  • WOC (非白人女性)

The German bilingual includes over 250 new and updated entries across German-English and English-German. This year’s update focused on the evolving language of sex and gender.
 
New words include:

  • Gender reassignment surgery (geschlechtsangleichende Operation)
  • Protected characteristic (geschütztes Merkmal)
  • Ex-husband (Ex-Mann)
  • Hinterherrufen (call after, catcall)
  • FLINTA* (people who are not cisgender men)
  • Männlich gelesen (masculine presenting)
  • Daten (date)

 
Updates on the data were also made. For instance, stress and syllable markers from the German headwords were removed but are still present within attribute on the headword element.

The French bilingual update includes over 200 new or updated entries across English-French and French-English and covers additions such as:

  • Air fryer (friteuse sans huile)
  • Dead name (morinom)
  • Deepfake (hypertrucage)
  • Menstrual cup (coupe menstruelle)
  • Misgender (mégenrer)
  • Prompt engineering (ingénierie des invites)
  • Vision impairment (déficience visuelle)

 
Structural data were also made to this data. Like the German-English dataset, the stress and syllable markers have been removed from the French headwords but are still present within the one of the attributes on the headword element.
 
Other structural data updates were also made to the bilingual dataset.

In this round of update, the Italian bilingual data included approximately 100 new terms on the English-Italian side. Some examples are:

  • Air fryer (friggitrice ad aria)
  • Birth sex (sesso alla nascita)
  • Monkeypox (vaiolo delle scimmie)
  • Ultra-processed (ultralavorato)

Release for the Korean bilingual dataset includes the addition of approximately 800 new entries. New entries include:

  • COVID-19
  • cottagecore
  • digital nomad
  • non-binary
  • self-isolate
  • top surgery
  • WFH
  • 샌드위치 데이 (bridge day)
  • 임금 절벽 (wage stagnation)
  • 랜선 여행 (armchair travel)
  • t라방 (livestreaming)
  • 번아웃 (burnout)

 
Additionally, some data elements were reviewed and improved. For instance, sensitivity improvements have been made to remove references to curse words from entries where they are not relevant.

For this year content update for our Portuguese bilingual dataset, around 100 new terms have been added to the English-Portuguese side, including:

  • Birth name (nome de nascimento)
  • Climate crisis (crise climática)
  • Multi-factor authentication (autenticação multifator)
  • Ultra-processed (ultraprocessado)

 
Instances of black referring to ethnicity have been changed to Black in both sides, in line with current US English usage, as this dataset primarily uses US English.

The content update made in the Russian bilingual dictionary data includes approximately 50 new and updated entries across Russian-English and English-Russian. New words include:

  • Asexuality (асексуа́льность)
  • Binder (би́ндер, ба́йндер, утя́жка для груди́)
  • Bisexuality (бисексуа́льность)
  • родово́й кана́л (birth canal)
  • би (bi)

 
Updates around the data was also included, where stress and syllable markers have been removed from the Russian headwords but are still present within the one of the attributes on the headword element.

The editorial team added and revised approximately 300 entries across English-Spanish and Spanish-English. The new additions include the entries:

  • Air fryer (freidora sin aceite)
  • Dead name (necrónimo)
  • Hearing loss (pérdida de audición)
  • Menstrual cup (copa menstrual)
  • Prompt engineering (ingeniería de instrucciones)
  • Protected characteristic (característica protegida)
  • Screen sharing (pantalla compartida)
  • Ultra-processed (ultraprocesado)
  • Voice note (nota de voz)

 
Sensitivity review has been carried out across both sides to ensure that sensitive terms are labelled and used appropriately. Besides the content update, updates on the data were also made, including the expansion of abbreviations for ‘somebody’ (‘sb’), ‘something’ (‘sth’), and ‘alguien’ (‘algn’) across both sides of the bilingual dataset.

To complete the content update for our Spanish datasets made this time, our editorial team also revised and updated the Spanish thesaurus dataset.
 
There was a comprehensive review project around this data covering sensitivity review where particular attention was paid to inclusivity, sexism, LGBT+ phobia, and discrimination against minorities.
 
Sensitivity labelling has been improved to enhance clarity and avoid repetition. Usage notes in the relevant entries have been reviewed as well.
 

The team also revised entries in the data to reflect present use of the Spanish language, and appropriate register labels have been applied to the relevant headwords, senses, and synonyms.
 
The content update for the thesaurus data also includes the expansion of the headword list where approximately 1000 entries have been added. New entries include:

  • Cisgénero (cisgender)
  • Computadorizar (to computerize)
  • Encriptar (to encrypt)
  • Escúter (scooter)
  • Nasobuco (face mask)
  • Videoarbitraje (VAR)
  • Videollamada (video call)

 
The thesaurus dataset also went through a structural update to improve data usage.