OED Labs

OED Text Visualizer

Here at the OED, we are exploring new ways for researchers to harness the power of the OED dataset.

 

Feedback from the academic community has shown us that while OED.com is a highly valuable resource for analyzing texts, it can also be a time-consuming experience. This is particularly acute for researchers working with historical texts, which often adds an extra level of complexity.

 

In response to this feedback, we are developing a new application called the OED Text Annotator.

 

The OED Text Annotator is a powerful data engine that tokenizes, lemmatizes, and disambiguates each word within a digitized text and then annotates each sense with OED data.

 

Our aspiration for this application is to enable researchers to more easily conduct in-depth or computational textual analysis on any digitized text using the full richness of the OED dataset.

 

The OED-annotated texts generated by this application have the potential for use in numerous formats and avenues of research. To showcase one such use, we have developed a tool called the OED Text Visualizer to distil the data and inspire new thinking.

 

The OED Text Visualizer takes the annotated text output of the OED Text Annotator and displays the etymologies and first usages, two core components of the OED’s data, in a visual format to demonstrate how annotation paired with simple visualization can open up new areas of questioning and means of discovery.

 

You are invited to test the OED Text Visualizer. Click on the button to access the tool and user information.

Frequently Asked Questions

 


What have been the improvements for version 2 of the beta?
 

  • Larger visualization interface
  • Ability to dynamically ‘zoom in’ on lower frequency words ie. the ‘juicier stuff’
  • Option to toggle between immediate and ulterior etymon information in the visualization
  • The visualization can now show up to two etymon languages where required
  • Option to toggle on or off: word shown in text in line with the dot in the visualization
  • Expanding the ‘other’ category for etymon to show further languages
  • Improvement for performance on post-1750 texts
  • Improved frequency data
  • Improvement in the handling of multi words phrases
  • Improvement in the handling of proper names

 

What are the abbreviations for part of speech?

 

Please refer to this key to explain the part of speech abbreviations.

 

What’s the difference between immediate and ulterior etymon?

 

‘Language of immediate origin’ is the language from which the word has been either directly inherited or borrowed or within which it has been formed, and the ‘Language of ulterior origin’ is the language from which the word is more remotely derived.