Lexical Datasets for NLP: Domain-specific language data

At Oxford Languages, we provide domain-specific data that supports language models used within specific industries. Our team of experienced linguists curate this data and organize it using taxonomies present in our dictionary data, ensuring that enterprises receive the best possible support for their language models.

Our data is developed using trustworthy resources, including our flagship English dictionaries (Oxford Dictionary of English and New Oxford American Dictionary) and our pronunciation data program. We offer two domains- medical and finance, both of which are constantly updated with the latest evidence from the world's largest language research program, including the multi-billion-word Oxford English Corpus.

The data provides high-level and granular taxonomies to support language models, making it the perfect resource for optimizing language models for specific industries.

Domain-specific data