You are here

Some Issues with Building a Multilingual Wordnet

Authors: 

Francis Bond, Luis Morgado da Costa, Michael Wayne Goodman, John McCrae, Ahti Lohk

Publication Type: 
Refereed Conference Meeting Proceeding
Abstract: 
In this paper we discuss the experience of bringing together over 40 different wordnets. We introduce some extensions to the GWA wordnet LMF format proposed in Vossen et al. (2016) and look at how this new information can be displayed. Notable extensions include: confidence, corpus frequency, orthographic variants, lexicalized and non-lexicalized synsets and lemmas, new parts of speech, and more. Many of these extensions already exist in multiple wordnets – the challenge was to find a compatible representation. To this end, we introduce a new version of the Open Multilingual Wordnet (Bond and Foster, 2013), that integrates a new set of tools that tests the extensions introduced by this new format, while also ensuring the integrity of the Collaborative Interlingual Index (CILI: Bond et al., 2016), avoiding the same new concept to be introduced through multiple projects.
Proceedings: 
Proceedings of the 12th Language Resource and Evaluation Conference (LREC 2020)
Digital Object Identifer (DOI): 
10.5281/zenodo.3842645
Publication Date: 
06/12/2020
Research Group: 
Institution: 
National University of Ireland, Galway (NUIG)
Open access repository: 
Yes