ТОП просматриваемых книг сайта:
Linked Lexical Knowledge Bases. Iryna Gurevych
Читать онлайн.Название Linked Lexical Knowledge Bases
Год выпуска 0
isbn 9781681731841
Автор произведения Iryna Gurevych
Жанр Программы
Серия Synthesis Lectures on Human Language Technologies
Издательство Ingram
• Sense links—As for Wiktionary, mostly links to related Wikipedia articles are given to provide more background knowledge about particular concepts.
1.3 STANDARDS
Since LKBs play an important role in many NLP tasks and are expensive to build, the capability to exchange, reuse, and also merge them has become a major requirement. Standardization of LKBs plays an important role in this context, because it allows to build uniform APIs, and thus facilitates exchange and reuse, as well as integration and merging of LKBs. Moreover, applications can easily switch between different standardized LKBs.
1.3.1 ISO LEXICAL MARKUP FRAMEWORK
The ISO standard Lexical Markup Framework (LMF) [Calzolari et al., 2013, Francopoulo and George, 2013, ISO24613, 2008] was developed to address these issues. LMF is an abstract standard, it defines a meta-model of lexical resources, covering both NLP lexicons and machine readable dictionaries. The standard specifies this meta-model in the Unified Modeling Language (UML) by providing a set of UML diagrams. UML packages are used to organize the metamodel and each diagram given in the standard corresponds to an UML package. LMF defines a mandatory core package and a number of extension packages for different types of resources, e.g., morphological resources or wordnets. The core package models a lexicon in the traditional headword-based fashion, i.e., organized by lexical entries. Each lexical entry is defined as the pairing of one to many forms and zero to many senses.
The abstract meta-model given by the LMF standard is not immediately usable as a format for encoding (i.e., converting) an existing LKB [Tokunaga et al., 2009]. It has to be instantiated first, i.e., a full-fledged lexicon model has to be developed by choosing LMF classes and by specifying suitable attributes for these LMF classes.
According to the standard, developing a lexicon model involves
1. selecting LMF extension packages (the usage of the core package is mandatory),
2. defining attributes for the classes in the core package and in the extension packages (as they are not prescribed by the standard), and
3. explicating the linguistic terminology, i.e., linking the attributes and other linguistic terms introduced (e.g., attribute values) to standardized descriptions of their meaning.
Selecting a combination of LMF classes and their relationships from the LMF core package and from the extension packages establishes the structure of a lexicon model. While the LMF core package models a lexicon in terms of lexical entries, the LMF extensions provide classes for different types of lexicon organization, e.g., covering the synset-based organization of wordnets or the semantic frame-based organization of FrameNet.
Fixing the structure of a lexicon model by choosing a set of classes contributes to the interoperability of LKBs, as it determines the high-level organization of lexical knowledge in a resource, e.g., whether synonymy is encoded by grouping senses into synsets (using the Synset
class) or by specifying sense relations (using the SenseRelation
class), which connect synonymous senses (i.e., synonyms). Defining attributes for the LMF classes and specifying the attribute values is far more challenging than choosing from a given set of classes, because the standard gives only a few examples of attributes and leaves the specification of attributes to the user in order to allow maximum flexibility.
Finally, the attributes and values have to be linked to a description of their meaning in an ISO compliant Data Category Registry [ISO12620, 2009, Windhouwer and Wright, 2013]. For example, ISOcat15 was the first implementation of the ISO Data Category Registry standard [ISO12620, 2009].16 The data model defined by the Data Category Registry specifies some mandatory information types for its entries, including a unique administrative identifier (e.g., partOfSpeech
) and a unique and persistent identifier (PID, e.g., http://www.isocat.org/datcat/DC-396) which can be used in automatic processing and annotation, in order to link to the entries. From a practical point of view, a Data Category Registry can be considered as a repository of mostly linguistic terminology which provides human-readable descriptions of the meaning of terms used in language resources. For instance, the meaning of many terms used for linguistic annotation is given in ISOcat, such as grammaticalNumber, gender, case. Accordingly, a Data Category Registry can be used as a glossary: users can look up the meaning of a term occurring in a language resource by consulting its entry in the Data Category Registry.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.