Skip to content

Language Models

Jos Denys edited this page Nov 10, 2020 · 7 revisions

What is a "knowledge base" or "language model"?

A knowledge base (KB) or language model is used by the iKnow engine. It is the "smart" part that takes care of the language-specific analysis of an input text, splitting the text into sentences and further into Entities (Concepts, Relations, PathRelevant and NonRelevant) and capturing the context of those entities through Attributes. A language model consists of a package of files (e.g. English), also called a Knowledgebase. The iKnow engine uses one or more Knowledgebases (one per language) to perform the indexing task.

A complete set has the following files :

  • labels.csv : label collection for the language model.
  • lexreps.csv : labeled lexrep collection,
  • rules.csv : inference rules, disambiguation and attribute span detection.

These 3 represent the core model data, they are closely connected (lexreps and rules use labels, rules can generate extra (literal) labels).

These are optional, the Japanese language model has no such data.

  • metadata.csv : global model settings, will influence the processing. All parameters are set for optimal performance, change at your own risk !

The linguistic models are based on grammatical distribution. 'Lexical representations' or 'lexreps' get labels according to their distribution. Grammatically ambiguous lexreps like 'works' (noun or verb) get ambiguous labels. A set of rules aims to determine the role that applies in a particular context.
An example:
works - ENVerbCon (verb or noun)
ENArtPosspron | ENVerbCon -> * | ENCon -> ENVerbCon becomes ENCon after an article or possessive pronoun
ENPerspron_subj | ENVerbCon -> * | ENVerb -> ENVerbCon becomes ENVerb after a personal pronoun that can act as a subject

To learn more about the structure of a language model, see Language model files. To learn more about the content of entities and attributes, see Language model guidelines.