-
Notifications
You must be signed in to change notification settings - Fork 19
Language Models
A knowledge base (KB) or language model is used by the iKnow engine. It is the "smart" part that takes care of the language-specific analysis of an input text, splitting the text into sentences and further into Entities (Concepts, Relations, PathRelevant and NonRelevant) and capturing the context of those entities through Attributes. A language model consists of a package of files (e.g. English), also called a Knowledgebase. The iKnow engine uses one or more Knowledgebases (one per language) to perform the indexing task.
A complete set has the following files :
- labels.csv : label collection for the language model.
- lexreps.csv : labeled lexrep collection,
- rules.csv : inference rules, disambiguation and attribute span detection.
These 3 represent the core model data, they are closely connected (lexreps and rules use labels, rules can generate extra (literal) labels).
- acro.csv : acronym collection.
- filter.csv : postprocess entity filter.
- prepro.csv : preprocess entity filter.
- regex.csv : lexrep regular expressions, referenced in lexreps.csv.
These are optional, the Japanese language model has no such data.
- metadata.csv : global model settings, will influence the processing. All parameters are set for optimal performance, change at your own risk !
The linguistic models are based on grammatical distribution. 'Lexical representations' or 'lexreps' get labels according to their distribution. Grammatically ambiguous lexreps like 'works' (noun or verb) get ambiguous labels. A set of rules aims to determine the role that applies in a particular context.
An example:
works - ENVerbCon (verb or noun)
ENArtPosspron | ENVerbCon -> * | ENCon -> ENVerbCon becomes ENCon after an article or possessive pronoun
ENPerspron_subj | ENVerbCon -> * | ENVerb -> ENVerbCon becomes ENVerb after a personal pronoun that can act as a subject
To learn more about the structure of a language model, see Language model files. To learn more about the content of entities and attributes, see Language model guidelines.