Skip to content

User Dictionary

Jos Denys edited this page Oct 16, 2020 · 37 revisions

The user dictionary is supported as of version 1.0.

Via the user dictionary you will be able to

  • influence the sentence boundary detection by defining abbreviations and sentence-ending strings
   engine = iknowpy.iKnowEngine()
   user_dictionary = iknowpy.UserDictionary()
   user_dictionary.add_sent_end_condition("Fr.", False)   # suppress 'Fr.' as a sentence terminator.
   ret = engine.load_user_dictionary(user_dictionary)
   engine.index("some text Fr. and following.", "en")

   # Normally 'Fr.' would split the sentence, but due to the 'False' parameter of method 'add_sent_end_condition()', this remains one sentence.
  • enforce words or sequences of words to get a specified role (Concept - Relation - PathRelevant - NonRelevant)
   user_dictionary.add_concept("one concept")   # mark as a concept
   user_dictionary.add_relation("one relation") # mark as a relation
   user_dictionary.add_non_relevant("crap")     # mark as non relevant 
  • define additional Negation markers
   user_dictionary.add_negation("w/o")  # mark w/o as a negation
  • define Sentiment markers
   user_dictionary.add_positive_sentiment("great")  # mark as a positive sentiment
   user_dictionary.add_negative_sentiment("awfull")  # mark as a negative sentiment
  • define Time markers
   user_dictionary.add_time("future")  # mark as a time attribute
  • define units and numbers for Measurements
   user_dictionary.add_unit("Hg")               # mark as a unit
   user_dictionary.add_number("magic number")   # mark as a number

User dictionary labels are assigned before lexrep lookup, and override the lexrep.csv labels. However, the language rules need to pick up the UD labels to make them effective. If the language model does not support a specific label, it will not be taken into account. For an overview of the current state of UD label support, see following table.

User Dictionary support per language.

For some extra information on sentiment analysis, see this very interesting IRIS article 👍

Sentiment markers in IRIS