Skip to content
mpc edited this page Sep 23, 2019 · 1 revision

NGrams

Basic usage

NGrams can be queried for in lexidb using the ngram query type. A value of n must be specified as well as a column to group the results using groupby. The context can be used to specify an area around which to build the ngram.

{
  "query": {
    "tokens": "{\"token\" : \"test\"}"
  }, 
  "result": {
    "groupby": "token", 
    "type": "ngram", 
    "n": "3", 
    "context": "2"
  }
}
"ngrams":[
  {"key":"to test the","value":29},
  {"key":"test the opinion","value":20},
  {"key":"the test is","value":12},
  {"key":"good character test","value":11}
]

Options

Group by

groupby can be used to build up a set of NGrams based on another column, for example, POS tags. The query above when performed with "groupby": "pos" would return;

"ngrams":[
  {"key":"JJ NN1 NN1","value":52},
  {"key":"VVI AT NN1","value":37},
  {"key":"NN1 TO VVI","value":34},
  {"key":"AT JJ NN1","value":32}
]

Context

context can be used to widen the window from which to build the NGrams. It can also take a value of 0. This can be used if for example wanting to build bigrams when searching for a phrase or sequence e.g.;

{
  "query": {
    "tokens": "{\"pos\":\"JJ\"}{\"pos\":\"NN\"}"
  }, 
  "result": {
    "groupby": "token", 
    "type": "ngram", 
    "n": "2", 
    "context": "0"
  }
}

returns;

"ngrams":[
  {"key":"those young","value":41},
  {"key":"young people","value":41},
  {"key":"those vulnerable","value":3},
  {"key":"vulnerable people","value":3},
  {"key":"social media","value":2},
  {"key":"global media","value":2},
  {"key":"those displaced","value":2},
  {"key":"those global","value":2},
]
Clone this wiki locally