-
Notifications
You must be signed in to change notification settings - Fork 1
NGrams
mpc edited this page Sep 23, 2019
·
1 revision
NGrams can be queried for in lexidb using the ngram
query type. A value of n
must be specified as well as a column to group the results using groupby
. The context
can be used to specify an area around which to build the ngram.
{
"query": {
"tokens": "{\"token\" : \"test\"}"
},
"result": {
"groupby": "token",
"type": "ngram",
"n": "3",
"context": "2"
}
}
"ngrams":[
{"key":"to test the","value":29},
{"key":"test the opinion","value":20},
{"key":"the test is","value":12},
{"key":"good character test","value":11}
]
groupby
can be used to build up a set of NGrams based on another column, for example, POS tags. The query above when performed with "groupby": "pos"
would return;
"ngrams":[
{"key":"JJ NN1 NN1","value":52},
{"key":"VVI AT NN1","value":37},
{"key":"NN1 TO VVI","value":34},
{"key":"AT JJ NN1","value":32}
]
context
can be used to widen the window from which to build the NGrams. It can also take a value of 0
. This can be used if for example wanting to build bigrams when searching for a phrase or sequence e.g.;
{
"query": {
"tokens": "{\"pos\":\"JJ\"}{\"pos\":\"NN\"}"
},
"result": {
"groupby": "token",
"type": "ngram",
"n": "2",
"context": "0"
}
}
returns;
"ngrams":[
{"key":"those young","value":41},
{"key":"young people","value":41},
{"key":"those vulnerable","value":3},
{"key":"vulnerable people","value":3},
{"key":"social media","value":2},
{"key":"global media","value":2},
{"key":"those displaced","value":2},
{"key":"those global","value":2},
]
- Basics
- Schema
- Query