- availableFields(...)
- buckets(...)
- categorize(...)
- countDocs(...)
- get(...)
- match(...)
- search(...)
- totalHits(...)
(viewable under index.options
)
- appendOnly
- batchSize
- compositeField
- db
- fieldedSearch
- fieldOptions
- preserveCase
- storeable
- searchable
- indexPath
- logLevel
- nGramLength
- separator
- stopwords
- wildcard
Make sure that search-index
is npm installed and then do either
const searchIndex = require('search-index')
searchIndex(options, function(err, si) {
// si is now a new search index
})
or
require('search-index')(options, function(err, si) {
// si is now a new search index
})
or
const getData = function(err, si) {
// si is now a new search index
}
require('search-index')(options, getData)
options
can be any [option](#options and settings) which will then form the defaults of the
index until it is closed, or the options are changed.
Closes the index and the underlying data store. Generally this should happen automatically, and therefore this function need never be called. Only needed if you are doing very fast restarts.
index.close(function(err) {
if (!err) console.log('success!')
})
Returns a readable stream of all fields that can be searched in.
si.availableFields().on('data', function (field) {
// "field" is the name of a field that is searchable
}).on('end', function () {
// done
})
Return a readable stream of user defined aggregations, can be used to generate categories by price, age, etc.
si.buckets({
query: [
{
AND: {'*': ['*']}
}
],
buckets: [
{
field: 'price',
gte: '2',
lte: '3',
set: false
}
{
field: 'price',
gte: '4',
lte: '7',
set: false
}
]
}).on('data', function (data) {
// do something with the data
}).on('end', function () {
// finshed
})
query is a standard search-index
query that will specify a set
of documents
buckets is an array of buckets that must be an object with the following 4 properties:
- field (mandatory) The field name
- gte "greater that or equal to"
- limit Limit the entries that will be returned
- lte "less than or equal to"
- set if true- return a set of IDs. If false or not set, return a count
Collate documents under all possible values of the given field name, and return a readable stream
si.categorize({
query: [
{
AND: {'*': ['swiss', 'watch']}
}
],
category: {
field: 'manufacturer'
}
}).on('data', function (data) {
// do something with the data
}).on('end', function () {
// finshed
})
query is a standard search-index
query that will specify a set
of documents
catogory is an array of objects specifying fields to categorize on:
- field Name of the field to categorize on
- set if true- return a set of IDs. If false or not set, return a count
In addition the q object can have offset and pageSize, that will work the same way as for the /search endpoint.
Returns the total amount of docs in the index
si.countDocs(function (err, count) {
console.log('this index contains ' + count + ' documents')
})
Gets a document from the corpus
index.get(docIDs).on('data', function (doc) {
// doc is a document for each ID in docIDs
})
- docIDs an array of document IDs
Use match to create autosuggest and autocomplete functionality. See also here
index.match({
beginsWith: 'epub'
}).on('data', function (data) {
data.should.be.exactly(matches.shift())
}).on('end', function () {
done()
})
options is an object that can contain:
- beginsWith string default:'' return all words that begin with this string
- field string default:'*' perform matches on data found in this field
- threshold number default:3 only preform matches once
beginsWith
is longer than this number - limit number default:10 maximum amount of matches to return
- type string default:'simple' the type of matcher to use, can only be 'simple' for the time being.
Searches in the index. See also here
si.search({
query: [{
AND: {'*': ['gigantic', 'teddy', 'bears']}
}]
}).on('data', function (data) {
// do something cool with search results
})
q is an object that describes a search query and can contain the following properties:
-
query Object An object that specifies query terms and fields. For example
{'title': ['ronald', 'reagan']}
. An asterisk can be used as a wildcard for either fieldname or terms{'*': ['ronald', 'reagan']}
or{'title': ['*']}
. If documents have been indexed with an nGram length of 2 or more, it is possible to search for the phrase 'ronald reagan':{'*': ['ronald reagan']}
. AND or NOT conditions can be specified, and queries can be chained together to create OR statements:[ { AND: {'*': ['watch', 'gold'] }, NOT: {'name': ['apple'] } }, { AND: {'*': ['apple', 'watch'] } } ]
Find "watch" AND "gold" in all ("*") fields OR "apple" and "watch" in all fields
-
filters Object One or more objects added to the AND and/or NOT objects that filters the search results. Filters can only be applied to any searchable fields in your index. Filters are commonly used in conjunction with the selection of catgories and buckets.
[ { AND: { '*': ['watch', 'gold'], 'price': [{ gte: '1000', lte: '8' }] } } ]
-
offset number Sets the start index of the results. In a scenario where you want to go to "page 2" in a resultset of 30 results per page, you would set
offset
to30
. Page 3 would require anoffset
of60
, and so on. -
pageSize number Sets the size of the resultset.
Returns a count of the documents for the given query including those hidden by pagination
si.totalHits(q, function (err, count) {
console.log('the query ' + q + ' gives ' + count + ' results')
})
Returns a writeable stream that can be used to index documents into the search index.
Note that this stream cannot be used concurrently. If documents are
being sent on top of one another then it is safer to use
concurrentAdd
, however add
is faster and uses less resources.
- batchOptions is an object describing indexing options
add
will heed the following batchOptions
:
- appendOnly boolean: default:false If set to true, documents
will not be deleted before being added. This is useful when you
are creating an index from scratch and want to speed up indexing
time. Normally all documents should have unique IDs, or no
IDs. If you add a document with
appendOnly: true
to an index that already contains a document with that ID, then the new document will simply overwrite the old one withough deleting the previous one. In this way advanced users can augment existing documents by settingappendOnly: false
.
// s is a Readable stream in object mode
s.pipe(si.defaultPipeline(batchOptions))
.pipe(si.add())
.on('finish', function() {
// complete
})
An alternative to .add(...)
that allows adding by passing an array
of documents and waiting for a callback. Useful for environments where
node streams cannot be constructed (such as browsers).
Note that concurrentAdd
queues documents internally, so in a
scenario where unordered documents are being added rapidly from many
sources concurrentAdd
should be used.
- data is an array of documents
- batchOptions is an object describing indexing options
mySearchIndex.concurrentAdd(batchOptions, data, function(err) {
// docs added
})
An alternative to .del(...)
that allows concurrent deletion
Note that concurrentDel
queues documents internally, so in a
scenario where documents are being deleted a rapidly without
waiting for callbacks concurrentDel
should be used.
- documentIDs is an array of document IDs
mySearchIndex.concurrentDel(documentIDs, function(err) {
// docs deleted
})
Prepares a "standard document" (an object where keys become field names, and values become corresponding field values) for indexing. Customised pipeline stages can be inserted before and after processing if required.
options is an object where each key corresponds to a field name in the documents to be indexed and can contain the following settings:
- compositeField boolean: default:true should a composite (
*
) field be generated? Setting tofalse
saves space and speeds up indexing, but disables search on all fields - defaultFieldOptions Object default options to use for this batch
- fieldOptions Object overrides
defaultFieldOptions
, can have the following object values:- fieldedSearch boolean, default:true : can searches be carried out on this specific field
- nGramLength number, default:1 : length of word sequences to be indexed. Use this to capture phrases of more than one word.
- preserveCase boolean, default:true : preserve the case of the text
- searchable boolean default:true : is this field searchable?
- separator regex : A regex in the String.split() format that will be used to tokenize this field
- sortable boolean default:false : can this field be sorted on? If true field is not searchable
- stopwords Array, default: require('stopword').en An array of stop words. Other languages than english are available.
- storeable Array specifies which fields to store in index. You may want to index fields that are not shown in results, for example when dealing with synonyms
- weight number: default:0 this number will be added to the score for the field allowing some fields to count more or less than others.
- wildcard boolean: default:true Enables (
*
) search
Deletes one or more documents from the corpus
si.del(docIDs, function(err) {
return done()
})
- docIDs an array of document IDs referencing documents that are to be deleted.
Returns a writable stream that allows you to add documents to the index.
index.feed(options)
Options is an object that is passed on to defaultPipeline
and in
addition contains its own parameters
- objectMode boolean: default:false Specifies whether the stream will accept objects or strings
Empties the index. Deletes everything.
index.flush(function(err) {
if (!err) console.log('success!')
})
Use dbReadStream()
to create a stream of the underlying key-value
store. This can be used to pipe indexes around. You can for example
replicate indexes to file, or to other (empty) indexes
// replicate an index to file
si.dbReadStream(options)
.pipe(JSONStream.stringify('', '\n', ''))
.pipe(fs.createWriteStream('backup.json'))
.on('close', function() {
// done
})
// replicate an index to another search-index
replicator.dbReadStream({gzip: true})
.pipe(zlib.createGunzip())
.pipe(JSONStream.parse())
.pipe(replicatorTarget2.dbWriteStream())
.on('close', function () {
// done
})
options is an optional object that describes how the stream is formatted
- gzip If set to
true
, the readstream will be compressed into the gzip format
Use dbWriteStream()
to read in an index created by
dbReadStream()
.
si.dbReadStream(options)
.pipe(fs.createWriteStream('backup.json'))
.on('close', function() {
done()
})
options is an optional object that describes how the stream will be written
- merge If set to
true
, the writestream will merge this index with the existing one, if set to false the existing index must be empty
boolean
When adding docs- dont check to see if it they exist already.
number
Specifies how many documents to process, before merging them into the index. When the end of the stream is reached all remaning documents will be merged, even if batchsize is not reached.
boolean
Allow search across all fields (*
)
a levelup instance
The datastore.
boolean
If true, then the field is searchable.
boolean
Contains field specific overrides to global settings
Example on setting options on several fields:
fieldOptions: {
id: {
searchable: false
},
url: {
searchable: false
}
}
true
If true, case is preserved. For example: queries for "Potato" will not match "potato"
boolean
If true, a cache of the field is stored in the index
boolean
If true, this field will be searchable, if it is false, the field will be available, but not searchable
string
The location of the datastore. If db
is specified, then indexPath is ignored
string
A bunyan log level.
number or array or object
All valid definitions of nGramLength:
nGramLength = 1 // 1
nGramLength = [1,3] // 1 & 3
nGramLength = {gte: 1, lte: 3} // 1, 2 & 3
Specifies how to split strings into phrases. See https://www.npmjs.com/package/term-vector for examples
string
Specifies how strings are to be split, using a regex in the String.split() format
array An array of stopwords
ar
- Modern Standard Arabicbn
- Bengalida
- Danishde
- Germanen
- Englishes
- Spanishfa
- Farsifr
- Frenchhi
- Hindiit
- Italianja
- Japanese*nl
- Dutchno
- Norwegianpl
- Polishpt
- Portugueseru
- Russioansv
- Sweedishzh
- Chinese Simplified*
*Some languages like ja
Japanese and zh
Chinese Simplified have no space between words. For these languages you need to split the text into words before adding it to search-index. You can check out TinySegmenter for Japanese and chinese-tokenizer for Chinese.
boolean
Should wildcard search be generated for this field