Skip to content

agco/elastic-harvesterjs

Repository files navigation

Build Status

Elastic-Harvest

Elastic-Harvest is a Nodejs implementation of the JSON API Search Profile.

This library ties together harvester.js and elasticsearch to offer the required linked resource filtering and aggregation features.

Apart from that it also provides a number of helper functions to synchronize harvester.js/mongoDB resources with an elasticsearch backend.

Elasticsearch Tools

Find useful elastic-search tools as well as their documentation in /non-functionals.

Find documentation for querying elasticharvester-powered endpoints here

Features

  • Aggregations : stats, extended_stats, top_hits, terms
  • Primary and Linked resource filtering interop
  • Top_hits aggregation interop with JSON API features, inclusion and sparse fieldsets #6

Roadmap

  • More aggregations : min, max, sum, avg, percentiles, percentile_ranks, cardinality, geo_bounds, significant_terms, range, date_range, filter, filters, missing, histogram, date_histogram, geo_distance
  • Reliable harvester.js/mongoDB - Elasticsearch data synchronisation ( oplog based )
  • Support adaptive queries, use the ES mapping file to figure out whether to use parent/child or nested queries / aggregations
  • Use Harvest associations + ES mapping file to discover which Mongodb collections have to be synced rather than having to register them explicitly
  • Bootstrap elasticsearch with existing data from Harvest resources through REST endpoint
  • Bootstrap elasticsearch mapping file through REST endpoint

Dependencies

elasticSearch v1.4.0+

Usage

var Elastic_Search_URL = process.env.BONSAI_URL || "http://127.0.0.1:9200";
var Elastic_Search_Index = "dealer-api";
var type = "dealers";

Create elastic search endpoint (NB: api changed in v1.0.0)

    var harvestApp = harvest(options);

    var peopleSearch;

    var peopleSearchRoute;

    //This circumvents a dependency issue between harvest and elastic-harvest.
    harvestApp.router.get('/people/search', function(){
        peopleSearchRoute.apply(peopleSearch,arguments);
    });

    harvestApp
        .resource('person', {
            name: String
            });

    peopleSearch = new ElasticHarvest(harvest_app, Elastic_Search_URL,Elastic_Search_Index, type);

    peopleSearchRoute  = peopleSearch.route;

    peopleSearch.setHarvestRoute(harvestApp.route('person'));
    
    peopleSearch.enableAutoSync("person");

Create an :after callback & sync elastic search after each item is posted to harvest

#####Note - only 1 "after" callback is allowed per endpoint, so if you enable autosync, you're giving it up to elastic-harvest.

dealerSearch.enableAutoSync("dealer");

Alternative way to create an :after endpoint & sync elastic search. This approach gives you access to do more in the after callback.

this.harvest_app.after("dealer", function (req, res, next) {
    if (req.method === 'POST' || (req.method === 'PUT' && this.id)) {
        return dealerSearch.expandAndSync(this);
    } else {
        return this;
    }
});

Expand an object's links:

dealerSearch.expandEntity(dealer);

Send an object to elastic search after expanding it's links:

dealerSearch.expandAndSync(dealer);

Send an object to elastic search without expanding it's links:

dealerSearch.sync(dealer);

Delete an object in elastic search: (added in 0.0.3)

dealerSearch.delete(dealer.id);

Create an :after callback & keep your elastic search index up to date with PUTs and POSTs on linked documents. (added in 0.0.5)

#####Note - only 1 "after" callback is allowed per endpoint, so if you enable indexUpdateOnModelUpdate, you're giving it up to elastic-harvest.

dealerSearch.enableAutoIndexUpdateOnModelUpdate("subdocumentsHarvestEndpoint","links.path.to.object.id");
e.g. dealerSearch.enableAutoIndexUpdateOnModelUpdate("brand","links.current_contracts.brand.id");

Update Elastic Search index when a related harvest model changes (added in 0.0.5)

entity = this;
dealerSearch.updateIndexForLinkedDocument("links.path.to.object.id",entity);

Delete ES Index (added in 0.0.9)

dealerSearch.deleteIndex().

Initialize ES Index (added in 0.0.9)

dealerSearch.initializeIndex().

Initialize an elastic search mapping (added in 0.0.6, updated in 0.0.9)

dealerSearch.initializeMapping(mappingObject).

v0.0.9 update provides automatic handling of missing-index errors.

The Mapping object can be loaded from a js file that looks like:

module.exports= {
    "trackingPoints": {
        "properties": {
            "data": {
                "type": "nested"
            },
            "loc" : {
                "type" : "nested",
                "properties": {
                    "location" : {
                        "type" : "geo_point"
                    }
                }
            },
            "time" : {
                "type" : "date"
            },
            "links": {
                "type": "nested",
                "properties": {
                    "equipment": {
                        "type": "nested",
                        "properties": {
                            "model": {
                                "type": "nested",
                                "properties": {
                                    "brand":{
                                        "type": "nested",
                                        "properties": {
                                            "name":{
                                                "type": "string",
                                                "index": "not_analyzed"
                                            }
                                        }
                                    },
                                    "equipmentType":{
                                        "type": "nested",
                                        "properties": {
                                            "value":{
                                                "type": "string",
                                                "index": "not_analyzed"
                                            }
                                        }
                                    },
                                    "name":{
                                        "type": "string",
                                        "index": "not_analyzed"
                                    }
                                }
                            }
                        }
                    },
                    "duty": {
                        "type": "nested",
                        "properties": {
                            "status":{
                                "type": "string",
                                "index": "not_analyzed"
                            }
                        }
                    }
                }
            }
        }
    }
}

Configuring scripts

There is a sampler script that you can run when wanting to get a subset of the results you normally get. To run this scripts will have to be enabled in Elastic Search config:

script.disable_dynamic: sandbox
script.default_lang: expression
script.groovy.sandbox.enabled: false

Then place this script as sampler.groovy file in scripts directory of ES instance.

count=count+1;if(count % skip_rate == 0){ return 1 }; return 0;

Running Sampler script

Sampler script can be executed in conjunction with any other ES query and aggregations. Just add the following to your query:

script=sampler&script.maxSamples=15

maxSamples being the number of results you want to get. Script will get a sample from the normal result set. For same query results you will get the same sample data.

An example:

/people/search?aggregations=n&n.property=links.pet.name&n.aggregations=mostpopular&mostpopular.type=top_hits&mostpopular.sort=-appearances&mostpopular.limit=1&mostpopular.include=pets&script=sampler&script.maxSamples=100