Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

settings.js overwrites ES default # of shards,replicas #178

Closed
easherma opened this issue Oct 11, 2016 · 3 comments
Closed

settings.js overwrites ES default # of shards,replicas #178

easherma opened this issue Oct 11, 2016 · 3 comments
Assignees

Comments

@easherma
Copy link

https://github.com/pelias/schema/blob/master/settings.js#L289

This is an important issue to either adjust and/or note in the installation docs. Per #6 , it is nice to be able to override ES default settings as part of the create index process, especially because helps to avoid having to make config changes in multiple places.

However, this default override is an unpleasant surprise for users setting up their own instance. In particular, setting up a single shard is a poor default, and since increasing the number of shards can only done via re-index and import. I'd suggest making the default something more sensible and noting in the install docs that create_index.js already overwrites these settings.

Tangential points:

  • It would be great to expand on this and include other typically changed elasticsearch configs. In particular, if heap size could be adjusted here that would be great (realizing that may not be possible or ideal)
  • I think it would be a more user-friendly process is create_index (and potentially drop as well, but less important) also created a snapshot of the initialized db pre import data. This is good practice anyways to give the user a fallback point, but is probably overlooked often. (snapshots pre and post import for each of the modules would also be great, but obviously would take more effort).
@orangejulius
Copy link
Member

orangejulius commented Oct 11, 2016

Good catch. I propose we do two things

1.) move the Elasticsearch settings in that block (# of replicas, # of shards, and index_concurrency) into pelias/config as defaults. We point to the defaults there much more frequently, so they'll be a bit more obvious than being buried here. Personally I associate master/settings.js in pelias/schema more with all our analyzer and token filter configuration.

2.) Pick a new default for the number of shards. Elasticsearch uses 5, so maybe there's no reason to think we're smarter than the Elasticsearch people? @missinglink any thoughts on how many shards we should use?

orangejulius added a commit to pelias/config that referenced this issue Oct 17, 2016
These were buried in pelias/schema (https://github.com/pelias/schema/blob/f28002db187f1685abc3688b141e0bfdd5cdd01a/settings.js#L289-L296),
so by moving them here it's more obvious they can be overridden.

We use 1 shard as a default in development where scalability isn't
required.

Also, because we use the [dfs_query_then_fetch](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch)
search mode, having one shard eliminates any possibility of queries run
_without_ that setting having confusing results due to TF/IDF

The `index_concurrency` setting is set to 10 as an attempt to increase
indexing performance as well.

Reference:
https://github.com/pelias/api/blob/9ff383cc2b4a690fa05a88e70c598bfdc28751f4/controller/search.js#L44
https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

Connects pelias/schema#178
@orangejulius orangejulius self-assigned this Oct 17, 2016
orangejulius added a commit to pelias/documentation that referenced this issue Oct 17, 2016
orangejulius added a commit to pelias/documentation that referenced this issue Oct 17, 2016
orangejulius added a commit to pelias/documentation that referenced this issue Oct 17, 2016
orangejulius added a commit that referenced this issue Oct 17, 2016
@orangejulius
Copy link
Member

Hey @easherma,
We've now moved the general Elasticsearch configuration settings to pelias/config, and there's a new section in the documentation with some suggestions on shard settings. Take a look, let me know if it makes sense, and if the Elasticsearch settings can be easily changed for you.

@orangejulius
Copy link
Member

I believe we fixed this in the work in the pull requests connected above my last comment. If there's anything more to do here don't hesitate to let us know in this issue or a new issue. But since we think we've taken care of it, we're closing this issue to help us keep track of things.

@ghost ghost removed the outreach label Jan 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants