From f123a53d7258349a171e47a35f4581899d8fa776 Mon Sep 17 00:00:00 2001 From: Clinton Gormley Date: Mon, 22 Jun 2015 23:49:45 +0200 Subject: [PATCH] Docs: Refactored modules and index modules sections --- docs/reference/aggregations/misc.asciidoc | 2 +- .../cluster/update-settings.asciidoc | 166 +---------- docs/reference/index-modules.asciidoc | 196 ++++++++++--- .../index-modules/allocation.asciidoc | 275 ++++++++---------- .../reference/index-modules/analysis.asciidoc | 20 +- docs/reference/index-modules/cache.asciidoc | 33 --- docs/reference/index-modules/mapper.asciidoc | 9 +- docs/reference/index-modules/merge.asciidoc | 36 ++- docs/reference/index-modules/slowlog.asciidoc | 46 +-- docs/reference/index-modules/store.asciidoc | 80 ++--- docs/reference/indices/stats.asciidoc | 4 +- .../indices/update-settings.asciidoc | 126 +------- docs/reference/mapping.asciidoc | 6 +- .../fielddata_formats.asciidoc} | 92 +----- .../mapping/types/core-types.asciidoc | 4 +- docs/reference/modules.asciidoc | 76 ++++- .../modules/advanced-scripting.asciidoc | 14 +- docs/reference/modules/cluster.asciidoc | 259 ++--------------- .../cluster/allocation_awareness.asciidoc | 107 +++++++ .../cluster/allocation_filtering.asciidoc | 70 +++++ .../modules/cluster/disk_allocator.asciidoc | 69 +++++ docs/reference/modules/cluster/misc.asciidoc | 36 +++ .../cluster/shards_allocation.asciidoc | 124 ++++++++ docs/reference/modules/gateway.asciidoc | 116 ++++---- docs/reference/modules/gateway/local.asciidoc | 56 ---- docs/reference/modules/indices.asciidoc | 74 ++--- .../modules/indices/circuit_breaker.asciidoc | 56 ++++ .../modules/indices/fielddata.asciidoc | 37 +++ .../modules/indices/filter_cache.asciidoc | 16 + .../modules/indices/indexing_buffer.asciidoc | 32 ++ .../indices}/query-cache.asciidoc | 20 +- .../modules/indices/recovery.asciidoc | 28 ++ .../modules/indices/ttl_interval.asciidoc | 16 + docs/reference/modules/plugins.asciidoc | 2 +- docs/reference/modules/threadpool.asciidoc | 1 - docs/reference/search/request-body.asciidoc | 2 +- docs/resiliency/index.asciidoc | 2 +- 37 files changed, 1171 insertions(+), 1137 deletions(-) delete mode 100644 docs/reference/index-modules/cache.asciidoc rename docs/reference/{index-modules/fielddata.asciidoc => mapping/fielddata_formats.asciidoc} (72%) create mode 100644 docs/reference/modules/cluster/allocation_awareness.asciidoc create mode 100644 docs/reference/modules/cluster/allocation_filtering.asciidoc create mode 100644 docs/reference/modules/cluster/disk_allocator.asciidoc create mode 100644 docs/reference/modules/cluster/misc.asciidoc create mode 100644 docs/reference/modules/cluster/shards_allocation.asciidoc delete mode 100644 docs/reference/modules/gateway/local.asciidoc create mode 100644 docs/reference/modules/indices/circuit_breaker.asciidoc create mode 100644 docs/reference/modules/indices/fielddata.asciidoc create mode 100644 docs/reference/modules/indices/filter_cache.asciidoc create mode 100644 docs/reference/modules/indices/indexing_buffer.asciidoc rename docs/reference/{index-modules => modules/indices}/query-cache.asciidoc (93%) create mode 100644 docs/reference/modules/indices/recovery.asciidoc create mode 100644 docs/reference/modules/indices/ttl_interval.asciidoc diff --git a/docs/reference/aggregations/misc.asciidoc b/docs/reference/aggregations/misc.asciidoc index f494d5291c0f9..73c24e7358534 100644 --- a/docs/reference/aggregations/misc.asciidoc +++ b/docs/reference/aggregations/misc.asciidoc @@ -7,7 +7,7 @@ can be cached for faster responses. These cached results are the same results that would be returned by an uncached aggregation -- you will never get stale results. -See <> for more details. +See <> for more details. [[returning-only-agg-results]] == Returning only aggregation results diff --git a/docs/reference/cluster/update-settings.asciidoc b/docs/reference/cluster/update-settings.asciidoc index a0f7bbaa976e6..08f4c9005970c 100644 --- a/docs/reference/cluster/update-settings.asciidoc +++ b/docs/reference/cluster/update-settings.asciidoc @@ -10,8 +10,8 @@ survive a full cluster restart). Here is an example: curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "discovery.zen.minimum_master_nodes" : 2 - } -}' + } +}' -------------------------------------------------- Or: @@ -21,8 +21,8 @@ Or: curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "discovery.zen.minimum_master_nodes" : 2 - } -}' + } +}' -------------------------------------------------- The cluster responds with the settings updated. So the response for the @@ -34,8 +34,8 @@ last example will be: "persistent" : {}, "transient" : { "discovery.zen.minimum_master_nodes" : "2" - } -}' + } +}' -------------------------------------------------- Cluster wide settings can be returned using: @@ -45,157 +45,7 @@ Cluster wide settings can be returned using: curl -XGET localhost:9200/_cluster/settings -------------------------------------------------- -There is a specific list of settings that can be updated, those include: -[float] -[[cluster-settings]] -=== Cluster settings - -[float] -==== Routing allocation - -[float] -===== Awareness - -`cluster.routing.allocation.awareness.attributes`:: - See <>. - -`cluster.routing.allocation.awareness.force.*`:: - See <>. - -[float] -===== Balanced Shards -All these values are relative to one another. The first three are used to -compose a three separate weighting functions into one. The cluster is balanced -when no allowed action can bring the weights of each node closer together by -more then the fourth setting. Actions might not be allowed, for instance, -due to forced awareness or allocation filtering. - -`cluster.routing.allocation.balance.shard`:: - Defines the weight factor for shards allocated on a node - (float). Defaults to `0.45f`. Raising this raises the tendency to - equalize the number of shards across all nodes in the cluster. - -`cluster.routing.allocation.balance.index`:: - Defines a factor to the number of shards per index allocated - on a specific node (float). Defaults to `0.55f`. Raising this raises the - tendency to equalize the number of shards per index across all nodes in - the cluster. - -`cluster.routing.allocation.balance.threshold`:: - Minimal optimization value of operations that should be performed (non - negative float). Defaults to `1.0f`. Raising this will cause the cluster - to be less aggressive about optimizing the shard balance. - -[float] -===== Concurrent Rebalance - -`cluster.routing.allocation.cluster_concurrent_rebalance`:: - Allow to control how many concurrent rebalancing of shards are - allowed cluster wide, and default it to `2` (integer). `-1` for - unlimited. See also <>. - -[float] -===== Enable allocation - -`cluster.routing.allocation.enable`:: - See <>. - -[float] -===== Throttling allocation - -`cluster.routing.allocation.node_initial_primaries_recoveries`:: - See <>. - -`cluster.routing.allocation.node_concurrent_recoveries`:: - See <>. - -[float] -===== Filter allocation - -`cluster.routing.allocation.include.*`:: - See <>. - -`cluster.routing.allocation.exclude.*`:: - See <>. - -`cluster.routing.allocation.require.*` - See <>. - -[float] -==== Metadata - -`cluster.blocks.read_only`:: - Have the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices). - -[float] -==== Discovery - -`discovery.zen.minimum_master_nodes`:: - See <> - -`discovery.zen.publish_timeout`:: - See <> - -[float] -==== Threadpools - -`threadpool.*`:: - See <> - -[float] -[[cluster-index-settings]] -=== Index settings - -[float] -==== Index filter cache - -`indices.cache.filter.size`:: - See <> - -[float] -==== TTL interval - -`indices.ttl.interval` (time):: - See <> - -[float] -==== Recovery - -`indices.recovery.concurrent_streams`:: - See <> - -`indices.recovery.concurrent_small_file_streams`:: - See <> - -`indices.recovery.file_chunk_size`:: - See <> - -`indices.recovery.translog_ops`:: - See <> - -`indices.recovery.translog_size`:: - See <> - -`indices.recovery.compress`:: - See <> - -`indices.recovery.max_bytes_per_sec`:: - See <> - -[float] -[[logger]] -=== Logger - -Logger values can also be updated by setting `logger.` prefix. More -settings will be allowed to be updated. - -[float] -=== Field data circuit breaker - -`indices.breaker.fielddata.limit`:: - See <> - -`indices.breaker.fielddata.overhead`:: - See <> +A list of dynamically updatable settings can be found in the +<> documentation. diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc index f74eda35bed62..804c8894bc6f2 100644 --- a/docs/reference/index-modules.asciidoc +++ b/docs/reference/index-modules.asciidoc @@ -1,72 +1,194 @@ + [[index-modules]] = Index Modules [partintro] -- -Index Modules are modules created per index and control all aspects -related to an index. Since those modules lifecycle are tied to an index, -all the relevant modules settings can be provided when creating an index -(and it is actually the recommended way to configure an index). + +Index Modules are modules created per index and control all aspects related to +an index. [float] [[index-modules-settings]] == Index Settings -There are specific index level settings that are not associated with any -specific module. These include: +Index level settings can be set per-index. Settings may be: + +_static_:: + +They can only be set at index creation time or on a +<>. + +_dynamic_:: + +They can be changed on a live index using the +<> API. + +WARNING: Changing static or dynamic index settings on a closed index could +result in incorrect settings that are impossible to rectify without deleting +and recreating the index. + +[float] +=== Static index settings + +Below is a list of all _static_ index settings that are not associated with any +specific index module: + +`index.number_of_shards`:: + + The number of primary shards that an index should have. Defaults to 5. + This setting can only be set at index creation time. It cannot be + changed on a closed index. + +`index.shard.check_on_startup`:: ++ +-- +experimental[] Whether or not shards should be checked for corruption before opening. When +corruption is detected, it will prevent the shard from being opened. Accepts: + +`false`:: + + (default) Don't check for corruption when opening a shard. + +`checksum`:: + + Check for physical corruption. + +`true`:: + + Check for both physical and logical corruption. This is much more + expensive in terms of CPU and memory usage. + +`fix`:: + + Check for both physical and logical corruption. Segments that were reported + as corrupted will be automatically removed. This option *may result in data loss*. + Use with extreme caution! + +Checking shards may take a lot of time on large indices. +-- + +[float] +[[dynamic-index-settings]] +=== Dynamic index settings + +Below is a list of all _dynamic_ index settings that are not associated with any +specific index module: + + +`index.number_of_replicas`:: + + The number of replicas each primary shard has. Defaults to 1. + +`index.auto_expand_replicas`:: + + Auto-expand the number of replicas based on the number of available nodes. + Set to a dash delimited lower and upper bound (e.g. `0-5`) or use `all` + for the upper bound (e.g. `0-all`). Defaults to `false` (i.e. disabled). `index.refresh_interval`:: - A time setting controlling how often the - refresh operation will be executed. Defaults to `1s`. Can be set to `-1` - in order to disable it. + + How often to perform a refresh operation, which makes recent changes to the + index visible to search. Defaults to `1s`. Can be set to `-1` to disable + refresh. `index.codec`:: - experimental[] - The `default` value compresses stored data with LZ4 compression, but - this can be set to `best_compression` for a higher compression ratio, - at the expense of slower stored fields performance. + experimental[] The `default` value compresses stored data with LZ4 + compression, but this can be set to `best_compression` for a higher + compression ratio, at the expense of slower stored fields performance. -`index.shard.check_on_startup`:: +`index.blocks.read_only`:: + + Set to `true` to make the index and index metadata read only, `false` to + allow writes and metadata changes. + +`index.blocks.read`:: + + Set to `true` to disable read operations against the index. - experimental[] - Should shard consistency be checked upon opening. When corruption is detected, - it will prevent the shard from being opened. - + - When `checksum`, check for physical corruption. - When `true`, check for both physical and logical corruption. This is much - more expensive in terms of CPU and memory usage. - When `fix`, check for both physical and logical corruption, and segments - that were reported as corrupted will be automatically removed. - Default value is `false`, which performs no checks. +`index.blocks.write`:: -NOTE: Checking shards may take a lot of time on large indices. + Set to `true` to disable write operations against the index. -WARNING: Setting `index.shard.check_on_startup` to `fix` may result in data loss, - use with extreme caution. +`index.blocks.metadata`:: + Set to `true` to disable index metadata reads and writes. + +`index.ttl.disable_purge`:: + + experimental[] Disables the purge of <> on + the current index. + +[[index.recovery.initial_shards]]`index.recovery.initial_shards`:: ++ +-- +A primary shard is only recovered only if there are enough nodes available to +allocate sufficient replicas to form a quorum. It can be set to: + + * `quorum` (default) + * `quorum-1` (or `half`) + * `full` + * `full-1`. + * Number values are also supported, e.g. `1`. -- -include::index-modules/analysis.asciidoc[] -include::index-modules/allocation.asciidoc[] +[float] +=== Settings in other index modules -include::index-modules/slowlog.asciidoc[] +Other index settings are available in index modules: -include::index-modules/merge.asciidoc[] +<>:: -include::index-modules/store.asciidoc[] + Settings to define analyzers, tokenizers, token filters and character + filters. -include::index-modules/mapper.asciidoc[] +<>:: -include::index-modules/translog.asciidoc[] + Control over where, when, and how shards are allocated to nodes. + +<>:: + + Enable or disable dynamic mapping for an index. + +<>:: + + Control over how shards are merged by the background merge process. + +<>:: + + Configure custom similarity settings to customize how search results are + scored. + +<>:: -include::index-modules/cache.asciidoc[] + Control over how slow queries and fetch requests are logged. -include::index-modules/query-cache.asciidoc[] +<>:: -include::index-modules/fielddata.asciidoc[] + Configure the type of filesystem used to access shard data. + +<>:: + + Control over the transaction log and background flush operations. + +-- + +include::index-modules/analysis.asciidoc[] + +include::index-modules/allocation.asciidoc[] + +include::index-modules/mapper.asciidoc[] + +include::index-modules/merge.asciidoc[] include::index-modules/similarity.asciidoc[] +include::index-modules/slowlog.asciidoc[] + +include::index-modules/store.asciidoc[] + +include::index-modules/translog.asciidoc[] + diff --git a/docs/reference/index-modules/allocation.asciidoc b/docs/reference/index-modules/allocation.asciidoc index 800e4d5de5ed4..4cc07060b6253 100644 --- a/docs/reference/index-modules/allocation.asciidoc +++ b/docs/reference/index-modules/allocation.asciidoc @@ -1,168 +1,131 @@ [[index-modules-allocation]] == Index Shard Allocation +This module provides per-index settings to control the allocation of shards to +nodes. + [float] [[shard-allocation-filtering]] === Shard Allocation Filtering -Allows to control the allocation of indices on nodes based on include/exclude -filters. The filters can be set both on the index level and on the -cluster level. Lets start with an example of setting it on the cluster -level: - -Lets say we have 4 nodes, each has specific attribute called `tag` -associated with it (the name of the attribute can be any name). Each -node has a specific value associated with `tag`. Node 1 has a setting -`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on. - -We can create an index that will only deploy on nodes that have `tag` -set to `value1` and `value2` by setting -`index.routing.allocation.include.tag` to `value1,value2`. For example: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/test/_settings -d '{ - "index.routing.allocation.include.tag" : "value1,value2" -}' --------------------------------------------------- - -On the other hand, we can create an index that will be deployed on all -nodes except for nodes with a `tag` of value `value3` by setting -`index.routing.allocation.exclude.tag` to `value3`. For example: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/test/_settings -d '{ - "index.routing.allocation.exclude.tag" : "value3" -}' --------------------------------------------------- - -`index.routing.allocation.require.*` can be used to -specify a number of rules, all of which MUST match in order for a shard -to be allocated to a node. This is in contrast to `include` which will -include a node if ANY rule matches. - -The `include`, `exclude` and `require` values can have generic simple -matching wildcards, for example, `value1*`. Additionally, special attribute -names called `_ip`, `_name`, `_id` and `_host` can be used to match by node -ip address, name, id or host name, respectively. - -Obviously a node can have several attributes associated with it, and -both the attribute name and value are controlled in the setting. For -example, here is a sample of several node configurations: - -[source,js] --------------------------------------------------- -node.group1: group1_value1 -node.group2: group2_value4 --------------------------------------------------- - -In the same manner, `include`, `exclude` and `require` can work against -several attributes, for example: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/test/_settings -d '{ - "index.routing.allocation.include.group1" : "xxx" - "index.routing.allocation.include.group2" : "yyy", - "index.routing.allocation.exclude.group3" : "zzz", - "index.routing.allocation.require.group4" : "aaa", -}' --------------------------------------------------- - -The provided settings can also be updated in real time using the update -settings API, allowing to "move" indices (shards) around in realtime. - -Cluster wide filtering can also be defined, and be updated in real time -using the cluster update settings API. This setting can come in handy -for things like decommissioning nodes (even if the replica count is set -to 0). Here is a sample of how to decommission a node based on `_ip` -address: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "cluster.routing.allocation.exclude._ip" : "10.0.0.1" - } -}' --------------------------------------------------- +Shard allocation filtering allows you to specify which nodes are allowed +to host the shards of a particular index. + +NOTE: The per-index shard allocation filters explained below work in +conjunction with the cluster-wide allocation filters explained in +<>. + +It is possible to assign arbitrary metadata attributes to each node at +startup. For instance, nodes could be assigned a `rack` and a `group` +attribute as follows: + +[source,sh] +------------------------ +bin/elasticsearch --node.rack rack1 --node.size big <1> +------------------------ +<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file. + +These metadata attributes can be used with the +`index.routing.allocation.*` settings to allocate an index to a particular +group of nodes. For instance, we can move the index `test` to either `big` or +`medium` nodes as follows: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.include.size": "big,medium" +} +------------------------ +// AUTOSENSE + +Alternatively, we can move the index `test` away from the `small` nodes with +an `exclude` rule: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.exclude.size": "small" +} +------------------------ +// AUTOSENSE + +Multiple rules can be specified, in which case all conditions must be +satisfied. For instance, we could move the index `test` to `big` nodes in +`rack1` with the following: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.include.size": "big", + "index.routing.allocation.include.rack": "rack1" +} +------------------------ +// AUTOSENSE + +NOTE: If some conditions cannot be satisfied then shards will not be moved. + +The following settings are _dynamic_, allowing live indices to be moved from +one set of nodes to another: + +`index.routing.allocation.include.{attribute}`:: + + Assign the index to a node whose `{attribute}` has at least one of the + comma-separated values. + +`index.routing.allocation.require.{attribute}`:: + + Assign the index to a node whose `{attribute}` has _all_ of the + comma-separated values. + +`index.routing.allocation.exclude.{attribute}`:: + + Assign the index to a node whose `{attribute}` has _none_ of the + comma-separated values. + +These special attributes are also supported: + +[horizontal] +`_name`:: Match nodes by node name +`_ip`:: Match nodes by IP address (the IP address associated with the hostname) +`_host`:: Match nodes by hostname + +All attribute values can be specified with wildcards, eg: + +[source,json] +------------------------ +PUT test/_settings +{ + "index.routing.allocation.include._ip": "192.168.2.*" +} +------------------------ +// AUTOSENSE [float] === Total Shards Per Node -The `index.routing.allocation.total_shards_per_node` setting allows to -control how many total shards (replicas and primaries) for an index will be allocated per node. -It can be dynamically set on a live index using the update index -settings API. +The cluster-level shard allocator tries to spread the shards of a single index +across as many nodes as possible. However, depending on how many shards and +indices you have, and how big they are, it may not always be possible to spread +shards evenly. + +The following _dynamic_ setting allows you to specify a hard limit on the total +number of shards from a single index allowed per node: + +`index.routing.allocation.total_shards_per_node`:: + + The maximum number of shards (replicas and primaries) that will be + allocated to a single node. Defaults to unbounded. + +[WARNING] +======================================= +This setting imposes a hard limit which can result in some shards not +being allocated. + +Use with caution. +======================================= + + -[float] -[[disk]] -=== Disk-based Shard Allocation - -disk based shard allocation is enabled from version 1.3.0 onward - -Elasticsearch can be configured to prevent shard -allocation on nodes depending on disk usage for the node. This -functionality is enabled by default, and can be changed either in the -configuration file, or dynamically using: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "cluster.routing.allocation.disk.threshold_enabled" : false - } -}' --------------------------------------------------- - -Once enabled, Elasticsearch uses two watermarks to decide whether -shards should be allocated or can remain on the node. - -`cluster.routing.allocation.disk.watermark.low` controls the low -watermark for disk usage. It defaults to 85%, meaning ES will not -allocate new shards to nodes once they have more than 85% disk -used. It can also be set to an absolute byte value (like 500mb) to -prevent ES from allocating shards if less than the configured amount -of space is available. - -`cluster.routing.allocation.disk.watermark.high` controls the high -watermark. It defaults to 90%, meaning ES will attempt to relocate -shards to another node if the node disk usage rises above 90%. It can -also be set to an absolute byte value (similar to the low watermark) -to relocate shards once less than the configured amount of space is -available on the node. - -NOTE: Percentage values refer to used disk space, while byte values refer to -free disk space. This can be confusing, since it flips the meaning of -high and low. For example, it makes sense to set the low watermark to 10gb -and the high watermark to 5gb, but not the other way around. - -Both watermark settings can be changed dynamically using the cluster -settings API. By default, Elasticsearch will retrieve information -about the disk usage of the nodes every 30 seconds. This can also be -changed by setting the `cluster.info.update.interval` setting. - -An example of updating the low watermark to no more than 80% of the disk size, a -high watermark of at least 50 gigabytes free, and updating the information about -the cluster every minute: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "cluster.routing.allocation.disk.watermark.low" : "80%", - "cluster.routing.allocation.disk.watermark.high" : "50gb", - "cluster.info.update.interval" : "1m" - } -}' --------------------------------------------------- - -By default, Elasticsearch will take into account shards that are currently being -relocated to the target node when computing a node's disk usage. This can be -changed by setting the `cluster.routing.allocation.disk.include_relocations` -setting to `false` (defaults to `true`). Taking relocating shards' sizes into -account may, however, mean that the disk usage for a node is incorrectly -estimated on the high side, since the relocation could be 90% complete and a -recently retrieved disk usage would include the total size of the relocating -shard as well as the space already used by the running relocation. diff --git a/docs/reference/index-modules/analysis.asciidoc b/docs/reference/index-modules/analysis.asciidoc index 3801f0f2802c1..93f569b962aaf 100644 --- a/docs/reference/index-modules/analysis.asciidoc +++ b/docs/reference/index-modules/analysis.asciidoc @@ -1,18 +1,12 @@ [[index-modules-analysis]] == Analysis -The index analysis module acts as a configurable registry of Analyzers -that can be used in order to break down indexed (analyzed) fields when a -document is indexed as well as to process query strings. It maps to the Lucene -`Analyzer`. +The index analysis module acts as a configurable registry of _analyzers_ +that can be used in order to convert a string field into individual terms +which are: -Analyzers are (generally) composed of a single `Tokenizer` and zero or -more `TokenFilters`. A set of `CharFilters` can be associated with an -analyzer to process the characters prior to other analysis steps. The -analysis module allows one to register `TokenFilters`, `Tokenizers` and -`Analyzers` under logical names that can then be referenced either in -mapping definitions or in certain APIs. The Analysis module -automatically registers (*if not explicitly defined*) built in -analyzers, token filters, and tokenizers. +* added to the inverted index in order to make the document searchable +* used by high level queries such as the <> + to generate seach terms. -See <> for configuration details. \ No newline at end of file +See <> for configuration details. diff --git a/docs/reference/index-modules/cache.asciidoc b/docs/reference/index-modules/cache.asciidoc deleted file mode 100644 index 2b6334cfd67b5..0000000000000 --- a/docs/reference/index-modules/cache.asciidoc +++ /dev/null @@ -1,33 +0,0 @@ -[[index-modules-cache]] -== Cache - -There are different caching inner modules associated with an index. They -include `filter` and others. - -[float] -[[filter]] -=== Filter Cache - -The filter cache is responsible for caching the results of filters (used -in the query). The default implementation of a filter cache (and the one -recommended to use in almost all cases) is the `node` filter cache type. - -[float] -[[node-filter]] -==== Node Filter Cache - -The `node` filter cache may be configured to use either a percentage of -the total memory allocated to the process or a specific amount of -memory. All shards present on a node share a single node cache (thats -why its called `node`). The cache implements an LRU eviction policy: -when a cache becomes full, the least recently used data is evicted to -make way for new data. - -The setting that allows one to control the memory size for the filter -cache is `indices.cache.filter.size`, which defaults to `10%`. *Note*, -this is *not* an index level setting but a node level setting (can be -configured in the node configuration). - -`indices.cache.filter.size` can accept either a percentage value, like -`30%`, or an exact value, like `512mb`. - diff --git a/docs/reference/index-modules/mapper.asciidoc b/docs/reference/index-modules/mapper.asciidoc index baca199efaed4..9b55f630f7160 100644 --- a/docs/reference/index-modules/mapper.asciidoc +++ b/docs/reference/index-modules/mapper.asciidoc @@ -49,5 +49,10 @@ automatically. The default mapping can be overridden by specifying the `_default_` type when creating a new index. -Dynamic creation of mappings for unmapped types can be completely -disabled by setting `index.mapper.dynamic` to `false`. +[float] +=== Mapper settings + +`index.mapper.dynamic` (_static_):: + + Dynamic creation of mappings for unmapped types can be completely + disabled by setting `index.mapper.dynamic` to `false`. diff --git a/docs/reference/index-modules/merge.asciidoc b/docs/reference/index-modules/merge.asciidoc index f9f468221aba5..ac6de1d3a9e3b 100644 --- a/docs/reference/index-modules/merge.asciidoc +++ b/docs/reference/index-modules/merge.asciidoc @@ -14,6 +14,11 @@ number of segments per tier. The merge policy is able to merge non-adjacent segments, and separates how many segments are merged at once from how many segments are allowed per tier. It also does not over-merge (i.e., cascade merges). +[float] +[[merge-settings]] +=== Merge policy settings + +All merge policy settings are _dynamic_ and can be updated on a live index. The merge policy has the following settings: `index.merge.policy.expunge_deletes_allowed`:: @@ -80,30 +85,29 @@ possibly either increase the `max_merged_segment` or issue an optimize call for the index (try and aim to issue it on a low traffic time). [float] -[[scheduling]] -=== Scheduling +[[merge-scheduling]] +=== Merge scheduling The merge scheduler (ConcurrentMergeScheduler) controls the execution of merge operations once they are needed (according to the merge policy). Merges run in separate threads, and when the maximum number of threads is reached, -further merges will wait until a merge thread becomes available. The merge -scheduler supports this setting: +further merges will wait until a merge thread becomes available. + +The merge scheduler supports the following _dynamic_ settings: `index.merge.scheduler.max_thread_count`:: -The maximum number of threads that may be merging at once. Defaults to -`Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))` -which works well for a good solid-state-disk (SSD). If your index is on -spinning platter drives instead, decrease this to 1. + The maximum number of threads that may be merging at once. Defaults to + `Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))` + which works well for a good solid-state-disk (SSD). If your index is on + spinning platter drives instead, decrease this to 1. `index.merge.scheduler.auto_throttle`:: -If this is true (the default), then the merge scheduler will -rate-limit IO (writes) for merges to an adaptive value depending on -how many merges are requested over time. An application with a low -indexing rate that unluckily suddenly requires a large merge will see -that merge aggressively throttled, while an application doing heavy -indexing will see the throttle move higher to allow merges to keep up -with ongoing indexing. This is a dynamic setting (you can <>). + If this is true (the default), then the merge scheduler will rate-limit IO + (writes) for merges to an adaptive value depending on how many merges are + requested over time. An application with a low indexing rate that + unluckily suddenly requires a large merge will see that merge aggressively + throttled, while an application doing heavy indexing will see the throttle + move higher to allow merges to keep up with ongoing indexing. diff --git a/docs/reference/index-modules/slowlog.asciidoc b/docs/reference/index-modules/slowlog.asciidoc index a5d0b4baafaa9..15c1b217577af 100644 --- a/docs/reference/index-modules/slowlog.asciidoc +++ b/docs/reference/index-modules/slowlog.asciidoc @@ -1,29 +1,31 @@ [[index-modules-slowlog]] -== Index Slow Log +== Slow Log [float] [[search-slow-log]] === Search Slow Log Shard level slow search log allows to log slow search (query and fetch -executions) into a dedicated log file. +phases) into a dedicated log file. Thresholds can be set for both the query phase of the execution, and fetch phase, here is a sample: -[source,js] +[source,yaml] -------------------------------------------------- -#index.search.slowlog.threshold.query.warn: 10s -#index.search.slowlog.threshold.query.info: 5s -#index.search.slowlog.threshold.query.debug: 2s -#index.search.slowlog.threshold.query.trace: 500ms - -#index.search.slowlog.threshold.fetch.warn: 1s -#index.search.slowlog.threshold.fetch.info: 800ms -#index.search.slowlog.threshold.fetch.debug: 500ms -#index.search.slowlog.threshold.fetch.trace: 200ms +index.search.slowlog.threshold.query.warn: 10s +index.search.slowlog.threshold.query.info: 5s +index.search.slowlog.threshold.query.debug: 2s +index.search.slowlog.threshold.query.trace: 500ms + +index.search.slowlog.threshold.fetch.warn: 1s +index.search.slowlog.threshold.fetch.info: 800ms +index.search.slowlog.threshold.fetch.debug: 500ms +index.search.slowlog.threshold.fetch.trace: 200ms -------------------------------------------------- +All of the above settings are _dynamic_ and can be set per-index. + By default, none are enabled (set to `-1`). Levels (`warn`, `info`, `debug`, `trace`) allow to control under which logging level the log will be logged. Not all are required to be configured (for example, only @@ -37,14 +39,10 @@ execute. Some of the benefits of shard level logging is the association of the actual execution on the specific machine, compared with request level. -All settings are index level settings (and each index can have different -values for it), and can be changed in runtime using the index update -settings API. - The logging file is configured by default using the following configuration (found in `logging.yml`): -[source,js] +[source,yaml] -------------------------------------------------- index_search_slow_log_file: type: dailyRollingFile @@ -64,18 +62,20 @@ log. The log file is ends with `_index_indexing_slowlog.log`. Log and the thresholds are configured in the elasticsearch.yml file in the same way as the search slowlog. Index slowlog sample: -[source,js] +[source,yaml] -------------------------------------------------- -#index.indexing.slowlog.threshold.index.warn: 10s -#index.indexing.slowlog.threshold.index.info: 5s -#index.indexing.slowlog.threshold.index.debug: 2s -#index.indexing.slowlog.threshold.index.trace: 500ms +index.indexing.slowlog.threshold.index.warn: 10s +index.indexing.slowlog.threshold.index.info: 5s +index.indexing.slowlog.threshold.index.debug: 2s +index.indexing.slowlog.threshold.index.trace: 500ms -------------------------------------------------- +All of the above settings are _dynamic_ and can be set per-index. + The index slow log file is configured by default in the `logging.yml` file: -[source,js] +[source,yaml] -------------------------------------------------- index_indexing_slow_log_file: type: dailyRollingFile diff --git a/docs/reference/index-modules/store.asciidoc b/docs/reference/index-modules/store.asciidoc index 12fcf0c350950..c603a00d89f5b 100644 --- a/docs/reference/index-modules/store.asciidoc +++ b/docs/reference/index-modules/store.asciidoc @@ -1,34 +1,16 @@ [[index-modules-store]] == Store -The store module allows you to control how index data is stored. - -The index can either be stored in-memory (no persistence) or on-disk -(the default). In-memory indices provide better performance at the cost -of limiting the index size to the amount of available physical memory. - -When using a local gateway (the default), file system storage with *no* -in memory storage is required to maintain index consistency. This is -required since the local gateway constructs its state from the local -index state of each node. - -Another important aspect of memory based storage is the fact that -Elasticsearch supports storing the index in memory *outside of the JVM -heap space* using the "Memory" (see below) storage type. It translates -to the fact that there is no need for extra large JVM heaps (with their -own consequences) for storing the index in memory. - -experimental[All of the settings exposed in the `store` module are expert only and may be removed in the future] +The store module allows you to control how index data is stored and accessed on disk. [float] [[file-system]] === File system storage types -File system based storage is the default storage used. There are -different implementations or _storage types_. The best one for the -operating environment will be automatically chosen: `mmapfs` on -Windows 64bit, `simplefs` on Windows 32bit, and `default` -(hybrid `niofs` and `mmapfs`) for the rest. +There are different file system implementations or _storage types_. The best +one for the operating environment will be automatically chosen: `mmapfs` on +Windows 64bit, `simplefs` on Windows 32bit, and `default` (hybrid `niofs` and +`mmapfs`) for the rest. This can be overridden for all indices by adding this to the `config/elasticsearch.yml` file: @@ -38,57 +20,53 @@ This can be overridden for all indices by adding this to the index.store.type: niofs --------------------------------- -It can also be set on a per-index basis at index creation time: +It is a _static_ setting that can be set on a per-index basis at index +creation time: [source,json] --------------------------------- -curl -XPUT localhost:9200/my_index -d '{ - "settings": { - "index.store.type": "niofs" - } -}'; +PUT /my_index +{ + "settings": { + "index.store.type": "niofs" + } +} --------------------------------- +experimental[This is an expert-only setting and may be removed in the future] + The following sections lists all the different storage types supported. -[float] -[[simplefs]] -==== Simple FS +[[simplefs]]`simplefs`:: -The `simplefs` type is a straightforward implementation of file system +The Simple FS type is a straightforward implementation of file system storage (maps to Lucene `SimpleFsDirectory`) using a random access file. This implementation has poor concurrent performance (multiple threads will bottleneck). It is usually better to use the `niofs` when you need index persistence. -[float] -[[niofs]] -==== NIO FS +[[niofs]]`niofs`:: -The `niofs` type stores the shard index on the file system (maps to +The NIO FS type stores the shard index on the file system (maps to Lucene `NIOFSDirectory`) using NIO. It allows multiple threads to read from the same file concurrently. It is not recommended on Windows because of a bug in the SUN Java implementation. -[[mmapfs]] -[float] -==== MMap FS +[[mmapfs]]`mmapfs`:: -The `mmapfs` type stores the shard index on the file system (maps to +The MMap FS type stores the shard index on the file system (maps to Lucene `MMapDirectory`) by mapping a file into memory (mmap). Memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this -class, be sure your have plenty of virtual address space. -See <> +class, be sure you have allowed plenty of +<>. -[[default_fs]] -[float] -==== Hybrid MMap / NIO FS +[[default_fs]]`default_fs`:: -The `default` type stores the shard index on the file system depending on -the file type by mapping a file into memory (mmap) or using Java NIO. Currently -only the Lucene term dictionary and doc values files are memory mapped to reduce -the impact on the operating system. All other files are opened using Lucene `NIOFSDirectory`. -Address space settings (<>) might also apply if your term +The `default` type is a hybrid of NIO FS and MMapFS, which chooses the best +file system for each type of file. Currently only the Lucene term dictionary +and doc values files are memory mapped to reduce the impact on the operating +system. All other files are opened using Lucene `NIOFSDirectory`. Address +space settings (<>) might also apply if your term dictionaries are large. diff --git a/docs/reference/indices/stats.asciidoc b/docs/reference/indices/stats.asciidoc index 2d1368ecb9941..6f6cd4a7cebaa 100644 --- a/docs/reference/indices/stats.asciidoc +++ b/docs/reference/indices/stats.asciidoc @@ -43,7 +43,7 @@ specified as well in the URI. Those stats can be any of: `fielddata`:: Fielddata statistics. `flush`:: Flush statistics. `merge`:: Merge statistics. -`query_cache`:: <> statistics. +`query_cache`:: <> statistics. `refresh`:: Refresh statistics. `suggest`:: Suggest statistics. `warmer`:: Warmer statistics. @@ -80,7 +80,7 @@ curl 'localhost:9200/_stats/search?groups=group1,group2 -------------------------------------------------- The stats returned are aggregated on the index level, with -`primaries` and `total` aggregations, where `primaries` are the values for only the +`primaries` and `total` aggregations, where `primaries` are the values for only the primary shards, and `total` are the cumulated values for both primary and replica shards. In order to get back shard level stats, set the `level` parameter to `shards`. diff --git a/docs/reference/indices/update-settings.asciidoc b/docs/reference/indices/update-settings.asciidoc index d4888103eb2a5..d5d00047e9c8d 100644 --- a/docs/reference/indices/update-settings.asciidoc +++ b/docs/reference/indices/update-settings.asciidoc @@ -29,130 +29,8 @@ curl -XPUT 'localhost:9200/my_index/_settings' -d ' }' -------------------------------------------------- -[WARNING] -======================== -When changing the number of replicas the index needs to be open. Changing -the number of replicas on a closed index might prevent the index to be opened correctly again. -======================== - -Below is the list of settings that can be changed using the update -settings API: - -`index.number_of_replicas`:: - The number of replicas each shard has. - -`index.auto_expand_replicas` (string):: - Set to a dash delimited lower and upper bound (e.g. `0-5`) - or one may use `all` as the upper bound (e.g. `0-all`), or `false` to disable it. - -`index.blocks.read_only`:: - Set to `true` to have the index read only, `false` to allow writes - and metadata changes. - -`index.blocks.read`:: - Set to `true` to disable read operations against the index. - -`index.blocks.write`:: - Set to `true` to disable write operations against the index. - -`index.blocks.metadata`:: - Set to `true` to disable metadata operations against the index. - -`index.refresh_interval`:: - The async refresh interval of a shard. - -`index.translog.flush_threshold_ops`:: - When to flush based on operations. - -`index.translog.flush_threshold_size`:: - When to flush based on translog (bytes) size. - -`index.translog.flush_threshold_period`:: - When to flush based on a period of not flushing. - -`index.translog.disable_flush`:: - Disables flushing. Note, should be set for a short - interval and then enabled. - -`index.cache.filter.max_size`:: - The maximum size of filter cache (per segment in shard). - Set to `-1` to disable. - -`index.cache.filter.expire`:: - experimental[] The expire after access time for filter cache. - Set to `-1` to disable. - -`index.gateway.snapshot_interval`:: - experimental[] The gateway snapshot interval (only applies to shared - gateways). Defaults to 10s. - -<>:: - All the settings for the merge policy currently configured. - A different merge policy can't be set. - -`index.merge.scheduler.*`:: - experimental[] All the settings for the merge scheduler. - -`index.routing.allocation.include.*`:: - A node matching any rule will be allowed to host shards from the index. - -`index.routing.allocation.exclude.*`:: - A node matching any rule will NOT be allowed to host shards from the index. - -`index.routing.allocation.require.*`:: - Only nodes matching all rules will be allowed to host shards from the index. - -`index.routing.allocation.disable_allocation`:: - Disable allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`. - -`index.routing.allocation.disable_new_allocation`:: - Disable new allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`. - -`index.routing.allocation.disable_replica_allocation`:: - Disable replica allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`. - -`index.routing.allocation.enable`:: - Enables shard allocation for a specific index. It can be set to: - * `all` (default) - Allows shard allocation for all shards. - * `primaries` - Allows shard allocation only for primary shards. - * `new_primaries` - Allows shard allocation only for primary shards for new indices. - * `none` - No shard allocation is allowed. - -`index.routing.rebalance.enable`:: - Enables shard rebalancing for a specific index. It can be set to: - * `all` (default) - Allows shard rebalancing for all shards. - * `primaries` - Allows shard rebalancing only for primary shards. - * `replicas` - Allows shard rebalancing only for replica shards. - * `none` - No shard rebalancing is allowed. - -`index.routing.allocation.total_shards_per_node`:: - Controls the total number of shards (replicas and primaries) allowed to be allocated on a single node. Defaults to unbounded (`-1`). - -`index.recovery.initial_shards`:: - When using local gateway a particular shard is recovered only if there can be allocated quorum shards in the cluster. It can be set to: - * `quorum` (default) - * `quorum-1` (or `half`) - * `full` - * `full-1`. - * Number values are also supported, e.g. `1`. - -`index.gc_deletes`:: - experimental[] - -`index.ttl.disable_purge`:: - experimental[] Disables temporarily the purge of expired docs. - -<>:: - All the settings for the store level throttling policy currently configured. - -`index.translog.fs.type`:: - experimental[] Either `simple` or `buffered` (default). - -<>:: - All the settings for slow log. - -`index.warmer.enabled`:: - See <>. Defaults to `true`. +The list of per-index settings which can be updated dynamically on live +indices can be found in <>. [float] [[bulk]] diff --git a/docs/reference/mapping.asciidoc b/docs/reference/mapping.asciidoc index 7e11fe658a228..ce514e5a1ed7e 100644 --- a/docs/reference/mapping.asciidoc +++ b/docs/reference/mapping.asciidoc @@ -56,10 +56,10 @@ value as a numeric type). The `index.mapping.coerce` global setting can be set on the index level to coerce numeric content globally across all -mapping types (The default setting is true and coercions attempted are +mapping types (The default setting is true and coercions attempted are to convert strings with numbers into numeric types and also numeric values with fractions to any integer/short/long values minus the fraction part). -When the permitted conversions fail in their attempts, the value is considered +When the permitted conversions fail in their attempts, the value is considered malformed and the ignore_malformed setting dictates what will happen next. -- @@ -69,6 +69,8 @@ include::mapping/types.asciidoc[] include::mapping/date-format.asciidoc[] +include::mapping/fielddata_formats.asciidoc[] + include::mapping/dynamic-mapping.asciidoc[] include::mapping/meta.asciidoc[] diff --git a/docs/reference/index-modules/fielddata.asciidoc b/docs/reference/mapping/fielddata_formats.asciidoc similarity index 72% rename from docs/reference/index-modules/fielddata.asciidoc rename to docs/reference/mapping/fielddata_formats.asciidoc index b54c45a04e09c..eda3566258e6f 100644 --- a/docs/reference/index-modules/fielddata.asciidoc +++ b/docs/reference/mapping/fielddata_formats.asciidoc @@ -1,87 +1,5 @@ -[[index-modules-fielddata]] -== Field data - -The field data cache is used mainly when sorting on or computing aggregations -on a field. It loads all the field values to memory in order to provide fast -document based access to those values. The field data cache can be -expensive to build for a field, so its recommended to have enough memory -to allocate it, and to keep it loaded. - -The amount of memory used for the field -data cache can be controlled using `indices.fielddata.cache.size`. Note: -reloading the field data which does not fit into your cache will be expensive -and perform poorly. - -[cols="<,<",options="header",] -|======================================================================= -|Setting |Description -|`indices.fielddata.cache.size` |The max size of the field data cache, -eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults -to unbounded. - -|`indices.fielddata.cache.expire` |experimental[] A time based setting that expires -field data after a certain time of inactivity. Defaults to `-1`. For -example, can be set to `5m` for a 5 minute expiry. -|======================================================================= - -[float] -[[circuit-breaker]] -=== Circuit Breaker - -Elasticsearch contains multiple circuit breakers used to prevent operations from -causing an OutOfMemoryError. Each breaker specifies a limit for how much memory -it can use. Additionally, there is a parent-level breaker that specifies the -total amount of memory that can be used across all breakers. - -The parent-level breaker can be configured with the following setting: - -`indices.breaker.total.limit`:: - Starting limit for overall parent breaker, defaults to 70% of JVM heap - -All circuit breaker settings can be changed dynamically using the cluster update -settings API. - -[float] -[[fielddata-circuit-breaker]] -==== Field data circuit breaker -The field data circuit breaker allows Elasticsearch to estimate the amount of -memory a field will require to be loaded into memory. It can then prevent the -field data loading by raising an exception. By default the limit is configured -to 60% of the maximum JVM heap. It can be configured with the following -parameters: - -`indices.breaker.fielddata.limit`:: - Limit for fielddata breaker, defaults to 60% of JVM heap - -`indices.breaker.fielddata.overhead`:: - A constant that all field data estimations are multiplied with to determine a - final estimation. Defaults to 1.03 - -[float] -[[request-circuit-breaker]] -==== Request circuit breaker - -The request circuit breaker allows Elasticsearch to prevent per-request data -structures (for example, memory used for calculating aggregations during a -request) from exceeding a certain amount of memory. - -`indices.breaker.request.limit`:: - Limit for request breaker, defaults to 40% of JVM heap - -`indices.breaker.request.overhead`:: - A constant that all request estimations are multiplied with to determine a - final estimation. Defaults to 1 - -[float] -[[fielddata-monitoring]] -=== Monitoring field data - -You can monitor memory usage for field data as well as the field data circuit -breaker using -<> - [[fielddata-formats]] -== Field data formats +== Fielddata formats The field data format controls how field data should be stored. @@ -111,7 +29,7 @@ It is possible to change the field data format (and the field data settings in general) on a live index by using the update mapping API. [float] -==== String field data types +=== String field data types `paged_bytes` (default on analyzed string fields):: Stores unique terms sequentially in a large buffer and maps documents to @@ -123,7 +41,7 @@ in general) on a live index by using the update mapping API. `not_analyzed`). [float] -==== Numeric field data types +=== Numeric field data types `array`:: Stores field values in memory using arrays. @@ -132,7 +50,7 @@ in general) on a live index by using the update mapping API. Computes and stores field data data-structures on disk at indexing time. [float] -==== Geo point field data types +=== Geo point field data types `array`:: Stores latitudes and longitudes in arrays. @@ -142,7 +60,7 @@ in general) on a live index by using the update mapping API. [float] [[global-ordinals]] -==== Global ordinals +=== Global ordinals Global ordinals is a data-structure on top of field data, that maintains an incremental numbering for all the terms in field data in a lexicographic order. diff --git a/docs/reference/mapping/types/core-types.asciidoc b/docs/reference/mapping/types/core-types.asciidoc index 76d2728a22f29..5d6a8e06bbb9b 100644 --- a/docs/reference/mapping/types/core-types.asciidoc +++ b/docs/reference/mapping/types/core-types.asciidoc @@ -200,7 +200,7 @@ PUT my_index/_mapping/my_type Please however note that norms won't be removed instantly, but will be removed as old segments are merged into new segments as you continue indexing new documents. -Any score computation on a field that has had +Any score computation on a field that has had norms removed might return inconsistent results since some documents won't have norms anymore while other documents might still have norms. @@ -484,7 +484,7 @@ binary type: It is possible to control which field values are loaded into memory, which is particularly useful for aggregations on string fields, using fielddata filters, which are explained in detail in the -<> section. +<> section. Fielddata filters can exclude terms which do not match a regex, or which don't fall between a `min` and `max` frequency range: diff --git a/docs/reference/modules.asciidoc b/docs/reference/modules.asciidoc index 3fa55cf364daf..9f175cc2fa2a1 100644 --- a/docs/reference/modules.asciidoc +++ b/docs/reference/modules.asciidoc @@ -1,6 +1,75 @@ [[modules]] = Modules +[partintro] +-- +This section contains modules responsible for various aspects of the functionality in Elasticsearch. Each module has settings which may be: + +_static_:: + +These settings must be set at the node level, either in the +`elasticsearch.yml` file, or as an environment variable or on the command line +when starting a node. They must be set on every relevant node in the cluster. + +_dynamic_:: + +These settings can be dynamically updated on a live cluster with the +<> API. + +The modules in this section are: + +<>:: + + Settings to control where, when, and how shards are allocated to nodes. + +<>:: + + How nodes discover each other to form a cluster. + +<>:: + + How many nodes need to join the cluster before recovery can start. + +<>:: + + Settings to control the HTTP REST interface. + +<>:: + + Global index-related settings. + +<>:: + + Controls default network settings. + +<>:: + + A Java node client joins the cluster, but doesn't hold data or act as a master node. + +<>:: + + Using plugins to extend Elasticsearch. + +<>:: + + Custom scripting available in Lucene Expressions, Groovy, Python, and + Javascript. + +<>:: + + Backup your data with snapshot/restore. + +<>:: + + Information about the dedicated thread pools used in Elasticsearch. + +<>:: + + Configure the transport networking layer, used internally by Elasticsearch + to communicate between nodes. +-- + + include::modules/cluster.asciidoc[] include::modules/discovery.asciidoc[] @@ -15,19 +84,20 @@ include::modules/network.asciidoc[] include::modules/node.asciidoc[] -include::modules/tribe.asciidoc[] - include::modules/plugins.asciidoc[] include::modules/scripting.asciidoc[] include::modules/advanced-scripting.asciidoc[] +include::modules/snapshots.asciidoc[] + include::modules/threadpool.asciidoc[] include::modules/transport.asciidoc[] -include::modules/snapshots.asciidoc[] +include::modules/tribe.asciidoc[] + diff --git a/docs/reference/modules/advanced-scripting.asciidoc b/docs/reference/modules/advanced-scripting.asciidoc index ba96a6ec7ab34..b4e907cb3136c 100644 --- a/docs/reference/modules/advanced-scripting.asciidoc +++ b/docs/reference/modules/advanced-scripting.asciidoc @@ -1,5 +1,5 @@ [[modules-advanced-scripting]] -== Text scoring in scripts +=== Text scoring in scripts Text features, such as term or document frequency for a specific term can be accessed in scripts (see <> ) with the `_index` variable. This can be useful if, for example, you want to implement your own scoring model using for example a script inside a <>. @@ -7,7 +7,7 @@ Statistics over the document collection are computed *per shard*, not per index. [float] -=== Nomenclature: +==== Nomenclature: [horizontal] @@ -33,7 +33,7 @@ depending on the shard the current document resides in. [float] -=== Shard statistics: +==== Shard statistics: `_index.numDocs()`:: @@ -49,7 +49,7 @@ depending on the shard the current document resides in. [float] -=== Field statistics: +==== Field statistics: Field statistics can be accessed with a subscript operator like this: `_index['FIELD']`. @@ -74,7 +74,7 @@ depending on the shard the current document resides in. The number of terms in a field cannot be accessed using the `_index` variable. See <> on how to do that. [float] -=== Term statistics: +==== Term statistics: Term statistics for a field can be accessed with a subscript operator like this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist. @@ -101,7 +101,7 @@ affect is your set the `index_options` to `docs` (see <>). To access them, call `_index.termVectors()` to get a diff --git a/docs/reference/modules/cluster.asciidoc b/docs/reference/modules/cluster.asciidoc index 25a88b2eeebe7..37a2927b67d7f 100644 --- a/docs/reference/modules/cluster.asciidoc +++ b/docs/reference/modules/cluster.asciidoc @@ -1,253 +1,36 @@ [[modules-cluster]] == Cluster -[float] -[[shards-allocation]] -=== Shards Allocation +One of the main roles of the master is to decide which shards to allocate to +which nodes, and when to move shards between nodes in order to rebalance the +cluster. -Shards allocation is the process of allocating shards to nodes. This can -happen during initial recovery, replica allocation, rebalancing, or -handling nodes being added or removed. +There are a number of settings available to control the shard allocation process: -The following settings may be used: +* <> lists the settings to control the allocation an + rebalancing operations. -`cluster.routing.allocation.allow_rebalance`:: - Allow to control when rebalancing will happen based on the total - state of all the indices shards in the cluster. `always`, - `indices_primaries_active`, and `indices_all_active` are allowed, - defaulting to `indices_all_active` to reduce chatter during - initial recovery. +* <> explains how Elasticsearch takes available disk space + into account, and the related settings. +* <> and <> control how shards can + be distributed across different racks or availability zones. -`cluster.routing.allocation.cluster_concurrent_rebalance`:: - Allow to control how many concurrent rebalancing of shards are - allowed cluster wide, and default it to `2`. +* <> allows certain nodes or groups of nodes excluded + from allocation so that they can be decommisioned. +Besides these, there are a few other <>. -`cluster.routing.allocation.node_initial_primaries_recoveries`:: - Allow to control specifically the number of initial recoveries - of primaries that are allowed per node. Since most times local - gateway is used, those should be fast and we can handle more of - those per node without creating load. Defaults to `4`. +All of the settings in this section are _dynamic_ settings which can be +updated on a live cluster with the +<> API. +include::cluster/shards_allocation.asciidoc[] -`cluster.routing.allocation.node_concurrent_recoveries`:: - How many concurrent recoveries are allowed to happen on a node. - Defaults to `2`. +include::cluster/disk_allocator.asciidoc[] -`cluster.routing.allocation.enable`:: +include::cluster/allocation_awareness.asciidoc[] -Controls shard allocation for all indices, by allowing specific -kinds of shard to be allocated. -+ --- -Can be set to: +include::cluster/allocation_filtering.asciidoc[] -* `all` - (default) Allows shard allocation for all kinds of shards. -* `primaries` - Allows shard allocation only for primary shards. -* `new_primaries` - Allows shard allocation only for primary shards for new indices. -* `none` - No shard allocations of any kind are allowed for all indices. --- - -`cluster.routing.rebalance.enable`:: - -Controls shard rebalance for all indices, by allowing specific -kinds of shard to be rebalanced. -+ --- -Can be set to: - -* `all` - (default) Allows shard balancing for all kinds of shards. -* `primaries` - Allows shard balancing only for primary shards. -* `replicas` - Allows shard balancing only for replica shards. -* `none` - No shard balancing of any kind are allowed for all indices. --- - -`cluster.routing.allocation.same_shard.host`:: - Allows to perform a check to prevent allocation of multiple instances - of the same shard on a single host, based on host name and host address. - Defaults to `false`, meaning that no check is performed by default. This - setting only applies if multiple nodes are started on the same machine. - -`indices.recovery.concurrent_streams`:: - The number of streams to open (on a *node* level) to recover a - shard from a peer shard. Defaults to `3`. - -`indices.recovery.concurrent_small_file_streams`:: - The number of streams to open (on a *node* level) for small files (under - 5mb) to recover a shard from a peer shard. Defaults to `2`. - -[float] -[[allocation-awareness]] -=== Shard Allocation Awareness - -Cluster allocation awareness allows to configure shard and replicas -allocation across generic attributes associated the nodes. Lets explain -it through an example: - -Assume we have several racks. When we start a node, we can configure an -attribute called `rack_id` (any attribute name works), for example, here -is a sample config: - ----------------------- -node.rack_id: rack_one ----------------------- - -The above sets an attribute called `rack_id` for the relevant node with -a value of `rack_one`. Now, we need to configure the `rack_id` attribute -as one of the awareness allocation attributes (set it on *all* (master -eligible) nodes config): - --------------------------------------------------------- -cluster.routing.allocation.awareness.attributes: rack_id --------------------------------------------------------- - -The above will mean that the `rack_id` attribute will be used to do -awareness based allocation of shard and its replicas. For example, lets -say we start 2 nodes with `node.rack_id` set to `rack_one`, and deploy a -single index with 5 shards and 1 replica. The index will be fully -deployed on the current nodes (5 shards and 1 replica each, total of 10 -shards). - -Now, if we start two more nodes, with `node.rack_id` set to `rack_two`, -shards will relocate to even the number of shards across the nodes, but, -a shard and its replica will not be allocated in the same `rack_id` -value. - -The awareness attributes can hold several values, for example: - -------------------------------------------------------------- -cluster.routing.allocation.awareness.attributes: rack_id,zone -------------------------------------------------------------- - -*NOTE*: When using awareness attributes, shards will not be allocated to -nodes that don't have values set for those attributes. - -[float] -[[forced-awareness]] -=== Forced Awareness - -Sometimes, we know in advance the number of values an awareness -attribute can have, and more over, we would like never to have more -replicas than needed allocated on a specific group of nodes with the -same awareness attribute value. For that, we can force awareness on -specific attributes. - -For example, lets say we have an awareness attribute called `zone`, and -we know we are going to have two zones, `zone1` and `zone2`. Here is how -we can force awareness on a node: - -[source,js] -------------------------------------------------------------------- -cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 -cluster.routing.allocation.awareness.attributes: zone -------------------------------------------------------------------- - -Now, lets say we start 2 nodes with `node.zone` set to `zone1` and -create an index with 5 shards and 1 replica. The index will be created, -but only 5 shards will be allocated (with no replicas). Only when we -start more shards with `node.zone` set to `zone2` will the replicas be -allocated. - -[float] -==== Automatic Preference When Searching / GETing - -When executing a search, or doing a get, the node receiving the request -will prefer to execute the request on shards that exists on nodes that -have the same attribute values as the executing node. This only happens -when the `cluster.routing.allocation.awareness.attributes` setting has -been set to a value. - -[float] -==== Realtime Settings Update - -The settings can be updated using the <> on a live cluster. - -[float] -[[allocation-filtering]] -=== Shard Allocation Filtering - -Allow to control allocation of indices on nodes based on include/exclude -filters. The filters can be set both on the index level and on the -cluster level. Lets start with an example of setting it on the cluster -level: - -Lets say we have 4 nodes, each has specific attribute called `tag` -associated with it (the name of the attribute can be any name). Each -node has a specific value associated with `tag`. Node 1 has a setting -`node.tag: value1`, Node 2 a setting of `node.tag: value2`, and so on. - -We can create an index that will only deploy on nodes that have `tag` -set to `value1` and `value2` by setting -`index.routing.allocation.include.tag` to `value1,value2`. For example: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/test/_settings -d '{ - "index.routing.allocation.include.tag" : "value1,value2" -}' --------------------------------------------------- - -On the other hand, we can create an index that will be deployed on all -nodes except for nodes with a `tag` of value `value3` by setting -`index.routing.allocation.exclude.tag` to `value3`. For example: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/test/_settings -d '{ - "index.routing.allocation.exclude.tag" : "value3" -}' --------------------------------------------------- - -`index.routing.allocation.require.*` can be used to -specify a number of rules, all of which MUST match in order for a shard -to be allocated to a node. This is in contrast to `include` which will -include a node if ANY rule matches. - -The `include`, `exclude` and `require` values can have generic simple -matching wildcards, for example, `value1*`. A special attribute name -called `_ip` can be used to match on node ip values. In addition `_host` -attribute can be used to match on either the node's hostname or its ip -address. Similarly `_name` and `_id` attributes can be used to match on -node name and node id accordingly. - -Obviously a node can have several attributes associated with it, and -both the attribute name and value are controlled in the setting. For -example, here is a sample of several node configurations: - -[source,js] --------------------------------------------------- -node.group1: group1_value1 -node.group2: group2_value4 --------------------------------------------------- - -In the same manner, `include`, `exclude` and `require` can work against -several attributes, for example: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/test/_settings -d '{ - "index.routing.allocation.include.group1" : "xxx", - "index.routing.allocation.include.group2" : "yyy", - "index.routing.allocation.exclude.group3" : "zzz", - "index.routing.allocation.require.group4" : "aaa" -}' --------------------------------------------------- - -The provided settings can also be updated in real time using the update -settings API, allowing to "move" indices (shards) around in realtime. - -Cluster wide filtering can also be defined, and be updated in real time -using the cluster update settings API. This setting can come in handy -for things like decommissioning nodes (even if the replica count is set -to 0). Here is a sample of how to decommission a node based on `_ip` -address: - -[source,js] --------------------------------------------------- -curl -XPUT localhost:9200/_cluster/settings -d '{ - "transient" : { - "cluster.routing.allocation.exclude._ip" : "10.0.0.1" - } -}' --------------------------------------------------- +include::cluster/misc.asciidoc[] diff --git a/docs/reference/modules/cluster/allocation_awareness.asciidoc b/docs/reference/modules/cluster/allocation_awareness.asciidoc new file mode 100644 index 0000000000000..208656126ff93 --- /dev/null +++ b/docs/reference/modules/cluster/allocation_awareness.asciidoc @@ -0,0 +1,107 @@ +[[allocation-awareness]] +=== Shard Allocation Awareness + +When running nodes on multiple VMs on the same physical server, on multiple +racks, or across multiple awareness zones, it is more likely that two nodes on +the same physical server, in the same rack, or in the same awareness zone will +crash at the same time, rather than two unrelated nodes crashing +simultaneously. + +If Elasticsearch is _aware_ of the physical configuration of your hardware, it +can ensure that the primary shard and its replica shards are spread across +different physical servers, racks, or zones, to minimise the risk of losing +all shard copies at the same time. + +The shard allocation awareness settings allow you to tell Elasticsearch about +your hardware configuration. + +As an example, let's assume we have several racks. When we start a node, we +can tell it which rack it is in by assigning it an arbitrary metadata +attribute called `rack_id` -- we could use any attribute name. For example: + +[source,sh] +---------------------- +./bin/elasticsearch --node.rack_id rack_one <1> +---------------------- +<1> This setting could also be specified in the `elasticsearch.yml` config file. + +Now, we need to setup _shard allocation awareness_ by telling Elasticsearch +which attributes to use. This can be configured in the `elasticsearch.yml` +file on *all* master-eligible nodes, or it can be set (and changed) with the +<> API. + +For our example, we'll set the value in the config file: + +[source,yaml] +-------------------------------------------------------- +cluster.routing.allocation.awareness.attributes: rack_id +-------------------------------------------------------- + +With this config in place, let's say we start two nodes with `node.rack_id` +set to `rack_one`, and we create an index with 5 primary shards and 1 replica +of each primary. All primaries and replicas are allocated across the two +nodes. + +Now, if we start two more nodes with `node.rack_id` set to `rack_two`, +Elasticsearch will move shards across to the new nodes, ensuring (if possible) +that the primary and replica shards are never in the same rack. + +.Prefer local shards +********************************************* + +When executing search or GET requests, with shard awareness enabled, +Elasticsearch will prefer using local shards -- shards in the same awareness +group -- to execute the request. This is usually faster than crossing racks or +awareness zones. + +********************************************* + +Multiple awareness attributes can be specified, in which case the combination +of values from each attribute is considered to be a separate value. + +[source,yaml] +------------------------------------------------------------- +cluster.routing.allocation.awareness.attributes: rack_id,zone +------------------------------------------------------------- + +NOTE: When using awareness attributes, shards will not be allocated to +nodes that don't have values set for those attributes. + +[float] +[[forced-awareness]] +=== Forced Awareness + +Imagine that you have two awareness zones and enough hardware across the two +zones to host all of your primary and replica shards. But perhaps the +hardware in a single zone, while sufficient to host half the shards, would be +unable to host *ALL* the shards. + +With ordinary awareness, if one zone lost contact with the other zone, +Elasticsearch would assign all of the missing replica shards to a single zone. +But in this example, this sudden extra load would cause the hardware in the +remaining zone to be overloaded. + +Forced awareness solves this problem by *NEVER* allowing copies of the same +shard to be allocated to the same zone. + +For example, lets say we have an awareness attribute called `zone`, and +we know we are going to have two zones, `zone1` and `zone2`. Here is how +we can force awareness on a node: + +[source,yaml] +------------------------------------------------------------------- +cluster.routing.allocation.awareness.force.zone.values: zone1,zone2 <1> +cluster.routing.allocation.awareness.attributes: zone +------------------------------------------------------------------- +<1> We must list all possible values that the `zone` attribute can have. + +Now, if we start 2 nodes with `node.zone` set to `zone1` and create an index +with 5 shards and 1 replica. The index will be created, but only the 5 primary +shards will be allocated (with no replicas). Only when we start more shards +with `node.zone` set to `zone2` will the replicas be allocated. + +The `cluster.routing.allocation.awareness.*` settings can all be updated +dynamically on a live cluster with the +<> API. + + diff --git a/docs/reference/modules/cluster/allocation_filtering.asciidoc b/docs/reference/modules/cluster/allocation_filtering.asciidoc new file mode 100644 index 0000000000000..6fa0343ee4c49 --- /dev/null +++ b/docs/reference/modules/cluster/allocation_filtering.asciidoc @@ -0,0 +1,70 @@ +[[allocation-filtering]] +=== Shard Allocation Filtering + +While <> provides *per-index* settings to control the +allocation of shards to nodes, cluster-level shard allocation filtering allows +you to allow or disallow the allocation of shards from *any* index to +particular nodes. + +The typical use case for cluster-wide shard allocation filtering is when you +want to decommision a node, and you would like to move the shards from that +node to other nodes in the cluster before shutting it down. + +For instance, we could decomission a node using its IP address as follows: + +[source,json] +-------------------------------------------------- +PUT /_cluster/settings +{ + "transient" : { + "cluster.routing.allocation.exclude._ip" : "10.0.0.1" + } +} +-------------------------------------------------- +// AUTOSENSE + +NOTE: Shards will only be relocated if it is possible to do so without +breaking another routing constraint, such as never allocating a primary and +replica shard to the same node. + +Cluster-wide shard allocation filtering works in the same way as index-level +shard allocation filtering (see <> for details). + +The available _dynamic_ cluster settings are as follows, where `{attribute}` +refers to an arbitrary node attribute.: + +`cluster.routing.allocation.include.{attribute}`:: + + Assign the index to a node whose `{attribute}` has at least one of the + comma-separated values. + +`cluster.routing.allocation.require.{attribute}`:: + + Assign the index to a node whose `{attribute}` has _all_ of the + comma-separated values. + +`cluster.routing.allocation.exclude.{attribute}`:: + + Assign the index to a node whose `{attribute}` has _none_ of the + comma-separated values. + +These special attributes are also supported: + +[horizontal] +`_name`:: Match nodes by node name +`_ip`:: Match nodes by IP address (the IP address associated with the hostname) +`_host`:: Match nodes by hostname + +All attribute values can be specified with wildcards, eg: + +[source,json] +------------------------ +PUT _cluster/settings +{ + "transient": { + "cluster.routing.allocation.include._ip": "192.168.2.*" + } +} +------------------------ +// AUTOSENSE + diff --git a/docs/reference/modules/cluster/disk_allocator.asciidoc b/docs/reference/modules/cluster/disk_allocator.asciidoc new file mode 100644 index 0000000000000..09b504529db91 --- /dev/null +++ b/docs/reference/modules/cluster/disk_allocator.asciidoc @@ -0,0 +1,69 @@ +[[disk-allocator]] +=== Disk-based Shard Allocation + +Elasticsearch factors in the available disk space on a node before deciding +whether to allocate new shards to that node or to actively relocate shards +away from that node. + +Below are the settings that can be configred in the `elasticsearch.yml` config +file or updated dynamically on a live cluster with the +<> API: + +`cluster.routing.allocation.disk.threshold_enabled`:: + + Defaults to `true`. Set to `false` to disable the disk allocation decider. + +`cluster.routing.allocation.disk.watermark.low`:: + + Controls the low watermark for disk usage. It defaults to 85%, meaning ES will + not allocate new shards to nodes once they have more than 85% disk used. It + can also be set to an absolute byte value (like 500mb) to prevent ES from + allocating shards if less than the configured amount of space is available. + +`cluster.routing.allocation.disk.watermark.high`:: + + Controls the high watermark. It defaults to 90%, meaning ES will attempt to + relocate shards to another node if the node disk usage rises above 90%. It can + also be set to an absolute byte value (similar to the low watermark) to + relocate shards once less than the configured amount of space is available on + the node. + +NOTE: Percentage values refer to used disk space, while byte values refer to +free disk space. This can be confusing, since it flips the meaning of high and +low. For example, it makes sense to set the low watermark to 10gb and the high +watermark to 5gb, but not the other way around. + + +`cluster.info.update.interval`:: + + How often Elasticsearch should check on disk usage for each node in the + cluster. Defaults to `30s`. + +`cluster.routing.allocation.disk.include_relocations`:: + + Defaults to +true+, which means that Elasticsearch will take into account + shards that are currently being relocated to the target node when computing a + node's disk usage. Taking relocating shards' sizes into account may, however, + mean that the disk usage for a node is incorrectly estimated on the high side, + since the relocation could be 90% complete and a recently retrieved disk usage + would include the total size of the relocating shard as well as the space + already used by the running relocation. + + +An example of updating the low watermark to no more than 80% of the disk size, a +high watermark of at least 50 gigabytes free, and updating the information about +the cluster every minute: + +[source,js] +-------------------------------------------------- +PUT /_cluster/settings +{ + "transient": { + "cluster.routing.allocation.disk.watermark.low": "80%", + "cluster.routing.allocation.disk.watermark.high": "50gb", + "cluster.info.update.interval": "1m" + } +} +-------------------------------------------------- +// AUTOSENSE + diff --git a/docs/reference/modules/cluster/misc.asciidoc b/docs/reference/modules/cluster/misc.asciidoc new file mode 100644 index 0000000000000..554324df97e81 --- /dev/null +++ b/docs/reference/modules/cluster/misc.asciidoc @@ -0,0 +1,36 @@ +[[misc-cluster]] +=== Miscellaneous cluster settings + +[[cluster-read-only]] +==== Metadata + +An entire cluster may be set to read-only with the following _dynamic_ setting: + +`cluster.blocks.read_only`:: + + Make the whole cluster read only (indices do not accept write + operations), metadata is not allowed to be modified (create or delete + indices). + +WARNING: Don't rely on this setting to prevent changes to your cluster. Any +user with access to the <> +API can make the cluster read-write again. + + +[[cluster-logger]] +==== Logger + +The settings which control logging can be updated dynamically with the +`logger.` prefix. For instance, to increase the logging level of the +`indices.recovery` module to `DEBUG`, issue this request: + +[source,json] +------------------------------- +PUT /_cluster/settings +{ + "transient": { + "logger.indices.recovery": "DEBUG" + } +} +------------------------------- + diff --git a/docs/reference/modules/cluster/shards_allocation.asciidoc b/docs/reference/modules/cluster/shards_allocation.asciidoc new file mode 100644 index 0000000000000..1daf131106ded --- /dev/null +++ b/docs/reference/modules/cluster/shards_allocation.asciidoc @@ -0,0 +1,124 @@ +[[shards-allocation]] +=== Cluster Level Shard Allocation + +Shard allocation is the process of allocating shards to nodes. This can +happen during initial recovery, replica allocation, rebalancing, or +when nodes are added or removed. + +[float] +=== Shard Allocation Settings + +The following _dynamic_ settings may be used to control shard allocation and recovery: + +`cluster.routing.allocation.enable`:: ++ +-- +Enable or disable allocation for specific kinds of shards: + +* `all` - (default) Allows shard allocation for all kinds of shards. +* `primaries` - Allows shard allocation only for primary shards. +* `new_primaries` - Allows shard allocation only for primary shards for new indices. +* `none` - No shard allocations of any kind are allowed for any indices. + +This setting does not affect the recovery of local primary shards when +restarting a node. A restarted node that has a copy of an unassigned primary +shard will recover that primary immediately, assuming that the +<> setting is +satisfied. + +-- + +`cluster.routing.allocation.node_concurrent_recoveries`:: + + How many concurrent shard recoveries are allowed to happen on a node. + Defaults to `2`. + +`cluster.routing.allocation.node_initial_primaries_recoveries`:: + + While the recovery of replicas happens over the network, the recovery of + an unassigned primary after node restart uses data from the local disk. + These should be fast so more initial primary recoveries can happen in + parallel on the same node. Defaults to `4`. + + +`cluster.routing.allocation.same_shard.host`:: + + Allows to perform a check to prevent allocation of multiple instances of + the same shard on a single host, based on host name and host address. + Defaults to `false`, meaning that no check is performed by default. This + setting only applies if multiple nodes are started on the same machine. + +`indices.recovery.concurrent_streams`:: + + The number of network streams to open per node to recover a shard from + a peer shard. Defaults to `3`. + +`indices.recovery.concurrent_small_file_streams`:: + + The number of streams to open per node for small files (under 5mb) to + recover a shard from a peer shard. Defaults to `2`. + + +[float] +=== Shard Rebalancing Settings + +The following _dynamic_ settings may be used to control the rebalancing of +shards across the cluster: + + +`cluster.routing.rebalance.enable`:: ++ +-- +Enable or disable rebalancing for specific kinds of shards: + +* `all` - (default) Allows shard balancing for all kinds of shards. +* `primaries` - Allows shard balancing only for primary shards. +* `replicas` - Allows shard balancing only for replica shards. +* `none` - No shard balancing of any kind are allowed for any indices. +-- + + +`cluster.routing.allocation.allow_rebalance`:: ++ +-- +Specify when shard rebalancing is allowed: + + +* `always` - (default) Always allow rebalancing. +* `indices_primaries_active` - Only when all primaries in the cluster are allocated. +* `indices_all_active` - Only when all shards (primaries and replicas) in the cluster are allocated. +-- + +`cluster.routing.allocation.cluster_concurrent_rebalance`:: + + Allow to control how many concurrent shard rebalances are + allowed cluster wide. Defaults to `2`. + +[float] +=== Shard Balancing Heuristics + +The following settings are used together to determine where to place each +shard. The cluster is balanced when no allowed action can bring the weights +of each node closer together by more then the `balance.threshold`. + +`cluster.routing.allocation.balance.shard`:: + + Defines the weight factor for shards allocated on a node + (float). Defaults to `0.45f`. Raising this raises the tendency to + equalize the number of shards across all nodes in the cluster. + +`cluster.routing.allocation.balance.index`:: + + Defines a factor to the number of shards per index allocated + on a specific node (float). Defaults to `0.55f`. Raising this raises the + tendency to equalize the number of shards per index across all nodes in + the cluster. + +`cluster.routing.allocation.balance.threshold`:: + Minimal optimization value of operations that should be performed (non + negative float). Defaults to `1.0f`. Raising this will cause the cluster + to be less aggressive about optimizing the shard balance. + + +NOTE: Regardless of the result of the balancing algorithm, rebalancing might +not be allowed due to forced awareness or allocation filtering. diff --git a/docs/reference/modules/gateway.asciidoc b/docs/reference/modules/gateway.asciidoc index 5d8ed730fe649..3ce7a920a2f25 100644 --- a/docs/reference/modules/gateway.asciidoc +++ b/docs/reference/modules/gateway.asciidoc @@ -1,69 +1,51 @@ [[modules-gateway]] -== Gateway - -The gateway module allows one to store the state of the cluster meta -data across full cluster restarts. The cluster meta data mainly holds -all the indices created with their respective (index level) settings and -explicit type mappings. - -Each time the cluster meta data changes (for example, when an index is -added or deleted), those changes will be persisted using the gateway. -When the cluster first starts up, the state will be read from the -gateway and applied. - -The gateway set on the node level will automatically control the index gateway -that will be used. For example, if the `local` gateway is used (the default), -then each index created on the node will automatically use its own respective -index level `local` gateway. - -The default gateway used is the -<> gateway. - -The `none` gateway option was removed in Elasticsearch 2.0. - -[float] -[[recover-after]] -=== Recovery After Nodes / Time - -In many cases, the actual cluster meta data should only be recovered -after specific nodes have started in the cluster, or a timeout has -passed. This is handy when restarting the cluster, and each node local -index storage still exists to be reused and not recovered from the -gateway (which reduces the time it takes to recover from the gateway). - -The `gateway.recover_after_nodes` setting (which accepts a number) -controls after how many data and master eligible nodes within the -cluster recovery will start. The `gateway.recover_after_data_nodes` and -`gateway.recover_after_master_nodes` setting work in a similar fashion, -except they consider only the number of data nodes and only the number -of master nodes respectively. The `gateway.recover_after_time` setting -(which accepts a time value) sets the time to wait till recovery happens -once all `gateway.recover_after...nodes` conditions are met. - -The `gateway.expected_nodes` allows to set how many data and master -eligible nodes are expected to be in the cluster, and once met, the -`gateway.recover_after_time` is ignored and recovery starts. -Setting `gateway.expected_nodes` also defaults `gateway.recover_after_time` to `5m` The `gateway.expected_data_nodes` and `gateway.expected_master_nodes` -settings are also supported. For example setting: - -[source,js] --------------------------------------------------- -gateway: - recover_after_time: 5m - expected_nodes: 2 --------------------------------------------------- - -In an expected 2 nodes cluster will cause recovery to start 5 minutes -after the first node is up, but once there are 2 nodes in the cluster, -recovery will begin immediately (without waiting). - -Note, once the meta data has been recovered from the gateway (which -indices to create, mappings and so on), then this setting is no longer -effective until the next full restart of the cluster. - -Operations are blocked while the cluster meta data has not been -recovered in order not to mix with the actual cluster meta data that -will be recovered once the settings has been reached. - -include::gateway/local.asciidoc[] +== Local Gateway + +The local gateway module stores the cluster state and shard data across full +cluster restarts. + +The following _static_ settings, which must be set on every data node in the +cluster, controls how long nodes should wait before they try to recover any +shards which are stored locally: + +`gateway.expected_nodes`:: + + The number of (data or master) nodes that are expected to be in the cluster. + Recovery of local shards will start as soon as the expected number of + nodes have joined the cluster. Defaults to `0` + +`gateway.expected_master_nodes`:: + + The number of master nodes that are expected to be in the cluster. + Recovery of local shards will start as soon as the expected number of + master nodes have joined the cluster. Defaults to `0` + +`gateway.expected_data_nodes`:: + + The number of data nodes that are expected to be in the cluster. + Recovery of local shards will start as soon as the expected number of + data nodes have joined the cluster. Defaults to `0` + +`gateway.recover_after_time`:: + + If the expected number of nodes is not achieved, the recovery process waits + for the configured amount of time before trying to recover regardless. + Defaults to `5m` if one of the `expected_nodes` settings is configured. + +Once the `recover_after_time` duration has timed out, recovery will start +as long as the following conditions are met: + +`gateway.recover_after_nodes`:: + + Recover as long as this many data or master nodes have joined the cluster. + +`gateway.recover_after_master_nodes`:: + + Recover as long as this many master nodes have joined the cluster. + +`gateway.recover_after_data_nodes`:: + + Recover as long as this many data nodes have joined the cluster. + +NOTE: These settings only take effect on a full cluster restart. diff --git a/docs/reference/modules/gateway/local.asciidoc b/docs/reference/modules/gateway/local.asciidoc deleted file mode 100644 index fdcfec62091b3..0000000000000 --- a/docs/reference/modules/gateway/local.asciidoc +++ /dev/null @@ -1,56 +0,0 @@ -[[modules-gateway-local]] -=== Local Gateway - -The local gateway allows for recovery of the full cluster state and -indices from the local storage of each node, and does not require a -common node level shared storage. - -Note, different from shared gateway types, the persistency to the local -gateway is *not* done in an async manner. Once an operation is -performed, the data is there for the local gateway to recover it in case -of full cluster failure. - -It is important to configure the `gateway.recover_after_nodes` setting -to include most of the expected nodes to be started after a full cluster -restart. This will insure that the latest cluster state is recovered. -For example: - -[source,js] --------------------------------------------------- -gateway: - recover_after_nodes: 3 - expected_nodes: 5 --------------------------------------------------- - -[float] -==== Dangling indices - -When a node joins the cluster, any shards/indices stored in its local `data/` -directory which do not already exist in the cluster will be imported into the -cluster by default. This functionality has two purposes: - -1. If a new master node is started which is unaware of the other indices in - the cluster, adding the old nodes will cause the old indices to be - imported, instead of being deleted. - -2. An old index can be added to an existing cluster by copying it to the - `data/` directory of a new node, starting the node and letting it join - the cluster. Once the index has been replicated to other nodes in the - cluster, the new node can be shut down and removed. - -The import of dangling indices can be controlled with the -`gateway.auto_import_dangled` which accepts: - -[horizontal] -`yes`:: - - Import dangling indices into the cluster (default). - -`close`:: - - Import dangling indices into the cluster state, but leave them closed. - -`no`:: - - Delete dangling indices after `gateway.dangling_timeout`, which - defaults to 2 hours. diff --git a/docs/reference/modules/indices.asciidoc b/docs/reference/modules/indices.asciidoc index f5302c9fbb1e2..e8f2ef05ce1ab 100644 --- a/docs/reference/modules/indices.asciidoc +++ b/docs/reference/modules/indices.asciidoc @@ -1,66 +1,50 @@ [[modules-indices]] == Indices -The indices module allow to control settings that are globally managed -for all indices. +The indices module controls index-related settings that are globally managed +for all indices, rather than being configurable at a per-index level. -[float] -[[buffer]] -=== Indexing Buffer +Available settings include: -The indexing buffer setting allows to control how much memory will be -allocated for the indexing process. It is a global setting that bubbles -down to all the different shards allocated on a specific node. +<>:: -The `indices.memory.index_buffer_size` accepts either a percentage or a -byte size value. It defaults to `10%`, meaning that `10%` of the total -memory allocated to a node will be used as the indexing buffer size. -This amount is then divided between all the different shards. Also, if -percentage is used, it is possible to set `min_index_buffer_size` (defaults to -`48mb`) and `max_index_buffer_size` (defaults to unbounded). + Circuit breakers set limits on memory usage to avoid out of memory exceptions. -The `indices.memory.min_shard_index_buffer_size` allows to set a hard -lower limit for the memory allocated per shard for its own indexing -buffer. It defaults to `4mb`. +<>:: -[float] -[[indices-ttl]] -=== TTL interval + Set limits on the amount of heap used by the in-memory fielddata cache. -You can dynamically set the `indices.ttl.interval`, which allows to set how -often expired documents will be automatically deleted. The default value -is 60s. +<>:: -The deletion orders are processed by bulk. You can set -`indices.ttl.bulk_size` to fit your needs. The default value is 10000. + Configure the amount heap used to cache filter results. -See also <>. +<>:: -[float] -[[recovery]] -=== Recovery + Control the size of the buffer allocated to the indexing process. -The following settings can be set to manage the recovery policy: +<>:: -[horizontal] -`indices.recovery.concurrent_streams`:: - defaults to `3`. + Control the behaviour of the shard-level query cache. -`indices.recovery.concurrent_small_file_streams`:: - defaults to `2`. +<>:: -`indices.recovery.file_chunk_size`:: - defaults to `512kb`. + Control the resource limits on the shard recovery process. -`indices.recovery.translog_ops`:: - defaults to `1000`. +<>:: -`indices.recovery.translog_size`:: - defaults to `512kb`. + Control how expired documents are removed. -`indices.recovery.compress`:: - defaults to `true`. +include::indices/circuit_breaker.asciidoc[] -`indices.recovery.max_bytes_per_sec`:: - defaults to `40mb`. +include::indices/fielddata.asciidoc[] + +include::indices/filter_cache.asciidoc[] + +include::indices/indexing_buffer.asciidoc[] + +include::indices/query-cache.asciidoc[] + +include::indices/recovery.asciidoc[] + +include::indices/ttl_interval.asciidoc[] diff --git a/docs/reference/modules/indices/circuit_breaker.asciidoc b/docs/reference/modules/indices/circuit_breaker.asciidoc new file mode 100644 index 0000000000000..1caa87f920298 --- /dev/null +++ b/docs/reference/modules/indices/circuit_breaker.asciidoc @@ -0,0 +1,56 @@ +[[circuit-breaker]] +=== Circuit Breaker + +Elasticsearch contains multiple circuit breakers used to prevent operations from +causing an OutOfMemoryError. Each breaker specifies a limit for how much memory +it can use. Additionally, there is a parent-level breaker that specifies the +total amount of memory that can be used across all breakers. + +These settings can be dynamically updated on a live cluster with the +<> API. + +[[parent-circuit-breaker]] +[float] +==== Parent circuit breaker + +The parent-level breaker can be configured with the following setting: + +`indices.breaker.total.limit`:: + + Starting limit for overall parent breaker, defaults to 70% of JVM heap. + +[[fielddata-circuit-breaker]] +[float] +==== Field data circuit breaker +The field data circuit breaker allows Elasticsearch to estimate the amount of +memory a field will require to be loaded into memory. It can then prevent the +field data loading by raising an exception. By default the limit is configured +to 60% of the maximum JVM heap. It can be configured with the following +parameters: + +`indices.breaker.fielddata.limit`:: + + Limit for fielddata breaker, defaults to 60% of JVM heap + +`indices.breaker.fielddata.overhead`:: + + A constant that all field data estimations are multiplied with to determine a + final estimation. Defaults to 1.03 + +[[request-circuit-breaker]] +[float] +==== Request circuit breaker + +The request circuit breaker allows Elasticsearch to prevent per-request data +structures (for example, memory used for calculating aggregations during a +request) from exceeding a certain amount of memory. + +`indices.breaker.request.limit`:: + + Limit for request breaker, defaults to 40% of JVM heap + +`indices.breaker.request.overhead`:: + + A constant that all request estimations are multiplied with to determine a + final estimation. Defaults to 1 + diff --git a/docs/reference/modules/indices/fielddata.asciidoc b/docs/reference/modules/indices/fielddata.asciidoc new file mode 100644 index 0000000000000..eda1ff48e797e --- /dev/null +++ b/docs/reference/modules/indices/fielddata.asciidoc @@ -0,0 +1,37 @@ +[[modules-fielddata]] +=== Fielddata + +The field data cache is used mainly when sorting on or computing aggregations +on a field. It loads all the field values to memory in order to provide fast +document based access to those values. The field data cache can be +expensive to build for a field, so its recommended to have enough memory +to allocate it, and to keep it loaded. + +The amount of memory used for the field +data cache can be controlled using `indices.fielddata.cache.size`. Note: +reloading the field data which does not fit into your cache will be expensive +and perform poorly. + +`indices.fielddata.cache.size`:: + + The max size of the field data cache, eg `30%` of node heap space, or an + absolute value, eg `12GB`. Defaults to unbounded. Also see + <>. + +`indices.fielddata.cache.expire`:: + + experimental[] A time based setting that expires field data after a + certain time of inactivity. Defaults to `-1`. For example, can be set to + `5m` for a 5 minute expiry. + +NOTE: These are static settings which must be configured on every data node in +the cluster. + +[float] +[[fielddata-monitoring]] +==== Monitoring field data + +You can monitor memory usage for field data as well as the field data circuit +breaker using +<> + diff --git a/docs/reference/modules/indices/filter_cache.asciidoc b/docs/reference/modules/indices/filter_cache.asciidoc new file mode 100644 index 0000000000000..163a75138b489 --- /dev/null +++ b/docs/reference/modules/indices/filter_cache.asciidoc @@ -0,0 +1,16 @@ +[[filter-cache]] +=== Node Filter Cache + +The filter cache is responsible for caching the results of filters (used in +the query). There is one filter cache per node that is shared by all shards. +The cache implements an LRU eviction policy: when a cache becomes full, the +least recently used data is evicted to make way for new data. + +The following setting is _static_ and must be configured on every data node in +the cluster: + +`indices.cache.filter.size`:: + + Controls the memory size for the filter cache , defaults to `10%`. Accepts + either a percentage value, like `30%`, or an exact value, like `512mb`. + diff --git a/docs/reference/modules/indices/indexing_buffer.asciidoc b/docs/reference/modules/indices/indexing_buffer.asciidoc new file mode 100644 index 0000000000000..e648573314768 --- /dev/null +++ b/docs/reference/modules/indices/indexing_buffer.asciidoc @@ -0,0 +1,32 @@ +[[indexing-buffer]] +=== Indexing Buffer + +The indexing buffer is used to store newly indexed documents. When it fills +up, the documents in the buffer are written to a segment on disk. It is divided +between all shards on the node. + +The following settings are _static_ and must be configured on every data node +in the cluster: + +`indices.memory.index_buffer_size`:: + + Accepts either a percentage or a byte size value. It defaults to `10%`, + meaning that `10%` of the total heap allocated to a node will be used as the + indexing buffer size. + +`indices.memory.min_index_buffer_size`:: + + If the `index_buffer_size` is specified as a percentage, then this + setting can be used to specify an absolute minimum. Defaults to `48mb`. + +`indices.memory.max_index_buffer_size`:: + + If the `index_buffer_size` is specified as a percentage, then this + setting can be used to specify an absolute maximum. Defaults to unbounded. + +`indices.memory.min_shard_index_buffer_size`:: + + Sets a hard lower limit for the memory allocated per shard for its own + indexing buffer. Defaults to `4mb`. + + diff --git a/docs/reference/index-modules/query-cache.asciidoc b/docs/reference/modules/indices/query-cache.asciidoc similarity index 93% rename from docs/reference/index-modules/query-cache.asciidoc rename to docs/reference/modules/indices/query-cache.asciidoc index 6f74bf415602e..444e578de06cf 100644 --- a/docs/reference/index-modules/query-cache.asciidoc +++ b/docs/reference/modules/indices/query-cache.asciidoc @@ -1,5 +1,5 @@ -[[index-modules-shard-query-cache]] -== Shard query cache +[[shard-query-cache]] +=== Shard query cache When a search request is run against an index or against many indices, each involved shard executes the search locally and returns its local results to @@ -13,7 +13,7 @@ use case, where only the most recent index is being actively updated -- results from older indices will be served directly from the cache. [IMPORTANT] -================================== +=================================== For now, the query cache will only cache the results of search requests where `size=0`, so it will not cache `hits`, @@ -21,10 +21,10 @@ but it will cache `hits.total`, <>, and <>. Queries that use `now` (see <>) cannot be cached. -================================== +=================================== [float] -=== Cache invalidation +==== Cache invalidation The cache is smart -- it keeps the same _near real-time_ promise as uncached search. @@ -46,7 +46,7 @@ curl -XPOST 'localhost:9200/kimchy,elasticsearch/_cache/clear?query_cache=true' ------------------------ [float] -=== Enabling caching by default +==== Enabling caching by default The cache is not enabled by default, but can be enabled when creating a new index as follows: @@ -73,7 +73,7 @@ curl -XPUT localhost:9200/my_index/_settings -d' ----------------------------- [float] -=== Enabling caching per request +==== Enabling caching per request The `query_cache` query-string parameter can be used to enable or disable caching on a *per-query* basis. If set, it overrides the index-level setting: @@ -99,7 +99,7 @@ it uses a random function or references the current time) you should set the `query_cache` flag to `false` to disable caching for that request. [float] -=== Cache key +==== Cache key The whole JSON body is used as the cache key. This means that if the JSON changes -- for instance if keys are output in a different order -- then the @@ -110,7 +110,7 @@ keys are always emitted in the same order. This canonical mode can be used in the application to ensure that a request is always serialized in the same way. [float] -=== Cache settings +==== Cache settings The cache is managed at the node level, and has a default maximum size of `1%` of the heap. This can be changed in the `config/elasticsearch.yml` file with: @@ -126,7 +126,7 @@ stale results are automatically invalidated when the index is refreshed. This setting is provided for completeness' sake only. [float] -=== Monitoring cache usage +==== Monitoring cache usage The size of the cache (in bytes) and the number of evictions can be viewed by index, with the <> API: diff --git a/docs/reference/modules/indices/recovery.asciidoc b/docs/reference/modules/indices/recovery.asciidoc new file mode 100644 index 0000000000000..cd21f135e3848 --- /dev/null +++ b/docs/reference/modules/indices/recovery.asciidoc @@ -0,0 +1,28 @@ +[[recovery]] +=== Indices Recovery + +The following _expert_ settings can be set to manage the recovery policy. + +`indices.recovery.concurrent_streams`:: + Defaults to `3`. + +`indices.recovery.concurrent_small_file_streams`:: + Defaults to `2`. + +`indices.recovery.file_chunk_size`:: + Defaults to `512kb`. + +`indices.recovery.translog_ops`:: + Defaults to `1000`. + +`indices.recovery.translog_size`:: + Defaults to `512kb`. + +`indices.recovery.compress`:: + Defaults to `true`. + +`indices.recovery.max_bytes_per_sec`:: + Defaults to `40mb`. + +These settings can be dynamically updated on a live cluster with the +<> API: diff --git a/docs/reference/modules/indices/ttl_interval.asciidoc b/docs/reference/modules/indices/ttl_interval.asciidoc new file mode 100644 index 0000000000000..5e3069302d731 --- /dev/null +++ b/docs/reference/modules/indices/ttl_interval.asciidoc @@ -0,0 +1,16 @@ +[[indices-ttl]] +=== TTL interval + +Documents that have a <> value set need to be deleted +once they have expired. How and how often they are deleted is controlled by +the following dynamic cluster settings: + +`indices.ttl.interval`:: + + How often the deletion process runs. Defaults to `60s`. + +`indices.ttl.bulk_size`:: + + The deletions are processed with a <>. + The number of deletions processed can be configured with + this settings. Defaults to `10000`. diff --git a/docs/reference/modules/plugins.asciidoc b/docs/reference/modules/plugins.asciidoc index ecc0a9ab98c63..f208a2f14aa81 100644 --- a/docs/reference/modules/plugins.asciidoc +++ b/docs/reference/modules/plugins.asciidoc @@ -22,7 +22,7 @@ Installing plugins typically take the following form: [source,shell] ----------------------------------- -plugin --install // +bin/plugin --install // ----------------------------------- The plugins will be diff --git a/docs/reference/modules/threadpool.asciidoc b/docs/reference/modules/threadpool.asciidoc index 1cec70a41ec94..77fe73feb4b63 100644 --- a/docs/reference/modules/threadpool.asciidoc +++ b/docs/reference/modules/threadpool.asciidoc @@ -9,7 +9,6 @@ of discarded. There are several thread pools, but the important ones include: -[horizontal] `index`:: For index/delete operations. Defaults to `fixed` with a size of `# of available processors`, diff --git a/docs/reference/search/request-body.asciidoc b/docs/reference/search/request-body.asciidoc index 43beff6ce5e59..cfd0394fc2d9b 100644 --- a/docs/reference/search/request-body.asciidoc +++ b/docs/reference/search/request-body.asciidoc @@ -73,7 +73,7 @@ And here is a sample response: Set to `true` or `false` to enable or disable the caching of search results for requests where `size` is 0, ie aggregations and suggestions (no top hits returned). - See <>. + See <>. `terminate_after`:: diff --git a/docs/resiliency/index.asciidoc b/docs/resiliency/index.asciidoc index 8497a02567c26..990bd5593f3e1 100644 --- a/docs/resiliency/index.asciidoc +++ b/docs/resiliency/index.asciidoc @@ -416,7 +416,7 @@ The Snapshot/Restore API supports a number of different repository types for sto [float] === Circuit Breaker: Fielddata (STATUS: DONE, v1.0.0) -Currently, the https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-fielddata.html[circuit breaker] protects against loading too much field data by estimating how much memory the field data will take to load, then aborting the request if the memory requirements are too high. This feature was added in Elasticsearch version 1.0.0. +Currently, the https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-fielddata.html[circuit breaker] protects against loading too much field data by estimating how much memory the field data will take to load, then aborting the request if the memory requirements are too high. This feature was added in Elasticsearch version 1.0.0. [float] === Use of Paginated Data Structures to Ease Garbage Collection (STATUS: DONE, v1.0.0 & v1.2.0)