Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ZSTD and ZSTD_NO_DICT codecs to core (out of sandbox) #4421

Merged
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
90e4dcf
documentation for zstd and zstd out of sandbox
sarthakaggarwal97 Jun 29, 2023
969d3fb
Update _api-reference/index-apis/create-index.md
hdhalter Jul 11, 2023
f8bc687
Update _api-reference/index-apis/create-index.md
hdhalter Jul 11, 2023
57c46b0
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
6c8c087
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
642f364
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
2cbdbdb
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
0374170
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
d79675c
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
4908884
Update create-index.md
hdhalter Jul 13, 2023
793db12
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
a050110
Update create-index.md
hdhalter Jul 13, 2023
8ae0f2d
Update _api-reference/index-apis/create-index.md
hdhalter Jul 13, 2023
3a0e510
Update create-index.md
hdhalter Jul 13, 2023
a66b865
Update create-index.md
hdhalter Jul 14, 2023
7046371
Update create-index.md
hdhalter Jul 14, 2023
bfc1d32
Update _api-reference/index-apis/create-index.md
hdhalter Jul 14, 2023
ea7964f
Update _api-reference/index-apis/create-index.md
hdhalter Jul 14, 2023
9e4509a
Update _api-reference/index-apis/create-index.md
hdhalter Jul 14, 2023
e4e7846
Update _api-reference/index-apis/create-index.md
hdhalter Jul 14, 2023
988f4df
Update _api-reference/index-apis/create-index.md
hdhalter Jul 14, 2023
c2feee0
Update _api-reference/index-apis/create-index.md
hdhalter Jul 14, 2023
6b4fad4
Update _api-reference/index-apis/create-index.md
hdhalter Jul 14, 2023
3ade466
Update _api-reference/index-apis/create-index.md
hdhalter Jul 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 41 additions & 5 deletions _api-reference/index-apis/create-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,8 @@ timeout | Time | How long to wait for the request to return. Default is `30s`.

As part of your request, you can supply parameters in your request's body that specify index settings, mappings, and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) for your newly created index. The following sections provide more information about index settings and mappings.


### Index settings

Index settings are separated into two varieties: static index settings and dynamic index settings. Static index settings are settings that you specify at index creation and can't change later. You can change dynamic settings at any time, including at index creation.
Index settings are separated into two types: static index settings and dynamic index settings. Static index settings are settings that you specify at index creation and can't change later. You can change dynamic settings at any time, including at index creation.

#### Static index settings

Expand All @@ -82,7 +80,6 @@ Setting | Description
index.number_of_shards | The number of primary shards in the index. Default is 1.
index.number_of_routing_shards | The number of routing shards used to split an index.
index.shard.check_on_startup | Whether the index's shards should be checked for corruption. Available options are `false` (do not check for corruption), `checksum` (check for physical corruption), and `true` (check for both physical and logical corruption). Default is `false`.
index.codec | The compression type to use to compress stored data. Available values are `default` (optimizes for retrieval speed) and `best_compression` (optimizes for better compression at the expense of speed, leading to smaller data sizes on disk). For snapshot distributions built with the sandbox feature enabled, `-Dsandbox.enabled=true`, OpenSearch offers a custom-codecs plugin that supports the value `zstd` for Zstandard compression.
index.routing_partition_size | The number of shards a custom routing value can go to. Routing helps an imbalanced cluster by relocating values to a subset of shards rather than just a single shard. To enable, set this value to greater than 1 but less than `index.number_of_shards`. Default is 1.
index.soft_deletes.retention_lease.period | The maximum amount of time to retain a shard's history of operations. Default is `12h`.
index.load_fixed_bitset_filters_eagerly | Whether OpenSearch should pre-load cached filters. Available options are `true` and `false`. Default is `true`.
Expand All @@ -94,6 +91,7 @@ Setting | Description
:--- | :---
index.number_of_replicas | The number of replica shards each primary shard should have. For example, if you have 4 primary shards and set `index.number_of_replicas` to 3, the index has 12 replica shards. Default is 1.
index.auto_expand_replicas | Whether the cluster should automatically add replica shards based on the number of data nodes. Specify a lower bound and upper limit (for example, 0-9), or `all` for the upper limit. For example, if you have 5 data nodes and set `index.auto_expand_replicas` to 0-3, then the cluster does not automatically add another replica shard. However, if you set this value to `0-all` and add 2 more nodes for a total of 7, the cluster will expand to now have 6 replica shards. Default is disabled.
index.codec | Determines how the index’s stored fields are compressed and stored on the disk. This setting impacts the size of the index shards and the performance of the operations on the index. Available values are: `default', 'best_compression`, `zstd` and `zstd_no_dict`. Two new codecs are introduced in OpenSearch 2.9: `zstd` and `zstd_no_dict`. They provide an option to configure the compression level as an index setting `index.codec.compression_level` which is not available for other codecs. For information about each setting, see [Index codec settings](#Index-codec-settings).
Copy link
Contributor Author

@sarthakaggarwal97 sarthakaggarwal97 Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a line over here that index.codec settings cannot be updated in real-time.
Similarly, index.codec.compression_level can also not be updated at real time.

index.search.idle.after | Amount of time a shard should wait for a search or get request until it goes idle. Default is `30s`.
index.refresh_interval | How often the index should refresh, which publishes its most recent changes and makes them available for searching. Can be set to `-1` to disable refreshing. Default is `1s`.
index.max_result_window | The maximum value of `from` + `size` for searches to the index. `from` is the starting index to search from, and `size` is the amount of results to return. Default: 10000.
Expand All @@ -113,4 +111,42 @@ index.routing.allocation.enable | Specifies options for the index’s shard allo
index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow rebalancing for all shards), `primaries` (allow rebalancing only for primary shards), `replicas` (allow rebalancing only for replicas), and `none` (do not allow rebalancing). Default is `all`.
index.gc_deletes | Amount of time to retain a deleted document's version number. Default is `60s`.
index.default_pipeline | The default ingest node pipeline for the index. If the default pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.

#### Index codec settings
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we will add an additional page to explain the index.codec settings around the time of release.
This information might be redundant post the addition of that page, how do you suggest we should proceed @hdhalter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it out here: We might need to update the documentation related to snapshot, and we can check if we can add something over here at best practices for performance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add more information about indexing, in general, and will add more information to support these options after the release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide me an endpoint so I can test the code samples on Monday? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarthakaggarwal97 - The documentation related to snapshots has been updated. The link above to 'best practices' is part of the Service documentation and needs to be handled by @chatnish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide me an endpoint so I can test the code samples on Monday? Thanks!

Do you want a domain where you can test these code samples?

The `index.codec` setting of an OpenSearch index determines how the index’s stored fields are compressed and stored on the disk. The setting impacts the size of the index shards and the performance of the operations on the index. OpenSearch provides support for four different codecs that can be used for compressing the stored fields. Each codec offers different trade-offs between compression ratio (storage size) and indexing performance (speed). The available codecs are:
hdhalter marked this conversation as resolved.
Show resolved Hide resolved
* `default` - This codec employs the `LZ4` algorithm with a preset dictionary, which prioritizes performance over compression ratio. It offers faster indexing and search operations when compared with `best_compression`, but may result in larger index/shard sizes. If no codec is provided in the index settings, then `LZ4` is used as default algorithm for compression.
hdhalter marked this conversation as resolved.
Show resolved Hide resolved
* `best_compression` - This codec utilizes `zlib` as an underlying algorithm for compression. It achieves high compression ratios resulting in smaller index sizes. However, this may incur additional CPU usage during operations on the index and subsequently may result in high indexing and search latencies.
hdhalter marked this conversation as resolved.
Show resolved Hide resolved
* `zstd` - This codec uses the [`Zstandard` compression algorithm](https://github.com/facebook/zstd), which provides a good balance between compression ratio and speed. It provides significant compression comparable to `best_compression` codec with reasonable CPU usage and improved indexing/search performance comparable to `default` codec.
hdhalter marked this conversation as resolved.
Show resolved Hide resolved
* `zstd_no_dict` This codec is similar to `zstd` but excludes the dictionary compression feature. It provides faster indexing and search operations compared to `zstd` at the expense of a slightly larger index size.
hdhalter marked this conversation as resolved.
Show resolved Hide resolved

`Compression_level` provides a trade-off between compression ratio and speed. Higher compression level results in higher compression ratio (lesser storage size) with a trade off on speed, that is, slower compression and decompression speeds (slower indexing and search latencies). Currently, the `zstd` and `zstd_no_dict` supports the compression level in the range from 1 to 6. Similar to `index.codec`, `index.codec.compression_level` is an optional index setting. Default compression level 3 is used if an option is not provided.

The setting of an index can be updated using a PUT request. Here's an example using the curl commands to close an index, update the settings, and open an index.
hdhalter marked this conversation as resolved.
Show resolved Hide resolved

```json
# Close the index
curl -XPOST "http://localhost:9200/your_index/_close"
```
{% include copy.html %}

```json
# Update the index.codec setting
curl -XPUT "http://localhost:9200/your_index/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"codec": "zstd_no_dict"
"codec.compression_level": 3
}
}
'
```
{% include copy.html %}

```json
# Reopen the index
curl -XPOST "http://localhost:9200/your_index/_open"
```
{% include copy.html %}


15 changes: 0 additions & 15 deletions _install-and-configure/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,21 +289,6 @@ Members of the OpenSearch community have built countless plugins for the service
| store-smb | 1.0.0 |
| transport-nio | 1.0.0 |

### Experimental plugins

OpenSearch offers experimental plugins that may be used in a snapshot distribution that has the [sandbox feature enabled](https://github.com/opensearch-project/OpenSearch/blob/main/sandbox/build.gradle).

| Plugin Name | Description | Earliest Available Version |
| :--- | :--- |
| custom-codecs | Provides additional compression codecs. | 1.0.0 |

Use the following example command to enable the sandbox feature:

```bash
./gradlew assemble -Dsandbox.enabled=true
bin/opensearch-plugin install file:///path/to/plugin-<version>-SNAPSHOT.zip
```

## Related links

- [About Observability]({{site.url}}{{site.baseurl}}/observability-plugin/index/)
Expand Down