💡 Pre-requistes: Add a broken index (this is more about the diagnosis commands than the actual broken index)
# delete if already in place
DELETE broken_index
# add a new "broken" index
PUT broken_index
{
"settings": {
"number_of_replicas": 2,
"number_of_shards": 2
}
}
# add a document
PUT broken_index/_doc/1
{
"field1" : "data"
}
❓ Check The Cluster Health
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/cluster-health.htmlGET _cluster/health
// output
{
"cluster_name" : "elastic-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 48,
"active_shards" : 48,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 16,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 75.0
}
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/cat-health.html
GET _cat/health?v
// output
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1633541301 17:28:21 elastic-cluster yellow 1 1 48 48 0 0 16 0 - 75.0%
index health usually matches the cluster health
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/cat-indices.htmlQuery Parameters health (Optional, string) Health status used to limit returned indices. Valid values are:
- green
- yellow
- red
GET _cat/indices?health=yellow
// output
yellow open broken_index xHfY0EcGRq-RRZv_cQONCw 2 2 1 0 4kb 4kb
❓ Diagnose the fault in index broken_index
View Solution (click to reveal)
Explain the index health
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/cluster-allocation-explain.html
GET _cluster/allocation/explain
{
"index": "broken_index"
}
// output
{
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: shard must be specified;2: primary must be specified;"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: shard must be specified;2: primary must be specified;"
},
"status" : 400
}
❓ View the shard allocation for the index
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/cat-shards.html
GET _cat/shards/broken_index?v&s=index
// output
index shard prirep state docs store ip node
broken_index 1 p STARTED 0 208b 172.20.0.2 esnode
broken_index 1 r UNASSIGNED
broken_index 1 r UNASSIGNED
broken_index 0 p STARTED 1 3.8kb 172.20.0.2 esnode
broken_index 0 r UNASSIGNED
broken_index 0 r UNASSIGNED
Here we can see that a shard is UNASSIGNED
.
Why is that?
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/cluster-allocation-explain.html
A lot of information is presented here, if you have a larger underlying issue you will see many explainations across many indices. Try to keep that in mind.
For each index and then each node, read the explanations
❓ Repair by reducing the number of replicas required. This matches the number of replia nodes available. In this case 0 as we are running on a single node cluster.
PUT /broken_index/_settings
{
"number_of_replicas": 0
}
// output
{
"acknowledged" : true
}
Check the index health again
GET _cat/indices/broken_index?v
// output
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open broken_index xHfY0EcGRq-RRZv_cQONCw 2 0 1 0 4kb 4kb
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshots-take-snapshot.html
You cannot back up an Elasticsearch cluster by simply taking a copy of the data directories of all of its nodes.
Elasticsearch may be making changes to the contents of its data directories while it is running; copying its data directories cannot be expected to capture a consistent picture of their contents.
If you try to restore a cluster from such a backup, it may fail and report corruption and/or missing files. Alternatively, it may appear to have succeeded though it silently lost some of its data.
The only reliable way to back up a cluster is by using the snapshot and restore functionality.
- Back up the data
- Back up the cluster configuration
- Back up the security configuration
- Restore the data
- Restore the security configuration
Version compatibility refers to the underlying Lucene index compatibility. Follow the Upgrade documentation when migrating between versions.
A snapshot contains a copy of the on-disk data structures that make up an index. This means that snapshots can only be restored to versions of Elasticsearch that can read the indices:
- A snapshot of an index created in 6.x can be restored to 7.x.
- A snapshot of an index created in 5.x can be restored to 6.x.
- A snapshot of an index created in 2.x can be restored to 5.x.
- A snapshot of an index created in 1.x can be restored to 2.x.
You must register a snapshot repository before you can perform snapshot and restore operations. We recommend creating a new snapshot repository for each major version. The valid repository settings depend on the repository type.
❓ Backup and restore an index using snapshots
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-snapshots.html (references other documentation)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshot-restore.html
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshots-register-repository.html#self-managed-repo-types
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/restore-snapshot-api.html#restore-snapshot-api-index-settings
- ❓ Backup the
shakespeare
index to a snapshot calledshakespeare_snapshot_<current_date>
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshot-restore.html
You will need to make sure that the path.repo
setting has been applied to each ElasticSearch node before doing this.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshots-register-repository.html
The docker images in this Git Repository have this set in the 1es-1kb-xpackSec.yml
single node cluster. Which was used predominately through out the other sections.
Normally you would save the snapshots to share storage like NFS, AWS S3 etc. In this demo we use the local filesystem /tmp
this is not recommended in production.
This can all be done in the kibana GUI https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshot-restore.html
GET /_nodes?pretty&filter_path=nodes.*.settings.path
// output
{
"nodes" : {
"HKbyLT8xRMC08bJO32XNFg" : {
"settings" : {
"path" : {
"logs" : "/usr/share/elasticsearch/logs",
"home" : "/usr/share/elasticsearch",
"repo" : "/tmp"
}
}
}
}
}
As you can see the path.repo
is set to /tmp
.
PUT /_snapshot/my_test_backup
{
"type": "fs",
"settings": {
"location": "/tmp/test1",
"compress": true
}
}
// output
{
"acknowledged" : true
}
Notice that /tmp
needed to be available and that you can then append a path to that. eg. /tmp/test1
Date math requires the snapshot name to be enclosed in angled brackets '<>' that is: '%3C' '%3E'
You must enclose date math names in angle brackets. If you use the name in a request path,
special characters must be URI encoded.
PUT /_snapshot/my_test_backup/%3Cshakespeare-snapshot-%7Bnow%2Fd%7D%3E
{
"indices": "shakespeare",
"ignore_unavailable": true,
"include_global_state": false
}
List all snapshots
GET /_snapshot/my_test_backup/_all
// or
GET /_snapshot/my_test_backup/*
// output
{
"snapshots" : [
{
"snapshot" : "shakespeare-snapshot-2021.05.13",
"uuid" : "t0Qk_gDtT1uqvmrbNg7yIQ",
"version_id" : 7020099,
"version" : "7.2.0",
"indices" : [
"shakespeare"
],
"include_global_state" : false,
"state" : "SUCCESS",
"start_time" : "2021-05-13T18:59:27.117Z",
"start_time_in_millis" : 1620932367117,
"end_time" : "2021-05-13T18:59:28.050Z",
"end_time_in_millis" : 1620932368050,
"duration_in_millis" : 933,
"failures" : [ ],
"shards" : {
"total" : 1,
"failed" : 0,
"successful" : 1
}
}
]
}
- ❓ Restore the
shakespeare_snapshot_<current_date>
index snapshot to the namerestored_index_shakespeare
View Solution (click to reveal)
POST /_snapshot/my_test_backup/shakespeare-snapshot-2021.05.13/_restore
{
"indices": "shakespeare",
"ignore_unavailable": true,
"include_global_state": true,
"rename_pattern": "(.+)",
"rename_replacement": "restored_index_$1"
}
// output
{
"accepted" : true
}
Check the restored index
GET _cat/indices/*shakespeare
// output
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open shakespeare h21mMC7ZRWGB1OPjgW0VuQ 1 1 111396 0 20.5mb 20.5mb
yellow open restored_index_shakespeare gKIdyU4jSnqmuBLRqPpZLw 1 1 111396 0 20.5mb 20.5mb
- 📕 Backup/Restore the cluster configuration
We recommend that you take regular (ideally, daily) backups of your Elasticsearch config ($ES_PATH_CONF) directory using the file backup software of your choice.
Normally /etc/elasticsearch
Some settings in configuration files might be overridden by cluster settings. You can capture these settings in a data backup snapshot by specifying the include_global_state: true (default) parameter for the snapshot API.
Alternatively, you can extract these configuration values in text format by using the get settings API:
GET _cluster/settings?pretty&flat_settings&filter_path=persistent
So, it's a probably good idea to backup your /etc/elasticsearch/
folder and run an external API call to download the persistant settings to a text file.
- 📕 Backup/Restore the Security configuration
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/security-backup.html (just a reference to the following 2 links) https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshots-take-snapshot.html#back-up-config-files https://www.elastic.co/guide/en/elasticsearch/reference/8.1/snapshots-take-snapshot.html#cluster-state-snapshots
Elasticsearch security features are configured using the xpack.security namespace inside the elasticsearch.yml and elasticsearch.keystore files. In addition there are several other extra configuration files inside the same ES_PATH_CONF directory. These files define roles and role mappings and configure the file realm.
Elasticsearch security features store system configuration data inside a dedicated index. This index is named .security-6 in the Elasticsearch 6.x versions and .security-7 in the 7.x releases. The .security alias always points to the appropriate index. This index contains the data which is not available in configuration files and cannot be reliably backed up using standard filesystem tools. This data describes:
- the definition of users in the native realm (including hashed passwords)
- role definitions (defined via the create roles API)
- role mappings (defined via the create role mappings API)
- application privileges
- API keys
The .security index thus contains resources and definitions in addition to configuration information. All of that information is required in a complete security features backup.
Use the standard Elasticsearch snapshot functionality to backup .security, as you would for any other data index.
Snapshot the .security index in a dedicated repository, where read and write access is strictly restricted and audited.
So, backup the /etc/elasticsearch
folder,
and
snapshot the .security
index alias.
See https://www.elastic.co/guide/en/elasticsearch/reference/7.13/restore-security-configuration.html
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/searchable-snapshots.html
Searchable snapshots let you use snapshots to search infrequently accessed and read-only data in a very cost-effective fashion. The cold and frozen data tiers use searchable snapshots to reduce your storage and operating costs.
Searchable snapshots eliminate the need for replica shards, potentially halving the local storage needed to search your data. Searchable snapshots rely on the same snapshot mechanism you already use for backups and have minimal impact on your snapshot repository storage costs.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ilm-searchable-snapshot.html
Well worth watching: https://www.youtube.com/watch?v=nN6JNP9i3qQ
❓ Create a searchable snapshot of the Kibana eCommerce data.
View Solution (click to reveal)
Defaults to 90% of total disk space for dedicated frozen data tier nodes. Otherwise defaults to 0b.
xpack.searchable.snapshot.shared_cache.size=100mb
PUT /_snapshot/my_snapshots
{
"type": "fs",
"settings": {
"location": "/tmp/snapshots",
"compress": true
}
}
GET _snapshot/my_snapshots
PUT /_snapshot/my_snapshots/%3Cecomm-snapshot-%7Bnow%2Fd%7D%3E
{
"indices": "kibana_sample_data_ecommerce",
"ignore_unavailable": true,
"include_global_state": false
}
GET _snapshot/my_snapshots/ecomm*
This is the Pièce de résistance - the snapshot is mounted
in local shared cache.
POST _snapshot/my_snapshots/ecomm-snapshot-2021.10.07/_mount?storage=shared_cache
{
"index" : "kibana_sample_data_ecommerce",
"renamed_index": "mounted-ecomm"
}
GET _cat/indices/mounted-ecomm?v
GET _cat/count/mounted-ecomm
GET mounted-ecomm/_count
GET mounted-ecomm/_search
{
"size": 5,
"query": {
"match_all": {}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/searchable-snapshots-api-clear-cache.html clear the cache
POST /mounted-ecomm/_searchable_snapshots/cache/clear
and delete mounted index
DELETE mounted-ecomm
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/modules-cross-cluster-search.html
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"cluster_one": {
"seeds": [
"10.0.0.1:9300"
]
},
"cluster_two": {
"seeds": [
"2.0.0.1:9300"
]
},
"cluster_three": {
"seeds": [
"3.0.0.1:9300"
]
}
}
}
}
}
Here we have three clusters, one on the local network somewhere and two others out on the internet.
💡 This setup can be easily achieved within the Kibana GUI, and is part of the cross cluster replication below.
See this section
GET _cluster/settings?pretty&flat_settings
// output
{
"persistent" : {
"cluster.remote.west-cluster.mode" : "sniff",
"cluster.remote.west-cluster.node_connections" : "3",
"cluster.remote.west-cluster.seeds" : [
"esnode-west:9300"
],
"cluster.remote.west-cluster.skip_unavailable" : "false"
},
"transient" : { }
}
GET west-cluster:follower-kibana_sample_data_ecommerce/_count
// output
{
"count" : 4675,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
GET /cluster_one:twitter/_search
{
"query": {
"match": {
"user": "kimchy"
}
}
}
Here we search one of the remote clusters from inside out local cluster.
Remote clusters are accessed as such <remote_name>:<index_name>
GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
{
"query": {
"match": {
"user": "kimchy"
}
}
}
Here we search the local cluster and two remote clusters.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/xpack-ccr.html
With cross-cluster replication, you can replicate indices across clusters to:
- Continue handling search requests in the event of a datacenter outage
- Prevent search volume from impacting indexing throughput
- Reduce search latency by processing search requests in geo-proximity to the user
Cross-cluster replication uses an active-passive model. You index to a leader index, and the data is replicated to one or more read-only follower indices. Before you can add a follower index to a cluster, you must configure the remote cluster that contains the leader index.
This is a heavily involved process - follow this link - with the guidelines below
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ccr-getting-started.html (reference to the following)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/ccr-getting-started-tutorial.html
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/remote-clusters-connect.html
❓ Replicate kibana_sample_data_ecommerce
from the east
cluster to the west cluster
View Solution (click to reveal)
Use the 2es-2kibana-xpack-cluster-713.yml
docker compose file.
This also requires kibana-east.yml
and kibana-west.yml
.
The docker-compose file has the correct node settings applied.
node.roles=master,data,ingest,remote_cluster_client
You will have to setup a 30day trial licence, to be able to do this part of the lab. This can easily be done once the nodes are up.
Stack Management -> License management -> Start a 30-day trial -> Start trial {click} -> Start my trial {click}
Do this in Kibana as per the instructions above.
Stack Management -> Remote Clusters -> Add a remote cluster
Cluster | Kibana URL | Remote Name | Seed Nodes | Direction |
---|---|---|---|---|
East | http://<your host ip>:5601 | west-cluster | esnode-west:9300 | Leader |
West | http://<your host ip>:5602 | east-cluster | esnode-east:9300 | Follower |
Add the sample data to the Leader (east)
Add the Kibana eCommerce sample data.
Replicate the following index from East to West.
kibana_sample_data_ecommerce
The index status changes to Paused. When the remote recovery process is complete, the index following begins and the status changes to Active.
On West, create an auto-follow pattern test-ccr-index-*
On East, create a timeseries index test-ccr-index-000000
Add docs to the new time series index, and confirm they are on West
Name | Remote Cluster | Index Patterns | Prefix | Suffix |
---|---|---|---|---|
east-auto-follow | east-cluster | test-ccr-index-* | follower- | none |
PUT test-ccr-index-000000
{
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"mappings": {
"properties": {
"@timestamp" : {
"type": "date"
},
"data": {
"type": "text"
}
}
}
}
Add the docs
POST test-ccr-index-000000/_doc
{
"@timestamp": "2021-10-07T11:50:00.000Z",
"data" : "test data"
}
POST test-ccr-index-000000/_doc
{
"@timestamp": "2021-10-07T11:55:00.000Z",
"data" : "test data"
}
Test the index
GET test-ccr-index-000000/_search
{
"query": {
"match_all": {}
}
}
GET _cat/indices/follow*
// output
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open follower-kibana_sample_data_ecommerce RwN36YDWRNuLffDGftF0eA 1 0 4675 0 4.4mb 4.4mb
green open follower-test-ccr-index-000000 K8gUG9L2Rz2oqTX-OB_P3A 1 0 2 0 3.8kb 3.8kb
test the index
GET follower-test-ccr-index-000000/_search
{
"query": {
"match_all": {}
}
}
# Bonus below! Not part of the 8.1 exam topics! # Define role-based access control using Elasticsearch Security
⚠️ ⚠️ ⚠️ IMPORTANT NOTE: from here on it is assumed you have a working kibana node to work from the "development console" and that you have imported the sample data.
❓ To do this section you need to:
- create roles and users
You will see an error like the below if you have a Basic licence.
The following role parameters are not available under Basic licence.
"field_security" : { ... }, # field level security
"query": "..." # document level security
Error message:
{
"error": {
"root_cause": [
{
"type": "security_exception",
"reason": "current license is non-compliant for [field and document level security]",
"license.expired.feature": "field and document level security"
}
],
"type": "security_exception",
"reason": "current license is non-compliant for [field and document level security]",
"license.expired.feature": "field and document level security"
},
"status": 403
}
- a role called
flights_all
for read only access on the Flight sample data - the role should have cluster monitor access
- a user called
flight_reader_all
that has the role applied - the user password should be
flight123
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/built-in-roles.html https://www.elastic.co/guide/en/elasticsearch/reference/8.1/security-privileges.html
PUT _security/role/flights_all
{
"cluster": [ "monitor" ],
"indices": [
{
"names": ["kibana_sample_data_flights"],
"privileges": ["read","view_index_metadata", "monitor"]
}
]
}
PUT _security/user/flight_reader_all
{
"password": "flight123",
"roles": [ "kibana_user", "flights_all" ],
"full_name": "flights all",
"email": "[email protected]"
}
Test the user access
- Logout as elastic
- Login to Kibana as
flight_reader_all
passwordflight123
- Go to dev console and see what indices you have access to.
Check that we can access the index stats (monitor) and only the index we have allowed access to.
GET _cat/indices
green open kibana_sample_data_flights R1AptZYHTrivEUEmtftubg 1 0 13059 0 6.6mb 6.6mb
Check that we can query the index. Get the document count
GET kibana_sample_data_flights/_count
{
"count" : 13059,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
- a role called
flights_australia
for read only access on the Flght data that only allows access to data that has a Destination Country of Australia. - the following fields are allowed to be displayed: Flight Number, Country of Origin and City of Origin
- a user called
flight_reader_au
should have the role applied to it
View Solution (click to reveal)
# test your query first
POST kibana_sample_data_flights/_search
{
"query": {
"match": {
"DestCountry": "AU"
}
}
}
PUT _security/role/flights_australia
{
"indices": [
{
"names": [
"kibana_sample_data_flights"
],
"privileges": [
"read"
],
"field_security": {
"grant": ["FlightNum", "OriginCountry", "OriginCityName"]
},
"query": {
"match": {
"DestCountry": "AU"
}
}
}
]
}
Create a user for that role
PUT _security/user/flight_reader_au
{
"password": "flight123",
"roles": "flights_australia",
"full_name": "flights australia",
"email": "[email protected]"
}
Test
- Logout as elastic
- Login to Kibana as
flight_reader_au
- Go to dev console and see what indices you have access to.
#TODO: write a query to count the number of documents accessible