Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILM Delete Fails when APM Integration installed by Kibana #7568

Closed
graphaelli opened this issue Mar 17, 2022 · 1 comment
Closed

ILM Delete Fails when APM Integration installed by Kibana #7568

graphaelli opened this issue Mar 17, 2022 · 1 comment
Assignees

Comments

@graphaelli
Copy link
Member

APM Server version: 8.2.0-SNAPSHOT

Description of the problem including expected versus actual behavior:

When kibana (or any user with insufficient permissions, but kibana is the only common one) installs the APM integration the ILM delete phase for APM data streams fail. This is particularly relevant for ESS, where the kibana system user commonly installs the APM Integration.

Steps to reproduce:

Apply this patch to a recent apm-server checkout, this was against fd4ee95, to configure auto installation of the APM integration (this also shortens the ILM policy while we're at it):

diff --git a/apmpackage/apm/data_stream/traces/elasticsearch/ilm/default_policy.json b/apmpackage/apm/data_stream/traces/elasticsearch/ilm/default_policy.json
index 8a46dcb69..2386440d3 100644
--- a/apmpackage/apm/data_stream/traces/elasticsearch/ilm/default_policy.json
+++ b/apmpackage/apm/data_stream/traces/elasticsearch/ilm/default_policy.json
@@ -4,8 +4,8 @@
             "hot": {
                 "actions": {
                     "rollover": {
-                        "max_age": "30d",
-                        "max_size": "50gb"
+                        "max_age": "1m",
+                        "max_size": "1mb"
                     },
                     "set_priority": {
                         "priority": 100
@@ -13,7 +13,7 @@
                 }
             },
             "delete": {
-                "min_age": "10d",
+                "min_age": "1m",
                 "actions": {
                     "delete": {}
                 }
diff --git a/docker-compose.yml b/docker-compose.yml
index 7bf023676..b00718eb2 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -31,7 +31,7 @@ services:
       - "xpack.license.self_generated.type=trial"
       - "xpack.security.authc.token.enabled=true"
       - "xpack.security.authc.api_key.enabled=true"
-      - "logger.org.elasticsearch=${ES_LOG_LEVEL:-error}"
+      - "logger.org.elasticsearch=${ES_LOG_LEVEL:-info}"
       - "action.destructive_requires_name=false"
     volumes:
       - "./testing/docker/elasticsearch/roles.yml:/usr/share/elasticsearch/config/roles.yml"
@@ -63,6 +63,7 @@ services:
     image: docker.elastic.co/beats/elastic-agent:8.2.0-ff67d7b8-SNAPSHOT
     ports:
       - 8220:8220
+      - 8200:8200
     healthcheck:
       test: ["CMD-SHELL", "curl -s -k https://localhost:8220/api/status | grep -q 'HEALTHY'"]
       retries: 300
diff --git a/testing/docker/kibana/kibana.yml b/testing/docker/kibana/kibana.yml
index 81d52687f..99ec0416a 100644
--- a/testing/docker/kibana/kibana.yml
+++ b/testing/docker/kibana/kibana.yml
@@ -6,6 +6,8 @@ xpack.security.encryptionKey: fhjskloppd678ehkdfdlliverpoolfcr
 xpack.encryptedSavedObjects.encryptionKey: fhjskloppd678ehkdfdlliverpoolfcr
 
 xpack.fleet.packages:
+  - name: apm
+    version: latest
   - name: fleet_server
     version: latest
 xpack.fleet.agentPolicies:
@@ -19,3 +21,12 @@ xpack.fleet.agentPolicies:
         id: default-fleet-server
         package:
           name: fleet_server
+      - name: fleet_server-apm-server
+        id: default-apm-server
+        package:
+          name: apm
+        inputs:
+          - type: apm
+            vars:
+              - name: host
+                value: "0.0.0.0:8200"

Then:

# assemble the APM integration package
make build-package
# start all services including local EPR
docker-compose up -d
# ingest a trace
curl -i -H "Content-type: application/x-ndjson" --data-binary @testdata/intake-v2/events.ndjson http://localhost:8200/intake/v2/events

Now wait a few minutes or force ILM with:

POST traces-apm-default/_rollover 
{
  "conditions": {
    "max_docs":   "0"
  }
}

and note that ILM delete phase will fail:

security_exception: action [indices:admin/delete] is unauthorized for user [kibana_system_user] with roles [kibana_system] on indices [.ds-traces-apm-default-2022.03.17-000001], this action is granted by the index privileges [delete_index,manage,all]

I think rollover works fine because of: https://github.com/elastic/elasticsearch/blob/3a5a15b633273efef626b57ee0c1c2e5f09fb695/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java#L743-L756 but delete fails because it's too specific here: https://github.com/elastic/elasticsearch/blob/3a5a15b633273efef626b57ee0c1c2e5f09fb695/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java#L771-L775

Some good news

One fix for this is expanding kibana_system delete permissions in Elasticsearch. There is a workaround until that (or another suitable solution) is in place, for example for older clusters - re-save the ILM policy as a user that does have permissions to both roll the data stream over and delete the underlying indices. After doing that in kibana on /app/management/data/index_lifecycle_management/policies/edit/traces-apm.traces-default_policy a retry of the ILM policy execution succeeded:

POST .ds-traces-apm-default-2022.03.17-000001/_ilm/retry

triggered:

apm-server-elasticsearch-1  | {"@timestamp":"2022-03-17T21:11:07.297Z", "log.level": "INFO", "message":"[.ds-traces-apm-default-2022.03.17-000001/GcpWi7lDSAymQyMur59n0A] deleting index", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5b66f0ce99a8][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","trace.id":"db4d487f9480c6ce9c244dcb4a9cf6a6","elasticsearch.cluster.uuid":"UP4deS_PRfep0Uuj-9GUTA","elasticsearch.node.id":"TDlJ0Nn2TVqcaD7RnYAi1w","elasticsearch.node.name":"5b66f0ce99a8","elasticsearch.cluster.name":"docker-cluster"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants