Agent remains `Unhealthy` even on updating invalid integration configuration to valid input. #2954

amolnater-qasource · 2023-06-28T06:11:59Z

Issue

Kibana Build details:

VERSION: 8.9.0 BC1
BUILD: 64385
COMMIT: 313dac73d8d3bc5930447f732e3ae163fb1b7f70

Host OS and Browser version: All, All

Preconditions:

8.9.0 BC1 Kibana cloud environment should be available.
Few agents should be installed.

Steps to reproduce:

Navigate to Fleet>Agents tab.
Select any agent and navigate to its agent policy>system-1 integration.
Add invalid field data to Cpu metrics field say xxxxxx.
Observe agent gets unhealthy and under agent details appropriate error is visible.
Update the configuration to the expected correct input in field- percentages
Observe even after 30 minutes agent remains Unhealthy.

Expected:
Agent should get back healthy on updating invalid integration configuration to valid input.

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-06-28.11-00-17.mp4

ec2amaz-tc0oajr.-.Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-06-28.11-21-38.mp4

Debug Logs:
elastic-agent-diagnostics-2023-06-28T06-08-54Z-00.zip

Definition of done

Agents should get back healthy on switching from invalid input to valid input.
Test are in place to confirm Agent state is conform to what we are expecting when input status changed (in both ways).

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-06-28T06:12:01Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

amolnater-qasource · 2023-06-28T06:12:18Z

@manishgupta-qasource Please review.

manishgupta-qasource · 2023-06-28T06:46:35Z

Secondary review for this ticket is Done

cmacknz · 2023-06-28T08:50:52Z

This is the failing input in state.yaml:

- id: system/metrics-default
  state:
    state: 2
    message: 'Healthy: communicating with pid ''5768'''
    units:
      ? unittype: 0
        unitid: system/metrics-default-system/metrics-system-6d2e7b5d-b166-466c-8fb5-db5c3a512387
      : state: 4
        message: '[failed to reload inputs: 1 error: Error creating runner from config:
          1 error: error validating config: invalid core.metrics value ''xxxxxxx''
          (valid options are percentages and ticks)]'
      ? unittype: 1
        unitid: system/metrics-default
      : state: 4
        message: '[failed to reload inputs: 1 error: Error creating runner from config:
          1 error: error validating config: invalid core.metrics value ''xxxxxxx''
          (valid options are percentages and ticks)]'

The configuration for the system/metrics in pre-config.yaml looks fine, and matches what is in beat-rendered-config.yaml:

- data_stream:
    namespace: windows
  id: system/metrics-system-6d2e7b5d-b166-466c-8fb5-db5c3a512387
  meta:
    package:
      name: system
      version: 1.34.0
  name: system-2
  package_policy_id: 6d2e7b5d-b166-466c-8fb5-db5c3a512387
  revision: 3
  streams:
  - core.metrics:
    - percentages
    data_stream:
      dataset: system.core
      type: metrics
    id: system/metrics-system.core-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - core
  - cpu.metrics:
    - percentages
    - normalized_percentages
    data_stream:
      dataset: system.cpu
      type: metrics
    id: system/metrics-system.cpu-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - cpu
    period: 10s
  - data_stream:
      dataset: system.diskio
      type: metrics
    diskio.include_devices: null
    id: system/metrics-system.diskio-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - diskio
    period: 10s
  - data_stream:
      dataset: system.filesystem
      type: metrics
    id: system/metrics-system.filesystem-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - filesystem
    period: 1m
    processors:
    - drop_event:
        when:
          regexp:
            system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
  - data_stream:
      dataset: system.fsstat
      type: metrics
    id: system/metrics-system.fsstat-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - fsstat
    period: 1m
    processors:
    - drop_event:
        when:
          regexp:
            system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
  - condition: ${host.platform} != 'windows'
    data_stream:
      dataset: system.load
      type: metrics
    id: system/metrics-system.load-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - load
    period: 10s
  - data_stream:
      dataset: system.memory
      type: metrics
    id: system/metrics-system.memory-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - memory
    period: 10s
  - data_stream:
      dataset: system.network
      type: metrics
    id: system/metrics-system.network-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - network
    network.interfaces: null
    period: 10s
  - data_stream:
      dataset: system.process
      type: metrics
    id: system/metrics-system.process-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - process
    period: 10s
    process.cgroups.enabled: false
    process.cmdline.cache.enabled: true
    process.include_cpu_ticks: false
    process.include_top_n.by_cpu: 5
    process.include_top_n.by_memory: 5
    processes:
    - .*
  - data_stream:
      dataset: system.process.summary
      type: metrics
    id: system/metrics-system.process.summary-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - process_summary
    period: 10s
  - data_stream:
      dataset: system.socket_summary
      type: metrics
    id: system/metrics-system.socket_summary-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - socket_summary
    period: 10s
  - data_stream:
      dataset: system.uptime
      type: metrics
    id: system/metrics-system.uptime-6d2e7b5d-b166-466c-8fb5-db5c3a512387
    metricsets:
    - uptime
    period: 10s
  type: system/metrics
  use_output: default

This is definitely a bug, one we haven't see before.

LaZyDK · 2023-09-12T11:22:57Z

We are seeing something similar in 8.9.2.
The first "Custom Logs" integration is not enabled, hence the lack of errors.

AndersonQ · 2023-09-28T09:22:18Z

@amolnater-qasource could you re-validate that issue? I cannot reproduce it on main nor on 8.10.2. It was most likely fixed by elastic/beats#36183

LaZyDK · 2023-09-28T09:31:46Z

The issue has gone away in 8.10.1.

amolnater-qasource · 2023-09-29T06:45:51Z

Hi @AndersonQ

Thank you for the update.

We have revalidated this issue on 8.10.2 and 8.11.0-SNAPSHOT kibana cloud environment and found it fixed now.

Observations:

Agent gets back healthy on updating invalid integration configuration to valid input.

Screen Recording:
8.11.0:

ec2amaz-u7odjck.-.Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-09-29.12-04-45.mp4

8.10.2:

Edit.integration.-.Windows.Agent.policy.1.-.Agent.policies.-.Fleet.-.Elastic.-.Google.Chrome.2023-09-29.12-08-59.mp4

Build details:
VERSION: 8.11.0 SNAPSHOT
BUILD: 67332
COMMIT: c20d177a036be73d7b1180dc17e644afa260994f

Hence we are closing this issue and marking as QA:Validated.

Thanks!!

harshitgupta-qasource · 2024-01-24T10:33:33Z

`Bug Conversion`

Test-Case not required as this particular checkpoint is already covered in exploratory testing.

Thanks!

amolnater-qasource added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team impact:high Short-term priority; add to current release, or definitely next. labels Jun 28, 2023

pierrehilbert assigned AndersonQ Jul 7, 2023

harshitgupta-qasource mentioned this issue Jul 19, 2023

[Fleet] Implement per-integration health reporting elastic/kibana#154634

Closed

amolnater-qasource closed this as completed Sep 29, 2023

amolnater-qasource added the QA:Validated Validated by the QA Team label Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent remains `Unhealthy` even on updating invalid integration configuration to valid input. #2954

Agent remains `Unhealthy` even on updating invalid integration configuration to valid input. #2954

amolnater-qasource commented Jun 28, 2023 •

edited by pierrehilbert

Loading

elasticmachine commented Jun 28, 2023

amolnater-qasource commented Jun 28, 2023

manishgupta-qasource commented Jun 28, 2023

cmacknz commented Jun 28, 2023

LaZyDK commented Sep 12, 2023

AndersonQ commented Sep 28, 2023

LaZyDK commented Sep 28, 2023

amolnater-qasource commented Sep 29, 2023

harshitgupta-qasource commented Jan 24, 2024 •

edited

Loading

Agent remains Unhealthy even on updating invalid integration configuration to valid input. #2954

Agent remains Unhealthy even on updating invalid integration configuration to valid input. #2954

Comments

amolnater-qasource commented Jun 28, 2023 • edited by pierrehilbert Loading

Issue

Definition of done

elasticmachine commented Jun 28, 2023

amolnater-qasource commented Jun 28, 2023

manishgupta-qasource commented Jun 28, 2023

cmacknz commented Jun 28, 2023

LaZyDK commented Sep 12, 2023

AndersonQ commented Sep 28, 2023

LaZyDK commented Sep 28, 2023

amolnater-qasource commented Sep 29, 2023

harshitgupta-qasource commented Jan 24, 2024 • edited Loading

Bug Conversion

Agent remains `Unhealthy` even on updating invalid integration configuration to valid input. #2954

Agent remains `Unhealthy` even on updating invalid integration configuration to valid input. #2954

amolnater-qasource commented Jun 28, 2023 •

edited by pierrehilbert

Loading

harshitgupta-qasource commented Jan 24, 2024 •

edited

Loading

`Bug Conversion`