elastic-agent.yml: yaml: line 47: could not find expected ':' #98

mtojek · 2021-12-16T15:20:37Z

Hi Team,

while performing a periodic check of the logs and jobs, I found this flaky error:

Attaching to elastic-package-stack_elastic-agent_1
�[36melastic-agent_1              |�[0m Policy selected for enrollment:  2016d7cc-135e-5583-9758-3ba01f5a06e5
�[36melastic-agent_1              |�[0m {"log.level":"warn","@timestamp":"2021-12-16T13:03:56.424Z","log.logger":"tls","log.origin":{"file.name":"tlscommon/tls_config.go","file.line":105},"message":"SSL/TLS verifications disabled.","ecs.version":"1.6.0"}
�[36melastic-agent_1              |�[0m {"log.level":"info","@timestamp":"2021-12-16T13:03:57.357Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":454},"message":"Starting enrollment to URL: http://fleet-server:8220/","ecs.version":"1.6.0"}
�[36melastic-agent_1              |�[0m {"log.level":"info","@timestamp":"2021-12-16T13:03:58.134Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":252},"message":"Elastic Agent might not be running; unable to trigger restart","ecs.version":"1.6.0"}
�[36melastic-agent_1              |�[0m Successfully enrolled the Elastic Agent.
�[36melastic-agent_1              |�[0m Error: could not read configuration file /usr/share/elastic-agent/state/elastic-agent.yml: yaml: line 47: could not find expected ':'
�[36melastic-agent_1              |�[0m For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.0/fleet-troubleshooting.html

source

started using ordinary elastic-package stack up -v -d (7.16.0)

It causes some flakiness for the master builds.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-12-16T15:20:39Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

mtojek · 2021-12-17T12:25:05Z

Another occurrence, same story:

https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/master/1348/artifact/build/elastic-stack-dump/netscout/logs/elastic-agent.log

jlind23 · 2021-12-17T15:23:01Z

@mtojek could you share the elastic-agent.yml file?

mtojek · 2021-12-17T15:48:17Z

I'd love to, but we don't modify the final one. It's inside the Docker image we use to run a container. I will try to collect more reference data.

BTW if this error doesn't remind anything, I can tweak elastic-package to pull this file out of the container.

ruflin · 2021-12-20T08:09:50Z

Nothing obvious comes to mind here unfortunately. @mtojek In 7.16 we shipped an elastic-agent diagnostic command that pulls out this file and lots of additional useful information. The problem is it has to be run inside the container. I'm wondering if this is an option instead, this would also make sure in the future we always have the diagnostic file available with all the info including logs.

Is your script doing any manual modifications to the elastic-agent.yml file?

mtojek · 2021-12-20T08:33:34Z

The problem is it has to be run inside the container. I'm wondering if this is an option instead, this would also make sure in the future we always have the diagnostic file available with all the info including logs.

It would be great if this is HTTP endpoint exposed for diagnostics.

Is your script doing any manual modifications to the elastic-agent.yml file?

Nah, elastic-package isn't aware of this particular config. If there isn't any other way, I can adjust the implementation and pull that one out too.

ruflin · 2021-12-20T14:12:34Z

@michel-laterman Interesting request to get diagnostics through an http endpoint. This sounds a bit risky as it contains all the internal information but curious to hear your thoughts.

michel-laterman · 2021-12-21T00:37:14Z

I think it's currently too risky, there is definitely going to be an information leak if we exposed it over HTTP at the moment (api keys and credentials in the config for example).

mtojek · 2022-02-10T08:36:47Z

@michel-laterman We managed to catch this bug and dump the policy. It looks like something is not commented out properly:

#   exponential: false
fleet.
#     #reporting_threshold: 10000

Full dump:

# ================================ General =====================================
# Beats is configured under Fleet, you can define most settings
# from the Kibana UI. You can update this file to configure the settings that
# are not supported by Fleet.
fleet:
  enabled: true

# agent.download:
#   # source of the artifacts, requires elastic like structure and naming of the binaries
#   # e.g /windows-x86.zip
#   sourceURI: "https://artifacts.elastic.co/downloads/beats/"
#   # path to the directory containing downloaded packages
#   target_directory: "${path.data}/downloads"
#   # timeout for downloading package
#   timeout: 120s
#   # file path to a public key used for verifying downloaded artifacts
#   # if not file is present Elastic Agent will try to load public key from elastic.co website.
#   pgpfile: "${path.data}/elastic.pgp"
#   # install_path describes the location of installed packages/programs. It is also used
#   # for reading program specifications.
#   install_path: "${path.data}/install"

# agent.process:
#   # minimal port number for spawned processes
#   min_port: 10000
#   # maximum port number for spawned processes
#   max_port: 30000
#   # timeout for creating new processes. when process is not successfully created by this timeout
#   # start operation is considered a failure
#   spawn_timeout: 30s

# agent.retry:
#   # enabled determines whether retry is possible. Default is false.
#   enabled: true
#   # retries_count specifies number of retries. Default is 3.
#   # Retry count of 1 means it will be retried one time after one failure.
#   retries_count: 3
#   # delay specifies delay in ms between retries. Default is 30s
#   delay: 30s
#   # max_delay specifies maximum delay in ms between retries. Default is 300s
#   max_delay: 5m
#   # Exponential determines whether delay is treated as exponential.
#   # With 30s delay and 3 retries: 30, 60, 120s
#   # Default is false
#   exponential: false
fleet.
#     #reporting_threshold: 10000
#     # Frequency used to check the queue of events to be sent out to fleet.
#     #reporting_check_frequency_sec: 30

# agent.download:
#   # source of the artifacts, requires elastic like structure and naming of the binaries
#   # e.g /windows-x86.zip
#   sourceURI: "https://artifacts.elastic.co/downloads/beats/"
#   # path to the directory containing downloaded packages
#   target_directory: "${path.data}/downloads"
#   # timeout for downloading package
#   timeout: 120s
#   # file path to a public key used for verifying downloaded artifacts
#   # if not file is present agent will try to load public key from elastic.co website.
#   pgpfile: "${path.data}/elastic.pgp"
#   # install_path describes the location of installed packages/programs. It is also used
#   # for reading program specifications.
#   install_path: "${path.data}/install"

# agent.process:
#   # timeout for creating new processes. when process is not successfully created by this timeout
#   # start operation is considered a failure
#   spawn_timeout: 30s
#   # timeout for stopping processes. when process is not stopped by this timeout then the process.
#   # is force killed
#   stop_timeout: 30s

# agent.grpc:
#   # listen address for the GRPC server that spawned processes connect back to.
#   address: localhost
#   # port for the GRPC server that spawned processes connect back to.
#   port: 6789

# agent.retry:
#   # Enabled determines whether retry is possible. Default is false.
#   enabled: true
#   # RetriesCount specifies number of retries. Default is 3.
#   # Retry count of 1 means it will be retried one time after one failure.
#   retriesCount: 3
#   # Delay specifies delay in ms between retries. Default is 30s
#   delay: 30s
#   # MaxDelay specifies maximum delay in ms between retries. Default is 300s
#   maxDelay: 5m
#   # Exponential determines whether delay is treated as exponential.
#   # With 30s delay and 3 retries: 30, 60, 120s
#   # Default is false
#   exponential: false

# agent.monitoring:
#   # enabled turns on monitoring of running processes
#   enabled: false
#   # enables log monitoring
#   logs: false
#   # enables metrics monitoring
#   metrics: false
#   # exposes /debug/pprof/ endpoints
#   # recommended that these endpoints are only enabled if the monitoring endpoint is set to localhost
#   pprof: false
#   # exposes agent metrics using http, by default sockets and named pipes are used
#   http:
#       # enables http endpoint
#       enabled: false
#       # The HTTP endpoint will bind to this hostname, IP address, unix socket or named pipe.
#       # When using IP addresses, it is recommended to only use localhost.
#       host: localhost
#       # Port on which the HTTP endpoint will bind. Default is 0 meaning feature is disabled.
#       port: 6791

# # Allow fleet to reload its configuration locally on disk.
# # Notes: Only specific process configuration will be reloaded.
# agent.reload:
#   # enabled configure the Elastic Agent to reload or not the local configuration.
#   #
#   # Default is true
#   enabled: true

#   # period define how frequent we should look for changes in the configuration.
#   period: 10s

# Logging

# There are four options for the log output: file, stderr, syslog, eventlog
# The file output is the default.

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#agent.logging.level: info

# Enable debug output for selected components. To enable all selectors use ["*"]
# Other available selectors are "beat", "publish", "service"
# Multiple selectors can be chained.
#agent.logging.selectors: [ ]

# Send all logging output to stderr. The default is false.
agent.logging.to_stderr: true

# Send all logging output to syslog. The default is false.
#agent.logging.to_syslog: false

# Send all logging output to Windows Event Logs. The default is false.
#agent.logging.to_eventlog: false

# If enabled, Elastic-Agent periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. Also, the total values for
# all non-zero internal metrics are logged on shutdown. This setting is also passed
# to beats running under the agent. The default is true.
#agent.logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#agent.logging.metrics.period: 30s

# Logging to rotating files. Set logging.to_files to false to disable logging to
# files.
#agent.logging.to_files: true
#agent.logging.files:
  # Configure the path where the logs are written. The default is the logs directory
  # under the home path (the binary location).
  #path: /var/log/elastic-agent

  # The name of the files where the logs are written to.
  #name: elastic-agent

  # Configure log file size limit. If limit is reached, log file will be
  # automatically rotated
  #rotateeverybytes: 10485760 # = 10MB

  # Number of rotated log files to keep. Oldest files will be deleted first.
  #keepfiles: 7

  # The permissions mask to apply when rotating log files. The default value is 0600.
  # Must be a valid Unix-style file permissions mask expressed in octal notation.
  #permissions: 0600

  # Enable log file rotation on time intervals in addition to size-based rotation.
  # Intervals must be at least 1s. Values of 1m, 1h, 24h, 7*24h, 30*24h, and 365*24h
  # are boundary-aligned with minutes, hours, days, weeks, months, and years as
  # reported by the local system clock. All other intervals are calculated from the
  # Unix epoch. Defaults to disabled.
  #interval: 0

  # Rotate existing logs on startup rather than appending to the existing
  # file. Defaults to true.
  # rotateonstartup: true

# Set to true to log messages in JSON format.
#agent.logging.json: false

# Set to true, to log messages with minimal required Elastic Common Schema (ECS)
# information. Recommended to use in combination with `logging.json=true`
# Defaults to false.
#agent.logging.ecs: false

# Providers

# Providers supply the key/values pairs that are used for variable substitution
# and conditionals. Each provider's keys are automatically prefixed with the name
# of the provider.

#providers:

# Agent provides information about the running agent.
#  agent:
#    enabled: true

# Docker provides inventory information from Docker.
#  docker:
#    enabled: true
#    host: "unix:///var/run/docker.sock"
#    cleanup_timeout: 60

# Env providers information about the running environment.
#  env:
#    enabled: true

# Host provides information about the current host.
#  host:
#    enabled: true

# Local provides custom keys to use as variable.
#  local:
#    enabled: true
#    vars:
#      foo: bar

# Local dynamic allows you to define multiple key/values to generate multiple configurations.
#  local_dynamic:
#    enabled: true
#    items:
#      - vars:
#          my_var: key1
#      - vars:
#          my_var: key2
#      - vars:
#          my_var: key3

mtojek · 2022-02-10T08:44:35Z

It looks like a bug in the YAML processor. I believe that the healthy YAML should look like this:

#   reporting:
#     # Reporting threshold indicates how many events should be kept in-memory before reporting them to fleet.
#     #reporting_threshold: 10000
#     # Frequency used to check the queue of events to be sent out to fleet.
#     #reporting_check_frequency_sec: 30

ph · 2022-02-10T15:09:59Z

@mtojek Where that configuration is coming from? The diagnostic?

mtojek · 2022-02-10T15:18:42Z

I don't know which party generates it. Elastic Agent in container uses it by default while running as part of Elastic Package stack. I assume that this is the default policy?

EDIT:

If you are asking how we pulled it out, it's 'docker cp'.

mtojek · 2022-02-16T08:41:54Z

Hey @ph!

Do you have any updates on this issue? It happens more recently for Integrations master (zscaler_zpa).

mtojek · 2022-03-16T11:18:29Z

Hey @ph,

Another occurrence: log

Let me know if I can provide you with more reference data.

ph · 2022-03-16T14:10:41Z

Attaching to elastic-package-stack_elastic-agent_1
�[36melastic-agent_1              |�[0m Policy selected for enrollment:  elastic-agent-managed-ep
�[36melastic-agent_1              |�[0m {"log.level":"warn","@timestamp":"2022-03-16T08:17:54.079Z","log.logger":"tls","log.origin":{"file.name":"tlscommon/tls_config.go","file.line":105},"message":"SSL/TLS verifications disabled.","ecs.version":"1.6.0"}
�[36melastic-agent_1              |�[0m {"log.level":"info","@timestamp":"2022-03-16T08:17:54.620Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":455},"message":"Starting enrollment to URL: http://fleet-server:8220/","ecs.version":"1.6.0"}
�[36melastic-agent_1              |�[0m Successfully enrolled the Elastic Agent.
�[36melastic-agent_1              |�[0m {"log.level":"info","@timestamp":"2022-03-16T08:17:55.924Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":253},"message":"Elastic Agent might not be running; unable to trigger restart","ecs.version":"1.6.0"}
�[36melastic-agent_1              |�[0m Error: could not read configuration file /usr/share/elastic-agent/state/elastic-agent.yml: yaml: line 47: could not find expected ':'
�[36melastic-agent_1              |�[0m For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.1/fleet-troubleshooting.html

ph · 2022-04-06T14:03:09Z

@blakerouse @narph maybe, I see we aren't handling a fsync error too, would be good to at least change that.

narph · 2022-04-07T11:58:08Z

@ph , @blakerouse , thanks for the clarification, I've created
#312
elastic/elastic-agent-libs#36 in order to improve the error handling and also to sync the parent directory.

blakerouse · 2022-04-07T14:36:45Z

I wonder if it would even better to change replace_store to also use SafeFileRotate for the replacement of elastic-agent.yml as well.

narph · 2022-04-11T14:01:07Z

I wonder if it would even better to change replace_store to also use SafeFileRotate for the replacement of elastic-agent.yml as well.

I've made the changes and used SafeFileRotate instead, tests will fail until elastic/elastic-agent-libs#36 is in

mtojek · 2022-04-19T13:01:12Z

Hi Team, do we have any progress around this issue?

I checked logs from last night and the problem still persists.

narph · 2022-04-19T14:21:10Z

@mtojek , which version are you testing with ? we merged #312 and #327 can you test with this fix in?

ph · 2022-04-19T19:20:54Z

Looking at the log it seems to be 8.2, is that the BC or the snapshot?

mtojek · 2022-04-19T21:39:16Z

It's always the latest SNAPSHOT of a version.

mtojek · 2022-04-26T07:49:07Z

Any update on this issue?

Fresh occurrences:
tenable_sc (stack v8.1)

mtojek · 2022-04-26T07:49:30Z

ping @narph @blakerouse

jlind23 · 2022-04-26T08:52:31Z

@mtojek it has been merged in 8.2, this is probably why you still see it on an 8.1 stack

mtojek · 2022-04-26T09:23:15Z

I see, thanks! I recommend backporting to as many stack versions as we can. For older ones, we'll have to think about a workaround in Integrations. Otherwise, we will keep flakiness there.

ph · 2022-04-26T12:36:30Z

Agree, I though it was still present in 8.2 :)

jsoriano · 2022-04-26T16:08:54Z

Can the fix for this be backported also to 7.17? I have seen this with 7.17.3 (here).

jlind23 · 2022-04-26T16:13:13Z

But did it really fixed it? If yes, then we will talk about the backport tomorrow during our team weekly.

jsoriano · 2022-04-26T16:15:12Z

Oh ok, I thought this was already fixed in main.

mtojek · 2022-04-27T07:29:41Z

Fun fact, I just found a variation of the issue in 7.14.1 (line 45: did not find expected key):

Attaching to elastic-package-stack_elastic-agent_1
�[36melastic-agent_1              |�[0m Policy selected for enrollment:  19a8c960-c56a-11ec-94ca-31016dcca567
�[36melastic-agent_1              |�[0m 2022-04-26T14:07:03.459Z	WARN	[tls]	tlscommon/tls_config.go:98	SSL/TLS verifications disabled.
�[36melastic-agent_1              |�[0m 2022-04-26T14:07:04.249Z	INFO	cmd/enroll_cmd.go:396	Starting enrollment to URL: http://fleet-server:8220/
�[36melastic-agent_1              |�[0m 2022-04-26T14:07:05.528Z	INFO	cmd/enroll_cmd.go:232	Elastic Agent might not be running; unable to trigger restart
�[36melastic-agent_1              |�[0m 2022-04-26T14:07:05.528Z	INFO	cmd/enroll_cmd.go:234	Successfully triggered restart on running Elastic Agent.
�[36melastic-agent_1              |�[0m Successfully enrolled the Elastic Agent.
�[36melastic-agent_1              |�[0m Error: could not read configuration file /usr/share/elastic-agent/state/elastic-agent.yml: yaml: line 45: did not find expected key

log file

ph · 2022-04-27T12:26:20Z

@mtojek we are running a test for 7.14?

jlind23 · 2022-04-27T12:39:29Z

As seen with @mtojek there is no further occurence in 8.2, thus we can close this issue and rather update the PR with the appropriate backport label after today's discussion.

mtojek · 2022-04-27T12:41:08Z

As seen with @mtojek there is no further occurence in 8.2, thus we can close this issue and rather update the PR with the appropriate backport label after today's discussion.

So far...

we are running a test for 7.14?

We are running system tests against the latest supported stack. In some cases, this is 7.14.

narph · 2022-04-28T08:45:02Z

backport for 7..17 elastic/beats#31449

tnjman · 2023-01-26T13:00:12Z

We are on 8.4, and we still are experiencing this issue, fyi. Currently testing via 'self-signed' and having LOTS of issues getting any certs to work right, so in our 'self-signed testing mode,' here's the "cleansed" version:
sudo elastic-agent enroll
--fleet-server-es=https://[real-ip-here]:9200
--fleet-server-service-token=abcxyz123-REAL-svc-token-here
--fleet-server-policy=fleet-server-policy
--insecure

Error: could not read configuration file /opt/Elastic/Agent/elastic-agent.yml: yaml: line 707: did not find expected key
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.4/fleet-troubleshooting.html

Any and all hints/help appreciated; thanks in advance!

cmacknz · 2023-01-26T18:19:49Z

@tnjman I think the specific issue in this bug has been fixed, you are probably experiencing a different issue. It probably requires more investigation to determine why, which we usually don't do in the issue tracker. Start a thread in https://discuss.elastic.co/c/elastic-stack/elastic-agent/ and someone should help you there.

mtojek added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Dec 16, 2021

jlind23 added bug Something isn't working flaky-test Unstable or unreliable test cases. Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team and removed Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Dec 17, 2021

jlind23 added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team and removed Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Dec 17, 2021

jlind23 added the backport-v8.2.0 label Jan 4, 2022

jlind23 added the v8.2.0 label Jan 28, 2022

mtojek mentioned this issue Feb 9, 2022

Workaround: dump agent policy if line 47 error appears elastic/integrations#2658

Closed

jlind23 transferred this issue from elastic/beats Mar 7, 2022

This was referenced Apr 7, 2022

Extract function SyncParent to reuse in elastic agent elastic/elastic-agent-libs#36

Merged

Improve sync handling and errors #312

Merged

mtojek mentioned this issue Apr 15, 2022

Flaky: network elastic-package-service_default id 2cc14bcd.... has active endpoints elastic/elastic-package#545

Open

jsoriano mentioned this issue Apr 26, 2022

Enable SSL in the Elastic Stack elastic/elastic-package#789

Merged

24 tasks

jlind23 closed this as completed Apr 27, 2022

mtojek mentioned this issue Apr 27, 2022

Testing integrations against unsupported stacks elastic/integrations#3208

Closed

narph mentioned this issue Apr 28, 2022

[7.17](backport #312) Improve sync handling and errors elastic/beats#31449

Merged

6 tasks

elastic-agent.yml: yaml: line 47: could not find expected ':' #98

elastic-agent.yml: yaml: line 47: could not find expected ':' #98

Comments

mtojek commented Dec 16, 2021 • edited Loading

elasticmachine commented Dec 16, 2021

mtojek commented Dec 17, 2021

jlind23 commented Dec 17, 2021

mtojek commented Dec 17, 2021 • edited Loading

ruflin commented Dec 20, 2021

mtojek commented Dec 20, 2021

ruflin commented Dec 20, 2021

michel-laterman commented Dec 21, 2021

mtojek commented Feb 10, 2022 • edited Loading

mtojek commented Feb 10, 2022

ph commented Feb 10, 2022

mtojek commented Feb 10, 2022 • edited Loading

mtojek commented Feb 16, 2022

mtojek commented Mar 16, 2022 • edited Loading

ph commented Mar 16, 2022

ph commented Apr 6, 2022

narph commented Apr 7, 2022

blakerouse commented Apr 7, 2022

narph commented Apr 11, 2022

mtojek commented Apr 19, 2022

narph commented Apr 19, 2022

ph commented Apr 19, 2022

mtojek commented Apr 19, 2022

mtojek commented Apr 26, 2022 • edited Loading

mtojek commented Apr 26, 2022

jlind23 commented Apr 26, 2022

mtojek commented Apr 26, 2022

ph commented Apr 26, 2022

jsoriano commented Apr 26, 2022

jlind23 commented Apr 26, 2022

jsoriano commented Apr 26, 2022

mtojek commented Apr 27, 2022

ph commented Apr 27, 2022

jlind23 commented Apr 27, 2022

mtojek commented Apr 27, 2022

narph commented Apr 28, 2022

tnjman commented Jan 26, 2023 • edited Loading

cmacknz commented Jan 26, 2023

mtojek commented Dec 16, 2021 •

edited

Loading

mtojek commented Dec 17, 2021 •

edited

Loading

mtojek commented Feb 10, 2022 •

edited

Loading

mtojek commented Feb 10, 2022 •

edited

Loading

mtojek commented Mar 16, 2022 •

edited

Loading

mtojek commented Apr 26, 2022 •

edited

Loading

tnjman commented Jan 26, 2023 •

edited

Loading