Scraping prometheus metrics endpoint crashes falco process #3229

sboschman · 2024-05-31T10:50:40Z

Describe the bug

Followed the prometheus support section in the docs to enable the /metrics endpoint. A soon as you make a request to this endpoint the falco process crashes without any indication on stdout/stderr what went wrong.

How to reproduce it

engine:
  kind: nodriver

metrics:
  enabled: true
  interval: 1h (also tried with 1m and waiting ~ 5 mins before /metrics request)
  output_rule: true
  rules_counters_enabled: false (tried with only this one enabled)
  resource_utilization_enabled: false (tried with only this one enabled)
  state_counters_enabled: false
  kernel_event_counters_enabled: false
  libbpf_stats_enabled: false
  convert_memory_to_mb: true
  include_empty_values: false

webserver:
  enabled: true
  k8s_healthz_endpoint: /healthz
  listen_port: 8765
  prometheus_metrics_enabled: true

Enabled debug logging as well:

libs_logger:
  enabled: true
  severity: debug
log_level: debug

This doesn't give any output when requesting the /metrics endpoint. So, it is not helping narrowing down what it happening just before it crashes.

This is what happens with a port forward to the falco pod:

% curl localhost:8765/healthz
{"status": "ok"}

% curl localhost:8765/metrics
curl: (52) Empty reply from server

After the /metrics request has been done, Kubernetes shows pod termination:

      lastState:
        terminated:
          exitCode: 139
          reason: Error

Expected behaviour

Expected to see some metrics, or at least not to crash the entire falco process.

Screenshots

Environment

Falco version: 0.38.0 (x86_64)

System info:

Cloud provider or hardware configuration:
OS:

Kernel:

Installation method: Kubernetes (docker.io/falcosecurity/falco-no-driver image)

Additional context

This is a dedicated github plugin instance of falco (running as a k8s pod), i.e. not using syscall at all (--disable-source syscall).

2024-05-31T09:43:33+0000: Loaded event sources: syscall, github
2024-05-31T09:43:33+0000: Enabled event sources: github
2024-05-31T09:43:33+0000: Opening event source 'github'

No clue if this has anything to do with the crash though.

The text was updated successfully, but these errors were encountered:

Issif · 2024-05-31T11:49:43Z

I confirm the situation:

falco without a plugin: OK
falco with a plugin and no driver: KO

Each call to the /metrics endpoint crashes the falco container in the pod.

leogr · 2024-05-31T12:22:59Z

Have you tried with a plugin and a driver? 🤔

I want to restrict the possible root cause to the plugin only.

FedeDP · 2024-05-31T12:25:44Z

Also cc @incertum @sgaist

FedeDP · 2024-05-31T13:17:04Z

Opened the PR with the fix ☝️

incertum · 2024-05-31T13:33:18Z

@sboschman thanks a bunch for testing it so promptly. We still have no good metrics support when running Falco with a plugin only. We should perhaps add a note to the website about that as well. For example CPU usage calculation still won't work for plugin only given a regression here: #2821

incertum · 2024-05-31T13:34:47Z

Also the Falco number of events won't be available atm in Prometheus as it would have required a major refactor and we ran out of time. Please follow this issue for things we spotted that we still need to address: #3194

sboschman · 2024-05-31T14:17:27Z

I see @incertum , didn't realise running with plugin only has limited prometheus metrics support atm. Was indeed looking for cpu + memory metrics, and Rules Counters Fields (hoping to do something with total events processed and total rules matched, as to determine how much unnecessary events we sent to falco or missing rules we have)

incertum · 2024-05-31T14:31:30Z

I know yes it's annoying, we will work on that for Falco 0.39.0 plus we will offer custom plugin metrics system where you can emit custom plugin metrics when you write your own plugin. I'll CC you on that other issue so you stay in the loop.

sboschman · 2024-05-31T14:33:34Z

on a side note @incertum , I also noticed that it is mandatory to enable the output rule (or output file I suppose) when using prometheus metrics output, which is not mentioned in the docs as requirement.

This config

  output_rule: false

results in falco failing to start with:

Error: Metrics are enabled with no output configured. Please enable at least one output channel

Is this already a known limitation or do you want me to open a separate issue for it?

incertum · 2024-05-31T14:36:01Z

on a side note @incertum , I also noticed that it is mandatory to enable the output rule (or output file I suppose) when using prometheus metrics output, which is not mentioned in the docs as requirement.

This config
  output_rule: false
results in falco failing to start with:
Error: Metrics are enabled with no output configured. Please enable at least one output channel
Is this already a known limitation or do you want me to open a separate issue for it?

uhhh no we messed up for sure on that. It should also work with you have prometheus enabled and no other output.
Adding this on the list for fixes. @FedeDP and @sgaist we could address that in the next patch release.

Thanks a bunch for your help Sverre on spotting these things!

FedeDP · 2024-06-03T08:28:06Z

/milestone 0.38.1

sboschman added the kind/bug label May 31, 2024

FedeDP mentioned this issue May 31, 2024

fix(userspace/falco): fixed falco_metrics::to_text implementation when running with plugins #3230

Merged

FedeDP mentioned this issue May 31, 2024

new(tests,pkg,action): added 2 new tests around prometheus metrics. falcosecurity/testing#59

Merged

This was referenced Jun 1, 2024

fix(metrics): allow each metric output channel to be selected independently #3232

Merged

cleanup(metrics): improve prometheus and plugin metrics info falcosecurity/falco-website#1328

Merged

poiana added this to the 0.38.1 milestone Jun 3, 2024

poiana closed this as completed in #3230 Jun 3, 2024

FedeDP mentioned this issue Jun 4, 2024

chore(ci): enable dummy tests on the testing framework. #3233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraping prometheus metrics endpoint crashes falco process #3229

Scraping prometheus metrics endpoint crashes falco process #3229

sboschman commented May 31, 2024 •

edited

Loading

Issif commented May 31, 2024

leogr commented May 31, 2024

FedeDP commented May 31, 2024

FedeDP commented May 31, 2024

incertum commented May 31, 2024

incertum commented May 31, 2024

sboschman commented May 31, 2024

incertum commented May 31, 2024

sboschman commented May 31, 2024

incertum commented May 31, 2024

FedeDP commented Jun 3, 2024

Scraping prometheus metrics endpoint crashes falco process #3229

Scraping prometheus metrics endpoint crashes falco process #3229

Comments

sboschman commented May 31, 2024 • edited Loading

Issif commented May 31, 2024

leogr commented May 31, 2024

FedeDP commented May 31, 2024

FedeDP commented May 31, 2024

incertum commented May 31, 2024

incertum commented May 31, 2024

sboschman commented May 31, 2024

incertum commented May 31, 2024

sboschman commented May 31, 2024

incertum commented May 31, 2024

FedeDP commented Jun 3, 2024

sboschman commented May 31, 2024 •

edited

Loading