Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.0] iptables system test: found 0 hits in logs-iptables.log-ep data stream #2602

Closed
mtojek opened this issue Jan 26, 2022 · 26 comments · Fixed by #5984
Closed

[8.0] iptables system test: found 0 hits in logs-iptables.log-ep data stream #2602

mtojek opened this issue Jan 26, 2022 · 26 comments · Fixed by #5984
Labels

Comments

@mtojek
Copy link
Contributor

mtojek commented Jan 26, 2022

Hi,

while reviewing daily CI jobs I found this flaky problem with iptables:

[2022-01-26T05:06:50.986Z] 2022/01/26 05:06:50 DEBUG found 0 hits in logs-iptables.log-ep data stream
[2022-01-26T05:06:51.921Z] 2022/01/26 05:06:51 DEBUG found 0 hits in logs-iptables.log-ep data stream
[2022-01-26T05:06:52.858Z] 2022/01/26 05:06:52 DEBUG found 0 hits in logs-iptables.log-ep data stream
[2022-01-26T05:06:53.798Z] 2022/01/26 05:06:53 DEBUG found 0 hits in logs-iptables.log-ep data stream
[2022-01-26T05:06:54.736Z] 2022/01/26 05:06:54 DEBUG found 0 hits in logs-iptables.log-ep data stream
[2022-01-26T05:06:56.115Z] 2022/01/26 05:06:55 DEBUG found 0 hits in logs-iptables.log-ep data stream
[2022-01-26T05:06:57.052Z] 2022/01/26 05:06:56 DEBUG found 0 hits in logs-iptables.log-ep data stream
[2022-01-26T05:06:57.991Z] 2022/01/26 05:06:57 DEBUG reassigning original policy back to agent...
[2022-01-26T05:06:57.991Z] 2022/01/26 05:06:57 DEBUG PUT http://127.0.0.1:5601/api/fleet/agents/c75c33c6-df61-417e-a6fc-585e51bef85e/reassign
[2022-01-26T05:06:58.929Z] 2022/01/26 05:06:58 DEBUG GET http://127.0.0.1:5601/api/fleet/agents/c75c33c6-df61-417e-a6fc-585e51bef85e
[2022-01-26T05:06:58.929Z] 2022/01/26 05:06:58 DEBUG Agent data: {"id":"c75c33c6-df61-417e-a6fc-585e51bef85e","policy_id":"2016d7cc-135e-5583-9758-3ba01f5a06e5","local_metadata":{"host":{"name":"docker-fleet-agent"}}}
[2022-01-26T05:06:58.929Z] 2022/01/26 05:06:58 DEBUG Wait until the policy (ID: 2016d7cc-135e-5583-9758-3ba01f5a06e5, revision: 1) is assigned to the agent (ID: c75c33c6-df61-417e-a6fc-585e51bef85e)...

It looks like the iptables integration doesn't produce any metrics/logs? cc @andrewkroh

@andrewkroh
Copy link
Member

andrewkroh commented Jan 26, 2022

A breaking change in the journald input is breaking all integrations that use the input.

@kvch I left you a few comments about making the change in a compatible manner. WDYT adding an automatic translation to keep old configs working? elastic/beats#29294 (comment)

There's another bug that will likely break integrations after the config issue is resolved - elastic/beats#30031. The field validation tests should fail because there would be a bunch of unexpected key names.

@andrewkroh
Copy link
Member

The config breaking change doesn't seem to be the culprit. Filebeat 8.1.0-SNAPSHOT appears to ignore the filters specified in the old config format and read all journald logs. If I follow the Filebeat log I can see it publishing the 3 expected logs, but then Filebeat appears to crash.

2022-01-26T18:40:10Z - message: Application: filebeat--8.1.0-SNAPSHOT[d4a6c4ea-0e00-4882-b65c-d0be7534282f]: State changed to RESTARTING: exited with code: 2 - type: 'STATE' - sub_type: 'STARTING'

It's not possible to observe the stdout of the Filebeat process run by Agent so I cannot see if there is a panic.


I've tested Filebeat standalone with the console output and it does not crash. But it also does not add a cursor to registry. That seems odd. This is what I see under 8.1.

{"k":"journald::iptables::/home/vagrant/go/src/github.com/elastic/beats/filebeat/iptables.journal","v":{"cursor":null,"ttl":1800000000000,"updated":[33042222,1643224698]}}

And this is what I see under 7.17.

{"k":"journald::old::/home/vagrant/go/src/github.com/elastic/beats/filebeat/iptables.journal","v":{"ttl":1800000000000,"updated":[675663316,1643225792],"cursor":{"monotonictimestamp":1840880826380,"version":1,"position":"s=bc466f462aa64ffdb17ecc1f266e6fb3;i=3;b=c2f79f985830406a9e08241d015eff05;m=1ac9d0c380c;t=5d56b9afbd2ae;x=a38a1756ce1c8f62","realtimetimestamp":1642033021964974}}}

@andrewkroh
Copy link
Member

I think the problem is related to https://github.com/elastic/beats/pull/29070/files#diff-839f58c64b7063f75769a4d945fca4efb1e7f103cd6217a90c2363490f918dd0L144-R151.

@belimawr can you take a look at Filebeat. I think the change you made lost both the translation of journald field names to ECS (eventFromFields is an unused function) and the cursor state (Private is not passed through to the Event here).

@mtojek
Copy link
Contributor Author

mtojek commented Jan 27, 2022

@belimawr belimawr self-assigned this Jan 27, 2022
@belimawr
Copy link
Contributor

I think the problem is related to https://github.com/elastic/beats/pull/29070/files#diff-839f58c64b7063f75769a4d945fca4efb1e7f103cd6217a90c2363490f918dd0L144-R151.

@belimawr can you take a look at Filebeat. I think the change you made lost both the translation of journald field names to ECS (eventFromFields is an unused function) and the cursor state (Private is not passed through to the Event here).

This data should now be set directly into the event here: https://github.com/belimawr/beats/blob/67b5ba70ed05ac3b6085ef023375dbcb48ad4b69/filebeat/input/journald/input.go#L239-L244

Anyways, I'm looking into it.

@mtojek
Copy link
Contributor Author

mtojek commented Jan 27, 2022

@andrewkroh

It's not possible to observe the stdout of the Filebeat process run by Agent so I cannot see if there is a panic.

You can do this, you have to log in to the special internal bucket: beats-ci-temp-internal. Anyway, I checked there and no panic, or we haven't logged it :)

{"log.level":"info","@timestamp":"2022-01-27T03:30:51.927Z","log.logger":"centralmgmt.fleet","log.origin":{"file.name":"management/manager.go","file.line":150},"message":"Status change to Configuring: Updating configuration","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.928Z","log.logger":"centralmgmt.fleet","log.origin":{"file.name":"management/manager.go","file.line":271},"message":"Applying settings for filebeat.inputs","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2022-01-27T03:30:51.928Z","log.logger":"input","log.origin":{"file.name":"v2/loader.go","file.line":102},"message":"EXPERIMENTAL: The journald input is experimental","service.name":"filebeat","input":"journald","stability":"Experimental","deprecated":false,"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.928Z","log.logger":"centralmgmt.fleet","log.origin":{"file.name":"management/manager.go","file.line":271},"message":"Applying settings for output","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.928Z","log.logger":"input.journald","log.origin":{"file.name":"compat/compat.go","file.line":111},"message":"Input journald starting","service.name":"filebeat","id":"journald-journald.log-a8e2ab6d-b4f1-4ea5-b80a-2e6d994d869e","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.929Z","log.logger":"esclientleg","log.origin":{"file.name":"eslegclient/connection.go","file.line":102},"message":"elasticsearch url: http://elasticsearch:9200","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.929Z","log.logger":"centralmgmt.fleet","log.origin":{"file.name":"management/manager.go","file.line":271},"message":"Applying settings for filebeat.modules","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.929Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":213},"message":"retryer: send wait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.929Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":217},"message":"  done","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.929Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-01-27T03:30:51.929Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":"  done","service.name":"filebeat","ecs.version":"1.6.0"}

@andrewkroh
Copy link
Member

andrewkroh commented Jan 27, 2022

@mtojek It's not a problem with our CI setup (that's great). It's that Elastic Agent does not direct the stdout/stderr anywhere when it uses Go os/exec. So that output is lost to /dev/null.

@mtojek
Copy link
Contributor Author

mtojek commented Jan 27, 2022

I followed that approach as I couldn't believe that processes aren't dying and they were. Unfortunately @andrewkroh was right and stderr is dumped to /dev/null, so the only way to sniff it was to use strace in docker container with SYS_PTRACE.

I checked PIDs of running filebeat processes and excluded them into my strace trap:

root@docker-fleet-agent:/usr/share/elastic-agent# strace -f -e write -s 2048 -p `ps aux | grep filebeat | cut -f 2 -d " " | grep -v "21862" | grep -v "21895" | head -1` 2>&1 | grep write --line-buffered >> logs.txt

then started the journald test with elastic-package test system -v and started hitting the command above until it hangs....

When it hanged, it dumped the content filebeat wrote to /dev/null:

[pid 24411] write(5, "\0", 1)           = 1
[pid 24411] write(6, "{\"log.level\":\"info\",\"@timestamp\":\"2022-01-27T14:16:50.979Z\",\"log.logger\":\"add_cloud_metadata\",\"log.origin\":{\"file.name\":\"add_cloud_metadata/add_cloud_metadata.go\",\"file.line\":101},\"message\":\"add_cloud_metadata: hosting provider type not detected.\",\"service.name\":\"filebeat\",\"ecs.version\":\"1.6.0\"}\n", 297) = 297
[pid 24411] write(2, "fatal error: ", 13) = 13
[pid 24411] write(2, "unexpected signal during runtime execution", 42) = 42
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "[signal ", 8)     = 8
[pid 24411] write(2, "SIGSEGV: segmentation violation", 31) = 31
[pid 24411] write(2, " code=", 6)       = 6
[pid 24411] write(2, "0x1", 3)          = 3
[pid 24411] write(2, " addr=", 6)       = 6
[pid 24411] write(2, "0x0", 3)          = 3
[pid 24411] write(2, " pc=", 4)         = 4
[pid 24411] write(2, "0x7f095c89c617", 14) = 14
[pid 24411] write(2, "]\n", 2)          = 2
[pid 24411] write(2, "\nruntime stack:\n", 16) = 16
[pid 24411] write(2, "runtime.throw", 13) = 13
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "{", 1)            = 1
[pid 24411] write(2, "0x5562e18f20ef", 14) = 14
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0x0", 3)          = 3
[pid 24411] write(2, "}", 1)            = 1
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "/usr/local/go/src/runtime/panic.go", 34) = 34
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "1198", 4)         = 4
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0x71", 4)         = 4
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "runtime.sigpanic", 16) = 16
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "/usr/local/go/src/runtime/signal_unix.go", 40) = 40
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "719", 3)          = 3
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0x396", 5)        = 5
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "goroutine ", 10)  = 10
[pid 24411] write(2, "202", 3)          = 3
[pid 24411] write(2, " [", 2)           = 2
[pid 24411] write(2, "syscall", 7)      = 7
[pid 24411] write(2, "]:\n", 3)         = 3
[pid 24411] write(2, "runtime.cgocall", 15) = 15
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "0x5562e1870c90", 14) = 14
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0xc0006f7728", 12) = 12
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "/usr/local/go/src/runtime/cgocall.go", 36) = 36
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "156", 3)          = 3
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0x5c", 4)         = 4
[pid 24411] write(2, " fp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7700", 12) = 12
[pid 24411] write(2, " sp=", 4)         = 4
[pid 24411] write(2, "0xc0006f76c8", 12) = 12
[pid 24411] write(2, " pc=", 4)         = 4
[pid 24411] write(2, "0x5562df58527c", 14) = 14
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "github.com/coreos/go-systemd/v22/sdjournal._Cfunc_my_sd_journal_wait", 68) = 68
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "0x7f092c593cf0", 14) = 14
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0x7f0924003aa0", 14) = 14
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0x186a0", 7)      = 7
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "_cgo_gotypes.go", 15) = 15
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "584", 3)          = 3
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0x50", 4)         = 4
[pid 24411] write(2, " fp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7728", 12) = 12
[pid 24411] write(2, " sp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7700", 12) = 12
[pid 24411] write(2, " pc=", 4)         = 4
[pid 24411] write(2, "0x5562e1305f90", 14) = 14
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "github.com/coreos/go-systemd/v22/sdjournal.(*Journal).Wait.func1", 64) = 64
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "0x5562e188ce82", 14) = 14
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0xf", 3)          = 3
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0xc0006f77c0", 12) = 12
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "/go/pkg/mod/github.com/coreos/go-systemd/[email protected]/sdjournal/journal.go", 73) = 73
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "1032", 4)         = 4
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0x7c", 4)         = 4
[pid 24411] write(2, " fp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7770", 12) = 12
[pid 24411] write(2, " sp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7728", 12) = 12
[pid 24411] write(2, " pc=", 4)         = 4
[pid 24411] write(2, "0x5562e130b31c", 14) = 14
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "github.com/coreos/go-systemd/v22/sdjournal.(*Journal).Wait", 58) = 58
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "0xc000122c40", 12) = 12
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0x5f5e100", 9)    = 9
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "/go/pkg/mod/github.com/coreos/go-systemd/[email protected]/sdjournal/journal.go", 73) = 73
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "1032", 4)         = 4
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0xce", 4)         = 4
[pid 24411] write(2, " fp=", 4)         = 4
[pid 24411] write(2, "0xc0006f77b8", 12) = 12
[pid 24411] write(2, " sp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7770", 12) = 12
[pid 24411] write(2, " pc=", 4)         = 4
[pid 24411] write(2, "0x5562e130b22e", 14) = 14
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "github.com/elastic/beats/v7/filebeat/input/journald/pkg/journalread.(*Reader).checkForNewEvents", 95) = 95
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "0xc0006e0d50", 12) = 12
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "/go/src/github.com/elastic/beats/filebeat/input/journald/pkg/journalread/reader.go", 82) = 82
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "192", 3)          = 3
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0x33", 4)         = 4
[pid 24411] write(2, " fp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7830", 12) = 12
[pid 24411] write(2, " sp=", 4)         = 4
[pid 24411] write(2, "0xc0006f77b8", 12) = 12
[pid 24411] write(2, " pc=", 4)         = 4
[pid 24411] write(2, "0x5562e1310973", 14) = 14
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "github.com/elastic/beats/v7/filebeat/input/journald/pkg/journalread.(*Reader).Next", 82) = 82
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "0xc0006e0d50", 12) = 12
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "{", 1)            = 1
[pid 24411] write(2, "0x7f09342d7a78", 14) = 14
[pid 24411] write(2, ", ", 2)           = 2
[pid 24411] write(2, "0xc000a44440", 12) = 12
[pid 24411] write(2, "}", 1)            = 1
[pid 24411] write(2, ")\n", 2)          = 2
[pid 24411] write(2, "\t", 1)           = 1
[pid 24411] write(2, "/go/src/github.com/elastic/beats/filebeat/input/journald/pkg/journalread/reader.go", 82) = 82
[pid 24411] write(2, ":", 1)            = 1
[pid 24411] write(2, "167", 3)          = 3
[pid 24411] write(2, " +", 2)           = 2
[pid 24411] write(2, "0xc5", 4)         = 4
[pid 24411] write(2, " fp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7878", 12) = 12
[pid 24411] write(2, " sp=", 4)         = 4
[pid 24411] write(2, "0xc0006f7830", 12) = 12
[pid 24411] write(2, " pc=", 4)         = 4
[pid 24411] write(2, "0x5562e1310865", 14) = 14
[pid 24411] write(2, "\n", 1)           = 1
[pid 24411] write(2, "github.com/elastic/beats/v7/filebeat/input/journald.(*journald).Run", 67) = 67
[pid 24411] write(2, "(", 1)            = 1
[pid 24411] write(2, "0xc000ffdc00", 12) = 12
[pid 24411] write(2, ", ", 2)           = 2

I still believe we need a more convenient debugging setup that don't require extra tools.

@belimawr
Copy link
Contributor

belimawr commented Jan 27, 2022

I still believe we need a more convenient debugging setup don't require extra tools.

I could not agree more!

Putting that fact aside, do you know which version/commit of the beats you were running/debugging?

If it's the same code as on master, the fault looks to be here: https://github.com/elastic/beats/blob/206de56092f2480dbff796393aed3e263834a31f/filebeat/input/journald/pkg/journalread/reader.go#L192, I should be able to reproduce that without the agent.

Thanks for getting the stack trace!!

Do you know if this tests uses a real journald or a journal file?

@mtojek
Copy link
Contributor Author

mtojek commented Jan 27, 2022

Diagnostics:

root@docker-fleet-agent:/usr/share/elastic-agent# ./elastic-agent diagnostics
elastic-agent  version: 7.17.0
               build_commit: a574d89ab5ad8c5254d6851bd39d4add7f69ce32  build_time: 2022-01-27 03:53:01 +0000 UTC  snapshot_build: true
Applications:
  *  name: metricbeat_monitoring   route_key: default
     process: metricbeat           id: 0913ef3d-db2e-46bc-a7d5-1538abb7081f          ephemeral_id: 8e651d15-e22b-46db-8f72-09a0c08c7f24  elastic_license: true
     version: 7.17.0               commit: a574d89ab5ad8c5254d6851bd39d4add7f69ce32  build_time: 2022-01-27 01:49:00 +0000 UTC           binary_arch: amd64
     hostname: docker-fleet-agent  username: elastic-agent                           user_id: 1000                                       user_gid: 1000
  *  name: metricbeat              route_key: default
     process: metricbeat           id: 0913ef3d-db2e-46bc-a7d5-1538abb7081f          ephemeral_id: 8e651d15-e22b-46db-8f72-09a0c08c7f24  elastic_license: true
     version: 7.17.0               commit: a574d89ab5ad8c5254d6851bd39d4add7f69ce32  build_time: 2022-01-27 01:49:00 +0000 UTC           binary_arch: amd64
     hostname: docker-fleet-agent  username: elastic-agent                           user_id: 1000                                       user_gid: 1000
  *  name: filebeat                route_key: default
     process: filebeat             id: 9f554cde-8c8d-4b7b-85a0-d4f4b86dcadf          ephemeral_id: cc821ca9-e60a-4f2d-8066-312dbbf167e7  elastic_license: true
     version: 7.17.0               commit: a574d89ab5ad8c5254d6851bd39d4add7f69ce32  build_time: 2022-01-27 01:36:46 +0000 UTC           binary_arch: amd64
     hostname: docker-fleet-agent  username: elastic-agent                           user_id: 1000                                       user_gid: 1000
  *  name: filebeat_monitoring     route_key: default
     process: filebeat             id: 9f554cde-8c8d-4b7b-85a0-d4f4b86dcadf          ephemeral_id: cc821ca9-e60a-4f2d-8066-312dbbf167e7  elastic_license: true
     version: 7.17.0               commit: a574d89ab5ad8c5254d6851bd39d4add7f69ce32  build_time: 2022-01-27 01:36:46 +0000 UTC           binary_arch: amd64
     hostname: docker-fleet-agent  username: elastic-agent                           user_id: 1000                                       user_gid: 1000

@mtojek
Copy link
Contributor Author

mtojek commented Jan 27, 2022

That's the test input:

service: journald
input: journald
data_stream:
  vars:
    paths:
      - "{{SERVICE_LOGS_DIR}}/test.journal"
    tags: [forwarded]

source

@mtojek
Copy link
Contributor Author

mtojek commented Jan 28, 2022

We did some more debugging today with @belimawr and came a point:

root@620dc8cd6f21:/usr/share/elastic-agent# gdb --args ./data/elastic-agent-a574d8/install/filebeat-7.17.0-SNAPSHOT-linux-x86_64/filebeat
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./data/elastic-agent-a574d8/install/filebeat-7.17.0-SNAPSHOT-linux-x86_64/filebeat...
(No debugging symbols found in ./data/elastic-agent-a574d8/install/filebeat-7.17.0-SNAPSHOT-linux-x86_64/filebeat)
(gdb) run
Starting program: /usr/share/elastic-agent/data/elastic-agent-a574d8/install/filebeat-7.17.0-SNAPSHOT-linux-x86_64/filebeat
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f7387f61700 (LWP 1079)]
[New Thread 0x7f73875c0700 (LWP 1080)]
[New Thread 0x7f7386dbf700 (LWP 1081)]
[New Thread 0x7f738657e700 (LWP 1082)]
[New Thread 0x7f7385d7d700 (LWP 1083)]
[New Thread 0x7f738557c700 (LWP 1084)]
[New Thread 0x7f7384ceb700 (LWP 1085)]
[New Thread 0x7f7367fff700 (LWP 1086)]
[New Thread 0x7f73677fe700 (LWP 1087)]
[New Thread 0x7f7366ffd700 (LWP 1088)]
[New Thread 0x7f73667fc700 (LWP 1089)]
[New Thread 0x7f7365ffb700 (LWP 1090)]
[New Thread 0x7f73657fa700 (LWP 1091)]
[New Thread 0x7f7364ff9700 (LWP 1092)]
[New Thread 0x7f73477ff700 (LWP 1093)]
[New Thread 0x7f7346ffe700 (LWP 1094)]

Thread 17 "filebeat" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7346ffe700 (LWP 1094)]
__strchr_sse2 () at ../sysdeps/x86_64/multiarch/../strchr.S:32
32	../sysdeps/x86_64/multiarch/../strchr.S: No such file or directory.
(gdb) bt
#0  __strchr_sse2 () at ../sysdeps/x86_64/multiarch/../strchr.S:32
#1  0x00007f73840c4780 in ?? () from /lib/x86_64-linux-gnu/libsystemd.so.0
#2  0x00007f73840c139d in ?? () from /lib/x86_64-linux-gnu/libsystemd.so.0
#3  0x00007f73840c16a3 in ?? () from /lib/x86_64-linux-gnu/libsystemd.so.0
#4  0x00007f73840c1b62 in sd_journal_get_fd () from /lib/x86_64-linux-gnu/libsystemd.so.0
#5  0x00007f73840c1db0 in sd_journal_wait () from /lib/x86_64-linux-gnu/libsystemd.so.0
#6  0x000055f4cd470caf in _cgo_4c5c0c7e428f_Cfunc_my_sd_journal_wait ()
#7  0x000055f4cb1eba88 in runtime.asmcgocall.abi0 ()
#8  0x00007ffcc65389d7 in clock_gettime ()

It looks the problem is in libsystemd0 installed in the Agent image is the culprit (related issue):

root@b5e7aa26f9d4:/usr/share/elastic-agent# dpkg -l | grep libsystemd0
ii  libsystemd0:amd64          245.4-4ubuntu3.14                 amd64        systemd utility library

Based on the Github comments, we can overcome this by installing a newer version.

filebeat.yml:

➜  sample_logs git:(main) ✗ cat filebeat.yml
logging.level: debug

filebeat.inputs:
- type: journald
  paths:
  - /usr/share/elastic-agent/test.journal
  tags: [forwarded]

output.file:
  path: /usr/share/elastic-agent/output.log

@mtojek
Copy link
Contributor Author

mtojek commented Jan 28, 2022

Follow-up:

As the libsystemd0 update isn't available on the Ubuntu focal, I replaced the /etc/apt/sources.list with ones forhirsute. It allowed me to apt-get update and apt-get --only-upgrade install libsystemd0 the latest version:

root@b5e7aa26f9d4:/usr/share/elastic-agent# dpkg -l | grep libsystemd0
ii  libsystemd0:amd64          247.3-3ubuntu3.7                  amd64        systemd utility library

No problems were observed there :)

Here comes the question - should we update the base Docker image for Elastic Agent? It's confirmed now that it contains a broken library.

cc @ph @blakerouse @andrewkroh

@andrewkroh
Copy link
Member

I am in favor of using a newer version in both the Filebeat and Elastic Agent containers. There was a bug reported that required >=246 to address. So with 247 from hirsute that would fix the issue.

@andrewkroh
Copy link
Member

Maybe we move to 21.04 hirsute until the next LTS release comes out in April 2022 (Jammy Jellyfish).

@belimawr
Copy link
Contributor

@elastic/elastic-agent-control-plane because it is a problem with the Elastic-Agent Docker image, could you take a look at it?

@mtojek managed to fix it by updating the libsystemd0 package, but he had to use the package from a newer version of Ubuntu (details here).

@belimawr belimawr removed their assignment Jan 31, 2022
@ph
Copy link
Contributor

ph commented Jan 31, 2022

I agree, this fall into our side, @jlind23 fyi.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 1, 2022

@elastic/elastic-agent-control-plane Ok, this is moving way too slow in terms of triaging/investigating.

I'm going to temporarily disable both failing tests (journals and iptables). Both are panicking:
https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/main/84/tests

@jlind23
Copy link
Contributor

jlind23 commented Feb 1, 2022

@mtojek sorry but we had a couple of other SDH to triage before those failing tests. I'll do my best to put someone on it.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 1, 2022

Thanks, Julien, it would be great to do initial triage and prepare a plan for this.

@blakerouse
Copy link

Filed the issue against Ubuntu, hopefully we can get this fixed in the LTS so we can use the LTS for the docker image. I do not think switching to a development release for our images is a good idea.

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1959725

@jlind23
Copy link
Contributor

jlind23 commented Feb 8, 2022

@mtojek this is an experimental feature and thus it will not be fixed until further resolution on ubuntu side.
Can we close this issue then?

@mtojek
Copy link
Contributor Author

mtojek commented Feb 8, 2022

TBH I would keep it open, as we merged a skip-tests PR. This issue will help us remember to re-enable system tests.

@botelastic
Copy link

botelastic bot commented Feb 8, 2023

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Feb 8, 2023
@andrewkroh
Copy link
Member

There might be a workaround by moving the test data into a directory prefixed by /run as per the systemd code path that is crashing. That would cause it to return before the problematic line.

https://github.com/systemd/systemd/blob/ea500ac513cf51bcb79a5666f1519499d029428f/src/journal/sd-journal.c#L1625-L1631

@andrewkroh
Copy link
Member

I raised a PR to allow elastic-package to put test data into /run as a workaround for not having an Ubuntu container with systemd >= v246. See elastic/elastic-package#1236.

andrewkroh added a commit to andrewkroh/integrations that referenced this issue Apr 26, 2023
system tests for the journald input have been disabled to a segfault. This uses a workaround
to avoid that segfault so we can continue testing.

Closes elastic#2602
Relates elastic/elastic-package#1236
andrewkroh added a commit that referenced this issue Apr 27, 2023
system tests for the journald input have been disabled to a segfault. This uses a workaround
to avoid that segfault so we can continue testing.

While performing that testing I discovered that neither iptables nor journald were
aligned with the current ECS definition of the log.syslog.* fields. ECS added
numerous log.syslog fields that should be used by journald/iptables
instead of syslog.*.

And because journald is an input package this needs to be done without
an Ingest Pipeline so that users with custom pipelines can benefit.

Bump stack version for the iptables integration to get journald input fixes.

Closes #2602
Relates elastic/elastic-package#1236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants