Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promtail: fix syslog udp messages never processed #7231

Closed
wants to merge 6 commits into from
Closed

promtail: fix syslog udp messages never processed #7231

wants to merge 6 commits into from

Conversation

unawarez
Copy link

@unawarez unawarez commented Sep 23, 2022

What this PR does / why we need it:

Prior to this patch, incoming UDP syslogs would only be sent to promtail targets all at once when promtail shut down. The ultimate cause seems to be that go-syslog's stream parsers are not emitting any messages until the input stream is closed, even after newline-delimiting the messages. This patch closes those streams after one packet's contents and a newline are sent.

Which issue(s) this PR fixes:

#6772 is probably this

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide
  • Documentation added (n/a)
  • Tests updated (n/a)
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md (n/a)

@unawarez unawarez requested a review from a team as a code owner September 23, 2022 00:30
@CLAassistant
Copy link

CLAassistant commented Sep 23, 2022

CLA assistant check
All committers have signed the CLA.

@unawarez unawarez marked this pull request as draft September 23, 2022 01:11
@unawarez
Copy link
Author

I'm backing this out to a draft, because it created another bug on a different machine... I think it's something about having a consistent source port for the packets, which points to something about UDPTransport.streams keeping pipes based on source address.

@unawarez unawarez marked this pull request as ready for review September 23, 2022 02:25
@unawarez unawarez changed the title promtail: fix syslog udp reader blocking forever promtail: fix syslog udp messages never processed Sep 23, 2022
@unawarez
Copy link
Author

OK I've now seen this work on both macOS and FreeBSD, forwarding real syslogs from the latter.

I just could not get any logs to post without closing the internal io.Pipes. I sort of wonder if it's framing-related, and if writing an octet count in front of the packet would make it work without closing...

Closing the pipes also meant they couldn't be persisted in the streams map. Without that, it might be worth directly using the non-stream-based parsers from go-syslog and dropping the internal pipes altogether for UDP...

@chaudum
Copy link
Contributor

chaudum commented Sep 23, 2022

Thanks @unawarez for looking into this again!

Closing the pipes also meant they couldn't be persisted in the streams map. Without that, it might be worth directly using the non-stream-based parsers from go-syslog and dropping the internal pipes altogether for UDP...

That sounds like a reasonable approach to me. However, I would like to know whether this has performance implications. Writing a benchmark test would be great!

@unawarez
Copy link
Author

Cool, I can try writing a benchmark test over the next couple days

@stale
Copy link

stale bot commented Nov 2, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Nov 2, 2022
@MichelHollands MichelHollands removed the stale A stale issue or PR that will automatically be closed. label Nov 7, 2022
@MichelHollands
Copy link
Contributor

@unawarez Can you rebase this PR? The CI is failing due to a change in the protobuf files. Also did you have any luck writing a benchmark?

@unawarez
Copy link
Author

unawarez commented Nov 8, 2022 via email

@unawarez unawarez marked this pull request as draft November 8, 2022 04:45
@unawarez
Copy link
Author

unawarez commented Nov 8, 2022

somehow f7fa4c4 was categorically worse, and did not fix the extra newline test failure, which main does not have

@unawarez
Copy link
Author

unawarez commented Nov 8, 2022

I found there is already a Benchmark_SyslogTarget. On my machine I get this for current main, a1d46d2:

% (cd ../../pkg/promtail/targets/syslog; go clean && go test -bench=SyslogTarget -run='^#' -count=5)
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/syslog
Benchmark_SyslogTarget/tcp-10         	      1	1001671917 ns/op	  46704 B/op	    437 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1001956250 ns/op	  46160 B/op	    433 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1001519250 ns/op	  46368 B/op	    431 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1001915375 ns/op	  46736 B/op	    432 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1002232375 ns/op	  45792 B/op	    429 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001507250 ns/op	  45096 B/op	    468 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001478791 ns/op	  46048 B/op	    476 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001407458 ns/op	  45416 B/op	    471 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001726917 ns/op	  45992 B/op	    477 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001358750 ns/op	  45352 B/op	    470 allocs/op
PASS
ok  	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	10.718s

On this PR 94f4b16ffb2f7ef835190c7f4cfc6461686eb4b8 I get this instead:

Benchmark_SyslogTarget/tcp-10         	      1	1001681458 ns/op	  46296 B/op	    436 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1002053958 ns/op	  46736 B/op	    432 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1001664959 ns/op	  46048 B/op	    432 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1002507042 ns/op	  46112 B/op	    429 allocs/op
Benchmark_SyslogTarget/tcp-10         	      1	1001686292 ns/op	  46400 B/op	    432 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1002141375 ns/op	 161752 B/op	    802 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1002111208 ns/op	 158616 B/op	    793 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001899958 ns/op	 159288 B/op	    783 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001866958 ns/op	 161096 B/op	    798 allocs/op
Benchmark_SyslogTarget/udp-10         	      1	1001939333 ns/op	 156520 B/op	    776 allocs/op

Much worse heap activity for sure, and still has the extra newline test failure. I'll see what else I can do.

But! There's another interesting story if you run more than one iteration. That's why I was using -count=5 above. If I replace -count=5 with -benchtime=2s then even on current main a1d46d2 the UDP benchmark errors out:

% (cd ../../pkg/promtail/targets/syslog; go clean && go test -bench=SyslogTarget -run='^#' -benchtime=2s)
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/syslog
Benchmark_SyslogTarget/tcp-10         	  31761	    64496 ns/op	  32235 B/op	    360 allocs/op
Benchmark_SyslogTarget/udp-10         	--- FAIL: Benchmark_SyslogTarget/udp-10
    syslogtarget_test.go:350: 
        	Error Trace:	/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/syslogtarget_test.go:350
        	           				/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/benchmark.go:193
        	           				/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/benchmark.go:334
        	           				/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/asm_arm64.s:1172
        	Error:      	Condition never satisfied
        	Test:       	Benchmark_SyslogTarget/udp
        	Messages:   	expected: 9210 got:2072
--- FAIL: Benchmark_SyslogTarget
FAIL
exit status 1
FAIL	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	44.354s

Maybe it's the "udp messages never emitted" bug manifesting, or maybe it's a coincidence.

Either way I should have time tomorrow to try out the string-based parser (#7231 (comment)) and be able to get a comparison against main using this benchmark. I have some hope that approach could get past the extra newline issue.

@unawarez
Copy link
Author

unawarez commented Nov 9, 2022

I dug pretty deep into the tests trying to figure out the flakiness. I'm about to rebase and start over with a commit that only makes some changes to the unit test and benchmark for UDP. The biggest thing is I think octet counting inside UDP packets is not allowed. Octet counting is only defined in the TCP RFC 6587. The UDP RFC 5426 just says the packet contents are formatted "according to RFC 5424".

@pull-request-size pull-request-size bot added size/M and removed size/S labels Nov 9, 2022
@unawarez
Copy link
Author

unawarez commented Nov 9, 2022

OK, here is a comparison of all 3 commits above. Sadly -benchtime 1.1s still makes all the UDP benchmarks die on all these commits, so it's still a -count 5 comparison.

Anyway I'm inclined to think udp_octetcounting and udp_newline failures are not true problems based on RFC digging noted above. I think the extra newline failure is because the test sender is putting a newline in the packet, and now the UDP pipeline is faithfully preserving that newline.

edit: deleted giant block of outdated test & bench output

@unawarez unawarez marked this pull request as ready for review November 9, 2022 03:22
@unawarez
Copy link
Author

unawarez commented Nov 9, 2022

I don't think I'll be able to keep digging, but as one more experiment with the benchtime=1.1s UDP benchmarks failing, I added a 1ms sleep after this line https://github.com/grafana/loki/blob/1f0a3c291d3e7498982a06e7f542697008af7d53/clients/pkg/promtail/targets/syslog/syslogtarget_test.go#L348 and then the benchmark passed. That makes me think the failure is from some UDP packets actually getting dropped when too many are sent at once...

@unawarez
Copy link
Author

I learned more about some golang APIs and looked at syslogtarget_test again, and I'd actually like to propose another commit that tightens up the benchmark implementation. AFAICT it only had a 1s measurement resolution because it's relying on require.Eventually to block with a 1s tick. I'd like to write another fake Client impl that directly exposes a sync primitive for the caller to block on. I should have an example of that here pretty soon.

% go test -run SyslogTarget
level=info msg="syslog listening on address" address=127.0.0.1:64941 protocol=tcp tls=false
level=info address=127.0.0.1:64941 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64943 protocol=tcp tls=false
level=info address=127.0.0.1:64943 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:51651 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:51651: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:53346 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:53346: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64945 protocol=tcp tls=false
level=info address=127.0.0.1:64945 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64947 protocol=tcp tls=false
level=info address=127.0.0.1:64947 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64951 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:64951 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64953 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:64953 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64955 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:64955 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64958 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:64958 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64961 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="invalid or unsupported framing. first byte: 'x'"
level=info address=127.0.0.1:64961 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64963 protocol=tcp tls=false
level=info address=127.0.0.1:64963 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:64965 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="read tcp 127.0.0.1:64965->127.0.0.1:64966: i/o timeout"
level=info address=127.0.0.1:64965 msg="syslog server shutting down" protocol=tcp err="context canceled"
PASS
ok  	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	2.738s

% go test -run \^\# -bench SyslogTarget -benchtime 20ms -count 3
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/syslog
Benchmark_SyslogTarget/tcp-10         	    380	    53253 ns/op	  28766 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp-10         	    382	    65782 ns/op	  28766 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp-10         	    375	    67906 ns/op	  28763 B/op	    360 allocs/op
Benchmark_SyslogTarget/udp-10         	    270	    85868 ns/op	  28694 B/op	    410 allocs/op
Benchmark_SyslogTarget/udp-10         	    283	    84983 ns/op	  28688 B/op	    410 allocs/op
Benchmark_SyslogTarget/udp-10         	    273	    81703 ns/op	  28690 B/op	    410 allocs/op
PASS
ok  	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	0.938s
This runs another variant of the syslog target UDP tests and benchmark.
The raw version does nothing to the input string i.e. sends the message
as a packet with no framing.

% go test -run SyslogTarget
level=info msg="syslog listening on address" address=127.0.0.1:65161 protocol=tcp tls=false
level=info address=127.0.0.1:65161 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65163 protocol=tcp tls=false
level=info address=127.0.0.1:65163 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:55256 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:55256: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=warn msg="error parsing syslog stream" err="unexpected EOF"
level=info msg="syslog listening on address" address=127.0.0.1:50342 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:50342: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:61955 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:61955: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
--- FAIL: TestSyslogTarget (4.11s)
    --- FAIL: TestSyslogTarget/udp_raw (2.01s)
        syslogtarget_test.go:468:
            	Error Trace:	/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/syslogtarget_test.go:468
            	Error:      	Condition never satisfied
            	Test:       	TestSyslogTarget/udp_raw
            	Messages:   	Expected to receive 3 messages.
level=info msg="syslog listening on address" address=127.0.0.1:65165 protocol=tcp tls=false
level=info address=127.0.0.1:65165 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65167 protocol=tcp tls=false
level=info address=127.0.0.1:65167 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65171 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:65171 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65173 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:65173 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65175 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:65175 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65178 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:65178 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65181 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="invalid or unsupported framing. first byte: 'x'"
level=info address=127.0.0.1:65181 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65183 protocol=tcp tls=false
level=info address=127.0.0.1:65183 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65185 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="read tcp 127.0.0.1:65185->127.0.0.1:65186: i/o timeout"
level=info address=127.0.0.1:65185 msg="syslog server shutting down" protocol=tcp err="context canceled"
FAIL
exit status 1
FAIL	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	4.757s

% go test -run \^\# -bench SyslogTarget -benchtime 20ms -count 3
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/syslog
Benchmark_SyslogTarget/tcp_octetcounting-10         	    429	    70874 ns/op	  28763 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp_octetcounting-10         	    460	    63785 ns/op	  28758 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp_octetcounting-10         	    447	    68253 ns/op	  28760 B/op	    360 allocs/op
Benchmark_SyslogTarget/udp_octetcounting-10         	    248	    86094 ns/op	  28700 B/op	    410 allocs/op
Benchmark_SyslogTarget/udp_octetcounting-10         	    290	    78608 ns/op	  28688 B/op	    410 allocs/op
Benchmark_SyslogTarget/udp_octetcounting-10         	    302	    70038 ns/op	  28681 B/op	    410 allocs/op
--- FAIL: Benchmark_SyslogTarget/udp_raw
    syslogtarget_test.go:343: test timed out
--- FAIL: Benchmark_SyslogTarget
FAIL
exit status 1
FAIL	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	8.972s
Prior to this patch, incoming UDP syslogs would only be sent to promtail
targets all at once when promtail shut down. The ultimate cause seems to
be that `go-syslog`'s stream parsers are not emitting any messages until
the input stream is closed, even after newline-delimiting the messages.
This patch closes those streams after one packet's contents and a
newline are sent.

% go test -run SyslogTarget
level=info msg="syslog listening on address" address=127.0.0.1:65036 protocol=tcp tls=false
level=info address=127.0.0.1:65036 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65038 protocol=tcp tls=false
level=info address=127.0.0.1:65038 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:53197 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:53197: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:62837 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:62837: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:54772 protocol=udp
level=warn msg="error parsing syslog stream" err="found ILLEGAL(\n), expecting a MSGLEN"
level=warn msg="error parsing syslog stream" err="found ILLEGAL(\n), expecting a MSGLEN"
level=warn msg="error parsing syslog stream" err="found ILLEGAL(\n), expecting a MSGLEN"
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:54772: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
--- FAIL: TestSyslogTarget (3.12s)
    --- FAIL: TestSyslogTarget/udp_newline_separated (1.03s)
        syslogtarget_test.go:489:
            	Error Trace:	/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/syslogtarget_test.go:489
            	Error:      	Not equal:
            	           	expected: "An application event log entry..."
            	           	actual  : "An application event log entry...\n"

            	           	Diff:
            	           	--- Expected
            	           	+++ Actual
            	           	@@ -1 +1,2 @@
            	           	An application event log entry...
            	           	+
            	Test:       	TestSyslogTarget/udp_newline_separated
level=info msg="syslog listening on address" address=127.0.0.1:65040 protocol=tcp tls=false
level=info address=127.0.0.1:65040 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65042 protocol=tcp tls=false
level=info address=127.0.0.1:65042 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65046 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:65046 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65048 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:65048 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65050 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:65050 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65053 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:65053 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65056 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="invalid or unsupported framing. first byte: 'x'"
level=info address=127.0.0.1:65056 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65058 protocol=tcp tls=false
level=info address=127.0.0.1:65058 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65060 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="read tcp 127.0.0.1:65060->127.0.0.1:65061: i/o timeout"
level=info address=127.0.0.1:65060 msg="syslog server shutting down" protocol=tcp err="context canceled"
FAIL
exit status 1
FAIL	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	3.767s

% go test -run \^\# -bench SyslogTarget -benchtime 20ms -count 3
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/syslog
Benchmark_SyslogTarget/tcp_octetcounting-10         	    414	    54300 ns/op	  28766 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp_octetcounting-10         	    453	    69232 ns/op	  28758 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp_octetcounting-10         	    402	    67254 ns/op	  28764 B/op	    360 allocs/op
Benchmark_SyslogTarget/udp_octetcounting-10         	    180	   136459 ns/op	 151979 B/op	    742 allocs/op
Benchmark_SyslogTarget/udp_octetcounting-10         	    154	   134058 ns/op	 151945 B/op	    741 allocs/op
Benchmark_SyslogTarget/udp_octetcounting-10         	    166	   127600 ns/op	 151845 B/op	    741 allocs/op
Benchmark_SyslogTarget/udp_raw-10                   	    238	    96841 ns/op	  97031 B/op	    662 allocs/op
Benchmark_SyslogTarget/udp_raw-10                   	    230	    96199 ns/op	  96994 B/op	    660 allocs/op
Benchmark_SyslogTarget/udp_raw-10                   	    223	    96208 ns/op	  96888 B/op	    660 allocs/op
PASS
ok  	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	1.198s
This uses the standalone parser object in `go-syslog`.

% go test -run SyslogTarget
level=info msg="syslog listening on address" address=127.0.0.1:65083 protocol=tcp tls=false
level=info address=127.0.0.1:65083 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65085 protocol=tcp tls=false
level=info address=127.0.0.1:65085 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:61376 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:61376: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:51527 protocol=udp
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:51527: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:51734 protocol=udp
level=warn msg="error parsing syslog stream" err="expecting a priority value within angle brackets [col 0]"
level=warn msg="error parsing syslog stream" err="expecting a priority value within angle brackets [col 0]"
level=warn msg="error parsing syslog stream" err="expecting a priority value within angle brackets [col 0]"
level=warn msg="error parsing syslog stream" err="expecting a priority value within angle brackets [col 0]"
level=warn msg="error parsing syslog stream" err="expecting a priority value within angle brackets [col 0]"
level=warn msg="error parsing syslog stream" err="expecting a priority value within angle brackets [col 0]"
level=warn msg="failed to read packets" addr=null err="read udp 127.0.0.1:51734: use of closed network connection"
level=info msg="syslog server shutting down" protocol=udp err="context canceled"
--- FAIL: TestSyslogTarget (4.11s)
    --- FAIL: TestSyslogTarget/udp_newline_separated (1.03s)
        syslogtarget_test.go:489:
            	Error Trace:	/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/syslogtarget_test.go:489
            	Error:      	Not equal:
            	           	expected: "An application event log entry..."
            	           	actual  : "An application event log entry...\n"

            	           	Diff:
            	           	--- Expected
            	           	+++ Actual
            	           	@@ -1 +1,2 @@
            	           	An application event log entry...
            	           	+
            	Test:       	TestSyslogTarget/udp_newline_separated
    --- FAIL: TestSyslogTarget/udp_octetcounting (2.02s)
        syslogtarget_test.go:468:
            	Error Trace:	/Users/me/shelf/loki/clients/pkg/promtail/targets/syslog/syslogtarget_test.go:468
            	Error:      	Condition never satisfied
            	Test:       	TestSyslogTarget/udp_octetcounting
            	Messages:   	Expected to receive 3 messages.
level=info msg="syslog listening on address" address=127.0.0.1:65087 protocol=tcp tls=false
level=info address=127.0.0.1:65087 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65089 protocol=tcp tls=false
level=info address=127.0.0.1:65089 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65093 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:65093 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65095 protocol=tcp tls=true
level=warn msg="error parsing syslog stream" err="expecting a RFC3339MICRO timestamp or a nil value [col 7]"
level=warn msg="error parsing syslog stream" err="expecting a msgid (from 1 to max 32 US-ASCII characters) or a nil value [col 42]"
level=info address=127.0.0.1:65095 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65097 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:65097 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65100 protocol=tcp tls=true
level=warn msg="error initializing syslog stream" err="tls: client didn't provide a certificate"
level=info address=127.0.0.1:65100 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65103 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="invalid or unsupported framing. first byte: 'x'"
level=info address=127.0.0.1:65103 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65105 protocol=tcp tls=false
level=info address=127.0.0.1:65105 msg="syslog server shutting down" protocol=tcp err="context canceled"
level=info msg="syslog listening on address" address=127.0.0.1:65107 protocol=tcp tls=false
level=warn msg="error initializing syslog stream" err="read tcp 127.0.0.1:65107->127.0.0.1:65108: i/o timeout"
level=info address=127.0.0.1:65107 msg="syslog server shutting down" protocol=tcp err="context canceled"
FAIL
exit status 1
FAIL	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	4.755s

% go test -run \^\# -bench SyslogTarget -benchtime 20ms -count 3
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/syslog
Benchmark_SyslogTarget/tcp_octetcounting-10         	    447	    60722 ns/op	  28763 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp_octetcounting-10         	    699	    31549 ns/op	  28751 B/op	    360 allocs/op
Benchmark_SyslogTarget/tcp_octetcounting-10         	    705	    33085 ns/op	  28749 B/op	    360 allocs/op
--- FAIL: Benchmark_SyslogTarget/udp_octetcounting
    syslogtarget_test.go:343: test timed out
Benchmark_SyslogTarget/udp_raw-10                   	    163	   143320 ns/op	  34686 B/op	    453 allocs/op
Benchmark_SyslogTarget/udp_raw-10                   	    277	    72205 ns/op	  33976 B/op	    450 allocs/op
Benchmark_SyslogTarget/udp_raw-10                   	    374	    58167 ns/op	  34307 B/op	    451 allocs/op
--- FAIL: Benchmark_SyslogTarget
FAIL
exit status 1
FAIL	github.com/grafana/loki/clients/pkg/promtail/targets/syslog	9.120s
@pull-request-size pull-request-size bot added size/L and removed size/M labels Nov 10, 2022
@davinkevin
Copy link

+1 on this PR, it's a very important one if you want, like me, to use promtail+syslog for nginx, because nginx doesn't support TCP for syslog.

For more context, we are about to chose our logging component for our platform and this small bug can prevent us to use promtail, even if we already use loki for aggregation.

I really hope we'll be able to see this PR merged soon! 😇

@unawarez
Copy link
Author

unawarez commented Dec 12, 2022

Because I left this with a bit of an infodump, I'll write a summary of where I left it:

  • The benchmark that already exists wasn't really working because it blocked on a 1-second wait for completion. The first commit is just about tightening that up, so probably that commit will be helpful no matter what. The other commits depend on making a bunch of decisions that I don't feel like I can drop in and make, so I think you could edit and/or cherry-pick some.

    One Heisenbug that still exists with the UDP benchmarks is, if it sends too many packets too fast then some packets are genuinely dropped by the OS. So the benchmark works fine up to some number of simultaneous requests and then it gets flaky. I didn't do any work toward fixing that.

  • Commit 2 is the first maybe-opinionated commit. Currently the tests and benchmarks are parameterized as {tcp newline-separated, tcp octet-counted, udp newline-separated, udp octet-counted}. I added a "udp raw" variant, which means the UDP payload exactly equals the syslog message. I think this is the only well-defined format for UDP syslogs (see mentions of RFCs some comments above). The other part of the story with these parameterized tests is: the UDP newline and UDP octet-counted variants fail on my new implementations, and tl;dr I think a valid way to fix those test failures is to remove UDP-newline and UDP-octet-counted, but I didn't do that in any commit here. I believe those UDP test variants have only been passing because of the current implementation details, and I also think the lack of a "UDP raw" flavor was how this PR's original issue was never caught. There are more details in the comments/commits.

  • The last 2 commits are 2 different approaches for fixing this, using the stream-based syslogparser or the string-based syslogparser. All the commits in this PR have test and bench output in the commit messages, but it will be a little outdated now, and unfortunately I don't have the env set up anymore to re-run them. In the end the numbers were disappointing compared to the original implementation. But they both did solve the issue. Maybe someone with real Golang experience could improve. (Looking at the syslogparser dependency's code could maybe be worthwhile too...)

Happy to explain anything, otherwise I leave it up to the maintainers to take/edit what they will ✌️

@MasslessParticle
Copy link
Contributor

This looks good so far. CI is failing because you've got a test failure and a linting error.

@jeschkies
Copy link
Contributor

@unawarez would you like us to take over?

@unawarez
Copy link
Author

@MasslessParticle @jeschkies yes, see #7231 (comment)

@jeschkies
Copy link
Contributor

@unawarez World you grant me write access to your form so I can push to this branch? This would simplify things.

@jeschkies
Copy link
Contributor

I've took a closer look and I'm not sure this approach is the correct one. Seems the original issue #6772 has more details.

@MasslessParticle
Copy link
Contributor

Based on the comments/activity for this one, I'm going to close it for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants