-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statsd meter recordings are dropped when submitted from parallel threads #2880
Comments
Looks like the following change in StatsdCounter will make make things work for N=10:
Problem is that we try to emit events to a sink from different threads and the current behavior of Sinks.many() sinks is to fail fast if events are not serialized. Apparently it was different with Processors before. This is all based on https://stackoverflow.com/questions/65379026/reactor-handle-fail-non-serialized-error-from-a-many-sink?noredirect=1&lq=1 |
This discovery was a little problematic as
It claims that everything is great while it isn't. And doesn't even debug log anything. |
Actually, aforementioned busy-loop solution doesn't really work if I try to set N to 1000 in a test. Too much busy work is happening. |
Thank you for the detailed investigation. That is very helpful. We will discuss with the Reactor team how we should handle this. |
Noticed the title change. |
can you try this as a workaround: change the underlying Sinks.unsafe().many().unicast().onBackpressureBuffer(
Queues.unboundedMultiproducer().get()); If I understand correctly, the sink is already subscribed to only once so Unicast is an option. The above code will create an "unsafe" implementation of the sink, meaning there will be no attempt at detecting concurrent usage. Instead, it will rely on the MPSC properties of the unboundedMultiproducer queue for serialization. |
For the sake of documentation:
|
until we figure out what to do. fixes micrometer-metricsgh-2880
related: micrometer-metricsgh-2880 Co-Authored-By: Denis Khitrik <[email protected]>
I moved the code back to the deprecated |
I've updated the 1.8.0 release notes calling this out as a known issue at the top. We'll work on getting a release out in the next day or two with the fix going back to the |
Yes, I can do that. Thank you!
|
The 1.8.1-SNAPSHOT now contains the change reverting to the deprecated API. Could you please let us know if it looks alright before we do the release @hdv? Thanks for reporting the issue and all the investigation including a test case. Much appreciated and sorry for the regression. |
@shakuzen I'll report back shortly after I try it in a live test system. However, I'd like to point one more thing out. If we set N equal to 1000 in the test, it still fails for both parallel and sequential cases. Any idea why that could happen? |
Also I noticed that TcpClient batches samples disregarding the buffering setting. Is that intentional? |
@shakuzen I can see that 1.8.1-SNAPSHOT behavior is back on par with 1.7.6! Confirmed in my test environment. Please note, the questions above are still standing. |
@hdv Thank you very much confirming it! |
@hdv New patch releases are out, you can use a release version now: https://github.com/micrometer-metrics/micrometer/releases/tag/v1.8.1 Thank you very much for reporting and investigating this! |
Yes, we noticed that and have some ideas on why it is happening but need to investigate further. For what it's worth, it is the same behavior on 1.7.x, so it is pre-existing behavior, which is why we're considering it separate from this regression and did not block the patch release on it.
Looking at the packet splitting in the TCP case is one of the things we need to look into more. |
Describe the bug
The following test fails when added to
StatsdMeterRegistryPublishTest
:It succeeds when
.parallel()
line is commented out.Environment
main
branchjava -version
]~/projects/micrometer #v1.8.0 *3 !1 ❯ java --version base
openjdk 11.0.2 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
To Reproduce
How to reproduce the bug:
See the test above
Expected behavior
Test should succeed
Additional context
.parallel()
The text was updated successfully, but these errors were encountered: