-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault tests are flaky #21095
Comments
/cc @vsevel |
Let me try a quick bandaid |
Hopefully fixes: quarkusio#21095
Let's see if #21097 does anything |
Reopening as #21105 failed. It seems like the increase in timeout in #21097 got this moving more, but now we have this error:
@vsevel any idea what's going on? Should we perhaps automatically retry when the connection was closed? |
hi @geoand, that is how I create the web client:
I am not convinced using the concept of read timeout does translate well into vertx's idle timeout. but when I did this, I did not find any I use the read timeout as well in:
so the that said, we would have an issue anyway (whether it is a closed connection or a timeout), because for some reason there is an issue communicating with vault. in our project we have been hit a lot with timeout exceptions communicating with vault. since we are using the vault config source, those exceptions would happen at startup time, and get the app to not start, get killed, and we would have to get it to restart. that is why I did #20343. we tried to investigate this communication issue, looking at vault's audit logs. but all calls received by vault were processed in 5ms. we could not pinpoint something wrong in vault. rather than vault being slow, this gave the impression (without being able to prove it) that the quarkus client was actually not sending the http request, but yet waiting for a response, which would never come back, and finish up in a timeout. again just an impression, but I have developed the feeling that I did not have this behavior when working with the original okhttp client that I was using at the beginning. may be a review of vault's if we want a bandaid, we can certainly do some retries on the type of exception above. note however that it is not as simple as retrying if all errors happen in the init phase, we can fix them one by one, a bit painful but certainly doable. I would be surprised though it is is happening only there, and if we make this part more resilient, there is a chance that test may fail anyway on the calls that are made in the tests themselves (as opposed to in the initialization). I have the feeling that we have an issue at the communication level. it would be ideal to get that solved instead. I realize how painful an unreliable CI can be however, so I can definitely help in making some of these interactions more resilient. |
@cescoffier could you perhaps take a look when you have some time? |
Hopefully fixes: quarkusio#21095 (cherry picked from commit 4e89bdf)
Description
Seen those fail a lot recently (mostly on Java 17?), e.g.:
#21078 (comment)
Might just be caused by slow CI.
/cc @vsevel @stuartwdouglas
Implementation ideas
No response
The text was updated successfully, but these errors were encountered: