-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS connection error oversize record received
#164
Comments
Additional context for this issue in this Slack thread: https://natsio.slack.com/archives/C069GSYFP/p1563202735013400 |
Further analysis with the latest version of nats.rb (0.11.0) showed one time issue with a new error message:
But after few other attempts was not seen again. However we lose NATS messages that did not reach the server. On the server side we observed to the time when few messages were lost, the client opened two connections and only one was used. Which was not the case with version 0.9.2. Additionally we see SSL connection errors in EM, but not on the NATS server side. I will provide @wallyqs with logs from our setup. We tried quite a lot of different options with no success. Please advise what is the best option from here on to proceed? |
Providing the logs would be great. We are actively trying to determine the root cause of the issue. |
Thanks for sharing the info and the logs. Currently still trying to reproduce this issue, the ordering of the CONNECT and INFO protocol should be resolved with the upgrade to v0.11.0, so still not sure which is the condition that the client it is running into right now. One issue that we can see from the logs is that there are
It is possible to lower down the ping/pong interval from the configuration file in order to detect these clients sooner (in nats-server v1.3.0 this should work): ping_interval: 15 # seconds There are a couple of other callbacks that also may help to implement to gather more info about the condition of the client besides NATS.on_error do |e|
puts "Error: #{e}"
end
NATS.on_close do
puts "Connection closed"
end
NATS.on_disconnect do |e|
puts "Disconnected! Reason: #{e}"
puts "Pending data: #{NATS.pending_data_size}"
end
NATS.on_reconnect do |nats|
puts "Reconnected! Pending data: #{nats.pending_data_size}"
end Note that while the client is reconnecting the client will be buffering some of the commands (pending_data_size) but they will be flushed once it reaches the connected state but some of the commands may be lost when it disconnects. |
@wallyqs thanks for the prompt response! Is there a why to find out, which commands get lost during disconnects? |
@wallyqs Not sure if this additional points you refer would bring much more valuable info, but we will include it in the further analysis. I did already mentioned in private message that we traced the messages to just before the OpenSSL EM helper classes do encrypt and send via the connection. We added to the EventMachine cpp coding some debug traces. The Please advise how to continue on investigating the issue, what kind of tracing/debug you would need to fully understand reproduce the issue? |
I think we should all jump on a Google Hangout or a Zoom call as a next step. WDYT? |
Yes, it makes sense. Let discuss in Slack about how and when. |
Sounds good. |
We are from the BOSH team, and we use nats.rb client with TLS enabled. We have a problem with NATS server error:
oversized record received
.So we observed that the client sends plaintext messages to server before TLS is started, which results in the
oversized
error.We face the issue only when there is heavy load of NATS messages from BOSH Director. We use nats.rb version 0.9.2, but we tried with 0.11.0 as well and the issue was reproducible.
Here an example what we see with tshark. Director is the client and uses the nats.rb version 0.9.2. nats is the server in version 1.3.0, which is forked and can be found here.
In case the connection fails with TLS handshake error:
Normally when the communication is successful:
The difference is that the NATS
CONNECT
message in the error case is sent before theINFO
message and before the TLS is started. The main issue for us is that we lose messages that do not reach the NATS server in this scenario. Is there a guarantee that the NATS client will deliver the messages to the NATS server or in case of reconnect errors some may be lost (e.g. in EM/SSL buffers).The text was updated successfully, but these errors were encountered: