-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NotSupportedException in Connection.publish getting BufferedStream.Position #586
Comments
I also had this issue, can you please keep me posted here? |
I've come across this issue, and it is consistently repeatable in my environment with a single NATS server. My expectation was that this buffer would be present and used even if the connection is out for a period of time and isn't just when handling switching between servers. @DavidSimner I think you are definitely on the right track and seems to be a race condition. The moment I put break points in, the publishing thread is updated with the "correct" updated It's repeatable across multiple .Net and NATS.Client packages on my machine. I haven't gotten familiar enough with the source to know what would be a good fix but I'm willing to help test if anyone wants to look at this. |
@rlabbe82 Any chance you have some pseudo / code I could use to repeat this? What version of the clients and service are you using? You said it's repeatable across clients? Which client and also do you have any code I could pass on to the client dev team? |
@scottf Yeah I do. After writing this in I think I figured out what is happening. @DavidSimner's thought experiment is what I am hitting. In the test I'm stopping the local container that is running the NATS server. This has the server close the connection and forces the client to go into the reconnection process. When the reconnection thread runs and calls This then replaces the Seems to be a corner case but I could see it popping up sporadically with network layers like Docker that may not have completely released the external socket and allows for the reconnection to initially be successful. As a quick test I added a line to replace Here is the program that I just put together to test with.
Found with NATS.Client 1.07, checked on 1.1.1 and rolled back to 0.11.0 as a quick test. |
Thanks for the input, that's an interesting observation about when the server is in a container. |
@scottf Also seeing this bug in our logs (the broker was restarted by our update utility on an embedded Yocto system)
|
So the server running in the container was restarted? Or was the entire container restarted. Probably doesn't matter, it's the client that is having this error. Since this is during a synchronous publish, it seems reasonable that your just handle the exception and wait for the reconnect to resume publishing. (I have a really good example of this in Java if you are interested.) I think the most important question is does the connection go into reconnect logic and recover? |
In my case it would never recover. |
Sounds like a very similar issue I ran into. I can reliably reproduce this when connecting to Synadia Cloud. The entire Program.cs is below. When delay is 0 it works. Delay a few seconds or more throws. Notably the library (not the repro code) writes to console every second or so (in case the delay is greater than a second) reconnect events:
Exception:
Using |
Hi @scottf , any thoughts on this? Happy to create a new issue if it's not exactly the same problem. But as it stands, we are evaluating Synadia Cloud, and the client just keeps disconnecting. Effectively we can't publish messages apart from immediately after initial connection. |
Can you please share more information on your outgoing networking. Does this happen with non encoded connection? Does this happen when you are hitting a local dev server? Any vpn or proxy in between? You get the idea. I need more details here to be able to help. |
Just a dev laptop, no fancy proxies. When using Literally the below 3 lines always throws for me on Publish when using v1 client (File - New - Project - Program.cs):
Similar functionality using v2 client works as expected on the same machine:
|
This is being addressed in https://github.com/nats-io/nats.net/pull/888 |
@scottf this looks promising. Any ETA on NuGet package release? |
@Kaarel I'm doing the release right now! 1.1.5 |
Can confirm the issue is fixed in 1.1.5 :) Thank you! |
We are seeing a
System.NotSupportedException
with the following stack trace:It happens intermittently and isn't 100% reproducible, but here is my best guess as to what is happening:
In the C# class
Connection
there is a fieldprivate Stream bw
which is aBufferedStream
over another underlyingStream
.In the method
publish
if thestatus
isConnState.RECONNECTING
then the getter for the propertybw.Position
is called. This getter is only supported if the underlyingStream
supports seeking. As we can see from the stack trace above, in that case, the underlying stream does not support seeking, and this causes aNotSupportedException
to be thrown.Here is my best guess as to what happened:
When the connection first enters
ConnState.RECONNECTING
,bw
is created over aMemoryStream
calledpending
which does support seeking and so is not the cause.However, later on, the method
doReconnect
calls the methodcreateConn
which changes the fieldbw
to now be created over a stream associated with the TCP socket (which does not support seeking). AftercreateConn
has returned successfully,doReconnect
then callsprocessConnectInit
. As a thought experiment, consider what happens if an exception is thrown byprocessConnectInit
: it will be caught in thecatch
block indoReconnect
, and thestatus
will be set toConnState.RECONNECTING
but it will not changebw
to be over aMemoryStream
and sobw
will continue to have an underlying stream associated with the TCP socket (which does not support seeking). Thecatch
block will thencontinue
and it will then select the next server and release themu
lock before it calls theReconnectDelayHandler
. At this point imagine that another thread callspublish
.publish
will throw the stack trace above, becausemu
is not locked, and thestatus
isConnState.RECONNECTING
, and the underlying stream ofbw
does not support seeking.I think the fix is that in every place where the
status
is set to beConnState.RECONNECTING
,bw
should be reset to be created over theMemoryStream
calledpending
The text was updated successfully, but these errors were encountered: