Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/properly terminate read operations #179

Merged
merged 4 commits into from
Apr 18, 2024

Conversation

carlmontanari
Copy link
Contributor

fixes #178 (I think/hope)

cc/ @netixx

@netixx
Copy link
Contributor

netixx commented Apr 15, 2024

Testing that right now 👍

Just for my personal improvement, why not use context.WithTimeout instead of WithCancel + timer ?

@carlmontanari
Copy link
Contributor Author

Just for my personal improvement, why not use context.WithTimeout instead of WithCancel + timer ?

no good reasons I assure you 😅 -- mostly was just trying to change as little as possible at the moment. Probably thats a better idea/way to go but.... this works (I think/hope) 😁

@netixx
Copy link
Contributor

netixx commented Apr 16, 2024

I tested this PR, but on the latest main (with byot): https://github.com/netixx/scrapligo/tree/read-timeout

Most of the deadlocking is gone, but I am still getting goroutines stuck:

  1. in the channel.Close c.done <- struct{}{}
  2. in github.com/scrapli/scrapligo/driver/netconf.(*Driver).getServerCapabilities.func1 (chansend)
  3. in github.com/scrapli/scrapligo/driver/netconf.(*Driver).sendRPC.func1 (time.Sleep)

for 1/:
I guess all the consumer for c.done are already gone or not created at this stage (that is read()). For example if we are calling c.Close() and read() has already died (because eg. of io.EOF).

for 2/:
If the timer triggers, then their will be no reader for cr will terminate and the goroutine will get stuck in the send operation.

for 3/
getMessage should return non nil when the driver is closing, so that sendRPC can return ? Or the sendRPC loop itself should return when the driver is closing (since we'll never get any data).

I don't know how you want to handle those issues, my approach would be to have a global cancellable context in the driver and channel, and checking everywhere there is a blocking operation, whether the context is Done or not.
Then when we call close, we cancel the context and everything should finish properly.

In the future maybe we can pass the context from the caller with a WithContext option, so that we inherit the parent context and can terminate gracefully if the parent context is cancelled itself.

If it's a solution that you think could work, I can work on it if you like.

@carlmontanari
Copy link
Contributor Author

copy, thanks for digging in again @netixx! ill try to take a closer look at this later today or tomorrow and get back to ya!

@carlmontanari
Copy link
Contributor Author

ok @netixx I think I fixed the last of them w/ that lats commit. its not beautiful but should work 🤣

In the future maybe we can pass the context from the caller with a WithContext option, so that we inherit the parent context and can terminate gracefully if the parent context is cancelled itself.

I think "yes" this sounds way smarter/saner/better way to go about it in general... I think at some point a "v2" should follow this path. im not sure I have the energy to do that anytime soon (especially as there would be a lot of other stuff id want to change I think), but if you are hyped about it and wanted to start a branch maybe we could chip away at it over a long-ish period of time?

@netixx
Copy link
Contributor

netixx commented Apr 18, 2024

Ok, if I have some time laying around, I'll attempt to do something with contexts :)

In the meantime, this fixes my issue 👍 👍

Thanks for all your help and work!!

@carlmontanari carlmontanari marked this pull request as ready for review April 18, 2024 13:24
@carlmontanari carlmontanari merged commit 9372cb4 into main Apr 18, 2024
30 checks passed
@carlmontanari carlmontanari deleted the fix/properly-terminate-read-operations branch April 18, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Netconf channel stuck in ReadUntilPrompt
2 participants