-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility of removing bus sockets #61
Comments
It is so tempting to simplify To do away with bus sockets in |
Your idea to use the websocket path - what if the dispatcher ws url path consists of the token? I can use your current method to generate the tokens at dispatcher. |
That sounds promising. Assuming a websocket path always looks like
I am wondering what to do in the case of 2B. If a worker fails to connect within the expiry window, I think I would need the token to rotate somehow. That way, if the worker starts after the expiry window, it will connect to a socket where nobody is listening, and it will promptly exit because What would be the best way to get the token to rotate after an expiry window? Only |
Rotating after an expiry window would be non-optimal with the new dispatcher design. I can have the dispatcher listeners only accept one connection (dropping new connections before they are added, effectively lock the socket). This would be new functionality added at Then for 2B if a subsequent worker tries to connect, it will fail and exit. All-in-all seems like a good solution and prevents 2 servers dialling into the same socket accidentally (or intentionally) - the token would also provide some obfuscation for the URL from a security perspective. |
Yes, I think that would work well! |
This is implemented in The socket locking turned out to be quite troublesome. It was perfectly effective for preventing additional connections at the dispatcher. However the retry mechanisms for the server trying to connect can be quite aggressive and end up interfering with the connections. Not reliable enough for our purposes. Instead I have chosen to use the mirai bus socket interface. Now if you specify an integer third argument to daemons, leaving the first 2 missing, it sends a command to replace that socket at dispatcher with a new one. daemons(,,3L) You get the new socket URL as the return value at This interface slots into the existing one well, but I can create a new function for this if you prefer! |
Amazing! This is exactly what I need, and it will simplify
Overloading the |
Great! I just wanted to check it was all good first. The overloading is indeed different... but I will open up a new interface for this to be cleaner. |
implemented by function |
I've also implemented the locked sockets for when |
That's perfect! I have been slowly working my way through the lower-level data structures of |
I have renamed the headers for the I have taken out 'busy' as that was just the difference of complete - assigned, it didn't actively monitor 'busy' status. Please feel free to update in
|
I am almost ready to merge my updates and close this issue, but I am having trouble with shikokuchuo/mirai#47. That should be the final obstacle before a much simpler and more robust |
Working through the tests, I am getting a lot of crashes in RStudio on my local macbook, and I am frequently getting |
I am also getting crashes sometimes after I manually terminate the dispatcher in |
shikokuchuo/mirai#48 reproduces some of the same errors, and the reprex only uses |
Hopefully my fix in 0.8.2.9007 also eliminates the crashes you have been experiencing. Making a record of this one (the one I caught running crew tests): *** caught segfault ***
address 0x7f673c021000, cause 'invalid permissions'
malloc(): unsorted double linked list corrupted
Aborted |
Thank you so much! Like I mentioned at shikokuchuo/mirai#48, although not entirely gone, the errors are greatly reduced on my end. I am now experiencing new ones at shikokuchuo/mirai#50. |
This branch of library(crew)
x <- crew_controller_local(tasks_max = 1L)
x$start()
x$push(command = ps::ps_pid())
Sys.sleep(5)
x$pop()
#> # A tibble: 1 × 11
#> name command result seconds seed error trace…¹ warni…² launc…³ worker insta…⁴
#> <chr> <chr> <list> <dbl> <int> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 bea03b0cc5adc7a10fe8b60a1f0e126… ps::ps… <int> 0 6.70e8 NA NA NA aed93c… 1 e86f73…
# … with abbreviated variable names ¹traceback, ²warnings, ³launcher, ⁴instance
x$push(command = ps::ps_pid())
Sys.sleep(5)
x$pop() # should not be NULL
#> NULL
x$queue[[1]]$handle[[1]]$data # should be resolved
#> 'unresolved' logi NA
x$terminate() Before I terminated the controller, the second instance of Error:
! in callr subprocess.
Caused by error in `socket(protocol = "rep", dial = url, autostart = asyncdial || …`:
! 15 | Address invalid That's a clue. |
In my last test, the socket from |
Oddly enough, even after shikokuchuo/mirai#51 is fixed, I am still getting |
The way you phrased the problem, makes it somewhat clearer. I suspect the answer will be to rotate just the listener rather than the entire socket. |
Yes, that explanation is consistent with what I am seeing. Anything I will need to do differently in It if helps, the code for #61 does not actually rely on having a different websocket path, only that the |
So I would be fine if |
Should be fine as I can reset the counter in the cv. The behaviour should match what it is now. The path will still change. This is an NNG concept - a socket can have as many dialers and listeners attached to it - for really complex applications! |
Sounds great! I was going to clarify that I still need an indication from Applications with many listeners and many dialers sound complicated indeed! |
Take f1e961f v0.8.2.9012 for a spin. I really hope this works! |
Flawless! The transient worker throughput test now runs all tasks in a timely manner and shows the exact right number of worker launches! |
I know this was a tough one for both of us, so I want to thank you again for sticking with me and rooting out the cause. |
You're welcome! I really hate to leave things unfinished... so I'm also glad we persevered! |
Prework
Description
I was contemplating how well things scale up in general, and came to the realisation that with your bus sockets and CVs, it's actually now a duplication of what already happens at the main
mirai
socket.What you can do is pre-launch, check online status is 0, [check and kill process if alive], and record the instance number.
Then if the instance number increases, the server has been 'discovered'.
This seems to remove an unnecessary complication.
The text was updated successfully, but these errors were encountered: