-
-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release waiting processes when destroying a socket (without ThreadedFFI dependency) #16172
Release waiting processes when destroying a socket (without ThreadedFFI dependency) #16172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
There is a seemingly related test failure. Ideas?
|
Hmm...when I ran the test in my image, it passed, but if I run it many times in a row (by script, not clicking the button— Since writing this, another issue led me to realize that As far as making the test reliable, off the top of my head the only thing I can think of is to simply insert a wait in between the two I also created #16196—I think we should rename |
I restarted the CI since it has been a while and this can fix some failing tests |
There is still a failing socket test osx-64 / Tests-osx-64 / testNextPutAllFlushOtherEndClosed – MacOSX64.Network.Tests.SocketStreamTest6s |
0b737fb
to
e82d63d
Compare
Aha—it's the same issue as with #testFlushOtherEndClosed. I added the wait here as well and it should be good now. |
Now I see this one: unix-64 / Tests-unix-64 / testFlushLargeMessageOtherEndClosed – Unix64.Network.Tests.SocketStreamTest10s |
Interesting...I believe this is because of differing defaults for Footnotes
|
@daniels220 As GitHub does not notify a pull request’s author about this as far as I know, in case you’re not aware: GitHub now indicates ‘this branch has conflicts that must be resolved’. That seems to be due to the merge of pull request #16376. |
175171e
to
a60667b
Compare
@Rinzwind Thanks for the heads-up—easy fix. |
Thanks! There’s one test failure in build 6 you may still need to look at though. |
Strangely this only breaks on windows with a timeout. Maybe the issue is somewhere else than the code?
|
I restarted a build to see if this is random or not. I don't know it we should merge this one or not because of this test. It already fixes a lot of our current failing tests. |
The same failure happened again with the same stack |
I'm hoping to look at this tomorrow (Monday). I have a sneaking suspicion what the problem might be, but it's not something I was ever able to reproduce/verify—it would be great if this were a reproducible test case, but also, I could be completely off base. |
Okay wow, no, it's very much not what I suspected. So on Windows, when you try to set So I can work around that by only trying to set the buffer on not-Windows (or just giving up entirely and accepting that I'll need to send an 8MB message to make the test work on Linux)...at which point I find out that Windows will happily say, "yup, I sent all that" when you make a So there's an arguable VM/plugin bug in play—though I think it's more like some nasty Windows errata that the VM might want to work around. Certainly looking at the plugin code myself, I don't see anything suspicious, it must be that So. Effectively this test is expected-failure on Windows, in that I don't know how to make the behavior it's testing work on Windows. It's not a regression from earlier Pharo versions, the test itself is new (my doing). Thoughts on how to proceed? |
There’s a bug when setting (9999 to: 10000) collect: [ :value |
(Socket new
setOption: 'SO_SNDBUF' value: value;
getOption: 'SO_SNDBUF') second ] I get |
@Rinzwind Ouch, yes, I actually looked at the plugin code and was suspicious of those lines, but couldn't understand them well enough to specifically point to a bug. Looking at it now, yeah, wow, that line is completely confusing "length of a string" and "byte size of a type". Most 32-bit integers are longer than four decimal digits! (Of course Of course the right solution, as implied in the comments in the plugin code, is for the image to pass the values as their actual types, the primitive to convert them to appropriate encoding in a But then this starts sounding an awful lot like what the FFI marshaling layer is already doing, which really brings me back around to a pet peeve of mine, that being, why is so much of the sockets implementation in primitives that are just thin wrappers over syscalls? I know we need a tiny bit of abstraction to handle different platforms, but that could be handled by switching to different helper classes or something in the image, with just a very very few primitives to handle the parts that absolutely cannot be done any other way (I'm thinking some of the stuff with In any case, we still have the weird behavior in Windows that we can have |
Another approach to solving the bug could be to change the condition on line 1539 to As for the Windows behavior, that’s not a newly introduced problem that might require reverting recent changes as far as I understand, so you may need to make |
a60667b
to
91485b2
Compare
Yes, explicitly specifying the type is clearly the correct behavior—that's more in line with what I was thinking of in terms of refactoring so the image passes the parameters as their correct type rather than strings. The declaration on the C side would still be needed for Makes sense re: skipping the test on Windows. Just did that. |
The last build failed due to issue #16443. That seems to have been fixed so another build could be started but its Windows test run will probably fail now due to issue #16450. |
Tests passed! Except a known random failure. Should we integrate? |
and now we have a conflict :P |
91485b2
to
1ff5eb1
Compare
…pharo-project#15975)" This reverts commit f5df596.
This should be sufficient, as multiple processes trying to read or write at once will result in undefined behavior anyway as their data gets all mixed up together.
…through accessors This was already sometimes the case, and makes sense--these are private implementation details that really shouldn't be exposed to the outside world at all. Also refactor to make it clearer that #waitForData is strictly a degenerate case of #waitForDataFor:...--the loop structure is identical, just without the timeout-related bits.
…therEndClosed Actually, extract a helper as the first part of each test is identical.
…OtherEndClosed Linux default is enormous (~2.5MiB), while Windows is only 65KiB. Attempt to standardize on this small buffer, but make sure we exceed the *actual* buffer size, as Linux (again) refuses to set a buffer size below 416KiB.
Fails due to Windows accepting arbitrarily large messages--much larger than the TCP send buffer--in a single send(2) call, preventing #waitForSendDone... from being called during the send process. Not sure why but this is not a regression, just strange behavior revealed by the new test.
Also, no need to actually store a `sendDone` temp, if we reach the end of the method we know it is true.
1ff5eb1
to
82d4d7b
Compare
Right, I remember seeing the issue whose fix created that conflict. As it happens I have some further refinement, but in any case, conflict resolved. |
It's green! |
There are no test results for Windows in the most recent build’s results though. The output of the ‘Allocate node’ step for the Windows test run in that build (build 14) shows ‘pharo-ci-jenkins2-win10’ was used (which doesn’t actually run the tests, see issue #15105):
|
Plus a couple related refactoring opportunities I spotted at the last moment.