-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: TestScript/mod_concurrent is flaky on Windows #32188
Comments
Do we have a reason to suspect that this is a regression? Given that it manifests as a failure (not data corruption) when running |
@alexbrainman, do you have any tips on figuring out the actual error code from Windows error messages?
|
Frequent tests flakes are always a release blocker. We can disable the test if it's not fixed soon. |
Agreed, but you didn't mention how frequent it is. 🙂 I see ~5 flakes on the first page of the build dashboard, so I agree that this is reasonable as a release-blocker. |
@bcmills, trybots too. I haven't gone over those logs, but I've seen enough myself now to be annoyed & filed this bug. |
It is ERROR_SHARING_VIOLATION (search https://docs.microsoft.com/en-us/windows/desktop/debug/system-error-codes--0-499- for
Same. But I went to the page where all windows errors are, and found your error text there. If I google for https://docs.microsoft.com/en-us/windows/desktop/debug/system-error-codes--0-499- https://docs.microsoft.com/en-us/windows/desktop/debug/system-error-codes I did not look at the code, but the fact that we get ERROR_SHARING_VIOLATION while opening go.mod, suggests that something unusual happens. Normally I would expect this error when trying to open/delete/move file for executable that is running. The fact it happens with go.mod file - maybe some locking code of yours is not working properly. I am, probably, wrong. Alex |
Change https://golang.org/cl/179859 mentions this issue: |
That was my first guess too, but the |
My stress-test successfully reproduced the failure mode, along with a few other related failure modes, on the first attempt. The https://storage.googleapis.com/go-build-log/51ca1bc6/windows-amd64-2016_af1db195.log
|
Per this article, it appears that we could use It is not obvious to me whether |
What do you mean atomic ? If you refer to
in your log, then the Also
could be because ioutil.ReadFile does not work properly while your renameio.WriteFile is in progress. I am not familiar enough with your problem, but do you really need file locks? Do you know that you can use memory sync objects, like https://docs.microsoft.com/en-us/windows/desktop/api/synchapi/nf-synchapi-createmutexexw or https://docs.microsoft.com/en-us/windows/desktop/api/synchapi/nf-synchapi-createmutexexw and many others? These will provide you with whatever sync functionality you require across single computer. I also suspect they will be faster than working with files, because they are pure sync objects. Alex |
An atomic file-rename has the property that any read of the file sees either its contents before the rename or after the rename, never some third state (such as partially-zeroed, truncated, or inaccessible).
Yes, that's exactly what I mean about it not being atomic: if It isn't obvious to me whether |
Yes, that particular error is probably another manifestation of #30789. The |
Change https://golang.org/cl/180218 mentions this issue: |
Change https://golang.org/cl/180219 mentions this issue: |
Change https://golang.org/cl/180317 mentions this issue: |
Then I don't see how do you know that Windows MoveFileEx system call with the MOVEFILE_REPLACE_EXISTING flag is not atomic.
What
I did not look at your Still you need to synchronize your file reads and writes, how do you propose to do that? You don't want your read happens before write. Also, even with file locks or without FILE_SHARE_READ | FILE_SHARE_WRITE, whoever comes second will get "Access denied" error:
Alex |
Change https://golang.org/cl/180517 mentions this issue: |
Empirical testing. 🙂 But empirical testing shows that
Note that there is a third bit, Unfortunately, even setting that bit does not seem to eliminate all of the
I don't want exclusive access to the file — I want semantics consistent with POSIX, in which existing handles to the file see its existing contents from before the rename, handles opened after the rename see the new contents, and there is no fuzzy inaccessible state in the middle in which the file cannot be read at all.
In some cases, yes. Most of these operations follow the sequence read-prepare-lock-read-write-unlock, which works on all of the other supported platforms and minimizes the duration under which the file is locked. The problem with this test on Windows is that the first read operation fails. |
ReadFile is a drop-in replacement for ioutil.ReadFile that works around Windows filesystem flakiness under load. A followup CL will replace uses of ioutil.ReadFile in cmd/go with this function. Updates #32188 Change-Id: I232ba893b132bdc84cd7b0edde436165a69e1aa8 Reviewed-on: https://go-review.googlesource.com/c/go/+/180219 Run-TryBot: Bryan C. Mills <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Jay Conrod <[email protected]>
Sorry, but I still do not understand what you are talking about. You are saying Thank you. Alex |
That example is exactly the new test I added for the For now, the workaround of retrying both the read and rename operations on error seems to produce an acceptably low flake rate (for both |
Thank you for the reference. As far as understand your test just verifies that complete write of a file is read as a whole. Other than that there is no synchronization between readers or writers, or between writers, or between readers. You could have achieved that by using CreateFile without either FILE_SHARE_READ or FILE_SHARE_WRITE. Then all file readers and writers would get exclusive access to the file, while others will get 'Access denied' (or similar) error. If you get 'Access denied' error, you would have to try again later. You could also have used CreateFile with FILE_SHARE_READ. Similar to above, but all your readers can read file simultaneously. Other than similar to the above. You could also use system sync objects (for example CreateMutexEx or similar) to control access to the file. These would allow you to wait for sync object to be released instead of trying in a loop.
Your approach uses os.Rename as atomic write operation. I do not know, if os.Rename behaves this way on Windows. Alex |
Correct.
My goal is for readers and writers not to need to retry at all. Concurrent writes are guarded by locking an external file, and concurrent reads should simply wait for the file to become available, then read it.
We already use a (more portable) file lock to guard write access to files for which writes are important and not idempotent. In theory we could do the same for read access and for files for which dropped writes are OK (such as the cache-trimming timestamp) or for which writes are idempotent, but that would add needless locking overhead in by far the common case (concurrent readers with no writers). We could add the locking only for Windows, but — especially given that our builders have much better coverage for Unix-like platforms — that seems like it would substantially increase the likelihood of introducing a subtle, Windows-only deadlock. |
Change https://golang.org/cl/181541 mentions this issue: |
Factor the try-on-failure variants are now in the package cmd/go/internal/robustio. Add to them a RemoveAll variant using the same retry loop, and use it to attempt to address the observed flakes in TestLinkXImportPathEscape. Fixes #19491 Updates #25965 Updates #28387 Updates #32188 Change-Id: I9db1a0c7537b8aaadccab1b9eca734595668ba29 Reviewed-on: https://go-review.googlesource.com/c/go/+/181541 Run-TryBot: Bryan C. Mills <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Alex Brainman <[email protected]> Reviewed-by: Jay Conrod <[email protected]> Reviewed-by: Russ Cox <[email protected]>
This test is flaky on Windows:
https://storage.googleapis.com/go-build-log/bf2990d2/windows-amd64-2016_b2c5a5fb.log
/cc @jayconrod @bcmills
The text was updated successfully, but these errors were encountered: