Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub test runner doesn't like the new NetAsyncDownloader test #3906

Closed
HebaruSan opened this issue Sep 15, 2023 · 5 comments · Fixed by #3907
Closed

GitHub test runner doesn't like the new NetAsyncDownloader test #3906

HebaruSan opened this issue Sep 15, 2023 · 5 comments · Fixed by #3907
Assignees
Labels
Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Linux Issues specific for Linux Mono Issues specific for Mono Network Issues affecting internet connections of CKAN Tests Issues affecting the internal tests

Comments

@HebaruSan
Copy link
Member

HebaruSan commented Sep 15, 2023

Problem

#3904 made the NetAsyncDownloader set the target.size for the files it downloads as a conveniece for calling code (in this case, so the RepositoryDataManger could report updates for progress bars while loading) and added tests for it (with file:// URLs so we could always run them despite a flaky network).

  • Result on Windows: All tests pass, no errors or warnings.
  • Result on Ubuntu VM: All tests pass with warning from the test runner:
    image
  • Result on GitHub: All tests pass most of the time (7/10 in the merge commit), but once in a while it reports that some of the sizes aren't set in the multi-download test:
    image

I investigated this extensively yesterday and thought that we resolved it by upgrading NUnit-ConsoleRunner from 3.11.1 to 3.12.0, but apparently it still happens.

I am worried that this is somehow finding a real problem that could blow up once the release ships.

@HebaruSan HebaruSan added Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Linux Issues specific for Linux Mono Issues specific for Mono Tests Issues affecting the internal tests Network Issues affecting internet connections of CKAN labels Sep 15, 2023
@HebaruSan HebaruSan self-assigned this Sep 15, 2023
@HebaruSan
Copy link
Member Author

HebaruSan commented Sep 15, 2023

Going by the basic code flow, the outcome on GitHub should be impossible. The main thread waits for notification of completion of all downloads, before the test assertions are run:

// Start the download!
Download(targets);
log.Debug("Waiting for downloads to finish...");
complete_or_canceled.WaitOne();

... which is sent by the final download, when the completed download count reaches the total number of downloads:

private void triggerCompleted()
{
// Signal that we're done.
complete_or_canceled.Set();
}

// Make sure the threads don't trip on one another
lock (dlMutex)
{
if (++completed_downloads >= downloads.Count + queuedDownloads.Count)
{
log.DebugFormat("Triggering completion at {0} completed, {1} started, {2} queued", completed_downloads, downloads.Count, queuedDownloads.Count);
triggerCompleted();
}
}

... which comes sequentially after all of the sizes have been set when their respective threads finish:

else
{
log.InfoFormat("Finished downloading {0}", string.Join(", ", dl.target.urls));
dl.bytesLeft = 0;
// Let calling code find out how big this file is
dl.target.size = new FileInfo(dl.target.filename).Length;
}

Maybe there's some sort of inter-thread marshalling delay in target objects being updated? If that's the case, we might have to send them through another synchronization primitive to ensure they arrive safely and then have the main thread set them itself...

@HebaruSan
Copy link
Member Author

HebaruSan commented Sep 15, 2023

Also of note, this exact code is working perfectly fine in dozens of other tests, because it underlies the TemporaryRepositoryData test harness class that was needed to replace all usages of Registry.AddAvailable, Registry.RemoveAvailable, etc. However, I believe those are all single-download cases, for which we have a different test that is passing 100% of the time. So this could conceivably still be catching a real problem in the multi-download case.

That 30% failure rate smells like a race condition. But we are already handling synchronization.

@HebaruSan
Copy link
Member Author

Another possibility is a delay between when Stream.Write finishes writing to the file and when FileInfo.Length reports the correct size. Ideally the operating system would ensure this could never happen, but maybe there's some caching somewhere inside Mono...?

@HebaruSan
Copy link
Member Author

HebaruSan commented Sep 15, 2023

Ooh, if I add 9 more files to the test, my Ubuntu VM is able to reproduce the problem!

image

Also added more asserts, and "Sizes on disk should match originals" means that the test itself is seeing the wrong sizes on disk, so it's not a question of NetAsyncDownloader's updates getting lost in the shuffle; it's setting the sizes to 0, because that's what FileInfo.Length is telling it.

@HebaruSan
Copy link
Member Author

HebaruSan commented Sep 15, 2023

New guess: Since the file:// downloads are right there on disk (and probably cached in memory by the OS), they can complete so fast that the completion signal is triggered while the main thread is still in the process of starting them!

... I think that's it!! 🕺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Linux Issues specific for Linux Mono Issues specific for Mono Network Issues affecting internet connections of CKAN Tests Issues affecting the internal tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant