GitHub test runner doesn't like the new NetAsyncDownloader test #3906

HebaruSan · 2023-09-15T17:18:49Z

Problem

#3904 made the NetAsyncDownloader set the target.size for the files it downloads as a conveniece for calling code (in this case, so the RepositoryDataManger could report updates for progress bars while loading) and added tests for it (with file:// URLs so we could always run them despite a flaky network).

Result on Windows: All tests pass, no errors or warnings.
Result on Ubuntu VM: All tests pass with warning from the test runner:
Result on GitHub: All tests pass most of the time (7/10 in the merge commit), but once in a while it reports that some of the sizes aren't set in the multi-download test:

I investigated this extensively yesterday and thought that we resolved it by upgrading NUnit-ConsoleRunner from 3.11.1 to 3.12.0, but apparently it still happens.

I am worried that this is somehow finding a real problem that could blow up once the release ships.

The text was updated successfully, but these errors were encountered:

HebaruSan · 2023-09-15T17:27:12Z

Going by the basic code flow, the outcome on GitHub should be impossible. The main thread waits for notification of completion of all downloads, before the test assertions are run:

CKAN/Core/Net/NetAsyncDownloader.cs

Lines 174 to 178 in 5588442

    
           // Start the download! 
        
           Download(targets); 
        
           log.Debug("Waiting for downloads to finish..."); 
        
           complete_or_canceled.WaitOne();

... which is sent by the final download, when the completed download count reaches the total number of downloads:

CKAN/Core/Net/NetAsyncDownloader.cs

Lines 349 to 353 in 5588442

    
           private void triggerCompleted() 
        
           { 
        
               // Signal that we're done. 
        
               complete_or_canceled.Set(); 
        
           }

CKAN/Core/Net/NetAsyncDownloader.cs

Lines 482 to 490 in 5588442

    
           // Make sure the threads don't trip on one another 
        
           lock (dlMutex) 
        
           { 
        
               if (++completed_downloads >= downloads.Count + queuedDownloads.Count) 
        
               { 
        
                   log.DebugFormat("Triggering completion at {0} completed, {1} started, {2} queued", completed_downloads, downloads.Count, queuedDownloads.Count); 
        
                   triggerCompleted(); 
        
               } 
        
           }

... which comes sequentially after all of the sizes have been set when their respective threads finish:

CKAN/Core/Net/NetAsyncDownloader.cs

Lines 458 to 464 in 5588442

    
           else 
        
           { 
        
               log.InfoFormat("Finished downloading {0}", string.Join(", ", dl.target.urls)); 
        
               dl.bytesLeft = 0; 
        
               // Let calling code find out how big this file is 
        
               dl.target.size = new FileInfo(dl.target.filename).Length; 
        
           }

Maybe there's some sort of inter-thread marshalling delay in target objects being updated? If that's the case, we might have to send them through another synchronization primitive to ensure they arrive safely and then have the main thread set them itself...

HebaruSan · 2023-09-15T17:39:39Z

Also of note, this exact code is working perfectly fine in dozens of other tests, because it underlies the TemporaryRepositoryData test harness class that was needed to replace all usages of Registry.AddAvailable, Registry.RemoveAvailable, etc. However, I believe those are all single-download cases, for which we have a different test that is passing 100% of the time. So this could conceivably still be catching a real problem in the multi-download case.

That 30% failure rate smells like a race condition. But we are already handling synchronization.

HebaruSan · 2023-09-15T18:07:02Z

Another possibility is a delay between when Stream.Write finishes writing to the file and when FileInfo.Length reports the correct size. Ideally the operating system would ensure this could never happen, but maybe there's some caching somewhere inside Mono...?

HebaruSan · 2023-09-15T18:29:11Z

Ooh, if I add 9 more files to the test, my Ubuntu VM is able to reproduce the problem!

Also added more asserts, and "Sizes on disk should match originals" means that the test itself is seeing the wrong sizes on disk, so it's not a question of NetAsyncDownloader's updates getting lost in the shuffle; it's setting the sizes to 0, because that's what FileInfo.Length is telling it.

HebaruSan · 2023-09-15T18:38:21Z

New guess: Since the file:// downloads are right there on disk (and probably cached in memory by the OS), they can complete so fast that the completion signal is triggered while the main thread is still in the process of starting them!

... I think that's it!! 🕺

HebaruSan added Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Linux Issues specific for Linux Mono Issues specific for Mono Tests Issues affecting the internal tests Network Issues affecting internet connections of CKAN labels Sep 15, 2023

HebaruSan self-assigned this Sep 15, 2023

HebaruSan mentioned this issue Sep 15, 2023

Fix race condition for multiple very fast downloads #3907

Merged

HebaruSan closed this as completed in #3907 Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub test runner doesn't like the new NetAsyncDownloader test #3906

GitHub test runner doesn't like the new NetAsyncDownloader test #3906

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading

GitHub test runner doesn't like the new NetAsyncDownloader test #3906

GitHub test runner doesn't like the new NetAsyncDownloader test #3906

Comments

HebaruSan commented Sep 15, 2023 • edited Loading

Problem

HebaruSan commented Sep 15, 2023 • edited Loading

HebaruSan commented Sep 15, 2023 • edited Loading

HebaruSan commented Sep 15, 2023

HebaruSan commented Sep 15, 2023 • edited Loading

HebaruSan commented Sep 15, 2023 • edited Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading

HebaruSan commented Sep 15, 2023 •

edited

Loading