fix: Don't make network request while holding config lock #161

bryanoltman · 2024-05-13T19:58:13Z

Update check_for_update_internal to no longer hold the config lock while making a patch check request.

Fixes shorebirdtech/shorebird#1981

…r a patch check request

eseidel · 2024-05-13T20:00:38Z

library/src/updater.rs

+    let response = request_fn(&url, request)?;
+    debug!("Patch check response: {:?}", response);
+    Ok(response)


How does the response get recorded? I guess the caller does that?

I guess the response never is recorded? TIL.

Nope, we don't log the response. What's interesting is that this function is not called by the engine – seemingly only ever by the shorebird_code_push package.

I think that's why check_for_update_internal exists, was to share code between this (new) function which was exposed for package:shorebird_code_push, and the normal update mechanism.

As I look back through this code I am reminded of how simple/primitive it is. We just haven't touched it much in a year.

That also might mean that what's happening is that package:shorebird_code_push is running already in a background thread while the user is launching the app (shorebird_init?)

I suspect that we'll also (possibly separately) want to do the fix where we make set_config be able to return quickly without even trying to grab the config lock to help shorebird_init always be fast? Or maybe we just need to be extra careful to make sure we're never holding the config lock for any length of time (e.g. across a network request).

We just haven't touched it much in a year.

Yep, it's been relatively stable + the consequences of getting it wrong are quite high.

I suspect that we'll also (possibly separately) want to do the fix where we make set_config be able to return quickly without even trying to grab the config lock to help shorebird_init always be fast

I think we could spawn a new thread when calling set_config from init to make it look something like:

pub fn init( app_config: AppConfig, file_provider: Box<dyn ExternalFileProvider>, yaml: &str, ) -> Result<(), UpdateError> { #[cfg(any(target_os = "android", test))] use crate::android::libapp_path_from_settings; init_logging(); let config = YamlConfig::from_yaml(yaml) .map_err(|err| UpdateError::InvalidArgument("yaml".to_string(), err.to_string()))?; let libapp_path = libapp_path_from_settings(&app_config.original_libapp_paths)?; debug!("libapp_path: {:?}", libapp_path); let _ = std::thread::spawn(move || { set_config( app_config, file_provider, libapp_path, &config, NetworkHooks::default(), ); }); Ok(()) }

I don't think you want that. Any time the lock is already taken, there is nothing for us to do.

What about using https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.try_lock in the set_config case? If the lock is already taken, we know that config is initialized and thus nothing to do (since any other taking of the lock when we're not initialized errors I believe).

That would work. We can discuss in the future PR :)

library/src/updater.rs

codecov · 2024-05-13T20:02:53Z

Codecov Report

Attention: Patch coverage is 92.75362% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 95.98%. Comparing base (caaa14c) to head (78e21ee).

Files	Patch %	Lines
library/src/updater.rs	94.02%	4 Missing ⚠️
patch/src/main.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #161      +/-   ##
==========================================
- Coverage   96.03%   95.98%   -0.05%     
==========================================
  Files          20       20              
  Lines        2898     2962      +64     
==========================================
+ Hits         2783     2843      +60     
- Misses        115      119       +4

Flag	Coverage Δ
library	`97.83% <94.11%> (-0.10%)`	⬇️
patch	`36.26% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eseidel · 2024-05-13T20:57:00Z

The bug you're fixing is basically making it so that in some circumstances it's possible to cause application launch to hang waiting on a network request (which is a terrible thing for any app to do). Obviously unintentional on our part.

bryanoltman · 2024-05-13T21:19:22Z

patch/src/main.rs

@@ -23,7 +23,7 @@ fn main() {
        eprintln!("  base:   Path to the base file");
        eprintln!("  new:    Path to the new file");
        eprintln!("  output: Path to the output patch file");
-        eprintln!("");


This is just to get the linter to stop yelling

eseidel · 2024-05-13T21:28:25Z

library/src/updater.rs

+                let patch_check_delay = std::time::Duration::from_secs(2);
+                std::thread::sleep(patch_check_delay);
+                // If we get here, the test has failed.
+                unreachable!("If the test has not terminated before this, set_config is likely being blocked by a patch check request, which should not happen");


Does the test terminating cause the check_for_update thread to get killed? I'm not sure what code here is tearing down the thread before this gets hit? A naive implementation I would assume this to just get hit after the test runs (while some other test is running).

Does the test terminating cause the check_for_update thread to get killed?

No, although I'm not sure what happens if the tests take longer than 2s to run. Let me dig into that a bit more.

Added a bool to ensure that we don't hit unreachable if we're managed to obtain and release the lock

library/src/updater.rs

eseidel

I think this needs one more round (to explain why the unreachable is never hit, confirm that it's actually doing something) and confirm .unwrap() isn't just silently hiding failures.

bryanoltman · 2024-05-13T21:43:01Z

I think I've addressed all feedback, although maybe not

confirm that it's actually doing something

What do you mean by that?

eseidel · 2024-05-13T21:48:21Z

I think I've addressed all feedback, although maybe not

confirm that it's actually doing something

What do you mean by that?

Because I was unclear as to how the thread actually gets canceled (presumably rust has some fancy built in way?) it wasn't clear to me that the network callback was ever called. If I were writing this I might just remove the 2s timeout and see if the test then correctly crashes to your unreachable. 🤷 If you tell me it's working I believe you.

eseidel

lgtm.

Please confirm, the test failed before you change, correct?

bryanoltman · 2024-05-13T21:53:32Z

If I were writing this I might just remove the 2s timeout and see if the test then correctly crashes to your unreachable.

I could probably shorten the delay, but removing it makes the test flaky—the config thread and the patch check thread are spawned close enough together that it seems to be a coin flip whether the patch_check_request_fn body executes before HAS_FINISHED_CONFIG = true is set.

bryanoltman · 2024-05-13T21:54:10Z

Please confirm, the test failed before you change, correct?

Confirmed. If I swap out the body of check_for_update_internal with the following, this test fails

    with_config(|config| {
        // Load UpdaterState from disk
        // If there is no state, make an empty state.
        let state =
            UpdaterState::load_or_new_on_error(&config.storage_dir, &config.release_version);
        send_patch_check_request(config, &state)
    })

eseidel · 2024-05-13T21:55:01Z

It's always possible to de-flake threads with mutexs of course if you want to go that route.

Most important is that the test fails (even flakily) before your change and passes consistently after.

bryanoltman added 2 commits May 13, 2024 15:31

Add test to demonstrate hang when attempting to init while waiting fo…

11e9a6b

…r a patch check request

Fix thread contention when initing during patch check

1d9b717

eseidel reviewed May 13, 2024

View reviewed changes

library/src/updater.rs Outdated Show resolved Hide resolved

Clean up test

6c7f331

bryanoltman changed the title ~~Don't make network request while holding config lock~~ fix: Don't make network request while holding config lock May 13, 2024

bryanoltman closed this May 13, 2024

bryanoltman reopened this May 13, 2024

revert change to fake_yaml

5023253

bryanoltman marked this pull request as ready for review May 13, 2024 21:18

bryanoltman commented May 13, 2024

View reviewed changes

bryanoltman requested review from eseidel, erickzanardo and felangel May 13, 2024 21:19

eseidel reviewed May 13, 2024

View reviewed changes

library/src/updater.rs Outdated Show resolved Hide resolved

eseidel requested changes May 13, 2024

View reviewed changes

reorg

ce302a5

bryanoltman requested a review from eseidel May 13, 2024 21:43

eseidel approved these changes May 13, 2024

View reviewed changes

lower timeout in test

78e21ee

bryanoltman merged commit d309317 into main May 14, 2024
6 of 8 checks passed

bryanoltman deleted the bo/config-lock branch May 14, 2024 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Don't make network request while holding config lock #161

fix: Don't make network request while holding config lock #161

bryanoltman commented May 13, 2024 •

edited

Loading

eseidel May 13, 2024

eseidel May 13, 2024

bryanoltman May 13, 2024

eseidel May 13, 2024

bryanoltman May 13, 2024

eseidel May 13, 2024

bryanoltman May 13, 2024

codecov bot commented May 13, 2024 •

edited

Loading

eseidel commented May 13, 2024

bryanoltman May 13, 2024

eseidel May 13, 2024

bryanoltman May 13, 2024

bryanoltman May 13, 2024

eseidel left a comment

bryanoltman commented May 13, 2024

eseidel commented May 13, 2024

eseidel left a comment

bryanoltman commented May 13, 2024

bryanoltman commented May 13, 2024

eseidel commented May 13, 2024

fix: Don't make network request while holding config lock #161

fix: Don't make network request while holding config lock #161

Conversation

bryanoltman commented May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 13, 2024 • edited Loading

Codecov Report

eseidel commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eseidel left a comment

Choose a reason for hiding this comment

bryanoltman commented May 13, 2024

eseidel commented May 13, 2024

eseidel left a comment

Choose a reason for hiding this comment

bryanoltman commented May 13, 2024

bryanoltman commented May 13, 2024

eseidel commented May 13, 2024

bryanoltman commented May 13, 2024 •

edited

Loading

codecov bot commented May 13, 2024 •

edited

Loading