Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Don't make network request while holding config lock #161

Merged
merged 6 commits into from
May 14, 2024

Conversation

bryanoltman
Copy link
Contributor

@bryanoltman bryanoltman commented May 13, 2024

Update check_for_update_internal to no longer hold the config lock while making a patch check request.

Fixes shorebirdtech/shorebird#1981

Comment on lines +177 to +179
let response = request_fn(&url, request)?;
debug!("Patch check response: {:?}", response);
Ok(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the response get recorded? I guess the caller does that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the response never is recorded? TIL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we don't log the response. What's interesting is that this function is not called by the engine – seemingly only ever by the shorebird_code_push package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's why check_for_update_internal exists, was to share code between this (new) function which was exposed for package:shorebird_code_push, and the normal update mechanism.

As I look back through this code I am reminded of how simple/primitive it is. We just haven't touched it much in a year.

That also might mean that what's happening is that package:shorebird_code_push is running already in a background thread while the user is launching the app (shorebird_init?)

I suspect that we'll also (possibly separately) want to do the fix where we make set_config be able to return quickly without even trying to grab the config lock to help shorebird_init always be fast? Or maybe we just need to be extra careful to make sure we're never holding the config lock for any length of time (e.g. across a network request).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just haven't touched it much in a year.

Yep, it's been relatively stable + the consequences of getting it wrong are quite high.

I suspect that we'll also (possibly separately) want to do the fix where we make set_config be able to return quickly without even trying to grab the config lock to help shorebird_init always be fast

I think we could spawn a new thread when calling set_config from init to make it look something like:

pub fn init(
    app_config: AppConfig,
    file_provider: Box<dyn ExternalFileProvider>,
    yaml: &str,
) -> Result<(), UpdateError> {
    #[cfg(any(target_os = "android", test))]
    use crate::android::libapp_path_from_settings;

    init_logging();
    let config = YamlConfig::from_yaml(yaml)
        .map_err(|err| UpdateError::InvalidArgument("yaml".to_string(), err.to_string()))?;

    let libapp_path = libapp_path_from_settings(&app_config.original_libapp_paths)?;
    debug!("libapp_path: {:?}", libapp_path);

    let _ = std::thread::spawn(move || {
        set_config(
            app_config,
            file_provider,
            libapp_path,
            &config,
            NetworkHooks::default(),
        );
    });

    Ok(())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want that. Any time the lock is already taken, there is nothing for us to do.

What about using https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.try_lock in the set_config case? If the lock is already taken, we know that config is initialized and thus nothing to do (since any other taking of the lock when we're not initialized errors I believe).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work. We can discuss in the future PR :)

library/src/updater.rs Outdated Show resolved Hide resolved
Copy link

codecov bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 92.75362% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 95.98%. Comparing base (caaa14c) to head (78e21ee).

Files Patch % Lines
library/src/updater.rs 94.02% 4 Missing ⚠️
patch/src/main.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #161      +/-   ##
==========================================
- Coverage   96.03%   95.98%   -0.05%     
==========================================
  Files          20       20              
  Lines        2898     2962      +64     
==========================================
+ Hits         2783     2843      +60     
- Misses        115      119       +4     
Flag Coverage Δ
library 97.83% <94.11%> (-0.10%) ⬇️
patch 36.26% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eseidel
Copy link
Contributor

eseidel commented May 13, 2024

The bug you're fixing is basically making it so that in some circumstances it's possible to cause application launch to hang waiting on a network request (which is a terrible thing for any app to do). Obviously unintentional on our part.

@bryanoltman bryanoltman changed the title Don't make network request while holding config lock fix: Don't make network request while holding config lock May 13, 2024
@bryanoltman bryanoltman reopened this May 13, 2024
@bryanoltman bryanoltman marked this pull request as ready for review May 13, 2024 21:18
@@ -23,7 +23,7 @@ fn main() {
eprintln!(" base: Path to the base file");
eprintln!(" new: Path to the new file");
eprintln!(" output: Path to the output patch file");
eprintln!("");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to get the linter to stop yelling

let patch_check_delay = std::time::Duration::from_secs(2);
std::thread::sleep(patch_check_delay);
// If we get here, the test has failed.
unreachable!("If the test has not terminated before this, set_config is likely being blocked by a patch check request, which should not happen");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test terminating cause the check_for_update thread to get killed? I'm not sure what code here is tearing down the thread before this gets hit? A naive implementation I would assume this to just get hit after the test runs (while some other test is running).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test terminating cause the check_for_update thread to get killed?

No, although I'm not sure what happens if the tests take longer than 2s to run. Let me dig into that a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bool to ensure that we don't hit unreachable if we're managed to obtain and release the lock

library/src/updater.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@eseidel eseidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs one more round (to explain why the unreachable is never hit, confirm that it's actually doing something) and confirm .unwrap() isn't just silently hiding failures.

@bryanoltman
Copy link
Contributor Author

I think I've addressed all feedback, although maybe not

confirm that it's actually doing something

What do you mean by that?

@bryanoltman bryanoltman requested a review from eseidel May 13, 2024 21:43
@eseidel
Copy link
Contributor

eseidel commented May 13, 2024

I think I've addressed all feedback, although maybe not

confirm that it's actually doing something

What do you mean by that?

Because I was unclear as to how the thread actually gets canceled (presumably rust has some fancy built in way?) it wasn't clear to me that the network callback was ever called. If I were writing this I might just remove the 2s timeout and see if the test then correctly crashes to your unreachable. 🤷 If you tell me it's working I believe you.

Copy link
Contributor

@eseidel eseidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Please confirm, the test failed before you change, correct?

@bryanoltman
Copy link
Contributor Author

If I were writing this I might just remove the 2s timeout and see if the test then correctly crashes to your unreachable.

I could probably shorten the delay, but removing it makes the test flaky—the config thread and the patch check thread are spawned close enough together that it seems to be a coin flip whether the patch_check_request_fn body executes before HAS_FINISHED_CONFIG = true is set.

@bryanoltman
Copy link
Contributor Author

Please confirm, the test failed before you change, correct?

Confirmed. If I swap out the body of check_for_update_internal with the following, this test fails

    with_config(|config| {
        // Load UpdaterState from disk
        // If there is no state, make an empty state.
        let state =
            UpdaterState::load_or_new_on_error(&config.storage_dir, &config.release_version);
        send_patch_check_request(config, &state)
    })

@eseidel
Copy link
Contributor

eseidel commented May 13, 2024

It's always possible to de-flake threads with mutexs of course if you want to go that route.

Most important is that the test fails (even flakily) before your change and passes consistently after.

@bryanoltman bryanoltman merged commit d309317 into main May 14, 2024
6 of 8 checks passed
@bryanoltman bryanoltman deleted the bo/config-lock branch May 14, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix: Application Not Responding at shorebird_init on some devices
2 participants