-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Don't make network request while holding config lock #161
Conversation
let response = request_fn(&url, request)?; | ||
debug!("Patch check response: {:?}", response); | ||
Ok(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the response get recorded? I guess the caller does that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the response never is recorded? TIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we don't log the response. What's interesting is that this function is not called by the engine – seemingly only ever by the shorebird_code_push
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's why check_for_update_internal exists, was to share code between this (new) function which was exposed for package:shorebird_code_push, and the normal update mechanism.
As I look back through this code I am reminded of how simple/primitive it is. We just haven't touched it much in a year.
That also might mean that what's happening is that package:shorebird_code_push is running already in a background thread while the user is launching the app (shorebird_init?)
I suspect that we'll also (possibly separately) want to do the fix where we make set_config be able to return quickly without even trying to grab the config lock to help shorebird_init always be fast? Or maybe we just need to be extra careful to make sure we're never holding the config lock for any length of time (e.g. across a network request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just haven't touched it much in a year.
Yep, it's been relatively stable + the consequences of getting it wrong are quite high.
I suspect that we'll also (possibly separately) want to do the fix where we make set_config be able to return quickly without even trying to grab the config lock to help shorebird_init always be fast
I think we could spawn a new thread when calling set_config
from init
to make it look something like:
pub fn init(
app_config: AppConfig,
file_provider: Box<dyn ExternalFileProvider>,
yaml: &str,
) -> Result<(), UpdateError> {
#[cfg(any(target_os = "android", test))]
use crate::android::libapp_path_from_settings;
init_logging();
let config = YamlConfig::from_yaml(yaml)
.map_err(|err| UpdateError::InvalidArgument("yaml".to_string(), err.to_string()))?;
let libapp_path = libapp_path_from_settings(&app_config.original_libapp_paths)?;
debug!("libapp_path: {:?}", libapp_path);
let _ = std::thread::spawn(move || {
set_config(
app_config,
file_provider,
libapp_path,
&config,
NetworkHooks::default(),
);
});
Ok(())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you want that. Any time the lock is already taken, there is nothing for us to do.
What about using https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.try_lock in the set_config case? If the lock is already taken, we know that config is initialized and thus nothing to do (since any other taking of the lock when we're not initialized errors I believe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work. We can discuss in the future PR :)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #161 +/- ##
==========================================
- Coverage 96.03% 95.98% -0.05%
==========================================
Files 20 20
Lines 2898 2962 +64
==========================================
+ Hits 2783 2843 +60
- Misses 115 119 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The bug you're fixing is basically making it so that in some circumstances it's possible to cause application launch to hang waiting on a network request (which is a terrible thing for any app to do). Obviously unintentional on our part. |
@@ -23,7 +23,7 @@ fn main() { | |||
eprintln!(" base: Path to the base file"); | |||
eprintln!(" new: Path to the new file"); | |||
eprintln!(" output: Path to the output patch file"); | |||
eprintln!(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to get the linter to stop yelling
let patch_check_delay = std::time::Duration::from_secs(2); | ||
std::thread::sleep(patch_check_delay); | ||
// If we get here, the test has failed. | ||
unreachable!("If the test has not terminated before this, set_config is likely being blocked by a patch check request, which should not happen"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test terminating cause the check_for_update thread to get killed? I'm not sure what code here is tearing down the thread before this gets hit? A naive implementation I would assume this to just get hit after the test runs (while some other test is running).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test terminating cause the check_for_update thread to get killed?
No, although I'm not sure what happens if the tests take longer than 2s to run. Let me dig into that a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bool to ensure that we don't hit unreachable
if we're managed to obtain and release the lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs one more round (to explain why the unreachable is never hit, confirm that it's actually doing something) and confirm .unwrap() isn't just silently hiding failures.
I think I've addressed all feedback, although maybe not
What do you mean by that? |
Because I was unclear as to how the thread actually gets canceled (presumably rust has some fancy built in way?) it wasn't clear to me that the network callback was ever called. If I were writing this I might just remove the 2s timeout and see if the test then correctly crashes to your unreachable. 🤷 If you tell me it's working I believe you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Please confirm, the test failed before you change, correct?
I could probably shorten the delay, but removing it makes the test flaky—the config thread and the patch check thread are spawned close enough together that it seems to be a coin flip whether the |
Confirmed. If I swap out the body of with_config(|config| {
// Load UpdaterState from disk
// If there is no state, make an empty state.
let state =
UpdaterState::load_or_new_on_error(&config.storage_dir, &config.release_version);
send_patch_check_request(config, &state)
}) |
It's always possible to de-flake threads with mutexs of course if you want to go that route. Most important is that the test fails (even flakily) before your change and passes consistently after. |
Update
check_for_update_internal
to no longer hold the config lock while making a patch check request.Fixes shorebirdtech/shorebird#1981