feat: allow overriding `MetadataSyncJob` timeout via Qubes service flag #1552

cfm · 2022-08-30T23:20:09Z

Description

Closes #1547 by:

teaching the securedrop-client script to check for a Qubes service flag like SDEXTENDEDTIMEOUT_N and export it as SDEXTENDEDTIMEOUT=N;
having the MetadataSyncJob (noisily) override its sdclientapi.API.default_request_timeout with $SDEXTENDEDTIMEOUT if set in the environment.

Test Plan

Point your SecureDrop Workstation at my testing instance per #1547 (comment) and #1547 (comment). Then:

To test the current failure mode (#1547)

Check out 1547-override-timeout in sd-app:/home/user/securedrop-client.

Apply the following patch so that we can test files/securedrop-client within the virtual environment:

diff --git a/files/securedrop-client b/files/securedrop-client
index 9674911..403e37f 100755
--- a/files/securedrop-client
+++ b/files/securedrop-client
@@ -18,5 +18,6 @@ if [ -n "$timeout_flag_value" ]; then
        export SDEXTENDEDTIMEOUT="$timeout_flag_value"
 fi
 
+cd ~/securedrop-client
 # Now execute the actual client, only if running in an sd-app
-if [ "$(qubesdb-read /name)" = "sd-app" ]; then ./bin/sd-client; else echo "Not running in sd-app, client not starting."; fi
+if [ "$(qubesdb-read /name)" = "sd-app" ]; then python -m securedrop_client; else echo "Not running in sd-app, client not starting."; fi

Then:

user@sd-app:~/securedrop-client$ make venv
user@sd-app:~/securedrop-client$ source .venv/bin/activate
user@sd-app:~/securedrop-client$ files/securedrop-client

Log in and wait for sync. Against my testing instance (~3500 sources and submissions):
- the initial sync succeeds (~10 min); and
- subsequent syncs fail (~2 min).

To test overriding the timeout

Set the service flag:

[user@dom0 ~]$ qvm-service --enable sd-app SDEXTENDEDTIMEOUT_600
[user@dom0 ~]$ qvm-shutdown sd-app && sleep 5 && qvm-start sd-app

You can also do SDEXTENDEDTIMEOUT_10 if you're impatient. :-)

Explore how that flag is exposed within sd-app:

user@sd-app:~$ qubesdb-list /qubes-service/
SDEXTENDEDTIMEOUT_600
meminfo-writer
paxctld
user@sd-app:~$ qubesdb-list /qubes-service/SD
EXTENDEDTIMEOUT_600
user@sd-app:~$ qubesdb-list /qubes-service/SDEXTENDEDTIMEOUT_
600

Launch the client as above:

user@sd-app:~/securedrop-client$ source .venv/bin/activate
user@sd-app:~/securedrop-client$ files/securedrop-client

SDEXTENDEDTIMEOUT=600 appears in securedrop-client's standard output.

Log in and wait for sync.
- Sync either (a) succeeds, if the SDEXTENDEDTIMEOUT you set is long enough, or (b) fails after SDEXTENDEDTIMEOUT seconds.
- In sd-log, you see (e.g.): WARNING: MetadataSyncJob will use default_request_timeout=600

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

I have tested these changes in the appropriate Qubes environment
I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
These changes should not need testing in Qubes

If these changes add or remove files other than client code, the AppArmor profile may need to be updated. Please check as applicable:

I have updated the AppArmor profile
No update to the AppArmor profile is required for these changes

And see e21c163 for why!

I don't know and would appreciate guidance

If these changes modify the database schema, you should include a database migration. Please check as applicable:

I have written a migration and upgraded a test database based on main and confirmed that the migration is self-contained and applies cleanly
I have written a migration but have not upgraded a test database based on main and would like the reviewer to do so
I need help writing a database migration
No database schema changes are needed

gonzalo-bulnes

Not really a review because I'm not set up right now to follow the test plan. But just a note to say that I find the description and test plan fantastic! Having only loosely followed the conversation on this issue, I feel I understand what was done and how it should behave. 🙌

gonzalo-bulnes · 2022-08-31T22:25:30Z

securedrop_client/api_jobs/sync.py

+            logger.warn(
+                f"{self.__class__.__name__} will use "
+                f"default_request_timeout={api_client.default_request_timeout}"
+            )


A minor pet peeve of mine: since the message doesn't require action (e.g. like a deprecation warning does), I would prefer it to be at INFO level.

I say pet peeve because I find personally that most warnings shouldn't be emitted (potentially controversial opinion) because they fall into the following categories:

Errors that are not fatal because they're handled. (In my experience, the source of the error is unlikely to be fixed anyway if given the opportunity the choice was made, instead, to print a warning. 🤷 So the warning is mostly noise IMHO.)

Fatal errors, which shouldn't have happened and would better be logged at the ERROR level since they signal a bug to be fixed. (Foreseeable errors must be handled.)

Or messages that don't require user action and only provide context. (At which point I think competing for attention at the warning level contributes more to error-fatigue than it really helps keeping us informed. Using an INFO level seems more honest to me.)

What would I make a warning? Things that will become fatal errors unless action is taken, e.g. deprecation warnings. YMMV! Admittedly it's nitpickey and mostly a matter of personal preference. 🙂

I generally agree with your criteria, @gonzalo-bulnes. In this specific case, I'm rounding up slightly from our decision to call out this flag "explicitly as experimental, i.e. we cannot guarantee that we will support it in feature" to something akin to a deprecation warning. But I'm happy to demote this message to INFO if that logic isn't persuasive.

(See also: #1166.)

Oh, that makes sense @cfm I missed that intent! I'm not sure what context goes into the current wording: would there be an opportunity to call out explicitly (and additionally to the warning level): "WARNING: ... use experimental default_..."?

---so that we have something to compare to when we override it.

…IMEOUT=N Since a Qubes service just sets a boolean flag, we use qubesdb-list (1) as a glob, for any key beginning with the prefix "SDEXTENDEDTIMEOUT_"; and (2) to return the "value", aka the key without that prefix. This approach is too naïve to work for arbitrary keys (which risk colliding in the glob) or arbitrarily-typed values (without encoding). But it's good enough for this experimental flag. More-sophisticated approaches, such as scanning the contents of "/var/run/qubes-service", would require AppArmor grants, which we already have for qubesdb-cmd.

…ENDEDTIMEOUT is set

gonzalo-bulnes · 2022-09-07T05:23:05Z

(rebased without changes to include new CI check)

eaon

Going with an approval without merging as I have not tested this, I'm just just weighing in on the implementation:

I think this looks great! Balances precedent (qubesdb reads in the start script) and an approximation of what we may see in the future (i.e. lightweight ways of communicating information to the client without requiring a run of sdw-admin --apply) - and is easily reversible. Love it!

I don't have time to go through the test plan, but if someone else can confirm this is working as expected, I think this PR is ready to be merged! Thanks @cfm 😄

eloquence · 2022-09-07T20:37:43Z

(Just a note that I'm currently syncing with the SDTIMEOUT_600 setting against Cory's server with 3500 sources. Will report results here and merge if it looks good, per Michael's prior conditional approval.)

eloquence · 2022-09-07T20:52:32Z

tests/api_jobs/test_sync.py

+    remote_user = factory.RemoteUser()
+    api_client.get_users = mocker.MagicMock(return_value=[remote_user])
+
+    os.environ["SDEXTENDEDTIMEOUT"] = str(TIMEOUT_OVERRIDE)  # environment value must be string


One alternative strategy to consider for tests that modify env vars might be to mock os.environ, see https://github.com/freedomofpress/securedrop-workstation/blob/b31d0acdf35cf5248acbe2dcd9372483dfde424f/launcher/tests/test_util.py#L259-L266 for an example (not a blocking comment, just an observation).

eloquence · 2022-09-07T22:34:04Z

I've not tested the old behavior, since I've previously confirmed that syncs against a large number of sources do indeed time out. The new behavior works as expected.

Exploratory testing confirms the expected behavior of the new service
SDEXTENDEDTIMEOUT=600 appears in securedrop-client's standard output.
Sync succeeds,
In sd-log, you see (e.g.): WARNING: MetadataSyncJob will use default_request_timeout=600

eloquence · 2022-09-07T22:36:50Z

(Upon consideration, I've held off on merge for now, since we've invited an external stakeholder to test this PR before merging.)

philmcmahon · 2022-09-09T11:28:17Z

I've not managed to run this locally but can confirm that this appears to do exactly what we've been doing by manually patching sync.py, so a big 👍🏻 from us

eloquence · 2022-09-13T01:14:15Z

Per chat with @creviera and previous reviews, merging :)

cfm · 2022-09-15T00:21:09Z

Thanks to @eaon for pairing on the design here and to all who reviewed, especially @philmcmahon!

cfm requested a review from eaon August 30, 2022 23:20

cfm requested a review from a team as a code owner August 30, 2022 23:20

gonzalo-bulnes reviewed Aug 31, 2022

View reviewed changes

cfm mentioned this pull request Sep 1, 2022

Consider improvements to log verbosity, readability and parseability #1166

Open

cfm assigned eaon and eloquence Sep 6, 2022

cfm requested a review from eloquence September 6, 2022 17:10

cfm added 4 commits September 7, 2022 15:22

refactor(MetadataSyncJob): define constant DEFAULT_REQUEST_TIMEOUT

5e83215

---so that we have something to compare to when we override it.

test: MetadataSyncJob obeys $SDEXTENDEDTIMEOUT

715d7ab

feat(MetadataSyncJob): override API.default_request_timeout if $SDEXT…

0679ae4

…ENDEDTIMEOUT is set

gonzalo-bulnes force-pushed the 1547-override-timeout branch from 9d4873f to 0679ae4 Compare September 7, 2022 05:22

eaon approved these changes Sep 7, 2022

View reviewed changes

eloquence reviewed Sep 7, 2022

View reviewed changes

eloquence merged commit 611f396 into main Sep 13, 2022

eloquence deleted the 1547-override-timeout branch September 13, 2022 01:14

This was referenced Sep 13, 2022

Release securedrop-client 0.8.1 #1558

Closed

Release experimental support for custom timeout #1561

Closed

rocodes mentioned this pull request Nov 23, 2023

Qubes: Checking if it has the RPC Policy and Services prior to Starting Conversion freedomofpress/dangerzone#623

Open

cfm mentioned this pull request Jan 24, 2024

Document the process of increasing sync timeouts freedomofpress/securedrop-workstation-docs#181

Merged

2 tasks

zenmonkeykstop mentioned this pull request May 14, 2024

Transparently resume and complete stalled or failed submission downloads. #1994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow overriding `MetadataSyncJob` timeout via Qubes service flag #1552

feat: allow overriding `MetadataSyncJob` timeout via Qubes service flag #1552

cfm commented Aug 30, 2022

gonzalo-bulnes left a comment

gonzalo-bulnes Aug 31, 2022 •

edited

Loading

cfm Sep 1, 2022 •

edited

Loading

gonzalo-bulnes Sep 2, 2022

gonzalo-bulnes commented Sep 7, 2022

eaon left a comment

eloquence commented Sep 7, 2022

eloquence Sep 7, 2022

eloquence commented Sep 7, 2022

eloquence commented Sep 7, 2022

philmcmahon commented Sep 9, 2022

eloquence commented Sep 13, 2022

cfm commented Sep 15, 2022

feat: allow overriding MetadataSyncJob timeout via Qubes service flag #1552

feat: allow overriding MetadataSyncJob timeout via Qubes service flag #1552

Conversation

cfm commented Aug 30, 2022

Description

Test Plan

To test the current failure mode (#1547)

To test overriding the timeout

Checklist

gonzalo-bulnes left a comment

Choose a reason for hiding this comment

gonzalo-bulnes Aug 31, 2022 • edited Loading

Choose a reason for hiding this comment

cfm Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

gonzalo-bulnes Sep 2, 2022

Choose a reason for hiding this comment

gonzalo-bulnes commented Sep 7, 2022

eaon left a comment

Choose a reason for hiding this comment

eloquence commented Sep 7, 2022

eloquence Sep 7, 2022

Choose a reason for hiding this comment

eloquence commented Sep 7, 2022

eloquence commented Sep 7, 2022

philmcmahon commented Sep 9, 2022

eloquence commented Sep 13, 2022

cfm commented Sep 15, 2022

feat: allow overriding `MetadataSyncJob` timeout via Qubes service flag #1552

feat: allow overriding `MetadataSyncJob` timeout via Qubes service flag #1552

gonzalo-bulnes Aug 31, 2022 •

edited

Loading

cfm Sep 1, 2022 •

edited

Loading