-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow overriding MetadataSyncJob
timeout via Qubes service flag
#1552
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a review because I'm not set up right now to follow the test plan. But just a note to say that I find the description and test plan fantastic! Having only loosely followed the conversation on this issue, I feel I understand what was done and how it should behave. 🙌
logger.warn( | ||
f"{self.__class__.__name__} will use " | ||
f"default_request_timeout={api_client.default_request_timeout}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor pet peeve of mine: since the message doesn't require action (e.g. like a deprecation warning does), I would prefer it to be at INFO
level.
I say pet peeve because I find personally that most warnings shouldn't be emitted (potentially controversial opinion) because they fall into the following categories:
- Errors that are not fatal because they're handled. (In my experience, the source of the error is unlikely to be fixed anyway if given the opportunity the choice was made, instead, to print a warning. 🤷 So the warning is mostly noise IMHO.)
- Fatal errors, which shouldn't have happened and would better be logged at the ERROR level since they signal a bug to be fixed. (Foreseeable errors must be handled.)
- Or messages that don't require user action and only provide context. (At which point I think competing for attention at the warning level contributes more to error-fatigue than it really helps keeping us informed. Using an INFO level seems more honest to me.)
What would I make a warning? Things that will become fatal errors unless action is taken, e.g. deprecation warnings. YMMV! Admittedly it's nitpickey and mostly a matter of personal preference. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree with your criteria, @gonzalo-bulnes. In this specific case, I'm rounding up slightly from our decision to call out this flag "explicitly as experimental, i.e. we cannot guarantee that we will support it in feature" to something akin to a deprecation warning. But I'm happy to demote this message to INFO
if that logic isn't persuasive.
(See also: #1166.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that makes sense @cfm I missed that intent! I'm not sure what context goes into the current wording: would there be an opportunity to call out explicitly (and additionally to the warning level): "WARNING: ... use experimental default_..."?
---so that we have something to compare to when we override it.
…IMEOUT=N Since a Qubes service just sets a boolean flag, we use qubesdb-list (1) as a glob, for any key beginning with the prefix "SDEXTENDEDTIMEOUT_"; and (2) to return the "value", aka the key without that prefix. This approach is too naïve to work for arbitrary keys (which risk colliding in the glob) or arbitrarily-typed values (without encoding). But it's good enough for this experimental flag. More-sophisticated approaches, such as scanning the contents of "/var/run/qubes-service", would require AppArmor grants, which we already have for qubesdb-cmd.
…ENDEDTIMEOUT is set
9d4873f
to
0679ae4
Compare
(rebased without changes to include new CI check) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going with an approval without merging as I have not tested this, I'm just just weighing in on the implementation:
I think this looks great! Balances precedent (qubesdb reads in the start script) and an approximation of what we may see in the future (i.e. lightweight ways of communicating information to the client without requiring a run of sdw-admin --apply
) - and is easily reversible. Love it!
I don't have time to go through the test plan, but if someone else can confirm this is working as expected, I think this PR is ready to be merged! Thanks @cfm 😄
(Just a note that I'm currently syncing with the SDTIMEOUT_600 setting against Cory's server with 3500 sources. Will report results here and merge if it looks good, per Michael's prior conditional approval.) |
remote_user = factory.RemoteUser() | ||
api_client.get_users = mocker.MagicMock(return_value=[remote_user]) | ||
|
||
os.environ["SDEXTENDEDTIMEOUT"] = str(TIMEOUT_OVERRIDE) # environment value must be string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative strategy to consider for tests that modify env vars might be to mock os.environ
, see https://github.com/freedomofpress/securedrop-workstation/blob/b31d0acdf35cf5248acbe2dcd9372483dfde424f/launcher/tests/test_util.py#L259-L266 for an example (not a blocking comment, just an observation).
I've not tested the old behavior, since I've previously confirmed that syncs against a large number of sources do indeed time out. The new behavior works as expected.
|
(Upon consideration, I've held off on merge for now, since we've invited an external stakeholder to test this PR before merging.) |
I've not managed to run this locally but can confirm that this appears to do exactly what we've been doing by manually patching |
Per chat with @creviera and previous reviews, merging :) |
Thanks to @eaon for pairing on the design here and to all who reviewed, especially @philmcmahon! |
Description
Closes #1547 by:
securedrop-client
script to check for a Qubes service flag likeSDEXTENDEDTIMEOUT_N
and export it asSDEXTENDEDTIMEOUT=N
;MetadataSyncJob
(noisily) override itssdclientapi.API.default_request_timeout
with$SDEXTENDEDTIMEOUT
if set in the environment.Test Plan
Point your SecureDrop Workstation at my testing instance per #1547 (comment) and #1547 (comment). Then:
To test the current failure mode (#1547)
Check out
1547-override-timeout
insd-app:/home/user/securedrop-client
.Apply the following patch so that we can test
files/securedrop-client
within the virtual environment:Then:
Log in and wait for sync. Against my testing instance (~3500 sources and submissions):
To test overriding the timeout
Set the service flag:
You can also do
SDEXTENDEDTIMEOUT_10
if you're impatient. :-)Explore how that flag is exposed within
sd-app
:Launch the client as above:
SDEXTENDEDTIMEOUT=600
appears insecuredrop-client
's standard output.Log in and wait for sync.
SDEXTENDEDTIMEOUT
you set is long enough, or (b) fails afterSDEXTENDEDTIMEOUT
seconds.sd-log
, you see (e.g.):WARNING: MetadataSyncJob will use default_request_timeout=600
Checklist
If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:
If these changes add or remove files other than client code, the AppArmor profile may need to be updated. Please check as applicable:
And see e21c163 for why!
If these changes modify the database schema, you should include a database migration. Please check as applicable:
main
and confirmed that the migration is self-contained and applies cleanlymain
and would like the reviewer to do so