Proof-of-concept: warn that only last --exclude-where used when user passes multiple #1229

corneliusroemer · 2023-05-17T22:54:38Z

See https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1684360772513889
for discussion leading up to this

This change is entirely backwards compatible. All it does is add a helpful warning

Description of proposed changes

Warn users that only last --exclude-where is used when user passes the same optional arg multiple times.

This is a proof of concept. It would make sense to roll this out more systematically to all arguments where we have nargs="+"

The warning looks as follows:

$ augur filter             --sequences data/sequences.fasta.zst             --metadata results/metadata.tsv             --exclude config/exclude_accessions_mpxv.txt             --output-sequences results/hmpxv1/good_sequences.fasta.zst             --output-metadata results/hmpxv1/good_metadata.tsv             --min-date 2017             --min-length 180000             --exclude-where outbreak!='hMPXV-1' clade!='IIb' --exclude-where 'coverage<0.95' --exclude-where 'divergence<550' --exclude-where 'missing_data>10000' --exclude-where 'divergence>650' --exclude-where 'QC_rare_mutations!=good'             --output-log results/hmpxv1/good_filter.log

WARNING: You passed `--exclude-where` 6 times. Only the last argument ['QC_rare_mutations!=good'] will be used.
The following arguments {'divergence>650', 'divergence<550', 'coverage<0.95', 'outbreak!=hMPXV-1', 'clade!=IIb', 'missing_data>10000'} will be ignored.
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
519 strains were dropped during filtering
        42 of these were dropped because of 'QC_rare_mutations!=good'
        113 of these were dropped because they were earlier than 2017.0 or missing a date
        364 of these were dropped because they were shorter than minimum length of 180000bp
6553 strains passed all filters

Testing

What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?

Checklist

Add a message in CHANGES.md summarizing the changes in this PR that are end user focused. Keep headers and formatting consistent with the rest of the file.

…passes multiple See https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1684360772513889 for discussion leading up to this This change is entirely backwards compatible. All it does is add a helpful warning

codecov · 2023-05-17T23:15:56Z

Codecov Report

Patch coverage: 60.00% and project coverage change: -0.02 ⚠️

Comparison is base (5c1fc41) 68.87% compared to head (ea609d1) 68.86%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1229      +/-   ##
==========================================
- Coverage   68.87%   68.86%   -0.02%     
==========================================
  Files          64       64              
  Lines        6937     6940       +3     
  Branches     1693     1694       +1     
==========================================
+ Hits         4778     4779       +1     
- Misses       1854     1855       +1     
- Partials      305      306       +1

Impacted Files	Coverage Δ
augur/filter/validate_arguments.py	`80.00% <33.33%> (-8.24%)`	⬇️
augur/filter/__init__.py	`100.00% <100.00%> (ø)`
augur/filter/include_exclude_rules.py	`97.72% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

joverlee521 · 2023-05-18T20:16:31Z

Since --exclude-where already accepts multiple arguments, I would expect multiple uses of it to build up the list of conditions instead of just using the last provided.

I think this would mirror how we handle multiple --node-data args for export.

corneliusroemer · 2023-05-18T20:21:02Z

@joverlee521 Interesting, so you're saying we should change the behaviour? I'm open for that, but maybe it would be best to first start warning about that change then. This PR is more about the ability to warn than a case for changing it. I'm fine with the current behaviour as long as users are warned.

victorlin · 2023-05-24T07:53:23Z

augur/filter/include_exclude_rules.py

+        # Use only last exclude_where argument for backward compatibility.
+        # User warned in validate_arguments.py


I don't like how the validation logic is separate from the actual logic. It's not stated explicitly anywhere, but the way I see it is that validate_arguments.py contains logic done ahead of time to avoid unnecessary computation at the cost of logical duplication. For example, there's no point in going through filtering only to realize subsampling is not possible.

Since this change is done during filter construction which doesn't come after any computationally expensive tasks, I suggest doing the validation here and not validate_arguments.py.

victorlin · 2023-05-24T07:56:19Z

augur/filter/validate_arguments.py

+    if args.exclude_where and len(args.exclude_where) > 1:
+        print_err(f"WARNING: You passed `--exclude-where` {len(args.exclude_where)} times. Only the last argument {args.exclude_where[-1]} will be used.\n"
+                  f"The following arguments {set(sum(args.exclude_where,[])).difference(set(args.exclude_where[-1]))} will be ignored.")


I assume the current behavior also applies to --include-where and other filter options. Should those get the same treatment?

victorlin · 2024-04-15T17:46:00Z

+1 for directly changing the behavior. I don't see it as a breaking change - if a user is specifying --exclude-where multiple times, I doubt they are relying on the "feature" that only the last one is used. I created a PR to change it everywhere: #1445

Proof-of-concept: warn that only last --exclude-where used when user …

ea609d1

…passes multiple See https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1684360772513889 for discussion leading up to this This change is entirely backwards compatible. All it does is add a helpful warning

corneliusroemer force-pushed the warn-overwrite branch from de7352b to ea609d1 Compare May 17, 2023 23:00

corneliusroemer requested a review from a team May 18, 2023 20:21

victorlin reviewed May 24, 2023

View reviewed changes

victorlin mentioned this pull request Apr 15, 2024

Allow repeating an option that takes multiple values #1445

Merged

3 tasks

victorlin closed this in #1445 Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof-of-concept: warn that only last --exclude-where used when user passes multiple #1229

Proof-of-concept: warn that only last --exclude-where used when user passes multiple #1229

corneliusroemer commented May 17, 2023 •

edited

Loading

codecov bot commented May 17, 2023

joverlee521 commented May 18, 2023

corneliusroemer commented May 18, 2023

victorlin May 24, 2023

victorlin May 24, 2023

victorlin commented Apr 15, 2024

		# Use only last exclude_where argument for backward compatibility.
		# User warned in validate_arguments.py

Proof-of-concept: warn that only last --exclude-where used when user passes multiple #1229

Proof-of-concept: warn that only last --exclude-where used when user passes multiple #1229

Conversation

corneliusroemer commented May 17, 2023 • edited Loading

Description of proposed changes

Testing

Checklist

codecov bot commented May 17, 2023

Codecov Report

joverlee521 commented May 18, 2023

corneliusroemer commented May 18, 2023

victorlin May 24, 2023

Choose a reason for hiding this comment

victorlin May 24, 2023

Choose a reason for hiding this comment

victorlin commented Apr 15, 2024

corneliusroemer commented May 17, 2023 •

edited

Loading