Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: Confusing warning and error combination when query contains a missing column #1592

Closed
victorlin opened this issue Aug 21, 2024 · 2 comments · Fixed by #1604
Closed
Labels
bug Something isn't working

Comments

@victorlin
Copy link
Member

@huddlej noted in nextstrain/mpox#273 (comment) that this output comes across as a bug (that a warning is also an error):

WARNING: Column 'QC_rare_mutations' does not exist in the metadata file. Ignoring it.
ERROR: Query contains a column that does not exist in metadata.

This happens because the columns are effectively checked twice:

  1. read_metadata which does not error by design:

    augur/augur/io/metadata.py

    Lines 124 to 127 in d8faf01

    # Ignore missing columns. Don't error since augur filter's
    # --exclude-where allows invalid columns to be specified (they
    # are just ignored).
    print_err(f"WARNING: Column '{requested_column}' does not exist in the metadata file. Ignoring it.")

  2. Evaluation of the query:

    try:
    return set(metadata_copy.query(query).index.values)
    except Exception as e:
    if isinstance(e, PandasUndefinedVariableError):
    raise AugurError(f"Query contains a column that does not exist in metadata.") from e

Possible solution

Both checks are beneficial, so it doesn't make sense to remove either. Instead, the warning text could be updated to indicate potential issues:

WARNING: Column 'QC_rare_mutations' does not exist in the metadata file. This may cause subsequent errors.
ERROR: Query contains a column that does not exist in metadata.
@victorlin victorlin added the bug Something isn't working label Aug 21, 2024
@genehack
Copy link
Contributor

I would think if you're going to be making changes here, updating the second (error) message to include the name of the missing column would be a good improvement — it will make the combination of messages overall more clear because it will be apparent that both of them are about the same column.

@victorlin
Copy link
Member Author

updating the second (error) message to include the name of the missing column would be a good improvement

I agree, and looked into this briefly. Pandas's UndefinedVariableError does not expose the column name apart from being in the error message text name 'QC_rare_mutations' is not defined, and parsing it from there feels wrong. Maybe a good compromise would be to tack on the entire Pandas error message like this:

WARNING: Column 'QC_rare_mutations' does not exist in the metadata file. This may cause subsequent errors.
ERROR: Query contains a column that does not exist in metadata: name 'QC_rare_mutations' is not defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants