Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when S3 _format_* file doesn't exist. #121

Merged
merged 4 commits into from
Nov 10, 2020

Conversation

vinceatbluelabs
Copy link
Contributor

I encountered an issue when trying to load from an incomplete records directory that didn't have a _format_parquet file inside. Turns out that the AWS S3 list_objects() API doesn't include a key when there are no files matched, rather than including the key with an empty list like the code assumed.

Here's the error message I got, which is not an ideal one for users:

(records-mover-3.8.5) ]0;bigbookprobroz@bigbookpro:~/src/records-mover$ mvrec recordsdir2table s3://bluelabs-scratch/vince.broz/WTwRu8xDeco/ bltoolsdevbq targetsmart_copy_test_autodelete person_20201013
13:15:21 - Starting...
13:15:22 - Found no service account info for BigQuery, using local creds
13:15:23 - 
Traceback (most recent call last):
  File "/Users/broz/src/records-mover/records_mover/records/job/mover.py", line 35, in run_records_mover_job
    source = source_method(**source_kwargs)
  File "/Users/broz/src/records-mover/records_mover/records/sources/factory.py", line 191, in directory_from_url
    return RecordsDirectoryRecordsSource(directory=directory,
  File "/Users/broz/src/records-mover/records_mover/records/sources/directory.py", line 21, in __init__
    self.records_format = directory.load_format(fail_if_dont_understand=fail_if_dont_understand)
  File "/Users/broz/src/records-mover/records_mover/records/records_directory.py", line 47, in load_format
    return self.format_file.load_format(fail_if_dont_understand)
  File "/Users/broz/src/records-mover/records_mover/records/records_format_file.py", line 16, in load_format
    matching_locs = self.records_loc.files_matching_prefix(prefix)
  File "/Users/broz/src/records-mover/records_mover/url/s3/s3_directory_url.py", line 49, in files_matching_prefix
    return [self._key_in_same_bucket(o['Key']) for o in resp['Contents']]
KeyError: 'Contents'
(records-mover-3.8.5) ]0;bigbookprobroz@bigbookpro:~/src/records-mover$

After this fix, folks get a better error message:

(records-mover-3.8.5) �]0;bigbookpro�broz@bigbookpro:~/src/records-mover$ mvrec recordsdir2table s3://bluelabs-scratch/vince.broz/WTwRu8xDeco/ bltoolsdevbq targetsmart_copy_test_autodelete person_20201013
13:20:33 - Starting...
13:20:33 - Found no service account info for BigQuery, using local creds
13:20:35 - 
Traceback (most recent call last):
  File "/Users/broz/src/records-mover/records_mover/records/job/mover.py", line 35, in run_records_mover_job
    source = source_method(**source_kwargs)
  File "/Users/broz/src/records-mover/records_mover/records/sources/factory.py", line 191, in directory_from_url
    return RecordsDirectoryRecordsSource(directory=directory,
  File "/Users/broz/src/records-mover/records_mover/records/sources/directory.py", line 21, in __init__
    self.records_format = directory.load_format(fail_if_dont_understand=fail_if_dont_understand)
  File "/Users/broz/src/records-mover/records_mover/records/records_directory.py", line 47, in load_format
    return self.format_file.load_format(fail_if_dont_understand)
  File "/Users/broz/src/records-mover/records_mover/records/records_format_file.py", line 18, in load_format
    raise TypeError(f"_format file not found in bucket under {prefix}*")
TypeError: _format file not found in bucket under _format_*
(records-mover-3.8.5) �]0;bigbookpro�broz@bigbookpro:~/src/records-mover$ 

@vinceatbluelabs vinceatbluelabs changed the title Handle case where files_matching_prefix() has no results Better error message when S3 _format_* file doesn't exist. Nov 8, 2020
@vinceatbluelabs vinceatbluelabs marked this pull request as ready for review November 8, 2020 18:55
Copy link

@crvena-sonja crvena-sonja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@vinceatbluelabs vinceatbluelabs merged commit 0a5e1fc into master Nov 10, 2020
@vinceatbluelabs vinceatbluelabs deleted the better_error_message branch November 10, 2020 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants