Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce, fix unexpected exception when processing zero byte file #65

Merged
merged 1 commit into from
May 23, 2020

Conversation

vinceatbluelabs
Copy link
Contributor

Replace it with an expected exception with a better error message that comes from Pandas.

Before:

(records-mover-3.8.2) ]0;graybookprobroz@graybookpro:~/src/records-mover$ mvrec file2url ./empty.csv s3://bluelabs-scratch/vince.broz/bar.csv

10:09:33 - Starting...
10:09:34 - Mover: copying from DataUrlRecordsSource(None) to DataUrlTarget by first writing DataUrlRecordsSource(None) to DelimitedRecordsFormat(bluelabs - {'compression': None}) records format (if easy to rewrite)...
10:09:34 - Determining records format with initial_hints={'compression': None}
10:09:34 - Got unrecognized encoding from chardet sniffing: {'encoding': None, 'confidence': 0.0, 'language': None}
10:09:34 - Python could not determine newline format of file.
10:09:34 - 
Traceback (most recent call last):
  File "/Users/broz/src/records-mover/records_mover/records/job/mover.py", line 37, in run_records_mover_job
    return records.move(source, target, processing_instructions)
  File "/Users/broz/src/records-mover/records_mover/records/mover.py", line 93, in move
    with records_source.\
  File "/Users/broz/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/broz/src/records-mover/records_mover/records/sources/data_url.py", line 46, in to_fileobjs_source
    with FileobjsSource.\
  File "/Users/broz/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/broz/src/records-mover/records_mover/records/sources/fileobjs.py", line 55, in infer_if_needed
    sniff_hints_from_fileobjs(list(target_names_to_input_fileobjs.values()),
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 202, in sniff_hints_from_fileobjs
    hints = sniff_hints(fileobj, initial_hints=initial_hints)
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 265, in sniff_hints
    streaming_hints.update(python_inferred_hints)  # type: ignore
UnboundLocalError: local variable 'python_inferred_hints' referenced before assignment
(records-mover-3.8.2) ]0;graybookprobroz@graybookpro:~/src/records-mover$ 

After:

(records-mover-3.8.2) �]0;graybookpro�broz@graybookpro:~/src/records-mover$ mvrec file2url ./empty.csv s3://bluelabs-scratch/vince.broz/bar.csv
10:22:25 - Starting...
10:22:25 - Mover: copying from DataUrlRecordsSource(None) to DataUrlTarget by first writing DataUrlRecordsSource(None) to DelimitedRecordsFormat(bluelabs - {'compression': None}) records format (if easy to rewrite)...
10:22:25 - Determining records format with initial_hints={'compression': None}
10:22:25 - Got unrecognized encoding from chardet sniffing: {'encoding': None, 'confidence': 0.0, 'language': None}
10:22:25 - Python could not determine newline format of file.
10:22:26 - Attempting to parse with quoting: minimal
10:22:26 - Attempting to parse with quoting: None
10:22:26 - 
Traceback (most recent call last):
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2558, in _infer_columns
    line = self._buffered_line()
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2734, in _buffered_line
    return self._next_line()
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2831, in _next_line
    orig_line = self._next_iter_line(row_num=self.pos + 1)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2891, in _next_iter_line
    return next(self.data)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 171, in csv_hints_from_pandas
    return attempt_parse(quoting='minimal')
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 158, in attempt_parse
    with stream_csv(fresh_fileobj, current_hints):
  File "/Users/broz/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/broz/src/records-mover/records_mover/records/delimited/csv_streamer.py", line 69, in stream_csv
    out = read_csv(text_fileobj, **kwargs)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 1126, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2286, in __init__
    ) = self._infer_columns()
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2580, in _infer_columns
    raise EmptyDataError("No columns to parse from file")
pandas.errors.EmptyDataError: No columns to parse from file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2558, in _infer_columns
    line = self._buffered_line()
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2734, in _buffered_line
    return self._next_line()
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2831, in _next_line
    orig_line = self._next_iter_line(row_num=self.pos + 1)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2891, in _next_iter_line
    return next(self.data)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/broz/src/records-mover/records_mover/records/job/mover.py", line 37, in run_records_mover_job
    return records.move(source, target, processing_instructions)
  File "/Users/broz/src/records-mover/records_mover/records/mover.py", line 93, in move
    with records_source.\
  File "/Users/broz/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/broz/src/records-mover/records_mover/records/sources/data_url.py", line 46, in to_fileobjs_source
    with FileobjsSource.\
  File "/Users/broz/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/broz/src/records-mover/records_mover/records/sources/fileobjs.py", line 55, in infer_if_needed
    sniff_hints_from_fileobjs(list(target_names_to_input_fileobjs.values()),
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 202, in sniff_hints_from_fileobjs
    hints = sniff_hints(fileobj, initial_hints=initial_hints)
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 268, in sniff_hints
    pandas_inferred_hints = csv_hints_from_pandas(fileobj, streaming_hints)
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 173, in csv_hints_from_pandas
    return attempt_parse(quoting=None)
  File "/Users/broz/src/records-mover/records_mover/records/delimited/sniff.py", line 158, in attempt_parse
    with stream_csv(fresh_fileobj, current_hints):
  File "/Users/broz/.pyenv/versions/3.8.2/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/broz/src/records-mover/records_mover/records/delimited/csv_streamer.py", line 69, in stream_csv
    out = read_csv(text_fileobj, **kwargs)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 1126, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2286, in __init__
    ) = self._infer_columns()
  File "/Users/broz/.pyenv/versions/3.8.2/envs/records-mover-3.8.2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2580, in _infer_columns
    raise EmptyDataError("No columns to parse from file")
pandas.errors.EmptyDataError: No columns to parse from file
(records-mover-3.8.2) �]0;graybookpro�broz@graybookpro:~/src/records-mover$ 

Replace it with an expected exception with a better error message that
comes from Pandas.
@vinceatbluelabs vinceatbluelabs requested a review from cwegrzyn May 23, 2020 14:23
Copy link
Contributor

@cwegrzyn cwegrzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@vinceatbluelabs vinceatbluelabs merged commit 525d9dd into master May 23, 2020
@vinceatbluelabs vinceatbluelabs deleted the zero_bytes branch May 23, 2020 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants