Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX - AttributeError raised in StataReader.value_labels() #19510

Merged
merged 3 commits into from
Feb 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,7 @@ I/O
- Bug in :func:`DataFrame.to_parquet` where an exception was raised if the write destination is S3 (:issue:`19134`)
- :class:`Interval` now supported in :func:`DataFrame.to_excel` for all Excel file types (:issue:`19242`)
- :class:`Timedelta` now supported in :func:`DataFrame.to_excel` for xls file type (:issue:`19242`, :issue:`9155`)
- Bug in :meth:`pandas.io.stata.StataReader.value_labels` raising an ``AttributeError`` when called on very old files. Now returns an empty dict (:issue:`19417`)

Plotting
^^^^^^^^
Expand Down
8 changes: 5 additions & 3 deletions pandas/io/stata.py
Original file line number Diff line number Diff line change
Expand Up @@ -1341,12 +1341,14 @@ def _null_terminate(self, s):
return s

def _read_value_labels(self):
if self.format_version <= 108:
# Value labels are not supported in version 108 and earlier.
return
if self._value_labels_read:
# Don't read twice
return
if self.format_version <= 108:
# Value labels are not supported in version 108 and earlier.
self._value_labels_read = True
self.value_label_dict = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a preference stated recently to use {} rather than dict()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah can use {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict() is internally consistent with the function.

return

if self.format_version >= 117:
self.path_or_buf.seek(self.seek_value_labels)
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/io/test_stata.py
Original file line number Diff line number Diff line change
Expand Up @@ -589,6 +589,16 @@ def test_105(self):
df0['psch_dis'] = df0["psch_dis"].astype(np.float32)
tm.assert_frame_equal(df.head(3), df0)

def test_value_labels_old_format(self):
# GH 19417
#
# Test that value_labels() returns an empty dict if the file format
# predates supporting value labels.
dpath = os.path.join(self.dirpath, 'S4_EDUC1.dta')
reader = StataReader(dpath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to close the reader (or do this in a context manager) see other tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close()d

assert reader.value_labels() == {}
reader.close()

def test_date_export_formats(self):
columns = ['tc', 'td', 'tw', 'tm', 'tq', 'th', 'ty']
conversions = {c: c for c in columns}
Expand Down