Shorter repr for attributes #1322

Zac-HD · 2017-03-24T00:26:26Z

NetCDF files often have tens of attributes, including multi-paragraph summaries or the full modification history of the file. It's great to have this available in the .attrs, but we can truncate it substantially
in the repr! Hopefully this will stop people writing data.attrs = {} and discarding metadata in interactive workflows for the sake of cleaner output.

closes Truncate long lines in repr of Dataset.attrs #1319
test data adjusted
passes git diff upstream/master | flake8 --diff
whatsnew entry

shoyer

This needs more comprehensive unit tests, to verify all the clever logic for truncating and reformatting multi-line attribute strings. Right now we don't really exercise that logic at all in the test suite. I would probably put these in test_formatting.py.

shoyer · 2017-03-24T06:03:44Z

xarray/core/formatting.py

+    empty_line = output.find(u'\n\n')
+    if empty_line != -1:
+        output = output[:empty_line] + u'...'
+    return maybe_truncate(output.replace(u'\n', u' '), 79)


Use OPTIONS['display_width'] instead of 79 here.

shoyer · 2017-03-24T06:12:57Z

xarray/core/formatting.py

+    key = u'    %s:' % unicode_type(key)
+    if col_width is not None:
+        key = pretty_print(key, col_width)
+    output = u'%s %s' % (key, unicode_type(value).replace(u'\t', u'    '))


Maybe we should only do all this text formatting logic if the attribute is already a string? I can imagine this doing strange things if the attribute is a number or array.

I disagree - it's just adjusting the spacing and replacing tabs and newlines before printing them. While I don't think numbers or array representations would have spaces (and arrays with newlines are rare), surely we want to process them for display in the same way?

If somebody is putting non-strings in attributes (which they almost certainly are, as ill-advised as that may be), there is some value in having them appear in a noticeably distinct way. I would rather have non-strings look a little funky and span multiple lines than be confused with the repr of the objects.

e.g., consider how the repr of an xarray.Dataset looks:

In [16]: repr(xarray.Dataset({'foo': ('x', [1, 2, 3])}))[:70] + '...' Out[16]: '<xarray.Dataset>\nDimensions: (x: 3)\nDimensions without coordinates: x...'

Though, it might possibly make sense to "indent" their reprs on any newlines to align them with the indentation of first line.

@fmaussion @pwolfram any opinions here?

I think that indentation would help clarify. 👍

Indentation would certainly be an improvement, but I still strongly prefer to limit each attribute to a single line.

My motivation for this pull request is seeing a common pattern in notebooks dealing with NetCDF data of simply discarding the attrs metadata, because (a) the formatting is a mess and (b) it's a lot of output. Taking the example in #1319, this is a temporary file with "only" 29 attributes, of which eight are long enough to truncate and three include newlines. This is a short example. With larger workflows, this pattern destroys provenance and reproducibility - along with directly useful information!

If users need the unmodified attributes or are unsure about types, they can access them directly - so I would really prefer to keep the repr to a single line for each, no matter what.

In the examples you've seen, how often do these datasets with long lists of attributes include non-string attributes?

Rather, non-string attributes that have a repr that spans multiple lines.

It's not uncommon to find numbers or timestamps in the attributes, but it would surprise me to find something with a multiline repr. I'd still want to truncate it to one line in that case, but that's more of a personal preference I guess.

NetCDF files often have tens of attributes, including multi-paragraph summaries or the full modification history of the file. It's great to have this available in the .attrs, but we can truncate it substantially in the repr! Hopefully this will stop people writing `data.attrs = {}` and discarding metadata in interactive workflows for the sake of cleaner output.

Zac-HD · 2017-03-25T03:30:38Z

If the logic is "clever", time to ditch it. I've rebased to a clearer version which simply handles indentation and replaces tabs and newlines with their backslash representations.

What properties of this would you want to test?

Zac-HD · 2017-03-27T03:18:29Z

Ping @shoyer

shoyer · 2017-03-27T15:12:15Z

If the logic is "clever", time to ditch it. I've rebased to a clearer version which simply handles indentation and replaces tabs and newlines with their backslash representations.

"Clever" was possibly too strong of a word. I think both your original and new proposal are fine -- use your own best taste on which version is most readable.

It's more that we need unit tests for any functionality. Otherwise, things tend to brake inadvertently when someone else does code cleanup months or years later.

What properties of this would you want to test?

This could be super simple here, e.g., just a few assert statements verifying the text substitutions and truncation. Ideally we exercise every line of logic in your code.

Zac-HD · 2017-03-30T12:03:28Z

@shoyer - my preference is to process and truncate all strings, including object reprs. If you want to overrule that feel free; otherwise I think it's ready to merge.

shoyer · 2017-04-03T00:50:17Z

@Zac-HD My guess is that non-string multi-line attributes are quite rare, so in truth it doesn't really matter what we pick (even though my preference leaned the other way from yours). If someone cares enough to complain we can revisit this, but until then I'm OK with your choice.

shoyer · 2017-04-03T00:50:28Z

Thanks for the contribution!

shoyer reviewed Mar 24, 2017

View reviewed changes

Zac-HD force-pushed the master branch from dd98913 to 1e115fc Compare March 25, 2017 03:15

Add test for attribute repr

b93eaab

shoyer merged commit d08efaf into pydata:master Apr 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shorter repr for attributes #1322

Shorter repr for attributes #1322

Zac-HD commented Mar 24, 2017 •

edited

Loading

shoyer left a comment

shoyer Mar 24, 2017

shoyer Mar 24, 2017

Zac-HD Mar 25, 2017

shoyer Mar 27, 2017

pwolfram Mar 27, 2017

Zac-HD Mar 28, 2017

shoyer Mar 28, 2017

shoyer Mar 28, 2017

Zac-HD Mar 28, 2017

Zac-HD commented Mar 25, 2017

Zac-HD commented Mar 27, 2017

shoyer commented Mar 27, 2017

Zac-HD commented Mar 30, 2017

shoyer commented Apr 3, 2017

shoyer commented Apr 3, 2017

Shorter repr for attributes #1322

Shorter repr for attributes #1322

Conversation

Zac-HD commented Mar 24, 2017 • edited Loading

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zac-HD commented Mar 25, 2017

Zac-HD commented Mar 27, 2017

shoyer commented Mar 27, 2017

Zac-HD commented Mar 30, 2017

shoyer commented Apr 3, 2017

shoyer commented Apr 3, 2017

Zac-HD commented Mar 24, 2017 •

edited

Loading