CFTimeIndex calendar in repr #4092

aaronspring · 2020-05-25T15:55:20Z

Closes Indicate calendar type in CFTimeIndex repr #2416
Tests added
Passes isort -rc . && black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

Done:

added calendar property to CFTimeIndex
rebuild repr from pandas

spencerkclark

Thanks @aaronspring, sorry for taking a while to get back to you. These are good questions.

inheritance is disencouraged, how should I extend the __repr__ coming from pd.Index? Should I try to rebuild pd.Index?

I'll admit, I'm not an expert in writing array-like reprs. I agree though that we should do what we can to avoid relying on private pandas API. Have you looked into what this would entail? Is there a big web of internal methods we'd have to copy over?

how to deal with calendar=365(6)_day which get internally converted into noleap/all_leap?

I think I am ok with this. My sense is that it is more important that indexes with the same date type have the same calendar attribute than that the calendar argument passed to cftime_range is propagated verbatim to the index it produces. What do you think?

xarray/tests/test_cftimeindex.py

dcherian · 2020-06-02T02:38:14Z

how should I extend the __repr__ coming from pd.Index

if you define a __repr__ method under CFTimeIndex, won't it override pd.Index.__repr__?

aaronspring · 2020-06-02T08:10:24Z

how should I extend the __repr__ coming from pd.Index

if you define a __repr__ method under CFTimeIndex, won't it override pd.Index.__repr__?

Yes. I will do that. My question was whether should replicate the pd.Index.repr or try to import or inherit as much as I can from pandas?

dcherian · 2020-06-02T14:37:28Z

Ah now I understand your questions. Thanks for clarifying.

My question was whether should replicate the pd.Index.repr or try to import or inherit as much as I can from pandas?

Maybe start by making the smallest possible change to the pandas repr?

EDIT: If we want to build our own repr, there are some helpful functions in core/formatting.py

aaronspring · 2020-06-03T16:17:01Z

I was hoping to inherit from pandas like:

    def __repr__(self):
        super().__repr__()
        return self.__repr__().strip(")")+f", calendar={self.calendar}')"

But I will now try to rebuild as in pandas.

pep8speaks · 2020-06-03T21:28:21Z

Hello @aaronspring! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-19 17:01:25 UTC

aaronspring · 2020-06-04T11:02:00Z

This doesnt change the html repr yet: when clicking on the data symbol, calendar is not shown in the printout. I will further try to fix this

aaronspring · 2020-06-04T15:10:46Z

I do understand the meaning of this last sentence @spencerkclark

My sense is that it is more important that indexes with the same date type have the same calendar attribute than that the calendar argument passed to cftime_range is propagated verbatim to the index it produces. What do you think?

Should I rather test on cftimeindex created in a different way?

aaronspring · 2020-06-06T14:54:31Z

Currently only the cftimeindex repr shows the calendar property. I aim to get it into the dataset/dataarray repr.

would it be a good idea to modify formatting.py.array_repr(array)? IndexVariable uses this repr. Once time is a coordinate, it becomes an IndexVariable. but the index variable doesnt have this cftime.calender property anymore. Should I try to add this calendar property to the array or would this not be desired?

aaronspring · 2020-06-06T17:59:49Z

I found a workaround with to_index() in formatting.short_data_repr. this is makes one test fail: xarray/tests/test_dask.py TestDataArrayAndDataset.test_dataarray_repr. is this ok or not a valid way to implement a cftimeindex.repr?

EDIT: I ensure now that to_index is only done when if CFTimeIndex.

Ready for review.

spencerkclark

Thanks for the ping @aaronspring -- my apologies for being slow to respond! I appreciate the effort to go all the way toward reproducing the pandas repr. My remaining concern, and it is a little nit-picky, is that in its current form, there is no way to limit the width of the repr, e.g. with xarray.set_options(display_width=40) (see below for example for an existing xarray repr). Do you think you might be able to enable that?

In [1]: import xarray as xr
In [2]: ds = xr.tutorial.open_dataset("rasm")

In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...

In [4]: xr.set_options(display_width=40)
Out[4]: <xarray.core.options.set_options at 0x7fbdc834a668>

In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16...
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /works...
    institution:               U.W.
    source:                    RACM R...
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based ...
    comment:                   Output...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue De...

spencerkclark · 2020-07-05T16:59:57Z

xarray/coding/cftimeindex.py

+        len_item = 19  # length of one item in repr
+        # shorten repr for more than 100 items
+        max_width = (19 + 1) * 100 if len(self) <= 100 else 22 * len_item
+        datastr = format_array_flat(self.values, max_width)


I think format_array_flat is a good choice if we want a 2-line repr, but for including more values it might be cleaner to write our own logic, rather than add commas and line breaks afterwards. As you've picked up on, I think the fact that we can treat the length of each element of the repr in a cftime array as a constant simplifies things greatly.

This is of course ignoring the potential for five-digit years; however, we already assume we won't see those in at least one other place in xarray (partial string indexing). At some point it might be good address that, but I think for now it's ok to stick with the four-digit year assumption. Particularly here, I think the worst that would happen is that the repr might potentially be a few characters wider than the imposed limit.

yes, for now I use format_array_flat and insert commata and linebreaks manually. Are you suggestion I should rather write a new function? This function would probably use much of the code of format_array_flat.

concerning the 5digit years: xr.cftime_range(start='10000',periods=2) fails with ValueError: no ISO-8601 match for string: 10000

I now wrote a new function format_cftimeindex_array like format_array_flat. Hope this is what you were hoping to see. Both functions share much of the code especially in the beginning of the function. Should I refactor these shared code parts into a small function that both functions use?

concerning the 5digit years: xr.cftime_range(start='10000',periods=2) fails with ValueError: no ISO-8601 match for string: 10000

Oh right, we use the same string parsing logic in cftime_range as in partial datetime string indexing. I was thinking of dates one might read in from a file, which get decoded through cftime.num2date. Anyway I acknowledge that is an issue we don't need to address at the moment!

I now wrote a new function format_cftimeindex_array like format_array_flat. Hope this is what you were hoping to see.

Sorry I was thinking something more along these lines for the code that formats the times (the rest of the repr can be added around what it generates):

CFTIME_REPR_LENGTH = 19 def format_row(times, indent=0, separator=", ", row_end=",\n"): return indent * " " + separator.join(map(str, times)) + row_end def format_times( index, max_width, offset, separator=", ", first_row_offset=0, intermediate_row_end=",\n", last_row_end="" ): n_per_row = max(max_width // (CFTIME_REPR_LENGTH + len(separator)), 1) n_rows = int(np.ceil(len(index) / n_per_row)) representation = "" for row in range(n_rows): indent = first_row_offset if row == 0 else offset row_end = last_row_end if row == n_rows - 1 else intermediate_row_end times_for_row = index[row * n_per_row:(row + 1) * n_per_row] representation = representation + format_row( times_for_row, indent=indent, separator=separator, row_end=row_end ) return representation

In other words iteratively generating each row in the repr, inserting the separator as you build each row, and inserting line breaks at the end of each row. I just find it fits in my head better than adding those elements post-hoc. I think you should be able to leverage the code above to construct a "split" repr as well (e.g. one that shows only the first and last 10 elements of the index) by calling format_times twice with the appropriate arguments.

Ah ok. Now I think I get the idea...

thanks for the really nice template

implemented thanks to your nice template given above. ready for review @spencerkclark

aaronspring · 2020-07-07T10:00:18Z

now aligns to display_width:

for dw in [40,60,80,120]:
    with xr.set_options(display_width=dw):
        print(time[:2],dw,'\n')

CFTimeIndex([2000-01-01 00:00:00,
             2000-01-02 00:00:00],
            dtype='object',
            length=2,
            calendar='gregorian') 40 

CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
            dtype='object', length=2, calendar='gregorian') 60 

CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
            dtype='object', length=2, calendar='gregorian') 80 

CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00], dtype='object', length=2, calendar='gregorian') 120

aaronspring · 2020-07-09T12:35:49Z

dont understand why isort fails

keewis · 2020-07-09T12:57:43Z

that's #4204. #4206 pinned the version of isort, so you should be able to get a green CI by merging master into your feature branch.

…onspring/xarray into AS_CFTimeIndex_repr_calendar

spencerkclark

Thanks @aaronspring; this is looking very close! Just a few more minor suggestions.

xarray/core/formatting.py

xarray/coding/cftimeindex.py

xarray/tests/test_cftimeindex.py

xarray/coding/cftimeindex.py

aaronspring · 2020-07-18T19:34:29Z

Thanks @aaronspring; this is looking very close! Just a few more minor suggestions.

I hope this is the final one. all tests pass. implemented your suggestions. took me a few commits, but I learned a lot. thanks for the guidance @spencerkclark

spencerkclark

Thanks @aaronspring! I pushed a few more minor edits, mostly to spruce up the documentation a bit. Otherwise this looks good to me. Barring any comments from others I'll merge it sometime next week.

doc/whats-new.rst

Co-authored-by: keewis <[email protected]>

spencerkclark · 2020-07-23T10:42:40Z

Thanks again @aaronspring!

dcherian · 2020-07-23T14:47:19Z

Thanks @aaronspring this is a great contribution.

aaronspring · 2020-07-23T17:36:56Z

My pleasure. Learnt a lot.

And it was a long-standing issue that was mentioned in a medium article about xarray looking for more contributors.

* upstream/master: Added xarrays-spatial and updated geoviews link (pydata#4262) update docs to point to xarray-contrib and xarray-tutorial (pydata#4252) Add release summary, some touch-ups (pydata#4217) CFTimeIndex calendar in repr (pydata#4092) fix the RTD timeouts (pydata#4254) update isort CI and pre-commit hook (pydata#4204)

AS added 2 commits May 25, 2020 17:47

add property

0bd61d9

test repr skip

28cbf84

spencerkclark reviewed Jun 2, 2020

View reviewed changes

xarray/tests/test_cftimeindex.py Outdated Show resolved Hide resolved

AS added 2 commits June 3, 2020 21:37

repr

c5d36cc

linting

8d9ebe0

aaronspring requested a review from spencerkclark June 3, 2020 20:47

aaronspring changed the title ~~WIP: CFTimeIndex calendar in repr~~ CFTimeIndex calendar in repr Jun 3, 2020

remove unnecessary

c1cac9f

AS added 4 commits June 3, 2020 23:28

remove unnecessary

d09092a

add quotation marks to calendar

59dc912

add length to wrapper

1fccb53

linting

a52e997

aaronspring changed the title ~~CFTimeIndex calendar in repr~~ [WIP]: CFTimeIndex calendar in repr Jun 6, 2020

coords.to_index() if CFTimeIndex

6d022ea

AS added 4 commits June 7, 2020 01:09

to_index() iff CFTimeIndex

bf8c5a0

revert linting

1e809d0

revert linting

1bfdc7c

revert linting

d58cb7d

aaronspring changed the title ~~[WIP]: CFTimeIndex calendar in repr~~ CFTimeIndex calendar in repr Jun 6, 2020

aaronspring changed the title ~~CFTimeIndex calendar in repr~~ CFTimeIndex calendar in repr and coords repr from to_index Jun 6, 2020

aaronspring changed the title ~~CFTimeIndex calendar in repr and coords repr from to_index~~ WIP: CFTimeIndex calendar in repr Jun 9, 2020

spencerkclark reviewed Jul 6, 2020

View reviewed changes

sensitive to display_width

69d000f

aaronspring requested a review from spencerkclark July 7, 2020 10:10

AS added 2 commits July 7, 2020 16:06

rewritte format_cftimeindex_array from template of format_array_flat

80ca891

bugfix

3080d81

AS and others added 5 commits July 15, 2020 12:57

new approach

b77a2ee

Merge branch 'master' into AS_CFTimeIndex_repr_calendar

1e67f3f

docstring

d8d54ce

Merge branch 'AS_CFTimeIndex_repr_calendar' of https://github.com/aar…

4b62406

…onspring/xarray into AS_CFTimeIndex_repr_calendar

attrs spaces fix

249ae24

spencerkclark reviewed Jul 18, 2020

View reviewed changes

AS added 2 commits July 18, 2020 20:49

rm pandas test, refactor format_attrs and repr test dedent

ecef05a

rm f lint

7c31e3a

aaronspring requested a review from spencerkclark July 18, 2020 19:33

spencerkclark added 3 commits July 19, 2020 09:34

Pass index to format_attrs instead of attrs dict

683f00c

Update whats-new.rst

2707409

Add docstring for new calendar property

b7552b3

spencerkclark approved these changes Jul 19, 2020

View reviewed changes

keewis reviewed Jul 19, 2020

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

Update doc/whats-new.rst

e8d85db

Co-authored-by: keewis <[email protected]>

spencerkclark merged commit 0b706a4 into pydata:master Jul 23, 2020

spencerkclark mentioned this pull request Apr 16, 2021

Calendar utilities #5155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CFTimeIndex calendar in repr #4092

CFTimeIndex calendar in repr #4092

aaronspring commented May 25, 2020 •

edited

Loading

spencerkclark left a comment

dcherian commented Jun 2, 2020

aaronspring commented Jun 2, 2020

dcherian commented Jun 2, 2020 •

edited

Loading

aaronspring commented Jun 3, 2020

pep8speaks commented Jun 3, 2020 •

edited

Loading

aaronspring commented Jun 4, 2020

aaronspring commented Jun 4, 2020

aaronspring commented Jun 6, 2020

aaronspring commented Jun 6, 2020 •

edited

Loading

spencerkclark left a comment

spencerkclark Jul 5, 2020

aaronspring Jul 7, 2020

aaronspring Jul 7, 2020

spencerkclark Jul 12, 2020

aaronspring Jul 12, 2020

aaronspring Jul 15, 2020

aaronspring Jul 15, 2020

aaronspring commented Jul 7, 2020

aaronspring commented Jul 9, 2020

keewis commented Jul 9, 2020

spencerkclark left a comment

aaronspring commented Jul 18, 2020

spencerkclark left a comment

spencerkclark commented Jul 23, 2020

dcherian commented Jul 23, 2020

aaronspring commented Jul 23, 2020 •

edited

Loading

CFTimeIndex calendar in repr #4092

CFTimeIndex calendar in repr #4092

Conversation

aaronspring commented May 25, 2020 • edited Loading

spencerkclark left a comment

Choose a reason for hiding this comment

dcherian commented Jun 2, 2020

aaronspring commented Jun 2, 2020

dcherian commented Jun 2, 2020 • edited Loading

aaronspring commented Jun 3, 2020

pep8speaks commented Jun 3, 2020 • edited Loading

Comment last updated at 2020-07-19 17:01:25 UTC

aaronspring commented Jun 4, 2020

aaronspring commented Jun 4, 2020

aaronspring commented Jun 6, 2020

aaronspring commented Jun 6, 2020 • edited Loading

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark Jul 5, 2020

Choose a reason for hiding this comment

aaronspring Jul 7, 2020

Choose a reason for hiding this comment

aaronspring Jul 7, 2020

Choose a reason for hiding this comment

spencerkclark Jul 12, 2020

Choose a reason for hiding this comment

aaronspring Jul 12, 2020

Choose a reason for hiding this comment

aaronspring Jul 15, 2020

Choose a reason for hiding this comment

aaronspring Jul 15, 2020

Choose a reason for hiding this comment

aaronspring commented Jul 7, 2020

aaronspring commented Jul 9, 2020

keewis commented Jul 9, 2020

spencerkclark left a comment

Choose a reason for hiding this comment

aaronspring commented Jul 18, 2020

spencerkclark left a comment

Choose a reason for hiding this comment

spencerkclark commented Jul 23, 2020

dcherian commented Jul 23, 2020

aaronspring commented Jul 23, 2020 • edited Loading

aaronspring commented May 25, 2020 •

edited

Loading

dcherian commented Jun 2, 2020 •

edited

Loading

pep8speaks commented Jun 3, 2020 •

edited

Loading

aaronspring commented Jun 6, 2020 •

edited

Loading

aaronspring commented Jul 23, 2020 •

edited

Loading