Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFTimeIndex calendar in repr #4092

Merged
merged 37 commits into from
Jul 23, 2020
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
0bd61d9
add property
May 25, 2020
28cbf84
test repr skip
May 25, 2020
c5d36cc
repr
Jun 3, 2020
8d9ebe0
linting
Jun 3, 2020
c1cac9f
remove unnecessary
Jun 3, 2020
d09092a
remove unnecessary
Jun 3, 2020
59dc912
add quotation marks to calendar
Jun 4, 2020
1fccb53
add length to wrapper
Jun 4, 2020
a52e997
linting
Jun 4, 2020
6d022ea
coords.to_index() if CFTimeIndex
Jun 6, 2020
bf8c5a0
to_index() iff CFTimeIndex
Jun 6, 2020
1e809d0
revert linting
Jun 6, 2020
1bfdc7c
revert linting
Jun 6, 2020
d58cb7d
revert linting
Jun 6, 2020
a0d00ab
to_index in short_data_repr_html
Jun 9, 2020
019d309
refine test and rm prints
Jun 9, 2020
269e967
fix to pass all tests
Jun 9, 2020
68a37c6
revert linting changes
Jun 11, 2020
0907cef
revert to_index()
Jun 12, 2020
a9c048f
require cftime for added test
Jun 12, 2020
fcb48fd
implement format_array_flat repr without commata and multiple lines
Jun 14, 2020
83839ec
reproduce pd.Index repr for CFTimeIndex repr
Jun 15, 2020
e3c8c01
reproduce pd.Index repr for CFTimeIndex repr
Jun 15, 2020
69d000f
sensitive to display_width
Jul 7, 2020
80ca891
rewritte format_cftimeindex_array from template of format_array_flat
Jul 7, 2020
3080d81
bugfix
Jul 7, 2020
b77a2ee
new approach
Jul 15, 2020
1e67f3f
Merge branch 'master' into AS_CFTimeIndex_repr_calendar
aaronspring Jul 15, 2020
d8d54ce
docstring
Jul 15, 2020
4b62406
Merge branch 'AS_CFTimeIndex_repr_calendar' of https://github.com/aar…
Jul 15, 2020
249ae24
attrs spaces fix
Jul 15, 2020
ecef05a
rm pandas test, refactor format_attrs and repr test dedent
Jul 18, 2020
7c31e3a
rm f lint
Jul 18, 2020
683f00c
Pass index to format_attrs instead of attrs dict
spencerkclark Jul 19, 2020
2707409
Update whats-new.rst
spencerkclark Jul 19, 2020
b7552b3
Add docstring for new calendar property
spencerkclark Jul 19, 2020
e8d85db
Update doc/whats-new.rst
spencerkclark Jul 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ New Features
:py:func:`xarray.decode_cf`) that allows to disable/enable the decoding of timedeltas
independently of time decoding (:issue:`1621`)
`Aureliana Barghini <https://github.com/aurghs>`
- Add ``calendar`` as a new property for ``CFTimeIndex`` and show in ``calendar`` and
``length`` in ``CFTimeIndex.__repr__`` (:issue:`2416`, :pull:`4092`)
`Aaron Spring <https://github.com/aaronspring>`

Bug fixes
~~~~~~~~~
Expand Down
67 changes: 67 additions & 0 deletions xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
from xarray.core.utils import is_scalar

from ..core.common import _contains_cftime_datetimes
from ..core.formatting import format_array_flat
from .times import _STANDARD_CALENDARS, cftime_to_nptime, infer_calendar_name


Expand Down Expand Up @@ -259,6 +260,66 @@ def __new__(cls, data, name=None):
result._cache = {}
return result

def __repr__(self):
"""
Return a string representation for this object.

copied from pandas.io.printing.py
expect for attrs.append(("calendar", self.calendar))
"""
klass_name = type(self).__name__
len_item = 19 # length of one item in repr
# shorten repr for more than 100 items
max_width = (19 + 1) * 100 if len(self) <= 100 else 22 * len_item
datastr = format_array_flat(self.values, max_width)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think format_array_flat is a good choice if we want a 2-line repr, but for including more values it might be cleaner to write our own logic, rather than add commas and line breaks afterwards. As you've picked up on, I think the fact that we can treat the length of each element of the repr in a cftime array as a constant simplifies things greatly.

This is of course ignoring the potential for five-digit years; however, we already assume we won't see those in at least one other place in xarray (partial string indexing). At some point it might be good address that, but I think for now it's ok to stick with the four-digit year assumption. Particularly here, I think the worst that would happen is that the repr might potentially be a few characters wider than the imposed limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for now I use format_array_flat and insert commata and linebreaks manually. Are you suggestion I should rather write a new function? This function would probably use much of the code of format_array_flat.

concerning the 5digit years: xr.cftime_range(start='10000',periods=2) fails with ValueError: no ISO-8601 match for string: 10000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now wrote a new function format_cftimeindex_array like format_array_flat. Hope this is what you were hoping to see. Both functions share much of the code especially in the beginning of the function. Should I refactor these shared code parts into a small function that both functions use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concerning the 5digit years: xr.cftime_range(start='10000',periods=2) fails with ValueError: no ISO-8601 match for string: 10000

Oh right, we use the same string parsing logic in cftime_range as in partial datetime string indexing. I was thinking of dates one might read in from a file, which get decoded through cftime.num2date. Anyway I acknowledge that is an issue we don't need to address at the moment!

I now wrote a new function format_cftimeindex_array like format_array_flat. Hope this is what you were hoping to see.

Sorry I was thinking something more along these lines for the code that formats the times (the rest of the repr can be added around what it generates):

CFTIME_REPR_LENGTH = 19

def format_row(times, indent=0, separator=", ", row_end=",\n"):
    return indent * " " + separator.join(map(str, times)) + row_end


def format_times(
    index,
    max_width,
    offset,
    separator=", ",
    first_row_offset=0,
    intermediate_row_end=",\n",
    last_row_end=""
):
    n_per_row = max(max_width // (CFTIME_REPR_LENGTH + len(separator)), 1)
    n_rows = int(np.ceil(len(index) / n_per_row))
    
    representation = ""
    for row in range(n_rows):
        indent = first_row_offset if row == 0 else offset
        row_end = last_row_end if row == n_rows - 1 else intermediate_row_end  
        times_for_row = index[row * n_per_row:(row + 1) * n_per_row]
        representation = representation + format_row(
            times_for_row,
            indent=indent,
            separator=separator,
            row_end=row_end
        )
        
    return representation

In other words iteratively generating each row in the repr, inserting the separator as you build each row, and inserting line breaks at the end of each row. I just find it fits in my head better than adding those elements post-hoc. I think you should be able to leverage the code above to construct a "split" repr as well (e.g. one that shows only the first and last 10 elements of the index) by calling format_times twice with the appropriate arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. Now I think I get the idea...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the really nice template

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented thanks to your nice template given above. ready for review @spencerkclark


def join_every_second(s, sep=" ", join=", "):
# to formatting.py
"""Join every second item after split(sep)."""
ss = s.split(sep)
sj = [x + " " + y for x, y in zip(ss[0::2], ss[1::2])]
return join.join(sj)

linebreak_spaces = " " * len(klass_name)
linebreak_add = linebreak_spaces + " "

def insert_linebreak_after_three(s, sep=",", linebreak=" "):
"""Linebreak after three items split(sep)."""
s_sep = s.split(sep)
for i in range(len(s_sep)):
if i % 3 == 0 and i != 0:
s_sep[i] = f"\n{linebreak}{s_sep[i]}"
return sep.join(s_sep)

if datastr:
if len(self) <= 3:
datastr = join_every_second(datastr)
else:
sepstr = "..."
if sepstr in datastr:
firststr, laststr = datastr.split(f" {sepstr} ")
firststr = insert_linebreak_after_three(
join_every_second(firststr), linebreak=linebreak_add)
laststr = insert_linebreak_after_three(
join_every_second(laststr), linebreak=linebreak_add)
datastr = f"{firststr},\n{linebreak_spaces} {sepstr}\n{linebreak_spaces} {laststr}"
else:
datastr = insert_linebreak_after_three(
join_every_second(datastr), linebreak=linebreak_add
)

attrs = {
"dtype": f"'{self.dtype}'",
"length": f"{len(self)}",
"calendar": f"'{self.calendar}'",
}
attrs_str = [f"{k}={v}" for k, v in attrs.items()]
prepr = f",{' '}".join(attrs_str)
if len(self) <= 3:
return f"{klass_name}([{datastr}], {prepr})"
else:
return f"{klass_name}([{datastr}],\n{linebreak_spaces} {prepr})"

def _partial_date_slice(self, resolution, parsed):
"""Adapted from
pandas.tseries.index.DatetimeIndex._partial_date_slice
Expand Down Expand Up @@ -581,6 +642,12 @@ def asi8(self):
]
)

@property
def calendar(self):
from .times import infer_calendar_name

return infer_calendar_name(self)

def _round_via_method(self, freq, method):
"""Round dates using a specified method."""
from .cftime_offsets import CFTIME_TICKS, to_offset
Expand Down
103 changes: 103 additions & 0 deletions xarray/tests/test_cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -884,6 +884,109 @@ def test_cftimeindex_shift_invalid_freq():
index.shift(1, 1)


@requires_cftime
@pytest.mark.parametrize(
("calendar", "expected"),
[
("noleap", "noleap"),
("365_day", "noleap"),
("360_day", "360_day"),
("julian", "julian"),
("gregorian", "gregorian"),
("proleptic_gregorian", "proleptic_gregorian"),
],
)
def test_cftimeindex_calendar_property(calendar, expected):
index = xr.cftime_range(start="2000", periods=3, calendar=calendar)
assert index.calendar == expected


@requires_cftime
@pytest.mark.parametrize(
("calendar", "expected"),
[
("noleap", "noleap"),
("365_day", "noleap"),
("360_day", "360_day"),
("julian", "julian"),
("gregorian", "gregorian"),
("proleptic_gregorian", "proleptic_gregorian"),
],
)
def test_cftimeindex_calendar_repr(calendar, expected):
"""Test that cftimeindex has calendar property in repr."""
index = xr.cftime_range(start="2000", periods=3, calendar=calendar)
repr_str = index.__repr__()
assert f" calendar='{expected}'" in repr_str
assert "2000-01-01 00:00:00, 2000-01-02 00:00:00" in repr_str


@requires_cftime
@pytest.mark.parametrize("periods", [2, 40])
def test_cftimeindex_periods_repr(periods):
"""Test that cftimeindex has periods property in repr."""
index = xr.cftime_range(start="2000", periods=periods)
repr_str = index.__repr__()
assert f" length={periods}" in repr_str


@requires_cftime
@pytest.mark.parametrize("periods", [2, 3, 4, 100, 101])
def test_cftimeindex_repr_formatting(periods):
"""Test that cftimeindex.__repr__ is formatted as pd.Index.__repr__."""
index = xr.cftime_range(start="2000", periods=periods)
repr_str = index.__repr__()
print(repr_str)
# check for commata
assert "2000-01-01 00:00:00, 2000-01-02 00:00:00" in repr_str
if periods <= 3:
assert "\n" not in repr_str
"CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00, 2000-01-03 00:00:00], dtype='object', calendar='standard')" == repr_str
else:
# check for linebreak
assert ", 2000-01-03 00:00:00,\n" in repr_str
# check for times have same indent
lines = repr_str.split("\n")
firststr = "2000"
assert lines[0].find(firststr) == lines[1].find(firststr)
# check for attrs line has one less indent than times
assert lines[-1].find("dtype") + 1 == lines[0].find(firststr)
# check for ... separation dots
if periods > 100:
assert "..." in repr_str


@requires_cftime
@pytest.mark.parametrize("periods", [22, 50, 100])
def test_cftimeindex_repr_101_shorter(periods):
index_101 = xr.cftime_range(start="2000", periods=101)
index_periods = xr.cftime_range(start="2000", periods=periods)
index_101_repr_str = index_101.__repr__()
index_periods_repr_str = index_periods.__repr__()
assert len(index_101_repr_str) < len(index_periods_repr_str)


@requires_cftime
@pytest.mark.parametrize("periods", [3, 4, 100, 101])
def test_cftimeindex_repr_compare_pandasIndex(periods):
cfindex = xr.cftime_range(start="2000", periods=periods)
pdindex = pd.Index(cfindex)
cfindex_repr_str = cfindex.__repr__()
pdindex_repr_str = pdindex.__repr__()
pdindex_repr_str = pdindex_repr_str.replace("Index", "CFTimeIndex")
pdindex_repr_str = pdindex_repr_str.replace(f"\n{' '*7}", f"\n{' '*13}")
if periods > 3:
pdindex_repr_str = pdindex_repr_str.replace("dtype", f"{' '*6}dtype")
if periods <= 100:
lengthstr = f"length={periods}, "
else:
lengthstr = ""
pdindex_repr_str = pdindex_repr_str.replace(
")", f", {lengthstr}calendar='gregorian')"
)
assert pdindex_repr_str == cfindex_repr_str


@requires_cftime
def test_parse_array_of_cftime_strings():
from cftime import DatetimeNoLeap
Expand Down