-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds custom plot formatting for TimedeltaIndex. #15067
Conversation
I think this is ready to merge, it is functional for me now. |
Codecov Report
@@ Coverage Diff @@
## master #15067 +/- ##
==========================================
- Coverage 90.37% 85.52% -4.85%
==========================================
Files 135 145 +10
Lines 49476 51325 +1849
==========================================
- Hits 44713 43896 -817
- Misses 4763 7429 +2666
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this @jgoppert. A few inline comments.
Could you also
- Add a note in
doc/source/visualization.rst
showing how to useplt.gcf().autofmt_xdate()
after plotting - Add a note in
doc/source/whatsnew/v0.20.0.txt
pandas/tseries/plotting.py
Outdated
raise IOError('index type not supported') | ||
|
||
# Note, DatetimeIndex does not use this | ||
# interface. DatetimeIndex uses matplotlib.date directly | ||
|
||
# x and y coord info | ||
subplot.format_coord = lambda t, y: ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the format_coord section to the period index section, the draw_if_interactive I left common.
pandas/tseries/plotting.py
Outdated
subplot.xaxis.set_major_formatter(majformatter) | ||
subplot.xaxis.set_minor_formatter(minformatter) | ||
|
||
elif isinstance(index, TimedeltaIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for this in pandas/tests/plotting/test_datetimelike.py
?
You should just need a couple of datasets
- with
n_decimals > 9
- with
n_decimals < 9
In the tests, if you could check that the formatting is correct, and maybe check the number of ticklabels.
pandas/tseries/plotting.py
Outdated
@@ -278,8 +280,22 @@ def _maybe_convert_index(ax, data): | |||
# Patch methods for subplot. Only format_dateaxis is currently used. | |||
# Do we need the rest for convenience? | |||
|
|||
|
|||
def format_dateaxis(subplot, freq): | |||
def timeTicks(x, pos, n_decimals): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, needs more work though, see comment below.
pandas/tseries/plotting.py
Outdated
@@ -278,8 +280,22 @@ def _maybe_convert_index(ax, data): | |||
# Patch methods for subplot. Only format_dateaxis is currently used. | |||
# Do we need the rest for convenience? | |||
|
|||
|
|||
def format_dateaxis(subplot, freq): | |||
def timeTicks(x, pos, n_decimals): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> timeTicks -> format_time_ticks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched it to format_timedelta_ticks to be very clear it isn't being used for period index.
d, h = divmod(h, 24) | ||
decimals = int(ns * 10**(n_decimals - 9)) | ||
s = r'{:02d}:{:02d}:{:02d}'.format(int(h), int(m), int(s)) | ||
if n_decimals > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are pretty much reinveting the wheel here.
In [2]: pd.Timedelta(20,unit='s')
Out[2]: Timedelta('0 days 00:00:20')
In [3]: pd.Timedelta(20.5,unit='s')
Out[3]: Timedelta('0 days 00:00:20.500000')
In [4]: pd.Timedelta(120.5,unit='s')
Out[4]: Timedelta('0 days 00:02:00.500000')
# if you want to get the data out for specific formatting
In [5]: pd.Timedelta(120.5,unit='s').components
Out[5]: Components(days=0, hours=0, minutes=2, seconds=0, milliseconds=500, microseconds=0, nanoseconds=0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simply call Timedelta(..)._repr_base(format='...')
with a new format if you really want it
In [16]: pd.Timedelta('1 day')._repr_base()
Out[16]: '1 days'
In [17]: pd.Timedelta('1 day')._repr_base(format='long')
Out[17]: '1 days 00:00:00'
In [18]: pd.Timedelta('1 day')._repr_base(format='all')
Out[18]: '1 days 00:00:00.000000000'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, python timedelta doesn't support nanoseconds.
In [4]: pd.Timedelta(1e-9, unit='s')
Out[4]: Timedelta('0 days 00:00:00')
In [5]: pd.Timedelta(1e-3, unit='s')
Out[5]: Timedelta('0 days 00:00:00.001000')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course it does. This is pandas Timedelta. (you are looking at the printing), which by-default suppres it
In [6]: pd.Timedelta(1, unit='ns')._repr_base(format='all')
Out[6]: '0 days 00:00:00.000000001'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great, got confused by the output. So how do I say I want n decimals, because that would be a drop-in for what I've already written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, suppressing days when they are 0 would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the
Components(days=0, hours=0, minutes=2, seconds=0, milliseconds=500, microseconds=0, nanoseconds=0)
could compress the code a bit, but I'm not sure it would save significant space over what I've already got.
Still need to improve unit testing.
for l in xaxis.get_ticklabels(): | ||
if len(l.get_text()) > 0: | ||
self.assertEqual(l.get_rotation(), 30) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test needs more work.
For this test, we basically want to check that for a narrow window (as shown below) we get nanosecond precision, and if at 0 to start without days.
rng = timedelta_range('0', periods=3, freq='ns')
# rough code to follow
expected_labels = [
'00:00:00.000000001',
'00:00:00.000000002,
'00:00:00.000000001']
for label, expected_label in zip(ax.get_xtick_labels(), expected_labels):
# check that there are no days
self.assertEqual(label, expected_label)
We also want to check for a wide window that we have second precision.
rng = timedelta_range('0', periods=10, freq='1 d')
# test issue #8711 | ||
s = Series(range(5), timedelta_range('1day', periods=5)) | ||
_check_plot_works(s.plot) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea if check_plots_works will actually do anything useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically just tests that the plot call does not error. So, for example, that means that it will test that it does not raise a MemoryError like it does now in some cases
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -116,6 +116,7 @@ Other enhancements | |||
|
|||
- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`) | |||
|
|||
- ``pd.TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a link to visualization.rst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is certainly possible, we only have not really a section about timedelta plotting currently.
@TomAugspurger @jreback do you guys think you can help me clean this up. I'm not familiar with your docs and unit tests and have to get back to what I wrote this for today. I gave you push access to my branch since I can't push the branch to upstream. |
I just pushed the full implementation of the unit test. Can we merge this now? It is definitely an improvement over master's mem allocation error and we can revisit the small details later. |
@jgoppert things are merged when they pass all tests, have documentation, and unit tests to the satisfaction of the maintainers. This has not been reviewed as yet. of course if you or really anyone wants to pull in your commits on top of master, they can certainly do this. |
I understand, I'm just running out of development bandwidth and don't want it to stall. Let me know what the delta is and I'll try to do what I can. |
@jreback Any idea why clang 3.5 failed to set the xticks label? |
cc @tacaswell |
So weird that osx with python 3.5 keeps failing and all of the others work. I'm running out of ideas. |
doc/source/visualization.rst
Outdated
date tick adjustment from matplotlib for figures whose ticklabels overlap. | ||
|
||
See the :meth:`autofmt_xdate <matplotlib.figure.autofmt_xdate>` method and the | ||
`matplotlib documentation <http://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure.autofmt_xdate>`__ for more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refer here to http://matplotlib.org/users/recipes.html#fixing-common-date-annoyances instead (the link to the api docs will also be generated through the :meth:..
you did, and is also included in this recipe section.
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -116,6 +116,7 @@ Other enhancements | |||
|
|||
- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`) | |||
|
|||
- ``pd.TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is certainly possible, we only have not really a section about timedelta plotting currently.
def test_timedelta_plot(self): | ||
# test issue #8711 | ||
s = Series(range(5), timedelta_range('1day', periods=5)) | ||
_check_plot_works(s.plot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add here the three examples you showed in the top post of the PR? (just with the same _check_plot
call as you already did for 1 series). At least one of them was producing a memory error currently, so it would be good to include that here to ensure this does not happen again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, just added the 2 easy ones above.
Still need to improve unit testing.
4d240a8
to
4eff697
Compare
@jorisvandenbossche I think I addressed your review. Hopefully mac passes CI now. |
pandas/tseries/plotting.py
Outdated
lambda x, pos: format_timedelta_ticks(x, pos, n_decimals)) | ||
subplot.xaxis.set_major_formatter(formatter) | ||
else: | ||
raise IOError('index type not supported') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
pandas/tseries/plotting.py
Outdated
def format_dateaxis(subplot, freq): | ||
def format_timedelta_ticks(x, pos, n_decimals): | ||
''' Convert seconds to 'D days HH:MM:SS.F' ''' | ||
s, ns = divmod(x, 1e9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use triple double quotes not single quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
pandas/tseries/plotting.py
Outdated
elif isinstance(index, TimedeltaIndex): | ||
from matplotlib import ticker | ||
(vmin, vmax) = tuple(subplot.xaxis.get_view_interval()) | ||
n_decimals = int(np.ceil(np.log10(100 * 1e9 / (vmax - vmin)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason not to make this interface similar to the above?
e.g. why not create a TimeSeries_TimedeltaFormatter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in latest commit.
@jreback Any idea how to proceed with this. Can we put a skip in for mac and file an issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it would be OK to skip those two tests on travis for mac for now.
Tests that check the exact labels are not that robust anyway, as it can change with matplotlib versions.
What you could test additionaly (and what should pass on mac as well), is that the ax.xaxis.get_major_formatter
is the expected formatter (the one you now coded), and pass in a known value to check the formatting.
""" | ||
|
||
@staticmethod | ||
def format_timedelta_ticks(x, pos, n_decimals): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the pos
here
def format_timedelta_ticks(x, pos, n_decimals): | ||
""" | ||
Convert seconds to 'D days HH:MM:SS.F' | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function can be removed no that you moved it to a Formatter?
df = DataFrame(np.random.randn(len(rng), 3), rng) | ||
ax = df.plot(fontsize=2) | ||
fig = ax.get_figure() | ||
fig.canvas.draw() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think those two lines above are needed?
(and the same for the test above)
Roughly I mean something like:
|
I replicated the exact travis env for 3.5 / OSX and these test passed for me. so ok to skip i guess (though I wonder if these have something to do with some output settings, e.g. LOCALE). |
This is not unreasonable. It might be worth adding a |
…ion display issues xref #15067
thanks @jgoppert I merged this (and skip those 2 comparisons on mac). They seem kind of odd, though I couldn't reproduce them with a very similar local environment. If you want to keep looking at them great (otherwise ok too). Similarly if you want to do a mouse-over formatter as @tacaswell suggest would be great as well. |
hmm the tick locations look a bit odd, is this normal? (this is the just merged PR) |
Thanks for getting this in: When you do rot=45, it works pretty well, but it does't align to tics, the autofmt_xdate does. Also, internally the tics are just set by the nanoseconds, so it isn't intelligent about frequency. If you look at the plot you will notice that the tic locations are at meaningful locations for nanoseconds. Staring at 1 day with a period of 5 starts at an odd place in terms of nanoseconds. If you look at when I just do seconds, it works well because seconds are just related to nano-seconds via a power of 10. |
thanks for the expl @jgoppert |
Author: James Goppert <[email protected]> Author: James Goppert <[email protected]> Closes pandas-dev#8711 Closes pandas-dev#15067 from jgoppert/tdi_plot_fix and squashes the following commits: 945ec14 [James Goppert] Merge branch 'master' into tdi_plot_fix 7db61ec [James Goppert] Create TimeSeries_TimedeltaFormatter. 232efe6 [James Goppert] Fix comment format and exception type for tdi plotting. 4eff697 [James Goppert] Add more time delta series plotting tests. f5f32bc [James Goppert] Link time delta index docs to better matplotlib docs. d588c2c [James Goppert] Fixes test for tdi w/o autofmt_xdate. b6e6a81 [James Goppert] Disables autofmt_xdate testing. c7851e3 [James Goppert] Adjusts tdi test draw calls to try to fix CI issue. 7d28842 [James Goppert] Switch to draw_idle to try to fix bug on xticks update. 3abc310 [James Goppert] Try plt.draw() instead of canvas.draw() to fix issue on osx 3.5. 91954bd [James Goppert] Finished unit test for timedelta plotting. 41ebc85 [James Goppert] Fixes for review comments from pandas-dev#15067. f021cbd [James Goppert] Support nano-second level precision x-axis labels. 5ec65fa [James Goppert] Plot fix for tdi and added more comments. b967d24 [James Goppert] flake8 fixes for tdi plotting. efe5636 [James Goppert] Adds custom plot formatting for TimedeltaIndex.
…ion display issues xref pandas-dev#15067
it works only for fixed time step between points on the plot consider following code:
|
Instead of hacking the PeriodIndex plotting code to work for TimedeltaIndex, this just creates a simple formatter unique to TimedeltaIndex. It does fine for regular intervals up to a day, then the labels and run into eachothter. If this happens, you can use plt.autofmt_xdate() to fix it.
git diff upstream/master | flake8 --diff