Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT: Allow multi-indexes to be written to excel #10570

Merged
merged 1 commit into from
Aug 20, 2015
Merged

COMPAT: Allow multi-indexes to be written to excel #10570

merged 1 commit into from
Aug 20, 2015

Conversation

flamingbear
Copy link
Contributor

(Even though they cannot be read back in)

Closes #10564

('2014','weight')])
df = pd.DataFrame(np.random.randn(10,3), columns=cols)
import warnings
with warnings.catch_warnings(record=True) as w:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert_produces_warning

@jorisvandenbossche
Copy link
Member

Is this warn_roundtrip_serialization keyword needed?

I agree it can be annoying to not be able to remove a warning, but a keyword just for this that maybe is not needed anymore in a next release seems also a bit unnecessary

@jreback
Copy link
Contributor

jreback commented Jul 15, 2015

another way to do this is to make a custom warning class (e.g. inherit from UserWarning). Then always show the warning. If the users don't like it they can always filter. I DO think this is important to always warn as you are doing something which is not reversible.

@jreback jreback added IO Excel read_excel, to_excel MultiIndex Compat pandas objects compatability with Numpy or Python functions labels Jul 15, 2015
@jreback jreback added this to the 0.17.0 milestone Jul 15, 2015
@@ -119,6 +119,7 @@ Other API Changes
- Enable serialization of lists and dicts to strings in ExcelWriter (:issue:`8188`)
- Allow passing `kwargs` to the interpolation methods (:issue:`10378`).
- Serialize metadata properties of subclasses of pandas objects (:issue:`10553`).
- Allow multi-indexes columns to be written to Excel with a one way serialization (:issue: `10564`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a bit more explanation (e.g. we removed this in 0.16.2 (or was it 1?)) because of the one-way serialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about ?:

+- Allow ``DataFrame`` with ``MultiIndex`` columns to be written to Excel (:issue: `10564`). The original fix implemented for (:issue:`9794`) was slightly overkill as only ``DataFrame``s without indexes turned out to be broken Excel files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more like:

this was changed in 0.16.2 as the read-back method could not always guarantee perfect fidelity.

@jorisvandenbossche
Copy link
Member

It is maybe not reversible with a simple read_excel, but it is possible to get the same as original with a little bit of processing.

BTW, the docstring of read_excel is not fully correct, as it supports creating a multi-index index with index_col (it just can't deal with the formatting of multi-indices by to_excel)

@jorisvandenbossche
Copy link
Member

And, @jreback, I like the idea of having a global way to turn of this kind of UserWarnings more than special keywords all over the place (and if we want an argument for it, I would rather go with something more general, like silent or verbose)

@jreback
Copy link
Contributor

jreback commented Jul 15, 2015

@jorisvandenbossche

hmm, ok could have io.excel.warnings='verbose' (for an option).

Though maybe a more general io.warnings='verbose'

@flamingbear
Copy link
Contributor Author

Is this still being considered?

@jreback
Copy link
Contributor

jreback commented Jul 30, 2015

can you rebase / squash?

@jorisvandenbossche what do you think about verbose

@flamingbear
Copy link
Contributor Author

Yeah, I'll see if I can pull in the master and squash.

@flamingbear
Copy link
Contributor Author

Sorry I'm bungling this at the moment.

@flamingbear
Copy link
Contributor Author

Ok. Got my git figured out. Hopefully tests will still pass. Thanks.

"not yet implemented.")
if not index:
raise NotImplementedError("Writing to Excel with MultiIndex"
" columns and no index is not yet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change 'and no index' to 'and with 'index=False' ' (otherwise you could understand that the dataframe has no index)

@jorisvandenbossche
Copy link
Member

In any case, I think verbose is much better as the original name.
I think I also like that you can set an option for such things, but maybe both approaches do not exclude each other for now?

@@ -1305,11 +1309,17 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
"""
from pandas.io.excel import ExcelWriter
if self.columns.nlevels > 1:
raise NotImplementedError("Writing as Excel with a MultiIndex is "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this entire thing to core.format.ExcelFormatter didn't realize it was here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what you're looking for. Just the 'raise NotImpl...' moved into the ExcelFormatter init? or where?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type of checking on the index should go about here. Then all of this code will be in one place.

@flamingbear
Copy link
Contributor Author

Ok. Couldn't figure out what to do with the verbose keyword, so I improvised and copied someone else's custom warning.

@jreback
Copy link
Contributor

jreback commented Aug 14, 2015

you need to still accept the verbose kw in DataFrame.to_excel and pass in thru as another parameter. Then reference it in the formatter; further a custom warning is fine (though emitting it should still be controlled by verbose as before)

@flamingbear
Copy link
Contributor Author

Now other things are breaking and I don't really see why. I'm just about at the end of my rope here.
So now there's a verbose keyword to ExcelFormatter too?

@jreback
Copy link
Contributor

jreback commented Aug 14, 2015

just pass it thru, you had added it before; just put it back in to_excel (and then to ExcelFormatter).

@flamingbear
Copy link
Contributor Author

I'm not sure what the tests were trying to catch before or if they're good, but I can make them pass by reordering the tests for valid input before testing for aliases. (seems kinda sketchy)

@flamingbear
Copy link
Contributor Author

Don't think this was my fault: https://travis-ci.org/pydata/pandas/jobs/75683428, it passed on the branch https://travis-ci.org/flamingbear/pandas/builds/75683397

I just pushed a no-op branch up again. Is there a way I could have scheduled travis tests without pushing the branch again?

@flamingbear
Copy link
Contributor Author

@jreback I think this is ready for you now. Thanks. Matt

@jreback
Copy link
Contributor

jreback commented Aug 15, 2015

lgtm. @jorisvandenbossche ?

@@ -1640,11 +1644,14 @@ class ExcelFormatter(object):
inf_rep : string, default `'inf'`
representation for np.inf values (which aren't representable in Excel)
A `'-'` sign will be added in front of -inf.
verbose: boolean, default True
If True, warn user that the resulting output file may not be
re-read or parsed directly by pandas.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use normal indentation (4 spaces here)? no alignment with the 'boolean, ..' is needed. (se the other parameters)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can do that.

@flamingbear
Copy link
Contributor Author

@jreback @jorisvandenbossche : Can you give me some help here? This happened before and I repushed and it fixed itself. I don't know what a ResourceError is, https://travis-ci.org/pydata/pandas/jobs/75848253, but it's still passing fine on my travis. I can keep pushing until it goes through again. Is that the best option? This happened before #10570 (comment)

@jreback
Copy link
Contributor

jreback commented Aug 16, 2015

just leave for now
I think this is caused by something else

@flamingbear
Copy link
Contributor Author

alright if this one fails, I'll leave it. Thanks

"is not yet implemented.")
elif self.index and self.verbose:
warnings.warn("Writing to Excel with MultiIndex columns is a"
" one way serializable operation. You will not"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is where I mean

@jreback
Copy link
Contributor

jreback commented Aug 19, 2015

@flamingbear minor change, then lgtm. ping when green.

(Even though they cannot be read back in.)

Closes #10564
@flamingbear
Copy link
Contributor Author

@jreback we're green. Thanks.

@jreback
Copy link
Contributor

jreback commented Aug 19, 2015

@jorisvandenbossche ?

@jorisvandenbossche
Copy link
Member

Looks good! Thanks!

jorisvandenbossche added a commit that referenced this pull request Aug 20, 2015
…el-writing

COMPAT: Allow multi-indexes to be written to excel
@jorisvandenbossche jorisvandenbossche merged commit b63206b into pandas-dev:master Aug 20, 2015
@flamingbear
Copy link
Contributor Author

Thanks guys.

@flamingbear flamingbear deleted the 10564-allow-multiindex-excel-writing branch March 31, 2016 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO Excel read_excel, to_excel MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants