-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: repr - allow block to override values that get formatted #17143
Changes from 4 commits
f6e376f
ec7ff5d
7e259c9
000c6fc
dcd3c0f
b41b713
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# -*- coding: utf-8 -*- | ||
# pylint: disable=W0102 | ||
|
||
import numpy as np | ||
|
||
import pandas as pd | ||
from pandas.core.internals import Block, BlockManager | ||
|
||
|
||
class CustomBlock(Block): | ||
|
||
def formatting_values(self): | ||
return np.array(["Val: {}".format(i) for i in self.values]) | ||
|
||
|
||
def test_custom_repr(): | ||
values = np.arange(3, dtype='int64') | ||
|
||
# series | ||
block = CustomBlock(values, placement=slice(0, 3)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't pass fastpath, that's not really a public option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you don't use fastpath, it does not preserve the Block type. Eg:
The reason is that we don't check for For that reason I am also using the fastpath in GeoPandas. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, that looks like a bug. if you change that does it break anything else? (could be followup as well) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general it has proven a bit difficult to construct Series and DataFrame objects from given blocks, without re-creating the blocks (eg in Series, the block gets converted to array, which is then passed to SingleBlockManager, which does not preserve the block type) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you generally need to give it a SingleBkockManger or BlockManager blocks are a lower level item There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, that's fine. But slightly change my comment: it has also proven to be difficult to add a block to a BlockManager with preserving the block type. Once we have a BlockManager, it's indeed simply passing it to DataFrame(..) to get a df. That's what we do to create dataframes, I should probably take the same approach to create SingleBlockManager for the series case instead of using that fastpath. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is it hard to add a Block to BM There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you have Block.set and Block.insert methods to add things to a Block, but those also do not preserve the block you pass There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See here for a work around that I now use in the geopandas refactor branch: geopandas/geopandas#467 (comment) (but given the length of code in Block.insert/set, this is maybe actually a simple way) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, I added a commit with an attempt to remove the usage of fastpath |
||
s = pd.Series(block, index=pd.RangeIndex(3), fastpath=True) | ||
assert repr(s) == '0 Val: 0\n1 Val: 1\n2 Val: 2\ndtype: int64' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so the windows test fail because this is int32 there. you have to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just specified the dtype as int64, I suppose that is fine as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep thats good too |
||
|
||
# dataframe | ||
block = CustomBlock(values.reshape(1, -1), placement=slice(0, 1)) | ||
blk_mgr = BlockManager([block], [['col'], range(3)]) | ||
df = pd.DataFrame(blk_mgr) | ||
assert repr(df) == ' col\n0 Val: 0\n1 Val: 1\n2 Val: 2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add this in setup.py as well