Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__str__ and __repr__ are ignored for custom iterable objects #17695

Closed
dmyersturnbull opened this issue Sep 27, 2017 · 4 comments
Closed

__str__ and __repr__ are ignored for custom iterable objects #17695

dmyersturnbull opened this issue Sep 27, 2017 · 4 comments

Comments

@dmyersturnbull
Copy link

I posted a comment on ipython/ipython#2379, but I realized this may be specific to Pandas. I'm using IPython 6.2.0, Python 3.5, Jupyter 4.3.0, and Pandas 0.20.1.

Pandas won't use __str__ or __repr__ (or anything I can find) to display an iterable object that defines __len__

Code Sample, a copy-pastable example if possible

class WithLen:
    z = [1,2,3]
    def __getitem__(self, i): return self.z[i]
    def __str__(self): return "from str"
    def __repr__(self): return "from repr"
    def __len__(self): return len(self.z)

df = pd.DataFrame([
    pd.Series({'test': WithLen()})
])
print(df)  # PROBLEMATIC CASE
print(df['test'].iloc[0])  # prints fine
class WithoutLen:
    z = [1,2,3]
    def __getitem__(self, i): return self.z[i]
    def __str__(self): return "from str"
    def __repr__(self): return "from repr"
    def __len__(self): return len(self.z)

print(pd.DataFrame([
    pd.Series({'test': WithoutLen()})
]))  # prints fine

Problem description

I've implemented __str__ and __repr__ on an iterable class with good reason, and it being iterable shouldn't override that behavior.

Expected Output

I expected and want to see this printed both with and without __len__:

       test
0  from str

Output of pd.show_versions()


INSTALLED VERSIONS
------------------
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.6-201.fc25.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.2.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: 1.1.6
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None
@TomAugspurger
Copy link
Contributor

You might try looking in pandas/io/formats/format.py, specifically

def format_array(values, formatter, float_format=None, na_rep='NaN',
and I'm assuming
class GenericArrayFormatter(object):

Presumably we cast to np.array somewhere in there, perhaps unnecessarily.

@jreback
Copy link
Contributor

jreback commented Sep 28, 2017

to be honest this is not real well supported, nor intended. you are pretty much on your own.

putting anything else besides scalars inside a Series/DataFrame is non-idiomatic. There will be some support for containers or things (e.g. lists / arrays) in pandas2.

@jreback jreback closed this as completed Sep 28, 2017
@jreback jreback added this to the won't fix milestone Sep 28, 2017
@TomAugspurger
Copy link
Contributor

I think if we can support formatting without too much effort then we should. @dmyersturnbull if you can track down the issue and suggest a fix, it'd probably be accepted.

@jorisvandenbossche
Copy link
Member

The reason is because in our 'pretty print' implementation, we have a special method for sequences that can follow the max_seq_items option:

def _pprint_seq(seq, _nest_lvl=0, max_seq_items=None, **kwds):
"""
internal. pprinter for iterables. you should probably use pprint_thing()
rather then calling this directly.
bounds length of printed sequence, depending on options
"""
if isinstance(seq, set):
fmt = u("{{{body}}}")
else:
fmt = u("[{body}]") if hasattr(seq, '__setitem__') else u("({body})")
if max_seq_items is False:
nitems = len(seq)
else:
nitems = max_seq_items or get_option("max_seq_items") or len(seq)
s = iter(seq)
r = []
for i in range(min(nitems, len(seq))): # handle sets, no slicing
r.append(pprint_thing(
next(s), _nest_lvl + 1, max_seq_items=max_seq_items, **kwds))
body = ", ".join(r)
if nitems < len(seq):
body += ", ..."
elif isinstance(seq, tuple) and len(seq) == 1:
body += ','
return fmt.format(body=body)

Which more or less hardcodes the possible results (depending on whether setitem exists, it will look like list of tuple). I am not directly sure of a way to both follow the repr of custom objects and still be able to limit the number of items showed for normal reprs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants