-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: 1.3.0 column assignment via single columnnp.matrix
behaviour change
#42376
Comments
np.matrix
behaviour changenp.matrix
behaviour change
changing milestone to 1.3.5 |
can be fixed on master and not worth for 1.3.x supporting np.matrix is not usefulfull at all |
sure. removing milestone. |
I think it's useful until the ecosystem has a realistic alternative to scipy sparse matrices (hopefully soon). From the linked bugs, we've had a number of issues opened by users getting errors from operations like this: df["mean"] = sparse.random(100, 10, format="csr").mean(axis=1)
df Traceback---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
/usr/local/lib/python3.9/site-packages/IPython/lib/pretty.py in pretty(self, obj)
392 if cls is not object \
393 and callable(cls.__dict__.get('__repr__')):
--> 394 return _repr_pprint(obj, self, cycle)
395
396 return _default_pprint(obj, self, cycle)
/usr/local/lib/python3.9/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
698 """A pprint that just redirects to the normal repr function."""
699 # Find newlines and replace them with p.break_()
--> 700 output = repr(obj)
701 lines = output.splitlines()
702 with p.group():
/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in __repr__(self)
993 else:
994 width = None
--> 995 self.to_string(
996 buf=buf,
997 max_rows=max_rows,
/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, min_rows, max_cols, show_dimensions, decimal, line_width, max_colwidth, encoding)
1129 decimal=decimal,
1130 )
-> 1131 return fmt.DataFrameRenderer(formatter).to_string(
1132 buf=buf,
1133 encoding=encoding,
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in to_string(self, buf, encoding, line_width)
1051
1052 string_formatter = StringFormatter(self.fmt, line_width=line_width)
-> 1053 string = string_formatter.to_string()
1054 return save_to_buffer(string, buf=buf, encoding=encoding)
1055
/usr/local/lib/python3.9/site-packages/pandas/io/formats/string.py in to_string(self)
23
24 def to_string(self) -> str:
---> 25 text = self._get_string_representation()
26 if self.fmt.should_show_dimensions:
27 text = "".join([text, self.fmt.dimensions_info])
/usr/local/lib/python3.9/site-packages/pandas/io/formats/string.py in _get_string_representation(self)
38 return self._empty_info_line
39
---> 40 strcols = self._get_strcols()
41
42 if self.line_width is None:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/string.py in _get_strcols(self)
29
30 def _get_strcols(self) -> list[list[str]]:
---> 31 strcols = self.fmt.get_strcols()
32 if self.fmt.is_truncated:
33 strcols = self._insert_dot_separators(strcols)
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in get_strcols(self)
538 Render a DataFrame to a list of columns (as lists of strings).
539 """
--> 540 strcols = self._get_strcols_without_index()
541
542 if self.index:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in _get_strcols_without_index(self)
802 int(self.col_space.get(c, 0)), *(self.adj.len(x) for x in cheader)
803 )
--> 804 fmt_values = self.format_col(i)
805 fmt_values = _make_fixed_width(
806 fmt_values, self.justify, minimum=header_colwidth, adj=self.adj
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_col(self, i)
816 frame = self.tr_frame
817 formatter = self._get_formatter(i)
--> 818 return format_array(
819 frame.iloc[:, i]._values,
820 formatter,
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
1238 )
1239
-> 1240 return fmt_obj.get_result()
1241
1242
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in get_result(self)
1269
1270 def get_result(self) -> list[str]:
-> 1271 fmt_values = self._format_strings()
1272 return _make_fixed_width(fmt_values, self.justify)
1273
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in _format_strings(self)
1516
1517 def _format_strings(self) -> list[str]:
-> 1518 return list(self.get_result_as_array())
1519
1520
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in get_result_as_array(self)
1480 float_format = lambda value: self.float_format % value
1481
-> 1482 formatted_values = format_values_with(float_format)
1483
1484 if not self.fixed_width:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_values_with(float_format)
1454 values = self.values
1455 is_complex = is_complex_dtype(values)
-> 1456 values = format_with_na_rep(values, formatter, na_rep)
1457
1458 if self.fixed_width:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_with_na_rep(values, formatter, na_rep)
1425 mask = isna(values)
1426 formatted = np.array(
-> 1427 [
1428 formatter(val) if not m else na_rep
1429 for val, m in zip(values.ravel(), mask.ravel())
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in <listcomp>(.0)
1426 formatted = np.array(
1427 [
-> 1428 formatter(val) if not m else na_rep
1429 for val, m in zip(values.ravel(), mask.ravel())
1430 ]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.9/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _repr_html_(self)
1045 decimal=".",
1046 )
-> 1047 return fmt.DataFrameRenderer(formatter).to_html(notebook=True)
1048 else:
1049 return None
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in to_html(self, buf, encoding, classes, notebook, border, table_id, render_links)
1027 render_links=render_links,
1028 )
-> 1029 string = html_formatter.to_string()
1030 return save_to_buffer(string, buf=buf, encoding=encoding)
1031
/usr/local/lib/python3.9/site-packages/pandas/io/formats/html.py in to_string(self)
70
71 def to_string(self) -> str:
---> 72 lines = self.render()
73 if any(isinstance(x, str) for x in lines):
74 lines = [str(x) for x in lines]
/usr/local/lib/python3.9/site-packages/pandas/io/formats/html.py in render(self)
619 self.write("<div>")
620 self.write_style()
--> 621 super().render()
622 self.write("</div>")
623 return self.elements
/usr/local/lib/python3.9/site-packages/pandas/io/formats/html.py in render(self)
76
77 def render(self) -> list[str]:
---> 78 self._write_table()
79
80 if self.should_show_dimensions:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/html.py in _write_table(self, indent)
246 self._write_header(indent + self.indent_delta)
247
--> 248 self._write_body(indent + self.indent_delta)
249
250 self.write("</table>", indent)
/usr/local/lib/python3.9/site-packages/pandas/io/formats/html.py in _write_body(self, indent)
393 def _write_body(self, indent: int) -> None:
394 self.write("<tbody>", indent)
--> 395 fmt_values = self._get_formatted_values()
396
397 # write values
/usr/local/lib/python3.9/site-packages/pandas/io/formats/html.py in _get_formatted_values(self)
583
584 def _get_formatted_values(self) -> dict[int, list[str]]:
--> 585 return {i: self.fmt.format_col(i) for i in range(self.ncols)}
586
587 def _get_columns_formatted_values(self) -> list[str]:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/html.py in <dictcomp>(.0)
583
584 def _get_formatted_values(self) -> dict[int, list[str]]:
--> 585 return {i: self.fmt.format_col(i) for i in range(self.ncols)}
586
587 def _get_columns_formatted_values(self) -> list[str]:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_col(self, i)
816 frame = self.tr_frame
817 formatter = self._get_formatter(i)
--> 818 return format_array(
819 frame.iloc[:, i]._values,
820 formatter,
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
1238 )
1239
-> 1240 return fmt_obj.get_result()
1241
1242
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in get_result(self)
1269
1270 def get_result(self) -> list[str]:
-> 1271 fmt_values = self._format_strings()
1272 return _make_fixed_width(fmt_values, self.justify)
1273
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in _format_strings(self)
1516
1517 def _format_strings(self) -> list[str]:
-> 1518 return list(self.get_result_as_array())
1519
1520
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in get_result_as_array(self)
1480 float_format = lambda value: self.float_format % value
1481
-> 1482 formatted_values = format_values_with(float_format)
1483
1484 if not self.fixed_width:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_values_with(float_format)
1454 values = self.values
1455 is_complex = is_complex_dtype(values)
-> 1456 values = format_with_na_rep(values, formatter, na_rep)
1457
1458 if self.fixed_width:
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in format_with_na_rep(values, formatter, na_rep)
1425 mask = isna(values)
1426 formatted = np.array(
-> 1427 [
1428 formatter(val) if not m else na_rep
1429 for val, m in zip(values.ravel(), mask.ravel())
/usr/local/lib/python3.9/site-packages/pandas/io/formats/format.py in <listcomp>(.0)
1426 formatted = np.array(
1427 [
-> 1428 formatter(val) if not m else na_rep
1429 for val, m in zip(values.ravel(), mask.ravel())
1430 ]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() It's weird that this works, but just errors on |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Before 1.3.0, this worked fine. As of 1.3.0, displaying df fails with:
traceback
I discovered this new behaviour due to our tests starting to fail. What was causing that was:
failing with:
traceback
Problem description
This problem is being triggered because the result of
X.sum(axis=1)
whenX
is a scipy sparse matrix is not a 1d numpy ndarray, but anp.matrix
with one column. This used to be handled by pandas, but now isn't.This is a problem because it's a behaviour change that breaks existing code. As far as I can tell from the release notes, this was not an intentional behaviour change. It does look like some things around column assignment did change, and I imagine that assigning with deprecated numpy types was not considered.
Expected Output
I would expect this to not error, and for this to pass:
np.testing.assert_array_equal(df["X_sum"], np.ravel(X.sum(axis=1)))
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: