BUG: to_latex with multicolumn and multiindex joins cells which are on different hierarchy levels #16719

wlter · 2017-06-18T15:13:00Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np


df = pd.DataFrame()


idx = 0
for i in range(2):
  for val0 in [3,2,1]:
    for val1 in range(val0):
      idx = idx + 1
      r0 = str(np.mod(int(idx*0.5),2))
      r1 = np.random.uniform()
      df = df.append({"i":i,"val0" : val0,\
                            "val1":val1, 
                            "r0":r0, \
                          "r1":r1}, ignore_index=True)


df.set_index(["i",'val0','val1',"r0", 'r1'], inplace=True)

print(df.to_latex(multirow = True, escape=False))

produces

\begin{tabular}{lllll}
	\toprule
	&     &     &   &          \\
	i & val0 & val1 & r0 & r1 \\
	\midrule
	\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.307919 \\
	&     & 1.0 & \multirow{2}{*}{1} & 0.488816 \\
	&     & 2.0 &   & 0.708405 \\
	\cline{2-5}
	\cline{4-5}
	& \multirow{2}{*}{2.0} & 0.0 & \multirow{2}{*}{0} & 0.806916 \\
	&     & 1.0 &   & 0.763446 \\
	\cline{2-5}
	\cline{4-5}
	& 1.0 & \multirow{2}{*}{0.0} & \multirow{2}{*}{1} & 0.255642 \\
	\cline{1-5}
	\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} &     &   & 0.093269 \\
	\cline{3-5}
	\cline{4-5}
	&     & 1.0 & \multirow{2}{*}{0} & 0.775120 \\
	&     & 2.0 &   & 0.989241 \\
	\cline{2-5}
	\cline{4-5}
	& \multirow{2}{*}{2.0} & 0.0 & \multirow{2}{*}{1} & 0.741230 \\
	&     & 1.0 &   & 0.960813 \\
	\cline{2-5}
	\cline{4-5}
	& 1.0 & 0.0 & 0 & 0.559090 \\
	\bottomrule
\end{tabular}

produces

Problem description

Hey,

when using a multi-index dataframe, the to_latex command with option "multirow = True" joins cells, which due to the multirow hierarchy shouldn't be joined. In the image this produces the joined entry on the divider line.

Expected Output

cells should not be joint if the hierarchy divides them

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 32 OS: Windows OS-release: 7 machine: AMD64 processor: AMD64 Family 16 Model 4 Stepping 3, AuthenticAMD byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

wlter · 2017-06-19T18:04:48Z

So, I wrote my own script, which is in no way compatible - but maybe the idea helps here: I used a hierarchical multi-index for the first levels and left the rest of the hierarchy to default dataframes. In the table, the hierrachichal elements are all distributed over mutliple cells using multirow. The hierarchical elemnts are also creating the clines. I don't allow multirow in the area of default dara frames. So my suggestion would be to not join cells because of their content but:

on the level of the multiindex:

join cells when they belong to the same hierarchical element using multirow
add the necessary \clines

on the level of the dataframes (below the multiindex)

do not join cells

TLDR: Suggesting to join cells based on whether they are part of multi-index or default dataframe instead of their content

sgsaenger · 2018-12-03T13:31:59Z

Does this problem persist in any way?
v0.23.4 seems to correctly yield:

\begin{tabular}{lllll}
\toprule
    &     &     &   &          \\
i & val0 & val1 & r0 & r1 \\
\midrule
\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.604922 \\
    &     & 1.0 & 1 & 0.204031 \\
    &     & 2.0 & 1 & 0.140646 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 0 & 0.456303 \\
    &     & 1.0 & 0 & 0.799159 \\
\cline{2-5}
    & 1.0 & 0.0 & 1 & 0.328993 \\
\cline{1-5}
\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} & 0.0 & 1 & 0.078872 \\
    &     & 1.0 & 0 & 0.489851 \\
    &     & 2.0 & 0 & 0.058885 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 1 & 0.889743 \\
    &     & 1.0 & 1 & 0.498069 \\
\cline{2-5}
    & 1.0 & 0.0 & 0 & 0.397285 \\
\bottomrule
\end{tabular}

mroeschke · 2019-10-22T04:18:33Z

Looks to produce a correct output on master. Could use a test:

In [7]: df = pd.DataFrame()
   ...:
   ...:
   ...: idx = 0
   ...: for i in range(2):
   ...:   for val0 in [3,2,1]:
   ...:     for val1 in range(val0):
   ...:       idx = idx + 1
   ...:       r0 = str(np.mod(int(idx*0.5),2))
   ...:       r1 = np.random.uniform()
   ...:       df = df.append({"i":i,"val0" : val0,\
   ...:                             "val1":val1,
   ...:                             "r0":r0, \
   ...:                           "r1":r1}, ignore_index=True)
   ...:
   ...:
   ...: df.set_index(["i",'val0','val1',"r0", 'r1'], inplace=True)
   ...:
   ...: print(df.to_latex(multirow = True, escape=False))
\begin{tabular}{lllll}
\toprule
    &     &     &   &          \\
i & val0 & val1 & r0 & r1 \\
\midrule
\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.959256 \\
    &     & 1.0 & 1 & 0.617433 \\
    &     & 2.0 & 1 & 0.355065 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 0 & 0.493983 \\
    &     & 1.0 & 0 & 0.710831 \\
\cline{2-5}
    & 1.0 & 0.0 & 1 & 0.068669 \\
\cline{1-5}
\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} & 0.0 & 1 & 0.391476 \\
    &     & 1.0 & 0 & 0.630937 \\
    &     & 2.0 & 0 & 0.892755 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 1 & 0.018855 \\
    &     & 1.0 & 1 & 0.444306 \\
\cline{2-5}
    & 1.0 & 0.0 & 0 & 0.590894 \\
\bottomrule
\end{tabular}


In [8]: pd.__version__
Out[8]: '0.26.0.dev0+627.gef77b5700'

mroeschke · 2020-01-21T01:59:13Z

Actually this looks incorrect again:

In [1]:         df = pd.DataFrame(
   ...:             index=[(0.0, 3.0, 0.0, 0, 0.929121402530996), (0.0, 3.0, 1.0, 1, 0.4187460393711979),
   ...:                    (0.0, 3.0, 2.0, 1, 0.47309,
   ...:                     69460209665), (0.0, 2.0, 0.0, 0, 0.3044652225366862),
   ...:                    (0.0, 2.0, 1.0, 0, 0.8087530860532746), (0.0, 1.0, 0.0, 1, 0.0799379522,
   ...:                                                             5657385),
   ...:                    (1.0, 3.0, 0.0, 1, 0.7793463916039404), (1.0, 3.0, 1.0, 0, 0.21066478200369132),
   ...:                    (1.0, 3.0, 2.0, 0, 0.3143737387268,
   ...:                     193), (1.0, 2.0, 0.0, 1, 0.46081170223017887),
   ...:                    (1.0, 2.0, 1.0, 1, 0.8865341655631166), (1.0, 1.0, 0.0, 0, 0.31459248512345084
   ...:                                                             )])
   ...:         result = df.to_latex(multirow = True, escape=False)

# should be `begin{tabular}{lllll}`
In [2]: result
Out[2]: "\\begin{tabular}{l}\n\\toprule\nEmpty DataFrame\nColumns: Index([], dtype='object')\nIndex: Index([    (0.0, 3.0, 0.0, 0, 0.929121402530996),\n          (0.0, 3.0, 1.0, 1, 0.4187460393711979),\n        (0.0, 3.0, 2.0, 1, 0.47309, 69460209665),\n          (0.0, 2.0, 0.0, 0, 0.3044652225366862),\n          (0.0, 2.0, 1.0, 0, 0.8087530860532746),\n       (0.0, 1.0, 0.0, 1, 0.0799379522, 5657385),\n          (1.0, 3.0, 0.0, 1, 0.7793463916039404),\n         (1.0, 3.0, 1.0, 0, 0.21066478200369132),\n        (1.0, 3.0, 2.0, 0, 0.3143737387268, 193),\n         (1.0, 2.0, 0.0, 1, 0.46081170223017887),\n          (1.0, 2.0, 1.0, 1, 0.8865341655631166),\n         (1.0, 1.0, 0.0, 0, 0.31459248512345084)],\n      dtype='object') \\\\\n\\bottomrule\n\\end{tabular}\n"

In [3]: pd.__version__
Out[3]: '1.0.0rc0+134.gbbcda98c7.dirty'

mroeschke · 2020-04-04T20:35:43Z

Looks to work on master now. Could use a test

In [7]: In [7]: df = pd.DataFrame()
   ...:    ...:
   ...:    ...:
   ...:    ...: idx = 0
   ...:    ...: for i in range(2):
   ...:    ...:   for val0 in [3,2,1]:
   ...:    ...:     for val1 in range(val0):
   ...:    ...:       idx = idx + 1
   ...:    ...:       r0 = str(np.mod(int(idx*0.5),2))
   ...:    ...:       r1 = np.random.uniform()
   ...:    ...:       df = df.append({"i":i,"val0" : val0,\
   ...:    ...:                             "val1":val1,
   ...:    ...:                             "r0":r0, \
   ...:    ...:                           "r1":r1}, ignore_index=True)
   ...:    ...:
   ...:    ...:
   ...:    ...: df.set_index(["i",'val0','val1',"r0", 'r1'], inplace=True)
   ...:    ...:
   ...:    ...: print(df.to_latex(multirow = True, escape=False))
\begin{tabular}{lllll}
\toprule
    &     &     &   &          \\
i & val0 & val1 & r0 & r1 \\
\midrule
\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.795703 \\
    &     & 1.0 & 1 & 0.135335 \\
    &     & 2.0 & 1 & 0.210815 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 0 & 0.398306 \\
    &     & 1.0 & 0 & 0.753541 \\
\cline{2-5}
    & 1.0 & 0.0 & 1 & 0.753324 \\
\cline{1-5}
\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} & 0.0 & 1 & 0.606834 \\
    &     & 1.0 & 0 & 0.703867 \\
    &     & 2.0 & 0 & 0.532124 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 1 & 0.393957 \\
    &     & 1.0 & 1 & 0.252969 \\
\cline{2-5}
    & 1.0 & 0.0 & 0 & 0.401082 \\
\bottomrule
\end{tabular}


In [8]: pd.__version__
Out[8]: '1.1.0.dev0+1108.gcad602e16'

TomAugspurger added Bug IO LaTeX to_latex labels Jul 12, 2017

TomAugspurger added this to the Next Major Release milestone Jul 12, 2017

TomAugspurger added Difficulty Intermediate labels Jul 12, 2017

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug IO LaTeX to_latex labels Oct 22, 2019

mroeschke added Bug IO LaTeX to_latex and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Jan 21, 2020

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug IO LaTeX to_latex labels Apr 4, 2020

mroeschke mentioned this issue May 26, 2021

TST: Old Issues #41674

Merged

8 tasks

jreback modified the milestones: Contributions Welcome, 1.3 May 26, 2021

jreback closed this as completed in #41674 May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_latex with multicolumn and multiindex joins cells which are on different hierarchy levels #16719

BUG: to_latex with multicolumn and multiindex joins cells which are on different hierarchy levels #16719

wlter commented Jun 18, 2017

wlter commented Jun 19, 2017

sgsaenger commented Dec 3, 2018

mroeschke commented Oct 22, 2019

mroeschke commented Jan 21, 2020

mroeschke commented Apr 4, 2020

BUG: to_latex with multicolumn and multiindex joins cells which are on different hierarchy levels #16719

BUG: to_latex with multicolumn and multiindex joins cells which are on different hierarchy levels #16719

Comments

wlter commented Jun 18, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

wlter commented Jun 19, 2017

sgsaenger commented Dec 3, 2018

mroeschke commented Oct 22, 2019

mroeschke commented Jan 21, 2020

mroeschke commented Apr 4, 2020

Output of `pd.show_versions()`