Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_latex with multicolumn and multiindex joins cells which are on different hierarchy levels #16719

Closed
wlter opened this issue Jun 18, 2017 · 5 comments · Fixed by #41674
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@wlter
Copy link

wlter commented Jun 18, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np


df = pd.DataFrame()


idx = 0
for i in range(2):
  for val0 in [3,2,1]:
    for val1 in range(val0):
      idx = idx + 1
      r0 = str(np.mod(int(idx*0.5),2))
      r1 = np.random.uniform()
      df = df.append({"i":i,"val0" : val0,\
                            "val1":val1, 
                            "r0":r0, \
                          "r1":r1}, ignore_index=True)


df.set_index(["i",'val0','val1',"r0", 'r1'], inplace=True)

print(df.to_latex(multirow = True, escape=False))

produces

\begin{tabular}{lllll}
	\toprule
	&     &     &   &          \\
	i & val0 & val1 & r0 & r1 \\
	\midrule
	\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.307919 \\
	&     & 1.0 & \multirow{2}{*}{1} & 0.488816 \\
	&     & 2.0 &   & 0.708405 \\
	\cline{2-5}
	\cline{4-5}
	& \multirow{2}{*}{2.0} & 0.0 & \multirow{2}{*}{0} & 0.806916 \\
	&     & 1.0 &   & 0.763446 \\
	\cline{2-5}
	\cline{4-5}
	& 1.0 & \multirow{2}{*}{0.0} & \multirow{2}{*}{1} & 0.255642 \\
	\cline{1-5}
	\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} &     &   & 0.093269 \\
	\cline{3-5}
	\cline{4-5}
	&     & 1.0 & \multirow{2}{*}{0} & 0.775120 \\
	&     & 2.0 &   & 0.989241 \\
	\cline{2-5}
	\cline{4-5}
	& \multirow{2}{*}{2.0} & 0.0 & \multirow{2}{*}{1} & 0.741230 \\
	&     & 1.0 &   & 0.960813 \\
	\cline{2-5}
	\cline{4-5}
	& 1.0 & 0.0 & 0 & 0.559090 \\
	\bottomrule
\end{tabular}


produces

to_latex-bug2

Problem description

Hey,

when using a multi-index dataframe, the to_latex command with option "multirow = True" joins cells, which due to the multirow hierarchy shouldn't be joined. In the image this produces the joined entry on the divider line.

Expected Output

  • cells should not be joint if the hierarchy divides them

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 32 OS: Windows OS-release: 7 machine: AMD64 processor: AMD64 Family 16 Model 4 Stepping 3, AuthenticAMD byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@wlter
Copy link
Author

wlter commented Jun 19, 2017

So, I wrote my own script, which is in no way compatible - but maybe the idea helps here: I used a hierarchical multi-index for the first levels and left the rest of the hierarchy to default dataframes. In the table, the hierrachichal elements are all distributed over mutliple cells using multirow. The hierarchical elemnts are also creating the clines. I don't allow multirow in the area of default dara frames. So my suggestion would be to not join cells because of their content but:

on the level of the multiindex:

  • join cells when they belong to the same hierarchical element using multirow
  • add the necessary \clines

on the level of the dataframes (below the multiindex)

  • do not join cells

TLDR: Suggesting to join cells based on whether they are part of multi-index or default dataframe instead of their content

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jul 12, 2017
@sgsaenger
Copy link

Does this problem persist in any way?
v0.23.4 seems to correctly yield:

\begin{tabular}{lllll}
\toprule
    &     &     &   &          \\
i & val0 & val1 & r0 & r1 \\
\midrule
\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.604922 \\
    &     & 1.0 & 1 & 0.204031 \\
    &     & 2.0 & 1 & 0.140646 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 0 & 0.456303 \\
    &     & 1.0 & 0 & 0.799159 \\
\cline{2-5}
    & 1.0 & 0.0 & 1 & 0.328993 \\
\cline{1-5}
\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} & 0.0 & 1 & 0.078872 \\
    &     & 1.0 & 0 & 0.489851 \\
    &     & 2.0 & 0 & 0.058885 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 1 & 0.889743 \\
    &     & 1.0 & 1 & 0.498069 \\
\cline{2-5}
    & 1.0 & 0.0 & 0 & 0.397285 \\
\bottomrule
\end{tabular}

latex_bug

@mroeschke
Copy link
Member

Looks to produce a correct output on master. Could use a test:

In [7]: df = pd.DataFrame()
   ...:
   ...:
   ...: idx = 0
   ...: for i in range(2):
   ...:   for val0 in [3,2,1]:
   ...:     for val1 in range(val0):
   ...:       idx = idx + 1
   ...:       r0 = str(np.mod(int(idx*0.5),2))
   ...:       r1 = np.random.uniform()
   ...:       df = df.append({"i":i,"val0" : val0,\
   ...:                             "val1":val1,
   ...:                             "r0":r0, \
   ...:                           "r1":r1}, ignore_index=True)
   ...:
   ...:
   ...: df.set_index(["i",'val0','val1',"r0", 'r1'], inplace=True)
   ...:
   ...: print(df.to_latex(multirow = True, escape=False))
\begin{tabular}{lllll}
\toprule
    &     &     &   &          \\
i & val0 & val1 & r0 & r1 \\
\midrule
\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.959256 \\
    &     & 1.0 & 1 & 0.617433 \\
    &     & 2.0 & 1 & 0.355065 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 0 & 0.493983 \\
    &     & 1.0 & 0 & 0.710831 \\
\cline{2-5}
    & 1.0 & 0.0 & 1 & 0.068669 \\
\cline{1-5}
\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} & 0.0 & 1 & 0.391476 \\
    &     & 1.0 & 0 & 0.630937 \\
    &     & 2.0 & 0 & 0.892755 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 1 & 0.018855 \\
    &     & 1.0 & 1 & 0.444306 \\
\cline{2-5}
    & 1.0 & 0.0 & 0 & 0.590894 \\
\bottomrule
\end{tabular}


In [8]: pd.__version__
Out[8]: '0.26.0.dev0+627.gef77b5700'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug IO LaTeX to_latex labels Oct 22, 2019
@mroeschke
Copy link
Member

Actually this looks incorrect again:

In [1]:         df = pd.DataFrame(
   ...:             index=[(0.0, 3.0, 0.0, 0, 0.929121402530996), (0.0, 3.0, 1.0, 1, 0.4187460393711979),
   ...:                    (0.0, 3.0, 2.0, 1, 0.47309,
   ...:                     69460209665), (0.0, 2.0, 0.0, 0, 0.3044652225366862),
   ...:                    (0.0, 2.0, 1.0, 0, 0.8087530860532746), (0.0, 1.0, 0.0, 1, 0.0799379522,
   ...:                                                             5657385),
   ...:                    (1.0, 3.0, 0.0, 1, 0.7793463916039404), (1.0, 3.0, 1.0, 0, 0.21066478200369132),
   ...:                    (1.0, 3.0, 2.0, 0, 0.3143737387268,
   ...:                     193), (1.0, 2.0, 0.0, 1, 0.46081170223017887),
   ...:                    (1.0, 2.0, 1.0, 1, 0.8865341655631166), (1.0, 1.0, 0.0, 0, 0.31459248512345084
   ...:                                                             )])
   ...:         result = df.to_latex(multirow = True, escape=False)

# should be `begin{tabular}{lllll}`
In [2]: result
Out[2]: "\\begin{tabular}{l}\n\\toprule\nEmpty DataFrame\nColumns: Index([], dtype='object')\nIndex: Index([    (0.0, 3.0, 0.0, 0, 0.929121402530996),\n          (0.0, 3.0, 1.0, 1, 0.4187460393711979),\n        (0.0, 3.0, 2.0, 1, 0.47309, 69460209665),\n          (0.0, 2.0, 0.0, 0, 0.3044652225366862),\n          (0.0, 2.0, 1.0, 0, 0.8087530860532746),\n       (0.0, 1.0, 0.0, 1, 0.0799379522, 5657385),\n          (1.0, 3.0, 0.0, 1, 0.7793463916039404),\n         (1.0, 3.0, 1.0, 0, 0.21066478200369132),\n        (1.0, 3.0, 2.0, 0, 0.3143737387268, 193),\n         (1.0, 2.0, 0.0, 1, 0.46081170223017887),\n          (1.0, 2.0, 1.0, 1, 0.8865341655631166),\n         (1.0, 1.0, 0.0, 0, 0.31459248512345084)],\n      dtype='object') \\\\\n\\bottomrule\n\\end{tabular}\n"

In [3]: pd.__version__
Out[3]: '1.0.0rc0+134.gbbcda98c7.dirty'

@mroeschke mroeschke added Bug IO LaTeX to_latex and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Jan 21, 2020
@mroeschke
Copy link
Member

Looks to work on master now. Could use a test

In [7]: In [7]: df = pd.DataFrame()
   ...:    ...:
   ...:    ...:
   ...:    ...: idx = 0
   ...:    ...: for i in range(2):
   ...:    ...:   for val0 in [3,2,1]:
   ...:    ...:     for val1 in range(val0):
   ...:    ...:       idx = idx + 1
   ...:    ...:       r0 = str(np.mod(int(idx*0.5),2))
   ...:    ...:       r1 = np.random.uniform()
   ...:    ...:       df = df.append({"i":i,"val0" : val0,\
   ...:    ...:                             "val1":val1,
   ...:    ...:                             "r0":r0, \
   ...:    ...:                           "r1":r1}, ignore_index=True)
   ...:    ...:
   ...:    ...:
   ...:    ...: df.set_index(["i",'val0','val1',"r0", 'r1'], inplace=True)
   ...:    ...:
   ...:    ...: print(df.to_latex(multirow = True, escape=False))
\begin{tabular}{lllll}
\toprule
    &     &     &   &          \\
i & val0 & val1 & r0 & r1 \\
\midrule
\multirow{6}{*}{0.0} & \multirow{3}{*}{3.0} & 0.0 & 0 & 0.795703 \\
    &     & 1.0 & 1 & 0.135335 \\
    &     & 2.0 & 1 & 0.210815 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 0 & 0.398306 \\
    &     & 1.0 & 0 & 0.753541 \\
\cline{2-5}
    & 1.0 & 0.0 & 1 & 0.753324 \\
\cline{1-5}
\multirow{6}{*}{1.0} & \multirow{3}{*}{3.0} & 0.0 & 1 & 0.606834 \\
    &     & 1.0 & 0 & 0.703867 \\
    &     & 2.0 & 0 & 0.532124 \\
\cline{2-5}
    & \multirow{2}{*}{2.0} & 0.0 & 1 & 0.393957 \\
    &     & 1.0 & 1 & 0.252969 \\
\cline{2-5}
    & 1.0 & 0.0 & 0 & 0.401082 \\
\bottomrule
\end{tabular}


In [8]: pd.__version__
Out[8]: '1.1.0.dev0+1108.gcad602e16'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug IO LaTeX to_latex labels Apr 4, 2020
@mroeschke mroeschke mentioned this issue May 26, 2021
8 tasks
@jreback jreback modified the milestones: Contributions Welcome, 1.3 May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants