Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged cells #5

Open
mericano1 opened this issue Apr 17, 2024 · 2 comments
Open

Merged cells #5

mericano1 opened this issue Apr 17, 2024 · 2 comments

Comments

@mericano1
Copy link

Hi there,
this is a great little tool, thanks for sharing. I have tested on some sheets and I have noticed the table boundaries are not correct when two cells are merged. That seems to be probably due to the fact that the information is lost when converted to a dataframe ?

What do you think would be the best way to handle this case ?

@ChrisPappalardo
Copy link
Owner

Hello,

Thanks for opening this and your interest in eparse. I'm traveling at the moment but I'll look at this when I return. Can you provide a specific example of what you mean by the table boundaries are not correct when a cell is merged? The short answer is that we take whatever pandas gives by default when converting xlsx to dataframe. In my experience this is only an issue on header rows, but I'd like to see your specific use case.

@ChrisPappalardo
Copy link
Owner

Hi. At PyCon sprints following up on this issue, which appears to be a limitation of pandas:
image
The proposed workaround in pandas does not work for all cells (we would probably not want to fill NA values across the entire sub-table), but there may be a workaround when handling table headers in df_find_tables and/or df_parse_table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants