Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: close corrupt files in ExcelFile #41806

Merged
merged 1 commit into from
Jun 4, 2021
Merged

REGR: close corrupt files in ExcelFile #41806

merged 1 commit into from
Jun 4, 2021

Conversation

twoertwein
Copy link
Member

@twoertwein twoertwein commented Jun 3, 2021

@simonjayhawkins maybe for 1.2.5?

@jreback jreback added IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version labels Jun 3, 2021
@jreback jreback added this to the 1.2.5 milestone Jun 3, 2021
@jreback
Copy link
Contributor

jreback commented Jun 3, 2021

lgtm. ping when ready.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein

Comment on lines 1506 to 1520
msg = "File is not a zip file"
with tm.ensure_clean(f"corrupt{read_ext}") as file:
with pytest.raises((BadZipFile, ValueError), match=msg):
with pd.ExcelFile(file, engine=engine) as _:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is testing the message from the pd.ExcelFile(file, engine=engine) as _ expression which does already raise BadZipFile: File is not a zip file

how is this testing that the file is closed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't replicate the issue locally to see this test fail without the patch.

I assume it's Windows specific (I use windows, but use wsl and don't have dev tools installed in native windows)

I think that ensure_clean is supposed to remove files, closed or not, so the file leaks check is checking ensure_clean and not the op under test?

Copy link
Member Author

@twoertwein twoertwein Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test worked only on windows: tm.ensure_clean will fail to remove the file and throw an error (the file leak check doesn't catch the ResourceWarning).

Locally, the test fails for me now when the patch is not applied (but it might depend on whether ResourceWarnings are en/disabled).

@simonjayhawkins
Copy link
Member

@twoertwein some tests are failing, if you fix-up and remove the draft status so can be merged when green.

(I think this is the last PR for 1.2.5, we have a few outstanding issues but no PRs to fix. https://github.com/pandas-dev/pandas/milestone/85 @jreback )

@simonjayhawkins simonjayhawkins mentioned this pull request Jun 4, 2021
@twoertwein twoertwein marked this pull request as ready for review June 4, 2021 18:44
@twoertwein
Copy link
Member Author

The remaining tests fail because inspect_format (in xlrd) uses if content: but content is b"" (empty file). It should probably check if content is not None and check the length of the content.

I think we can avoid this bug by writing random data to the file.

@twoertwein
Copy link
Member Author

@simonjayhawkins @jreback green

@jreback jreback merged commit 2973e4e into pandas-dev:master Jun 4, 2021
@jreback
Copy link
Contributor

jreback commented Jun 4, 2021

@meeseeksdev backport 1.2.x

@jreback
Copy link
Contributor

jreback commented Jun 4, 2021

thanks @twoertwein

@lumberbot-app

This comment has been minimized.

@lumberbot-app

This comment has been minimized.

@jreback
Copy link
Contributor

jreback commented Jun 4, 2021

looks like needs manual backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Projects
None yet
3 participants