Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/TST: doctests leaving extraneous files #23201

Closed
jreback opened this issue Oct 17, 2018 · 8 comments · Fixed by #23858
Closed

DOC/TST: doctests leaving extraneous files #23201

jreback opened this issue Oct 17, 2018 · 8 comments · Fixed by #23858
Labels
Docs good first issue Testing pandas testing functions or related to the test suite
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 17, 2018

after running

LINT=1 ci/code_check.sh

we get some garbage left around

(pandas) bash-3.2$ git st
On branch PR_TOOL_MERGE_PR_23096
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        df.parquet.gzip
        df_info.txt
        output.xlsx
        output1.xlsx

nothing added to commit but untracked files present (use "git add" to track)
@jreback jreback added Testing pandas testing functions or related to the test suite Docs labels Oct 17, 2018
@jreback jreback added this to the 0.24.0 milestone Oct 17, 2018
@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2018

cc @datapythonista

prob from some of the doc-tests

@datapythonista
Copy link
Member

Didn't realize, but likely to be from the doctests. Not sure what's the best option here, what comes to my mind:

  • Save to temp: df.to_csv('/tmp/data.csv')
  • Remove the file after saving in the doctest (I don't like that it may be confusing and verbose to the users)
  • Avoid saving to disk in the doctests (I don't like it, as we won't be able to show several methods)
  • Saving to StringIO objects (probably too complex for beginners)
  • Having a clean up function (we'll have to maintain the list of files, and we have several ways to run the doctests, so we can't run it automaitcally in all cases)

Any other idea? I think the first option is probably the best, even if it's not ideal.

@WillAyd
Copy link
Member

WillAyd commented Oct 25, 2018

Is using a tempfile / namedtempfile not an option?

@jreback
Copy link
Contributor Author

jreback commented Oct 25, 2018

yes these should use a named temp file i think

@datapythonista
Copy link
Member

In terms of cleaning up after running the doctests there is no doubt that using Python stdlib tempfile is the best option. But in terms of documentation I think we're doing a poor favor to our users if instead of showing a method like to_csv as:

df.to_csv('/tmp/my_dataframe.csv')

we show:

import tempfile
with tempfile.NamedTemporaryFile() as csv_fd:
    df.to_csv(csv_fd)

@TomAugspurger
Copy link
Contributor

How about we add doctest: +SKIP to the lines writing to disk, and lines that read afterwards?

Writing to /tmp isn't great, since users may be copy-pasting and may not have a /tmp directory.

Anything with cleaning up or writing to a StringIO will, I think, just muddy the example. We shouldn't compromise on the documentation here.

@datapythonista
Copy link
Member

+1 on doctest: +SKIP.

@RomainSa
Copy link
Contributor

Working on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs good first issue Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants