DOC: consistent imports ('import pandas as pd' et al) #9886

jorisvandenbossche · 2015-04-13T20:40:45Z

Throughout the docs, many different imports are used, not always visible to the reader. And ideally, I think this would be a bit more uniform. I think we need:

consistent imports
visible imports

So the code snippets can be ran without having to know what should be imported, apart from the default imports.
But in that case, of course, we have to decide/agree on which imports to use.

I think import pandas as pd is regarded as recommended? But still, it is not used that much in the docs. So I was wondering, does everybody agree to use that in all docs (and is it just a case of "it is the intention, but no one did it")?
So everywhere pd.DataFrame and pd.Series, or do we still want the convenience of using DataFrame(..)/Series(..) without the pd (so to only from pandas import DataFrame, Series, and use pd. for all the rest)?

So: Question 1: do we use import pandas as pd for everything or not?

Then we have the often used imports from other packages. I think this is standard, and should not need discussion:

import numpy as np
import matplotlib.pyplot as plt

But for the datetime package, the imports are not really uniform. The following two imports are both used mixed in the docs, but are not compatible with each other:

import datetime
from datetime import datetime

So: Question 2: How do we import datetime?

Further, there are some other imports used a lot, like from numpy.random import randn or from numpy import nan.

For other non-pandas imports, I propose that they should always use the standard import (eg np) or be done explicitely where used, and never hidden (as it is now the case sometimes, eg from dateutil.relativedelta import relativedelta)

So: Question 3: Do we agree that all other non-pandas imports should be done explicitely?

This means that imports in the suppressed code block like the following will be removed:

from numpy import nan
randn = np.random.randn
randint = np.random.randint
from dateutil.relativedelta import relativedelta
import random
import os
import csv

Third issue: there are also some pandas imports of non top-level things.

For pandas submodules, imports like this appear in the docs:

from pandas.tseries.api import *
from pandas.tseries.offsets import *

from pandas.core.reshape import *
from pandas.tools.tile import *

happen in the docs (and often not visible to the users), what I think is a bad idea. First, a lot of these imports should never been done as the functions used from there are also in the top-level pandas namespace.

If it is for functions that are used from the submodules, that we have to decide how to import them. At least, the imports should happen explicitely in the visible docs. But to do that, there are some different forms possible:

from pandas.tseries.offsets import *
... BMonthEnd()
... Day()

from pandas.tseries.offsets import BMonthEnd, Day
... BMonthEnd()
... Day()

import pandas.tseries.offsets as offsets (or from pandas.tseries import offsets)
... offsets.BMonthEnd()
... offsets.Day()

So: Question 4: How do we import from pandas submodules?

TO DO:

Decide on the imports (see questions above)
Clearly document this in the contributor guidelines
Adapt this in the documentation

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2015-04-22T23:58:23Z

@jreback @shoyer @TomAugspurger @sinhrks What do you think about this?

shoyer · 2015-04-23T00:01:25Z

I think we agree here.

Yes, we should use import pandas as pd for everything.
I don't think this matters so much, as long we're explicit
Yes, though I would probably allow for omitting import numpy as np as well.
We should import explicitly, never with * (like we encourage users to do)

jreback · 2015-04-23T00:06:00Z

agree with @shoyer here.

I would prob show import numpy as np as well

should show esp in 10min to pandas. Not really concerned about in the docs itself, but I guess it could be a bit unclear.

Are you proposing prefixing all calls with pd.? e.g.

pd.Series, pd.date_range etc...?

jorisvandenbossche · 2015-04-23T00:11:32Z

@shoyer On question 3: yes indeed, apart from pd, I would propose that also np and plt are assumed and do not have to be imported explicitely (or only once at the top).

@jreback Yes, indeed: pd.Series, pd.DataFrame, pd.date_range -> that is the consequence of saying "Yes, we should use import pandas as pd for everyting" (see question 1 for more details)
But that is the reason I asked, as we don't do this at the moment, so it has quite an impact.

jreback · 2015-04-23T00:15:38Z

I think that might be a bit too far. I would be +1 on this for 10min to pandas. But IMHO just clutters the docs otherwise. and we don't do it ANYWHERE IIRC.

TomAugspurger · 2015-04-23T13:17:48Z

always import pandas as pd even for pd.Series, pd.DataFrame
I always use import datetime, but as long as we're consistent
Explicit imports for sure. We have

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: import matplotlib.pyplot as plt

at the start of the 10 minute intro. I think that's ok. All others should imported (and shown to be imported), say once per section? We shouldn't assume people are reading the docs straight through.
4. Make the submodule imports visible and don't use *

So I think I'm agreeing with @jorisvandenbossche and @shoyer here.

sinhrks · 2015-04-23T14:09:39Z

+1 for @shoyer and @TomAugspurger . And I prefer import datetime also, because I feel datetime.datetime and datetime.timedelta are more explicit.

jorisvandenbossche · 2015-04-25T07:23:06Z

@jreback That's the reason I brought it up, as now we don't do this, although recommend it. But I think we should be consistent in the first place. If we recommend it, we should do it ourselves. If we do it in 10min, we should do it in all the docs, although this is more verbose.
Otherwise we should change our recommendation to from pandas import Series, DataFrame; import pandas as pd (but I am not really a proponent of that)

jorisvandenbossche · 2015-04-25T07:24:10Z

For question 4, anybody a preference out of those two?

from pandas.tseries.offsets import BMonthEnd, Day
... BMonthEnd()
... Day()

import pandas.tseries.offsets as offsets (or from pandas.tseries import offsets)
... offsets.BMonthEnd()
... offsets.Day()

jorisvandenbossche · 2015-04-25T20:43:25Z

I started with some files, see #9987 (long train rit today :-)), so that is what it would look like if I follow the above.

BUG: 10633 - some last errors removed ENH: pickle support for Period pandas-dev#10439 update legacy_storage for pickles update pickles/msgpack for 0.16.2 Added tests for ABC Types, Issue pandas-dev#10828 TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode COMPAT:Allow multi-indexes to be written to excel. (Even though they cannot be read back in.) Closes pandas-dev#10564 DOC: typo A few changes in docs TST: Changes in test ENH: pickle support for Period pandas-dev#10439 update legacy_storage for pickles update pickles/msgpack for 0.16.2 Added tests for ABC Types, Issue pandas-dev#10828 TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode COMPAT:Allow multi-indexes to be written to excel. (Even though they cannot be read back in.) Closes pandas-dev#10564 DOC: typo ENH: pickle support for Period pandas-dev#10439 update legacy_storage for pickles update pickles/msgpack for 0.16.2 Added tests for ABC Types, Issue pandas-dev#10828 TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode COMPAT:Allow multi-indexes to be written to excel. (Even though they cannot be read back in.) Closes pandas-dev#10564 DOC: typo A few changes in docs ERR: 10720 BUG: 10633 and 10800 fix merging ENH: pickle support for Period pandas-dev#10439 update legacy_storage for pickles update pickles/msgpack for 0.16.2 Added tests for ABC Types, Issue pandas-dev#10828 TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode COMPAT:Allow multi-indexes to be written to excel. (Even though they cannot be read back in.) Closes pandas-dev#10564 DOC: typo A few changes in docs TST: Changes in test Fixing a slight messup DOC:Updating consistent imports in the merging.rst file pandas-dev#9886 DOC: GH9886 Part V DOC: GH9886 Part V - some merging issues

xref #9886

jorisvandenbossche · 2017-03-03T15:41:24Z

Most is done, overview of the files that still need updates:

merging.rst (only randn)
timedeltas.rst (the non pd imports: offsets, datetime, pytz, ..)
timeseries.rst (same)

ghost · 2017-03-05T15:21:00Z

I'ld like to work on this, for the above files: merging.rst timedeltsd.rst and timeseries.rst

mroeschke · 2021-04-18T06:21:20Z

Looks like the remaining files in #9886 (comment) have consistent import styles now. Going to close out this issue but happy to reopen if we re-catch inconsistent imports in the references.

jorisvandenbossche added the Docs label Apr 13, 2015

jreback added this to the Next Major Release milestone Apr 13, 2015

jorisvandenbossche mentioned this issue Apr 25, 2015

DOC: clean up / consistent imports (GH9886) #9987

Merged

jorisvandenbossche mentioned this issue May 5, 2015

DOC: add import prefix to all pandas imports #1967

Closed

jorisvandenbossche added the Difficulty Novice label May 5, 2015

jorisvandenbossche mentioned this issue May 14, 2015

DOC: consistent imports (GH9886) part II #10136

Merged

jorisvandenbossche mentioned this issue Jun 15, 2015

DOC: consistent imports (GH9886) part III #10359

Merged

jorisvandenbossche mentioned this issue Jul 13, 2015

DOC: consistent imports (GH9886) part IV #10561

Merged

jorisvandenbossche mentioned this issue Aug 30, 2015

EuroScipy 2015 pandas sprint #10877

Closed

sinhrks mentioned this issue Mar 5, 2016

ENH: Partial string matching for timestamps with multiindex #12530

Closed

4 tasks

jackieleng mentioned this issue Aug 27, 2016

Added consistent pandas imports in io documentation #14097

Merged

jackieleng mentioned this issue Dec 27, 2016

DOC: consistent import timedeltas docs #14997

Merged

jorisvandenbossche pushed a commit that referenced this issue Dec 27, 2016

DOC: consistent import timedeltas docs (#14997)

7f0eefc

xref #9886

TomAugspurger added the good first issue label Oct 11, 2017

jreback added good first issue and removed good first issue Difficulty Novice labels Dec 15, 2017

jorisvandenbossche mentioned this issue Oct 27, 2018

Validate PEP-8 in docstring examples #23154

Closed

mroeschke closed this as completed Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: consistent imports ('import pandas as pd' et al) #9886

DOC: consistent imports ('import pandas as pd' et al) #9886

jorisvandenbossche commented Apr 13, 2015

jorisvandenbossche commented Apr 22, 2015

shoyer commented Apr 23, 2015

jreback commented Apr 23, 2015

jorisvandenbossche commented Apr 23, 2015

jreback commented Apr 23, 2015

TomAugspurger commented Apr 23, 2015

sinhrks commented Apr 23, 2015

jorisvandenbossche commented Apr 25, 2015

jorisvandenbossche commented Apr 25, 2015

jorisvandenbossche commented Apr 25, 2015

jorisvandenbossche commented Mar 3, 2017

ghost commented Mar 5, 2017

mroeschke commented Apr 18, 2021

DOC: consistent imports ('import pandas as pd' et al) #9886

DOC: consistent imports ('import pandas as pd' et al) #9886

Comments

jorisvandenbossche commented Apr 13, 2015

jorisvandenbossche commented Apr 22, 2015

shoyer commented Apr 23, 2015

jreback commented Apr 23, 2015

jorisvandenbossche commented Apr 23, 2015

jreback commented Apr 23, 2015

TomAugspurger commented Apr 23, 2015

sinhrks commented Apr 23, 2015

jorisvandenbossche commented Apr 25, 2015

jorisvandenbossche commented Apr 25, 2015

jorisvandenbossche commented Apr 25, 2015

jorisvandenbossche commented Mar 3, 2017

ghost commented Mar 5, 2017

mroeschke commented Apr 18, 2021