Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: consistent imports ('import pandas as pd' et al) #9886

Closed
3 tasks
jorisvandenbossche opened this issue Apr 13, 2015 · 13 comments
Closed
3 tasks

DOC: consistent imports ('import pandas as pd' et al) #9886

jorisvandenbossche opened this issue Apr 13, 2015 · 13 comments

Comments

@jorisvandenbossche
Copy link
Member

Throughout the docs, many different imports are used, not always visible to the reader. And ideally, I think this would be a bit more uniform. I think we need:

  • consistent imports
  • visible imports

So the code snippets can be ran without having to know what should be imported, apart from the default imports.
But in that case, of course, we have to decide/agree on which imports to use.


I think import pandas as pd is regarded as recommended? But still, it is not used that much in the docs. So I was wondering, does everybody agree to use that in all docs (and is it just a case of "it is the intention, but no one did it")?
So everywhere pd.DataFrame and pd.Series, or do we still want the convenience of using DataFrame(..)/Series(..) without the pd (so to only from pandas import DataFrame, Series, and use pd. for all the rest)?

So: Question 1: do we use import pandas as pd for everything or not?


Then we have the often used imports from other packages. I think this is standard, and should not need discussion:

import numpy as np
import matplotlib.pyplot as plt

But for the datetime package, the imports are not really uniform. The following two imports are both used mixed in the docs, but are not compatible with each other:

import datetime
from datetime import datetime

So: Question 2: How do we import datetime?

Further, there are some other imports used a lot, like from numpy.random import randn or from numpy import nan.

For other non-pandas imports, I propose that they should always use the standard import (eg np) or be done explicitely where used, and never hidden (as it is now the case sometimes, eg from dateutil.relativedelta import relativedelta)

So: Question 3: Do we agree that all other non-pandas imports should be done explicitely?

This means that imports in the suppressed code block like the following will be removed:

from numpy import nan
randn = np.random.randn
randint = np.random.randint
from dateutil.relativedelta import relativedelta
import random
import os
import csv

Third issue: there are also some pandas imports of non top-level things.

For pandas submodules, imports like this appear in the docs:

from pandas.tseries.api import *
from pandas.tseries.offsets import *

from pandas.core.reshape import *
from pandas.tools.tile import *

happen in the docs (and often not visible to the users), what I think is a bad idea. First, a lot of these imports should never been done as the functions used from there are also in the top-level pandas namespace.

If it is for functions that are used from the submodules, that we have to decide how to import them. At least, the imports should happen explicitely in the visible docs. But to do that, there are some different forms possible:

from pandas.tseries.offsets import *
... BMonthEnd()
... Day()

from pandas.tseries.offsets import BMonthEnd, Day
... BMonthEnd()
... Day()

import pandas.tseries.offsets as offsets (or from pandas.tseries import offsets)
... offsets.BMonthEnd()
... offsets.Day()

So: Question 4: How do we import from pandas submodules?

TO DO:

  • Decide on the imports (see questions above)
  • Clearly document this in the contributor guidelines
  • Adapt this in the documentation
@jreback jreback added this to the Next Major Release milestone Apr 13, 2015
@jorisvandenbossche
Copy link
Member Author

@jreback @shoyer @TomAugspurger @sinhrks What do you think about this?

@shoyer
Copy link
Member

shoyer commented Apr 23, 2015

I think we agree here.

  1. Yes, we should use import pandas as pd for everything.
  2. I don't think this matters so much, as long we're explicit
  3. Yes, though I would probably allow for omitting import numpy as np as well.
  4. We should import explicitly, never with * (like we encourage users to do)

@jreback
Copy link
Contributor

jreback commented Apr 23, 2015

agree with @shoyer here.

I would prob show import numpy as np as well

should show esp in 10min to pandas. Not really concerned about in the docs itself, but I guess it could be a bit unclear.

Are you proposing prefixing all calls with pd.? e.g.

pd.Series, pd.date_range etc...?

@jorisvandenbossche
Copy link
Member Author

@shoyer On question 3: yes indeed, apart from pd, I would propose that also np and plt are assumed and do not have to be imported explicitely (or only once at the top).

@jreback Yes, indeed: pd.Series, pd.DataFrame, pd.date_range -> that is the consequence of saying "Yes, we should use import pandas as pd for everyting" (see question 1 for more details)
But that is the reason I asked, as we don't do this at the moment, so it has quite an impact.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2015

I think that might be a bit too far. I would be +1 on this for 10min to pandas. But IMHO just clutters the docs otherwise. and we don't do it ANYWHERE IIRC.

@TomAugspurger
Copy link
Contributor

  1. always import pandas as pd even for pd.Series, pd.DataFrame
  2. I always use import datetime, but as long as we're consistent
  3. Explicit imports for sure. We have
In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: import matplotlib.pyplot as plt

at the start of the 10 minute intro. I think that's ok. All others should imported (and shown to be imported), say once per section? We shouldn't assume people are reading the docs straight through.
4. Make the submodule imports visible and don't use *

So I think I'm agreeing with @jorisvandenbossche and @shoyer here.

@sinhrks
Copy link
Member

sinhrks commented Apr 23, 2015

+1 for @shoyer and @TomAugspurger . And I prefer import datetime also, because I feel datetime.datetime and datetime.timedelta are more explicit.

@jorisvandenbossche
Copy link
Member Author

@jreback That's the reason I brought it up, as now we don't do this, although recommend it. But I think we should be consistent in the first place. If we recommend it, we should do it ourselves. If we do it in 10min, we should do it in all the docs, although this is more verbose.
Otherwise we should change our recommendation to from pandas import Series, DataFrame; import pandas as pd (but I am not really a proponent of that)

@jorisvandenbossche
Copy link
Member Author

For question 4, anybody a preference out of those two?

from pandas.tseries.offsets import BMonthEnd, Day
... BMonthEnd()
... Day()

import pandas.tseries.offsets as offsets (or from pandas.tseries import offsets)
... offsets.BMonthEnd()
... offsets.Day()

@jorisvandenbossche
Copy link
Member Author

I started with some files, see #9987 (long train rit today :-)), so that is what it would look like if I follow the above.

springcoil added a commit to springcoil/pandas that referenced this issue Aug 30, 2015
BUG: 10633 - some last errors removed

ENH: pickle support for Period pandas-dev#10439

update legacy_storage for pickles

update pickles/msgpack for 0.16.2

Added tests for ABC Types, Issue pandas-dev#10828

TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode

COMPAT:Allow multi-indexes to be written to excel.

(Even though they cannot be read back in.)

Closes pandas-dev#10564

DOC: typo

A few changes in docs

TST: Changes in test

ENH: pickle support for Period pandas-dev#10439

update legacy_storage for pickles

update pickles/msgpack for 0.16.2

Added tests for ABC Types, Issue pandas-dev#10828

TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode

COMPAT:Allow multi-indexes to be written to excel.

(Even though they cannot be read back in.)

Closes pandas-dev#10564

DOC: typo

ENH: pickle support for Period pandas-dev#10439

update legacy_storage for pickles

update pickles/msgpack for 0.16.2

Added tests for ABC Types, Issue pandas-dev#10828

TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode

COMPAT:Allow multi-indexes to be written to excel.

(Even though they cannot be read back in.)

Closes pandas-dev#10564

DOC: typo

A few changes in docs

ERR: 10720

BUG: 10633 and 10800 fix

merging

ENH: pickle support for Period pandas-dev#10439

update legacy_storage for pickles

update pickles/msgpack for 0.16.2

Added tests for ABC Types, Issue pandas-dev#10828

TST: pandas-dev#10822, skip tests on windows for odd error message in to_datetime with unicode

COMPAT:Allow multi-indexes to be written to excel.

(Even though they cannot be read back in.)

Closes pandas-dev#10564

DOC: typo

A few changes in docs

TST: Changes in test

Fixing a slight messup

DOC:Updating consistent imports in the merging.rst file pandas-dev#9886

DOC: GH9886 Part V

DOC: GH9886 Part V - some merging issues
@jorisvandenbossche
Copy link
Member Author

Most is done, overview of the files that still need updates:

  • merging.rst (only randn)
  • timedeltas.rst (the non pd imports: offsets, datetime, pytz, ..)
  • timeseries.rst (same)

@ghost
Copy link

ghost commented Mar 5, 2017

I'ld like to work on this, for the above files: merging.rst timedeltsd.rst and timeseries.rst

@mroeschke
Copy link
Member

Looks like the remaining files in #9886 (comment) have consistent import styles now. Going to close out this issue but happy to reopen if we re-catch inconsistent imports in the references.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants