Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create DataFrame from SQLite3 cursor #10134

Closed
hexum opened this issue May 14, 2015 · 11 comments
Closed

Can't create DataFrame from SQLite3 cursor #10134

hexum opened this issue May 14, 2015 · 11 comments
Labels
API Design Duplicate Report Duplicate issue or pull request

Comments

@hexum
Copy link

hexum commented May 14, 2015

When I pass cursor as data to DataFrame constructor an error occurs.

cursor = sqlite.execute(sql)
pd.DataFrame(cursor)


/usr/lib/python3/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    255                                          copy=copy)
    256         elif isinstance(data, collections.Iterator):
--> 257             raise TypeError("data argument can't be an iterator")
    258         else:
    259             try:

TypeError: data argument can't be an iterator

But normal generators is accepted

def gen():
    yield (1,2,3)
    yield (4,5,6)
    yield (7,8,9)

pd.DataFrame(gen())
Out[171]: 
   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9

It feels like inconsistence.

@jreback
Copy link
Contributor

jreback commented May 18, 2015

what is type(cursor).__mro__

@jreback jreback added API Design IO SQL to_sql, read_sql, read_sql_query labels May 18, 2015
@hexum
Copy link
Author

hexum commented May 18, 2015

type(cursor).__mro__
(sqlite3.Cursor, object)

@hexum
Copy link
Author

hexum commented May 18, 2015

It's not a big issue: passing cursor to list constructor results normal list wich is accepted by DataFrame.
I just can't understand why iterable is not acepted.
Type checking is a bad practice in Python, isn't it?
Why just not to check ability to iterate?

hasattr([], "__iter__")

@jreback
Copy link
Contributor

jreback commented May 18, 2015

type checking in python is fine. In pandas is actually quite a bit more complicated, because we need to determine, if, for example a list of-lists or list-of scalars are passed, then this is problematic

so an Iterable must have __iter__ AND __len__. A cursor doesn't have this property, while for example range(5) does. (I know that in theory rowcount works for his, but I don't think this is a guaranteed property). So a Cursor should act much like a GeneratorType (and not an Iterator).

@hexum
Copy link
Author

hexum commented May 18, 2015

I just create a generator with not defined length. And DataFrame accepts it as I expect.
I think we should turn off type check. Anyway user may construct infinity generator and pass type checking.

In [20]: def g():
   ....:     for i in range(5):
   ....:         yield [i, i ** 2, i ** 3]
   ....:         

In [21]: DataFrame(g())
Out[21]: 
   0   1   2
0  0   0   0
1  1   1   1
2  2   4   8
3  3   9  27
4  4  16  64

@jreback
Copy link
Contributor

jreback commented May 18, 2015

a generator is fine. you have an iterator. you can certainly make any changes you would like. but they would need to pass the test suite as is.

@hexum
Copy link
Author

hexum commented May 18, 2015

Hmmm. I'll see how to overcome it.

@jorisvandenbossche jorisvandenbossche removed the IO SQL to_sql, read_sql, read_sql_query label May 20, 2015
@ns-cweber
Copy link

@jreback Out of curiosity, why is a generator fine, but not an iterator? It looks like DataFrame's constructor checks to see if the data argument is a GeneratorType and then wraps it (data = list(data)), but if it's an iterator it raises an exception.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2016

see my comment above
and iterator is not sufficient as its not required to have a len

this is an old issue - so don't really remember
have a look at the test code for Frame construction

@ns-cweber
Copy link

Generator is also not required to have a len. The solution for generators in the DataFrame constructor is to consume it into a list() and from there treat it as though a list was passed as the data arg. The same should work for iters. If you're worried about infinite iterators, then you should be equally worried about infinite generators, unless I'm misunderstanding something. I'll take a look at the tests.

@TomAugspurger
Copy link
Contributor

Duplicate of #2193

I also think it'll be possible to turn the iterable into a list, just like with a generator.

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jul 7, 2018
@TomAugspurger TomAugspurger added this to the No action milestone Jul 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

5 participants