Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.from_records should have an optional "size" parameter #1794

Closed
tebeka opened this issue Aug 21, 2012 · 2 comments
Closed

DataFrame.from_records should have an optional "size" parameter #1794

tebeka opened this issue Aug 21, 2012 · 2 comments
Milestone

Comments

@tebeka
Copy link

tebeka commented Aug 21, 2012

Currently, data should have a __len__ method. However in some cases I'd like to pass a iterator with a known size (mostly in the case of database, where I can run the SELECT SUM(...) FROM table, for the size. Then SELECT * FROM table and pass the cursor as data.

Currently I need to create in memory copy of the rows in the cursor.

@tebeka
Copy link
Author

tebeka commented Nov 20, 2012

From looking at DataFrame.from_records code, it looks like it first creates an array from all the rows and then allocate the DataFrame. If this is the case, then we still allocate twice the memory we need.

@wesm
Copy link
Member

wesm commented Nov 20, 2012

This is true. It is difficult to prevent this but we can take care of it internally. I'll create a separate issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants