-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Missing support for creating a DataFrame from Python native arrays (or other objects supporting the buffer protocol) #4297
Comments
out of curiosity why are you using the python |
I am not seeing an opposition in the use of one rather than the other, as I am using using either (or both) depending on the needs. When I pick Python's array, the reason can be one of:
$ python -m timeit -s 'import array' -s 'a = array.array("i", range(1000))' -- 'for e in a:' ' pass'
100000 loops, best of 3: 15.8 usec per loop
$ python -m timeit -s 'import numpy' -s 'a = numpy.array(range(1000))' -- 'for e in a:' ' pass'
10000 loops, best of 3: 77.2 usec per loop
|
your first point is valid, though you speed test is not testing anything, try doing actual operations
numpy support the buffer protocol |
Note: the issue report is about is about a missing functionality. Should you wish (and have the powers) to decide not to Otherwise:
|
@lgautier I don't have a problem with your request. I am simply pointing out to you that IMHO there is no reason to use Your 3rd point is also very odd; if you want lightweight, then how would you expect to use your second point is also not a use case. accessing elements is not a relevant benchmark as it tests a very specific criteria (which maybe array is optimized for); as to your first point, http://sourceforge.net/projects/numpy/files/NumPy/stats/timeline?dates=2013-01-01+to+2013-07-19, seems like a lot to me |
I'd like to keep the issue report on track: this is about suggesting that objects supporting the buffer protocol should be accepted by constructors for To keep answering to what has moved, in my opinion, to an extraneous discussion, may be because of mutual incomprehension. Point 1: I looked at the link, and I really could not get the slightest hint that "numpy is a base for most Python packages" (if the thread was more relaxed I could have made the comment that the graph caught my attention first, and that a striking feature is the decreasing number of downloads over time... ;-) ). If the number of download is what I should be looking at, that's not a small number, but a little perspective would not hurt: $ vanity sqlalchemy | tail -n 1
SQLAlchemy has been downloaded 2183196 times!
$ vanity numpy | tail -n 1
numpy has been downloaded 883206 times! I would certainly not claim that most Python packages are based on SQLAlchemy, or not even that most Python packages having anything to do with SQL are using SQLAlchemy. Point 2: accessing elements in an array quickly is either not an operation, or too specialized ?! Come on ! Point 3: that's also in answer to your question about why one would ever use Python's arrays. I just listed what is sometimes making me use |
ok...I have marked this for 0.13. Not a big deal. Your point 3 is obviously completely False; you cannot use |
Thanks.
I don't want to be arguing over a controversial issue then... ;-)
Could you point out which part is (allegedly) false ? I'd happy to try clarifying. Your comment suggests that you might have understood that I am claiming that
|
What evidence is there to support this, other than the rarely-used element-by-element access in pure Python? A factor of ~3 for a rare use case is not evidence. |
sorry this got out of hand :) I was honestly trying to find a usecase for |
I think supporting objects that implement the buffer protocol is probably okay. E.g., you could pass a Cython |
@cpcloud you could easily do that in wrappers, |
yeah was thinking something similar. crossing too much of the api for something probably not that useful |
now we have a combo: unsubstantiated judgment (first part), lecturing people about what their code is doing (without knowing much of the said code), and comments (there and in the rest of the post) largely showing a miscomprehension of what is discussed. Sorry for teasing, but can you read the thread ? |
@lgautier Why don't you just submit a PR then instead of wasting time on pedantic arguments? It sounds like you're the only one that understands what's going on in your head, so let the code speak for itself. |
I am tempted to say: "I did not expect the Spanish Inquisition"
Sure. Note that I understand the question as: ("where does While To give a concrete example. I can have a simple module "foo.py" from array import array
def load_data():
genome_pos = array.fromfile('genome_pos.dat') # sorted genomic positions
signal = array.fromfile('signal.dat') # some signal (let's keep it short, this is an example)
return (genome_pos, signal)
import bisect
def interval_indices(genome_pos, pos_a, pos_b):
i_a = bisect.bisect(genome_pos, pos_a)
i_b = bisect.bisect(genome_pos, pos_b)
return (i_a, i_b) That module can be used to put together a quick application. import foo
genome_pos, signal = foo.load_data()
from flask import Flask
app = Flask(__name__)
from flask import request
@app.route('/slice/<pos_a>/<pos_b>')
def hello(pos_a=None, pos_b=None):
i_a, i_b = foo.interval_indices(genome_pos, pos_a, pos_b)
return render_template('xyplot.html', x=genome_pos[i_a:i_b], y=signal[i_a:i_b])
if __name__ == '__main__':
app.run()
This leads to the second question: The convenience of the buffer interface is that, should computation with |
The problem finally seem to be an odd restriction: acceptable sequences were list, tuple, or numpy.ndarray. A guess is that strings in Python wanted to be considered scalars, but the above restriction was implemented rather than take anything with a __len__ except str (or unicode for Python 2.x) in. Considerations about the use of the buffer interface are not needed for now. note: I cannot run tests on Python 2.7. ``` nosetests pandas ``` ends with: ``` from . import hashtable, tslib, lib ImportError: cannot import name hashtable ```
The buffer interface turned out to be out of the equation for the fix. Pull request: |
Also, since x has a
__len__
may be the error message should not refer to a scalar.Explicit casting to numpy arrays is a workaround, but it would seem convenient to have it done within the constructor for
DataFrame
:The text was updated successfully, but these errors were encountered: