Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading from mongodb (BSON) #4329

Closed
hayd opened this issue Jul 23, 2013 · 18 comments
Closed

Reading from mongodb (BSON) #4329

hayd opened this issue Jul 23, 2013 · 18 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label

Comments

@hayd
Copy link
Contributor

hayd commented Jul 23, 2013

Now we have a neat read_json function (#3804), it'd be really neat if we could read in from mongo using this i.e. without have to create an intermediary python object.

I wasn't even sure how to return a json (bson) string from a mongo query, though surely this ought to be relatively easy...

I've no idea how much of an issue the BSON parts are, or whether these can be converted after the fact...

http://stackoverflow.com/questions/17805304/how-can-i-load-data-from-mongodb-collection-into-pandas-dataframe

@jreback
Copy link
Contributor

jreback commented Jul 23, 2013

@hayd
Copy link
Contributor Author

hayd commented Jul 23, 2013

I think I'm missing something... Can I use msgpack to read from mongo? (I didn't think you could...)

@jreback
Copy link
Contributor

jreback commented Jul 23, 2013

no....its just a competing format

@ghost
Copy link

ghost commented Oct 25, 2013

Has anyone here used Monary for doing this. I was looking for a fast method for hooking up pandas and mongo and came across this:

https://bitbucket.org/djcbeach/monary/wiki/Home

I found it via this blog post:
http://alexgaudio.com/2012/07/07/monarymongopandas.html

@hayd
Copy link
Contributor Author

hayd commented Oct 25, 2013

@dam5h Thanks for linking to that, it's shame it's not on pip (I posted on google groups to ask author about that, if no response I might just set it up myself :s). It's under the Apache so potentially we could migrate it... easier (at least for now) just to add to the cookbook?

@ghost
Copy link

ghost commented Oct 28, 2013

@hayd Sounds good to me! I haven't tried using it yet but it looks interesting.

@lJoublanc
Copy link

Just thought I would share my experience here: I've managed to re-use the msgpack code to create very crude serialization to MongoDB (deserialization not tried yet). This involved calling the encode method of packers.py recursively on a dataframe to produe a dict, and then calling pymongo The only other change I made was to monkey-patch pandas.io.packers.convert to wrap any encoded numpy arrays in bson.binary.Binary (currently msgpack stores these as encoded strings in latin-1 base - MongoDB doesn't like this).

@jreback
Copy link
Contributor

jreback commented Dec 29, 2014

interesting

can I benchmark vs monary? see if it's close?

@lJoublanc
Copy link

As I mentioned it's very crude. Still trying to get to grips with publishing notebooks on github with nbviewer, see if this works: https://gist.github.com/lJoublanc/c8591cd8e918024d505a#file-testpickle-ipynb. You'll need python-blosc and pymongo installed.

nbviewer : http://nbviewer.ipython.org/gist/lJoublanc/c8591cd8e918024d505a

@femtotrader
Copy link

Any news about this ?

@lJoublanc
Copy link

@femtotrader was about to suggest you look at arctic as it has BSON (de)serialization into pandas frames implemented, but I see from your profile you already seem to be using it. Does that implementation serve your purpose well?

@femtotrader
Copy link

Thanks @lJoublanc I'm giving a try to Arctic... I wondered if Monary wasn't more efficient... but road is not so straight than using Arctic https://bitbucket.org/djcbeach/monary/issues/19/use-pandas-series-dataframe-and-panel-with

@drafter250
Copy link

Does the bson_numpy project close this? http://bson-numpy.readthedocs.io/en/latest/#

@jreback
Copy link
Contributor

jreback commented Jul 26, 2017

is this a conda package / wheel?

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

Doesn't appear to be the case from the docs, but I think there is a pip package:

https://pypi.python.org/pypi/BSON-NumPy/0.1

@jreback
Copy link
Contributor

jreback commented Jul 27, 2017

source only are not very useful when a c library is needed

@gfyoung
Copy link
Member

gfyoung commented Jul 27, 2017

Ah, that's a good point.

@wesm wesm added the Won't Fix label Jul 6, 2018
@wesm
Copy link
Member

wesm commented Jul 6, 2018

Closing; this might be better addressed in an external package

@wesm wesm closed this as completed Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

7 participants