-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed improvement. #13
Comments
I know that @SylvainCorlay has some idea about it, did you have anything written up Sylvain? I had an idea based on the image transfer I do for the volume data. You could turn a numpy array to a png image, every pixels (rgba) would be 1 float32 value (for transferring data i think that is precise enough). On the js side you'd have to draw it on a canvas, and then read the pixels. PNG (can be/) is lossless and you'd get compression for free. But the best way would be to do a binary transfer over a websocket, but I guess that requires a seperate websocket, otherwise we'd have to do base64 or base85 encoding/decoding. |
I just noticed this: bqplot/bqplot#150 as en example, I also noticed there is a native atob function, but maybe not all browsers support it, so that may need a fallback. |
It is the json_clean function that take a lot of time for big array. |
Sounds good! We could also write a numpy to json code ourselves even, but then I'd like to see a unittest on how it handles NaN etc. But can you share your code here? |
@maartenbreddels we should factor out the custom serializers for numpy arrays in the |
True, but I'm happy to test out these ideas in ipyvolume for now. @jeammimi has a direct use case now, so if figures out a good solution, we should move it from here to traittypes. Would be good to see a (custom) json vs base64 encoding to see if it's worth the effort. I'm pretty sure it is (in favor for base64). |
Comms and widgets can handle binary serialization... |
Really? and can you get an ArrayBuffer with a handle to the data? Any code examples, or hints on how to do that. That would mean 0 overhead right? |
That is right. It is documented in comms and handled when there are memory views in the widgets. I really think that it is worth the effort to do this at a common level for visualization widget libraries. This is the reason why I moved the |
So it is a little bit ugly because i didn't manage to wrap the recursive function |
@jeammimi could you try instead sth along these lines import base64
....
a = np.array(value)
data = base64.encodebytes(a.tobytes())
# return data for the js side And on the js side you'd have to do a custom deserializer (check volume.js for 'serializers'), and define one for 'x', 'y' etc. |
@SylvainCorlay ok, I saw this here: https://github.com/jupyter/notebook/blob/master/notebook/tests/test_serialize.py |
Yes, if to_json, from_json contains a memoryview, at the top level, is is passed as a buffer in the zmq and websocker message. |
please submit it to traittypes! This would help everyone. @vidartf was also interested in binary messaging for numpy arrays. |
I see, it's quite simple! The only thing is that the |
I tried with the base64. |
Ok, deserializers are getting used. So, I got something working.. just to let you know, on the python side: def array_to_binary(ar, obj=None):
if ar is not None:
ar = ar.astype(np.float64)
mv = memoryview(ar)
return mv
else:
return None on the js side: function binary_array(data, manager) {
console.log("binary array")
console.log(data)
if(data) {
window.ardata = data
var ar = new Float64Array(data.buffer)
window.far = ar
return ar
}
return
}
var ScatterModel = widgets.WidgetModel.extend({
defaults: function() {
return _.extend(widgets.WidgetModel.prototype.defaults(), {
.....
})
}}, {
serializers: _.extend({
x: { deserialize: binary_array },
y: { deserialize: binary_array },
z: { deserialize: binary_array },
}, widgets.WidgetModel.serializers)
}); I thought I could return a json with memory views in it, but it only accepts a memoryview, json_clean doesn't seem to accept that. |
I thought a bit about it, and I think the issue is Widget. _split_state_buffers. It needs to go throught dicts and lists (similar to @SylvainCorlay should I take this discussion to ipywidgets? Or do you think a PR like this can be merged without issues? (Only thing is getting the dev version of ipywidgets to install on my computer. @jeammimi what do you think in the meantime, base64, json, custom json, or some hack to get the binary transfer working? |
The transfer base64 seems to be quite fast, but converting the array to base64 takes some time, |
I think that's gonna be a non-friendly solution, you'd have to monkey patch it, I think that's a no go. What I propose, is we do it 2 ways, the (default) json method, and the binary transfer. The last thing requires a patch to ipywidgets (I'm working on it now), but this is gonna be the best route anyway. This however means that you'd have to install ipywidgets from git (it's not always easy). Therefore I thing we should for the moment by default to json (slow), and by setting say |
Well indeed my hack is ugly (but I use it right now because it is ten time faster). |
About the base64, the atob function turns it into a binary string. Then you'd have to mantually convert every (say 4) bytes to a float. About the binary transfer. Now with this PR jupyter-widgets/ipywidgets#1194 I've implemented the basics in d008749 , but it (probably) does not work with animations yet. So maybe you can take a look at that. |
Just to let you know there exists these libraries: Would make live easier on the js side |
Haha, you got the perfect timing. |
I took some of the code from here: |
Really? That sounds good!. Ok, I gave it a little thought.. |
This is really looking promising, also for situations where you cannot do the whole animation in the browser, but need to feed the widget with data every X msec, if this is fast enough, we could maybe do sth like 10^4-5 particles, animated. |
I am not sure we need any of this library. |
@jeammimi nice experiments! This is really promising. I like that you are following the npy format spec. |
In terms of front-end package for the implementation, ndarray looks promising though. |
I would really go for this package, it's lightweird (the source is small: https://github.com/scijs/ndarray/blob/master/ndarray.js ) and they care about performance. They tested out all these things for performance, and it opens up many options (fft, image manipulation, name it: http://scijs.net/packages/ ) The code is also now license, ndarray is MIT, so we're save there. What do you think? |
It also seem that we can do (http://scijs.net/packages/#scijs/ndarray): sizes = ndarray(data, shape=[10, 100]) // sequence of 10, with 100 sizes
var sizes_attr = new THREE.InstancedBufferAttribute(sizes.pick(sequence_index, null) , 1, 1); No loops needed, just plain memcpy's! |
I am really not good with javascript basics: define('hello', ["jupyter-js-widgets","ndarray"], function(widgets,ndarray) {
} but it does not work |
Good question. What I did, is
you already get ndarray in the global namespace as I saw from the dev console (actually, that shouldn't happen). If that does not happen, put |
Is there a way to avoid that we have to convert the array after everycall of |
Yes, see the (de)serialize here |
Hi, |
Any JS error to share? Maybe take this here: https://gitter.im/maartenbreddels/ipyvolume |
Ok, I now have all three methods implemented, see 88c3ad0. Set
Color isn't attacked with this though, when performance >= 0, it should be possible to transfer it the same way if not a list of strings. PR welcome 😉 Btw, I've added examples/test.ipynb for some basic testing, don't execute the cells directly after eachother, give a 1 few mseconds, savefig isn't really working that well yet. The last part in this notebook is kind of a performance test/demo. Saw quite a speedup going from performance=0 to 1 |
I consider this closed now, further improvements can go in a seperate issue. |
I am starting to use it with my work, and in this example case, I am working with array of size 1000x2000 and it starts taking some time to execute.
Apparently a lot of time is spend on the checking and transfer of the data from python to javascript.
Any ideas of to improve this part ?
I provide the output of prun:
The text was updated successfully, but these errors were encountered: