-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typed arrays support #2388
Typed arrays support #2388
Conversation
... to avoid confusion with Array.isArray and upcoming Lib.isTypedArray
... and use typed array 'subarray' to slice coordinate arrays to series length
... that should accept typed array (mostly arrayOk attribute, with marker.size and marker.color being to most likely candidates for typed array inputs).
- do the Lib.extend* methods do the right for typed arrays - what to do with 2d arrays? Should we start supporting ndarrays? - should we invent a JSON-seriliazable version of typed arrays?
src/plots/cartesian/set_convert.js
Outdated
for(i = 0; i < len; i++) { | ||
arrayOut[i] = ax.d2c(arrayIn[i], 0, cal); | ||
if(ax.type === 'linear' && Lib.isTypedArray(arrayIn) && arrayIn.subarray) { | ||
arrayOut = arrayIn.subarray(0, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More info: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/subarray
Bypassing isNumeric(v) ? Number(v) : BADNAM
coercion saves about ~50ms at 1e4 points during scatter calc step. 🐎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typed arrays can't hold undefined
, only NaN
, right? So do we need to switch to NaN
as BADNUM
? Though this has the follow-on problem that NaN === NaN
doesn't work, so v === BADNUM
would have to be replaced by IsNaN(v)
when we already know it's a number and I guess IsNaN(v) && typeof v === 'number'
if we don't... That's pretty annoying. But if we DON'T, won't we have errors dealing with missing data in typed arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... so yeah connectgaps: false
is probably broken for scatter
with typed arrays. Let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for completeness while you're looking into edge cases, (+/-)Infinity
are also allowed in float typed arrays. Aside from users directly providing Infinity
, I presume this comes up in gl code when we drop doubles to singles, if the input is too large. I believe (+/-)Infinity
and NaN
are the only non-numerics that have representations in standard floating point encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me surprise, connectgaps: false
is working fine with typed arrays:
Here are all the places we check equality with BADNUM
:
by the looks of it, these are all places that check gd.calcdata
values (i.e. not in a typed array). Luckily isNumeric
is used elsewhere (which works for fine detecting NaN
s in typed arrays`).
I'll test out a few more cases this afternoon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I couldn't find an example of BADNUM
breaking things for typed arrays.
Replacing BADNUM
s with NaN
s (and isNaN
) might help us be more typed-array friendly down the road, but as long as we're using arrays of object of our calcdata
for most things, BADNUM
should work just fine.
Note that these two lines in scattergl calc aren't necessary (as Number(BADNUM) // -> NaN
), but it makes things clear that BADNUM
and typed array aren't the best of friends, so I'll keep it there.
/cc @jmmease |
@jackparmer Yeah, having this alongside plotly/plotly.py#942 will let us transfer numpy arrays from Python into Plotly.js traces without any data marshaling! Using the binary ipywidgets protocol for the Python -> JS transfer already reduces the plot time for a 1-million point scattergl plot from ~24 seconds (using |
@jmmease Thanks for input! I'm curious though: how does one transfer numpy arrays to JS |
@etpinard When working in the Jupyter Notebook, the ipywidgets library handles syncronizing back-end Python objects with a front-end JavaScript model (This is done over ZMQ and Websockets, see here). The Python -> JavaScript serialization logic has special handling for binary Python buffers (specifically So, during serialization on the Python side, I wrap numpy arrays in a Then, during deserialization on the JavaScript side, the binary buffers are passed to the constructor of a Typed array (see here), where the numpy datatype metadata is used to lookup the appropriate TypedArray constructor (See here). Currently, I immediately convert the constructed typed array into a standard array (see here), but with your changes, it looks like we won't need to do this conversion, and will be able to pass the typed arrays directly into Plotly.js. This should make for a really efficient interactive visualization experience for Jupyter Notebook users, especially when used in conjunction with WebGL traces! |
@etpinard I really like the idea of a JSON-serializable encoding of typed arrays. I'd like to find an efficient way to save figures involving large arrays to disk. In your first example (copied below) what did you picture the value of the
I assume that a JSON list of numbers should be supported, but this wouldn't really offer efficiency gains in terms of storage size and (de)serialization time. Would it make sense to also support a HEX-string encoding of the typed array buffer? If there is a |
... replot is sufficient since regl push.
... to make things look a little more like the rest of plotly.js
- reuse scatter axis-expansion logic - improve 'fast' axis expand routine (using average marker.size as pad value) - use ax.makeCalcdata for all axis types (this creates a new array for linear axes, but makes thing more robust) - add a few TODOs
- most notable change is in gl2d_axes_label2 that now shows the correct to-zero autorange.
... by merging the concat and splice steps (which can't be done using native methods on typed arrays)
src/lib/is_array.js
Outdated
// IE9 fallback | ||
var ab = (typeof ArrayBuffer === 'undefined' || !ArrayBuffer.isView) ? | ||
{isView: function() { return false; }} : | ||
ArrayBuffer; | ||
|
||
module.exports = function isArray(a) { | ||
exports.isArrayOrTypedArray = function(a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perusing https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer/isView - there's also DataView
that passes but looks like it's a little too generic to be meaningful to us as an input. Should we be concerned about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
improved in f0395b5
src/plots/cartesian/set_convert.js
Outdated
if(ax.type === 'linear' && Lib.isTypedArray(arrayIn) && arrayIn.subarray) { | ||
arrayOut = arrayIn.subarray(0, len); | ||
} else { | ||
arrayOut = new Array(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases like this where we know the length and we know we're dealing with numbers, would we benefit by using typed arrays internally? When they're supported of course - we'd need a helper to revert to Array
in IE9. And subject to my question about NaN
.
For a later time regardless...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an idea. Saving a few isNumeric
(assuming isNaN
is faster) calls downstream of the makeCalcdata
should help a little bit.
src/traces/bar/calc.js
Outdated
@@ -11,6 +11,7 @@ | |||
|
|||
var isNumeric = require('fast-isnumeric'); | |||
|
|||
var Lib = require('../../lib'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you go back and forth between this and isArrayOrTypedArray = require('../../lib').isArrayOrTypedArray;
- not a big deal but the latter seems marginally preferable to me when you don't need anything else from Lib
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned up in 9b83826
@@ -8,6 +8,8 @@ | |||
|
|||
'use strict'; | |||
|
|||
var isArrayOrTypedArray = require('../../lib').isArrayOrTypedArray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is map_2d_array
- and we're not supporting typed 2D arrays, at least not yet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eyes. Only the inner arrays should be allowed to be typed arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in d4cb0c4
@@ -615,7 +615,7 @@ describe('@gl parcoords', function() { | |||
function restyleDimension(key, setterValue) { | |||
|
|||
// array values need to be wrapped in an array; unwrapping here for value comparison | |||
var value = Lib.isArray(setterValue) ? setterValue[0] : setterValue; | |||
var value = Array.isArray(setterValue) ? setterValue[0] : setterValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Lib.isArray
is no more. It was replaced by Lib.isArrayOrTypedArray
and Lib.isTypedArray
. Here, these parcoord test don't use typed arrays, so good old Array.isArray
calls suffice.
src/plots/cartesian/set_convert.js
Outdated
if(len === arrayIn.length) { | ||
return arrayIn; | ||
} else if(arrayIn.subarray) { | ||
return arrayIn.subarray(0, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work for linear and log axes, but what happens if you feed numeric data to a non-numeric (date or category) axis? Both of those could be real use cases, but both also alter the input numbers, so I think they need to bail out to the d2c
block below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and 🔒 in 6dd2f69
if(Array.isArray(color)) { | ||
if(Lib.isTypedArray(color)) { | ||
isArrayWithOneNumber = true; | ||
} else if(Array.isArray(color)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again to my question about missing data in typed arrays... here can't we just let these typed arrays drop into the for loop as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done and 🔒 in 306986d
src/traces/heatmap/has_columns.js
Outdated
module.exports = function(trace) { | ||
return !Array.isArray(trace.z[0]); | ||
return !Lib.isArrayOrTypedArray(trace.z[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it work to provide a 2D array as an array of typed arrays? Not that I want to encourage this, far better for us to provide a solution based on a single typed array...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it work to provide a 2D array as an array of typed arrays?
Should have looked at the next commit before commenting 🏆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, arrays of typed arrays for should work after this PR.
@@ -16,7 +18,7 @@ | |||
module.exports = function mapArray(out, data, func) { | |||
var i; | |||
|
|||
if(!Array.isArray(out)) { | |||
if(!isArrayOrTypedArray(out)) { | |||
// If not an array, make it an array: | |||
out = []; | |||
} else if(out.length > data.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah great. So this is fine, but I guess 🐎 at some point we could make a helper that uses slice
on regular arrays and subarray
on typed arrays, for cases like this where we don't need a copy.
On-par autorange for scattergl
- by making sure it returns false on instances of DataView
- we could have used isNaN here, but isNumeric is fast enough that the gains would be negligible.
Excellent work - should be a big step toward higher performance! 💃 🍾 |
@etpinard I just started playing with these changes and I was wondering if the following behavior is expected. When I create a scatter (or scattergl) trace using a typed array as the x/y values the trace displays properly. When I look at the gd.data[0].x property I see a typed array, but when I look at the gd._fullData[0].x attribute I see a standard (non-typed) array. For memory efficiency, I was thinking it would be nice for _fullData to store the typed version as well. I can open a separate issue, just wanted to get your initial take on it. |
That shouldn't happen. Would you mind sharing a reproducible example? |
Never mind, I'm not able to reproduce it. Must be a bug somewhere else in my code. Thanks! |
Resolves #860 (hopefully)
Supporting typed arrays inputs should allow users to build more memory-conscious apps around plotly.js (cc #1784). Moreover, typed arrays will allow us to bypass numeric-object-to-number coercion speeding up the
calc
step for various trace types (see proof-of-concept in 45c2f35).Now, a few things to take in considerations that the commits below do not cover:
[new Float32Array([1,2,3]), new Float32Array([2,1,3])]
), but that doesn't sound very user friendly. Maybe adding first-classndarray
support would be worthwhile?wouldn't conflict with the existing api, and should allow us to bring some performance benefit to r/python/dash users that declare their data types.