Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed arrays support #2388

Merged
merged 28 commits into from
Feb 28, 2018
Merged

Typed arrays support #2388

merged 28 commits into from
Feb 28, 2018

Conversation

etpinard
Copy link
Contributor

Resolves #860 (hopefully)

Supporting typed arrays inputs should allow users to build more memory-conscious apps around plotly.js (cc #1784). Moreover, typed arrays will allow us to bypass numeric-object-to-number coercion speeding up the calc step for various trace types (see proof-of-concept in 45c2f35).


Now, a few things to take in considerations that the commits below do not cover:

  • Typed arrays are nice drop-in replacements for 1D arrays, but what should we do with 2D data arrays? It would easy to extend support for arrays of typed arrays (e.g [new Float32Array([1,2,3]), new Float32Array([2,1,3])]), but that doesn't sound very user friendly. Maybe adding first-class ndarray support would be worthwhile?
  • Should we add support for a JSON-serializable version of typed arrays? Something like:
trace = {
  y: {
     type: 'float32',
     vals: [/* */]
  }
}

// or
trace = {
  yarraytype: 'float32',
  y: [/* ... */]
}

wouldn't conflict with the existing api, and should allow us to bring some performance benefit to r/python/dash users that declare their data types.

... to avoid confusion with Array.isArray and upcoming
    Lib.isTypedArray
... and use typed array 'subarray' to slice coordinate arrays
    to series length
... that should accept typed array (mostly arrayOk attribute,
    with marker.size and marker.color being to most likely
    candidates for typed array inputs).
- do the Lib.extend* methods do the right for typed arrays
- what to do with 2d arrays? Should we start supporting ndarrays?
- should we invent a JSON-seriliazable version of typed arrays?
for(i = 0; i < len; i++) {
arrayOut[i] = ax.d2c(arrayIn[i], 0, cal);
if(ax.type === 'linear' && Lib.isTypedArray(arrayIn) && arrayIn.subarray) {
arrayOut = arrayIn.subarray(0, len);
Copy link
Contributor Author

@etpinard etpinard Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More info: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/subarray

Bypassing isNumeric(v) ? Number(v) : BADNAM coercion saves about ~50ms at 1e4 points during scatter calc step. 🐎

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typed arrays can't hold undefined, only NaN, right? So do we need to switch to NaN as BADNUM? Though this has the follow-on problem that NaN === NaN doesn't work, so v === BADNUM would have to be replaced by IsNaN(v) when we already know it's a number and I guess IsNaN(v) && typeof v === 'number' if we don't... That's pretty annoying. But if we DON'T, won't we have errors dealing with missing data in typed arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, typed arrays "call" Number(v) on each item before "inserting" them into the typed array:

image

so yeah (I think) that means NaNs are the only non-number values accepted in typed arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... so yeah connectgaps: false is probably broken for scatter with typed arrays. Let me check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for completeness while you're looking into edge cases, (+/-)Infinity are also allowed in float typed arrays. Aside from users directly providing Infinity, I presume this comes up in gl code when we drop doubles to singles, if the input is too large. I believe (+/-)Infinity and NaN are the only non-numerics that have representations in standard floating point encodings.

Copy link
Contributor Author

@etpinard etpinard Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me surprise, connectgaps: false is working fine with typed arrays:

image


Here are all the places we check equality with BADNUM:

image

by the looks of it, these are all places that check gd.calcdata values (i.e. not in a typed array). Luckily isNumeric is used elsewhere (which works for fine detecting NaNs in typed arrays`).

I'll test out a few more cases this afternoon.

Copy link
Contributor Author

@etpinard etpinard Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I couldn't find an example of BADNUM breaking things for typed arrays.

Replacing BADNUMs with NaNs (and isNaN) might help us be more typed-array friendly down the road, but as long as we're using arrays of object of our calcdata for most things, BADNUM should work just fine.

Note that these two lines in scattergl calc aren't necessary (as Number(BADNUM) // -> NaN), but it makes things clear that BADNUM and typed array aren't the best of friends, so I'll keep it there.

@jackparmer
Copy link
Contributor

/cc @jmmease

@jonmmease
Copy link
Contributor

@jackparmer Yeah, having this alongside plotly/plotly.py#942 will let us transfer numpy arrays from Python into Plotly.js traces without any data marshaling! Using the binary ipywidgets protocol for the Python -> JS transfer already reduces the plot time for a 1-million point scattergl plot from ~24 seconds (using iplot) to ~4 seconds (using FigureWidget). I'm excited to see how far down we'll be able to push that with these changes.

@etpinard
Copy link
Contributor Author

@jmmease Thanks for input! I'm curious though: how does one transfer numpy arrays to JS Float32Arrays ?

@jonmmease
Copy link
Contributor

@etpinard When working in the Jupyter Notebook, the ipywidgets library handles syncronizing back-end Python objects with a front-end JavaScript model (This is done over ZMQ and Websockets, see here).

The Python -> JavaScript serialization logic has special handling for binary Python buffers (specifically memoryview, bytearray, and bytes objects). As I understand it, these buffers are transferred to the front end in binary form, without any ASCII encoding.

So, during serialization on the Python side, I wrap numpy arrays in a memoryview and combine them in a dict with the numpy datatype and shape metadata. (see here)

Then, during deserialization on the JavaScript side, the binary buffers are passed to the constructor of a Typed array (see here), where the numpy datatype metadata is used to lookup the appropriate TypedArray constructor (See here).

Currently, I immediately convert the constructed typed array into a standard array (see here), but with your changes, it looks like we won't need to do this conversion, and will be able to pass the typed arrays directly into Plotly.js. This should make for a really efficient interactive visualization experience for Jupyter Notebook users, especially when used in conjunction with WebGL traces!

@jonmmease
Copy link
Contributor

@etpinard I really like the idea of a JSON-serializable encoding of typed arrays. I'd like to find an efficient way to save figures involving large arrays to disk.

In your first example (copied below) what did you picture the value of the vals property being?

trace = {
  y: {
     type: 'float32',
     vals: [/* */]
  }
}

I assume that a JSON list of numbers should be supported, but this wouldn't really offer efficiency gains in terms of storage size and (de)serialization time. Would it make sense to also support a HEX-string encoding of the typed array buffer?

If there is a shape property alongside type and vals then this same HEX-string approach could also be used to encode multi-dimensional arrays (e.g. by assuming row-major ordering).

... replot is sufficient since regl push.
... to make things look a little more like
    the rest of plotly.js
- reuse scatter axis-expansion logic
- improve 'fast' axis expand routine (using average marker.size
  as pad value)
- use ax.makeCalcdata for all axis types (this creates a new array
  for linear axes, but makes thing more robust)
- add a few TODOs
- most notable change is in gl2d_axes_label2 that
  now shows the correct to-zero autorange.
// IE9 fallback
var ab = (typeof ArrayBuffer === 'undefined' || !ArrayBuffer.isView) ?
{isView: function() { return false; }} :
ArrayBuffer;

module.exports = function isArray(a) {
exports.isArrayOrTypedArray = function(a) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perusing https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer/isView - there's also DataView that passes but looks like it's a little too generic to be meaningful to us as an input. Should we be concerned about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improved in f0395b5

if(ax.type === 'linear' && Lib.isTypedArray(arrayIn) && arrayIn.subarray) {
arrayOut = arrayIn.subarray(0, len);
} else {
arrayOut = new Array(len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases like this where we know the length and we know we're dealing with numbers, would we benefit by using typed arrays internally? When they're supported of course - we'd need a helper to revert to Array in IE9. And subject to my question about NaN.

For a later time regardless...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an idea. Saving a few isNumeric (assuming isNaN is faster) calls downstream of the makeCalcdata should help a little bit.

@@ -11,6 +11,7 @@

var isNumeric = require('fast-isnumeric');

var Lib = require('../../lib');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you go back and forth between this and isArrayOrTypedArray = require('../../lib').isArrayOrTypedArray; - not a big deal but the latter seems marginally preferable to me when you don't need anything else from Lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up in 9b83826

@@ -8,6 +8,8 @@

'use strict';

var isArrayOrTypedArray = require('../../lib').isArrayOrTypedArray;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is map_2d_array - and we're not supporting typed 2D arrays, at least not yet, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eyes. Only the inner arrays should be allowed to be typed arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in d4cb0c4

@@ -615,7 +615,7 @@ describe('@gl parcoords', function() {
function restyleDimension(key, setterValue) {

// array values need to be wrapped in an array; unwrapping here for value comparison
var value = Lib.isArray(setterValue) ? setterValue[0] : setterValue;
var value = Array.isArray(setterValue) ? setterValue[0] : setterValue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need to do this?

Copy link
Contributor Author

@etpinard etpinard Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Lib.isArray is no more. It was replaced by Lib.isArrayOrTypedArray and Lib.isTypedArray. Here, these parcoord test don't use typed arrays, so good old Array.isArray calls suffice.

if(len === arrayIn.length) {
return arrayIn;
} else if(arrayIn.subarray) {
return arrayIn.subarray(0, len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work for linear and log axes, but what happens if you feed numeric data to a non-numeric (date or category) axis? Both of those could be real use cases, but both also alter the input numbers, so I think they need to bail out to the d2c block below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and 🔒 in 6dd2f69

if(Array.isArray(color)) {
if(Lib.isTypedArray(color)) {
isArrayWithOneNumber = true;
} else if(Array.isArray(color)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again to my question about missing data in typed arrays... here can't we just let these typed arrays drop into the for loop as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and 🔒 in 306986d

module.exports = function(trace) {
return !Array.isArray(trace.z[0]);
return !Lib.isArrayOrTypedArray(trace.z[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to provide a 2D array as an array of typed arrays? Not that I want to encourage this, far better for us to provide a solution based on a single typed array...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to provide a 2D array as an array of typed arrays?

Should have looked at the next commit before commenting 🏆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, arrays of typed arrays for should work after this PR.

@@ -16,7 +18,7 @@
module.exports = function mapArray(out, data, func) {
var i;

if(!Array.isArray(out)) {
if(!isArrayOrTypedArray(out)) {
// If not an array, make it an array:
out = [];
} else if(out.length > data.length) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ this block has a .slice ⚠️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great. So this is fine, but I guess 🐎 at some point we could make a helper that uses slice on regular arrays and subarray on typed arrays, for cases like this where we don't need a copy.

@alexcjohnson
Copy link
Collaborator

Excellent work - should be a big step toward higher performance! 💃 🍾

@etpinard etpinard merged commit f8e7ee4 into master Feb 28, 2018
@etpinard etpinard deleted the typed-arrays-support branch February 28, 2018 20:28
@jonmmease
Copy link
Contributor

@etpinard I just started playing with these changes and I was wondering if the following behavior is expected.

When I create a scatter (or scattergl) trace using a typed array as the x/y values the trace displays properly. When I look at the gd.data[0].x property I see a typed array, but when I look at the gd._fullData[0].x attribute I see a standard (non-typed) array.

For memory efficiency, I was thinking it would be nice for _fullData to store the typed version as well. I can open a separate issue, just wanted to get your initial take on it.

@etpinard
Copy link
Contributor Author

When I create a scatter (or scattergl) trace using a typed array as the x/y values the trace displays properly. When I look at the gd.data[0].x property I see a typed array, but when I look at the gd._fullData[0].x attribute I see a standard (non-typed) array.

That shouldn't happen. Would you mind sharing a reproducible example?

@jonmmease
Copy link
Contributor

Never mind, I'm not able to reproduce it. Must be a bug somewhere else in my code. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants