Typed arrays support #2388

etpinard · 2018-02-20T21:35:48Z

Resolves #860 (hopefully)

Supporting typed arrays inputs should allow users to build more memory-conscious apps around plotly.js (cc #1784). Moreover, typed arrays will allow us to bypass numeric-object-to-number coercion speeding up the calc step for various trace types (see proof-of-concept in 45c2f35).

Now, a few things to take in considerations that the commits below do not cover:

Typed arrays are nice drop-in replacements for 1D arrays, but what should we do with 2D data arrays? It would easy to extend support for arrays of typed arrays (e.g [new Float32Array([1,2,3]), new Float32Array([2,1,3])]), but that doesn't sound very user friendly. Maybe adding first-class ndarray support would be worthwhile?
Should we add support for a JSON-serializable version of typed arrays? Something like:

trace = {
  y: {
     type: 'float32',
     vals: [/* */]
  }
}

// or
trace = {
  yarraytype: 'float32',
  y: [/* ... */]
}

wouldn't conflict with the existing api, and should allow us to bring some performance benefit to r/python/dash users that declare their data types.

... to avoid confusion with Array.isArray and upcoming Lib.isTypedArray

... and use typed array 'subarray' to slice coordinate arrays to series length

... that should accept typed array (mostly arrayOk attribute, with marker.size and marker.color being to most likely candidates for typed array inputs).

- do the Lib.extend* methods do the right for typed arrays - what to do with 2d arrays? Should we start supporting ndarrays? - should we invent a JSON-seriliazable version of typed arrays?

etpinard · 2018-02-20T21:37:41Z

src/plots/cartesian/set_convert.js

-            for(i = 0; i < len; i++) {
-                arrayOut[i] = ax.d2c(arrayIn[i], 0, cal);
+            if(ax.type === 'linear' && Lib.isTypedArray(arrayIn) && arrayIn.subarray) {
+                arrayOut = arrayIn.subarray(0, len);


More info: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/subarray

Bypassing isNumeric(v) ? Number(v) : BADNAM coercion saves about ~50ms at 1e4 points during scatter calc step. 🐎

Typed arrays can't hold undefined, only NaN, right? So do we need to switch to NaN as BADNUM? Though this has the follow-on problem that NaN === NaN doesn't work, so v === BADNUM would have to be replaced by IsNaN(v) when we already know it's a number and I guess IsNaN(v) && typeof v === 'number' if we don't... That's pretty annoying. But if we DON'T, won't we have errors dealing with missing data in typed arrays?

As far as I know, typed arrays "call" Number(v) on each item before "inserting" them into the typed array:

so yeah (I think) that means NaNs are the only non-number values accepted in typed arrays.

... so yeah connectgaps: false is probably broken for scatter with typed arrays. Let me check.

Just for completeness while you're looking into edge cases, (+/-)Infinity are also allowed in float typed arrays. Aside from users directly providing Infinity, I presume this comes up in gl code when we drop doubles to singles, if the input is too large. I believe (+/-)Infinity and NaN are the only non-numerics that have representations in standard floating point encodings.

To me surprise, connectgaps: false is working fine with typed arrays:

Here are all the places we check equality with BADNUM:

by the looks of it, these are all places that check gd.calcdata values (i.e. not in a typed array). Luckily isNumeric is used elsewhere (which works for fine detecting NaNs in typed arrays`).

I'll test out a few more cases this afternoon.

All right, I couldn't find an example of BADNUM breaking things for typed arrays.

Replacing BADNUMs with NaNs (and isNaN) might help us be more typed-array friendly down the road, but as long as we're using arrays of object of our calcdata for most things, BADNUM should work just fine.

Note that these two lines in scattergl calc aren't necessary (as Number(BADNUM) // -> NaN), but it makes things clear that BADNUM and typed array aren't the best of friends, so I'll keep it there.

jackparmer · 2018-02-21T01:57:27Z

/cc @jmmease

jonmmease · 2018-02-21T14:32:51Z

@jackparmer Yeah, having this alongside plotly/plotly.py#942 will let us transfer numpy arrays from Python into Plotly.js traces without any data marshaling! Using the binary ipywidgets protocol for the Python -> JS transfer already reduces the plot time for a 1-million point scattergl plot from ~24 seconds (using iplot) to ~4 seconds (using FigureWidget). I'm excited to see how far down we'll be able to push that with these changes.

etpinard · 2018-02-21T15:14:11Z

@jmmease Thanks for input! I'm curious though: how does one transfer numpy arrays to JS Float32Arrays ?

jonmmease · 2018-02-21T16:08:19Z

@etpinard When working in the Jupyter Notebook, the ipywidgets library handles syncronizing back-end Python objects with a front-end JavaScript model (This is done over ZMQ and Websockets, see here).

The Python -> JavaScript serialization logic has special handling for binary Python buffers (specifically memoryview, bytearray, and bytes objects). As I understand it, these buffers are transferred to the front end in binary form, without any ASCII encoding.

So, during serialization on the Python side, I wrap numpy arrays in a memoryview and combine them in a dict with the numpy datatype and shape metadata. (see here)

Then, during deserialization on the JavaScript side, the binary buffers are passed to the constructor of a Typed array (see here), where the numpy datatype metadata is used to lookup the appropriate TypedArray constructor (See here).

Currently, I immediately convert the constructed typed array into a standard array (see here), but with your changes, it looks like we won't need to do this conversion, and will be able to pass the typed arrays directly into Plotly.js. This should make for a really efficient interactive visualization experience for Jupyter Notebook users, especially when used in conjunction with WebGL traces!

jonmmease · 2018-02-25T13:42:24Z

@etpinard I really like the idea of a JSON-serializable encoding of typed arrays. I'd like to find an efficient way to save figures involving large arrays to disk.

In your first example (copied below) what did you picture the value of the vals property being?

trace = {
  y: {
     type: 'float32',
     vals: [/* */]
  }
}

I assume that a JSON list of numbers should be supported, but this wouldn't really offer efficiency gains in terms of storage size and (de)serialization time. Would it make sense to also support a HEX-string encoding of the typed array buffer?

If there is a shape property alongside type and vals then this same HEX-string approach could also be used to encode multi-dimensional arrays (e.g. by assuming row-major ordering).

... replot is sufficient since regl push.

... to make things look a little more like the rest of plotly.js

- reuse scatter axis-expansion logic - improve 'fast' axis expand routine (using average marker.size as pad value) - use ax.makeCalcdata for all axis types (this creates a new array for linear axes, but makes thing more robust) - add a few TODOs

- most notable change is in gl2d_axes_label2 that now shows the correct to-zero autorange.

... by merging the concat and splice steps (which can't be done using native methods on typed arrays)

alexcjohnson · 2018-02-28T01:44:07Z

src/lib/is_array.js

 // IE9 fallback
 var ab = (typeof ArrayBuffer === 'undefined' || !ArrayBuffer.isView) ?
    {isView: function() { return false; }} :
    ArrayBuffer;

-module.exports = function isArray(a) {
+exports.isArrayOrTypedArray = function(a) {


perusing https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer/isView - there's also DataView that passes but looks like it's a little too generic to be meaningful to us as an input. Should we be concerned about this?

improved in f0395b5

alexcjohnson · 2018-02-28T02:56:55Z

src/plots/cartesian/set_convert.js

+            if(ax.type === 'linear' && Lib.isTypedArray(arrayIn) && arrayIn.subarray) {
+                arrayOut = arrayIn.subarray(0, len);
+            } else {
+                arrayOut = new Array(len);


In cases like this where we know the length and we know we're dealing with numbers, would we benefit by using typed arrays internally? When they're supported of course - we'd need a helper to revert to Array in IE9. And subject to my question about NaN.

For a later time regardless...

That's an idea. Saving a few isNumeric (assuming isNaN is faster) calls downstream of the makeCalcdata should help a little bit.

alexcjohnson · 2018-02-28T03:01:19Z

src/traces/bar/calc.js

@@ -11,6 +11,7 @@

 var isNumeric = require('fast-isnumeric');

+var Lib = require('../../lib');


you go back and forth between this and isArrayOrTypedArray = require('../../lib').isArrayOrTypedArray; - not a big deal but the latter seems marginally preferable to me when you don't need anything else from Lib.

cleaned up in 9b83826

alexcjohnson · 2018-02-28T03:05:40Z

src/traces/carpet/map_2d_array.js

@@ -8,6 +8,8 @@

 'use strict';

+var isArrayOrTypedArray = require('../../lib').isArrayOrTypedArray;


this file is map_2d_array - and we're not supporting typed 2D arrays, at least not yet, right?

Good eyes. Only the inner arrays should be allowed to be typed arrays.

fixed in d4cb0c4

alexcjohnson · 2018-02-28T03:10:45Z

test/jasmine/tests/parcoords_test.js

@@ -615,7 +615,7 @@ describe('@gl parcoords', function() {
            function restyleDimension(key, setterValue) {

                // array values need to be wrapped in an array; unwrapping here for value comparison
-                var value = Lib.isArray(setterValue) ? setterValue[0] : setterValue;
+                var value = Array.isArray(setterValue) ? setterValue[0] : setterValue;


Why did you need to do this?

Because Lib.isArray is no more. It was replaced by Lib.isArrayOrTypedArray and Lib.isTypedArray. Here, these parcoord test don't use typed arrays, so good old Array.isArray calls suffice.

alexcjohnson · 2018-02-28T03:18:01Z

src/plots/cartesian/set_convert.js

+                if(len === arrayIn.length) {
+                    return arrayIn;
+                } else if(arrayIn.subarray) {
+                    return arrayIn.subarray(0, len);


This should work for linear and log axes, but what happens if you feed numeric data to a non-numeric (date or category) axis? Both of those could be real use cases, but both also alter the input numbers, so I think they need to bail out to the d2c block below.

Done and 🔒 in 6dd2f69

alexcjohnson · 2018-02-28T03:20:45Z

src/components/colorscale/has_colorscale.js

-    if(Array.isArray(color)) {
+    if(Lib.isTypedArray(color)) {
+        isArrayWithOneNumber = true;
+    } else if(Array.isArray(color)) {


again to my question about missing data in typed arrays... here can't we just let these typed arrays drop into the for loop as well?

done and 🔒 in 306986d

alexcjohnson · 2018-02-28T03:24:11Z

src/traces/heatmap/has_columns.js

 module.exports = function(trace) {
-    return !Array.isArray(trace.z[0]);
+    return !Lib.isArrayOrTypedArray(trace.z[0]);


Would it work to provide a 2D array as an array of typed arrays? Not that I want to encourage this, far better for us to provide a solution based on a single typed array...

Would it work to provide a 2D array as an array of typed arrays?

Should have looked at the next commit before commenting 🏆

Yep, arrays of typed arrays for should work after this PR.

alexcjohnson · 2018-02-28T03:32:56Z

src/traces/carpet/map_1d_array.js

@@ -16,7 +18,7 @@
 module.exports = function mapArray(out, data, func) {
    var i;

-    if(!Array.isArray(out)) {
+    if(!isArrayOrTypedArray(out)) {
        // If not an array, make it an array:
        out = [];
    } else if(out.length > data.length) {


⚠️ this block has a .slice ⚠️

That's ok though:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/slice

Ah great. So this is fine, but I guess 🐎 at some point we could make a helper that uses slice on regular arrays and subarray on typed arrays, for cases like this where we don't need a copy.

On-par autorange for scattergl

- by making sure it returns false on instances of DataView

... for now.

- we could have used isNaN here, but isNumeric is fast enough that the gains would be negligible.

alexcjohnson · 2018-02-28T20:16:02Z

Excellent work - should be a big step toward higher performance! 💃 🍾

jonmmease · 2018-03-12T13:43:24Z

@etpinard I just started playing with these changes and I was wondering if the following behavior is expected.

When I create a scatter (or scattergl) trace using a typed array as the x/y values the trace displays properly. When I look at the gd.data[0].x property I see a typed array, but when I look at the gd._fullData[0].x attribute I see a standard (non-typed) array.

For memory efficiency, I was thinking it would be nice for _fullData to store the typed version as well. I can open a separate issue, just wanted to get your initial take on it.

etpinard · 2018-03-12T14:11:15Z

When I create a scatter (or scattergl) trace using a typed array as the x/y values the trace displays properly. When I look at the gd.data[0].x property I see a typed array, but when I look at the gd._fullData[0].x attribute I see a standard (non-typed) array.

That shouldn't happen. Would you mind sharing a reproducible example?

jonmmease · 2018-03-12T14:54:33Z

Never mind, I'm not able to reproduce it. Must be a bug somewhere else in my code. Thanks!

etpinard added 6 commits February 20, 2018 16:15

rename Lib.isArray -> Lib.isArrayOrTypedArray

91b03ff

... to avoid confusion with Array.isArray and upcoming Lib.isTypedArray

accept typed array when coercing 'data_array' attributes

a25ff13

[PoC] bypass ax.d2c for typedArray during ax.makeCalcdata

45c2f35

... and use typed array 'subarray' to slice coordinate arrays to series length

replace Array.isArray -> Lib.isArrayOrTypedArray in various places

2d81bdc

... that should accept typed array (mostly arrayOk attribute, with marker.size and marker.color being to most likely candidates for typed array inputs).

replace Lib.isArray -> Array.isArray in tests w/o typed arrays

bcf59d7

[wip] a few todos

bc32981

- do the Lib.extend* methods do the right for typed arrays - what to do with 2d arrays? Should we start supporting ndarrays? - should we invent a JSON-seriliazable version of typed arrays?

etpinard added feature something new status: in progress labels Feb 20, 2018

etpinard commented Feb 20, 2018

View reviewed changes

Merge branch 'master' into typed-arrays-support

2372629

etpinard added 6 commits February 26, 2018 11:21

make scattergl dragmode relayout replot not recalc

909120e

... replot is sufficient since regl push.

large lint commit in scattergl/index.js

5316c47

... to make things look a little more like the rest of plotly.js

improvements to Scattergl.calc

0aa0f5e

- reuse scatter axis-expansion logic - improve 'fast' axis expand routine (using average marker.size as pad value) - use ax.makeCalcdata for all axis types (this creates a new array for linear axes, but makes thing more robust) - add a few TODOs

update baselines

39ef5a9

- most notable change is in gl2d_axes_label2 that now shows the correct to-zero autorange.

update jasmine tests for new scattergl auto-ranges

36b4e25

add svg vs gl scatter autorange 🔒s

40f93f7

etpinard mentioned this pull request Feb 26, 2018

On-par autorange for scattergl #2404

Merged

etpinard added 6 commits February 26, 2018 17:40

improve makeCalcdata typed array handling + some tests + some linting

d98dcc0

a few more Array.isArray -> Lib.isArrayOrTypedArray

c0e2f73

some typed array tests

09d37b6

🔒 Plotly.restyle support for typed arrays

a7ed2c2

add and 🔒 typed array support for extendTraces and prependTraces

b95e462

... by merging the concat and splice steps (which can't be done using native methods on typed arrays)

Merge branch 'master' into typed-arrays-support

53ea0ec

etpinard removed the status: in progress label Feb 28, 2018

etpinard added this to the v1.35.0 milestone Feb 28, 2018

etpinard mentioned this pull request Feb 28, 2018

allow plotly.js to accept numpy buffers #1784

Open

alexcjohnson reviewed Feb 28, 2018

View reviewed changes

etpinard added 6 commits February 28, 2018 13:46

Merge pull request #2404 from plotly/scattergl-autorange

a2fb88b

On-par autorange for scattergl

improve isTypedArray

f0395b5

- by making sure it returns false on instances of DataView

don't require all of Lib when only isArrayOrTypedArray is needed

9b83826

turn coord typed arrays into plain array for 'date' and 'category axes

6dd2f69

... for now.

check that typed arrays items aren't all NaNs in hasColorscale

306986d

- we could have used isNaN here, but isNumeric is fast enough that the gains would be negligible.

no need to check for typed array in outer array of a 2d array

d4cb0c4

etpinard mentioned this pull request Feb 28, 2018

wishlist for potential breaking changes since v1 #420

Closed

15 tasks

etpinard merged commit f8e7ee4 into master Feb 28, 2018

etpinard deleted the typed-arrays-support branch February 28, 2018 20:28

etpinard mentioned this pull request Feb 28, 2018

Data array attributes should support Typed arrays (e.g. Float32Array) #860

Closed

etpinard mentioned this pull request May 1, 2018

Use isArrayOrTypedArray in gl_format_color.js #2596

Merged

veggiesaurus mentioned this pull request May 8, 2018

[plotly.js] added typed array to scatter data DefinitelyTyped/DefinitelyTyped#25608

Merged

7 tasks

jonmmease mentioned this pull request Aug 16, 2018

Add support for encoding TypedArrays as primitive objects for serialization #2911

Closed

2 tasks

cstjean mentioned this pull request Oct 22, 2019

Data Transfer Performance JuliaPlots/PlotlyJS.jl#77

Open

		@@ -11,6 +11,7 @@

		var isNumeric = require('fast-isnumeric');

		var Lib = require('../../lib');

		@@ -8,6 +8,8 @@

		'use strict';

		var isArrayOrTypedArray = require('../../lib').isArrayOrTypedArray;

Typed arrays support #2388

Typed arrays support #2388

Conversation

etpinard commented Feb 20, 2018

etpinard Feb 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etpinard Feb 28, 2018 • edited Loading

Choose a reason for hiding this comment

etpinard Feb 28, 2018 • edited Loading

Choose a reason for hiding this comment

jackparmer commented Feb 21, 2018

jonmmease commented Feb 21, 2018

etpinard commented Feb 21, 2018

jonmmease commented Feb 21, 2018

jonmmease commented Feb 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etpinard Feb 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcjohnson commented Feb 28, 2018

jonmmease commented Mar 12, 2018

etpinard commented Mar 12, 2018

jonmmease commented Mar 12, 2018

etpinard Feb 20, 2018 •

edited

Loading

etpinard Feb 28, 2018 •

edited

Loading

etpinard Feb 28, 2018 •

edited

Loading

etpinard Feb 28, 2018 •

edited

Loading