major difference in performance between sync and async methods to download tensor data #6687

vladmandic · 2022-07-28T01:03:38Z

when using webgl backend, tensor.dataSync() is about 35% faster than await tensor.data() when returning larger amounts of data
(in my case, model takes 720x720x3 image as input and produces 720x720x3 image as output)

actual timings are ~100ms for tensor.dataSync() and ~140ms for await tensor.data()
note that there is nothing else executing, this is a single active loop

i wouldn't mind using dataSync(), but then again any sync calls are not compatible with webgpu backend and maintaining two separate codepaths for webgl and webgpu is a no-op

btw, model in question can be found at https://github.com/vladmandic/anime

environment: tfjs 3.19..0 on chrome 103

The text was updated successfully, but these errors were encountered:

gyagp · 2022-07-28T11:28:14Z

We once did some investigation on why there is a big perf diff between WebGL data() and dataSync(). The basic finding is below:
WebGLRenderingContext.readPixels() is a blocking operation. It waits for all previous rendering operations to finish, then reads data from GPU to CPU, which may block the main thread for quite a long time. To mitigate the situation, in async path (that's TFJS implementation of tensor.data()), we may insert a fence right after rendering operations, then use some timer (For example, setTimeout) to poll fence’s status. If fence is not satisfied, we can do something in parallel. Otherwise, we then readPixels() without a long blocking as rendering operations have been finished.
With an ideal timer, fence may unblock the UI, hide the latency using parallel tasks and timely read data back.
However, according to html spec (https://html.spec.whatwg.org/multipage/timers-and-user-prompts.html#:~:text=If%20nesting%20level%20is%20greater%20than%205%2C%20and%20timeout%20is%20less%20than%204%2C%20then%20set%20timeout%20to%204), If the nesting level is greater than 5 and timeout is less than 4ms, timeout will be changed to 4ms, which causes a delay of at least 4ms. You have about 40ms diff, so I'm not sure if there are other reasons.
To mitigate this, there are 3 possibilities:

We already found a tricky way to introduce a new setTimeout(), which will be implemented in TFJS (Sorry that this work has been delayed for a while).
In Chrome Canary, it also changed the nesting threshold from 5 to 100 (https://groups.google.com/a/chromium.org/g/blink-dev/c/-TjeYs7shTQ/m/FhJq0mQyDAAJ?pli=1). You may have a quick try with Chrome Canary (not Stable) to see if your problem can be alleviated.
Another possible suggestion is you may always keep the async path (sorry that by design, WebGPU is mostly async and WebGL is always sync), but wrap tensor.dataSync() as an async function for WebGL.

More details of our investigation can be found at https://docs.google.com/document/d/13Up4P7Or9uliiU2RZfzOk1aQ5yjFSjG2M6k16BkQn4U/edit.
If you may have a check with Chrome Canary, please let me know the result. We'll take a further look if it couldn't greatly alleviate your situation. Thanks!

vladmandic · 2022-07-28T13:52:28Z

@gyagp

thanks for the explanation
tried with chrome canary 106.0.5206.0

await tensor.data() --> average 107ms
tensor.dataSync() --> average 99ms

difference is definitely smaller, but still noticeable at 10%
i guess this comes down to polling delay you've mentioned

looking forward to new timeout method
if you have a pr for that, i can monitor and try once its ready?

gyagp · 2022-07-28T15:55:12Z

Thanks for the quick try! It seems 4ms delay in setTimeout() contributes to most of the gap between async and sync calls. Once it's solved, I also think another gap comes from polling, like you mentioned. More specific, browser needs to check the result at some frequency. If that's too frequent, much power will be drained. Currently in Chrome implementation, the frequency of task scheduler is about 1ms (the exact implementation is more complicated), which is the general cost of async calls.
In WebGPU path, we already found some perf issue related to GPU task scheduler, and we're fixing that. We will double check if WebGL path can be improved.
Stay tuned with the new setTimeout :)

pyu10055 · 2022-07-28T21:28:52Z

@gyagp Thank you for the detail explanation, we would hope to unify the API between WebGL and WebGPU without major performance discrepancy, looking forward to the new timeout mechanism. thanks.

gyagp · 2022-08-03T21:40:28Z

@vladmandic I have a PR (under review) at #6694. May I have your help to give a try?
Currently the custom setTimeout is gated by an environmental variable "USE_SETTIMEOUTWPM", so you need to enable it first with "tf.env().set('USE_SETTIMEOUTWPM', true);". Thanks!

vladmandic · 2022-08-03T22:12:59Z

@gyagp

after enabling USE_SETTIMEOUTWPM, i'm getting errors in tfjs-backend-wasm-threaded-simd.worker.js

worker.js received unknown command undefined

and i'm clearly not using wasm here at all, its just that my tfjs bundle has all backends pre-linked

my guess is that issue is because you're using browsers window object blindly and registering a method on it:

tfjs-core/src/util_base.ts

(window as any).setTimeoutWPM(tryFn, nextBackoff);

registering a function directly on a global namespace is a bad practice
but referencing browsers global namespace without even checking if its running in a browser context is a fail

vladmandic · 2022-08-03T22:26:13Z

@gyagp

i've removed tfjs-backend-wasm from my tfjs bundle and test now runs
results show some, but still far from sync data download performance

tensor with shape [1, 720, 720, 3], running model inference 500 times and calculating average time for tensor download:

using tensor.dataSync() -> 94 ms
using await tensor.data() -> 124 ms
using await tensor.data() with USE_SETTIMEOUTWPM = true -> 116 ms

repo used for test: https://github.com/vladmandic/anime

gyagp · 2022-08-04T06:23:06Z

Thank you so much @vladmandic ! This is not a very outstanding improvement, but 4ms delay is real and critical for an application that targets 60FPS, or cares about every millisecond. I will polish my PR and try to get it merged. After this, I will investigate the gap between data() and dataSync() again to see if gap can be further filled.

vladmandic · 2022-08-04T12:23:14Z

@gyagp

but 4ms delay is real and critical for an application that targets 60FPS, or cares about every millisecond

very true

i've tried USE_SETTIMEOUTWPM on a different use-case,
an app that uses multiple small models so there are multiple await tensor.data() calls for each input frame
each data result is tiny, it's just there are quite a few of them

and performance difference is much bigger - 15-20% overall performance improvement and that is huge:

average on 500 frames: ~122ms
average on 500 frames with USE_SETTIMEOUTWPM: ~103ms

repo used for test: https://github.com/vladmandic/human

gyagp · 2022-08-09T08:03:15Z

I have updated the PR, and it's under final review. @vladmandic May I ask your favor to test it again? I don't want to introduce any regression for your cases. By the way, I changed the flag name, and you need to "tf.env().set('USE_SETTIMEOUTCUSTOM', true);" first.

vladmandic · 2022-08-09T11:58:10Z

as soon as @tensorflow/tfjs-backend-wasm is loaded and USE_SETTIMEOUTCUSTOM is enabled, there are problems regardless if WASM is used or not as WASM init code happens on module load

worker.js received unknown command undefined
threadPrintErr @ tfjs.esm.js:81524
self.onmessage @ tfjs.esm.js:81594
postMessage (async)
(anonymous) @ tfjs.esm.js:8814
setTimeout (async)
setTimeoutCustom @ tfjs.esm.js:8813

issue is that worker.js in tfjs-backend-wasm listens to all messages and throws an error because it does not know about this particular message

self.onmessage = (e) => {
...
} else {
  err('worker.js received unknown command ' + e.data.cmd);
  err(e.data);
}

where err() is just a wrapper around console.error(text) via threadPrintErr()

i don't think that is not a bug in your code, but bad behavior from wasm worker script as it should not treat
any unknown message as error

bigger problem is that WASM worker script is generated by EMScripten and only lightly modified by TFJS build process

actual source for worker.js and offending line is in EMScripten repo:
https://github.com/emscripten-core/emscripten/blob/5c4b1675ca5ba34e283b21b3c01df092827db628/src/worker.js#L288

and it seems this is directly related:
emscripten-core/emscripten#16450

shaoboyan · 2022-08-10T08:56:42Z

@vladmandic
I've created a simple sample to try to reproduce the perf difference but the perf result shows that:

average 'await tensor.data()'on 500 runs: ~53.23ms
average 'tensor.dataSync()' on 500 runs : ~44.08ms
average 'await tensor.data()' on 500 runs with USE_SETTIMEOUTWPM: ~39.73ms

If you want to try this, you could follow the steps:

clone/check to the branch : https://github.com/shaoboyan/tfjs/tree/settimeout
Build tfjs-core in tfjs/link-package folder with command yarn build-deps-for tfjs-core
Build tfjs-backend-webgl in tfjs/link-package folder with command yarn build-deps-for tfjs-backend-webgl
Start a server in the folder above tfjs folder with command npx http-server
Start chrome canary and access http://127.0.0.1:8080/tfjs/e2e/benchmarks/local-benchmark/index_tfjs.html?backend=webgl&localBuild=webgl,core. Open the console and the result is there.

My code clip is :

 await tf.setBackend("webgl");
      await tf.ready();
      let value = [];
      for (var i = 0; i < 1440 * 1440; ++i) {
        value[i] = i / 1440;
      }

      const tensor = tf.tensor2d(value, [1440, 1440]);
      const tensor2 = tf.tensor2d(value, [1440, 1440]);
      const result = tf.matMul(tensor, tensor2);
      const tensorTest = tensor;

      //warm up
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        result.dataSync();
        result.dispose();
      }

      let start = 0; //performance.now();
      let total = 0;

      tf.env().set('USE_SETTIMEOUTCUSTOM', true);
      start = performance.now();
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        await result.data();
        result.dispose();
      }
      total = performance.now() - start;
      console.log('await tensor.data() readback time origin: ' + total / 500);

      total = 0;
      start = performance.now();
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        result.dataSync();
        result.dispose();
      }
      //total += performance.now() - start;
      total = performance.now() - start;
      console.log('tensor.dataSync() readback time: ' + total / 500);

      tf.env().set('USE_SETTIMEOUTCUSTOM', false);
      total = 0;
      start = performance.now();
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        await result.data();
        result.dispose();
      }
      total = performance.now() - start;
      console.log('await tensor.data() readback time custom: ' + total / 500);

I'm not sure what's the difference between this sample and your sample. Do you have some ideas?

And is it possible for me to run your sample on my local so I could do more profile work? If it is possible, your guidance would be appreciate.

gyagp · 2022-08-10T09:09:36Z

@vladmandic, For the background, I asked @shaoboyan to investigate the diff between data and dataSync (116ms vs. 94ms) even if we use the custom setTimeout as you reported below.
using tensor.dataSync() -> 94 ms
using await tensor.data() -> 124 ms
using await tensor.data() with USE_SETTIMEOUTWPM = true -> 116 ms

vladmandic · 2022-08-10T12:47:43Z

@gyagp @shaoboyan

i just redid all the tests - seems there is quite a difference depending on:

timings for initial iterations are all over the place,
and not just 1st inference - results dont stabilize until much later
so i'm now throwing those away and only measure average timings after 100th iterations
timings vary drastically depending if test is synthetic (like your example)
or actually involves model inference (my original example)
if test includes model inference difference in results seem to be due to extra pending ops
on browser's webgl pipeline and nothing to do with tensor download

results

download tensor data after model.execute():

await tensor.data(): 175ms
await tensor.data() with USE_SETTIMEOUTCUSTOM: 170ms
tensor.dataSync(): 165ms

download tensor data after model.execute() and gl.clientWaitSync:

await tensor.data(): 114ms
await tensor.data() with USE_SETTIMEOUTCUSTOM: 101ms
tensor.dataSync(): 102ms

download tensor data for manually created/modified tensor:

await tensor.data(): 104ms
await tensor.data() with USE_SETTIMEOUTCUSTOM: 93ms
tensor.dataSync(): 94ms

all-in-all, using webgl backend impacts result measurement a lot if it webgl pipeline is not taken into consideration
(as @pyu10055 once explained, model.execute() is not fully sync on its own)

other than that, USE_SETTIMEOUTCUSTOM does indeed bring async download to same level of performance as sync
(actually even a touch faster than sync call, but within margin of error as difference is below 1ms)!

btw, this is my code for gl.syncWait:

async function syncWait(gl: WebGL2RenderingContext): Promise<number> {
  const sync = gl.fenceSync(gl.SYNC_GPU_COMMANDS_COMPLETE, 0);
  if (!sync) return 0;
  const ts = performance.now();
  return new Promise((resolve) => {
    const loop = () => {
      const status = gl.clientWaitSync(sync, gl.SYNC_FLUSH_COMMANDS_BIT, 0);
      if (status === gl.WAIT_FAILED) throw new Error('clientWaitSync: wait failed');
      else if (status === gl.ALREADY_SIGNALED || status === gl.CONDITION_SATISFIED) {
        gl.deleteSync(sync);
        resolve(performance.now() - ts);
      } else {
        setTimeout(loop, 0);
      }
    };
    loop();
  });
}

and simplified code example would be:

const tensor = model.execute(input);
await syncWait(tf.backend().getGPGPUContext().gl);
const timeStamp = performance.now();
const data = await tensor.data();
totalTime += (performance.now() - timeStamp);
log({ avgDataTime: totalTime / ++numFrames });

shaoboyan · 2022-08-11T02:03:25Z

@vladmandic
Thanks for trying and the detailed data! But I still has concern due to the previous model average time. I'm a bit suspect that the async download might cause the perf diff due to the different scheduling method in browser.
So would you mind to test again by including the model.execute(input) and testing after several warmup rounds ? If the gap still exists, I think a profile to understand this is still worthwhile.

vladmandic · 2022-08-11T03:53:46Z

@shaoboyan i'm not sure what is the ask - i've already done that - not just several warmup rounds, but 100 and then measuring average between rounds 100 to 200. yes, gap still exists, but a) its much smaller, up to a point of close enough, b) its due to webgl pipeline also performing timer checks which interferes with browser handling of ticks.

my much bigger concern is blocking issue of new code being in conflict with tjfs-backend-wasm.

shaoboyan · 2022-08-11T05:55:47Z

b) its due to webgl pipeline also performing timer checks which interferes with browser handling of ticks.

@vladmandic Thanks. I misunderstand your tests. I think this reason resolves my concern.

gyagp · 2022-08-19T09:20:25Z

as soon as @tensorflow/tfjs-backend-wasm is loaded and USE_SETTIMEOUTCUSTOM is enabled, there are problems regardless if WASM is used or not as WASM init code happens on module load
worker.js received unknown command undefined
threadPrintErr @ tfjs.esm.js:81524
self.onmessage @ tfjs.esm.js:81594
postMessage (async)
(anonymous) @ tfjs.esm.js:8814
setTimeout (async)
setTimeoutCustom @ tfjs.esm.js:8813
issue is that worker.js in tfjs-backend-wasm listens to all messages and throws an error because it does not know about this particular message
self.onmessage = (e) => {
...
} else {
  err('worker.js received unknown command ' + e.data.cmd);
  err(e.data);
}
where err() is just a wrapper around console.error(text) via threadPrintErr()

i don't think that is not a bug in your code, but bad behavior from wasm worker script as it should not treat any unknown message as error

bigger problem is that WASM worker script is generated by EMScripten and only lightly modified by TFJS build process

actual source for worker.js and offending line is in EMScripten repo: https://github.com/emscripten-core/emscripten/blob/5c4b1675ca5ba34e283b21b3c01df092827db628/src/worker.js#L288

and it seems this is directly related: emscripten-core/emscripten#16450

I tried e2e benchmark in TFJS on both Windows and macOS. However, I couldn't repro the issue you mentioned. I double checked the emscripten source code (_bazel_user/qpnosdr3/external/emscripten_bin_win/emscripten/src/library_pthread.js) we use now, and it's still the problematic one, and I think its version is 3.1.4. Emscripten fixed this problem (emscripten-core/emscripten#16450) on Mar 10, and the first release after the fix is 3.1.8 on Mar 24.
This is still a valid problem. @pyu10055 @mattsoulanille (should be still on vacation now), do you think if we can upgrade emsdk version now?

vladmandic · 2022-08-19T12:10:42Z

@gyagp i'm not sure upgrading emscripten would solve it as fix emscripten-core/emscripten#16450 solved it for https://github.com/emscripten-core/emscripten/blob/main/src/library_pthread.js#L290, but it still needs a same fix for https://github.com/emscripten-core/emscripten/blob/main/src/worker.js#L287 -those are two different places in the code where same problem can occur

vladmandic · 2022-08-30T18:29:37Z

@gyagp any progress on emscripten worker.js issue i've mentioned - it uses global onmessage handler so it intercepts all messages including the one you're using.

gyagp · 2022-09-05T12:45:13Z

@vladmandic Sorry, I was occupied by some other things. @mattsoulanille, do you have any suggestion on comment 17?

mattsoulanille · 2022-09-13T18:30:35Z

@gyagp Maybe we can register our listener before Emscripten and suppress propagation of our event? However, I'm not sure if this is possible in a webworker created by Emscripten. Ideally, there would be something similar to CustomEvents that we could use as a unique event (we can't use window.dispatchEvent because it is synchronous).

I also updated Emscripten to 3.1.20, although this might not solve the issue.

vladmandic · 2022-09-13T20:10:22Z

@mattsoulanille @gyagp i just created emscripten issue for this as this is really something that should be patched there - imo anything you do here is just a workaround to avoid bad emscripten behavior

gyagp · 2022-09-14T00:27:02Z

@mattsoulanille Thanks and let's wait for Emscripten's feedback at emscripten-core/emscripten#17844.

hujiajie · 2022-09-14T06:35:47Z

Will MessageChannel fit your needs here?

mattsoulanille · 2022-09-14T17:26:56Z

MessageChannel looks like it will work. Thanks @hujiajie for pointing it out to us! @gyagp, we should try this out as a drop-in replacement for window.postMessage.

A quick test:

lastTime = new Date().getTime();
i = 0;
function time() {
    i++;
    if (i % 1000 === 0) {
        currentTime = new Date().getTime();
        delta = currentTime - lastTime;
        console.log(`1000 iterations in ${delta} ms. ${delta / 1000} ms per iteration`);
        lastTime = currentTime;
    }
}

channel = new MessageChannel()
channel.port1.onmessage = () => {
    channel.port2.postMessage('hello');
    time();
};
channel.port2.postMessage('hello')

This looks like it has no 4ms delay (while the same code under a setTimeout 'loop' does):

1000 iterations in 9 ms. 0.009 ms per iteration
VM5549:8 1000 iterations in 11 ms. 0.011 ms per iteration
VM5549:8 1000 iterations in 9 ms. 0.009 ms per iteration
VM5549:8 1000 iterations in 10 ms. 0.01 ms per iteration
VM5549:8 1000 iterations in 11 ms. 0.011 ms per iteration
2VM5549:8 1000 iterations in 9 ms. 0.009 ms per iteration

vladmandic · 2022-09-14T17:49:27Z

imo, MessageChannel looks nice, but unlike window.postMessage, its not in-process - it actually uses sockets on both sides.
perhaps not an issue (although it does seem more heavy-weight), but do make sure that there is a single reused MessageChannel, you don't want to exhaust ports stuck in half-close state.

vladmandic · 2022-09-17T02:02:21Z

@mattsoulanille @gyagp emscripten pr has been merged to main, so now it should be just a question of upgrading to new version once its out.

mattsoulanille · 2022-09-17T08:32:24Z

Thanks for the update @vladmandic, and for the extra info on MessageChannel. I agree that it would be better to stay in-process. Once emsdk 3.1.22 is published (and it looks like that will be soon since there's a PR open for it), I'll update it in TFJS.

gyagp · 2022-09-23T09:15:55Z

Sorry for the late response. MessageChannel also works and I updated my test (https://github.com/webatintel/webatintel.github.io/blob/main/timer.html#L44) to double confirm this. Given @vladmandic already fixed the issue in Emscripten upstream (thank you so much for the effort!), let's keep the current solution.

vladmandic · 2022-09-26T13:42:34Z

@mattsoulanille fyi, emscripten 3.1.22 (and 3.1.23) have been published

mattsoulanille · 2022-09-26T18:02:26Z

Thanks for the ping @vladmandic. I checked emsdk's github a few days ago, so I'm not sure how I missed the 3.1.22 update. I've created #6856.

mattsoulanille · 2022-09-27T07:00:18Z

@vladmandic, emsdk has been upgraded to 3.1.23 in tfjs's master branch. Could you please check this against your esbuild build infrastructure when you get a chance? Thanks!

vladmandic · 2022-09-27T15:45:50Z

@mattsoulanille @gyagp sure!
i just did a fresh checkout&build from https://github.com/gyagp/tfjs/tree/settimeout
and have no issues using USE_SETTIMEOUTCUSTOM

gyagp · 2022-09-27T15:56:17Z

@vladmandic Thanks for the double check!

If the setTimeout nesting level is greater than 5 and timeout is less than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom setTimeout is provided to mitigate the perf impact. BUG: #6687 Co-authored-by: Na Li <[email protected]>

vladmandic · 2022-09-27T17:20:19Z

closing as resolved - thanks!

google-ml-butler · 2022-09-27T17:20:21Z

Are you satisfied with the resolution of your issue?
Yes
No

* Customize setTimeout (tensorflow#6694) If the setTimeout nesting level is greater than 5 and timeout is less than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom setTimeout is provided to mitigate the perf impact. BUG: tensorflow#6687 Co-authored-by: Na Li <[email protected]> * Upgrade windows BrowserStack chrome to 104 (tensorflow#6866) * webgpu: Disable importExternalTexture (tensorflow#6868) WebGPU Working Group recently found some problem with importExtenalTexture in spec, so we have to disable it temporarily. Co-authored-by: Yang Gu <[email protected]> Co-authored-by: Na Li <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]>

@Brianzheng123

* Customize setTimeout (tensorflow#6694) If the setTimeout nesting level is greater than 5 and timeout is less than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom setTimeout is provided to mitigate the perf impact. BUG: tensorflow#6687 Co-authored-by: Na Li <[email protected]> * Upgrade windows BrowserStack chrome to 104 (tensorflow#6866) * webgpu: Disable importExternalTexture (tensorflow#6868) WebGPU Working Group recently found some problem with importExtenalTexture in spec, so we have to disable it temporarily. * Refactored Resizing Layer Unit Tests (#38) * Rescaling Preprocessing Layer Co-authored-by: David Kim (@koyykdy) <[email protected]> Brian Zheng (@Brianzheng123) <[email protected]> * PR issues resolved * linting and PR issues resolved Co-authored-by: Adam Lang (@AdamLang96) <[email protected]> Co-authored-by: (@Brianzheng123) <[email protected]> * initial implementation of image preprocessing: resizing layer, and associated unit tests. Comments and refactoring for image scaling layer * refactoring in computeOutputShape for image resizing layer * Unit tests for image resizing preprocessing layer expanded and refactored * refactored unit tests for resizing layer * Preprocessing-Resizing layer unit test expansion and refactoring. Co-authored-by: Adam Lang <@AdamLang96> ([email protected]) * cleaning up commit diffs * cleaning up commit diffs * PR commit suggestions accepted - code refactored to reflect changes * resizing layer unit test refactoring Co-authored-by: AdamLang96 <[email protected]> * Linting issue resolved: unused import statement culled (#39) * Rescaling Preprocessing Layer Co-authored-by: David Kim (@koyykdy) <[email protected]> Brian Zheng (@Brianzheng123) <[email protected]> * PR issues resolved * linting and PR issues resolved Co-authored-by: Adam Lang (@AdamLang96) <[email protected]> Co-authored-by: (@Brianzheng123) <[email protected]> * initial implementation of image preprocessing: resizing layer, and associated unit tests. Comments and refactoring for image scaling layer * refactoring in computeOutputShape for image resizing layer * Unit tests for image resizing preprocessing layer expanded and refactored * refactored unit tests for resizing layer * Preprocessing-Resizing layer unit test expansion and refactoring. Co-authored-by: Adam Lang <@AdamLang96> ([email protected]) * cleaning up commit diffs * cleaning up commit diffs * PR commit suggestions accepted - code refactored to reflect changes * resizing layer unit test refactoring * linting issues resolved: unusued import statement culled Co-authored-by: AdamLang96 <[email protected]> * Update jasmine_util.ts (tensorflow#6872) FIX * webgl: Fix NaN issue (tensorflow#6828) Fix tensorflow#6822 Problem 1： On some GPUs, even if a and b are both non-NaN, the value of isNaN in vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0)); are still larger than 0., which misleads all values become NAN. 2： After resolving NAN issue, the result is still incorrect. It seems that the isnan_custom is not well supported on the problem GPU. After switching back to builtin isnan, everything works well. Solution: Use the bool type bvec4 instead of float type vec4 to calculate isNaN to avoid the the float precision issue when comparing with zero. Meanwhile, add an env flag WEBGL2_ISNAN_CUSTOM to allow user to specify which isnan to use. * Upgrade nodejs to 18.7.0 (tensorflow#6863) * Upgrade nodejs to 18.7.0 * Fix hash table test string not passed as base64 * fixed prelu fusing code that pre-maturely neg the const on multiply (tensorflow#6876) Co-authored-by: RajeshT <[email protected]> * Update tfjs-layers/src/layers/preprocessing/image_resizing.ts Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: Yang Gu <[email protected]> Co-authored-by: Na Li <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: AdamLang96 <[email protected]> Co-authored-by: Linchenn <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Ping Yu <[email protected]> Co-authored-by: RajeshT <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: Yang Gu <[email protected]> Co-authored-by: Na Li <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]> Co-authored-by: AdamLang96 <[email protected]> Co-authored-by: Linchenn <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Ping Yu <[email protected]> Co-authored-by: RajeshT <[email protected]> Co-authored-by: Matthew Soulanille <[email protected]>

vladmandic added the type:bug Something isn't working label Jul 28, 2022

qjia7 assigned gyagp Jul 28, 2022

gyagp mentioned this issue Aug 9, 2022

Customize setTimeout #6694

Merged

vladmandic mentioned this issue Sep 13, 2022

emscripten global handling of onmessages causes conflicts with other apps emscripten-core/emscripten#17844

Closed

vladmandic closed this as completed Sep 27, 2022

dydavidkim mentioned this issue Sep 29, 2022

Refactored PR Comment Issues, Plus Pulling in updates to tfjs master repository CodeSmithDSMLProjects/tfjs#40

Merged

major difference in performance between sync and async methods to download tensor data #6687

major difference in performance between sync and async methods to download tensor data #6687

Comments

vladmandic commented Jul 28, 2022

gyagp commented Jul 28, 2022

vladmandic commented Jul 28, 2022 • edited Loading

gyagp commented Jul 28, 2022

pyu10055 commented Jul 28, 2022

gyagp commented Aug 3, 2022

vladmandic commented Aug 3, 2022

vladmandic commented Aug 3, 2022 • edited Loading

gyagp commented Aug 4, 2022

vladmandic commented Aug 4, 2022 • edited Loading

gyagp commented Aug 9, 2022

vladmandic commented Aug 9, 2022 • edited Loading

shaoboyan commented Aug 10, 2022

gyagp commented Aug 10, 2022

vladmandic commented Aug 10, 2022 • edited Loading

results

shaoboyan commented Aug 11, 2022

vladmandic commented Aug 11, 2022 • edited Loading

shaoboyan commented Aug 11, 2022 • edited Loading

gyagp commented Aug 19, 2022

vladmandic commented Aug 19, 2022

vladmandic commented Aug 30, 2022

gyagp commented Sep 5, 2022

mattsoulanille commented Sep 13, 2022

vladmandic commented Sep 13, 2022

gyagp commented Sep 14, 2022

hujiajie commented Sep 14, 2022

mattsoulanille commented Sep 14, 2022

vladmandic commented Sep 14, 2022

vladmandic commented Sep 17, 2022

mattsoulanille commented Sep 17, 2022

gyagp commented Sep 23, 2022 • edited Loading

vladmandic commented Sep 26, 2022

mattsoulanille commented Sep 26, 2022

mattsoulanille commented Sep 27, 2022

vladmandic commented Sep 27, 2022

gyagp commented Sep 27, 2022

vladmandic commented Sep 27, 2022

google-ml-butler bot commented Sep 27, 2022

vladmandic commented Jul 28, 2022 •

edited

Loading

vladmandic commented Aug 3, 2022 •

edited

Loading

vladmandic commented Aug 4, 2022 •

edited

Loading

vladmandic commented Aug 9, 2022 •

edited

Loading

vladmandic commented Aug 10, 2022 •

edited

Loading

vladmandic commented Aug 11, 2022 •

edited

Loading

shaoboyan commented Aug 11, 2022 •

edited

Loading

gyagp commented Sep 23, 2022 •

edited

Loading