Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

major difference in performance between sync and async methods to download tensor data #6687

Closed
vladmandic opened this issue Jul 28, 2022 · 37 comments
Assignees
Labels
type:bug Something isn't working

Comments

@vladmandic
Copy link
Contributor

when using webgl backend, tensor.dataSync() is about 35% faster than await tensor.data() when returning larger amounts of data
(in my case, model takes 720x720x3 image as input and produces 720x720x3 image as output)

actual timings are ~100ms for tensor.dataSync() and ~140ms for await tensor.data()
note that there is nothing else executing, this is a single active loop

i wouldn't mind using dataSync(), but then again any sync calls are not compatible with webgpu backend and maintaining two separate codepaths for webgl and webgpu is a no-op

btw, model in question can be found at https://github.com/vladmandic/anime

environment: tfjs 3.19..0 on chrome 103

@vladmandic vladmandic added the type:bug Something isn't working label Jul 28, 2022
@gyagp
Copy link

gyagp commented Jul 28, 2022

We once did some investigation on why there is a big perf diff between WebGL data() and dataSync(). The basic finding is below:
WebGLRenderingContext.readPixels() is a blocking operation. It waits for all previous rendering operations to finish, then reads data from GPU to CPU, which may block the main thread for quite a long time. To mitigate the situation, in async path (that's TFJS implementation of tensor.data()), we may insert a fence right after rendering operations, then use some timer (For example, setTimeout) to poll fence’s status. If fence is not satisfied, we can do something in parallel. Otherwise, we then readPixels() without a long blocking as rendering operations have been finished.
With an ideal timer, fence may unblock the UI, hide the latency using parallel tasks and timely read data back.
However, according to html spec (https://html.spec.whatwg.org/multipage/timers-and-user-prompts.html#:~:text=If%20nesting%20level%20is%20greater%20than%205%2C%20and%20timeout%20is%20less%20than%204%2C%20then%20set%20timeout%20to%204), If the nesting level is greater than 5 and timeout is less than 4ms, timeout will be changed to 4ms, which causes a delay of at least 4ms. You have about 40ms diff, so I'm not sure if there are other reasons.
To mitigate this, there are 3 possibilities:

  1. We already found a tricky way to introduce a new setTimeout(), which will be implemented in TFJS (Sorry that this work has been delayed for a while).
  2. In Chrome Canary, it also changed the nesting threshold from 5 to 100 (https://groups.google.com/a/chromium.org/g/blink-dev/c/-TjeYs7shTQ/m/FhJq0mQyDAAJ?pli=1). You may have a quick try with Chrome Canary (not Stable) to see if your problem can be alleviated.
  3. Another possible suggestion is you may always keep the async path (sorry that by design, WebGPU is mostly async and WebGL is always sync), but wrap tensor.dataSync() as an async function for WebGL.

More details of our investigation can be found at https://docs.google.com/document/d/13Up4P7Or9uliiU2RZfzOk1aQ5yjFSjG2M6k16BkQn4U/edit.
If you may have a check with Chrome Canary, please let me know the result. We'll take a further look if it couldn't greatly alleviate your situation. Thanks!

@vladmandic
Copy link
Contributor Author

vladmandic commented Jul 28, 2022

@gyagp

thanks for the explanation
tried with chrome canary 106.0.5206.0

  • await tensor.data() --> average 107ms
  • tensor.dataSync() --> average 99ms

difference is definitely smaller, but still noticeable at 10%
i guess this comes down to polling delay you've mentioned

looking forward to new timeout method
if you have a pr for that, i can monitor and try once its ready?

@gyagp
Copy link

gyagp commented Jul 28, 2022

Thanks for the quick try! It seems 4ms delay in setTimeout() contributes to most of the gap between async and sync calls. Once it's solved, I also think another gap comes from polling, like you mentioned. More specific, browser needs to check the result at some frequency. If that's too frequent, much power will be drained. Currently in Chrome implementation, the frequency of task scheduler is about 1ms (the exact implementation is more complicated), which is the general cost of async calls.
In WebGPU path, we already found some perf issue related to GPU task scheduler, and we're fixing that. We will double check if WebGL path can be improved.
Stay tuned with the new setTimeout :)

@pyu10055
Copy link
Collaborator

@gyagp Thank you for the detail explanation, we would hope to unify the API between WebGL and WebGPU without major performance discrepancy, looking forward to the new timeout mechanism. thanks.

@gyagp
Copy link

gyagp commented Aug 3, 2022

@vladmandic I have a PR (under review) at #6694. May I have your help to give a try?
Currently the custom setTimeout is gated by an environmental variable "USE_SETTIMEOUTWPM", so you need to enable it first with "tf.env().set('USE_SETTIMEOUTWPM', true);". Thanks!

@vladmandic
Copy link
Contributor Author

@gyagp

after enabling USE_SETTIMEOUTWPM, i'm getting errors in tfjs-backend-wasm-threaded-simd.worker.js

worker.js received unknown command undefined

and i'm clearly not using wasm here at all, its just that my tfjs bundle has all backends pre-linked

my guess is that issue is because you're using browsers window object blindly and registering a method on it:

tfjs-core/src/util_base.ts

(window as any).setTimeoutWPM(tryFn, nextBackoff);

registering a function directly on a global namespace is a bad practice
but referencing browsers global namespace without even checking if its running in a browser context is a fail

@vladmandic
Copy link
Contributor Author

vladmandic commented Aug 3, 2022

@gyagp

i've removed tfjs-backend-wasm from my tfjs bundle and test now runs
results show some, but still far from sync data download performance

tensor with shape [1, 720, 720, 3], running model inference 500 times and calculating average time for tensor download:

  • using tensor.dataSync() -> 94 ms
  • using await tensor.data() -> 124 ms
  • using await tensor.data() with USE_SETTIMEOUTWPM = true -> 116 ms

repo used for test: https://github.com/vladmandic/anime

@gyagp
Copy link

gyagp commented Aug 4, 2022

Thank you so much @vladmandic ! This is not a very outstanding improvement, but 4ms delay is real and critical for an application that targets 60FPS, or cares about every millisecond. I will polish my PR and try to get it merged. After this, I will investigate the gap between data() and dataSync() again to see if gap can be further filled.

@vladmandic
Copy link
Contributor Author

vladmandic commented Aug 4, 2022

@gyagp

but 4ms delay is real and critical for an application that targets 60FPS, or cares about every millisecond

very true

i've tried USE_SETTIMEOUTWPM on a different use-case,
an app that uses multiple small models so there are multiple await tensor.data() calls for each input frame
each data result is tiny, it's just there are quite a few of them

and performance difference is much bigger - 15-20% overall performance improvement and that is huge:

  • average on 500 frames: ~122ms
  • average on 500 frames with USE_SETTIMEOUTWPM: ~103ms

repo used for test: https://github.com/vladmandic/human

@gyagp
Copy link

gyagp commented Aug 9, 2022

I have updated the PR, and it's under final review. @vladmandic May I ask your favor to test it again? I don't want to introduce any regression for your cases. By the way, I changed the flag name, and you need to "tf.env().set('USE_SETTIMEOUTCUSTOM', true);" first.

@vladmandic
Copy link
Contributor Author

vladmandic commented Aug 9, 2022

as soon as @tensorflow/tfjs-backend-wasm is loaded and USE_SETTIMEOUTCUSTOM is enabled, there are problems regardless if WASM is used or not as WASM init code happens on module load

worker.js received unknown command undefined
threadPrintErr @ tfjs.esm.js:81524
self.onmessage @ tfjs.esm.js:81594
postMessage (async)
(anonymous) @ tfjs.esm.js:8814
setTimeout (async)
setTimeoutCustom @ tfjs.esm.js:8813

issue is that worker.js in tfjs-backend-wasm listens to all messages and throws an error because it does not know about this particular message

self.onmessage = (e) => {
...
} else {
  err('worker.js received unknown command ' + e.data.cmd);
  err(e.data);
}

where err() is just a wrapper around console.error(text) via threadPrintErr()

i don't think that is not a bug in your code, but bad behavior from wasm worker script as it should not treat
any unknown message as error

bigger problem is that WASM worker script is generated by EMScripten and only lightly modified by TFJS build process

actual source for worker.js and offending line is in EMScripten repo:
https://github.com/emscripten-core/emscripten/blob/5c4b1675ca5ba34e283b21b3c01df092827db628/src/worker.js#L288

and it seems this is directly related:
emscripten-core/emscripten#16450

@shaoboyan
Copy link
Contributor

@vladmandic
I've created a simple sample to try to reproduce the perf difference but the perf result shows that:

  • average 'await tensor.data()'on 500 runs: ~53.23ms
  • average 'tensor.dataSync()' on 500 runs : ~44.08ms
  • average 'await tensor.data()' on 500 runs with USE_SETTIMEOUTWPM: ~39.73ms

If you want to try this, you could follow the steps:

  • clone/check to the branch : https://github.com/shaoboyan/tfjs/tree/settimeout
  • Build tfjs-core in tfjs/link-package folder with command yarn build-deps-for tfjs-core
  • Build tfjs-backend-webgl in tfjs/link-package folder with command yarn build-deps-for tfjs-backend-webgl
  • Start a server in the folder above tfjs folder with command npx http-server
  • Start chrome canary and access http://127.0.0.1:8080/tfjs/e2e/benchmarks/local-benchmark/index_tfjs.html?backend=webgl&localBuild=webgl,core. Open the console and the result is there.

My code clip is :

 await tf.setBackend("webgl");
      await tf.ready();
      let value = [];
      for (var i = 0; i < 1440 * 1440; ++i) {
        value[i] = i / 1440;
      }

      const tensor = tf.tensor2d(value, [1440, 1440]);
      const tensor2 = tf.tensor2d(value, [1440, 1440]);
      const result = tf.matMul(tensor, tensor2);
      const tensorTest = tensor;

      //warm up
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        result.dataSync();
        result.dispose();
      }

      let start = 0; //performance.now();
      let total = 0;

      tf.env().set('USE_SETTIMEOUTCUSTOM', true);
      start = performance.now();
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        await result.data();
        result.dispose();
      }
      total = performance.now() - start;
      console.log('await tensor.data() readback time origin: ' + total / 500);

      total = 0;
      start = performance.now();
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        result.dataSync();
        result.dispose();
      }
      //total += performance.now() - start;
      total = performance.now() - start;
      console.log('tensor.dataSync() readback time: ' + total / 500);

      tf.env().set('USE_SETTIMEOUTCUSTOM', false);
      total = 0;
      start = performance.now();
      for (var i = 0; i < 500; ++i) {
        const result = tf.matMul(tensor, tensor2);
        await result.data();
        result.dispose();
      }
      total = performance.now() - start;
      console.log('await tensor.data() readback time custom: ' + total / 500);

I'm not sure what's the difference between this sample and your sample. Do you have some ideas?

And is it possible for me to run your sample on my local so I could do more profile work? If it is possible, your guidance would be appreciate.

@gyagp
Copy link

gyagp commented Aug 10, 2022

@vladmandic, For the background, I asked @shaoboyan to investigate the diff between data and dataSync (116ms vs. 94ms) even if we use the custom setTimeout as you reported below.
using tensor.dataSync() -> 94 ms
using await tensor.data() -> 124 ms
using await tensor.data() with USE_SETTIMEOUTWPM = true -> 116 ms

@vladmandic
Copy link
Contributor Author

vladmandic commented Aug 10, 2022

@gyagp @shaoboyan

i just redid all the tests - seems there is quite a difference depending on:

  • timings for initial iterations are all over the place,
    and not just 1st inference - results dont stabilize until much later
    so i'm now throwing those away and only measure average timings after 100th iterations
  • timings vary drastically depending if test is synthetic (like your example)
    or actually involves model inference (my original example)
  • if test includes model inference difference in results seem to be due to extra pending ops
    on browser's webgl pipeline and nothing to do with tensor download

results

download tensor data after model.execute():

  • await tensor.data(): 175ms
  • await tensor.data() with USE_SETTIMEOUTCUSTOM: 170ms
  • tensor.dataSync(): 165ms

download tensor data after model.execute() and gl.clientWaitSync:

  • await tensor.data(): 114ms
  • await tensor.data() with USE_SETTIMEOUTCUSTOM: 101ms
  • tensor.dataSync(): 102ms

download tensor data for manually created/modified tensor:

  • await tensor.data(): 104ms
  • await tensor.data() with USE_SETTIMEOUTCUSTOM: 93ms
  • tensor.dataSync(): 94ms

all-in-all, using webgl backend impacts result measurement a lot if it webgl pipeline is not taken into consideration
(as @pyu10055 once explained, model.execute() is not fully sync on its own)

other than that, USE_SETTIMEOUTCUSTOM does indeed bring async download to same level of performance as sync
(actually even a touch faster than sync call, but within margin of error as difference is below 1ms)!

btw, this is my code for gl.syncWait:

async function syncWait(gl: WebGL2RenderingContext): Promise<number> {
  const sync = gl.fenceSync(gl.SYNC_GPU_COMMANDS_COMPLETE, 0);
  if (!sync) return 0;
  const ts = performance.now();
  return new Promise((resolve) => {
    const loop = () => {
      const status = gl.clientWaitSync(sync, gl.SYNC_FLUSH_COMMANDS_BIT, 0);
      if (status === gl.WAIT_FAILED) throw new Error('clientWaitSync: wait failed');
      else if (status === gl.ALREADY_SIGNALED || status === gl.CONDITION_SATISFIED) {
        gl.deleteSync(sync);
        resolve(performance.now() - ts);
      } else {
        setTimeout(loop, 0);
      }
    };
    loop();
  });
}

and simplified code example would be:

const tensor = model.execute(input);
await syncWait(tf.backend().getGPGPUContext().gl);
const timeStamp = performance.now();
const data = await tensor.data();
totalTime += (performance.now() - timeStamp);
log({ avgDataTime: totalTime / ++numFrames });

@shaoboyan
Copy link
Contributor

@vladmandic
Thanks for trying and the detailed data! But I still has concern due to the previous model average time. I'm a bit suspect that the async download might cause the perf diff due to the different scheduling method in browser.
So would you mind to test again by including the model.execute(input) and testing after several warmup rounds ? If the gap still exists, I think a profile to understand this is still worthwhile.

@vladmandic
Copy link
Contributor Author

vladmandic commented Aug 11, 2022

@shaoboyan i'm not sure what is the ask - i've already done that - not just several warmup rounds, but 100 and then measuring average between rounds 100 to 200. yes, gap still exists, but a) its much smaller, up to a point of close enough, b) its due to webgl pipeline also performing timer checks which interferes with browser handling of ticks.

my much bigger concern is blocking issue of new code being in conflict with tjfs-backend-wasm.

@shaoboyan
Copy link
Contributor

shaoboyan commented Aug 11, 2022

b) its due to webgl pipeline also performing timer checks which interferes with browser handling of ticks.

@vladmandic Thanks. I misunderstand your tests. I think this reason resolves my concern.

@gyagp
Copy link

gyagp commented Aug 19, 2022

as soon as @tensorflow/tfjs-backend-wasm is loaded and USE_SETTIMEOUTCUSTOM is enabled, there are problems regardless if WASM is used or not as WASM init code happens on module load

worker.js received unknown command undefined
threadPrintErr @ tfjs.esm.js:81524
self.onmessage @ tfjs.esm.js:81594
postMessage (async)
(anonymous) @ tfjs.esm.js:8814
setTimeout (async)
setTimeoutCustom @ tfjs.esm.js:8813

issue is that worker.js in tfjs-backend-wasm listens to all messages and throws an error because it does not know about this particular message

self.onmessage = (e) => {
...
} else {
  err('worker.js received unknown command ' + e.data.cmd);
  err(e.data);
}

where err() is just a wrapper around console.error(text) via threadPrintErr()

i don't think that is not a bug in your code, but bad behavior from wasm worker script as it should not treat any unknown message as error

bigger problem is that WASM worker script is generated by EMScripten and only lightly modified by TFJS build process

actual source for worker.js and offending line is in EMScripten repo: https://github.com/emscripten-core/emscripten/blob/5c4b1675ca5ba34e283b21b3c01df092827db628/src/worker.js#L288

and it seems this is directly related: emscripten-core/emscripten#16450

I tried e2e benchmark in TFJS on both Windows and macOS. However, I couldn't repro the issue you mentioned. I double checked the emscripten source code (_bazel_user/qpnosdr3/external/emscripten_bin_win/emscripten/src/library_pthread.js) we use now, and it's still the problematic one, and I think its version is 3.1.4. Emscripten fixed this problem (emscripten-core/emscripten#16450) on Mar 10, and the first release after the fix is 3.1.8 on Mar 24.
This is still a valid problem. @pyu10055 @mattsoulanille (should be still on vacation now), do you think if we can upgrade emsdk version now?

@vladmandic
Copy link
Contributor Author

@gyagp i'm not sure upgrading emscripten would solve it as fix emscripten-core/emscripten#16450 solved it for https://github.com/emscripten-core/emscripten/blob/main/src/library_pthread.js#L290, but it still needs a same fix for https://github.com/emscripten-core/emscripten/blob/main/src/worker.js#L287 -those are two different places in the code where same problem can occur

@vladmandic
Copy link
Contributor Author

@gyagp any progress on emscripten worker.js issue i've mentioned - it uses global onmessage handler so it intercepts all messages including the one you're using.

@gyagp
Copy link

gyagp commented Sep 5, 2022

@vladmandic Sorry, I was occupied by some other things. @mattsoulanille, do you have any suggestion on comment 17?

@mattsoulanille
Copy link
Member

@gyagp Maybe we can register our listener before Emscripten and suppress propagation of our event? However, I'm not sure if this is possible in a webworker created by Emscripten. Ideally, there would be something similar to CustomEvents that we could use as a unique event (we can't use window.dispatchEvent because it is synchronous).

I also updated Emscripten to 3.1.20, although this might not solve the issue.

@vladmandic
Copy link
Contributor Author

@mattsoulanille @gyagp i just created emscripten issue for this as this is really something that should be patched there - imo anything you do here is just a workaround to avoid bad emscripten behavior

@gyagp
Copy link

gyagp commented Sep 14, 2022

@mattsoulanille Thanks and let's wait for Emscripten's feedback at emscripten-core/emscripten#17844.

@hujiajie
Copy link
Contributor

Will MessageChannel fit your needs here?

@mattsoulanille
Copy link
Member

MessageChannel looks like it will work. Thanks @hujiajie for pointing it out to us! @gyagp, we should try this out as a drop-in replacement for window.postMessage.

A quick test:

lastTime = new Date().getTime();
i = 0;
function time() {
    i++;
    if (i % 1000 === 0) {
        currentTime = new Date().getTime();
        delta = currentTime - lastTime;
        console.log(`1000 iterations in ${delta} ms. ${delta / 1000} ms per iteration`);
        lastTime = currentTime;
    }
}

channel = new MessageChannel()
channel.port1.onmessage = () => {
    channel.port2.postMessage('hello');
    time();
};
channel.port2.postMessage('hello')

This looks like it has no 4ms delay (while the same code under a setTimeout 'loop' does):

1000 iterations in 9 ms. 0.009 ms per iteration
VM5549:8 1000 iterations in 11 ms. 0.011 ms per iteration
VM5549:8 1000 iterations in 9 ms. 0.009 ms per iteration
VM5549:8 1000 iterations in 10 ms. 0.01 ms per iteration
VM5549:8 1000 iterations in 11 ms. 0.011 ms per iteration
2VM5549:8 1000 iterations in 9 ms. 0.009 ms per iteration

@vladmandic
Copy link
Contributor Author

imo, MessageChannel looks nice, but unlike window.postMessage, its not in-process - it actually uses sockets on both sides.
perhaps not an issue (although it does seem more heavy-weight), but do make sure that there is a single reused MessageChannel, you don't want to exhaust ports stuck in half-close state.

@vladmandic
Copy link
Contributor Author

@mattsoulanille @gyagp emscripten pr has been merged to main, so now it should be just a question of upgrading to new version once its out.

@mattsoulanille
Copy link
Member

Thanks for the update @vladmandic, and for the extra info on MessageChannel. I agree that it would be better to stay in-process. Once emsdk 3.1.22 is published (and it looks like that will be soon since there's a PR open for it), I'll update it in TFJS.

@gyagp
Copy link

gyagp commented Sep 23, 2022

Sorry for the late response. MessageChannel also works and I updated my test (https://github.com/webatintel/webatintel.github.io/blob/main/timer.html#L44) to double confirm this. Given @vladmandic already fixed the issue in Emscripten upstream (thank you so much for the effort!), let's keep the current solution.

@vladmandic
Copy link
Contributor Author

@mattsoulanille fyi, emscripten 3.1.22 (and 3.1.23) have been published

@mattsoulanille
Copy link
Member

Thanks for the ping @vladmandic. I checked emsdk's github a few days ago, so I'm not sure how I missed the 3.1.22 update. I've created #6856.

@mattsoulanille
Copy link
Member

@vladmandic, emsdk has been upgraded to 3.1.23 in tfjs's master branch. Could you please check this against your esbuild build infrastructure when you get a chance? Thanks!

@vladmandic
Copy link
Contributor Author

@mattsoulanille @gyagp sure!
i just did a fresh checkout&build from https://github.com/gyagp/tfjs/tree/settimeout
and have no issues using USE_SETTIMEOUTCUSTOM

@gyagp
Copy link

gyagp commented Sep 27, 2022

@vladmandic Thanks for the double check!

mattsoulanille pushed a commit that referenced this issue Sep 27, 2022
If the setTimeout nesting level is greater than 5 and timeout is less
than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom
setTimeout is provided to mitigate the perf impact.

BUG: #6687
Co-authored-by: Na Li <[email protected]>
@vladmandic
Copy link
Contributor Author

closing as resolved - thanks!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

dydavidkim added a commit to CodeSmithDSMLProjects/tfjs that referenced this issue Sep 28, 2022
* Customize setTimeout (tensorflow#6694)

If the setTimeout nesting level is greater than 5 and timeout is less
than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom
setTimeout is provided to mitigate the perf impact.

BUG: tensorflow#6687
Co-authored-by: Na Li <[email protected]>

* Upgrade windows BrowserStack chrome to 104 (tensorflow#6866)

* webgpu: Disable importExternalTexture (tensorflow#6868)

WebGPU Working Group recently found some problem with
importExtenalTexture in spec, so we have to disable it temporarily.

Co-authored-by: Yang Gu <[email protected]>
Co-authored-by: Na Li <[email protected]>
Co-authored-by: Matthew Soulanille <[email protected]>
dydavidkim added a commit to CodeSmithDSMLProjects/tfjs that referenced this issue Sep 28, 2022
* Customize setTimeout (tensorflow#6694)

If the setTimeout nesting level is greater than 5 and timeout is less
than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom
setTimeout is provided to mitigate the perf impact.

BUG: tensorflow#6687
Co-authored-by: Na Li <[email protected]>

* Upgrade windows BrowserStack chrome to 104 (tensorflow#6866)

* webgpu: Disable importExternalTexture (tensorflow#6868)

WebGPU Working Group recently found some problem with
importExtenalTexture in spec, so we have to disable it temporarily.

Co-authored-by: Yang Gu <[email protected]>
Co-authored-by: Na Li <[email protected]>
Co-authored-by: Matthew Soulanille <[email protected]>
dydavidkim added a commit to CodeSmithDSMLProjects/tfjs that referenced this issue Sep 29, 2022
* Customize setTimeout (tensorflow#6694)

If the setTimeout nesting level is greater than 5 and timeout is less
than 4ms, timeout will be clamped to 4ms, which hurts the perf. A custom
setTimeout is provided to mitigate the perf impact.

BUG: tensorflow#6687
Co-authored-by: Na Li <[email protected]>

* Upgrade windows BrowserStack chrome to 104 (tensorflow#6866)

* webgpu: Disable importExternalTexture (tensorflow#6868)

WebGPU Working Group recently found some problem with
importExtenalTexture in spec, so we have to disable it temporarily.

* Refactored Resizing Layer Unit Tests (#38)

* Rescaling Preprocessing Layer
Co-authored-by:
David Kim (@koyykdy) <[email protected]>
Brian Zheng (@Brianzheng123) <[email protected]>

* PR issues resolved

* linting and PR issues resolved
Co-authored-by: Adam Lang (@AdamLang96) <[email protected]>
Co-authored-by: (@Brianzheng123) <[email protected]>

* initial implementation of image preprocessing: resizing layer, and associated unit tests. Comments and refactoring for image scaling layer

* refactoring in computeOutputShape for image resizing layer

* Unit tests for image resizing preprocessing layer expanded and refactored

* refactored unit tests for resizing layer

* Preprocessing-Resizing layer unit test expansion and refactoring.
Co-authored-by: Adam Lang <@AdamLang96> ([email protected])

* cleaning up commit diffs

* cleaning up commit diffs

* PR commit suggestions accepted - code refactored to reflect changes

* resizing layer unit test refactoring

Co-authored-by: AdamLang96 <[email protected]>

* Linting issue resolved: unused import statement culled (#39)

* Rescaling Preprocessing Layer
Co-authored-by:
David Kim (@koyykdy) <[email protected]>
Brian Zheng (@Brianzheng123) <[email protected]>

* PR issues resolved

* linting and PR issues resolved
Co-authored-by: Adam Lang (@AdamLang96) <[email protected]>
Co-authored-by: (@Brianzheng123) <[email protected]>

* initial implementation of image preprocessing: resizing layer, and associated unit tests. Comments and refactoring for image scaling layer

* refactoring in computeOutputShape for image resizing layer

* Unit tests for image resizing preprocessing layer expanded and refactored

* refactored unit tests for resizing layer

* Preprocessing-Resizing layer unit test expansion and refactoring.
Co-authored-by: Adam Lang <@AdamLang96> ([email protected])

* cleaning up commit diffs

* cleaning up commit diffs

* PR commit suggestions accepted - code refactored to reflect changes

* resizing layer unit test refactoring

* linting issues resolved: unusued import statement culled

Co-authored-by: AdamLang96 <[email protected]>

* Update jasmine_util.ts (tensorflow#6872)

FIX

* webgl: Fix NaN issue (tensorflow#6828)

Fix tensorflow#6822

Problem
1: On some GPUs, even if a and b are both non-NaN, the value of isNaN in vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0)); are still larger than 0., which misleads all values become NAN.
2: After resolving NAN issue, the result is still incorrect. It seems that the isnan_custom is not well supported on the problem GPU. After switching back to builtin isnan, everything works well.

Solution:
Use the bool type bvec4 instead of float type vec4 to calculate isNaN to avoid the the float precision issue when comparing with zero.
Meanwhile, add an env flag WEBGL2_ISNAN_CUSTOM to allow user to specify which isnan to use.

* Upgrade nodejs to 18.7.0 (tensorflow#6863)

* Upgrade nodejs to 18.7.0

* Fix hash table test string not passed as base64

* fixed prelu fusing code that pre-maturely neg the const on multiply (tensorflow#6876)

Co-authored-by: RajeshT <[email protected]>

* Update tfjs-layers/src/layers/preprocessing/image_resizing.ts

Co-authored-by: Matthew Soulanille <[email protected]>

Co-authored-by: Yang Gu <[email protected]>
Co-authored-by: Na Li <[email protected]>
Co-authored-by: Matthew Soulanille <[email protected]>
Co-authored-by: AdamLang96 <[email protected]>
Co-authored-by: Linchenn <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: Ping Yu <[email protected]>
Co-authored-by: RajeshT <[email protected]>
Co-authored-by: Matthew Soulanille <[email protected]>

Co-authored-by: Yang Gu <[email protected]>
Co-authored-by: Na Li <[email protected]>
Co-authored-by: Matthew Soulanille <[email protected]>
Co-authored-by: AdamLang96 <[email protected]>
Co-authored-by: Linchenn <[email protected]>
Co-authored-by: Jiajia Qin <[email protected]>
Co-authored-by: Ping Yu <[email protected]>
Co-authored-by: RajeshT <[email protected]>
Co-authored-by: Matthew Soulanille <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants