Breaking change: #7951 (tfjs-v4.11.0) #7974

AmitMY · 2023-09-24T12:42:52Z

TensorFlow.js version (use command below): tfjs-v4.11.0

Describe the current behavior
Old models fail now that #7951 is merged. Error:

ERROR Error: Uncaught (in promise): Error: {this.name} outputs 1 tensors but only one mask
Error: {this.name} outputs 1 tensors but only one mask

(This is not the case with v4.10.0)

Describe the expected behavior
Minor updates should not break functionality

Standalone code to reproduce the issue
Same as #6699

gaikwadrahul8 · 2023-09-25T13:30:53Z

Hi, @AmitMY

Thank you for bringing this issue to our attention and I attempted to replicate the same issue from my end and I'm getting below error message so to confirm, are you also getting the same error message or Am I doing something wrong here, if so please help me with steps to replicate the same issue from my end which will help us to investigate this issue further.

I'm following this official documentation to replicate the same issue from my end. Thank you!

I received the below error message :

AmitMY · 2023-09-26T08:22:17Z

I'm getting the same issue now.

Here, I wrote a minimal reproduction:
What I learned, is that the problem goes away if I don't resetDropout (I don't want dropout in inference)

(Here's a directory with a new model, and an HTML file. to reproduce, I ran http-server .)
https://drive.google.com/file/d/1Y8OIdTetbDn0mcppE_sRL8bRP-vh_1tP/view?usp=drive_link

<!DOCTYPE html>
<html lang="en-US">
<head>
    <meta charset="utf-8" />
    <title>TensorFlow.js browser example</title>

    <!-- Load TensorFlow.js from a script tag -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest/dist/tf.min.js"></script>
</head>
<body>
<h1>TensorFlow.js example</h1>
<h2>Open the console to see the results.</h2>
<script>
    function resetDropout(layers) {
        for (const layer of layers) {
            // For Sequential models, the layers are in the layers property
            if (layer.layers) {
                resetDropout(layer.layers);
            }

            // For TimeDistributed models, the layer is in the layer property
            if (layer.layer) {
                resetDropout([layer.layer]);
            }

            // If the layer is a dropout layer, reset the rate
            if (layer.rate) {
                layer.rate = 0;
            }
        }
    }

    async function main() {
        const model = await tf.loadLayersModel('model_new.json');
        resetDropout(model.layers);

        const tensor = tf.zeros([1, 1, 256, 256, 3]).toFloat();
        tensor.print();

        // Must apply model in training=True mode to avoid using aggregated norm statistics
        model.apply(tensor, {training: true}).print();
    }

    main();
</script>
</body>
</html>

gaikwadrahul8 · 2023-09-27T09:38:44Z

Hi, @AmitMY

Thank you for providing the minimal code example to replicate the same issue from my end and I'm getting below error message so to confirm, Are you also getting the same error message or different error message if possible could you please share your error log or screenshot which will help us to investigate this issue further. Thank you!

For reference I have added screenshot below :

When I tried without resetDropout option that error is not occurring and it's working as expected if I'm not wrong :

Tensor
    [[[[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       ...
       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]]]]
print.js:34 Tensor
    [[[[[-0.5485492, 0.1209364 , -0.5575122],
        [-0.4198668, 0.3109047 , -0.3952827],
        [-0.4475051, 0.3676944 , -0.4088269],
        ...,
        [-0.3347363, -0.1958253, -0.5153912],
        [-0.3527599, -0.2083139, -0.5241337],
        [-0.4063931, -0.0599973, -0.5099336]],

       [[-0.4308985, 0.2829142 , -0.4243529],
        [-0.3667891, 0.3545043 , -0.3479016],
        [-0.4654999, 0.3381675 , -0.4241154],
        ...,
        [-0.1721301, -0.3778005, -0.4985313],
        [-0.1661717, -0.3855677, -0.4854655],
        [-0.2940664, -0.1737114, -0.4872297]],

       [[-0.485551 , 0.2725164 , -0.4622128],
        [-0.4397161, 0.3448741 , -0.4254563],
        [-0.5056862, 0.3295551 , -0.469498 ],
        ...,
        [-0.185382 , -0.3900234, -0.5167618],
        [-0.192379 , -0.3988956, -0.5107889],
        [-0.2973868, -0.1739733, -0.5081293]],

       ...
       [[-0.4989672, 0.3559542 , -0.4790567],
        [-0.4982536, 0.3546511 , -0.4794993],
        [-0.4979915, 0.3549074 , -0.478335 ],
        ...,
        [-0.5017183, 0.3484358 , -0.4759241],
        [-0.4635063, 0.3229676 , -0.4686852],
        [-0.4900923, 0.3222119 , -0.4737524]],

       [[-0.4948567, 0.3582715 , -0.4819818],
        [-0.4974912, 0.3562227 , -0.4809256],
        [-0.4989477, 0.3557112 , -0.4791932],
        ...,
        [-0.4984515, 0.3406397 , -0.4796974],
        [-0.4526571, 0.2930876 , -0.4691183],
        [-0.4757496, 0.2983395 , -0.4714819]],

       [[-0.494108 , 0.3538587 , -0.4799017],
        [-0.493968 , 0.3533177 , -0.4801645],
        [-0.4948561, 0.3538442 , -0.4791983],
        ...,
        [-0.499143 , 0.3447358 , -0.4813157],
        [-0.4782982, 0.3105071 , -0.4720287],
        [-0.4577081, 0.2640458 , -0.4753132]]]]]

AmitMY · 2023-09-27T12:42:35Z

Yes, the error I am getting is the same:

ERROR Error: Uncaught (in promise): Error: {this.name} outputs 1 tensors but only one mask
Error: {this.name} outputs 1 tensors but only one mask

This is happening with 4.11.0 but not with 4.10.0

Just so it is clear:
When running a model in predict mode there is no dropout, but I have to use training for this model because of gathered training statistics. This is another known issue.
But in any case, this is a breaking change.

gaikwadrahul8 · 2023-09-29T05:12:06Z

Hi, @AmitMY

Thank you for the confirmation, I really appreciate your findings and observations which will help us to investigate this issue further. We will have to dig more into this issue and will update you soon. Thank you!

Linchenn · 2023-09-29T17:48:51Z

Hi @haoyunfeix , I saw you fixed #6699 before, could you help check this again?

gyagp · 2023-09-30T01:53:26Z

Yunfei already left Intel, so he may not double check this. We will take a look once we get back to office after holidays (Oct 7).

Linchenn · 2023-09-30T05:49:38Z

Thank you Yang and take your time.

mattsoulanille · 2023-10-03T20:43:48Z

This is likely a bug in #7951. I'm trying to reproduce it, but I'm running into a different error than expected:

Uncaught (in promise) Error: Identity matrix initializer can only be used for 2D square matrices.
    at initializers.js:241:15
    at engine.js:469:20
    at e.value (engine.js:480:19)
    at e.value (engine.js:467:17)
    at e8 (globals.js:192:17)
    at n.value (initializers.js:239:12)
    at n.value (topology.js:1288:35)
    at n.value (recurrent.js:1703:24)
    at n.value (recurrent.js:516:17)
    at topology.js:981:14

Did the model file shared at that drive link change? The sha256sum I'm seeing for the zip file is ab24c0af3c44d6268b8914a346bd8917b17984245e9ffd6b0ee5d751cefd7167. Additionally, I'd be interested to know if this only fails with WebGPU, since if it's a bu in #7951, it's probably going to affect all backends.

AmitMY · 2023-10-04T12:30:48Z

I redownloaded the zip file and can confirm the same hash:

shasum -a 256 web_model.zip
ab24c0af3c44d6268b8914a346bd8917b17984245e9ffd6b0ee5d751cefd7167

However, the error I am seeing is still

Uncaught (in promise) Error: {this.name} outputs 1 tensors but only one mask

Furthermore, this repro is not using WebGPU, but WebGL:

axinging · 2023-10-07T05:51:26Z

I tried webgpu, webgl, cpu, all complain same errors:

topology.js:1409 Uncaught (in promise) Error: {this.name} outputs 1 tensors but only one mask
    at n.value (topology.js:1409:13)
    at topology.js:1013:16
    at nue (common.js:48:20)
    at n.value (topology.js:967:12)
    at xce (executor.js:341:25)
    at container.js:746:14
    at engine.js:469:20
    at e.value (engine.js:480:19)
    at e.value (engine.js:467:17)
    at e8 (globals.js:192:17)

Revert this change (#7951) works well on webgpu, webgl, cpu.

Below code is required to change backend before predict:

        await tf.setBackend('webgl'); //webgpu, cpu
        console.log(tf.getBackend());
        await tf.ready();

I drafted a fix here: #7993

Bug: tensorflow#7974

Bug: #7974

gaikwadrahul8 · 2023-10-18T12:44:38Z

Hi, @AmitMY

I apologize for the delayed response. I see that PR #7993 has been merged and I have tested the same code snippet, which is now working as expected. could you please confirm this from your end once ? if so, please feel free to close this issue ? Thank you!

AmitMY · 2023-10-18T14:01:27Z

i can confirm that i am using 4.12.0

but that the bug remains

My reproduction still reproduces the issue in 4.12.0
#7974 (comment)

axinging · 2023-10-19T01:54:58Z

@AmitMY, it seems 4.12.0 doesn't include change #7993.
You can see 4.12.0 commits history here:
https://github.com/tensorflow/tfjs/commits/cb5cd829909a07cdbd14975f2169a78e9213bf4a
Also, you can view it here: https://github.com/tensorflow/tfjs/releases/tag/tfjs-v4.12.0

So @gaikwadrahul8 @Linchenn, Could you help why #7993 is not included in v4.12.0?

BTW, @AmitMY, if you want to try this feature on the latest code, You can download TFJS code and build, then put below file under fold tfjs\e2e\benchmarks\local-benchmark:


<head>
  <title>TensorFlow.js Op demo</title>
</head>

<body>
  <h2>TensorflowJS Model demo</h2>
  <script src="loader.js"></script>
  <script>
    'use strict';

    function resetDropout(layers) {
      for (const layer of layers) {
        // For Sequential models, the layers are in the layers property
        if (layer.layers) {
          resetDropout(layer.layers);
        }

        // For TimeDistributed models, the layer is in the layer property
        if (layer.layer) {
          resetDropout([layer.layer]);
        }

        // If the layer is a dropout layer, reset the rate
        if (layer.rate) {
          layer.rate = 0;
        }
      }
    }

    async function main() {
      await tf.setBackend('webgpu');
      console.log(tf.getBackend());
      await tf.ready();
      tf.env().set('WEBGPU_CPU_FORWARD', false);
      const model = await tf.loadLayersModel('model.json');
      resetDropout(model.layers);

      const tensor = tf.zeros([1, 1, 256, 256, 3]);//.toFloat();
      tensor.print();

      // Must apply model in training=True mode to avoid using aggregated norm statistics
      model.apply(tensor, { training: true }).print();
      console.log("end");
    }

    (async function () {
      let localBuild = ['core', 'cpu', 'webgl', 'webgpu', 'tfjs-converter', 'layers'];
      await loadTFJS(localBuild);
      main();
    })();
  </script>
</body>

</html>

AmitMY · 2023-10-23T10:12:43Z

since @gaikwadrahul8 says it works, i'll just wait for it to be released. Do you know why it was not released, and if it would be in 4.13.0?

mattsoulanille · 2023-10-23T22:06:21Z

Sorry, we're working on a new RC based release structure, and our branch cut for 4.12.0 was before the fix was merged. This will be included in 4.13.0 next week.

peacefulotter · 2023-11-06T12:19:19Z

Any news on this? 4.13.0 seems to not be released yet and it's been more than a week. Thanks!

tylersalminen · 2023-11-07T15:52:27Z

Any news on this? 4.13.0 seems to not be released yet and it's been more than a week. Thanks!

ditto, waiting for new release with this fix so can enable dropout in our training models

AmitMY · 2023-11-08T14:25:14Z

No bug in 4.13.0 🥳

google-ml-butler · 2023-11-08T14:25:16Z

Are you satisfied with the resolution of your issue?
Yes
No

AmitMY added the type:bug Something isn't working label Sep 24, 2023

gaikwadrahul8 self-assigned this Sep 25, 2023

gaikwadrahul8 added the comp:webgpu label Sep 25, 2023

gaikwadrahul8 added the stat:awaiting response label Sep 25, 2023

google-ml-butler bot removed the stat:awaiting response label Sep 26, 2023

gaikwadrahul8 added the stat:awaiting response label Sep 27, 2023

google-ml-butler bot removed the stat:awaiting response label Sep 27, 2023

gaikwadrahul8 assigned Linchenn Sep 29, 2023

gaikwadrahul8 added the stat:awaiting tensorflower label Sep 29, 2023

mattsoulanille self-assigned this Oct 3, 2023

axinging added a commit to axinging/tfjs that referenced this issue Oct 7, 2023

[tfjs-layers] Fix outputs and mask mismatch

85aeabc

Bug: tensorflow#7974

axinging mentioned this issue Oct 7, 2023

[tfjs-layers] Fix outputs and mask mismatch #7993

Merged

Linchenn pushed a commit that referenced this issue Oct 9, 2023

[tfjs-layers] Fix outputs and mask mismatch (#7993)

4687567

Bug: #7974

gaikwadrahul8 added stat:awaiting response and removed stat:awaiting tensorflower labels Oct 18, 2023

google-ml-butler bot removed the stat:awaiting response label Oct 18, 2023

AmitMY closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking change: #7951 (tfjs-v4.11.0) #7974

Breaking change: #7951 (tfjs-v4.11.0) #7974

AmitMY commented Sep 24, 2023

gaikwadrahul8 commented Sep 25, 2023 •

edited

Loading

AmitMY commented Sep 26, 2023

gaikwadrahul8 commented Sep 27, 2023 •

edited

Loading

AmitMY commented Sep 27, 2023 •

edited

Loading

gaikwadrahul8 commented Sep 29, 2023

Linchenn commented Sep 29, 2023

gyagp commented Sep 30, 2023

Linchenn commented Sep 30, 2023

mattsoulanille commented Oct 3, 2023 •

edited

Loading

AmitMY commented Oct 4, 2023

axinging commented Oct 7, 2023 •

edited

Loading

gaikwadrahul8 commented Oct 18, 2023

AmitMY commented Oct 18, 2023

axinging commented Oct 19, 2023 •

edited

Loading

AmitMY commented Oct 23, 2023

mattsoulanille commented Oct 23, 2023

peacefulotter commented Nov 6, 2023

tylersalminen commented Nov 7, 2023

AmitMY commented Nov 8, 2023

google-ml-butler bot commented Nov 8, 2023

Breaking change: #7951 (tfjs-v4.11.0) #7974

Breaking change: #7951 (tfjs-v4.11.0) #7974

Comments

AmitMY commented Sep 24, 2023

gaikwadrahul8 commented Sep 25, 2023 • edited Loading

AmitMY commented Sep 26, 2023

gaikwadrahul8 commented Sep 27, 2023 • edited Loading

AmitMY commented Sep 27, 2023 • edited Loading

gaikwadrahul8 commented Sep 29, 2023

Linchenn commented Sep 29, 2023

gyagp commented Sep 30, 2023

Linchenn commented Sep 30, 2023

mattsoulanille commented Oct 3, 2023 • edited Loading

AmitMY commented Oct 4, 2023

axinging commented Oct 7, 2023 • edited Loading

gaikwadrahul8 commented Oct 18, 2023

AmitMY commented Oct 18, 2023

axinging commented Oct 19, 2023 • edited Loading

AmitMY commented Oct 23, 2023

mattsoulanille commented Oct 23, 2023

peacefulotter commented Nov 6, 2023

tylersalminen commented Nov 7, 2023

AmitMY commented Nov 8, 2023

google-ml-butler bot commented Nov 8, 2023

gaikwadrahul8 commented Sep 25, 2023 •

edited

Loading

gaikwadrahul8 commented Sep 27, 2023 •

edited

Loading

AmitMY commented Sep 27, 2023 •

edited

Loading

mattsoulanille commented Oct 3, 2023 •

edited

Loading

axinging commented Oct 7, 2023 •

edited

Loading

axinging commented Oct 19, 2023 •

edited

Loading