Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cpu forward for Cast #6760

Merged
merged 5 commits into from
Aug 22, 2022
Merged

Add cpu forward for Cast #6760

merged 5 commits into from
Aug 22, 2022

Conversation

qjia7
Copy link
Contributor

@qjia7 qjia7 commented Aug 17, 2022

It's requred by cityscapes architecture for DeepLabV3. Otherwise, it has
two issues: 1. it will call dataSync, which is not supported for webgpu.
2. Even it's supported. It's bad to happen cpu/gpu data sync during the
model execution.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

It's requred by cityscapes architecture for DeepLabV3. Otherwise, it has
two issues: 1. it will call dataSync, which is not supported for webgpu.
2. Even it's supported. It's bad to happen cpu/gpu data sync during the
model execution.
@qjia7 qjia7 changed the title webgpu: Add cpu forward for ToInt32 webgpu: Add cpu forward for Cast Aug 17, 2022
@qjia7 qjia7 changed the title webgpu: Add cpu forward for Cast Add cpu forward for Cast Aug 17, 2022
Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Linchenn Will this be needed for WebGL?

Reviewed 1 of 1 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @gyagp and @Linchenn)

Copy link
Collaborator

@Linchenn Linchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qjia7 , have you seen the performance gain from this change?

I am not clear if we could get performance gain for WebGL. I just checked the graph of this model and found all three cast op appear at the beginning of the graph and are followed by pack op, ExpandDims op and ResizeBilinear op, which could be also executed on WebGL without forwarding to CPU. So I am considering, if cast op happens at WebGL, the following ops will still work on WebGL without forwarding to CPU.

image.png

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @gyagp and @qjia7)


tfjs-backend-cpu/src/kernels/Cast.ts line 82 at r3 (raw file):

    return result;
  }

Do we need to move the casting logics related to complex64 into castImpl?

Code quote:

  // Casting to complex64.
  if (dtype === 'complex64') {
    if (x.dtype === 'complex64') {
      return identity({inputs: {x}, backend});
    }

    const zerosTensorInfo = zeros(backend, x.shape, x.dtype);
    const floatX = cast({inputs: {x}, backend, attrs: {dtype: 'float32'}});

    const result =
        complex({inputs: {real: floatX, imag: zerosTensorInfo}, backend});

    backend.disposeIntermediateTensorInfo(zerosTensorInfo);
    backend.disposeIntermediateTensorInfo(floatX);

    return result;
  }

  // Casting from complex64
  if (x.dtype === 'complex64') {
    const realPart = real({inputs: {input: x}, backend});
    const result = cast({inputs: {x: realPart}, backend, attrs: {dtype}});

    backend.disposeIntermediateTensorInfo(realPart);

    return result;
  }

Copy link
Contributor Author

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's also needed by webgl. For debugging, you can set a breakpoint in https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/kernels/Cast.ts#L71. It will be caught and goes to int which is a pure gpu implementation. However, the result of this Cast is expected to be as the input of parameter size for op resizeBilinear. However, the size's type is [number, number] which is expected to call dataSync to get the parameter value. Then it will happen a gpu-cpu copy in webgl.

ExpandDims op and ResizeBilinear op, which could be also executed on WebGL without forwarding to CPU.

ExpandDims is as the first parameter of resizeBilinear. There are no problem for them. But the issue is the second parameter size of resizeBilinear, which is expected in cpu. However, the Cast makes it in gpu. That's the place where happens cpu-> gpu- > cpu.

I test the performance. It may have 3~5 ms impact using NV3080 (160ms -> 155ms).

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @gyagp and @Linchenn)


tfjs-backend-cpu/src/kernels/Cast.ts line 82 at r3 (raw file):

Previously, Linchenn wrote…

Do we need to move the casting logics related to complex64 into castImpl?

I roughly looked at all the xxxImp. It seems that it won't take a backend parameter and only process the value. So I am not sure how to process complex64.
For simplicity, I did the current change.

Copy link
Collaborator

@Linchenn Linchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late replay. Thank you for the detailed investigation! LGTM

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @gyagp)


tfjs-backend-cpu/src/kernels/Cast.ts line 82 at r3 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

I roughly looked at all the xxxImp. It seems that it won't take a backend parameter and only process the value. So I am not sure how to process complex64.
For simplicity, I did the current change.

LGTM. Let me check this, #6777. If it is needed, we could do it in a seperate PR. (I just consider if we should put all casting logics into one function, instead of scattering.)

Copy link
Contributor Author

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @gyagp)


tfjs-backend-cpu/src/kernels/Cast.ts line 82 at r3 (raw file):

Previously, Linchenn wrote…

LGTM. Let me check this, #6777. If it is needed, we could do it in a seperate PR. (I just consider if we should put all casting logics into one function, instead of scattering.)

Sounds good. Thank you!

@qjia7 qjia7 merged commit 0557e78 into tensorflow:master Aug 22, 2022
@qjia7 qjia7 deleted the int_cpu branch August 22, 2022 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants