-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cpu forward for Cast #6760
Add cpu forward for Cast #6760
Conversation
It's requred by cityscapes architecture for DeepLabV3. Otherwise, it has two issues: 1. it will call dataSync, which is not supported for webgpu. 2. Even it's supported. It's bad to happen cpu/gpu data sync during the model execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Linchenn Will this be needed for WebGL?
Reviewed 1 of 1 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @gyagp and @Linchenn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @qjia7 , have you seen the performance gain from this change?
I am not clear if we could get performance gain for WebGL. I just checked the graph of this model and found all three cast
op appear at the beginning of the graph and are followed by pack
op, ExpandDims
op and ResizeBilinear
op, which could be also executed on WebGL without forwarding to CPU. So I am considering, if cast
op happens at WebGL, the following ops will still work on WebGL without forwarding to CPU.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @gyagp and @qjia7)
tfjs-backend-cpu/src/kernels/Cast.ts
line 82 at r3 (raw file):
return result; }
Do we need to move the casting logics related to complex64 into castImpl?
Code quote:
// Casting to complex64.
if (dtype === 'complex64') {
if (x.dtype === 'complex64') {
return identity({inputs: {x}, backend});
}
const zerosTensorInfo = zeros(backend, x.shape, x.dtype);
const floatX = cast({inputs: {x}, backend, attrs: {dtype: 'float32'}});
const result =
complex({inputs: {real: floatX, imag: zerosTensorInfo}, backend});
backend.disposeIntermediateTensorInfo(zerosTensorInfo);
backend.disposeIntermediateTensorInfo(floatX);
return result;
}
// Casting from complex64
if (x.dtype === 'complex64') {
const realPart = real({inputs: {input: x}, backend});
const result = cast({inputs: {x: realPart}, backend, attrs: {dtype}});
backend.disposeIntermediateTensorInfo(realPart);
return result;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's also needed by webgl. For debugging, you can set a breakpoint in https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/kernels/Cast.ts#L71. It will be caught and goes to int
which is a pure gpu implementation. However, the result of this Cast
is expected to be as the input of parameter size
for op resizeBilinear
. However, the size
's type is [number, number]
which is expected to call dataSync to get the parameter value. Then it will happen a gpu-cpu copy in webgl.
ExpandDims op and ResizeBilinear op, which could be also executed on WebGL without forwarding to CPU.
ExpandDims
is as the first parameter of resizeBilinear
. There are no problem for them. But the issue is the second parameter size
of resizeBilinear
, which is expected in cpu. However, the Cast makes it in gpu. That's the place where happens cpu-> gpu- > cpu.
I test the performance. It may have 3~5 ms impact using NV3080 (160ms -> 155ms).
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @gyagp and @Linchenn)
tfjs-backend-cpu/src/kernels/Cast.ts
line 82 at r3 (raw file):
Previously, Linchenn wrote…
Do we need to move the casting logics related to complex64 into castImpl?
I roughly looked at all the xxxImp. It seems that it won't take a backend
parameter and only process the value
. So I am not sure how to process complex64
.
For simplicity, I did the current change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late replay. Thank you for the detailed investigation! LGTM
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @gyagp)
tfjs-backend-cpu/src/kernels/Cast.ts
line 82 at r3 (raw file):
Previously, qjia7 (Jiajia Qin) wrote…
I roughly looked at all the xxxImp. It seems that it won't take a
backend
parameter and only process thevalue
. So I am not sure how to processcomplex64
.
For simplicity, I did the current change.
LGTM. Let me check this, #6777. If it is needed, we could do it in a seperate PR. (I just consider if we should put all casting logics into one function, instead of scattering.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @gyagp)
tfjs-backend-cpu/src/kernels/Cast.ts
line 82 at r3 (raw file):
Previously, Linchenn wrote…
LGTM. Let me check this, #6777. If it is needed, we could do it in a seperate PR. (I just consider if we should put all casting logics into one function, instead of scattering.)
Sounds good. Thank you!
It's requred by cityscapes architecture for DeepLabV3. Otherwise, it has
two issues: 1. it will call dataSync, which is not supported for webgpu.
2. Even it's supported. It's bad to happen cpu/gpu data sync during the
model execution.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is