-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[webgpu] Create tensor from GPUBuffer #7034
Conversation
454261d
to
921f57f
Compare
4d34bd1
to
d39a410
Compare
718938a
to
7d560d7
Compare
@qjia7 @xhcao @haoyunfeix @gyagp PTAL |
} | ||
|
||
tensorData | ||
.resourceInfo = {size, usage: this.defaultGpuBufferUsage(), buffer}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use tensorData.resourceInfo = {size: buffer.size, usage: buffer.usage, buffer}
?
@@ -92,6 +92,67 @@ import {makeTensor} from './tensor_ops_util'; | |||
* | |||
* const tex = a.dataToGPU(); | |||
* ``` | |||
* | |||
* ```js | |||
* // Pass a `WebGPUData` object and specify a shape yourself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also emphasize that we will directly bind this buffer to the new created tensor to support zero copy. So please DONOT destroy this buffer until all calculations are finished in TFJS.
expect(endNumBytes - startNumBytes).toEqual(0); | ||
expect(endNumTensors - startNumTensors).toEqual(0); | ||
aBuffer.destroy(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add tests for buffer.size > shape and buffer.size < shape.
7d2469c
to
6944a59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work and thank you Xu! Overall LGTM except a few nits and a question:
If I understand correctly, this implementation will create a new tensor using the GPUBuffer passed from users. However, WebGL's implementation always creates a new texture and copies values from the texture passed from users. I am curious, is this intended? should we keep the same approach for the two backends, even though this implementation is good for performance (also directly resolves #6232)? @qjia7 @axinging
Reviewed 4 of 10 files at r1, 6 of 7 files at r4.
Reviewable status: 0 of 1 approvals obtained (waiting on @axinging and @xhcao)
tfjs-core/src/tensor_util_env.ts
line 33 at r4 (raw file):
const isObject = typeof val === 'object'; if (isObject) { if ('texture' in val && val.texture instanceof WebGLTexture) {
Great catch! Thanks!
Code quote:
val.texture instanceof WebGLTexture
tfjs-core/src/ops/tensor.ts
line 105 at r1 (raw file):
* * // Example for WebGPU: * async function createReadonlyGPUBufferFromData(device, data, dtype) {
Could this be a non-async function?
tfjs-core/src/ops/tensor.ts
line 157 at r1 (raw file):
* ``` * @param values The values of the tensor. Can be nested array of numbers, * or a flat array, or a `TypedArray`, or a `WebGLData` object. If the
Could you also add some description about WebGPUData
in value
parameter's description?
Thanks @Linchenn |
Yes, it's intended for performance. But like you said, it will result different behavior between webgl and webgpu. For webgl, once the user creates the tensor from the external texture. The user can directly destroy that texture. However, for webgpu, the user can't destroy the external buffer until tfjs finished the calculation. To keep the user have consistent usage experience, can we tell user that once users use this API to create tensor from GPU data no matter it's webgl or webgpu, to support zero copy for performance, we require that they can't destroy this external gpu resource until tfjs finished all the calculations? For webgl, it's not true, but it leaves room for optimization for webgl in future. How do you think? @Linchenn @pyu10055 @lina128 |
tfjs-core/src/ops/tensor.ts
Outdated
* tensor values. ). (If the values passed from texture is less than the tensor | ||
* size, zeros will be padded at the rear.). If the values is a `WebGPUData` | ||
* object, the dtype could only be 'float32' or 'int32 and the object has to | ||
* have: buffer, a `GPUBuffer`, the buffer must share the same `GPUDevice` with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline synced with Xing, we need to tell user what kind of buffer usage are needed. Check it in the conde. Also add corresponding tests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @qjia7 for the clarification. I am look at the API itself, the parameter values
typically refer passing by value mechanism, which means the values is copied during the call comparing to passing by reference.
I think we might want to have two APIs to differentiate the mechanisms ( 1 for values and 1 for reference).
Reviewable status: 0 of 1 approvals obtained (waiting on @axinging, @qjia7, and @xhcao)
Thanks @pyu10055. I am trying to understand if Javascript is "passing by value" or "passing by reference" here.
Regarding to above definition, number|boolean should be look like "passing by value". But How about TypedArray?
|
ff5d58f
to
196cf98
Compare
Hi, @pyu10055 @qjia7, I think @huningxin @BruceDai is the user of this API, maybe they have some inputs about this? The WebGL version which requires an extra copy: #6853 |
You are right, but I think Ping's point is that the current |
Oh, Thanks @Linchenn @pyu10055 , sorry I misunderstand ping's idea. I think there are four situations here based on your and ping's comments:
So do you mean we should support something like below?
|
I think for the first two cases, we are copying values here: tfjs/tfjs-core/src/ops/tensor_ops_util.ts Lines 83 to 85 in 48b2009
Could we implement the 'passing by value' one for WebGPU at first? like: export function tensor<R extends Rank>(
values: TensorLike|WebGLData|WebGPUData, shape?: ShapeMap[R],
dtype?: DataType): Tensor<R> From my perspective, this has some overheads (I am not sure if the overhead is significant), but it complies with our current design and avoids using data managed by users. |
For TypedArray, it has a "noConversionNeeded" case, I think this requires no copy(please correct me if anything wrong).
We can understand this on perf and data read write. PerfFrom fast to slow: TypedArray no conversion > TypedArray with conversion > Array. Regarding to "TypedArray no conversion" and "TypedArray with conversion".
When run "TypedArray with conversion" first, "TypedArray no conversion" is a little fast than "TypedArray with conversion".
Below is the test case:
Data read writeBased on data read write test, TypedArray with conversion looks like deep copy, TypedArray no conversion looks like shallow copy.
Test result:
|
c839ead
to
6e0937c
Compare
// resource buffer. When zeroCopy is true, tensor will use this GPUBUffer as | ||
// tensor's resource buffer, user should not destroy this GPUBuffer until all | ||
// access are done. | ||
zeroCopy?: boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still like external
as the name to indicate the resource comes from external and can't be destroyed in tfjs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And change WEBGPU_TENSOR_FROM_BUFFER_WITH_ZERO_COPY
flag to WEBGPU_TENSOR_USE_EXTERNAL_BUFFER
to keep consistency?
throw new Error(`GPUBuffer size(${ | ||
values.buffer.size}) is smaller than tensor size(${size})!`); | ||
} else if ( | ||
(values.buffer.usage & GPUBufferUsage.STORAGE) !== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's unify to use GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC
as the minimum requirements. I don't want users meet errors that they can't call tensor.data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pyu10055 @Linchenn Thanks for your comments and suggestions. We internally synced this issue and reached an agreement to support create tensor from buffer with one copy by default to keep consistent with other backends. Meanwhile, we add a webgpu flag to enable zero copy path so that we can collect some perf data conveniently in future. Are you ok with this solution?
// resource buffer. When zeroCopy is true, tensor will use this GPUBUffer as | ||
// tensor's resource buffer, user should not destroy this GPUBuffer until all | ||
// access are done. | ||
zeroCopy?: boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And change WEBGPU_TENSOR_FROM_BUFFER_WITH_ZERO_COPY
flag to WEBGPU_TENSOR_USE_EXTERNAL_BUFFER
to keep consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @qjia7 @axinging the approach looks good to me, one suggestion is to add the zeroCopy flag (or call something else) to the WebGPUData interface instead of being a global flag on the backend.
I think this way, it is easier to document the meaning of the flag and the related API, and in the future if WebGL would support the same functionality it can add this flag as well.
Reviewable status: 0 of 1 approvals obtained (waiting on @axinging, @qjia7, and @xhcao)
tfjs-backend-webgpu/src/backend_webgpu.ts
line 59 at r10 (raw file):
Previously, qjia7 (Jiajia Qin) wrote…
And change
WEBGPU_TENSOR_FROM_BUFFER_WITH_ZERO_COPY
flag toWEBGPU_TENSOR_USE_EXTERNAL_BUFFER
to keep consistency?
+1
tfjs-core/src/types.ts
line 191 at r10 (raw file):
* creating a tensor, tensor type is float32. */ export interface WebGPUData {
can we have the zeroCopy flag here, instead of a global flag?
ab5807a
to
c3cf9d0
Compare
@pyu10055 @Linchenn @qjia7 , zeroCopy is added into WebGPUData, PTAL. The current status of copy is summarized as below:
|
c3cf9d0
to
81b08d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Xu so much for the amazing work! LGTM except some small nits.
Reviewed all commit messages.
Reviewable status: complete! 2 of 1 approvals obtained (waiting on @axinging, @qjia7, and @xhcao)
tfjs-core/src/ops/tensor.ts
line 105 at r13 (raw file):
* // by WebGPUData.zeroCopy. When zeroCopy is false or undefined(default), this * // passing GPUBuffer can be destroyed after tensor is created. When zeroCopy * // is true, this GPUBuffer is bound directly by the tensor, so donot destroy
do not
tfjs-core/src/ops/tensor.ts
line 184 at r13 (raw file):
* zero copy by flag zeroCopy. When zeroCopy is false or undefined(default), * this passing GPUBuffer can be destroyed after tensor is created. When * zeroCopy is true, this GPUBuffer is bound directly by the tensor, so donot
do not
81b08d3
to
fd85685
Compare
BUG: #6232
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is