Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[webgpu] Create tensor from GPUBuffer #7034

Merged
merged 9 commits into from
Nov 23, 2022

Conversation

axinging
Copy link
Contributor

@axinging axinging commented Nov 10, 2022

BUG: #6232

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch from 454261d to 921f57f Compare November 10, 2022 04:47
@axinging axinging changed the title [webgpu] create tensor from GPUBuffer [webgpu] Create tensor from GPUBuffer Nov 10, 2022
@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch 4 times, most recently from 4d34bd1 to d39a410 Compare November 10, 2022 05:57
@axinging axinging marked this pull request as ready for review November 10, 2022 06:35
@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch 2 times, most recently from 718938a to 7d560d7 Compare November 11, 2022 02:51
@axinging
Copy link
Contributor Author

@qjia7 @xhcao @haoyunfeix @gyagp PTAL

}

tensorData
.resourceInfo = {size, usage: this.defaultGpuBufferUsage(), buffer};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use tensorData.resourceInfo = {size: buffer.size, usage: buffer.usage, buffer}?

@@ -92,6 +92,67 @@ import {makeTensor} from './tensor_ops_util';
*
* const tex = a.dataToGPU();
* ```
*
* ```js
* // Pass a `WebGPUData` object and specify a shape yourself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also emphasize that we will directly bind this buffer to the new created tensor to support zero copy. So please DONOT destroy this buffer until all calculations are finished in TFJS.

expect(endNumBytes - startNumBytes).toEqual(0);
expect(endNumTensors - startNumTensors).toEqual(0);
aBuffer.destroy();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add tests for buffer.size > shape and buffer.size < shape.

@qjia7 qjia7 requested review from Linchenn and xhcao November 14, 2022 06:27
@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch from 7d2469c to 6944a59 Compare November 15, 2022 07:01
Copy link
Collaborator

@Linchenn Linchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work and thank you Xu! Overall LGTM except a few nits and a question:

If I understand correctly, this implementation will create a new tensor using the GPUBuffer passed from users. However, WebGL's implementation always creates a new texture and copies values from the texture passed from users. I am curious, is this intended? should we keep the same approach for the two backends, even though this implementation is good for performance (also directly resolves #6232)? @qjia7 @axinging

Reviewed 4 of 10 files at r1, 6 of 7 files at r4.
Reviewable status: 0 of 1 approvals obtained (waiting on @axinging and @xhcao)


tfjs-core/src/tensor_util_env.ts line 33 at r4 (raw file):

  const isObject = typeof val === 'object';
  if (isObject) {
    if ('texture' in val && val.texture instanceof WebGLTexture) {

Great catch! Thanks!

Code quote:

val.texture instanceof WebGLTexture

tfjs-core/src/ops/tensor.ts line 105 at r1 (raw file):

 *
 * // Example for WebGPU:
 * async function createReadonlyGPUBufferFromData(device, data, dtype) {

Could this be a non-async function?


tfjs-core/src/ops/tensor.ts line 157 at r1 (raw file):

 * ```
 * @param values The values of the tensor. Can be nested array of numbers,
 *     or a flat array, or a `TypedArray`, or a `WebGLData` object. If the

Could you also add some description about WebGPUData in value parameter's description?

@axinging
Copy link
Contributor Author

axinging commented Nov 15, 2022

Great work and thank you Xu! Overall LGTM except a few nits and a question:

If I understand correctly, this implementation will create a new tensor using the GPUBuffer passed from users. However, WebGL's implementation always creates a new texture and copies values from the texture passed from users. I am curious, is this intended? should we keep the same approach for the two backends, even though this implementation is good for performance (also directly resolves #6232)? @qjia7 @axinging

Reviewed 4 of 10 files at r1, 6 of 7 files at r4.
Reviewable status: 0 of 1 approvals obtained (waiting on @axinging and @xhcao)

tfjs-core/src/tensor_util_env.ts line 33 at r4 (raw file):

  const isObject = typeof val === 'object';
  if (isObject) {
    if ('texture' in val && val.texture instanceof WebGLTexture) {

Great catch! Thanks!

Code quote:

val.texture instanceof WebGLTexture

tfjs-core/src/ops/tensor.ts line 105 at r1 (raw file):

 *
 * // Example for WebGPU:
 * async function createReadonlyGPUBufferFromData(device, data, dtype) {

Could this be a non-async function?

tfjs-core/src/ops/tensor.ts line 157 at r1 (raw file):

 * ```
 * @param values The values of the tensor. Can be nested array of numbers,
 *     or a flat array, or a `TypedArray`, or a `WebGLData` object. If the

Could you also add some description about WebGPUData in value parameter's description?

Thanks @Linchenn
All your comments are updated, except the WebGL has copy and WebGPU not. BTW, I prefer the not copy one. And I think @qjia7 will put some comments on this.

@qjia7
Copy link
Contributor

qjia7 commented Nov 15, 2022

If I understand correctly, this implementation will create a new tensor using the GPUBuffer passed from users. However, WebGL's implementation always creates a new texture and copies values from the texture passed from users. I am curious, is this intended?

Yes, it's intended for performance. But like you said, it will result different behavior between webgl and webgpu. For webgl, once the user creates the tensor from the external texture. The user can directly destroy that texture. However, for webgpu, the user can't destroy the external buffer until tfjs finished the calculation. To keep the user have consistent usage experience, can we tell user that once users use this API to create tensor from GPU data no matter it's webgl or webgpu, to support zero copy for performance, we require that they can't destroy this external gpu resource until tfjs finished all the calculations? For webgl, it's not true, but it leaves room for optimization for webgl in future. How do you think? @Linchenn @pyu10055 @lina128

* tensor values. ). (If the values passed from texture is less than the tensor
* size, zeros will be padded at the rear.). If the values is a `WebGPUData`
* object, the dtype could only be 'float32' or 'int32 and the object has to
* have: buffer, a `GPUBuffer`, the buffer must share the same `GPUDevice` with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline synced with Xing, we need to tell user what kind of buffer usage are needed. Check it in the conde. Also add corresponding tests for it.

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @qjia7 for the clarification. I am look at the API itself, the parameter values typically refer passing by value mechanism, which means the values is copied during the call comparing to passing by reference.
I think we might want to have two APIs to differentiate the mechanisms ( 1 for values and 1 for reference).

Reviewable status: 0 of 1 approvals obtained (waiting on @axinging, @qjia7, and @xhcao)

@axinging
Copy link
Contributor Author

Thank @qjia7 for the clarification. I am look at the API itself, the parameter values typically refer passing by value mechanism, which means the values is copied during the call comparing to passing by reference. I think we might want to have two APIs to differentiate the mechanisms ( 1 for values and 1 for reference).

Reviewable status: 0 of 1 approvals obtained (waiting on @axinging, @qjia7, and @xhcao)

Thanks @pyu10055. I am trying to understand if Javascript is "passing by value" or "passing by reference" here.
In the API side, tensor accepts TensorLike:

type TensorLike = string | number | boolean | TypedArray | RecursiveArray<number | TypedArray | number[]> | RecursiveArray<boolean> | RecursiveArray<...> | Uint8Array[]
export function tensor<R extends Rank>(
    values: TensorLike|WebGLData|WebGPUData, shape?: ShapeMap[R],
    dtype?: DataType): Tensor<R> 

Regarding to above definition, number|boolean should be look like "passing by value". But How about TypedArray?
Based on this thread https://stackoverflow.com/questions/518000/is-javascript-a-pass-by-reference-or-pass-by-value-language, I wrote a simple TypeScript case. From below case, you can see TypedArray behaves more like "pass by reference."

class Animal {
  name: string;
  constructor(theName: string) {
    this.name = theName;
    console.log("Constructor:" + theName);
  }
}

function passByValueOrReference_TypedArray(array: Float32Array) {
   array[0] = 100;
}

function passByValueOrReference_NumberArray(array: number[]) {
   array[0] = 200;
}

function passByValueOrReference_Class(animal: Animal) {
   console.log(animal);
}

{
    const array = new Float32Array(2);
    array[0] = 42;
    passByValueOrReference_TypedArray(array);
    console.log(array);
}
{
    const array: number[] = [1,2];
    array[0] = 42;
    passByValueOrReference_NumberArray(array);
    console.log(array);
}
{
    // I am not confidient about this case.
    const animal = new Animal("Dog");
    passByValueOrReference_Class(animal);
    console.log(animal);
}

[LOG]: Float32Array: {
  "0": 100,
  "1": 0
} 
[LOG]: [200, 2] 
[LOG]: "Constructor:Dog" 
[LOG]: Animal: {
  "name": "Dog"
} 
[LOG]: Animal: {
  "name": "Dog"
} 

@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch from ff5d58f to 196cf98 Compare November 16, 2022 02:39
@axinging
Copy link
Contributor Author

Hi, @pyu10055 @qjia7, I think @huningxin @BruceDai is the user of this API, maybe they have some inputs about this?

The WebGL version which requires an extra copy: #6853
This WebGPU version is zero copy, but user should not destroy the GPUBuffer until all access is done.

@Linchenn
Copy link
Collaborator

From below case, you can see TypedArray behaves more like "pass by reference."

You are right, but I think Ping's point is that the current function tensor(values, shape, dtype) is designed to be 'passing by value'. Then, whatever the values' type is, we will make a copy from values (if values is TypedArray, we also make a copy, instead of using users' directly).

@axinging
Copy link
Contributor Author

axinging commented Nov 17, 2022

From below case, you can see TypedArray behaves more like "pass by reference."

You are right, but I think Ping's point is that the current function tensor(values, shape, dtype) is designed to be 'passing by value'. Then, whatever the values' type is, we will make a copy from values (if values is TypedArray, we also make a copy, instead of using users' directly).

Oh, Thanks @Linchenn @pyu10055 , sorry I misunderstand ping's idea. I think there are four situations here based on your and ping's comments:

  1. primitive type, such as number[], string, boolean;
  2. TypedArray: copy or not, or both?
  3. WebGL: required copy.
  4. WebGPU: zero copy

So do you mean we should support something like below?

// Create tensor requires copy
export function tensor<R extends Rank>(
    values: TensorLike|WebGLData, shape?: ShapeMap[R],
    dtype?: DataType): Tensor<R> 

// Create tensor without copy
export function tensorWithoutCopy<R extends Rank>(
    values: TensorLike|WebGPUData, shape?: ShapeMap[R],
    dtype?: DataType): Tensor<R> 

@Linchenn
Copy link
Collaborator

  • primitive type, such as number[], string, boolean;
  • TypedArray: copy or not, or both?

I think for the first two cases, we are copying values here:

values = dtype !== 'string' ?
toTypedArray(values, dtype) :
flatten(values as string[], [], true) as string[];

So do you mean we should support something like below?
// Create tensor requires copy
export function tensor(
values: TensorLike|WebGLData, shape?: ShapeMap[R],
dtype?: DataType): Tensor

// Create tensor without copy
export function tensorWithoutCopy(
values: TensorLike|WebGPUData, shape?: ShapeMap[R],
dtype?: DataType): Tensor

Could we implement the 'passing by value' one for WebGPU at first? like:

export function tensor<R extends Rank>(
    values: TensorLike|WebGLData|WebGPUData, shape?: ShapeMap[R],
    dtype?: DataType): Tensor<R> 

From my perspective, this has some overheads (I am not sure if the overhead is significant), but it complies with our current design and avoids using data managed by users.

@axinging
Copy link
Contributor Author

axinging commented Nov 18, 2022

For TypedArray, it has a "noConversionNeeded" case, I think this requires no copy(please correct me if anything wrong).
I summarized the deep copy or shallow copy of all possible tensor inputs:

Tensor input type Copy Perf(cpu backend, ms)
string - number - boolean Deep copy
TypedArray noConversionNeeded Shallow copy 0.7
TypedArray ConversionNeeded Uncertain 0.8000001
Array Deep copy 5.8
RecursiveArray Deep copy
WebGL Deep copy
WebGPU(Draft) Shallow copy

We can understand this on perf and data read write.

Perf

From fast to slow: TypedArray no conversion > TypedArray with conversion > Array.

Regarding to "TypedArray no conversion" and "TypedArray with conversion".
When run "TypedArray no conversion" first, it is close to "TypedArray with conversion".

TypedArray, no conversion:0.7000000476837158
TypedArray, with conversion:0.8000000715255737
Array:5.799999952316284

When run "TypedArray with conversion" first, "TypedArray no conversion" is a little fast than "TypedArray with conversion".

TypedArray, with conversion:1.5
TypedArray, no conversion:0
Array:5.299999952316284

Below is the test case:

async function arrayOrTypedArray() {
  await tf.setBackend('cpu');
  await tf.ready();
  let aArray = [];
  const shapeSize = 224 * 224 * 3;
  let aTypedArray = new Float32Array(shapeSize);
  for (let i = 0; i < shapeSize; i++) {
    aArray[i] = i;
    aTypedArray[i] = i;
  }
  const shape = [shapeSize];
  const dtype = 'float32';
  // TypedArray, no conversion
  {
    const start = performance.now();
    const a = tf.tensor(aTypedArray, shape, dtype);
    await a.data();
    const end = performance.now();
    console.log('TypedArray, no conversion:' + (end - start));
  }
  // TypedArray, with conversion
  {
    const start = performance.now();
    const a = tf.tensor(aTypedArray, shape, 'int32');
    await a.data();
    const end = performance.now();
    console.log('TypedArray, with conversion:' + (end - start));
  }
  // Array
  {
    const start = performance.now();
    const a = tf.tensor(aArray, shape, dtype);
    await a.data();
    const end = performance.now();
    console.log('Array:' + (end - start));
  }
}

Data read write

Based on data read write test, TypedArray with conversion looks like deep copy, TypedArray no conversion looks like shallow copy.

async function typedArray() {
  await tf.setBackend('cpu');
  await tf.ready();
  let aArray = [];
  const shapeSize = 224 * 224 * 3;
  let aTypedArray = new Float32Array(shapeSize);
  for (let i = 0; i < shapeSize; i++) {
    aArray[i] = i;
    aTypedArray[i] = i;
  }
  const shape = [shapeSize];
  const dtype = 'float32';
  // TypedArray with conversion
  {
    const a0 = aTypedArray[0];
    const a = tf.tensor(aTypedArray, shape, 'int32');
    const aReadBack = await a.data();
    aReadBack[0] = 100;
    console.log('TypedArray with conversion: ');
    console.log('   Raw: ' + a0);
    // aTypedArray[0] is not changed, so this look like deep copy between
    // aTypedArray and aReadBack.
    console.log('   Raw after update and read back:' + aTypedArray[0]);
    const aReadBack2 = await a.data();
    // aReadBack2[0] changed, so this look like shallow copy between aReadBack
    // and aReadBack2.
    console.log('   Read back again: ' + aReadBack2[0]);
  }

  // TypedArray no conversion
  {
    const a0 = aTypedArray[0];
    const a = tf.tensor(aTypedArray, shape, dtype);
    const aReadBack = await a.data();
    aReadBack[0] = 100;
    console.log('TypedArray no conversion: ');
    console.log('   Raw: ' + a0);
    // aTypedArray[0] changed, so this look like shallow copy between
    // aTypedArray and aReadBack.
    console.log('   Raw after update and read back:' + aTypedArray[0]);
    const aReadBack2 = await a.data();
    // aReadBack2[0] changed, so this look like shallow copy between aReadBack
    // and aReadBack2.
    console.log('   Read back again: ' + aReadBack2[0]);
  }
}

Test result:

TypedArray with conversion:
   Raw: 0
   Raw after update and read back:0
   Read back again: 100
TypedArray no conversion:
   Raw: 0
   Raw after update and read back:100
   Read back again: 100

@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch 3 times, most recently from c839ead to 6e0937c Compare November 21, 2022 07:54
// resource buffer. When zeroCopy is true, tensor will use this GPUBUffer as
// tensor's resource buffer, user should not destroy this GPUBuffer until all
// access are done.
zeroCopy?: boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like external as the name to indicate the resource comes from external and can't be destroyed in tfjs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And change WEBGPU_TENSOR_FROM_BUFFER_WITH_ZERO_COPY flag to WEBGPU_TENSOR_USE_EXTERNAL_BUFFER to keep consistency?

throw new Error(`GPUBuffer size(${
values.buffer.size}) is smaller than tensor size(${size})!`);
} else if (
(values.buffer.usage & GPUBufferUsage.STORAGE) !==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's unify to use GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC as the minimum requirements. I don't want users meet errors that they can't call tensor.data.

Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyu10055 @Linchenn Thanks for your comments and suggestions. We internally synced this issue and reached an agreement to support create tensor from buffer with one copy by default to keep consistent with other backends. Meanwhile, we add a webgpu flag to enable zero copy path so that we can collect some perf data conveniently in future. Are you ok with this solution?

// resource buffer. When zeroCopy is true, tensor will use this GPUBUffer as
// tensor's resource buffer, user should not destroy this GPUBuffer until all
// access are done.
zeroCopy?: boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And change WEBGPU_TENSOR_FROM_BUFFER_WITH_ZERO_COPY flag to WEBGPU_TENSOR_USE_EXTERNAL_BUFFER to keep consistency?

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @qjia7 @axinging the approach looks good to me, one suggestion is to add the zeroCopy flag (or call something else) to the WebGPUData interface instead of being a global flag on the backend.
I think this way, it is easier to document the meaning of the flag and the related API, and in the future if WebGL would support the same functionality it can add this flag as well.

Reviewable status: 0 of 1 approvals obtained (waiting on @axinging, @qjia7, and @xhcao)


tfjs-backend-webgpu/src/backend_webgpu.ts line 59 at r10 (raw file):

Previously, qjia7 (Jiajia Qin) wrote…

And change WEBGPU_TENSOR_FROM_BUFFER_WITH_ZERO_COPY flag to WEBGPU_TENSOR_USE_EXTERNAL_BUFFER to keep consistency?

+1


tfjs-core/src/types.ts line 191 at r10 (raw file):

 * creating a tensor, tensor type is float32.
 */
export interface WebGPUData {

can we have the zeroCopy flag here, instead of a global flag?

@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch 2 times, most recently from ab5807a to c3cf9d0 Compare November 22, 2022 05:13
@axinging
Copy link
Contributor Author

axinging commented Nov 22, 2022

@pyu10055 @Linchenn @qjia7 , zeroCopy is added into WebGPUData, PTAL.

The current status of copy is summarized as below:

Tensor input type Copy
string - number - boolean Deep copy
TypedArray noConversionNeeded Shallow copy
TypedArray ConversionNeeded Deep Copy
Array Deep copy
RecursiveArray Deep copy
WebGL Deep copy
WebGPU(zeroCopy = false) Deep copy
WebGPU(zeroCopy = true) Shallow copy

@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch from c3cf9d0 to 81b08d3 Compare November 22, 2022 07:08
Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Collaborator

@Linchenn Linchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Xu so much for the amazing work! LGTM except some small nits.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @axinging, @qjia7, and @xhcao)


tfjs-core/src/ops/tensor.ts line 105 at r13 (raw file):

 * // by WebGPUData.zeroCopy. When zeroCopy is false or undefined(default), this
 * // passing GPUBuffer can be destroyed after tensor is created. When zeroCopy
 * // is true, this GPUBuffer is bound directly by the tensor, so donot destroy

do not


tfjs-core/src/ops/tensor.ts line 184 at r13 (raw file):

 * zero copy by flag zeroCopy. When zeroCopy is false or undefined(default),
 * this passing GPUBuffer can be destroyed after tensor is created. When
 * zeroCopy is true, this GPUBuffer is bound directly by the tensor, so donot

do not

@axinging axinging force-pushed the tensorFromGPUBuffer_v1 branch from 81b08d3 to fd85685 Compare November 22, 2022 23:39
@axinging
Copy link
Contributor Author

axinging commented Nov 22, 2022

@Linchenn , done, thanks!

Hi, @pyu10055, could you please take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants