Adding `.view(...)`/reinterpret `dtype` method #266

jakirkham · 2021-09-16T17:42:10Z

In NumPy (and some other libraries) arrays have a method to view the data as another dtype. This is different from astype as this taking data that may not be typed like bytes or bytearray and applying different dtype metadata on top of it. As an example reinterpreting the data in this way can be useful particularly in distributed setting where the data goes through serialization/deserialization steps where metadata is extracted, sent along, and then reapply to the data. Though this can come up in other situations as well.

cc @rgommers @kgryte (since we discussed this briefly earlier)

The text was updated successfully, but these errors were encountered:

rgommers · 2021-09-16T19:52:00Z

As an example reinterpreting the data in this way can be useful particularly in distributed setting where the data goes through serialization/deserialization steps where metadata is extracted, sent along, and then reapply to the data.

This is different from numpy.ndarray.view right? In the latter case, there is already an array instance which must have a well-defined dtype, it's not just a block of memory. This particular example sounds closest to frombuffer. I'm left wondering a little why the deserialization doesn't use the correct metadata immediately though - can you point to a concrete example?

jakirkham · 2021-09-16T19:59:58Z

One could do np.asarray(memoryview(buf)).view(fmt) for example. Though yes there are similarities to np.frombuffer

Because the memory is allocated to receive the message before any of that information arrives (it needs to be written somewhere in memory). Only after the metadata and data are stored, can they go through the deserialization process

rgommers · 2021-09-16T20:46:53Z

One could do np.asarray(memoryview(buf)).view(fmt) for example

Equivalent to np.asarray(memoryview(buf), dtype=fmt)?

I think I understand the use case, but there's no way to get an array that's untyped in the API, so the "reinterpret memory" use case seems quite niche. And I expect that there will be libraries that don't allow this kind of thing, because memory layout is an implementation detail not exposed to the user. So I'm leaning towards "out of scope" here.

It seems like serialization falls under I/O, which is out of scope completely.

jakirkham · 2021-09-16T21:17:01Z

One could do np.asarray(memoryview(buf)).view(fmt) for example

Equivalent to np.asarray(memoryview(buf), dtype=fmt)?

Not if dtype=... means .astype(...). I think this gets back into our discussion earlier.

Maybe a short example helps? Imagine b is received over the wire along with relevant metadata. The data is three float32 numbers (IOW Out[3] is what we want).

In [1]: import numpy as np

In [2]: b = b"\x00\x00\x00\x00\x00\x00\x80?\x00\x00\x00@"

In [3]: np.asarray(memoryview(b)).view(np.float32)
Out[3]: array([0., 1., 2.], dtype=float32)

In [4]: np.asarray(memoryview(b), dtype=np.float32)
Out[4]: 
array([  0.,   0.,   0.,   0.,   0.,   0., 128.,  63.,   0.,   0.,   0.,   64.], dtype=float32)

I think I understand the use case, but there's no way to get an array that's untyped in the API, so the "reinterpret memory" use case seems quite niche.

In our usual case it is not so much that the data is untyped, but the type doesn't necessarily match what it should. Taking the example above, we have...

In [6]: np.asarray(memoryview(b)).dtype
Out[6]: dtype('uint8')

IOW we often have something that is uint8 or int8.

And I expect that there will be libraries that don't allow this kind of thing, because memory layout is an implementation detail not exposed to the user. So I'm leaning towards "out of scope" here.

For clarity, am not looking to manipulate the underlying memory in any way and don't really care how it is represented. Am just trying to patch on the correct formatting. Another way to think of this would be altering the dtype DLPack might use. Suppose one could hack around with the DLPack representation before it goes through the protocol, but that feels a bit clumsy.

It seems like serialization falls under I/O, which is out of scope completely.

It is certainly useful in I/O contexts (communication, file I/O, etc.). Though am not really looking for the protocol to handle the I/O portion or even serialization. Just the ability to perform this cast.

rgommers · 2021-09-16T21:25:26Z

Thanks, that is helpful. The "it has the wrong dtype" has come up in at least one other place I think, using DLPack to transfer bool arrays - those weren't supported, so it was done as uint8.

I think the next step here is figure out how other array libraries do this (if they allow it).

kgryte · 2021-09-20T09:40:52Z

Another use case for reinterpretation is ability to convert to and from the underlying byte representation of floating-point numbers.

This is common in the implementation of transcendental functions where you want to manipulate the underlying bits of a IEEE 754 floating-point number directly. Go, e.g., provides dedicated APIs for such reinterpretation (Float64bits and Float64frombits (albeit only operating on a single number)). JavaScript exposes an ArrayBuffer from which can instantiated typed array views allowing floating-point <=> bits reinterpretation.

The ability to reinterpret the underlying memory (i.e., have a data "view") can certainly be useful in certain classes of numerical algorithms and when you want to vectorize operations. The ability to reinterpret without needing to perform a copy would afford performance benefits.

Currently, the only way to achieve reinterpretation according to the specification is via either (1) manual iteration and data copy or (2) a combination of __dlpack__ and from_dlpack (see interchange), which may or may not involve data copy.

jakirkham · 2022-10-05T22:10:56Z

cc @seberg (in case you have thoughts on this one :)

seberg · 2022-10-06T07:28:47Z

For the use-case of reading blobs from the buffer protocol, I prefer the frombuffer API. OTOH, I guess Dask cannot export buffers and it doesn't match well for a "reinterpret cast" of an existing array.
So there may be need for view as well (which is a bit more generic I guess?), although it seems less important.

jakirkham · 2022-10-06T08:08:58Z

Think the main value of view is it allows reinterpreting an existing array and knowing the end array type will be the same (the dtype is ofc changed).

Whereas with frombuffer, asarray, etc., one needs to know the type of the array to call the right function. With a method, this confusion can be avoided.

kgryte · 2023-06-29T08:30:27Z

As this proposal is currently without a champion, I'll go ahead and close.

kgryte added this to the v2022 milestone Oct 4, 2021

kgryte added the API extension Adds new functions or objects to the API. label Oct 4, 2021

jakirkham mentioned this issue Oct 5, 2022

Dispatching fromfile and tofile #490

Closed

rgommers removed this from the v2022 milestone Dec 14, 2022

kgryte closed this as completed Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `.view(...)`/reinterpret `dtype` method #266

Adding `.view(...)`/reinterpret `dtype` method #266

jakirkham commented Sep 16, 2021

rgommers commented Sep 16, 2021

jakirkham commented Sep 16, 2021

rgommers commented Sep 16, 2021

jakirkham commented Sep 16, 2021

rgommers commented Sep 16, 2021

kgryte commented Sep 20, 2021 •

edited

Loading

jakirkham commented Oct 5, 2022

seberg commented Oct 6, 2022

jakirkham commented Oct 6, 2022

kgryte commented Jun 29, 2023

Adding .view(...)/reinterpret dtype method #266

Adding .view(...)/reinterpret dtype method #266

Comments

jakirkham commented Sep 16, 2021

rgommers commented Sep 16, 2021

jakirkham commented Sep 16, 2021

rgommers commented Sep 16, 2021

jakirkham commented Sep 16, 2021

rgommers commented Sep 16, 2021

kgryte commented Sep 20, 2021 • edited Loading

jakirkham commented Oct 5, 2022

seberg commented Oct 6, 2022

jakirkham commented Oct 6, 2022

kgryte commented Jun 29, 2023

Adding `.view(...)`/reinterpret `dtype` method #266

Adding `.view(...)`/reinterpret `dtype` method #266

kgryte commented Sep 20, 2021 •

edited

Loading