Separate size and capacity of arrays in LGDO objects #107

iguinn · 2024-09-19T18:54:45Z

For LGDO arrays, I suggest that we separate the capacity (i.e. the size of the underlying numpy array) from the size of the LGDO object. This is similar to how C++ std::vectors work. When increasing the size, only increase the capacity if needed (I think the typical strategy with C++ is to double the capacity when needed, although it is compiler dependent. We could also double up to a certain size and then add fixed amounts after that); when decreasing the size there is no need to change the capacity. I would suggest we only take this approach for the outer-most dimension (i.e. ArrayOfArrays still have fixed lengths in other dimensions). We would also want to turn [lgdo_type].nda into a property that returns an array view over the range [0:size]

This would have several benefits:

Simplify reading into pre-defined buffers. If the read is smaller than the pre-defined buffer, then we simply change the size without causing any problems

Do not need to return pair of array+size anymore, which makes read easier to use (and avoids the inconsistent output type that we have right now)
Helpful for LH5Iterator

Could help with performance when reading from many files as an alternative approach to what is suggested here: Determine the total buffer size before starting to read LH5 data from a list of files #93. Instead of having to re-allocate/copy the array each time a new file is open, the number of times this is done scales logarithmically with number of files (depending on the approach we take to increasing capacity)
In certain situations this would avoid the error when trying to resize an array when another reference to it exists. This is mostly useful for benefit 1, but there may be other situations where shrinking an array is useful. See potential annoyance 1.

Note that VectorOfVectors is already more or less handled this way, because resizing the data array is a common need

Potential annoyances:

If multiple variables point to the same LGDO object, we need to be careful that the size is treated as a mutable reference rather than an immutable value, otherwise if you make a change to one, it could change the underlying array without updating the size in all the others.
This may not be totally trivial to plug into view_as and read_as, if external libraries use the shape of the array. Changing the behavior of nda hopefully keeps this simple

The text was updated successfully, but these errors were encountered:

iguinn added enhancement New feature or request types LGDO types labels Sep 19, 2024

jasondet assigned iguinn Sep 25, 2024

iguinn linked a pull request Oct 13, 2024 that will close this issue

Separate size and capacity of LGDO ArrayLike objects #109

Open

gipert linked a pull request Nov 9, 2024 that will close this issue

Separate size and capacity of LGDO ArrayLike objects #109

Open

ManuelHu mentioned this issue Nov 13, 2024

LH5iterator behaviour if buffer size doesnt divide table size #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate size and capacity of arrays in LGDO objects #107

Separate size and capacity of arrays in LGDO objects #107

iguinn commented Sep 19, 2024

Separate size and capacity of arrays in LGDO objects #107

Separate size and capacity of arrays in LGDO objects #107

Comments

iguinn commented Sep 19, 2024