Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate size and capacity of arrays in LGDO objects #107

Open
iguinn opened this issue Sep 19, 2024 · 0 comments · May be fixed by #109
Open

Separate size and capacity of arrays in LGDO objects #107

iguinn opened this issue Sep 19, 2024 · 0 comments · May be fixed by #109
Assignees
Labels
enhancement New feature or request types LGDO types

Comments

@iguinn
Copy link
Contributor

iguinn commented Sep 19, 2024

For LGDO arrays, I suggest that we separate the capacity (i.e. the size of the underlying numpy array) from the size of the LGDO object. This is similar to how C++ std::vectors work. When increasing the size, only increase the capacity if needed (I think the typical strategy with C++ is to double the capacity when needed, although it is compiler dependent. We could also double up to a certain size and then add fixed amounts after that); when decreasing the size there is no need to change the capacity. I would suggest we only take this approach for the outer-most dimension (i.e. ArrayOfArrays still have fixed lengths in other dimensions). We would also want to turn [lgdo_type].nda into a property that returns an array view over the range [0:size]

This would have several benefits:

  1. Simplify reading into pre-defined buffers. If the read is smaller than the pre-defined buffer, then we simply change the size without causing any problems
  • Do not need to return pair of array+size anymore, which makes read easier to use (and avoids the inconsistent output type that we have right now)
  • Helpful for LH5Iterator
  1. Could help with performance when reading from many files as an alternative approach to what is suggested here: Determine the total buffer size before starting to read LH5 data from a list of files #93. Instead of having to re-allocate/copy the array each time a new file is open, the number of times this is done scales logarithmically with number of files (depending on the approach we take to increasing capacity)
  2. In certain situations this would avoid the error when trying to resize an array when another reference to it exists. This is mostly useful for benefit 1, but there may be other situations where shrinking an array is useful. See potential annoyance 1.
  • Note that VectorOfVectors is already more or less handled this way, because resizing the data array is a common need

Potential annoyances:

  1. If multiple variables point to the same LGDO object, we need to be careful that the size is treated as a mutable reference rather than an immutable value, otherwise if you make a change to one, it could change the underlying array without updating the size in all the others.
  2. This may not be totally trivial to plug into view_as and read_as, if external libraries use the shape of the array. Changing the behavior of nda hopefully keeps this simple
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request types LGDO types
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant