Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rechunk function #237

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

felixcremer
Copy link
Contributor

This adds a rechunk function which would return a RechunkedDiskArray
This function could then be overloaded by wrapping packages like DimensionalData.

This was discussed in #35 but is not a fix to that but would open up a better workaround.
This would need some tests

This function could then be overloaded by wrapping packges.
Copy link
Member

@asinghvi17 asinghvi17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to allow any sensible definition of chunks here? I think rechunkeddiskarray takes tuples of chunksize like (10, 10, 3) for instance...

src/rechunk.jl Outdated
Rechunk the underlying data of A into the given `chunks`.

"""
rechunk(data::AbstractDiskArray, chunks::GridChunks) = RechunkedDiskArray(data , chunks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rechunk(data::AbstractDiskArray, chunks::GridChunks) = RechunkedDiskArray(data, chunks)

just for style

asinghvi17

This comment was marked as duplicate.

@meggart
Copy link
Collaborator

meggart commented Mar 3, 2025

I start wondering if rechunk might be a misleading function name. In the end, we are not rechunking the data but rather creating a view into the data that behaves as if it was chunked in a different way internally. There might be some confusion among users when they assume that the result of rechunk would magically make time series access more efficient on spatially chunked data, this happens quite reularly in the python world for xarray's chunk function.

I think we need either a different name or at least a better docstring explaining that no physical rechunking is happening here.

@felixcremer
Copy link
Contributor Author

Yes, i am also wondering whether the name setchunks makes more sense. I could copy the docstring from the YAXArrays.setchunks function and remove the YAXArray specifics.

help?> YAXArrays.setchunks
  setchunks(c::YAXArray,chunks)

  Resets the chunks of a YAXArray and returns a new YAXArray. Note that this will not change the chunking of the underlying data itself, it will just make the data
  "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savecube on the resulting array. The chunks argument can
  take one of the following forms:

    •  a DiskArrays.GridChunks object

    •  a tuple specifying the chunk size along each dimension

    •  an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes

@rafaqz
Copy link
Collaborator

rafaqz commented Mar 3, 2025

Even setchunks seems more active than it is... It's really like fakechunks

@felixcremer
Copy link
Contributor Author

Fakechunks sounds like it is not chunked at all. But I see your point.
Other possible names could be:

pretendchunk
pseudochunk
mockchunk

And if we rename the function we should also rename the main object because it would be good to have them in accordance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants