-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V3] v2 -> v3 data migration #1798
Comments
Big +1 on this. I'm working on a conversion tool for large-scale genomics data (100s TB scale) which is usually held in file systems (for the moment, it will probably migrate to object stores later on). A CLI tool that does an in-place migration from v2 to v3 would be a big help. I'm hoping to move to v3 early on, before too many datasets are converted into v2 format and so most users won't ever know about v2. My assumptions was that the migration was largely a case of writing a new JSON metadata file per-array, and should be possible to do both cheaply and safely? |
Yes, I think this is right. Besides the metadata, which will live in a completely new JSON document ( |
Thanks yes, I've been aiming for v3 forwards compatibility by using "/" as the default dimension separator. Then, iterating over the chunks in the first dimension and renaming to have a "c" prefix should be relatively cheap (I forgot about this difference). Is there some developer documentation with recommendations for forwards/backwards compatibility? |
For most cases, the migration only requires adding |
This is correct. When I wrote up this issue, I forgot about the |
i updated the issue to be more accurate :) |
I agree that a CLI tool that can convert an entire hierarchy would be great! |
Today I learned that there is a v1 to v2 migrator in the zarr-python codebase: Lines 1941 to 1956 in 6105ef2
|
We should invest in tools to make the v2 -> v3 conversion simple for people who are motivated to convert their data. A few high-level ideas:
Someone should investigate how complicated in-place conversions would be. On a local filesystem where. V3 is designed to make array conversions easy, requiring only the creation of new metadata.mv
is cheap, this could be attractive.zarr-python
v2 and v3, and a migration guide. This should have its own page in the docs.The text was updated successfully, but these errors were encountered: