Replies: 26 comments
-
@sabas do you have a specific suggestion? I think you are right this is useful. /cc @paulfitz |
Beta Was this translation helpful? Give feedback.
-
I was thinking a specification which would tell how to intepret a zipped package on the fly, in the same way a JAR is executed by Java.
|
Beta Was this translation helpful? Give feedback.
-
@sabas i think this makes a lot of sense. Do you want to start speccing something out? |
Beta Was this translation helpful? Give feedback.
-
See #198 |
Beta Was this translation helpful? Give feedback.
-
There was a lot of discussion in the PR. The PR basically suggested tar + gzip. Subsequent discussion in the PR suggested reviewing existing best practice more and using zip. Main excerpts: @mfenner wrote:
Excerpt from Research Object bundle spec:
@tfmorris wrote:
|
Beta Was this translation helpful? Give feedback.
-
@mfenner would you be interested in taking a bit of editorship here? You were a strong proponent of introducing this (and I'm +1 too). In addition, this should be very simple and short spec to write once we decide what to do. |
Beta Was this translation helpful? Give feedback.
-
Let me think about how to approach this. |
Beta Was this translation helpful? Give feedback.
-
@mfenner any further thoughts? /cc @danfowler I am increasingly thinking that "bundling" a data package into one file (compressed) is an important use case and would love your suggestions here. |
Beta Was this translation helpful? Give feedback.
-
@rgrp sorry for not following up on this. I want a standard zip compression, and hadn't found the time to spec out the details. Bundling a data package into one file is an important use case for me. |
Beta Was this translation helpful? Give feedback.
-
For reference (although not directly related to a spec for compression) we went ahead and added zip support to the recently upgraded Python lib for DataPackage, based on very clear use cases in the CKAN integration, and, in general, that it is sensible and reasonable :). @vitorbaptista developed and led on that initiative. For reference: |
Beta Was this translation helpful? Give feedback.
-
@mfenner i imagine this can be super simple. Would you be able to start a draft and drop it in an issue here? @vitorbaptista useful to get outline of what you did. |
Beta Was this translation helpful? Give feedback.
-
The requirements for my ZIP file loading were to be able to load both ZIPs that follow the pattern:
and also
This is because we wanted to support the ZIP files generated by GitHub (i.e. https://github.com/datasets/gdp/archive/master.zip), which have all contents inside a folder. The actual code checks that the ZIP file has only and only one |
Beta Was this translation helpful? Give feedback.
-
+1 Makes a lot of sense. |
Beta Was this translation helpful? Give feedback.
-
I just hope you awere of ZIP filename encoding problems: http://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ Lot of users still stick to windows-1251 (cyrillic) or SHIFT_JIS (japanese). Maybe it would be good idea to pick archive format that doesn't have such desing flaw (if such format exists)? |
Beta Was this translation helpful? Give feedback.
-
That blog post is from 2008, is barely coherent, and seems focused more on
the tools than the format.
What do you recommend instead of ZIP?
|
Beta Was this translation helpful? Give feedback.
-
@mfenner are you happy to draft a mini-spec here? I imagine it could be just a few paragraphs saying e.g.
|
Beta Was this translation helpful? Give feedback.
-
I wouldn't limit it to
I would suggest us to follow the 3rd option, as it's both easier to code and to explain. |
Beta Was this translation helpful? Give feedback.
-
I think is better to be explicit in this case and limit the options for people. A single |
Beta Was this translation helpful? Give feedback.
-
Option 1 would enforce the rules used by the |
Beta Was this translation helpful? Give feedback.
-
@tfmorris I propose 7zip as its open-source, provide better compression ration and UTF-8 file-names. Despite 2008 is far away, problems with i18n in filesystems is the same - ZIP file created on PC with Korean locale and contain Korean in filenames will be unreadable gibberish after unZIPing on PC with different locale. |
Beta Was this translation helpful? Give feedback.
-
For reference, BagIt's serialization specification work doesn't actually mandate a given format, just rules for (de)serializing behavior:
|
Beta Was this translation helpful? Give feedback.
-
@mfenner are you still interested to work on a mini spec for this? |
Beta Was this translation helpful? Give feedback.
-
Having read the BagIt approach I think they got it pretty much right. My only question would be about step 3 - we could have instead that you do it in the datapackage directory so that the datapackage.json is at the root of the archive file. However, my guess is that bagit creators thought about this. Next steps:
|
Beta Was this translation helpful? Give feedback.
-
@rufuspollock will you work on some wording for this? Maybe better in here until I finish on #337 |
Beta Was this translation helpful? Give feedback.
-
@pwalsh yes - note this is a patterns item at this stage. It won't be part of the spec atm i think. |
Beta Was this translation helpful? Give feedback.
-
tar + zstd are great for this purpose. Zstd is superior to gzip/zlib. Tools exist and are available on permissive license (BSD).
related topic #290 (comment) |
Beta Was this translation helpful? Give feedback.
-
Updated: 2016-11-17
We want a way to "bundle" a data package into a single file for transmission. In addition it may be compressed at the same time.
Note also that individual resources can be compressed in themselves - see #290
Desired Features
Original Description
As other packaging types use compression for distributing each package (JAR is a ZIP archive), there should be a section proposing a way to deal with compressed data packages.
Beta Was this translation helpful? Give feedback.
All reactions