-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a standard single-file format #15
Comments
I really like this suggestion a lot, and you're right - if we don't specifically call this out, there are going to be SigMF recordings floating around as I agree with every bullet in @kpreid's top-level comment, but do want to follow-up on one item:
Is the suggestion, here, that the filenames of the metadata and dataset files within the recording be fixed? |
The most important part is: pathnames referring to some other directory, relatively or absolutely, are prohibited; only filenames are allowed. Whether the filenames themselves are fixed has a tradeoff:
Since the goal here is to make a format good for “archiving” (in the scientists-and-librarians sense) data, I think that the surprise should be avoided and option 1 above should be chosen. (If one wants to have a name intrinsic to the data set, well, that's what the contents of the |
You can't write a ZIP (or any format) in a streaming fashion when TWO part of it are streaming (i.e. data and metadata) ... |
@smunaut Yes, but you can either write the metadata first (if it is known that the recording will have exactly one segment and no annotations), or keep it in memory and write it second after the recording ends (the metadata will most likely be very small compared to the sample data). |
"most likely" ... I'm generating annotation for every GSM bursts in a real-time scanner app, that's about 2000 annotations per second (for a single GSM channel), I can assure you it grows quickly. The whole thing has been designed to support fully stream-able annotations (but not segments), breaking this now would be a shame |
@smunaut And such applications can use the two-file format for that. I'm proposing specifying only that “if you want a single file, do it this way”. |
Mmm, my bad, I was understanding was that you wanted to mandate the use of the single file format. But then what would be required from reader / writer application to be deemed compliant ?
|
We should obligate readers to embed ZIP unarchiving only if we're also willing to obligate them to embed HTML parsing (see #7). It's the same situation: libraries are commonly available and it wouldn't be hard, but it's still reasonable to choose to not have implementations need library dependencies. So, assuming we take the no-dependency choice, let's say they must support the two-file format (unless they are such that two-file doesn't make sense at all) and there's a standard unpacking tool to go from one-file to two-file (which is just a thin wrapper around a regular unzip which recognizes the .sigmf extension and renames the contents and maybe validates them). I don't think this is a particularly great situation; my initial claim is not that there should be two ways to store a SigMF recording but that there inevitably will be and we can do better by standardizing it than not. |
Well (1) I wouldn't mandate HTML in the first place. If you want ZIP (or really any other) format as the reference, you'll need to explicitly reference what you consider to be the canonical spec for that format, including which extension it must support (because all those formats can have extensions of their own and revisions, etc ...). |
I think this is worth putting in the spec - it's just a matter of selecting the compression format. If ZIP isn't a great option, what about the other common formats: |
@bhilburn We need an archive format, not a compression format. Some things, like zip, are both, but gzip, bzip2, and xz are pure compression formats, not archive formats (they "contain only one file"); they have to be combined with e.g. tar (which is only archive and no compression). (Certainly we could consider the use of tar for this purpose, but I have no knowledge about its suitability.) |
Fair point, @kpreid. I was definitely treating them as one-and-the-same in my earlier comment. The real question, then, is whether Input from anyone regarding the specification of |
Okay, #44 is up! Please review and comment. |
#44 has been merged! |
Reference implementation of archive format from issue #15
I was asked to review the SigMF specification by @bhilburn.
Data formats consisting of multiple files are frequently awkward to work with; for example, when downloading them from a web site. People will likely decide to distribute them in archives instead.
Therefore, I propose that the specification preemptively define a simple single-file format, which straightforwardly contains the metadata and data files. There are a lot of possibilities here, so here are some suggested constraints:
.sigmf
, even if it is itself a standard archive format such as ZIP. (Same argument as in Use more specific filename extensions #14, and should be coordinated with that.)The text was updated successfully, but these errors were encountered: