External metadata files #25

Telkhine · 2020-11-04T20:11:50Z

I don't know if this project is concerned with where the metadata file is stored, i.e. like how ComicInfo.xml is stored inside a compressed comic archive. If the project is, I would propose support for external metadata files.

An external metadata file would look something like this:

Comic File	Metadata File
`Batman 001 (1940).pdf`	`Batman 001 (1940).xml`

Here are a few reasons why this would be good:

Not all comics are archives. Some are epub, pdf etc
Editing a metadata file within an archive is generally not an atomic write operation. Data corruption of the comic file becomes a higher risk. It's better a metadata file become corrupt than a comicbook file.

Some projects which solved this problem in a similar way:

Kodi: they store external xml metadata in a similar why to what I mentioned above.
Calibre: They use a metadata.opf file which describes a book.

The text was updated successfully, but these errors were encountered:

gotson · 2020-11-04T23:57:11Z

This is indeed an axis of research and discussion, but a bit early given the data model is not finalized.

Offering multiple options is something I had in mind, either embedding the file in the archive where possible, or as a sidecar if not.

shimizurei · 2020-12-15T13:04:36Z

I think calibre just uses the epub standard.

Bitwolfies · 2021-02-07T21:59:59Z

While I normally hate external metadata like that, its kinda necessary for pdfs, I don't use them cause they cant integrate with comicvine, but I would love to use them for things like comics from humble bundle, where the pdf's are of a much higher quailty than the cbz's, and converting them to cbz while getting everything right is a long, annoying process.

timgilbert · 2021-02-15T20:08:24Z

For what it's worth, PDFs do support embedded XML metadata files via XMP (see section 1.6.1 in this document). Whether any applications will read the metadata is another story, but at least this is a possible solution that doesn't require a sidecar file.

wyldphyre · 2021-03-26T08:38:33Z

Even if the solution ends up having a way to include metadata within the archive (which would be ideal), I think it would be good for the specification to also support external files as a fall back (probably as first described above). This would make it much easier to produce software that can add/modify archive metadata.

My reasoning behind this is to make it as simple as possible to work with archive metadata. Its takes a lot of know how and/or frameworks etc to work with PDF, for example, and is probably beyond what many simple apps/scripts/tools would want to get into. If someone wants to write a script/tool to work with archive metadata, but doesn't have the time/effort/experience/capability to work with something more complicated that a zip file (i.e .pdf or .epub files), at least they have the option of putting the metadata into an external file.

And, while it probably goes without saying, we should definitely avoid anything even remotely proprietary. It annoys me greatly that people continue to release archive as .cbr files, when there is no real benefit over .zip (minor space savings), and pretty much any software/script can find a way to work with zip files. Much less so for RAR files. Because of this I personally convert .cbr to .cbr when I get them.

gotson · 2021-05-28T02:09:18Z

We have not yet reached the stage of implementation design for the exported format, but one thing that is important is to separate the model, the data format, and the container:

the model is a collection of objects, a relational model. Probably different from the one shown on the main page, because the exported model will be slightly different from the source of truth model (we may need to flatten it a bit).
the data format is a technical representation of the model. It could be XML, JSON, Avro, ProtoBuf, to name a few. That's a serialization format. They are not all equal, and each has pros/cons that will need to be evaluated.
the container is what holds the data. Common ones are simple file, but there's also zip headers, or PDF information maybe. The file could be inside the archive, or as a sidecar with the same name. This is quite flexible, but would need to be documented, so clients don't have to handle too many cases.

cmargroff · 2022-02-02T17:43:37Z

Given that both ePub and PDF both have embedded xml metadata wouldn't a similar strategy be the right way here? I have a feeling that the layman user would discard external metadata files not understanding the files are linked even with an exact same basename.

wyldphyre · 2022-02-03T05:17:04Z

Yeah, use of an external file would probably the last thing you’d want to do, in an ideal world. But if you lack the tooling/tech to be able to modify a particular file with metadata (pdf probably being a good example), it might be nice to have the option.

Telkhine · 2022-02-03T05:27:33Z

Also, as I said, editing file directly is generally not an atomic write operation. Data corruption of a comic file becomes a higher risk. Some people might prefer not modifying the original source to preserve data integrity. The option of eternal metadata is a very nice option.

ajslater · 2022-02-03T19:52:20Z

The ComicBookInfo format took a unique approach in that they added their metadata to the zipfile comments. I thought this was genius until it became clear that updating zipfile comments also requires decompressing and recompressing the entire archive.

It seems like a text file embedded int the (hopefully not very) compressed archive is the best solution.

Ideally the text and metadata files would be the only compressed assets in the archive and the image files would be STORED to speed opening and recompressing the archive. zip (or rar or lzma) compression is unlikely to improve upon image specific compression schemes like jpeg, webp and png.

gotson added the Container format Describes an existing metadata container label Feb 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External metadata files #25

External metadata files #25

Telkhine commented Nov 4, 2020

gotson commented Nov 4, 2020

shimizurei commented Dec 15, 2020

Bitwolfies commented Feb 7, 2021

timgilbert commented Feb 15, 2021

wyldphyre commented Mar 26, 2021

gotson commented May 28, 2021

cmargroff commented Feb 2, 2022

wyldphyre commented Feb 3, 2022

Telkhine commented Feb 3, 2022

ajslater commented Feb 3, 2022 •

edited

Loading

External metadata files #25

External metadata files #25

Comments

Telkhine commented Nov 4, 2020

gotson commented Nov 4, 2020

shimizurei commented Dec 15, 2020

Bitwolfies commented Feb 7, 2021

timgilbert commented Feb 15, 2021

wyldphyre commented Mar 26, 2021

gotson commented May 28, 2021

cmargroff commented Feb 2, 2022

wyldphyre commented Feb 3, 2022

Telkhine commented Feb 3, 2022

ajslater commented Feb 3, 2022 • edited Loading

ajslater commented Feb 3, 2022 •

edited

Loading