Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External metadata files #25

Open
Telkhine opened this issue Nov 4, 2020 · 10 comments
Open

External metadata files #25

Telkhine opened this issue Nov 4, 2020 · 10 comments
Labels
Container format Describes an existing metadata container

Comments

@Telkhine
Copy link

Telkhine commented Nov 4, 2020

I don't know if this project is concerned with where the metadata file is stored, i.e. like how ComicInfo.xml is stored inside a compressed comic archive. If the project is, I would propose support for external metadata files.

An external metadata file would look something like this:

Comic File Metadata File
Batman 001 (1940).pdf Batman 001 (1940).xml

Here are a few reasons why this would be good:

  1. Not all comics are archives. Some are epub, pdf etc
  2. Editing a metadata file within an archive is generally not an atomic write operation. Data corruption of the comic file becomes a higher risk. It's better a metadata file become corrupt than a comicbook file.

Some projects which solved this problem in a similar way:

@gotson
Copy link
Member

gotson commented Nov 4, 2020

This is indeed an axis of research and discussion, but a bit early given the data model is not finalized.

Offering multiple options is something I had in mind, either embedding the file in the archive where possible, or as a sidecar if not.

@shimizurei
Copy link

I think calibre just uses the epub standard.

@gotson gotson added the Container format Describes an existing metadata container label Feb 5, 2021
@Bitwolfies
Copy link

While I normally hate external metadata like that, its kinda necessary for pdfs, I don't use them cause they cant integrate with comicvine, but I would love to use them for things like comics from humble bundle, where the pdf's are of a much higher quailty than the cbz's, and converting them to cbz while getting everything right is a long, annoying process.

@timgilbert
Copy link

For what it's worth, PDFs do support embedded XML metadata files via XMP (see section 1.6.1 in this document). Whether any applications will read the metadata is another story, but at least this is a possible solution that doesn't require a sidecar file.

@wyldphyre
Copy link

Even if the solution ends up having a way to include metadata within the archive (which would be ideal), I think it would be good for the specification to also support external files as a fall back (probably as first described above). This would make it much easier to produce software that can add/modify archive metadata.

My reasoning behind this is to make it as simple as possible to work with archive metadata. Its takes a lot of know how and/or frameworks etc to work with PDF, for example, and is probably beyond what many simple apps/scripts/tools would want to get into. If someone wants to write a script/tool to work with archive metadata, but doesn't have the time/effort/experience/capability to work with something more complicated that a zip file (i.e .pdf or .epub files), at least they have the option of putting the metadata into an external file.

And, while it probably goes without saying, we should definitely avoid anything even remotely proprietary. It annoys me greatly that people continue to release archive as .cbr files, when there is no real benefit over .zip (minor space savings), and pretty much any software/script can find a way to work with zip files. Much less so for RAR files. Because of this I personally convert .cbr to .cbr when I get them.

@gotson
Copy link
Member

gotson commented May 28, 2021

We have not yet reached the stage of implementation design for the exported format, but one thing that is important is to separate the model, the data format, and the container:

  • the model is a collection of objects, a relational model. Probably different from the one shown on the main page, because the exported model will be slightly different from the source of truth model (we may need to flatten it a bit).
  • the data format is a technical representation of the model. It could be XML, JSON, Avro, ProtoBuf, to name a few. That's a serialization format. They are not all equal, and each has pros/cons that will need to be evaluated.
  • the container is what holds the data. Common ones are simple file, but there's also zip headers, or PDF information maybe. The file could be inside the archive, or as a sidecar with the same name. This is quite flexible, but would need to be documented, so clients don't have to handle too many cases.

@cmargroff
Copy link

Given that both ePub and PDF both have embedded xml metadata wouldn't a similar strategy be the right way here? I have a feeling that the layman user would discard external metadata files not understanding the files are linked even with an exact same basename.

@wyldphyre
Copy link

Yeah, use of an external file would probably the last thing you’d want to do, in an ideal world. But if you lack the tooling/tech to be able to modify a particular file with metadata (pdf probably being a good example), it might be nice to have the option.

@Telkhine
Copy link
Author

Telkhine commented Feb 3, 2022

Also, as I said, editing file directly is generally not an atomic write operation. Data corruption of a comic file becomes a higher risk. Some people might prefer not modifying the original source to preserve data integrity. The option of eternal metadata is a very nice option.

@ajslater
Copy link

ajslater commented Feb 3, 2022

The ComicBookInfo format took a unique approach in that they added their metadata to the zipfile comments. I thought this was genius until it became clear that updating zipfile comments also requires decompressing and recompressing the entire archive.

It seems like a text file embedded int the (hopefully not very) compressed archive is the best solution.

Ideally the text and metadata files would be the only compressed assets in the archive and the image files would be STORED to speed opening and recompressing the archive. zip (or rar or lzma) compression is unlikely to improve upon image specific compression schemes like jpeg, webp and png.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Container format Describes an existing metadata container
Projects
None yet
Development

No branches or pull requests

8 participants