Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added metadata in Tectonic.toml #1120

Merged
merged 7 commits into from
Feb 4, 2024
Merged

Added metadata in Tectonic.toml #1120

merged 7 commits into from
Feb 4, 2024

Conversation

rm-dr
Copy link
Contributor

@rm-dr rm-dr commented Nov 23, 2023

This PR adds the tectonic show metadata command, which allows users to embed arbitrary data in their Tectonic.toml under the doc.metadata table:

[doc]
name = "..."
bundle = "..."


# New feature
[doc.metadata]
description = "This is ..."
arr = [1, 2, 3, [4,5]]


[[output]]
name = "..."
type = "..."
# ...

tectonic show metadata is used as follows (only valid when in a workspace):

$ tectonic show metadata description
This is ...

$ tectonic show metadata arr.3.1
5

$ tectonic show metadata arr.len
4

$ tectonic show metadata arr
(no output and error status 1. Arrays and tables can't be directly printed.)

The goal tectonic show metadata is to make scripting easier. Instead of adding additional dependencies to a bash script to parse toml files, users may use tectonic show instead.

Copy link

codecov bot commented Nov 23, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (217060e) 46.22% compared to head (6a4351f) 46.20%.
Report is 10 commits behind head on master.

Files Patch % Lines
crates/docmodel/src/document.rs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1120      +/-   ##
==========================================
- Coverage   46.22%   46.20%   -0.02%     
==========================================
  Files         171      171              
  Lines       65069    65104      +35     
==========================================
+ Hits        30077    30082       +5     
- Misses      34992    35022      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ratmice
Copy link
Contributor

ratmice commented Nov 24, 2023

Personally, I have a rather irrational dislike of metadata-like fields, nobody can verify their correctness, besides perhaps their levenshtein distance to some common metadata (which would be pretty fraught if your correct metadata happens to have a close enough distance to some common metadata).

I would much rather see a json-ld based input format, and a schema for tectonic.toml <-> json, such that other tools are free to build schemas for their data and we can still extract the tectonic-specific bits from it.

FWIW, I had posted a bit about this in the old tectonic forums about two years ago asking that we avoid adding such fields in the future, and work towards a better way than needing to embed arbitrary metadata fields. So if there is a cure for fuddy duddy, I have not discovered it.

@rm-dr
Copy link
Contributor Author

rm-dr commented Nov 24, 2023

Personally, I have a rather irrational dislike of metadata-like fields, nobody can verify their correctness

I don't think I understand. Why would we need to verify the correctness of metadata?
"Metadata" (in this case) is simply arbitrary data for an external tool.
Tectonic does not (and should not) care what users put in that table, as long as it's valid toml.

In fact, this PR implements borrows this feature from cargo. It's an elegant solution to the problem---it doesn't require any extra files, and it's completely orthogonal to the regular build process.

I would much rather see a json-ld based input format, and a schema for tectonic.toml <-> json, such that other tools are free to build schemas for their data and we can still extract the tectonic-specific bits from it.

Why the complexity? Another format when we already support toml?
The approach here makes it very easy to pipe into a build script: add some metadata keys, call desc="$(tectonic show metadata description)" in your build script, and you're off to the races. You don't need to worry about parsing json (or toml) in a bash script, and you don't need to add any other dependencies.

For more complex tools, we could easily make tectonic show metadata <table> output json...
(right now, trying to print a table returns an error) But I can't think of a use case for this.

If there is one, I'd argue that tools complex enough to warrant that should be parsing Tectonic.toml on their own anyway.

@ratmice
Copy link
Contributor

ratmice commented Nov 24, 2023

Personally, I have a rather irrational dislike of metadata-like fields, nobody can verify their correctness

I don't think I understand. Why would we need to verify the correctness of metadata? "Metadata" (in this case) is simply arbitrary data for an external tool. Tectonic does not (and should not) care what users put in that table, as long as it's valid toml.

In fact, this PR implements borrows this feature from cargo. It's an elegant solution to the problem---it doesn't require any extra files, and it's completely orthogonal to the regular build process.

If it is completely orthogonal it belongs in a completely separate .toml file alongside Tectonic.toml, I don't see why extra files are a problem. Once we start including metadata well-formedness of Tectonic.toml becomes undecidable without an arbitrary set of tools. Where the toml file itself gives you no hint of which tools are required.

I'd argue for the more complex scheme of linked data, when the data being added is well... linked to the data in the toml file.
For tools which use tectonic as a library, but need to extend the toml file for some reason this situation seems plausible.

I don't like it in cargo either and think it is a mistake. With a schema, tectonic itself can consume the schema and check for well-formedness regardless of the tool, with the context of the metadata retained.

I would much rather see a json-ld based input format, and a schema for tectonic.toml <-> json, such that other tools are free to build schemas for their data and we can still extract the tectonic-specific bits from it.

Why the complexity? Another format when we already support toml?

Because toml doesn't support arbitrary data in a way which is self-describing and checkable for well-formedness.
The second aspect which I should mention is that really this other format is only necessary if one needs to augment the toml
with metadata. That is we should be able to produce json conforming to a tectonic schema from toml, and Tectonic.toml from schemaized json by stripping out the metadata values. Once we add arbitrary metadata these fields can only be schemaized as json literals which lack any form of context.

IMO forcing people who want to embed arbitrary data in tectonic.toml to use a different format better suited to solving these problems and providing a migration path from Tectonic.toml seems like an adequate trade-off... Once we include this metadata the migration can no longer be done automatically, and the output now contains blobs devoid of any context.

The approach here makes it very easy to pipe into a build script: add some metadata keys, call desc="$(tectonic show metadata description)" in your build script, and you're off to the races. You don't need to worry about parsing json (or toml) in a bash script, and you don't need to add any other dependencies.

For more complex tools, we could easily make tectonic show metadata <table> output json... (right now, trying to print a table returns an error) But I can't think of a use case for this.

If there is one, I'd argue that tools complex enough to warrant that should be parsing Tectonic.toml on their own anyway.

Sure... it's easy, but it is also unprincipled, metadata could clash, only the author knows what tools consume it or what it means.
I would hope that we would at least consider ways we can avoid these problems. Even if they are not the ways which has commonly been approached.

@rm-dr
Copy link
Contributor Author

rm-dr commented Nov 24, 2023

I think I actually agree with most of these points... Cranko options in tectonic's tomls caused me a bit of confusion earlier today.

But I'd still argue for a simple scheme. 99% of tectonic users won't ever need metadata, and 99% of those that do will only need a few simple values.

Making an extra file for three lines of, say, auto-publish metadata isn't ideal. At that point, I'd just roll my own metadata file instead of dealing with a schema.

Tectonic isn't a fully-fledged build system, it's a TeX compiler. Fancy linked schemas may be nice for cargo, but they're beyond overkill for tectonic.

@ratmice
Copy link
Contributor

ratmice commented Nov 24, 2023

To me at least, it comes down to whether we want to include arbitrary data despite the downsides of doing so.
I definitely hear/feel that the complexity involved in building a fancy schema system is excessive for the problem at hand (especially if I'm likely the only one who wants to work towards it).

But I don't think the alternative of not including arbitrary metadata, and forcing users with metadata to create their own toml file is so inconvenient that it is worth allowing it in Tectonic.toml proper. But if for whatever reason it must be in the main toml file, i'd rather see the overly complex scheme which avoids these downsides.

Anyhow, thanks for hearing me out at least.

@pkgw
Copy link
Collaborator

pkgw commented Nov 25, 2023

Hmm, this is an interesting discussion!

I think it might be clarifying to focus on some specific use cases. @rm-dr, I would guess that there's a goal that this proposed functionality would help you accomplish? Can you describe the sort of things that you envision doing with this feature?

At the moment, although I don't personally have any projects where I can see myself using this functionality, I can see how it could come in handy. In particular, while it's true that Tectonic's capabilities as a build system are pretty limited, I would like to see them get stronger, with better "workspace" support in particular. As those capabilities (hopefully) improve, I think it becomes more valuable to have a metadata system that's integrated with the semantics of the broader build system. As things stand right now, it does feel a bit like overkill, but hopefully that won't always be the case.

@rm-dr
Copy link
Contributor Author

rm-dr commented Nov 26, 2023

I think it might be clarifying to focus on some specific use cases.

Here's mine.
I've set up CI that automatically publishes my lecture notes to a class website on commit. There are little extra bits of information I'd like to pass to it: A pretty title, description, and a boolean (for in-progress notes which shouldn't be public).

I could, of course, add a file to every week's workspace...
But with this PR, I just need the following:

[doc.metadata]
title = "pretty title"
description = "sentence"
publish = false

...with a few elegant tectonic show metadata calls in the script.

@pkgw
Copy link
Collaborator

pkgw commented Nov 27, 2023

Thanks, I think that's helpful.

As I wrote above, I don't mind generic metadata fields in these sorts of files, at least not as much as @ratmice. While I don't think that it's wise to encourage people to make up their own fields in the abstract hope that they'll be useful, I think that these sorts of mechanisms have proven to be useful when you want to integrate some kind of value-add tool that integrates with the main build framework (e.g., my Cranko tool, or @rm-dr's use case here). And I'd like to hope that the Tectonic document model will open up lots of nice possibilities for exactly that sort of thing. I can't disagree that you can always add another file that just sits next to your (e.g.) Tectonic.toml, but in my experience there's a cleaner feeling that comes with the ability to add metadata right in the build-system file.

The aspect that concerns me a bit more here is adding to the CLI's "surface area" to add utilities to print out the metadata. Every new CLI command is something that (thinking cautiously) we have to support for the rest of time, and this particular piece of functionality seems like it could be implemented better in an entirely separate tool — there's jq for JSON, and it looks like there are a few options for TOML files. I wouldn't expect this particular CLI interface to be much of a maintenance burden, but these things add up. @rm-dr could you see an external tool working for the data-extraction component of your use case?

@rm-dr
Copy link
Contributor Author

rm-dr commented Nov 27, 2023

The aspect that concerns me a bit more here is adding to the CLI's "surface area" to add utilities to print out the metadata. Every new CLI command is something that (thinking cautiously) we have to support for the rest of time.

That's a good point. It's a super simple feature right now, but I can see how it could become a problem later.

...could you see an external tool working for the data-extraction component of your use case?

Of course. If the goal is to allow [doc.metadata] but never parse it, I'd just hack together a bash function to find and read those three lines of TOML. It's not the prettiest solution, but it's simple and it's good enough for what I'm doing.

@ratmice ratmice mentioned this pull request Nov 27, 2023
@pkgw
Copy link
Collaborator

pkgw commented Nov 28, 2023

Cool. I think that I'd happy to merge a PR that added a free-metadata section to the TOML file. I'm a bit more reluctant to add generic metadata commands to the CLI.

@rm-dr
Copy link
Contributor Author

rm-dr commented Nov 29, 2023

Done. This PR no longer provides a command, simply ignoring [doc.metadata]

@rm-dr rm-dr changed the title Added "show metadata" command Added metadata in Tectonic.toml Dec 15, 2023
@pkgw pkgw merged commit 8456df1 into tectonic-typesetting:master Feb 4, 2024
@pkgw
Copy link
Collaborator

pkgw commented Feb 4, 2024

Sorry for taking absolutely forever to merge this.

@rm-dr rm-dr deleted the toml-metadata branch February 7, 2024 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants