Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce -Z deterministic-mode to address a few reproducability issues #1036

Merged

Conversation

Mrmaxmeier
Copy link
Contributor

@Mrmaxmeier Mrmaxmeier commented May 14, 2023

I've dusted off tectonic-on-arXiv, and I looks like upgrading to the TeXLive 2022 bundle uncovered new reproducibility issues:

  • Documents that request synctex files with \synctex=1 contain absolute paths to the TeX files. This seems intentional, so I've left absolute paths enabled for tectonic --synctex runs.
  • The hyperxmp package embeds a bunch of metadata into the output PDF. In particular, it derives a random "xmpMM:InstanceId" from a bunch of information sources. One of those is the input file modification timestamp. I've wired this up to be the document build date instead. I'm not sure if there's a "legitimate" use case for file modification timestamps, so let's just try always returning the build date instead. 🙂

=> https://tt.ente.ninja/#/compare/05d445d074674bf1da7dd277bc8524fe85388ea9-untrusted-run-1/05d445d074674bf1da7dd277bc8524fe85388ea9-untrusted-run-2

@codecov
Copy link

codecov bot commented May 14, 2023

Codecov Report

Merging #1036 (22fb622) into master (6b6df36) will increase coverage by 0.00%.
The diff coverage is 53.70%.

❗ Current head 22fb622 differs from pull request most recent head c2e2ef1. Consider uploading reports for the commit c2e2ef1 to get more accurate results

@@           Coverage Diff           @@
##           master    #1036   +/-   ##
=======================================
  Coverage   45.09%   45.09%           
=======================================
  Files         155      155           
  Lines       62609    62651   +42     
=======================================
+ Hits        28234    28255   +21     
- Misses      34375    34396   +21     
Impacted Files Coverage Δ
src/unstable_opts.rs 56.66% <33.33%> (-1.23%) ⬇️
src/driver.rs 74.97% <41.17%> (-0.69%) ⬇️
src/docmodel.rs 55.17% <50.00%> (-1.08%) ⬇️
crates/bridge_core/src/lib.rs 66.20% <60.86%> (-0.33%) ⬇️
crates/xetex_layout/layout/xetex-XeTeXFontInst.cpp 47.66% <100.00%> (ø)
src/bin/tectonic/compile.rs 73.58% <100.00%> (+2.78%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Mrmaxmeier Mrmaxmeier marked this pull request as ready for review May 14, 2023 16:08
@pkgw
Copy link
Collaborator

pkgw commented May 16, 2023

It is great to see you again, @Mrmaxmeier!

Yes, the absolute path stuff was all added because it turns out that Synctex outputs must include absolute paths to the files they reference. I think there might even be an issue filed on your Arxiv tester relating to how those new absolute paths break the tests for the Arxiv documents that enable Synctex — I am pretty sure it predates the 2022.0 update.

I like the overall approach you have taken here, and I really like the nice structuring and internal documentation. Thank you for the nice work! Here are some rambling thoughts on some aspects of this:

  1. For the FsEmulationSettings struct, it would be better for API stability and such to keep its fields opaque and instead provide builder methods: APIs like fn expose_absolute_paths(&mut self, setting: bool), etc. This would be a bit annoying to implement for the struct, but since we already have a builder for launcher, how about adding methods like with_expose_absolute_paths() there rather than a high-level with_fs_emulation_settings()? I believe that the whole settings struct could remain private, then.
  2. For the question of what to do about absolute paths for your Arxiv tester, my initial thinking about this was that in test/reproducibility mode, maybe we would provide a fixed, fake absolute path prefix. I do like the idea of just not making absolute paths available unless they're needed, but (1) it feels like it might be hard to enumerate all the cases where they are needed and (2) this change doesn't solve the test failures in your tester, right? Since with Synctex enabled the absolute paths will still leak.
  3. Likewise, for file modification times, my initial idea would have been to generally allow the mtimes to leak, and then fake them only in hardcore-reproducible mode.

In general this is starting to tie in to some bigger things to tackle. For instance, to be really nicely reproducible things like the \today command would need to "lock in" a specific date/time, which would require us to emit a Tectonic.lock file and build that whole infrastructure. I like the idea of one day Tectonic.toml having a basic reproducible = true setting that will turn on all these features, and fine-grained control over things like leaking of absolute paths and mtimes if people really need them.

Anyway, regarding this pull request, I think my main question is, what do you think about item 2 above? I am genuinely not sure what kind of approach would be better here.

@pkgw pkgw mentioned this pull request May 16, 2023
@Mrmaxmeier
Copy link
Contributor Author

Right, I agree that heuristics for "reproducible mode" are a bit misguided. There's merit in keeping the default reproducible, but that's not really realistic as \today should and does default to the current date.

I'll introduce a --reproducible (or maybe --deterministic?) flag instead. I thought about coupling the reproducible mode with the SOURCE_DATE_EPOCH env var, as it's generally used in environments where reproducible builds are desired, but I think having it be an explicit flag is a better idea.

Is this something that should live in the "unstable options" world? It's certainly niche, but the interface (reproducible mode on/off) seems reasonably stable :)

Note: tectonic-on-arXiv will break for a while until this PR is merged as I'm now passing --reproducible as a CLI flag.


Tectonic.lock is a cool idea, though I'm not sure where the line between "committed to .lock" and "part of the document context" is. (i.e. the document's TeX sources shouldn't be part of the lockfile, and the bundle should be, but what about external paths / fonts / results of write18 invocations?)

@Mrmaxmeier Mrmaxmeier force-pushed the reproducibility-stub-abspath-mtime branch from 05d445d to 2156d30 Compare May 31, 2023 09:36
@Mrmaxmeier Mrmaxmeier changed the title Stub ttbc_get_abspath and ttbc_input_get_mtime for reproducible document builds Introduce --reproducible mode to address a few determinism issues May 31, 2023
@pkgw
Copy link
Collaborator

pkgw commented May 31, 2023

I think that yes, this should be an "unstable option", and I think that I prefer the "deterministic" terminology since I think it captures a bit more precisely what these changes are trying to accomplish. (In my mind, the "stability" question is less about whether the functionality is going to evolve, and more about whether we're ready to commit to exposing it through a certain interface, and I'd like to avoid proliferating command-line flags.) I agree that it feels like the right choice not to couple this stuff to SOURCE_DATE_EPOCH as well.

To check my understanding, is the current behavior that when Synctex outputs are created in "reproducible mode", the absolute paths will be forcibly suppressed, and the outputs will contain relative paths instead? If that's true, the Synctex outputs will be technically malformed, but I am 100% OK with that. It would be a good thing to document, though.

@Mrmaxmeier
Copy link
Contributor Author

Yep, deterministic mode generates broken synctex files. I've added a note in the unstable options help text that discusses this.

@Mrmaxmeier Mrmaxmeier changed the title Introduce --reproducible mode to address a few determinism issues Introduce -Z deterministic-mode to address a few reproducability issues May 31, 2023
@Mrmaxmeier Mrmaxmeier force-pushed the reproducibility-stub-abspath-mtime branch from d0016a7 to 10ab15b Compare May 31, 2023 17:25
@pkgw
Copy link
Collaborator

pkgw commented Jun 11, 2023

I finally have the chance to sit down and review this. This is looking excellent! I just pushed a commit to update vcpkg, which should hopefully get the build all green again.

@pkgw pkgw merged commit a3389b9 into tectonic-typesetting:master Jun 11, 2023
@Mrmaxmeier Mrmaxmeier deleted the reproducibility-stub-abspath-mtime branch June 11, 2023 16:18
@pkgw pkgw mentioned this pull request Jun 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants