-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use symlink to hashes to avoid mirror failures #26
Conversation
2246e17
to
19308ef
Compare
I am not really a fan of placing this workaround here in that way:
will follow up in bugzilla |
If we could use the
and changelog date as |
f522d60
to
7200786
Compare
That causes even more changes, as it would prevent optimising away those rebuilds that resulted in the same build output.
No that is only disk io, due to content being the same the actual transfer is much smaller, probably a KB. But I implemented a way to reuse mtimes when content does not change when an old build is available. However not sure if that works yet as I have problems finding $BUILD_ROOT, as that env variable is not set.
That is a problem as the iso would become non-reproducible if it were to embed these timestamps. Or maybe that needs to be fixed by honoring SOURCE_DATE_EPOCH in the iso generation step. Can you point to an example kiwi file? |
I tried |
I guess .build.oldpackages is in the BUILD_ROOT that is outside the kvm, so I don't have access to that. We could optimize away a few mtime changes in https://github.com/openSUSE/openSUSE-release-tools/blob/master/publish_distro . What do you think? |
I think with #27 we can always apply it even for iso images as mkiso will then fix it again, instead of only for REPO_ONLY. |
7200786
to
debc19e
Compare
Ok mkiso can not fix the file mtime timestamps (only the time in the volume info). That will be a reproducibility regression in the iso, but as the iso is not fully reproducible due to the included .asc file that is generated each iso build, that is fine, we can fix that later, see comment in the code of #27 . So with the mtimes now being touched in all cases including for iso, I think this PR is ready. |
.build.oldpackages is inside the VM, but not at product build time. The idea was that the meta packages themself would check at their build time if they produce new files with different content. In that case they should avoid to set the same time stamp as the old ones. That is something what could get be done in a generic way in the %clamp_mtime_to_source_date_epoch rpm macro and would be IMHO much nice than putting a workaround on to of a workaround. Also it would work for all build types and not depending on the tool which is using these rpms. |
This would make packages harder to reproduce as each build would require the previous build output to reproduce the timestamps. In effect fully reproducing a distribution would require redoing all builds ever done for that distribution in the exact same sequence (the sequence is currently not an effective build input). While technically still reproducible, it would in practice require too much build time for a rolling distribution that exists for years. Am I missing some solution? Do you have a better idea? Ideally we would fix mirroring to consider also the file content even when the meta data is equal, either by using a sync solution that can keep that as state to not need to rehash files all the times or by accepting the disk IO cost. If we ever do we can remove this change again. It seems to me this PRs solution is the best next step:
|
okay, but always touching the meta files is actually the opposite of reproducible builds, each product run would deliver different results, even when the file content has not changed. We wouldn't have an issue either if the meta package is reproducible (not sure if that is the case). However, we deal here with a situation where a change is actually happen and wanted. In that case it is IMHO wrong to keep the old mtime of the files via the %clamp_* macro. Putting now a workaround into product-builder, which leads always to not reproducible results still sounds the wrong way to me. Another option would be to rename the files and include eg. a hash sum of their content into the file name. product-builder could create symlinks then and that should be handleable by the download redirector. |
No, we have an issue purely because the mirrors will not copy files with same stat but different content. The meta package is reproducible. No, it is not wrong to keep the same mtime despite a changed content, that is how SOURCE_DATE_EPOCH is defined to work. Yes, it is a bit wrong, but seemed more practical than fixing how all mirrors sync. Yes, adding a hash to all extracted file names is an option, if nothing trips over that. I'll implement that. |
debc19e
to
3fd5a87
Compare
Done, please rereview. |
3fd5a87
to
5c7cd5e
Compare
I have noticed a problem with using symlinks, it won't work for the files the iso uses for booting. And it won't work if mounted as its Joliet instead of Rock. So ideally the symlinks are only in the repo and not the iso. I cloud replace them with hardlinks before making the iso and redo them as symlinks after the iso is produced, but not sure that is the best way. What do you think? |
Converting them works: https://github.com/openSUSE/product-builder/pull/27/files#diff-d21f610fbd9c740ba827ceed296d338963b19adcb81545df05a36a73374601e2R1059 Then that will need to be merged first for this one to work. |
5c7cd5e
to
d2d759d
Compare
Mirrors may use rsync with the skip on same mtime feature, which would skip files that are different in content but have the same mtime. This results in an inconsistent mirror. Avoid this by creating symlinks to files with the real content named after the content hash. When the rpm macro %clamp_mtime_to_source_date_epoch is set to Y to enable reproducible builds, the mtime of files in the rpm will be set to the date of the last changes entry, but build dependencies that affect the content may be newer. This is relevant when extracting such an rpm for a repo that is used by the installer. Some mirrors may fail to sync to the newest content as they skipped them. This would make an installer using that mirror fail. Fixes: https://bugzilla.opensuse.org/show_bug.cgi?id=1148824
d2d759d
to
b405c1a
Compare
I still do not see any reason to workaround issues created in the meta packages with additional code in product builder. Either do the proper checking via .oldpackages as mentioned before in the meta packages or just disable the rpm functionality in the meta package spec files there. You should be able to do so by either setting %define source_date_epoch_from_changelog 0 or %define clamp_mtime_to_source_date_epoch 0 in the spec file of the meta packages. That will have more or less the same effect that every new build will have new mtimes inside of the rpm package. No need to add code in product-builder then. Sorry, going to close this request now. |
In consequence that means you refuse to allow us to implement https://reproducible-builds.org/ for any OpenSUSE or SUSE distribution? |
On Donnerstag, 7. September 2023, 15:26:06 CEST Jan Zerebecki wrote:
In consequence that means you refuse to allow us to implement https://reproducible-builds.org/ for any OpenSUSE or SUSE distribution?
No, that means that non-reproducible builds should not claim to be reproducible and the
workaround at another level making it _always_ _NOT_ reproducible.
This would basically add code to do A just to implement code to revert A at another level.
Either implement a propper reproducible build in meta packages or disable
the fake mtime setting code in the meta package build.
…--
Adrian Schroeter ***@***.***>
Build Infrastructure Project Manager
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nürnberg, Germany
(HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
|
You have a totally different definition of reproducible builds than https://reproducible-builds.org/ here. |
On Donnerstag, 7. September 2023, 15:41:27 CEST Jan Zerebecki wrote:
You have a totally different definition of reproducible builds than https://reproducible-builds.org/ here.
But I do not think you want your used definition of reproducible builds either as that would mean unnecessarily rebuilding packages. Increasing the used mirror bandwidth massively.
I see the violation in the meta packages as they _claim_ to be reproducible, but they are not.
_if_ they would just copy over the timestamp of the copied binaries you would not
have any problem.
But they ignore this otherwise perfectly reproducible state by the
usage of their changelog timestamp workaround.
This is not reproducible in any way and therefore it makes no sense to
add another workaround in product buider to counter-act this workaround.
Just take over the timestamp from the original binaries in meta packages
will solve this and make the builds reproducible at the same time.
The workarounds actually brake the reproducibility and make the stuff
un-necessary complex.
…--
Adrian Schroeter ***@***.***>
Build Infrastructure Project Manager
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nürnberg, Germany
(HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
|
We had a discussion elsewhere and you proposed something, I though it might work, but I missed a corner case, so it doesn't work. You proposed: Only while building meta rpm packages that will be extracted during the product builder: Set the rpm macro to clamp mtime to false or unset it. Then ensure in the spec build script the mtime of files is either taken from input binary rpms or manually set to SOURCE_DATE_EPOCH. I thought that might work, but I forgot a case. Example: There is a file that is built with a compiler. That file is included in a meta-rpm-package, never changes its name, and is extracted during product-builder time with its mtime kept from the rpm where it was compiled, and this mtime is exposed to rsync for mirrors. Then the compiler changes in such a way that the compiled file content changes. The compilers changelog changes, but the file its changelog doesn't change. Thus the file doesn't change its mtime. So the suggestion doesn't work. In general: Even in a build system that always uses current time for its output files, we can not normally rely on unix time or mtime to be monotonically increasing (it is explicitly declared not to be) or otherwise derive causal relationships from it. The Google Spanner paper https://research.google/pubs/pub39966/ describes what is necessary to do this. OBS doesn't do anything to offer such a guarantee. The problem is just rare enough and rebuilds that correct the problem again happen often enough that in practice it doesn't become visible. So instead of trying to detect causality from our flawed recording of time, we need to detect causality from the cause, the content change. Because fixing our recording of time is prohibitively expensive in terms of the complexity and performance. See also https://reproducible-builds.org/docs/source-date-epoch/#more-detailed-discussion and the following headings for some background about the relationship between time and reproducible builds. So what do you suggest we should do instead? |
Mirrors may use rsync with the skip on same mtime feature, which would
skip files that are different in content but have the same mtime. This
results in an inconsistent mirror.
Avoid this by creating symlinks to files with the real content named
after the content hash.
When the rpm macro %clamp_mtime_to_source_date_epoch is set to Y to
enable reproducible builds, the mtime of files in the rpm will be set to
the date of the last changes entry, but build dependencies that affect
the content may be newer. This is relevant when extracting such an rpm
for a repo that is used by the installer. Some mirrors may fail to
sync to the newest content as they skipped them. This would make an
installer using that mirror fail.
Fixes: https://bugzilla.opensuse.org/show_bug.cgi?id=1148824