Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WCOW Base layer creation and export #901

Closed
wants to merge 3 commits into from

Conversation

TBBle
Copy link
Contributor

@TBBle TBBle commented Dec 2, 2020

Fixes: #853

Implementation of base-layer export and base-layer creation, intended to support FROM scratch non-bootable flows like #750, or supporting the containerd snapshotter in producing mountable, exportable layers which are not based on an existing MS-provided OCI image, so that the containerd snapshotter test suite can pass without significant changes.

It should work with bootable layers, or at least I've confirmed that passing a nanoserver tarball through these two new features produces something functionally-identical to the original nanoserver tarball.

So in theory, this could be used to "squash" several layers into a new single base layer, but I suspect that's not feasible for anyone except MS due to licensing requirements, and based on my learnings here, MS are clearly using a different tool internally to produce current container images.

Things I don't know, and will ignore unless told-otherwise:

  • Should bcd.bak and related backups be included in the exported tarstream, replacing the modified UtilityVM files they were backups of?
  • What does lack of a UtilityVM in a base layer actually mean?

Minimal base-layer export is working.

My test case for this stage is importing mcr.microsoft.com/windows/nanoserver:10.0.19042.630's base (and only) layer using wclayer import, then exporting it to a tarball and importing that tarball to a new directory.

Current differences/issues

No tests

It doesn't appear we have any wclayer tests, beyond some usage for wclayer.CreateScratchLayer on top of nanoserver or busyboxw, relying on Docker to provide the rest of the stack of images.

Certainly nothing testing import/export.

Files that are not round-tripping correctly: Probably working as intended, one open question.

Importing from my generated tarball ends up with different bcd.bak and bcd.log.bak compared to importing from the MS-built tarball. This is because the BCD in the tree is different after untarring, and the bcd.bak is a copy of the bcd from the original tarball.

❔ It's not clear if the exported tarball should invert the backup step, and redirect the BCD and BCD.LOG to use the bcd.bak and bcd.log.bak (etc) files that were saved during import. (Edit: See mutatedFiles and its usage. However, I don't think this is relevant to this PR anyway, as we're working at a lower layer than this.)

A bunch of other files differ, but they differ every time the same tarball is imported anyway.

And really, since my target here is not bootable images but FROM scratch for e.g., #750 or supporting containerd WCOW snapshots sanely, none of the being-changed files matter to me.

Details of differing files

Comparing a containerd import to the wclayer import from the same (MS-generated tarball), the following files differ:

$ diff -ru ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1 nanoorig/
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/blank.vhdx and nanoorig/blank.vhdx differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/blank-base.vhdx and nanoorig/blank-base.vhdx differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/Files/EFI/Microsoft/Boot/BCD and nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG and nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/SystemTemplate.vhdx and nanoorig/UtilityVM/SystemTemplate.vhdx differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/SystemTemplateBase.vhdx and nanoorig/UtilityVM/SystemTemplateBase.vhdx differ

We know that the BCD files are modified by the import process, and the rest are presumably generated on-the-fly and end up with timestamps inside them or something.

Comparing the wclayer import of the original tarball to the wclayer import of the tarball I generated with wclayer export, the following files differ:

Binary files nanoorig/bcd.bak and newnano/bcd.bak differ
Binary files nanoorig/bcd.log.bak and newnano/bcd.log.bak differ
Binary files nanoorig/blank.vhdx and newnano/blank.vhdx differ
Binary files nanoorig/blank-base.vhdx and newnano/blank-base.vhdx differ
Binary files nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD and newnano/UtilityVM/Files/EFI/Microsoft/Boot/BCD differ
Binary files nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG and newnano/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG differ
Binary files nanoorig/UtilityVM/SystemTemplate.vhdx and newnano/UtilityVM/SystemTemplate.vhdx differ
Binary files nanoorig/UtilityVM/SystemTemplateBase.vhdx and newnano/UtilityVM/SystemTemplateBase.vhdx differ

And nailing down what all the BCD files are

$ sha256sum */bcd.bak */UtilityVM/Files/EFI/Microsoft/Boot/BCD
71b8db95fdc7339e00bc6ac87505d5cf21987fcdd171184a88e91836a884a83b *nanoorig/bcd.bak
6d6597ec802a43cb9b53b3bddb2e6eccd806abc2871cbb454495a3e38a087227 *newnano/bcd.bak
6d6597ec802a43cb9b53b3bddb2e6eccd806abc2871cbb454495a3e38a087227 *newnano2/bcd.bak
6d6597ec802a43cb9b53b3bddb2e6eccd806abc2871cbb454495a3e38a087227 *nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD
3546633c9b234e14b30fc06b24fd0757a474b36771382764db7db9f6a0de3a7e *newnano/UtilityVM/Files/EFI/Microsoft/Boot/BCD
f084cc86c3b06a96e497d354003b0436a6210b4412c26cf65f5bff3d262b6764 *newnano2/UtilityVM/Files/EFI/Microsoft/Boot/BCD

We can see that the bcd.bak that comes from importing my generated tarball is the BCD from the from-MS imported Utility VM; bcd.bak in the from-MS import is presumably the BCD that's actually inside the tarball.

This is an oddity of ociwriter.ImportLayer. It backs up BCD and BCD.LOG*, but doesn't do anything with those backups, they are not referenced anywhere else in the hcsshim code base. I'm not sure if the intent was to copy them back after importing, or if they are supposed to be included in the export (i.e. invert the backup process).

File order is different: Only annoying when doing comparisons of the tar tv output.

I don't know what tool is being used to create the files published by MS, but clearly it's not filepath.Walk in Go, as they disagree about the ordering of case. I suspect the most-important thing here is that Files comes before UtilityVM. Order of processing files will affect hard-links in the tarball, but only cosmetically.

Hard links: Works for base layers, but not non-base layers (but the latter is unrelated to this PR).

Resolved, for this use-case. The layer export code now recognises hard-links on the filesystem, and adds them to the tar stream. The algorithm roughly matches the one given at the OCI Image Spec "Image Layer Filesystem Changeset" notes on hard-links.

The hard-links present in the nanoserver image are also present if that layer is reexported, but:

  • FIle order means a different file might be "the file" in the tarball, because the data is recorded against the first-seen name.
  • Somehow the nanoserver tarball from mcr contains hardlinks with different modtimes to their targets (observed in links from UtilityVM/Files/ to Files/). That information is lost when importing the layer (those details are a factor of the single file, not the multiple directory entries linking to it) so that's not possible to replicate. Nor desirable, I think. I'm not sure how that is possible in the first place, short of stale directory entries on the source system when creating the tarball.

In testing, I noticed that hard-links that exist in a non-base layer do not appear as hard-links when exporting, because modvmcompute.NewProc("ExportLayer") does not preserve hard links when copying the layer data into a temporary directory. It does not create BACKUP_LINK streams, and even if it did, go-winio's backuptar doesn't handle them anyway. It's not obvious if the BACKUP_LINK data format is well-defined but not documented, or is supposed to be application-defined, in which case the content would have to be part of ExportLayer's API and somehow make sense both intra-layer and inter-layer.

I also found that New-Item -ItemType HardLink didn't work to create cross-layer links (which makes sense, since I assume it wants to change the 'link count' on a file in a read-only layer), so I suspect that at this time, hcsshim will never export hard-links except when exporting a base layer (i.e. this new code). Maybe this used to work, since the code exists to handle both inter-layer and intra-layer hard links in tarballs during import.

Hard-link support also pulls in a fairly noisy Microsoft/go-winio bump which will resolvably conflict with #941, and since I was running go mod tidy as a matter of course, I think I inlined #942 in passing. If the latter lands first, this'll be less noisy, but the go-winio bump will still generate noise in both modules because it has bumped golang.org/x/sys quite a long way, which flows through into the vendor directory. (This was broken out into separate PRs)

Notes from before this was implemented

The tarball from MS for nanoserver's base layer includes lots of hardlinks, some between entries in Files, but most are from the UtilityVM/Files back to Files/. Not capturing these hard links when exporting the layer takes an uncompressed nanoserver base layer from about 280MB to about 430MB.

This one's a little intricate. winio.NewBackupFileReader doesn't generate winio.BackupLink, that's up to the code walking the tree, and I haven't yet worked out how, from Go, to actually identify that a file is a HardLink, which PowerShell can tell me, and track down the other links, which fsutil hardlinks can tell me. So the data's there, but the API has escaped my minimal searching. I am guessing I want BY_HANDLE_FILE_INFORMATION to tell me when something has hard-links, and match them by file index.

Dagnabbit: Go's os.File.Stat actually fetches exactly this structure, but doesn't grab the number of links, and the index values are not exported. So we'll be making syscalls of our own to achieve this. os.SameFile uses the indexes, but I can't see a good way to apply that here.

Also, github.com/Microsoft/go-winio/backuptar.WriteTarFileFromBackupStream doesn't handle winio.BackupLink anyway. So that would have to be implemented too, or intercept the call to backuptar.WriteTarFileFromBackupStream in ociwclayer.writeTarFromLayer like we do for deleted files. That's probably necessary anyway, as per the MS docs, the actual data in the winio.BackupLink object is up to the application. It would also mirror ociwclayer.writeLayerFromTar which similarly intercepts the call to writeBackupStreamFromTarAndSaveMutatedFiles for tar.TypeLink headers.

I did a quick test of just tarballing-up the tree using Windows-bundled tar, and it did find hardlinks correctly, although it disagreed about which one in the tarball was the "original", not that it matters. It also failed to include UtilityVM but did include UtilityVM\Files, but that could probably be fixed with a better command-line and more than the zero effort I put into this test:

tar cf simplenano.tar -C .\nanoorig Files UtilityVM\Files
No permission bits on the exported tarballs: This was already the case for the non-base layers.

backuptar.WriteTarFileFromBackupStream just doesn't have any way to set the permission bits on the tar headers it creates. I considered making it automatically apply 0644 for files and 0755 for directories to match the MS-generated tarballs, but (a) that's way over in go-winio, and is separate to this; and (b) I'm only 80% sure that won't invalidate every Windows Containers build-cache on the planet, and I don't think 20% is low-enough to risk having to change my name and go into hiding before an angry mob of CI/CD engineers comes looking for me.

Weirdly, the MS-built tarball doesn't have permission bits on the Files, UtilityVM, or UtilityVM/Files directories, which suggests that whatever is done to generate these tarballs, it's doing something fairly specialised and a bit unusual. And also suggests this isn't important.

Punted to microsoft/go-winio#184 for discussion/feedback.


Creating a base layer from some files on-disk is working.

This introduces a new API, hcsshim.ConvertToBaseLayer:

  • If any of the below files are missing, create them as empty files so hcsshim.ProcessBaseLayer will complete:
    • Files/Windows/System32/config/DEFAULT
    • Files/Windows/System32/config/SAM
    • Files/Windows/System32/config/SECURITY
    • Files/Windows/System32/config/SOFTWARE
    • Files/Windows/System32/config/SYSTEM
  • Checks that if UtilityVM/Files exists, it contains UtilityVM/Files/EFI/Microsoft/Boot/BCD so that hcschim.ProcessUtilityVMImage will complete (assuming the BCD is valid).
  • Runs ProcessBaseLayer and if necessary, ProcessUtilityVMImage
    • Basically, this is the part of (w *baseLayerWriter) Close() after reapplyDirectoryTimes.

I'm not sure when a UtilityVM is necessary. At the moment, I'm assuming it's only needed if the layer is used as the base of a container's scratch layer, and that's not the target use-case for this work.

As it happens, the test TestSCSIAddRemoveWCOW is a passable candidate for exercising this API and particularly the use-case I'm targeting, as it creates scratch layers for attaching to a container as SCSI disks. Before this change, those scratch layers are based on the image used to create the container in the first place, i.e. nanoserver. With this API, it is much more similar to TestSCSIAddRemoveLCOW which uses trivial empty EXT4 scratch layers.

All of the above could be handled by the hcsshim user, e.g., this was done (but never merged) by refactoring the containerd snapshot testsuite in containerd/containerd#2366 to use hcsshim.ProcessBaseLayer and fstest.Base, including defining fstest.Base in containerd/continuity in order to record an undocumented internal hcsshim requirement.

I propose instead to make containerd's snapshot Commit API call hcsshim.ConvertToBaseLayer at the appropriate point so that to callers of the snapshot system, no behaviour difference is seen between Windows and non-Windows behaviours.

My fallback is to take all the code I wrote here, and add it to containerd instead, which of course means no one else benefits, and the knowledge needed to call hcsshim.Process* remains buried over in some non-hcsshim project.

Current differences/issues

Test-case TODO question

testutilities.CreateWCOWBlankRWLayer has a TODO:

// TODO: This is wrong. Need to search the folders.

It appears that it intends to search the provided layers to find one with a UtilityVM folder, but it's not clear why, as its current use-case is for scratch layers attached as SCSI devices via the Utility VM, which does not, as far as I can see, require a Utility VM.

It's a TODO here because before this change, the top layer has a Utility VM, but with this change, it does not.

The test passes so I think I'm okay, but it's possible there's a downlevel issue, as I'm testing on Windows 10 20H2.

cd tests
go test -v -tags "uvmscsi" -run TestSCSIAddRemoveWCOW  ./...

There is no step 3

Well, there is, but not here. Assuming that this work is completed and merged, I need to revendor into containerd, and rebase my work there (see containerd/containerd#4415 (comment)) to take advantage of this new API, and hopefully get the snapshot integration test suite to start passing, which creates arbitrary base layers in every test.

@TBBle TBBle force-pushed the base-layer-manipulation branch from 7b8341c to ecb0129 Compare December 6, 2020 07:31
@TBBle TBBle changed the title Base layer manipulation other than 'import' WCOW Base layer creation and export Dec 6, 2020
@TBBle TBBle force-pushed the base-layer-manipulation branch 3 times, most recently from cff971b to 7a04cff Compare December 12, 2020 07:26
@TBBle TBBle marked this pull request as ready for review December 12, 2020 13:57
@TBBle TBBle requested a review from a team as a code owner December 12, 2020 13:57
internal/ociwclayer/export.go Outdated Show resolved Hide resolved
@TBBle TBBle force-pushed the base-layer-manipulation branch from 2ed80aa to 269aad0 Compare December 19, 2020 12:58
@TBBle TBBle force-pushed the base-layer-manipulation branch from 269aad0 to 9f2e17d Compare January 2, 2021 08:12
@TBBle
Copy link
Contributor Author

TBBle commented Jan 2, 2021

Per discussion in #916, the new API hcsshim.ConvertToBaseLayer might want to be exposed via pkgs instead. Although perhaps that is true of the entire layer.go API, which is already just thin bounces into internal/ anyway.

@TBBle
Copy link
Contributor Author

TBBle commented Jan 2, 2021

Another API thought. Since this is currently being added to layer.go where almost all the APIs take info DriverInfo, layerId string rather than path string, should ConvertToBaseLayer be the same?

I didn't do this initially because the two current exceptions to this API structure are the functions the new API wraps, ProcessBaseLayer and ProcessUtilityVMImage.

@TBBle TBBle force-pushed the base-layer-manipulation branch from 9f2e17d to 014672b Compare January 10, 2021 04:34
@TBBle TBBle force-pushed the base-layer-manipulation branch from 014672b to 57a0abb Compare January 16, 2021 04:46
@brasmith-ms
Copy link

Hey folks, jumping in here to provide an update and clear up some confusion from the Microsoft side. What held this up for so long were some reservations from our legal team on enabling an open-source tool to easily package and redistribute Windows binaries. With the secure supply chain executive order, and our general company policies on security, we are moving very cautiously on topics like these. With that said, I'm here to bring the good news that these issues have resolved, and this PR is good to move forward.

I know it's been incredibly frustrating for the community here. I want to emphasize that we are here to support the community and do the best we can to make things easy for you. This definitely took longer than it should have and we're changing how things are done on our end to ensure it doesn't happen again. If you have concerns or questions, always feel welcome to reach out to me. (Brandon Smith (MS) on K8s Slack)

@ambarve would you be able to move this forward from here?

@TBBle
Copy link
Contributor Author

TBBle commented Jul 20, 2022

Glad to hear that. I've clearly got some rebasing and updating to do, I'll try and make some time for that soon but I'm pretty snowed under at the moment. Maybe on the coming weekend I can flush the rebasing through, at least.

Due to #1375, I can't easily test this with the containerd Snapshotter test-suite as I did in the past. So I'll also have to pull-in and complete the tests I mentioned in April, and ensure everything's working on both LTSC 2019 and LTSC 2022 (although I'm confident it will be, since I squashed one bug and know of only one other likely stumbling point, for which I have a proposed fix). At least those platforms are available on GHA now.

@dcantah
Copy link
Contributor

dcantah commented Jul 20, 2022

@TBBle I'm trying to resolve #1375 with this #1463 if you want to take a look. Still going back and forth on that

@TBBle TBBle force-pushed the base-layer-manipulation branch from 7028dda to 23cd100 Compare July 23, 2022 08:01
@TBBle
Copy link
Contributor Author

TBBle commented Jul 23, 2022

Sorry, I haven't had a chance to look closely at #1463 yet. It shouldn't block this (unless an API I need changes...) as I think it's better to do tests in this repo anyway. I've rebased my existing work (it wasn't as bad as I feared, most of the big scary conflict warnings turned out to be delete/edit over test/vendor, and they're trivial to resolve.

I've pushed my WIP tests over to https://github.com/TBBle/hcsshim/tree/layer-manipulation-tests and will perhaps go over them tonight or tomorrow, and try and get them clean-enough (and passing-enough, and high-coverage-enough) to pull into this PR, in which case I'd be confident this code is ready-to-go.

I see we now have a containerd integration test in the CI suite, but my Go practice is a little rusty, and I'm not sure if it will pick up the checked-out hcsshim in its containerd build as I suspect it will prefer the vendored version. So once #1375 has been resolved in some way, if we force teach the containerd integration test to build against the checked-out hcsshim, then we get the containerd snapshotter test suite also running for us, once that side lands too in containerd/containerd#4419 or similar. (There's significant runway before that PR lands, which is another good reason to land this code here with its own tests.)

TBBle added a commit to TBBle/containerd that referenced this pull request Jul 23, 2022
This is to gain access to:
* microsoft/hcsshim#901

TODO: Does not currently pull in the first commit of
https://github.com/TBBle/hcsshim/commits/hack-base-layer-manipulation-with-128layers-test
as it needs rebasing onto current hcsshim master.

That is a proposed fix for a post-RS5 change in behaviour, which needs
to be shown necessary on Windows Server 2022 as it was only observed on
my Windows 10 20H2 (or earlier) box, well before Windows Server 2022
shipped, and may no longer be needed.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@dcantah
Copy link
Contributor

dcantah commented Jul 23, 2022

Sorry, I haven't had a chance to look closely at #1463 yet.

No worries at all. I found out I'm calling that FormatWritableLayerVhd function internally in a helper in the same package also which is rather annoying.. Hoping to have a clearer picture this coming week on what we wanna do there.

I see we now have a containerd integration test in the CI suite, but my Go practice is a little rusty, and I'm not sure if it will pick up the checked-out hcsshim in its containerd build as I suspect it will prefer the vendored version.

The containerd integration suite here should run against/build a shim from whatever is in the current PR. So containerd main + your branch for hcsshim.

@TBBle
Copy link
Contributor Author

TBBle commented Jul 24, 2022

The problem for the integration test is that the code I'm working on isn't in the shim, it's an in-tree snapshotter in containerd linking against hcsshim directly. If it was an external snapshotter instead, that'd be a different question (and also solves the manifesting blocker in-passing). Although there isn't currently support as I recall an external mounter in containerd, so we'd be a significant refactoring away from being able to decouple fully anyway, if that was a useful direction.

On option would be to programmatically add a replace rule to containerd's go.mod to use the local checkout of hcsshim for its in-tree work. That (but manually) is how my testing of the snapshotter worked in the containerd tree. That whole idea though is somewhat flawed, as unlike the runtime shim, there isn't a stable API/ABI boundary for this part, so such tests would fail when hcsshim evolves its API in breaking ways, ala 0.9 and (I assume) 0.10.

@dcantah
Copy link
Contributor

dcantah commented Jul 24, 2022

@TBBle Ahh gotcha gotcha, you meant the containerd suite. Yea it will still use whatever is in its go.mod for code it uses hcsshim as a library for. External snapshotter solving the manifesting issue is an interesting thought also.. That's actually one of the suggestions we heard internally, but I'm not sure it makes sense for the windows snapshotter to not be built in

@TBBle
Copy link
Contributor Author

TBBle commented Jul 24, 2022

I also had "take the snapshotter external" as an option in the first "hcsshim ABI broke hard" ticket; in a green-field development, knowing what I know now, I'd have built the Windows snapshotter as an external snapshotter in this repo, since it parallels the external runtime shim and better-isolates containerd from vagaries of HCS/computerstorage evolution. (And this is the direction the OS X containerd work was strongly encouraged by the containerd team, which is what put it on my radar at all; in that case, they also farmed off mounting by defining a CLI utility as the API boundary).

But given the half-decade-or-so weight of code for the Windows Snapshotter already in containerd, as well as having access to the snapshotter test suite once we get that working, it'd be a time-costly and burden-shifting change that'd need more justification and stakeholder buy-in than I'm currently aware of.

aznashwan pushed a commit to aznashwan/containerd that referenced this pull request Aug 1, 2022
This is to gain access to:
* microsoft/hcsshim#901

TODO: Does not currently pull in the first commit of
https://github.com/TBBle/hcsshim/commits/hack-base-layer-manipulation-with-128layers-test
as it needs rebasing onto current hcsshim master.

That is a proposed fix for a post-RS5 change in behaviour, which needs
to be shown necessary on Windows Server 2022 as it was only observed on
my Windows 10 20H2 (or earlier) box, well before Windows Server 2022
shipped, and may no longer be needed.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
TBBle added 3 commits August 11, 2022 01:26
This is the inverse of the baseLayerWriter: It walks Files/ and
UtilityVM/Files/ (if present) and ignores the rest of the layer data,
as it will be recreated when the layer is imported.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
This API allows turning any collection of files into a WCOW base layer.

It will create the necessary files in Files/ for
hcsshim.ProcessBaseLayer to function, validate the necessary files for
hcsshim.ProcessUtilityVMImage if UtilityVM/ exists, and then call those
two APIs to complete the process.

Calling this on a directory containing an untarred base layer OCI
tarball, gives a very similar outcome to passing the tar stream through
ociwclayer.ImportLayer.

The new API is used in `TestSCSIAddRemoveWCOW` to create nearly-empty
base layers for the scratch layers attached and removed from the utility
VM.

A wclayer command is also introduced: `makebaselayer` for testing and
validation purposes.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle TBBle force-pushed the base-layer-manipulation branch from 23cd100 to d376404 Compare August 10, 2022 15:27
TBBle added a commit to TBBle/containerd that referenced this pull request Aug 10, 2022
This is to gain access to:
* microsoft/hcsshim#901

TODO: Does not currently pull in the first commit of
https://github.com/TBBle/hcsshim/commits/hack-base-layer-manipulation-with-128layers-test
as it needs rebasing onto current hcsshim master.

That is a proposed fix for a post-RS5 change in behaviour, which needs
to be shown necessary on Windows Server 2022 as it was only observed on
my Windows 10 20H2 (or earlier) box, well before Windows Server 2022
shipped, and may no longer be needed.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle
Copy link
Contributor Author

TBBle commented Aug 11, 2022

Carrying back results from the now-unblocked testing with my containerd PR that depends on this work, and it fails in what looks like #901 (comment) even on Windows Server 2019. So either something else has gone wrong, or there's been a platform update that pulled in whatever change introduced that in later Windows builds.

So for this PR, I'll need to finish my existing test suite work, and get it to the point where I can replicate that issue in this repo directly, and then rebase/reinstate/recreate TBBle@1803a21 into this PR as well, since it looks like it's all-versions necessary now. (A quick test suggests the cherry-pick will be easy, it didn't conflict anywhere except go.mod/go.sum and vendoring)

Note to self: The repro case seen in containerd is the import from tar of an exported tarball (i.e. wclayer export -> wclayer import; the latter hits internal/wclayer.ImportLayer via pkg/ociwclayer.ImportLayerFromTar) so it won't be hard to repro in the tests.

Later edit: I had a few minutes to do the rebase of the "hives" commit (on a new branch, https://github.com/TBBle/hcsshim/tree/base-layer-manipulation-with-hives, to keep things separate) and although the code itself rebased fine, I noticed a couple of issues:

  • go mod vendor behaviour has changed, and the relevant data is no longer added to the vendor directory.
  • When pulling hcsshim into containerd, it pulled and vendored the hive-manipulation library.

This suggests I should rework the go generate code to just fetch the blob from GitHub. Anyway, the important thing is to confirm that this leaves us in a good state for the containerd snapshotter, or reveals some other bitrot or API drift that's gone unnoticed while that PR was blocked by time or circumstances.

TBBle added a commit to TBBle/containerd that referenced this pull request Aug 11, 2022
This is to gain access to:
* microsoft/hcsshim#901
* TBBle/hcsshim@035362c

The latter is a WIP fix for a post-RS5 change in behaviour, which needs
to be shown necessary on Windows Server 2022 as it was only observed on
my Windows 10 20H2 (or earlier) box, well before Windows Server 2022
shipped, and may no longer be needed.

On the other hand, it recently appeared in a Windows Server 2019 (RS5)
testsuite run, so perhaps it's needed everywhere.

That's what we're testing here, as hcsshim does not have a thorough
test-suite for these codepaths yet.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle TBBle marked this pull request as draft September 27, 2022 14:04
@TBBle
Copy link
Contributor Author

TBBle commented Sep 27, 2022

Converted to draft as I'm currently moving countries, and hence don't have time or computing power for at least the next month (and maybe longer... we'll see) to rebase and resolve merge conflicts, let-alone complete the work described in #901 (comment).

@gabriel-samfira
Copy link
Contributor

gabriel-samfira commented Sep 29, 2022

So for this PR, I'll need to finish my existing test suite work, and get it to the point where I can replicate that issue in this repo directly, and then rebase/reinstate/recreate TBBle@1803a21 into this PR as well, since it looks like it's all-versions necessary now. (A quick test suggests the cherry-pick will be easy, it didn't conflict anywhere except go.mod/go.sum and vendoring)

To generate the minimal hive, how about something like this:

package main

import (
	"fmt"
	"log"
	"os"
	"syscall"

	winio "github.com/Microsoft/go-winio"
	"golang.org/x/sys/windows/registry"
)

//go:generate go run golang.org/x/sys/windows/mkwinsyscall -output zsyscall_windows.go main.go
//sys   RegSaveKeyW(key syscall.Handle, file *uint16, access uint32) (regerrno error) = advapi32.RegSaveKeyW

func main() {
	privileges := []string{"SeBackupPrivilege"}
	if err := winio.EnableProcessPrivileges(privileges); err != nil {
		log.Fatal(err)
	}
	defer winio.DisableProcessPrivileges(privileges)

	k, existing, err := registry.CreateKey(syscall.HKEY_CURRENT_USER, "tempKey", registry.ALL_ACCESS)
	if err != nil {
		log.Fatal(err)
	}
	defer registry.DeleteKey(syscall.HKEY_CURRENT_USER, "tempKey")

	fmt.Println(k, existing)

	saveTo := "C:\\Users\\Administrator\\reg\\save.dat"
	if _, err := os.Stat(saveTo); err == nil {
		os.Remove(saveTo)
	}

	if err := RegSaveKeyW(syscall.Handle(k), syscall.StringToUTF16Ptr(saveTo), 0); err != nil {
		log.Fatal(err)
	}
}

This exports an empty key. That exported file should be usable as a valid hive. The tempKey becomes the root of the new hive. I think we can also just embed the empty hive as a base64 string in the code if there are no licensing issues.

@riverar
Copy link

riverar commented Sep 29, 2022

Use the offline registry library functions instead https://learn.microsoft.com/windows/win32/devnotes/offline-registry-library-functions

@gabriel-samfira
Copy link
Contributor

Use the offline registry library functions instead https://learn.microsoft.com/windows/win32/devnotes/offline-registry-library-functions

That's a lot better! Thanks for the link!

@gabriel-samfira
Copy link
Contributor

@TBBle

Created a PR against your branch to generate empty hives using the offline registry library. I know you don't have time now, so have a look at your earliest convenience.

@TBBle
Copy link
Contributor Author

TBBle commented Mar 2, 2023

#1637, the successor for this PR, has merged.

@TBBle TBBle closed this Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it possible to create an empty scratch layer?