-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error building 4.25.3 windows server 2019 "Not enough space on disk" #99
Comments
Exactly same issue. Tried increasing maximum image size to 400G, did not help. Any resolution to this? |
Docker Engine 19.03.6 upgraded a library in order to fix a problem on Windows 1903/1909, but that shouldn't be the problem here, and I can't see anything else relevant in the release notes. Nonetheless, I'd suggest upgrading your Docker EE install to the latest version (should be 19.03.13) and trying again. The only other thing I'd suggest is making sure that the drive with the container images has sufficient free space, in case it's not the same drive ue4-docker is monitoring with And to be absolutely sure that we do have sufficient disk space in the container, try running:
to make sure that the container is seeing the image size setting correctly. |
I am seeing the same problem. My Docker engine is v19.03.13. This is the relevant snippet:
My volume size is configured at 600 and there is free space in the hard disk. Any ideas welcomed! |
@jordirovira That looks like a different issue. Try your build again with |
You are right, that fixed it. Thanks! I'll follow that issue to look for more information. |
I am running into the exact same issue as @jordirovira:
But I have upgraded to the 'Edge' version of docker, which includes the fix for the 8gig issue:
I have plenty of disk space, and my config is set to 600GB image sizes as well:
I am just trying to run a
If I inspect it appears the UE4Rules.pdb file is 0 bytes. This tells me I must not provide So now I am stuck. Is anybody else able to get ue4-docker to work right now? What solutions have you done to work around this? Thanks |
@Agendum same as jordirovira, that looks like a different issue. That issue is #44, not this issue, and there's a fix already in upstream Docker, pending the next major Docker Engine release. That said, I'm using |
Thanks @TBBle. I wish it was issue #44. In my post I showed I am using the 'Edge' version of Docker which includes the fix for #44, as well as the output of Unfortunately on Unreal Engine 4.25.3 it fails when trying to access UE4Rules.PDB no matter what I try. Why was it decided for ue4-docker to truncate symbol files instead of delete them? I wonder if this is something I can modify in my local ue4-docker install? @jforand you created this thread for 4.25.3 and said Thanks |
Ah, sorry, I hadn't noticed that 20.10.0 beta1 included my fix. Also, having a closer look, I have misdiagnosed both yours and jordirovira's problem as #44, but they are not: They are both failing with the I wonder if this is a different 8gig bug in Docker Engine, that the I'll try and create a reproduction case for this today (adding it to the 8gig test-case) and if so, see if we can get it fixed upstream in time for the 20.10 Docker Engine release. |
Oh, as far as truncating the PDB files instead of deleting them, I am assuming this is because the UE4 BuildGraph system will try to copy them or check for updatedness or something, and fail if the files are not present. But I'm not sure about that, @adamrehn probably can explain that choice. |
A trivial attempt to reproduce the new issue didn't trigger it on my Windows 10 2004 desktop machine. I basically replaced the Dockerfile for the 8gig diagnostics test on Windows with: # escape=`
ARG BASETAG
FROM mcr.microsoft.com/windows/servercore:${BASETAG} AS builder
SHELL ["cmd", "/S", "/C"]
# Add a sentinel label so we can easily identify intermediate images
LABEL com.adamrehn.ue4-docker.sentinel="1"
# Write an 8GiB sparse file
RUN powershell "fsutil.exe file createnew file 8589934592"
# Start a new target
FROM mcr.microsoft.com/windows/servercore:${BASETAG}
SHELL ["cmd", "/S", "/C"]
# Add a sentinel label so we can easily identify intermediate images
LABEL com.adamrehn.ue4-docker.sentinel="1"
# Copy the 8GiB file from the builder
COPY --from=builder C:\file C:\file and ran the test:
So more investigation is needed. |
Thanks @TBBle, have you been able to reproduce the bug with I am going to try again using Windows 10 2009. |
Yeah, right now the only machine I have with sufficient space to try and reproduce your problem is the UE4 Windows build node in our CI cluster, which I don't want to upgrade to the beta Docker Engine release, even if it has been published to the Microsoft Docker Provider. I'll try and free-up some space, or perhaps get hold of another (disposable) build node, but I can't promise fast turnaround as this is a background task for me. |
I updated WindowsUtils.py so that _validTags includes '2009' and also changed getReleaseBaseTag to support '2009'. I ran
I can only conclude that is another bug in Docker's I wonder if I delete PDBs under UnrealEngine\Engine\Plugins\Animation\LiveLink** before the COPY if it would work. |
It's odd because We're clearly in So there's nothing in Docker that's check the disk space, so the failure must be coming from I'm going to see if the problem reproduces on my desktop machine over the weekend. but as has been noted, I don't actually have 600GB of free space on that machine, so I may hit problems related to that before I reproduce this issue. Hopefully I can free up some space before that happens, or it reproduces before consuming all that space. |
In the Dockerfile I added this line:
But that failed with:
So I changed the delete line to be:
And this one failed with:
This one makes me more suspicious. Would hdf5.a really be so large?
The file is only about 4MB. Also, I went to get the size of the whole directory being copied:
The total amount of content being copied over is 27.75GB. To isolate the problem down I created this Dockerfile:
The output is:
So there is 599.8GB available, and it is trying to copy over a total of 27.75GB but it fails with out of space. My docker configuration is:
And...
Over 1.22TB available on the physical disk. It feels like there is a bug here in Docker's COPY. Any idea what else I can do to narrow this down? Thanks |
How much space do you have free on your C: drive? I haven't looked through Docker, but containerd (really hcsshim) ends up with things in Another possibility is that the way One thing I'll note is that microsoft/hcsshim#718 (which worked around a Windows issue with non-default sandbox size) does not apply to Windows 2004 onwards. So if there's still (or another) sandbox-resizing bug around, we might be stumbling over that. My desktop machine is still building the image for me to repro this locally (it takes days...), so I can't test this myself right now. You could also perhaps narrow it down by using And for an even-more-hassle test, if you downgraded Docker to 19.03, and try your isolation-test Dockefile, then it will either fail the same way (which means the |
Thanks @TBBle. My C: drive has about 160GB of free space. I ran I tried doing a
So the only thing I could think of was to mount it and copy files over...
Files were copied just fine. But this likely doesn't test the code path you are thinking about. |
Yeah, copying through a volume mount won't hit the codepath I'm suspicious of, it's specifically Am I correct in understanding from the error you got on I also read back again, and noticed that @jordirovira was seeing the same issue (with I'm currently watching the Fun! For reference, the command I used to repro was
And I did need to have 600gB of disk space free, as my attempt with 530gB free ran out of space archiving the big layer in step 16. >_< Also, if playing along at home, it seems HyperV isolation locks up the build as there's a busy-wait in UE4 where the shader compile worker gets starved of CPU due to only having two cores under Hyper-V. Working out why process isolation isn't working for me is a separate thing I need to get to. >_< |
I'm finally starting on debugging this. The debug-level logs from the dockerd are:
given a Dockerfile of FROM ue4-builder-stage-19:4.25.3-2004 AS builder
FROM mcr.microsoft.com/windows/servercore:2004
# Repro of https://github.com/adamrehn/ue4-docker/issues/99#issuecomment-716697043
COPY --from=builder C:/UnrealEngine/LocalBuilds/Engine/Windows C:/UnrealEngine where ue4-builder-stage-19:4.25.3-2004 is a tag I added to the 'builder' result in the ue4-docker run (stage 19), I'm a little suspicious that the server-side doesn't see the "out of space" error but a "closed pipe" failure when adding the file to the tar stream, so I'm wondering if that's a red-herring or translation of a more-generic failure. Either that, or it just doesn't log the failure it's sending back to the client. Edit: Confirmed that the Edit: Confirmed that the problem is that the volume being copied to is only 20GB (21339549696 bytes) so it seems that the storage opts are not being applied correctly (or sandbox resizing is failing) when the layer is mounted for copying-to. This should be easy to create a more-focused test for now, although Logs with hacked-in messages:
Edit: And it turns out now this was reported to Docker in 2018: moby/moby#37352 |
I'm now pushing the For the record, the repro case I'm using (and aiming to add to the Docker Engine integration test suite) for this is: FROM mcr.microsoft.com/windows/servercore:2004 AS intermediate
WORKDIR C:\\stuff
RUN dir
# Create and delete a 21GB file
RUN fsutil file createnew C:\\stuff\\bigfile_0.txt 22548578304 && del bigfile_0.txt
# Create three 7GB files
RUN fsutil file createnew C:\\stuff\\bigfile_1.txt 7516192768
RUN fsutil file createnew C:\\stuff\\bigfile_2.txt 7516192768
RUN fsutil file createnew C:\\stuff\\bigfile_3.txt 7516192768
# Copy that 21GB of data out into a new target
FROM mcr.microsoft.com/windows/servercore:2004
COPY --from=intermediate C:\\stuff C:\\stuff which tests that both And since I've been doing repros, I think @jforand 's original issue in this ticket is simply that the 4.35.3 Windows UE4 build takes up more than 200GB of space. My stage-19 image (the last one before we start copying just the stuff we need into new clean images) for 4.25.3 on Windows 10 2004 is 263GB, and the peak size was larger than that during the build as part of the build script deletes a bunch of data before we finish. I don't recall 4.24.3 being that large, so I suspect 4.25.3 just added a bunch of new stuff. This same growth is probably why we're hitting the I'm not sure why @arisona had the same issue as @jforand with a 400GB sandbox size, that seems like it ought to be enough for the compile stage... but perhaps not. I had a successful build with 600GB sandbox size, which was pretty much all the available space on my disk. So yeah... Three different issues here, all of which seem to lead back to "4.25 was sufficiently bigger than 4.24 to push us over a bunch of limits we were already close to hitting". |
Thanks @TBBle. Do you think if I split the ue4-docker COPY statement into multiple sub-copies will I be able to work around this? |
I didn't test to see if each COPY command gets 20GB of free space, but I suspect it does, as the sandbox size doesn't include parent layers. So that might work around the issue. |
Good news: A dockerd.exe built from moby/moby#41636 (Docker Engine master plus fixes for the
although it used the cache from my earlier builds with vanilla 20.10.0-beta1. I'm sure it would have built okay from scratch, but I didn't really want to wait another 19 hours to find out.
Huh. That's going to be too big for Amazon ECR (20GB limit), for example. I guess once we have the ue4-docker build/plugin system rework, I can redirect PDBs off to a Symbol Store instead, and exclude them from the final image. If |
In this case, it has to be the I don't recall why I had the 3x7GB steps instead of creating a 21GB file and then using that as That disk-space issue is specific to the Azure nodes used by Moby on their Jenkins, it shouldn't affect anyone who's going to be able to run ue4-docker anyway. So
should work as a diagnostic for ue4-docker. (Although does the nanoserver image have fsutil?) You could perhaps |
I've created PR #147 that adds diagnostic for 20GiB COPY issue. |
This became a blocker issue for 5.0.0-preview-1. We strongly need a usable dockerd with fixes from moby/moby#41636 |
I got tired waiting for Moby release, so here you are a drop-in |
So.
I don't see what else can be done here. Everyone just sits and waits until Moby finally makes a new release with a fix. Closing. |
@slonopotamus do you think it's worth adding a note to the output of the |
Okay, will do. |
I've forgot about it for a minute, but here it is: #255 |
Output of the
ue4-docker info
command:I was previously able to build 4.24.3 on this same machine.
When trying to build 4.25.3 (with "ue4-docker build 4.25.3 --exclude debug --exclude templates --monitor"), I get the following error.
Output of "docker images" if it's relevant
The text was updated successfully, but these errors were encountered: