Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic Linking Issue on Windows #3420

Closed
1 of 7 tasks
iamrecursion opened this issue May 19, 2021 · 22 comments
Closed
1 of 7 tasks

Periodic Linking Issue on Windows #3420

iamrecursion opened this issue May 19, 2021 · 22 comments
Assignees
Labels
Area: Common Tools investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows

Comments

@iamrecursion
Copy link

Description
We have sporadic failures where the linker on windows will fail to find a required dll with the following linker error.

LINK : fatal error LNK1171: unable to load mspdbcore.dll (error code: 1455)

This is a run where the failure occurred. This is a run where the failure did not occur, despite there being no material change to the actions workflow.

Area for Triage:

Question, Bug, or Feature?:
Bug

Virtual environments affected

  • Ubuntu 16.04
  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11
  • Windows Server 2016 R2
  • Windows Server 2019

Image version

Virtual Environment
  Environment: windows-2019
  Version: 20210509.1
  Included Software: https://github.com/actions/virtual-environments/blob/win19/20210509.1/images/win/Windows2019-Readme.md
  Image Release: https://github.com/actions/virtual-environments/releases/tag/win19%2F20210509.1

Expected behavior
Successful linking.

Actual behavior
Linking fails randomly.

Repro steps
I can't provide a repro as this appears to happen at random.

@Darleev
Copy link
Contributor

Darleev commented May 19, 2021

Hello @iamrecursion,
We will investigate the issue.

@Darleev Darleev added OS: Windows Area: Common Tools investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels May 19, 2021
@iamrecursion
Copy link
Author

Thanks. It's very annoying at the moment as it means that about half our builds fail.

@dsame
Copy link
Contributor

dsame commented May 21, 2021

@iamrecursion Can you please make sure your are using the same version of sbt for the failed and succeed builds?

@iamrecursion
Copy link
Author

They were both done with the same version, yes. Nothing in the project configuration changed between those two runs.

@dsame
Copy link
Contributor

dsame commented May 24, 2021

@iamrecursion
It is possible to restore the failed task "Build the PM Distribution" in the current repo? The existing branches does not have the task and it is not possible to investigate the problem.

The existing success log does not show the exact linker output and it is not possible to check for the link options.

It supposed we have some defaults of VC/sbt/graalvm changed and the required library is not in a path anymore

@iamrecursion
Copy link
Author

My apologies. This is the real job that fails. "Build the PM distribution" was purely a debugging task for me.

@dsame
Copy link
Contributor

dsame commented May 31, 2021

@iamrecursion
I see many Sleep 1 in your workflow. Can you please explain the purpose of it? I guess there's a download not completed in the time of the build step but can not identify it well.

@dsame
Copy link
Contributor

dsame commented May 31, 2021

@iamrecursion can you please remove the cache task from the build. It is the common place for unstable builds. For me the commenting out the cache step produced to success builds - not sure how stable it is in the real life.

@radeusgd
Copy link

radeusgd commented Jun 1, 2021

@dsame
The sleeps were added because sbt was sometimes unstable and would crash on init and we just saw empirically that added them made it stable. The download-in-the-background hypothesis sounds interesting, but if I recall correctly it was not uncommon with sbt that the first few invocations (in a given run) did in fact work and only a subsequent one crashed on initialization - so in that case I doubt it could be a download, since the previous invocations did work and they should have crashed in such a case too, right?

I've created a PR (linked above) where I disabled the caches for the main CI, let's see how it affects the issue - here's the first run (currently still in progress). However I don't understand how SBT caches could affect libraries which are MSVC components.

@dsame
Copy link
Contributor

dsame commented Jun 2, 2021

@radeusgd at the moment i suspect the build uses mspdbcore.dll from the following paths (or does not use them but use some other paths):

/LIBPATH:C:\hostedtoolcache\windows\GraalVM\java11-windows-amd64-21.1.0\x64\lib\static\windows-amd64 
/LIBPATH:C:\hostedtoolcache\windows\GraalVM\java11-windows-amd64-21.1.0\x64\lib\svm\clibraries\windows-amd64 

and we have some 32/64 executable conflicts issues

@radeusgd
Copy link

radeusgd commented Jun 2, 2021

@radeusgd at the moment i suspect the build uses mspdbcore.dll from the following paths (or does not use them but use some other paths):

/LIBPATH:C:\hostedtoolcache\windows\GraalVM\java11-windows-amd64-21.1.0\x64\lib\static\windows-amd64 
/LIBPATH:C:\hostedtoolcache\windows\GraalVM\java11-windows-amd64-21.1.0\x64\lib\svm\clibraries\windows-amd64 

and we have some 32/64 executable conflicts issues

Is some further information needed then to help with the debugging?

@radeusgd
Copy link

radeusgd commented Jun 8, 2021

@dsame

It seems that the build with cache off also fails with the same issue.

Are the 32/64 executable conflicts due to GitHub env configuration or is it something that we can modify? Do you have a suggestion what can be done to fix this?

@dsame
Copy link
Contributor

dsame commented Jun 10, 2021

@iamrecursion @radeusgd
The upgrade of GraalVM native image tool causes the problem. I'd suggest to try to rework the workflow in order to avoid this upgrade. Is it possible?

I confirmed the environment including the requested DLLs are the same for succeed and failed builds and having a little if any ability to control the upgrade process I see no way for further investigations.

@iamrecursion
Copy link
Author

Unfortunately we can't easily downgrade our version of GraalVM. We've found a way to make it slightly more reliable as it seems to be related to memory pressure, but we really need runners with more memory it seems.

@miketimofeev
Copy link
Contributor

@iamrecursion there is a way to increase swap size on Windows, it can help, I guess.
#2642 (comment)

@iamrecursion
Copy link
Author

That's super helpful. We'll try that and see if it helps.

@dsame
Copy link
Contributor

dsame commented Jun 16, 2021

@iamrecursion

I mean you should have a new version of GraalVM native image in the build, this way you avoid its upgrad during the build

image

@radeusgd
Copy link

radeusgd commented Jun 16, 2021

@iamrecursion

I mean you should have a new version of GraalVM native image in the build, this way you avoid its upgrad during the build

image

Oh, this could have been phrased in a confusing way indeed.

This message is not meant to say that we are rebuilding the Native Image tool itself.

It is meant to say that we are rebuilding our application's native image, so essentially it just means that we are invoking the native-image build process, which is a part of our build pipeline.

The Native Image component is only installed at the very beginning.

@AlenaSviridenko
Copy link
Contributor

Hi @iamrecursion,
is this issue still happening to your builds or did increasing swap size help?

@tiagobento
Copy link

Giving my 2 cents on this issue.

At the project I work on, we have a GitHub Workflow [1] that builds a Quarkus application using native-image. We were experiencing random linking failures on ~50% of our builds. After increasing the swap size following #2642 (comment), we didn't see any of those anymore and we've been building consistently ever since. (Thanks @miketimofeev!!)

[1] https://github.com/kiegroup/kogito-tooling-go/blob/main/.github/workflows/release.yml

@AlenaSviridenko
Copy link
Contributor

So I am closing this issue, since increasing swap size did help here.
Feel free to contact us if you have any concerns, thanks.

@radeusgd
Copy link

Hi @iamrecursion,
is this issue still happening to your builds or did increasing swap size help?

It seems that on our CI incresing the swap size also did help, it is much more stable since then and most of the time we did not notice this linking failure after the change. Thanks for the help!

apupier added a commit to apupier/kaoto-backend that referenced this issue Jan 23, 2023
compilation

Following workaround mentioned here
actions/runner-images#3420 (comment)
to the `LINK : fatal error LNK1171: unable to load mspdbcore.dll (error
code: 1455)` error

fixes kaoto-archive#416

Signed-off-by: Aurélien Pupier <[email protected]>
apupier added a commit to apupier/kaoto-backend that referenced this issue Jan 23, 2023
compilation

Following workaround mentioned here
actions/runner-images#3420 (comment)
to the `LINK : fatal error LNK1171: unable to load mspdbcore.dll (error
code: 1455)` error

fixes kaoto-archive#416

Signed-off-by: Aurélien Pupier <[email protected]>
apupier added a commit to apupier/kaoto-backend that referenced this issue Jan 23, 2023
compilation

Following workaround mentioned here
actions/runner-images#3420 (comment)
to the `LINK : fatal error LNK1171: unable to load mspdbcore.dll (error
code: 1455)` error

fixes kaoto-archive#416

Signed-off-by: Aurélien Pupier <[email protected]>
Delawen pushed a commit to kaoto-archive/kaoto-backend that referenced this issue Jan 23, 2023
compilation

Following workaround mentioned here
actions/runner-images#3420 (comment)
to the `LINK : fatal error LNK1171: unable to load mspdbcore.dll (error
code: 1455)` error

fixes #416

Signed-off-by: Aurélien Pupier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Common Tools investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows
Projects
None yet
Development

No branches or pull requests

7 participants