Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux/riscv64 test failures #3832

Open
aarzilli opened this issue Oct 13, 2024 · 11 comments
Open

linux/riscv64 test failures #3832

aarzilli opened this issue Oct 13, 2024 · 11 comments

Comments

@aarzilli
Copy link
Member

The linux/riscv64 builder is not picking up linux/riscv64 builds. Looking at teamcity the problem seems to be this:

Unmet requirements: docker.server.version exists 
@aarzilli
Copy link
Member Author

cc @lrzlin

@lrzlin
Copy link
Contributor

lrzlin commented Oct 13, 2024

According to https://github.com/lawrenceching/gitbook/blob/master/teamcity-is-reporting-unmet-requirements-docker.server.version-exists.md, I start the docker deamon, now it's up and running on the linux/riscv64 builder, and I could find the Linux-openEuler24.09-riscv64 in compatible agents. Maybe we could trigger the builds manually and see what happens?

@aarzilli aarzilli changed the title linux/riscv64 builder does not process builds linux/riscv64 test failures Oct 13, 2024
@aarzilli
Copy link
Member Author

Alright. The agent is picking up builds however there are some failing tests: https://delve.teamcity.com/viewLog.html?buildId=63555&tab=buildResultsDiv&buildTypeId=Delve_linux_riscv64_1_23

  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script
  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

@lrzlin
Copy link
Contributor

lrzlin commented Oct 13, 2024

  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script

This seems to be related with async preempt mechanism of go, I tried disable it in delve but doesn't help, loong64 used to have similar issue, they finally fixed this by making CLs to the go compiler itself, so maybe we should do the same.

  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

Yes, I found this should also related with the go compiler, I tried loong64's delve pr on official go1.23.2 and their custom go1.22.6, the custom 1.22.6 compiler could pass these prologue skipping tests but the 1.23.2 couldn't.

@aarzilli
Copy link
Member Author

  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script

This seems to be related with async preempt mechanism of go, I tried disable it in delve but doesn't help, loong64 used to have similar issue, they finally fixed this by making CLs to the go compiler itself, so maybe we should do the same.

We could also simply adjust the test to ignore that message.

  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

Yes, I found this should also related with the go compiler, I tried loong64's delve pr on official go1.23.2 and their custom go1.22.6, the custom 1.22.6 compiler could pass these prologue skipping tests but the 1.23.2 couldn't.

It should be possible to make those tests pass without changing the compiler. I'd check the disassembler output for the functions where the detection is failing. As it is, if this test fails several other tests will fail sporadically.

aarzilli added a commit to aarzilli/delve that referenced this issue Oct 14, 2024
Delete non-working prologue detection code and mark port as experimental.

Updates go-delve#3832
aarzilli added a commit to aarzilli/delve that referenced this issue Oct 14, 2024
Delete non-working prologue detection code and mark port as experimental.

Updates go-delve#3832
aarzilli added a commit to aarzilli/delve that referenced this issue Oct 14, 2024
Delete non-working prologue detection code and mark port as experimental.

Updates go-delve#3832
derekparker pushed a commit that referenced this issue Oct 14, 2024
Delete non-working prologue detection code and mark port as experimental.

Updates #3832
@lrzlin
Copy link
Contributor

lrzlin commented Oct 15, 2024

@aarzilli Hi, I found there are more tests failed after deteled the prologue related code, is that means at least it's partially correct?

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

@aarzilli
Copy link
Member Author

@aarzilli Hi, I found there are more tests failed after deteled the prologue related code, is that means at least it's partially correct?

It could be. I compared it to the code in stacksplit and to what its generated by the compiler and I didn't see any resemblance but maybe I missed something. But a bad prologue detection code is going to make many of the tests in our suite fail non-deterministically so I would have to take samples before and after to be confident just by looking at test failures.

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

@lrzlin
Copy link
Contributor

lrzlin commented Oct 15, 2024

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

Add prologue_end DWARF stmt for riscv64: https://go-review.googlesource.com/c/go/+/620295

@aarzilli
Copy link
Member Author

It seems like the agent is now offline?

@lrzlin
Copy link
Contributor

lrzlin commented Oct 23, 2024

It seems like the agent is now offline?

Sorry, the server was down due to maintainance, I've restart the teamcity agent.

@lrzlin
Copy link
Contributor

lrzlin commented Oct 25, 2024

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

Add prologue_end DWARF stmt for riscv64: https://go-review.googlesource.com/c/go/+/620295

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants