Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go20 fileops flakiness #994

Closed
aviramha opened this issue Jan 31, 2023 · 2 comments
Closed

go20 fileops flakiness #994

aviramha opened this issue Jan 31, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@aviramha
Copy link
Member

Bug Description

the CI fails too many times with the following

stderr 2023-01-31T14:24:11.588117950Z 14737: 2023-01-31T14:24:11.588036Z TRACE ThreadId(03) handle_daemon_message:pop_send: mirrord_layer::file: enter daemon_message=File(Write(Ok(WriteFileResponse { written_amount: 445 }))) value=Ok(WriteFileResponse { written_amount: 445 })

stderr 2023-01-31T14:24:11.588309849Z 14737: 2023-01-31T14:24:11.588115Z TRACE ThreadId(01) mirrord_layer::go_hooks: c_abi_syscall6_handler: syscall=3 param1=15 param2=0 param3=0 param4=0 param5=0 param6=0
2023-01-31T14:24:11.588170Z TRACE ThreadId(01) mirrord_layer: Closing fd 15

deleting "http-echo-9vpgfsa"
deleting "http-echo-9vpgfsa"
thread 'file_ops::file_ops::test_file_ops::agent_1_Agent__Ephemeral::ops_4_FileOps__Go20' panicked at 'Timeout 240s expired', /home/runner/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/rstest-0.16.0/src/timeout.rs:27:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Steps to Reproduce

run e2e ci

Backtrace

No response

Relevant Logs

No response

Your operating system and version

Linux

Local process

go 20 fileops

Local process version

No response

Additional Info

No response

@aviramha
Copy link
Member Author

tbh I'm not sure we should have that many variants in the e2e - both of job/ephemeral and applications but I do want us to understand why it happens on Go20 specifically as there might be an underlying issue.
I'd split this issue into two:

  1. figure out why this happens only on it
  2. reduce the overlapping tests; (re)move fileops tests that overlap with integration, leave in the e2e for file ops only agent/ephemeral with one type of application

bors bot pushed a commit that referenced this issue Feb 6, 2023
The deadlock has only been observed in the E2E fileops test with Go-1.20 on GitHub Actions.

This PR hopefully eliminate this deadlock, but also contains some fixes of problematic lockings that did not cause this deadlock.

Part of #994.
bors bot pushed a commit that referenced this issue Feb 8, 2023
Replace the fileops E2E Go tests with integration tests - separate integration test for each part of the E2E test.
Testing with go versions 1.18, 1.19, 1.20.

Part of #994.
@t4lz
Copy link
Member

t4lz commented Feb 8, 2023

Deadlock cause was fixed in #1009, Go Fileops tests were converted to integration tests in #1019. Tracking the conversion of the other tests for other languages in #1032.

@t4lz t4lz closed this as completed Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants