Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer #21706

Closed
damccorm opened this issue Jun 5, 2022 · 21 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 5, 2022

Causing unit test failures in nightly

https://github.com/apache/beam/runs/6150980228?check_suite_focus=true#step:6:112

https://github.com/apache/beam/runs/6143954692?check_suite_focus=true#step:6:308

https://github.com/apache/beam/runs/6137479897?check_suite_focus=true#step:6:112
 
@pytest.mark.timeout(3)
118 def test_dynamic_timer_clear_then_set_timer(self):
E Failed: Timeout >3.0s

 

 

Imported from Jira BEAM-14367. Original Jira may contain additional context.
Reported by: yihu.

@damccorm
Copy link
Contributor Author

damccorm commented Jun 5, 2022

Unable to assign user @AnandInguva. If able, self-assign, otherwise tag @damccorm so that he can assign you. Because of GitHub's spam prevention system, your activity is required to enable assignment in this repo.

@damccorm damccorm added this to the 2.40.0 Release milestone Jun 5, 2022
@Abacn
Copy link
Contributor

Abacn commented Jun 6, 2022

Still observed after #17569 though less frequently: https://github.com/apache/beam/runs/6760704625?check_suite_focus=true

@aaltay
Copy link
Member

aaltay commented Jun 7, 2022

Who would be a good owner for this issue? @tvalentyn @AnandInguva ?

Unable to assign user @AnandInguva.
@damccorm - What is limitation for doing that?

@damccorm
Copy link
Contributor Author

damccorm commented Jun 8, 2022

@damccorm - What is limitation for doing that?

It requires a user to interact with the issue or have some level of permission (e.g. committer) in the repo before assignment.

@aaltay
Copy link
Member

aaltay commented Jun 8, 2022

@damccorm - What is limitation for doing that?

It requires a user to interact with the issue or have some level of permission (e.g. committer) in the repo before assignment.

Thank you. Could we clarify this somewhere, and also provide guidance for people what should they do if they cannot assign an issue. (Example a similar issue Sachin faced here #21741 - created the issue but could not assign to Reza. I was able to assign it. I guess because both I and Reza are committers?)

@damccorm
Copy link
Contributor Author

damccorm commented Jun 8, 2022

Thank you. Could we clarify this somewhere, and also provide guidance for people what should they do if they cannot assign an issue. (Example a similar issue Sachin faced here #21741 - created the issue but could not assign to Reza. I was able to assign it. I guess because both I and Reza are committers?)

Sure! I already had a pr to add some automation to make this easier with other doc changes, so I bundled this guidance into that change: #21719

@aaltay
Copy link
Member

aaltay commented Jun 8, 2022

Thank you. Could we clarify this somewhere, and also provide guidance for people what should they do if they cannot assign an issue. (Example a similar issue Sachin faced here #21741 - created the issue but could not assign to Reza. I was able to assign it. I guess because both I and Reza are committers?)

Sure! I already had a pr to add some automation to make this easier with other doc changes, so I bundled this guidance into that change: #21719

Thank you. This is an improvement. I understand why this is a limitation in general, but it is a bit unfortunate. I feel like we will end up with the workflow of file an issue, and tag someone to comment so that they can assign the bug to themselves, but until they do that the dashboards etc will have issues without owners. I think this is an acceptable trade off, but if you can think of any improvements I would take it :)

@AnandInguva
Copy link
Contributor

AnandInguva commented Jun 9, 2022

Who would be a good owner for this issue? @tvalentyn @AnandInguva ?

I will take a look and assign to the appropriate people. Thanks

@AnandInguva
Copy link
Contributor

@pabloem Are you the owner of this test? If yes, could you take a look on why this is flaky? Thanks

@kennknowles kennknowles removed this from the 2.40.0 Release milestone Jun 14, 2022
@kennknowles
Copy link
Member

Clearing the milestone field since this doesn't seem like it is a release blocker. Would this flaking test possibly indicate a problem that makes the release non-functional?

@pabloem
Copy link
Member

pabloem commented Jun 14, 2022

agreed with Kenn that this is not a 2.40.0 blocker

@pabloem
Copy link
Member

pabloem commented Oct 18, 2022

I've ran this many many times on my laptop without causing it to fail. I'll close this for now.

@pabloem pabloem closed this as completed Oct 18, 2022
@Abacn
Copy link
Contributor

Abacn commented Oct 18, 2022

This only flakes on jenkins. e.g. https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/6234/

@Abacn
Copy link
Contributor

Abacn commented Oct 18, 2022

I think if it is specific to our (poor) test infrastructure we can downgrade the priority and leave it open for track.

@pabloem pabloem reopened this Oct 18, 2022
@pabloem
Copy link
Member

pabloem commented Oct 18, 2022

thanks @Abacn !

@pabloem pabloem added P2 and removed P1 labels Oct 18, 2022
@github-actions github-actions bot added this to the 2.43.0 Release milestone Oct 18, 2022
@kennknowles
Copy link
Member

There is no sign of an infrastructure failure. This appears to be an actually-possible execution that fails. This suggests a race condition to me, and the heavy load on the Jenkins workers causes it.

@kennknowles kennknowles added P1 and removed P2 labels Oct 19, 2022
@kennknowles
Copy link
Member

Based on it being a timeout I would guess deadlock (assuming 3 seconds is many orders of magnitude more than a successful run takes). Something to do with a concurrent map used in the dynamic timers? I don't know what this looks like in the Python codebase.

@chamikaramj
Copy link
Contributor

Hi folks,

Is this an actual blocker for the 2.43.0 release ? Seems like it was added to the milestone automatically.

@kennknowles
Copy link
Member

@pabloem any update? are you working on this right now?

@kennknowles
Copy link
Member

I'm assuming lack of response means that this is on the back burner and could be unassigned?

@damccorm
Copy link
Contributor Author

I think this is fixed. If not, it should get auto-flagged by our tooling anyways, so this should be safe to close

@github-actions github-actions bot added this to the 2.61.0 Release milestone Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants