You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
java.lang.RuntimeException: Dataflow job 2023-12-04_17_25_11-8098838800222529628 terminated in state RUNNING but did not return a failure reason.
at org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:141)
at org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:93)
at org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:55)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:321)
Taking a look at worker log (2023-12-04_01_06_49-4587814450665214461)
DEBUG 2023-12-04T09:10:15.680161433Z Cleaning up.
DEBUG 2023-12-04T09:10:15.758333246Z Starting worker pool teardown.
INFO 2023-12-04T09:10:15.782743429Z Stopping worker pool...
DEBUG 2023-12-04T09:10:15.860076821Z Starting worker pool teardown.
INFO 2023-12-04T09:10:15.884898782Z Stopping worker pool...
INFO 2023-12-04T09:10:22.584818Z *** SIGTERM received by PID 14 (TID 14) on cpu 0 from PID 0; stack trace: ***
INFO 2023-12-04T09:10:22.584901Z PC: @ 0x55e6a3671f96 (unknown) (unknown)
...
INFO 2023-12-04T09:10:23.448826420Z ****************************************
DEBUG 2023-12-04T09:15:59.183629917Z Cleaning up.
DEBUG 2023-12-04T09:16:00.002974729Z Autoscaling: Reduced the number of workers to 0 based on low average worker CPU utilization, and the pipeline having sufficiently low backlog and keeping up with input rate.
DEBUG 2023-12-04T09:16:05.045251507Z Starting worker pool teardown.
INFO 2023-12-04T09:16:10.060122904Z Stopping worker pool...
DEBUG 2023-12-04T09:16:10.114936278Z Starting worker pool teardown.
INFO 2023-12-04T09:16:10.144000861Z Stopping worker pool...
INFO 2023-12-04T09:16:10.179066170Z Worker pool stopped.
DEBUG 2023-12-04T09:16:10.204084191Z Tearing down pending resources...
INFO 2023-12-04T09:16:10.792660720Z Your project already contains 100 Dataflow-created metric descriptors...
Occasionally it took 6 minutes to stop the worker pool, where TestPipeline.run() is still blocking.
Another occurrence 2023-12-04_17_25_11-8098838800222529628
INFO 2023-12-05T01:27:58.530670Z Successfully sampled resources
DEBUG 2023-12-05T01:28:21.709062648Z Cleaning up.
DEBUG 2023-12-05T01:28:21.755866390Z Starting worker pool teardown.
INFO 2023-12-05T01:28:21.778534382Z Stopping worker pool...
DEBUG 2023-12-05T01:28:21.806963604Z Starting worker pool teardown.
INFO 2023-12-05T01:28:21.827367725Z Stopping worker pool...
ERROR 2023-12-05T01:28:28.218612Z *** SIGTERM received by PID 14 (TID 14) on cpu 0 from PID 0; stack trace: ***
ERROR 2023-12-05T01:28:28.219090Z PC: @ 0x5621e4e71f96 (unknown) (unknown) @ 0x5621e6f3de89 1344 (unknown) @ 0x7f9168c099a0 631864672 (unknown) @ 0x5621e4e71f96 96 (unknown) @ 0x5621e4cafcca 208 (unknown) @ 0x5621e5df4d30 64 (unknown) @ 0x5621e52547b7 1664 (unknown) @ 0x7f9168a77bbd 208 __libc_start_main @ 0x5621e4c13599 (unknown) (unknown)
WARNING 2023-12-05T01:28:28.223847Z --- CPU registers: --- r8=0 r9=ffffffffffffffff r10=0 r11=246 r12=7f91689c9100 r13=7f91689c9180 r14=881 r15=ffffffffffffffff rdi=7f91689c9180 rsi=189 rbp=7ffd8e451870 rbx=7f91689c9100 rdx=0 rax=fffffffffffffffc rcx=5621e4e71f96 rsp=7ffd8e451820 rip=5621e4e71f96 efl=246 cgf=2b000000000033 err=0 trp=0 msk=0 cr2=0
...
INFO 2023-12-05T01:28:28.674663925Z *********************
DEFAULT 2023-12-05T01:28:28.674675645Z {"logging.googleapis.com/diagnostic":{…}}
DEBUG 2023-12-05T01:33:47.725042327Z Autoscaling: Reduced the number of workers to 0 based on low average worker CPU utilization, and the pipeline having sufficiently low backlog and keeping up with input rate.
DEBUG 2023-12-05T01:33:47.946791112Z Cleaning up.
DEBUG 2023-12-05T01:33:48.026216727Z Starting worker pool teardown.
INFO 2023-12-05T01:33:48.042428706Z Stopping worker pool...
DEBUG 2023-12-05T01:33:48.064501675Z Starting worker pool teardown.
INFO 2023-12-05T01:33:48.086649213Z Stopping worker pool...
INFO 2023-12-05T01:33:48.118769551Z Worker pool stopped.
DEBUG 2023-12-05T01:33:48.138574980Z Tearing down pending resources...
INFO 2023-12-05T01:33:49.164659964Z Your project already contains 100 Dataflow-created metric descriptors...
also took 6 minutes.
It is strange to see the last log is "Your project already contains 100 Dataflow-created metric descriptors." This should be seen in setup, usually.
For comparison, a normally shutdown worker has this kind of log:
DEBUG 2023-12-06T21:04:26.941513220Z Cleaning up.
DEBUG 2023-12-06T21:04:26.980247165Z Starting worker pool teardown.
INFO 2023-12-06T21:04:27.007520790Z Stopping worker pool...
DEBUG 2023-12-06T21:04:27.036155756Z Starting worker pool teardown.
INFO 2023-12-06T21:04:27.057789052Z Stopping worker pool...
INFO 2023-12-06T21:04:34.289906Z *** SIGTERM received by PID 13 (TID 13) on cpu 1 from PID 0; stack trace: ***
ERROR 2023-12-06T21:04:34.293400Z *** SIGTERM received by PID 13 (TID 13) on cpu 1 from PID 0; stack trace: ***
DEFAULT 2023-12-06T21:04:34.718670731Z {"logging.googleapis.com/diagnostic":{…}}
DEBUG 2023-12-06T21:06:48.144509841Z Autoscaling: Reduced the number of workers to 0 based on low average worker CPU utilization, and the pipeline having sufficiently low backlog and keeping up with input rate.
INFO 2023-12-06T21:06:48.181400397Z Worker pool stopped.
DEBUG 2023-12-06T21:06:48.208832034Z Tearing down pending resources...
Likely some racing condition in stop worker pool causing inconsistent teardown time
Issue Failure
Failure: Test is flaky
Issue Priority
Priority: 2 (backlog / disabled test but we think the product is healthy)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
What happened?
This is not an SDK bug. Internal ticket will also be filed.
It is noticed that some random test failing on https://github.com/apache/beam/actions/workflows/beam_PostCommit_Java_ValidatesRunner_Dataflow_V2_Streaming.yml?query=event%3Aschedule
For example,
https://github.com/apache/beam/runs/19288524813 with
Taking a look at worker log (
2023-12-04_01_06_49-4587814450665214461
)Occasionally it took 6 minutes to stop the worker pool, where TestPipeline.run() is still blocking.
Another occurrence
2023-12-04_17_25_11-8098838800222529628
also took 6 minutes.
It is strange to see the last log is "Your project already contains 100 Dataflow-created metric descriptors." This should be seen in setup, usually.
For comparison, a normally shutdown worker has this kind of log:
Likely some racing condition in stop worker pool causing inconsistent teardown time
Issue Failure
Failure: Test is flaky
Issue Priority
Priority: 2 (backlog / disabled test but we think the product is healthy)
Issue Components
The text was updated successfully, but these errors were encountered: