Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop the started instances of a verticle when the deployment fails #5296

Merged
merged 4 commits into from
Sep 5, 2024

Conversation

vietj
Copy link
Member

@vietj vietj commented Sep 4, 2024

  • decouple the deployment manager from verticles
  • rewrite the deployment/undeployment control flow to use future composition
  • stop the started instances of a verticle, when the deployment fails

Motivation:

The deployment manager handle verticle concerns (such as creating contexts, worker pool, etc...) and impacts the testability and readability of deploy/undeploy (lifecycle) operations.

Changes:

Abstract the verticle details in a Deployable interface which is presented to the deployment manager by the verticle manager which already handle partially the handling of the verticle.

Results:

The deployment manager is not anymore coupled to verticles and can be tested / changed independantly.
@vietj vietj added this to the 5.0.0 milestone Sep 4, 2024
Motivation:

When the deployment of a verticle with multiple instance fails, the started instances shall be stopped since the deployment is considered as failed.

Changes:

Stop the started instances of a verticle deployment when the deployment fails.

Result:

When a verticle with multiple instance fails, all started instances are stopped.
@vietj vietj self-assigned this Sep 5, 2024
@vietj vietj changed the title Improve the deployment manager Stop the started instances of a verticle when the deployment fails Sep 5, 2024
@vietj vietj merged commit aa29562 into master Sep 5, 2024
7 checks passed
@vietj vietj deleted the deployment-manager-rewrite branch September 5, 2024 09:10
tsegismont added a commit to tsegismont/vertx-micrometer-metrics that referenced this pull request Sep 5, 2024
This is an attempt to fix intermittent build failures.

In some cases (only on CI), some tests fail because a previous test could not close properly a net or http server (java.net.BindException: Address already in use).

When this happens, the CI logs also print a message indicating a failure to properly close a server:

2024-09-05T05:02:17.7401756Z Sep 05, 2024 5:02:17 AM io.netty.util.concurrent.DefaultPromise safeExecute
2024-09-05T05:02:17.7410618Z SEVERE: Failed to submit a listener notification task. Event loop shut down?
2024-09-05T05:02:17.7412174Z java.util.concurrent.RejectedExecutionException: event executor terminated
2024-09-05T05:02:17.7413826Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:931)
2024-09-05T05:02:17.7415853Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350)
2024-09-05T05:02:17.7417941Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343)
2024-09-05T05:02:17.7419788Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:833)
2024-09-05T05:02:17.7421765Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.execute0(SingleThreadEventExecutor.java:824)
2024-09-05T05:02:17.7423694Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:814)
2024-09-05T05:02:17.7425432Z 	at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:862)
2024-09-05T05:02:17.7427177Z 	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:500)
2024-09-05T05:02:17.7428763Z 	at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:185)
2024-09-05T05:02:17.7430369Z 	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
2024-09-05T05:02:17.7432106Z 	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
2024-09-05T05:02:17.7434075Z 	at io.vertx.core.net.impl.NetServerImpl.lambda$actualClose$12(NetServerImpl.java:650)
2024-09-05T05:02:17.7435573Z 	at io.vertx.core.impl.future.FutureImpl$4.onSuccess(FutureImpl.java:176)
2024-09-05T05:02:17.7437457Z 	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:68)
2024-09-05T05:02:17.7438885Z 	at io.vertx.core.impl.future.FutureImpl.addListener(FutureImpl.java:231)
2024-09-05T05:02:17.7440198Z 	at io.vertx.core.impl.future.FutureImpl.onComplete(FutureImpl.java:199)
2024-09-05T05:02:17.7441321Z 	at io.vertx.core.net.impl.NetServerImpl.actualClose(NetServerImpl.java:643)
2024-09-05T05:02:17.7442279Z 	at io.vertx.core.net.impl.NetServerImpl.doClose(NetServerImpl.java:632)
2024-09-05T05:02:17.7443537Z 	at io.vertx.core.net.impl.NetServerImpl.close(NetServerImpl.java:172)
2024-09-05T05:02:17.7444943Z 	at io.vertx.core.internal.CloseFuture.cascadeClose(CloseFuture.java:151)
2024-09-05T05:02:17.7446072Z 	at io.vertx.core.internal.CloseFuture.close(CloseFuture.java:119)
2024-09-05T05:02:17.7447319Z 	at io.vertx.core.impl.DeploymentManager$VerticleHolder.close(DeploymentManager.java:270)

Looking at the tests that deploy verticles, it seems most of them don't manage the start promise properly.

Perhaps Vert.x does not close properly servers started by a verticle if the deployment hasn't been completed.

This may be solved by eclipse-vertx/vert.x#5296

But it's best to make sure the test verticle are deployed properly.

Signed-off-by: Thomas Segismont <[email protected]>
tsegismont added a commit to vert-x3/vertx-micrometer-metrics that referenced this pull request Sep 5, 2024
This is an attempt to fix intermittent build failures.

In some cases (only on CI), some tests fail because a previous test could not close properly a net or http server (java.net.BindException: Address already in use).

When this happens, the CI logs also print a message indicating a failure to properly close a server:

2024-09-05T05:02:17.7401756Z Sep 05, 2024 5:02:17 AM io.netty.util.concurrent.DefaultPromise safeExecute
2024-09-05T05:02:17.7410618Z SEVERE: Failed to submit a listener notification task. Event loop shut down?
2024-09-05T05:02:17.7412174Z java.util.concurrent.RejectedExecutionException: event executor terminated
2024-09-05T05:02:17.7413826Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:931)
2024-09-05T05:02:17.7415853Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350)
2024-09-05T05:02:17.7417941Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343)
2024-09-05T05:02:17.7419788Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:833)
2024-09-05T05:02:17.7421765Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.execute0(SingleThreadEventExecutor.java:824)
2024-09-05T05:02:17.7423694Z 	at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:814)
2024-09-05T05:02:17.7425432Z 	at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:862)
2024-09-05T05:02:17.7427177Z 	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:500)
2024-09-05T05:02:17.7428763Z 	at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:185)
2024-09-05T05:02:17.7430369Z 	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
2024-09-05T05:02:17.7432106Z 	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
2024-09-05T05:02:17.7434075Z 	at io.vertx.core.net.impl.NetServerImpl.lambda$actualClose$12(NetServerImpl.java:650)
2024-09-05T05:02:17.7435573Z 	at io.vertx.core.impl.future.FutureImpl$4.onSuccess(FutureImpl.java:176)
2024-09-05T05:02:17.7437457Z 	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:68)
2024-09-05T05:02:17.7438885Z 	at io.vertx.core.impl.future.FutureImpl.addListener(FutureImpl.java:231)
2024-09-05T05:02:17.7440198Z 	at io.vertx.core.impl.future.FutureImpl.onComplete(FutureImpl.java:199)
2024-09-05T05:02:17.7441321Z 	at io.vertx.core.net.impl.NetServerImpl.actualClose(NetServerImpl.java:643)
2024-09-05T05:02:17.7442279Z 	at io.vertx.core.net.impl.NetServerImpl.doClose(NetServerImpl.java:632)
2024-09-05T05:02:17.7443537Z 	at io.vertx.core.net.impl.NetServerImpl.close(NetServerImpl.java:172)
2024-09-05T05:02:17.7444943Z 	at io.vertx.core.internal.CloseFuture.cascadeClose(CloseFuture.java:151)
2024-09-05T05:02:17.7446072Z 	at io.vertx.core.internal.CloseFuture.close(CloseFuture.java:119)
2024-09-05T05:02:17.7447319Z 	at io.vertx.core.impl.DeploymentManager$VerticleHolder.close(DeploymentManager.java:270)

Looking at the tests that deploy verticles, it seems most of them don't manage the start promise properly.

Perhaps Vert.x does not close properly servers started by a verticle if the deployment hasn't been completed.

This may be solved by eclipse-vertx/vert.x#5296

But it's best to make sure the test verticle are deployed properly.

Signed-off-by: Thomas Segismont <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant