Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FISH-796 Fix clustered singleton bugs and added test #5012

Merged
merged 8 commits into from
Dec 8, 2020

Conversation

lprimak
Copy link
Contributor

@lprimak lprimak commented Nov 28, 2020

Description

As I upgraded the code to Hazelcast 4, one of my tests was clustered singleton.
In the process of doing this, I found a few bugs, and re-added test

Important Info

Relates to #5014 and #5013

Testing

New tests

added back tests in Payara root to test clustered singleton

Testing Performed

automated and manual tests, as well as suite

Testing Environment

JDK 8, Mac

Notes for Reviewers

The PR seems big, but it's mostly new test files added

@lprimak
Copy link
Contributor Author

lprimak commented Nov 28, 2020

jenkins test

2 similar comments
@lprimak
Copy link
Contributor Author

lprimak commented Nov 28, 2020

jenkins test

@lprimak
Copy link
Contributor Author

lprimak commented Nov 28, 2020

jenkins test

@lprimak lprimak force-pushed the Fix-Clustered-Singleton-Bugs branch 2 times, most recently from 6f59378 to dfc9ddc Compare November 28, 2020 05:32
@lprimak
Copy link
Contributor Author

lprimak commented Nov 28, 2020

jenkins test

1 similar comment
@lprimak
Copy link
Contributor Author

lprimak commented Nov 28, 2020

jenkins test

@lprimak lprimak marked this pull request as draft November 28, 2020 07:00
@lprimak
Copy link
Contributor Author

lprimak commented Nov 28, 2020

jenkins test

1 similar comment
@lprimak
Copy link
Contributor Author

lprimak commented Nov 28, 2020

jenkins test

@lprimak lprimak changed the title Fix clustered singleton bugs Fix clustered singleton bugs and added test Nov 28, 2020
@lprimak
Copy link
Contributor Author

lprimak commented Nov 29, 2020

jenkins test

@lprimak lprimak changed the title Fix clustered singleton bugs and added test FISH-796 Fix clustered singleton bugs and added test Nov 29, 2020
@lprimak lprimak marked this pull request as ready for review November 29, 2020 04:20
@dmatej
Copy link
Contributor

dmatej commented Nov 30, 2020

I'm curious if this would help my current issue with Hazelcast vs. docker node start, where is some race condition blocking the docker node instance start for one minute, because hazelcast on das starts some migration before node finishes it's startup -> it fails again and again each 5 seconds, after one minute it gives up and node finishes startup. Whole test finishes in cca 135 seconds.
This situation is 5 times more often than finishing test in cca 68 seconds where is no blocking.

@lprimak
Copy link
Contributor Author

lprimak commented Nov 30, 2020

Not sure, but my gut says no, but Hazelcast 4.1 migration could

@lprimak lprimak marked this pull request as draft December 3, 2020 17:42
@lprimak lprimak marked this pull request as ready for review December 3, 2020 18:00
@lprimak
Copy link
Contributor Author

lprimak commented Dec 3, 2020

jenkins test

@lprimak lprimak force-pushed the Fix-Clustered-Singleton-Bugs branch from 511ef8d to d2d7e9e Compare December 3, 2020 19:09
@lprimak
Copy link
Contributor Author

lprimak commented Dec 3, 2020

jenkins test

1 similar comment
@lprimak
Copy link
Contributor Author

lprimak commented Dec 3, 2020

jenkins test

@lprimak lprimak force-pushed the Fix-Clustered-Singleton-Bugs branch from 040d4d5 to efab4a7 Compare December 3, 2020 19:59
@lprimak
Copy link
Contributor Author

lprimak commented Dec 3, 2020

jenkins test

@lprimak lprimak marked this pull request as draft December 3, 2020 20:44
@lprimak lprimak marked this pull request as ready for review December 3, 2020 22:01
@lprimak
Copy link
Contributor Author

lprimak commented Dec 3, 2020

jenkins test

Copy link
Contributor

@MattGill98 MattGill98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks broadly good, just a couple of questions

@lprimak lprimak requested a review from MattGill98 December 4, 2020 21:42
@lprimak
Copy link
Contributor Author

lprimak commented Dec 4, 2020

jenkins test

@lprimak lprimak requested a review from MattGill98 December 7, 2020 16:53
@lprimak
Copy link
Contributor Author

lprimak commented Dec 7, 2020

jenkins test

Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through it with anything like a fine-toothed comb, but the tests pass and no obvious purple elephants from a skim over the code.

@MattGill98 MattGill98 merged commit a62d616 into payara:master Dec 8, 2020
@lprimak lprimak deleted the Fix-Clustered-Singleton-Bugs branch December 24, 2020 18:10
@AndrewG10i
Copy link

AndrewG10i commented Feb 11, 2021

Hello there! @lprimak may I kindly share with you some early feedback as per this fix.

Yesterday I have downloaded the new release of Payara Micro 5.2021.1 (which includes this fix) and started with testing persistent schedules functionality (we use both ways of creating them with @Schedule and via TimerService), as from the fix topic it seems that persistent timers are going to be affected.

Please see below my findings considering the Payara Micro Community documentation for persistent timers -> How EJB timers are persisted.

In my tests we are considering two scenarios (by using cluster with two nodes launched with arguments: --name node1 and --name node2):

Scenario 1: instance (node1) that created persistent timer goes DOWN.

In pre-5.2021.1 versions (e.g. 5.2020.7):
The following statement from the documentation:
"If that instance goes down, the timer will be recreated on another instance with the same name once it joins the cluster."
actually worked as:
"If that instance goes down, the timer will be recreated (running) on another instance in the cluster."
So timer was actually migrating to the another node (node2) in the cluster and keep running on it. Yep, we were aware that it didn't follow the documentation but that behavior we actually liked: we need a timer running on the single node in the cluster with automatic failover to another node in case the one which created it goes down.

Since 5.2021.1 seems that was fixed, and now persistent timer doesn't run in the cluster (what basically corresponds to the documentation "Until that time, the timer becomes inactive.").
Just to note here that actually my tests shown that persistent timer runs once on the another node (node2) in the cluster, but after that single run - node2 shows message like:
[2021-02-11T13:16:34.957+0800] [] [INFO] [] [javax.enterprise.system.container.ejb.fish.payara.ejb.timer.hazelcast] The timer (1@@1613020572690@@node1@@clusterDev) is now owned by (node1). Removing from local cache

Seems all works in compliance with the documentation (spec) now, except that timer managed to run once on another node, but okay, in general - no action needed.

Scenario 2: instance (node1) that created persistent timer goes UP after it was down.

In pre-5.2021.1 versions (e.g. 5.2020.7) persistent timer was keep running on the another node (node2).

As per 5.2021.1: According to the documentation the following should happen: "If that instance goes down, the timer will be recreated on another instance with the same name once it joins the cluster."
But in my tests once I launch again node1 (which has originally created the timer) - timer is not re-created on it. Basically I do not see timer running at all in the cluster anymore.

This is our preliminary observations on the persistent timers behavior in Payara Micro 5.2021.1 with the fix according to this ticket.
I would like to emphasize that currently we are not registering separate issue for this as we are still working on checking and reconfirming our observations, and also adapting our application to the new behavior of persistent timers (as we need to migrate to the new release due to fixes for the other bugs).

We will do our best to prepare and publish test applications and test scenarios for demonstrating the reported behavior and once it is done and fully confirmed in our tests - we will register it as a separate issue.

Thank you!

aubi pushed a commit to aubi/Payara that referenced this pull request Jan 3, 2022
…Singleton-Bugs

FISH-796 Fix clustered singleton bugs and added test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants