Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master] ConcurrencyException "Max number of attempts to lock object" in WriteLockManager - bugfix #2323

Merged

Conversation

rfelcman
Copy link
Contributor

Fixes #2319 in WriteLockManager by toWaitOn.getInstanceLockCondition().await(MAX_WAIT, TimeUnit.MILLISECONDS); to wait for lock on object (CacheKey) to be released.
Before this fix there was
toWaitOnLockCondition.await(ConcurrencyUtil.SINGLETON.getAcquireWaitTime(), TimeUnit.MILLISECONDS);
which was
toWaitOnLockCondition.await(0, TimeUnit.MILLISECONDS); in case of default persistence settings -> Zero time ammount to wait.
Additionally there are some other migrations from synchronized to ReentrantLock usage for CacheKey related code.

…LockManager - bugfix

Fixes eclipse-ee4j#2319 in `WriteLockManager` by `toWaitOn.getInstanceLockCondition().await(MAX_WAIT, TimeUnit.MILLISECONDS);` to wait for lock on object (CacheKey) to be released.
Before this fix there was `toWaitOnLockCondition.await(ConcurrencyUtil.SINGLETON.getAcquireWaitTime(), TimeUnit.MILLISECONDS);` which was `toWaitOnLockCondition.await(0, TimeUnit.MILLISECONDS);` in case of default settings -> Zero time ammount to wait.
Additionally there are some other migrations from `synchronized` to ReentrantLock usage for CacheKey related code.

Signed-off-by: Radek Felcman <[email protected]>
@dmatej
Copy link

dmatej commented Dec 10, 2024

Additionally there are some other migrations from synchronized to ReentrantLock usage for CacheKey related code.

This could be related to the #2219 (comment) ? The linked issue is a blocker both for Spring and GlassFish now, I think you missed one systematic problem when migrating synchronized to locks.

I would have to spend much more time to analyze which change exactly causes the deadlock but as the change is a breaking change, I would recommend to revert it at least for 4.0.x versions. For now everybody stays with 4.0.3, GlassFish 7 with EclipseLink 4.0.4 doesn't pass even TCK for Jakarta EE 10.

@rfelcman
Copy link
Contributor Author

Additionally there are some other migrations from synchronized to ReentrantLock usage for CacheKey related code.

This could be related to the #2219 (comment) ? The linked issue is a blocker both for Spring and GlassFish now, I think you missed one systematic problem when migrating synchronized to locks.

I would have to spend much more time to analyze which change exactly causes the deadlock but as the change is a breaking change, I would recommend to revert it at least for 4.0.x versions. For now everybody stays with 4.0.3, GlassFish 7 with EclipseLink 4.0.4 doesn't pass even TCK for Jakarta EE 10.

About #2219 . I think it's related with DatabaseAccessor or DatasourceAccessor only. Not CacheKey releated changes.
In case #2219 I attached there some initial test case https://github.com/user-attachments/files/17043934/jpa-bug-2219-SpringbootHikariDSDeadLock.tar.gz with hope that somebody from comunity with SpringBoot knowledge will modify and update this as I don't have any experience with Spring/SpringBoot. But #2219 is now in progress. Simplest way should be revert back to synchronized but we are pushed by other projects to ReentrantLock way.
About TCK. It passing to us see https://ci.eclipse.org/eclipselink/job/TCK_JakartaEE10_JPA_run/18/consoleFull
It's for 4.0.4-RC1.v202407151106-059428cdd2583c46f1f3e50d235854840a6fa9a7 but 4.0.4 is same with same git commit ID 059428c .

@rfelcman
Copy link
Contributor Author

@dmatej are available somewhere some details logs/ Jenkins output about Glassfish TCK failures related with EclipseLink 4.0.4?

@dmatej
Copy link

dmatej commented Dec 15, 2024

It is a race condition, originally we blamed Weld, but then it failed after we reverted weld upgrade. Then by doing bisect we have found that the EclipseLink update to 4.0.4 is to blame. It is a race condition, so ti passes in some 30% of runs on my machine - it is quite well reproducible if you run those tests several times.

Jenkins CI has shorter history now, but you can checkout GlassFish's master branch, then run this:

# just once, to download the tck
mvn clean install -Ptck -pl :jakarta-platform-tck 
# to build glassfish with other EclipseLink than default 4.0.3
mvn clean install -T4C -Pfastest -Declipselink.version=4.0.4 
# to run unstable tests; be sure you have free GlassFish ports, but it will tell you.
mvn clean install -Ptck -pl :platform-tck-runner -Dit.test=CustomITest -Dtck.module=ejb32/lite/timer/basic/sharing

Copy link
Member

@Tomas-Kraus Tomas-Kraus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@rfelcman rfelcman merged commit e4245e2 into eclipse-ee4j:master Dec 16, 2024
6 checks passed
@rfelcman rfelcman deleted the bug_jpa_2319_concurrency_exception branch December 16, 2024 18:17
rfelcman added a commit to rfelcman/eclipselink that referenced this pull request Dec 16, 2024
…LockManager - bugfix (eclipse-ee4j#2323)

Fixes eclipse-ee4j#2319 in `WriteLockManager` by `toWaitOn.getInstanceLockCondition().await(MAX_WAIT, TimeUnit.MILLISECONDS);` to wait for lock on object (CacheKey) to be released.
Before this fix there was `toWaitOnLockCondition.await(ConcurrencyUtil.SINGLETON.getAcquireWaitTime(), TimeUnit.MILLISECONDS);` which was `toWaitOnLockCondition.await(0, TimeUnit.MILLISECONDS);` in case of default settings -> Zero time ammount to wait.
Additionally there are some other migrations from `synchronized` to ReentrantLock usage for CacheKey related code.

Signed-off-by: Radek Felcman <[email protected]>
(cherry picked from commit e4245e2)
rfelcman added a commit that referenced this pull request Dec 18, 2024
…LockManager - bugfix - backport from master (#2329)

* ConcurrencyException "Max number of attempts to lock object" in WriteLockManager - bugfix (#2323)

Fixes #2319 in `WriteLockManager` by `toWaitOn.getInstanceLockCondition().await(MAX_WAIT, TimeUnit.MILLISECONDS);` to wait for lock on object (CacheKey) to be released.
Before this fix there was `toWaitOnLockCondition.await(ConcurrencyUtil.SINGLETON.getAcquireWaitTime(), TimeUnit.MILLISECONDS);` which was `toWaitOnLockCondition.await(0, TimeUnit.MILLISECONDS);` in case of default settings -> Zero time ammount to wait.
Additionally there are some other migrations from `synchronized` to ReentrantLock usage for CacheKey related code.

Signed-off-by: Radek Felcman <[email protected]>
(cherry picked from commit e4245e2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConcurrencyException "Max number of attempts to lock object" in WriteLockManager
3 participants