-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify zone db locking to avoid a race #2561
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
|
||
if (lock.isLoading) { | ||
// another thread is loading(), return | ||
synchronized (GpuTimeZoneDB.class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still leaves room for a race where shutdown is called from another thread after the lock is released on L87. Should we make the whole method cacheDatabaseAsync
synchronized
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second thread will check that the resource it is trying to update is not null, which would be closed and checked under the lock. So the second thread will do work, unnecessarily, but I don't see a case for a runtime error here, unless I am missing something.
But agree, what if this was all locked, is it bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gerashegalov is 100% correct. I will fix it.
Is it bad?
Yes and no. We should not get inconsistent data, but we might load data after it was shutdown was called, and have no way to properly free it. It is on shutdown, but the change is small enough, and better enough, that I think it is best.
@gerashegalov and @abellina I fixed the issue with locking, but I could not leave it alone and I removed the singleton instance + static methods abstraction. It was not actually needed for unit tests, so i just removed it to make the code a little bit cleaner. Hopefully. |
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I started to run into a race condition when running the integration tests in spark-rapids for an unrelated PR where
Would fail 2 of the tests with a null pointer exception saying that Z was not inside
fixedTransitions
.I am not 100% sure how this ended up as a problem. Looking at the code quickly I don't see how this would happen. But when I added a synchronize to
fromUtcTimestampToTimestamp
so that my log messages would come out not-intermixed, the problem went away. The synchronization logic is here so that shutdown can be called while we are still trying to load the DB. But that also is a bug because we don't want to assume that the data is freed properly when we are trying to allocate it. So I just made the locking simpler. Every critical section, loading the DB and freeing the resources, is now protected by synchronizing onGpuTimeZoneDB.class
, which is the same as a synchronize on a static method in that class.