Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify zone db locking to avoid a race #2561

Merged
merged 6 commits into from
Nov 4, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 17 additions & 74 deletions src/main/java/com/nvidia/spark/rapids/jni/GpuTimeZoneDB.java
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ public class GpuTimeZoneDB {
// use this reference to indicate if time zone cache is initialized.
private HostColumnVector fixedTransitions;

private static boolean isShutdownCalledEver = false;

// Guarantee singleton instance
private GpuTimeZoneDB() {
}
Expand All @@ -70,49 +72,25 @@ static GpuTimeZoneDB getInstance() {
return instance;
}

static class LoadingLock {
Boolean isLoading = false;

// record whether a shutdown is called ever.
// if `isCloseCalledEver` is true, then the following loading should be skipped.
Boolean isShutdownCalledEver = false;
}

private static final LoadingLock lock = new LoadingLock();

/**
* This should be called on startup of an executor.
* Runs in a thread asynchronously.
* If `shutdown` was called ever, then will not load the cache
*/
public static void cacheDatabaseAsync() {
synchronized (lock) {
if (lock.isShutdownCalledEver) {
// shutdown was called ever, will never load cache again.
return;
}

if (lock.isLoading) {
// another thread is loading(), return
synchronized (GpuTimeZoneDB.class) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves room for a race where shutdown is called from another thread after the lock is released on L87. Should we make the whole method cacheDatabaseAsync synchronized instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second thread will check that the resource it is trying to update is not null, which would be closed and checked under the lock. So the second thread will do work, unnecessarily, but I don't see a case for a runtime error here, unless I am missing something.

But agree, what if this was all locked, is it bad?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov is 100% correct. I will fix it.

Is it bad?

Yes and no. We should not get inconsistent data, but we might load data after it was shutdown was called, and have no way to properly free it. It is on shutdown, but the change is small enough, and better enough, that I think it is best.

if (isShutdownCalledEver) {
log.error("cache async called after DB already loaded");
return;
} else {
lock.isLoading = true;
}
}

// start a new thread to load
Runnable runnable = () -> {
try {
instance.cacheDatabaseImpl();
} catch (Exception e) {
log.error("cache time zone transitions cache failed", e);
} finally {
synchronized (lock) {
// now loading is done
lock.isLoading = false;
// `cacheDatabase` and `shutdown` may wait loading is done.
lock.notify();
}
}
};
Thread thread = Executors.defaultThreadFactory().newThread(runnable);
Expand All @@ -127,61 +105,26 @@ public static void cacheDatabaseAsync() {
* If cache is exits, do not load cache again.
*/
public static void cacheDatabase() {
synchronized (lock) {
if (lock.isLoading) {
// another thread is loading(), wait loading is done
while (lock.isLoading) {
try {
lock.wait();
} catch (InterruptedException e) {
throw new IllegalStateException("cache time zone transitions cache failed", e);
}
}
return;
} else {
lock.isLoading = true;
}
}

try {
instance.cacheDatabaseImpl();
} finally {
// loading is done.
synchronized (lock) {
lock.isLoading = false;
// `cacheDatabase` and/or `shutdown` may wait loading is done.
lock.notify();
}
}
instance.cacheDatabaseImpl();
}

/**
* close the cache, used when Plugin is closing
*/
public static void shutdown() {
synchronized (lock) {
lock.isShutdownCalledEver = true;
while (lock.isLoading) {
// wait until loading is done
try {
lock.wait();
} catch (InterruptedException e) {
throw new IllegalStateException("shutdown time zone transitions cache failed", e);
}
}
instance.shutdownImpl();
// `cacheDatabase` and/or `shutdown` may wait loading is done.
lock.notify();
}
public static synchronized void shutdown() {
isShutdownCalledEver = true;
instance.shutdownImpl();
}

private void cacheDatabaseImpl() {
if (fixedTransitions == null) {
try {
loadData();
} catch (Exception e) {
closeResources();
throw e;
synchronized (GpuTimeZoneDB.class) {
if (fixedTransitions == null) {
try {
loadData();
} catch (Exception e) {
closeResources();
throw e;
}
}
}
}
Expand Down
Loading