-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spanner: Re-create sessions that have been invalidated by the server #4734
Spanner: Re-create sessions that have been invalidated by the server #4734
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4734 +/- ##
=========================================
Coverage 50.36% 50.36%
Complexity 23665 23665
=========================================
Files 2233 2233
Lines 225856 225856
Branches 24956 24956
=========================================
Hits 113742 113742
Misses 103517 103517
Partials 8597 8597 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have stylistic preferences. I'm not going to block this PR for those comments.
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Show resolved
Hide resolved
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Show resolved
Hide resolved
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Show resolved
Hide resolved
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Show resolved
Hide resolved
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerImpl.java
Show resolved
Hide resolved
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Show resolved
Hide resolved
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Outdated
Show resolved
Hide resolved
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Outdated
Show resolved
Hide resolved
...ud-clients/google-cloud-spanner/src/test/java/com/google/cloud/spanner/GceTestEnvConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One high level comment, customers should ideally never see a SessionNotFound with this change. But in ur change, u only have 1 retry for error case. I would change it such that it is always retried for SessionNotFound
|
||
private void maybeRecreateUnderlyingSession(SessionNotFoundException e) { | ||
if (beforeFirstServerCall && allowDelegateRecreation) { | ||
delegate = spanner.createSession(db); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we get another session from the pool instead of creating a new one ?
I am afraid that the latency on this request will be too high when we internally retry.
try { | ||
T res = method.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be after markUsed() ?
} catch (SessionNotFoundException e) { | ||
maybeRecreateUnderlyingSession(e); | ||
lastException = null; | ||
return internalRunWithSessionRetry(method); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a chance we got sessionNotFound here as well..
have u considered putting this in a loop ?
Pros: customers never see a SessionNotFound
Cons: high latency, but our timeouts should still take effect.
} catch (SessionNotFoundException e) { | ||
maybeRecreateUnderlyingSession(e); | ||
TransactionRunner newRunner = delegate.readWriteTransaction(); | ||
result = newRunner.run(callable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, this could potentially still see a sessionNotFound (SNF)
10ccbb2
to
a05d6f1
Compare
@olavloite, it looks like there are conflicts. |
e0224e2
to
a63929e
Compare
...e-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SessionPool.java
Show resolved
Hide resolved
try { | ||
return callable.apply(session); | ||
} catch (SessionNotFoundException e) { | ||
session = pool.replaceReadWriteSession(e, session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up from my last comment, if the session is READ, is this correct, as replaceReadWriteSession eventually returns 'getReadWriteSession()'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it wasn't right, even though the method was never called with READ. But that was not correct either, as executePartitionedUpdate
does not need a read/write session, as it always starts a new transaction (as it needs a special type of transaction).
@@ -188,7 +188,7 @@ public long executePartitionedUpdate(final Statement stmt) { | |||
Span span = tracer.spanBuilder(PARTITION_DML_TRANSACTION).startSpan(); | |||
try (Scope s = tracer.withSpan(span)) { | |||
return runWithSessionRetry( | |||
SessionMode.READ_WRITE, | |||
SessionMode.READ, // PartitionedUpdate does not need a prepared tx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in a separate PR? Because this PR is quite large, I'd like to separate out the recreating sessions implementation from the refactorings/fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move it to a separate PR later this afternoon.
} | ||
|
||
@VisibleForTesting | ||
int getNumberOfAvailableWritePreparedSessions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not DBClient's responsibility. I wouldn't expose it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method is now removed and the visibility of the session pool changed to package-private to make it visible for test cases.
@@ -69,7 +99,7 @@ public Timestamp writeAtLeastOnce(Iterable<Mutation> mutations) throws SpannerEx | |||
public ReadContext singleUse() { | |||
Span span = tracer.spanBuilder(READ_ONLY_TRANSACTION).startSpan(); | |||
try (Scope s = tracer.withSpan(span)) { | |||
return pool.getReadSession().singleUse(); | |||
return getReadSession().singleUse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious whats the behavior with read sessions seeing SNF errors ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SNF is detected at the first actual database request. For read-only transactions (both single use and actual read transactions), that is at the first query. The query is sent to the server at the first call to ResultSet#next(), so the SNF for a read transaction is handled here:
Line 129 in 1b61e28
return internalNext(); |
} catch (RuntimeException e) { | ||
TraceUtil.endSpanWithFailure(span, e); | ||
throw e; | ||
} | ||
} | ||
|
||
private <T> T runWithSessionRetry(Function<Session, T> callable) { | ||
PooledSession session = getReadWriteSession(); | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this respect client side timeout values set ?
If our backend is down, and does not return any sessions -> whats the outcome of the customer application ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short answer: It will behave the same as now, and the behavior will depend on whether the pool has a session ready to be returned or not. The runWithSessionRetry
will only retry on the specific condition of a SessionNotFoundException
being returned.
There are two possible scenarios:
- The application requests a session and the session pool has a session that it thinks is valid. This session is handed to the application and the application tries to execute the transaction. This will cause the query or begin transaction statement to time out.
- The pool does not have a session available that can be returned and starts the creation of a new session asynchronously. The requesting thread is placed in a waiter. The create session request will fail and notify the waiter of the error. The waiting thread will return with the create session error.
Both of the above will not be considered a SessionNotFoundException
by the runWithSessionRetry
method, and the method will return with an error.
Replaced while(true) loops with internal methods. Added method for creating ReadOnlyTransactions.
When a SessionNotFoundException is thrown, a new session is now picked from the pool, instead of creating a new session on the server. This reduces latency in calls that get an invalidated session from the pool. It does however also require that the retry-logic for SessionNotFound is put in a loop, as the second session returned from the pool could also possibly be an invalidated session.
When a SessionNotFoundException occurs, a new session should be requested from the SessionPool instead of creating a new one.
829b657
to
55ec5ec
Compare
Sessions in the session pool that have been invalidated by the server may cause
SessionNotFoundException
s to bubble up to the user, even though the user has no control over the session management. When this occurs, the session should be removed from the SessionPool and a new session should be fetched from the SessionPool without the user having to intervene.In case of a
SessionNotFoundException
being thrown while using aTransactionManager
, the client library will replace the session with a new one from the pool, and trigger a restart of the transaction by throwing anAbortedException
(it is not possible to replace the session without also restarting the transaction).This change has been initiated on request from the Spanner team.