-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stale read for rawkv with read ts #96
base: master
Are you sure you want to change the base?
Stale read for rawkv with read ts #96
Conversation
Signed-off-by: iosmanthus <[email protected]>
Signed-off-by: iosmanthus <[email protected]>
Signed-off-by: iosmanthus <[email protected]>
|
||
TiKV currently supports **three** features to process read-only queries more efficiently. | ||
|
||
1. Follower read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean replica read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then correct name should be used.
text/0096-rawkv-stale-read.md
Outdated
} | ||
``` | ||
|
||
2. While TiKV is handling radw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. While TiKV is handling radw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`. | |
2. While TiKV is handling raw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`. |
|
||
```diff | ||
class RawKVClient { | ||
+ ByteString rawGet(ByteString key, readTs: Timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be confusing for client to understand what is readTs in RawKV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing the time type to DataTime
instead of using TimeStamp
which might be more like the syntax of TiDB: https://docs.pingcap.com/tidb/dev/as-of-timestamp#syntax
text/0096-rawkv-stale-read.md
Outdated
|
||
### TiKV | ||
|
||
While trying to read data, clients should specify a timestamp which attachs to the request header as `read_ts`, typically a timestamp few seconds ago. The replica should read the local storage with the `read_ts` and reuses the mechanism from the stale read of TxnKV. This requires the replica to check the `read_ts` against the `safe_ts` which is advaneced by `CheckLeader` message from the store of the leader or `resolve-ts` worker. As long as the `safe_ts` is no less than `read_ts`, the replica is allowed to read the key from local storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is safe_ts
maintained in RawKV since there is no locks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no locks, the resolve_ts
will advance the safe_ts
by requesting the TSO for a timestamp periodically. The default config for the resolve_ts
worker is 1s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More details are supplemented.
Signed-off-by: iosmanthus <[email protected]>
…thus/tikv-rfcs into stale-read-for-rawkv-with-read-ts
|
||
TiKV currently supports **three** features to process read-only queries more efficiently. | ||
|
||
1. Follower read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then correct name should be used.
|
||
1. Follower read. | ||
|
||
Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to it to the client. This feature helps distribute the read stress on the leader but still increases the read latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to it to the client. This feature helps distribute the read stress on the leader but still increases the read latency. | |
Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to the client. This feature helps distribute the read stress on the leader but still increases the read latency. |
|
||
The `read_ts` specified by the client could be acquired by the following ways: | ||
|
||
1. Calculate a timestamp from the physical time from the local. The `read_ts` might suffer from the clock drift and exceed the max timestamp allocated from TSO. The client will fail to read any data even if that target replica is the leader since the `safe_ts` of the replica don't catch up with the `read_ts`. **Deploying NTP services** in the cluster might mitigate this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can preserve this value for the unbound stable read: read the latest data without checking safe_ts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then how to preserve compatibility?
|
||
The `read_ts` specified by the client could be acquired by the following ways: | ||
|
||
1. Calculate a timestamp from the physical time from the local. The `read_ts` might suffer from the clock drift and exceed the max timestamp allocated from TSO. The client will fail to read any data even if that target replica is the leader since the `safe_ts` of the replica don't catch up with the `read_ts`. **Deploying NTP services** in the cluster might mitigate this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if read_ts
exceeds the max timestamp allocated from TSO, maybe we can just return the latest data instead of no data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the read_ts
will lost its restriction to the data freshness since some very stale replicas might be chosen.
|
||
### TiKV | ||
|
||
While trying to read data, clients should specify a timestamp which attachs to the request header as `read_ts`, typically a timestamp few seconds ago. The replica should read the local storage with the `read_ts` and reuses the mechanism from the stale read of TxnKV. This requires the replica to check the `read_ts` against the `safe_ts` which is advaneced by `CheckLeader` message from the store of the leader (for follower) or `resolve-ts` worker (for leader). As long as the `safe_ts` is no less than `read_ts`, the replica is allowed to read the key from local storage. Notice that there is no lock for the RawKV regions, thus the `resolve-ts` worker advanced the `safe_ts` by requesting the TSO for the latest timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
What's the meaning of
safe_ts
for RawKV ? Suggest to give a definition, e.g, "the minimum timestamp of the on-the-fly RawKV writes", or "all writes beforesafe_ts
can be read". -
If the definition depends on "timestamp" of RawKV writes, this feature depends on the timestamp introduced by API V2, is that right ?
-
The mechanism to get the "minimal timestamp" of the on-the-fly writes between Txn & Raw would be quite different. Although there is no locks, Raw writes would still be "on-the-fly" during Raft procedure.
-
RawKV CDC faces a very similar problem to track "on-the-fly" for
resolved-ts
. I think we can reuse it for stale read. Please refer to RawKV Change Data Capture #86 .
There is a special case that user may choose availability rather than consistency. So client is OK to read with any ts, that is just return what the replica has currently. In this RFC, it seems keys with larger ts may be skipped during read. |
This RFC doesn't depend on the keys' timestamp, the underlying storage could have no information about the timestamp. The |
I'm OK with 0 timestamp. Currently, txn stale read consider ts 0 an error. And client (like TiDB) may actually send ts 0 by mistake. This RFC should state clear what 0 means in rawkv and implementation should not break compatibility. |
Signed-off-by: iosmanthus [email protected]
This pull request is based on the #80.
Rendered