Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/RFCS: first draft of a sqlliveness RFC #50377

Merged
merged 1 commit into from
Jul 6, 2020

Conversation

ajwerner
Copy link
Contributor

There's been a lot of back-and-forth over what we're trying to build and why.
Hopefully an RFC can help us to consolidate on terminology, goals, and
implementation details.

Release note: None

@ajwerner ajwerner requested a review from spaskob June 18, 2020 00:44
@ajwerner ajwerner requested a review from a team as a code owner June 18, 2020 00:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/sqlliveness-RFC branch from 03ff2c0 to db5420f Compare June 18, 2020 03:12
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 47 at r1 (raw file):

This use maps directly to the use of node liveness to drive epoch-based
range leases. We would like to utilize SQL liveness to create leases
which do not need to be independently heartbeated to remain active. This

It would be nice to mention the scalability concern up front (this is the blowup from one SQL instance holding leases on many tables). I didn't realize that that was the issue until much later in the doc.


docs/RFCS/20200615_sql_liveness.md, line 55 at r1 (raw file):

protocol is madness.

2) Discovery of SQL instances and their metadata.

We need to be careful with this - we specifically avoided use cases like this in node liveness because of the critical importance of the liveness range (so we keep its load to the minimum writes necessary). Most nodes don't even read from the node liveness range directly; one node reads and then the information propagates via gossip.


docs/RFCS/20200615_sql_liveness.md, line 64 at r1 (raw file):

## Key Terms

 * `Instance` - An instance is a (unix) process running a SQL server. An instance is bound to a single SQL tenant. Instances have at various times been referred to as a SQL pod, or a SQL node. An instance will have an `InstanceID` which is gauranteed to be unique within the tenant cluster while the `Instance` exists. 

"Instance" is pretty generic; while we're introducing new terminology I'd prefer to use something more descriptive. At least call it a "SQL Instance".


docs/RFCS/20200615_sql_liveness.md, line 65 at r1 (raw file):

 * `Instance` - An instance is a (unix) process running a SQL server. An instance is bound to a single SQL tenant. Instances have at various times been referred to as a SQL pod, or a SQL node. An instance will have an `InstanceID` which is gauranteed to be unique within the tenant cluster while the `Instance` exists. 
 * `SQLInstanceID` - A 64-bit integer ID associated with an instance. It is guaranteed that each existing `Instance` will have a unique `SQLInstanceID`. It is, however, possible that an `SQLInstanceID` will be

Unless there's a really strong reason to use an int64, IDs should be bytes, not a numeric type (you're not going to do math with them).


docs/RFCS/20200615_sql_liveness.md, line 68 at r1 (raw file):

 recycled and used by a different `Instance` which shares no state or
 properties with a previous holder of that ID.
 * `Epoch` - An epoch is a sequence number associated with the liveness

If a SQLInstanceID is reused, does its epoch restart from zero or continue from its previous value?


docs/RFCS/20200615_sql_liveness.md, line 70 at r1 (raw file):

 * `Epoch` - An epoch is a sequence number associated with the liveness
of an `SQLInstanceID` in a tenant cluster.
 * `Session` - A session is defined as a tuple of `(SQLInstanceID, Epoch)`.

What do we gain by using a tuple like this instead of a single opaque SessionID? Do we ever use the capability to link two sessions of the same instance?


docs/RFCS/20200615_sql_liveness.md, line 81 at r1 (raw file):

transactions which utilize the claimed resource or the transaction must
commit before a known expiration timestamp for the session (perhaps by
utilizing the `*kv.Txn`'s commit deadline functionality which has serious room for improvement).

It's perhaps worth noting that this commit deadline is not how range leases are implemented. It's how SQL descriptor leases are implemented, which came after the fact and were not part of the original design for node liveness. Range leases are ignorant of the expiration of epochs.


docs/RFCS/20200615_sql_liveness.md, line 303 at r1 (raw file):

Maybe that 5m is excessive as we anticipate running SQL
pods on pre-emptible instances. Shortening it too much is costly here.

Another reason these leases are so long is that we can't (couldn't?) update the commit deadline for a transaction so we needed to allow plenty of buffer time.


docs/RFCS/20200615_sql_liveness.md, line 308 at r1 (raw file):

One postgres feature that we've discussed building are
`pg_advisory_locks`. Having a sql-level liveness construct

I haven't thought about this deeply but I've thought now that we have SELECT FOR UPDATE we should implement these advisory locks as SFU on a system locks table.


docs/RFCS/20200615_sql_liveness.md, line 357 at r1 (raw file):

observability more difficult for little obvious gain. The epoch-based leases in
KV have shown us that node epochs, given their monotonic nature, give state in
the cluster a sense of time and place.

This seems less valuable to me in a world in which sql instances are themselves more ephemeral.


docs/RFCS/20200615_sql_liveness.md, line 427 at r1 (raw file):

        * Big bummer here is that we're talking about sticking foreign key
          relationships between system tables.
        * Also seems really bad to make marking a node as dead an expensive

It's better if you use session IDs instead of (instance id, epoch), because invalidating an old session need not interfere with the establishment of a new session from that instance as long as we're not trying to treat them as different incarnations of the same thing.


docs/RFCS/20200615_sql_liveness.md, line 432 at r1 (raw file):

    4) Perhaps there's a compromise here between 1) and 3) to be made here.
       Let's call this one lazy resolution.
        * This is a lot like intent resolution.

And like range leases. You must observe the liveness record in order to request a new lease, and then it's the local lease record that is consulted for future operations.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the read. An alternative layout might be something like:

CREATE TABLE system.sql_liveness (
    session_id         UUID PRIMARY KEY, -- or a STRING
    expiration         DECIMAL NOT NULL,
    sql_instance_id INT8 NOT NULL -- does this need to be here?
)

CREATE TABLE system.sql_instances (
    sql_instance_id INT8 PRIMARY KEY
    -- ... metadata
)

This is what @spaskob was originally thinking - or at least close to it. I pushed back only because I was worried about figuring out which container in our deployment actually is running some job. Perhaps that's overblown because we could just store both the session and sql_instance_id in the jobs table.

An even more extreme refinement might be to not include the sql_instance_id in the system.sql_liveness table at all. The real thing that I think @spaskob was thinking is just:

CREATE TABLE system.sql_lease_sessions (
     id               STRING PRIMARY KEY,
     expiration DECIMAL NOT NULL -- hlc
)

This would be just like etcd leases and would not be tightly coupled to instances in any way. Is that better?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 55 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We need to be careful with this - we specifically avoided use cases like this in node liveness because of the critical importance of the liveness range (so we keep its load to the minimum writes necessary). Most nodes don't even read from the node liveness range directly; one node reads and then the information propagates via gossip.

My thinking has been that the tightness here doesn't have to be as big. Also, I


docs/RFCS/20200615_sql_liveness.md, line 64 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

"Instance" is pretty generic; while we're introducing new terminology I'd prefer to use something more descriptive. At least call it a "SQL Instance".

👍


docs/RFCS/20200615_sql_liveness.md, line 65 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Unless there's a really strong reason to use an int64, IDs should be bytes, not a numeric type (you're not going to do math with them).

Ack.


docs/RFCS/20200615_sql_liveness.md, line 68 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If a SQLInstanceID is reused, does its epoch restart from zero or continue from its previous value?

I was thinking it continues. There's an open question about when rows ever get removed that I'll add. Perhaps that's the best reason to use independent session IDs.


docs/RFCS/20200615_sql_liveness.md, line 70 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What do we gain by using a tuple like this instead of a single opaque SessionID? Do we ever use the capability to link two sessions of the same instance?

I suppose not.


docs/RFCS/20200615_sql_liveness.md, line 81 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's perhaps worth noting that this commit deadline is not how range leases are implemented. It's how SQL descriptor leases are implemented, which came after the fact and were not part of the original design for node liveness. Range leases are ignorant of the expiration of epochs.

👍 I'd like to incorporate the discussion going on at the bottom of the document higher up here.


docs/RFCS/20200615_sql_liveness.md, line 303 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Another reason these leases are so long is that we can't (couldn't?) update the commit deadline for a transaction so we needed to allow plenty of buffer time.

This is discussed in open questions. We should set the deadline right as we create EndTransactions regardless of what happens here.


docs/RFCS/20200615_sql_liveness.md, line 308 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I haven't thought about this deeply but I've thought now that we have SELECT FOR UPDATE we should implement these advisory locks as SFU on a system locks table.

I don't think it's that simple. pg_advisory_lock has some tricky semantics that outlive individual transactions and can be tied to SQL sessions. There was a pretty good exploration of this during the hackathon. In the hackathon, the pitch we ran with was to expose a special type of intent which can act as a lease. The difference between a lease intent and a regular intent is that when a lease intent is resolved due to expiration, it moves the timestamp cache to its expiration time.

The only point I'm trying to make is that there are a number of ways to implement pg_advisory_lock and this isn't definitely the right one. The unreplicated locks in SELECT FOR UPDATE aren't an obvious fit but something using intents or some slight generalization might be.


docs/RFCS/20200615_sql_liveness.md, line 357 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This seems less valuable to me in a world in which sql instances are themselves more ephemeral.

Fair. I'm more than happy to rework the basic structure here. My concern with a UUID is that it's going to make it hard to map leases back to a sql pod. One compromise could be to put the ID which internally contains the SQLInstanceID? Another might be to store the SQLInstanceID in the session row in a different column.


docs/RFCS/20200615_sql_liveness.md, line 427 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's better if you use session IDs instead of (instance id, epoch), because invalidating an old session need not interfere with the establishment of a new session from that instance as long as we're not trying to treat them as different incarnations of the same thing.

Good point.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 55 at r1 (raw file):

Previously, ajwerner wrote…

My thinking has been that the tightness here doesn't have to be as big. Also, I

This comment is incomplete.


docs/RFCS/20200615_sql_liveness.md, line 357 at r1 (raw file):

Previously, ajwerner wrote…

Fair. I'm more than happy to rework the basic structure here. My concern with a UUID is that it's going to make it hard to map leases back to a sql pod. One compromise could be to put the ID which internally contains the SQLInstanceID? Another might be to store the SQLInstanceID in the session row in a different column.

I think a separate column for this makes more sense than encoding stuff in the ID.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdarnell in the above example we'd decouple the instance metadata from the liveness sessions. In that world I'd anticipate that instances write to the system.sql_instances prior to attempting to create the first session. I'd like to have a discussion about that decomposition and whether it's well received before I go through and adopt it in this RFC.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 55 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This comment is incomplete.

True, I moved my discussion to the top level as I started to think more about this proposal. I don't feel a strong need to tightly couple these pieces of information. If I'm reading you correctly, you might like something closer to what I outlined in the top-level comment. Let's discuss there?

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the two-table version above. The instance ID isn't strictly necessary here but I think it will prove useful for observability/debugging.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)

@ajwerner
Copy link
Contributor Author

@spaskob I have a sense that you'll find yourself in agreement with Ben. Do you have interest in converting the language of the proposal over to your original proposal which he has endorsed?

Copy link
Contributor

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay - I was OOO on Monday. Yes, it is true that originally I was in favor of session id-s. However Andrew quite convincingly won me over to the pod ID + epoch) approach and after having tried in a hacky implementation for jobs, I must say it seems quite natural at least for jobs. One of the big advantages is that it makes writing a query to get expired or currently claimed job rows quite easy and readable. So fa the only drawback I've read about the epoch approach is that invalidating an epoch will lock the row and it will not prevent the sql pod to obtain a new epoch. I think this should be quite fine for jobs since if another node has detected the partition it will attempt next to steal the jobs and there is not much merit resurrecting the original node at the same time.
I am going to finish today a WIP PR that illustrates how epoch based claims can be used for jobs, I think this PR can be a useful place to discuss more concretely open questions in this RFC.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 23 at r1 (raw file):

The driving motivation for this work is the usage of node-liveness 
epochs as the basis for the best-effort locks in the jobs registry.

Is this motivation strong enough for as a general solution as described in this RFC? One may argue that the requirements for jobs are quite soft - a long adoption cycle of 30s, small amount of nodes and relatively small number of running jobs, no sub-second requirements for job failover. For such a use case any naive approaches should just work. And one may argue that a naive but easy to program against and flexible approach may be preferable.

In this RFC we are trading higher complexity and less flexibility compared to a jobs specific solution for a promise that the framework will be more generally applicable. For this promise to be realistic we need to include in the motivation other compelling use cases IMO.

Additionally for jobs, our prior experience of #registries = # nodes may be misleading for the future. What is our expectation of the number of sql pods that will be registered? What constraints should this system conform to in the new world fo multi-tenancy? Without analysis of how this system's requirements will evolve it's hard for me to justify a general solution vs a jobs specific one.


docs/RFCS/20200615_sql_liveness.md, line 65 at r1 (raw file):

Previously, ajwerner wrote…

Ack.

Additionally BYTES column type allows for the option to write an encoded proto in the future.


docs/RFCS/20200615_sql_liveness.md, line 68 at r1 (raw file):

Previously, ajwerner wrote…

I was thinking it continues. There's an open question about when rows ever get removed that I'll add. Perhaps that's the best reason to use independent session IDs.

My thinking is that it also continues. When a pod discovers that its epoch has been bumped it picks the current epoch if the deadline has not expired or bumps it and sets a new deadline that way all epochs < current one are considered invalid.


docs/RFCS/20200615_sql_liveness.md, line 70 at r1 (raw file):

Previously, ajwerner wrote…

I suppose not.

I think we might, also it makes it easier for a client to write queries for finding their current lease. An example from my current work in progress implementation using epochs:

	// Process jobs claimed as running or reverting.
	rows, err := r.ex.QueryEx(
		ctx, "select-running/reverting-jobs", nil,
		sqlbase.InternalExecutorSessionDataOverride{User: security.RootUser}, `
SELECT id FROM system.jobs
WHERE (status = $1 OR status = $2) AND (sqlinstance_id = $3 AND sqlinstance_epoch = $4)`,
		StatusRunning, StatusReverting, r.ID(), epoch,)

docs/RFCS/20200615_sql_liveness.md, line 127 at r1 (raw file):

}

type SessionIterator func(SessionInfo) (wantMore bool)

What is the benefit of iterators when the number of sessions is the number of SQL pods and I'd assume they are << O(1000)? A list interface will make it much easier for a developer to write and maintain code.


docs/RFCS/20200615_sql_liveness.md, line 134 at r1 (raw file):

Above find something of a straw-man interface to the sqlliveness

I think coming up with good interfaces is hard and especially so when we have only one use case to "test" them on. I feel strongly against defining a high level interface before we have another use case and I am willing to pay the price of future refactoring of jobs package when such interface has matured.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the big advantages is that it makes writing a query to get expired or currently claimed job rows quite easy and readable.

I'd love to unpack this further. Perhaps in the context of another PR. I anticipate that we can make this query clean regardless of whether the lease is scoped to (SQLInstanceID, Epoch) or (SessionID, SQLInstanceID, Epoch) where in the latter the opaque session ID is the actual claim and the other fields are metadata.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 23 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

Is this motivation strong enough for as a general solution as described in this RFC? One may argue that the requirements for jobs are quite soft - a long adoption cycle of 30s, small amount of nodes and relatively small number of running jobs, no sub-second requirements for job failover. For such a use case any naive approaches should just work. And one may argue that a naive but easy to program against and flexible approach may be preferable.

In this RFC we are trading higher complexity and less flexibility compared to a jobs specific solution for a promise that the framework will be more generally applicable. For this promise to be realistic we need to include in the motivation other compelling use cases IMO.

Additionally for jobs, our prior experience of #registries = # nodes may be misleading for the future. What is our expectation of the number of sql pods that will be registered? What constraints should this system conform to in the new world fo multi-tenancy? Without analysis of how this system's requirements will evolve it's hard for me to justify a general solution vs a jobs specific one.

We should ancipate that there will be many more sql pods than KV nodes. Furthermore, those sql pods will be rate-limited at the KV layer. The way I see it, the less heartbeating we can do at the sql layer, the better.


docs/RFCS/20200615_sql_liveness.md, line 70 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

I think we might, also it makes it easier for a client to write queries for finding their current lease. An example from my current work in progress implementation using epochs:

	// Process jobs claimed as running or reverting.
	rows, err := r.ex.QueryEx(
		ctx, "select-running/reverting-jobs", nil,
		sqlbase.InternalExecutorSessionDataOverride{User: security.RootUser}, `
SELECT id FROM system.jobs
WHERE (status = $1 OR status = $2) AND (sqlinstance_id = $3 AND sqlinstance_epoch = $4)`,
		StatusRunning, StatusReverting, r.ID(), epoch,)

There is nothing that precludes additionally including SQLInstanceID in the jobs table in addition to an opaque session ID. We could then do (sqlinstance_id = $3 AND session = $4) and it would all be the same.


docs/RFCS/20200615_sql_liveness.md, line 127 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

What is the benefit of iterators when the number of sessions is the number of SQL pods and I'd assume they are << O(1000)? A list interface will make it much easier for a developer to write and maintain code.

I generally prefer the iterator approach as it later lends itself to both pagination and predicates. It's easy to implement a slice approach on top of an iterator but not very valuable to do it the other way. Furthermore, doing an iterator approach allows the internal data structure to be implemented as, say, a BTree and not incur allocations upon invocation.

I'm not wedded here but don't buy the idea that the iterator approach is particularly difficult.


docs/RFCS/20200615_sql_liveness.md, line 134 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

I think coming up with good interfaces is hard and especially so when we have only one use case to "test" them on. I feel strongly against defining a high level interface before we have another use case and I am willing to pay the price of future refactoring of jobs package when such interface has matured.

This here is probably the strongest point of contention. In my mind, a clear abstraction is easier to evolve and evaluate than a leaky set of ad-hoc functionality. In fact, I find that often the hardest thing is to understand what the abstractions really are.

All that being said, i do think there's a trap in thinking that abstractions which exist are the right ones to exist that can lead to awkward code which bends itself backwards trying to make a bad abstraction work for a new use-case rather than changing the abstraction.

@ajwerner ajwerner force-pushed the ajwerner/sqlliveness-RFC branch from db5420f to dbeee41 Compare June 30, 2020 03:59
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the RFC to utilize the unique session ID approach. See how y'all feel about it.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 47 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It would be nice to mention the scalability concern up front (this is the blowup from one SQL instance holding leases on many tables). I didn't realize that that was the issue until much later in the doc.

Done.


docs/RFCS/20200615_sql_liveness.md, line 308 at r1 (raw file):

Previously, ajwerner wrote…

I don't think it's that simple. pg_advisory_lock has some tricky semantics that outlive individual transactions and can be tied to SQL sessions. There was a pretty good exploration of this during the hackathon. In the hackathon, the pitch we ran with was to expose a special type of intent which can act as a lease. The difference between a lease intent and a regular intent is that when a lease intent is resolved due to expiration, it moves the timestamp cache to its expiration time.

The only point I'm trying to make is that there are a number of ways to implement pg_advisory_lock and this isn't definitely the right one. The unreplicated locks in SELECT FOR UPDATE aren't an obvious fit but something using intents or some slight generalization might be.

I've removed this given I didn't have a coherent proposal.


docs/RFCS/20200615_sql_liveness.md, line 432 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

And like range leases. You must observe the liveness record in order to request a new lease, and then it's the local lease record that is consulted for future operations.

Done.

@ajwerner ajwerner force-pushed the ajwerner/sqlliveness-RFC branch 2 times, most recently from 2d57153 to d5e07b4 Compare July 1, 2020 14:23
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreimatei I added a rather speculative section on transaction record heartbeating. I also added a section on CHANGEFEED one-version leases like from the hack-a-thon as a future use case.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 55 at r1 (raw file):

Previously, ajwerner wrote…

True, I moved my discussion to the top level as I started to think more about this proposal. I don't feel a strong need to tightly couple these pieces of information. If I'm reading you correctly, you might like something closer to what I outlined in the top-level comment. Let's discuss there?

Done.


docs/RFCS/20200615_sql_liveness.md, line 64 at r1 (raw file):

Previously, ajwerner wrote…

👍

Done.


docs/RFCS/20200615_sql_liveness.md, line 65 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

Additionally BYTES column type allows for the option to write an encoded proto in the future.

The SQLInstanceID is an existing concept. The SessionID has become an opaque bytes identifier.


docs/RFCS/20200615_sql_liveness.md, line 70 at r1 (raw file):

Previously, ajwerner wrote…

There is nothing that precludes additionally including SQLInstanceID in the jobs table in addition to an opaque session ID. We could then do (sqlinstance_id = $3 AND session = $4) and it would all be the same.

This is no longer relevant. Resolving.


docs/RFCS/20200615_sql_liveness.md, line 127 at r1 (raw file):

Previously, ajwerner wrote…

I generally prefer the iterator approach as it later lends itself to both pagination and predicates. It's easy to implement a slice approach on top of an iterator but not very valuable to do it the other way. Furthermore, doing an iterator approach allows the internal data structure to be implemented as, say, a BTree and not incur allocations upon invocation.

I'm not wedded here but don't buy the idea that the iterator approach is particularly difficult.

I've removed the concept of iteration. Now there's only a predicate.


docs/RFCS/20200615_sql_liveness.md, line 357 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I think a separate column for this makes more sense than encoding stuff in the ID.

Ack.


docs/RFCS/20200615_sql_liveness.md, line 427 at r1 (raw file):

Previously, ajwerner wrote…

Good point.

Done.

Copy link
Contributor

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 134 at r1 (raw file):

Previously, ajwerner wrote…

This here is probably the strongest point of contention. In my mind, a clear abstraction is easier to evolve and evaluate than a leaky set of ad-hoc functionality. In fact, I find that often the hardest thing is to understand what the abstractions really are.

All that being said, i do think there's a trap in thinking that abstractions which exist are the right ones to exist that can lead to awkward code which bends itself backwards trying to make a bad abstraction work for a new use-case rather than changing the abstraction.

Sounds good, I am preparing the jobs PR to use random session IDs instead of epoch based leases. Let's continue the discussion about the API there. Based on our discussions I don't anticipate any big changes.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdarnell any chance I could get you to weigh in on the current state of this RFC? @spaskob is itching to get an implementation under way.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @spaskob)


docs/RFCS/20200615_sql_liveness.md, line 23 at r1 (raw file):

Previously, ajwerner wrote…

We should ancipate that there will be many more sql pods than KV nodes. Furthermore, those sql pods will be rate-limited at the KV layer. The way I see it, the less heartbeating we can do at the sql layer, the better.

I agree that even if the job system is the first user of this scheme, it's the potential use for things like table leases that really justify it IMHO.


docs/RFCS/20200615_sql_liveness.md, line 68 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

My thinking is that it also continues. When a pod discovers that its epoch has been bumped it picks the current epoch if the deadline has not expired or bumps it and sets a new deadline that way all epochs < current one are considered invalid.

I think we can remove a lot of detail about the SQLInstanceID since it's a preexisting concept and we're no longer using it to construct session IDs here.

@ajwerner ajwerner force-pushed the ajwerner/sqlliveness-RFC branch from d5e07b4 to e7267bb Compare July 6, 2020 20:26
There's been a lot of back-and-forth over what we're trying to build and why.
Hopefully an RFC can help us to consolidate on terminology, goals, and
implementation details.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/sqlliveness-RFC branch from e7267bb to b9fb657 Compare July 6, 2020 20:26
@ajwerner
Copy link
Contributor Author

ajwerner commented Jul 6, 2020

TFTR!

bors r=bdarnell,spaskob

@craig
Copy link
Contributor

craig bot commented Jul 6, 2020

Build succeeded

@craig craig bot merged commit afecdb7 into cockroachdb:master Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants