-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs, *: refactor planning vs exec APIs #37691
Comments
part of the problem here is that the conn executor has special code to execute "async" schema changes before sending the reply, making them appear synchronous: This doesn't exist for jobs, so there is no way to a) commit a job record and then b) wait for its results currently -- the job can't run until its creation commits, but that doesn't currently have any way to block on its execution. Current jobs commit, run and block all during the statement creation them, but that is a gross mis-use of the planner txn -- which won't commit until later -- to plan the job. |
It's useful to be able to create jobs in transactions. Obviously they would need to execute after the transaction commits. We add a new function ScheduleJob on the planner that allows that. The main use case is converting schema changes to jobs. We sweep under the rug the complicated semantics of handling any failures of these jobs. This will be addressed in follow up PR(s). This PR handles the plumbing and this function is not for production use yet. Touches cockroachdb#37691, cockroachdb#42061. Release note: none.
It's useful to be able to create jobs in transactions. Obviously they would need to execute after the transaction commits. We add a new function QueueJob on the extendedEvalContext to do that. The main use case is converting schema changes to jobs. We sweep under the rug the complicated semantics of handling any failures of these jobs. This will be addressed in follow up PR(s). This PR handles the plumbing and this function is not for production use yet. Touches cockroachdb#37691, cockroachdb#42061. Release note: none.
It's useful to be able to create jobs in transactions. Obviously they would need to execute after the transaction commits. We add a new function QueueJob on the extendedEvalContext to do that. The main use case is converting schema changes to jobs. We sweep under the rug the complicated semantics of handling any failures of these jobs. This will be addressed in follow up PR(s). This PR handles the plumbing and this function is not for production use yet. Touches cockroachdb#37691, cockroachdb#42061. Release note: none.
It's useful to be able to create jobs in transactions. Obviously they would need to execute after the transaction commits. We add a new function QueueJob on the extendedEvalContext to do that. The main use case is converting schema changes to jobs. We sweep under the rug the complicated semantics of handling any failures of these jobs. This will be addressed in follow up PR(s). This PR handles the plumbing and this function is not for production use yet. Touches cockroachdb#37691, cockroachdb#42061. Release note: none.
42689: jobs: schedule a job in a transaction to execute after commit r=spaskob a=spaskob jobs: schedule a job in a transaction to execute after commit It's useful to be able to create jobs in transactions. Obviously they would need to execute after the transaction commits. We add a new function QueueJob on the planner that allows that. The main use case is converting schema changes to jobs. We sweep under the rug the complicated semantics of handling any failures of these jobs. This will be addressed in follow up PR(s). This PR handles the plumbing and this function is not for production use yet. Touches #37691, #42061. Release note: none. 43136: storage: remove getQuorumIndex from computeTruncateDecision r=bdarnell a=irfansharif Some lineage here: `getQuorumIndex` was initially `getBehindIndex`, and was changed to getQuorumIndex to incorporate pending snapshots (561398f). Previous to this we started off computeTruncateDecision using raft.Status.Commit. As of (b64bdc9), we refactored log truncation decision making and pulled out pending snapshot logic back out of getQuorumIndex. `getQuorumIndex` was then far too similar to raft.Status.Commit but not quite. Given that we already use raft.Status.Progress in `computeTruncateDecision` to track lagging, yet recently active followers, we can get rid of `getQuorumIndex` altogether to simplify things. Co-authored-by: Spas Bojanov <[email protected]> Co-authored-by: irfan sharif <[email protected]>
The hook plans generally get a limited version of planner (planHookState) some parts of which use the planner's txn.
They use this planner as part of the planning of the job they ultimately start, e.g. resolving the table name or or user privileges, etc that go in to determining if, how and on what the job will run.
Unfortunately, planner usually assumes that it is used for the entirety of a normal SQL statement, so it's embedded txn is usually not committed until after the plan has run. This however means that when the hook plans start and run a job based on calls to this planner, they are running a job -- with side effects -- based on reads made with an uncommitted txn.
However some jobs support being planned in a transaction, and then run after it commits, e.g. using
DETACHED
. In these cases, it is vital the limit all their planning to use the planner txn to avoid leaking side-effects.Thus, all job planning statements (BACKUP, IMPORT, RESTORE, etc) should be audited to ensure they only use the planhook (planner) txn.
Then, for those jobs which wish to execute during the statement execution to return results (i.e. non-detached), they should a) assert the planner txn is implicit and then b) commit it inline ala 1PC INSERTs, before running the job.
Separately, job execution currently gets the same PlanHookState interface however, as this closes over a planner, it also closes over a planner transaction field, but as this is outside a statement execution, that field is nil. This has lead to crashes when method that reference that field are called, often indirectly via very non-obvious call-trees.
Instead, job execution (i.e.
Resume()
) should be passed a different interface entirely from planning, which is not a planner at all and has potentially nil txn field.The text was updated successfully, but these errors were encountered: