-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document TableFunctionSplitProcessor thread-safety #16955
Merged
findepi
merged 2 commits into
trinodb:master
from
findepi:findepi/document-tablefunctionsplitprocessor-thread-safety-38bd32
Apr 11, 2023
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that implementation of
TableFunctionSplitProcessor
need to ensure memory visibility of internal data structures which may be changed by one thread and then accessed by the other. Or are we sure this is ensured by the caller?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formally we would call on JLS's
happen-before
semantics and say that there is ahappens-before
relation between previous method call end and the next method call... i didn't want to be very formal here. but, i wanted to indicate the implementor should not take note of things iike thread id, or use ThreadLocal internally.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, to reason about thread-safety, we must consider both
TableFunctionSplitProcessor
andTableFunctionProcessorProvider
.The
LeafTableFunctionOperator
callsTableFunctionProcessorProvider.getSplitProcessor(session, handle)
for each split it has. The function author implements theTableFunctionProcessorProvider
and they can decide on the lifecycle of the Processor. One extreme would be to keep a singleTableFunctionSplitProcessor
, and return it from each call to the provider -- and deal with multiple threads. The other extreme is to instantiate a newTableFunctionSplitProcessor
for each call to the provider. The latter is easy and clear, and imo should be considered the default approach.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it provide a split already in this method call?
it would make it clear the processor serves one split.
(i understand we wanted the leaf processor to be similar to intermediate processor, but reality is that a function implementor implements only one of them at the same time, so making things just simpler would be beneficial)
then the
TableFunctionSplitProcessor
just provides Pages until it's done. So becomes equivalent toConnectorPageSource
. Maybe we could reuse that interface?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just throwing in my 2 cents that I also found this part of the interface a bit unintuitive when reviewing @homar's CDF table function implantation, specifically that
process
is initially called with a Split and then continues to be called withnull
arguments. It's something you only need to learn once, but at first glance I was expecting this to work more like aConnectorPageSource
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It totally makes sense to call
TableFunctionProcessorProvider.getSplitProcessor(session, handle, split)
, and remove thesplit
argument from theTableFunctionSplitProcessor.process
method. It should simplify the operator a lot.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will take a stab.
What about replacing
TableFunctionSplitProcessor
withConnectorPageSource
, as a consequence of that?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one seems to be designed for reading input. We'd have to think about how we implement
getReadTimeNanos()
etc. so that it makes sense for a particular table function. Or maybe subclass to ensure that those methods cannot be used. Even though they are never used anyway.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#17032