Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document TableFunctionSplitProcessor thread-safety #16955

Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@

import io.trino.spi.connector.ConnectorSplit;

/**
* Processes table functions splits, as returned from {@link io.trino.spi.connector.ConnectorSplitManager}
* for a {@link ConnectorTableFunctionHandle}.
* <p>
* Thread-safety: implementations do not have to be thread-safe. The {@link #process} method may be called from
* multiple threads, but will never be called from two threads at the same time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that implementation of TableFunctionSplitProcessor need to ensure memory visibility of internal data structures which may be changed by one thread and then accessed by the other. Or are we sure this is ensured by the caller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formally we would call on JLS's happen-before semantics and say that there is a happens-before relation between previous method call end and the next method call... i didn't want to be very formal here. but, i wanted to indicate the implementor should not take note of things iike thread id, or use ThreadLocal internally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, to reason about thread-safety, we must consider both TableFunctionSplitProcessor and TableFunctionProcessorProvider.
The LeafTableFunctionOperator calls TableFunctionProcessorProvider.getSplitProcessor(session, handle) for each split it has. The function author implements the TableFunctionProcessorProvider and they can decide on the lifecycle of the Processor. One extreme would be to keep a single TableFunctionSplitProcessor, and return it from each call to the provider -- and deal with multiple threads. The other extreme is to instantiate a new TableFunctionSplitProcessor for each call to the provider. The latter is easy and clear, and imo should be considered the default approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LeafTableFunctionOperator calls TableFunctionProcessorProvider.getSplitProcessor(session, handle) for each split it has.

Can it provide a split already in this method call?
it would make it clear the processor serves one split.

(i understand we wanted the leaf processor to be similar to intermediate processor, but reality is that a function implementor implements only one of them at the same time, so making things just simpler would be beneficial)

then the TableFunctionSplitProcessor just provides Pages until it's done. So becomes equivalent to ConnectorPageSource. Maybe we could reuse that interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just throwing in my 2 cents that I also found this part of the interface a bit unintuitive when reviewing @homar's CDF table function implantation, specifically that process is initially called with a Split and then continues to be called with null arguments. It's something you only need to learn once, but at first glance I was expecting this to work more like a ConnectorPageSource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it provide a split already in this method call?

It totally makes sense to call TableFunctionProcessorProvider.getSplitProcessor(session, handle, split), and remove the split argument from the TableFunctionSplitProcessor.process method. It should simplify the operator a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will take a stab.

What about replacing TableFunctionSplitProcessor with ConnectorPageSource, as a consequence of that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one seems to be designed for reading input. We'd have to think about how we implement getReadTimeNanos() etc. so that it makes sense for a particular table function. Or maybe subclass to ensure that those methods cannot be used. Even though they are never used anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
public interface TableFunctionSplitProcessor
{
/**
Expand Down