Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table function to execute stored procedure in SQLServer #15982

Closed

Conversation

Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented Feb 6, 2023

Description

A dedicated table function which allows us to run stored procedures in SQLServer. We can't use existing query table function - as it wraps with inside a query like SELECT * FROM (EXEC my_procedure) o - so we are handing them via a different function. All the pushdown are disabled for JdbcProcedureHandle resolved by this Procedure.

Slightly different implementation from #15813

Limitation

  • In case of stored procedures with multiple statement - we process the ResultSet only for the first statement. (Test to be added)
  • Stored procedures with output variables are not supported.
  • Input variables for the stored procedures are not parameterized(yet).

Example

SELECT * FROM TABLE(system.procedure(query => 'CALL my_new_procedure()')

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# SQLServer 
* Add support for executing stored procedures using `procedure` table function. 

@cla-bot cla-bot bot added the cla-signed label Feb 6, 2023
@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch 2 times, most recently from e17613b to eec9f7d Compare February 6, 2023 08:52
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. @kokosing please review

@@ -69,6 +69,10 @@ public ConnectorSplitSource getSplits(
DynamicFilter dynamicFilter,
Constraint constraint)
{
if (table instanceof JdbcProcedureHandle) {
return delegateSplitManager.getSplits(transaction, session, table, dynamicFilter, constraint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a code comment

@@ -63,7 +64,7 @@
private ResultSet resultSet;
private boolean closed;

public JdbcRecordCursor(JdbcClient jdbcClient, ExecutorService executor, ConnectorSession session, JdbcSplit split, JdbcTableHandle table, List<JdbcColumnHandle> columnHandles)
public JdbcRecordCursor(JdbcClient jdbcClient, ExecutorService executor, ConnectorSession session, JdbcSplit split, ConnectorTableHandle table, List<JdbcColumnHandle> columnHandles)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a BaseJdbcConnectorTableHandle marker interface and let JdbcProcedureHandle, JdbcTableHandle both extend it; use here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be as an abstract class right ?

// If no columns are recorded, it means that applyProjection never got called (e.g., in the case all columns are being used) and all
// table columns should be returned. TODO: this is something that should be addressed once the getRecordSet API is revamped
jdbcTable.getColumns()
.ifPresent(tableColumns -> verify(ImmutableSet.copyOf(tableColumns).containsAll(columns)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not for procedure handle?
Have same check for them too

Copy link
Member Author

@Praveen2112 Praveen2112 Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of ProcedureHandle we won't face the no columns kind of a situation - plus the columns are populated when we are creating the table handle. Add the ProcedureHandle doesn't support project pushdown there is a good chance the column we pass on RecordSetProvider might be a subset of overall columns present in ProcedureHandle. Will implement it.

}
}

public record ProcedureQuery(@JsonProperty String query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we use records for between-nodes serialization (I think we are starting to)

then ProcedureFunctionHandle above can be a record too

also, i don't think we need @JsonProperty and @JsonCreator for records (airlift/airlift#993), do we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcedureFunctionHandle - wanted be in-sync with other TableHandle implementation. I don't have any strong opinions on it. Will move it to a record.

if (handle instanceof QueryFunctionHandle queryFunctionHandle) {
return Optional.of(getTableFunctionApplicationResult(session, queryFunctionHandle.getTableHandle()));
}
else if (handle instanceof ProcedureFunctionHandle procedureFunctionHandle) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant else

return jdbcClient.delete(session, (JdbcTableHandle) handle);
}

@Override
public void truncateTable(ConnectorSession session, ConnectorTableHandle tableHandle)
{
verify(!(tableHandle instanceof JdbcProcedureHandle), "Not a table reference: %s", tableHandle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this. The code will never get invoked with JdbcProcedureHandle, and if it would, it would fail explicitly anyway

@@ -904,6 +985,7 @@ public void setColumnType(ConnectorSession session, ConnectorTableHandle table,
@Override
public void renameTable(ConnectorSession session, ConnectorTableHandle table, SchemaTableName newTableName)
{
verify(!(table instanceof JdbcProcedureHandle), "Not a table reference: %s", table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replicate it for all alter options ?

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch 2 times, most recently from 584e83c to 47bdf91 Compare February 6, 2023 10:19
@Praveen2112
Copy link
Member Author

@findepi AC.

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from 47bdf91 to db1d540 Compare February 6, 2023 11:10
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@vlad-lyutenko
Copy link
Contributor

vlad-lyutenko commented Feb 6, 2023

We can't use existing query table function - as it wraps with inside a query like SELECT * FROM (CALL my_procedure()) o

As for me it's really new topic, can you please point me to some entry point, to understand what we have now (I mean existing query table function) and why we can't use it for this connectors

@findepi
Copy link
Member

findepi commented Feb 6, 2023

@vlad-lyutenko this all boils down to two things

  • Connection.prepareStatement vs Connection.prepareCall
  • blocking additional SQL pushdown for calls

@mosabua
Copy link
Member

mosabua commented Feb 6, 2023

Given that this is a very powerful user facing feature that potentially has huge security and functionality implications I request the following:

  • Make sure documentation is included in this PR similar to the query passthrough table function with the relevant warnings and suggestions about security in each connector doc
  • Figure out how this works in terms of privileges or other access control we potentially have to add

cc @colebow @jhlodin @k-haley1

@mosabua
Copy link
Member

mosabua commented Feb 6, 2023

Also for reference following is a link to the section in the MySQL connector docs where a new subsection for the new table function is needed.

https://trino.io/docs/current/connector/mysql.html#table-functions

And related question... will this table function have a similar restriction where it is required that the stored procedure returns a result set? If so .. that could be limiting functionality. And if not ... the security implications are even bigger since arbitrary stored procedures can be called.

Last.. I assume that the user accessing the database from the catalog configuration has to have the right to run the stored procedure .. can you detail what is actually specifically needed in MySQL and SQL Server terminology?

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from db1d540 to 9bf4ed0 Compare February 7, 2023 07:25
@Praveen2112
Copy link
Member Author

will this table function have a similar restriction where it is required that the stored procedure returns a result set?

Yes we have a same restriction where the stored procedure has to return a ResultSet - we don't support multiple ResultSet as of now.

I assume that the user accessing the database from the catalog configuration has to have the right to run the stored procedure

Similar to tables - if the user specified in the catalog configuration has access to run the stored procedures then we would support it.

@Praveen2112
Copy link
Member Author

@kokosing AC

SCHEMA_NAME,
NAME,
List.of(ScalarArgumentSpecification.builder()
.name("QUERY")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more user-friendly if the argument was renamed to procedure and the user would only pass the procedure call, that is "procedure_name()" without the CALL keyword.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on vendor you are using CALL or EXEC. So maybe we could rename name of function to call to exec depending on vendor (connector) and only pass the name of procedure. This could work only if we are not expecting to pass parameters any parameters to procedures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of arguments we could either pass them as an Array<String> and handle them accordingly.

@mosabua
Copy link
Member

mosabua commented Feb 13, 2023

Need any help with the docs here @Praveen2112 ? Should I send a separate parallel PR?

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from 9bf4ed0 to 10dc303 Compare February 22, 2023 12:02
@Praveen2112 Praveen2112 changed the title Add table function to execute stored procedure in MySQL & SQLServer Add table function to execute stored procedure in SQLServer Feb 22, 2023
@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch 2 times, most recently from 6d8b5c4 to f464ee5 Compare February 22, 2023 14:14
@github-actions github-actions bot added the docs label Feb 22, 2023
@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from be36f5f to daec542 Compare February 24, 2023 06:43
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed. I am fine with the approach

);

If the stored procedure ``employee_sp`` requires any input, it can be specified via ``inputs``
argument as an ``array<varchar>``::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if procedure expects an integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we pass the arguments as string - we pass it without quotes for integer and with additional quotes for string (if requested by the underlying datasource).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for SQLServer, it doesn't expect additional quotes. So do we need to document it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’d be better to pass the arguments as a ROW, so each one has the proper type. You get type safety and avoid conversions that might not be possible to do easily or reliably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide pointer on where/how it is established during the analysis step - Should we use a different Argument specification for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of scalar argument is established at TF declaration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we could pass arbitrary scalars, it seems right to use an array of varchar in this case. It shifts the responsibility of formatting them the right way to the user. That's in the spirit of query pass-through. This way it is harder for the user to pass parameters, or non-literal expressions, but I don't know if a real use-case would need those capabilities.

Like @kokosing noticed, we must watch out for possible abuse (injection).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table argument is the only argument type of table functions that is polymorphic . Technically, we could make use of it, and pass a scalar subquery containing parameters. But then we'd have to refactor the PTF to use the operator-based execution.

{
return new ProcedureQuery(
"EXECUTE %s.%s %s"
.formatted(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like one could inject some SQL by using inputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. Have added additional checks.

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from daec542 to d383be0 Compare February 27, 2023 10:10
@Praveen2112 Praveen2112 requested a review from hashhar February 28, 2023 14:03
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good, some early comments.

public void testProcedureWithDropOperation()
{
try (TestTable table = new TestTable(onRemoteDatabase(), "table_to_drop", "(id BIGINT)")) {
try (TestProcedure testProcedure = createTestingProcedure("DROP TABLE " + table.getName())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added test with OUTPUT clause.

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from d383be0 to 3792ce9 Compare March 3, 2023 13:33
@Praveen2112
Copy link
Member Author

@hashhar , @skrzypo987 AC

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch 2 times, most recently from 3301e93 to 91ee0b7 Compare March 6, 2023 02:19
@Praveen2112
Copy link
Member Author

Rebased the PR to resolve conflicts.

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from 91ee0b7 to 5fa16b1 Compare March 6, 2023 06:14
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the recent changes.

Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good. Did not look at the code..

@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from 5fa16b1 to 8515486 Compare March 7, 2023 10:03
);

If the stored procedure ``employee_sp`` requires any input, it can be specified via ``inputs``
argument as an ``array<varchar>``::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document this?

public JdbcProcedureHandle getProcedureHandle(ConnectorSession session, ProcedureInformation procedureInformation)

{
if (procedureInformation.inputArguments().stream().anyMatch(argument -> argument.contains(";"))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can process arguments via preparedCall.setParameter we wouldn't need this.
if we cannot process arguments via preparedCall.setParameter we shouldn't pretend we can.
If the user provides parameter values as string literals to be concatenated with the procedure call, why can't they provide the whole procedure call (including the parameters)? Just like we do with SQL pushdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can process arguments via preparedCall.setParameter we wouldn't need this.

It could be possible - like if we could get the procedure column information and map them accordingly. I would like implement them in an incremental way as executing stored procedure is kind of an experimental feature. WDYT ?

why can't they provide the whole procedure call (including the parameters)? Just like we do with SQL pushdown?

The initial approach we have is like SQL Pushdown - where we don't do any processing but consume a String as an input. For some datasources like SQLServer it would be helpful if we could get the schema and procedure name - so we could apply some checks to ensure the SP doesn't any DDL or DML operations. So it more of an extended version of SQLPushdown - where we take some of them in parts and concat all the pieces.

@findepi findepi dismissed their stale review March 17, 2023 20:26

that was just for emphasis.

These methods can be reused for Procedures PTF
- Extract building columns from ResultSetMetaData as a separate method.
- Extract creating connection based on session
@Praveen2112 Praveen2112 force-pushed the praveen/stored_procedure_ptf_2 branch from 47e7213 to 88d57a0 Compare March 21, 2023 09:30
@Praveen2112
Copy link
Member Author

Rebased due to conflicts.

@Praveen2112
Copy link
Member Author

Overridden by #16696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.