feat(csharp): redefine C# APIs to prioritize full async support (#1865) · apache/arrow-adbc@ddb1bcf

Commit

feat(csharp): redefine C# APIs to prioritize full async support (#1865)

For #1843, redefines the C# APIs to prioritize full async support. As
this is making a number of breaking changes already, it also takes the
opportunity to do some general cleanup.

Async methods that are generally expected to run locally (e.g.
GetOption/SetOption) are defined to return ValueTask and have their
default implementation be synchronous. Async methods that are generally
expected to run remotely are defined to return Task and have their
default implementations be asynchronous.

My mental model for the four ADBC "object" types is as follows:

The driver is analogous to the ODBC or JDBC driver, or ADO.NET provider.
In JDBC, this type is represented by the java.sql.Driver interface. In
ADO.NET, it's represented by the DbProviderFactory class. Because this
object is strictly about code, it's not expected to do any IO other than
that potentially required by any code e.g. to bring in pages of a binary
image from disk.

The database is analogous to the JDBC DataSource, the ODBC "DSN" or the
ADO.NET connection string. It represents the information and capability
required to create a database connection but does not itself do IO until
it tries to create a connection. (This would imply that parameter
validation which requires network access -- e.g. to validate a host name
-- is deferred until the connection is created. Perhaps that's too
limiting?)

Because neither the driver nor the database is doing IO, neither of them
need to have async methods other than Connect, including async cleanup.

The connection represents an actual session with a database. This
matches an ODBC connection, a JDBC java.sql.Connection or an ADO.NET
DbConnection. Opening a connection, closing it or using it to fetch
information about the data source are all operations likely to require
IO so these all require async implementations.

The statement is a unit of bookkeeping related to certain types of
database operations. In some cases, a connection can only have a single
active statement running against it, but it can be useful to have
multiple statements even then if, for instance, each one is a prepared
statement that represents both client-side and server-side resources.
The statement is analogous to an ODBC statement, a JDBC
java.sql.Statement or an ADO.NET DbCommand. Due to the need to clean up
an in progress operation or to release server-side resources, the
cleanup of a statement might do IO and should therefore support
asynchrony. But when the statement is first created, it only represents
the potential for future work and so creation is always synchronous.

I'd be curious to hear how well this aligns with others' points of view.
lidavidm? davidhcoe? ("ping" removed)

Loading branch information

CurtHagenlocher authored May 19, 2024

1 parent 3d021ea commit ddb1bcf

0 comments on commit `ddb1bcf`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `ddb1bcf`

Commit

There are no files selected for viewing

0 comments on commit ddb1bcf

0 comments on commit `ddb1bcf`