Guidelines to handle schemes and views #24

zozlak · 2014-12-09T11:59:53Z

I rise my issue here, because I think it would be valuable to decide such things on the "top level", so implementations of the DBI API can be compiliant one with the other and provide users with a really unified API.

Wide variety of modern database systems supports:

many schemes per database;
views;

but there are no official guidelines how to handle that in packages implementing the DBI API.

There is a nottice in dbGetTables() description that "This should, where possible, include temporary tables.". Is it possible to add a similar notice regarding views? And maybe also in dbReadTable() description?
I am asking about it because views play a vital role for both:

assuring user privileages;
making complicated database structures available for "everyday users";

and ommiting them in DBI API implementations dramatically reduce their functionality.
This is even more painful as dplyr uses DBI database connectivity backends.

The schemes issue is more complicated. DBI wanted a table to be identified by simple a name. For database systems which support multiple schemes per database a clarification is needed:

Does a schema name (if other the default scheme) should be the part of the "DBI table name"? (which by the way is simply wrong, because we can not assure proper escaping, anyway it works like that in some DBI implementations)
Or maybe DBI drivers are by definition limited to use only a default scheme?
Or maybe DBI should define other (but common for all DBI compiliant drivers) way to deal with schemes?

I checked RPostgreSQL, RMySQL and ROracle and they deal with scheme in different way:

RMySQL pretends that there is no such things like schemes even MySQL supports schemes (in its own way, by mapping them to databases, but anyway);
ROracle functions take an additional and optional "scheme" parameter;
RPostgreSQL functions accept the "name" parameter to be a character vector; in such a case they take the first element of the vector as schema name and the second as table name.

And checking onother DBI API implementations would probably make this list longer.
This inconsistency limits the usage of the DBI API when it cames to handle schemes.

krlmlr · 2015-11-11T08:08:41Z

Views and temporary tables should be returned with dbListTables()
DBI should officially support multi-component table names (mostly for dbReadTable() and dbWriteTable()) ~~via the new Table class~~ that are combined to a single identifier with the help of a new dbBuildIdentifier(Connection, character); the latter defaults to quoting the individual names and combining them with a "."

I think everything can be tested in a generic fashion in the DBItest package

krlmlr · 2015-11-11T21:33:42Z

Need schema parameter in dbListTables(), default to "current schema"

krlmlr · 2015-11-11T21:40:40Z

dbBuildIdentifier() or dbQuoteIdentifier()? Double-check if anyone uses it, probably not (at least not dplyr) --> then it's dbQuoteIdentifier() with character vector

zozlak · 2015-11-11T22:21:57Z

Need schema parameter in dbListTables(), default to "current schema"

On one hand I fully support your request but on the other this will be backward-incompatible change in the API. It is against CRAN guidelines and as DBI is quite widely used it is a really serious thing.

Although I can imagine solutions which will not break backward compatibility (e.g. passing schemas vector as an attribute of the returned vector) it will be rather inconvenient for the user.

Taking this into account maybe it would be better to introduce a new function and mark dbListTables() as deprecated...

hadley · 2015-11-11T22:37:38Z

@zozlak how will it be a backward incompatible change?

(And backward incompatible changes aren't against CRAN policy as long as you give everyone adequate notice about the change)

krlmlr · 2015-11-11T22:37:56Z

From the policies:

If an update will change the package’s API and hence affect packages depending on it, it is expected that you will contact the maintainers of affected packages and suggest changes, and give them time to prepare updates before submitting your updated package. Do mention in the submission email which packages are affected and that their maintainers have been informed. In order to derive the reverse dependencies of a package including the addresses of maintainers who have to be notified upon changes, the function reverse_dependencies_with_maintainers is available from the developer website.

zozlak · 2015-11-11T23:00:14Z

@zozlak how will it be a backward incompatible change?

If I want to keep current return type (character vector, one element per table), then I should either join schema name and table name (probably properly escaped) either do some strange things (like assign schema names to attribute).
The problem with returning joined schema and table name is that:

if I return them unescaped, then it may be impossible to distinguish table name from schema name;
it I want to return them in escaped form I have a problem because different database systems use different escapes for objects identifiers and that is something I would prefer to hide from R users.

And anyway if we agree that other functions (dbReadTable(), etc.) should take schema and table name separated from each other (e.g. as a two-element character vector), to be able to escape them properly, then I would expect dbListTables() to return schema and table name separately.

Now if I want to return schema name separated from table name and do it in a natural way it would mean for me to return them as a character matrix (or data frame) with schema and table name in separate columns but this would change return type of dbListTables() (and so will be backward incompatible).

Or am I missing something?

krlmlr · 2015-11-11T23:05:37Z

I think dbListTables() should keep returning a plain character vector with unquoted names. The schema should be specified as input parameter.

zozlak · 2015-11-12T08:09:59Z

@kiril I like your idea. It will make change totally transparent for those who don't care about schemes and at the same time allow to use schemes for those who need them.

Then we would also need dbListSchemes()

krlmlr · 2016-01-13T21:23:36Z

Views should be treated the same as tables
- No special handling for trying to write a view
Multi-component table names: Two options:
- Character vector -> several problems
- Preferred: New function dbTableName(Connection, character) -> SQL
  - dbTableName(Connection, SQL) -> SQL returns argument unchanged

hadley · 2016-01-13T21:24:29Z

dbReadTable() and dbWriteTable() would take character vectors and would call dbTableName() for you. This doesn't change existing behaviour but allows you to do (e.g.) dbReadTable(c("schema", "test")).

What should dbReadTable(SQL("SELECT * FROM a WHERE x = 1")) do? What does it do now?

krlmlr · 2017-01-28T20:55:54Z

dbReadTable(SQL("SELECT * FROM a WHERE x = 1")) should look for a table named SELECT * FROM a WHERE x = 1. This has given us headaches with RSQLite, because some packages did the quoting themselves.

Currently, the design for schema support looks like this:

The dbQuoteIdentifier() generic gains an optional namespace argument that allows prepending namespace components. This expects a character vector that will be passed to dbQuoteIdentifier(..., namespace = NULL) (again) before connecting the components. If length 0, no namespace is prepended
The dbReadTable(), dbWriteTable(), dbExistsTable() and dbRemoveTable() gain a default implementation that allows passing a list as name argument; this will then be used in a call to dbQuoteIdentifier() to simplify passing the namespace to these methods
There will be a new dbListNamespaces() generic, TBD
The dbListTables() method also gains an optional namespace argument, which can also be a list

krlmlr · 2017-01-29T10:05:50Z

Another option: dbReadTable() et al. are expected to always pass ... along to dbQuoteIdentifier(). This would allow arbitrary optional signatures to dbQuoteIdentifier().

krlmlr · 2017-01-29T21:15:26Z

Next iteration of the design:

dbQuoteIdentifier() methods use ... to extend an identifier with namespace components. DBI will give suggestions but not enforce anything. The default should not add any namespace components.
All methods that take a table name (dbReadTable(), dbWriteTable(), dbExistsTable(), dbRemoveTable(), ...) call dbQuoteIdentifier(conn, x = name, ...), this allows specifying namespace components as parts of a call to these methods.
There will be a new dbListNamespaces() generic, this will return a data frame with a list column that contains the schema specification; this will have to be used in the dbQuoteIdentifier() call to access this schema. The default empty schema is always part of this list.
dbListTables() methods use ... to specify the namespace just like dbQuoteIdentifier() does.

Example:

dbQuoteIdentifier(con, "name", namespace = c("database", "schema"))
## <SQL> "database"."schema"."name"
dbReadTable(con, name, namespace = c("database", "schema"))
## ...
dbListNamespaces(con)$namespaces
[[1]]
list()

[[2]]
list(namespace = c("database", "schema"))
...

dbListTables(con, name, namespace = c("database", "schema"))

iangow · 2017-04-06T16:22:38Z

OK. These would all work for my wish list.

tbl(pg, dbId("some_schema", "some_table"))
dbListTables(dbId("some_schema"))
compute(dbId("some_schema", "some_table"))

(Actually, compute(dbId("some_schema", "some_table"), overwrite = TRUE) would be great, but that's a different issue.)

krlmlr · 2017-04-11T21:37:45Z

Thanks for your input. I'm still confident that we can make do without a dbId() constructor ~~method (which would really need to be called as dbId(con, ...))~~, but I'll come back to the discussion if it turns out that we do need a dbId().

The approach I'm anticipating is described in #24 (comment), I'm repeating it below (slightly modified). We will not be able to decompose fully qualified table names into their components (cluster, database, schema, ...), but I'd argue we don't need to.

I think we need two components:

A way to pass through table/view identifiers with database/schema/... qualification
A way to list databases, schemas, ...

For the first, we already have the SQL class, but even the DBI specification doesn't enforce but only only suggests using it -- instead it specifies that quoted identifiers are immune to calls to dbQuoteIdentifier(), i.e., that dbQuoteIdentifier(con, x) is identical to x if it is already quoted. dbWriteTable() et al. are specified so that they accept a quoted identifier.

For the second, I'd like to define a generic dbListObjects() that returns a data frame with (at least) a name column and a is_table column. The data frame can contain other columns, but the names should be prefixed with a dot, so that we can extend the spec later. The name column is expected to contain a quoted identifier (e.g., of class SQL), the is_table column is TRUE if the row refers to a table or view. dbListObjects(con) lists top-level objects (such as databases), and dbListObjects(con, name) list objects in the name database/schema/... (whatever applies to your DBMS). We could also define that dbListObjects(con, NA) lists the tables/views that are accessible without qualification.

stephlocke · 2017-06-21T11:22:10Z

Schema support is vital for working with big/enterprise databases.

In terms of what to expose as metadata - it'd be nifty if you could support the range of information_schema exposed concepts. information_schema is an ansi compliant metadata schema that can be leveraged for getting consistent results across databases.

dbIdentifier() as a way to provide schema and database names would be an OK way to go - it doesn't add too much typing. It'd also be good if you did something like dbId(dbConn,"[dbo].[orders]") and have that end up with the correct reference to the table.

zozlak · 2017-06-21T12:27:41Z

"[dbo].[orders]"

example provided by @stephlocke makes me thinking about the approach proposed by @krlmlr .
I think the SQL class is not enough to pass table/view identifiers with database/schema/... qualification because my.tab can mean table "tab" in schema (or database) "my" as well as table "my.tab" in current context and the SQL class is unable to distinguish between these usecases. My impression is that the dbId() approach is really needed to solve such issue.

By the way while reading discussion about thedbId class I see two different approaches:

The dbId object itself being "not aware" of the database entity type it is representing, e.g.:
- in dbReadTable(con, dbId('foo')) foo denotes a table/view named foo
- in dbListTables(dbId('foo')) foo denotes container for tables named foo (in most RDBMSs the foo schema, in MySQL the foo database)
- in dbListSchemas(dbId('foo')) foo denotes container for schemas named foo (in all RDBMSs I know the foo database)
The dbId object storing information about the entity type it is representing.

While the first approach provides users with short and intuitive syntax only the latter one will work with the dbListObjects() function proposed by @krlmlr. In the first case the dbListObjects() won't know what kind of database entities to list. Alternatively we can assume that the dbListObjects() lists all kinds of database entities (which doesn't sound good to me) but then looking at the dbId objects returned by it we can't tell, what kind of database entity they represent which is similarly problematic.

jimhester · 2017-08-21T19:35:51Z

I just implemented a simple version of this for odbc, https://github.com/rstats-db/odbc/compare/SQLTable?expand=1#diff-ada2ba640dbb79b8d3a115369830f718. It would be nice to settle on something in DBI as well, as this generic should really be defined there.

I think reversing the order of the arguments like I did is pretty important; i.e. the highest level argument should the furthest right. This approach would also allow the generic to skip names for schema and catalog all together if desired and just take additional arguments as ....

Also I think you need to pass the connection object to dbId so that you can properly escape the identifiers and potentially use a different schema separator if needed.

- Schema support: Export `Id()`, new generics `dbListObjects()` and `dbUnquoteIdentifier()`, methods for `Id` that call `dbQuoteIdentifier()` and then forward (#220).

ldfreight · 2018-03-06T10:44:39Z

Hello,

thank you for the recent addition. Support for schema is great.

Using DBI 0.7.15 from github and trying to send to dbo.mytable using:

DBI::dbWriteTable(conn = con, name = DBI::Id(schema="dbo", name="mytable"), value = df, append=TRUE)
however this returns

unable to find an inherited method for function 'dbWriteTable' for signature '"Microsoft SQL Server", "SQL", "missing"'

Is this the correct usage of Id ?

thank you

mmastand · 2018-03-07T23:13:59Z

That looks like the right usage to me. I'm also having this same problem with Id and DBI 0.7-15 (installed from github). The following:

res <- DBI::dbWriteTable(conn = con,
                         name = DBI::Id(catalog = "testing", schema = "dbo", name = "test_table"),
                         value = v,
                         append = TRUE)

returns

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"Microsoft SQL Server", "SQL", "missing"’

hadley · 2018-03-09T14:46:18Z

Commenting on a closed issue is not a good way to get help. You probably need to file an issue with whatever package you are using to connect to SQL server.

krlmlr · 2018-03-09T16:19:12Z

Thanks. Id() is called Table() in the most recent CRAN version. Please file a new issue, preferably with your DBI backend package, as @hadley suggested.

krlmlr · 2018-03-09T16:21:13Z

Oh, I'm confused, it's Id now. The usage looks correct, but the DBI backend needs to support it.

mmastand · 2018-03-09T18:39:26Z

Thanks, I'll move this to odbc.

smartinsightsfromdata mentioned this issue Oct 13, 2015

A syntax for managing db schemas? #43

Closed

This was referenced Oct 28, 2015

Think about schemas r-dbi/RPostgres#57

Closed

RPostgres is unable to find table without manually giving the sql call r-dbi/RPostgres#32

Closed

krlmlr added the action:design label Nov 11, 2015

krlmlr mentioned this issue Nov 11, 2015

Test vectorization r-dbi/DBItest#18

Closed

krlmlr mentioned this issue Nov 12, 2015

Can't connect to Views in PostgreSQL tidyverse/dplyr#1007

Closed

krlmlr mentioned this issue Dec 23, 2015

Schema handling in default implementation of dbQuoteIdentifier #71

Closed

This was referenced Feb 2, 2016

Make dbExistsTable more efficient r-dbi/RPostgres#59

Closed

dbListTables now queries for view as well as table names r-dbi/RPostgres#29

Closed

dbListTables() lacks schema handling s-u/RJDBC#5

Closed

krlmlr mentioned this issue Feb 29, 2016

dbWriteTable with subset of columns and specified schema r-dbi/RPostgres#77

Closed

imanuelcostigan mentioned this issue Jun 11, 2016

Return schema as well as table name in dbListTables imanuelcostigan/RSQLServer#52

Closed

krlmlr mentioned this issue Aug 18, 2016

Materialize persistent tables in database tidyverse/dplyr#2065

Closed

jimhester mentioned this issue Jan 27, 2017

Handle schemas better r-dbi/odbc#18

Closed

krlmlr mentioned this issue Jan 29, 2017

Test for dbQuoteIdentifier(DBIConnection, list) r-dbi/DBItest#106

Closed

krlmlr mentioned this issue Jan 29, 2017

Improve default implementation of dbQuoteString() and dbQuoteIdentifier() #77

Closed

5 tasks

chezou mentioned this issue Apr 17, 2017

Sanitize table names in copy_to() sparklyr/sparklyr#611

Closed

imanuelcostigan mentioned this issue Jun 1, 2017

Creating tbl on non-default DB and Schema Fails imanuelcostigan/RSQLServer#162

Closed

jimhester mentioned this issue Aug 22, 2017

dbWriteTable fails to write to non-default schema r-dbi/odbc#91

Closed

jimhester mentioned this issue Oct 16, 2017

can't copy_to schema other than Dflt. r-dbi/odbc#120

Closed

richfitz mentioned this issue Nov 3, 2017

foreign tables are not recognised r-dbi/RPostgres#134

Closed

etiennebr mentioned this issue Nov 20, 2017

Question about support for odbc drivers r-spatial/sf#557

Closed

krlmlr mentioned this issue Nov 27, 2017

Implement RStudio Connection Contract r-dbi/RMariaDB#55

Closed

This was referenced Dec 4, 2017

dbWriteTable fails to find tables in non-default schemas #181

Closed

dbWriteTable() fails to write to existing table with non-default schema #191

Closed

colearendt mentioned this issue Dec 20, 2017

Cannot overwrite or append in schemas r-dbi/odbc#140

Closed

This was referenced Dec 21, 2017

Doesn't work with different schemas r-dbi/RPostgres#162

Closed

dbExistsTable can't handle schema r-dbi/RPostgres#160

Closed

krlmlr mentioned this issue Jan 29, 2018

Schema support #220

Merged

krlmlr closed this as completed in #220 Feb 2, 2018

krlmlr referenced this issue Feb 2, 2018

Merge pull request #220 from r-dbi/f-#24-schema

e59d1cd

- Schema support: Export `Id()`, new generics `dbListObjects()` and `dbUnquoteIdentifier()`, methods for `Id` that call `dbQuoteIdentifier()` and then forward (#220).

github-actions bot locked and limited conversation to collaborators Oct 9, 2020

krlmlr pushed a commit that referenced this issue Oct 18, 2022

remove rmarkdown bandaid; closes #24

8aa0a01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidelines to handle schemes and views #24

Guidelines to handle schemes and views #24

zozlak commented Dec 9, 2014

krlmlr commented Nov 11, 2015

krlmlr commented Nov 11, 2015

krlmlr commented Nov 11, 2015

zozlak commented Nov 11, 2015

hadley commented Nov 11, 2015

krlmlr commented Nov 11, 2015

zozlak commented Nov 11, 2015

krlmlr commented Nov 11, 2015

zozlak commented Nov 12, 2015

krlmlr commented Jan 13, 2016

hadley commented Jan 13, 2016

krlmlr commented Jan 28, 2017

krlmlr commented Jan 29, 2017 •

edited

Loading

krlmlr commented Jan 29, 2017 •

edited

Loading

iangow commented Apr 6, 2017 •

edited

Loading

krlmlr commented Apr 11, 2017 •

edited

Loading

stephlocke commented Jun 21, 2017

zozlak commented Jun 21, 2017

jimhester commented Aug 21, 2017

ldfreight commented Mar 6, 2018 •

edited

Loading

mmastand commented Mar 7, 2018

hadley commented Mar 9, 2018

krlmlr commented Mar 9, 2018

krlmlr commented Mar 9, 2018

mmastand commented Mar 9, 2018

Guidelines to handle schemes and views #24

Guidelines to handle schemes and views #24

Comments

zozlak commented Dec 9, 2014

krlmlr commented Nov 11, 2015

krlmlr commented Nov 11, 2015

krlmlr commented Nov 11, 2015

zozlak commented Nov 11, 2015

hadley commented Nov 11, 2015

krlmlr commented Nov 11, 2015

zozlak commented Nov 11, 2015

krlmlr commented Nov 11, 2015

zozlak commented Nov 12, 2015

krlmlr commented Jan 13, 2016

hadley commented Jan 13, 2016

krlmlr commented Jan 28, 2017

krlmlr commented Jan 29, 2017 • edited Loading

krlmlr commented Jan 29, 2017 • edited Loading

iangow commented Apr 6, 2017 • edited Loading

krlmlr commented Apr 11, 2017 • edited Loading

stephlocke commented Jun 21, 2017

zozlak commented Jun 21, 2017

jimhester commented Aug 21, 2017

ldfreight commented Mar 6, 2018 • edited Loading

mmastand commented Mar 7, 2018

hadley commented Mar 9, 2018

krlmlr commented Mar 9, 2018

krlmlr commented Mar 9, 2018

mmastand commented Mar 9, 2018

krlmlr commented Jan 29, 2017 •

edited

Loading

krlmlr commented Jan 29, 2017 •

edited

Loading

iangow commented Apr 6, 2017 •

edited

Loading

krlmlr commented Apr 11, 2017 •

edited

Loading

ldfreight commented Mar 6, 2018 •

edited

Loading