Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I cannot connect postgresql schema.table with dplyr package #244

Closed
diegogarcilazo opened this issue Feb 6, 2014 · 15 comments
Closed

I cannot connect postgresql schema.table with dplyr package #244

diegogarcilazo opened this issue Feb 6, 2014 · 15 comments
Assignees
Labels
feature a feature request or enhancement
Milestone

Comments

@diegogarcilazo
Copy link

I wrote this question before to know about GitHub/hadley/dplyr.
http://stackoverflow.com/questions/21592266/i-cannot-connect-postgresql-schema-table-with-dplyr-package

Thanks for dplyr, ggplots, reshape2 and .....

@diegogarcilazo
Copy link
Author

@hadley thinking about your answer I came up to write the full SQL sentence

my_tbl = tbl(my_db, dplyr::sql('SELECT * FROM mortalidad.def0307'))

and there I could connect to the schema and table. Thanks a lot. – Diego

@dotcomken
Copy link

I was just going to post this same issue when working with Greenplum. You sql method solved my issue.

@diegogarcilazo
Copy link
Author

I'm happy with that. The only thing that worries me is that in the sql statement the following is observed

SELECT "var1", count(*) AS "n" FROM (SELECT * FROM mortalidad.mort12)

when any grouping is done. There is a subquery after FROM and I do not know if this will have any type of problem with the efficiency of the sql query. I'm an Epidemiologist and I'm unaware of this.

@hadley hadley added this to the v0.2 milestone Mar 17, 2014
@hadley hadley self-assigned this Mar 17, 2014
@hadley
Copy link
Member

hadley commented Mar 19, 2014

Would you mind giving me a sequence of SQL commands that I can run in postgresql to create a schema and a database in that schema?

@diegogarcilazo
Copy link
Author

Sorry for the delay here I send as a create a scheme

CREATE TABLE schema_name.table_name
(
codigo character varying(3),
nombre character varying(51),
continente character varying(7)
)

2014-03-19 18:56 GMT-03:00 Hadley Wickham [email protected]:

Would you mind giving me a sequence of SQL commands that I can run in
postgresql to create a schema and a database in that schema?

Reply to this email directly or view it on GitHubhttps://github.com//issues/244#issuecomment-38112148
.

Diego Garcilazo
Médico
Departamento Programas de Salud
Instituto Nacional de Enfermedades Respiratorias
Av. Blas Parera 8260 // Santa Fe // Argentina //0342 - 4896850
http://www.anlis.gov.ar/inst/iner/

@hadley hadley modified the milestones: 0.3, v0.2 Apr 7, 2014
@hadley hadley mentioned this issue Aug 26, 2014
@hadley
Copy link
Member

hadley commented Sep 1, 2014

I think there are basically two options here:

  1. Special case escaping whenever there's a dot in the identifier. This would mean that x.y would be escape to "x"."y". This has the advantage of being very succinct, it has the disadvantage that tables translated from R to SQL can't use . in field names (probably not very common).
  2. Use a new escaping mechanism. Probably the easiest way to do this would be to extend ident() to take multiple arguments, e.g. ident("x", "y"). Alternatively could define dot() which does NSE to allow you to write dot(x, y). This is more typing, but is guaranteed to not break any existing code.

This important both for table names (e.g. schema.table) and possibly to disambiguate field names in joins (e.g. table.field)

I'm currently leaning towards the explicit dot() function, because changing how escaping works globally seems risky.

@hadley
Copy link
Member

hadley commented Sep 1, 2014

Hmmmm, for this use case, you can just do:

my_tbl <- tbl(my_db, sql('mortalidad.def0307'))

@hadley hadley modified the milestones: 0.4, 0.3 Sep 1, 2014
@hadley
Copy link
Member

hadley commented Sep 1, 2014

And some other code for future reference (wasn't successful)

dot <- function(...) {
  args <- dots(...)
  is_name <- vapply(args, is.name, logical(1))
  if (any(!is_name)) {
    stop("All arguments to dot() must be names.")
  }

  dots <- vapply(args, as.character, character(1))
  class(dots) <- c("ident_multi", "ident", "sql", "character")
  dots
}

dot <- function(...) {
  out <- paste(escape(c(...), parens = FALSE), collapse = ".")
  class(out) <- c("ident", "sql", "character")
  out
}

#' @export
escape.ident_multi <- function(x, parens = FALSE, collapse = ", ", con = NULL) {
  y <- vapply(x, escape_ident, FUN.VALUE = character(1), con = con)
  paste(y, collapse = ".")
}

#' @export
format.ident_multi <- function(x, ...) paste0("<SQL> ", paste(x, collapse = "."))

# lahman_sqlite() %>% tbl("Batting")
# lahman_sqlite() %>% tbl("main.Batting")
# lahman_sqlite() %>% tbl(sql("main.Batting"))
# lahman_sqlite() %>% tbl(dot("main", "Batting"))

@hhoeflin
Copy link

Hi Hadley,

I am having the same issue and the nesting seems to be a problem for performance (in my case a serious one as I am working on data with 100s million rows). Could you elaborate on why your last try failed?

Thanks

@HarlanH
Copy link

HarlanH commented Aug 12, 2015

FWIW, for some use cases a helpful idiom is:

dbSendQuery(con$con, build_sql("SET search_path TO ", schema_name))

Then subsequent queries using that connection are within that schema.

@hhoeflin
Copy link

Hi Harlan,

thanks - and I am using something like that already (except I am expanding the search path by the schema. But it only helps so far. If you have tables with the same names in 2 schemas, it won't.

Anyway, I was wondering why not overload the '$' sign in the base_scalar SQL translator.

It would even make sense from an r perspective

a$b$c, there a would be an environment (i.e. a schema), b a table and c a column. The only think that has to be fixed then is tbl_sql, that shouldn't check in such a simple fashion if the table_name exists in the database. Instead, it could check if a "SELECT" statement is part of the SQL query and if not, add a SELECT * in front and check the variables this way.

In this case, one could even enable NSE for the tbl function?

@hhoeflin
Copy link

Oh yes, one more comment:

dbSendQuery

often does not work for RJDBC (at least with vertica). The query above, i.e. SET search_path TO ...

does not return a result. dbSendQuery does expect one. For vertica, this does not work and one has to use dbSendUpdate. Is that also the case for MySQL (or at least does dbSendUpdate not fail for MySQL)?

@hadley
Copy link
Member

hadley commented Aug 13, 2015

I like the idea of overloading $. I'll definitely think about it when I'm next working on dplyr.

@hhoeflin have you tried RPostgreSQL instead of RJDBC for talking to vertica?

@hhoeflin
Copy link

No - had no idea that RPostgreSQL is an option. In any case - the vertica backend is now implemented for RJDBC already. Had a try though and ran into problems as the postgres tables it is trying to access to find out which tables exist in the database seem to be different.

@hadley
Copy link
Member

hadley commented Feb 2, 2017

This is slowly getting resolved through DBI: r-dbi/DBI#24

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants