Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr is translating FALSE to "0" in SQL. #2614

Closed
iangow opened this issue Apr 3, 2017 · 11 comments
Closed

dplyr is translating FALSE to "0" in SQL. #2614

iangow opened this issue Apr 3, 2017 · 11 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@iangow
Copy link

iangow commented Apr 3, 2017

Translating FALSE to 0 works in SQLite, which has few real types. But it breaks existing PostgreSQL code.

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
y <- c(1, 2, 3, NA)
       
df <- 
    memdb_frame(y) %>%
    mutate(y_big = coalesce(y > 2, FALSE))

df
#> Source:     lazy query [?? x 2]
#> Database:   sqlite 3.11.1 [:memory:]
#> 
#>       y y_big
#>   <dbl> <int>
#> 1     1     0
#> 2     2     0
#> 3     3     1
#> 4    NA     0
    
pg <- src_postgres()

df <-
    copy_to(pg, tibble(y),
            name = "adhaksdhjkalsda",
            overwrite = TRUE) %>%
    mutate(y_big = coalesce(y > 2, FALSE))  
df %>% show_query()
#> <SQL>
#> SELECT "y", COALESCE("y" > 2.0, 0) AS "y_big"
#> FROM "adhaksdhjkalsda"

df
#> Source:     lazy query [?? x 2]
#> Database:   postgres 9.6.2 [[email protected]:5432/crsp]
#> Error in postgresqlExecStatement(conn, statement, ...): RS-DBI driver: (could not Retrieve the result : ERROR:  COALESCE types boolean and integer cannot be matched
#> LINE 1: SELECT "y", COALESCE("y" > 2.0, 0) AS "y_big"
#>                                         ^
#> )

df <-
    copy_to(pg, tibble(y),
            name = "adhaksdhjkalsda",
            overwrite = TRUE) %>%
    mutate(y_big = coalesce(y > 2, sql("FALSE")))  
df %>% show_query()
#> <SQL>
#> SELECT "y", COALESCE("y" > 2.0, FALSE) AS "y_big"
#> FROM "adhaksdhjkalsda"

df
#> Source:     lazy query [?? x 2]
#> Database:   postgres 9.6.2 [[email protected]:5432/crsp]
#> 
#>       y y_big
#>   <dbl> <lgl>
#> 1     1 FALSE
#> 2     2 FALSE
#> 3     3  TRUE
#> 4    NA FALSE
@hadley
Copy link
Member

hadley commented Apr 3, 2017

See #2052 for discussion, and https://www.postgresql.org/docs/9.1/static/datatype-boolean.html, which suggests that 0 and 1 are ok.

@iangow
Copy link
Author

iangow commented Apr 3, 2017

Documentation seems literally correct: '0' and '1' are fine, but 0 and 1 are not:

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
y <- c(1, 2, 3, NA)

pg <- src_postgres()

df <-
    copy_to(pg, tibble(y),
            name = "adhaksdhjkalsda",
            overwrite = TRUE) %>%
    mutate(y_big = coalesce(y > 2, FALSE))  
df %>% show_query()
#> <SQL>
#> SELECT "y", COALESCE("y" > 2.0, 0) AS "y_big"
#> FROM "adhaksdhjkalsda"

df <-
    copy_to(pg, tibble(y),
            name = "adhaksdhjkalsda",
            overwrite = TRUE) %>%
    mutate(y_big = coalesce(y > 2, sql("'0'")))  
df %>% show_query()
#> <SQL>
#> SELECT "y", COALESCE("y" > 2.0, '0') AS "y_big"
#> FROM "adhaksdhjkalsda"

df
#> Source:     lazy query [?? x 2]
#> Database:   postgres 9.6.2 [[email protected]:5432/crsp]
#> 
#>       y y_big
#>   <dbl> <lgl>
#> 1     1 FALSE
#> 2     2 FALSE
#> 3     3  TRUE
#> 4    NA FALSE

@iangow
Copy link
Author

iangow commented Apr 3, 2017

Literals TRUE and FALSE appear to be part of SQL standard (also see here), even if PostgreSQL is the only system to use these.

@hadley
Copy link
Member

hadley commented Apr 3, 2017

Hmmm, and there's currently no double dispatch on vector type and database backend.

But generally dplyr translation assumes SQL-92, so maybe it's ok to just translate to '0' and '1. Will need to add a backend test.

@iangow
Copy link
Author

iangow commented Apr 3, 2017

Seems to work OK with SQLite (if it broke there, then clearly not worth trying):

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
y <- c(1, 2, 3, NA)
       
df <- 
    memdb_frame(y) %>%
    mutate(y_big = coalesce(y > 2, sql("'0'")))

df
#> Source:     lazy query [?? x 2]
#> Database:   sqlite 3.11.1 [:memory:]
#> 
#>       y y_big
#>   <dbl> <int>
#> 1     1     0
#> 2     2     0
#> 3     3     1
#> 4    NA     0

@hadley
Copy link
Member

hadley commented Apr 3, 2017

Unfortunately MySQL yields:

SELECT `x`, COALESCE(`x` > 0.0, '1') AS `y`
FROM `yxuqwdshce`

Source:     lazy query [?? x 2]
Database:   mysql 10.1.18-MariaDB [hadley@localhost:/test]
      x     y
  <dbl> <chr>
1     1     1
2    NA     1

@hadley
Copy link
Member

hadley commented Apr 3, 2017

I can't find any single translation that works across these three backends

@hadley
Copy link
Member

hadley commented Apr 3, 2017

In the long run, hopefully DBI can provide a new generic: r-dbi/DBI#172. In the short-run, need another method like sql_escape_string, i.e. sql_escape_logical that dispatches on the connection.

@iangow
Copy link
Author

iangow commented Apr 3, 2017

Thanks

I can't find any single translation that works across these three backends

It seems TRUE and FALSE work for MySQL (don't have to test) and PostgreSQL. There's no such thing as booleans in SQLite, so it seems that the type that needs to be converted there. Maybe that's what you're referring to with "no double dispatch on vector type and database backend."

@hadley
Copy link
Member

hadley commented Apr 3, 2017

Argh, it's worse than that since there seems to be a major logic failing and con doesn't get passed down to escape in many scenarios. This hasn't caused a problem in the past because it only affects literals.

@hadley
Copy link
Member

hadley commented Apr 3, 2017

Rather than continuing to painstaking thread the current connection through every function call, it'll probably be easier to generalise win_current_con() to current_con() and only set it in a few places.

@hadley hadley added bug an unexpected problem or unintended behavior database labels Apr 4, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants