Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with quoted identifiers and CTEs #1559

Open
aguynamedryan opened this issue Nov 22, 2024 · 0 comments · May be fixed by #1560
Open

Issues with quoted identifiers and CTEs #1559

aguynamedryan opened this issue Nov 22, 2024 · 0 comments · May be fixed by #1560

Comments

@aguynamedryan
Copy link

After completing #1558, I discovered there were some other issues with CTEs, namely, dbplyr quoted tables names inconsistently and incorrectly.

I created this test case to generate the two different issues I encountered.

test_that("CTE quoting works right", {
  lf1 <- lazy_frame(x = 1, y = 2, .name = "lf1")
  lf2 <- lazy_frame(x = 1, z = 2, .name = "lf2")

  # The query is nonsensical, because I took a failing query from the wild and
  # whiddled it down to the minimum amount of code required to reveal two
  # separate quoting issues when using CTEs

  double_it <- function(.data, column_name) {
    .data %>%
      mutate(
        "{ column_name }" := .data[[column_name]] * 2
      )
  }

  skinny <- function(column_name) {
    lf1 %>%
      double_it(column_name) %>%
      mutate(
        column_name = column_name,
        .keep = "none"
      )
  }

  tall_tbl <- purrr::map(c("x", "y"), skinny) %>%
    purrr::reduce(dplyr::union_all)

  query <- tall_tbl %>%
    left_join(tall_tbl, by = join_by(column_name)) %>%
    left_join(tall_tbl, by = join_by(column_name)) %>%
    sql_render(sql_options = sql_options(cte = TRUE)) %>%
    expect_snapshot()
})

The output produced for this snapshot is the following:

WITH `q01` AS (
    SELECT `x` * 2.0 AS `x`, `y`
    FROM `lf1`
  ),
  `q02` AS (
    SELECT 'x' AS `column_name`
    FROM `q01`
  ),
  `q03` AS (
    SELECT `x`, `y` * 2.0 AS `y`
    FROM `lf1`
  ),
  `q04` AS (
    SELECT 'y' AS `column_name`
    FROM `q03` AS `q01`
  ),
  `q05` AS (
    SELECT *
    FROM `q02`
  
    UNION ALL
  
    SELECT *
    FROM `q04`
  ),
  `q06` AS (
    SELECT 'x' AS `column_name`
    FROM ```q01``` AS `q01`
  ),
  `q07` AS (
    SELECT 'y' AS `column_name`
    FROM ```q03``` AS `q01`
  ),
  `q08` AS (
    SELECT *
    FROM `q06`
  
    UNION ALL
  
    SELECT *
    FROM `q07`
  ),
  `q09` AS (
    SELECT *
    FROM '`q06`'
  
    UNION ALL
  
    SELECT *
    FROM '`q07`'
  )
  SELECT `...1`.`column_name` AS `column_name`
  FROM `q05` AS `...1`
  LEFT JOIN `q08` AS `...2`
    ON (`...1`.`column_name` = `...2`.`column_name`)
  LEFT JOIN `q09` AS `...3`
    ON (`...1`.`column_name` = `...3`.`column_name`)

Here are the two issues I noticed:

  • If you look at the definition for CTE q06, you'll see the FROM clause has incorrectly triple-quoted the q03 identifier
  • If you look at the definition for CTE q09, you'll see the FROM clause has incorrectly applied single quotes around the q06 identifier

Either of these issues render the SQL statement invalid and unable to run on a database.

aguynamedryan added a commit to aguynamedryan/dbplyr that referenced this issue Nov 22, 2024
@aguynamedryan aguynamedryan linked a pull request Nov 22, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant