-
Notifications
You must be signed in to change notification settings - Fork 2.1k
/
Copy pathnew-sql-backend.Rmd
123 lines (83 loc) · 5.23 KB
/
new-sql-backend.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "Adding a new SQL backend"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Adding a new SQL backend}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
library(dplyr)
```
This document describes how to add a new SQL backend to dplyr. To begin:
* Ensure that you have a DBI compliant database backend. If not, you'll need
to first create it by following the instructions in
`vignette("backend", package = "DBI")`.
* You'll need a working knowledge of S3. Make sure that you're
[familiar with the basics](http://adv-r.had.co.nz/OO-essentials.html#s3)
before you start.
This document is still a work in progress, but it will hopefully get you started. If you're familiar with how your database and at least one other database that dplyr supports already, this should be reasonably simple task. However, it is possible that a new database backend may need new methods - I'm happy to add those as needed.
## Create the src object
Start by creating a new src function to represent the backend. Assuming we're going to create a src for postgres, you'd call it `src_postgres()`, and you'd follow the pattern of an existing src. A simplified version of `src_postgres()` is show below:
```{r, eval = FALSE}
src_postgres <- function(dbname = NULL, host = NULL, port = NULL, user = NULL,
password = NULL, ...) {
con <- dbConnect(PostgreSQL(), host = host %||% "", dbname = dbname %||% "",
user = user, password = password %||% "", port = port %||% "", ...)
src_sql("postgres", con)
}
```
Use `src_sql()` to create a new S3 object with the correct structure. It must have a DBI connection, but it can store anything else that might be useful.
Next, implement a method for `src_desc()` that briefly describes the source:
```{r}
#' @export
src_desc.src_postgres <- function(con) {
info <- dbGetInfo(con)
host <- if (info$host == "") "localhost" else info$host
paste0("postgres ", info$serverVersion, " [", info$user, "@",
host, ":", info$port, "/", info$dbname, "]")
}
```
If you read the source code for the real `src_postgres()` you'll notice that it caches the `getGetInfo()` field on creation, since this saves a little time when printing tbls.
Before continuing, check that you can create a connection to a local database, and that you get a listing of the existing tables. If you have a problem at this point, you may need to check the DBI backend.
## tbl
Next implement the `tbl()` method for your data source. This will probably just be:
```{r, eval = FALSE}
tbl.src_mssql <- function(src, from, ...) {
tbl_sql("mssql", src = src, from = from, ...)
}
```
Before continuing, make sure you can connect to an existing table, and that the results are printed reasonably. If not, that may indicate your database uses a non-standard DBI interface, and you'll need to fix that before continuing.
This is also a good time implement `explain()`, by adding a method for `db_explain()`.
If your database uses non-standard quoting (i.e. something other than `"` for identifiers and `'` for strings), implement methods for `sql_escape_string()` and `sql_escape_ident()`.
You may need to implement `db_query_fields()`, which should return a character vector giving the field names of a query.
At this point, all the basic verbs (`summarise()`, `filter()`, `arrange()`, `mutate()` etc) should also work, but it's hard to test without some data.
## `copy_to()`
Next, implement the methods that power `copy_to()` work. Once you've implemented these methods, you'll be able copy datasets from R into your database, which will make testing much easier.
* `db_data_type()`
* `sql_begin()`, `sql_commit()`, `sql_rollback()`
* `sql_create_table()`, `sql_insert_into()`, `sql_drop_table()`
* `sql_create_index()`, `sql_analyze()`
If the database doesn't support a function, just return `TRUE` without doing anything. If you find these methods a very poor match to your backend, you may find it easier to provide a direct `copy_to()` method.
At this point, you should be able to copy the nycflights13 data packages into your database with (e.g.):
```{r, eval = FALSE}
copy_nycflights13(src_mssql(...))
copy_lahman(src_mssql(...))
```
Don't proceed further until this works, and you've verified that the basic single table verbs word.
## Compute, collect and collapse
Next, check that `collapse()`, `compute()`, and `collect()` work.
* If `collapse()` fails, your database has a non-standard way of constructing
subqueries. Add a method for `sql_subquery()`.
* If `compute()` fails, your database has a non-standard way of saving queries
in temporary tables. Add a method for `db_save_query()`.
## Multi table verbs
Next check the multitable verbs:
* `left_join()`, `inner_join()`: powered by `sql_join()`
* `semi_join()`, `anti_join()`: powered by `sql_semi_join()`
* `union()`, `intersect()`, `setdiff()`: powered by `sql_set_op()`
## sql translation
To finish off, you can add custom R -> SQL translation by providing a method for `src_translate_env()`. This function should return an object created by `sql_variant()`. See existing methods for examples.