-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
141 lines (112 loc) · 3.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# scrappy: A Simple Web Scraper <img src="https://raw.githubusercontent.com/villegar/scrappy/main/inst/images/logo.png" alt="logo" align="right" height=200px/>
<!-- badges: start -->
`r badger::badge_cran_release("scrappy", "black")`
`r badger::badge_devel("villegar/scrappy", "yellow")`
`r badger::badge_github_actions("villegar/scrappy")`
<!-- badges: end -->
The goal of scrappy is to provide simple functions to scrape data from different
websites for academic purposes.
## Installation
You can install the released version of scrappy from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("scrappy")
```
And the development version from [GitHub](https://github.com/villegar/scrappy) with:
``` r
# install.packages("devtools")
devtools::install_github("villegar/scrappy")
```
## Example
__NOTE:__ To run the following examples on your computer, you need to download
and install Mozilla Firefox (https://www.mozilla.org/en-GB/firefox/new/). Alternatively,
you can replace the value of `browser` in the call to `RSelenium::rsDriver`.
```{r run-examples-hidden-code, echo = FALSE, message = FALSE, output = FALSE, cache = TRUE}
# Create RSelenium session
rD <- RSelenium::rsDriver(browser = "firefox", port = 4549L, verbose = FALSE)
# Call scrappy
out_newa <- scrappy::newa_nrcc(
client = rD$client,
year = 2020,
month = 12, # December
station = "gbe", # Geneva (Bejo) station
save_file = FALSE
) # Don't save output
out_gmaps <- scrappy::google_maps(
client = rD$client,
name = "Sefton Park",
max_reviews = 20
)
out_gpp <- scrappy::find_a_gp(rD$client, postcode = "L69 3GL")
# Stop server
conn <- rD$server$stop()
```
### NEWA @ Cornell University
The Network for Environment and Weather Applications at Cornell University.
Website: http://newa.cornell.edu
```{r example-newa, eval = FALSE}
# Create RSelenium session
rD <- RSelenium::rsDriver(browser = "firefox", port = 4549L, verbose = FALSE)
# Call scrappy
out <- scrappy::newa_nrcc(
client = rD$client,
year = 2020,
month = 12, # December
station = "gbe", # Geneva (Bejo) station
save_file = FALSE
) # Don't save output to a CSV file
# Stop server
rD$server$stop()
```
Partial output from the previous example:
```{r, echo = FALSE}
knitr::kable(head(out_newa, 10))
```
### Google Maps
Extract the reviews for Sefton Park in Liverpool (only the 20 most recent):
```{r example-google-maps, eval = FALSE}
# Create RSelenium session
rD <- RSelenium::rsDriver(browser = "firefox", port = 4549L, verbose = FALSE)
# Call scrappy
out <- scrappy::google_maps(
client = rD$client,
name = "Sefton Park",
max_reviews = 20
)
# Stop server
rD$server$stop()
```
Output after removing original authors' names and URL to their profiles:
```{r, echo = FALSE}
`%>%` <- scrappy::`%>%`
out_gmaps %>%
dplyr::mutate(
author = paste0("Author ", seq_along(author)),
author_url = ""
) %>%
knitr::kable()
```
### NHS GP practices by postcode
```{r example-nhs-gpp, eval = FALSE}
# Create RSelenium session
rD <- RSelenium::rsDriver(browser = "firefox", port = 4549L, verbose = FALSE)
# Retrieve GP practices near L69 3GL
# (Waterhouse building, University of Liverpool)
out <- scrappy::find_a_gp(rD$client, postcode = "L69 3GL")
# Stop server
rD$server$stop()
```
```{r, echo = FALSE}
knitr::kable(out_gpp)
```