-
Notifications
You must be signed in to change notification settings - Fork 4
/
README.Rmd
129 lines (82 loc) · 4.83 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# opendataes
[![R build status](https://github.com/rOpenSpain/opendataes/workflows/R-CMD-check/badge.svg)](https://github.com/rOpenSpain/opendataes/actions)
[![Coverage status](https://codecov.io/gh/rOpenSpain/opendataes/branch/master/graph/badge.svg)](https://codecov.io/github/rOpenSpain/opendataes?branch=master)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
## Description
The goal of `opendataes` is to interact and download data from the [https://datos.gob.es](https://datos.gob.es) API.
This API is an effort from the Spanish government to unify all data sources from different provinces and regions into a single API. The API includes data from entities such as universities, small and big city halls, autonomous communities and Spain as a whole. With over 19,000 datasets in topics all the way from the number of parking spaces in a given city to the levels of air pollution at the regional level (at the moment of writing, October 2018), the API keeps growing with a rich set of public information that researchers and data analysts can use for their own research.
Because aggregating data from all these different sources poses a challenge for reading different formats and harmonizing different datasets, `opendataes` is very conservative in what it can read. Once you install the package, you can print the contents of `permitted_formats` and `available_publishers` to explore which formats and publishers are available.
## Collaboration
This package is meant to be developed by the R community and it is completely open to new pull requests, ideas and collaborations. The idea of the package is to include as many formats and publishers as possible by bringing the support and knowledge of other developers If you're interested in collaborating, please check the vignettes as the package is described in detail there and file a pull request.
## Installation
`opendataes` is currently being tested and is not available on CRAN. You can install the development version from [Github](https://github.com) with:
``` r
remotes::install_github("ropenspain/opendataes")
```
## Example
The package has one main function that allows to read data from the API: `openes_load`. However, that function can be used in two different ways.
### Web-based search
You can search for a dataset [here](http://datos.gob.es/es/catalogo?_publisher_display_name_limit=0) and click on it's names to go to its homepage. You'll have something like this:
<img src="man/figures/datos_web.png" align="center" />
<br/>
<br/>
`openes_load` needs only the end path of a given dataset to read the data. So, we copy this:
<br/>
<br/>
<img src="man/figures/datos_url.png" align="center"/>
<br/>
<br/>
and pass it on to `openes_load`, and that is it.
```{r}
library(opendataes)
id <- 'l01080193-resultados-absolutos-de-las-elecciones-al-parlamento-europeo-de-la-ciudad-de-barcelona'
elections <- openes_load(id)
```
### R-based search
Alternatively, `opendataes` allows to seach datasets by keywords. For example, to read the same data as before we could search for a keyword and specifying the publisher of that dataset. We know that the data belongs to the 'Ayuntamiento of Barcelona' but `openes_keywords` requires that publisher's code.
First we extract the code using the `publishers_available` data frame from `opendataes`:
```{r}
pb_code <- publishers_available$publisher_code[publishers_available$publishers == 'Ayuntamiento de Barcelona']
```
(you can check out the available publishers with `publishers_available`, which should increase as the package evolves)
And then we search for a given key keyword
```{r,}
kw <- openes_keywords('elecciones', pb_code)
kw
```
Once we have that, `openes_load` only requires that we pass this exact data frame but only with one row. We can do some data munging and subset our data of interest.
```{r}
final_dt <- kw[grepl("Resultados absolutos de las elecciones al Parlamento Europeo de la ciudad de Barcelona",
kw$description,
fixed = TRUE), ]
final_dt
```
And we pass it to `openes_load`
```{r}
elections <- openes_load(final_dt)
```
### Usage
Once we have the results, we get a print out of the relevant metadata of the dataset.
```{r}
elections
```
But more importantly, we can subset the metadata of the dataset as well as the data that was read.
```{r}
elections$metadata
```
```{r}
elections$data
```
For a deeper explanation of what the columns of the metadata mean and some important caveats of what the package can and cannot do, please read the package's vignettes.