-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
97 lines (62 loc) · 4.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
echo = FALSE
)
library(tidyverse)
library(tmap)
```
# schema
<!-- badges: start -->
<!-- badges: end -->
The goal of this repo is to provide a shared, comprehensive and, eventually, authoritative set of categories and datasets to map existing and planned infrastructure for active travel planning.
Eventually, the intention is for this repo to also contain associated tools and code/documentation to make schemas defining active travel datasets useful for everyone wanting to map active travel infrastructure in a consistent way that is friendly and as useful as possible for people using the data.
## Introduction
Database schemas define the fields contained within datasets and the values that are allowed.
They are often visualised in [database diagrams](https://learn.microsoft.com/en-us/sql/ssms/visual-db-tools/design-database-diagrams-visual-database-tools?view=sql-server-ver16).
<!-- ![](https://user-images.githubusercontent.com/1825120/185740368-56effb61-a3c1-4534-bd49-ad42951f2fd7.png) -->
Advantages of well-designed and used schemas include:
- Consistency between different users (local authorities, researchers, consultancies, other users)
- Supporting people generating new plans to think about the data they may need to collect
- Future-proofness: an evolving schema with empty fields that may become useful encourages datasets to be kept up-to-date
<!-- ## An example schema -->
<!-- A basic example of a schema for active travel infrastructure is shown below. -->
<!-- ![](https://user-images.githubusercontent.com/1825120/185741429-fabb3183-bcbe-4bd9-8396-dff4a533d55d.png) -->
<!-- In this schema, there are three tables representing neteworks, routes and ways. -->
<!-- The ways represent the individual segments that, in combination, form routes. -->
<!-- In turn, many routes can add-up to represent a network. -->
<!-- Networks could cover an entire city or be small, e.g. to represent the travel network in a particular neighbourhood. -->
```{r}
# areas = jsonlite::read_json("schemas/areas.json", simplifyVector = TRUE)
```
<!-- ## Schema versioning -->
Just like software used to process data, data formats, and the underlying schemas that define them, should evolve over time.
<!-- To extend the schema above we can add more fields such as MinWidth and MaxWidth and add another table, representing point features on the network. -->
<!-- This updated schema is illustrated below. -->
<!-- ![](https://user-images.githubusercontent.com/1825120/185744446-0896f9e8-de0b-43d9-ac9e-21735762017f.png) -->
<!-- In the future the schema could evolve further. -->
Versioning the schema will ensure backwards compatibility and encourage innovation.
## Existing active travel infrastructure schemas
### TfL's Cycling Infrastructure Database (CID)
The CID contains detailed data on London's cycle infrastructure, divided into the following tables (see the [example_data](https://github.com/acteng/schema/tree/main/example_data) folder for examples):
```{r, message=FALSE}
cid_tables = read_csv("example_data/cid_tables.csv")
knitr::kable(cid_tables)
```
The CID's data has a rather verbose schema in which many attributes such as cycleway type are converted into a series of boolean (TRUE/FALSE) expressions.
Converted into a dataset in the Arrow format, the column names and associated data types, example values, asset types and labels are as follows, for the Cycle lane/track table, for example:
```{r}
```
Many of the columns in the schema outlined above have been replaced in schemas in this repo by 'type' keys that accept values including Advisory lane, Full separation and Stepped separation for motor traffic.
See the [cid](cid.md) example, [TfL's schema document](https://cycling.data.tfl.gov.uk/CyclingInfrastructure/documentation/cid_database_schema.xlsx), the repo <https://github.com/PublicHealthDataGeek/CycleInfraLnd/> and associated paper (Tait et al., [2022](https://www.sciencedirect.com/science/article/pii/S221414052200041X)) for further details on the CID.
### Ordnance Survey Schemas
TBC.
### OSM
TBC.
### The OpenInfra project
TBC.