-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
219 lines (178 loc) · 8.86 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
output: bookdown::github_document2
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
cache = TRUE,
collapse = TRUE,
comment = "#>",
# eval = FALSE,
echo = FALSE,
message = FALSE,
warning = FALSE
)
library(tidyverse)
library(sf)
param_region = "west-yorkshire"
param_min_flow = 100
param_min_flow_rail = 10
```
# Estimating cycling potential to rail stations
<!-- badges: start -->
<!-- [![.github/workflows/render-rmarkdown.yaml](https://github.com/npct/rail/actions/workflows/render-rmarkdown.yaml/badge.svg)](https://github.com/npct/rail/actions/workflows/render-rmarkdown.yaml) -->
<!-- badges: end -->
The goal of this repo is to explore methods for calculating cycling potential to public transport nodes, rail stations in the first instance.
# OD data
The input data consists of origin-destination pairs.
These can be obtained from a range of sources.
We will use open OD data from the 2011 UK Census to demonstrate the methods.
A random sample of OD pairs from the national dataset is shown below.
```{r}
od_england_wales = pct::get_od()
centroids_england_wales = pct::get_centroids_ew() %>%
sf::st_transform(4326)
od_england_wales %>%
sample_n(3) %>%
knitr::kable()
```
```{r}
region = pct::pct_regions %>%
filter(region_name == param_region)
centroids_region = centroids_england_wales[region, ]
od_region = od_england_wales %>%
filter(geo_code1 %in% centroids_region$msoa11cd)
od_filtered = od_region %>%
filter(all >= param_min_flow_rail)
desire_filtered = od::od_to_sf(x = od_filtered, z = centroids_england_wales)
desire_filtered$length_m = sf::st_length(desire_filtered) %>% as.numeric()
desire_rail = desire_filtered %>%
filter(train >= param_min_flow_rail)
```
The case study region of West Yorkshire is used to subset the dataset of `r nrow(od_england_wales)` OD pairs to records representing trips originating in the region (`r nrow(od_region)` rows).
In a further subsetting stage only OD pairs with more than a threshold number of trips were kept to focus the analysis on desire lines in which large numbers of people travel by train.
Setting this threshold to `r param_min_flow_rail` people by results in `r nrow(desire_rail)` rows in the case study region.
These rail trips are illustrated in Figure \@ref(fig:simpleraildesire) below.
```{r simpleraildesire, fig.cap="Illustration of major commute desire lines originating in West Yorkshire by any mode (black) and by rail (blue)."}
plot(region$geometry)
plot(centroids_england_wales$geometry, add = TRUE)
plot(centroids_region$geometry, pch = 2, add = TRUE)
plot(desire_filtered$geometry, add = TRUE, lwd = 0.2)
plot(desire_rail$geometry, add = TRUE, col = "blue")
```
# Rail station data
Data on rail station locations was obtained from the [naptan.app.dft.gov.uk](http://naptan.app.dft.gov.uk) website.
The multi-stage trips from home to work via rail stations is shown in Figure \@ref(fig:railsample) below.
This graphic assumes simplistically that the first stage of rail journeys was to the nearest station, that the rail journey went to the station closes to their destination, and that trips involve travelling in a straight line (an assumption we will remove in the next section).
```{r, eval=FALSE}
u_naptan = "http://naptan.app.dft.gov.uk/DataRequest/Naptan.ashx?format=csv"
f_naptan = basename(u_naptan)
download.file(url = u_naptan, destfile = f_naptan)
unzip(f_naptan)
list.files()
rail_naptan = readr::read_csv("RailReferences.csv")
rail_sf = sf::st_as_sf(rail_naptan, coords = c("Easting", "Northing"), crs = 27700) %>%
sf::st_transform(4326)
plot(region$geometry)
sf::write_sf(rail_sf, "rail_stations.geojson")
```
```{r railsample, fig.show='hold', out.width="49%", fig.cap="Illustration of desire lines with high numbers of rail trips, focussing on a sample of 5, assuming straight line travel (left) and assuming trips travel via the nearest station to the origin and destination, showing desire lines from home locations to the nearest stations (right)." }
# plot sample of trips to demo data on
set.seed(7)
desire_sample = desire_rail %>%
filter(length_m > 10000) %>%
sample_n(3)
desire_sample2 = desire_rail %>%
filter(length_m > 10000) %>%
filter(!geo_code2 %in% centroids_region$msoa11cd) %>%
sample_n(2)
desire_sample = rbind(desire_sample, desire_sample2)
rail_stations = sf::read_sf("rail_stations.geojson")
# calculate straight line routes via stations
desire_rail_via = stplanr::line_via(l = desire_rail, p = rail_stations)
desire_sample_via = stplanr::line_via(l = desire_sample, p = rail_stations)
# sf::st_crs(desire_sample_via$leg_dest) # see https://github.com/ropensci/stplanr/issues/465
plot(region$geometry)
plot(desire_rail$geometry, col = "grey", add = TRUE)
plot(desire_sample$geometry, col = "blue", add = TRUE, lwd = 2)
plot(rail_stations$geometry, add = TRUE)
plot(region$geometry)
plot(desire_rail_via$leg_orig, col = "grey", add = TRUE)
plot(desire_sample_via$leg_orig, col = "green", add = TRUE, lwd = 3)
plot(desire_sample_via$leg_via, col = "red", add = TRUE, lwd = 3)
plot(desire_sample_via$leg_dest, col = "orange", add = TRUE, lwd = 3)
```
The distribution of total trip distances and trip distances to and from stations is shown in Figure \@ref(fig:distances).
```{r distances, fig.cap="Straight line distances of journey, origin-station segments, station-destination segments, and rail sections of journey."}
desire_rail_via$dist_od = lwgeom::st_geod_length(desire_rail$geometry)
desire_rail_via$dist_leg_orig = lwgeom::st_geod_length(desire_rail_via$leg_orig)
desire_rail_via$dist_leg_via = lwgeom::st_geod_length(desire_rail_via$leg_via)
desire_rail_via$dist_leg_dest = lwgeom::st_geod_length(desire_rail_via$leg_dest)
desire_rail_clean = desire_rail_via %>%
mutate(across(dist_od:dist_leg_dest, \(x) as.numeric(x / 1000)))
leg_labels = c("Origin", "Via", "Destination", "Journey")
distances_df = desire_rail_clean %>%
sf::st_drop_geometry() %>%
select(contains("dist")) %>%
pivot_longer(cols = contains("dist"), names_to = "Journey leg", values_to = "Distance (km)") %>%
mutate(`Journey leg` = gsub(pattern = "dist_|dist_leg_", replacement = "", `Journey leg`)) %>%
mutate(`Journey leg` = str_to_title(`Journey leg`)) %>%
mutate(`Journey leg` = gsub(pattern = "Orig", replacement = "Origin", `Journey leg`)) %>%
mutate(`Journey leg` = gsub(pattern = "Dest", replacement = "Destination", `Journey leg`)) %>%
mutate(`Journey leg` = gsub(pattern = "Od", replacement = "Journey", `Journey leg`)) %>%
mutate(`Journey leg` = factor(`Journey leg`, labels = leg_labels))
# summary(distances_df$`Journey leg`)
distances_df %>%
ggplot() +
geom_histogram(aes(`Distance (km)`), binwidth = 1) +
# geom_density(aes(`Distance (km)`), stat = "count", bw = 100) + # too intricate
facet_wrap(~`Journey leg`, nrow = 1) +
xlim(c(-3, 30))
```
# Public transport routing
The route that people will take is not necessarily the one that goes to the closest rail station to their home.
It will usually be the route that minimises total journey time.
The total journey time can be calculated as the sum of the origin, public transport stage, and destination stages:
$$
T_j = T_o + T_p + T_d
$$
The time taken for each stage varies depending on the origin and destination station.
In this example we will focus only on the choice of the origin station.
We can find the three nearest stations to each origin as follows:
```{r}
origin = lwgeom::st_startpoint(desire_rail_clean$leg_orig)[1]
destination = lwgeom::st_endpoint(desire_rail_clean$leg_dest)[1]
```
```{r, echo=TRUE}
nearest_stations = nngeo::st_nn(origin, rail_stations, k = 3, progress = FALSE)
nearest_stations
```
```{r}
time_2021 = lubridate::ymd_hms("2021-08-11 07:30:00")
# route_pt1 = stplanr::route_google(from = rail_stations$geometry[1], to = destination, mode = "transit")
route_pt1 = stplanr::route_google(from = rail_stations$geometry[nearest_stations[[1]][1]], to = destination, mode = "transit", arrival_time = time_2021)
route_pt2 = stplanr::route_google(from = rail_stations$geometry[nearest_stations[[1]][2]], to = destination, mode = "transit", arrival_time = time_2021)
route_pt3 = stplanr::route_google(from = rail_stations$geometry[nearest_stations[[1]][3]], to = destination, mode = "transit", arrival_time = time_2021)
```
Based on this example, we can plot the three route options and show their associated times:
```{r}
plot(region$geometry)
plot(origin, add = TRUE, cex = 2)
plot(destination, add = TRUE, cex = 2)
plot(rail_stations$geometry[nearest_stations[[1]]], col = "red", add = TRUE, pch = 2)
plot(rail_stations$geometry, col = "red", add = TRUE)
plot(route_pt1, add = T)
plot(route_pt2, add = T)
plot(route_pt3, add = T)
```
```{r}
route_df = rbind(route_pt1, route_pt2, route_pt3)
route_df %>%
select(matches("dist|dur")) %>%
select(-matches("traffic")) %>%
sf::st_drop_geometry() %>%
knitr::kable()
```
# Cycle routing
# Scaling the methods
# Discussion