README.Rmd

---
output: bookdown::github_document2
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  cache = TRUE,
  collapse = TRUE,
  comment = "#>",
  # eval = FALSE,
  echo = FALSE,
  message = FALSE,
  warning = FALSE
)
library(tidyverse)
library(sf)

param_region = "west-yorkshire"
param_min_flow = 100
param_min_flow_rail = 10
```

# Estimating cycling potential to rail stations

<!-- badges: start -->
<!-- [![.github/workflows/render-rmarkdown.yaml](https://github.com/npct/rail/actions/workflows/render-rmarkdown.yaml/badge.svg)](https://github.com/npct/rail/actions/workflows/render-rmarkdown.yaml) -->
<!-- badges: end -->

The goal of this repo is to explore methods for calculating cycling potential to public transport nodes, rail stations in the first instance.

# OD data

The input data consists of origin-destination pairs.
These can be obtained from a range of sources.
We will use open OD data from the 2011 UK Census to demonstrate the methods.
A random sample of OD pairs from the national dataset is shown below.

```{r}
od_england_wales = pct::get_od()
centroids_england_wales = pct::get_centroids_ew() %>% 
  sf::st_transform(4326)
od_england_wales %>% 
  sample_n(3) %>% 
  knitr::kable()
```

```{r}
region = pct::pct_regions %>% 
  filter(region_name == param_region)
centroids_region = centroids_england_wales[region, ]
od_region = od_england_wales %>% 
  filter(geo_code1 %in% centroids_region$msoa11cd)
od_filtered = od_region %>% 
  filter(all >= param_min_flow_rail)
desire_filtered = od::od_to_sf(x = od_filtered, z = centroids_england_wales)
desire_filtered$length_m = sf::st_length(desire_filtered) %>% as.numeric()
desire_rail = desire_filtered %>% 
  filter(train >= param_min_flow_rail)
```

The case study region of West Yorkshire is used to subset the dataset of `r nrow(od_england_wales)` OD pairs to records representing trips originating in the region (`r nrow(od_region)` rows).
In a further subsetting stage only OD pairs with more than a threshold number of trips were kept to focus the analysis on desire lines in which large numbers of people travel by train.
Setting this threshold to `r param_min_flow_rail` people by results in `r nrow(desire_rail)` rows in the case study region.
These rail trips are illustrated in Figure \@ref(fig:simpleraildesire) below.

```{r simpleraildesire, fig.cap="Illustration of major commute desire lines originating in West Yorkshire by any mode (black) and by rail (blue)."}
plot(region$geometry)
plot(centroids_england_wales$geometry, add = TRUE)
plot(centroids_region$geometry, pch = 2, add = TRUE)
plot(desire_filtered$geometry, add = TRUE, lwd = 0.2)
plot(desire_rail$geometry, add = TRUE, col = "blue")
```

# Rail station data

Data on rail station locations was obtained from the [naptan.app.dft.gov.uk](http://naptan.app.dft.gov.uk) website.
The multi-stage trips from home to work via rail stations is shown in Figure \@ref(fig:railsample) below.
This graphic assumes simplistically that the first stage of rail journeys was to the nearest station, that the rail journey went to the station closes to their destination, and that trips involve travelling in a straight line (an assumption we will remove in the next section). 

```{r, eval=FALSE}
u_naptan = "http://naptan.app.dft.gov.uk/DataRequest/Naptan.ashx?format=csv"
f_naptan = basename(u_naptan)
download.file(url = u_naptan, destfile = f_naptan)
unzip(f_naptan)
list.files()
rail_naptan = readr::read_csv("RailReferences.csv")
rail_sf = sf::st_as_sf(rail_naptan, coords = c("Easting", "Northing"), crs = 27700) %>% 
  sf::st_transform(4326)
plot(region$geometry)
sf::write_sf(rail_sf, "rail_stations.geojson")
```

```{r railsample, fig.show='hold', out.width="49%", fig.cap="Illustration of desire lines with high numbers of rail trips, focussing on a sample of 5, assuming straight line travel (left) and assuming trips travel via the nearest station to the origin and destination, showing desire lines from home locations to the nearest stations (right)." }
# plot sample of trips to demo data on
set.seed(7)
desire_sample = desire_rail %>% 
  filter(length_m > 10000) %>% 
  sample_n(3)
desire_sample2 = desire_rail %>% 
  filter(length_m > 10000) %>% 
  filter(!geo_code2 %in% centroids_region$msoa11cd) %>% 
  sample_n(2)

desire_sample = rbind(desire_sample, desire_sample2)
rail_stations = sf::read_sf("rail_stations.geojson")
# calculate straight line routes via stations
desire_rail_via = stplanr::line_via(l = desire_rail, p = rail_stations)
desire_sample_via = stplanr::line_via(l = desire_sample, p = rail_stations)
# sf::st_crs(desire_sample_via$leg_dest) # see https://github.com/ropensci/stplanr/issues/465

plot(region$geometry)
plot(desire_rail$geometry, col = "grey", add = TRUE)
plot(desire_sample$geometry, col = "blue", add = TRUE, lwd = 2)
plot(rail_stations$geometry, add = TRUE)

plot(region$geometry)
plot(desire_rail_via$leg_orig, col = "grey", add = TRUE)
plot(desire_sample_via$leg_orig, col = "green", add = TRUE, lwd = 3)
plot(desire_sample_via$leg_via, col = "red", add = TRUE, lwd = 3)
plot(desire_sample_via$leg_dest, col = "orange", add = TRUE, lwd = 3)
```

The distribution of total trip distances and trip distances to and from stations is shown in Figure \@ref(fig:distances).

```{r distances, fig.cap="Straight line distances of journey, origin-station segments, station-destination segments, and rail sections of journey."}
desire_rail_via$dist_od = lwgeom::st_geod_length(desire_rail$geometry)
desire_rail_via$dist_leg_orig = lwgeom::st_geod_length(desire_rail_via$leg_orig)
desire_rail_via$dist_leg_via = lwgeom::st_geod_length(desire_rail_via$leg_via)
desire_rail_via$dist_leg_dest = lwgeom::st_geod_length(desire_rail_via$leg_dest)
desire_rail_clean = desire_rail_via %>% 
  mutate(across(dist_od:dist_leg_dest, \(x) as.numeric(x / 1000)))
leg_labels = c("Origin", "Via", "Destination", "Journey")
distances_df = desire_rail_clean %>% 
  sf::st_drop_geometry() %>% 
  select(contains("dist")) %>% 
  pivot_longer(cols = contains("dist"), names_to = "Journey leg", values_to = "Distance (km)") %>% 
  mutate(`Journey leg` = gsub(pattern = "dist_|dist_leg_", replacement = "", `Journey leg`)) %>% 
  mutate(`Journey leg` = str_to_title(`Journey leg`)) %>%
  mutate(`Journey leg` = gsub(pattern = "Orig", replacement = "Origin", `Journey leg`)) %>% 
  mutate(`Journey leg` = gsub(pattern = "Dest", replacement = "Destination", `Journey leg`)) %>% 
  mutate(`Journey leg` = gsub(pattern = "Od", replacement = "Journey", `Journey leg`)) %>% 
  mutate(`Journey leg` = factor(`Journey leg`, labels = leg_labels))
# summary(distances_df$`Journey leg`)
distances_df %>% 
  ggplot() +
  geom_histogram(aes(`Distance (km)`), binwidth = 1) +
  # geom_density(aes(`Distance (km)`), stat = "count", bw = 100) + # too intricate
  facet_wrap(~`Journey leg`, nrow = 1) +
  xlim(c(-3, 30))
```

# Public transport routing

The route that people will take is not necessarily the one that goes to the closest rail station to their home.
It will usually be the route that minimises total journey time.

The total journey time can be calculated as the sum of the origin, public transport stage, and destination stages:

$$
T_j = T_o + T_p + T_d
$$

The time taken for each stage varies depending on the origin and destination station.
In this example we will focus only on the choice of the origin station.
We can find the three nearest stations to each origin as follows:

```{r}
origin = lwgeom::st_startpoint(desire_rail_clean$leg_orig)[1]
destination = lwgeom::st_endpoint(desire_rail_clean$leg_dest)[1]
```


```{r, echo=TRUE}
nearest_stations = nngeo::st_nn(origin, rail_stations, k = 3, progress = FALSE)
nearest_stations
```

```{r}
time_2021 = lubridate::ymd_hms("2021-08-11 07:30:00")
# route_pt1 = stplanr::route_google(from = rail_stations$geometry[1], to = destination, mode = "transit")
route_pt1 = stplanr::route_google(from = rail_stations$geometry[nearest_stations[[1]][1]], to = destination, mode = "transit", arrival_time = time_2021)
route_pt2 = stplanr::route_google(from = rail_stations$geometry[nearest_stations[[1]][2]], to = destination, mode = "transit", arrival_time = time_2021)
route_pt3 = stplanr::route_google(from = rail_stations$geometry[nearest_stations[[1]][3]], to = destination, mode = "transit", arrival_time = time_2021)
```

Based on this example, we can plot the three route options and show their associated times:

```{r}
plot(region$geometry)
plot(origin, add = TRUE, cex = 2)
plot(destination, add = TRUE, cex = 2)
plot(rail_stations$geometry[nearest_stations[[1]]], col = "red", add = TRUE, pch = 2)
plot(rail_stations$geometry, col = "red", add = TRUE)
plot(route_pt1, add = T)
plot(route_pt2, add = T)
plot(route_pt3, add = T)
```


```{r}
route_df = rbind(route_pt1, route_pt2, route_pt3) 
route_df %>% 
  select(matches("dist|dur")) %>% 
  select(-matches("traffic")) %>% 
  sf::st_drop_geometry() %>% 
  knitr::kable()
```

# Cycle routing


# Scaling the methods


# Discussion