Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starter on updates for 2023 data #254

Merged
merged 16 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: stats19
Title: Work with Open Road Traffic Casualty Data from Great Britain
Version: 3.2.0
Version: 3.3.0
Authors@R: c(
person("Robin", "Lovelace", email = "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-5679-6536")),
Expand All @@ -24,7 +24,7 @@ Description: Tools to help download, process and analyse the UK road collision d
The statistics relate only to events on public roads that were reported
to the police, and subsequently recorded, using the 'STATS19' collision reporting form. See
the Department for Transport website
<https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data> for more
<https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data> for more
information on these datasets.
The package is described in a paper in the Journal of Open Source Software
(Lovelace et al. 2019) <doi:10.21105/joss.01181>.
Expand Down Expand Up @@ -61,12 +61,11 @@ Suggests:
htmltools,
tmap,
jsonlite,
pct,
spatstat.geom,
osmdata,
covr
VignetteBuilder: knitr
RoxygenNote: 7.2.3
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE)
Language: en-US
X-schema.org-keywords: stats19, road-safety, transport, car-crashes, ropensci, data
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# stats19 3.3.0 2025-01

* Support for 2023 data (#251)
* Another round of updates to the schema files thanks to updates from the DfT

# stats19 3.2.0 2024-10

* Updates so package functions fail gracefully when input data is not as expected, e.g. due to URL changes (#252)
Expand Down
2 changes: 1 addition & 1 deletion R/dl.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
#' # with type as casualty
#' dl_stats19(year = 2022, type = "casualty")
#' # try another year
#' dl_stats19(year = 2018)
#' dl_stats19(year = 2023)
#' }
#' }
dl_stats19 = function(year = NULL,
Expand Down
28 changes: 13 additions & 15 deletions R/format.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,9 @@
#' @export
#' @examples
#' \donttest{
#' if(curl::has_internet()) {
#' dl_stats19(year = 2022, type = "collision")
#' x = read_collisions(year = 2022, format = FALSE)
#' x = readr::read_csv("https://github.com/ropensci/stats19/releases/download/v3.0.0/fatalities.csv")
#' if(nrow(x) > 0) {
#' x[1:3, 1:12]
#' crashes = format_collisions(x)
#' crashes[1:3, 1:12]
#' summary(crashes$datetime)
#' }
#' }
#' if(curl::has_internet()) {
#' dl_stats19(year = 2022, type = "collision")
#' }
#' }
#' @export
format_collisions = function(x) {
Expand Down Expand Up @@ -66,19 +58,24 @@ format_stats19 = function(x, type) {
# Rename columns
old_names = names(x)
new_names = format_column_names(old_names)
# waldo::compare(old_names, new_names) They are the same for 2023 date
# TODO: remove format_column_names() and use stats19::stats19_schema$variable
names(x) = new_names

# create lookup table
lkp = stats19::stats19_variables[stats19::stats19_variables$table == type,]
lkp = stats19::stats19_variables[stats19::stats19_variables$table == tolower(type),]

vkeep = new_names %in% stats19::stats19_schema$variable_formatted
vkeep = new_names %in% stats19::stats19_schema$variable
vars_to_change = which(vkeep)

# # for testing
# browser()
# i = 1
# x_old = x
for(i in vars_to_change) {
lkp_name = lkp$column_name[lkp$column_name == new_names[i]]
lkp_name = unique(lkp$variable[lkp$variable %in% new_names[i]])
lookup = stats19::stats19_schema[
stats19::stats19_schema$variable_formatted == lkp_name,
stats19::stats19_schema$variable %in% lkp_name,
c("code", "label")
]
original_class = class(x[[i]])
Expand All @@ -88,6 +85,7 @@ format_stats19 = function(x, type) {
x[[i]] = ifelse(is.na(matched_labels), x[[i]], matched_labels)
x[[i]] = methods::as(x[[i]], original_class)
}
# waldo::compare(x_old, x)

date_in_names = "date" %in% names(x)
if(date_in_names) {
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ knitr::opts_chunk$set(
# stats19 <a href='https://docs.ropensci.org/stats19/'><img src='https://raw.githubusercontent.com/ropensci/stats19/master/man/figures/logo.png' align="right" height=215/></a>

**stats19** provides functions for downloading and formatting road crash data.
Specifically, it enables access to the UK's official road traffic casualty database, [STATS19](https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data). (The name comes from the form used by the police to record car crashes and other incidents resulting in casualties on the roads.)
Specifically, it enables access to the UK's official road traffic casualty database, [STATS19](https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data). (The name comes from the form used by the police to record car crashes and other incidents resulting in casualties on the roads.)

The raw data is provided as a series of `.csv` files that contain integers and which are stored in dozens of `.zip` files.
Finding, reading-in and formatting the data for research can be a time consuming process subject to human error.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ cycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://l
**stats19** provides functions for downloading and formatting road crash
data. Specifically, it enables access to the UK’s official road traffic
casualty database,
[STATS19](https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data).
[STATS19](https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data).
(The name comes from the form used by the police to record car crashes
and other incidents resulting in casualties on the roads.)

Expand Down
6 changes: 1 addition & 5 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
Apologies for missing the file that failed to pass the auto checks.

Updated to remove README.html, also the package has now been tested without wifi and the tests pass.

Updates so package functions fail gracefully when input data is not as expected, e.g. due to URL changes.
Various updates, including removal of `pct` from Suggests, and fixes to support new datasets from the UK Department for Transport.

## R CMD check results

Expand Down
5 changes: 4 additions & 1 deletion data-raw/all-crashes.R
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
a = stats19::read_collisions(year = 1979)
devtools::load_all()
a_new = get_stats19(year = 1979, type = "collision", data_dir = tempdir())

a = read_collisions(year = 1979)
24 changes: 14 additions & 10 deletions data-raw/file_name_df.csv
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
file_name,url
dft-road-casualty-statistics-casualty-adjustment-lookup_2004-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-collision-adjustment-lookup_2004-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-vehicle-e-scooter-2020-Latest-Published-Year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-e-scooter-2020-Latest-Published-Year.csv
dft-road-casualty-statistics-historical-revisions-data.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-historical-revisions-data.csv
dft-road-casualty-statistics-vehicle-provisional-mid-year-unvalidated-2024.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-provisional-mid-year-unvalidated-2024.csv
dft-road-casualty-statistics-casualty-provisional-mid-year-unvalidated-2024.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-provisional-mid-year-unvalidated-2024.csv
dft-road-casualty-statistics-collision-provisional-mid-year-unvalidated-2024.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-provisional-mid-year-unvalidated-2024.csv
dft-road-casualty-statistics-casualty-2023.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2023.csv
dft-road-casualty-statistics-vehicle-2023.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2023.csv
dft-road-casualty-statistics-collision-2023.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2023.csv
dft-road-casualty-statistics-casualty-2022.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2022.csv
dft-road-casualty-statistics-vehicle-2022.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2022.csv
dft-road-casualty-statistics-collision-2022.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2022.csv
dft-road-casualty-statistics-casualty-1979-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-1979-latest-published-year.csv
dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv
dft-road-casualty-statistics-collision-1979-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-1979-latest-published-year.csv
dft-road-casualty-statistics-casualty-2021.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2021.csv
dft-road-casualty-statistics-vehicle-2021.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2021.csv
dft-road-casualty-statistics-collision-2021.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2021.csv
Expand All @@ -18,9 +17,14 @@ dft-road-casualty-statistics-collision-2020.csv,https://data.dft.gov.uk/road-acc
dft-road-casualty-statistics-casualty-2019.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2019.csv
dft-road-casualty-statistics-vehicle-2019.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2019.csv
dft-road-casualty-statistics-collision-2019.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2019.csv
dft-road-casualty-statistics-casualty-2018.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2018.csv
dft-road-casualty-statistics-vehicle-2018.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2018.csv
dft-road-casualty-statistics-collision-2018.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2018.csv
dft-road-casualty-statistics-casualties-adjustment-last-5-years.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualties-adjustment-last-5-years.csv
dft-road-casualty-statistics-collision-adjustment-last-5-years.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-adjustment-last-5-years.csv
dft-road-casualty-statistics-casualty-adjustment-lookup_2004-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-collision-adjustment-lookup_2004-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-casualty-1979-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-1979-latest-published-year.csv
dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv
dft-road-casualty-statistics-collision-1979-latest-published-year.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-1979-latest-published-year.csv
dft-road-casualty-statistics-casualty-last-5-years.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-last-5-years.csv
dft-road-casualty-statistics-vehicle-last-5-years.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-last-5-years.csv
dft-road-casualty-statistics-collision-last-5-years.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-last-5-years.csv
dft-road-casualty-statistics-historical-revisions-data.csv,https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-historical-revisions-data.csv
24 changes: 14 additions & 10 deletions data-raw/file_names.txt
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
dft-road-casualty-statistics-casualty-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-collision-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-vehicle-e-scooter-2020-Latest-Published-Year.csv
dft-road-casualty-statistics-historical-revisions-data.csv
dft-road-casualty-statistics-vehicle-provisional-mid-year-unvalidated-2024.csv
dft-road-casualty-statistics-casualty-provisional-mid-year-unvalidated-2024.csv
dft-road-casualty-statistics-collision-provisional-mid-year-unvalidated-2024.csv
dft-road-casualty-statistics-casualty-2023.csv
dft-road-casualty-statistics-vehicle-2023.csv
dft-road-casualty-statistics-collision-2023.csv
dft-road-casualty-statistics-casualty-2022.csv
dft-road-casualty-statistics-vehicle-2022.csv
dft-road-casualty-statistics-collision-2022.csv
dft-road-casualty-statistics-casualty-1979-latest-published-year.csv
dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv
dft-road-casualty-statistics-collision-1979-latest-published-year.csv
dft-road-casualty-statistics-casualty-2021.csv
dft-road-casualty-statistics-vehicle-2021.csv
dft-road-casualty-statistics-collision-2021.csv
Expand All @@ -17,9 +16,14 @@ dft-road-casualty-statistics-collision-2020.csv
dft-road-casualty-statistics-casualty-2019.csv
dft-road-casualty-statistics-vehicle-2019.csv
dft-road-casualty-statistics-collision-2019.csv
dft-road-casualty-statistics-casualty-2018.csv
dft-road-casualty-statistics-vehicle-2018.csv
dft-road-casualty-statistics-collision-2018.csv
dft-road-casualty-statistics-casualties-adjustment-last-5-years.csv
dft-road-casualty-statistics-collision-adjustment-last-5-years.csv
dft-road-casualty-statistics-casualty-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-collision-adjustment-lookup_2004-latest-published-year.csv
dft-road-casualty-statistics-casualty-1979-latest-published-year.csv
dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv
dft-road-casualty-statistics-collision-1979-latest-published-year.csv
dft-road-casualty-statistics-casualty-last-5-years.csv
dft-road-casualty-statistics-vehicle-last-5-years.csv
dft-road-casualty-statistics-collision-last-5-years.csv
dft-road-casualty-statistics-historical-revisions-data.csv
56 changes: 34 additions & 22 deletions data-raw/misc.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,8 @@ all_links = page %>%
html_nodes("a") %>% # find all links
html_attr("href")

zips = all_links %>% str_subset("\\.zip")
csvs = all_links %>% str_subset("\\.csv")
r = all_links %>% str_subset("\\.csv")

r = c(zips, csvs)
dr = c()
for(i in 1:length(r)) {
dr[i] = sub("https://data.dft.gov.uk/road-accidents-safety-data/",
Expand Down Expand Up @@ -45,25 +43,38 @@ writeLines(file_names_char, "data-raw/file_names.txt")
readr::write_csv(file_name_df, "data-raw/file_name_df.csv")
file.edit("data-raw/file_names.txt")
file.remove("file_names_old.rda")
# All file names with 1979 in the name
file_names_1979 = file_names[grepl("1979", names(file_names))]
# $`dft-road-casualty-statistics-casualty-1979-latest-published-year.csv`
# [1] "dft-road-casualty-statistics-casualty-1979-latest-published-year.csv"

# $`dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv`
# [1] "dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv"

# $`dft-road-casualty-statistics-collision-1979-latest-published-year.csv`
# [1] "dft-road-casualty-statistics-collision-1979-latest-published-year.csv"

# 2023 data:
file_names_2023 = file_names[grepl("2023", names(file_names))]

file_names$`accident-and-casualty-adjustment-2004-to-2019.zip`
file_names$`accident-and-casualty-adjustment-2004-to-2019.zip` = NULL
file_names$`accident-and-casualty-adjustment-2004-to-2019.zip`
usethis::use_data(file_names, overwrite = TRUE)
```

The `accidents_sample_raw` can be (re)generated using:

```{r}
devtools::load_all()
# Obtained with:
dl_stats19(year = 2022, type = "collison")
accidents_2022_raw = read_collisions(year = 2022)
dl_stats19(year = 2023, type = "collision")
accidents_2023_raw = read_collisions(year = 2023)
accidents_2023_raw = get_stats19(year = 2023, type = "collision", data_dir = tempdir(), format = FALSE)
set.seed(350)
sel = sample(nrow(accidents_2022_raw), 3)
accidents_sample_raw = accidents_2022_raw[sel, ]
sel = sample(nrow(accidents_2023_raw), 3)
accidents_sample_raw = accidents_2023_raw[sel, ]
# accidents_sample = format_collisions(accidents_sample_raw)
accidents_sample = accidents_sample_raw
waldo::compare(accidents_sample_raw, accidents_sample)
accidents_sample_formatted = format_collisions(accidents_sample)
waldo::compare(accidents_sample_raw, accidents_sample_formatted)
usethis::use_data(accidents_sample_raw, overwrite = TRUE)
usethis::use_data(accidents_sample, overwrite = TRUE)
```
Expand All @@ -72,26 +83,27 @@ Similarly for casualites, use:

```{r}
# Obtained with:
dl_stats19(year = 2022, type = "cas")
casualties_2022_raw = read_casualties(year = 2022)
casualties_2023_raw = get_stats19(year = 2023, type = "casualty", data_dir = tempdir(), format = FALSE)
set.seed(350)
sel = sample(nrow(casualties_2022_raw), 3)
casualties_sample_raw = casualties_2022_raw[sel, ]
sel = sample(nrow(casualties_2023_raw), 3)
casualties_sample_raw = casualties_2023_raw[sel, ]
# casualties_sample = format_casualties(casualties_sample_raw)
casualties_sample = casualties_sample_raw
casualties_sample_formatted = format_casualties(casualties_sample)
waldo::compare(casualties_sample_raw, casualties_sample_formatted)
usethis::use_data(casualties_sample, overwrite = TRUE)
```

and for vehicles, use:

```{r}
# Obtained with:
dl_stats19(year = 2022, type = "veh")
vehicles_2022_raw = read_vehicles(year = 2022)
vehicles_2023_raw = get_stats19(year = 2023, type = "vehicle", data_dir = tempdir(), format = FALSE)
set.seed(350)
sel = sample(nrow(vehicles_2022_raw), 3)
vehicles_sample_raw = vehicles_2022_raw[sel, ]
# vehicles_sample = format_vehicles(vehicles_sample_raw)
vehicles_sample = vehicles_2022_raw[sel,]
sel = sample(nrow(vehicles_2023_raw), 3)
vehicles_sample_raw = vehicles_2023_raw[sel, ]
vehicles_sample_formatted = format_vehicles(vehicles_sample_raw)
vehicles_sample = vehicles_2023_raw[sel,]
waldo::compare(vehicles_sample_raw, vehicles_sample_formatted)
usethis::use_data(vehicles_sample, overwrite = TRUE)
```

Expand Down
Loading
Loading