Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mapsinR26May2023.qmd #44

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added data/CCGoutcomes.zip
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"hash": "ef75c07640c5bbb30dd1380194868cff",
"result": {
"markdown": "---\ntitle: Maps in R using ggplot2 and OSM packages\nsubtitle: \"NHSE R drop in session.\"\nauthor: \"Pablo Leon\"\ndate: \"2023-10-06\"\ncategories: [NHS England]\n---\n\n::: {.cell}\n\n```{.r .cell-code}\n# List of required packages\nrequired_packages <- c(\"here\", \"sf\", \"ggplot2\", \"readxl\", \"dplyr\", \"janitor\")\n\n# Check if required packages are not installed, then install and load them\nfor (package in required_packages) {\n if (!requireNamespace(package, quietly = TRUE)) {\n install.packages(package)\n }\n}\n\n# Load the required packages\nlibrary(here)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nhere() starts at /Users/craigshenton/Documents/GitHub/nhs-r-reporting\n```\n:::\n\n```{.r .cell-code}\nlibrary(sf)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nLinking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE\n```\n:::\n\n```{.r .cell-code}\nlibrary(ggplot2)\nlibrary(readxl)\nlibrary(dplyr)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n\nAttaching package: 'dplyr'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nThe following objects are masked from 'package:stats':\n\n filter, lag\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nThe following objects are masked from 'package:base':\n\n intersect, setdiff, setequal, union\n```\n:::\n\n```{.r .cell-code}\nlibrary(janitor)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n\nAttaching package: 'janitor'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nThe following objects are masked from 'package:stats':\n\n chisq.test, fisher.test\n```\n:::\n:::\n\n\n## Intro {background=\"#43464B\"}\n\nThese slides were presented in the NHSE-R drop in session on Friday 26th May 2023.\n\nIn R we can plot geospatial data using several methods, today I will focus on static maps using `ggplot2` and `osm` packages. Drawing maps usually imply these steps:\n\n- Getting shapefiles to draw a map\n- Obtaining metadata to plot on the map \n- In R we load the multypoligon objects using `geom_sf()` and `coord_sf()`\n- Then we can take advantage of ggplot2 framework to overlay one map on top of another\n\nAn introduction to plotting maps in R using ggplot2 can be found on this online book: <https://ggplot2-book.org/maps.html>.\n\n\n\n## 1. NHS Health boundaries{auto-animate=\"true\" background=\"#43464B\"}\n\nThe Office for National Statistics provides a free and open access to several geographic products. There is a specific section for `Health boundaries` on their `Open Geography Portal` website: <https://geoportal.statistics.gov.uk/>. \n\nFrom `Clinical Commissioning Groups` section download `2021 Boundaries` **shapefile** zipped file.\n\n::: columns\n::: {.column width=\"40%\"}\n\n![Health Boundaries](Figures_maps_slides/02 Health boundaries.png)\n:::\n\n::: {.column width=\"60%\"}\n\n![Health Boundaries details](Figures_maps_slides/05 Unzip Shapefile.png)\n:::\n:::\n\n\n## 2. Unzip CCG boundaries into R {background=\"#43464B\"}\n\nLoad unzipped files into R using `Open Geography Portal` \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Define a function to download and cache the CCG data\ndownload_CCG_data <- function() {\n data_dir <- here::here(\"data\") # Use here::here() for consistent path resolution\n \n # Check if the data directory exists, create it if not\n if (!dir.exists(data_dir)) {\n dir.create(data_dir, recursive = TRUE)\n }\n \n zip_file <- file.path(data_dir, \"CCGoutcomes.zip\")\n \n # Check if the file already exists, if not, download and unzip it\n if (!file.exists(zip_file)) {\n download.file(\n url = \"https://files.digital.nhs.uk/48/4DB2CA/CCG_OIS_MAR_2022_Excels_Files.zip\",\n destfile = zip_file\n )\n unzip(zip_file, exdir = data_dir, junkpaths = TRUE)\n }\n}\n\n# Call the function to download and cache the CCG data\ndownload_CCG_data()\n```\n:::\n\n\n## 3. Check shapefiles content {background=\"#43464B\"}\n\nCheck unzipped file contents. We shuold obtain a collection of boundaries files including the .shp file for CCG map.\n\n\n::: columns\n::: {.column width=\"40%\"}\n\n![Health Boundaries](Figures_maps_slides/06 Unzip Shapefile.png)\n:::\n\n::: {.column width=\"60%\"}\n\n:::\n:::\n\n## 4.Load CCG Shapefile and check map {background=\"#43464B\"}\n\nNow we can plot using `ggplot()` and `geom_sf()`function the CCG map for England\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Function to load CCG boundaries data\nload_CCG_boundaries <- function() {\n # Load CCG boundaries shapefile\n shapefile_path <- \"data/Clinical_Commissioning_Groups_April_2021/CCG_APR_2021_EN_BGC.shp\"\n \n # Read the shapefile using sf::st_read()\n CCG_boundaries <- sf::st_read(shapefile_path)\n return(CCG_boundaries)\n}\n\n# Function to create CCG map\ncreate_CCG_map <- function(CCG_boundaries) {\n # Create ggplot map\n CCG_map <- ggplot2::ggplot(data = CCG_boundaries) +\n ggplot2::geom_sf(size = 0.5, color = \"black\", fill = \"coral\") +\n ggplot2::ggtitle(\"CCG Boundaries plot. April 2021\") +\n ggplot2::coord_sf()\n \n return(CCG_map)\n}\n\nCCG_boundaries <- load_CCG_boundaries()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nReading layer `CCG_APR_2021_EN_BGC' from data source \n `/Users/craigshenton/Documents/GitHub/nhs-r-reporting/docs/tutorials/R_drop_in_session_26_may_Maps_in_R/data/Clinical_Commissioning_Groups_April_2021/CCG_APR_2021_EN_BGC.shp' \n using driver `ESRI Shapefile'\nSimple feature collection with 106 features and 7 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 82668.52 ymin: 5352.6 xmax: 655653.8 ymax: 657539.2\nProjected CRS: OSGB36 / British National Grid\n```\n:::\n\n```{.r .cell-code}\nCCG_map <- create_CCG_map(CCG_boundaries)\nprint(CCG_map)\n```\n\n::: {.cell-output-display}\n![](mapsinR26May2023_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\n## 5. Obtain NHS Indicators {background=\"#43464B\"}\n\nFor this presentation I will download `CCG Outcomes Indicator` set for March 2022 from Digital website: \nhttps://digital.nhs.uk/data-and-information/publications/statistical/ccg-outcomes-indicator-set/march-2022\n\nThe Office for National Statistics provides a free and open access to several geographic produts. There is a specific section for Health boundaries on their website: \n\n![Health Boundaries](Figures_maps_slides/CCG outcomes indicator set.png)\n\n## 6. Download CCG Outcomes Indicators {background=\"#43464B\"}\n\nAfter loading the shapefile we `download` the `indicators` to be plotted in the map \n\n![CCG Indicators](Figures_maps_slides/05 Load ccg indicators.png)\n\n## 7. Data wrangling {background=\"#43464B\"}\n\nNow we `combine` ONS `shapefiles` with CCG Outcomes `Indicators` data ready to be plotted in the map.\n\nWe will plot this indicator: 1.17 - Percentage of new cases of cancer for which a valid stage is recorded at the time of diagnosis.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(dplyr)\n# Function to load and preprocess cancer data\nload_and_preprocess_cancer_data <- function() {\n # Load cancer data from Excel file\n cancer_data <- readxl::read_excel(here::here(\"data\", \"CCG_1.17_I01968_D.xlsx\"), sheet = 3, skip = 13) \n \n # Clean column names, select necessary columns, and filter rows\n cleaned_cancer_data <- cancer_data %>%\n janitor::clean_names() %>%\n dplyr::select(reporting_period, breakdown, ons_code, level, level_description, indicator_value) %>%\n dplyr::filter(level_description != \"England\")\n \n return(cleaned_cancer_data)\n}\n\n# Main function for data wrangling\ncancer_data_analysis <- function() {\n # Load and preprocess cancer data\n cancer_data <- load_and_preprocess_cancer_data()\n \n # Additional processing or analysis can be performed here\n \n # Print the resulting dataframe\n print(cancer_data)\n}\n\n# Call the main function for data analysis\ncancer_new_sel <- cancer_data_analysis()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 742 × 6\n reporting_period breakdown ons_code level level_description indicator_value\n <dbl> <chr> <chr> <chr> <chr> <dbl>\n 1 2019 CCG E38000006 02P NHS Barnsley CCG 68 \n 2 2019 CCG E38000007 99E NHS Basildon and … 73.5\n 3 2019 CCG E38000008 02Q NHS Bassetlaw CCG 68.5\n 4 2019 CCG E38000231 92G NHS Bath and Nort… 78.7\n 5 2019 CCG E38000249 M1J4Y NHS Bedfordshire,… 80.2\n 6 2019 CCG E38000221 15A NHS Berkshire Wes… 70 \n 7 2019 CCG E38000220 15E NHS Birmingham an… 75.4\n 8 2019 CCG E38000250 D2P2L NHS Black Country… 76.8\n 9 2019 CCG E38000014 00Q NHS Blackburn wit… 83.6\n10 2019 CCG E38000015 00R NHS Blackpool CCG 76.8\n# ℹ 732 more rows\n```\n:::\n:::\n\n\n## 8. Merge shapefile and Outcome Indicators {background=\"#43464B\"} \n\nThis is the final step prior to plotting the map. We `merge` shapefile with `indicator` files. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Rename columns in cancer_new_sel\ncancer_MAP_rename <- cancer_new_sel %>%\n dplyr::select(\n reporting_period,\n breakdown,\n CCG21CD = ons_code,\n level,\n level_description,\n indicator_value\n )\n\n# Merge shapefile and metric data using left_join from dplyr\nmapdata <- dplyr::left_join(CCG_boundaries, cancer_MAP_rename, by = \"CCG21CD\")\n\n# Apply projection (EPSG:4326) to merged data to plot the map\nmapdata_coord <- sf::st_transform(mapdata, crs = 4326)\n```\n:::\n\n\n## 9. Plot map in ggplot2 {background=\"#43464B\"}\n\nFinally we can add Title and labs to the map in ggplot we have just created\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Plot map combining shapefile and CCG indicator\ncancer_map_blues <- mapdata_coord %>% \n ggplot2::ggplot(ggplot2::aes(fill = indicator_value)) +\n ggplot2::geom_sf() +\n ggplot2::labs(\n title = \"CCG OIS Indicator 1.17 - Record of Stage of Cancer at Diagnosis\",\n subtitle = \"Percentage of New Cases of Cancer for Which a Valid Stage is Recorded \\n at the Time of Diagnosis (95% CI)\",\n caption = \"Data source: NHS Digital National Disease Registration Service (NDRS)\"\n ) +\n ggplot2::theme_minimal() +\n ggplot2::scale_fill_viridis_c(name = \"Indicator Value\")\n\n# Print the map\nprint(cancer_map_blues)\n```\n\n::: {.cell-output-display}\n![](mapsinR26May2023_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n\n\n## 10. Open street maps using OSM package {background=\"#43464B\"}\n\nThe second part of this presentation will cover how to build city maps in R using `osmdata` package in R. This package allows us to download and use data from the `OpenStreetMap (OSM)`\n\nPackage details: <https://cran.r-project.org/web/packages/osmdata/vignettes/osmdata.html>\nOpenStreepMap. OSM is a global open access mapping project, which is free and open under the ODbL licence. OpenStreetMap contributors 2017: <https://www.openstreetmap.org/#map=6/54.910/-3.432>\n\n\nFollow all the details for this second map from my website: <https://pablo-source.github.io/City_maps.html>\n\n\n\n## Online resources to build maps in R {background=\"#43464B\"}\n\nThis presentation only covers a handful of options, please check this repo for facets and grid options\n\n- Maps-in-R> GitHub repo: <https://github.com/Pablo-source/Maps-in-R>\n- Examples and Shapefiles:<https://github.com/Pablo-source/Maps-in-R/tree/main>\n- OSM-maps> GitHub repo: <https://github.com/Pablo-source/Maps-in-R/tree/main/City_maps>\n- CCG Outcomes Indicato Set - March 2022: <https://digital.nhs.uk/data-and-information/publications/statistical/ccg-outcomes-indicator-set/march-2022>\n- Map projections using the “sf” package in R: <https://cran.r-project.org/web/packages/oce/vignettes/D_map_projections.html>\n- R Spatial Workshop Notes: <https://spatialanalysis.github.io/workshop-notes/spatial-clustering.html>\n- NHS-R NHS Colour Guidelines: <https://nhsengland.github.io/nhs-r-reporting/tutorials/nhs-colours.html>\n\n\n\n\n## Shapefiles {background=\"#43464B\"}\n\n- The Open Geography portal from the Office for National Statistics (ONS): \n<https://geoportal.statistics.gov.uk/>\n- The London Datastore:Shapefiles and plenty of social indicators to plot (OA,LSOA,MSOA,Wards)\n<https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london>\n\nOnline books about building maps in R\n- Geocomputation with R: <https://bookdown.org/robinlovelace/geocompr/spatial-class.html>\n- Chapter 16. Geospatial – Bigbook of r: <https://www.bigbookofr.com/geospatial.html>\n\n- Any questions? [email protected], <https://github.com/Pablo-source>",
"supporting": [
"mapsinR26May2023_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/_site/listings.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
{
"listing": "/tutorials/index.html",
"items": [
"/tutorials/R_drop_in_session_26_may_Maps_in_R/mapsinR26May2023.html",
"/tutorials/quarto.html",
"/tutorials/udal.html",
"/tutorials/rap.html",
Expand Down
Loading