Describe how to extract log update records #144

pbchase · 2023-12-11T16:31:30Z

I wrote this for a internal project to identify last-update events for fields and thus forms. I thought it woudl be complicated, but it was not that bad. This is the code that located every field update record for a single REDCap project, PID 2633. It has these features:

Reads directly from the databse backend.
Reads redcap_projects to locate the log table
Writes all the data to an RDS for easy reloading by RScript coders
Separates and pivots the data in the overloaded data_values filed to make that data tidy
Summarizes the data into the last-updated data fior each REDCap field name

I feel like this code has a lot of value, but I am not yet sure what to do with it. There is value in knowing that it can be done and that the that the wall-time required to do it is modest even on very large dataset. There is value in the code that gets the raw data out of REDCap. There is value in the transformation from raw data to tidy. That transformation uses some more modern tidyr methods--separate_longer_delim and separate_wider_delim-- currently in the experimental lifecycle. They were perfectly suited for the job. Its hard to imagine expressing that trasnfrmaiton more succinctly.

Having said all that, maybe this begs for two functions, get_redcap_data_update_values() and make_redcap_data_update_values_tidy(), and a vignette that shows a developer how to use them. It would have a refactored version of this script in it.

library(tidyverse)
library(REDCapR)
library(dotenv)
library(lubridate)
library(DBI)
library(RMariaDB)
library(redcapcustodian)
library(openxlsx)
library(rcc.adrc)

library(dotenv)

load_dot_env("redcap_prod.env")

init_etl("extract_log_update_records")

rc_conn <- connect_to_redcap_db()
pid <- 2633

log_event_table <- tbl(rc_conn, "redcap_projects") |>
  filter(project_id == pid) |>
  collect() |>
  pull(log_event_table)

system.time( {
  project_update_records <- tbl(rc_conn, log_event_table) |>
    filter(project_id == pid & event == "UPDATE" & object_type == "redcap_data") |>
    collect() |>
    mutate(ts = lubridate::ymd_hms(ts))
})

project_update_records |>
  count(description)

project_update_records |> saveRDS(here::here("output", "log_db_update_records_from_pid_2633.rds"))

# project_update_records <- readRDS(here::here("output", "log_db_update_records_from_pid_2633.rds"))

field_updates <-
  project_update_records |>
  select(ts, data_values) |>
  separate_longer_delim(data_values, ",\n") |>
  separate_wider_delim(data_values, " = ", names = c("field_name", "value"), too_many = "merge") |>
  mutate(value = str_replace_all(value, "'", ""))

field_updates |>
  group_by(field_name) |>
  summarise(max_date = max(ts))

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Describe how to extract log update records #144

Describe how to extract log update records #144

pbchase commented Dec 11, 2023

Describe how to extract log update records #144

Describe how to extract log update records #144

Comments

pbchase commented Dec 11, 2023