Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe how to extract log update records #144

Open
pbchase opened this issue Dec 11, 2023 · 0 comments
Open

Describe how to extract log update records #144

pbchase opened this issue Dec 11, 2023 · 0 comments

Comments

@pbchase
Copy link
Contributor

pbchase commented Dec 11, 2023

I wrote this for a internal project to identify last-update events for fields and thus forms. I thought it woudl be complicated, but it was not that bad. This is the code that located every field update record for a single REDCap project, PID 2633. It has these features:

  • Reads directly from the databse backend.
  • Reads redcap_projects to locate the log table
  • Writes all the data to an RDS for easy reloading by RScript coders
  • Separates and pivots the data in the overloaded data_values filed to make that data tidy
  • Summarizes the data into the last-updated data fior each REDCap field name

I feel like this code has a lot of value, but I am not yet sure what to do with it. There is value in knowing that it can be done and that the that the wall-time required to do it is modest even on very large dataset. There is value in the code that gets the raw data out of REDCap. There is value in the transformation from raw data to tidy. That transformation uses some more modern tidyr methods--separate_longer_delim and separate_wider_delim-- currently in the experimental lifecycle. They were perfectly suited for the job. Its hard to imagine expressing that trasnfrmaiton more succinctly.

Having said all that, maybe this begs for two functions, get_redcap_data_update_values() and make_redcap_data_update_values_tidy(), and a vignette that shows a developer how to use them. It would have a refactored version of this script in it.

library(tidyverse)
library(REDCapR)
library(dotenv)
library(lubridate)
library(DBI)
library(RMariaDB)
library(redcapcustodian)
library(openxlsx)
library(rcc.adrc)

library(dotenv)

load_dot_env("redcap_prod.env")

init_etl("extract_log_update_records")

rc_conn <- connect_to_redcap_db()
pid <- 2633

log_event_table <- tbl(rc_conn, "redcap_projects") |>
  filter(project_id == pid) |>
  collect() |>
  pull(log_event_table)

system.time( {
  project_update_records <- tbl(rc_conn, log_event_table) |>
    filter(project_id == pid & event == "UPDATE" & object_type == "redcap_data") |>
    collect() |>
    mutate(ts = lubridate::ymd_hms(ts))
})

project_update_records |>
  count(description)

project_update_records |> saveRDS(here::here("output", "log_db_update_records_from_pid_2633.rds"))

# project_update_records <- readRDS(here::here("output", "log_db_update_records_from_pid_2633.rds"))

field_updates <-
  project_update_records |>
  select(ts, data_values) |>
  separate_longer_delim(data_values, ",\n") |>
  separate_wider_delim(data_values, " = ", names = c("field_name", "value"), too_many = "merge") |>
  mutate(value = str_replace_all(value, "'", ""))

field_updates |>
  group_by(field_name) |>
  summarise(max_date = max(ts))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant