Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read job_summary_data from multiple log rows #109

Closed
pbchase opened this issue Apr 26, 2023 · 4 comments
Closed

Read job_summary_data from multiple log rows #109

pbchase opened this issue Apr 26, 2023 · 4 comments
Assignees

Comments

@pbchase
Copy link
Contributor

pbchase commented Apr 26, 2023

It would be helpful to have a function that codifies the recipe for reading the job_summary_data out of a log entry for a group of similar records. The need derives from our use of JSON to store the data, MySQL's inability to parse it, and JSONLite's lack of vectorization. Reading one record of job_summary_data data is easy, but most use cases can't identify the single record of interest or they want records across a span of time. Here's some sample code that reads all the records saved from a script named warn_owners_of_impending_bill

library(dotenv)
load_dot_env("prod.env")

library(redcapcustodian)
library(rcc.billing)
library(RMariaDB)
library(DBI)
library(tidyverse)
library(lubridate)
library(dotenv)

init_etl("pbc_scratch")

log_conn <- get_package_scope_var("log_con")

log_data <- tbl(log_conn, "rcc_job_log") %>%
  dplyr::filter(script_name == "warn_owners_of_impending_bill") %>%
  collect()

log_ids <- log_data$id
names(log_ids) <- log_ids

purrr::map2_dfr(log_ids, log_data$job_summary_data, ~ jsonlite::fromJSON(.y), .id = "id") %>%
  mutate(id = as.integer(id)) %>%
  left_join(log_data %>% select(-job_summary_data), by = "id")

The interesting part is the last two stanzas of code where we give names to a vector of row IDs to query, iteratively read the JSON from those rows, then join the rest of the log data to the decoded JSON:

log_ids <- log_data$id
names(log_ids) <- log_ids

purrr::map2_dfr(log_ids, log_data$job_summary_data, ~ jsonlite::fromJSON(.y), .id = "id") %>%
  mutate(id = as.integer(id)) %>%
  left_join(log_data %>% select(-job_summary_data), by = "id")

Putting just these components into the function allows the caller to decide how many records from what script or scripts get decoded. We already have use cases for this function where we want to inspect logged email send failures generated by multiple scripts. As we logged the email failures the same way in each function, aggregating the send failures across them is trivial with a function like this.

This function will need a name. I offer these suggestions:

  1. spread_job_summary_data
  2. decode_job_summary_data
  3. decode_job_summary_records
  4. decode_job_summary
  5. decode_job_summaries
  6. read_job_summaries

We need to keep the distinction in people's minds between the redcapcustodian job logs and REDCap's redcap_log_event table

@ChemiKyle, have any ideas on naming?

@pbchase pbchase changed the title Read _job_summary_data_ from multiple log rows Read job_summary_data from multiple log rows Apr 26, 2023
@pbchase
Copy link
Contributor Author

pbchase commented Apr 26, 2023

more name suggestions:

  1. unnest_job_summary_records
  2. unnest_job_summaries
  3. unnest_job_summary_data
  4. unnest_job_summary_data_records
  5. unnest_job_summary_data_column
  6. unnest_job_log_data

@pbchase pbchase self-assigned this May 18, 2023
@pbchase
Copy link
Contributor Author

pbchase commented May 18, 2023

I'm choosing the name unnest_job_summary_data

@pbchase
Copy link
Contributor Author

pbchase commented May 18, 2023

I'm choosing the name unnest_job_summary_data

No, let's make that unnest_job_summary_data_json_object

@pbchase
Copy link
Contributor Author

pbchase commented Jul 15, 2023

addressed in PR# 111

@pbchase pbchase closed this as completed Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant