Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up table and add columns to tutorial. #21

Merged
merged 4 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions tutorial/10-FHIRPIT.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ editor_options:
|-------------------------------------------------------------------|
| title: "FHIR PIT: HL7 Fast Healthcare Interoperability Resources Patient data Integration Tool" |
| subtitle: "Tutorial" |
| date: "April 16, 2024" |
| date: "April 30, 2024" |
| author: "Juan Garcia, Kara Fecho, Hong Yi" |
| format: html |

*Date*: April 16, 2024
*Date*: April 30, 2024

*Author(s)*: Juan Garcia, Kara Fecho, Hong Yi

Expand Down Expand Up @@ -113,14 +113,14 @@ The main outputs to consider are in ICEES, ICEES2 and ICEES2_dei directories (na
```{r}
year_ <- 2010
basepath <- file.path(work_dir, "data", "output")
icees2dei_url <- file.path(basepath,"icees2_dei", sprintf("%spatient_deidentified", year_))
icees2dei_patient_url <- file.path(basepath,"icees2_dei", sprintf("%spatient_deidentified", year_))
```

The ICEES directory contains one CSV file per subject. Each CSV file consists of the subject’s visits, concatenated with the corresponding environmental exposure estimates for that day and that subject's location. If a subject has multiple visits per day, then the transformation "PreprocPerPatSeriesToVector" step aggregates multiple daily visits by counting how many times a drug or diagnosis occurred. The directory for each subject is indexed by the "patient_num" column. The ICEES2 directory contains a single CSV file with the aggregation of all subjects grouped by (subject, study period). Lastly, the ICEES2DEI directory contains the same aggregated CSV file, but the data have been stripped of all PHI per HIPAA Safe Harbor method [5]. The fully deidentified file then abides by all federal regulations surrounding privacy and security, although institutional regulations may remain. For exposition purposes, we focus on the deidentified ICEES2DEI CSV file and reorder its columns.

```{r}
icees2dei_colorder <- scan("icees2dei_column_order.txt", what="", sep="\n")
icees2dei <- read.csv(icees2dei_url, header = TRUE)[, icees2dei_colorder]
icees2dei_colorder <- scan("icees2dei_patient_column_order.txt", what="", sep="\n")
icees2dei <- read.csv(icees2dei_patient_url, header = TRUE)[, icees2dei_colorder]
```

```{r}
Expand All @@ -130,7 +130,7 @@ icees2dei
The deidentified ICEES2DEI CSV file may be used for further analysis. For instance, below we plot `AvgDailyPM2.5Exposure` vs `MaxDailyOzoneExposure` to examine the relationship between maximum daily ozone exposure and average daily PM2.5 exposure, noting a strong correlation is not expected for this tutorial, as we randomly sampled the exposures data.

```{r}
ggplot(icees2dei,aes(y=AvgDailyPM2.5Exposure,x=AvgDailyOzoneExposure)) + geom_point(color='blue') +
ggplot(icees2dei,aes(y=AvgDailyPM2.5Exposure,x=MaxDailyOzoneExposure)) + geom_point(color='blue') +
geom_smooth(method = "lm", se = FALSE)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,9 @@ index
year
Race
Sex2
Sex
Ethnicity
ObesityBMI
TotalEDVisits
TotalInpatientVisits
TotalEDInpatientVisits
AgeStudyStart
Active_In_Year
Albuterol
Androstenedione
Arformoterol
Expand Down Expand Up @@ -81,17 +76,7 @@ TesticularCancerDx
TesticularDysfunctionDx
UterineCancerDx
AvgDailyPM2.5Exposure
AvgDailyPM2.5Exposure_StudyAvg
AvgDailyPM2.5Exposure_StudyMax
MaxDailyPM2.5Exposure
MaxDailyPM2.5Exposure_StudyAvg
MaxDailyPM2.5Exposure_StudyMax
AvgDailyOzoneExposure
AvgDailyOzoneExposure_StudyAvg
AvgDailyOzoneExposure_StudyMax
MaxDailyOzoneExposure
MaxDailyOzoneExposure_StudyAvg
MaxDailyOzoneExposure_StudyMax
AvgDailyPM2.5Exposure_2
MaxDailyOzoneExposure_2
AvgDailyCOExposure_2
Expand All @@ -101,4 +86,12 @@ AvgDailyNOxExposure_2
AvgDailySO2Exposure_2
AvgDailyAcetaldehydeExposure_2
AvgDailyFormaldehydeExposure_2
AvgDailyBenzeneExposure_2
AvgDailyBenzeneExposure_2
Landfill_Exposure
CAFO_Exposure
MajorRoadwayHighwayExposure
RoadwayDistanceExposure
EstResidentialDensity
EstProbabilityNoAuto
EstHouseholdIncome
EstProbabilityNoHealthIns