summary_tables.Rmd

---
title: "Summary Tables"
date: "`r format(Sys.time(), '%d %B, %Y')`"
params:
  output_version: ''
output:
  html_document:
    toc: true
    toc_depth: 5
    toc_float: 
      collapsed: false
    number_sections: true
---

```{r setup, include=FALSE}

# Script summarises mode splits, distances and durations travelled and looks at health outcomes (YLLs and deaths) for the different air pollution (AP), injury and physical activity (PA) pathways but also by disease

# WARNING: This script only works if ITHIM-Global has been run in constant mode using either the BOGOTA, LATAM, AFRICA_INDIA or GLOBAL scenario definitions


#### It produces the following output documents:

### - html file containing the following information:

# ( - table displaying highest propensity (%) for each distance category, for a given mode. - CURRENTLY commented out as not relevant for BOGOTA, GLOBAL, LATAM or AFRICA_INDIA scenario definitions)


## Mode split
# - for each city a table is displayed showing the Baseline mode split for each distance category 

# - for each city a table is given showing the mode split for each scenario

# (- one table showing the baseline mode split for all cities - CURRENTLY COMMENTED OUT)


## Distances travelled
# - city specific distance tables showing average daily distances in km travelled by each person in the baseline population by mode for all scenarios (incl Baseline)

# - city specific distance tables showing average daily distances in km travelled by each person in the baseline population WITH a trip by mode for all scenarios (incl Baseline)

# - one table giving the city specific baseline average daily distances (km) travelled by each person in the baseline population for all cities

#( - city specific distance tables showing total yearly distances in km travelled by the total population with ages considered in the model for each mode for all scenarios (incl Baseline)  - CURRENTLY COMMENTED OUT)

#( - one table giving the city specific total yearly baseline distances in km travelled by the total population with ages considered in the model for each mode for all cities - CURRENTLY COMMENTED OUT)
#( - city specific distance tables showing total distances travelled by distance category toegether with the number of trips and proportions of all trips within the distance categories - CURRENTLY COMMENTED OUT)

#( - city specific distance tables showing total daily distances in km travelled by the total population  with ages considered in the model for each mode for all scenarios (incl Baseline)  - CURRENTLY COMMENTED OUT)


## Travel duration
# - city specific duration tables showing average daily duration (h) spent travelling by each person in the baseline population by mode for all scenarios (incl Baseline)

# - city specific duration tables showing average daily duration (h) spent travelling by each person in the baseline population WITH a trip by mode for all scenarios (incl Baseline)

#( - one table giving the city specific baseline average duration (h) spent travelling by each person in the respective baseline populations for all cities  - CURRENTLY COMMENTED OUT)


## (Health outcomes - injury, air pollution and physical activity  - CURRENTLY COMMENTED OUT)
# (- one table for each city showing the change in total YLLs for each of the scenarios compared to the reference scenario for each age group - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in YLLs per 100,000 people for each of the scenarios compared to the reference scenario for each age group - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in YLLs per 100,000 people for each of the scenarios compared to the reference scenario for each age and sex category - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in total YLLs due to changes in accident fatalities for each of the scenarios compared to the reference scenario for each age group - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in total YLLs due to changes in physical activity levels for each of the scenarios compared to the reference scenario for each age group - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in total YLLs due to changes in air pollution (PM2.5) levels for each of the scenarios compared to the reference scenario for each age group - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in YLLs per 100,000 people due to changes in accident fatalities for each of the scenarios compared to the reference scenario for each age group - CURRENTLY COMMENTED OUT)

# - (one table for each city showing the change in YLLs per 100,000 people due to changes in physical activity levelsfor each of the scenarios compared to the reference scenario for each age group - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in YLLs per 100,000 people due to changes in air pollution (PM2.5) levels for each of the scenarios compared to the reference scenario for each age group  - CURRENTLY COMMENTED OUT)


## (Health outcomes - disease  - CURRENTLY COMMENTED OUT)
# (- one table for each city showing the change in deaths for each of the scenarios compared to the reference scenario for each disease  - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in deaths per 100,000 people for each of the scenarios compared to the reference scenario for each disease  - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in YLLs for each of the scenarios compared to the reference scenario for each disease - CURRENTLY COMMENTED OUT)

# (- one table for each city showing the change in YLLs per 100,000 people for each of the scenarios compared to the reference scenario for each disease - CURRENTLY COMMENTED OUT)


### - output .csv files containing the following information. (Note that all these files are also saved with their version number in the file name):
 

# - the ylls.csv file (results/multi_city/health_impacts/ylls.csv) gives the changes in YLLs for each age and sex category, for each age category per 100,000 people, and for each age and sex category per 100,000 people for each city, disease and scenario compared to the reference scenario. It also shows the disease level, whether the disease is exasperated by changes in AP, PA or both, and the age, and age and sex specific population levels. The results are for either AP and PA only if the diseases are affected by only AP or PA or the results are for the combined AP and PA effects if the diseases are affected by both AP and PA. 

# - the ylls_pathway.csv file (results/multi_city/health_impacts/ylls_pathway.csv) gives the changes in YLLs for each age and sex category, for each age category per 100,000 people, and for each age and sex category per 100,000 people for each city, disease and scenario compared to the reference scenario. It also shows the disease level, whether the disease is exasperated by changes in AP or PA, and the age, and age and sex specific population levels. The results are given for AP and PA independently even if a disease is affected by both pathways.

# - the deaths.csv file (results/multi_city/health_impacts/deaths.csv) gives the changes in deaths for each age and sex category, for each age category per 100,000 people, and for each age and sex category per 100,000 people for each city, disease and scenario compared to the reference scenario. It also shows the disease level, whether the disease is exasperated by changes in AP, PA or both, and the age, and age and sex specific population levels. The results are for either AP and PA only if the diseases are affected by only AP or PA or the results are for the combined AP and PA effects if the diseases are affected by both AP and PA. 

# - the deaths_pathway.csv file (results/multi_city/health_impacts/deaths_pathway.csv) gives the changes in deaths for each age and sex category, for each age category per 100,000 people, and for each age and sex category per 100,000 people for each city, disease and scenario compared to the reference scenario. It also shows the disease level, whether the disease is exasperated by changes in AP or PA, and the age, and age and sex specific population levels. The results are given for AP and PA independently even if a disease is affected by both pathways.


#### The script performs the following steps:

# - set up helper functions to re-format large numbers, re-name scenarios and format and print output tables

# - read in io object containing the results from an ITHIM-Global run and extract relevant variables. Read in mode_order.csv which sets up a set mode order

# (extract table displaying highest propensity (%) for each distance category, for a given mode. - CURRENTLY commented out as not relevant for BOGOTA, GLOBAL, LATAM or AFRICA_INDIA scenario definitions)

# - call get_scen_settings.R function to find the Baseline mode splits for each distance category for each city and print tables to html file

# - for each city create a table showing the mode split for each scenario

# - create one table showing the baseline mode split for all cities

# - create city specific distance tables showing average daily distances in km travelled by each person in the baseline population by mode for all scenarios (incl Baseline)

# - create  city specific distance tables showing average daily distances in km travelled by each person in the baseline population WITH a trip by mode for all scenarios (incl Baseline)

# - create one table giving the city specific baseline average daily distances (km) travelled by each person in the baseline population for all cities

# - create city specific distance tables showing total yearly distances in km travelled by the total  population with ages considered in the model for each mode for all scenarios (incl Baseline)

# - create one table giving the city specific total yearly baseline distances in km travelled by the total population with ages considered in the model for each mode for all cities

# - create city specific distance tables showing total distances travelled by distance category together with the number of trips and proportions of all trips within the distance categories

# - create city specific distance tables showing total daily distances in km travelled by the total population  with ages considered in the model for each mode for all scenarios (incl Baseline)

# - create city specific duration tables showing average daily duration (h) spent travelling by each person in the baseline population by mode for all scenarios (incl Baseline)

# - create city specific duration tables showing average daily duration (h) spent travelling by each person in the baseline population WITH a trip by mode for all scenarios (incl Baseline)

# - create one table giving the city specific baseline average duration (h) spent travelling by each person in the respective baseline populations for all cities

# - create one table for each city showing the change in total YLLsfor each of the scenarios compared to the reference scenario  for each age group

# - create one table for each city showing the change in YLLs per 100,000 people for each of the scenarios compared to the reference scenario for each age group

# - create one table for each city showing the change in YLLs per 100,000 people for each of the scenarios compared to the reference scenario for each age and sex category

# - create one table for each city showing the change in total YLLs due to changes in accident fatalities for each of the scenarios compared to the reference scenario for each age group

# - create one table for each city showing the change in total YLLs due to changes in physical activity levels for each of the scenarios compared to the reference scenario for each age group

# - create one table for each city showing the change in total YLLs due to changes in air pollution (PM2.5) levels for each of the scenarios compared to the reference scenario for each age group

# - create one table for each city showing the change in YLLs per 100,000 people due to changes in accident fatalities for each of the scenarios compared to the reference scenario for each age group

# - create one table for each city showing the change in YLLs per 100,000 people due to changes in physical activity levels for each of the scenarios compared to the reference scenario for each age group

# - create one table for each city showing the change in YLLs per 100,000 people due to changes in air pollution (PM2.5) levelsfor each of the scenarios compared to the reference scenario for each age group

# - create one table for each city showing the change in deaths for each of the scenarios compared to the reference scenario for each disease

# - create one table for each city showing the change in deaths per 100,000 people for each of the scenarios compared to the reference scenario for each disease

# - create one table for each city showing the change in YLLs for each of the scenarios compared to the reference scenario for each disease

# - create one table for each city showing the change in YLLs per 100,000 people for each of the scenarios compared to the reference scenario for each disease

# - create 4 individual .csv files in 'results/multi_city/health_impacts/' for both death and YLLs considering the effect of AP and PA separately (*_pathway.csv) or together (*.csv). These files give the changes in YLLs or deaths for each age and sex category, for each age category per 100,000 people, and for each age and sex category per 100,000 people for each city, disease and scenario compared to the reference scenario. They also give the disease level, whether the disease is exasperated by changes in AP, PA or both, and the age and age and sex specific population levels. 


knitr::opts_chunk$set(comment=NA, prompt=FALSE, cache=FALSE, echo=F, results='asis', warning = F, message = F)

```

```{r loadLibraries, echo=F, message=FALSE}
suppressWarnings({

library(summarytools)
library(knitr)
library(summarytools)
library(tidyverse)
library(gt)
library(pracma)
library(data.table)
library(writexl)
library(cli)
})

options(dplyr.summarise.inform = FALSE)

st_options(bootstrap.css     = FALSE,       # Already part of the theme so no need for it
           plain.ascii       = FALSE,       # One of the essential settings
           style             = "rmarkdown", # Idem.
           dfSummary.silent  = TRUE,        # Suppresses messages about temporary files
           footnote          = NA,          # Keeping the results minimalistic
           subtitle.emphasis = FALSE)       # For the vignette theme, this gives
                                            # much better results. Your mileage may vary

```

```{r helper_functions, echo = F, message = F}
# set up helper functions

# Function to format numbers with million (M) and billion (B) suffixes
# Source: https://stackoverflow.com/questions/28159936/formatting-large-currency-or-dollar-values-to-millions-billions
comprss <- function(tx) { 
  div <- findInterval(as.numeric(gsub("\\,", "", tx)), 
                      c(0, 1e3, 1e6, 1e9, 1e12) )  # modify this if negative numbers are possible
  paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 2), 
        c("","K","M","B","T")[div] )
}

# function to rename scenarios
get_qualified_scen_name <- function(cs){
  qualified_scen_name <- ""
  if (cs == 'base' | cs == 'baseline' | cs == 'Baseline')
    qualified_scen_name <- 'Baseline'
  else if(cs == "sc_walk")
    qualified_scen_name <- 'Walking'
  else if(cs == "sc_cycle")
    qualified_scen_name <- 'Cycling'
  else if(cs == "sc_car")
    qualified_scen_name <- 'Car'
  else if(cs == "sc_motorcycle")
    qualified_scen_name <- 'Motorcycling'
  else if(cs == "sc_bus")
    qualified_scen_name <- 'Bus'
  
  return(qualified_scen_name)
}

# define function to sum the columns of the data, print the data to html and add a caption
sum_and_round_and_print <- function(data, text = '') {
  data <- lapply(data, function(x) round(x, round_to))
  data <- lapply(data, function(x) rbind(x, Total = colSums(x)))
  for (city in cities) {
    print(kable(data[[city]], caption = paste(text, city)))
    cat("\n")
  }
}

# define function to round the data and print it to html with a caption
round_and_print <- function(data,text=''){
  data <- lapply(data, function(x)round(x,round_to))
  for(city in cities) {
    print(kable(data[[city]], caption = paste(text, city)))
    cat("\n")
  }
}

```

```{r load_objects = "asis", echo = F, message = F}
#output_version <- "v0.3"
# read in io object which is produced at the end of an ITHIM-Global run
# Assumes that multi_city_script.R has been run 
# read in input file

if (!exists("output_version")){
  ## Get the current repo sha
  gitArgs <- c("rev-parse", "--short", "HEAD", ">", file.path("repo_sha"))
  # Use shell command for Windows as it's failing with system2 for Windows (giving status 128)
  if (.Platform$OS.type == "windows"){
    shell(paste(append("git", gitArgs), collapse = " "), wait = T)
  } else {
    system2("git", gitArgs, wait = T)
  }
  
  repo_sha <-  as.character(readLines(file.path("repo_sha")))
  output_version <- paste0(repo_sha, "_test_run")
}

io <- readRDS(paste0("results/multi_city/io_", output_version, ".rds"))

# Get names of cities from the io object
cities <- names(io)[!names(io) %in% c('scen_prop','ithim_run' )]

# extract number of scenarios
NSCEN <- nrow(io$scen_prop)

# input parameter file name
input_parameter_file <<- io$ithim_run$input_parameter_file
  
# scenario and reference scenario definitions
scenario_name <- io$ithim_run$scenarios_used
ref_scen <- io$ithim_run$reference_scenario
reference_scenario <- get_qualified_scen_name(ref_scen)

# further model run information
compute_mode <- io$ithim_run$compute_mode 
timestamp_model <- io$ithim_run$timestamp
comments_model <- io$ithim_run$comment

# Read trip_order
# trip_order <- read_csv("data/global/trips/mode_order.csv")
trip_order <- read_csv("results/multi_city/mode_order.csv")

# define which decimal place to round to
round_to <- 2


# Plot colours for each scenario
scen_colours <- c("Baseline" = '#b15928',
                  "Cycling" = '#abdda4',
                  "Car" = '#d7191c',
                  "Bus" = '#2b83ba',
                  "Motorcycle" = '#fdae61')


# print model run information to screen:
cat(
   cli::style_hyperlink(
      text = paste0("https://github.com/ITHIM/ITHIM-R/tree/", stringr::str_remove(output_version, "_test_run")),
      url = paste0("https://github.com/ITHIM/ITHIM-R/tree/", stringr::str_remove(output_version, "_test_run"))
   )
)
cat("  \n")
cat(paste0('Scenario: ', SCENARIO_INCREASE * 100, "%")) 
cat("  \n")
cat(paste0('Input Parameter version: ', io$ithim_run$input_parameter_file)) 
cat("  \n")
cat(paste0('Output version: ', output_version)) 
cat("  \n")
cat(paste0('Timestamp of model run: ', timestamp_model))
cat("  \n")
cat(paste0('Comments from model run: ', comments_model))
cat("  \n")


```

<!-- # **1st option** -->

# Scenario 

Scenario definition (by mode)

```{r trip_mode_dist = "asis", echo = F}
# find the modal share for all cities and scenarios

for (city in cities) {
  
  # Calculate proportions by mode when compared with baseline
  total_trips <- io[[city]]$trip_scen_sets |> filter(scenario == "baseline") |> 
    distinct(trip_id, scenario, .keep_all = T) |> nrow()
  
  prop <- io[[city]]$trip_scen_sets |> 
    filter(participant_id !=0) |> 
    distinct(trip_id, scenario, .keep_all = T) |> 
    group_by(scenario, trip_mode) |> 
    reframe(freq = round(sum(dplyr::n())/total_trips*100, 1)) |> 
    mutate(pd = freq - freq[scenario == 'baseline']) |> 
    filter(pd != 0)
  
  # Rename scenarios
  prop[prop$scenario == "baseline",]$scenario <- "Baseline"
  prop[prop$scenario == "sc_bus",]$scenario <- "Bus"
  prop[prop$scenario == "sc_car",]$scenario <- "Car"
  prop[prop$scenario == "sc_cycle",]$scenario <- "Cycling"
  
  # plot
  y <- ggplot(prop) +
    aes(x = trip_mode, fill = scenario, weight = pd) +
    geom_bar() +
    scale_fill_hue(direction = 1) +
    scale_x_discrete(guide = guide_axis(angle = 90)) + 
    geom_text(aes(x = trip_mode, y = round(pd, 1), label = round(pd, 1)), size = 4, 
              position = position_dodge()) + scale_fill_manual("Scenario",values = scen_colours) +
    facet_wrap(vars(scenario)) + 
    labs(title = "", y = "Percentage (%)", x = "Trip Mode") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
  
  #print
  #
  print(y)
  
  # save image
  ggsave("figures/scen_defn_mode.svg", plot = y, width=10, height=8)
  
}


```

# Summary tables

List of tables for scenario, distance and duration

## Distance categories

<!-- ## Scenario definition  -->

<!-- Table displays highest propensity for each distance category, for a given mode. -->

<!-- Table displays highest propensity (%) for each distance category, for a given mode. -->

<!-- In scenario generation, trips are sampled without replacement to increase the share of the requested mode up to the total shown. -->

Distance categories are defined as:

0-2 km = {trip distance \< 2km}

2-6 km = {2km \<= trip distance \< 6km}

6+ km = {6km \<= trip distance}

```{r scen_prop = "asis"}

# extract and update scenario names
scen_prop <- round(io$scen_prop, round_to)

rownames(scen_prop) <- paste("sc",rownames(io$scen_prop),  sep = "_")

# rename scenarios
orig_scen_names_updated <- unlist(lapply(rownames(scen_prop), FUN=get_qualified_scen_name))


# # rename scenarios
rownames(scen_prop) <- unlist(lapply(rownames(scen_prop), FUN=get_qualified_scen_name))

#kable(scen_prop, headings = "Scenario Proportions")

```

## City mode split for each distance category

```{r message=F, error=F, warning=FALSE}
# call get_scen_settings function to find the mode split for each distance category for each city

get_scen_settings <- ithimr::get_scenario_settings(cities = cities)

```

## Baseline mode split by distance category

```{r trip_mode_dist_prop = "asis", echo = F}

for (city in cities) {
  #  & 
  td <- io[[city]]$trip_scen_sets |> 
    filter(scenario == "baseline" & trip_id != 0 & participant_id != 0) |>  
    distinct(trip_id, .keep_all = T) |> 
    count(trip_mode, trip_distance_cat) |> 
    mutate(freq = prop.table(n), .by = trip_mode) |>
    filter(trip_mode %in% c('pedestrian', 'cycle', 'car', 'motorcycle', 'bus')) |> 
  dplyr::select(-n) |> dplyr::mutate(freq = round(freq * 100, 1)) |> pivot_wider(names_from = trip_distance_cat, values_from = freq) |> janitor::adorn_totals('col') |> mutate(Total = round(Total))
  
    # print mode share for individual cities
  print(kable(td, caption = paste("Trip proportion (%) by mode and by distance category for ", city)))
  cat("\n")

  
}


```

## Mode split for each scenario by city

City specific trip proportions by mode, for baseline and scenarios

```{r load_tidyverse, echo = F, message = F}
suppressWarnings({
  require(tidyverse)  
})
```

```{r trip_mode_dist = "asis", echo = F}
# find the modal share for all cities and scenarios

trip_prop <- list()

for (city in cities) { # Loop for each city
  
  # extract the trip data for all scenarios
  df <- io[[city]]$trip_scen_sets
  
  # find the number of unique trips in the Baseline
  u_trips <- df %>% dplyr::filter(scenario == "baseline") %>% 
    summarise(uid = n_distinct(trip_id)) %>% as.numeric()
  
  # find the proportion of trips made by each mode in each scenario
  td <- df %>% distinct(trip_id, scenario, .keep_all = T) %>% 
    filter(!trip_mode %in% c("bus_driver", "taxi", "rail", "auto_rickshaw", "truck", "other", "car_driver")) |> 
    group_by(trip_mode, scenario) %>% 
    summarise(p = round(dplyr::n() / u_trips * 100, 1)) %>% 
    spread(key = trip_mode, value = p) %>% 
    mutate(row_sums = rowSums(.[sapply(., is.numeric)], na.rm = TRUE))
  
  # find and update scenario names
  scen_names <- td$scenario
  scen_names_updated <- unlist(lapply(scen_names, FUN=get_qualified_scen_name))
  
  # take transpose
  td <- as.data.frame(t(td))
  
  # rename columns by using updated scenario names
  names(td) <- scen_names_updated
  
  # drop first column and re-order by pre-defined mode order
  td <- td[-1, ]
  x <- match(row.names(td), trip_order$mode)
  x[!is.na(x)]
  td <- td[order(x),]
  
  # convert row names into column of data and create new dataframe
  td1 <- td %>% rownames_to_column()
  
  # re-name column
  names(td1)[1] <- 'stage_mode'
  
  # populate trip_prop list with stage modes
  if (length(trip_prop) == 0){
    trip_prop <- td1 %>% dplyr::select(stage_mode)
  }
  
  td1 <- td1 %>% dplyr::select(stage_mode, Baseline)
  
  names(td1)[2] <- city
  
  # add proportions for all cities but only keep stage modes that appear in the trip data of all cities
  trip_prop <- inner_join(trip_prop, td1, by = 'stage_mode')
  
  # print mode share for individual cities
  print(kable(td, caption = paste("Trip proportion (%) by mode for ", city)))
  cat("\n")
} # End loop city

```

## Distance tables

City specific distance tables showing average distance in km travelled by each person in the baseline population by mode for all scenarios (incl Baseline)

```{r trip_dist = "asis"}
# find the average distance a person in the baseline population spends travelling on each mode per day

trip_dist <- list()

for (city in cities) { # loop through cities
  
  # find number of rows / people in the baseline population
  count_people <- nrow(io[[city]]$base_pop) 
  
  # extract distances and divide by number of people
  td <- io[[city]]$dist %>% 
    # dplyr::filter(stage_mode != 'bus_driver' & stage_mode != 'car_driver') %>%
    filter(!stage_mode %in% c("taxi", "rail", "auto_rickshaw", "truck", "other")) |> 
    mutate_if(is.numeric, round, digits = round_to) %>% 
    mutate_if(is.numeric, list(~round((.) / count_people, round_to)))

  # update scenario names 
  colnames(td)[2:ncol(td)] <- c('Baseline', orig_scen_names_updated)
  
  # match with pre-defined mode order
  x <- match(td$stage_mode, trip_order$mode)
  x[!is.na(x)]
  td <- td[order(x),]
  row.names(td) <- NULL
  
  # write to .html
  print(kable(td, caption = paste("Distance table (km) per day for ", city, "(", count_people, ") per person in the baseline population")))
  cat("\n")
  
  td1 <- td
  
  # initialise trip_dist
  if (length(trip_dist) == 0){
    trip_dist <- td1 %>% dplyr::select(stage_mode)
  }
  
  # extract baseline distances
  td1 <- td1 %>% dplyr::select(stage_mode, Baseline)
  
  names(td1)[2] <- city
  
  # join baseline mode distances for all cities
  trip_dist <- inner_join(trip_dist, td1, by = 'stage_mode')
  
  
  ## repeat but only for people who are travelling
  # extract number of people travelling from the trip data rather than the baseline population
  count_people <- length(unique(io[[city]]$trip_scen_sets$participant_id))
  td <- io[[city]]$dist %>% 
    # dplyr::filter(stage_mode != 'bus_driver') %>%
    filter(!stage_mode %in% c("taxi", "rail", "auto_rickshaw", "truck", "other")) |> 
    mutate_if(is.numeric, round, digits = round_to) %>% 
    mutate_if(is.numeric, list(~round((.) / count_people, round_to)))
  colnames(td)[2:ncol(td)] <- c('Baseline', orig_scen_names_updated)
  x <- match(td$stage_mode, trip_order$mode)
  x[!is.na(x)]
  td <- td[order(x),]
  row.names(td) <- NULL
  
  print(kable(td, caption = paste("Distance table (km) for ", city, "(", count_people, ") per person in the baseline population (people with trips ONLY)")))
  
}

```

<!-- ### Distance table showing average baseline distances in km travelled by each person in the baseline population by mode for all cities -->

<!-- ```{r} -->

<!-- # create table showing average baseline distances by mode for all cities -->

<!-- trip_dist <- trip_dist %>% janitor::adorn_totals('row') -->

<!-- kable(trip_dist, caption = "Avg. baseline distance (km) per person in the baseline population per day across all cities") -->

<!-- ``` -->

<!-- ## Distance tables scaled by using total model age population (per year in km) -->

<!-- City specific distance tables for baseline and scenarios for the entire population with ages considered in the model -->

<!-- ```{r trip_tot_dist = "asis", message=F} -->

<!-- # create distance tables for the yearly distances travelled by the entire population -->

<!-- bl_td <- list() -->

<!-- for (city in cities) { # loop through cities -->

<!--   # extract daily population distances for ages considered in model and multiply by 365 to to get yearly distances -->

<!--   td <- io[[city]]$true_dist %>% dplyr::filter(stage_mode != 'bus_driver' & stage_mode != 'car_driver') %>% -->

<!--     mutate_if(is.numeric, list(~round((.) * 365, 3))) -->

<!--   # update scenario names -->

<!--   colnames(td)[2:ncol(td)] <- scen_names_updated -->

<!--   # match with pre-defined mode order -->

<!--   x <- match(td$stage_mode, trip_order$mode) -->

<!--   x[!is.na(x)] -->

<!--   td <- td[order(x),] -->

<!--   row.names(td) <- NULL -->

<!--   # initialise bl_td with stage_mode -->

<!--   if (length(bl_td) == 0){ -->

<!--     bl_td <- td %>% dplyr::select(stage_mode) -->

<!--   } -->

<!--   # calculate row and column totals -->

<!--   td <- td %>% ungroup() %>% janitor::adorn_totals(c('row', 'col')) -->

<!--   # extract baseline distances -->

<!--   td1 <- td %>% dplyr::select(stage_mode, Baseline) -->

<!--   names(td1)[2] <- city -->

<!--   # join baseline distances for all cities -->

<!--   bl_td <- inner_join(bl_td, td1, by = 'stage_mode') -->

<!--   # format table -->

<!--   td1 <- td %>% gt() %>% fmt_number(columns = 2:ncol(td), decimals = T, suffixing = T) -->

<!--   # print to html -->

<!--   cat(paste("Yearly distance (km) travelled by the entire population with ages considered in the model for ", city)) -->

<!--   print(td1) -->

<!--   cat("\n") -->

<!-- } -->

<!-- ``` -->

<!-- ### Baseline total distance for all cities (per year in km) -->

<!-- Baseline distance travelled by entire population with ages considered in the model for all cities -->

<!-- ```{r trip_dist_all_cities = "asis", message=F} -->

<!-- # create baseline yearly distance table for the entire population with ages considered in the model for all cities -->

<!-- backup <- bl_td -->

<!-- # add row totals -->

<!-- td <- as.data.frame(bl_td) %>% janitor::adorn_totals('row') -->

<!-- # remove all spaces/blanks -->

<!-- for (i in 2:ncol(td)){ -->

<!--   td[, i] <- comprss(td[, i]) -->

<!-- } -->

<!-- print(kable(td, caption = paste("Yearly baseline distance table (km) for population with ages considered in the model"))) -->

<!-- ``` -->

<!-- ## Distance by distance category -->

<!-- ### Tables -->

<!-- ```{r trip_dist_mode_figs = "asis"} -->

<!-- # for each city create a table showing the total distance, the number of trips and the proportion of trips in each distance category -->

<!-- for (city in cities){ # loop through cities -->

<!--   # extract trip data for the baseline scenario and calculate the total distance, the number of trips and the proportion of trips in each distance category -->

<!--   df <- io[[city]]$trip_scen_sets %>% dplyr::filter(scenario == 'Baseline' | scenario == 'baseline') %>% distinct(trip_id, .keep_all = T) %>% group_by(trip_distance_cat) %>%  -->

<!--     summarise(sum_dist = sum(trip_distance), n_vals = dplyr::n(), proportion = round(n_vals / nrow(.) * 100, 1)) -->

<!--  # print to html -->

<!--   print(kable(df, caption = paste("Distance in travel survey by mode for the ages considered in the model for  ", city))) -->

<!-- } -->

<!-- ``` -->

<!-- ## Daily population distances (per day in km) for ages considered in model -->

<!-- ### Tables -->

<!-- ```{r trip_dist_mode_figs = "asis"} -->

<!-- for (city in cities){ # loop through distances -->

<!--   # extract total daily distances travelled by the entire population with ages considered in the model -->

<!--   df <- io[[city]]$true_dist -->

<!--   names(df)[-1] <- scen_names_updated -->

<!--   print(kable(df, caption = paste("Total population distance with ages considered in model by mode per day for  ", city))) -->

<!-- } -->

<!-- ``` -->

## Duration tables

City specific duration tables for the baseline population for baseline and scenarios (per day in minutes)

```{r trip_dur = "asis"}

# create tables showing the average duration each person in the baseline population spends travelling per mode 

trip_dur <- list()
l <- list()

for(city in cities){ # loop through cities
  count_people <- nrow(io[[city]]$base_pop)
  
  # extract the duration for the baseline population
  td <- io[[city]]$dur %>% 
    # dplyr::filter(stage_mode != 'bus_driver' & stage_mode != 'car_driver') %>% 
    filter(!stage_mode %in% c("taxi", "rail", "auto_rickshaw", "truck", "other")) |> 
    mutate_if(is.numeric, round, digits = round_to) %>% mutate_if(is.numeric, list(~round((.) / count_people, round_to)))

  # update scenario names
  colnames(td)[2:ncol(td)] <- c('Baseline', orig_scen_names_updated)
  
  # order by pre-defined mode order
  x <- match(td$stage_mode, trip_order$mode)
  x[!is.na(x)]
  td <- td[order(x),]
  row.names(td) <- NULL
  
  # add output to list
  l[[city]] <- td
  
  td1 <- td
  
  # initialise list for all cities
  if (length(trip_dur) == 0){
    trip_dur <- td1 %>% dplyr::select(stage_mode)
  }
  
  # extract baseline data
  td1 <- td1 %>% dplyr::select(stage_mode, Baseline)
  
  names(td1)[2] <- city

  # create one baseline table containing all city information
  trip_dur <- inner_join(trip_dur, td1, by = 'stage_mode')
  
  # print to html
  print(kable(td, caption = paste("Duration table (mins) per day for baseline population for ", city, "(", count_people, ") per person (everyone)")))
  
  # repeat but only counting people who are actually taking a trip
  count_people <- length(unique(io[[city]]$trip_scen_sets$participant_id)) # find unique trip ids rather than rows in baseline population
  td <- io[[city]]$dur %>% 
    # dplyr::filter(stage_mode != 'bus_driver' & stage_mode != 'car_driver') %>% 
    filter(!stage_mode %in% c("taxi", "rail", "auto_rickshaw", "truck", "other")) |> 
    mutate_if(is.numeric, round, digits = round_to) %>% mutate_if(is.numeric, list(~round((.) / count_people, round_to)))
  colnames(td)[2:ncol(td)] <- c('Baseline', orig_scen_names_updated)
  x <- match(td$stage_mode, trip_order$mode)
  x[!is.na(x)]
  td <- td[order(x),]
  row.names(td) <- NULL
  
  print(kable(td, caption = paste("Duration table (mins) per day based on trips taken in travel survey for ", city, "(", count_people, ") per person (people with trips)")))
  cat("\n")
}

```

<!-- ### Baseline average duration (mins) per day for all cities -->

<!-- ```{r} -->

<!-- # Average baseline duration travelled on each mode for all cities for all people in baseline population -->

<!-- # calculate the row sum -->

<!-- trip_dur <- trip_dur %>% janitor::adorn_totals('row') -->

<!-- kable(trip_dur, caption = "Avg. duration (min) per person in the baseline population per day across all cities") -->

<!-- ``` -->

<!-- ## Health outcomes -->

<!-- ### Change in YLLs compared to reference scenario {#change_YLL} -->

<!-- ```{r scen_prop = "asis"} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # Calculate the changes in YLL compared to the reference scenario by age groups -->

<!-- # find ap ylls columns -->

<!-- ap_yll_cols <- which(sapply(colnames(io[[1]]$outcomes$pathway_hb$ylls  |> dplyr::select(-c(ends_with("lb") | ends_with("ub")))),function(x)grepl('ap',as.character(x)))) -->

<!-- # find pa yll columns -->

<!-- pa_yll_cols <- which(sapply(colnames(io[[1]]$outcomes$pathway_hb$ylls |> dplyr::select(-c(ends_with("lb") | ends_with("ub")))),function(x)grepl('pa',as.character(x)))) -->

<!-- # # find scenario names of health burden as the reference scenario is not necessarily the baseline scenario -->

<!-- # hb_columns <- names(io[[cities[1]]]$outcomes$hb$ylls)[3: length(names(io[[cities[1]]]$outcomes$hb$ylls))] -->

<!-- #  -->

<!-- # # find the unique scenario names in the hb columns -->

<!-- # inj_scen_cols <- grep("inj", hb_columns) # find location of injury columns -->

<!-- # # extract unique scenario names -->

<!-- # hb_scen_names <- unique(sapply(stringr::str_split(hb_columns[inj_scen_names], "_yll_inj", n = 3), function(x) x[1])) -->

<!-- # # update scenario names -->

<!-- # hb_scen_names_upd <- unlist(lapply(hb_scen_names, FUN=get_qualified_scen_name)) -->

<!-- # from the known scenarios, remove the reference scenario -->

<!-- all_orig_scen_upd <- c('Baseline',orig_scen_names_updated) -->

<!-- hb_scen_names_upd <- all_orig_scen_upd[all_orig_scen_upd!=reference_scenario] -->

<!-- # calculate ylls by age category and scenario -->

<!-- # i.e. sum over all disease and remove sex  -->

<!-- yll_totals <- lapply(io[-c(2,3)],function(x){ -->

<!--   temp <- sapply(1:NSCEN,function(y){# loop through scenarios -->

<!--     xx <- x$outcomes$hb$ylls |> dplyr::select(-c(ends_with("lb") | ends_with("ub"))) -->

<!--     xxx <- rowSums(xx[,seq(2 + y, ncol(xx), by = NSCEN)]) # sum over all diseases -->

<!--     sapply(sort(unique(xx$age_cat)),function(z) -->

<!--       sum(xxx[xx$age_cat == z])) # sum age categories ignoring sex -->

<!--   }) -->

<!--   colnames(temp) <- hb_scen_names_upd # update scenario names -->

<!--   temp -->

<!-- }) -->

<!-- # add the column total, round the results and print to html -->

<!-- sum_and_round_and_print(yll_totals,"Change total in YLLs compared to the reference scenario in ") -->

<!-- ``` -->

<!-- ### Change in YLLs compared to reference scenario per 100,000 people {#change_YLL_age_100k} -->

<!-- YLLs per 100,000 people by age group by scenario -->

<!-- ```{r scen_prop = "asis"} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate change in YLLs compared to the reference scenario per 100,000 people -->

<!-- # extract population statistics -->

<!-- pop_by_age <- lapply(io[-c(2,3)], function(x) -->

<!--   sapply(unique(x$demographic$age),  -->

<!--          function(y) -->

<!--            sum(subset(x$demographic, age == y)$population) -->

<!--          ) -->

<!--   ) -->

<!-- # calculate ylls per 100,000 people -->

<!-- yll_rates <- lapply(cities, function(x) -->

<!--   rbind(yll_totals[[x]] / t(repmat(pop_by_age[[x]],NSCEN,1))*100000, # divide by population -->

<!--         Total = colSums(yll_totals[[x]])/rep(sum(pop_by_age[[x]]), # calculate rate for total YLLs -->

<!--                                              length = NSCEN)*100000 -->

<!--         ) -->

<!--   ) -->

<!-- names(yll_rates) <- cities -->

<!-- # round results and print to html -->

<!-- round_and_print(yll_rates,"Change in YLLs per 100,000 in ") -->

<!-- ``` -->

<!-- ### Change in YLLs compared to reference scenario per 100,000 people by age and sex  {#change_YLL_gender_age_100k} -->

<!-- Change in YLLs per 100,000 people by age and sex category for each scenario compared to the reference scenario -->

<!-- ```{r scen_prop = "asis"} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate changes in YLLs for each scenario compared to the Baseline per 100,000 people by age and sex category -->

<!-- # calculate the total ylls for each scenario summing across all diseases -->

<!-- yll_totals <- lapply(io[-c(2,3)], function(x) { -->

<!--   temp <- sapply(1:NSCEN,function(y){ -->

<!--     xx <- x$outcomes$hb$ylls |> dplyr::select(-c(ends_with("lb") | ends_with("ub"))) -->

<!--     rowSums(xx[,seq(2 + y, ncol(xx), by = NSCEN)]) -->

<!--   }) -->

<!--   rownames(temp) <- apply(x$outcomes$hb$deaths[,c('sex','age_cat')], 1, -->

<!--                           function(z)paste0(z[2], '_', z[1])) -->

<!--   colnames(temp) <- orig_scen_names_updated -->

<!--   temp -->

<!-- }) -->

<!-- # calculate the YLLs per 100,000 people -->

<!-- yll_rates <- lapply(cities, function(x) -->

<!--   rbind(yll_totals[[x]][match(apply(io[[x]]$demographic[,c('sex','age')], 1, -->

<!--                                     function(z) paste0(z[2],'_',z[1])),  -->

<!--                               rownames(yll_totals[[x]])),] / -->

<!--           t(repmat(io[[x]]$demographic$population,NSCEN,1)) * 100000, -->

<!--         Total = colSums(yll_totals[[x]])/rep(sum(pop_by_age[[x]]), -->

<!--                                              length = NSCEN) * 100000)) -->

<!-- names(yll_rates) <- cities -->

<!-- # round and print ylls per 100,000 people to html -->

<!-- round_and_print(yll_rates,"Change in YLLs per 100,000 in ") -->

<!-- ``` -->

<!-- ### Change in YLLs due to injury compared to the reference scenario  {#change_yll_injury} -->

<!-- ```{r} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate change in YLLs due to injury -->

<!-- # calculate change in YLLs due to injury by summing across all age categories for the injury results -->

<!-- injury_totals <- lapply(io[-c(2,3)], function(x) { -->

<!--   xx <- x$outcomes$hb$ylls |> dplyr::select(-c(ends_with("lb") | ends_with("ub"))) -->

<!--   injury_col <- grep("inj", names(xx))[1]  -->

<!--   xxx <- sapply(1:NSCEN, function(y) -->

<!--     sapply(sort(unique(xx$age_cat)), function(z) -->

<!--       sum(xx[xx$age_cat == z, injury_col - 1 + y]))) -->

<!--   colnames(xxx) <- hb_scen_names_upd -->

<!--   xxx -->

<!-- }) -->

<!-- # calculate the total, round the results and print to html -->

<!-- sum_and_round_and_print(injury_totals, "Change in YLLs due to injury in ") -->

<!-- ``` -->

<!-- ### Change in YLLs due to PA compared to the reference scenario {#change_yll_PA} -->

<!-- ```{r} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the changes in YLLs compared to the reference scenario caused by changes in PA for the unique age categories -->

<!-- # sum the YLLs for each scenario by age category -->

<!-- pa_totals <- lapply(io[-c(2,3)],function(x){ -->

<!--   xx <- x$outcomes$pathway_hb$ylls  |> dplyr::select(-c(ends_with("lb") | ends_with("ub"))) -->

<!--   xxx <- sapply(1:NSCEN, function(y){ -->

<!--     xxx <- rowSums(xx[, pa_yll_cols[seq(y, length(pa_yll_cols), by = NSCEN)]]) -->

<!--     sapply(sort(unique(xx$age_cat)), function(z) sum(xxx[xx$age_cat == z])) -->

<!--   }) -->

<!--   colnames(xxx) <- hb_scen_names_upd -->

<!--   xxx -->

<!-- }) -->

<!-- # calculate column sums, round the results and print to html -->

<!-- sum_and_round_and_print(pa_totals,"Change in YLLs due to PA in ") -->

<!-- ``` -->

<!-- ### Change in YLLs due to AP compared to the reference scenario {#change_yll_AP} -->

<!-- ```{r} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the changes in YLLS compared to the reference scenario due to changes in AP -->

<!-- # sum the YLLs by scenario and age category -->

<!-- ap_totals <- lapply(io[-c(2,3)],function(x){ -->

<!--   xx <- x$outcomes$pathway_hb$ylls  |> dplyr::select(-c(ends_with("lb") | ends_with("ub"))) -->

<!--   xxx <- sapply(1:NSCEN,function(y) { -->

<!--     xxx <- rowSums(xx[, ap_yll_cols[seq(y, length(ap_yll_cols), by = NSCEN)]]) -->

<!--     sapply(sort(unique(xx$age_cat)), function(z) sum(xxx[xx$age_cat == z])) -->

<!--   }) -->

<!--   colnames(xxx) <- hb_scen_names_upd -->

<!--   xxx -->

<!-- }) -->

<!-- # calculate column sums, round the results and print to html -->

<!-- sum_and_round_and_print(ap_totals,"Change in YLLs due to AP in ") -->

<!-- ``` -->

<!-- ### Change in YLLs due to injury compared to the reference scenario per 100,000 people {#change_yll_injury_100k} -->

<!-- <!-- YLLs per 100,000 people by age group by scenario - injury -->

--\>

<!-- ```{r scen_prop = "asis"} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the changes in YLLs compared to the reference scenario per 100,000 people due to injury -->

<!-- injury_rates <- lapply(cities, function(x) -->

<!--   rbind( -->

<!--     injury_totals[[x]]/t(repmat(pop_by_age[[x]],NSCEN,1)) * 100000, -->

<!--     Total = colSums(injury_totals[[x]])/rep(sum(pop_by_age[[x]]), -->

<!--                                             length = NSCEN) * 100000) -->

<!--   ) -->

<!-- names(injury_rates) <- cities -->

<!-- # round results and print to html -->

<!-- round_and_print(injury_rates,"Change in YLLs due to injury per 100,000 people in ") -->

<!-- ``` -->

<!-- ### Change in YLLs due to PA compared to the reference scenario per 100,000 people {#change_yll_PA_100k} -->

<!-- ```{r} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the changes in YLLs compared to the reference scenario per 100,000 people due to PA -->

<!-- pa_rates <- lapply(cities, function(x) -->

<!--   rbind( -->

<!--     pa_totals[[x]] / t(repmat(pop_by_age[[x]], NSCEN, 1)) * 100000, -->

<!--     Total = colSums(pa_totals[[x]])/rep(sum(pop_by_age[[x]]), -->

<!--                                         length = NSCEN) * 100000)) -->

<!-- names(pa_rates) <- cities -->

<!-- # round results and print to html -->

<!-- round_and_print(pa_rates,"Change in YLLs due to PA per 100,000 people in ") -->

<!-- ``` -->

<!-- ### Change in YLLs due to AP compared to the reference scenario per 100,000 people {#change_yll_AP_100k} -->

<!-- ```{r} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the changes in YLLs compared to the reference scenario per 100,000 people due to AP -->

<!-- ap_rates <- lapply(cities, function(x) -->

<!--   rbind( -->

<!--     ap_totals[[x]] / t(repmat(pop_by_age[[x]], NSCEN ,1)) * 100000, -->

<!--     Total = colSums(ap_totals[[x]]) / rep(sum(pop_by_age[[x]]), -->

<!--                                           length = NSCEN) * 100000)) -->

<!-- names(ap_rates) <- cities -->

<!-- # round results and print to html -->

<!-- round_and_print(ap_rates,"Change in YLLs due to AP per 100,000 people in ") -->

<!-- ``` -->

<!-- ## BY DISEASE -->

<!-- Change in total deaths compared to the reference scenario (for the city based on real population size but only for ages considered in the model) by scenario -->

<!-- ### Change in deaths by disease compared to the reference scenario {#change_death_disease} -->

<!-- ```{r} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the changes in deaths compared to the reference scenario by disease -->

<!-- # sum the total deaths by scenario -->

<!-- disease_totals <- lapply(io[-c(2,3)],function(x){ -->

<!--   # extract disease counts -->

<!--   xx <- x$outcomes$hb$deaths |> dplyr::select(-c(ends_with("lb") | ends_with("ub")))  -->

<!--   xxx <- sapply(1:NSCEN,function(y){ # sum over scenarios -->

<!--     colSums(xx[, seq(y + 2, ncol(xx), by = NSCEN)]) -->

<!--   }) -->

<!--   colnames(xxx) <- hb_scen_names_upd -->

<!--   # find scenario name of row names -->

<!--   row_scen <- unique(sapply(stringr::str_split(rownames(xxx), "_", n = 3), function(x) x[2])) -->

<!--   # update row names -->

<!--   rownames(xxx) <- gsub(paste0("sc_",row_scen,"_deaths_"), "", rownames(xxx)) -->

<!--   xxx -->

<!-- }) -->

<!-- # sum, round and print the results  -->

<!-- sum_and_round_and_print(disease_totals,"Change in deaths by disease in ") -->

<!-- ``` -->

<!-- ### Change in deaths by disease compared to the reference scenario per 100,000 people  {#change_death_disease_100k} -->

<!-- Change in deaths per 100,000 people (based on real population size but only for ages considered in the model) by scenario in comparison to the reference scenario -->

<!-- ```{r scen_prop = "asis"} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the change in deaths compared to the reference scenario per 100,000 people -->

<!-- # divided total number of deaths by population -->

<!-- disease_rates <- lapply(cities, function(x) -->

<!--   rbind( -->

<!--     disease_totals[[x]] / sum(pop_by_age[[x]]) * 100000, -->

<!--     Total = colSums(disease_totals[[x]]) / rep(sum(pop_by_age[[x]]),  -->

<!--                                                length = NSCEN)*100000)) -->

<!-- names(disease_rates) <- cities -->

<!-- # round results and print to html -->

<!-- round_and_print(disease_rates,"Change in deaths by disease per 100,000 people in ") -->

<!-- ``` -->

<!-- ### Change in total YLLs by disease compared to the reference scenario {#change_yll_disease} -->

<!-- Change in total YLLs by disease compared to the reference scenario (based on real population size, for ages considered within the ITHIM model)  -->

<!-- ```{r} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # sum the total ylls by scenario -->

<!-- disease_totals <- lapply(io[-c(2,3)],function(x){ -->

<!--   xx <- x$outcomes$hb$ylls |> dplyr::select(-c(ends_with("lb") | ends_with("ub"))) -->

<!--   xxx <- sapply(1:NSCEN,function(y) { -->

<!--     colSums(xx[,seq(y + 2, ncol(xx), by = NSCEN)]) -->

<!--   }) -->

<!--   colnames(xxx) <- hb_scen_names_upd -->

<!--   # find scenario name of row names -->

<!--   row_scen <- unique(sapply(stringr::str_split(rownames(xxx), "_", n = 3), function(x) x[2])) -->

<!--   # update row names -->

<!--   rownames(xxx) <- gsub(paste0("sc_",row_scen,"_ylls_"), "", rownames(xxx)) -->

<!--   xxx -->

<!-- }) -->

<!-- # sum and round results, print to html -->

<!-- sum_and_round_and_print(disease_totals,"Change in YLL by disease in ") -->

<!-- ``` -->

<!-- ### Change in YLL due to disease compared to the reference scenario per 100,000 people {#change_yll_disease_100k} -->

<!-- Change in total YLLs by disease compared to the reference scenario per 100,000 people (based on real population size, for ages considered within the ITHIM model)  -->

<!-- ```{r scen_prop = "asis"} -->

<!-- cat(paste0('Reference scenario:  ', reference_scenario)) -->

<!-- # calculate the change in YLL by disease by dividing by the population -->

<!-- disease_rates <- lapply(cities, function(x) -->

<!--   rbind( -->

<!--     disease_totals[[x]] / sum(pop_by_age[[x]]) * 100000, -->

<!--     Total = colSums(disease_totals[[x]]) / rep(sum(pop_by_age[[x]]), -->

<!--                                                length = NSCEN) * 100000)) -->

<!-- names(disease_rates) <- cities -->

<!-- # round and print results to html -->

<!-- round_and_print(disease_rates,"Change in YLL by disease per 100,000 in ") -->

<!-- ``` -->

```{r, echo=FALSE, results='hide', message=FALSE}

# Export all health results

# hb - gives all AP (for diseases affected by AP only), PA (for diseases affected by PA only), combined AP and PA (for diseases affected by both AP and PA) and inj results
# pathway_hb - gives all AP, PA and inj results (but does not combine the results for diseases affected by both AP and PA)

# initialise variables
health <- list()
measure <- c("ylls", "deaths")
pathway <- c("hb", "pathway_hb")

for (k in measure) { # loop for ylls or deaths
  for (j in pathway) { # loop for hb or pathway_hb
    for (i in names(io)[!names(io) %in% c('scen_prop','ithim_run' )]) { # loop for cities

      # Temporal dataset for health outcomes
      temp <- io[[i]]$outcomes[[j]][[k]]

      # Total population
      #overall_pop <- sum(io[[i]]$demographic$population)

      # Population by age and sex
      pop_by_age_sex <- io[[i]]$demographic %>% group_by(age, sex) %>%
        summarize(pop_age_sex = sum(population))

      # get the health burden into the correct format by creating a long format and adding additional columns
      # create long format
      health[[k]][[j]][[i]] <- gather(temp, key = "cause", "measure",
                                      -sex, -age_cat) %>%
        mutate(city = i, # create city column
               scenario = case_when( # create scenario column
                 grepl("sc_cycle", cause) ~ get_qualified_scen_name('sc_cycle'),
                 grepl("sc_car", cause) ~ get_qualified_scen_name('sc_car'),
                 grepl("sc_bus", cause) ~ get_qualified_scen_name('sc_bus'),
                 grepl("sc_motorcycle", cause) ~ get_qualified_scen_name('sc_motorcycle')
               ),
               level1 = case_when( # create level 1 column
                 grepl("all_cause", cause) ~ "L1: All Cause",
                 grepl("inj", cause) ~ "L1: RTI"
               ),
               level2 = case_when( # create level 2 column
                 grepl("total_cancer", cause) ~ "L2: Cancer",
                 grepl("CVD", cause) ~ "L2: CVD",
                 grepl("respiratory", cause) ~ "L2: Respiratory",
                 grepl("T2D", cause) ~ "L2: Other",
                 grepl("total_dementia", cause) ~ "L2: Other",
                 grepl("Parkinson", cause) ~ "L2: Other",
                 grepl("inj", cause) ~ "L2: RTI"
               ),
               level3 = case_when( # create level 3 column
                 grepl("IHD", cause) ~ "L3: IHD",
                 grepl("lung_cancer", cause) ~ "L3: lung_cancer",
                 grepl("COPD", cause) ~ "L3: COPD",
                 grepl("stroke", cause) ~ "L3: stroke",
                 grepl("T2D", cause) ~ "L3: T2D",
                 grepl("LRI", cause) ~ "L3: LRI",
                 grepl("breast_cancer", cause) ~ "L3: breast_cancer",
                 grepl("colon_cancer", cause) ~ "L3: colon_cancer",
                 grepl("endo_cancer", cause) ~ "L3: endo_cancer",
                 grepl("liver_cancer", cause) ~ "L3: liver_cancer",
                 grepl("total_dementia", cause) ~ "L3: total_dementia",
                 grepl("myeloma", cause) ~ "L3: myeloma",
                 grepl("myeloid_leukemia", cause) ~ "L3: Myeloid Leukemia",
                 grepl("Parkinson", cause) ~ "L3: Parkinson",
                 grepl("head_neck_cancer", cause) ~ "L3: head_neck_cancer",
                 grepl("stomach_cancer", cause) ~ "L3: stomach_cancer",
                 grepl("inj", cause) ~ "L3: RTI"
               ),
               dose = case_when( # create dose column
                 grepl("_pa_ap_", cause) ~ "PA and AP",
                 grepl("_pa_", cause) ~ "PA",
                 grepl("_ap_", cause) ~ "AP",
                 grepl("_inj", cause) ~ "RTI"
               )
        ) %>% # End mutate
        # join with demographic information
        left_join(pop_by_age_sex, by = c("age_cat" = "age", "sex" = "sex"))

        # %>%
        # create new columns showing burden per 100000 people by age or by age and sex
        # mutate(metric_100k = measure / overall_pop * 100000,
        #        metric_100k_sex = measure / overall_pop * 100000)

    } # Loop for each city
  } # Loop for each pathway
} # Loop for each measure


# for each health burden and pathway join all cities into one dataframe
ylls <-  rbindlist(health$ylls$hb, use.names = T) #|> filter(!str_detect(cause, "_lb") & !str_detect(cause, "_ub"))
ylls_pathway <- rbindlist(health$ylls$pathway_hb, use.names = T) #|> filter(!str_detect(cause, "_lb") & !str_detect(cause, "_ub"))
deaths <- rbindlist(health$deaths$hb, use.names = T) #|> filter(!str_detect(cause, "_lb") & !str_detect(cause, "_ub"))
deaths_pathway <- rbindlist(health$deaths$pathway_hb, use.names = T) #|> filter(!str_detect(cause, "_lb") & !str_detect(cause, "_ub"))


# # Export ylls HB
# output_file <- "results/multi_city/health_impacts/data.xlsx"
# writexl::write_xlsx(list(ylls = ylls,
#                          ylls_pathway = ylls_pathway,
#                          deaths = deaths,
#                          deaths_pathway = deaths_pathway),
#                     path = output_file)

# Export ylls HB

# write csv without output version number
write.csv(ylls,
            paste0('results/multi_city/health_impacts/ylls.csv'),
            row.names = F)
# Export ylls PATHWAY_HB
write.csv(ylls_pathway,
            paste0('results/multi_city/health_impacts/ylls_pathway.csv'),
            row.names = F)
# Export deaths HB
write.csv(deaths,
            paste0('results/multi_city/health_impacts/deaths.csv'),
            row.names = F)
# Export deaths PATHWAY_HB
write.csv(deaths_pathway,
            paste0('results/multi_city/health_impacts/deaths_pathway.csv'),
            row.names = F)


```