Skip to content

Commit

Permalink
Add vign a3 documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dinilu committed Feb 13, 2023
1 parent fd675c8 commit ef09d51
Showing 1 changed file with 48 additions and 4 deletions.
52 changes: 48 additions & 4 deletions vignettes/a3_Downscale_with_Cross-validation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,19 @@ vignette: >
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, eval = FALSE)
```

This file is to document the downscaling process of Trace-21ka paleoclimate data using the most updated and finer resolution "historical" climate data for the Western Mediterranean region and the methods and tools implemented in the `climate4R` framework.
Here, we document the downscaling process with crossvalidation of Trace-21ka paleoclimate data using the UERRA dataset for the historical period. This evaluation allows us to test several configurations of the downscaling process and select the optimal ones to run the downscaling of the whole dataset.

More information can be found [here](https://github.com/SantanderMetGroup/notebooks/blob/master/2019_downscaleR_GMD.pdf), [here](https://github.com/SantanderMetGroup/notebooks), and [here](https://github.com/SantanderMetGroup/downscaleR/wiki).


## Downscaling

The downscaling process will be divided in several steps.

### Defining variables and folds for crossvalidation

Let start by loading the R package `dsclim`. Then, we create a vector (`vars`) containing the names of several climate variables to be used as predictors in the downscaling process. The variable `local.var` is assigned the value "tasmin", which is the variable that we aim to downscale in the first place. The next line creates a list of six elements containing the years 1961 to 1990 split into five-year groups, this is where we are defining the years to be used in the crossvalidation. To do so, we will fit one downscaling model for each of the five years period using data from the rest of the years and using this period to evaluate/validate the model. Hence, we will fit six models. Finally, the `trace.dir` variable is set to the path of a directory containing TraCE21ka data.

```{r defining_predictos_and_folds}
library(dsclim)
Expand All @@ -41,6 +45,8 @@ trace.dir <- "../../Data/TraCE21ka/"

### Defining parameters for downscaling

The following code is setting up parameters for the R package `downscaleR`. The `spatial.pars` list is defining the variables that should be combined using a Principal Component Analysis. Here, we specify all predictor variables. Hence, those models using spatial.pars argument will use all the variables to fit a PCA and transform the varibles. More especifically, the PCA will retain those axis that explain up to 90% of the variance (`v.exp`) and will not be rotated. The ` local.pars.XXX` objects define different configurations of local vars for the models. `M2` models use the only one variable as the local variable, whereas `M3` uses all predictor variables as local variables. Finally, models ending in 1 use the information of the local pixel as predictor, whereas models ending in 4 use the information from the 4 closest pixels as predictors.

```{r defining_ds_parameters}
library(downscaleR)
Expand All @@ -62,6 +68,8 @@ local.pars.M34 <- list(n = 4, vars = vars)

### Data loading

To load the data, we retrieve file names from all files in the trace directory. The first line assigns the trace file names from the trace directory to the variable "trace.file.names". Then, we loads the trace files into the variable "hist.trace" and specifies the years 1961-1990, since we are not intended to downscale the whole period but only the historical period to run the crossvalidation. The following lines check that the data were loaded properly.

```{r load_trace_data}
trace.file.names <- traceFileNames(trace.dir)
hist.trace <- dsclim::loadTrace(trace.file.names, years = 1961:1990)
Expand All @@ -72,6 +80,7 @@ head(hist.trace$Dates[[1]]$end)
tail(hist.trace$Dates[[1]]$end)
```

Now, we need to load the data from the UERRA project since this is going to be our current high resolution climate data to be used as predictand. The first line is loading the local variable (`local.var`) from the `2m_temperature/latlon/1961-90_2m_tmin.nc` file in the UERRA-HARMONIE directory and and assigning it to the variable `uerra.data`. The following lines just check that data were loaded properly and matches the same dates than loaded TraCE21ka data.

```{r load_uerra_data}
uerra.data <- loadUerra("../../Data/UERRA/UERRA-HARMONIE/2m_temperature/latlon/1961-90_2m_tmin.nc", local.var)
Expand All @@ -85,10 +94,18 @@ tail(uerra.data$Dates$end)

### Downscaling {.tabset .tabset-pills}

Select a model
Now, we can start fitting models with the cross validation approach.

In general, the code uses the `downscaleR` package to fit a Generalized Linear Model (GLM) downscaling method to create a model that predicts paleoclimate data from historical climate. The `family` argument sets the distribution of the model to a Gaussian distribution with an `identity` link. The `folds` argument sets the number of folds to use in the cross-validation. The `prepareData.args` is a list of arguments that sets the global, local, and spatial predictors to use in the model.

The result from the downscaling is a high resolution dataset for the whole period (1961-1990), although the data for group of five years were not used to fit the particular model used to dowscale this five years data. Finally, we use the `spatialPlot` function from the `visualizeR` package to plot an average of the output (e.g. a `climatology` calculated with the `transformeR` package).

To see the code for each model, select a tab:

#### GLM.1.sp

This model uses spatial pars but do not use global or local variables.

```{r fit_GLM.1.sp}
GLM.1.sp <- downscaleR::downscaleCV(
x = hist.trace,
Expand All @@ -111,6 +128,9 @@ visualizeR::spatialPlot(transformeR::climatology(GLM.1.sp))


#### GLM.21

This model uses a local variables with 1 local pixel, but no spatial or global variables.

```{r fit_GLM.21}
GLM.21 <- downscaleR::downscaleCV(
x = hist.trace,
Expand All @@ -132,6 +152,9 @@ visualizeR::spatialPlot(transformeR::climatology(GLM.21))


#### GLM.21.sp

This model uses a local variables with 1 local pixel and spatial variables, but no global variables.

```{r fit_GLM.21.sp}
GLM.21.sp <- downscaleR::downscaleCV(
x = hist.trace,
Expand All @@ -154,6 +177,9 @@ visualizeR::spatialPlot(transformeR::climatology(GLM.21.sp))


#### GLM.24

This model uses a local variables with 4 local pixel, but no spatial or global variables.

```{r fit_GLM.24}
GLM.24 <- downscaleR::downscaleCV(
x = hist.trace,
Expand All @@ -175,6 +201,9 @@ visualizeR::spatialPlot(transformeR::climatology(GLM.24))


#### GLM.24.sp

This model uses a local variables with 4 local pixel and spatial variables, but no global variables.

```{r fit_GLM.24.sp}
GLM.24.sp <- downscaleR::downscaleCV(
x = hist.trace,
Expand All @@ -196,6 +225,9 @@ visualizeR::spatialPlot(transformeR::climatology(GLM.24.sp))


#### GLM.31

This model uses all variables as local variables with 1 local pixel, but no spatial or global variables.

```{r fit_GLM.31}
GLM.31 <- downscaleR::downscaleCV(
x = hist.trace,
Expand All @@ -217,6 +249,9 @@ visualizeR::spatialPlot(transformeR::climatology(GLM.31))


#### GLM.34

This model uses all variables as local variables with 4 local pixel, but no spatial or global variables.

```{r fit_GLM.34}
GLM.34 <- downscaleR::downscaleCV(
x = hist.trace,
Expand All @@ -239,9 +274,12 @@ visualizeR::spatialPlot(transformeR::climatology(GLM.34))

## Evaluate downscaling performance

Now that we have a downscaled version of the data, we can evaluate their values against the "real" data, represented here by the UERRA dataset.

### Create function to extract info

First, we define a function that will run the evaluation and extract significant metrics.

```{r function_for_accuracy_metrics}
ds_validation <- function(models, obs, measure.code = "bias", index.code) {
l <- lapply(
Expand All @@ -265,6 +303,8 @@ ds_validation <- function(models, obs, measure.code = "bias", index.code) {

### Run evaluation

Then, we can use this function to evaluate all the models. To speed things up, we run this code in several cores of the computer using the `parallel` package.

```{r calculate_accuracy_metrics}
ds.methods <- c("GLM.1.sp", "GLM.21", "GLM.21.sp", "GLM.24", "GLM.24.sp", "GLM.31", "GLM.34")
Expand Down Expand Up @@ -300,11 +340,11 @@ names(val.results) <- value.indices

## Plot evaluations' results

Finally, we can plot the results from the evaluations in several formats.

### Error (or bias) maps {.tabset .tabset-pills}

Select a model

First, we plot mean bias in each pixel (the mean include 30 years * 12 months data). To see the maps from each model select a tab:

#### GLM.1.sp
```{r map_biasses_glm1sp}
Expand Down Expand Up @@ -390,6 +430,8 @@ visualizeR::spatialPlot(val.results[[1]][[7]],

### Violin plots sumarizing results

To have an overall perspective of all the metrics from the different models, we can use violin plots for each model and the three metrics of the bias (mean, sd, and skewness).

```{r plot_results}
library(reshape2)
library(ggplot2)
Expand All @@ -414,6 +456,8 @@ ggplot(test, aes(x = L2, y = value)) +

### Error or bias histograms

Finally, we complement previous evaluations ploting the histograms from each metric.

```{r plot_residual_hist, out.width=500}
library(transformeR)
par(mfrow = c(3, 3))
Expand Down

0 comments on commit ef09d51

Please sign in to comment.