2_SDM tutorial.Rmd

---
title: "Practice SDM for Clubes de Ciencias!"
output: html_notebook
---

Here we go! We are going to be making our first SDM of clubes de ciencias, and doing it with computer programming in the R language!

Remeber, the first thing we have to do is let R know which packages we are using to make our SDMs.

The three R packages are called `raster`, `rgdal`, and `dismo`

First, remove the \# before the line of code, and then run the `install.packages` function. After you run the line of code, add the \# back into the line of code.

```{r}
# install.packages(c('raster', 'rgdal', 'dismo','rJava','maptools'))
```

### Trouble w/ installing `rJava`?

I was having some trouble getting `rJava` to load on my mac, so then I ran the following code: source of this tip [here](https://zhiyzuo.github.io/installation-rJava/). You will have to remove the `#` to get the code below to run

**BUT please ignore it if rJava loads for you.**

```{r}
# install.packages("rJava")
# dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/server/libjvm.dylib')
# library(rJava)
```

### Trouble installing `dismo`?

If the dismo package won't install, you may have to click the **console** tab below, and when asked

> Do you want to install from sources the packages which need compilation? (Yes/no/cancel)

type "no" and press "enter

### Time for a sloth SDM!

![](https://cdn6.dissolve.com/p/D1294_19_688/D1294_19_688_1200.jpg)

Now, we'll import the set of occurrence data that we'll be using for our first SDM. We have to first load the `dismo` R pacakge.

```{r}
library(dismo) # This line of code loads the dismo package
file <- paste(system.file(package="dismo"), "/ex/bradypus.csv", sep="") # This line of code loads a whole Excel database that contains GPS points for the sloth.
```

Run the code below to view a table of what the data we saved as the object `file` looks like in R. Notice, we save the table as **sloth_data** and then have to type `head(sloth_data)` to view the first five rows of the data table.

```{r}
sloth_data <- read.table(file,  header=TRUE,  sep=",") # This line of code turns our excel spreadsheet into a table that R can read.
head(sloth_data)  # this line of code lets us actually look at the table in R
```

The table with all the sloth data has lots of extra information. We don't need all of it to make an SDM, so we'll keep just the GPS points (lon, lat) for where the sloth has been observed.

```{r}
sloth_data <- sloth_data[,2:3] # This line of code tells R to keep only the longitude (lon) and latitude (lat) for the sloths
head(sloth_data)
```

### The climate data

Similar to how we pulled in sloth data, we now have to locate and bring into R all of the climate data.

```{r}
link_to_find_climate_data <- file.path(system.file(package="dismo"), 'ex') # This line of code tells R where we stored out climate data.

climate_files <- list.files(link_to_find_climate_data, pattern='grd$', full.names=TRUE )
climate_files # here we see that R found many files related to global climate on the computer.
```

Our next step is to group our climate data. R calls this **stacking** so now, let's **stack** all of our climate data.

```{r}
stacked_climate_data <- stack(climate_files)
stacked_climate_data
```

Let's view what our climate data are called. They have kind of weird names. Bio1, bio12, bio16. Basically each one represents global temperatures, global precipitation and more!

For more information on what each variable represents, you can visit this [link](https://www.worldclim.org/data/bioclim.html).

```{r}
names(stacked_climate_data)
```

Let's visualize our climate on a map. Areas that are colored in green mean those areas have higher annual temperature averages (bio1) or higher precipitation amounts (bio12).

```{r}
plot(stacked_climate_data)
```

We can even view one layer of climate data at a time if we want. So here, let's visualize bio1, which represents annual mean temperature.

```{r}
plot(stacked_climate_data, 1)
```

### Mapping climate data and occurrence data

Now we can visually show, in the same window, one layer of climate data, plus a map of the world, plus the occurrence data for our species. It will take a few steps, but let's start with loading a new library in R called `maptools` which is super useful for making maps in R.

```{r}
library(maptools)
data(wrld_simpl) # This line of data will download a map of the world to R.
plot(wrld_simpl) # Here we can get a look at what the map looks like.
```

Now, let's make a quick map of the sloth's range. We will:

1)  plot the world map, and then
2)  use the points() function to add in points wherever a sloth has been spotted.

You can switch out the color `blue` for another color if you want :)

```{r}
plot(wrld_simpl)
points(sloth_data, col='blue')

```

### Extracting data

Now, we gather climate information at every blue dot on our map. This is our way of knowing whether sloths like area with lots of rain, or warm temperatures. This is the best and most useful part of a species distribution model.

After taking the next few steps and writing the next lines of code, we have many data points corresponding to the type of climate that the species likes best, and other data points corresponding to climate the sloth doesn't like as much.

```{r}
set.seed(0) # We need this line of R code to make we get the same results every time we run this code.
group <- kfold(sloth_data, 5)
locations_sloths_like_train <- sloth_data[group != 1, ]
locations_sloths_like_test <- sloth_data[group == 1, ]
```

#### What kind of climate does the sloth **NOT** like?

```{r}
extent_of_our_map <- extent(-90, -32, -33, 23) # this is the GPS range for our whole map.
stacked_climate_data <- dropLayer(stacked_climate_data, 'biome') # get rid of one piece of climate data that is tough to work with
locations_sloths_dislike <- randomPoints(stacked_climate_data, n=1000, ext=extent_of_our_map, extf = 1.25) # This line of code gather data on the climate that the sloth DOES NOT like very much.
```

We don't need to understand the next lines of code that much, but they are important for our machine learning (in other words our statistical) models.  If you are interested in machine learning, here is where we create our **training** and **testing** datasets. This is a key step to letting us know if the predictions we make are accurate.

```{r}
colnames(locations_sloths_dislike) = c('lon', 'lat')
group <- kfold(locations_sloths_dislike, 5)
locations_sloth_dislikes_train <- locations_sloths_dislike[group != 1, ]
locations_sloth_dislikes_test <- locations_sloths_dislike[group == 1, ]
```

### Running the Maxent machine learning model

Now, we make our predictive maps using the modeling software called `Maxent`. This is a machine learning algorithm for making predictions. This part might take a little while, so be patient : )

The output from this step is a set of the most important climate variables for the sloth. Which variable is most important for the sloth? Bio7 and Bio 12, which relate to temperature range and precipitation.
```{r}
maxent()
# jar <- paste(system.file(package="dismo"), "/java/maxent.jar", sep='')
maxent_results <- maxent(stacked_climate_data, locations_sloths_like_train)
plot(maxent_results)
```

Next, we can create **response plots** for each of our climate variables. In the plots below, the y-axis represents the chance of finding a sloth at a particular value of a climate variable (the x-axis). Run the code below, the focus in on the *bio12* graph.  As precipitation values increase along the x-axis, you see that the chance of us finding a sloth increases (the red line goes up!).

```{r}
response(maxent_results)
```

One of the most useful parts of creating SDMs is the maps that result from our maxent model predictions.  Run the code block below. In the map on the left, you see bright green in areas where it's highly likely we find sloths.  In the map on the right you'll see black X's where sloths were seen, and areas of green where our model predicted sloths might be found.

```{r}
e <- evaluate(locations_sloths_like_test, locations_sloth_dislikes_test, maxent_results, stacked_climate_data)
e
px <- predict(stacked_climate_data, maxent_results, ext=extent_of_our_map, progress='')
par(mfrow=c(1,2))
plot(px, main='Maxent, raw values')
plot(wrld_simpl, add=TRUE, border='dark grey')
tr <- threshold(e, 'spec_sens')
plot(px > tr, main='presence/absence')
plot(wrld_simpl, add=TRUE, border='dark grey')
points(locations_sloths_like_train, pch='+')

```