Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update habitatmap_stdized and habitatmap_terr #50

Merged
merged 48 commits into from
May 12, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
30729be
add link to zenodo
ToonHub Feb 3, 2021
c1d752c
add correction for 3140_rbbmr
ToonHub Feb 3, 2021
d216633
aggregate two records of the same type within a polygon
ToonHub Feb 3, 2021
2b905bd
remove write_sessioninfo
ToonHub Feb 4, 2021
71cb190
add renv.lock file
ToonHub Feb 9, 2021
ecd3ad8
add openssl to calculate file hashes
ToonHub Feb 9, 2021
24cd961
update habmap_correction, only keep current code corrections applied
ToonHub Feb 9, 2021
6bd0668
check if correct file version is used based on file hashes
ToonHub Feb 9, 2021
9d7f196
add openssl to check data source version based on hashes
ToonHub Feb 9, 2021
6625e5b
check file hashes of used data sources
ToonHub Feb 9, 2021
81c508e
add renv.lock
ToonHub Feb 9, 2021
5e8b060
add some code to stop when evaluating the hashes gives a deviation fr…
w-jan Feb 10, 2021
597fc28
add stop-rules (cfr previous commit)
w-jan Feb 10, 2021
d1a3d9e
use only md5 hashes to check forr correct version (not sha256)
ToonHub Feb 24, 2021
20ec2da
update project name: process_habitatmap --> generate_habitatmap_stdized
ToonHub Feb 24, 2021
046f32a
add version geospatial libraries
ToonHub Feb 24, 2021
c080e0b
update title
ToonHub Feb 24, 2021
55c2eef
add not-ignored files
ToonHub Feb 24, 2021
e61d612
add code to check result
ToonHub Feb 24, 2021
1eacc10
for habitatmap code '31xx,rbbmr' set for both types certain = TRUE
ToonHub Feb 24, 2021
197fc39
add crs = 31370 when reading habitatmap
ToonHub Feb 24, 2021
fde539c
modified the aggregation of different records for the same type withi…
ToonHub Feb 24, 2021
73e4937
use only md5 hashes to check for correct version
ToonHub Feb 24, 2021
512f6f6
change link zenodo
ToonHub Feb 24, 2021
5a3d513
update session info
ToonHub Feb 24, 2021
d156420
add code for checking result habitatmap_terr
ToonHub Feb 24, 2021
689f139
change columns order in types table
ToonHub Feb 24, 2021
1b5602f
avoid unintended overwriting of result as file hashes will change
ToonHub Feb 24, 2021
50c363c
habmap_std/terr: also print sha256sum in the file evaluation chapter *
florisvdh Mar 2, 2021
01b2e65
habmap_std/terr: define filepath earlier
florisvdh Mar 2, 2021
b330115
habmap_stdized: create data folder (as in habitatmap_terr)
florisvdh Mar 2, 2021
943f46d
habmap_std/terr: apply explicit GPKG timestamp for reproducibility
florisvdh Mar 2, 2021
0ab5e58
habmap_std/terr: improve comment on ISO 8601 timestamp
florisvdh Mar 3, 2021
cd3d09b
habitatmap_terr: drop sessioninfo.txt
florisvdh Mar 3, 2021
14d530a
habmap_std/terr, timestamping: no need to call DBI explicitly
florisvdh Mar 3, 2021
ff4a8b4
habmap_std/terr, timestamping: drop RSQLite, use GDAL env var
florisvdh Mar 8, 2021
16f9d12
Merge pull request #52 from inbo/update_habitatmap_stdized_terr_fv
ToonHub Mar 11, 2021
a44de74
change table to convert 7140 into 7140_meso
ToonHub Mar 11, 2021
efc3a48
change timestamp to creation date
ToonHub Mar 11, 2021
0ea2617
change some of the documentation
ToonHub Mar 11, 2021
2dcfd7a
add checks
ToonHub Mar 11, 2021
3a08cac
change timestamp to creation date
ToonHub Mar 11, 2021
fe92f98
change md5_ref
ToonHub Mar 11, 2021
5afe0a9
drop eval = FALSE as md5-hash is stable due to fixed timestamp
ToonHub Mar 11, 2021
f007f41
provide the code from the original habitat map (for example 31xx_rbbm…
ToonHub Mar 11, 2021
f1b1eb6
change md5_ref
ToonHub Mar 11, 2021
b088c44
add some checks
ToonHub Mar 11, 2021
92ed3f8
Update src/generate_habitatmap_stdized/10_generate_habmap_stdized.Rmd
ToonHub May 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 132 additions & 42 deletions src/generate_habitatmap_stdized/10_generate_habmap_stdized.Rmd
Original file line number Diff line number Diff line change
@@ -1,58 +1,77 @@
# Generate standardized habitatmap

## Data source
The shapefile of the BWK and Natura 2000 habitat map of Flanders can be downloaded here[http://www.geopunt.be/download?container=bwk2%5C2018&title=Biologische%20waarderingskaart%20-%20Natura%202000%20Habitatkaart].

```{r read_raw_data}
The shapefile of the BWK and Natura 2000 habitat map of Flanders can be downloaded [here](https://zenodo.org/record/4428002#.YBqzu-hKigY).

path <- fileman_up("n2khab_data")
file <- "10_raw/habitatmap"
To be sure we will use the correct version of the data source (habitatmap_2020), we will derive the md5 and sha256 file hashes and compare it to the file hashes in the [data source version overview table](https://docs.google.com/spreadsheets/d/1E8ERlfYwP3OjluL8d7_4rR1W34ka4LRCE35JTxf3WMI/edit#gid=2100595853).

habitatmap_sf <- st_read(file.path(path, file),
layer = "habitatmap")
```
```{r}

## Correction of some of the codes in the Habitat map
path <- fileman_up("n2khab_data")
file <- "10_raw/habitatmap"

For some polygons in the habitat map the vegetation type is not certain. When
this is the case, the code contains the possible vegetation types separated with
a ','. However, in some cases this way of coding is not used. In that case we
change the code according to the general coding rules (for example 3130_rbbmr is
changed to 3130,rbbmr). This makes processing of the habitat map more
straightforward. Table \@ref(tab:code_corrected) shows the corrected codes.
mypath <- file.path(path, file)

hashes <-
tibble(filepath = str_c(mypath, "/",
list.files(path = mypath,
recursive = TRUE)
)) %>%
mutate(filename = str_match(filepath, "(.+\\/)*(.+)")[,3],
md5 = map(filepath, function(x) {
file(x) %>% md5 %>% str_c(collapse = '')
}) %>% as.character,
sha256 = map(filepath, function(x) {
file(x) %>% sha256 %>% str_c(collapse = '')
}) %>% as.character,
md5_ref = c("f767a12540b2bde435c6709c9c9675ad",
"358ff672c2fae48eba5bd09f8a671675",
"f881f61a6c07741b58cb618d8bbb0b99",
"1da89a5dc267bbd427c8c07fcf63f344",
"79ae8b4b3c6d970a1f10e78f3eca9ef7"),
sha256_ref = c("c76819ecf3f1276caea9ff8514b4c0f2d201a12e4b13dabd70dd7c42bc05318a",
"be029dba3d7ff730407842eee3b43bbec8b2536701ccf7e54c114ba6ee8ac6f4",
"a8225ebf83d19412567ce9b909f03631626eec43dffb1f067c7edf3d59ab3aa4",
"c457ab6ae5728b4c0da79793a6d328806aeca13234a76b7d41b1cf8c50b78a19",
"5f717cd7393869e396ddb174a618bbf615521968fad6af605ed5586d250768cd"),
match = md5 == md5_ref & sha256 == sha256_ref) %>%
select(filename,
md5,
sha256,
md5_ref,
sha256_ref,
match)



kable(hashes) %>%
kable_styling()

```{r, eval = FALSE}
if (!all.equal(hashes$md5, hashes$md5_ref) == TRUE | !all.equal(hashes$sha256, hashes$sha256_ref) == TRUE) {
stop(cat("The source map is NOT up to date ! Please check the datasource. "))
}
```

habmap_correction <- gs_key("1PU9MDyJKZz2DOrKqb-SIIVmSaR1mh4Ke6v0WCqhr8t4") %>%
gs_read(ws = 3)

write_vc(habmap_correction, "habmap_correction/habmap_correction",
sorting = "code")
```{r read_raw_data}

habitatmap_sf <- st_read(file.path(path, file),
layer = "habitatmap")
```

```{r code_corrected}

habmap_correction <- read_vc("habmap_correction/habmap_correction")

habmap_correction %>%
kable(caption = "Corrected codes in habitat map") %>%
kable_styling()

```

## Processing of the attribute table

Every polygon in the habitat map can consist of maximum 5 different vegetation types. This information is stored in the columns 'HAB1', 'HAB2',..., 'HAB5' of the attribute table. The estimated fraction of each vegetation type within the polygons is stored in the columns 'PHAB1', 'PHAB2', ..., 'PHAB5'.

We will convert the attribute table to a long format, so that every row contains one vegetation type.

In some cases, the vegetation type is uncertain, and 2 or 3 possible vegetation types are provided, separated with a ','. We will split the different possible vegetation types and create one row for each of them. An additional variable 'certain' will be FALSE if the orginal habitatmap code consists of 2 or 3 possible vegetation types, and TRUE if only one vegetation type is provided.


```{r select_polygons}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note - feel free to make such changes. Gathering HAB & PHAB columns can now be done in one step using pivot_longer_spec(). E.g. first example below https://tidyr.tidyverse.org/articles/pivot.html#by-hand-1.


habmap_sf <- habmap_sf %>%
habmap_sf <- habitatmap_sf %>%
filter(!(HAB1 == "gh" & PHAB1 == 100)) %>%
mutate(polygon_id = TAG, # unieke id
description_orig = str_c(PHAB1, "% ", HAB1,
Expand Down Expand Up @@ -81,6 +100,38 @@ habmap_longHAB <- habmap_sf %>%
filter(!is.na(code)) %>%
filter(! code %in% c("gh", "x"))

```

## Correction of some of the codes in the Habitat map

For some polygons in the habitat map the vegetation type is not certain. When
this is the case, the code contains the possible vegetation types separated with
a ','. However, in some cases this way of coding is not used. In that case we
change the code according to the general coding rules (for example 3130_rbbmr is
changed to 3130,rbbmr). This makes processing of the habitat map more
straightforward. Table \@ref(tab:codeCorrected) shows the corrected codes and the number of polygons for which the correction is applied.


```{r }

habmap_correction <- read_vc("habmap_correction/habmap_correction")

overview_habmap_correction <- habmap_longHAB %>%
inner_join(habmap_correction, by = "code") %>%
group_by(code, code_corrected) %>%
summarise(n_polygons = n()) %>%
ungroup()

```


```{r codeCorrected}
overview_habmap_correction %>%
kable(caption = "Corrected codes in habitat map") %>%
kable_styling()
```

```{r}
if(sum(habmap_correction$code %in% habmap_longHAB$code) > 0){

habmap_longHAB <- habmap_longHAB %>%
Expand All @@ -96,6 +147,16 @@ if(sum(habmap_correction$code %in% habmap_longHAB$code) > 0){
habmap_longHAB <- habmap_longHAB %>%
mutate(code_orig = code)
}
```


## Splitting uncertain types

When the vegetation type is uncertain, and 2 or 3 possible vegetation types are provided, separated with a ','.
We will split the different possible vegetation types and create one row for each of them.
An additional variable 'certain' will be FALSE if the orginal habitatmap code consists of 2 or 3 possible vegetation types, and TRUE if only one vegetation type is provided.

```{r}

habmap_long <- habmap_longHAB %>%
left_join(habmap_longPHAB, by = c("polygon_id", "patch_id")) %>%
Expand All @@ -109,14 +170,47 @@ habmap_long <- habmap_longHAB %>%
filter(!(type %in% c("gh", "bos"))) %>%
select(-patch_id)

```



In some cases you get two records for the same type, one certain and one uncertain, within the same habitatmap polygon.
See for example the case below where you get two phab values for type 3150 within the same polygon.
Furthermore the sum of the phab values within the polygon is higher than 100.
Therefore we will distribute the phab value for uncertain types equally over the two possible types (in this case phab = 20 for type 3150 and phab = 20 for type rbbmr).
Next we sum the phab values over the types within the same polygon.

```{r}
habmap_long %>%
filter(polygon_id == "357179_v2020")
```


```{r}
habmap_long <- habmap_long %>%
group_by(polygon_id, code) %>%
mutate(n = n()) %>%
ungroup() %>%
group_by(polygon_id, type) %>%
summarise(phab = sum(phab/n),
code_orig = str_c(code, collapse = "; "),
certain = all(certain)) %>%
ungroup()
```
In this example we will get phab = 80 for 3150 and phab 20 for rbbmr as can be seen below.

```{r}
habmap_long %>%
filter(polygon_id == "357179_v2020")
```



## Select vegetation types that belong to the standard list of habitat and rbb types

Table \@ref(tab:select_types) shows the records with habitat types that do not belong to the standar list of habitat and rbb types. These records are filtered out.
Table \@ref(tab:selectTypes) shows the records with habitat types that do not belong to the standard list of habitat and rbb types. These records are filtered out.

```{r select_types}
```{r selectTypes}

types <- read_types() %>%
select(type, typelevel, main_type, typeclass)
Expand All @@ -126,7 +220,7 @@ habmap_long <- habmap_long %>%

habmap_types <- habmap_long %>%
filter(!is.na(typelevel)) %>%
select(polygon_id, code_orig = code, phab, certain, type) %>%
select(polygon_id, code_orig, phab, certain, type) %>%
mutate(type = factor(type,
levels = levels(types$type)
)
Expand All @@ -137,12 +231,13 @@ habmap_other_type <- habmap_long %>%
filter(is.na(typelevel))

habmap_other_type %>%
select(polygon_id, code, phab, certain, type) %>%
kable() %>%
select(polygon_id, code_orig, phab, certain, type) %>%
kable(caption = "Polygons with types that do not coorspond with the standard list of types and rbb") %>%
kable_styling()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Steven's (BioDiv) comment on the fact that there are 2 polygons with 6110,gh:

  • officially no 6110 in Flanders
  • some steep slopes along the Albert Canal with a vegetation that could be classified in the Netherlands as a species-poor 6110, but it is dubious. There is consultation needed with the Dutch colleagues.

Seems ok to me to drop 6110 for the moment


```


## Select features that contain habitat or rbb types

```{r}
Expand All @@ -157,13 +252,8 @@ habmap_types_sf <- habmap_sf %>%

```{r}

st_write(habmap_types_sf, file.path(path,"20_processed/habitatmap_stdized/habitatmap_stdized.gpkg"), layer = "habitatmap_polygons", driver = "GPKG")

con = dbConnect(RSQLite::SQLite(),
dbname = file.path(path,"20_processed/habitatmap_stdized/habitatmap_stdized.gpkg"))

dbWriteTable(con, "habitatmap_types", habmap_types)
st_write(habmap_types_sf, file.path(path,"20_processed/habitatmap_stdized/habitatmap_stdized.gpkg"), layer = "habitatmap_polygons", driver = "GPKG", delete_dsn = TRUE)

dbDisconnect(con)
st_write(habmap_types, file.path(path,"20_processed/habitatmap_stdized/habitatmap_stdized.gpkg"), layer = "habitatmap_types", driver = "GPKG", append = TRUE)

```
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
code code_corrected
3130_rbbmr 3130,rbbmr
3140_rbbmr 3140,rbbmr
3150_rbbmr 3150,rbbmr
3160_rbbmr 3160,rbbmr
7140_cl 7140
91E0_vnva 91E0_vn,91E0_va
91E0_vnvm 91E0_vn,91E0_vm
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
..generic:
git2rdata: 0.0.5
git2rdata: 0.3.1
optimize: yes
NA string: NA
sorting: code
hash: e258eadad45dcddd9c388f37c17aaeb8e590e978
data_hash: 8b326b50c61c13db225e4a5e1234bb44c203bf74
data_hash: 79478f79fa7b9f4fd1623ebf46cf4800233a0072
code:
class: character
code_corrected:
Expand Down
6 changes: 4 additions & 2 deletions src/generate_habitatmap_stdized/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,20 @@ output:
---

```{r setup, include=FALSE}

options(stringsAsFactors = FALSE,
scipen = 999,
digits = 15)

renv::restore()
library(sf)
library(tidyverse)
library(stringr)
library(knitr)
library(n2khab)
library(git2rdata)
library(googlesheets)
library(kableExtra)
library(DBI)
library(openssl)

opts_chunk$set(
echo = TRUE,
Expand Down
Loading