Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update habitatmap_stdized and habitatmap_terr #50

Merged
merged 48 commits into from
May 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
30729be
add link to zenodo
ToonHub Feb 3, 2021
c1d752c
add correction for 3140_rbbmr
ToonHub Feb 3, 2021
d216633
aggregate two records of the same type within a polygon
ToonHub Feb 3, 2021
2b905bd
remove write_sessioninfo
ToonHub Feb 4, 2021
71cb190
add renv.lock file
ToonHub Feb 9, 2021
ecd3ad8
add openssl to calculate file hashes
ToonHub Feb 9, 2021
24cd961
update habmap_correction, only keep current code corrections applied
ToonHub Feb 9, 2021
6bd0668
check if correct file version is used based on file hashes
ToonHub Feb 9, 2021
9d7f196
add openssl to check data source version based on hashes
ToonHub Feb 9, 2021
6625e5b
check file hashes of used data sources
ToonHub Feb 9, 2021
81c508e
add renv.lock
ToonHub Feb 9, 2021
5e8b060
add some code to stop when evaluating the hashes gives a deviation fr…
w-jan Feb 10, 2021
597fc28
add stop-rules (cfr previous commit)
w-jan Feb 10, 2021
d1a3d9e
use only md5 hashes to check forr correct version (not sha256)
ToonHub Feb 24, 2021
20ec2da
update project name: process_habitatmap --> generate_habitatmap_stdized
ToonHub Feb 24, 2021
046f32a
add version geospatial libraries
ToonHub Feb 24, 2021
c080e0b
update title
ToonHub Feb 24, 2021
55c2eef
add not-ignored files
ToonHub Feb 24, 2021
e61d612
add code to check result
ToonHub Feb 24, 2021
1eacc10
for habitatmap code '31xx,rbbmr' set for both types certain = TRUE
ToonHub Feb 24, 2021
197fc39
add crs = 31370 when reading habitatmap
ToonHub Feb 24, 2021
fde539c
modified the aggregation of different records for the same type withi…
ToonHub Feb 24, 2021
73e4937
use only md5 hashes to check for correct version
ToonHub Feb 24, 2021
512f6f6
change link zenodo
ToonHub Feb 24, 2021
5a3d513
update session info
ToonHub Feb 24, 2021
d156420
add code for checking result habitatmap_terr
ToonHub Feb 24, 2021
689f139
change columns order in types table
ToonHub Feb 24, 2021
1b5602f
avoid unintended overwriting of result as file hashes will change
ToonHub Feb 24, 2021
50c363c
habmap_std/terr: also print sha256sum in the file evaluation chapter *
florisvdh Mar 2, 2021
01b2e65
habmap_std/terr: define filepath earlier
florisvdh Mar 2, 2021
b330115
habmap_stdized: create data folder (as in habitatmap_terr)
florisvdh Mar 2, 2021
943f46d
habmap_std/terr: apply explicit GPKG timestamp for reproducibility
florisvdh Mar 2, 2021
0ab5e58
habmap_std/terr: improve comment on ISO 8601 timestamp
florisvdh Mar 3, 2021
cd3d09b
habitatmap_terr: drop sessioninfo.txt
florisvdh Mar 3, 2021
14d530a
habmap_std/terr, timestamping: no need to call DBI explicitly
florisvdh Mar 3, 2021
ff4a8b4
habmap_std/terr, timestamping: drop RSQLite, use GDAL env var
florisvdh Mar 8, 2021
16f9d12
Merge pull request #52 from inbo/update_habitatmap_stdized_terr_fv
ToonHub Mar 11, 2021
a44de74
change table to convert 7140 into 7140_meso
ToonHub Mar 11, 2021
efc3a48
change timestamp to creation date
ToonHub Mar 11, 2021
0ea2617
change some of the documentation
ToonHub Mar 11, 2021
2dcfd7a
add checks
ToonHub Mar 11, 2021
3a08cac
change timestamp to creation date
ToonHub Mar 11, 2021
fe92f98
change md5_ref
ToonHub Mar 11, 2021
5afe0a9
drop eval = FALSE as md5-hash is stable due to fixed timestamp
ToonHub Mar 11, 2021
f007f41
provide the code from the original habitat map (for example 31xx_rbbm…
ToonHub Mar 11, 2021
f1b1eb6
change md5_ref
ToonHub Mar 11, 2021
b088c44
add some checks
ToonHub Mar 11, 2021
92ed3f8
Update src/generate_habitatmap_stdized/10_generate_habmap_stdized.Rmd
ToonHub May 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/generate_habitatmap_stdized/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
source("renv/activate.R")
207 changes: 159 additions & 48 deletions src/generate_habitatmap_stdized/10_generate_habmap_stdized.Rmd
Original file line number Diff line number Diff line change
@@ -1,58 +1,64 @@
# Generate standardized habitatmap

## Data source
The shapefile of the BWK and Natura 2000 habitat map of Flanders can be downloaded here[http://www.geopunt.be/download?container=bwk2%5C2018&title=Biologische%20waarderingskaart%20-%20Natura%202000%20Habitatkaart].

```{r read_raw_data}

path <- fileman_up("n2khab_data")
file <- "10_raw/habitatmap"

habitatmap_sf <- st_read(file.path(path, file),
layer = "habitatmap")
```
The shapefile of the BWK and Natura 2000 habitat map of Flanders can be downloaded [here](https://zenodo.org/record/4428002#.YBqzu-hKigY).
ToonHub marked this conversation as resolved.
Show resolved Hide resolved

## Correction of some of the codes in the Habitat map
To be sure we will use the correct version of the data source (habitatmap_2020), we will derive the md5 file hashes and compare it to the file hashes in the [data source version overview table](https://docs.google.com/spreadsheets/d/1E8ERlfYwP3OjluL8d7_4rR1W34ka4LRCE35JTxf3WMI/edit#gid=2100595853).

For some polygons in the habitat map the vegetation type is not certain. When
this is the case, the code contains the possible vegetation types separated with
a ','. However, in some cases this way of coding is not used. In that case we
change the code according to the general coding rules (for example 3130_rbbmr is
changed to 3130,rbbmr). This makes processing of the habitat map more
straightforward. Table \@ref(tab:code_corrected) shows the corrected codes.

```{r, eval = FALSE}
```{r}

habmap_correction <- gs_key("1PU9MDyJKZz2DOrKqb-SIIVmSaR1mh4Ke6v0WCqhr8t4") %>%
gs_read(ws = 3)
path <- fileman_up("n2khab_data")
file <- "10_raw/habitatmap"

write_vc(habmap_correction, "habmap_correction/habmap_correction",
sorting = "code")
mypath <- file.path(path, file)

hashes <-
tibble(filepath = str_c(mypath, "/",
list.files(path = mypath,
recursive = TRUE)
)) %>%
mutate(filename = str_match(filepath, "(.+\\/)*(.+)")[,3],
md5 = map(filepath, function(x) {
file(x) %>% md5 %>% str_c(collapse = '')
}) %>% as.character,
md5_ref = c("f767a12540b2bde435c6709c9c9675ad",
"358ff672c2fae48eba5bd09f8a671675",
"f881f61a6c07741b58cb618d8bbb0b99",
"1da89a5dc267bbd427c8c07fcf63f344",
"79ae8b4b3c6d970a1f10e78f3eca9ef7"),
match = md5 == md5_ref) %>%
select(filename,
md5,
md5_ref,
match)

kable(hashes) %>%
kable_styling()

if (!all.equal(hashes$md5, hashes$md5_ref)) {
stop(cat("The source map is NOT up to date ! Please check the datasource. "))
}
```

```{r code_corrected}

habmap_correction <- read_vc("habmap_correction/habmap_correction")

habmap_correction %>%
kable(caption = "Corrected codes in habitat map") %>%
kable_styling()
```{r read_raw_data}

habitatmap_sf <- st_read(file.path(path, file),
layer = "habitatmap", crs = 31370)
```


## Processing of the attribute table

Every polygon in the habitat map can consist of maximum 5 different vegetation types. This information is stored in the columns 'HAB1', 'HAB2',..., 'HAB5' of the attribute table. The estimated fraction of each vegetation type within the polygons is stored in the columns 'PHAB1', 'PHAB2', ..., 'PHAB5'.

We will convert the attribute table to a long format, so that every row contains one vegetation type.

In some cases, the vegetation type is uncertain, and 2 or 3 possible vegetation types are provided, separated with a ','. We will split the different possible vegetation types and create one row for each of them. An additional variable 'certain' will be FALSE if the orginal habitatmap code consists of 2 or 3 possible vegetation types, and TRUE if only one vegetation type is provided.


```{r select_polygons}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note - feel free to make such changes. Gathering HAB & PHAB columns can now be done in one step using pivot_longer_spec(). E.g. first example below https://tidyr.tidyverse.org/articles/pivot.html#by-hand-1.


habmap_sf <- habmap_sf %>%
habmap_sf <- habitatmap_sf %>%
filter(!(HAB1 == "gh" & PHAB1 == 100)) %>%
mutate(polygon_id = TAG, # unieke id
description_orig = str_c(PHAB1, "% ", HAB1,
Expand Down Expand Up @@ -81,6 +87,35 @@ habmap_longHAB <- habmap_sf %>%
filter(!is.na(code)) %>%
filter(! code %in% c("gh", "x"))

```

## Correction of some of the codes in the Habitat map

Some polygons in the habitat map contain codes that do not correspond with the standardized list of habitat types.
We correct these codes to make processing of the habitat map more straightforward.
Table \@ref(tab:codeCorrected) shows the corrected codes and the number of polygons for which the correction is applied.
Comment on lines +94 to +96
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍



```{r }

habmap_correction <- read_vc("habmap_correction/habmap_correction")

overview_habmap_correction <- habmap_longHAB %>%
inner_join(habmap_correction, by = "code") %>%
group_by(code, code_corrected) %>%
summarise(n_polygons = n()) %>%
ungroup()

```


```{r codeCorrected}
overview_habmap_correction %>%
kable(caption = "Corrected codes in habitat map") %>%
kable_styling()
```

```{r}
if(sum(habmap_correction$code %in% habmap_longHAB$code) > 0){

habmap_longHAB <- habmap_longHAB %>%
Expand All @@ -96,10 +131,24 @@ if(sum(habmap_correction$code %in% habmap_longHAB$code) > 0){
habmap_longHAB <- habmap_longHAB %>%
mutate(code_orig = code)
}
```


## Splitting codes that contain different types

In several cases the code contains 2 or 3 possible vegetation types which are separated with a ','.
We will split the different possible vegetation types and create one row for each of them.
An additional variable 'certain' will be FALSE if the orginal habitatmap code consists of 2 or 3 possible vegetation types, and TRUE if only one vegetation type is provided.
An exception to this rule are following codes: `3130,rbbmr`, `3140,rbbmr`, `3150,rbbmr` and `3160,rbbmr`.
florisvdh marked this conversation as resolved.
Show resolved Hide resolved
These are standing water bodies that contain both `rbbmr` and the habitat type.
Therefore `certain` will be set to `TRUE` for both types included in `code`.

```{r}

habmap_long <- habmap_longHAB %>%
left_join(habmap_longPHAB, by = c("polygon_id", "patch_id")) %>%
mutate(certain = !str_detect(code, ",")) %>%
mutate(certain = !str_detect(code, ","),
certain = ifelse(code_orig %in% c("3130_rbbmr", "3140_rbbmr", "3150_rbbmr", "3160_rbbmr"), TRUE, certain)) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

separate(code,
into = c("type1", "type2", "type3"),
sep = ",",
Expand All @@ -109,40 +158,95 @@ habmap_long <- habmap_longHAB %>%
filter(!(type %in% c("gh", "bos"))) %>%
select(-patch_id)

```

Sometimes you get two records for the same type within the same habitatmap polygon.
We distinguish two cases:

+ two certain or two uncertain records of the same type, for example:

```{r}
habmap_long %>%
filter(polygon_id == "357179_v2020")
```

In this case we will sum the phab-values and create one record for each type.

+ a certain and an uncertain record of the same type, for example:

```{r}
habmap_long %>%
filter(polygon_id == "639375_v2014")
```

We will not aggregate the certain and uncertain record, because we will loose some information used to create `habitatmap_terr`. It is some of the uncertain records (such as `9120, gh`) are eliminated when processing `habitatmap_terr`.

```{r}
check_double_type <- habmap_long %>%
group_by(polygon_id, type) %>%
mutate(n_type = n()) %>%
ungroup() %>%
filter(n_type > 1)

```


```{r}
habmap_long_aggr <- habmap_long %>%
# group_by(polygon_id, code) %>%
# mutate(n = n()) %>%
# ungroup() %>%
group_by(polygon_id, type, certain) %>%
summarise(phab = sum(phab),
code_orig = str_c(code_orig, collapse = "; "),
certain = all(certain)) %>%
ungroup()
```
See below the result of the aggregation for both examples.

```{r}
habmap_long_aggr %>%
filter(polygon_id == "357179_v2020")
```

```{r}
habmap_long_aggr %>%
filter(polygon_id == "639375_v2014")
```


## Select vegetation types that belong to the standard list of habitat and rbb types

Table \@ref(tab:select_types) shows the records with habitat types that do not belong to the standar list of habitat and rbb types. These records are filtered out.
Table \@ref(tab:selectTypes) shows the records with habitat types that do not belong to the standard list of habitat and rbb types. These records are filtered out.

```{r select_types}
```{r selectTypes}

types <- read_types() %>%
select(type, typelevel, main_type, typeclass)

habmap_long <- habmap_long %>%
habmap_long_aggr <- habmap_long_aggr %>%
left_join(types, by = "type")

habmap_types <- habmap_long %>%
habmap_types <- habmap_long_aggr %>%
filter(!is.na(typelevel)) %>%
select(polygon_id, code_orig = code, phab, certain, type) %>%
select(polygon_id, type, certain, code_orig, phab) %>%
mutate(type = factor(type,
levels = levels(types$type)
)
) %>%
arrange(polygon_id, desc(phab))

habmap_other_type <- habmap_long %>%
habmap_other_type <- habmap_long_aggr %>%
filter(is.na(typelevel))

habmap_other_type %>%
select(polygon_id, code, phab, certain, type) %>%
kable() %>%
select(polygon_id, type, certain, code_orig, phab) %>%
kable(caption = "Polygons with types that do not coorspond with the standard list of types and rbb") %>%
kable_styling()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Steven's (BioDiv) comment on the fact that there are 2 polygons with 6110,gh:

  • officially no 6110 in Flanders
  • some steep slopes along the Albert Canal with a vegetation that could be classified in the Netherlands as a species-poor 6110, but it is dubious. There is consultation needed with the Dutch colleagues.

Seems ok to me to drop 6110 for the moment


```


## Select features that contain habitat or rbb types

```{r}
Expand All @@ -156,14 +260,21 @@ habmap_types_sf <- habmap_sf %>%
## Write results into a geopackage

```{r}
dir.create(file.path(path, "20_processed/habitatmap_stdized"), recursive = TRUE)
filepath <- file.path(path,"20_processed/habitatmap_stdized/habitatmap_stdized.gpkg")
```

st_write(habmap_types_sf, file.path(path,"20_processed/habitatmap_stdized/habitatmap_stdized.gpkg"), layer = "habitatmap_polygons", driver = "GPKG")

con = dbConnect(RSQLite::SQLite(),
dbname = file.path(path,"20_processed/habitatmap_stdized/habitatmap_stdized.gpkg"))

dbWriteTable(con, "habitatmap_types", habmap_types)

dbDisconnect(con)
```{r}

st_write(habmap_types_sf,
filepath,
layer = "habitatmap_polygons",
driver = "GPKG",
delete_dsn = TRUE)

st_write(habmap_types,
filepath,
layer = "habitatmap_types",
driver = "GPKG",
append = TRUE)
```
59 changes: 59 additions & 0 deletions src/generate_habitatmap_stdized/20_check_result.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
## Checks on the data source

Checksums:

```{r}
file(filepath) %>%
openssl::md5() %>%
str_c(collapse = '') %>%
`names<-`("md5sum")
file(filepath) %>%
openssl::sha256() %>%
str_c(collapse = '') %>%
`names<-`("sha256sum")
```


```{r paged.print=FALSE, warning=FALSE}
(pol <- read_sf(filepath,
layer = "habitatmap_polygons"))
```

```{r paged.print=FALSE, warning=FALSE}
(types <- read_sf(filepath,
layer = "habitatmap_types"))
```

- Do all records in the habitatmap_types table have a corresponding spatial feature in the habitatmap_polygons layer?

```{r}
check_polygon_id_types <- types %>%
anti_join(pol, by = "polygon_id")

nrow(check_polygon_id_types) == 0
```

- Do all spatial features in the habitatmap_polygons layer have a corresponding record in the habitatmap_types table?

```{r}
check_polygon_id_polygons <- pol %>%
st_drop_geometry() %>%
anti_join(types, by = "polygon_id")

nrow(check_polygon_id_polygons) == 0
```

- Is the number of unique IDs in the dataframe the same as the number of polygons in the sf object?

```{r}
types %>%
distinct(polygon_id) %>%
nrow == nrow(pol)
```


- Is the CRS conform EPSG:31370?

```{r}
st_crs(pol) == st_crs(31370)
```
7 changes: 7 additions & 0 deletions src/generate_habitatmap_stdized/99_sessioninfo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
si <- devtools::session_info()
p <- si$platform %>%
do.call(what = "c")
if ("sf" %in% si$packages$package) {
p <- c(p, sf_extSoftVersion())
names(p)[names(p) == "proj.4"] <- "PROJ"
}
if ("rgrass7" %in% si$packages$package) {
p <- c(p, GRASS = link2GI::findGRASS()[1, "version"])
}
sprintf("- **%s**:\n %s\n", names(p), p) %>%
cat()
```
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
code code_corrected
3130_rbbmr 3130,rbbmr
3140_rbbmr 3140,rbbmr
3150_rbbmr 3150,rbbmr
3160_rbbmr 3160,rbbmr
7140_cl 7140
91E0_vnva 91E0_vn,91E0_va
91E0_vnvm 91E0_vn,91E0_vm
7140_cl 7140_meso
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
..generic:
git2rdata: 0.0.5
git2rdata: 0.3.1
optimize: yes
NA string: NA
sorting: code
hash: e258eadad45dcddd9c388f37c17aaeb8e590e978
data_hash: 8b326b50c61c13db225e4a5e1234bb44c203bf74
data_hash: 50567ead057ed055dc0b403b6a4f3a074c622951
code:
class: character
code_corrected:
Expand Down
Loading