Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate watercourse_100mseg.gpkg #44

Merged
merged 31 commits into from
Jan 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d841380
Generate watercourse_100mseg & watercourse_100msegpoints: initial code
florisvdh Nov 27, 2020
a6345ea
Generate watercourse_100m...: minor updates & fixes
florisvdh Nov 27, 2020
9f4ad02
Generate watercourse_100m...: elaborate GRASS geoprocessing
florisvdh Nov 27, 2020
40ab1e5
Generate watercourse_100m...: improve GRASS setup
florisvdh Nov 27, 2020
3cc7615
Generate watercourse_100m...: minor additions
florisvdh Nov 27, 2020
5cc5343
Generate watercourse_100m...: add more narrative
florisvdh Nov 27, 2020
afcb7ae
Generate watercourse_100m...: add extra paragraph titles
florisvdh Nov 27, 2020
2b525f3
Generate watercourse_100m...: rename variable to 'grassdbase_exists'
florisvdh Nov 27, 2020
5fc7da0
Generate watercourse_100m...: textual fixes
florisvdh Nov 30, 2020
59e43b9
Generate watercourse_100m...: better attr table preparation directly …
florisvdh Nov 30, 2020
9f23b77
Generate watercourse_100m...: export both layers into the final GPKG …
florisvdh Nov 30, 2020
288a637
Generate watercourse_100m...: add some checks on the data source
florisvdh Nov 30, 2020
ea7281b
Generate watercourse_100m...: add parameter grass_reexport
florisvdh Nov 30, 2020
7105062
Generate watercourse_100m...: drop mapview, add stringr
florisvdh Nov 30, 2020
6c133bb
Generate watercourse_100m...: compute checksums of data source
florisvdh Nov 30, 2020
30faabf
Generate watercourse_100m...: use renv
florisvdh Nov 30, 2020
19b565c
Generate watercourse_100m...: more explanation
florisvdh Nov 30, 2020
f542d7c
Generate watercourse_100m...: section about attribute variables
florisvdh Nov 30, 2020
749e3ef
Generate watercourse_100m...: uses watercourses, not watercourse_segm…
florisvdh Dec 16, 2020
4044c83
Generate watercourse_100m...: take watercourses from n2khab_data
florisvdh Dec 16, 2020
fd75d91
Generate watercourse_100m...: improved GRASS setup
florisvdh Dec 16, 2020
5e18861
Generate watercourse_100m...: enforce EPSG:31370 in GRASS location
florisvdh Dec 16, 2020
3bd1bfe
Generate watercourse_100m...: use execshell function
florisvdh Dec 16, 2020
5e06b69
Generate watercourse_100m...: use watercourses_20200807
florisvdh Dec 16, 2020
71722b7
Generate watercourse_100m...: fix in code for v.out.ogr
florisvdh Dec 16, 2020
5721395
Generate watercourse_100m...: minor text fixes
florisvdh Dec 16, 2020
b7cd3a3
Generate watercourse_100m...: updates in the execution *
florisvdh Dec 16, 2020
a97d9cc
Generate watercourse_100m...: read_sf: take EPSG:31370 from GPKG
florisvdh Dec 16, 2020
9df7895
Generate watercourse_100m...: adjust cartographic extent
florisvdh Dec 16, 2020
46c742b
Generate watercourse_100m...: improve PROJ in sessioninfo report
florisvdh Dec 16, 2020
dc4d4d4
Generate watercourse_100m...: add renv::restore()
florisvdh Dec 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,5 @@ src/private_code.R
*.qgs
*.bak
*.qgz
grass_*/
watercourse_segments/
1 change: 1 addition & 0 deletions src/generate_watercourse_100mseg/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
source("renv/activate.R")
331 changes: 331 additions & 0 deletions src/generate_watercourse_100mseg/10_generate_watercourse_100mseg.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@
# Making the data source

We will create a master dataset of watercourse segments of 100 m (`watercourse_100mseg`), in order to derive sampling frames of lotic types (3260 in particular).

It is a processed data source, derived from the raw data source `watercourses` (version `watercourses_20200807`).
The segments will have a 100 m length, except at the end of the linestrings of `watercourses`.

Further we will also derive a second layer which represent the _endpoints_ of the 100 m segments.

```{r}
local_root <- find_root(has_file("generate_watercourse_100mseg.Rproj"))
datapath <- file.path(local_root, "data")
n2khab_datapath <- find_root_file("n2khab_data",
criterion = has_dir("n2khab_data"))
if (!dir.exists(datapath)) dir.create(datapath)
finalpath <- find_root_file("n2khab_data/20_processed/watercourse_100mseg",
criterion = has_dir("n2khab_data"))
if (!dir.exists(finalpath)) dir.create(finalpath)
```

```{r}
grass_location <- file.path(datapath, "grass_watercourses")
if (!dir.exists(grass_location)) dir.create(file.path(grass_location, "PERMANENT"),
recursive = TRUE)
grassdbase_exists <- file.exists(file.path(grass_location, "PERMANENT/PROJ_INFO"))
if (.Platform$OS.type == "unix") {
Sys.setenv(GRASS_PYTHON = system("which python3", intern = TRUE))
}
```


## Geoprocessing steps in GRASS

Read the `watercourses` data source.

```{r paged.print=FALSE}
watercourses <- read_sf(file.path(n2khab_datapath, "10_raw/watercourses"))
```

Initialize GRASS projectfolder if non-existant, or link to the already existing one:

```{r}
if (interactive()) {
gisbase_grass <-
if (.Platform$OS.type == "windows") link2GI::paramGRASSw()$gisbase_GRASS[1] else {
link2GI::paramGRASSx()$gisbase_GRASS[1]
}
}
initGRASS(gisBase = if (interactive()) gisbase_grass else params$gisbase_grass,
home = tempdir(),
gisDbase = datapath, location = "grass_watercourses",
mapset = "PERMANENT", override = TRUE)
```

Cross-platform function to pass full GRASS command syntax as a string from R:

```{r}
execshell <- function(commandstring, intern = FALSE) {
if (.Platform$OS.type == "windows") {
res <- shell(commandstring, intern = TRUE)
} else {
res <- system(commandstring, intern = TRUE)
}
if (!intern) cat(res, sep = "\n") else return(res)
}
```


```{r eval=!grassdbase_exists}
# setting CRS and extent of the GRASS location, based on 'watercourses'
execGRASS("g.proj", flags = c("c", "quiet"),
epsg = 31370)
```


```{r}
use_sf()
```

Add `watercourses` data source to GRASS database:

```{r eval=!grassdbase_exists}
execGRASS("v.in.ogr",
"o",
input = file.path(n2khab_datapath, "10_raw/watercourses"),
output = "watercourses")
```

### Generate `watercourse_100mseg` in GRASS

To do this, we first use the `v.edit` command to flip the direction of the linestrings, because the segments are to be generated from mouth to source, leaving segments of <100 m only at the source side.
After that, we will use the `v.split` command.
It will take each linestring of the raw `watercourses` data source and split it into segments of 100 m length.
This generally results in one segment (for each original linestring) that is shorter than 100 m!

<!-- g.manual v.split -->

```{r eval=!grassdbase_exists}
execshell("g.copy vector=watercourses,watercourses_original")
execshell("v.edit map=watercourses tool=flip where='1=1'")
execshell("v.split -f --overwrite input=watercourses output=watercourse_100mseg_pre1 length=100")
```

```{r}
execshell("v.category input=watercourse_100mseg_pre1 option=report")
```

As seen from the above output, in accordance with the `v.split` manual, the categories and the attribute table of the original vector map are simply copied over.
This means that the individual 100 m segments have no unique category, i.e. they don't have their own rows (with ID) in another attribute table.
To get that, as explained in the documentation, we define a second layer^[To do that, the vector map `watercourse_100mseg_pre1` is copied, including layers & the link to the same attribute table.] with a unique category per feature (100 m segment) and create a new attribute table that links to the new layer (an attribute table always has exactly 1 row for each category and uses 1 layer to take categories from).
Next, we copy the category of layer 1 as a mere attribute into a second column in the new attribute table (which is linked to layer 2), in order to define the link between the grouped category `cat_group` (categories of `watercourses`) and the unique category `cat`.

The `execshell()` command directly sends the below commands to the shell as one script for execution by GRASS.

```{r eval=!grassdbase_exists}
execshell('v.category input=watercourse_100mseg_pre1 option=add layer=2 output=watercourse_100mseg
# v.category input=watercourse_100mseg option=report
# v.db.connect -p map=watercourse_100mseg
v.db.addtable watercourse_100mseg layer=2 key="rank" columns="vhag_code int"
# v.db.connect -p map=watercourse_100mseg
# v.info -c watercourse_100mseg layer=2
# v.info -c watercourse_100mseg layer=1
v.to.db map=watercourse_100mseg layer=2 option=query query_layer=1 query_column=VHAG columns=vhag_code --overwrite
# v.db.select map=watercourse_100mseg layer=2 | head
')
```

<!-- Output from the v.to.db command: -->

<!-- WARNING: Values in column <vhag_code> will be overwritten -->
<!-- Reading features... -->
<!-- 100% -->
<!-- Querying database... -->
<!-- 100% -->
<!-- Updating database... -->
<!-- 100% -->
<!-- 271676 categories read from vector map (layer 2) -->
<!-- 271676 records selected from table (layer 1) -->
<!-- 271676 categories read from vector map exist in selection from table -->
<!-- 271676 records updated/inserted (layer 2) -->



<!-- v.db.connect -p map=watercourse_100mseg -->
<!-- Vector map <watercourse_100mseg> is connected by: -->
<!-- layer <1/watercourses> table <watercourse_100mseg> in database <(...)/n2khab-preprocessing/src/generate_watercourse_100mseg/data/grass_watercourses/PERMANENT/sqlite/sqlite.db> through driver <sqlite> with key <cat> -->
<!-- layer <2/watercourse_100mseg_2> table <watercourse_100mseg_2> in database <(...)/n2khab-preprocessing/src/generate_watercourse_100mseg/data/grass_watercourses/PERMANENT/sqlite/sqlite.db> through driver <sqlite> with key <rank> -->

### Generate `watercourse_100msegpoints` in GRASS

To do this, we use the `v.to.points` command.
We want each point at the downstream side of its corresponding segment.
We create one point per 100 m segment.
As we flipped the direction of the watercourse lines, we want the _startpoint_ of each segment.

<!-- g.manual v.to.points -->

```{r eval=!grassdbase_exists}
execGRASS("v.to.points",
flags = c("overwrite"),
input = "watercourse_100mseg",
layer = "2",
output = "watercourse_100msegpoints",
use = "start")
```

<!-- 100% -->
<!-- Building topology for vector map <watercourse_100msegpoints@PERMANENT>... -->
<!-- Registering primitives... -->
<!-- 270000 -->
<!-- v.to.points complete. 271676 points written to output vector map. -->


<!-- v.info watercourse_100msegpoints layer=1 -->
<!-- +----------------------------------------------------------------------------+ -->
<!-- | Name: watercourse_100msegpoints | -->
<!-- | Mapset: PERMANENT | -->
<!-- | Location: grass_watercourses | -->
<!-- | Database: /media/floris/DATA/PROJECTS/09685_NatuurlijkMilieu/160 Be | -->
<!-- | Title: | -->
<!-- | Map scale: 1:1 | -->
<!-- | Name of creator: floris | -->
<!-- | Organization: | -->
<!-- | Source date: Wed Dec 16 16:54:35 2020 | -->
<!-- | Timestamp (first layer): none | -->
<!-- |----------------------------------------------------------------------------| -->
<!-- | Map format: native | -->
<!-- |----------------------------------------------------------------------------| -->
<!-- | Type of map: vector (level: 2) | -->
<!-- | | -->
<!-- | Number of points: 271676 Number of centroids: 0 | -->
<!-- | Number of lines: 0 Number of boundaries: 0 | -->
<!-- | Number of areas: 0 Number of islands: 0 | -->
<!-- | | -->
<!-- | Map is 3D: No | -->
<!-- | Number of dblinks: 2 | -->
<!-- | | -->
<!-- | Projection: Belge 1972 / Belgian Lambert 72 | -->
<!-- | | -->
<!-- | N: 243962.6791 S: 153064.35460326 | -->
<!-- | E: 258447.81310536 W: 22979.76203745 | -->
<!-- | | -->
<!-- | Digitization threshold: 0 | -->
<!-- | Comment: | -->
<!-- | | -->
<!-- +----------------------------------------------------------------------------+ -->


<!-- v.category input=watercourse_100msegpoints option=report -->
<!-- Layer/table: 1/watercourse_100msegpoints_1 -->
<!-- type count min max -->
<!-- point 271676 1 271676 -->
<!-- line 0 0 0 -->
<!-- boundary 0 0 0 -->
<!-- centroid 0 0 0 -->
<!-- area 0 0 0 -->
<!-- face 0 0 0 -->
<!-- kernel 0 0 0 -->
<!-- all 271676 1 271676 -->
<!-- Layer/table: 2/watercourse_100msegpoints_2 -->
<!-- type count min max -->
<!-- point 271676 1 271676 -->
<!-- line 0 0 0 -->
<!-- boundary 0 0 0 -->
<!-- centroid 0 0 0 -->
<!-- area 0 0 0 -->
<!-- face 0 0 0 -->
<!-- kernel 0 0 0 -->
<!-- all 271676 1 271676 -->

<!-- v.info -c watercourse_100msegpoints layer=1 -->
<!-- Displaying column types/names for database connection of layer <1>: -->
<!-- INTEGER|rank -->
<!-- INTEGER|vhag_code -->

## Exporting resulting objects from GRASS

We already took care of standardized column names within GRASS; so no further changes are needed when exporting the data from GRASS.
We do it this way in order to minimize the number of times the data need to be written forth and back to a geopackage, which takes extra time (> 250e3 features) and file organization.

```{r eval=params$grass_reexport}
execGRASS("v.out.ogr",
flags = "overwrite",
input = "watercourse_100mseg",
layer = "2",
output = file.path(finalpath, "watercourse_100mseg.gpkg"),
output_layer = "watercourse_100mseg_lines")
```

```{r eval=params$grass_reexport}
# This export takes a very long time (> 60 min).
# There must be some bug in the export procedure
# (after all, the data are simpler than previous one).
execGRASS("v.out.ogr",
"a",
input = "watercourse_100msegpoints",
layer = "1",
output = file.path(finalpath, "watercourse_100mseg.gpkg"),
output_layer = "watercourse_100mseg_points")
```


# Attribute variables of the processed data source: explanation

Both layers in the GPKG file have exactly the same attribute variables:

- `rank`: a unique integer increment (1-2-3-4-5-...).
It is to be used in a _relative_ way: it ranks the 100 m segments / points along each original linestring in the raw data source.
- `vhag_code`: the unique VHAG code of the original linestring in the raw data source.
It is common to all segments / points in the processed data source that belong to the same original watercourse.

To obtain other attributes, one can make a join on the VHAG code column of the raw data source.


# Checks on the data source

```{r}
filepath <- file.path(finalpath, "watercourse_100mseg.gpkg")
```

Checksums:

```{r}
file(filepath) %>%
openssl::md5() %>%
str_c(collapse = '') %>%
`names<-`("md5sum")
file(filepath) %>%
openssl::sha256() %>%
str_c(collapse = '') %>%
`names<-`("sha256sum")
```


```{r paged.print=FALSE, warning=FALSE}
(seg <- read_sf(filepath,
layer = "watercourse_100mseg_lines"))
```

```{r paged.print=FALSE, warning=FALSE}
(pts <- read_sf(filepath,
layer = "watercourse_100mseg_points"))
```

```{r}
all.equal(seg$rank, pts$rank)
```

```{r}
all.equal(seg$vhag_code, pts$vhag_code)
```

```{r}
all.equal(unique(seg$vhag_code), unique(watercourses$VHAG))
```

Cartographic display of a row selection:

```{r}
seg %>%
filter(rank < 2e4) %>%
ggplot() +
geom_sf(colour = "grey50") +
geom_sf(data = pts %>% filter(rank < 2e4), colour = "red", size = 0.5) +
lims(x = c(36e3, 40e3), y = c(184e3, 188e3)) +
coord_sf(datum = 31370) +
theme_bw()+
theme(panel.grid = element_blank())
```


25 changes: 25 additions & 0 deletions src/generate_watercourse_100mseg/99_sessioninfo.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Used environment

```{r session-info, results = "asis", echo=FALSE}
si <- devtools::session_info()
p <- si$platform %>%
do.call(what = "c")
if ("sf" %in% si$packages$package) {
p <- c(p, sf_extSoftVersion())
names(p)[names(p) == "proj.4"] <- "PROJ"
}
if ("rgrass7" %in% si$packages$package) {
p <- c(p, GRASS = link2GI::findGRASS()[1, "version"])
}
sprintf("- **%s**:\n %s\n", names(p), p) %>%
cat()
```

```{r results = "asis", echo=FALSE}
si$packages %>%
as_tibble %>%
select(package, loadedversion, date, source) %>%
pander::pandoc.table(caption = "(\\#tab:sessioninfo)Loaded R packages",
split.table = Inf)
```

4 changes: 4 additions & 0 deletions src/generate_watercourse_100mseg/_bookdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
book_filename: "generating_watercourse_100mseg.Rmd"
new_session: FALSE
rmd_files: # specifies the order of Rmd files; NOT needed when you use index.Rmd and an alphabetical order for other Rmd files
# - index.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: knitr
LaTeX: XeLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes

BuildType: Website
Loading