Skip to content

Commit

Permalink
much needed tidy up
Browse files Browse the repository at this point in the history
  • Loading branch information
btupper committed Jan 6, 2025
1 parent feb117f commit 43b258c
Show file tree
Hide file tree
Showing 27 changed files with 931 additions and 999 deletions.
2 changes: 1 addition & 1 deletion C00_coding.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ RStudio is a free GUI that wraps around two languages (R and Python, and soon to
![Rstudio screenshot](images/Rstudio.png)
There are many [RStudio tutorials](https://duckduckgo.com/?q=introduction+to+rstudio&t=ffab&atb=v342-1&iax=videos&ia=videos&iai=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dw8ooTMStQV0&pn=1) online. We encourage you to check them out.

# Getting the software you will need
# Getting the software and data you will need

Use this [wiki page](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Courseware) to guide you through the process of installing the software you'll need for this course, "the courseware".

Expand Down
9 changes: 4 additions & 5 deletions C01_observations.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ We need to load the Brickman database, and then filter it for the static variabl
```{r mask}
db = brickman_database() |>
filter(scenario == "STATIC", var == "mask")
mask = read_brickman(db)
mask = read_brickman(db, add_depth = FALSE)
mask
```

Expand Down Expand Up @@ -216,7 +216,7 @@ So, we dropped `{r} dropped_records` records which is about `{r} sprintf("%0.1f%
We have explored a data set, in particular for *Mola mola*; your species may present you with unique challenges. Our goal is to winnow the original data set down to just the most reliable observations for modeling.

# Coding Assignment
::: {.callout-note appearance="simple"}

We went through many steps to filter out records that won't help use model. We'll need that filtered data many-many-many times in the days ahead. Wouldn't it be nice if we could sweep all of those filtering steps into a single function, call it `read_observations()`, that simple took care of it all for us? Yes - that would be really nice!

Open the "read_observations.R" file you'll find in the "functions" directory. We have started it for you. Edit the function so that it appropriately filters your species data set by adding optional arguments (like `minimum_year` has been added). And then adding the code steps needed to implement that filter.
Expand All @@ -225,12 +225,11 @@ Not every filter needs user input. For instance, `eventDate` can't be `NA`, and

On the other hand, filtering by `basisOfRecord` or `individualCount` might need more flexibility, especially if you might switch to other species.

SPeaking of which, we provided `scientificname` with a default value - we chose "Mola mola" because we are a bit lazy. If you are feeling lazy, you can change the default to your own species.
Speaking of which, we provided `scientificname` with a default value - we chose "Mola mola" because we are a bit lazy. If you are feeling lazy, you can change the default to your own species.

As you build your function, pause every so often and run the following to test things out.

```
source("setup.R")
obs = read_observations()
```
:::
```
10 changes: 3 additions & 7 deletions C02_background.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ obs = read_observations(scientificname = "Mola mola") |>
filter(month == "Aug")
db = brickman_database() |>
filter(scenario == "STATIC", var == "mask")
mask = read_brickman(db)
mask = read_brickman(db, add_depth = FALSE)
```

We have two approaches to what happens next. The first is the greedy approach that say, gather together lots of observations and background points. Lot and lots! The second approach is much more conseravtive as it considers the value (or not!) of having replicate measurements at locations that share the same array cell.
Expand Down Expand Up @@ -245,21 +245,17 @@ plot(coast, col = "orange", add = TRUE)
We have prepared what we call "model inputs", in particular for *Mola mola*, by selecting background points using two different approaches: greedy and conservative. There are lots of other approaches, too, but for the sake of learning we'll settle on just these two. We developed a function that will produce our model inputs for a given month, and saved them to disk. Then we read at least one back and showed that we can restore these from disk.

# Coding Assignment
:::{.callout-note appearance="simple"}
Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `make_model_input_by_month()` for each month.
Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `make_model_input_by_month()` for each month. You'll know you have done it correctly if your result is a list filled with lists of greedy-conservative tables, **and** your `model_inputs` directory holds at least 24 files (12 months x 2 sampling schemes).

You'll know you have done it correctly if your result is a list filled with lists of greedy-conservative tables.
:::


# Challenge
And here we add one challenge...

:::{.callout-note appearance="simple"}
Create a function to read the correct model input when given the species, month and approach.

Use the menu option `File > New File > R Script` to create a blank file. Save the file (even though it is empty) in the "functions" directory as "model_input.R". Use this file to build a function (or set of functions) that uses this set of arguments. Below is a template to help you get started.
:::


```{r read_model_input}
#' Reads a model input file given species, month, approach and path
Expand Down
4 changes: 1 addition & 3 deletions C03_covariates.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,7 @@ Use the `Files` pane to navigate to your personal data directory. Open the `g_A
We loaded the covariates for the "PRESENT" climate scenario and looked at collinearity across the entore study domain. We invoked a function that suggests which variables to keep and which to drop based upon collinearity. We examined the covariates at just the `presence` and `background` locations. We then saved a configuration for later reuse.

# Coding Assignment
::: {.callout-note appearance="simple"}
Open nd edit the file called `functions/select_covariates.R`. Within the file write the function(s) you need to select the "keeper" variables for a given approach (greedy or conservative) and a given month (Jan - Dec). Have the function return an appropriate configuration list. The function shoulkd start out approximately like this...
Open and edit the file called `functions/select_covariates.R`. Within the file write the function(s) you need to select the "keeper" variables for a given approach (greedy or conservative) and a given month (Jan - Dec). Have the function return an appropriate configuration list. The function shoulkd start out approximately like this...

```
#' Given a species, month and sampling approach select variabkes for each month
Expand All @@ -163,7 +162,6 @@ select_covariates = function(approach = "greedy",
```

Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `select_covariates()` for each month using each approach. At each iteration write the configuration. When you are done, you should have 12 YAML files for each approach - so 24 YAML files written all together for each species.
:::



Expand Down
3 changes: 1 addition & 2 deletions C04_models.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,6 @@ You can read it back later with `read_workflow()`.
We have built a random forest model using tools from the [tidymodels universe](https://www.tidymodels.org/). After reading in a suite of data, we split our data into training and testing sets, witholding the testing set until the very end. We looked a variety of metrics including a simple tally, a confusion matrix, ROC and AUC, accuracy and partial dependencies. We saved the recipe and model together in a special container, called a workflow, to a file.

# Coding Assignment
::: {.callout-note appearance="simple"}

Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to build a workflow for each month using one or both of your background selection methods. Save each workflow in the `models` directory. If you chose to do both background selection methods then you should end up with 24 workflows (12 months x 2 background sampling methods).

:::
9 changes: 3 additions & 6 deletions C05_prediction.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -153,15 +153,12 @@ write_prediction(forecast_2075, file = file.path(path, "g_Aug_RCP85_2075.tif"))
write_prediction(rcp85, file = file.path(path, "g_Aug_RCP85_all.tif"))
```

To read it back simply provide the filename to `read_prediction()`. If you are reading back a multi-layer array, be sure to check out the `time` argument to assign values to the time dimension. Single layer arrays don't have the concept of time.





To read it back simply provide the filename to `read_prediction()`. If you are reading back a multi-layer array, be sure to check out the `time` argument to assign values to the time dimension. Single layer arrays don't have the concept of time so the `time` argument is ignored.

# Recap

We made both a nowcast and a number predictions using a previously saved workflow. Contrary to Yogi Berra's claim, it's actually pretty easy to predict the future. Perhaps more challenging is to interpret the prediction. We bundled these together to make time series plots, and we saved the `.pred_presence` values.

# Coding Assignment

For each each climate scenario create a monthly forecast (so that's three: nowcast, forecast_2055 and forecast_2075) and save each to in your `predictions` directory. Whether you choose to draw upon the greedy background sampling method, the conservative background sampling method or both is up to you. Keep in mind that some months may not have enough data to model without throwing an error. We suggest that you wrap your critical steps in a `try()` function which will catch the error without crashing your iterator. There is a tutorial on [error catching](https://bigelowlab.github.io/handytandy/try.html) that specifically uses `try()`.
46 changes: 24 additions & 22 deletions docs/C00_coding.html
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ <h2 id="toc-title">On this page</h2>
<ul>
<li><a href="#background" id="toc-background" class="nav-link active" data-scroll-target="#background"><span class="header-section-number">1</span> Background</a></li>
<li><a href="#one-of-many-available-guis-rstudio" id="toc-one-of-many-available-guis-rstudio" class="nav-link" data-scroll-target="#one-of-many-available-guis-rstudio"><span class="header-section-number">2</span> One of many available GUIs: RStudio</a></li>
<li><a href="#getting-the-software-you-will-need" id="toc-getting-the-software-you-will-need" class="nav-link" data-scroll-target="#getting-the-software-you-will-need"><span class="header-section-number">3</span> Getting the software you will need</a></li>
<li><a href="#getting-the-software-and-data-you-will-need" id="toc-getting-the-software-and-data-you-will-need" class="nav-link" data-scroll-target="#getting-the-software-and-data-you-will-need"><span class="header-section-number">3</span> Getting the software and data you will need</a></li>
<li><a href="#coding-with-r" id="toc-coding-with-r" class="nav-link" data-scroll-target="#coding-with-r"><span class="header-section-number">4</span> Coding with R</a>
<ul class="collapse">
<li><a href="#loading-the-necessary-tools" id="toc-loading-the-necessary-tools" class="nav-link" data-scroll-target="#loading-the-necessary-tools"><span class="header-section-number">4.1</span> Loading the necessary tools</a></li>
Expand Down Expand Up @@ -270,8 +270,8 @@ <h1 data-number="2"><span class="header-section-number">2</span> One of many ava
<p>RStudio is a free GUI that wraps around two languages (R and Python, and soon to be more). When you invoke RStudio you’ll see that it is laid out as a multi-panel application that runs inside your browser. It will look something like this screenshot.</p>
<p><img src="images/Rstudio.png" class="img-fluid" alt="Rstudio screenshot"> There are many <a href="https://duckduckgo.com/?q=introduction+to+rstudio&amp;t=ffab&amp;atb=v342-1&amp;iax=videos&amp;ia=videos&amp;iai=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dw8ooTMStQV0&amp;pn=1">RStudio tutorials</a> online. We encourage you to check them out.</p>
</section>
<section id="getting-the-software-you-will-need" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> Getting the software you will need</h1>
<section id="getting-the-software-and-data-you-will-need" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> Getting the software and data you will need</h1>
<p>Use this <a href="https://github.com/BigelowLab/ColbyForecasting2025/wiki/Courseware">wiki page</a> to guide you through the process of installing the software you’ll need for this course, “the courseware”.</p>
</section>
<section id="coding-with-r" class="level1" data-number="4">
Expand Down Expand Up @@ -423,26 +423,28 @@ <h2 data-number="4.4" class="anchored" data-anchor-id="array-data-aka-raster-dat
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>current <span class="ot">=</span> <span class="fu">read_brickman</span>(db)</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>current</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>stars object with 3 dimensions and 8 attributes
<pre><code>stars object with 3 dimensions and 9 attributes
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu.
MLD 1.011275e+00 5.583339810 15.967359543 18.910421492 2.809953e+01
Sbtm 2.324167e+01 32.136343956 34.232215881 33.507147254 3.491243e+01
SSS 1.644333e+01 30.735633373 31.104771614 31.492407921 3.203519e+01
SST -7.826599e-01 6.434107542 12.359498501 12.151707840 1.763068e+01
Tbtm -2.676387e-01 3.595118523 6.110801697 6.122372065 7.521761e+00
U -2.121380e-01 -0.010892980 -0.002634738 -0.010139401 7.229637e-04
V -1.883337e-01 -0.010722862 -0.002858645 -0.008474233 9.565173e-04
Xbtm 3.275602e-06 0.001458065 0.003088348 0.008360344 7.256525e-03
Max. NA's
MLD 106.69815063 59796
Sbtm 35.15741730 59796
SSS 35.59160995 59796
SST 26.43147278 59796
Tbtm 24.60999298 59796
U 0.07469980 59796
V 0.05264002 59796
Xbtm 0.18996811 59796
Min. 1st Qu. Median Mean 3rd Qu.
MLD 1.011275e+00 5.583339810 15.967359543 18.910421492 2.809953e+01
Sbtm 2.324167e+01 32.136343956 34.232215881 33.507147254 3.491243e+01
SSS 1.644333e+01 30.735633373 31.104771614 31.492407921 3.203519e+01
SST -7.826599e-01 6.434107542 12.359498501 12.151707840 1.763068e+01
Tbtm -2.676387e-01 3.595118523 6.110801697 6.122372065 7.521761e+00
U -2.121380e-01 -0.010892980 -0.002634738 -0.010139401 7.229637e-04
V -1.883337e-01 -0.010722862 -0.002858645 -0.008474233 9.565173e-04
Xbtm 3.275602e-06 0.001458065 0.003088348 0.008360344 7.256525e-03
depth 5.000000e+00 60.258880615 145.012619019 923.313763739 1.704049e+03
Max. NA's
MLD 1.066982e+02 59796
Sbtm 3.515742e+01 59796
SSS 3.559161e+01 59796
SST 2.643147e+01 59796
Tbtm 2.460999e+01 59796
U 7.469980e-02 59796
V 5.264002e-02 59796
Xbtm 1.899681e-01 59796
depth 4.964409e+03 59796
dimension(s):
from to offset delta refsys point values x/y
x 1 121 -74.93 0.08226 WGS 84 FALSE NULL [x]
Expand Down
13 changes: 2 additions & 11 deletions docs/C01_observations.html
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,7 @@ <h2 data-number="5.6" class="anchored" data-anchor-id="geometry"><span class="he
<div class="cell">
<div class="sourceCode cell-code" id="cb37"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a>db <span class="ot">=</span> <span class="fu">brickman_database</span>() <span class="sc">|&gt;</span></span>
<span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">filter</span>(scenario <span class="sc">==</span> <span class="st">"STATIC"</span>, var <span class="sc">==</span> <span class="st">"mask"</span>)</span>
<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>mask <span class="ot">=</span> <span class="fu">read_brickman</span>(db)</span>
<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>mask <span class="ot">=</span> <span class="fu">read_brickman</span>(db, <span class="at">add_depth =</span> <span class="cn">FALSE</span>)</span>
<span id="cb37-4"><a href="#cb37-4" aria-hidden="true" tabindex="-1"></a>mask</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>stars object with 2 dimensions and 1 attribute
Expand Down Expand Up @@ -631,23 +631,14 @@ <h1 data-number="6"><span class="header-section-number">6</span> Recap</h1>
</section>
<section id="coding-assignment" class="level1" data-number="7">
<h1 data-number="7"><span class="header-section-number">7</span> Coding Assignment</h1>
<div class="callout callout-style-simple callout-note">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-body-container">
<p>We went through many steps to filter out records that won’t help use model. We’ll need that filtered data many-many-many times in the days ahead. Wouldn’t it be nice if we could sweep all of those filtering steps into a single function, call it <code>read_observations()</code>, that simple took care of it all for us? Yes - that would be really nice!</p>
<p>Open the “read_observations.R” file you’ll find in the “functions” directory. We have started it for you. Edit the function so that it appropriately filters your species data set by adding optional arguments (like <code>minimum_year</code> has been added). And then adding the code steps needed to implement that filter.</p>
<p>Not every filter needs user input. For instance, <code>eventDate</code> can’t be <code>NA</code>, and all points must fall within the area covered by the Brickman data. So you can automatically add those filters without any user options.</p>
<p>On the other hand, filtering by <code>basisOfRecord</code> or <code>individualCount</code> might need more flexibility, especially if you might switch to other species.</p>
<p>SPeaking of which, we provided <code>scientificname</code> with a default value - we chose “Mola mola” because we are a bit lazy. If you are feeling lazy, you can change the default to your own species.</p>
<p>Speaking of which, we provided <code>scientificname</code> with a default value - we chose “Mola mola” because we are a bit lazy. If you are feeling lazy, you can change the default to your own species.</p>
<p>As you build your function, pause every so often and run the following to test things out.</p>
<pre><code>source("setup.R")
obs = read_observations()</code></pre>
</div>
</div>
</div>


</section>
Expand Down
Loading

0 comments on commit 43b258c

Please sign in to comment.