much needed tidy up

BigelowLab · Jan 6, 2025 · 43b258c · 43b258c
1 parent feb117f
commit 43b258c
Show file tree

Hide file tree

Showing 27 changed files with 931 additions and 999 deletions.
diff --git a/C00_coding.qmd b/C00_coding.qmd
@@ -16,7 +16,7 @@ RStudio is a free GUI that wraps around two languages (R and Python, and soon to
 ![Rstudio screenshot](images/Rstudio.png)
 There are many [RStudio tutorials](https://duckduckgo.com/?q=introduction+to+rstudio&t=ffab&atb=v342-1&iax=videos&ia=videos&iai=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dw8ooTMStQV0&pn=1) online.  We encourage you to check them out.
 
-# Getting the software you will need
+# Getting the software and data you will need
 
 Use this [wiki page](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Courseware) to guide you through the process of installing the software you'll need for this course, "the courseware".
 

diff --git a/C01_observations.qmd b/C01_observations.qmd
@@ -176,7 +176,7 @@ We need to load the Brickman database, and then filter it for the static variabl
 ```{r mask}
 db = brickman_database() |>
   filter(scenario == "STATIC", var == "mask")
-mask = read_brickman(db)
+mask = read_brickman(db, add_depth = FALSE)
 mask
 ```
 
@@ -216,7 +216,7 @@ So, we dropped `{r} dropped_records` records which is about `{r} sprintf("%0.1f%
 We have explored a data set, in particular for *Mola mola*; your species may present you with unique challenges. Our goal is to winnow the original data set down to just the most reliable observations for modeling.  
 
 # Coding Assignment
-::: {.callout-note appearance="simple"}
+
 We went through many steps to filter out records that won't help use model.  We'll need that filtered data many-many-many times in the days ahead.  Wouldn't it be nice if we could sweep all of those filtering steps into a single function, call it `read_observations()`, that simple took care of it all for us?  Yes - that would be really nice!
 
 Open the "read_observations.R" file you'll find in the "functions" directory. We have started it for you. Edit the function so that it appropriately filters your species data set by adding optional arguments (like `minimum_year` has been added).  And then adding the code steps needed to implement that filter.
@@ -225,12 +225,11 @@ Not every filter needs user input.  For instance, `eventDate` can't be `NA`, and
 
 On the other hand, filtering by `basisOfRecord` or `individualCount` might need more flexibility, especially if you might switch to other species.  
 
-SPeaking of which, we provided `scientificname` with a default value - we chose "Mola mola" because we are a bit lazy.  If you are feeling lazy, you can change the default to your own species.
+Speaking of which, we provided `scientificname` with a default value - we chose "Mola mola" because we are a bit lazy.  If you are feeling lazy, you can change the default to your own species.
 
 As you build your function, pause every so often and run the following to test things out.
 
 ```
 source("setup.R")
 obs = read_observations()
-```
-:::
+```
diff --git a/C02_background.qmd b/C02_background.qmd
@@ -23,7 +23,7 @@ obs = read_observations(scientificname = "Mola mola") |>
   filter(month == "Aug")
 db = brickman_database() |>
   filter(scenario == "STATIC", var == "mask")
-mask = read_brickman(db)
+mask = read_brickman(db, add_depth = FALSE)
 ```
 
 We have two approaches to what happens next.  The first is the greedy approach that say, gather together lots of observations and background points.  Lot and lots!  The second approach is much more conseravtive as it considers the value (or not!) of having replicate measurements at locations that share the same array cell.  
@@ -245,21 +245,17 @@ plot(coast, col = "orange", add = TRUE)
 We have prepared what we call "model inputs", in particular for *Mola mola*, by selecting background points using two different approaches: greedy and conservative.  There are lots of other approaches, too, but for the sake of learning we'll settle on just these two.  We developed a function that will produce our model inputs for a given month, and saved them to disk.  Then we read at least one back and showed that we can restore these from disk.
 
 # Coding Assignment
-:::{.callout-note appearance="simple"}
-Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `make_model_input_by_month()` for each month.  
+Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `make_model_input_by_month()` for each month.  You'll know you have done it correctly if your result is a list filled with lists of greedy-conservative tables, **and** your `model_inputs` directory holds at least 24 files (12 months x 2 sampling schemes).
 
-You'll know you have done it correctly if your result is a list filled with lists of greedy-conservative tables.
-:::
 
 
 # Challenge
 And here we add one challenge...
 
-:::{.callout-note appearance="simple"}
 Create a function to read the correct model input when given the species, month and approach.
 
 Use the menu option `File > New File > R Script` to create a blank file. Save the file (even though it is empty) in the "functions" directory as "model_input.R".  Use this file to build a function (or set of functions) that uses this set of arguments.  Below is a template to help you get started.
-:::
+
 
 ```{r read_model_input}
 #' Reads a model input file given species, month, approach and path

diff --git a/C03_covariates.qmd b/C03_covariates.qmd
@@ -136,8 +136,7 @@ Use the `Files` pane to navigate to your personal data directory.  Open the `g_A
 We loaded the covariates for the "PRESENT" climate scenario and looked at collinearity across the entore study domain.  We invoked a function that suggests which variables to keep and which to drop based upon collinearity.  We examined the covariates at just the `presence` and `background` locations.  We then saved a configuration for later reuse.
 
 # Coding Assignment
-::: {.callout-note appearance="simple"}
-Open nd edit the file called `functions/select_covariates.R`.  Within the file write the function(s) you need to select the "keeper" variables for a given approach (greedy or conservative) and a given month (Jan - Dec).  Have the function return an appropriate configuration list. The function shoulkd start out approximately like this...
+Open and edit the file called `functions/select_covariates.R`.  Within the file write the function(s) you need to select the "keeper" variables for a given approach (greedy or conservative) and a given month (Jan - Dec).  Have the function return an appropriate configuration list. The function shoulkd start out approximately like this...
 
 ```
 #' Given a species, month and sampling approach select variabkes for each month
@@ -163,7 +162,6 @@ select_covariates = function(approach = "greedy",
 ```
 
 Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `select_covariates()` for each month using each approach.  At each iteration write the configuration.  When you are done, you should have 12 YAML files for each approach - so 24 YAML files written all together for each species.
-:::
 
 
 

diff --git a/C04_models.qmd b/C04_models.qmd
@@ -348,7 +348,6 @@ You can read it back later with `read_workflow()`.
 We have built a random forest model using tools from the [tidymodels universe](https://www.tidymodels.org/).  After reading in a suite of data, we split our data into training and testing sets, witholding the testing set until the very end.  We looked a variety of metrics including a simple tally, a confusion matrix, ROC and AUC, accuracy and partial dependencies.  We saved the recipe and model together in a special container, called a workflow, to a file.
 
 # Coding Assignment
-::: {.callout-note appearance="simple"}
 
+Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to build a workflow for each month using one or both of your background selection methods.  Save each workflow in the `models` directory.  If you chose to do both background selection methods then you should end up with 24 workflows (12 months x 2 background sampling methods).
 
-:::
diff --git a/C05_prediction.qmd b/C05_prediction.qmd
@@ -153,15 +153,12 @@ write_prediction(forecast_2075, file = file.path(path, "g_Aug_RCP85_2075.tif"))
 write_prediction(rcp85, file = file.path(path, "g_Aug_RCP85_all.tif"))
 ```
 
-To read it back simply provide the filename to `read_prediction()`.  If you are reading back a multi-layer array, be sure to check out the `time` argument to assign values to the time dimension.  Single layer arrays don't have the concept of time.
-
-
-
-
-
+To read it back simply provide the filename to `read_prediction()`.  If you are reading back a multi-layer array, be sure to check out the `time` argument to assign values to the time dimension.  Single layer arrays don't have the concept of time so the `time` argument is ignored.
 
 # Recap
 
 We made both a nowcast and a number predictions using a previously saved workflow. Contrary to Yogi Berra's claim, it's actually pretty easy to predict the future.  Perhaps more challenging is to interpret the prediction.  We bundled these together to make time series plots, and we saved the `.pred_presence` values.
 
 # Coding Assignment
+
+For each each climate scenario create a monthly forecast (so that's three: nowcast, forecast_2055 and forecast_2075) and save each to in your `predictions` directory.  Whether you choose to draw upon the greedy background sampling method, the conservative background sampling method or both is up to you. Keep in mind that some months may not have enough data to model without throwing an error.  We suggest that you wrap your critical steps in a `try()` function which will catch the error without crashing your iterator. There is a tutorial on [error catching](https://bigelowlab.github.io/handytandy/try.html) that specifically uses `try()`.
diff --git a/docs/C00_coding.html b/docs/C00_coding.html
@@ -223,7 +223,7 @@ <h2 id="toc-title">On this page</h2>
   <ul>
   <li><a href="#background" id="toc-background" class="nav-link active" data-scroll-target="#background"><span class="header-section-number">1</span> Background</a></li>
   <li><a href="#one-of-many-available-guis-rstudio" id="toc-one-of-many-available-guis-rstudio" class="nav-link" data-scroll-target="#one-of-many-available-guis-rstudio"><span class="header-section-number">2</span> One of many available GUIs: RStudio</a></li>
-  <li><a href="#getting-the-software-you-will-need" id="toc-getting-the-software-you-will-need" class="nav-link" data-scroll-target="#getting-the-software-you-will-need"><span class="header-section-number">3</span> Getting the software you will need</a></li>
+  <li><a href="#getting-the-software-and-data-you-will-need" id="toc-getting-the-software-and-data-you-will-need" class="nav-link" data-scroll-target="#getting-the-software-and-data-you-will-need"><span class="header-section-number">3</span> Getting the software and data you will need</a></li>
   <li><a href="#coding-with-r" id="toc-coding-with-r" class="nav-link" data-scroll-target="#coding-with-r"><span class="header-section-number">4</span> Coding with R</a>
   <ul class="collapse">
   <li><a href="#loading-the-necessary-tools" id="toc-loading-the-necessary-tools" class="nav-link" data-scroll-target="#loading-the-necessary-tools"><span class="header-section-number">4.1</span> Loading the necessary tools</a></li>
@@ -270,8 +270,8 @@ <h1 data-number="2"><span class="header-section-number">2</span> One of many ava
 <p>RStudio is a free GUI that wraps around two languages (R and Python, and soon to be more). When you invoke RStudio you’ll see that it is laid out as a multi-panel application that runs inside your browser. It will look something like this screenshot.</p>
 <p><img src="images/Rstudio.png" class="img-fluid" alt="Rstudio screenshot"> There are many <a href="https://duckduckgo.com/?q=introduction+to+rstudio&amp;t=ffab&amp;atb=v342-1&amp;iax=videos&amp;ia=videos&amp;iai=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dw8ooTMStQV0&amp;pn=1">RStudio tutorials</a> online. We encourage you to check them out.</p>
 </section>
-<section id="getting-the-software-you-will-need" class="level1" data-number="3">
-<h1 data-number="3"><span class="header-section-number">3</span> Getting the software you will need</h1>
+<section id="getting-the-software-and-data-you-will-need" class="level1" data-number="3">
+<h1 data-number="3"><span class="header-section-number">3</span> Getting the software and data you will need</h1>
 <p>Use this <a href="https://github.com/BigelowLab/ColbyForecasting2025/wiki/Courseware">wiki page</a> to guide you through the process of installing the software you’ll need for this course, “the courseware”.</p>
 </section>
 <section id="coding-with-r" class="level1" data-number="4">
@@ -423,26 +423,28 @@ <h2 data-number="4.4" class="anchored" data-anchor-id="array-data-aka-raster-dat
 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>current <span class="ot">=</span> <span class="fu">read_brickman</span>(db)</span>
 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>current</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
-<pre><code>stars object with 3 dimensions and 8 attributes
+<pre><code>stars object with 3 dimensions and 9 attributes
 attribute(s):
-               Min.      1st Qu.       Median         Mean      3rd Qu.
-MLD    1.011275e+00  5.583339810 15.967359543 18.910421492 2.809953e+01
-Sbtm   2.324167e+01 32.136343956 34.232215881 33.507147254 3.491243e+01
-SSS    1.644333e+01 30.735633373 31.104771614 31.492407921 3.203519e+01
-SST   -7.826599e-01  6.434107542 12.359498501 12.151707840 1.763068e+01
-Tbtm  -2.676387e-01  3.595118523  6.110801697  6.122372065 7.521761e+00
-U     -2.121380e-01 -0.010892980 -0.002634738 -0.010139401 7.229637e-04
-V     -1.883337e-01 -0.010722862 -0.002858645 -0.008474233 9.565173e-04
-Xbtm   3.275602e-06  0.001458065  0.003088348  0.008360344 7.256525e-03
-              Max.  NA's
-MLD   106.69815063 59796
-Sbtm   35.15741730 59796
-SSS    35.59160995 59796
-SST    26.43147278 59796
-Tbtm   24.60999298 59796
-U       0.07469980 59796
-V       0.05264002 59796
-Xbtm    0.18996811 59796
+                Min.      1st Qu.        Median          Mean      3rd Qu.
+MLD     1.011275e+00  5.583339810  15.967359543  18.910421492 2.809953e+01
+Sbtm    2.324167e+01 32.136343956  34.232215881  33.507147254 3.491243e+01
+SSS     1.644333e+01 30.735633373  31.104771614  31.492407921 3.203519e+01
+SST    -7.826599e-01  6.434107542  12.359498501  12.151707840 1.763068e+01
+Tbtm   -2.676387e-01  3.595118523   6.110801697   6.122372065 7.521761e+00
+U      -2.121380e-01 -0.010892980  -0.002634738  -0.010139401 7.229637e-04
+V      -1.883337e-01 -0.010722862  -0.002858645  -0.008474233 9.565173e-04
+Xbtm    3.275602e-06  0.001458065   0.003088348   0.008360344 7.256525e-03
+depth   5.000000e+00 60.258880615 145.012619019 923.313763739 1.704049e+03
+               Max.  NA's
+MLD    1.066982e+02 59796
+Sbtm   3.515742e+01 59796
+SSS    3.559161e+01 59796
+SST    2.643147e+01 59796
+Tbtm   2.460999e+01 59796
+U      7.469980e-02 59796
+V      5.264002e-02 59796
+Xbtm   1.899681e-01 59796
+depth  4.964409e+03 59796
 dimension(s):
       from  to offset    delta refsys point      values x/y
 x        1 121 -74.93  0.08226 WGS 84 FALSE        NULL [x]

diff --git a/docs/C01_observations.html b/docs/C01_observations.html
@@ -553,7 +553,7 @@ <h2 data-number="5.6" class="anchored" data-anchor-id="geometry"><span class="he
 <div class="cell">
 <div class="sourceCode cell-code" id="cb37"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a>db <span class="ot">=</span> <span class="fu">brickman_database</span>() <span class="sc">|&gt;</span></span>
 <span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">filter</span>(scenario <span class="sc">==</span> <span class="st">"STATIC"</span>, var <span class="sc">==</span> <span class="st">"mask"</span>)</span>
-<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>mask <span class="ot">=</span> <span class="fu">read_brickman</span>(db)</span>
+<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>mask <span class="ot">=</span> <span class="fu">read_brickman</span>(db, <span class="at">add_depth =</span> <span class="cn">FALSE</span>)</span>
 <span id="cb37-4"><a href="#cb37-4" aria-hidden="true" tabindex="-1"></a>mask</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre><code>stars object with 2 dimensions and 1 attribute
@@ -631,23 +631,14 @@ <h1 data-number="6"><span class="header-section-number">6</span> Recap</h1>
 </section>
 <section id="coding-assignment" class="level1" data-number="7">
 <h1 data-number="7"><span class="header-section-number">7</span> Coding Assignment</h1>
-<div class="callout callout-style-simple callout-note">
-<div class="callout-body d-flex">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-body-container">
 <p>We went through many steps to filter out records that won’t help use model. We’ll need that filtered data many-many-many times in the days ahead. Wouldn’t it be nice if we could sweep all of those filtering steps into a single function, call it <code>read_observations()</code>, that simple took care of it all for us? Yes - that would be really nice!</p>
 <p>Open the “read_observations.R” file you’ll find in the “functions” directory. We have started it for you. Edit the function so that it appropriately filters your species data set by adding optional arguments (like <code>minimum_year</code> has been added). And then adding the code steps needed to implement that filter.</p>
 <p>Not every filter needs user input. For instance, <code>eventDate</code> can’t be <code>NA</code>, and all points must fall within the area covered by the Brickman data. So you can automatically add those filters without any user options.</p>
 <p>On the other hand, filtering by <code>basisOfRecord</code> or <code>individualCount</code> might need more flexibility, especially if you might switch to other species.</p>
-<p>SPeaking of which, we provided <code>scientificname</code> with a default value - we chose “Mola mola” because we are a bit lazy. If you are feeling lazy, you can change the default to your own species.</p>
+<p>Speaking of which, we provided <code>scientificname</code> with a default value - we chose “Mola mola” because we are a bit lazy. If you are feeling lazy, you can change the default to your own species.</p>
 <p>As you build your function, pause every so often and run the following to test things out.</p>
 <pre><code>source("setup.R")
 obs = read_observations()</code></pre>
-</div>
-</div>
-</div>
 
 
 </section>