diff --git a/C02_background.qmd b/C02_background.qmd
index 8d98a2e..7bc2899 100644
--- a/C02_background.qmd
+++ b/C02_background.qmd
@@ -48,7 +48,7 @@ greedy_input
 ```
 You may encounter a warning message that says, "There are fewer available cells for raster...". This is useful information, there simply weren't a lot of non-NA cells to sample from.  Let's plot this.
 
-```{r plot_greedye_input}
+```{r plot_greedy_input}
 plot(greedy_input['class'], 
      axes = TRUE,  
      pch = ".", 
@@ -67,26 +67,43 @@ Well, that's imbalanced with a different number presences than background points
 
 # The conservative approach - data thinning
 
-The conservative approach says that the environmental covariates (that's the Brickman data), or more specifically the resolution of the envirnomental covariates, should dictate the sampling.  The core thought here is that it doesn't produce more or better information to have replicate measurements of either presences or  In this approach we eliminate (thin) presences so that we have no more than one per covariate array cell. 
+The conservative approach says that the environmental covariates (that's the Brickman data), or more specifically the resolution of the envirnomental covariates, should dictate the sampling.  The core thought here is that it doesn't produce more or better information to have replicate measurements of either presences or  
+
+## Thin by cell
+
+In this approach we eliminate (thin) presences so that we have no more than one per covariate array cell. 
 
 ```{r thin_by_cell}
 dim_before = dim(obs)
-cat("number of rows before thinning:", dim_before[1], "\n")
-obs = thin_by_cell(obs, mask)
-dim_after = dim(obs)
-cat("number of rows after thinning:", dim_after[1], "\n")
+cat("number of rows before cell thinning:", dim_before[1], "\n")
+thinned_obs = thin_by_cell(obs, mask)
+dim_after = dim(thinned_obs)
+cat("number of rows after cell thinning:", dim_after[1], "\n")
+```
+
+So, that dropped quite a few!  
+
+## Make a weighted sampling map
+
+There is a technique we can use to to make a weighted sampling map.  Simply counting the number of original observations per cell will indicate where we are most likely to oberve `Mola mola`.  
+
+```{r sample_weight}
+samp_weight = rasterize_point_density(obs, mask)
+plot(samp_weight, axes = TRUE, breaks = "equal", col = rev(hcl.colors(10)), reset = FALSE)
+plot(coast, col = "orange", lwd = 2, add = TRUE)
 ```
 
-So, that dropped quite a few!  Now let's take a look at the background, but this time we'll try to match the count of presences.
+Now let's take a look at the background, but this time we'll try to match the count of presences.
 
 ```{r sample_background_conservative}
-conservative_input = sample_background(obs, mask, 
+conservative_input = sample_background(thinned_obs, samp_weight, 
                               n = 2 * nrow(obs),
                               class_label = "background",
-                              method = c("dist_max", 30000),
+                              method = "bias",
                               return_pres = TRUE)
 count(conservative_input, class)
 ```
+Whoa - that's many fewer background points.
 
 ```{r plot_conservative_input}
 plot(conservative_input['class'], 
@@ -97,6 +114,7 @@ plot(conservative_input['class'],
      reset = FALSE)
 plot(coast, col = "orange", add = TRUE)
 ```
+It appears that background points are essentially shadowing the thinned presence points.
 
 # Greedy or Conservative?
 
@@ -176,13 +194,16 @@ make_model_input_by_month  = function(mon = "Jan",
   write_sf(greedy_input, file.path(path, filename))
   
   # thin the obs
-  obs = thin_by_cell(obs, raster)
+  thinned_obs = thin_by_cell(obs, raster)
+  
+  # sampling weight
+  samp_weight = rasterize_point_density(obs, raster)
   
   # make the conservative model
-  conservative_input = sample_background(obs, raster,
+  conservative_input = sample_background(thinned_obs, samp_weight,
                                    n = 2 * nrow(obs),
                                    class_label = "background",
-                                   method = c("dist_max", 30000),
+                                   method = "bias",
                                    return_pres = TRUE)
   
   # save the conservative data
@@ -199,7 +220,6 @@ make_model_input_by_month  = function(mon = "Jan",
 }
 ```
 
-
 # Reusing the function in a loop
 More phew!  But that is it!  Now we use a for loop to run through the months, calling our function each time. Happily, the built-in variable `month.abb` has all of the month names in order.
 
@@ -245,8 +265,8 @@ plot(coast, col = "orange", add = TRUE)
 We have prepared what we call "model inputs", in particular for *Mola mola*, by selecting background points using two different approaches: greedy and conservative.  There are lots of other approaches, too, but for the sake of learning we'll settle on just these two.  We developed a function that will produce our model inputs for a given month, and saved them to disk.  Then we read at least one back and showed that we can restore these from disk.
 
 # Coding Assignment
-Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `make_model_input_by_month()` for each month.  You'll know you have done it correctly if your result is a list filled with lists of greedy-conservative tables, **and** your `model_inputs` directory holds at least 24 files (12 months x 2 sampling schemes).
 
+Use the [iterations tutorial](https://bigelowlab.github.io/handytandy/iterations.html) to apply your `make_model_input_by_month()` for each month.  You'll know you have done it correctly if your result is a list filled with lists of greedy-conservative tables, **and** your `model_inputs` directory holds at least 24 files (12 months x 2 sampling schemes).
 
 
 # Challenge
diff --git a/docs/C00_coding.html b/docs/C00_coding.html
index e95c1ae..d716d9c 100644
--- a/docs/C00_coding.html
+++ b/docs/C00_coding.html
@@ -345,7 +345,7 @@ <h3 data-number="4.2.1" class="anchored" data-anchor-id="point-data"><span class
         x = sf::st_as_sf(x, coords = c("lon", "lat"), crs = 4326)
     x
 }
-&lt;bytecode: 0x7fe9513d2b48&gt;</code></pre>
+&lt;bytecode: 0x7fcb072131b8&gt;</code></pre>
 </div>
 </div>
 <p>If that still doesn’t work, we highly recommend trying <a href="https://rseek.org/">Rseek.org</a> which is an R-language specific search engine.</p>
diff --git a/docs/C02_background.html b/docs/C02_background.html
index 0042361..1e8450a 100644
--- a/docs/C02_background.html
+++ b/docs/C02_background.html
@@ -226,7 +226,11 @@ <h2 id="toc-title">On this page</h2>
   <ul class="collapse">
   <li><a href="#sample-background" id="toc-sample-background" class="nav-link" data-scroll-target="#sample-background"><span class="header-section-number">2.1</span> Sample background</a></li>
   </ul></li>
-  <li><a href="#the-conservative-approach---data-thinning" id="toc-the-conservative-approach---data-thinning" class="nav-link" data-scroll-target="#the-conservative-approach---data-thinning"><span class="header-section-number">3</span> The conservative approach - data thinning</a></li>
+  <li><a href="#the-conservative-approach---data-thinning" id="toc-the-conservative-approach---data-thinning" class="nav-link" data-scroll-target="#the-conservative-approach---data-thinning"><span class="header-section-number">3</span> The conservative approach - data thinning</a>
+  <ul class="collapse">
+  <li><a href="#thin-by-cell" id="toc-thin-by-cell" class="nav-link" data-scroll-target="#thin-by-cell"><span class="header-section-number">3.1</span> Thin by cell</a></li>
+  <li><a href="#make-a-weighted-sampling-map" id="toc-make-a-weighted-sampling-map" class="nav-link" data-scroll-target="#make-a-weighted-sampling-map"><span class="header-section-number">3.2</span> Make a weighted sampling map</a></li>
+  </ul></li>
   <li><a href="#greedy-or-conservative" id="toc-greedy-or-conservative" class="nav-link" data-scroll-target="#greedy-or-conservative"><span class="header-section-number">4</span> Greedy or Conservative?</a></li>
   <li><a href="#model-input-per-month" id="toc-model-input-per-month" class="nav-link" data-scroll-target="#model-input-per-month"><span class="header-section-number">5</span> Model input per month</a>
   <ul class="collapse">
@@ -333,7 +337,7 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="sample-background"><span
 <div class="cell-output-display">
 <div>
 <figure class="figure">
-<p><img src="C02_background_files/figure-html/plot_greedye_input-1.png" class="img-fluid figure-img" width="672"></p>
+<p><img src="C02_background_files/figure-html/plot_greedy_input-1.png" class="img-fluid figure-img" width="672"></p>
 </figure>
 </div>
 </div>
@@ -359,49 +363,73 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="sample-background"><span
 </section>
 <section id="the-conservative-approach---data-thinning" class="level1" data-number="3">
 <h1 data-number="3"><span class="header-section-number">3</span> The conservative approach - data thinning</h1>
-<p>The conservative approach says that the environmental covariates (that’s the Brickman data), or more specifically the resolution of the envirnomental covariates, should dictate the sampling. The core thought here is that it doesn’t produce more or better information to have replicate measurements of either presences or In this approach we eliminate (thin) presences so that we have no more than one per covariate array cell.</p>
+<p>The conservative approach says that the environmental covariates (that’s the Brickman data), or more specifically the resolution of the envirnomental covariates, should dictate the sampling. The core thought here is that it doesn’t produce more or better information to have replicate measurements of either presences or</p>
+<section id="thin-by-cell" class="level2" data-number="3.1">
+<h2 data-number="3.1" class="anchored" data-anchor-id="thin-by-cell"><span class="header-section-number">3.1</span> Thin by cell</h2>
+<p>In this approach we eliminate (thin) presences so that we have no more than one per covariate array cell.</p>
 <div class="cell">
 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>dim_before <span class="ot">=</span> <span class="fu">dim</span>(obs)</span>
-<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="fu">cat</span>(<span class="st">"number of rows before thinning:"</span>, dim_before[<span class="dv">1</span>], <span class="st">"</span><span class="sc">\n</span><span class="st">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="fu">cat</span>(<span class="st">"number of rows before cell thinning:"</span>, dim_before[<span class="dv">1</span>], <span class="st">"</span><span class="sc">\n</span><span class="st">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
-<pre><code>number of rows before thinning: 2459 </code></pre>
+<pre><code>number of rows before cell thinning: 2459 </code></pre>
 </div>
-<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>obs <span class="ot">=</span> <span class="fu">thin_by_cell</span>(obs, mask)</span>
-<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>dim_after <span class="ot">=</span> <span class="fu">dim</span>(obs)</span>
-<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="fu">cat</span>(<span class="st">"number of rows after thinning:"</span>, dim_after[<span class="dv">1</span>], <span class="st">"</span><span class="sc">\n</span><span class="st">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>thinned_obs <span class="ot">=</span> <span class="fu">thin_by_cell</span>(obs, mask)</span>
+<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>dim_after <span class="ot">=</span> <span class="fu">dim</span>(thinned_obs)</span>
+<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="fu">cat</span>(<span class="st">"number of rows after cell thinning:"</span>, dim_after[<span class="dv">1</span>], <span class="st">"</span><span class="sc">\n</span><span class="st">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
-<pre><code>number of rows after thinning: 1204 </code></pre>
+<pre><code>number of rows after cell thinning: 1204 </code></pre>
 </div>
 </div>
-<p>So, that dropped quite a few! Now let’s take a look at the background, but this time we’ll try to match the count of presences.</p>
+<p>So, that dropped quite a few!</p>
+</section>
+<section id="make-a-weighted-sampling-map" class="level2" data-number="3.2">
+<h2 data-number="3.2" class="anchored" data-anchor-id="make-a-weighted-sampling-map"><span class="header-section-number">3.2</span> Make a weighted sampling map</h2>
+<p>There is a technique we can use to to make a weighted sampling map. Simply counting the number of original observations per cell will indicate where we are most likely to oberve <code>Mola mola</code>.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb14"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>conservative_input <span class="ot">=</span> <span class="fu">sample_background</span>(obs, mask, </span>
-<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>                              <span class="at">n =</span> <span class="dv">2</span> <span class="sc">*</span> <span class="fu">nrow</span>(obs),</span>
-<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>                              <span class="at">class_label =</span> <span class="st">"background"</span>,</span>
-<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>                              <span class="at">method =</span> <span class="fu">c</span>(<span class="st">"dist_max"</span>, <span class="dv">30000</span>),</span>
-<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>                              <span class="at">return_pres =</span> <span class="cn">TRUE</span>)</span>
-<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="fu">count</span>(conservative_input, class)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb14"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>samp_weight <span class="ot">=</span> <span class="fu">rasterize_point_density</span>(obs, mask)</span>
+<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(samp_weight, <span class="at">axes =</span> <span class="cn">TRUE</span>, <span class="at">breaks =</span> <span class="st">"equal"</span>, <span class="at">col =</span> <span class="fu">rev</span>(<span class="fu">hcl.colors</span>(<span class="dv">10</span>)), <span class="at">reset =</span> <span class="cn">FALSE</span>)</span>
+<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(coast, <span class="at">col =</span> <span class="st">"orange"</span>, <span class="at">lwd =</span> <span class="dv">2</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output-display">
+<div>
+<figure class="figure">
+<p><img src="C02_background_files/figure-html/sample_weight-1.png" class="img-fluid figure-img" width="672"></p>
+</figure>
+</div>
+</div>
+</div>
+<p>Now let’s take a look at the background, but this time we’ll try to match the count of presences.</p>
+<div class="cell">
+<div class="sourceCode cell-code" id="cb15"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>conservative_input <span class="ot">=</span> <span class="fu">sample_background</span>(thinned_obs, samp_weight, </span>
+<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>                              <span class="at">n =</span> <span class="dv">2</span> <span class="sc">*</span> <span class="fu">nrow</span>(obs),</span>
+<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>                              <span class="at">class_label =</span> <span class="st">"background"</span>,</span>
+<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>                              <span class="at">method =</span> <span class="st">"bias"</span>,</span>
+<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>                              <span class="at">return_pres =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (1204 presences) than the requested 4918 background points. Only 1204 will be returned.</code></pre>
+</div>
+<div class="sourceCode cell-code" id="cb17"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="fu">count</span>(conservative_input, class)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre><code>Simple feature collection with 2 features and 2 fields
 Geometry type: MULTIPOINT
 Dimension:     XY
-Bounding box:  xmin: -74.89169 ymin: 38.8679 xmax: -65.02004 ymax: 45.21401
+Bounding box:  xmin: -74.5867 ymin: 38.868 xmax: -65.02004 ymax: 45.1333
 Geodetic CRS:  WGS 84
 # A tibble: 2 × 3
   class          n                                                      geometry
 * &lt;fct&gt;      &lt;int&gt;                                              &lt;MULTIPOINT [°]&gt;
-1 presence    1204 ((-65.07 42.68), (-65.05 42.6), (-65.067 42.617), (-65.16 42…
-2 background  2408 ((-65.02004 42.25251), (-65.02004 42.74609), (-65.1023 42.66…</code></pre>
+1 presence    1204 ((-65.067 42.65), (-65.05 42.6), (-65.067 42.617), (-65.16 4…
+2 background  1204 ((-65.1023 42.66383), (-65.02004 42.58157), (-65.1023 42.581…</code></pre>
 </div>
 </div>
+<p>Whoa - that’s many fewer background points.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb16"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(conservative_input[<span class="st">'class'</span>], </span>
-<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>     <span class="at">axes =</span> <span class="cn">TRUE</span>,  </span>
-<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>     <span class="at">pch =</span> <span class="st">"."</span>, </span>
-<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>     <span class="at">extent =</span> mask, </span>
-<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>     <span class="at">main =</span> <span class="st">"August conservative class distribution"</span>,</span>
-<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>     <span class="at">reset =</span> <span class="cn">FALSE</span>)</span>
-<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(coast, <span class="at">col =</span> <span class="st">"orange"</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb19"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(conservative_input[<span class="st">'class'</span>], </span>
+<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>     <span class="at">axes =</span> <span class="cn">TRUE</span>,  </span>
+<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>     <span class="at">pch =</span> <span class="st">"."</span>, </span>
+<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a>     <span class="at">extent =</span> mask, </span>
+<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a>     <span class="at">main =</span> <span class="st">"August conservative class distribution"</span>,</span>
+<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a>     <span class="at">reset =</span> <span class="cn">FALSE</span>)</span>
+<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(coast, <span class="at">col =</span> <span class="st">"orange"</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
 <div>
 <figure class="figure">
@@ -410,6 +438,8 @@ <h1 data-number="3"><span class="header-section-number">3</span> The conservativ
 </div>
 </div>
 </div>
+<p>It appears that background points are essentially shadowing the thinned presence points.</p>
+</section>
 </section>
 <section id="greedy-or-conservative" class="level1" data-number="4">
 <h1 data-number="4"><span class="header-section-number">4</span> Greedy or Conservative?</h1>
@@ -445,73 +475,76 @@ <h2 data-number="5.1" class="anchored" data-anchor-id="a-function-we-can-reuse">
 <p>Phew! That’s a lot of steps. To manually run those steps 12 times would be tedious, so we roll that into a function that we can reuse 12 times instead.</p>
 <p>This function will have a name, <code>make_model_input_by_month</code>. It’s a long name, but it makes it obvious what it does. First we start with the documentation.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb18"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="co">#' Builds greedy and conservative model input data sets for a given month</span></span>
-<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="co">#' </span></span>
-<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param mon chr the month abbreviation for the month of interest ("Jan" by default)</span></span>
-<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param obs table, the complete observation data set</span></span>
-<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param raster stars, the object that defines the sampling space, usually a mask</span></span>
-<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param species chr, the name of the species prepended to the name of the output files.</span></span>
-<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a><span class="co">#'   (By default "Mola mola" which gets converted to "Mola_mola")</span></span>
-<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param path the output data path to store this data (be default "model_input")</span></span>
-<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param min_obs num this sets a threshold below which we wont try to make a model. (Default is 3)</span></span>
-<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a><span class="co">#' @return a named two element list of greedy and conservative model inputs - they are tables</span></span>
-<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a>make_model_input_by_month  <span class="ot">=</span> <span class="cf">function</span>(<span class="at">mon =</span> <span class="st">"Jan"</span>,</span>
-<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">obs =</span> <span class="fu">read_observations</span>(<span class="st">"Mola mola"</span>),</span>
-<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">raster =</span> <span class="cn">NULL</span>,</span>
-<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">species =</span> <span class="st">"Mola mola"</span>,</span>
-<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">path =</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>),</span>
-<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">min_obs =</span> <span class="dv">3</span>){</span>
-<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a>  <span class="co"># the user *must* provide a raster</span></span>
-<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a>  <span class="cf">if</span> (<span class="fu">is.null</span>(raster)) <span class="fu">stop</span>(<span class="st">"please provide a raster"</span>)</span>
-<span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a>  <span class="co"># filter the obs</span></span>
-<span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a>  obs <span class="ot">=</span> obs <span class="sc">|&gt;</span></span>
-<span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a>    <span class="fu">filter</span>(month <span class="sc">==</span> mon[<span class="dv">1</span>])</span>
-<span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a>  <span class="co"># check that we have at least some records, if not enough then alert the user</span></span>
-<span id="cb18-24"><a href="#cb18-24" aria-hidden="true" tabindex="-1"></a>  <span class="co"># and return NULL</span></span>
-<span id="cb18-25"><a href="#cb18-25" aria-hidden="true" tabindex="-1"></a>  <span class="cf">if</span> (<span class="fu">nrow</span>(obs) <span class="sc">&lt;</span> min_obs){</span>
-<span id="cb18-26"><a href="#cb18-26" aria-hidden="true" tabindex="-1"></a>    <span class="fu">warning</span>(<span class="st">"sorry, this month has too few records: "</span>, mon)</span>
-<span id="cb18-27"><a href="#cb18-27" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span>(<span class="cn">NULL</span>)</span>
-<span id="cb18-28"><a href="#cb18-28" aria-hidden="true" tabindex="-1"></a>  }</span>
-<span id="cb18-29"><a href="#cb18-29" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-30"><a href="#cb18-30" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make sure the output path exists, if not, make it</span></span>
-<span id="cb18-31"><a href="#cb18-31" aria-hidden="true" tabindex="-1"></a>  <span class="fu">make_path</span>(path)</span>
-<span id="cb18-32"><a href="#cb18-32" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-33"><a href="#cb18-33" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-34"><a href="#cb18-34" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make the greedy model input by sampling the background</span></span>
-<span id="cb18-35"><a href="#cb18-35" aria-hidden="true" tabindex="-1"></a>  greedy_input <span class="ot">=</span> <span class="fu">sample_background</span>(obs, raster,</span>
-<span id="cb18-36"><a href="#cb18-36" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">n =</span> <span class="dv">2</span> <span class="sc">*</span> <span class="fu">nrow</span>(obs),</span>
-<span id="cb18-37"><a href="#cb18-37" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">class_label =</span> <span class="st">"background"</span>,</span>
-<span id="cb18-38"><a href="#cb18-38" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">method =</span> <span class="fu">c</span>(<span class="st">"dist_max"</span>, <span class="dv">30000</span>),</span>
-<span id="cb18-39"><a href="#cb18-39" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">return_pres =</span> <span class="cn">TRUE</span>)</span>
-<span id="cb18-40"><a href="#cb18-40" aria-hidden="true" tabindex="-1"></a>  <span class="co"># save the greedy data</span></span>
-<span id="cb18-41"><a href="#cb18-41" aria-hidden="true" tabindex="-1"></a>  filename <span class="ot">=</span> <span class="fu">sprintf</span>(<span class="st">"%s-%s-greedy_input.gpkg"</span>, </span>
-<span id="cb18-42"><a href="#cb18-42" aria-hidden="true" tabindex="-1"></a>                     <span class="fu">gsub</span>(<span class="st">" "</span>, <span class="st">"_"</span>, species),</span>
-<span id="cb18-43"><a href="#cb18-43" aria-hidden="true" tabindex="-1"></a>                     mon)</span>
-<span id="cb18-44"><a href="#cb18-44" aria-hidden="true" tabindex="-1"></a>  <span class="fu">write_sf</span>(greedy_input, <span class="fu">file.path</span>(path, filename))</span>
-<span id="cb18-45"><a href="#cb18-45" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-46"><a href="#cb18-46" aria-hidden="true" tabindex="-1"></a>  <span class="co"># thin the obs</span></span>
-<span id="cb18-47"><a href="#cb18-47" aria-hidden="true" tabindex="-1"></a>  obs <span class="ot">=</span> <span class="fu">thin_by_cell</span>(obs, raster)</span>
-<span id="cb18-48"><a href="#cb18-48" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-49"><a href="#cb18-49" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make the conservative model</span></span>
-<span id="cb18-50"><a href="#cb18-50" aria-hidden="true" tabindex="-1"></a>  conservative_input <span class="ot">=</span> <span class="fu">sample_background</span>(obs, raster,</span>
-<span id="cb18-51"><a href="#cb18-51" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">n =</span> <span class="dv">2</span> <span class="sc">*</span> <span class="fu">nrow</span>(obs),</span>
-<span id="cb18-52"><a href="#cb18-52" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">class_label =</span> <span class="st">"background"</span>,</span>
-<span id="cb18-53"><a href="#cb18-53" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">method =</span> <span class="fu">c</span>(<span class="st">"dist_max"</span>, <span class="dv">30000</span>),</span>
-<span id="cb18-54"><a href="#cb18-54" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">return_pres =</span> <span class="cn">TRUE</span>)</span>
-<span id="cb18-55"><a href="#cb18-55" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-56"><a href="#cb18-56" aria-hidden="true" tabindex="-1"></a>  <span class="co"># save the conservative data</span></span>
-<span id="cb18-57"><a href="#cb18-57" aria-hidden="true" tabindex="-1"></a>  filename <span class="ot">=</span> <span class="fu">sprintf</span>(<span class="st">"%s-%s-conservative_input.gpkg"</span>, </span>
-<span id="cb18-58"><a href="#cb18-58" aria-hidden="true" tabindex="-1"></a>                     <span class="fu">gsub</span>(<span class="st">" "</span>, <span class="st">"_"</span>, species),</span>
-<span id="cb18-59"><a href="#cb18-59" aria-hidden="true" tabindex="-1"></a>                     mon)</span>
-<span id="cb18-60"><a href="#cb18-60" aria-hidden="true" tabindex="-1"></a>  <span class="fu">write_sf</span>(conservative_input, <span class="fu">file.path</span>(path,filename))</span>
-<span id="cb18-61"><a href="#cb18-61" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-62"><a href="#cb18-62" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make a list</span></span>
-<span id="cb18-63"><a href="#cb18-63" aria-hidden="true" tabindex="-1"></a>  r <span class="ot">=</span> <span class="fu">list</span>(<span class="at">greedy =</span> greedy_input, <span class="at">conservative =</span> conservative_input)</span>
-<span id="cb18-64"><a href="#cb18-64" aria-hidden="true" tabindex="-1"></a>  </span>
-<span id="cb18-65"><a href="#cb18-65" aria-hidden="true" tabindex="-1"></a>  <span class="co"># return, but disable automatic printing</span></span>
-<span id="cb18-66"><a href="#cb18-66" aria-hidden="true" tabindex="-1"></a>  <span class="fu">invisible</span>(r)</span>
-<span id="cb18-67"><a href="#cb18-67" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="co">#' Builds greedy and conservative model input data sets for a given month</span></span>
+<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a><span class="co">#' </span></span>
+<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param mon chr the month abbreviation for the month of interest ("Jan" by default)</span></span>
+<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param obs table, the complete observation data set</span></span>
+<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param raster stars, the object that defines the sampling space, usually a mask</span></span>
+<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param species chr, the name of the species prepended to the name of the output files.</span></span>
+<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a><span class="co">#'   (By default "Mola mola" which gets converted to "Mola_mola")</span></span>
+<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param path the output data path to store this data (be default "model_input")</span></span>
+<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param min_obs num this sets a threshold below which we wont try to make a model. (Default is 3)</span></span>
+<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a><span class="co">#' @return a named two element list of greedy and conservative model inputs - they are tables</span></span>
+<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a>make_model_input_by_month  <span class="ot">=</span> <span class="cf">function</span>(<span class="at">mon =</span> <span class="st">"Jan"</span>,</span>
+<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">obs =</span> <span class="fu">read_observations</span>(<span class="st">"Mola mola"</span>),</span>
+<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">raster =</span> <span class="cn">NULL</span>,</span>
+<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">species =</span> <span class="st">"Mola mola"</span>,</span>
+<span id="cb21-15"><a href="#cb21-15" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">path =</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>),</span>
+<span id="cb21-16"><a href="#cb21-16" aria-hidden="true" tabindex="-1"></a>                                      <span class="at">min_obs =</span> <span class="dv">3</span>){</span>
+<span id="cb21-17"><a href="#cb21-17" aria-hidden="true" tabindex="-1"></a>  <span class="co"># the user *must* provide a raster</span></span>
+<span id="cb21-18"><a href="#cb21-18" aria-hidden="true" tabindex="-1"></a>  <span class="cf">if</span> (<span class="fu">is.null</span>(raster)) <span class="fu">stop</span>(<span class="st">"please provide a raster"</span>)</span>
+<span id="cb21-19"><a href="#cb21-19" aria-hidden="true" tabindex="-1"></a>  <span class="co"># filter the obs</span></span>
+<span id="cb21-20"><a href="#cb21-20" aria-hidden="true" tabindex="-1"></a>  obs <span class="ot">=</span> obs <span class="sc">|&gt;</span></span>
+<span id="cb21-21"><a href="#cb21-21" aria-hidden="true" tabindex="-1"></a>    <span class="fu">filter</span>(month <span class="sc">==</span> mon[<span class="dv">1</span>])</span>
+<span id="cb21-22"><a href="#cb21-22" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-23"><a href="#cb21-23" aria-hidden="true" tabindex="-1"></a>  <span class="co"># check that we have at least some records, if not enough then alert the user</span></span>
+<span id="cb21-24"><a href="#cb21-24" aria-hidden="true" tabindex="-1"></a>  <span class="co"># and return NULL</span></span>
+<span id="cb21-25"><a href="#cb21-25" aria-hidden="true" tabindex="-1"></a>  <span class="cf">if</span> (<span class="fu">nrow</span>(obs) <span class="sc">&lt;</span> min_obs){</span>
+<span id="cb21-26"><a href="#cb21-26" aria-hidden="true" tabindex="-1"></a>    <span class="fu">warning</span>(<span class="st">"sorry, this month has too few records: "</span>, mon)</span>
+<span id="cb21-27"><a href="#cb21-27" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span>(<span class="cn">NULL</span>)</span>
+<span id="cb21-28"><a href="#cb21-28" aria-hidden="true" tabindex="-1"></a>  }</span>
+<span id="cb21-29"><a href="#cb21-29" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-30"><a href="#cb21-30" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make sure the output path exists, if not, make it</span></span>
+<span id="cb21-31"><a href="#cb21-31" aria-hidden="true" tabindex="-1"></a>  <span class="fu">make_path</span>(path)</span>
+<span id="cb21-32"><a href="#cb21-32" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-33"><a href="#cb21-33" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-34"><a href="#cb21-34" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make the greedy model input by sampling the background</span></span>
+<span id="cb21-35"><a href="#cb21-35" aria-hidden="true" tabindex="-1"></a>  greedy_input <span class="ot">=</span> <span class="fu">sample_background</span>(obs, raster,</span>
+<span id="cb21-36"><a href="#cb21-36" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">n =</span> <span class="dv">2</span> <span class="sc">*</span> <span class="fu">nrow</span>(obs),</span>
+<span id="cb21-37"><a href="#cb21-37" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">class_label =</span> <span class="st">"background"</span>,</span>
+<span id="cb21-38"><a href="#cb21-38" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">method =</span> <span class="fu">c</span>(<span class="st">"dist_max"</span>, <span class="dv">30000</span>),</span>
+<span id="cb21-39"><a href="#cb21-39" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">return_pres =</span> <span class="cn">TRUE</span>)</span>
+<span id="cb21-40"><a href="#cb21-40" aria-hidden="true" tabindex="-1"></a>  <span class="co"># save the greedy data</span></span>
+<span id="cb21-41"><a href="#cb21-41" aria-hidden="true" tabindex="-1"></a>  filename <span class="ot">=</span> <span class="fu">sprintf</span>(<span class="st">"%s-%s-greedy_input.gpkg"</span>, </span>
+<span id="cb21-42"><a href="#cb21-42" aria-hidden="true" tabindex="-1"></a>                     <span class="fu">gsub</span>(<span class="st">" "</span>, <span class="st">"_"</span>, species),</span>
+<span id="cb21-43"><a href="#cb21-43" aria-hidden="true" tabindex="-1"></a>                     mon)</span>
+<span id="cb21-44"><a href="#cb21-44" aria-hidden="true" tabindex="-1"></a>  <span class="fu">write_sf</span>(greedy_input, <span class="fu">file.path</span>(path, filename))</span>
+<span id="cb21-45"><a href="#cb21-45" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-46"><a href="#cb21-46" aria-hidden="true" tabindex="-1"></a>  <span class="co"># thin the obs</span></span>
+<span id="cb21-47"><a href="#cb21-47" aria-hidden="true" tabindex="-1"></a>  thinned_obs <span class="ot">=</span> <span class="fu">thin_by_cell</span>(obs, raster)</span>
+<span id="cb21-48"><a href="#cb21-48" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-49"><a href="#cb21-49" aria-hidden="true" tabindex="-1"></a>  <span class="co"># sampling weight</span></span>
+<span id="cb21-50"><a href="#cb21-50" aria-hidden="true" tabindex="-1"></a>  samp_weight <span class="ot">=</span> <span class="fu">rasterize_point_density</span>(obs, raster)</span>
+<span id="cb21-51"><a href="#cb21-51" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-52"><a href="#cb21-52" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make the conservative model</span></span>
+<span id="cb21-53"><a href="#cb21-53" aria-hidden="true" tabindex="-1"></a>  conservative_input <span class="ot">=</span> <span class="fu">sample_background</span>(thinned_obs, samp_weight,</span>
+<span id="cb21-54"><a href="#cb21-54" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">n =</span> <span class="dv">2</span> <span class="sc">*</span> <span class="fu">nrow</span>(obs),</span>
+<span id="cb21-55"><a href="#cb21-55" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">class_label =</span> <span class="st">"background"</span>,</span>
+<span id="cb21-56"><a href="#cb21-56" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">method =</span> <span class="st">"bias"</span>,</span>
+<span id="cb21-57"><a href="#cb21-57" aria-hidden="true" tabindex="-1"></a>                                   <span class="at">return_pres =</span> <span class="cn">TRUE</span>)</span>
+<span id="cb21-58"><a href="#cb21-58" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-59"><a href="#cb21-59" aria-hidden="true" tabindex="-1"></a>  <span class="co"># save the conservative data</span></span>
+<span id="cb21-60"><a href="#cb21-60" aria-hidden="true" tabindex="-1"></a>  filename <span class="ot">=</span> <span class="fu">sprintf</span>(<span class="st">"%s-%s-conservative_input.gpkg"</span>, </span>
+<span id="cb21-61"><a href="#cb21-61" aria-hidden="true" tabindex="-1"></a>                     <span class="fu">gsub</span>(<span class="st">" "</span>, <span class="st">"_"</span>, species),</span>
+<span id="cb21-62"><a href="#cb21-62" aria-hidden="true" tabindex="-1"></a>                     mon)</span>
+<span id="cb21-63"><a href="#cb21-63" aria-hidden="true" tabindex="-1"></a>  <span class="fu">write_sf</span>(conservative_input, <span class="fu">file.path</span>(path,filename))</span>
+<span id="cb21-64"><a href="#cb21-64" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-65"><a href="#cb21-65" aria-hidden="true" tabindex="-1"></a>  <span class="co"># make a list</span></span>
+<span id="cb21-66"><a href="#cb21-66" aria-hidden="true" tabindex="-1"></a>  r <span class="ot">=</span> <span class="fu">list</span>(<span class="at">greedy =</span> greedy_input, <span class="at">conservative =</span> conservative_input)</span>
+<span id="cb21-67"><a href="#cb21-67" aria-hidden="true" tabindex="-1"></a>  </span>
+<span id="cb21-68"><a href="#cb21-68" aria-hidden="true" tabindex="-1"></a>  <span class="co"># return, but disable automatic printing</span></span>
+<span id="cb21-69"><a href="#cb21-69" aria-hidden="true" tabindex="-1"></a>  <span class="fu">invisible</span>(r)</span>
+<span id="cb21-70"><a href="#cb21-70" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 </section>
 </section>
@@ -519,29 +552,65 @@ <h2 data-number="5.1" class="anchored" data-anchor-id="a-function-we-can-reuse">
 <h1 data-number="6"><span class="header-section-number">6</span> Reusing the function in a loop</h1>
 <p>More phew! But that is it! Now we use a for loop to run through the months, calling our function each time. Happily, the built-in variable <code>month.abb</code> has all of the month names in order.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb19"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (this_month <span class="cf">in</span> month.abb){</span>
-<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>  result <span class="ot">=</span> <span class="fu">make_model_input_by_month</span>(this_month,</span>
-<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">obs =</span> <span class="fu">read_observations</span>(<span class="at">scientificname =</span> <span class="st">"Mola mola"</span>),</span>
-<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">raster =</span> mask,</span>
-<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">species =</span> <span class="st">"Mola mola"</span>,</span>
-<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">path =</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>),</span>
-<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">min_obs =</span> <span class="dv">3</span>)</span>
-<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb22"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (this_month <span class="cf">in</span> month.abb){</span>
+<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>  result <span class="ot">=</span> <span class="fu">make_model_input_by_month</span>(this_month,</span>
+<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">obs =</span> <span class="fu">read_observations</span>(<span class="at">scientificname =</span> <span class="st">"Mola mola"</span>),</span>
+<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">raster =</span> mask,</span>
+<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">species =</span> <span class="st">"Mola mola"</span>,</span>
+<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">path =</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>),</span>
+<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>                                     <span class="at">min_obs =</span> <span class="dv">3</span>)</span>
+<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (4 presences) than the requested 8 background points. Only 4 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (16 presences) than the requested 34 background points. Only 16 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (13 presences) than the requested 28 background points. Only 13 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (148 presences) than the requested 480 background points. Only 148 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (364 presences) than the requested 1286 background points. Only 364 will be returned.</code></pre>
+</div>
 <div class="cell-output cell-output-stderr">
 <pre><code>Warning in sample_background(obs, raster, n = 2 * nrow(obs), class_label = "background", : There are fewer available cells for raster 'NA' (2325 presences) than the requested 4650 background points. Only 3882 will be returned.</code></pre>
 </div>
 <div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (996 presences) than the requested 4650 background points. Only 996 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (968 presences) than the requested 4168 background points. Only 968 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
 <pre><code>Warning in sample_background(obs, raster, n = 2 * nrow(obs), class_label = "background", : There are fewer available cells for raster 'NA' (2459 presences) than the requested 4918 background points. Only 4818 will be returned.</code></pre>
 </div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (1204 presences) than the requested 4918 background points. Only 1204 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (541 presences) than the requested 1844 background points. Only 542 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (311 presences) than the requested 832 background points. Only 312 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (210 presences) than the requested 676 background points. Only 210 will be returned.</code></pre>
+</div>
+<div class="cell-output cell-output-stderr">
+<pre><code>Warning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (20 presences) than the requested 112 background points. Only 20 will be returned.</code></pre>
+</div>
 </div>
 </section>
 <section id="listing-the-output-files" class="level1" data-number="7">
 <h1 data-number="7"><span class="header-section-number">7</span> Listing the output files</h1>
 <p>You can always look into you output directory to see if the files we made, but even better might be to use the computer to list them for you. If your species is found in sufficient numbers year round, you’ll have 24 files: 12 months x 2 approaches (greedy vs conservative)</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb22"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a>path <span class="ot">=</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>)</span>
-<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>files <span class="ot">=</span> <span class="fu">list.files</span>(path, <span class="at">full.names =</span> <span class="cn">TRUE</span>)</span>
-<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a>files</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb37"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a>path <span class="ot">=</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>)</span>
+<span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a>files <span class="ot">=</span> <span class="fu">list.files</span>(path, <span class="at">full.names =</span> <span class="cn">TRUE</span>)</span>
+<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>files</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre><code> [1] "/Users/ben/Dropbox/code/projects/ColbyForecasting_data/2025/ben/model_input/Mola_mola-Apr-conservative_input.gpkg"
  [2] "/Users/ben/Dropbox/code/projects/ColbyForecasting_data/2025/ben/model_input/Mola_mola-Apr-greedy_input.gpkg"      
@@ -574,15 +643,15 @@ <h1 data-number="7"><span class="header-section-number">7</span> Listing the out
 <h1 data-number="8"><span class="header-section-number">8</span> Reading the files</h1>
 <p>We know that each file should have a table with spatial information included. Let’s read one back and plot it.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb24"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a>x <span class="ot">=</span> <span class="fu">read_sf</span>(files[<span class="dv">1</span>])</span>
-<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>filename <span class="ot">=</span> <span class="fu">basename</span>(files[<span class="dv">1</span>])</span>
-<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(x[<span class="st">'class'</span>], </span>
-<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a>     <span class="at">axes =</span> <span class="cn">TRUE</span>,  </span>
-<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a>     <span class="at">pch =</span> <span class="st">"+"</span>, </span>
-<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a>     <span class="at">extent =</span> mask, </span>
-<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a>     <span class="at">main =</span> filename,</span>
-<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a>     <span class="at">reset =</span> <span class="cn">FALSE</span>)</span>
-<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(coast, <span class="at">col =</span> <span class="st">"orange"</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb39"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a>x <span class="ot">=</span> <span class="fu">read_sf</span>(files[<span class="dv">1</span>])</span>
+<span id="cb39-2"><a href="#cb39-2" aria-hidden="true" tabindex="-1"></a>filename <span class="ot">=</span> <span class="fu">basename</span>(files[<span class="dv">1</span>])</span>
+<span id="cb39-3"><a href="#cb39-3" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(x[<span class="st">'class'</span>], </span>
+<span id="cb39-4"><a href="#cb39-4" aria-hidden="true" tabindex="-1"></a>     <span class="at">axes =</span> <span class="cn">TRUE</span>,  </span>
+<span id="cb39-5"><a href="#cb39-5" aria-hidden="true" tabindex="-1"></a>     <span class="at">pch =</span> <span class="st">"+"</span>, </span>
+<span id="cb39-6"><a href="#cb39-6" aria-hidden="true" tabindex="-1"></a>     <span class="at">extent =</span> mask, </span>
+<span id="cb39-7"><a href="#cb39-7" aria-hidden="true" tabindex="-1"></a>     <span class="at">main =</span> filename,</span>
+<span id="cb39-8"><a href="#cb39-8" aria-hidden="true" tabindex="-1"></a>     <span class="at">reset =</span> <span class="cn">FALSE</span>)</span>
+<span id="cb39-9"><a href="#cb39-9" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(coast, <span class="at">col =</span> <span class="st">"orange"</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
 <div>
 <figure class="figure">
@@ -606,18 +675,18 @@ <h1 data-number="11"><span class="header-section-number">11</span> Challenge</h1
 <p>Create a function to read the correct model input when given the species, month and approach.</p>
 <p>Use the menu option <code>File &gt; New File &gt; R Script</code> to create a blank file. Save the file (even though it is empty) in the “functions” directory as “model_input.R”. Use this file to build a function (or set of functions) that uses this set of arguments. Below is a template to help you get started.</p>
 <div class="cell">
-<div class="sourceCode cell-code" id="cb25"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="co">#' Reads a model input file given species, month, approach and path</span></span>
-<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a><span class="co">#' </span></span>
-<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param scientificname chr, the species name</span></span>
-<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param mon chr month abbreviation</span></span>
-<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param approach chr, one of "greedy" or "conservative"</span></span>
-<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param path chr the path to the data directory</span></span>
-<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a>read_model_input <span class="ot">=</span> <span class="cf">function</span>(<span class="at">scientificname =</span> <span class="st">"Mola mola"</span>,</span>
-<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a>                            <span class="at">mon =</span> <span class="st">"Jan"</span>,</span>
-<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a>                            <span class="at">approach =</span> <span class="st">"greedy"</span>,</span>
-<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a>                            <span class="at">path =</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>)){</span>
-<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a>    <span class="co"># your part goes in here</span></span>
-<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode cell-code" id="cb40"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1"><a href="#cb40-1" aria-hidden="true" tabindex="-1"></a><span class="co">#' Reads a model input file given species, month, approach and path</span></span>
+<span id="cb40-2"><a href="#cb40-2" aria-hidden="true" tabindex="-1"></a><span class="co">#' </span></span>
+<span id="cb40-3"><a href="#cb40-3" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param scientificname chr, the species name</span></span>
+<span id="cb40-4"><a href="#cb40-4" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param mon chr month abbreviation</span></span>
+<span id="cb40-5"><a href="#cb40-5" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param approach chr, one of "greedy" or "conservative"</span></span>
+<span id="cb40-6"><a href="#cb40-6" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param path chr the path to the data directory</span></span>
+<span id="cb40-7"><a href="#cb40-7" aria-hidden="true" tabindex="-1"></a>read_model_input <span class="ot">=</span> <span class="cf">function</span>(<span class="at">scientificname =</span> <span class="st">"Mola mola"</span>,</span>
+<span id="cb40-8"><a href="#cb40-8" aria-hidden="true" tabindex="-1"></a>                            <span class="at">mon =</span> <span class="st">"Jan"</span>,</span>
+<span id="cb40-9"><a href="#cb40-9" aria-hidden="true" tabindex="-1"></a>                            <span class="at">approach =</span> <span class="st">"greedy"</span>,</span>
+<span id="cb40-10"><a href="#cb40-10" aria-hidden="true" tabindex="-1"></a>                            <span class="at">path =</span> <span class="fu">data_path</span>(<span class="st">"model_input"</span>)){</span>
+<span id="cb40-11"><a href="#cb40-11" aria-hidden="true" tabindex="-1"></a>    <span class="co"># your part goes in here</span></span>
+<span id="cb40-12"><a href="#cb40-12" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 
 
diff --git a/docs/C02_background_files/figure-html/plot_conservative_input-1.png b/docs/C02_background_files/figure-html/plot_conservative_input-1.png
index 0d6a5fd..1973b24 100644
Binary files a/docs/C02_background_files/figure-html/plot_conservative_input-1.png and b/docs/C02_background_files/figure-html/plot_conservative_input-1.png differ
diff --git a/docs/C02_background_files/figure-html/plot_greedye_input-1.png b/docs/C02_background_files/figure-html/plot_greedy_input-1.png
similarity index 100%
rename from docs/C02_background_files/figure-html/plot_greedye_input-1.png
rename to docs/C02_background_files/figure-html/plot_greedy_input-1.png
diff --git a/docs/C02_background_files/figure-html/read_file-1.png b/docs/C02_background_files/figure-html/read_file-1.png
index 3a2c04c..df45f04 100644
Binary files a/docs/C02_background_files/figure-html/read_file-1.png and b/docs/C02_background_files/figure-html/read_file-1.png differ
diff --git a/docs/C02_background_files/figure-html/sample_weight-1.png b/docs/C02_background_files/figure-html/sample_weight-1.png
new file mode 100644
index 0000000..ee31307
Binary files /dev/null and b/docs/C02_background_files/figure-html/sample_weight-1.png differ
diff --git a/docs/C03_covariates.html b/docs/C03_covariates.html
index e986dbe..e4d8a8b 100644
--- a/docs/C03_covariates.html
+++ b/docs/C03_covariates.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 
-<title>Prediction – Colby Forecasting 2025</title>
+<title>Covariates – Colby Forecasting 2025</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -232,59 +232,6 @@ <h2 id="toc-title">On this page</h2>
   </ul></li>
   <li><a href="#recap" id="toc-recap" class="nav-link" data-scroll-target="#recap"><span class="header-section-number">3</span> Recap</a></li>
   <li><a href="#coding-assignment" id="toc-coding-assignment" class="nav-link" data-scroll-target="#coding-assignment"><span class="header-section-number">4</span> Coding Assignment</a></li>
-  <li><a href="#setup-1" id="toc-setup-1" class="nav-link" data-scroll-target="#setup-1"><span class="header-section-number">5</span> Setup</a></li>
-  <li><a href="#load-data---choose-a-month-and-sampling-approach" id="toc-load-data---choose-a-month-and-sampling-approach" class="nav-link" data-scroll-target="#load-data---choose-a-month-and-sampling-approach"><span class="header-section-number">6</span> Load data - choose a month and sampling approach</a></li>
-  <li><a href="#split-the-data-set-into-testing-and-training-data-sets" id="toc-split-the-data-set-into-testing-and-training-data-sets" class="nav-link" data-scroll-target="#split-the-data-set-into-testing-and-training-data-sets"><span class="header-section-number">7</span> Split the data set into testing and training data sets</a></li>
-  <li><a href="#create-a-workflow" id="toc-create-a-workflow" class="nav-link" data-scroll-target="#create-a-workflow"><span class="header-section-number">8</span> Create a workflow</a></li>
-  <li><a href="#build-a-recipe" id="toc-build-a-recipe" class="nav-link" data-scroll-target="#build-a-recipe"><span class="header-section-number">9</span> Build a recipe</a>
-  <ul class="collapse">
-  <li><a href="#modifying-the-recipe-with-steps" id="toc-modifying-the-recipe-with-steps" class="nav-link" data-scroll-target="#modifying-the-recipe-with-steps"><span class="header-section-number">9.1</span> Modifying the recipe with steps</a></li>
-  <li><a href="#add-the-recipe-to-the-workflow" id="toc-add-the-recipe-to-the-workflow" class="nav-link" data-scroll-target="#add-the-recipe-to-the-workflow"><span class="header-section-number">9.2</span> Add the recipe to the workflow</a></li>
-  </ul></li>
-  <li><a href="#build-a-model" id="toc-build-a-model" class="nav-link" data-scroll-target="#build-a-model"><span class="header-section-number">10</span> Build a model</a>
-  <ul class="collapse">
-  <li><a href="#create-the-model" id="toc-create-the-model" class="nav-link" data-scroll-target="#create-the-model"><span class="header-section-number">10.1</span> Create the model</a></li>
-  <li><a href="#add-the-model-to-the-workflow" id="toc-add-the-model-to-the-workflow" class="nav-link" data-scroll-target="#add-the-model-to-the-workflow"><span class="header-section-number">10.2</span> Add the model to the workflow</a></li>
-  </ul></li>
-  <li><a href="#fit-the-model" id="toc-fit-the-model" class="nav-link" data-scroll-target="#fit-the-model"><span class="header-section-number">11</span> Fit the model</a></li>
-  <li><a href="#making-predictions" id="toc-making-predictions" class="nav-link" data-scroll-target="#making-predictions"><span class="header-section-number">12</span> Making predictions</a>
-  <ul class="collapse">
-  <li><a href="#predict-with-the-training-data" id="toc-predict-with-the-training-data" class="nav-link" data-scroll-target="#predict-with-the-training-data"><span class="header-section-number">12.1</span> Predict with the training data</a></li>
-  <li><a href="#assess-the-model" id="toc-assess-the-model" class="nav-link" data-scroll-target="#assess-the-model"><span class="header-section-number">12.2</span> Assess the model</a>
-  <ul class="collapse">
-  <li><a href="#confusion-matrix" id="toc-confusion-matrix" class="nav-link" data-scroll-target="#confusion-matrix"><span class="header-section-number">12.2.1</span> Confusion matrix</a></li>
-  <li><a href="#roc-and-auc" id="toc-roc-and-auc" class="nav-link" data-scroll-target="#roc-and-auc"><span class="header-section-number">12.2.2</span> ROC and AUC</a></li>
-  <li><a href="#accuracy" id="toc-accuracy" class="nav-link" data-scroll-target="#accuracy"><span class="header-section-number">12.2.3</span> Accuracy</a></li>
-  <li><a href="#partial-dependence-plot" id="toc-partial-dependence-plot" class="nav-link" data-scroll-target="#partial-dependence-plot"><span class="header-section-number">12.2.4</span> Partial dependence plot</a></li>
-  </ul></li>
-  <li><a href="#predict-with-the-testing-data" id="toc-predict-with-the-testing-data" class="nav-link" data-scroll-target="#predict-with-the-testing-data"><span class="header-section-number">12.3</span> Predict with the testing data</a>
-  <ul class="collapse">
-  <li><a href="#predict" id="toc-predict" class="nav-link" data-scroll-target="#predict"><span class="header-section-number">12.3.1</span> Predict</a></li>
-  <li><a href="#confusion-matrix-1" id="toc-confusion-matrix-1" class="nav-link" data-scroll-target="#confusion-matrix-1"><span class="header-section-number">12.3.2</span> Confusion matrix</a></li>
-  <li><a href="#rocauc" id="toc-rocauc" class="nav-link" data-scroll-target="#rocauc"><span class="header-section-number">12.3.3</span> ROC/AUC</a></li>
-  <li><a href="#accuracy-1" id="toc-accuracy-1" class="nav-link" data-scroll-target="#accuracy-1"><span class="header-section-number">12.3.4</span> Accuracy</a></li>
-  <li><a href="#partial-dependence" id="toc-partial-dependence" class="nav-link" data-scroll-target="#partial-dependence"><span class="header-section-number">12.3.5</span> Partial Dependence</a></li>
-  </ul></li>
-  </ul></li>
-  <li><a href="#saving-recipes-and-models-to-disk-as-a-workflow" id="toc-saving-recipes-and-models-to-disk-as-a-workflow" class="nav-link" data-scroll-target="#saving-recipes-and-models-to-disk-as-a-workflow"><span class="header-section-number">13</span> Saving recipes and models to disk as a workflow</a></li>
-  <li><a href="#recap-1" id="toc-recap-1" class="nav-link" data-scroll-target="#recap-1"><span class="header-section-number">14</span> Recap</a></li>
-  <li><a href="#coding-assignment-1" id="toc-coding-assignment-1" class="nav-link" data-scroll-target="#coding-assignment-1"><span class="header-section-number">15</span> Coding Assignment</a></li>
-  <li><a href="#setup-2" id="toc-setup-2" class="nav-link" data-scroll-target="#setup-2"><span class="header-section-number">16</span> Setup</a></li>
-  <li><a href="#load-the-brickman-data" id="toc-load-the-brickman-data" class="nav-link" data-scroll-target="#load-the-brickman-data"><span class="header-section-number">17</span> Load the Brickman data</a></li>
-  <li><a href="#load-the-workflow" id="toc-load-the-workflow" class="nav-link" data-scroll-target="#load-the-workflow"><span class="header-section-number">18</span> Load the workflow</a></li>
-  <li><a href="#make-a-prediction" id="toc-make-a-prediction" class="nav-link" data-scroll-target="#make-a-prediction"><span class="header-section-number">19</span> Make a prediction</a>
-  <ul class="collapse">
-  <li><a href="#nowcast" id="toc-nowcast" class="nav-link" data-scroll-target="#nowcast"><span class="header-section-number">19.1</span> Nowcast</a></li>
-  <li><a href="#forecast" id="toc-forecast" class="nav-link" data-scroll-target="#forecast"><span class="header-section-number">19.2</span> Forecast</a></li>
-  </ul></li>
-  <li><a href="#time-series" id="toc-time-series" class="nav-link" data-scroll-target="#time-series"><span class="header-section-number">20</span> Time series</a>
-  <ul class="collapse">
-  <li><a href="#forecast-2055" id="toc-forecast-2055" class="nav-link" data-scroll-target="#forecast-2055"><span class="header-section-number">20.1</span> Forecast 2055</a></li>
-  <li><a href="#bind-time-series" id="toc-bind-time-series" class="nav-link" data-scroll-target="#bind-time-series"><span class="header-section-number">20.2</span> Bind time series</a></li>
-  <li><a href="#save-the-predictions" id="toc-save-the-predictions" class="nav-link" data-scroll-target="#save-the-predictions"><span class="header-section-number">20.3</span> Save the predictions</a></li>
-  </ul></li>
-  <li><a href="#recap-2" id="toc-recap-2" class="nav-link" data-scroll-target="#recap-2"><span class="header-section-number">21</span> Recap</a></li>
-  <li><a href="#coding-assignment-2" id="toc-coding-assignment-2" class="nav-link" data-scroll-target="#coding-assignment-2"><span class="header-section-number">22</span> Coding Assignment</a></li>
   </ul>
 </nav>
     </div>
@@ -293,7 +240,7 @@ <h2 id="toc-title">On this page</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title">Prediction</h1>
+<h1 class="title">Covariates</h1>
 </div>
 
 
@@ -534,426 +481,6 @@ <h1 data-number="4"><span class="header-section-number">4</span> Coding Assignme
 <p>Use the <a href="https://bigelowlab.github.io/handytandy/iterations.html">iterations tutorial</a> to apply your <code>select_covariates()</code> for each month using each approach. At each iteration write the configuration. When you are done, you should have 12 YAML files for each approach - so 24 YAML files written all together for each species.</p>
 
 
-<blockquote class="blockquote">
-<dl>
-<dt>All models are wrong, but some are useful.</dt>
-<dd>
-<a href="https://en.wikipedia.org/wiki/George_E._P._Box">George Box</a>
-</dd>
-</dl>
-</blockquote>
-<p>Modeling starts with a collection of observations (presence and background for us!) and ends up with a collection of coeefficients that can be used with one or more formulas to make a predicition for the past, the present or the future. We are using modeling specifically to make habitat suitability maps for select species under two climate scenarios (RCP45 and RCP85) at two different times (2055 and 2075) in the future.</p>
-<p>We can choose from a number of different models: <a href="https://en.wikipedia.org/wiki/Random_forest">random forest “rf”</a>, <a href="https://en.wikipedia.org/wiki/Principle_of_maximum_entropy">maximum entropy “maxent” or “maxnet”</a>, <a href="boosted regression trees">boosted regression trees “brt”</a>, <a href="https://en.wikipedia.org/wiki/General_linear_model">general linear models “glm”</a>, etc. The point of each is to make a mathematical representation of natural occurrences. It is important to consider what those occurences might be - <em>categorical</em> like labels? <em>likelihoods</em> like probabilities? <em>continuous</em> like measurements? Here are examples of each…</p>
-<ul>
-<li><strong>Categorical</strong>
-<ul>
-<li>two class labels: “present/absence”, “red/green”, “shell/no shell”, “alive/dead”</li>
-<li>multi-class labels: “vanilla/chocolate/strawberry”, “immature/mature/aged”</li>
-</ul></li>
-<li><strong>Likelihood and Probability</strong>
-<ul>
-<li>probability: “50% chance of rain”, “80% chance of a fatal fall”</li>
-<li>relativity: “low likelihood of encounter”, “some likelihood of encounter”</li>
-</ul></li>
-<li><strong>Continuous</strong>
-<ul>
-<li>abundance: “48.2 mice per km^2”, “10,500 copepods per m^3”</li>
-<li>rate: “50 knot winds”, “28.2 Svedrups”</li>
-<li>measure: “3.2 cm of rain”, “12.1 grams of carbon”</li>
-</ul></li>
-</ul>
-<p>We are modeling with known observations (presences) and a sampling of the background, so we are trying to model a likelihood that a species will be encountered (and reported) relative to the environmental conditions. We are looking for a model that can produce relative likelihood of an encounter that results in a report.</p>
-<p>We’ll be using a random forest model (rf). We were inspired to follow this route by using this <a href="https://oj713.github.io/tidymodels/index.html">tidy models tutorial</a> prepared by our colleague <a href="https://omi-johnson.netlify.app/">Omi Johnson</a>.</p>
-</section>
-<section id="setup-1" class="level1" data-number="5">
-<h1 data-number="5"><span class="header-section-number">5</span> Setup</h1>
-<p>As always, we start by running our setup function. Start RStudio/R, and reload your project with the menu <code>File &gt; Recent Projects</code>.</p>
-<pre class="{r setup}"><code>source("setup.R")</code></pre>
-</section>
-<section id="load-data---choose-a-month-and-sampling-approach" class="level1" data-number="6">
-<h1 data-number="6"><span class="header-section-number">6</span> Load data - choose a month and sampling approach</h1>
-<p>Let’s load what we need to build a model for August using the greedy sampling technique. We’ll also need the model configuration (which is “g_Aug”). And we’ll need the covariate data. Notice that we select the covariates that are included in our configuration.</p>
-<pre class="{r load_data}"><code>model_input = read_model_input(scientificname = "Mola mola", 
-                               approach = "greedy", 
-                               mon = "Aug")
-cfg = read_configuration(version = "g_Aug")
-db = brickman_database()
-covars = read_brickman(db |&gt; filter(scenario == "PRESENT", interval == "mon"))|&gt;
-  select(all_of(cfg$keep_vars))</code></pre>
-<p>Of course we need covariates for August only, for this we can use a function we prepared earlier, <code>prep_model_data()</code>. Note the we specifically ask for a plain table which means we are dropping the spatial information for now. Also, we select only the variables required in the configuration, plus the <code>class</code> label.</p>
-<pre class="{r august_data}"><code>all_data = prep_model_data(model_input, 
-                           month = "Aug",
-                           covars = covars, 
-                           form = "table") |&gt;
-  select(all_of(c("class", cfg$keep_vars)))
-all_data</code></pre>
-</section>
-<section id="split-the-data-set-into-testing-and-training-data-sets" class="level1" data-number="7">
-<h1 data-number="7"><span class="header-section-number">7</span> Split the data set into testing and training data sets</h1>
-<p>We will split out a random sample of our dataset to a larger set used for training the model, and a smaller set we withhold to use for later testing of the model. Since we have labeled data (“presence” and “background”) we want to be sure we sample these in proportion, for that we’ll indicate that the data are stratified (into just two groups). Let’s first determine what the proportion is before splitting.</p>
-<pre class="{r prop_variables}"><code># A little function to compute the ratio of presences to background
-# @param x table with a "class" column
-# @return numeric ratio presences/background
-get_ratio = function(x){
-  counts = count(x, class)
-  np = filter(counts, class == "presence") |&gt; pull(n)
-  nb = filter(counts, class == "background") |&gt; pull(n)
-  return(np/nb)
-}
-
-cat("ratio of presence/background in full data set:", get_ratio(all_data), "\n")</code></pre>
-<p>Now let’s make the split with the training set comprising 75% of <code>all_data</code>. Note that we specifically identify <code>class</code> as the <code>strata</code> (or grouping) variable.</p>
-<pre class="{r split}"><code>split_data = initial_split(all_data, 
-                           prop = 3/4,
-                           strata = class)
-split_data</code></pre>
-<p>It prints the counts of the training data, the testing data and the entire data set. We can extract the training data and testing data using the <code>training()</code> and <code>testing()</code> functions. Let’s check the ratios for those..</p>
-<pre class="{r check_strata}"><code>cat("ratio of presence/background in training data:", 
-    training(split_data) |&gt; get_ratio(), "\n")
-
-cat("ratio of presence/background in testing data:", 
-    testing(split_data) |&gt; get_ratio(), "\n")</code></pre>
-<p>OK! The samples observed the original proportion of presence/background.</p>
-<blockquote class="blockquote">
-<p>Note! Did you notice that the function is called <code>initial_split()</code>, which implies a subsequent split - what do you suppose that is about?</p>
-</blockquote>
-</section>
-<section id="create-a-workflow" class="level1" data-number="8">
-<h1 data-number="8"><span class="header-section-number">8</span> Create a workflow</h1>
-<p><a href="https://workflows.tidymodels.org/">workflows</a> are containers for storing the data pre-processing steps and model specifications. Not too long ago it was quite a challenge to to keep track of all the bits and pieces required to make good forecasts. The advent of <code>workflows</code> greatly simplifies the process. A <code>workflow</code> will house two important items for us: a recipe and a model. For now, we’ll create an empty workflow, then add to it as needed. At the very end, we’ll save the workflow.</p>
-<pre class="{r make_workflow}"><code>wflow = workflow()</code></pre>
-<p>That’s it!</p>
-</section>
-<section id="build-a-recipe" class="level1" data-number="9">
-<h1 data-number="9"><span class="header-section-number">9</span> Build a recipe</h1>
-<p>The first thing we’ll add tot he workflow is a <code>recipe.</code> A <code>recipe</code> is a blueprint that guides the data handling and modeling process.</p>
-<p>A recipe at a bare minimum needs to know two things: what data it has to work with and what is the relationship among the variables within the data. The latter is expressed as a formula, very similar to how we specify the formula of a line with <code>y = mx + b</code> or a parabola <code>y = ax^2 + bx + c</code>.</p>
-<div class="callout callout-style-default callout-note callout-titled">
-<div class="callout-header d-flex align-content-center">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-title-container flex-fill">
-Note
-</div>
-</div>
-<div class="callout-body-container callout-body">
-<p>We often think of formulas as left-hand side (LHS) and right-hand side (RHS) equalities. And usually, the LHS is the outcome while the RHS is about the inputs. For our modeling, the outcome is to predict the across the entire domain. We can generalize the idea with the “is a function of” operator <code>~</code> (the tilde). For the classic formula for a line it like this… <code>y ~ x</code> and a parabola is also <code>y ~ x</code>.</p>
-<p>Consider a situation where we have reduced all of the suitable variables to <code>Sbtm</code>, <code>Tbtm</code>, <code>MLD</code> and<code>Xbtm</code>, which we have in a table along with a <code>class</code> variable. In our case we have the outcome is an prediction of <code>class</code> it is a function of variables like <code>Sbtm</code>, <code>Tbtm</code>, <code>MLD</code>, <code>Xbtm</code>, <em>etc.</em> This formula would look like <code>y ~ Sbtm + Tbtm + MLD + Xbtm</code>. Unlike the specific equation for a line or parabola, we don’t pretend to know what coefficients, powers and that sort of stuff looks like. We are just saying that <code>class</code> is a function of all of those variables (somehow).</p>
-<p>In the case here where the outcome (<code>class</code>) is a function of all other variables in the table, we have a nice short hand. <code>class ~ .</code> where the dot means “every other variable”.</p>
-</div>
-</div>
-<p>First we fish out of our split data the training data, and then drop the spatial information.</p>
-<pre class="{r tr_data}"><code>tr_data = training(split_data)
-tr_data</code></pre>
-<p>Now we make the recipe. Note that no computation takes place.</p>
-<div class="callout callout-style-default callout-note callout-titled">
-<div class="callout-header d-flex align-content-center">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-title-container flex-fill">
-Note
-</div>
-</div>
-<div class="callout-body-container callout-body">
-<p>Technically, <code>recipe()</code> only needs a small subset of the data set to establish the names and data types of the predictor and outcome variables. Just one row would suffice. That underscores that a recipe is simply building a template.</p>
-</div>
-</div>
-<pre class="{r recipe}"><code>rec = recipe(class ~ ., data = slice(tr_data,1))
-rec</code></pre>
-<p>This print out provides a very high level summary - all of the details are glossed over. To get a more detailed summary use the <code>summary()</code> function.</p>
-<pre class="{r recipe_summary}"><code>summary(rec)</code></pre>
-<p>Each variable in the input is assigned a role: “outcome” or “predictor”. The latter are the variables used in the creation of the model. There are other types of roles, (see <code>?recipe</code>) including “case_weight” and “ID”, and others can be defined as needed. Some are used in building the model, others are simply ride along and don’t change the model outcome.</p>
-<section id="modifying-the-recipe-with-steps" class="level2" data-number="9.1">
-<h2 data-number="9.1" class="anchored" data-anchor-id="modifying-the-recipe-with-steps"><span class="header-section-number">9.1</span> Modifying the recipe with steps</h2>
-<p>Steps are cumulative modifications, and that means the order in which they are added matters. These steps comprise the bulk of pre-processing steps.</p>
-<p>Some modifications are applied row-by-row. For example, rows of the input modeling data that have one or more missing values (NAs) can be problematic and they should be removed.</p>
-<p>Other modifications are to manipulate entire columns. Sometimes the recipes requires subsequent steps <em>before</em> the modeling begins in earnest. For example we know from experience that it is often useful to log scale (base 10) depth when working with biological models. If <code>depth</code> and <code>Xbtm</code> have made it this far, you’ll note that each range over 4 or more orders of magnitude. That’s not a problem by itself, but it can introduce a bias toward larger values whenever the mean is computed. So, we’ll add a step for log scaling these, but only if <code>depth</code> and <code>Xbtm</code> have made it this far (this may vary by species.)</p>
-<pre class="{r step_log}"><code>rec = rec |&gt; 
-  step_naomit()
-if ("depth" %in% cfg$keep_vars){
-  rec = rec |&gt;
-    step_log(depth,  base = 10)
-}
-if ("Xbtm" %in% cfg$keep_vars){
-  rec = rec |&gt;
-    step_log(Xbtm,  base = 10)
-}
-rec</code></pre>
-<p>Next we state that we want to <strong>remove</strong> variables that might be highly correlated with other variables. If two variables are highly correlated, they will not provide the modeling system with more information, just redundant information which doesn’t neccessarily help. <code>step_corr()</code> accepts a variety of arguments specifying which variables to test to correlation including some convenience selectors like <code>all_numeric()</code>, <code>all_string()</code> and friends. We want all predictors which happen to all be numeric, so we can use <code>all_predictors()</code> or <code>all_numeric_predictors()</code>. Specificity is better then generality so let’s choose numeric predictors.</p>
-<div class="callout callout-style-default callout-note callout-titled">
-<div class="callout-header d-flex align-content-center">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-title-container flex-fill">
-Note
-</div>
-</div>
-<div class="callout-body-container callout-body">
-<p>We have already tested variables for high collinearlity, but here we can add a slightly different filter, high correlation, for the same issue. Since we have dealt with this already we shouldn’t expect that step will change the preprocessing very much. But it is instructive to see it in action.</p>
-</div>
-</div>
-<pre class="{r step_corr}"><code>rec = rec |&gt; 
-  step_corr(all_numeric_predictors())
-rec</code></pre>
-</section>
-<section id="add-the-recipe-to-the-workflow" class="level2" data-number="9.2">
-<h2 data-number="9.2" class="anchored" data-anchor-id="add-the-recipe-to-the-workflow"><span class="header-section-number">9.2</span> Add the recipe to the workflow</h2>
-<pre class="{r add_recipe}"><code>wflow = wflow |&gt;
-  add_recipe(rec)
-wflow</code></pre>
-</section>
-</section>
-<section id="build-a-model" class="level1" data-number="10">
-<h1 data-number="10"><span class="header-section-number">10</span> Build a model</h1>
-<p>We are going to build a <a href="https://www.geeksforgeeks.org/random-forest-algorithm-in-machine-learning/">random forest “rf”</a> model in classification mode which means for us that we have predictions of “presence” or “background”. That’s just two classes, random forests can predict multiple classes, too. Also, random forests can make regression models which are used for continuous data. Below we start the model, declare its mode and assign an engine (the package we prefer to use.) We’ll be using the <a href="http://imbs-hl.github.io/ranger/">ranger R pakage</a>.</p>
-<section id="create-the-model" class="level2" data-number="10.1">
-<h2 data-number="10.1" class="anchored" data-anchor-id="create-the-model"><span class="header-section-number">10.1</span> Create the model</h2>
-<p>We create a random forest model, declare that it should be run in classification mode (not regression mode), and then specify that we want to use the <code>ranger</code> modeling engine (as opposed to, say, the <code>randForest</code> engine). We additionally specify that it should be able to produce probablilites of a class not just the class label. We also request that it saves bits of info so that we can compare the relative importance of the covariates.</p>
-<pre class="{r start_rf}"><code>model = rand_forest() |&gt;
-  set_mode("classification") |&gt;
-  set_engine("ranger", probability = TRUE, importance = "permutation") 
-model</code></pre>
-<p>Well, that feels underwhelming. We can pass arguments unique to the engine using the <code>set_args()</code> function, but, for now we’ll accept the defaults.</p>
-</section>
-<section id="add-the-model-to-the-workflow" class="level2" data-number="10.2">
-<h2 data-number="10.2" class="anchored" data-anchor-id="add-the-model-to-the-workflow"><span class="header-section-number">10.2</span> Add the model to the workflow</h2>
-<p>Now we simply add the model to the workflow.</p>
-<pre class="{r add_model}"><code>wflow = wflow |&gt;
-  add_model(model)
-wflow</code></pre>
-</section>
-</section>
-<section id="fit-the-model" class="level1" data-number="11">
-<h1 data-number="11"><span class="header-section-number">11</span> Fit the model</h1>
-<pre class="{r fit_rf}"><code>fitted_wflow = fit(wflow, data = tr_data)
-fitted_wflow</code></pre>
-</section>
-<section id="making-predictions" class="level1" data-number="12">
-<h1 data-number="12"><span class="header-section-number">12</span> Making predictions</h1>
-<p>Predicting is easy with this pattern: <code>predictions = predict(model, newdata, ...)</code> We want to specify that we want probabilites of a particular class being predicted. In each case we bind to the prediction our original classification, <code>class</code>.</p>
-<section id="predict-with-the-training-data" class="level2" data-number="12.1">
-<h2 data-number="12.1" class="anchored" data-anchor-id="predict-with-the-training-data"><span class="header-section-number">12.1</span> Predict with the training data</h2>
-<p>First we shall predict with the same data we trained with. The results of this will not really tell us much about our model as it is very circular to predict using the very data used to build the model. So this next section is more about a first pass at using the tools at your disposal.</p>
-<pre class="{r predict_train}"><code>train_pred = predict_table(fitted_wflow, tr_data, type = "prob")
-train_pred</code></pre>
-<p>Here the variables prepended with a dot <code>.</code> are computed, while the <code>class</code> variable is our original. There are many metrics we can use to determine how well this model predicts. Let’s start with the simplest thing… we can make a simply tally of <code>.pred</code> and <code>class</code>.</p>
-<pre class="{r count_outcomes}"><code>count(train_pred, .pred, class)</code></pre>
-<p>There false positives and false negatives, but many are correct. Of course, this is predicting with the very data we used to train the model; knowing that this is predicicting on training data with some many misses might not inspire confidence. But let’s explore more.</p>
-</section>
-<section id="assess-the-model" class="level2" data-number="12.2">
-<h2 data-number="12.2" class="anchored" data-anchor-id="assess-the-model"><span class="header-section-number">12.2</span> Assess the model</h2>
-<p>Hewre we walk through a number of common assessment tools. We want to assess a model to ascertain how closely it models reality (or not!) Using the tools is always easy, interpreting the metrics is not always easy.</p>
-<section id="confusion-matrix" class="level3" data-number="12.2.1">
-<h3 data-number="12.2.1" class="anchored" data-anchor-id="confusion-matrix"><span class="header-section-number">12.2.1</span> Confusion matrix</h3>
-<p>The confusion matrix is the next step beyond a simple tally that we made above.</p>
-<pre class="{r conf_mat_train}"><code>train_confmat = conf_mat(train_pred, class, .pred)
-train_confmat</code></pre>
-<p>You’ll see this is the same as the simple tally we made, but it comes with handy plotting functionality (shown below). Note that a perfect model would have the upper left and lower right quadrants fully accounting for all points. The lower left quadrant shows us the number of false-negatives while the upper right quadrant shows the number of false-positives.</p>
-<pre class="{r plot_confmat_train}"><code>autoplot(train_confmat, type = "heatmap")</code></pre>
-</section>
-<section id="roc-and-auc" class="level3" data-number="12.2.2">
-<h3 data-number="12.2.2" class="anchored" data-anchor-id="roc-and-auc"><span class="header-section-number">12.2.2</span> ROC and AUC</h3>
-<p>The area under the curve (<a href="https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve">AUC</a>) of the receiver-operator curve (<a href="https://en.wikipedia.org/wiki/Receiver_operating_characteristic">ROC</a>) is a common metric. AUC values range form 0-1 with 1 reflecting a model that faithfully predicts correctly. Technically an AUC value of 0.5 represents a random model (yup, the result of a coin flip!), so values greater than 0.5 and less than 1.0 are expected.</p>
-<p>First we can plot the ROC.</p>
-<pre class="{r plot_roc}"><code>plot_roc(train_pred, class, .pred_presence)</code></pre>
-<p>We can assure you from practical experience that this is an atypical ROC. Typically they are not smooth, but this smoothness is an artifact of our use of training data. If you really only need the AUC, you can use the <code>roc_auc()</code> function directly.</p>
-<pre class="{r roc_auc}"><code>roc_auc(train_pred, class,  .pred_presence)</code></pre>
-</section>
-<section id="accuracy" class="level3" data-number="12.2.3">
-<h3 data-number="12.2.3" class="anchored" data-anchor-id="accuracy"><span class="header-section-number">12.2.3</span> Accuracy</h3>
-<p>Accuracy, much like our simple tally above, tells us what fraction of the predictions are correct. Not that here we explicitly provide the predicted class label (not the probability.)</p>
-<pre class="{r accuracy}"><code>accuracy(train_pred, class, .pred)</code></pre>
-</section>
-<section id="partial-dependence-plot" class="level3" data-number="12.2.4">
-<h3 data-number="12.2.4" class="anchored" data-anchor-id="partial-dependence-plot"><span class="header-section-number">12.2.4</span> Partial dependence plot</h3>
-<p>Partial dependence reflects the relative contrubution of each variable influence over it’s full range of values. The output is a grid grid of plots showing the relative distribution of the variable (bars) as well as the relative influenceof the variable (line).</p>
-<pre class="{r pd_plot}"><code>partial_dependence_plot(fitted_wflow, data = tr_data)</code></pre>
-</section>
-</section>
-<section id="predict-with-the-testing-data" class="level2" data-number="12.3">
-<h2 data-number="12.3" class="anchored" data-anchor-id="predict-with-the-testing-data"><span class="header-section-number">12.3</span> Predict with the testing data</h2>
-<p>Finally, we can repeat these steps with the testing data. This should give use better information than using the training data</p>
-<section id="predict" class="level3" data-number="12.3.1">
-<h3 data-number="12.3.1" class="anchored" data-anchor-id="predict"><span class="header-section-number">12.3.1</span> Predict</h3>
-<pre class="{r predict_test}"><code>test_data = testing(split_data)
-test_pred = predict_table(fitted_wflow, test_data, type = "prob")
-test_pred</code></pre>
-</section>
-<section id="confusion-matrix-1" class="level3" data-number="12.3.2">
-<h3 data-number="12.3.2" class="anchored" data-anchor-id="confusion-matrix-1"><span class="header-section-number">12.3.2</span> Confusion matrix</h3>
-<pre class="{r conf_mat_test}"><code>test_confmat = conf_mat(test_pred, class, .pred)
-autoplot(test_confmat, type = "heatmap")</code></pre>
-</section>
-<section id="rocauc" class="level3" data-number="12.3.3">
-<h3 data-number="12.3.3" class="anchored" data-anchor-id="rocauc"><span class="header-section-number">12.3.3</span> ROC/AUC</h3>
-<pre class="{r plot_roc_test}"><code>plot_roc(test_pred, class, .pred_presence)</code></pre>
-<p>This ROC is more typical of what we see in regular practice.</p>
-</section>
-<section id="accuracy-1" class="level3" data-number="12.3.4">
-<h3 data-number="12.3.4" class="anchored" data-anchor-id="accuracy-1"><span class="header-section-number">12.3.4</span> Accuracy</h3>
-<pre class="{r accuracy_test}"><code>accuracy(test_pred, class, .pred)</code></pre>
-</section>
-<section id="partial-dependence" class="level3" data-number="12.3.5">
-<h3 data-number="12.3.5" class="anchored" data-anchor-id="partial-dependence"><span class="header-section-number">12.3.5</span> Partial Dependence</h3>
-<pre class="{r pd_plot_test}"><code>partial_dependence_plot(fitted_wflow, data = test_data)</code></pre>
-</section>
-</section>
-</section>
-<section id="saving-recipes-and-models-to-disk-as-a-workflow" class="level1" data-number="13">
-<h1 data-number="13"><span class="header-section-number">13</span> Saving recipes and models to disk as a workflow</h1>
-<p>We can (and should!) save recipes and models to disk for later recall. We need the recipe because it handle the pre-processing of our covariates, while the model specifies both the form of the model as well as the necessary coefficients. When bundled together for later use we can be assured the the data pre-processing steps and model specifications will be available. A <a href="https://workflows.tidymodels.org/">workflow</a> is a container for recipes, models and other parts of the model process.</p>
-<p>Now we can save the workflow container.</p>
-<pre class="{r save_model}"><code>write_workflow(fitted_wflow, version = cfg$version)</code></pre>
-<p>You can read it back later with <code>read_workflow()</code>.</p>
-</section>
-<section id="recap-1" class="level1" data-number="14">
-<h1 data-number="14"><span class="header-section-number">14</span> Recap</h1>
-<p>We have built a random forest model using tools from the <a href="https://www.tidymodels.org/">tidymodels universe</a>. After reading in a suite of data, we split our data into training and testing sets, witholding the testing set until the very end. We looked a variety of metrics including a simple tally, a confusion matrix, ROC and AUC, accuracy and partial dependencies. We saved the recipe and model together in a special container, called a workflow, to a file.</p>
-</section>
-<section id="coding-assignment-1" class="level1" data-number="15">
-<h1 data-number="15"><span class="header-section-number">15</span> Coding Assignment</h1>
-<p>Use the <a href="https://bigelowlab.github.io/handytandy/iterations.html">iterations tutorial</a> to build a workflow for each month using one or both of your background selection methods. Save each workflow in the <code>models</code> directory. If you chose to do both background selection methods then you should end up with 24 workflows (12 months x 2 background sampling methods).</p>
-<blockquote class="blockquote">
-<dl>
-<dt>It’s tough to make predictions, especially about the future.</dt>
-<dd>
-Yogi Berra
-</dd>
-</dl>
-</blockquote>
-<p>Finally we come to the end product of forecasting: the prediction. This last step is actually fairly simple, given a recipe and model (now bundled in a <code>workflow</code> container), run the same data-prep and predicting steps as we did earlier. One modification is that we now want to predict across the entire domain of our Brickman data set. You may recall that we are able to read these arrays, display them and extract point data from them. But we haven’t used them <em>en mass</em> as a variable yet.</p>
-</section>
-<section id="setup-2" class="level1" data-number="16">
-<h1 data-number="16"><span class="header-section-number">16</span> Setup</h1>
-<p>As always, we start by running our setup function. Start RStudio/R, and reload your project with the menu <code>File &gt; Recent Projects</code>.</p>
-<pre class="{r setup}"><code>source("setup.R")</code></pre>
-</section>
-<section id="load-the-brickman-data" class="level1" data-number="17">
-<h1 data-number="17"><span class="header-section-number">17</span> Load the Brickman data</h1>
-<p>Once again, we’ll use the August data where we started with a greedy sampling approach. We are going to make a prediction about the present, which means it something akin to a <a href="https://en.wikipedia.org/wiki/Nowcasting_(economics)">nowcast</a>.</p>
-<pre class="{r load_covar}"><code>cfg = read_configuration(version = "g_Aug")
-db = brickman_database()
-covars = read_brickman(db |&gt; filter(scenario == "PRESENT", interval == "mon")) |&gt;
-  select(all_of(cfg$keep_vars)) |&gt;
-  slice("month", "Aug") </code></pre>
-</section>
-<section id="load-the-workflow" class="level1" data-number="18">
-<h1 data-number="18"><span class="header-section-number">18</span> Load the workflow</h1>
-<p>We read the recipe and model workflow bundle.</p>
-<pre class="{r load_workflow}"><code>wflow = read_workflow(version = cfg$version)</code></pre>
-<p>Recall that the workflow has two elements: pre-processing recipe and model. When we make a prediction with the workflow it will accept new data that then gets filtered and/or transformed as specified by the recipe steps. The data that survives the preprocessing will then be used to feed into the model that was trained on a specific domain (time and space).</p>
-</section>
-<section id="make-a-prediction" class="level1" data-number="19">
-<h1 data-number="19"><span class="header-section-number">19</span> Make a prediction</h1>
-<p>First we shall make a “nowcast” which is just a prediction of the current environmental conditions.</p>
-<section id="nowcast" class="level2" data-number="19.1">
-<h2 data-number="19.1" class="anchored" data-anchor-id="nowcast"><span class="header-section-number">19.1</span> Nowcast</h2>
-<p>First make the prediction. The function yields a <code>stars</code> array object that has three attributes: <code>.pred_presence</code>, <code>.pred_background</code> and <code>.pred</code>. The leading dot simply gives us the heads up that these three values are all computed. The first two range from 0-1 which implies a probability. The last, <code>.pred</code>, is the class label we would assign if we accept that any <code>.pred_presence &gt;= 0.5</code> should be considered suitable habitat where a <strong>reported observation</strong> might occur.</p>
-<pre class="{r nowcast}"><code>nowcast = predict_stars(wflow, covars)
-nowcast</code></pre>
-<p>Now we can plot what is often called a “habitat suitability index” (hsi) map.</p>
-<pre class="{r plot_nowcast}"><code>coast = read_coastline()
-plot(nowcast['.pred_presence'], main = "Nowcast August", 
-     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)
-plot(coast, col = "orange", lwd = 2, add = TRUE)</code></pre>
-<p>We can also plot a presence/background labeled map, but keep in mind it is just a thresholded version of the above where “presence” means <code>.pred_presence &gt;= 0.5</code>.</p>
-<pre class="{r plot_class_labels}"><code>plot(nowcast['.pred'], main = "Nowcast August Labels", 
-     axes = TRUE, reset = FALSE)
-plot(coast, col = "black", lwd = 2, add = TRUE)</code></pre>
-</section>
-<section id="forecast" class="level2" data-number="19.2">
-<h2 data-number="19.2" class="anchored" data-anchor-id="forecast"><span class="header-section-number">19.2</span> Forecast</h2>
-<p>Now let’s try our hand at forecasting - let’s try RCP85 in 2075. First we load those parameters, then run the prediction and plot.</p>
-<pre class="{r load_2075_RCP85}"><code>covars_rcp85_2075 = read_brickman(db |&gt; filter(scenario == "RCP85", year == 2075, interval == "mon")) |&gt;
-  select(all_of(cfg$keep_vars)) |&gt;
-  slice("month", "Aug") </code></pre>
-<pre class="{r forecast}"><code>forecast_2075 = predict_stars(wflow, covars_rcp85_2075)
-forecast_2075</code></pre>
-<pre class="{r plot_forecast}"><code>coast = read_coastline()
-plot(forecast_2075['.pred_presence'], main = "RCP85 2075 August", 
-     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)
-plot(coast, col = "orange", lwd = 2, add = TRUE)</code></pre>
-<p>Hmmm, that’s pretty different than what the nowcast predicts.</p>
-</section>
-</section>
-<section id="time-series" class="level1" data-number="20">
-<h1 data-number="20"><span class="header-section-number">20</span> Time series</h1>
-<p>It would be nice to see a time series: current, 2055 and 2075 on the same graphic. Let’s load RCP85 2055 data, and make yet another prediction.</p>
-<section id="forecast-2055" class="level2" data-number="20.1">
-<h2 data-number="20.1" class="anchored" data-anchor-id="forecast-2055"><span class="header-section-number">20.1</span> Forecast 2055</h2>
-<pre class="{r load_2055_RCP85}"><code>covars_rcp85_2055 = read_brickman(db |&gt; filter(scenario == "RCP85", year == 2055, interval == "mon")) |&gt;
-  select(all_of(cfg$keep_vars)) |&gt;
-  slice("month", "Aug") 
-forecast_2055 = predict_stars(wflow, covars_rcp85_2055)
-forecast_2055</code></pre>
-</section>
-<section id="bind-time-series" class="level2" data-number="20.2">
-<h2 data-number="20.2" class="anchored" data-anchor-id="bind-time-series"><span class="header-section-number">20.2</span> Bind time series</h2>
-<p>We want to bind the <code>.pred_presence</code> attribute for each of the predictions (nowcast, forecast_2055 and forecast_2075). Let’s assume the “present” mean 2020 so we can assign a year.</p>
-<pre class="{r bind}"><code>rcp85 = c(nowcast, forecast_2055, forecast_2075, along = list(year = c("2020", "2055", "2075")))</code></pre>
-<div class="callout callout-style-default callout-note callout-titled">
-<div class="callout-header d-flex align-content-center">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-title-container flex-fill">
-Note
-</div>
-</div>
-<div class="callout-body-container callout-body">
-<p>Curious about we provide year as a vector of characters instead of a vector of integers? Try running the command above again and check out the 3rd dimension.</p>
-</div>
-</div>
-<p>Since we are plotting multiple arrays, we need to plot the coastline using a “hook” function.</p>
-<pre class="{r plot_hsi_series}"><code>plot_coast = function(){
-  plot(coast, col = "orange", lwd = 2, add = TRUE)
-}
-
-plot(rcp85['.pred_presence'], 
-     hook = plot_coast,
-     axes = TRUE, breaks = seq(0, 1, by = 0.1), join_zlim  = TRUE, reset = FALSE)</code></pre>
-<p>Hmmmm. Why does there seem to be a strong shift between 2020 and 2055, while the 2055 to 2075 shift seems less pronounced?</p>
-<div class="callout callout-style-default callout-note callout-titled">
-<div class="callout-header d-flex align-content-center">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-title-container flex-fill">
-Note
-</div>
-</div>
-<div class="callout-body-container callout-body">
-<p>Don’t forget that <a href="https://github.com/BigelowLab/ColbyForecasting2025/wiki/Spatial">there are other ways</a> to plot array based spatial data.</p>
-</div>
-</div>
-</section>
-<section id="save-the-predictions" class="level2" data-number="20.3">
-<h2 data-number="20.3" class="anchored" data-anchor-id="save-the-predictions"><span class="header-section-number">20.3</span> Save the predictions</h2>
-<p>We could save all three attributes, but <code>.pred_background</code> is just <code>1 - .pred_presence</code>, and <code>.pred</code> is just coding “presence” where <code>.pred_presence &gt;= 0.5</code>, so we can always compute those as needed if we have <code>.pred_presence</code>. In that case, let’s just save the first attribute, <code>.pred_presence</code>, in a multilayer <a href="https://en.wikipedia.org/wiki/GeoTIFF">GeoTIFF</a> formatted image array file. The <code>write_prediction()</code> function will do just that.</p>
-<pre class="{r save_pred}"><code># make sure the output directory exists
-path = data_path("predictions")
-if (!dir.exists(path)) ok = dir.create(path, recursive = TRUE)
-
-# write individual arrays?
-write_prediction(nowcast, file = file.path(path,"g_Aug_RCP85_2020.tif"))
-write_prediction(forecast_2055, file = file.path(path, "g_Aug_RCP85_2055.tif"))
-write_prediction(forecast_2075, file = file.path(path, "g_Aug_RCP85_2075.tif"))
-
-# or write them together in a "multi-layer" file?
-write_prediction(rcp85, file = file.path(path, "g_Aug_RCP85_all.tif"))</code></pre>
-<p>To read it back simply provide the filename to <code>read_prediction()</code>. If you are reading back a multi-layer array, be sure to check out the <code>time</code> argument to assign values to the time dimension. Single layer arrays don’t have the concept of time so the <code>time</code> argument is ignored.</p>
-</section>
-</section>
-<section id="recap-2" class="level1" data-number="21">
-<h1 data-number="21"><span class="header-section-number">21</span> Recap</h1>
-<p>We made both a nowcast and a number predictions using a previously saved workflow. Contrary to Yogi Berra’s claim, it’s actually pretty easy to predict the future. Perhaps more challenging is to interpret the prediction. We bundled these together to make time series plots, and we saved the <code>.pred_presence</code> values.</p>
-</section>
-<section id="coding-assignment-2" class="level1" data-number="22">
-<h1 data-number="22"><span class="header-section-number">22</span> Coding Assignment</h1>
-<p>For each each climate scenario create a monthly forecast (so that’s three: nowcast, forecast_2055 and forecast_2075) and save each to in your <code>predictions</code> directory. Whether you choose to draw upon the greedy background sampling method, the conservative background sampling method or both is up to you. Keep in mind that some months may not have enough data to model without throwing an error. We suggest that you wrap your critical steps in a <code>try()</code> function which will catch the error without crashing your iterator. There is a tutorial on <a href="https://bigelowlab.github.io/handytandy/try.html">error catching</a> that specifically uses <code>try()</code>.</p>
 </section>
 
 <a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
diff --git a/docs/C03_covariates_files/figure-html/pairs-1.png b/docs/C03_covariates_files/figure-html/pairs-1.png
index 9161992..f9a9617 100644
Binary files a/docs/C03_covariates_files/figure-html/pairs-1.png and b/docs/C03_covariates_files/figure-html/pairs-1.png differ
diff --git a/docs/C04_models.html b/docs/C04_models.html
index c0eb77f..7f6626b 100644
--- a/docs/C04_models.html
+++ b/docs/C04_models.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 
-<title>Prediction – Colby Forecasting 2025</title>
+<title>Models – Colby Forecasting 2025</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -258,22 +258,6 @@ <h2 id="toc-title">On this page</h2>
   <li><a href="#saving-recipes-and-models-to-disk-as-a-workflow" id="toc-saving-recipes-and-models-to-disk-as-a-workflow" class="nav-link" data-scroll-target="#saving-recipes-and-models-to-disk-as-a-workflow"><span class="header-section-number">9</span> Saving recipes and models to disk as a workflow</a></li>
   <li><a href="#recap" id="toc-recap" class="nav-link" data-scroll-target="#recap"><span class="header-section-number">10</span> Recap</a></li>
   <li><a href="#coding-assignment" id="toc-coding-assignment" class="nav-link" data-scroll-target="#coding-assignment"><span class="header-section-number">11</span> Coding Assignment</a></li>
-  <li><a href="#setup-1" id="toc-setup-1" class="nav-link" data-scroll-target="#setup-1"><span class="header-section-number">12</span> Setup</a></li>
-  <li><a href="#load-the-brickman-data" id="toc-load-the-brickman-data" class="nav-link" data-scroll-target="#load-the-brickman-data"><span class="header-section-number">13</span> Load the Brickman data</a></li>
-  <li><a href="#load-the-workflow" id="toc-load-the-workflow" class="nav-link" data-scroll-target="#load-the-workflow"><span class="header-section-number">14</span> Load the workflow</a></li>
-  <li><a href="#make-a-prediction" id="toc-make-a-prediction" class="nav-link" data-scroll-target="#make-a-prediction"><span class="header-section-number">15</span> Make a prediction</a>
-  <ul class="collapse">
-  <li><a href="#nowcast" id="toc-nowcast" class="nav-link" data-scroll-target="#nowcast"><span class="header-section-number">15.1</span> Nowcast</a></li>
-  <li><a href="#forecast" id="toc-forecast" class="nav-link" data-scroll-target="#forecast"><span class="header-section-number">15.2</span> Forecast</a></li>
-  </ul></li>
-  <li><a href="#time-series" id="toc-time-series" class="nav-link" data-scroll-target="#time-series"><span class="header-section-number">16</span> Time series</a>
-  <ul class="collapse">
-  <li><a href="#forecast-2055" id="toc-forecast-2055" class="nav-link" data-scroll-target="#forecast-2055"><span class="header-section-number">16.1</span> Forecast 2055</a></li>
-  <li><a href="#bind-time-series" id="toc-bind-time-series" class="nav-link" data-scroll-target="#bind-time-series"><span class="header-section-number">16.2</span> Bind time series</a></li>
-  <li><a href="#save-the-predictions" id="toc-save-the-predictions" class="nav-link" data-scroll-target="#save-the-predictions"><span class="header-section-number">16.3</span> Save the predictions</a></li>
-  </ul></li>
-  <li><a href="#recap-1" id="toc-recap-1" class="nav-link" data-scroll-target="#recap-1"><span class="header-section-number">17</span> Recap</a></li>
-  <li><a href="#coding-assignment-1" id="toc-coding-assignment-1" class="nav-link" data-scroll-target="#coding-assignment-1"><span class="header-section-number">18</span> Coding Assignment</a></li>
   </ul>
 </nav>
     </div>
@@ -282,7 +266,7 @@ <h2 id="toc-title">On this page</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title">Prediction</h1>
+<h1 class="title">Models</h1>
 </div>
 
 
@@ -458,15 +442,15 @@ <h1 data-number="5"><span class="header-section-number">5</span> Build a recipe<
    class        MLD  Sbtm   SSS   SST  Tbtm          U           V
    &lt;fct&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;      &lt;dbl&gt;       &lt;dbl&gt;
  1 background  3.53  29.5  29.0  17.7 15.6  -0.00355    0.000511  
- 2 background  3.92  31.1  28.9  17.9  7.64 -0.0000966  0.00158   
- 3 background  4.51  28.6  28.6  19.8 19.9   0.00193    0.00234   
- 4 background  4.56  28.4  28.3  20.2 20.3   0.00202   -0.00158   
- 5 background  4.95  28.0  28.0  20.8 20.8  -0.000153  -0.00000933
- 6 background  3.60  29.9  29.4  17.4 15.0  -0.00332    0.00153   
- 7 background  3.63  30.1  29.3  17.6 13.6  -0.000100   0.00167   
- 8 background  3.32  31.2  29.3  17.3  6.93  0.0000920  0.000934  
- 9 background  2.84  31.6  29.3  17.6  5.80 -0.00135    0.00327   
-10 background  4.02  31.9  29.3  17.8  5.38 -0.000583   0.00207   
+ 2 background  4.51  28.6  28.6  19.8 19.9   0.00193    0.00234   
+ 3 background  4.56  28.4  28.3  20.2 20.3   0.00202   -0.00158   
+ 4 background  4.95  28.0  28.0  20.8 20.8  -0.000153  -0.00000933
+ 5 background  3.60  29.9  29.4  17.4 15.0  -0.00332    0.00153   
+ 6 background  3.32  31.2  29.3  17.3  6.93  0.0000920  0.000934  
+ 7 background  4.34  32.4  29.2  18.1  5.50 -0.000321   0.00283   
+ 8 background  4.01  32.4  29.1  18.1  5.53  0.000172   0.00369   
+ 9 background  4.48  32.4  28.8  18.4  5.64  0.00486    0.00551   
+10 background  4.68  32.3  28.6  18.6  5.71  0.00707    0.00473   
 # ℹ 5,443 more rows</code></pre>
 </div>
 </div>
@@ -725,7 +709,7 @@ <h1 data-number="7"><span class="header-section-number">7</span> Fit the model</
 Target node size:                 10 
 Variable importance mode:         permutation 
 Splitrule:                        gini 
-OOB prediction error (Brier s.):  0.2084605 </code></pre>
+OOB prediction error (Brier s.):  0.2138066 </code></pre>
 </div>
 </div>
 </section>
@@ -742,16 +726,16 @@ <h2 data-number="8.1" class="anchored" data-anchor-id="predict-with-the-training
 <pre><code># A tibble: 5,453 × 4
    .pred_presence .pred_background .pred      class     
             &lt;dbl&gt;            &lt;dbl&gt; &lt;fct&gt;      &lt;fct&gt;     
- 1         0.0123            0.988 background background
- 2         0.0627            0.937 background background
- 3         0.328             0.672 background background
- 4         0.183             0.817 background background
- 5         0.119             0.881 background background
- 6         0.0146            0.985 background background
- 7         0.0182            0.982 background background
- 8         0.358             0.642 background background
- 9         0.0539            0.946 background background
-10         0.0264            0.974 background background
+ 1        0.00499            0.995 background background
+ 2        0.315              0.685 background background
+ 3        0.180              0.820 background background
+ 4        0.142              0.858 background background
+ 5        0.00261            0.997 background background
+ 6        0.0324             0.968 background background
+ 7        0.0295             0.971 background background
+ 8        0.0451             0.955 background background
+ 9        0.0469             0.953 background background
+10        0.0810             0.919 background background
 # ℹ 5,443 more rows</code></pre>
 </div>
 </div>
@@ -762,10 +746,10 @@ <h2 data-number="8.1" class="anchored" data-anchor-id="predict-with-the-training
 <pre><code># A tibble: 4 × 3
   .pred      class          n
   &lt;fct&gt;      &lt;fct&gt;      &lt;int&gt;
-1 presence   presence    1394
-2 presence   background   340
-3 background presence     446
-4 background background  3273</code></pre>
+1 presence   presence    1353
+2 presence   background   333
+3 background presence     487
+4 background background  3280</code></pre>
 </div>
 </div>
 <p>There false positives and false negatives, but many are correct. Of course, this is predicting with the very data we used to train the model; knowing that this is predicicting on training data with some many misses might not inspire confidence. But let’s explore more.</p>
@@ -782,8 +766,8 @@ <h3 data-number="8.2.1" class="anchored" data-anchor-id="confusion-matrix"><span
 <div class="cell-output cell-output-stdout">
 <pre><code>            Truth
 Prediction   presence background
-  presence       1394        340
-  background      446       3273</code></pre>
+  presence       1353        333
+  background      487       3280</code></pre>
 </div>
 </div>
 <p>You’ll see this is the same as the simple tally we made, but it comes with handy plotting functionality (shown below). Note that a perfect model would have the upper left and lower right quadrants fully accounting for all points. The lower left quadrant shows us the number of false-negatives while the upper right quadrant shows the number of false-positives.</p>
@@ -819,7 +803,7 @@ <h3 data-number="8.2.2" class="anchored" data-anchor-id="roc-and-auc"><span clas
 <pre><code># A tibble: 1 × 3
   .metric .estimator .estimate
   &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt;
-1 roc_auc binary         0.942</code></pre>
+1 roc_auc binary         0.939</code></pre>
 </div>
 </div>
 </section>
@@ -832,7 +816,7 @@ <h3 data-number="8.2.3" class="anchored" data-anchor-id="accuracy"><span class="
 <pre><code># A tibble: 1 × 3
   .metric  .estimator .estimate
   &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;
-1 accuracy binary         0.856</code></pre>
+1 accuracy binary         0.850</code></pre>
 </div>
 </div>
 </section>
@@ -864,16 +848,16 @@ <h3 data-number="8.3.1" class="anchored" data-anchor-id="predict"><span class="h
 <pre><code># A tibble: 1,819 × 4
    .pred_presence .pred_background .pred      class   
             &lt;dbl&gt;            &lt;dbl&gt; &lt;fct&gt;      &lt;fct&gt;   
- 1          0.724           0.276  presence   presence
- 2          0.335           0.665  background presence
- 3          0.550           0.450  presence   presence
- 4          0.480           0.520  background presence
- 5          0.872           0.128  presence   presence
- 6          0.974           0.0262 presence   presence
- 7          0.166           0.834  background presence
- 8          0.194           0.806  background presence
- 9          0.787           0.213  presence   presence
-10          0.615           0.385  presence   presence
+ 1        0.539              0.461 presence   presence
+ 2        0.361              0.639 background presence
+ 3        0.648              0.352 presence   presence
+ 4        0.853              0.147 presence   presence
+ 5        0.221              0.779 background presence
+ 6        0.416              0.584 background presence
+ 7        0.445              0.555 background presence
+ 8        0.379              0.621 background presence
+ 9        0.00643            0.994 background presence
+10        0.438              0.562 background presence
 # ℹ 1,809 more rows</code></pre>
 </div>
 </div>
@@ -914,7 +898,7 @@ <h3 data-number="8.3.4" class="anchored" data-anchor-id="accuracy-1"><span class
 <pre><code># A tibble: 1 × 3
   .metric  .estimator .estimate
   &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;
-1 accuracy binary         0.688</code></pre>
+1 accuracy binary         0.713</code></pre>
 </div>
 </div>
 </section>
@@ -951,144 +935,6 @@ <h1 data-number="11"><span class="header-section-number">11</span> Coding Assign
 <p>Use the <a href="https://bigelowlab.github.io/handytandy/iterations.html">iterations tutorial</a> to build a workflow for each month using one or both of your background selection methods. Save each workflow in the <code>models</code> directory. If you chose to do both background selection methods then you should end up with 24 workflows (12 months x 2 background sampling methods).</p>
 
 
-<blockquote class="blockquote">
-<dl>
-<dt>It’s tough to make predictions, especially about the future.</dt>
-<dd>
-Yogi Berra
-</dd>
-</dl>
-</blockquote>
-<p>Finally we come to the end product of forecasting: the prediction. This last step is actually fairly simple, given a recipe and model (now bundled in a <code>workflow</code> container), run the same data-prep and predicting steps as we did earlier. One modification is that we now want to predict across the entire domain of our Brickman data set. You may recall that we are able to read these arrays, display them and extract point data from them. But we haven’t used them <em>en mass</em> as a variable yet.</p>
-</section>
-<section id="setup-1" class="level1" data-number="12">
-<h1 data-number="12"><span class="header-section-number">12</span> Setup</h1>
-<p>As always, we start by running our setup function. Start RStudio/R, and reload your project with the menu <code>File &gt; Recent Projects</code>.</p>
-<pre class="{r setup}"><code>source("setup.R")</code></pre>
-</section>
-<section id="load-the-brickman-data" class="level1" data-number="13">
-<h1 data-number="13"><span class="header-section-number">13</span> Load the Brickman data</h1>
-<p>Once again, we’ll use the August data where we started with a greedy sampling approach. We are going to make a prediction about the present, which means it something akin to a <a href="https://en.wikipedia.org/wiki/Nowcasting_(economics)">nowcast</a>.</p>
-<pre class="{r load_covar}"><code>cfg = read_configuration(version = "g_Aug")
-db = brickman_database()
-covars = read_brickman(db |&gt; filter(scenario == "PRESENT", interval == "mon")) |&gt;
-  select(all_of(cfg$keep_vars)) |&gt;
-  slice("month", "Aug") </code></pre>
-</section>
-<section id="load-the-workflow" class="level1" data-number="14">
-<h1 data-number="14"><span class="header-section-number">14</span> Load the workflow</h1>
-<p>We read the recipe and model workflow bundle.</p>
-<pre class="{r load_workflow}"><code>wflow = read_workflow(version = cfg$version)</code></pre>
-<p>Recall that the workflow has two elements: pre-processing recipe and model. When we make a prediction with the workflow it will accept new data that then gets filtered and/or transformed as specified by the recipe steps. The data that survives the preprocessing will then be used to feed into the model that was trained on a specific domain (time and space).</p>
-</section>
-<section id="make-a-prediction" class="level1" data-number="15">
-<h1 data-number="15"><span class="header-section-number">15</span> Make a prediction</h1>
-<p>First we shall make a “nowcast” which is just a prediction of the current environmental conditions.</p>
-<section id="nowcast" class="level2" data-number="15.1">
-<h2 data-number="15.1" class="anchored" data-anchor-id="nowcast"><span class="header-section-number">15.1</span> Nowcast</h2>
-<p>First make the prediction. The function yields a <code>stars</code> array object that has three attributes: <code>.pred_presence</code>, <code>.pred_background</code> and <code>.pred</code>. The leading dot simply gives us the heads up that these three values are all computed. The first two range from 0-1 which implies a probability. The last, <code>.pred</code>, is the class label we would assign if we accept that any <code>.pred_presence &gt;= 0.5</code> should be considered suitable habitat where a <strong>reported observation</strong> might occur.</p>
-<pre class="{r nowcast}"><code>nowcast = predict_stars(wflow, covars)
-nowcast</code></pre>
-<p>Now we can plot what is often called a “habitat suitability index” (hsi) map.</p>
-<pre class="{r plot_nowcast}"><code>coast = read_coastline()
-plot(nowcast['.pred_presence'], main = "Nowcast August", 
-     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)
-plot(coast, col = "orange", lwd = 2, add = TRUE)</code></pre>
-<p>We can also plot a presence/background labeled map, but keep in mind it is just a thresholded version of the above where “presence” means <code>.pred_presence &gt;= 0.5</code>.</p>
-<pre class="{r plot_class_labels}"><code>plot(nowcast['.pred'], main = "Nowcast August Labels", 
-     axes = TRUE, reset = FALSE)
-plot(coast, col = "black", lwd = 2, add = TRUE)</code></pre>
-</section>
-<section id="forecast" class="level2" data-number="15.2">
-<h2 data-number="15.2" class="anchored" data-anchor-id="forecast"><span class="header-section-number">15.2</span> Forecast</h2>
-<p>Now let’s try our hand at forecasting - let’s try RCP85 in 2075. First we load those parameters, then run the prediction and plot.</p>
-<pre class="{r load_2075_RCP85}"><code>covars_rcp85_2075 = read_brickman(db |&gt; filter(scenario == "RCP85", year == 2075, interval == "mon")) |&gt;
-  select(all_of(cfg$keep_vars)) |&gt;
-  slice("month", "Aug") </code></pre>
-<pre class="{r forecast}"><code>forecast_2075 = predict_stars(wflow, covars_rcp85_2075)
-forecast_2075</code></pre>
-<pre class="{r plot_forecast}"><code>coast = read_coastline()
-plot(forecast_2075['.pred_presence'], main = "RCP85 2075 August", 
-     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)
-plot(coast, col = "orange", lwd = 2, add = TRUE)</code></pre>
-<p>Hmmm, that’s pretty different than what the nowcast predicts.</p>
-</section>
-</section>
-<section id="time-series" class="level1" data-number="16">
-<h1 data-number="16"><span class="header-section-number">16</span> Time series</h1>
-<p>It would be nice to see a time series: current, 2055 and 2075 on the same graphic. Let’s load RCP85 2055 data, and make yet another prediction.</p>
-<section id="forecast-2055" class="level2" data-number="16.1">
-<h2 data-number="16.1" class="anchored" data-anchor-id="forecast-2055"><span class="header-section-number">16.1</span> Forecast 2055</h2>
-<pre class="{r load_2055_RCP85}"><code>covars_rcp85_2055 = read_brickman(db |&gt; filter(scenario == "RCP85", year == 2055, interval == "mon")) |&gt;
-  select(all_of(cfg$keep_vars)) |&gt;
-  slice("month", "Aug") 
-forecast_2055 = predict_stars(wflow, covars_rcp85_2055)
-forecast_2055</code></pre>
-</section>
-<section id="bind-time-series" class="level2" data-number="16.2">
-<h2 data-number="16.2" class="anchored" data-anchor-id="bind-time-series"><span class="header-section-number">16.2</span> Bind time series</h2>
-<p>We want to bind the <code>.pred_presence</code> attribute for each of the predictions (nowcast, forecast_2055 and forecast_2075). Let’s assume the “present” mean 2020 so we can assign a year.</p>
-<pre class="{r bind}"><code>rcp85 = c(nowcast, forecast_2055, forecast_2075, along = list(year = c("2020", "2055", "2075")))</code></pre>
-<div class="callout callout-style-default callout-note callout-titled">
-<div class="callout-header d-flex align-content-center">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-title-container flex-fill">
-Note
-</div>
-</div>
-<div class="callout-body-container callout-body">
-<p>Curious about we provide year as a vector of characters instead of a vector of integers? Try running the command above again and check out the 3rd dimension.</p>
-</div>
-</div>
-<p>Since we are plotting multiple arrays, we need to plot the coastline using a “hook” function.</p>
-<pre class="{r plot_hsi_series}"><code>plot_coast = function(){
-  plot(coast, col = "orange", lwd = 2, add = TRUE)
-}
-
-plot(rcp85['.pred_presence'], 
-     hook = plot_coast,
-     axes = TRUE, breaks = seq(0, 1, by = 0.1), join_zlim  = TRUE, reset = FALSE)</code></pre>
-<p>Hmmmm. Why does there seem to be a strong shift between 2020 and 2055, while the 2055 to 2075 shift seems less pronounced?</p>
-<div class="callout callout-style-default callout-note callout-titled">
-<div class="callout-header d-flex align-content-center">
-<div class="callout-icon-container">
-<i class="callout-icon"></i>
-</div>
-<div class="callout-title-container flex-fill">
-Note
-</div>
-</div>
-<div class="callout-body-container callout-body">
-<p>Don’t forget that <a href="https://github.com/BigelowLab/ColbyForecasting2025/wiki/Spatial">there are other ways</a> to plot array based spatial data.</p>
-</div>
-</div>
-</section>
-<section id="save-the-predictions" class="level2" data-number="16.3">
-<h2 data-number="16.3" class="anchored" data-anchor-id="save-the-predictions"><span class="header-section-number">16.3</span> Save the predictions</h2>
-<p>We could save all three attributes, but <code>.pred_background</code> is just <code>1 - .pred_presence</code>, and <code>.pred</code> is just coding “presence” where <code>.pred_presence &gt;= 0.5</code>, so we can always compute those as needed if we have <code>.pred_presence</code>. In that case, let’s just save the first attribute, <code>.pred_presence</code>, in a multilayer <a href="https://en.wikipedia.org/wiki/GeoTIFF">GeoTIFF</a> formatted image array file. The <code>write_prediction()</code> function will do just that.</p>
-<pre class="{r save_pred}"><code># make sure the output directory exists
-path = data_path("predictions")
-if (!dir.exists(path)) ok = dir.create(path, recursive = TRUE)
-
-# write individual arrays?
-write_prediction(nowcast, file = file.path(path,"g_Aug_RCP85_2020.tif"))
-write_prediction(forecast_2055, file = file.path(path, "g_Aug_RCP85_2055.tif"))
-write_prediction(forecast_2075, file = file.path(path, "g_Aug_RCP85_2075.tif"))
-
-# or write them together in a "multi-layer" file?
-write_prediction(rcp85, file = file.path(path, "g_Aug_RCP85_all.tif"))</code></pre>
-<p>To read it back simply provide the filename to <code>read_prediction()</code>. If you are reading back a multi-layer array, be sure to check out the <code>time</code> argument to assign values to the time dimension. Single layer arrays don’t have the concept of time so the <code>time</code> argument is ignored.</p>
-</section>
-</section>
-<section id="recap-1" class="level1" data-number="17">
-<h1 data-number="17"><span class="header-section-number">17</span> Recap</h1>
-<p>We made both a nowcast and a number predictions using a previously saved workflow. Contrary to Yogi Berra’s claim, it’s actually pretty easy to predict the future. Perhaps more challenging is to interpret the prediction. We bundled these together to make time series plots, and we saved the <code>.pred_presence</code> values.</p>
-</section>
-<section id="coding-assignment-1" class="level1" data-number="18">
-<h1 data-number="18"><span class="header-section-number">18</span> Coding Assignment</h1>
-<p>For each each climate scenario create a monthly forecast (so that’s three: nowcast, forecast_2055 and forecast_2075) and save each to in your <code>predictions</code> directory. Whether you choose to draw upon the greedy background sampling method, the conservative background sampling method or both is up to you. Keep in mind that some months may not have enough data to model without throwing an error. We suggest that you wrap your critical steps in a <code>try()</code> function which will catch the error without crashing your iterator. There is a tutorial on <a href="https://bigelowlab.github.io/handytandy/try.html">error catching</a> that specifically uses <code>try()</code>.</p>
 </section>
 
 <a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
diff --git a/docs/C04_models_files/figure-html/conf_mat_test-1.png b/docs/C04_models_files/figure-html/conf_mat_test-1.png
index 2e58416..2724a00 100644
Binary files a/docs/C04_models_files/figure-html/conf_mat_test-1.png and b/docs/C04_models_files/figure-html/conf_mat_test-1.png differ
diff --git a/docs/C04_models_files/figure-html/pd_plot-1.png b/docs/C04_models_files/figure-html/pd_plot-1.png
index ede9226..3d893de 100644
Binary files a/docs/C04_models_files/figure-html/pd_plot-1.png and b/docs/C04_models_files/figure-html/pd_plot-1.png differ
diff --git a/docs/C04_models_files/figure-html/pd_plot_test-1.png b/docs/C04_models_files/figure-html/pd_plot_test-1.png
index 26f6b24..68e080d 100644
Binary files a/docs/C04_models_files/figure-html/pd_plot_test-1.png and b/docs/C04_models_files/figure-html/pd_plot_test-1.png differ
diff --git a/docs/C04_models_files/figure-html/plot_confmat_train-1.png b/docs/C04_models_files/figure-html/plot_confmat_train-1.png
index 6c69f54..72f6c7b 100644
Binary files a/docs/C04_models_files/figure-html/plot_confmat_train-1.png and b/docs/C04_models_files/figure-html/plot_confmat_train-1.png differ
diff --git a/docs/C04_models_files/figure-html/plot_roc-1.png b/docs/C04_models_files/figure-html/plot_roc-1.png
index 36c20ff..1c9932a 100644
Binary files a/docs/C04_models_files/figure-html/plot_roc-1.png and b/docs/C04_models_files/figure-html/plot_roc-1.png differ
diff --git a/docs/C04_models_files/figure-html/plot_roc_test-1.png b/docs/C04_models_files/figure-html/plot_roc_test-1.png
index cda37fc..3960701 100644
Binary files a/docs/C04_models_files/figure-html/plot_roc_test-1.png and b/docs/C04_models_files/figure-html/plot_roc_test-1.png differ
diff --git a/docs/C05_prediction.html b/docs/C05_prediction.html
index 4631464..db0a68e 100644
--- a/docs/C05_prediction.html
+++ b/docs/C05_prediction.html
@@ -310,12 +310,12 @@ <h2 data-number="4.1" class="anchored" data-anchor-id="nowcast"><span class="hea
 <pre><code>stars object with 2 dimensions and 3 attributes
 attribute(s):
  .pred_presence  .pred_background         .pred     
- Min.   :0.000   Min.   :0.003     presence  : 618  
- 1st Qu.:0.031   1st Qu.:0.743     background:5168  
- Median :0.092   Median :0.908     NA's      :4983  
- Mean   :0.183   Mean   :0.817                      
- 3rd Qu.:0.257   3rd Qu.:0.969                      
- Max.   :0.997   Max.   :1.000                      
+ Min.   :0.000   Min.   :0.000     presence  : 584  
+ 1st Qu.:0.035   1st Qu.:0.748     background:5202  
+ Median :0.097   Median :0.903     NA's      :4983  
+ Mean   :0.184   Mean   :0.816                      
+ 3rd Qu.:0.252   3rd Qu.:0.965                      
+ Max.   :1.000   Max.   :1.000                      
  NA's   :4983    NA's   :4983                       
 dimension(s):
   from  to offset    delta refsys point x/y
@@ -366,12 +366,12 @@ <h2 data-number="4.2" class="anchored" data-anchor-id="forecast"><span class="he
 <pre><code>stars object with 2 dimensions and 3 attributes
 attribute(s):
  .pred_presence  .pred_background         .pred     
- Min.   :0.000   Min.   :0.302     presence  :  37  
- 1st Qu.:0.137   1st Qu.:0.689     background:5749  
- Median :0.257   Median :0.743     NA's      :4983  
- Mean   :0.228   Mean   :0.772                      
- 3rd Qu.:0.311   3rd Qu.:0.863                      
- Max.   :0.698   Max.   :1.000                      
+ Min.   :0.000   Min.   :0.323     presence  :  36  
+ 1st Qu.:0.141   1st Qu.:0.688     background:5750  
+ Median :0.260   Median :0.740     NA's      :4983  
+ Mean   :0.230   Mean   :0.770                      
+ 3rd Qu.:0.312   3rd Qu.:0.859                      
+ Max.   :0.677   Max.   :1.000                      
  NA's   :4983    NA's   :4983                       
 dimension(s):
   from  to offset    delta refsys point x/y
@@ -410,12 +410,12 @@ <h2 data-number="5.1" class="anchored" data-anchor-id="forecast-2055"><span clas
 <pre><code>stars object with 2 dimensions and 3 attributes
 attribute(s):
  .pred_presence  .pred_background         .pred     
- Min.   :0.000   Min.   :0.447     presence  :   6  
- 1st Qu.:0.122   1st Qu.:0.694     background:5780  
- Median :0.251   Median :0.749     NA's      :4983  
- Mean   :0.221   Mean   :0.779                      
- 3rd Qu.:0.306   3rd Qu.:0.878                      
- Max.   :0.553   Max.   :1.000                      
+ Min.   :0.000   Min.   :0.425     presence  :  31  
+ 1st Qu.:0.135   1st Qu.:0.682     background:5755  
+ Median :0.263   Median :0.737     NA's      :4983  
+ Mean   :0.231   Mean   :0.769                      
+ 3rd Qu.:0.318   3rd Qu.:0.865                      
+ Max.   :0.575   Max.   :1.000                      
  NA's   :4983    NA's   :4983                       
 dimension(s):
   from  to offset    delta refsys point x/y
diff --git a/docs/C05_prediction_files/figure-html/plot_class_labels-1.png b/docs/C05_prediction_files/figure-html/plot_class_labels-1.png
index 57cfd52..5824595 100644
Binary files a/docs/C05_prediction_files/figure-html/plot_class_labels-1.png and b/docs/C05_prediction_files/figure-html/plot_class_labels-1.png differ
diff --git a/docs/C05_prediction_files/figure-html/plot_forecast-1.png b/docs/C05_prediction_files/figure-html/plot_forecast-1.png
index ad89cd1..c272b9c 100644
Binary files a/docs/C05_prediction_files/figure-html/plot_forecast-1.png and b/docs/C05_prediction_files/figure-html/plot_forecast-1.png differ
diff --git a/docs/C05_prediction_files/figure-html/plot_hsi_series-1.png b/docs/C05_prediction_files/figure-html/plot_hsi_series-1.png
index 0f96491..ee49823 100644
Binary files a/docs/C05_prediction_files/figure-html/plot_hsi_series-1.png and b/docs/C05_prediction_files/figure-html/plot_hsi_series-1.png differ
diff --git a/docs/C05_prediction_files/figure-html/plot_nowcast-1.png b/docs/C05_prediction_files/figure-html/plot_nowcast-1.png
index c66eab8..b8c4b5f 100644
Binary files a/docs/C05_prediction_files/figure-html/plot_nowcast-1.png and b/docs/C05_prediction_files/figure-html/plot_nowcast-1.png differ
diff --git a/docs/search.json b/docs/search.json
index 7978d52..c069ffa 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -1,342 +1,342 @@
 [
   {
-    "objectID": "C02_background.html",
-    "href": "C02_background.html",
-    "title": "Background",
+    "objectID": "about.html",
+    "href": "about.html",
+    "title": "About",
     "section": "",
-    "text": "Traditional ecological surveys are systematic, for a given species survey data sets tell us where the species is found and where it is absent. Using an observational data (like OBIS) set we only know where the species is found, which leaves us guessing about where they might not be found. This difference is what distinguishes a presence-abscence data set from a presence-only data set, and this difference guides the modeling process.\nWhen we model, we are trying to define the environs where we should expect to find a species as well as the environs we would not expect to find a species. We have in hand the locations of observations, and we can extract the environmental data at those locations. But to characterize the less suitable environments we are going to have to sample what is called “background”. We want these background samples to roughly match the regional preferences of the observations; that is we want to avoid having observations that are mostly over Georges Bank while our background samples are primarily around the Bay of Fundy.",
+    "text": "Brought to you by the Tandy Center for Ocean Forecasting at Bigelow Laboratory for Ocean Science and Colby College.\n\n1 Contacts\nDr. Nick Record\nBen Tupper\nRaising questions or issues: If you have a question, start a new “issue” on the github issues tab. If a question has been posed by another, and you think you can help with the answer then please feel free to respond.\n\n\n2 Website\nWe build the website using quarto which is perfect from transforming [RMarkdown](https://rmarkdown.rstudio.com/ pages into a website with minimal investment. See this wiki page if you would like to add your work to you own fork of the class repository.\n\n\n\n\n Back to top",
     "crumbs": [
-      "Background"
+      "About"
     ]
   },
   {
-    "objectID": "C02_background.html#sample-background",
-    "href": "C02_background.html#sample-background",
-    "title": "Background",
-    "section": "2.1 Sample background",
-    "text": "2.1 Sample background\nWhen we sample the background, we are creating the input for the model if we request that the observations (presences) are joined with the background.\nNext we sample the background as guided by the density map. We’ll ask for 2x as many presences, but it is just a request. We also request that no background point be further than 30km (30000m) from it’s closest presence point.\n\ngreedy_input = sample_background(obs, mask, \n                              n = 2 * nrow(obs),\n                              class_label = \"background\",\n                              method = c(\"dist_max\", 30000),\n                              return_pres = TRUE)\n\nWarning in sample_background(obs, mask, n = 2 * nrow(obs), class_label = \"background\", : There are fewer available cells for raster 'NA' (2459 presences) than the requested 4918 background points. Only 4818 will be returned.\n\ngreedy_input\n\nSimple feature collection with 7277 features and 1 field\nGeometry type: POINT\nDimension:     XY\nBounding box:  xmin: -74.89169 ymin: 38.805 xmax: -65.02004 ymax: 45.21401\nGeodetic CRS:  WGS 84\n# A tibble: 7,277 × 2\n   class                geometry\n * &lt;fct&gt;             &lt;POINT [°]&gt;\n 1 presence    (-72.8074 39.056)\n 2 presence      (-71.343 40.52)\n 3 presence  (-68.7691 41.52448)\n 4 presence       (-67.79 43.32)\n 5 presence (-68.44324 42.61177)\n 6 presence    (-72.4328 40.213)\n 7 presence   (-71.8784 40.3569)\n 8 presence      (-65.78 43.195)\n 9 presence       (-70.5 42.767)\n10 presence   (-72.3024 40.1862)\n# ℹ 7,267 more rows\n\n\nYou may encounter a warning message that says, “There are fewer available cells for raster…”. This is useful information, there simply weren’t a lot of non-NA cells to sample from. Let’s plot this.\n\nplot(greedy_input['class'], \n     axes = TRUE,  \n     pch = \".\", \n     extent = mask, \n     main = \"August greedy class distribution\",\n     reset = FALSE)\nplot(coast, col = \"orange\", add = TRUE)\n\n\n\n\n\n\n\n\nHmmm, let’s tally the class labels.\n\ncount(greedy_input, class)\n\nSimple feature collection with 2 features and 2 fields\nGeometry type: MULTIPOINT\nDimension:     XY\nBounding box:  xmin: -74.89169 ymin: 38.805 xmax: -65.02004 ymax: 45.21401\nGeodetic CRS:  WGS 84\n# A tibble: 2 × 3\n  class          n                                                      geometry\n* &lt;fct&gt;      &lt;int&gt;                                              &lt;MULTIPOINT [°]&gt;\n1 presence    2459 ((-65.07 42.68), (-65.067 42.65), (-65.05 42.583), (-65.05 4…\n2 background  4818 ((-65.02004 42.25251), (-65.02004 42.74609), (-65.1023 42.66…\n\n\nWell, that’s imbalanced with a different number presences than background points. But, on the bright side, the background points are definitely in the region of observations.",
+    "objectID": "C04_models.html",
+    "href": "C04_models.html",
+    "title": "Models",
+    "section": "",
+    "text": "All models are wrong, but some are useful.\n\nGeorge Box\nModeling starts with a collection of observations (presence and background for us!) and ends up with a collection of coeefficients that can be used with one or more formulas to make a predicition for the past, the present or the future. We are using modeling specifically to make habitat suitability maps for select species under two climate scenarios (RCP45 and RCP85) at two different times (2055 and 2075) in the future.\nWe can choose from a number of different models: random forest “rf”, maximum entropy “maxent” or “maxnet”, boosted regression trees “brt”, general linear models “glm”, etc. The point of each is to make a mathematical representation of natural occurrences. It is important to consider what those occurences might be - categorical like labels? likelihoods like probabilities? continuous like measurements? Here are examples of each…\nWe are modeling with known observations (presences) and a sampling of the background, so we are trying to model a likelihood that a species will be encountered (and reported) relative to the environmental conditions. We are looking for a model that can produce relative likelihood of an encounter that results in a report.\nWe’ll be using a random forest model (rf). We were inspired to follow this route by using this tidy models tutorial prepared by our colleague Omi Johnson.",
     "crumbs": [
-      "Background"
+      "Models"
     ]
   },
   {
-    "objectID": "C02_background.html#a-function-we-can-reuse",
-    "href": "C02_background.html#a-function-we-can-reuse",
-    "title": "Background",
-    "section": "5.1 A function we can reuse",
-    "text": "5.1 A function we can reuse\nHere we make a function that needs at least three arguments: the complete set of observations, the mask used for sampling (and possibly thinning) and the month to filter the observations. The pseudo-code might look like this…\nfor a given month\n  filter the obs for that month\n  make the greedy model input by sampling the background\n    save the greedy model input\n  thin the obs\n  make the conservative model input by sampling background\n    save the conservative model input\n  return a list the greedy and conservative model inputs\nPhew! That’s a lot of steps. To manually run those steps 12 times would be tedious, so we roll that into a function that we can reuse 12 times instead.\nThis function will have a name, make_model_input_by_month. It’s a long name, but it makes it obvious what it does. First we start with the documentation.\n\n#' Builds greedy and conservative model input data sets for a given month\n#' \n#' @param mon chr the month abbreviation for the month of interest (\"Jan\" by default)\n#' @param obs table, the complete observation data set\n#' @param raster stars, the object that defines the sampling space, usually a mask\n#' @param species chr, the name of the species prepended to the name of the output files.\n#'   (By default \"Mola mola\" which gets converted to \"Mola_mola\")\n#' @param path the output data path to store this data (be default \"model_input\")\n#' @param min_obs num this sets a threshold below which we wont try to make a model. (Default is 3)\n#' @return a named two element list of greedy and conservative model inputs - they are tables\nmake_model_input_by_month  = function(mon = \"Jan\",\n                                      obs = read_observations(\"Mola mola\"),\n                                      raster = NULL,\n                                      species = \"Mola mola\",\n                                      path = data_path(\"model_input\"),\n                                      min_obs = 3){\n  # the user *must* provide a raster\n  if (is.null(raster)) stop(\"please provide a raster\")\n  # filter the obs\n  obs = obs |&gt;\n    filter(month == mon[1])\n  \n  # check that we have at least some records, if not enough then alert the user\n  # and return NULL\n  if (nrow(obs) &lt; min_obs){\n    warning(\"sorry, this month has too few records: \", mon)\n    return(NULL)\n  }\n  \n  # make sure the output path exists, if not, make it\n  make_path(path)\n  \n  \n  # make the greedy model input by sampling the background\n  greedy_input = sample_background(obs, raster,\n                                   n = 2 * nrow(obs),\n                                   class_label = \"background\",\n                                   method = c(\"dist_max\", 30000),\n                                   return_pres = TRUE)\n  # save the greedy data\n  filename = sprintf(\"%s-%s-greedy_input.gpkg\", \n                     gsub(\" \", \"_\", species),\n                     mon)\n  write_sf(greedy_input, file.path(path, filename))\n  \n  # thin the obs\n  obs = thin_by_cell(obs, raster)\n  \n  # make the conservative model\n  conservative_input = sample_background(obs, raster,\n                                   n = 2 * nrow(obs),\n                                   class_label = \"background\",\n                                   method = c(\"dist_max\", 30000),\n                                   return_pres = TRUE)\n  \n  # save the conservative data\n  filename = sprintf(\"%s-%s-conservative_input.gpkg\", \n                     gsub(\" \", \"_\", species),\n                     mon)\n  write_sf(conservative_input, file.path(path,filename))\n  \n  # make a list\n  r = list(greedy = greedy_input, conservative = conservative_input)\n  \n  # return, but disable automatic printing\n  invisible(r)\n}",
+    "objectID": "C04_models.html#modifying-the-recipe-with-steps",
+    "href": "C04_models.html#modifying-the-recipe-with-steps",
+    "title": "Models",
+    "section": "5.1 Modifying the recipe with steps",
+    "text": "5.1 Modifying the recipe with steps\nSteps are cumulative modifications, and that means the order in which they are added matters. These steps comprise the bulk of pre-processing steps.\nSome modifications are applied row-by-row. For example, rows of the input modeling data that have one or more missing values (NAs) can be problematic and they should be removed.\nOther modifications are to manipulate entire columns. Sometimes the recipes requires subsequent steps before the modeling begins in earnest. For example we know from experience that it is often useful to log scale (base 10) depth when working with biological models. If depth and Xbtm have made it this far, you’ll note that each range over 4 or more orders of magnitude. That’s not a problem by itself, but it can introduce a bias toward larger values whenever the mean is computed. So, we’ll add a step for log scaling these, but only if depth and Xbtm have made it this far (this may vary by species.)\n\nrec = rec |&gt; \n  step_naomit()\nif (\"depth\" %in% cfg$keep_vars){\n  rec = rec |&gt;\n    step_log(depth,  base = 10)\n}\nif (\"Xbtm\" %in% cfg$keep_vars){\n  rec = rec |&gt;\n    step_log(Xbtm,  base = 10)\n}\nrec\n\n\n\n\n── Recipe ──────────────────────────────────────────────────────────────────────\n\n\n\n\n\n── Inputs \n\n\nNumber of variables by role\n\n\noutcome:   1\npredictor: 7\n\n\n\n\n\n── Operations \n\n\n• Removing rows with NA values in: &lt;none&gt;\n\n\nNext we state that we want to remove variables that might be highly correlated with other variables. If two variables are highly correlated, they will not provide the modeling system with more information, just redundant information which doesn’t neccessarily help. step_corr() accepts a variety of arguments specifying which variables to test to correlation including some convenience selectors like all_numeric(), all_string() and friends. We want all predictors which happen to all be numeric, so we can use all_predictors() or all_numeric_predictors(). Specificity is better then generality so let’s choose numeric predictors.\n\n\n\n\n\n\nNote\n\n\n\nWe have already tested variables for high collinearlity, but here we can add a slightly different filter, high correlation, for the same issue. Since we have dealt with this already we shouldn’t expect that step will change the preprocessing very much. But it is instructive to see it in action.\n\n\n\nrec = rec |&gt; \n  step_corr(all_numeric_predictors())\nrec\n\n\n\n\n── Recipe ──────────────────────────────────────────────────────────────────────\n\n\n\n\n\n── Inputs \n\n\nNumber of variables by role\n\n\noutcome:   1\npredictor: 7\n\n\n\n\n\n── Operations \n\n\n• Removing rows with NA values in: &lt;none&gt;\n\n\n• Correlation filter on: all_numeric_predictors()",
     "crumbs": [
-      "Background"
+      "Models"
     ]
   },
   {
-    "objectID": "C03_covariates.html",
-    "href": "C03_covariates.html",
-    "title": "Prediction",
-    "section": "",
-    "text": "“In the end that was the choice you made, and it doesn’t matter how hard it was to make it. It matters that you did.”\n\nCassandra Clare\nNow we turn our attention to what we know and guess about the environments. We are using the Brickman data to make habitat suitability maps for select species under two climate scenarios (RCP45 and RCP85) at two different times (2055 and 2075) in the future. Each variable we might use is called covariate or predictor. Our covariates are nicely packaged up and tidy, but the reality is that it often requires a good deal of data wrangling if the data are messy.\nOur step here is to make sure that two or more covariates are not highly correlated if they are, then we would likely want to drop all but one.",
+    "objectID": "C04_models.html#add-the-recipe-to-the-workflow",
+    "href": "C04_models.html#add-the-recipe-to-the-workflow",
+    "title": "Models",
+    "section": "5.2 Add the recipe to the workflow",
+    "text": "5.2 Add the recipe to the workflow\n\nwflow = wflow |&gt;\n  add_recipe(rec)\nwflow\n\n══ Workflow ════════════════════════════════════════════════════════════════════\nPreprocessor: Recipe\nModel: None\n\n── Preprocessor ────────────────────────────────────────────────────────────────\n2 Recipe Steps\n\n• step_naomit()\n• step_corr()",
     "crumbs": [
-      "Covariates"
+      "Models"
     ]
   },
   {
-    "objectID": "C03_covariates.html#reading-in-the-covariates",
-    "href": "C03_covariates.html#reading-in-the-covariates",
-    "title": "Prediction",
-    "section": "2.1 Reading in the covariates",
-    "text": "2.1 Reading in the covariates\nWe’ll read in the Brickman database, then filter two different subsets to read: “STATIC” covariate bathymetry that apply across all scenarios and times and monthly covariates for the “PRESENT” period. Note that depth is automatically included - that’s an option - see ?read_brickman for more information.\n\ndb = brickman_database()\npresent = read_brickman(filter(db, scenario == \"PRESENT\", interval == \"mon\"))\n\nWe have used August before as our example, let’s continue with August.\n\naug = present |&gt;\n  dplyr::slice(\"month\", \"Aug\")",
+    "objectID": "C04_models.html#create-the-model",
+    "href": "C04_models.html#create-the-model",
+    "title": "Models",
+    "section": "6.1 Create the model",
+    "text": "6.1 Create the model\nWe create a random forest model, declare that it should be run in classification mode (not regression mode), and then specify that we want to use the ranger modeling engine (as opposed to, say, the randForest engine). We additionally specify that it should be able to produce probablilites of a class not just the class label. We also request that it saves bits of info so that we can compare the relative importance of the covariates.\n\nmodel = rand_forest() |&gt;\n  set_mode(\"classification\") |&gt;\n  set_engine(\"ranger\", probability = TRUE, importance = \"permutation\") \nmodel\n\nRandom Forest Model Specification (classification)\n\nEngine-Specific Arguments:\n  probability = TRUE\n  importance = permutation\n\nComputational engine: ranger \n\n\nWell, that feels underwhelming. We can pass arguments unique to the engine using the set_args() function, but, for now we’ll accept the defaults.",
     "crumbs": [
-      "Covariates"
+      "Models"
     ]
   },
   {
-    "objectID": "C03_covariates.html#make-a-pairs-plot",
-    "href": "C03_covariates.html#make-a-pairs-plot",
-    "title": "Prediction",
-    "section": "2.2 Make a pairs plot",
-    "text": "2.2 Make a pairs plot\nA pairs plot is a plot often used in exploratory data analysis. It makes a grid of mini-plots of a set of variables, and reveals the relationships among the variables pair-by-pair. It’s easy to make.\n\npairs(aug)\n\n\n\n\n\n\n\n\nIn the lower left portion of the plot we see paired scatter plots, at upper right we see the correlation values of the pairs, and long the diagonal we see a histogram of each variable. Some pairs are highly correlated, say over 0.7, and to include both in the modeling might not provide us with greater predictive power. It may feel counterintuitive to remove any variables - more data means more information, right? And more information means more informed models. Consider two measurements, human arm length and inseam. We might use these to predict if a person is tall, but since they are probably strongly collinear/correlated do we really need both?",
+    "objectID": "C04_models.html#add-the-model-to-the-workflow",
+    "href": "C04_models.html#add-the-model-to-the-workflow",
+    "title": "Models",
+    "section": "6.2 Add the model to the workflow",
+    "text": "6.2 Add the model to the workflow\nNow we simply add the model to the workflow.\n\nwflow = wflow |&gt;\n  add_model(model)\nwflow\n\n══ Workflow ════════════════════════════════════════════════════════════════════\nPreprocessor: Recipe\nModel: rand_forest()\n\n── Preprocessor ────────────────────────────────────────────────────────────────\n2 Recipe Steps\n\n• step_naomit()\n• step_corr()\n\n── Model ───────────────────────────────────────────────────────────────────────\nRandom Forest Model Specification (classification)\n\nEngine-Specific Arguments:\n  probability = TRUE\n  importance = permutation\n\nComputational engine: ranger",
     "crumbs": [
-      "Covariates"
+      "Models"
     ]
   },
   {
-    "objectID": "C03_covariates.html#identify-the-most-independent-variables-and-the-most-collinear",
-    "href": "C03_covariates.html#identify-the-most-independent-variables-and-the-most-collinear",
-    "title": "Prediction",
-    "section": "2.3 Identify the most independent variables (and the most collinear)",
-    "text": "2.3 Identify the most independent variables (and the most collinear)\nWe have a function that can help use select which variables to remove. filter_collinear() returns a listing of variables it suggests we keep. It attaches to the return value an attribute (like a post-it note stuck on a box) that lists the complementary variables that it suggests we drop. We are choosing a particular method, but you can learn more about using R’s help for ?filter_collinear.\n\nkeep = filter_collinear(aug, method = \"vif_step\")\nkeep\n\n[1] \"MLD\"  \"Sbtm\" \"SSS\"  \"SST\"  \"Tbtm\" \"U\"    \"V\"   \nattr(,\"to_remove\")\n[1] \"Xbtm\"  \"depth\"\n\n\nOf course, we can decide to ignore this advice, and pick which ever ones we want including keeping them all.\nWhatever selection of variables we decide to model with, we will save this listing to a file. That way we can refer to it progammatically. But that comes later.",
+    "objectID": "C04_models.html#predict-with-the-training-data",
+    "href": "C04_models.html#predict-with-the-training-data",
+    "title": "Models",
+    "section": "8.1 Predict with the training data",
+    "text": "8.1 Predict with the training data\nFirst we shall predict with the same data we trained with. The results of this will not really tell us much about our model as it is very circular to predict using the very data used to build the model. So this next section is more about a first pass at using the tools at your disposal.\n\ntrain_pred = predict_table(fitted_wflow, tr_data, type = \"prob\")\ntrain_pred\n\n# A tibble: 5,453 × 4\n   .pred_presence .pred_background .pred      class     \n            &lt;dbl&gt;            &lt;dbl&gt; &lt;fct&gt;      &lt;fct&gt;     \n 1        0.00499            0.995 background background\n 2        0.315              0.685 background background\n 3        0.180              0.820 background background\n 4        0.142              0.858 background background\n 5        0.00261            0.997 background background\n 6        0.0324             0.968 background background\n 7        0.0295             0.971 background background\n 8        0.0451             0.955 background background\n 9        0.0469             0.953 background background\n10        0.0810             0.919 background background\n# ℹ 5,443 more rows\n\n\nHere the variables prepended with a dot . are computed, while the class variable is our original. There are many metrics we can use to determine how well this model predicts. Let’s start with the simplest thing… we can make a simply tally of .pred and class.\n\ncount(train_pred, .pred, class)\n\n# A tibble: 4 × 3\n  .pred      class          n\n  &lt;fct&gt;      &lt;fct&gt;      &lt;int&gt;\n1 presence   presence    1353\n2 presence   background   333\n3 background presence     487\n4 background background  3280\n\n\nThere false positives and false negatives, but many are correct. Of course, this is predicting with the very data we used to train the model; knowing that this is predicicting on training data with some many misses might not inspire confidence. But let’s explore more.",
     "crumbs": [
-      "Covariates"
+      "Models"
     ]
   },
   {
-    "objectID": "C03_covariates.html#a-closer-look-at-the-model-input-data",
-    "href": "C03_covariates.html#a-closer-look-at-the-model-input-data",
-    "title": "Prediction",
-    "section": "2.4 A closer look at the model input data",
-    "text": "2.4 A closer look at the model input data\nBefore we do commit to a selection of variables, let’s turn our attention back to our presence-background points, and look at just those chosen values rather than at values drawn form across the entire domain. Let’s open the file that contains the “greedy” model input for August during the PRESENT climate scenario.\n\nmodel_input = read_model_input(scientificname = \"Mola mola\", \n                               approach = \"greedy\", \n                               mon = \"Aug\")\nmodel_input\n\nSimple feature collection with 7277 features and 1 field\nGeometry type: POINT\nDimension:     XY\nBounding box:  xmin: -74.89169 ymin: 38.805 xmax: -65.02004 ymax: 45.21401\nGeodetic CRS:  WGS 84\n# A tibble: 7,277 × 2\n   class                    geom\n   &lt;chr&gt;             &lt;POINT [°]&gt;\n 1 presence    (-72.8074 39.056)\n 2 presence      (-71.343 40.52)\n 3 presence  (-68.7691 41.52448)\n 4 presence       (-67.79 43.32)\n 5 presence (-68.44324 42.61177)\n 6 presence    (-72.4328 40.213)\n 7 presence   (-71.8784 40.3569)\n 8 presence      (-65.78 43.195)\n 9 presence       (-70.5 42.767)\n10 presence   (-72.3024 40.1862)\n# ℹ 7,267 more rows\n\n\nNext we’ll extract data values from our August covariates.\n\nvariables = extract_brickman(aug, model_input, form = \"wide\")\nvariables\n\n# A tibble: 7,277 × 10\n   point   MLD  Sbtm   SSS   SST  Tbtm         U         V     Xbtm depth\n   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;\n 1 p0001  5.17  35.0  31.6  23.3  7.50 -0.00161  -0.00340  0.00133  304. \n 2 p0002  4.25  32.8  30.6  21.6  8.15 -0.00420  -0.00206  0.00166   71.6\n 3 p0003  4.64  34.0  30.7  20.2  7.05  0.00168   0.00148  0.000793 138. \n 4 p0004  5.58  34.6  30.7  18.8  7.55  0.00267  -0.000410 0.000957 234. \n 5 p0005  5.04  34.7  30.7  19.0  7.43 -0.00619  -0.00121  0.00224  205. \n 6 p0006  4.01  32.4  30.6  22.0  8.22 -0.00344  -0.000859 0.00126   62.6\n 7 p0007  4.10  32.9  30.5  21.8  8.34 -0.00565  -0.00226  0.00216   71.3\n 8 p0008  3.82  32.4  30.3  18.2  3.56 -0.00702  -0.00431  0.00293   81.6\n 9 p0009  3.20  32.4  30.6  17.9  5.73  0.000275 -0.00101  0.000372  70.6\n10 p0010  4.02  32.9  30.6  22.0  8.62 -0.000900 -0.00148  0.000614  64.9\n# ℹ 7,267 more rows\n\n\nWe are going to call a plotting function, plot_pres_vs_bg(), that wants some of the data from model_input and some of the data in variables. So, we have to do some data wrangling to combine those; we’ll add class to variables and then drop the point column.\n\nvariables = variables |&gt;\n  mutate(class = model_input$class) |&gt;    # the $ extracts a column \n  select(-point)                          # the - means \"deselect\" or \"drop\"\nvariables\n\n# A tibble: 7,277 × 10\n     MLD  Sbtm   SSS   SST  Tbtm         U         V     Xbtm depth class   \n   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;   \n 1  5.17  35.0  31.6  23.3  7.50 -0.00161  -0.00340  0.00133  304.  presence\n 2  4.25  32.8  30.6  21.6  8.15 -0.00420  -0.00206  0.00166   71.6 presence\n 3  4.64  34.0  30.7  20.2  7.05  0.00168   0.00148  0.000793 138.  presence\n 4  5.58  34.6  30.7  18.8  7.55  0.00267  -0.000410 0.000957 234.  presence\n 5  5.04  34.7  30.7  19.0  7.43 -0.00619  -0.00121  0.00224  205.  presence\n 6  4.01  32.4  30.6  22.0  8.22 -0.00344  -0.000859 0.00126   62.6 presence\n 7  4.10  32.9  30.5  21.8  8.34 -0.00565  -0.00226  0.00216   71.3 presence\n 8  3.82  32.4  30.3  18.2  3.56 -0.00702  -0.00431  0.00293   81.6 presence\n 9  3.20  32.4  30.6  17.9  5.73  0.000275 -0.00101  0.000372  70.6 presence\n10  4.02  32.9  30.6  22.0  8.62 -0.000900 -0.00148  0.000614  64.9 presence\n# ℹ 7,267 more rows\n\n\nFinally, can make a specialized plot comparing our variables for each class: presence and background.\n\nplot_pres_vs_bg(variables, \"class\")\n\n\n\n\n\n\n\n\nHow does this inform our thinking about reducing the number of variables? For which variables do presence and background values mirror each other? Which have the least overlap? We know that the model works by finding optimal combinations of covariates for the species. If there is never a difference between the conditions for presences and background then how will it find the optimal niche conditions?",
+    "objectID": "C04_models.html#assess-the-model",
+    "href": "C04_models.html#assess-the-model",
+    "title": "Models",
+    "section": "8.2 Assess the model",
+    "text": "8.2 Assess the model\nHewre we walk through a number of common assessment tools. We want to assess a model to ascertain how closely it models reality (or not!) Using the tools is always easy, interpreting the metrics is not always easy.\n\n8.2.1 Confusion matrix\nThe confusion matrix is the next step beyond a simple tally that we made above.\n\ntrain_confmat = conf_mat(train_pred, class, .pred)\ntrain_confmat\n\n            Truth\nPrediction   presence background\n  presence       1353        333\n  background      487       3280\n\n\nYou’ll see this is the same as the simple tally we made, but it comes with handy plotting functionality (shown below). Note that a perfect model would have the upper left and lower right quadrants fully accounting for all points. The lower left quadrant shows us the number of false-negatives while the upper right quadrant shows the number of false-positives.\n\nautoplot(train_confmat, type = \"heatmap\")\n\n\n\n\n\n\n\n\n\n\n8.2.2 ROC and AUC\nThe area under the curve (AUC) of the receiver-operator curve (ROC) is a common metric. AUC values range form 0-1 with 1 reflecting a model that faithfully predicts correctly. Technically an AUC value of 0.5 represents a random model (yup, the result of a coin flip!), so values greater than 0.5 and less than 1.0 are expected.\nFirst we can plot the ROC.\n\nplot_roc(train_pred, class, .pred_presence)\n\n\n\n\n\n\n\n\nWe can assure you from practical experience that this is an atypical ROC. Typically they are not smooth, but this smoothness is an artifact of our use of training data. If you really only need the AUC, you can use the roc_auc() function directly.\n\nroc_auc(train_pred, class,  .pred_presence)\n\n# A tibble: 1 × 3\n  .metric .estimator .estimate\n  &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt;\n1 roc_auc binary         0.939\n\n\n\n\n8.2.3 Accuracy\nAccuracy, much like our simple tally above, tells us what fraction of the predictions are correct. Not that here we explicitly provide the predicted class label (not the probability.)\n\naccuracy(train_pred, class, .pred)\n\n# A tibble: 1 × 3\n  .metric  .estimator .estimate\n  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;\n1 accuracy binary         0.850\n\n\n\n\n8.2.4 Partial dependence plot\nPartial dependence reflects the relative contrubution of each variable influence over it’s full range of values. The output is a grid grid of plots showing the relative distribution of the variable (bars) as well as the relative influenceof the variable (line).\n\npartial_dependence_plot(fitted_wflow, data = tr_data)",
     "crumbs": [
-      "Covariates"
+      "Models"
     ]
   },
   {
-    "objectID": "C03_covariates.html#saving-a-file-to-keep-track-of-modeling-choices",
-    "href": "C03_covariates.html#saving-a-file-to-keep-track-of-modeling-choices",
-    "title": "Prediction",
-    "section": "2.5 Saving a file to keep track of modeling choices",
-    "text": "2.5 Saving a file to keep track of modeling choices\nYou may have noticed that we write a lot of things to files (aka, “writing to disk”). It’s a useful practice especially when working with a multi-step process. One particular file, a configuration file, is used frequently in data science to store information about the choices we make as we work through our project. Configuration files generally are simple text files that we can easily get the computer to read and write.\nIn R, a confguration is treated as a named list. Each element of a list is named, but beyond that there aren’t any particular rules about confugurations. You can learn more about configurations in this tutorial.\nLet’s make a confuguration list that holds 4 items: version identifier, species name, sampling approach and the names of the variables to model with.\n\ncfg = list(\n  version = \"g_Aug\",               # g for greedy!\n  scientificname = \"Mola mola\",\n  approach = \"greedy\",\n  mon = \"Aug\",\n  keep_vars =  keep)\n\nWe can access by name three ways using what is called “indexing” : using the [[ indexing brackets, using the $ indexing operator or using the getElement() function.\n\ncfg[['scientificname']]\n\n[1] \"Mola mola\"\n\ncfg[[2]]\n\n[1] \"Mola mola\"\n\ncfg$scientificname\n\n[1] \"Mola mola\"\n\ngetElement(cfg, \"scientificname\")\n\n[1] \"Mola mola\"\n\ngetElement(cfg, 2)\n\n[1] \"Mola mola\"\n\n\nNow we’ll write this list to a file. First let’s set up a pathwy where we might store these configurations, and for that matter, to store our modeling files. We’ll make a new directory, models/g008 and write the configuration there. We’ll use the famous “YAML” format to store the file. See the file functions/configuration.R for documentation on reading and writing.\n\nok = make_path(data_path(\"models\")) # make a directory for models\nwrite_configuration(cfg)            \n\nUse the Files pane to navigate to your personal data directory. Open the g_Aug.yaml file - this is what you configuration looks like in YAML. Fortunately we don’t mess manually with these much.",
+    "objectID": "C04_models.html#predict-with-the-testing-data",
+    "href": "C04_models.html#predict-with-the-testing-data",
+    "title": "Models",
+    "section": "8.3 Predict with the testing data",
+    "text": "8.3 Predict with the testing data\nFinally, we can repeat these steps with the testing data. This should give use better information than using the training data\n\n8.3.1 Predict\n\ntest_data = testing(split_data)\ntest_pred = predict_table(fitted_wflow, test_data, type = \"prob\")\ntest_pred\n\n# A tibble: 1,819 × 4\n   .pred_presence .pred_background .pred      class   \n            &lt;dbl&gt;            &lt;dbl&gt; &lt;fct&gt;      &lt;fct&gt;   \n 1        0.539              0.461 presence   presence\n 2        0.361              0.639 background presence\n 3        0.648              0.352 presence   presence\n 4        0.853              0.147 presence   presence\n 5        0.221              0.779 background presence\n 6        0.416              0.584 background presence\n 7        0.445              0.555 background presence\n 8        0.379              0.621 background presence\n 9        0.00643            0.994 background presence\n10        0.438              0.562 background presence\n# ℹ 1,809 more rows\n\n\n\n\n8.3.2 Confusion matrix\n\ntest_confmat = conf_mat(test_pred, class, .pred)\nautoplot(test_confmat, type = \"heatmap\")\n\n\n\n\n\n\n\n\n\n\n8.3.3 ROC/AUC\n\nplot_roc(test_pred, class, .pred_presence)\n\n\n\n\n\n\n\n\nThis ROC is more typical of what we see in regular practice.\n\n\n8.3.4 Accuracy\n\naccuracy(test_pred, class, .pred)\n\n# A tibble: 1 × 3\n  .metric  .estimator .estimate\n  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;\n1 accuracy binary         0.713\n\n\n\n\n8.3.5 Partial Dependence\n\npartial_dependence_plot(fitted_wflow, data = test_data)",
     "crumbs": [
-      "Covariates"
+      "Models"
     ]
   },
   {
-    "objectID": "C03_covariates.html#modifying-the-recipe-with-steps",
-    "href": "C03_covariates.html#modifying-the-recipe-with-steps",
-    "title": "Prediction",
-    "section": "9.1 Modifying the recipe with steps",
-    "text": "9.1 Modifying the recipe with steps\nSteps are cumulative modifications, and that means the order in which they are added matters. These steps comprise the bulk of pre-processing steps.\nSome modifications are applied row-by-row. For example, rows of the input modeling data that have one or more missing values (NAs) can be problematic and they should be removed.\nOther modifications are to manipulate entire columns. Sometimes the recipes requires subsequent steps before the modeling begins in earnest. For example we know from experience that it is often useful to log scale (base 10) depth when working with biological models. If depth and Xbtm have made it this far, you’ll note that each range over 4 or more orders of magnitude. That’s not a problem by itself, but it can introduce a bias toward larger values whenever the mean is computed. So, we’ll add a step for log scaling these, but only if depth and Xbtm have made it this far (this may vary by species.)\nrec = rec |&gt; \n  step_naomit()\nif (\"depth\" %in% cfg$keep_vars){\n  rec = rec |&gt;\n    step_log(depth,  base = 10)\n}\nif (\"Xbtm\" %in% cfg$keep_vars){\n  rec = rec |&gt;\n    step_log(Xbtm,  base = 10)\n}\nrec\nNext we state that we want to remove variables that might be highly correlated with other variables. If two variables are highly correlated, they will not provide the modeling system with more information, just redundant information which doesn’t neccessarily help. step_corr() accepts a variety of arguments specifying which variables to test to correlation including some convenience selectors like all_numeric(), all_string() and friends. We want all predictors which happen to all be numeric, so we can use all_predictors() or all_numeric_predictors(). Specificity is better then generality so let’s choose numeric predictors.\n\n\n\n\n\n\nNote\n\n\n\nWe have already tested variables for high collinearlity, but here we can add a slightly different filter, high correlation, for the same issue. Since we have dealt with this already we shouldn’t expect that step will change the preprocessing very much. But it is instructive to see it in action.\n\n\nrec = rec |&gt; \n  step_corr(all_numeric_predictors())\nrec",
+    "objectID": "C02_background.html",
+    "href": "C02_background.html",
+    "title": "Background",
+    "section": "",
+    "text": "Traditional ecological surveys are systematic, for a given species survey data sets tell us where the species is found and where it is absent. Using an observational data (like OBIS) set we only know where the species is found, which leaves us guessing about where they might not be found. This difference is what distinguishes a presence-abscence data set from a presence-only data set, and this difference guides the modeling process.\nWhen we model, we are trying to define the environs where we should expect to find a species as well as the environs we would not expect to find a species. We have in hand the locations of observations, and we can extract the environmental data at those locations. But to characterize the less suitable environments we are going to have to sample what is called “background”. We want these background samples to roughly match the regional preferences of the observations; that is we want to avoid having observations that are mostly over Georges Bank while our background samples are primarily around the Bay of Fundy.",
     "crumbs": [
-      "Covariates"
+      "Background"
     ]
   },
   {
-    "objectID": "C03_covariates.html#add-the-recipe-to-the-workflow",
-    "href": "C03_covariates.html#add-the-recipe-to-the-workflow",
-    "title": "Prediction",
-    "section": "9.2 Add the recipe to the workflow",
-    "text": "9.2 Add the recipe to the workflow\nwflow = wflow |&gt;\n  add_recipe(rec)\nwflow",
+    "objectID": "C02_background.html#sample-background",
+    "href": "C02_background.html#sample-background",
+    "title": "Background",
+    "section": "2.1 Sample background",
+    "text": "2.1 Sample background\nWhen we sample the background, we are creating the input for the model if we request that the observations (presences) are joined with the background.\nNext we sample the background as guided by the density map. We’ll ask for 2x as many presences, but it is just a request. We also request that no background point be further than 30km (30000m) from it’s closest presence point.\n\ngreedy_input = sample_background(obs, mask, \n                              n = 2 * nrow(obs),\n                              class_label = \"background\",\n                              method = c(\"dist_max\", 30000),\n                              return_pres = TRUE)\n\nWarning in sample_background(obs, mask, n = 2 * nrow(obs), class_label = \"background\", : There are fewer available cells for raster 'NA' (2459 presences) than the requested 4918 background points. Only 4818 will be returned.\n\ngreedy_input\n\nSimple feature collection with 7277 features and 1 field\nGeometry type: POINT\nDimension:     XY\nBounding box:  xmin: -74.89169 ymin: 38.805 xmax: -65.02004 ymax: 45.21401\nGeodetic CRS:  WGS 84\n# A tibble: 7,277 × 2\n   class                geometry\n * &lt;fct&gt;             &lt;POINT [°]&gt;\n 1 presence    (-72.8074 39.056)\n 2 presence      (-71.343 40.52)\n 3 presence  (-68.7691 41.52448)\n 4 presence       (-67.79 43.32)\n 5 presence (-68.44324 42.61177)\n 6 presence    (-72.4328 40.213)\n 7 presence   (-71.8784 40.3569)\n 8 presence      (-65.78 43.195)\n 9 presence       (-70.5 42.767)\n10 presence   (-72.3024 40.1862)\n# ℹ 7,267 more rows\n\n\nYou may encounter a warning message that says, “There are fewer available cells for raster…”. This is useful information, there simply weren’t a lot of non-NA cells to sample from. Let’s plot this.\n\nplot(greedy_input['class'], \n     axes = TRUE,  \n     pch = \".\", \n     extent = mask, \n     main = \"August greedy class distribution\",\n     reset = FALSE)\nplot(coast, col = \"orange\", add = TRUE)\n\n\n\n\n\n\n\n\nHmmm, let’s tally the class labels.\n\ncount(greedy_input, class)\n\nSimple feature collection with 2 features and 2 fields\nGeometry type: MULTIPOINT\nDimension:     XY\nBounding box:  xmin: -74.89169 ymin: 38.805 xmax: -65.02004 ymax: 45.21401\nGeodetic CRS:  WGS 84\n# A tibble: 2 × 3\n  class          n                                                      geometry\n* &lt;fct&gt;      &lt;int&gt;                                              &lt;MULTIPOINT [°]&gt;\n1 presence    2459 ((-65.07 42.68), (-65.067 42.65), (-65.05 42.583), (-65.05 4…\n2 background  4818 ((-65.02004 42.25251), (-65.02004 42.74609), (-65.1023 42.66…\n\n\nWell, that’s imbalanced with a different number presences than background points. But, on the bright side, the background points are definitely in the region of observations.",
     "crumbs": [
-      "Covariates"
+      "Background"
     ]
   },
   {
-    "objectID": "C03_covariates.html#create-the-model",
-    "href": "C03_covariates.html#create-the-model",
-    "title": "Prediction",
-    "section": "10.1 Create the model",
-    "text": "10.1 Create the model\nWe create a random forest model, declare that it should be run in classification mode (not regression mode), and then specify that we want to use the ranger modeling engine (as opposed to, say, the randForest engine). We additionally specify that it should be able to produce probablilites of a class not just the class label. We also request that it saves bits of info so that we can compare the relative importance of the covariates.\nmodel = rand_forest() |&gt;\n  set_mode(\"classification\") |&gt;\n  set_engine(\"ranger\", probability = TRUE, importance = \"permutation\") \nmodel\nWell, that feels underwhelming. We can pass arguments unique to the engine using the set_args() function, but, for now we’ll accept the defaults.",
+    "objectID": "C02_background.html#thin-by-cell",
+    "href": "C02_background.html#thin-by-cell",
+    "title": "Background",
+    "section": "3.1 Thin by cell",
+    "text": "3.1 Thin by cell\nIn this approach we eliminate (thin) presences so that we have no more than one per covariate array cell.\n\ndim_before = dim(obs)\ncat(\"number of rows before cell thinning:\", dim_before[1], \"\\n\")\n\nnumber of rows before cell thinning: 2459 \n\nthinned_obs = thin_by_cell(obs, mask)\ndim_after = dim(thinned_obs)\ncat(\"number of rows after cell thinning:\", dim_after[1], \"\\n\")\n\nnumber of rows after cell thinning: 1204 \n\n\nSo, that dropped quite a few!",
     "crumbs": [
-      "Covariates"
+      "Background"
     ]
   },
   {
-    "objectID": "C03_covariates.html#add-the-model-to-the-workflow",
-    "href": "C03_covariates.html#add-the-model-to-the-workflow",
-    "title": "Prediction",
-    "section": "10.2 Add the model to the workflow",
-    "text": "10.2 Add the model to the workflow\nNow we simply add the model to the workflow.\nwflow = wflow |&gt;\n  add_model(model)\nwflow",
+    "objectID": "C02_background.html#make-a-weighted-sampling-map",
+    "href": "C02_background.html#make-a-weighted-sampling-map",
+    "title": "Background",
+    "section": "3.2 Make a weighted sampling map",
+    "text": "3.2 Make a weighted sampling map\nThere is a technique we can use to to make a weighted sampling map. Simply counting the number of original observations per cell will indicate where we are most likely to oberve Mola mola.\n\nsamp_weight = rasterize_point_density(obs, mask)\nplot(samp_weight, axes = TRUE, breaks = \"equal\", col = rev(hcl.colors(10)), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\n\n\n\n\n\n\n\n\nNow let’s take a look at the background, but this time we’ll try to match the count of presences.\n\nconservative_input = sample_background(thinned_obs, samp_weight, \n                              n = 2 * nrow(obs),\n                              class_label = \"background\",\n                              method = \"bias\",\n                              return_pres = TRUE)\n\nWarning in sample_background(thinned_obs, samp_weight, n = 2 * nrow(obs), : There are fewer available cells for raster 'NA' (1204 presences) than the requested 4918 background points. Only 1204 will be returned.\n\ncount(conservative_input, class)\n\nSimple feature collection with 2 features and 2 fields\nGeometry type: MULTIPOINT\nDimension:     XY\nBounding box:  xmin: -74.5867 ymin: 38.868 xmax: -65.02004 ymax: 45.1333\nGeodetic CRS:  WGS 84\n# A tibble: 2 × 3\n  class          n                                                      geometry\n* &lt;fct&gt;      &lt;int&gt;                                              &lt;MULTIPOINT [°]&gt;\n1 presence    1204 ((-65.067 42.65), (-65.05 42.6), (-65.067 42.617), (-65.16 4…\n2 background  1204 ((-65.1023 42.66383), (-65.02004 42.58157), (-65.1023 42.581…\n\n\nWhoa - that’s many fewer background points.\n\nplot(conservative_input['class'], \n     axes = TRUE,  \n     pch = \".\", \n     extent = mask, \n     main = \"August conservative class distribution\",\n     reset = FALSE)\nplot(coast, col = \"orange\", add = TRUE)\n\n\n\n\n\n\n\n\nIt appears that background points are essentially shadowing the thinned presence points.",
     "crumbs": [
-      "Covariates"
+      "Background"
     ]
   },
   {
-    "objectID": "C03_covariates.html#predict-with-the-training-data",
-    "href": "C03_covariates.html#predict-with-the-training-data",
-    "title": "Prediction",
-    "section": "12.1 Predict with the training data",
-    "text": "12.1 Predict with the training data\nFirst we shall predict with the same data we trained with. The results of this will not really tell us much about our model as it is very circular to predict using the very data used to build the model. So this next section is more about a first pass at using the tools at your disposal.\ntrain_pred = predict_table(fitted_wflow, tr_data, type = \"prob\")\ntrain_pred\nHere the variables prepended with a dot . are computed, while the class variable is our original. There are many metrics we can use to determine how well this model predicts. Let’s start with the simplest thing… we can make a simply tally of .pred and class.\ncount(train_pred, .pred, class)\nThere false positives and false negatives, but many are correct. Of course, this is predicting with the very data we used to train the model; knowing that this is predicicting on training data with some many misses might not inspire confidence. But let’s explore more.",
+    "objectID": "C02_background.html#a-function-we-can-reuse",
+    "href": "C02_background.html#a-function-we-can-reuse",
+    "title": "Background",
+    "section": "5.1 A function we can reuse",
+    "text": "5.1 A function we can reuse\nHere we make a function that needs at least three arguments: the complete set of observations, the mask used for sampling (and possibly thinning) and the month to filter the observations. The pseudo-code might look like this…\nfor a given month\n  filter the obs for that month\n  make the greedy model input by sampling the background\n    save the greedy model input\n  thin the obs\n  make the conservative model input by sampling background\n    save the conservative model input\n  return a list the greedy and conservative model inputs\nPhew! That’s a lot of steps. To manually run those steps 12 times would be tedious, so we roll that into a function that we can reuse 12 times instead.\nThis function will have a name, make_model_input_by_month. It’s a long name, but it makes it obvious what it does. First we start with the documentation.\n\n#' Builds greedy and conservative model input data sets for a given month\n#' \n#' @param mon chr the month abbreviation for the month of interest (\"Jan\" by default)\n#' @param obs table, the complete observation data set\n#' @param raster stars, the object that defines the sampling space, usually a mask\n#' @param species chr, the name of the species prepended to the name of the output files.\n#'   (By default \"Mola mola\" which gets converted to \"Mola_mola\")\n#' @param path the output data path to store this data (be default \"model_input\")\n#' @param min_obs num this sets a threshold below which we wont try to make a model. (Default is 3)\n#' @return a named two element list of greedy and conservative model inputs - they are tables\nmake_model_input_by_month  = function(mon = \"Jan\",\n                                      obs = read_observations(\"Mola mola\"),\n                                      raster = NULL,\n                                      species = \"Mola mola\",\n                                      path = data_path(\"model_input\"),\n                                      min_obs = 3){\n  # the user *must* provide a raster\n  if (is.null(raster)) stop(\"please provide a raster\")\n  # filter the obs\n  obs = obs |&gt;\n    filter(month == mon[1])\n  \n  # check that we have at least some records, if not enough then alert the user\n  # and return NULL\n  if (nrow(obs) &lt; min_obs){\n    warning(\"sorry, this month has too few records: \", mon)\n    return(NULL)\n  }\n  \n  # make sure the output path exists, if not, make it\n  make_path(path)\n  \n  \n  # make the greedy model input by sampling the background\n  greedy_input = sample_background(obs, raster,\n                                   n = 2 * nrow(obs),\n                                   class_label = \"background\",\n                                   method = c(\"dist_max\", 30000),\n                                   return_pres = TRUE)\n  # save the greedy data\n  filename = sprintf(\"%s-%s-greedy_input.gpkg\", \n                     gsub(\" \", \"_\", species),\n                     mon)\n  write_sf(greedy_input, file.path(path, filename))\n  \n  # thin the obs\n  thinned_obs = thin_by_cell(obs, raster)\n  \n  # sampling weight\n  samp_weight = rasterize_point_density(obs, raster)\n  \n  # make the conservative model\n  conservative_input = sample_background(thinned_obs, samp_weight,\n                                   n = 2 * nrow(obs),\n                                   class_label = \"background\",\n                                   method = \"bias\",\n                                   return_pres = TRUE)\n  \n  # save the conservative data\n  filename = sprintf(\"%s-%s-conservative_input.gpkg\", \n                     gsub(\" \", \"_\", species),\n                     mon)\n  write_sf(conservative_input, file.path(path,filename))\n  \n  # make a list\n  r = list(greedy = greedy_input, conservative = conservative_input)\n  \n  # return, but disable automatic printing\n  invisible(r)\n}",
     "crumbs": [
-      "Covariates"
+      "Background"
     ]
   },
   {
-    "objectID": "C03_covariates.html#assess-the-model",
-    "href": "C03_covariates.html#assess-the-model",
-    "title": "Prediction",
-    "section": "12.2 Assess the model",
-    "text": "12.2 Assess the model\nHewre we walk through a number of common assessment tools. We want to assess a model to ascertain how closely it models reality (or not!) Using the tools is always easy, interpreting the metrics is not always easy.\n\n12.2.1 Confusion matrix\nThe confusion matrix is the next step beyond a simple tally that we made above.\ntrain_confmat = conf_mat(train_pred, class, .pred)\ntrain_confmat\nYou’ll see this is the same as the simple tally we made, but it comes with handy plotting functionality (shown below). Note that a perfect model would have the upper left and lower right quadrants fully accounting for all points. The lower left quadrant shows us the number of false-negatives while the upper right quadrant shows the number of false-positives.\nautoplot(train_confmat, type = \"heatmap\")\n\n\n12.2.2 ROC and AUC\nThe area under the curve (AUC) of the receiver-operator curve (ROC) is a common metric. AUC values range form 0-1 with 1 reflecting a model that faithfully predicts correctly. Technically an AUC value of 0.5 represents a random model (yup, the result of a coin flip!), so values greater than 0.5 and less than 1.0 are expected.\nFirst we can plot the ROC.\nplot_roc(train_pred, class, .pred_presence)\nWe can assure you from practical experience that this is an atypical ROC. Typically they are not smooth, but this smoothness is an artifact of our use of training data. If you really only need the AUC, you can use the roc_auc() function directly.\nroc_auc(train_pred, class,  .pred_presence)\n\n\n12.2.3 Accuracy\nAccuracy, much like our simple tally above, tells us what fraction of the predictions are correct. Not that here we explicitly provide the predicted class label (not the probability.)\naccuracy(train_pred, class, .pred)\n\n\n12.2.4 Partial dependence plot\nPartial dependence reflects the relative contrubution of each variable influence over it’s full range of values. The output is a grid grid of plots showing the relative distribution of the variable (bars) as well as the relative influenceof the variable (line).\npartial_dependence_plot(fitted_wflow, data = tr_data)",
+    "objectID": "C00_coding.html",
+    "href": "C00_coding.html",
+    "title": "Coding",
+    "section": "",
+    "text": "Coding is the practice of writing instructions for computers to follow; computers aren’t clever by themselves - they need to be told what to do. Most coding is text-based; people writing coding instructions into simple text documents. But some coding is graphical or visual. We shall be using text-based coding. We are going to use a free and open source general programming language called R. R programming language has its roots in statistics and science, but it really can be used for anything.\nIn the early days, coding was pretty barebones - all one needed was a text editor and access to the programming language - no frills there, no pretty images, no buttons to push, just typing. As time passed, volunteers added niceties to the text editor, like visualizing plots of data, buttons to save files, colorized text for the typed code, and other bells and whistles. These editors became know as graphical user interfaces (GUI for short.) GUIs keep getting easier and easier for people to use. We will use the GUI known as RStudio. It’s best to think of GUIs as wrappers around the core programming language; they are really nice and pretty, but they can’t do math. The programming language itself (which does do math!), evolved only as it needed to to fix bugs and make general improvements.",
     "crumbs": [
-      "Covariates"
+      "Coding"
     ]
   },
   {
-    "objectID": "C03_covariates.html#predict-with-the-testing-data",
-    "href": "C03_covariates.html#predict-with-the-testing-data",
-    "title": "Prediction",
-    "section": "12.3 Predict with the testing data",
-    "text": "12.3 Predict with the testing data\nFinally, we can repeat these steps with the testing data. This should give use better information than using the training data\n\n12.3.1 Predict\ntest_data = testing(split_data)\ntest_pred = predict_table(fitted_wflow, test_data, type = \"prob\")\ntest_pred\n\n\n12.3.2 Confusion matrix\ntest_confmat = conf_mat(test_pred, class, .pred)\nautoplot(test_confmat, type = \"heatmap\")\n\n\n12.3.3 ROC/AUC\nplot_roc(test_pred, class, .pred_presence)\nThis ROC is more typical of what we see in regular practice.\n\n\n12.3.4 Accuracy\naccuracy(test_pred, class, .pred)\n\n\n12.3.5 Partial Dependence\npartial_dependence_plot(fitted_wflow, data = test_data)",
+    "objectID": "C00_coding.html#loading-the-necessary-tools",
+    "href": "C00_coding.html#loading-the-necessary-tools",
+    "title": "Coding",
+    "section": "4.1 Loading the necessary tools",
+    "text": "4.1 Loading the necessary tools\nFor any coding project you will need to access a select number of tools, often stored on your computer in what is called a package library (it’s just a directory/folder really). When the package is loaded from the library, all of the functionality the author built in to that package is exposed for you to use in your project. We have created a single file that will both install (if needed) and load (if not already loaded) each of these packages. It’s easy to run.\nFirst, make sure that you have loaded the project (File &gt; Open Project) if you haven’t already. Then at the R console pane type the following…\n\nsource(\"setup.R\")\n\nAfter a few moments the command prompt will return to focus. Be sure to run that command at the beginning of every new R session or anytime you are adding new functionality.\nNow we are ready to load some data into your R session.",
     "crumbs": [
-      "Covariates"
+      "Coding"
     ]
   },
   {
-    "objectID": "C03_covariates.html#nowcast",
-    "href": "C03_covariates.html#nowcast",
-    "title": "Prediction",
-    "section": "19.1 Nowcast",
-    "text": "19.1 Nowcast\nFirst make the prediction. The function yields a stars array object that has three attributes: .pred_presence, .pred_background and .pred. The leading dot simply gives us the heads up that these three values are all computed. The first two range from 0-1 which implies a probability. The last, .pred, is the class label we would assign if we accept that any .pred_presence &gt;= 0.5 should be considered suitable habitat where a reported observation might occur.\nnowcast = predict_stars(wflow, covars)\nnowcast\nNow we can plot what is often called a “habitat suitability index” (hsi) map.\ncoast = read_coastline()\nplot(nowcast['.pred_presence'], main = \"Nowcast August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\nWe can also plot a presence/background labeled map, but keep in mind it is just a thresholded version of the above where “presence” means .pred_presence &gt;= 0.5.\nplot(nowcast['.pred'], main = \"Nowcast August Labels\", \n     axes = TRUE, reset = FALSE)\nplot(coast, col = \"black\", lwd = 2, add = TRUE)",
+    "objectID": "C00_coding.html#spatial-data",
+    "href": "C00_coding.html#spatial-data",
+    "title": "Coding",
+    "section": "4.2 Spatial data",
+    "text": "4.2 Spatial data\nSpatial data is any data that has been assigned to a location on a planet (or even between planets!); that means environmental data is mapped to locations on oblate spheroids (like Earth). The oblate spheroid shape presents interesting but challenging math to the data scientist. Modern spatial data is designed to make data science easier by handling all of the location information in a discrete and standardized manner. By discrete we mean that we don’t have to sweat the details.\n\n4.2.1 Point data\nMany spatial data sets come as point data - locations (longitude, latitude and maybe altitude/depth and/or time) with one or more measurements (temperature, cloudiness, probability of precipitation, abundance of fish, population density, etc) attached to that point. Here is an example of point data about long-term oceanographic monitoring buoys in the Gulf of Maine (“gom”). We’ll read the buoy data into a variable, buoy. Next we can print the result simply by typing the name (or you could type print(buoys) if you like all the extra typing.)\n\nbuoys = gom_buoys()\nbuoys\n\nSimple feature collection with 6 features and 3 fields\nGeometry type: POINT\nDimension:     XY\nBounding box:  xmin: -70.4277 ymin: 42.3233 xmax: -65.9267 ymax: 44.10163\nGeodetic CRS:  WGS 84\n# A tibble: 6 × 4\n  name  longname            id                geometry\n* &lt;chr&gt; &lt;chr&gt;               &lt;chr&gt;          &lt;POINT [°]&gt;\n1 wms   Western Maine Shelf B01    (-70.4277 43.18065)\n2 cms   Central Maine Shelf E01     (-69.3578 43.7148)\n3 pb    Penobscot Bay       F01   (-68.99689 44.05495)\n4 ems   Eastern Maine Shelf I01   (-68.11359 44.10163)\n5 jb    Jordan Basin        M01   (-67.88029 43.49041)\n6 nec   Northeast Channel   N01     (-65.9267 42.3233)\n\n\n\n\n\n\n\n\nNote\n\n\n\nYou can get the online documention for functions a couple of ways. You can type ?name_of_function, or or help(name_of_function). Try ?gom_buoys as an example.\nSometimes you need more - like seeing the function itself. You can always try typing the function name without any trailing parentheses.\n\ngom_buoys\n\nfunction (form = c(\"table\", \"sf\")[2]) \n{\n    x = structure(list(name = c(\"wms\", \"cms\", \"pb\", \"ems\", \"jb\", \n        \"nec\"), longname = c(\"Western Maine Shelf\", \"Central Maine Shelf\", \n        \"Penobscot Bay\", \"Eastern Maine Shelf\", \"Jordan Basin\", \n        \"Northeast Channel\"), id = c(\"B01\", \"E01\", \"F01\", \"I01\", \n        \"M01\", \"N01\"), lon = c(-70.4277, -69.3578, -68.99689, \n        -68.11359, -67.88029, -65.9267), lat = c(43.18065, 43.7148, \n        44.05495, 44.10163, 43.49041, 42.3233)), row.names = c(NA, \n        -6L), class = c(\"tbl_df\", \"tbl\", \"data.frame\"))\n    if (tolower(form[1]) == \"sf\") \n        x = sf::st_as_sf(x, coords = c(\"lon\", \"lat\"), crs = 4326)\n    x\n}\n&lt;bytecode: 0x7fcb072131b8&gt;\n\n\nIf that still doesn’t work, we highly recommend trying Rseek.org which is an R-language specific search engine.\n\n\nSo there are 6 buoys, each with an attached attribute “name”, “longname” and “id”, as well as the spatial location datain the “geometry” column (just longitude and latitude in this case). We can easily plot these using the “name” column as a color key. For more on plotting spatial data, see this wiki page.\n\nplot(buoys['id'], axes = TRUE, pch = 16)\n\n\n\n\n\n\n\n\nWell, that’s pretty, but without a shoreline it lacks context.",
     "crumbs": [
-      "Covariates"
+      "Coding"
     ]
   },
   {
-    "objectID": "C03_covariates.html#forecast",
-    "href": "C03_covariates.html#forecast",
-    "title": "Prediction",
-    "section": "19.2 Forecast",
-    "text": "19.2 Forecast\nNow let’s try our hand at forecasting - let’s try RCP85 in 2075. First we load those parameters, then run the prediction and plot.\ncovars_rcp85_2075 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2075, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \nforecast_2075 = predict_stars(wflow, covars_rcp85_2075)\nforecast_2075\ncoast = read_coastline()\nplot(forecast_2075['.pred_presence'], main = \"RCP85 2075 August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\nHmmm, that’s pretty different than what the nowcast predicts.",
+    "objectID": "C00_coding.html#linestrings-and-polygon-data",
+    "href": "C00_coding.html#linestrings-and-polygon-data",
+    "title": "Coding",
+    "section": "4.3 Linestrings and polygon data",
+    "text": "4.3 Linestrings and polygon data\nLinestrings (open shapes) and polygons (closed shape) are much like point data, except that each geometry is linestring or polygon. We have a set of polygons/linestring that represent the coastline.\n\ncoast = read_coastline()\ncoast\n\nSimple feature collection with 14 features and 0 fields\nGeometry type: MULTILINESTRING\nDimension:     XY\nBounding box:  xmin: -74.9 ymin: 38.95218 xmax: -65 ymax: 46.06477\nGeodetic CRS:  WGS 84\n# A tibble: 14 × 1\n                                                                            geom\n                                                           &lt;MULTILINESTRING [°]&gt;\n 1 ((-72.1019 41.01504, -72.15127 41.05146, -72.18389 41.04678, -72.28745 41.02…\n 2 ((-73.68745 45.56143, -73.85293 45.51572, -73.96055 45.44141, -73.92021 45.4…\n 3 ((-73.69531 45.5855, -73.57236 45.69448, -73.72466 45.67183, -73.85771 45.57…\n 4 ((-66.32412 44.25732, -66.27378 44.29229, -66.21035 44.39204, -66.25049 44.3…\n 5 ((-68.69077 44.24873, -68.70303 44.23198, -68.70171 44.18267, -68.66118 44.1…\n 6 ((-66.89707 44.62891, -66.7625 44.68179, -66.75337 44.70981, -66.74541 44.79…\n 7 ((-68.29941 44.45649, -68.34702 44.43037, -68.40947 44.36426, -68.41172 44.2…\n 8 ((-71.39307 41.46675, -71.36533 41.48525, -71.35449 41.54229, -71.36431 41.5…\n 9 ((-74.25049 39.52939, -74.1332 39.68076, -74.10674 39.74644, -74.25317 39.55…\n10 ((-74.18818 40.6146, -74.23589 40.5187, -74.18813 40.52285, -74.13853 40.541…\n11 ((-70.67373 41.44854, -70.7605 41.37358, -70.8292 41.35898, -70.7853 41.3274…\n12 ((-71.34624 41.46938, -71.29092 41.4646, -71.24141 41.49194, -71.23203 41.65…\n13 ((-70.0627 41.32847, -70.08662 41.31758, -70.23306 41.28633, -70.05508 41.24…\n14 ((-74.9 39.14709, -74.89702 39.14546, -74.9 39.1329), (-74.9 38.95218, -74.7…\n\n\nIn this case, each record of geometry is a “MULTILINESTRING”, which is a group of one or more linestrings. Note that no other variables are in this table - it’s just the geometry.\nLet’s plot these geometries, and add the points on top.\n\nplot(coast, col = \"orange\", lwd = 2, axes = TRUE, reset = FALSE,\n     main = \"Buoys in the Gulf of Maine\")\nplot(st_geometry(buoys), pch = 1, cex = 0.5, add = TRUE)\ntext(st_geometry(buoys), labels = buoys$id, cex = 0.7, adj = c(1,-0.1))",
     "crumbs": [
-      "Covariates"
+      "Coding"
     ]
   },
   {
-    "objectID": "C03_covariates.html#forecast-2055",
-    "href": "C03_covariates.html#forecast-2055",
-    "title": "Prediction",
-    "section": "20.1 Forecast 2055",
-    "text": "20.1 Forecast 2055\ncovars_rcp85_2055 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2055, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \nforecast_2055 = predict_stars(wflow, covars_rcp85_2055)\nforecast_2055",
+    "objectID": "C00_coding.html#array-data-aka-raster-data",
+    "href": "C00_coding.html#array-data-aka-raster-data",
+    "title": "Coding",
+    "section": "4.4 Array data (aka raster data)",
+    "text": "4.4 Array data (aka raster data)\nOften spatial data comes in grids, like regular arrays of pixels. These are great for all sorts of data like satellite images, bathymetry maps and environmental modeling data. We’ll be working with environmental modeling data which we call “Brickman data”. You can learn more about Brickman data in the wiki. We’ll be glossing over the details here, but there’s lots of detail in the wiki.\nWe’ll read in the database that tracks 82 Brickman data files, and then immediately filter out the rows that define the “PRESENT” scenario (where present means 1982–2013) and monthly climatology models.\n\ndb = brickman_database() |&gt;\n  filter(scenario == \"PRESENT\", interval == \"mon\") # note the double '==', it's comparative\ndb\n\n# A tibble: 8 × 4\n  scenario year    interval var  \n  &lt;chr&gt;    &lt;chr&gt;   &lt;chr&gt;    &lt;chr&gt;\n1 PRESENT  PRESENT mon      MLD  \n2 PRESENT  PRESENT mon      Sbtm \n3 PRESENT  PRESENT mon      SSS  \n4 PRESENT  PRESENT mon      SST  \n5 PRESENT  PRESENT mon      Tbtm \n6 PRESENT  PRESENT mon      U    \n7 PRESENT  PRESENT mon      V    \n8 PRESENT  PRESENT mon      Xbtm \n\n\nIf you are wondering about filtering a table, be sure to check out the wiki on tabular data to get started.\nYou might be wondering what that |&gt; is doing. It is called a pipe, and it delivers the output of one function to the next function as the first parameter (aka argument). For example, brickman_database() produces a table, that table is immediately passed into filter() to choose rows that match our criteria.\nNow that we have the database listing just the records we want, we pass it to the read_brickman() function.\n\ncurrent = read_brickman(db)\ncurrent\n\nstars object with 3 dimensions and 9 attributes\nattribute(s):\n                Min.      1st Qu.        Median          Mean      3rd Qu.\nMLD     1.011275e+00  5.583339810  15.967359543  18.910421492 2.809953e+01\nSbtm    2.324167e+01 32.136343956  34.232215881  33.507147254 3.491243e+01\nSSS     1.644333e+01 30.735633373  31.104771614  31.492407921 3.203519e+01\nSST    -7.826599e-01  6.434107542  12.359498501  12.151707840 1.763068e+01\nTbtm   -2.676387e-01  3.595118523   6.110801697   6.122372065 7.521761e+00\nU      -2.121380e-01 -0.010892980  -0.002634738  -0.010139401 7.229637e-04\nV      -1.883337e-01 -0.010722862  -0.002858645  -0.008474233 9.565173e-04\nXbtm    3.275602e-06  0.001458065   0.003088348   0.008360344 7.256525e-03\ndepth   5.000000e+00 60.258880615 145.012619019 923.313763739 1.704049e+03\n               Max.  NA's\nMLD    1.066982e+02 59796\nSbtm   3.515742e+01 59796\nSSS    3.559161e+01 59796\nSST    2.643147e+01 59796\nTbtm   2.460999e+01 59796\nU      7.469980e-02 59796\nV      5.264002e-02 59796\nXbtm   1.899681e-01 59796\ndepth  4.964409e+03 59796\ndimension(s):\n      from  to offset    delta refsys point      values x/y\nx        1 121 -74.93  0.08226 WGS 84 FALSE        NULL [x]\ny        1  89  46.08 -0.08226 WGS 84 FALSE        NULL [y]\nmonth    1  12     NA       NA     NA    NA Jan,...,Dec    \n\n\nThis loads quite a complex set of arrays, but they have spatial information attached in the dimensions section. The x and y dimensions represent longitude and latitude respectively. The 3rd dimension, month, is time based.\nHere we plot all 12 months of sea surface temperature, SST. Note the they all share the same color scale so that they are easy to compare.\n\nplot(current['SST'])\n\n\n\n\n\n\n\n\nJust as we are able to plot linestrings/polygons along side points, we can also plot these with arrays (rasters). To do this for one month (“Apr”) of one variable (“SSS”) we simply need to slice that data out of the current variable.\n\napril_sss = current['SSS'] |&gt;\n  slice(\"month\", \"Apr\")\napril_sss\n\nstars object with 2 dimensions and 1 attribute\nattribute(s):\n         Min. 1st Qu.   Median    Mean  3rd Qu.     Max. NA's\nSSS  16.44333 30.8342 31.10334 31.4641 31.93447 35.59161 4983\ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]\n\n\nThen it’s just plot, plot, plot.\n\nplot(april_sss, axes = TRUE, reset = FALSE)\nplot(st_geometry(coast), add = TRUE, col = \"orange\", lwd = 2)\nplot(st_geometry(buoys), add = TRUE, pch = 16, col = \"purple\")\n\n\n\n\n\n\n\n\nWe can plot ALL twelve months of a variable (“SST”) with the coast and points shown. There is one slight modification to be made since a single call to plot() actually gets invoked 12 times for this data. So where do we add in the buoys and coast? Fortunately, we can create what is called a “hook” function - who knows where the name hook came from? Once the hook function is defined, it will be applied to the each of the 12 subplots.\n\n# a little function that gets called just after each sub-plot\n# it simple adds the coast and buoy\nadd_coast_and_buoys = function(){\n  plot(st_geometry(coast), col = \"orange\", lwd = 2, add = TRUE)\n  plot(st_geometry(buoys), pch = 16, col = \"purple\", add = TRUE)\n}\n\n# here we call the plot, and tell R where to call `add_coast_and_buoys()` after\n# each subplot is made\nplot(current['SST'], hook = add_coast_and_buoys)",
     "crumbs": [
-      "Covariates"
+      "Coding"
     ]
   },
   {
-    "objectID": "C03_covariates.html#bind-time-series",
-    "href": "C03_covariates.html#bind-time-series",
-    "title": "Prediction",
-    "section": "20.2 Bind time series",
-    "text": "20.2 Bind time series\nWe want to bind the .pred_presence attribute for each of the predictions (nowcast, forecast_2055 and forecast_2075). Let’s assume the “present” mean 2020 so we can assign a year.\nrcp85 = c(nowcast, forecast_2055, forecast_2075, along = list(year = c(\"2020\", \"2055\", \"2075\")))\n\n\n\n\n\n\nNote\n\n\n\nCurious about we provide year as a vector of characters instead of a vector of integers? Try running the command above again and check out the 3rd dimension.\n\n\nSince we are plotting multiple arrays, we need to plot the coastline using a “hook” function.\nplot_coast = function(){\n  plot(coast, col = \"orange\", lwd = 2, add = TRUE)\n}\n\nplot(rcp85['.pred_presence'], \n     hook = plot_coast,\n     axes = TRUE, breaks = seq(0, 1, by = 0.1), join_zlim  = TRUE, reset = FALSE)\nHmmmm. Why does there seem to be a strong shift between 2020 and 2055, while the 2055 to 2075 shift seems less pronounced?\n\n\n\n\n\n\nNote\n\n\n\nDon’t forget that there are other ways to plot array based spatial data.",
+    "objectID": "C00_coding.html#coding-assignment",
+    "href": "C00_coding.html#coding-assignment",
+    "title": "Coding",
+    "section": "4.5 Coding Assignment",
+    "text": "4.5 Coding Assignment\n\n\n\n\n\n\nUse the menu option File &gt; New File &gt; R Script to create a blank file. Save the file (even though it is empty) in the “assignment” directory as “assignment_script_1.R”. Use this file to build a script that meets the following challenge. Note that the existing file, “assignment_script_0.R” is already there as an example.\nUse the Brickman tutorial to extract data from the location of Buoy M01 for RCP4.5 2055. Make a plot of SST (y-axis) as a function of month (x-axis). Here’s one possible outcome.\n\n\n\nBuoy M01, RCP4.5 2055",
     "crumbs": [
-      "Covariates"
+      "Coding"
     ]
   },
   {
-    "objectID": "C03_covariates.html#save-the-predictions",
-    "href": "C03_covariates.html#save-the-predictions",
-    "title": "Prediction",
-    "section": "20.3 Save the predictions",
-    "text": "20.3 Save the predictions\nWe could save all three attributes, but .pred_background is just 1 - .pred_presence, and .pred is just coding “presence” where .pred_presence &gt;= 0.5, so we can always compute those as needed if we have .pred_presence. In that case, let’s just save the first attribute, .pred_presence, in a multilayer GeoTIFF formatted image array file. The write_prediction() function will do just that.\n# make sure the output directory exists\npath = data_path(\"predictions\")\nif (!dir.exists(path)) ok = dir.create(path, recursive = TRUE)\n\n# write individual arrays?\nwrite_prediction(nowcast, file = file.path(path,\"g_Aug_RCP85_2020.tif\"))\nwrite_prediction(forecast_2055, file = file.path(path, \"g_Aug_RCP85_2055.tif\"))\nwrite_prediction(forecast_2075, file = file.path(path, \"g_Aug_RCP85_2075.tif\"))\n\n# or write them together in a \"multi-layer\" file?\nwrite_prediction(rcp85, file = file.path(path, \"g_Aug_RCP85_all.tif\"))\nTo read it back simply provide the filename to read_prediction(). If you are reading back a multi-layer array, be sure to check out the time argument to assign values to the time dimension. Single layer arrays don’t have the concept of time so the time argument is ignored.",
+    "objectID": "index.html",
+    "href": "index.html",
+    "title": "Colby Forecasting",
+    "section": "",
+    "text": "Welcome to the Colby Forecasting 2025 workbook!\nThis document is comprised of sections: forecasting and coding with R programming language.\n\n1 Contacts:\nDr. Nick Record and Ben Tupper\n\n\n2 Questions and issues\nWe have a saying at Bigelow Lab, “there’s no such thing as a dumb question, but the quality of the answers you get may vary widely.” This is so true!\nIf you have a class, coding or forecasting question, start a new “issue” on the github issues tab. If a question has been posed by another, and you think you can help with the answer, then please feel free to respond. If you have a personal question or issue, then contact the instructors directly.\n\n\n3 The wiki\nSome ancillary content for the course has been placed in what is called a wiki. In theory anyone can contribute to a wiki, but in practice only a few do. We are open to suggestions for improvements and additions.\n\n\n\n\n Back to top",
     "crumbs": [
-      "Covariates"
+      "Home"
     ]
   },
   {
-    "objectID": "C04_models.html",
-    "href": "C04_models.html",
-    "title": "Prediction",
+    "objectID": "C01_observations.html",
+    "href": "C01_observations.html",
+    "title": "Observations",
     "section": "",
-    "text": "All models are wrong, but some are useful.\n\nGeorge Box\nModeling starts with a collection of observations (presence and background for us!) and ends up with a collection of coeefficients that can be used with one or more formulas to make a predicition for the past, the present or the future. We are using modeling specifically to make habitat suitability maps for select species under two climate scenarios (RCP45 and RCP85) at two different times (2055 and 2075) in the future.\nWe can choose from a number of different models: random forest “rf”, maximum entropy “maxent” or “maxnet”, boosted regression trees “brt”, general linear models “glm”, etc. The point of each is to make a mathematical representation of natural occurrences. It is important to consider what those occurences might be - categorical like labels? likelihoods like probabilities? continuous like measurements? Here are examples of each…\nWe are modeling with known observations (presences) and a sampling of the background, so we are trying to model a likelihood that a species will be encountered (and reported) relative to the environmental conditions. We are looking for a model that can produce relative likelihood of an encounter that results in a report.\nWe’ll be using a random forest model (rf). We were inspired to follow this route by using this tidy models tutorial prepared by our colleague Omi Johnson.",
+    "text": "Follow this wiki page on obtaining data from OBIS. Keep in mind that you will probably want a species with sufficient number of records in the northwest Atlantic. Just what constitutes “sufficient” is probably subject to some debate, but a couple of hundred as a minumum will be helpful for learning. One thing that might help is to be on alert species that are only congregate in one area such as right along the shoreline or only appear in a few months of the year. It isn’t that those species are not worthy of study, but they may make the learning process harder.\nYou should feel free to get the data for a couple of different species, if one becomes a headache with our given resources, then you can switch easily to another.",
     "crumbs": [
-      "Models"
+      "Observations"
     ]
   },
   {
-    "objectID": "C04_models.html#modifying-the-recipe-with-steps",
-    "href": "C04_models.html#modifying-the-recipe-with-steps",
-    "title": "Prediction",
-    "section": "5.1 Modifying the recipe with steps",
-    "text": "5.1 Modifying the recipe with steps\nSteps are cumulative modifications, and that means the order in which they are added matters. These steps comprise the bulk of pre-processing steps.\nSome modifications are applied row-by-row. For example, rows of the input modeling data that have one or more missing values (NAs) can be problematic and they should be removed.\nOther modifications are to manipulate entire columns. Sometimes the recipes requires subsequent steps before the modeling begins in earnest. For example we know from experience that it is often useful to log scale (base 10) depth when working with biological models. If depth and Xbtm have made it this far, you’ll note that each range over 4 or more orders of magnitude. That’s not a problem by itself, but it can introduce a bias toward larger values whenever the mean is computed. So, we’ll add a step for log scaling these, but only if depth and Xbtm have made it this far (this may vary by species.)\n\nrec = rec |&gt; \n  step_naomit()\nif (\"depth\" %in% cfg$keep_vars){\n  rec = rec |&gt;\n    step_log(depth,  base = 10)\n}\nif (\"Xbtm\" %in% cfg$keep_vars){\n  rec = rec |&gt;\n    step_log(Xbtm,  base = 10)\n}\nrec\n\n\n\n\n── Recipe ──────────────────────────────────────────────────────────────────────\n\n\n\n\n\n── Inputs \n\n\nNumber of variables by role\n\n\noutcome:   1\npredictor: 7\n\n\n\n\n\n── Operations \n\n\n• Removing rows with NA values in: &lt;none&gt;\n\n\nNext we state that we want to remove variables that might be highly correlated with other variables. If two variables are highly correlated, they will not provide the modeling system with more information, just redundant information which doesn’t neccessarily help. step_corr() accepts a variety of arguments specifying which variables to test to correlation including some convenience selectors like all_numeric(), all_string() and friends. We want all predictors which happen to all be numeric, so we can use all_predictors() or all_numeric_predictors(). Specificity is better then generality so let’s choose numeric predictors.\n\n\n\n\n\n\nNote\n\n\n\nWe have already tested variables for high collinearlity, but here we can add a slightly different filter, high correlation, for the same issue. Since we have dealt with this already we shouldn’t expect that step will change the preprocessing very much. But it is instructive to see it in action.\n\n\n\nrec = rec |&gt; \n  step_corr(all_numeric_predictors())\nrec\n\n\n\n\n── Recipe ──────────────────────────────────────────────────────────────────────\n\n\n\n\n\n── Inputs \n\n\nNumber of variables by role\n\n\noutcome:   1\npredictor: 7\n\n\n\n\n\n── Operations \n\n\n• Removing rows with NA values in: &lt;none&gt;\n\n\n• Correlation filter on: all_numeric_predictors()",
+    "objectID": "C01_observations.html#basisofrecord",
+    "href": "C01_observations.html#basisofrecord",
+    "title": "Observations",
+    "section": "5.1 basisOfRecord",
+    "text": "5.1 basisOfRecord\nNext we should examine the basisOfRecord variable to get an understanding of how these observations were made.\n\nobs |&gt; count(basisOfRecord)\n\nSimple feature collection with 4 features and 2 fields\nGeometry type: GEOMETRY\nDimension:     XY\nBounding box:  xmin: -74.65 ymin: 38.8 xmax: -65.00391 ymax: 45.1333\nGeodetic CRS:  WGS 84\n# A tibble: 4 × 3\n  basisOfRecord              n                                              geom\n* &lt;chr&gt;                  &lt;int&gt;                                    &lt;GEOMETRY [°]&gt;\n1 HumanObservation        9354 MULTIPOINT ((-65.07 42.68), (-65.067 42.65), (-6…\n2 NomenclaturalChecklist     1                        POINT (-65.80602 44.97985)\n3 Occurrence                 1                          POINT (-65.2852 42.6243)\n4 PreservedSpecimen        170 MULTIPOINT ((-67.05534 45.09908), (-66.35 45.133…\n\n\nIf you are using a different species you may have different values for basisOfRecord. Let’s take a closer look at the complete records for one from each group.\n\nhuman = obs |&gt;\n  filter(basisOfRecord == \"HumanObservation\") |&gt;\n  slice(1) |&gt;\n  browse_obis()\n\nPlease point your browser to the following url: \n\n\nhttps://api.obis.org/v3/occurrence/00040fa1-7acd-4731-bf1e-6dc16e30c7d4\n\npreserved = obs |&gt;\n  filter(basisOfRecord == \"PreservedSpecimen\") |&gt;\n  slice(1) |&gt;\n  browse_obis()\n\nPlease point your browser to the following url: \n\n\nhttps://api.obis.org/v3/occurrence/003abd48-a98a-4c2f-adc2-8f1d6f71dfa1\n\nchecklist = obs |&gt;\n  filter(basisOfRecord == \"NomenclaturalChecklist\") |&gt;\n  slice(1) |&gt;\n  browse_obis()\n\nPlease point your browser to the following url: \n\n\nhttps://api.obis.org/v3/occurrence/1b967631-4d90-44d0-b57e-cf71c554ee5c\n\noccurrence = obs |&gt;\n  filter(basisOfRecord == \"Occurrence\") |&gt;\n  slice(1) |&gt;\n  browse_obis()\n\nPlease point your browser to the following url: \n\n\nhttps://api.obis.org/v3/occurrence/d6e7882e-a850-435d-a546-73adaf625031\n\n\nNext let’s think about what our minimum requirements might be in oirder to build a model. To answer that we need to think about our environmental covariates in the Brickman data](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Brickman). That data has dimensions of x (longitude), y (latitude) and month. In order to match obseravtions with that data, our observations must be complete in those three variables. Let’s take a look at a summary of the observations which will indicate the number of elements missing in each variable.\n\nsummary(obs)\n\n      id            basisOfRecord        eventDate               year     \n Length:9526        Length:9526        Min.   :1932-09-15   Min.   :1932  \n Class :character   Class :character   1st Qu.:2003-10-02   1st Qu.:2003  \n Mode  :character   Mode  :character   Median :2009-07-11   Median :2009  \n                                       Mean   :2006-10-02   Mean   :2006  \n                                       3rd Qu.:2016-11-05   3rd Qu.:2016  \n                                       Max.   :2021-10-14   Max.   :2021  \n                                       NA's   :7            NA's   :7     \n    month            eventTime         individualCount             geom     \n Length:9526        Length:9526        Min.   : 1.000   POINT        :9526  \n Class :character   Class :character   1st Qu.: 1.000   epsg:4326    :   0  \n Mode  :character   Mode  :character   Median : 1.000   +proj=long...:   0  \n                                       Mean   : 1.112                       \n                                       3rd Qu.: 1.000                       \n                                       Max.   :25.000                       \n                                       NA's   :318",
     "crumbs": [
-      "Models"
+      "Observations"
     ]
   },
   {
-    "objectID": "C04_models.html#add-the-recipe-to-the-workflow",
-    "href": "C04_models.html#add-the-recipe-to-the-workflow",
-    "title": "Prediction",
-    "section": "5.2 Add the recipe to the workflow",
-    "text": "5.2 Add the recipe to the workflow\n\nwflow = wflow |&gt;\n  add_recipe(rec)\nwflow\n\n══ Workflow ════════════════════════════════════════════════════════════════════\nPreprocessor: Recipe\nModel: None\n\n── Preprocessor ────────────────────────────────────────────────────────────────\n2 Recipe Steps\n\n• step_naomit()\n• step_corr()",
+    "objectID": "C01_observations.html#eventdate",
+    "href": "C01_observations.html#eventdate",
+    "title": "Observations",
+    "section": "5.2 eventDate",
+    "text": "5.2 eventDate\nFor Mola mola there are some rows where eventDate is NA. We need to filter those. The filter function looks for a vector of TRUE/FALSE values - one for each row. In our case, we test the eventDate column to see if it is NA, but then we reverse the TRUE/FALSE logical with the preceding ! (pronounded “bang!”). This we retain only the rows where eventDate is notNA`, and then we print the summary again.\n\nobs = obs |&gt;\n  filter(!is.na(eventDate))\nsummary(obs)\n\n      id            basisOfRecord        eventDate               year     \n Length:9519        Length:9519        Min.   :1932-09-15   Min.   :1932  \n Class :character   Class :character   1st Qu.:2003-10-02   1st Qu.:2003  \n Mode  :character   Mode  :character   Median :2009-07-11   Median :2009  \n                                       Mean   :2006-10-02   Mean   :2006  \n                                       3rd Qu.:2016-11-05   3rd Qu.:2016  \n                                       Max.   :2021-10-14   Max.   :2021  \n                                                                          \n    month            eventTime         individualCount             geom     \n Length:9519        Length:9519        Min.   : 1.000   POINT        :9519  \n Class :character   Class :character   1st Qu.: 1.000   epsg:4326    :   0  \n Mode  :character   Mode  :character   Median : 1.000   +proj=long...:   0  \n                                       Mean   : 1.112                       \n                                       3rd Qu.: 1.000                       \n                                       Max.   :25.000                       \n                                       NA's   :315",
     "crumbs": [
-      "Models"
+      "Observations"
     ]
   },
   {
-    "objectID": "C04_models.html#create-the-model",
-    "href": "C04_models.html#create-the-model",
-    "title": "Prediction",
-    "section": "6.1 Create the model",
-    "text": "6.1 Create the model\nWe create a random forest model, declare that it should be run in classification mode (not regression mode), and then specify that we want to use the ranger modeling engine (as opposed to, say, the randForest engine). We additionally specify that it should be able to produce probablilites of a class not just the class label. We also request that it saves bits of info so that we can compare the relative importance of the covariates.\n\nmodel = rand_forest() |&gt;\n  set_mode(\"classification\") |&gt;\n  set_engine(\"ranger\", probability = TRUE, importance = \"permutation\") \nmodel\n\nRandom Forest Model Specification (classification)\n\nEngine-Specific Arguments:\n  probability = TRUE\n  importance = permutation\n\nComputational engine: ranger \n\n\nWell, that feels underwhelming. We can pass arguments unique to the engine using the set_args() function, but, for now we’ll accept the defaults.",
+    "objectID": "C01_observations.html#individualcount",
+    "href": "C01_observations.html#individualcount",
+    "title": "Observations",
+    "section": "5.3 individualCount",
+    "text": "5.3 individualCount\nThat’s better, but we still have 315 NA values for individualCount. Let’s look at at least one record of those in detail; filter out one, and browse it.\n\nobs |&gt;\n  filter(is.na(individualCount)) |&gt;\n  slice(1) |&gt;\n  browse_obis()\n\nPlease point your browser to the following url: \n\n\nhttps://api.obis.org/v3/occurrence/003abd48-a98a-4c2f-adc2-8f1d6f71dfa1\n\n\nEeek! It’s a carcas that washed up on shore! We checked a number of others, and they are all carcases. Is that a presence? Is that what we model are modeling? If not then we should filer those out.\n\nobs = obs |&gt;\n  filter(!is.na(individualCount))\nsummary(obs)\n\n      id            basisOfRecord        eventDate               year     \n Length:9204        Length:9204        Min.   :1932-09-15   Min.   :1932  \n Class :character   Class :character   1st Qu.:2003-07-26   1st Qu.:2003  \n Mode  :character   Mode  :character   Median :2009-07-11   Median :2009  \n                                       Mean   :2006-08-17   Mean   :2006  \n                                       3rd Qu.:2016-11-05   3rd Qu.:2016  \n                                       Max.   :2021-10-14   Max.   :2021  \n    month            eventTime         individualCount             geom     \n Length:9204        Length:9204        Min.   : 1.000   POINT        :9204  \n Class :character   Class :character   1st Qu.: 1.000   epsg:4326    :   0  \n Mode  :character   Mode  :character   Median : 1.000   +proj=long...:   0  \n                                       Mean   : 1.112                       \n                                       3rd Qu.: 1.000                       \n                                       Max.   :25.000                       \n\n\nWell now one has to wonder about a single observation of 25 animals. Let’s check that out.\n\nobs |&gt;\n  filter(individualCount == 25) |&gt;\n  browse_obis()\n\nPlease point your browser to the following url: \n\n\nhttps://api.obis.org/v3/occurrence/c907349a-2c52-4a51-a69a-5a338c5d492a\n\n\nOK, that seems legitmate. And it is possible, Mola mola can congregate for feeding, mating and possibly for karaoke parties.",
     "crumbs": [
-      "Models"
+      "Observations"
     ]
   },
   {
-    "objectID": "C04_models.html#add-the-model-to-the-workflow",
-    "href": "C04_models.html#add-the-model-to-the-workflow",
-    "title": "Prediction",
-    "section": "6.2 Add the model to the workflow",
-    "text": "6.2 Add the model to the workflow\nNow we simply add the model to the workflow.\n\nwflow = wflow |&gt;\n  add_model(model)\nwflow\n\n══ Workflow ════════════════════════════════════════════════════════════════════\nPreprocessor: Recipe\nModel: rand_forest()\n\n── Preprocessor ────────────────────────────────────────────────────────────────\n2 Recipe Steps\n\n• step_naomit()\n• step_corr()\n\n── Model ───────────────────────────────────────────────────────────────────────\nRandom Forest Model Specification (classification)\n\nEngine-Specific Arguments:\n  probability = TRUE\n  importance = permutation\n\nComputational engine: ranger",
+    "objectID": "C01_observations.html#year",
+    "href": "C01_observations.html#year",
+    "title": "Observations",
+    "section": "5.4 year",
+    "text": "5.4 year\nWe know that the “current” climate scenario for the Brickman model data define “current” as the 1982-2013 window. It’s just an average, and if you have values from 1970 to the current year, you probably are safe in including them. But do your observations fall into those years? Let’s make a plot of the counts per year, with dashed lines shown the Brickman “current” cliamtology period.\n\nggplot(data = obs,\n       mapping = aes(x = year)) + \n  geom_bar() + \n  geom_vline(xintercept = c(1982, 2013), linetype = \"dashed\") + \n  labs(title = \"Counts per year\")\n\n\n\n\n\n\n\n\nFor this species, it seem like it is only the record from 1932 that might be a stretch, so let’s filter that out by rejecting records before 1970. This time, instead of asking for a sumamry, we’ll print the dimensions (rows, columns) of the table.\n\nobs = obs |&gt;\n  filter(year &gt;= 1970)\ndim(obs)\n\n[1] 9203    8\n\n\nThat’s still a lot of records. Now let’s check out the distribution across the months of the year.",
     "crumbs": [
-      "Models"
+      "Observations"
     ]
   },
   {
-    "objectID": "C04_models.html#predict-with-the-training-data",
-    "href": "C04_models.html#predict-with-the-training-data",
-    "title": "Prediction",
-    "section": "8.1 Predict with the training data",
-    "text": "8.1 Predict with the training data\nFirst we shall predict with the same data we trained with. The results of this will not really tell us much about our model as it is very circular to predict using the very data used to build the model. So this next section is more about a first pass at using the tools at your disposal.\n\ntrain_pred = predict_table(fitted_wflow, tr_data, type = \"prob\")\ntrain_pred\n\n# A tibble: 5,453 × 4\n   .pred_presence .pred_background .pred      class     \n            &lt;dbl&gt;            &lt;dbl&gt; &lt;fct&gt;      &lt;fct&gt;     \n 1         0.0123            0.988 background background\n 2         0.0627            0.937 background background\n 3         0.328             0.672 background background\n 4         0.183             0.817 background background\n 5         0.119             0.881 background background\n 6         0.0146            0.985 background background\n 7         0.0182            0.982 background background\n 8         0.358             0.642 background background\n 9         0.0539            0.946 background background\n10         0.0264            0.974 background background\n# ℹ 5,443 more rows\n\n\nHere the variables prepended with a dot . are computed, while the class variable is our original. There are many metrics we can use to determine how well this model predicts. Let’s start with the simplest thing… we can make a simply tally of .pred and class.\n\ncount(train_pred, .pred, class)\n\n# A tibble: 4 × 3\n  .pred      class          n\n  &lt;fct&gt;      &lt;fct&gt;      &lt;int&gt;\n1 presence   presence    1394\n2 presence   background   340\n3 background presence     446\n4 background background  3273\n\n\nThere false positives and false negatives, but many are correct. Of course, this is predicting with the very data we used to train the model; knowing that this is predicicting on training data with some many misses might not inspire confidence. But let’s explore more.",
+    "objectID": "C01_observations.html#month",
+    "href": "C01_observations.html#month",
+    "title": "Observations",
+    "section": "5.5 month",
+    "text": "5.5 month\nWe will be making models and predictions for each month of the for the 4 future projection climates. Species and observers do show some seasonality, but it that seasonality so extreme that it might be impossible to model some months because of sparse data? Let’s make a plot of the counts per month.\n\nggplot(data = obs,\n       mapping = aes(x = month)) + \n  geom_bar() + \n  labs(title = \"Counts per month\")\n\n\n\n\n\n\n\n\nOh, rats! By default ggplot plots in alpha-numeric order, which scrambles our month order. To fix that we have to convert the month in a factor type while specifying the order of the factors, and we’ll use the mutate() function to help us.\n\nobs = obs |&gt;\n  mutate(month = factor(month, levels = month.abb))\n\nggplot(data = obs,\n       mapping = aes(x = month)) + \n  geom_bar() + \n  labs(title = \"Counts per month\")\n\n\n\n\n\n\n\n\nThat’s better! So, it may be the for Mola mola we might not be able to successfully model in the cold winter months. That’s good to keep in mind.",
     "crumbs": [
-      "Models"
+      "Observations"
     ]
   },
   {
-    "objectID": "C04_models.html#assess-the-model",
-    "href": "C04_models.html#assess-the-model",
-    "title": "Prediction",
-    "section": "8.2 Assess the model",
-    "text": "8.2 Assess the model\nHewre we walk through a number of common assessment tools. We want to assess a model to ascertain how closely it models reality (or not!) Using the tools is always easy, interpreting the metrics is not always easy.\n\n8.2.1 Confusion matrix\nThe confusion matrix is the next step beyond a simple tally that we made above.\n\ntrain_confmat = conf_mat(train_pred, class, .pred)\ntrain_confmat\n\n            Truth\nPrediction   presence background\n  presence       1394        340\n  background      446       3273\n\n\nYou’ll see this is the same as the simple tally we made, but it comes with handy plotting functionality (shown below). Note that a perfect model would have the upper left and lower right quadrants fully accounting for all points. The lower left quadrant shows us the number of false-negatives while the upper right quadrant shows the number of false-positives.\n\nautoplot(train_confmat, type = \"heatmap\")\n\n\n\n\n\n\n\n\n\n\n8.2.2 ROC and AUC\nThe area under the curve (AUC) of the receiver-operator curve (ROC) is a common metric. AUC values range form 0-1 with 1 reflecting a model that faithfully predicts correctly. Technically an AUC value of 0.5 represents a random model (yup, the result of a coin flip!), so values greater than 0.5 and less than 1.0 are expected.\nFirst we can plot the ROC.\n\nplot_roc(train_pred, class, .pred_presence)\n\n\n\n\n\n\n\n\nWe can assure you from practical experience that this is an atypical ROC. Typically they are not smooth, but this smoothness is an artifact of our use of training data. If you really only need the AUC, you can use the roc_auc() function directly.\n\nroc_auc(train_pred, class,  .pred_presence)\n\n# A tibble: 1 × 3\n  .metric .estimator .estimate\n  &lt;chr&gt;   &lt;chr&gt;          &lt;dbl&gt;\n1 roc_auc binary         0.942\n\n\n\n\n8.2.3 Accuracy\nAccuracy, much like our simple tally above, tells us what fraction of the predictions are correct. Not that here we explicitly provide the predicted class label (not the probability.)\n\naccuracy(train_pred, class, .pred)\n\n# A tibble: 1 × 3\n  .metric  .estimator .estimate\n  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;\n1 accuracy binary         0.856\n\n\n\n\n8.2.4 Partial dependence plot\nPartial dependence reflects the relative contrubution of each variable influence over it’s full range of values. The output is a grid grid of plots showing the relative distribution of the variable (bars) as well as the relative influenceof the variable (line).\n\npartial_dependence_plot(fitted_wflow, data = tr_data)",
+    "objectID": "C01_observations.html#geometry",
+    "href": "C01_observations.html#geometry",
+    "title": "Observations",
+    "section": "5.6 geometry",
+    "text": "5.6 geometry\nLast, but certainly not least, we should consider the possibility that some observations might be on shore. It happens! We already know that some records included fish that were washed up on shore. It’s possible someone mis-keyed the longitude or latitude when entering the vaklues into the database. It’s alos possible that some observations fall just outside the areas where the Brickman data has values. To look for these points, we’ll load the Brickman mask (defines land vs water. Well, really it defines data vs no-data), and use that for further filtering.\nWe need to load the Brickman database, and then filter it for the static variable called “mask”.\n\ndb = brickman_database() |&gt;\n  filter(scenario == \"STATIC\", var == \"mask\")\nmask = read_brickman(db, add_depth = FALSE)\nmask\n\nstars object with 2 dimensions and 1 attribute\nattribute(s):\n      Min. 1st Qu. Median Mean 3rd Qu. Max. NA's\nmask     1       1      1    1       1    1 4983\ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]\n\n\nLet’s see what our mask looks like with the observations drizzled on top. Because the mask only has values of 1 (data) or NA (no-data). You’ll note that we only want to plot the locations of the observations, so we strip obs of everyhting except its geometery.\n\nplot(mask, breaks = \"equal\", axes = TRUE, reset = FALSE)\nplot(st_geometry(obs), pch = \".\", add = TRUE)\n\n\n\n\n\n\n\n\nMaybe with proper with squinting we can see some that faal into no-data areas. The sure-fire way to tell is to extract the mask values at the point locations.\n\nhitOrMiss = extract_brickman(mask, obs)\nhitOrMiss\n\n# A tibble: 9,203 × 3\n   point name  value\n   &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;\n 1 p0001 mask      1\n 2 p0002 mask      1\n 3 p0003 mask      1\n 4 p0004 mask      1\n 5 p0005 mask      1\n 6 p0006 mask      1\n 7 p0007 mask      1\n 8 p0008 mask      1\n 9 p0009 mask      1\n10 p0010 mask      1\n# ℹ 9,193 more rows\n\n\nOK, let’s tally the “value” variable.\n\ncount(hitOrMiss, value)\n\n# A tibble: 2 × 2\n  value     n\n  &lt;dbl&gt; &lt;int&gt;\n1     1  9170\n2    NA    33\n\n\nOoooo, 33 records in obs don’t line up with values in the mask (or in any Brickman data). We should filter those out; we’ll do so with a filter(). Note that we a “reaching” into the hitOrMiss table to access the value column when we use this hitOrMiss$value. Let’s figure out how many records we have dropped with all of this filtering.\n\nobs = obs |&gt;\n  filter(!is.na(hitOrMiss$value))\ndim_end = dim(obs)\n\ndropped_records = dim_start[1] - dim_end[1]\ndropped_records\n\n[1] 356\n\n\nSo, we dropped 356 records which is about 3.7% of the raw OBIS data. Is it worth all that to drop just 4% of the data? Yes! Models are like all things computer… if you put garbage in you should expect to get garbage back out.",
     "crumbs": [
-      "Models"
+      "Observations"
     ]
   },
   {
-    "objectID": "C04_models.html#predict-with-the-testing-data",
-    "href": "C04_models.html#predict-with-the-testing-data",
-    "title": "Prediction",
-    "section": "8.3 Predict with the testing data",
-    "text": "8.3 Predict with the testing data\nFinally, we can repeat these steps with the testing data. This should give use better information than using the training data\n\n8.3.1 Predict\n\ntest_data = testing(split_data)\ntest_pred = predict_table(fitted_wflow, test_data, type = \"prob\")\ntest_pred\n\n# A tibble: 1,819 × 4\n   .pred_presence .pred_background .pred      class   \n            &lt;dbl&gt;            &lt;dbl&gt; &lt;fct&gt;      &lt;fct&gt;   \n 1          0.724           0.276  presence   presence\n 2          0.335           0.665  background presence\n 3          0.550           0.450  presence   presence\n 4          0.480           0.520  background presence\n 5          0.872           0.128  presence   presence\n 6          0.974           0.0262 presence   presence\n 7          0.166           0.834  background presence\n 8          0.194           0.806  background presence\n 9          0.787           0.213  presence   presence\n10          0.615           0.385  presence   presence\n# ℹ 1,809 more rows\n\n\n\n\n8.3.2 Confusion matrix\n\ntest_confmat = conf_mat(test_pred, class, .pred)\nautoplot(test_confmat, type = \"heatmap\")\n\n\n\n\n\n\n\n\n\n\n8.3.3 ROC/AUC\n\nplot_roc(test_pred, class, .pred_presence)\n\n\n\n\n\n\n\n\nThis ROC is more typical of what we see in regular practice.\n\n\n8.3.4 Accuracy\n\naccuracy(test_pred, class, .pred)\n\n# A tibble: 1 × 3\n  .metric  .estimator .estimate\n  &lt;chr&gt;    &lt;chr&gt;          &lt;dbl&gt;\n1 accuracy binary         0.688\n\n\n\n\n8.3.5 Partial Dependence\n\npartial_dependence_plot(fitted_wflow, data = test_data)",
+    "objectID": "C03_covariates.html",
+    "href": "C03_covariates.html",
+    "title": "Covariates",
+    "section": "",
+    "text": "“In the end that was the choice you made, and it doesn’t matter how hard it was to make it. It matters that you did.”\n\nCassandra Clare\nNow we turn our attention to what we know and guess about the environments. We are using the Brickman data to make habitat suitability maps for select species under two climate scenarios (RCP45 and RCP85) at two different times (2055 and 2075) in the future. Each variable we might use is called covariate or predictor. Our covariates are nicely packaged up and tidy, but the reality is that it often requires a good deal of data wrangling if the data are messy.\nOur step here is to make sure that two or more covariates are not highly correlated if they are, then we would likely want to drop all but one.",
     "crumbs": [
-      "Models"
+      "Covariates"
     ]
   },
   {
-    "objectID": "C04_models.html#nowcast",
-    "href": "C04_models.html#nowcast",
-    "title": "Prediction",
-    "section": "15.1 Nowcast",
-    "text": "15.1 Nowcast\nFirst make the prediction. The function yields a stars array object that has three attributes: .pred_presence, .pred_background and .pred. The leading dot simply gives us the heads up that these three values are all computed. The first two range from 0-1 which implies a probability. The last, .pred, is the class label we would assign if we accept that any .pred_presence &gt;= 0.5 should be considered suitable habitat where a reported observation might occur.\nnowcast = predict_stars(wflow, covars)\nnowcast\nNow we can plot what is often called a “habitat suitability index” (hsi) map.\ncoast = read_coastline()\nplot(nowcast['.pred_presence'], main = \"Nowcast August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\nWe can also plot a presence/background labeled map, but keep in mind it is just a thresholded version of the above where “presence” means .pred_presence &gt;= 0.5.\nplot(nowcast['.pred'], main = \"Nowcast August Labels\", \n     axes = TRUE, reset = FALSE)\nplot(coast, col = \"black\", lwd = 2, add = TRUE)",
+    "objectID": "C03_covariates.html#reading-in-the-covariates",
+    "href": "C03_covariates.html#reading-in-the-covariates",
+    "title": "Covariates",
+    "section": "2.1 Reading in the covariates",
+    "text": "2.1 Reading in the covariates\nWe’ll read in the Brickman database, then filter two different subsets to read: “STATIC” covariate bathymetry that apply across all scenarios and times and monthly covariates for the “PRESENT” period. Note that depth is automatically included - that’s an option - see ?read_brickman for more information.\n\ndb = brickman_database()\npresent = read_brickman(filter(db, scenario == \"PRESENT\", interval == \"mon\"))\n\nWe have used August before as our example, let’s continue with August.\n\naug = present |&gt;\n  dplyr::slice(\"month\", \"Aug\")",
     "crumbs": [
-      "Models"
+      "Covariates"
     ]
   },
   {
-    "objectID": "C04_models.html#forecast",
-    "href": "C04_models.html#forecast",
-    "title": "Prediction",
-    "section": "15.2 Forecast",
-    "text": "15.2 Forecast\nNow let’s try our hand at forecasting - let’s try RCP85 in 2075. First we load those parameters, then run the prediction and plot.\ncovars_rcp85_2075 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2075, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \nforecast_2075 = predict_stars(wflow, covars_rcp85_2075)\nforecast_2075\ncoast = read_coastline()\nplot(forecast_2075['.pred_presence'], main = \"RCP85 2075 August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\nHmmm, that’s pretty different than what the nowcast predicts.",
+    "objectID": "C03_covariates.html#make-a-pairs-plot",
+    "href": "C03_covariates.html#make-a-pairs-plot",
+    "title": "Covariates",
+    "section": "2.2 Make a pairs plot",
+    "text": "2.2 Make a pairs plot\nA pairs plot is a plot often used in exploratory data analysis. It makes a grid of mini-plots of a set of variables, and reveals the relationships among the variables pair-by-pair. It’s easy to make.\n\npairs(aug)\n\n\n\n\n\n\n\n\nIn the lower left portion of the plot we see paired scatter plots, at upper right we see the correlation values of the pairs, and long the diagonal we see a histogram of each variable. Some pairs are highly correlated, say over 0.7, and to include both in the modeling might not provide us with greater predictive power. It may feel counterintuitive to remove any variables - more data means more information, right? And more information means more informed models. Consider two measurements, human arm length and inseam. We might use these to predict if a person is tall, but since they are probably strongly collinear/correlated do we really need both?",
     "crumbs": [
-      "Models"
+      "Covariates"
     ]
   },
   {
-    "objectID": "C04_models.html#forecast-2055",
-    "href": "C04_models.html#forecast-2055",
-    "title": "Prediction",
-    "section": "16.1 Forecast 2055",
-    "text": "16.1 Forecast 2055\ncovars_rcp85_2055 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2055, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \nforecast_2055 = predict_stars(wflow, covars_rcp85_2055)\nforecast_2055",
+    "objectID": "C03_covariates.html#identify-the-most-independent-variables-and-the-most-collinear",
+    "href": "C03_covariates.html#identify-the-most-independent-variables-and-the-most-collinear",
+    "title": "Covariates",
+    "section": "2.3 Identify the most independent variables (and the most collinear)",
+    "text": "2.3 Identify the most independent variables (and the most collinear)\nWe have a function that can help use select which variables to remove. filter_collinear() returns a listing of variables it suggests we keep. It attaches to the return value an attribute (like a post-it note stuck on a box) that lists the complementary variables that it suggests we drop. We are choosing a particular method, but you can learn more about using R’s help for ?filter_collinear.\n\nkeep = filter_collinear(aug, method = \"vif_step\")\nkeep\n\n[1] \"MLD\"  \"Sbtm\" \"SSS\"  \"SST\"  \"Tbtm\" \"U\"    \"V\"   \nattr(,\"to_remove\")\n[1] \"Xbtm\"  \"depth\"\n\n\nOf course, we can decide to ignore this advice, and pick which ever ones we want including keeping them all.\nWhatever selection of variables we decide to model with, we will save this listing to a file. That way we can refer to it progammatically. But that comes later.",
     "crumbs": [
-      "Models"
+      "Covariates"
     ]
   },
   {
-    "objectID": "C04_models.html#bind-time-series",
-    "href": "C04_models.html#bind-time-series",
-    "title": "Prediction",
-    "section": "16.2 Bind time series",
-    "text": "16.2 Bind time series\nWe want to bind the .pred_presence attribute for each of the predictions (nowcast, forecast_2055 and forecast_2075). Let’s assume the “present” mean 2020 so we can assign a year.\nrcp85 = c(nowcast, forecast_2055, forecast_2075, along = list(year = c(\"2020\", \"2055\", \"2075\")))\n\n\n\n\n\n\nNote\n\n\n\nCurious about we provide year as a vector of characters instead of a vector of integers? Try running the command above again and check out the 3rd dimension.\n\n\nSince we are plotting multiple arrays, we need to plot the coastline using a “hook” function.\nplot_coast = function(){\n  plot(coast, col = \"orange\", lwd = 2, add = TRUE)\n}\n\nplot(rcp85['.pred_presence'], \n     hook = plot_coast,\n     axes = TRUE, breaks = seq(0, 1, by = 0.1), join_zlim  = TRUE, reset = FALSE)\nHmmmm. Why does there seem to be a strong shift between 2020 and 2055, while the 2055 to 2075 shift seems less pronounced?\n\n\n\n\n\n\nNote\n\n\n\nDon’t forget that there are other ways to plot array based spatial data.",
+    "objectID": "C03_covariates.html#a-closer-look-at-the-model-input-data",
+    "href": "C03_covariates.html#a-closer-look-at-the-model-input-data",
+    "title": "Covariates",
+    "section": "2.4 A closer look at the model input data",
+    "text": "2.4 A closer look at the model input data\nBefore we do commit to a selection of variables, let’s turn our attention back to our presence-background points, and look at just those chosen values rather than at values drawn form across the entire domain. Let’s open the file that contains the “greedy” model input for August during the PRESENT climate scenario.\n\nmodel_input = read_model_input(scientificname = \"Mola mola\", \n                               approach = \"greedy\", \n                               mon = \"Aug\")\nmodel_input\n\nSimple feature collection with 7277 features and 1 field\nGeometry type: POINT\nDimension:     XY\nBounding box:  xmin: -74.89169 ymin: 38.805 xmax: -65.02004 ymax: 45.21401\nGeodetic CRS:  WGS 84\n# A tibble: 7,277 × 2\n   class                    geom\n   &lt;chr&gt;             &lt;POINT [°]&gt;\n 1 presence    (-72.8074 39.056)\n 2 presence      (-71.343 40.52)\n 3 presence  (-68.7691 41.52448)\n 4 presence       (-67.79 43.32)\n 5 presence (-68.44324 42.61177)\n 6 presence    (-72.4328 40.213)\n 7 presence   (-71.8784 40.3569)\n 8 presence      (-65.78 43.195)\n 9 presence       (-70.5 42.767)\n10 presence   (-72.3024 40.1862)\n# ℹ 7,267 more rows\n\n\nNext we’ll extract data values from our August covariates.\n\nvariables = extract_brickman(aug, model_input, form = \"wide\")\nvariables\n\n# A tibble: 7,277 × 10\n   point   MLD  Sbtm   SSS   SST  Tbtm         U         V     Xbtm depth\n   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;\n 1 p0001  5.17  35.0  31.6  23.3  7.50 -0.00161  -0.00340  0.00133  304. \n 2 p0002  4.25  32.8  30.6  21.6  8.15 -0.00420  -0.00206  0.00166   71.6\n 3 p0003  4.64  34.0  30.7  20.2  7.05  0.00168   0.00148  0.000793 138. \n 4 p0004  5.58  34.6  30.7  18.8  7.55  0.00267  -0.000410 0.000957 234. \n 5 p0005  5.04  34.7  30.7  19.0  7.43 -0.00619  -0.00121  0.00224  205. \n 6 p0006  4.01  32.4  30.6  22.0  8.22 -0.00344  -0.000859 0.00126   62.6\n 7 p0007  4.10  32.9  30.5  21.8  8.34 -0.00565  -0.00226  0.00216   71.3\n 8 p0008  3.82  32.4  30.3  18.2  3.56 -0.00702  -0.00431  0.00293   81.6\n 9 p0009  3.20  32.4  30.6  17.9  5.73  0.000275 -0.00101  0.000372  70.6\n10 p0010  4.02  32.9  30.6  22.0  8.62 -0.000900 -0.00148  0.000614  64.9\n# ℹ 7,267 more rows\n\n\nWe are going to call a plotting function, plot_pres_vs_bg(), that wants some of the data from model_input and some of the data in variables. So, we have to do some data wrangling to combine those; we’ll add class to variables and then drop the point column.\n\nvariables = variables |&gt;\n  mutate(class = model_input$class) |&gt;    # the $ extracts a column \n  select(-point)                          # the - means \"deselect\" or \"drop\"\nvariables\n\n# A tibble: 7,277 × 10\n     MLD  Sbtm   SSS   SST  Tbtm         U         V     Xbtm depth class   \n   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;   \n 1  5.17  35.0  31.6  23.3  7.50 -0.00161  -0.00340  0.00133  304.  presence\n 2  4.25  32.8  30.6  21.6  8.15 -0.00420  -0.00206  0.00166   71.6 presence\n 3  4.64  34.0  30.7  20.2  7.05  0.00168   0.00148  0.000793 138.  presence\n 4  5.58  34.6  30.7  18.8  7.55  0.00267  -0.000410 0.000957 234.  presence\n 5  5.04  34.7  30.7  19.0  7.43 -0.00619  -0.00121  0.00224  205.  presence\n 6  4.01  32.4  30.6  22.0  8.22 -0.00344  -0.000859 0.00126   62.6 presence\n 7  4.10  32.9  30.5  21.8  8.34 -0.00565  -0.00226  0.00216   71.3 presence\n 8  3.82  32.4  30.3  18.2  3.56 -0.00702  -0.00431  0.00293   81.6 presence\n 9  3.20  32.4  30.6  17.9  5.73  0.000275 -0.00101  0.000372  70.6 presence\n10  4.02  32.9  30.6  22.0  8.62 -0.000900 -0.00148  0.000614  64.9 presence\n# ℹ 7,267 more rows\n\n\nFinally, can make a specialized plot comparing our variables for each class: presence and background.\n\nplot_pres_vs_bg(variables, \"class\")\n\n\n\n\n\n\n\n\nHow does this inform our thinking about reducing the number of variables? For which variables do presence and background values mirror each other? Which have the least overlap? We know that the model works by finding optimal combinations of covariates for the species. If there is never a difference between the conditions for presences and background then how will it find the optimal niche conditions?",
     "crumbs": [
-      "Models"
+      "Covariates"
     ]
   },
   {
-    "objectID": "C04_models.html#save-the-predictions",
-    "href": "C04_models.html#save-the-predictions",
-    "title": "Prediction",
-    "section": "16.3 Save the predictions",
-    "text": "16.3 Save the predictions\nWe could save all three attributes, but .pred_background is just 1 - .pred_presence, and .pred is just coding “presence” where .pred_presence &gt;= 0.5, so we can always compute those as needed if we have .pred_presence. In that case, let’s just save the first attribute, .pred_presence, in a multilayer GeoTIFF formatted image array file. The write_prediction() function will do just that.\n# make sure the output directory exists\npath = data_path(\"predictions\")\nif (!dir.exists(path)) ok = dir.create(path, recursive = TRUE)\n\n# write individual arrays?\nwrite_prediction(nowcast, file = file.path(path,\"g_Aug_RCP85_2020.tif\"))\nwrite_prediction(forecast_2055, file = file.path(path, \"g_Aug_RCP85_2055.tif\"))\nwrite_prediction(forecast_2075, file = file.path(path, \"g_Aug_RCP85_2075.tif\"))\n\n# or write them together in a \"multi-layer\" file?\nwrite_prediction(rcp85, file = file.path(path, \"g_Aug_RCP85_all.tif\"))\nTo read it back simply provide the filename to read_prediction(). If you are reading back a multi-layer array, be sure to check out the time argument to assign values to the time dimension. Single layer arrays don’t have the concept of time so the time argument is ignored.",
+    "objectID": "C03_covariates.html#saving-a-file-to-keep-track-of-modeling-choices",
+    "href": "C03_covariates.html#saving-a-file-to-keep-track-of-modeling-choices",
+    "title": "Covariates",
+    "section": "2.5 Saving a file to keep track of modeling choices",
+    "text": "2.5 Saving a file to keep track of modeling choices\nYou may have noticed that we write a lot of things to files (aka, “writing to disk”). It’s a useful practice especially when working with a multi-step process. One particular file, a configuration file, is used frequently in data science to store information about the choices we make as we work through our project. Configuration files generally are simple text files that we can easily get the computer to read and write.\nIn R, a confguration is treated as a named list. Each element of a list is named, but beyond that there aren’t any particular rules about confugurations. You can learn more about configurations in this tutorial.\nLet’s make a confuguration list that holds 4 items: version identifier, species name, sampling approach and the names of the variables to model with.\n\ncfg = list(\n  version = \"g_Aug\",               # g for greedy!\n  scientificname = \"Mola mola\",\n  approach = \"greedy\",\n  mon = \"Aug\",\n  keep_vars =  keep)\n\nWe can access by name three ways using what is called “indexing” : using the [[ indexing brackets, using the $ indexing operator or using the getElement() function.\n\ncfg[['scientificname']]\n\n[1] \"Mola mola\"\n\ncfg[[2]]\n\n[1] \"Mola mola\"\n\ncfg$scientificname\n\n[1] \"Mola mola\"\n\ngetElement(cfg, \"scientificname\")\n\n[1] \"Mola mola\"\n\ngetElement(cfg, 2)\n\n[1] \"Mola mola\"\n\n\nNow we’ll write this list to a file. First let’s set up a pathwy where we might store these configurations, and for that matter, to store our modeling files. We’ll make a new directory, models/g008 and write the configuration there. We’ll use the famous “YAML” format to store the file. See the file functions/configuration.R for documentation on reading and writing.\n\nok = make_path(data_path(\"models\")) # make a directory for models\nwrite_configuration(cfg)            \n\nUse the Files pane to navigate to your personal data directory. Open the g_Aug.yaml file - this is what you configuration looks like in YAML. Fortunately we don’t mess manually with these much.",
     "crumbs": [
-      "Models"
+      "Covariates"
     ]
   },
   {
@@ -354,7 +354,7 @@
     "href": "C05_prediction.html#nowcast",
     "title": "Prediction",
     "section": "4.1 Nowcast",
-    "text": "4.1 Nowcast\nFirst make the prediction. The function yields a stars array object that has three attributes: .pred_presence, .pred_background and .pred. The leading dot simply gives us the heads up that these three values are all computed. The first two range from 0-1 which implies a probability. The last, .pred, is the class label we would assign if we accept that any .pred_presence &gt;= 0.5 should be considered suitable habitat where a reported observation might occur.\n\nnowcast = predict_stars(wflow, covars)\nnowcast\n\nstars object with 2 dimensions and 3 attributes\nattribute(s):\n .pred_presence  .pred_background         .pred     \n Min.   :0.000   Min.   :0.003     presence  : 618  \n 1st Qu.:0.031   1st Qu.:0.743     background:5168  \n Median :0.092   Median :0.908     NA's      :4983  \n Mean   :0.183   Mean   :0.817                      \n 3rd Qu.:0.257   3rd Qu.:0.969                      \n Max.   :0.997   Max.   :1.000                      \n NA's   :4983    NA's   :4983                       \ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]\n\n\nNow we can plot what is often called a “habitat suitability index” (hsi) map.\n\ncoast = read_coastline()\nplot(nowcast['.pred_presence'], main = \"Nowcast August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\n\n\n\n\n\n\n\n\nWe can also plot a presence/background labeled map, but keep in mind it is just a thresholded version of the above where “presence” means .pred_presence &gt;= 0.5.\n\nplot(nowcast['.pred'], main = \"Nowcast August Labels\", \n     axes = TRUE, reset = FALSE)\nplot(coast, col = \"black\", lwd = 2, add = TRUE)",
+    "text": "4.1 Nowcast\nFirst make the prediction. The function yields a stars array object that has three attributes: .pred_presence, .pred_background and .pred. The leading dot simply gives us the heads up that these three values are all computed. The first two range from 0-1 which implies a probability. The last, .pred, is the class label we would assign if we accept that any .pred_presence &gt;= 0.5 should be considered suitable habitat where a reported observation might occur.\n\nnowcast = predict_stars(wflow, covars)\nnowcast\n\nstars object with 2 dimensions and 3 attributes\nattribute(s):\n .pred_presence  .pred_background         .pred     \n Min.   :0.000   Min.   :0.000     presence  : 584  \n 1st Qu.:0.035   1st Qu.:0.748     background:5202  \n Median :0.097   Median :0.903     NA's      :4983  \n Mean   :0.184   Mean   :0.816                      \n 3rd Qu.:0.252   3rd Qu.:0.965                      \n Max.   :1.000   Max.   :1.000                      \n NA's   :4983    NA's   :4983                       \ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]\n\n\nNow we can plot what is often called a “habitat suitability index” (hsi) map.\n\ncoast = read_coastline()\nplot(nowcast['.pred_presence'], main = \"Nowcast August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\n\n\n\n\n\n\n\n\nWe can also plot a presence/background labeled map, but keep in mind it is just a thresholded version of the above where “presence” means .pred_presence &gt;= 0.5.\n\nplot(nowcast['.pred'], main = \"Nowcast August Labels\", \n     axes = TRUE, reset = FALSE)\nplot(coast, col = \"black\", lwd = 2, add = TRUE)",
     "crumbs": [
       "Prediction"
     ]
@@ -364,7 +364,7 @@
     "href": "C05_prediction.html#forecast",
     "title": "Prediction",
     "section": "4.2 Forecast",
-    "text": "4.2 Forecast\nNow let’s try our hand at forecasting - let’s try RCP85 in 2075. First we load those parameters, then run the prediction and plot.\n\ncovars_rcp85_2075 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2075, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \n\n\nforecast_2075 = predict_stars(wflow, covars_rcp85_2075)\nforecast_2075\n\nstars object with 2 dimensions and 3 attributes\nattribute(s):\n .pred_presence  .pred_background         .pred     \n Min.   :0.000   Min.   :0.302     presence  :  37  \n 1st Qu.:0.137   1st Qu.:0.689     background:5749  \n Median :0.257   Median :0.743     NA's      :4983  \n Mean   :0.228   Mean   :0.772                      \n 3rd Qu.:0.311   3rd Qu.:0.863                      \n Max.   :0.698   Max.   :1.000                      \n NA's   :4983    NA's   :4983                       \ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]\n\n\n\ncoast = read_coastline()\nplot(forecast_2075['.pred_presence'], main = \"RCP85 2075 August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\n\n\n\n\n\n\n\n\nHmmm, that’s pretty different than what the nowcast predicts.",
+    "text": "4.2 Forecast\nNow let’s try our hand at forecasting - let’s try RCP85 in 2075. First we load those parameters, then run the prediction and plot.\n\ncovars_rcp85_2075 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2075, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \n\n\nforecast_2075 = predict_stars(wflow, covars_rcp85_2075)\nforecast_2075\n\nstars object with 2 dimensions and 3 attributes\nattribute(s):\n .pred_presence  .pred_background         .pred     \n Min.   :0.000   Min.   :0.323     presence  :  36  \n 1st Qu.:0.141   1st Qu.:0.688     background:5750  \n Median :0.260   Median :0.740     NA's      :4983  \n Mean   :0.230   Mean   :0.770                      \n 3rd Qu.:0.312   3rd Qu.:0.859                      \n Max.   :0.677   Max.   :1.000                      \n NA's   :4983    NA's   :4983                       \ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]\n\n\n\ncoast = read_coastline()\nplot(forecast_2075['.pred_presence'], main = \"RCP85 2075 August\", \n     axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)\nplot(coast, col = \"orange\", lwd = 2, add = TRUE)\n\n\n\n\n\n\n\n\nHmmm, that’s pretty different than what the nowcast predicts.",
     "crumbs": [
       "Prediction"
     ]
@@ -374,7 +374,7 @@
     "href": "C05_prediction.html#forecast-2055",
     "title": "Prediction",
     "section": "5.1 Forecast 2055",
-    "text": "5.1 Forecast 2055\n\ncovars_rcp85_2055 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2055, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \nforecast_2055 = predict_stars(wflow, covars_rcp85_2055)\nforecast_2055\n\nstars object with 2 dimensions and 3 attributes\nattribute(s):\n .pred_presence  .pred_background         .pred     \n Min.   :0.000   Min.   :0.447     presence  :   6  \n 1st Qu.:0.122   1st Qu.:0.694     background:5780  \n Median :0.251   Median :0.749     NA's      :4983  \n Mean   :0.221   Mean   :0.779                      \n 3rd Qu.:0.306   3rd Qu.:0.878                      \n Max.   :0.553   Max.   :1.000                      \n NA's   :4983    NA's   :4983                       \ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]",
+    "text": "5.1 Forecast 2055\n\ncovars_rcp85_2055 = read_brickman(db |&gt; filter(scenario == \"RCP85\", year == 2055, interval == \"mon\")) |&gt;\n  select(all_of(cfg$keep_vars)) |&gt;\n  slice(\"month\", \"Aug\") \nforecast_2055 = predict_stars(wflow, covars_rcp85_2055)\nforecast_2055\n\nstars object with 2 dimensions and 3 attributes\nattribute(s):\n .pred_presence  .pred_background         .pred     \n Min.   :0.000   Min.   :0.425     presence  :  31  \n 1st Qu.:0.135   1st Qu.:0.682     background:5755  \n Median :0.263   Median :0.737     NA's      :4983  \n Mean   :0.231   Mean   :0.769                      \n 3rd Qu.:0.318   3rd Qu.:0.865                      \n Max.   :0.575   Max.   :1.000                      \n NA's   :4983    NA's   :4983                       \ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]",
     "crumbs": [
       "Prediction"
     ]
@@ -398,75 +398,5 @@
     "crumbs": [
       "Prediction"
     ]
-  },
-  {
-    "objectID": "C00_coding.html",
-    "href": "C00_coding.html",
-    "title": "Coding",
-    "section": "",
-    "text": "Coding is the practice of writing instructions for computers to follow; computers aren’t clever by themselves - they need to be told what to do. Most coding is text-based; people writing coding instructions into simple text documents. But some coding is graphical or visual. We shall be using text-based coding. We are going to use a free and open source general programming language called R. R programming language has its roots in statistics and science, but it really can be used for anything.\nIn the early days, coding was pretty barebones - all one needed was a text editor and access to the programming language - no frills there, no pretty images, no buttons to push, just typing. As time passed, volunteers added niceties to the text editor, like visualizing plots of data, buttons to save files, colorized text for the typed code, and other bells and whistles. These editors became know as graphical user interfaces (GUI for short.) GUIs keep getting easier and easier for people to use. We will use the GUI known as RStudio. It’s best to think of GUIs as wrappers around the core programming language; they are really nice and pretty, but they can’t do math. The programming language itself (which does do math!), evolved only as it needed to to fix bugs and make general improvements.",
-    "crumbs": [
-      "Coding"
-    ]
-  },
-  {
-    "objectID": "C00_coding.html#loading-the-necessary-tools",
-    "href": "C00_coding.html#loading-the-necessary-tools",
-    "title": "Coding",
-    "section": "4.1 Loading the necessary tools",
-    "text": "4.1 Loading the necessary tools\nFor any coding project you will need to access a select number of tools, often stored on your computer in what is called a package library (it’s just a directory/folder really). When the package is loaded from the library, all of the functionality the author built in to that package is exposed for you to use in your project. We have created a single file that will both install (if needed) and load (if not already loaded) each of these packages. It’s easy to run.\nFirst, make sure that you have loaded the project (File &gt; Open Project) if you haven’t already. Then at the R console pane type the following…\n\nsource(\"setup.R\")\n\nAfter a few moments the command prompt will return to focus. Be sure to run that command at the beginning of every new R session or anytime you are adding new functionality.\nNow we are ready to load some data into your R session.",
-    "crumbs": [
-      "Coding"
-    ]
-  },
-  {
-    "objectID": "C00_coding.html#spatial-data",
-    "href": "C00_coding.html#spatial-data",
-    "title": "Coding",
-    "section": "4.2 Spatial data",
-    "text": "4.2 Spatial data\nSpatial data is any data that has been assigned to a location on a planet (or even between planets!); that means environmental data is mapped to locations on oblate spheroids (like Earth). The oblate spheroid shape presents interesting but challenging math to the data scientist. Modern spatial data is designed to make data science easier by handling all of the location information in a discrete and standardized manner. By discrete we mean that we don’t have to sweat the details.\n\n4.2.1 Point data\nMany spatial data sets come as point data - locations (longitude, latitude and maybe altitude/depth and/or time) with one or more measurements (temperature, cloudiness, probability of precipitation, abundance of fish, population density, etc) attached to that point. Here is an example of point data about long-term oceanographic monitoring buoys in the Gulf of Maine (“gom”). We’ll read the buoy data into a variable, buoy. Next we can print the result simply by typing the name (or you could type print(buoys) if you like all the extra typing.)\n\nbuoys = gom_buoys()\nbuoys\n\nSimple feature collection with 6 features and 3 fields\nGeometry type: POINT\nDimension:     XY\nBounding box:  xmin: -70.4277 ymin: 42.3233 xmax: -65.9267 ymax: 44.10163\nGeodetic CRS:  WGS 84\n# A tibble: 6 × 4\n  name  longname            id                geometry\n* &lt;chr&gt; &lt;chr&gt;               &lt;chr&gt;          &lt;POINT [°]&gt;\n1 wms   Western Maine Shelf B01    (-70.4277 43.18065)\n2 cms   Central Maine Shelf E01     (-69.3578 43.7148)\n3 pb    Penobscot Bay       F01   (-68.99689 44.05495)\n4 ems   Eastern Maine Shelf I01   (-68.11359 44.10163)\n5 jb    Jordan Basin        M01   (-67.88029 43.49041)\n6 nec   Northeast Channel   N01     (-65.9267 42.3233)\n\n\n\n\n\n\n\n\nNote\n\n\n\nYou can get the online documention for functions a couple of ways. You can type ?name_of_function, or or help(name_of_function). Try ?gom_buoys as an example.\nSometimes you need more - like seeing the function itself. You can always try typing the function name without any trailing parentheses.\n\ngom_buoys\n\nfunction (form = c(\"table\", \"sf\")[2]) \n{\n    x = structure(list(name = c(\"wms\", \"cms\", \"pb\", \"ems\", \"jb\", \n        \"nec\"), longname = c(\"Western Maine Shelf\", \"Central Maine Shelf\", \n        \"Penobscot Bay\", \"Eastern Maine Shelf\", \"Jordan Basin\", \n        \"Northeast Channel\"), id = c(\"B01\", \"E01\", \"F01\", \"I01\", \n        \"M01\", \"N01\"), lon = c(-70.4277, -69.3578, -68.99689, \n        -68.11359, -67.88029, -65.9267), lat = c(43.18065, 43.7148, \n        44.05495, 44.10163, 43.49041, 42.3233)), row.names = c(NA, \n        -6L), class = c(\"tbl_df\", \"tbl\", \"data.frame\"))\n    if (tolower(form[1]) == \"sf\") \n        x = sf::st_as_sf(x, coords = c(\"lon\", \"lat\"), crs = 4326)\n    x\n}\n&lt;bytecode: 0x7fe9513d2b48&gt;\n\n\nIf that still doesn’t work, we highly recommend trying Rseek.org which is an R-language specific search engine.\n\n\nSo there are 6 buoys, each with an attached attribute “name”, “longname” and “id”, as well as the spatial location datain the “geometry” column (just longitude and latitude in this case). We can easily plot these using the “name” column as a color key. For more on plotting spatial data, see this wiki page.\n\nplot(buoys['id'], axes = TRUE, pch = 16)\n\n\n\n\n\n\n\n\nWell, that’s pretty, but without a shoreline it lacks context.",
-    "crumbs": [
-      "Coding"
-    ]
-  },
-  {
-    "objectID": "C00_coding.html#linestrings-and-polygon-data",
-    "href": "C00_coding.html#linestrings-and-polygon-data",
-    "title": "Coding",
-    "section": "4.3 Linestrings and polygon data",
-    "text": "4.3 Linestrings and polygon data\nLinestrings (open shapes) and polygons (closed shape) are much like point data, except that each geometry is linestring or polygon. We have a set of polygons/linestring that represent the coastline.\n\ncoast = read_coastline()\ncoast\n\nSimple feature collection with 14 features and 0 fields\nGeometry type: MULTILINESTRING\nDimension:     XY\nBounding box:  xmin: -74.9 ymin: 38.95218 xmax: -65 ymax: 46.06477\nGeodetic CRS:  WGS 84\n# A tibble: 14 × 1\n                                                                            geom\n                                                           &lt;MULTILINESTRING [°]&gt;\n 1 ((-72.1019 41.01504, -72.15127 41.05146, -72.18389 41.04678, -72.28745 41.02…\n 2 ((-73.68745 45.56143, -73.85293 45.51572, -73.96055 45.44141, -73.92021 45.4…\n 3 ((-73.69531 45.5855, -73.57236 45.69448, -73.72466 45.67183, -73.85771 45.57…\n 4 ((-66.32412 44.25732, -66.27378 44.29229, -66.21035 44.39204, -66.25049 44.3…\n 5 ((-68.69077 44.24873, -68.70303 44.23198, -68.70171 44.18267, -68.66118 44.1…\n 6 ((-66.89707 44.62891, -66.7625 44.68179, -66.75337 44.70981, -66.74541 44.79…\n 7 ((-68.29941 44.45649, -68.34702 44.43037, -68.40947 44.36426, -68.41172 44.2…\n 8 ((-71.39307 41.46675, -71.36533 41.48525, -71.35449 41.54229, -71.36431 41.5…\n 9 ((-74.25049 39.52939, -74.1332 39.68076, -74.10674 39.74644, -74.25317 39.55…\n10 ((-74.18818 40.6146, -74.23589 40.5187, -74.18813 40.52285, -74.13853 40.541…\n11 ((-70.67373 41.44854, -70.7605 41.37358, -70.8292 41.35898, -70.7853 41.3274…\n12 ((-71.34624 41.46938, -71.29092 41.4646, -71.24141 41.49194, -71.23203 41.65…\n13 ((-70.0627 41.32847, -70.08662 41.31758, -70.23306 41.28633, -70.05508 41.24…\n14 ((-74.9 39.14709, -74.89702 39.14546, -74.9 39.1329), (-74.9 38.95218, -74.7…\n\n\nIn this case, each record of geometry is a “MULTILINESTRING”, which is a group of one or more linestrings. Note that no other variables are in this table - it’s just the geometry.\nLet’s plot these geometries, and add the points on top.\n\nplot(coast, col = \"orange\", lwd = 2, axes = TRUE, reset = FALSE,\n     main = \"Buoys in the Gulf of Maine\")\nplot(st_geometry(buoys), pch = 1, cex = 0.5, add = TRUE)\ntext(st_geometry(buoys), labels = buoys$id, cex = 0.7, adj = c(1,-0.1))",
-    "crumbs": [
-      "Coding"
-    ]
-  },
-  {
-    "objectID": "C00_coding.html#array-data-aka-raster-data",
-    "href": "C00_coding.html#array-data-aka-raster-data",
-    "title": "Coding",
-    "section": "4.4 Array data (aka raster data)",
-    "text": "4.4 Array data (aka raster data)\nOften spatial data comes in grids, like regular arrays of pixels. These are great for all sorts of data like satellite images, bathymetry maps and environmental modeling data. We’ll be working with environmental modeling data which we call “Brickman data”. You can learn more about Brickman data in the wiki. We’ll be glossing over the details here, but there’s lots of detail in the wiki.\nWe’ll read in the database that tracks 82 Brickman data files, and then immediately filter out the rows that define the “PRESENT” scenario (where present means 1982–2013) and monthly climatology models.\n\ndb = brickman_database() |&gt;\n  filter(scenario == \"PRESENT\", interval == \"mon\") # note the double '==', it's comparative\ndb\n\n# A tibble: 8 × 4\n  scenario year    interval var  \n  &lt;chr&gt;    &lt;chr&gt;   &lt;chr&gt;    &lt;chr&gt;\n1 PRESENT  PRESENT mon      MLD  \n2 PRESENT  PRESENT mon      Sbtm \n3 PRESENT  PRESENT mon      SSS  \n4 PRESENT  PRESENT mon      SST  \n5 PRESENT  PRESENT mon      Tbtm \n6 PRESENT  PRESENT mon      U    \n7 PRESENT  PRESENT mon      V    \n8 PRESENT  PRESENT mon      Xbtm \n\n\nIf you are wondering about filtering a table, be sure to check out the wiki on tabular data to get started.\nYou might be wondering what that |&gt; is doing. It is called a pipe, and it delivers the output of one function to the next function as the first parameter (aka argument). For example, brickman_database() produces a table, that table is immediately passed into filter() to choose rows that match our criteria.\nNow that we have the database listing just the records we want, we pass it to the read_brickman() function.\n\ncurrent = read_brickman(db)\ncurrent\n\nstars object with 3 dimensions and 9 attributes\nattribute(s):\n                Min.      1st Qu.        Median          Mean      3rd Qu.\nMLD     1.011275e+00  5.583339810  15.967359543  18.910421492 2.809953e+01\nSbtm    2.324167e+01 32.136343956  34.232215881  33.507147254 3.491243e+01\nSSS     1.644333e+01 30.735633373  31.104771614  31.492407921 3.203519e+01\nSST    -7.826599e-01  6.434107542  12.359498501  12.151707840 1.763068e+01\nTbtm   -2.676387e-01  3.595118523   6.110801697   6.122372065 7.521761e+00\nU      -2.121380e-01 -0.010892980  -0.002634738  -0.010139401 7.229637e-04\nV      -1.883337e-01 -0.010722862  -0.002858645  -0.008474233 9.565173e-04\nXbtm    3.275602e-06  0.001458065   0.003088348   0.008360344 7.256525e-03\ndepth   5.000000e+00 60.258880615 145.012619019 923.313763739 1.704049e+03\n               Max.  NA's\nMLD    1.066982e+02 59796\nSbtm   3.515742e+01 59796\nSSS    3.559161e+01 59796\nSST    2.643147e+01 59796\nTbtm   2.460999e+01 59796\nU      7.469980e-02 59796\nV      5.264002e-02 59796\nXbtm   1.899681e-01 59796\ndepth  4.964409e+03 59796\ndimension(s):\n      from  to offset    delta refsys point      values x/y\nx        1 121 -74.93  0.08226 WGS 84 FALSE        NULL [x]\ny        1  89  46.08 -0.08226 WGS 84 FALSE        NULL [y]\nmonth    1  12     NA       NA     NA    NA Jan,...,Dec    \n\n\nThis loads quite a complex set of arrays, but they have spatial information attached in the dimensions section. The x and y dimensions represent longitude and latitude respectively. The 3rd dimension, month, is time based.\nHere we plot all 12 months of sea surface temperature, SST. Note the they all share the same color scale so that they are easy to compare.\n\nplot(current['SST'])\n\n\n\n\n\n\n\n\nJust as we are able to plot linestrings/polygons along side points, we can also plot these with arrays (rasters). To do this for one month (“Apr”) of one variable (“SSS”) we simply need to slice that data out of the current variable.\n\napril_sss = current['SSS'] |&gt;\n  slice(\"month\", \"Apr\")\napril_sss\n\nstars object with 2 dimensions and 1 attribute\nattribute(s):\n         Min. 1st Qu.   Median    Mean  3rd Qu.     Max. NA's\nSSS  16.44333 30.8342 31.10334 31.4641 31.93447 35.59161 4983\ndimension(s):\n  from  to offset    delta refsys point x/y\nx    1 121 -74.93  0.08226 WGS 84 FALSE [x]\ny    1  89  46.08 -0.08226 WGS 84 FALSE [y]\n\n\nThen it’s just plot, plot, plot.\n\nplot(april_sss, axes = TRUE, reset = FALSE)\nplot(st_geometry(coast), add = TRUE, col = \"orange\", lwd = 2)\nplot(st_geometry(buoys), add = TRUE, pch = 16, col = \"purple\")\n\n\n\n\n\n\n\n\nWe can plot ALL twelve months of a variable (“SST”) with the coast and points shown. There is one slight modification to be made since a single call to plot() actually gets invoked 12 times for this data. So where do we add in the buoys and coast? Fortunately, we can create what is called a “hook” function - who knows where the name hook came from? Once the hook function is defined, it will be applied to the each of the 12 subplots.\n\n# a little function that gets called just after each sub-plot\n# it simple adds the coast and buoy\nadd_coast_and_buoys = function(){\n  plot(st_geometry(coast), col = \"orange\", lwd = 2, add = TRUE)\n  plot(st_geometry(buoys), pch = 16, col = \"purple\", add = TRUE)\n}\n\n# here we call the plot, and tell R where to call `add_coast_and_buoys()` after\n# each subplot is made\nplot(current['SST'], hook = add_coast_and_buoys)",
-    "crumbs": [
-      "Coding"
-    ]
-  },
-  {
-    "objectID": "C00_coding.html#coding-assignment",
-    "href": "C00_coding.html#coding-assignment",
-    "title": "Coding",
-    "section": "4.5 Coding Assignment",
-    "text": "4.5 Coding Assignment\n\n\n\n\n\n\nUse the menu option File &gt; New File &gt; R Script to create a blank file. Save the file (even though it is empty) in the “assignment” directory as “assignment_script_1.R”. Use this file to build a script that meets the following challenge. Note that the existing file, “assignment_script_0.R” is already there as an example.\nUse the Brickman tutorial to extract data from the location of Buoy M01 for RCP4.5 2055. Make a plot of SST (y-axis) as a function of month (x-axis). Here’s one possible outcome.\n\n\n\nBuoy M01, RCP4.5 2055",
-    "crumbs": [
-      "Coding"
-    ]
-  },
-  {
-    "objectID": "about.html",
-    "href": "about.html",
-    "title": "About",
-    "section": "",
-    "text": "Brought to you by the Tandy Center for Ocean Forecasting at Bigelow Laboratory for Ocean Science and Colby College.\n\n1 Contacts\nDr. Nick Record\nBen Tupper\nRaising questions or issues: If you have a question, start a new “issue” on the github issues tab. If a question has been posed by another, and you think you can help with the answer then please feel free to respond.\n\n\n2 Website\nWe build the website using quarto which is perfect from transforming [RMarkdown](https://rmarkdown.rstudio.com/ pages into a website with minimal investment. See this wiki page if you would like to add your work to you own fork of the class repository.\n\n\n\n\n Back to top",
-    "crumbs": [
-      "About"
-    ]
   }
 ]
\ No newline at end of file