-
Notifications
You must be signed in to change notification settings - Fork 0
/
cjs.qmd
224 lines (169 loc) · 9.08 KB
/
cjs.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: Cormack-Jolly-Seber
description: Estimating survival with CJS models in PyMC
author:
name: Philip T. Patton
affiliation:
- Marine Mammal Research Program
- Hawaiʻi Institute of Marine Biology
date: today
bibliography: refs.bib
jupyter: python3
---
In this notebook, I explore the Cormack-Jolly-Seber (CJS) model for estimating survival using capture-recapture data in PyMC. I have drawn considerable inspiration from Austin Rochford's [notebook](https://austinrochford.com/posts/2018-01-31-capture-recapture.html) on capture-recapture in PyMC, the [second chapter](https://github.com/philpatton/autocapture) of my dissertation (a work in progress), and @mccrea2014.
# Cormack-Jolly-Seber
The goal of the CJS model is to estimate survival, accounting for capture probabilities being less than one. There are many methods for estimating parameters in the model, including state-space formulations that explicitly model the latent alive/dead state $z.$ Following the theme of the previous notebooks, I instead marginalize this variable out of the model by using the so-called $M$-array. This is an array that contains the sufficient statistics for the CJS models. For example, $m_{1,2}$ is the number of individuals that were released on at $t=1$ and were *first* recaptured on $t=2.$
| | Number Released | | Number recaptured | | Never recaptured |
|------|:---------------:|----------:|:-----------------:|-----------|:----------------:|
| | | 1982 | 1983 | 1984 | |
| 1981 | $R_1$ | $m_{1,2}$ | $m_{1,3}$ | $m_{1,4}$ | $R_1-m_{1\cdot}$ |
| 1982 | $R_2$ | | $m_{2,3}$ | $m_{2,3}$ | $R_2-m_{2\cdot}$ |
| 1983 | $R_3$ | | | $m_{3,4}$ | $R_3-m_{3\cdot}$ |
: An example of the M-array from a four year capture-recapture survey. The number recaptured, $m_{i,j}$ refers to the number of individuals released at $i$ who were first recaptured at time $j$.
```{python}
from pymc.distributions.dist_math import factln
from scipy.linalg import circulant
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import arviz as az
import pymc as pm
import pytensor.tensor as pt
plt.rcParams['figure.dpi'] = 600
plt.style.use('fivethirtyeight')
plt.rcParams['axes.facecolor'] = 'white'
plt.rcParams['figure.facecolor'] = 'white'
plt.rcParams['axes.spines.left'] = False
plt.rcParams['axes.spines.right'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.bottom'] = False
sns.set_palette("tab10")
def create_recapture_array(history):
"""Create the recapture array from a capture history."""
_, occasion_count = history.shape
interval_count = occasion_count - 1
recapture_array = np.zeros((interval_count, interval_count), int)
for occasion in range(occasion_count - 1):
# which individuals, captured at t, were later recaptured?
captured_this_time = history[:, occasion] == 1
captured_later = (history[:, (occasion + 1):] > 0).any(axis=1)
now_and_later = captured_this_time & captured_later
# when were they next recaptured?
remaining_history = history[now_and_later, (occasion + 1):]
next_capture_occasion = (remaining_history.argmax(axis=1)) + occasion
# how many of them were there?
ind, count = np.unique(next_capture_occasion, return_counts=True)
recapture_array[occasion, ind] = count
return recapture_array.astype(int)
def create_m_array(history):
'''Create the m-array from a capture history.'''
# leftmost column of the m-array
number_released = history.sum(axis=0)
# core of the m-array
recapture_array = create_recapture_array(history)
number_recaptured = recapture_array.sum(axis=1)
# no animals that were released on the last occasion are recaptured
number_recaptured = np.append(number_recaptured, 0)
never_recaptured = number_released - number_recaptured
# add a dummy row at the end to make everything stack
zeros = np.zeros(recapture_array.shape[1])
recapture_array = np.row_stack((recapture_array, zeros))
# stack the relevant values into the m-array
m_array = np.column_stack((number_released, recapture_array, never_recaptured))
return m_array.astype(int)
def fill_lower_diag_ones(x):
'''Fill the lower diagonal of a matrix with ones.'''
return pt.triu(x) + pt.tril(pt.ones_like(x), k=-1)
```
As an example, I use the cormorant data from @mccrea2014, Table 4.6. These data come from an eleven year capture-recapture study between 1982 and 1993. These were breeding cormorants of unknown age. The data is summarized in the $M$-array below. The last column is the number that were never recapured. The number released can be calculated from the array.
```{python}
cormorant = np.array([
[ 10, 4, 2, 2, 0, 0, 0, 0, 0, 0, 12],
[ 0, 42, 12, 16, 1, 0, 1, 1, 1, 0, 83],
[ 0, 0, 85, 22, 5, 5, 2, 1, 0, 1, 53],
[ 0, 0, 0, 139, 39, 10, 10, 4, 2, 0, 94],
[ 0, 0, 0, 0, 175, 60, 22, 8, 4, 2, 199],
[ 0, 0, 0, 0, 0, 159, 46, 16, 5, 2, 193],
[ 0, 0, 0, 0, 0, 0, 191, 39, 4, 8, 171],
[ 0, 0, 0, 0, 0, 0, 0, 188, 19, 23, 284],
[ 0, 0, 0, 0, 0, 0, 0, 0, 101, 55, 274],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 84, 97]])
```
```{python}
interval_count, T = cormorant.shape
number_recaptured = cormorant[:,:-1]
never_recaputured = cormorant[:,-1]
number_released = number_recaptured.sum(axis=1) + never_recaputured
```
This PyMC model will look different than the ones in previous notebooks, simply because it requires many tricks to get the probabilities in the correct format for the $m$-array, then modeling the $m$-array as a multinomial with the associated cell probabilities. These probabilities correspond to the situations in the $m$-array, such as the probability that an animal survived and was not recaptured until a later date. In this example, I model survival as time-varying, i.e., $\phi(t).$
```{python}
#| fig-cap: Visual representation of the CJS model
#| label: fig-cjs
# utility vectors for creating arrays and array indices
intervals = np.arange(interval_count)
row_indices = np.reshape(intervals, (interval_count, 1))
col_indices = np.reshape(intervals, (1, interval_count))
# matrix indicating the number of intervals between sampling occassions
intervals_between = np.clip(col_indices - row_indices, 0, np.inf)
with pm.Model() as phit:
# priors for catchability and survival
p = pm.Uniform('p', 0, 1)
phi = pm.Uniform('phi', 0, 1, shape=interval_count)
# broadcast phi into a matrix
phi_mat = pt.ones_like(number_recaptured) * phi
phi_mat = fill_lower_diag_ones(phi_mat) # fill irrelevant values
# probability of surviving between i and j in the m-array
p_alive = pt.cumprod(phi_mat, axis=1)
p_alive = pt.triu(p_alive) # select relevant (upper triangle) values
# probability of not being captured between i and j
p_not_cap = pt.triu((1 - p) ** intervals_between)
# probabilities associated with each cell in the m-array
nu = p_alive * p_not_cap * p
# probability for the animals that were never recaptured
chi = 1 - nu.sum(axis=1)
# combine the probabilities into a matrix
chi = pt.reshape(chi, (interval_count, 1))
marr_probs = pt.horizontal_stack(nu, chi)
# distribution of the m-array
marr = pm.Multinomial(
'M-array',
n=number_released,
p=marr_probs,
observed=cormorant
)
pm.model_to_graphviz(phit)
```
```{python}
with phit:
cjs_idata = pm.sample()
```
```{python}
#| fig-cap: Traceplots for the cormorant CJS model with $p(\cdot)\phi(t)$. MLEs from @mccrea2014 shown by vertical and horizontal lines.
#| label: fig-cjs_trace
mccrea_p = [0.51]
mccrea_phi = [0.79, 0.56, 0.83, 0.86, 0.73, 0.70, 0.81, 0.64, 0.46, 0.95]
az.plot_trace(
cjs_idata,
figsize=(10, 4),
lines=[("phi", {}, [mccrea_phi]), ("p", {}, [mccrea_p])]
);
```
The model samples fairly quickly in this parameterization. The traceplots above include comparisons to the estimates from @mccrea2014. While a bit messy, the plots show a high level of agreement between their estimates and the ones here. To clean things up a bit, I plot the estimates for $\phi$ over time, along with the 94\% credible intervals
```{python}
#| fig-cap: Violin plots of the posterior for apparent surival over time from the cormorant CJS. Horizontal lines represent the posterior median.
#| label: fig-cjs_surv
fig, ax = plt.subplots(figsize=(6,4))
t = np.arange(1983, 1993)
phi_samps = az.extract(cjs_idata, var_names='phi').values.T
phi_median = np.median(phi_samps, axis=0)
ax.plot(t, phi_median, linestyle='dotted', color='lightgray', linewidth=2)
ax.violinplot(phi_samps, t, showmedians=True, showextrema=False)
ax.set_ylim((0,1))
ax.set_ylabel(r'Apparent survival $\phi$')
ax.set_title(r'Cormorant CJS')
plt.show()
```
```{python}
%load_ext watermark
%watermark -n -u -v -iv -w
```