-
Notifications
You must be signed in to change notification settings - Fork 0
/
appendixS4-comparison.qmd
executable file
·410 lines (307 loc) · 12.7 KB
/
appendixS4-comparison.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
---
title: "Appendix S4. Multistate Arnason-Schwarz: Robust design vs. single survey"
author: |
| Matthijs Hollanders & J. Andrew Royle
|
| Manuscript title: Know what you don't know: Embracing state uncertainty in disease-structured multistate models
|
| Quarto doc: [https://github.com/mhollanders/multievent](https://github.com/mhollanders/multievent)
format:
pdf:
toc: true
number-sections: true
fig-asp: 0.8
editor_options:
chunk_output_type: console
bibliography: multievent.bib
csl: methods-in-ecology-and-evolution.csl
nocite: |
@wickham2016, @youngflesh2018mcmcvis
geometry: margin = 1in
font: 11pt
execute:
output: false
warning: false
message: false
header-includes: |
\usepackage{titling}
\pretitle{\begin{flushleft}\Large\bfseries}
\posttitle{\end{flushleft}}
\preauthor{\begin{flushleft}\large}
\postauthor{\end{flushleft}}
---
```{r}
#| echo: false
options(scipen = 999, digits = 3)
```
\newpage
# Introduction
We simulate multistate mark-recapture Arnason-Schwarz [@arnason1972; @arnason1973; @schwarz1993] data for animals with two alive states using R 4.2.1 [@rcoreteam2022]. We generate two types of capture histories: (1) robust design, with multiple secondary surveys during primary occasions of assumed closure [@pollock1982], and (2) "traditional", consisting of a series of single surveys. We use Bayesian inference and Markov chain Monte Carlo (MCMC) methods using NIMBLE 0.12.2 [@devalpine2017; @devalpine2022] to analyze each capture history. We then examine the traceplots and effective sample sizes of each model, demonstrating that estimation of robust design formulations is superior to traditional designs.
# Simulating data
## Input parameters
We specify the sampling condition and define parameter values for the simulation. The three ecological states (*z*) are (1) alive in state 1, (2) alive in state 2, and (3) dead. The three observed states (*y*) are (1) recaptured in state 1, (2) recaptured in state 2, and (3) not recaptured.
```{r}
# Number of ecological (z, latent) and observed (y, data) states
n.states <- 3
# Sampling conditions
n.ind <- 200 # Number of individuals in simulation
n.prim <- 8 # Number of primary survey occasions
n.sec <- 2 # Number of secondary survey occasions (per primary)
# Parameters
pi <- 0.4 # Probability of being in state 2 at first capture
phi1 <- 0.9 # Survival probability of state 1
phi2 <- 0.7 # Survival probability of state 2
psi12 <- 0.6 # Probability of transitioning from state 1 to state 2
psi21 <- 0.3 # Probability of transitioning from state 2 to state 1
p1 <- 0.6 # Recapture probability of state 1
p2 <- 0.7 # Recapture probability of state 2
```
## Generating capture histories
We create the transition probability matrices (TPMs) of the ecological and observation processes, and then run the simulations, which yields the capture histories (*y*) that we use as data for the models [@kery2012a]. We simulate the same number of surveys for each model: `r n.prim` primary occasions with `r n.sec` secondary occasions for the robust design, and `r n.prim * n.sec` single surveys for the traditional. Note that this corresponds to `r n.prim - 1` ecological state transitions for the robust design, and `r n.prim * n.sec - 1` for the traditional.
```{r}
# Ecological process TPM
TPM.z <- matrix(c(phi1 * (1 - psi12), phi1 * psi12, 1 - phi1,
phi2 * psi21, phi2 * (1 - psi21), 1 - phi2,
0, 0, 1),
nrow = n.states, ncol = n.states, byrow = T)
# Observation process TPM
TPM.o <- matrix(c(p1, 0, 1 - p1,
0, p2, 1 - p2,
0, 0, 1),
nrow = n.states, ncol = n.states, byrow = T)
# Arrays for ecological (z) and observed (y) states
z.rd <- array(NA, c(n.ind, n.prim))
y.rd <- array(NA, c(n.ind, n.prim, n.sec))
z.t <- y.t <- array(NA, c(n.ind, n.prim * n.sec))
# Primary occasion and survey that individuals were first captured
first.rd <- sort(sample(1:(n.prim - 1), n.ind, replace = T))
first.t <- sort(sample(1:(n.prim * n.sec - 1), n.ind, replace = T))
# We start using NIMBLE's functionality here with the rcat() function
library(nimble)
# Simulation
for (i in 1:n.ind) {
# ROBUST DESIGN
# Ecological and observed state at first capture
z.rd[i,first.rd[i]] <- y.rd[i,first.rd[i],1] <- rcat(1, c(1 - pi, pi))
for (t in (first.rd[i] + 1):n.prim) {
# Ecological process
z.rd[i,t] <- rcat(1, TPM.z[z.rd[i,t-1],])
for (k in 1:n.sec) {
# Observation process
y.rd[i,t,k] <- rcat(1, TPM.o[z.rd[i,t],])
} # k
} # t
# SINGLE SURVEY
# Ecological and observed state at first capture
z.t[i,first.t[i]] <- y.t[i,first.t[i]] <- rcat(1, c(1 - pi, pi))
for (t in (first.t[i] + 1):(n.prim * n.sec)) {
# Ecological process
z.t[i,t] <- rcat(1, TPM.z[z.t[i,t-1],])
# Observation process
y.t[i,t] <- rcat(1, TPM.o[z.t[i,t],])
} # t
} # i
```
# Multistate models
We write the code for the two models using NIMBLE.
## Robust design code
```{r}
rdMScode <- nimbleCode({
# PRIORS
phi1 ~ dbeta(1, 1)
phi2 ~ dbeta(1, 1)
psi12 ~ dbeta(1, 1)
psi21 ~ dbeta(1, 1)
p1 ~ dbeta(1, 1)
p2 ~ dbeta(1, 1)
# ECOLOGICAL PROCESS (survival and state transitions)
# Alive in state 1
TPM.z[1,1] <- phi1 * (1 - psi12) # Survives, remains in state 1
TPM.z[1,2] <- phi1 * psi12 # Survives, transitions to state 2
TPM.z[1,3] <- 1 - phi1 # Dies
# Alive in state 2
TPM.z[2,1] <- phi2 * psi21 # Survives, transitions to state 1
TPM.z[2,2] <- phi2 * (1 - psi21) # Survives, remains in state 2
TPM.z[2,3] <- 1 - phi2 # Dies
# Dead
TPM.z[3,1] <- 0 # Transitions to state 1
TPM.z[3,2] <- 0 # Transitions to state 2
TPM.z[3,3] <- 1 # Remains dead
# OBSERVATION PROCESS (recapture)
# Alive in state 1
TPM.o[1,1] <- p1 # Recaptured in state 1
TPM.o[1,2] <- 0 # Recaptured in state 2
TPM.o[1,3] <- 1 - p1 # Not recaptured
# Alive in state 2
TPM.o[2,1] <- 0 # Recaptured in state 1
TPM.o[2,2] <- p2 # Recaptured in state 2
TPM.o[2,3] <- 1 - p2 # Not recaptured
# Dead
TPM.o[3,1] <- 0 # Recaptured in state 1
TPM.o[3,2] <- 0 # Recaptured in state 2
TPM.o[3,3] <- 1 # Not recaptured
# LIKELIHOOD
for (i in 1:n.ind) {
# Ecological state at first capture
z[i,first[i]] <- y[i,first[i],1]
for (t in (first[i] + 1):n.prim) {
# Ecological process
z[i,t] ~ dcat(TPM.z[z[i,t-1],1:3])
for (k in 1:n.sec) {
# Observation process
y[i,t,k] ~ dcat(TPM.o[z[i,t],1:3])
} # k
} # t
} # i
})
```
## Traditional (single survey) code
```{r}
tMScode <- nimbleCode({
# PRIORS
phi1 ~ dbeta(1, 1)
phi2 ~ dbeta(1, 1)
psi12 ~ dbeta(1, 1)
psi21 ~ dbeta(1, 1)
p1 ~ dbeta(1, 1)
p2 ~ dbeta(1, 1)
# ECOLOGICAL PROCESS (survival and state transitions)
# Alive in state 1
TPM.z[1,1] <- phi1 * (1 - psi12) # Survives, remains in state 1
TPM.z[1,2] <- phi1 * psi12 # Survives, transitions to state 2
TPM.z[1,3] <- 1 - phi1 # Dies
# Alive in state 2
TPM.z[2,1] <- phi2 * psi21 # Survives, transitions to state 1
TPM.z[2,2] <- phi2 * (1 - psi21) # Survives, remains in state 2
TPM.z[2,3] <- 1 - phi2 # Dies
# Dead
TPM.z[3,1] <- 0 # Transitions to state 1
TPM.z[3,2] <- 0 # Transitions to state 2
TPM.z[3,3] <- 1 # Remains dead
# OBSERVATION PROCESS (recapture)
# Alive in state 1
TPM.o[1,1] <- p1 # Recaptured in state 1
TPM.o[1,2] <- 0 # Recaptured in state 2
TPM.o[1,3] <- 1 - p1 # Not recaptured
# Alive in state 2
TPM.o[2,1] <- 0 # Recaptured in state 1
TPM.o[2,2] <- p2 # Recaptured in state 2
TPM.o[2,3] <- 1 - p2 # Not recaptured
# Dead
TPM.o[3,1] <- 0 # Recaptured in state 1
TPM.o[3,2] <- 0 # Recaptured in state 2
TPM.o[3,3] <- 1 # Not recaptured
# LIKELIHOOD
for (i in 1:n.ind) {
# Ecological state at first capture
z[i,first[i]] <- y[i,first[i]]
for (t in (first[i] + 1):n.surv) {
# Ecological process
z[i,t] ~ dcat(TPM.z[z[i,t-1],1:3])
# Observation process
y[i,t] ~ dcat(TPM.o[z[i,t],1:3])
} # t
} # i
})
```
## Run models
We first run the robust design and summarize the posterior distributions of model parameters.
```{r}
# Parameters to monitor
monitors <- c("phi1", "phi2", "psi12", "psi21", "p1", "p2")
# Number of chains and iterations
n.chains <- 3 ; n.iter <- 3000
# Robust design
rdMSstart <- Sys.time()
rdMSdraws <- nimbleMCMC(code = rdMScode,
constants = list(n.ind = dim(y.rd)[1],
n.prim = dim(y.rd)[2],
n.sec = dim(y.rd)[3],
first = first.rd),
data = list(y = y.rd),
inits = list(z = z.rd),
monitors = monitors,
nchains = n.chains,
niter = n.iter)
rdMSend <- Sys.time()
```
```{r}
#| output: true
library(MCMCvis)
rdMSend - rdMSstart
print(rdMSsum <- MCMCsummary(rdMSdraws, monitors))
```
And then do the same for the traditional model.
```{r}
# Traditional
tMSstart <- Sys.time()
tMSdraws <- nimbleMCMC(code = tMScode,
constants = list(n.ind = dim(y.t)[1],
n.surv = dim(y.t)[2],
first = first.t),
data = list(y = y.t),
inits = list(z = z.t),
monitors = monitors,
nchains = n.chains,
niter = n.iter)
tMSend <- Sys.time()
```
```{r}
#| output: true
tMSend - tMSstart
print(tMSsum <- MCMCsummary(tMSdraws, monitors))
```
# Results
## Traceplots
We see that mixing is much better in the robust design formulation, even with only `r n.sec` secondary surveys, and there is considerably more posterior correlation between parameters in the single survey model. The percent overlap between the prior and posterior distributions (PPO) is greater in the traditional model---particularly in the recapture and state transition probabilities---versus the robust design model, indicative reduced parameter identifiability [@gimenez2009]. Specifically, PPO of 35% is demonstrative of weak parameter identifiability. The red horizontal lines in the density plots are the prior distributions, and the red dashed lines mark the simulation input parameter values. The traditional model also takes longer to run with equal survey effort.
```{r}
#| output: true
# Prior for parameters for PPO
prior <- rbeta(n.iter, 1, 1)
# Robust design
MCMCtrace(rdMSdraws, params = monitors, gvals = c(phi1, phi2, psi12, psi21, p1, p2),
priors = prior, Rhat = T, n.eff = T, ind = T, pdf = F)
# Traditional
MCMCtrace(tMSdraws, params = monitors, gvals = c(phi1, phi2, psi12, psi21, p1, p2),
priors = prior, Rhat = T, n.eff = T, ind = T, pdf = F)
```
## Effective sample sizes
The effective sample sizes of parameters after `r n.chains * n.iter` iterations are higher with the robust design (red) compared to the single survey (grey) design.
```{r}
#| echo: false
#| output: true
#| fig-width: 6.5
#| fig-asp: 0.618
library(tidyverse)
# Plot theme
theme_set(theme_classic(base_size = 14))
theme_update(axis.ticks = element_line(color = "#333333"),
axis.line = element_line(color = "#333333"),
text = element_text(color = "#333333"),
axis.text = element_text(color = "#333333"))
# Plot details
n.param <- 6
params <- c("phi[1]", "phi[2]", "psi[12]", "psi[21]", "italic(p)[1]", "italic(p)[2]")
# Axis limits
n.break <- 4
ax.break <- round(max(rdMSsum$n.eff) / n.break, -2)
# Plot
tibble(rbind(rdMSsum, tMSsum)) |>
mutate(parameter = factor(rep(params, 2), levels = params),
model = factor(c(rep("Robust design", n.param), rep("Traditional", n.param)))) |>
ggplot(aes(x = parameter)) +
geom_col(aes(y = n.eff, fill = model),
alpha = 3/4,
position = position_dodge()) +
scale_x_discrete(labels = ggplot2:::parse_safe) +
scale_y_continuous(limits = c(0, ax.break * (n.break + 0.5)),
breaks = seq(ax.break, n.break * ax.break, ax.break),
expand = c(0, 0)) +
scale_fill_manual(values = c("#a4260f", "grey50")) +
coord_cartesian() +
labs(x = NULL,
y = "Effective sample size") +
guides(fill = "none")
```
\newpage
# References