-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-SpectralAnalysis.Rmd
335 lines (246 loc) · 11.1 KB
/
04-SpectralAnalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
# Spectral Analysis in R
Recall that in spectral analysis we step into a "bizarro world", switching from the time domain to the frequency domain. There, two essential features are of interest: peaks, and background. In this tutorial, we will perform spectral analysis on Rio Grande streamflow, and see if we can learn anything interesting about it in this way.
## Load Data and Packages
### Packages
```{r, message=FALSE}
library(tidyverse)
library(scales)
library(astrochron)
library(lomb)
library(biwavelet)
library(scales)
```
### Dataset
We will be looking at the daily discharge of the Rio Grande, which has been measured at Embudo, NM since 1889. The data and their associated metadata may be retrieved from the [USGS website](https://waterdata.usgs.gov/nwis/dv?cb_00060=on&format=rdb&site_no=08279500&legacy=&referred_module=sw&period=&begin_date=1889-01-01&end_date=2024-05-20). Let's load them and have a look:
```{r}
df <- read.table('https://waterdata.usgs.gov/nwis/dv?cb_00060=on&format=rdb&site_no=08279500&legacy=&referred_module=sw&period=&begin_date=1889-01-01&end_date=2024-05-20', skip=35, sep="\t")
names(df) <- c('agency', 'site_no', 'datetime','discharge (cf/s)','code')
df$datetime <- lubridate::as_datetime(df$datetime)
head(df)
```
## Data preview
First, let us inspect the distribution of values:
```{r}
hist(df$`discharge (cf/s)`, main = "Distribution of Rio Grande Discharge", xlab = "Discharge (cf/s)")
```
Like many other variables, streamflow is **positively skewed**: the distribution is asymmetric, with most values near zero and few instances of very high streamflow. Now let's inspect the temporal behavior:
```{r}
ggplot(df, aes(x = datetime, y = `discharge (cf/s)`)) +
geom_line() +
labs(x = "Time", y = "Discharge (cf/s)") +
theme_light()
```
## Data cleaning
### Aggregate to monthly
```{r}
discharge_monthly <- df |>
group_by(Date = floor_date(ymd(df$datetime), '1 month')) |>
summarise(discharge = mean(`discharge (cf/s)`, na.rm = TRUE), .groups = 'drop')
ggplot(discharge_monthly, aes(x=Date, y=discharge)) +
labs(title = "Rio Grande at Embudo, NM (monthly)",
x="Year (CE)",
y="dicharge (cf/s)") +
geom_line() +
ggtitle("Rio Grand Discharge") +
theme_light()
```
### Even sampling
```{r}
missing_vals <- discharge_monthly$Date[which(is.na(discharge_monthly$discharge))]
missing_vals
df3 <- discharge_monthly |>
dplyr::filter(Date > max(missing_vals))
hist(as.numeric(diff(df3$Date)),main = "Distribution of Time Steps", xlab = "Days")
df4 <- df3 |>
mutate(Date = decimal_date(Date)) |>
astrochron::linterp(dt=(1/12),genplot = F) |>
dplyr::filter(Date > max(missing_vals))
ggplot(df4, aes(x=Date, y=discharge)) +
labs(title = "Rio Grande at Embudo, NM (30.4375-day period)",
x="Year (CE)",
y="dicharge (cf/s)") +
geom_line() +
theme_light()
```
### Spectral analysis
#### multi-taper
```{r}
mtm1 <- mtm(df4,output = 1,verbose = F) |>
mutate(Period = 1/Frequency,
Power = Power/1000) |> #account for differing units in astrochrons MTM
dplyr::select(Period, Power)
reverselog_trans <- function(base = exp(1)) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv,
log_breaks(base = base),
domain = c(1e-100, Inf))
}
ggplot(mtm1, aes(x=Period, y=Power)) +
labs(title = "Rio Grande discharge spectrum (mtm)") +
geom_line() +
scale_y_log10() +
scale_x_continuous(trans = reverselog_trans(10),
breaks = c(100,50,20,10,5,2,1,0.5,0.2),
limits = c(100,0.2)) +
theme_light()
```
The prominent feature is a strong annual cycle and higher-order harmonics, super-imposed on a “warm colored” background (i.e. variations at long timescales are larger than variations at short timescales). There are hints of long-term scaling as well, one in the subannual range, and one from period of 1 to 50y. We may be interested in the shape of this background, and whether it can be fit by one or more power laws.
In addition, we may be interested in interannual (year-to-year) to interdecadal (decade-to-decade) variations in streamflow. We would like to know whether the peak observed around 3-4 years in the plot above is *significant* with respect to a reasonable null.
### STL
To do both of these things it would make sense to remove the annual cycle. We'll use STL decomposition for this.
#### Gaussianize
Now, because of the positive skew mentioned above, it turns out that the STL decomposition has trouble with this dataset. Instead, we can use the gaussianize() to map this dataset to a standard normal. There are applications for which this would be undesirable, but for this purpose it is good:
Gaussianize is available as part of the geoChronR package - but it's also pretty simple, so let's just define that here:
```{r}
gaussianize <- function(X) {
if (!is.matrix(X)) {
X = as.matrix(X)
}
p = NCOL(X)
n = NROW(X)
Xn = matrix(data = NA, nrow = n, ncol = p)
for (j in 1:p) {
nz <- which(!is.na(X[, j]))
N <- length(nz)
R <- rank(X[nz, j])
CDF <- R/N - 1/(2 * N)
Xn[nz, j] <- qnorm(CDF)
}
return(Xn)
}
```
#### MTM, again
Let's take a look at how this affects the spectrum:
```{r}
df5 <- df4 |>
mutate(discharge = gaussianize(discharge))
mtm2 <- mtm(df5,output = 1,verbose = F) |>
mutate(Period = 1/Frequency, #Calculate frequency
Power = Power/1000, #account for differing units in astrochrons MTM
AR1_95_power = AR1_95_power/1000) |> #account for differing units in astrochrons MTM
dplyr::select(Period, Power)
ggplot(mtm2, aes(x=Period, y=Power)) +
labs(title = "Rio Grande discharge spectrum (mtm)") +
geom_line() +
scale_y_log10() +
scale_x_continuous(trans = reverselog_trans(10),
breaks = c(100,50,20,10,5,2,1,0.5,0.2),
limits = c(100,0.2)) +
theme_light()
```
#### Applying STL
We see that the spectrum is nearly unchanged, but now we can apply STL:
```{r}
dis <- ts(as.numeric(df5$discharge), start = c(1889, 1), frequency = 12)
stl_res <- stl(dis, s.window = "periodic")
plot(stl_res)
```
#### The trend
Now let's apply spectral analysis to the trend component:
```{r}
df6 <- df5 |>
mutate(discharge = as.numeric(stl_res$time.series[,2]))
mtm3 <- mtm(df6,output = 1,verbose = F,) |>
mutate(Period = 1/Frequency,
Power = Power/1000, #account for differing units in astrochrons MTM
AR1_95_power = AR1_95_power/1000) #account for differing units in astrochrons MTM
trendOnlySpectrum <- ggplot(mtm3, aes(x=Period, y=Power)) +
labs(title = "Rio Grande discharge trend-only spectrum (mtm)") +
geom_line() +
geom_line(aes(y = AR1_95_power),color = "red") +
scale_y_log10() +
scale_x_continuous(trans = reverselog_trans(10),
breaks = c(100,50,20,10,5,2,1,0.5,0.2),
limits = c(100,0.2)) +
theme_light()
trendOnlySpectrum
```
Indeed this removed the annual cycle and its harmonics (for the most part). It's a fairly aggressive treatment, but we can now compare the significance of the interannual peaks w.r.t to an AR(1) benchmark.
We see that the broadband peak in the 2-10y range is above the AR(1) spectrum, so these frequencies could be considered significant.
### Estimation of scaling behavior
In this last example, we fit a single scaling exponent to the timeseries, spanning the entire range of frequencies (periods). Under the hood, all we are doing is fitting a straight line to the spectrum:
```{r}
# Log-transform both axes
toScale <- dplyr::filter(mtm3, between(Period,0.2,100))
log_period <- log10(toScale$Period)
log_spec <- log10(toScale$Power)
# Fit a line
fit <- lm(log_spec ~ log_period)
toScale$plotLine <- predict(fit)
# Plot
trendOnlySpectrum +
geom_line(data = toScale,aes(x = Period,y = 10^plotLine),color = "blue",linetype = 3)
```
This results in a fairly steep exponent near `r round(coef(fit)[2], 2)`. If we were specifically interested in the scaling exponent (spectral slope) between periods of 2-100y, you would do it like so:
```{r}
# Log-transform both axes
toScale <- dplyr::filter(mtm3, between(Period,2,100))
log_period <- log10(toScale$Period)
log_spec <- log10(toScale$Power)
# Fit a line
fitLong <- lm(log_spec ~ log_period)
toScale$plotLine <- predict(fitLong)
# Plot
trendOnlySpectrum +
geom_line(data = toScale,aes(x = Period,y = 10^plotLine),color = "blue",linetype = 3)
```
We see that this results in a much flatter line, i.e. a smaller scaling exponent around `r round(coef(fitLong)[2], 2)`.
## Gap-tolerant spectral analysis
### Lomb-Scargle
We return to the original series and apply a technique to obtain the spectrum, keeping gaps in the series, known as the Lomb-Scargle periodogram:
```{r}
#Let's average (or bin) the data into monthly intervals
dfBinned <- df |>
mutate(year = year(datetime),
month = month(datetime)) |>
group_by(year,month) |>
summarize(discharge = mean(`discharge (cf/s)`,na.rm = TRUE)) |>
mutate(yearDecimal = year + month/12 - 1/24) |>
dplyr::filter(is.finite(discharge))
# Compute Lomb-Scargle periodogram
lomb <- lomb::lsp(x = dfBinned$discharge,
times = dfBinned$yearDecimal,
ofac = 4,
scaleType = "period",normalize = "press",
plot = FALSE)
ltp <- data.frame(Period = 1/lomb$scanned, Power = lomb$power)
# Plot
ggplot(ltp, aes(x=Period, y=Power)) +
labs(title = "Rio Grande discharge Lomb-Scargle spectrum") +
geom_line() +
#geom_line(aes(y = AR1_95_power),color = "red") +
scale_y_log10() +
scale_x_continuous(trans = reverselog_trans(10),
breaks = c(100,50,20,10,5,2,1,0.5,0.2),
limits = c(100,0.2)) +
theme_light()
```
We can see that this resembles our interpolated MTM approach - but has some differences. Let's plot on the same graph to take a closer look:
```{r}
comboPlotData <- bind_rows(mutate(mtm1,method = "mtm"),
mutate(ltp,method = "lomb-scargle"))
ggplot(comboPlotData, aes(x=Period, y=Power,color = method)) +
labs(title = "Rio Grande discharge spectrum") +
geom_line() +
scale_y_log10() +
scale_x_continuous(trans = reverselog_trans(10),
breaks = c(100,50,20,10,5,2,1,0.5,0.2),
limits = c(100,0.2)) +
theme_light()
```
It is often useful to be able to compare methods and/or parameter choices, to see if the results are robust. We see that some choices lump peaks into broad bands, others tend to slice them up.
### Wavelet
Wavelet analysis using the Morlet wavelet:
```{r}
# Convert to matrix format required by biwavelet
dat <- cbind(time_vec, dis_vec)
dat <- dat[1:1880,]
# Compute wavelet transform
wav <- biwavelet::wt(dat)
# Plot wavelet power spectrum
biwavelet::plot.biwavelet(wav, plot.phase = FALSE, type = "power.norm")
```
We'll explore wavelets more in the next chapter.
## Takeaways
There are many methods for looking at the spectra of time series. For unevenly sampled series, we are more limited in methods, and interpolation may cause artifacts in our analyses.