README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
source(file.path("vignettes", "_common.R"))
knitr::opts_chunk$set(
  fig.path = "man/figures/README-"
)
```

# epiprocess

The `{epiprocess}` package works with epidemiological time series data and
provides tools to manage, analyze, and process the data in preparation for
modeling. It is designed to work in tandem with
[epipredict](https://cmu-delphi.github.io/epipredict/), which provides
pre-built epiforecasting models and as well as tools to build custom models.
Both packages are designed to lower the barrier to entry and implementation cost
for epidemiological time series analysis and forecasting.

`{epiprocess}` contains:

- `epi_df()` and `epi_archive()`, two data frame classes (that work like a
`{tibble}` with `{dplyr}` verbs) for working with epidemiological time
series data
  - `epi_df` is for working with a snapshot of data at a single point in time
  - `epi_archive` is for working with histories of data that changes over time
  - one of the most common uses of `epi_archive` is for accurate backtesting of
    forecasting models, see `vignette("backtesting", package="epipredict")`
- signal processing tools building on these data structures such as
  - `epi_slide()` for sliding window operations (aids with feature creation)
  - `epix_slide()` for sliding window operations on archives (aids with
  backtesting)
  - `growth_rate()` for computing growth rates
  - `detect_outlr()` for outlier detection
  - `epi_cor()` for computing correlations

If you are new to this set of tools, you may be interested learning through a
book format: [Introduction to Epidemiological
Forecasting](https://cmu-delphi.github.io/delphi-tooling-book/).

You may also be interested in:

- `{epidatr}`, for accessing wide range
of epidemiological data sets, including COVID-19 data, flu data, and more.
- [rtestim](https://github.com/dajmcdon/rtestim), a package for estimating
the time-varying reproduction number of an epidemic.

This package is provided by the [Delphi group](https://delphi.cmu.edu/) at
Carnegie Mellon University.

## Installation

To install:

```{r, eval=FALSE}
# Stable version
pak::pkg_install("cmu-delphi/epiprocess@main")

# Dev version
pak::pkg_install("cmu-delphi/epiprocess@dev")
```

The package is not yet on CRAN.

## Usage

Once `epiprocess` and `epidatr` are installed, you can use the following code to
get started:

```{r, results=FALSE, warning=FALSE, message=FALSE}
library(epiprocess)
library(epidatr)
library(dplyr)
library(magrittr)
```

Get COVID-19 confirmed cumulative case data from JHU CSSE for California,
Florida, New York, and Texas, from March 1, 2020 to January 31, 2022

```{r warning=FALSE, message=FALSE}
df <- pub_covidcast(
  source = "jhu-csse",
  signals = "confirmed_cumulative_num",
  geo_type = "state",
  time_type = "day",
  geo_values = "ca,fl,ny,tx",
  time_values = epirange(20200301, 20220131),
  as_of = as.Date("2024-01-01")
) %>%
  select(geo_value, time_value, cases_cumulative = value)
df
```

Convert the data to an epi_df object and sort by geo_value and time_value. You
can work with an `epi_df` like you can with a `{tibble}` by using `{dplyr}`
verbs

```{r}
edf <- df %>%
  as_epi_df(as_of = as.Date("2024-01-01")) %>%
  arrange_canonical() %>%
  group_by(geo_value) %>%
  mutate(cases_daily = cases_cumulative - lag(cases_cumulative, default = 0))
edf
```

Compute the 7 day moving average of the confirmed daily cases for each geo_value

```{r}
edf <- edf %>%
  group_by(geo_value) %>%
  epi_slide_mean(cases_daily, .window_size = 7, na.rm = TRUE, .prefix = "smoothed_")
edf
```

Autoplot the confirmed daily cases for each geo_value

```{r}
edf %>%
  autoplot(smoothed_cases_daily)
```