Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update google-symptoms documentation with results of 2023 summer dip fix #1587

Merged
merged 1 commit into from
Jan 28, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions docs/api/covidcast-signals/google-symptoms.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ nav_order: 1
* **Time type:** day (see [date format docs](../covidcast_times.md))
* **License:** To download or use the data, you must agree to the Google [Terms of Service](https://policies.google.com/terms)

<div style="background-color:#ff00001c; padding: 10px 30px;"><strong>Data issue:</strong> Between May 13 2024 and August 6 2024, signals values were 25%-50% lower compared to previous time periods. This affects <i>all</i> signals and symptom sets. Currently there is no explanation for the decrease in search volume, and the issue is under investigation by our data source partners.</div>

## Overview

This data source is based on the [COVID-19 Search Trends symptoms
dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=en-GB). Using
this search data, we estimate the volume of searches mapped to symptom sets related
to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Values are comparable across signals in the same location but NOT across geographic regions**. For example, within a state, we can compare `s01_smoothed_search` and `s02_smoothed_search`. However, we cannot compare `s01_smoothed_search` between states. Larger numbers represent increased relative popularity of symptom-related searches.

Between May 13 2024 and August 6 2024, [signal values were much lower](#limitations) compared to previous time periods due to a data outage.

#### Symptom sets

* _s01_: Cough, Phlegm, Sputum, Upper respiratory tract infection
Expand Down Expand Up @@ -94,7 +94,7 @@ population-weighted averaging.

For aggregation purposes only, we assign a value of 0 to source regions that
have no data provided due to quality or privacy issues for a certain day (see
Limitations for details). We do not report aggregated regions if none of their
[Limitations](#limitations) for details). We do not report aggregated regions if none of their
source regions have data. Because of this censoring behavior, the resulting data
for aggregated regions does not fully match the _actual_ search volume for these
regions (which is not provided to us).
Expand All @@ -106,6 +106,13 @@ As a result the delay can range from 3 to 10 days or even more. We check for
updates every day and provide the most up-to-date data.

## Limitations

Between May 13 2024 and August 6 2024, signal values were 25%-50% lower compared to previous time periods.
This affected _all_ signals and symptom sets.
nmdefries marked this conversation as resolved.
Show resolved Hide resolved
The drop does not reflect actual search term popularity during the affected period.
The apparent decrease in search volume was caused by an outage in the data pipeline on the source side.
The data was unfortunately not recoverable and the dip can not be repaired, but data outside the listed time period is unaffected.

When daily volume in a region does not meet quality or privacy thresholds, set
by Google, no daily value is reported. Weekly data may be available from Google
in these cases, but we do not yet support importation using weekly data.
Expand Down
Loading