diff --git a/docs/api/covidcast-signals/google-symptoms.md b/docs/api/covidcast-signals/google-symptoms.md index a8d403de2..684a429e8 100644 --- a/docs/api/covidcast-signals/google-symptoms.md +++ b/docs/api/covidcast-signals/google-symptoms.md @@ -16,8 +16,6 @@ nav_order: 1 * **Time type:** day (see [date format docs](../covidcast_times.md)) * **License:** To download or use the data, you must agree to the Google [Terms of Service](https://policies.google.com/terms) -
Data issue: Between May 13 2024 and August 6 2024, signals values were 25%-50% lower compared to previous time periods. This affects all signals and symptom sets. Currently there is no explanation for the decrease in search volume, and the issue is under investigation by our data source partners.
- ## Overview This data source is based on the [COVID-19 Search Trends symptoms @@ -25,6 +23,8 @@ dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-da this search data, we estimate the volume of searches mapped to symptom sets related to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Values are comparable across signals in the same location but NOT across geographic regions**. For example, within a state, we can compare `s01_smoothed_search` and `s02_smoothed_search`. However, we cannot compare `s01_smoothed_search` between states. Larger numbers represent increased relative popularity of symptom-related searches. +Between May 13 2024 and August 6 2024, [signal values were much lower](#limitations) compared to previous time periods due to a data outage. + #### Symptom sets * _s01_: Cough, Phlegm, Sputum, Upper respiratory tract infection @@ -94,7 +94,7 @@ population-weighted averaging. For aggregation purposes only, we assign a value of 0 to source regions that have no data provided due to quality or privacy issues for a certain day (see -Limitations for details). We do not report aggregated regions if none of their +[Limitations](#limitations) for details). We do not report aggregated regions if none of their source regions have data. Because of this censoring behavior, the resulting data for aggregated regions does not fully match the _actual_ search volume for these regions (which is not provided to us). @@ -106,6 +106,13 @@ As a result the delay can range from 3 to 10 days or even more. We check for updates every day and provide the most up-to-date data. ## Limitations + +Between May 13 2024 and August 6 2024, signal values were 25%-50% lower compared to previous time periods. +This affected _all_ signals and symptom sets. +The drop does not reflect actual search term popularity during the affected period. +The apparent decrease in search volume was caused by an outage in the data pipeline on the source side. +The data was unfortunately not recoverable and the dip can not be repaired, but data outside the listed time period is unaffected. + When daily volume in a region does not meet quality or privacy thresholds, set by Google, no daily value is reported. Weekly data may be available from Google in these cases, but we do not yet support importation using weekly data.