You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We had noticed some strange patterns before when plotting the wildfire data from the API. However, the statistics on the Hazard.intensity.data looked fine, so we assumed a plotting issue. Turns out, the Hazard.intensity is not in canonical format, meaning that multiple entries in intensity.data map to the same entry in the intensity matrix. This causes erroneously high intensities for about 6% of all centroids.
To Reproduce
Steps to reproduce the behavior/error:
Download a wildfire dataset (here: Mexico) and plot it. Observe weird rectangular patterns with high intensity
Notice that the matrix is not in canonical format, and that its maximum is way larger than the data maximum.
To add to the confusion, calling max will bring it into canonical format.
Canonical format: False
Min data: 300.2
Max data: 506.2
Min value: 0.0
Max value: 1695.1
Canonical format: True
Min data: 300.2
Max data: 1695.1
Background
The reason for this is that multiple entries in intensity.data point to the same matrix entries of intensity. csr_matrix supports this, and sums up these values.
wf_mexico=client.get_hazard("wildfire", properties={"country_iso3alpha": "MEX"})
indices_event_0=wf_mexico.intensity.indices[
wf_mexico.intensity.indptr[0] : wf_mexico.intensity.indptr[1]
]
value_counts=pd.Series(indices_event_0).value_counts()
print(value_counts[value_counts>1])
print(
"Data pointing to centroid 84509: "f"{wf_mexico.intensity.data[np.nonzero(indices_event_0==84509)]}"
)
# Even in non-canonical format, data entries are summedprint(f"Intensity for centroid 84509: {wf_mexico.intensity[0, 84509]}")
The canonical format can be explicitly reached by calling sum_duplicates(). Summing the values is the default behavior and causes the exacerbated intensities.
How to solve this
The following code prunes the doubled entries and calls an aggregation method. Unfortunately, the doubled entries are actually not doubled (as you can see above) but show different values. It is thus unclear how they should be merged into a single sensible value. It is possible to take the mean or the max.
I want to make clear that this is not just a fix. The following function modifies the hazard data. There seems to have been some underlying problem with merging the original wildfire data or with the hazard definition that should be fixed in order to avoid this inconsistent hazard definition from the start.
defprune_hazard(hazard, agg):
intensity=hazard.intensitynum_rows=intensity.shape[0]
# Iterate over events (rows)forrowinrange(num_rows):
# Extract indices and data for eventindices_event=intensity.indices[
intensity.indptr[row] : intensity.indptr[row+1]
]
data_event=intensity.data[
intensity.indptr[row] : intensity.indptr[row+1]
]
# Iterate over duplicate indices and aggregate the dataindex_counts=pd.Series(indices_event).value_counts()
foridx, countinindex_counts[index_counts>1].items():
data_idx=np.nonzero(indices_event==idx)
data_event[data_idx] =agg(data_event[data_idx]) /count# Bring into canonical formatintensity.sum_duplicates()
# Call like this:prune_hazard(wf_mexico, np.max) # or np.meanwf_mexico.plot_intensity(0)
Result:
Climada Version: develop
System Information (please complete the following information):
Describe the bug
We had noticed some strange patterns before when plotting the wildfire data from the API. However, the statistics on the
Hazard.intensity.data
looked fine, so we assumed a plotting issue. Turns out, theHazard.intensity
is not in canonical format, meaning that multiple entries inintensity.data
map to the same entry in theintensity
matrix. This causes erroneously high intensities for about 6% of all centroids.To Reproduce
Steps to reproduce the behavior/error:
data
maximum.max
will bring it into canonical format.Output:
Background
The reason for this is that multiple entries in
intensity.data
point to the same matrix entries ofintensity
.csr_matrix
supports this, and sums up these values.Output:
The canonical format can be explicitly reached by calling
sum_duplicates()
. Summing the values is the default behavior and causes the exacerbated intensities.How to solve this
The following code prunes the doubled entries and calls an aggregation method. Unfortunately, the doubled entries are actually not doubled (as you can see above) but show different values. It is thus unclear how they should be merged into a single sensible value. It is possible to take the mean or the max.
I want to make clear that this is not just a fix. The following function modifies the hazard data. There seems to have been some underlying problem with merging the original wildfire data or with the hazard definition that should be fixed in order to avoid this inconsistent hazard definition from the start.
Result:
Climada Version: develop
System Information (please complete the following information):
@emanuel-schmid @chahank @samluethi
The text was updated successfully, but these errors were encountered: