Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Gridded data #3481

Open
leaver2000 opened this issue Apr 17, 2024 · 13 comments
Open

Support for Gridded data #3481

leaver2000 opened this issue Apr 17, 2024 · 13 comments
Labels
Type: Feature New functionality

Comments

@leaver2000
Copy link

What should we add?

I've been developing an application that works with gridded data. The metpy.calc.thermo module has been a great guide for my work thus far.

My moist_lapse function differs from metpy.calc.thermo.moist_lapse and is written in Cython, that seemed to be one of the major bottlenecks the vectorizing things for gridded data support.

I currently have implementations of the moist_lapse, wet_bulb_temperature, parcel_profile, ccl, and downdraft_cape that support 2d: (N, Z) array structure.

My implementations don't use pint, everything is assumed si units. Let me know if there is any interest I'm happy to share.

pressure = isobaric_levels()
(temperature, specific_humidity), shape = get_values(["temperatureair", "humidityspecific"])
dewpoint = thermo.dewpoint_from_specific_humidity(pressure[newaxis, :], specific_humidity)

dcape = thermo.downdraft_cape(pressure, temperature, dewpoint)
dcape = dcape.reshape((-1,) + shape[2:])


fig, axes = plt.subplots(1, dcape.shape[0])
fig.tight_layout()
fig.suptitle("dcape", fontsize=16, x=0.5, y=0.65)

for i, ax in enumerate(axes):
    ax.imshow(dcape[i, :, :], cmap="coolwarm")

image

T = temperature[::2500]
Td = dewpoint[::2500]
my_dcape = thermo.downdraft_cape(
    pressure=pressure,
    temperature=T,
    dewpoint=Td,
)
mp_dcape = np.array(
    [
        mpcalc.downdraft_cape(pressure * units.pascal, T[i] * units.kelvin, Td[i] * units.kelvin)[0].m
        for i in range(T.shape[0])
    ]
)

delta = np.abs(my_dcape - mp_dcape)
print(f"{delta.max() = } {delta.mean() = } {delta.std() = }")

delta.max() = 0.6819282354549614 delta.mean() = 0.1615940182263646 delta.std() = 0.13604876693905152

Reference

No response

@leaver2000 leaver2000 added the Type: Feature New functionality label Apr 17, 2024
@leaver2000
Copy link
Author

assert (260640, 24) == temperature.shape == dewpoint.shape # (T*Y*X, Z)
assert (24,) == pressure.shape
%timeit thermo.downdraft_cape(pressure, temperature, dewpoint)
# 218 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@Z-Richard
Copy link
Contributor

First of all, great work! Indeed, there has been considerable debate and effort on making MetPy faster through (a) vectorization, e.g., #1968; and (b) using faster variants of NumPy, e.g., #3432. I think this increasing need echoes the demand of calculating convective metrics (CAPE, CIN) with model data, as understanding the changing nature of thunderstorms in a warming world has become a very important research topic. The original development of many MetPy thermodynamic functions, however, likely has a meteorological audience in mind (who might only need to calculate CAPE/CIN at a few points). And yet, the scientific community currently lacks such a tool to calculate these quantities en masse.
It would be helpful (and deeply appreciated) if you can put your code in a public repository (as a reference point if the MetPy developers want to take a look at the code later) or directly submit a draft PR (like #1968). While I am not a core developer of MetPy and do not know how this effort will evolve, I believe that building a faster, vectorized version of thermodynamic functions as a community will benefit future users substantially.

@leaver2000
Copy link
Author

leaver2000 commented Apr 27, 2024

@dopplershift what type of precision are you looking for gridded solutions? I've put together a couple of benchmarks below.

Cython functions

moist_lapse

import numpy as np
import metpy.calc as mpcalc
from metpy.units import units

import nzthermo as nzt
N = 1000
Z = 20

P = np.linspace(101325, 10000, Z)[np.newaxis, :] # (1, Z)
T = np.random.uniform(300, 200, N) # (N,)

ml = nzt.moist_lapse(P, T)
%timeit nzt.moist_lapse(P, T)
# 1.22 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
P = P[0] * units.Pa
T = T * units.kelvin
ml_ = [mpcalc.moist_lapse(P, T[i]).m for i in range(N)]  # type: ignore
%timeit [mpcalc.moist_lapse(P, T[i]).m for i in range(N)]
# 1.65 s ± 29.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.testing.assert_allclose(ml, ml_, rtol=1e-3)

lcl

P = np.random.uniform(101325, 10000, N) # (N,)
T = np.random.uniform(300, 200, N) # (N,)
Td = T - np.random.uniform(0, 10, N) # (N,)

lcl_p, lcl_t = nzt.lcl(P, T, Td) # ((N,), (N,))
%timeit nzt.lcl(P, T, Td)
# 1.4 ms ± 373 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
P *= units.Pa
T *= units.kelvin
Td *= units.kelvin
lcl_p_, lcl_t_ = (x.m for x in mpcalc.lcl(P, T, Td))  # type: ignore
%timeit mpcalc.lcl(P, T, Td)
# 1.57 s ± 7.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.testing.assert_allclose(lcl_p, lcl_p_, rtol=1e-3)
np.testing.assert_allclose(lcl_t, lcl_t_, rtol=1e-3)

Implementations

wet_bulb_temperature

P = np.random.uniform(101325, 10000, 1000).astype(np.float32)
T = np.random.uniform(300, 200, 1000).astype(np.float32)
Td = T - np.random.uniform(0, 10, 1000).astype(np.float32)

wb = nzt.wet_bulb_temperature(P, T, Td)
%timeit nzt.wet_bulb_temperature(P, T, Td)
# 1.17 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
P *= units.Pa
T *= units.kelvin
Td *= units.kelvin
wb_ = mpcalc.wet_bulb_temperature(P, T, Td).m
%timeit mpcalc.wet_bulb_temperature(P, T, Td)
# 390 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.testing.assert_allclose(wb, wb_, rtol=1e-3)

downdraft_cape

NOTE: random values caused metpy's dcape function to throw interpolation warnings and return nan in many most cases.

pressure = np.array(
    [1013, 1000, 975, 950, 925, 900, 875, 850, 825, 800, 775, 750, 725, 700, 650, 600, 550, 500, 450, 400, 350, 300],
).astype(np.float32)
pressure *= 100.0
temperature = np.array(
    [
        [243, 242, 241, 240, 239, 237, 236, 235, 233, 232, 231, 229, 228, 226, 235, 236, 234, 231, 226, 221, 217, 211],
        [250, 249, 248, 247, 246, 244, 243, 242, 240, 239, 238, 236, 235, 233, 240, 239, 236, 232, 227, 223, 217, 211],
        [293, 292, 290, 288, 287, 285, 284, 282, 281, 279, 279, 280, 279, 278, 275, 270, 268, 264, 260, 254, 246, 237],
        [300, 299, 297, 295, 293, 291, 292, 291, 291, 289, 288, 286, 285, 285, 281, 278, 273, 268, 264, 258, 251, 242],
    ],
    dtype=np.float32,
)
dewpoint = np.array(
    [
        [224, 224, 224, 224, 224, 223, 223, 223, 223, 222, 222, 222, 221, 221, 233, 233, 231, 228, 223, 218, 213, 207],
        [233, 233, 232, 232, 232, 232, 231, 231, 231, 231, 230, 230, 230, 229, 237, 236, 233, 229, 223, 219, 213, 207],
        [288, 288, 287, 286, 281, 280, 279, 277, 276, 275, 270, 258, 244, 247, 243, 254, 262, 248, 229, 232, 229, 224],
        [294, 294, 293, 292, 291, 289, 285, 282, 280, 280, 281, 281, 278, 274, 273, 269, 259, 246, 240, 241, 226, 219],
    ],
    dtype=np.float32,
)

dcape = nzt.downdraft_cape(pressure, temperature, dewpoint)
%timeit nzt.downdraft_cape(pressure, temperature, dewpoint)
# 2.41 ms ± 877 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
P = pressure * units.Pa
T = temperature * units.kelvin
Td = dewpoint * units.kelvin
dcape_ = [mpcalc.downdraft_cape(P, T[i], Td[i])[0].m for i in range(temperature.shape[0])]  # type: ignore
%timeit [mpcalc.downdraft_cape(P, T[i], Td[i]) for i in range(temperature.shape[0])]
# 16.5 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
np.testing.assert_allclose(dcape, dcape_, rtol=1e-2)

I am currently looking into the _parcel_profile_helper and have a rough implementation developed.

@zhangslTHU
Copy link

hi, @leaver2000 , thanks a lot for your great work. From above examples, it seems that your implementations are much faster than metpy which takes a long long time to calculate wet bulb temperature.

From the codes and nzthermo website, it seems nzthermo only supports 1D data when calculating wet bulb temperature, could nzthermo calculate 3D data (e.g., time, lat, lon) ?

Last but not least, it would be very helpful if you can provide a way to install nzthermo package with conda (like conda install nzthermo).

@leaver2000
Copy link
Author

leaver2000 commented May 6, 2024

From the codes and nzthermo website, it seems nzthermo only supports 1D data when calculating wet bulb temperature, could nzthermo calculate 3D data (e.g., time, lat, lon) ?

For wetbulb temperature that can be accomplished as shown below.

TIME = 144
LAT = 721
LON = 1440
N = TIME * LAT * LON
P = np.random.uniform(101325, 10000, N).astype(np.float32).reshape(TIME, LAT, LON)
T = np.random.uniform(300, 200, N).astype(np.float32).reshape(TIME, LAT, LON)
Td = T - np.random.uniform(0, 10, N).astype(np.float32).reshape(TIME, LAT, LON)

wb = nzt.wet_bulb_temperature(P.ravel(), T.ravel(), Td.ravel()).reshape(TIME, LAT, LON)

Last but not least, it would be very helpful if you can provide a way to install nzthermo package with conda (like conda install nzthermo).

I can look into it publishing a release, the challenge is the only supported/tested OS is linux.

@zhangslTHU
Copy link

Thanks a lot. Hope we can see the released version soon.

How to install nzthermo in linux? I'm curious about the code pip install . shown in nzthermo Getting Started, may be pip install nzthermo ?

@leaver2000
Copy link
Author

leaver2000 commented May 6, 2024

Thanks a lot. Hope we can see the released version soon.

How to install nzthermo in linux? I'm curious about the code pip install . shown in nzthermo Getting Started, may be pip install nzthermo ?

@zhangslTHU You can use pip to and git to build from source, like...

~ mkdir myproject
➜  ~ cd myproject 
➜  myproject python3.11 -m venv .venv && source .venv/bin/activate  
(.venv) ➜  myproject pip install git+https://github.com/leaver2000/nzthermo@master
(.venv) ➜  myproject python -c 'import numpy as np;import nzthermo as nzt;print(nzt.wet_bulb_temperature(np.array([101325.]),np.array([300.0]),np.array([292.0])))'
[294.36783585]

pip install . assumes you are inside of the project directory

@zhangslTHU
Copy link

@leaver2000 thanks a lot. it works now!

It is much faster than Metpy according to test results. It took about 21 mins by Metpy while 0.5s by nzthermo for a array with shape (time=24, lat=181, lon=360),.

Here shows the actual time i spent to calculate wet bulb temperature for ERA5 hourly data in Jan 2022 (31 days) with Metpy, and same data with nzthermo but for 1 year (12 months for 2022):

running time for metpy
read 1th data: 2024-05-06 21:36
finish 1th wbt calculation: 2024-05-06 21:57
read 2th data: 2024-05-06 21:57
finish 2th wbt calculation: 2024-05-06 22:18
read 3th data: 2024-05-06 22:18
finish 3th wbt calculation: 2024-05-06 22:40
read 4th data: 2024-05-06 22:40
finish 4th wbt calculation: 2024-05-06 23:01
read 5th data: 2024-05-06 23:01
finish 5th wbt calculation: 2024-05-06 23:22
read 6th data: 2024-05-06 23:22
finish 6th wbt calculation: 2024-05-06 23:44
read 7th data: 2024-05-06 23:44
finish 7th wbt calculation: 2024-05-07 00:05
read 8th data: 2024-05-07 00:05
finish 8th wbt calculation: 2024-05-07 00:27
read 9th data: 2024-05-07 00:27
finish 9th wbt calculation: 2024-05-07 00:48
read 10th data: 2024-05-07 00:48
finish 10th wbt calculation: 2024-05-07 01:09
read 11th data: 2024-05-07 01:09
finish 11th wbt calculation: 2024-05-07 01:31
read 12th data: 2024-05-07 01:31
finish 12th wbt calculation: 2024-05-07 01:52
read 13th data: 2024-05-07 01:52
finish 13th wbt calculation: 2024-05-07 02:14
read 14th data: 2024-05-07 02:14
finish 14th wbt calculation: 2024-05-07 02:35
read 15th data: 2024-05-07 02:35
finish 15th wbt calculation: 2024-05-07 02:56
read 16th data: 2024-05-07 02:56
finish 16th wbt calculation: 2024-05-07 03:18
read 17th data: 2024-05-07 03:18
finish 17th wbt calculation: 2024-05-07 03:39
read 18th data: 2024-05-07 03:39
finish 18th wbt calculation: 2024-05-07 04:00
read 19th data: 2024-05-07 04:00
finish 19th wbt calculation: 2024-05-07 04:21
read 20th data: 2024-05-07 04:21
finish 20th wbt calculation: 2024-05-07 04:43
read 21th data: 2024-05-07 04:43
finish 21th wbt calculation: 2024-05-07 05:04
read 22th data: 2024-05-07 05:04
finish 22th wbt calculation: 2024-05-07 05:25
read 23th data: 2024-05-07 05:25
finish 23th wbt calculation: 2024-05-07 05:47
read 24th data: 2024-05-07 05:47
finish 24th wbt calculation: 2024-05-07 06:08
read 25th data: 2024-05-07 06:08
finish 25th wbt calculation: 2024-05-07 06:29
read 26th data: 2024-05-07 06:29
finish 26th wbt calculation: 2024-05-07 06:50
read 27th data: 2024-05-07 06:50
finish 27th wbt calculation: 2024-05-07 07:12
read 28th data: 2024-05-07 07:12
finish 28th wbt calculation: 2024-05-07 07:33
read 29th data: 2024-05-07 07:33
finish 29th wbt calculation: 2024-05-07 07:54
read 30th data: 2024-05-07 07:54
finish 30th wbt calculation: 2024-05-07 08:16
read 31th data: 2024-05-07 08:16
finish 31th wbt calculation: 2024-05-07 08:37
running time for nzthermo
read 1th mon data: 2024-05-07 14:35
finish 1th mon wbt calculation: 2024-05-07 14:35
read 2th mon data: 2024-05-07 14:35
finish 2th mon wbt calculation: 2024-05-07 14:35
read 3th mon data: 2024-05-07 14:35
finish 3th mon wbt calculation: 2024-05-07 14:36
read 4th mon data: 2024-05-07 14:36
finish 4th mon wbt calculation: 2024-05-07 14:36
read 5th mon data: 2024-05-07 14:36
finish 5th mon wbt calculation: 2024-05-07 14:36
read 6th mon data: 2024-05-07 14:36
finish 6th mon wbt calculation: 2024-05-07 14:37
read 7th mon data: 2024-05-07 14:37
finish 7th mon wbt calculation: 2024-05-07 14:37
read 8th mon data: 2024-05-07 14:37
finish 8th mon wbt calculation: 2024-05-07 14:37
read 9th mon data: 2024-05-07 14:37
finish 9th mon wbt calculation: 2024-05-07 14:38
read 10th mon data: 2024-05-07 14:38
finish 10th mon wbt calculation: 2024-05-07 14:38
read 11th mon data: 2024-05-07 14:38
finish 11th mon wbt calculation: 2024-05-07 14:39
read 12th mon data: 2024-05-07 14:39
finish 12th mon wbt calculation: 2024-05-07 14:39

thanks again for the great work and the package your developed, it is very helpful and save much time!

@DWesl
Copy link
Contributor

DWesl commented May 9, 2024

For the functions that require a profile, the signature you chose suggests NumPy gufuncs, have you looked into those? (They are possible in Cython, but dealing with the possibility of someone having a record array of 32-bit floats and 8-bit integers and passing just the floats to your function is annoying).

I'm pretty sure wet-bulb temperature is a point-by-point function (even if you use a definition with parcels rising and falling), so the cython.ufunc decorator will take care of that for you. (Or, since you already have that working, speed up the adaptation of other functions).

@leaver2000
Copy link
Author

I've have tested out the cython.ufunc decorator in the past, and I'll admit my current implementations are somewhat inconsistent with typical array broadcasting.

The wetbulb temperature is element-wise ie: (N,)(N,)(N,) but still calls into the moist_lapse ODE which even for a single level is still solved via recursive iteration.

Many other use cases of moist_lapse require (N,Z),(N,),(N,) and it gets a bit tricky when you need each parcel to start and stop at a specific level along Z

@dopplershift
Copy link
Member

@leaver2000 Apologies for not responding sooner. I want to give a more thoughtful response, but just wanted to say that I am very interested in this work and it aligns with what is our highest development priority, getting things fast enough to have thermo (esp. CAPE) work on input grids.

Regarding precision, I just want to feel confident that what's coming out is correct.

@leaver2000

This comment was marked as outdated.

@leaver2000
Copy link
Author

leaver2000 commented Jun 7, 2024

edit: I must have had radar on the brain, any reference to MRMS (multi radar multi sensor) should be HRRR (high resolution rapid refresh)

@dopplershift I just merged my development branch into master which provides support for CAPE and CIN calculations, here is a MRMS HRRR example from earlier this week. I'm able to compute the CAPE & CIN values in about 20 seconds for ~2,000,000 points over 40 vertical levels.

isobaric = xr.open_dataset(
    "hrrr.t00z.wrfprsf00.grib2",
    engine="cfgrib",
    backend_kwargs={"filter_by_keys": {"typeOfLevel": "isobaricInhPa"}},
)
surface = xr.open_dataset(
    "hrrr.t00z.wrfsfcf00.grib2",
    engine="cfgrib",
    backend_kwargs={"filter_by_keys": {"typeOfLevel": "surface", "stepType": "instant"}},
)
T = isobaric["t"].to_numpy()  # (K) (Z, Y, X)
Z, Y, X = T.shape
N = Y * X
T = T.reshape(Z, N).transpose()  # (N, Z)

P = isobaric["isobaricInhPa"].to_numpy().astype(np.float32) * 100.0  # (Pa)
Q = isobaric["q"].to_numpy()  # (kg/kg) (Z, Y, X)
Q = Q.reshape(Z, N).transpose()  # (N, Z)

Td = nzt.dewpoint_from_specific_humidity(P[np.newaxis], Q)

prof = nzt.parcel_profile(P, T[:, 0], Td[:, 0])

CAPE, CIN = nzt.cape_cin(P, T, Td, prof)

CAPE = CAPE.reshape(Y, X)
CIN = CIN.reshape(Y, X)


lat = isobaric["latitude"].to_numpy()
lon = isobaric["longitude"].to_numpy()
lon = (lon + 180) % 360 - 180
timestamp = datetime.datetime.fromisoformat(isobaric["time"].to_numpy().astype(str).item())

fig, axes = plt.subplots(2, 2, figsize=(24, 12), subplot_kw={"projection": ccrs.PlateCarree()})
fig.suptitle(f"{timestamp:%Y-%m-%dT%H:%M:%SZ} | shape {Z, Y, X} | size {Z*Y*X:,}", fontsize=16, y=0.9)

# I suspect that the difference between our cape calculations and the MRMS cape calculations is due
# to the fact we are not actually starting at the surface or accounting for surface elevation.
# leading to inflated cape values in areas of higher elevation.
cape = np.where(CAPE < 8000, CAPE, 8000)
cin = np.where(CIN > -1400, CIN, -1400)

for ax, data, title, cmap in zip(
    axes[0], [cape, cin], ["NZTHERMO CAPE", "NZTHERMO CIN"], ["inferno", "inferno_r"]
):
    ax.coastlines(color="white", linewidth=0.25)
    ax.set_title(title, fontsize=16)
    ax.set_global()
    ax.set_extent([lon.min(), lon.max(), lat.min(), lat.max()])
    cf = ax.contourf(lon, lat, data, transform=ccrs.PlateCarree(), cmap=cmap)
    plt.colorbar(cf, ax=ax, orientation="vertical", pad=0.05, label="J/kg", shrink=0.75)

MRMS_CAPE = surface["cape"].to_numpy()
MRMS_CIN = surface["cin"].to_numpy()
for ax, data, title, cmap in zip(
    axes[1], [MRMS_CAPE, MRMS_CIN], ["MRMS CAPE", "MRMS CIN"], ["inferno", "inferno_r"]
):
    ax.coastlines(color="white", linewidth=0.25)
    ax.set_title(title, fontsize=16)
    ax.set_global()
    ax.set_extent([lon.min(), lon.max(), lat.min(), lat.max()])
    cf = ax.contourf(lon, lat, data, transform=ccrs.PlateCarree(), cmap=cmap)
    plt.colorbar(cf, ax=ax, orientation="vertical", pad=0.05, label="J/kg", shrink=0.75)

image

I've also migrated several of the Cython functions over to C++. I like Cython a lot but there is very little tooling around it ie linters formatters. Several of those C++ have been exposed to the Python as a np.ufunc as @DWesl suggested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New functionality
Projects
None yet
Development

No branches or pull requests

5 participants