-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increase minimal xugrid version to speed up xu.open_dataset()
#968
Comments
xu.open_dataset()
This almost certainly this change: https://docs.xarray.dev/en/stable/whats-new.html#performance Relevant discussion: pydata/xarray#9058 (Funny, the discussion has even worse datasets with 8500 variables!) Excellent timing! |
I am not sure whether this is the case, when I downgrade dfm_tools, xugrid, xarray, dask and netcdf4 to versions prior to this fix, I still see no memory increase upon merging the grid. I am a bit puzzled, but won't put additional time in it at the moment. |
xugrid was improved so opening datasets is approx 30% faster. Also merging partitions is approx 10% faster. These improvements are mainly relevant for datasets with many variables, the example dataset contained 2410 variables. For datasets with <100 variables, the difference might not be noteworthy.
Todo:
_get_topology()
by not accessing dataArray each time xugrid#285ds.close()
todfmt.open_partitioned_dataset()
(see below) >> won't do since plotting (or any other action) fills up the memory again and it should have some performance impact when closing an re-opening all partitions again.ds[var]
ords[varn]
in dfm_tools withds.variables[varn]
, e.g. indecode_default_fillvals()
andremove_nan_fillvalue_attrs()
, or on theiruds
counterparts. At least the latter has quite some impact on thedfmt.open_partitioned_dataset()
timingsCode to run with
memory_profiler
:Without
ds.close()
(peaks at around 1150 MB):With
ds.close()
(peaks at around 700 MB):Including ghostcell removal
The same, but then including removal of ghostcells
Without
ds.close()
andremove_ghost=True
(peaks at around 1350 MB):With
ds.close()
directly afterxu.core.wrap.UgridDataset
andremove_ghost=True
(peaks at around 1350 MB):With
ds.close()
afterremove_ghostcells()
andremove_ghost=True
(peaks at around 900 MB):including plot
When adding a plot, the memory reduction is again non-existent anymore:
Without
ds.close()
(peaks at around 1200 MB):Timings:
With
ds.close()
(peaks at around 1200 MB):Timings:
including plot with h5netcdf
Without
ds.close()
and withengine="h5netcdf"
(peaks at around 750 MB):Timings:
So this shows far much more time consumption on opening the dataset (because of the pure-python implementation h5netcdf), but the memory usage is significantly less. This might be very useful when h5netcdf is more performant: h5netcdf/h5netcdf#195. Follow-up issue: #484
The text was updated successfully, but these errors were encountered: