Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flexible chunks again in dfmt.open_partitioned_dataset() #953

Closed
5 tasks done
veenstrajelmer opened this issue Aug 14, 2024 · 0 comments · Fixed by #961
Closed
5 tasks done

flexible chunks again in dfmt.open_partitioned_dataset() #953

veenstrajelmer opened this issue Aug 14, 2024 · 0 comments · Fixed by #961

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Aug 14, 2024

Deltares/xugrid#253 made chunks argument not flexible in some cases. This new code now gives merging errors for some large datafiles.

Todo:

Code to test DCSM with different chunks:

import dfm_tools as dfmt
import datetime as dt

file_nc = r'p:\1204257-dcsmzuno\2006-2012\3D-DCSM-FM\A18b_ntsu1\DFM_OUTPUT_DCSM-FM_0_5nm\DCSM-FM_0_5nm_0*_map.nc'

chunks = "auto" # 21% of memory increase
# chunks = {"time":1} # 2% of memory increase

uds = dfmt.open_partitioned_dataset(file_nc, chunks=chunks)

print('>> load single timestep of waterlevels: ', end='')
dtstart = dt.datetime.now()
uds['mesh2d_s1'].isel(time=365).load()
print(f'{(dt.datetime.now()-dtstart).total_seconds():.2f} sec')

Gives:

chunks = "auto"
>> xu.open_dataset() with 20 partition(s): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 : 11.92 sec
>> xu.merge_partitions() with 20 partition(s): 4.37 sec
>> dfmt.open_partitioned_dataset() total: 16.31 sec
>> load single timestep of waterlevels: 54.47 sec
25% memory increase

chunks = {"time":1}
>> xu.open_dataset() with 20 partition(s): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 : 11.30 sec
>> xu.merge_partitions() with 20 partition(s): 4.21 sec
>> dfmt.open_partitioned_dataset() total: 15.53 sec
>> load single timestep of waterlevels: 0.48 sec
2% memory increase
@veenstrajelmer veenstrajelmer changed the title set chunks='auto' in dfmt.open_partitioned_dataset() set chunks='auto' in dfmt.open_partitioned_dataset() Aug 14, 2024
@veenstrajelmer veenstrajelmer changed the title set chunks='auto' in dfmt.open_partitioned_dataset() test chunks='auto' in dfmt.open_partitioned_dataset() Aug 15, 2024
@veenstrajelmer veenstrajelmer changed the title test chunks='auto' in dfmt.open_partitioned_dataset() flexible chunks again in dfmt.open_partitioned_dataset() Aug 16, 2024
@veenstrajelmer veenstrajelmer linked a pull request Aug 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant