Handle large datasets efficienlty #582

dalonsoa · 2024-10-09T13:25:49Z

Some models are going to require data at much higher temporal resolution than the wider model update tick. An example here is sub-daily or daily inputs to the Abiotic model.
The input data files for this use case can be very large – not something we really want to ingest into the Data object at model startup and try and store in RAM.
So, where do we store this kind of data, and is there a way to lazily load the data as required. This might be something that dask is well-suited to as this handles lazy loading of chunked data.

The text was updated successfully, but these errors were encountered:

dalonsoa · 2024-10-09T13:27:31Z

@vgro , we will need an example simulation with, at least, one BIG file and some indication to where it is used, so we can explore how to best handle that memory wise.

alexdewar · 2024-11-11T18:37:46Z

@vgro Do you happen to have a big file like this lying around? No pressure -- I've got lots to be getting on elsewhere -- but I won't be able to start on this until there's some data for me to work with, so if you do have a chance to look at it over the next few weeks, that'd be great.

vgro · 2025-01-14T14:29:04Z

@alexdewar I'm terribly sorry that I haven't replied to this, I never received an email about the issue and we haven't checked the issues systematically in a while. I have a few urgent tasks this week, I'll try and get something for you by the end of next week or so

alexdewar · 2025-01-16T09:27:39Z

Nw @vgro. If it had been really urgent I'd have sent an email... Whenever you can send it through is fine.

vgro · 2025-01-16T13:31:52Z

Nw @vgro. If it had been really urgent I'd have sent an email... Whenever you can send it through is fine.

If you want to run a simulation, you would probably need all the input data to have the same dimensions? For example you would need the climate data to have the same spatial extent and time steps? Or would it be enough to provide one variable, say precipitation?

alexdewar · 2025-01-16T17:38:08Z

Ideally it would have the same dimensions. I haven't looked into it enough to know exactly what I'd need, but big files with somewhat realistic input data should do the trick. Don't spend too long on this -- feel free to just send it through once you've got something and I can let you know if I need anything different.

dalonsoa added the enhancement New feature or request label Oct 9, 2024

alexdewar self-assigned this Oct 9, 2024

davidorme added this to the Core structures milestone Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle large datasets efficienlty #582

Handle large datasets efficienlty #582

dalonsoa commented Oct 9, 2024

dalonsoa commented Oct 9, 2024

alexdewar commented Nov 11, 2024

vgro commented Jan 14, 2025

alexdewar commented Jan 16, 2025

vgro commented Jan 16, 2025

alexdewar commented Jan 16, 2025

Handle large datasets efficienlty #582

Handle large datasets efficienlty #582

Comments

dalonsoa commented Oct 9, 2024

dalonsoa commented Oct 9, 2024

alexdewar commented Nov 11, 2024

vgro commented Jan 14, 2025

alexdewar commented Jan 16, 2025

vgro commented Jan 16, 2025

alexdewar commented Jan 16, 2025