Skip to content
This repository has been archived by the owner on Jan 14, 2025. It is now read-only.

Ensemble.calc_nobs expands the number of object partitions #347

Open
dougbrn opened this issue Jan 16, 2024 · 1 comment
Open

Ensemble.calc_nobs expands the number of object partitions #347

dougbrn opened this issue Jan 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@dougbrn
Copy link
Collaborator

dougbrn commented Jan 16, 2024

This is likely not unique to calc_nobs, and rather should occur anytime we're doing a groupby from source and merging into object without repartitioning. This groupby produces a result with npartitions equal to the number of source partitions, when we merge this into the object table the merge operation results in an object table with npartitions equal to the number of source partitions. This introduces partition bloat, which can needlessly expand the task graph. Not neccesarily a "bug", but this is not working in line with intuition, which would suggest an equal number of object rows should have an equal number of partitions. Edit: On second-look, it looks like we are attempting to repartition back to the number of object partitions in calc_nobs. Either this isn't working, or this may be a consequence of #342 as the repartition call is being used.

@dougbrn dougbrn added the bug Something isn't working label Jan 16, 2024
@dougbrn
Copy link
Collaborator Author

dougbrn commented Jan 17, 2024

After some exploration, and testing on the #349 branch, I'm less clear on the actual cause of this issue.

This minimal example produces the correct behavior of the object partitions being conserved.

from tape import Ensemble

ens = Ensemble(client=False).from_dataset('s82_rrlyrae', table_sync=False)
ens.update_frame(ens.source.repartition(npartitions=10))
ens.update_frame(ens.object.repartition(npartitions=2))

ens.calc_nobs(temporary=False, by_band=False)

I'm seeing this issue with the full ZTF notebook, and it seems to extend to more than just Ensemble.calc_nobs. Will continue investigating.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant