Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/unsequa multiprocessing #763

Merged
merged 34 commits into from
Aug 28, 2023
Merged

Conversation

chahank
Copy link
Member

@chahank chahank commented Aug 3, 2023

Changes proposed in this PR:
This PR is a small maintenance of the unsequa module.
Note: merge conflict is dependent on changes in PR #762 . Will be solved after the later is merged. SOLVED.

  • Replace Pathos with Multiprocess for parallel computing in unsequa module
  • Make the computation to explictly chunk the samples and distribute full chunks on nodes if parallelized.
  • Fix bugs with pandas 2.0 (iteritems -> items, append -> concat)
  • Remove the matplotlib parameters
  • Remove the to be deprecated impact.tot_value (optional, can be reversed if this should be done when the deprecated method is removed)

This PR fixes

PR Author Checklist

PR Reviewer Checklist

Chahan Kropf added 4 commits August 3, 2023 14:06
This fixes previously not working parallel computations and introduces a simpler handling of the pool. The users now only need to specify the number of processes.
@chahank chahank requested a review from peanutfun August 3, 2023 12:16
@chahank chahank mentioned this pull request Aug 3, 2023
13 tasks
Chahan Kropf and others added 2 commits August 4, 2023 11:40
# Conflicts:
#	climada/engine/unsequa/calc_cost_benefit.py
#	climada/engine/unsequa/calc_impact.py
#	climada/engine/unsequa/test/test_unsequa.py
@peanutfun peanutfun mentioned this pull request Aug 17, 2023
13 tasks
@timschmi95 timschmi95 mentioned this pull request Aug 21, 2023
13 tasks
@chahank chahank requested a review from emanuel-schmid August 22, 2023 13:19
Chahan Kropf and others added 3 commits August 23, 2023 13:05
* Allow to set loglevel

* Add method to sort samples

* Update CHANGELOG.md

* Add advanced examples for unsequa

* Remove logging control

* Update changelog

* Remove tipo

* Clarify docstring

* Correct docstring

* Update t.o.c.

* Remove unecessary output prints

* Remove linter issues

* Update changelog

---------

Co-authored-by: Chahan Kropf <[email protected]>
Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me but I cannot really assess if this is a clear improvement. My main trouble is the amount of doubled code and the lack of documentation regarding the parallelization process. Other than that, I see no major issues 👍

climada/engine/unsequa/calc_impact.py Outdated Show resolved Hide resolved
climada/engine/unsequa/calc_base.py Outdated Show resolved Hide resolved
climada/engine/unsequa/calc_cost_benefit.py Show resolved Hide resolved
climada/engine/unsequa/calc_cost_benefit.py Show resolved Hide resolved
climada/engine/unsequa/calc_cost_benefit.py Outdated Show resolved Hide resolved
climada/engine/unsequa/calc_cost_benefit.py Show resolved Hide resolved
climada/engine/unsequa/calc_cost_benefit.py Show resolved Hide resolved
climada/engine/unsequa/calc_cost_benefit.py Outdated Show resolved Hide resolved
climada/engine/unsequa/calc_cost_benefit.py Outdated Show resolved Hide resolved
samples_df.iterrows(),
chunksize=chunksize)

p_iterator = self._sample_parallel_iterator(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Make the entire iterator-computation-datamerge part a (local?) function that only takes a dataframe and the process number as arguments. Then you can pass the first DF row with processes=1 to estimate the compute time and afterwards pass the rest of the dataframe with processes=processes. No need to double the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 261415c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "local" I meant a function that is defined (only) in the scope of uncertainty. This way, it would need fewer parameters. But it works this way nonetheless

@peanutfun
Copy link
Member

peanutfun commented Aug 28, 2023

None of the "private" module functions (e.g. _multiprocess_chunksize) appear in the compiled docs. Either explicitly tell Sphinx to document private functions of the modules or make the functions public.

Edit: Follow-up issue: #774

Copy link
Member

@peanutfun peanutfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!!

@peanutfun peanutfun merged commit 8e0d851 into develop Aug 28, 2023
@emanuel-schmid emanuel-schmid deleted the feature/unsequa_multiprocessing branch August 29, 2023 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants