Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

progress bar and parallel compute for cartogram_ncont #36

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

e-kotov
Copy link

@e-kotov e-kotov commented Nov 3, 2024

Hi, trying to solve #35 myself I started with the simplest function cartogram_ncont(). I know there is an older pull request #28 , but this approach with just building on furrr and future seems simpler to me. Arguably it can be improved with better handling of the workers. E.g. instead of users specifying the n_cores in cartogram_ncont() they should be able to set the number of workers and type of parallel processing with future outside the function.

This PR results in cartogram_ncont() gaining both progress bar and parallel processing.

Before making changes to the code, I ran your default example from cartogram_ncont() function and obtained a control value of area of the resulting cartogram for nc from sf package. I wrote simple test for this area in the tests folder using {testthat}. The test passes with my updated code for cartogram_ncont().

@e-kotov
Copy link
Author

e-kotov commented Nov 3, 2024

Update:

  1. fixed missing CRS after applying cartogram_ncont()
  2. added test to prevent missing CRS after applying cartogram_ncont()

weight,
k = 1,
inplace = TRUE,
n_cpu = parallelly::availableCores()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in general it is recommended to keep the default core use to 1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

@@ -12,8 +12,16 @@ Authors@R: c(
Description: Construct continuous and non-contiguous area cartograms.
URL: https://github.com/sjewo/cartogram, https://sjewo.github.io/cartogram/
BugReports: https://github.com/sjewo/cartogram/issues
Imports: methods, sf, packcircles
Suggests:
Imports:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, the furrr, parallelly, and future packages could be moved to Suggests: and the code related to them may be only used if the progress bar is on.

(I also think there should be an option to disable the progress bar)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nowosad On a second thought, that would imply using the for loop (or purrr::map) by default with no progress bar... (though purrr::map also has progress bar). So kind of defeats the purpose. And why would parallel processing rely on the progress bar setting?

Copy link
Owner

@sjewo sjewo Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestions and your work on implementing parallel processing @e-kotov! I haven't had much time to keep an eye on this topic in the past.

I would like to keep the imports as small as possible.

  • Maybe utils::txtProgressBar() is sufficient for adding a progress bar without additional imports?
  • I would also prefer to move the parallelly, and future packages to Suggests, as @Nowosad suggested, and add an if-clause for n_cpu > 1 to use futures parallel execution.
  • I don't want to be petty, but is 'furrr' absolutely necessary? If the code can only be implemented with the future package, I would prefer that.

Best,
Sebastian

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjewo points taken. I'm not sure excluding furrr is practical, as future package only provides the framework, and then you need some other package like furrr or future.apply to do the job. So why not furrr?

I will get to it next week. What do you think of the older PR #28 ? Did you have plans to merge it at all?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply, @e-kotov !

Regarding furrr: i would like to keep the number of dependencies low. future.apply has half the number of imported packages. Nevertheless, if furrr is more practical in your opinion, that's fine for me!

I have not gotten the PR #28 to work and the “future” approach looks promising to me. So if you are working on a different solution that results in some speed improvements, I would certainly accept a PR.

@e-kotov
Copy link
Author

e-kotov commented Nov 3, 2024

@Nowosad I will work some more on this PR in the coming days. Clearly it is not ready for merging.

There is also something funny happening with the resulting cartogram. It does not show itself when applied to the test nc data, but with my own data (cannot upload right now), I am getting an error with ggplot geom_sf:

Error in `st_transform.sfc()`:
! cannot transform sfc object with missing crs

This may have something to do with attributes that I fail to restore to the geom column. I will look into this and update the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants