Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative to chromote dependency? #85

Open
courtiol opened this issue Dec 22, 2024 · 7 comments
Open

Alternative to chromote dependency? #85

courtiol opened this issue Dec 22, 2024 · 7 comments
Labels

Comments

@courtiol
Copy link

Hi, thanks a lot for the good work on wdpar.

I don't know if that would be possible to do otherwise, but chromote is a tricky dependency as it depends on chrome or chromium being installed on the system.
For most local usage that is fine, but for remote computing that is a strong requirement.
For example, I am running wdpar on a remote computer using podman within the rocker container geospatial (https://rocker-project.org/images/), which does not come with chrome or chromium.
I was not able to install those web-browser (normal install asks you to using snap, which then fail. I tried to install a headless version of chrome with little success as well)...
There must be a way to install chrome or chromium and I will explore that in more depth (perhaps a new rocker container could be setup for that), but the question is whether this is necessary to rely on so much software for wdpar?
So my question is why relying on chromote in the first place and how difficult would it be to bypass this dependency from within wdpar?

Thanks for sharing your thoughts.

@jeffreyhanson
Copy link
Collaborator

jeffreyhanson commented Dec 22, 2024

Hi @courtiol,

Thanks for reaching out!

Yeah, I've also found chromote to be a tricky dependency too. In the past, I had used wdman instead as a dependency (https://github.com/ropensci/wdman), but some people were having issues with it on macOS so I switched to chromote. I ended up choosing chromote as a dependency because its well maintained by Rstudio and seems to work on all platforms.

Just to clarify, did the wdpar installation instructions not work (see https://github.com/prioritizr/wdpar?tab=readme-ov-file#ubuntu)? If you run this, then that should install a chromium headless browser. I've only tested this on Ubuntu though. Also, in case it's helpful, the GitHub Actions CI includes Ubuntu, and so this might be useful (https://github.com/prioritizr/wdpar/blob/master/.github/workflows/R-CMD-check-ubuntu.yaml).

The reason why wdpar has chromote as a dependency is to find the URLs for downloading data from Protected Planet (https://www.protectedplanet.net/en). Although the latest global dataset is available at a persistent URL, the country-specific datasets are not and so we have to do some web scrapping in order to find the download URL for a given country. In particular, we need to perform the web scrapping with a software/method/tool that can execute JavaScript (e.g., so this rules out using rvest or xml2 packages) due to how the Protected Planet is set up. To my knowledge, this means that we require a headless browser (such as chromium, PhantomJS, etc).

How does that sound?

@courtiol
Copy link
Author

courtiol commented Dec 23, 2024

Hi @jeffreyhanson,

Thanks for the explanations.
I am using ubuntu in the container and I could certainly not do my huge job in a GitHub Action.
Everything installs smoothly, the issue is only at run time.
I did follow your installation instructions. Here is what I did after a ssh login to the remote computer:
(note: the issue does occur when running wdpar, but I isolate it by directly running chromote)

[XX@XX]$ podman pull rocker/geospatial
[XX@XX]$ podman run -ti rocker/geospatial bash                                                                                                                                                                                                
root@5a68e75a1731:/# apt-get -y update 
root@5a68e75a1731:/# apt-get install -y libgmp3-dev libmpfr-dev libudunits2-dev libgdal-dev libgeos-dev libproj-dev chromium-browser                                                                                                                        
Reading package lists... Done                                                                                                                                                                                                                               
Building dependency tree... Done                                                                                                                                                                                                                            
Reading state information... Done                                                                                                                                                                                                                           
libgmp3-dev is already the newest version (2:6.3.0+dfsg-2ubuntu6).                                                                                                                                                                                          
libmpfr-dev is already the newest version (4.2.1-1build1).                                                                                                                                                                                                  
libudunits2-dev is already the newest version (2.2.28-7build1).                                                                                                                                                                                             
libgdal-dev is already the newest version (3.8.4+dfsg-3ubuntu3).
libgeos-dev is already the newest version (3.12.1-3build1).
libproj-dev is already the newest version (9.4.0-1build2).
chromium-browser is already the newest version (2:1snap1-0ubuntu2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

root@5a68e75a1731:/# R
                                                                                                                                                                                                                                                            
R version 4.4.2 (2024-10-31) -- "Pile of Leaves"                                                                                                                                                                                                            
Copyright (C) 2024 The R Foundation for Statistical Computing                                                                                                                                                                                               
Platform: x86_64-pc-linux-gnu                                                                                                                                                                                                                               

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages("chromote")                                                                                                 
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘AsioHeaders’, ‘websocket’

trying URL 'https://p3m.dev/cran/__linux__/noble/latest/src/contrib/AsioHeaders_1.22.1-2.tar.gz'
Content type 'binary/octet-stream' length 649323 bytes (634 KB) 
==================================================
downloaded 634 KB

trying URL 'https://p3m.dev/cran/__linux__/noble/latest/src/contrib/websocket_1.4.2.tar.gz'
Content type 'binary/octet-stream' length 468359 bytes (457 KB) 
==================================================
downloaded 457 KB

trying URL 'https://p3m.dev/cran/__linux__/noble/latest/src/contrib/chromote_0.3.1.tar.gz'
Content type 'binary/octet-stream' length 391789 bytes (382 KB) 
==================================================
downloaded 382 KB

* installing *binary* package ‘AsioHeaders’ ...
* DONE (AsioHeaders)
* installing *binary* package ‘websocket’ ...
* DONE (websocket)
* installing *binary* package ‘chromote’ ...
* DONE (chromote)

The downloaded source packages are in
        ‘/tmp/RtmpCwBzky/downloaded_packages’

> chromote::ChromoteSession$new()
Error in `with_random_port()`:
! Cannot find an available port. Please try again.
Caused by error in `startup()`:
! Failed to start chrome. Error:

Command '/usr/bin/chromium-browser' requires the chromium snap to be installed.
Please install it with:

snap install chromium
Run `rlang::last_trace()` to see where the error occurred.

> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3  
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] websocket_1.4.2 processx_3.8.4  compiler_4.4.2  fastmap_1.2.0  
 [5] magrittr_2.0.3  R6_2.5.1        cli_3.6.3       promises_1.3.2 
 [9] later_1.4.1     tools_4.4.2     Rcpp_1.0.13-1   chromote_0.3.1 
[13] jsonlite_1.8.9  ps_1.8.1        rlang_1.1.4

> quit()

root@5a68e75a1731:/# snap install chromium
error: cannot communicate with server: Post "http://localhost/v2/snaps/chromium": dial unix /run/snapd.socket: connect: no such file or directory

root@5a68e75a1731:/# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS"

Your comment made me think: is it actually possible to download the latest global dataset and use it with wdpar?
I haven't tried, but that would circumvent the problem for my usage at least (and I do need to download all countries which for now requires me to jump through hoops since downloading failures are frequent).

Thanks for the help.

@jeffreyhanson
Copy link
Collaborator

Thanks so much for providing all these details!

Yeah, this error message does suggest that isn't not finding a chrome browser. This issue (ashbythorpe/selenider#22) has some install instructions to install google chrome in a docker image, maybe this might work?

Also, yeah, if you manually download the global dataset, you can then import it with the wdpa_read() function. Note that you would use wdpa_read() on the zip file that you download from Protected Planet. E.g., something like this:

file <- "WDPA_WDOECM_Jan2025_Public_LIE.zip"
wdpa_data <- wdpa_read(file)

@jeffreyhanson
Copy link
Collaborator

Also, sorry I wasn't clear, I wasn't suggesting that you use GitHub Actions to do the processing, my aim was to provide additional commands/configuratoins that might help with getting it working. But now that I see this extra info, it probably won't be very useful - sorry!

@jeffreyhanson
Copy link
Collaborator

Also, you might be able to use wdpa_fetch() with "global" without having chromote working - but I'm not sure? This is because it's hard-coded to find the global dataset at "http://wcmc.io/wdpa_current_release".

@courtiol
Copy link
Author

courtiol commented Dec 23, 2024

Thanks again,
the instructions given here (ashbythorpe/selenider#22) do not work for me:

> chromote::ChromoteSession$new()

Error in `with_random_port()`:
! Cannot find an available port. Please try again.
Caused by error in `startup()`:
! Failed to start chrome. Error:
Old Headless mode will be removed from the Chrome binary soon. Please use the new Headless mode (https://developer.chrome.com/docs/chromium/new-headless) or the chrome-headless-shell which is a standalone implementation of the old Headless mode (https://developer.chrome.com/blog/chrome-headless-shell).

[1223/084325.348199:ERROR:zygote_host_impl_linux.cc(101)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
Run `rlang::last_trace()` to see where the error occurred.
> 

I did try to follow the headless route but I also failed there.
I will attempt using wdpa_read() and bypass the issue.
Thanks for the tip.
I will also ask rocker folks how to solve the chromote setup and report their answer here for other users.
++

@courtiol
Copy link
Author

I posted an issue here: rocker-org/rocker-versioned2#892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants