Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arc_select fails on large dataset #2

Closed
ryanzomorrodi opened this issue Jul 8, 2024 · 11 comments
Closed

arc_select fails on large dataset #2

ryanzomorrodi opened this issue Jul 8, 2024 · 11 comments

Comments

@ryanzomorrodi
Copy link

Describe the bug
arc_select() crashes when downloading a large dataset. Not sure if this is truly connected to it being large or some other aspect of the data, but I can confirm that I am able to download other, smaller datasets.

To Reproduce
Use arc_open to open a feature service and arc_select to download the feature service and return it as an sf.

library(magrittr)
library(arcgis)

options(RUST_BACKTRACE=1)

PRCP_pred <- "https://services.arcgis.com/GL0fWlNkwysZaKeV/arcgis/rest/services/TXLA_ZCTA_PRCPpred/FeatureServer/0" %>%
    arc_open() %>%
    arc_select()
#> thread '<unnamed>' panicked at arcpbf\src\parse.rs:11:22:
#> internal error: entered unreachable code
#> thread '<unnamed>' panicked at arcpbf\src\lib.rs:156:1:
#> explicit panic
#> Error in multi_resp_process_(resps) :
#>   User function panicked: multi_resp_process_

I made the feature service public so you should be able to try to download it for yourself

Expected behavior
I expected the feature to be downloaded and stored as an sf. Annoyingly, the error only happens after it seems to have downloaded the entire feature.

Additional context
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64

@JosiahParry
Copy link
Collaborator

Well here is the good news. I can reproduce the bug with only one feature! Bad news, there's a bug 😲. I'm sorry about that. I'll work on it.

@JosiahParry JosiahParry transferred this issue from R-ArcGIS/arcgislayers Jul 8, 2024
@JosiahParry
Copy link
Collaborator

What is really interesting is that the protocol buffer is saying that this is a small integer but actually the field type is a date and it is not being processed as such.

I'm not sure if this is a bug in the feature service or the library to be honest!

Field type is: EsriFieldTypeOid
Field type is: EsriFieldTypeString
Field type is: EsriFieldTypeSmallInteger
Value { value_type: Some(StringValue("2017-08-01")) }

@JosiahParry
Copy link
Collaborator

``` r
library(arcgis)
#> Attaching core arcgis packages:
#> → arcgisutils v0.3.0
#> → arcgislayers v0.3.0
#> → arcgisgeocode v0.1.3
#> → arcgisplaces v0.1.0
PRCP_pred <- "https://services.arcgis.com/GL0fWlNkwysZaKeV/arcgis/rest/services/TXLA_ZCTA_PRCPpred/FeatureServer/0" |> 
    arc_open() |> 
    arc_select(n_max = 100)

PRCP_pred 
#> Simple feature collection with 100 features and 6 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -10171550 ymin: 3365293 xmax: -9906416 ymax: 3576996
#> Projected CRS: WGS 84 / Pseudo-Mercator
#> First 10 features:
#>    fid GEOID                DATE PRCPpred Shape__Area Shape__Length
#> 1    1 70001 2017-07-31 20:00:00        0    20781247      27489.86
#> 2    2 70002 2017-07-31 20:00:00        0    11972609      14513.87
#> 3    3 70003 2017-07-31 20:00:00        0    24840919      31051.86
#> 4    4 70005 2017-07-31 20:00:00        0    15136408      20291.43
#> 5    5 70006 2017-07-31 20:00:00        0     9506517      13857.12
#> 6    6 70030 2017-07-31 20:00:00        0   126821239      72154.08
#> 7    7 70031 2017-07-31 20:00:00        0    12218343      16125.83
#> 8    8 70032 2017-07-31 20:00:00        0     6061464      13245.77
#> 9    9 70036 2017-07-31 20:00:00        0    17300199      30751.63
#> 10  10 70037 2017-07-31 20:00:00        0   233311656     137159.07
#>                          geometry
#> 1  POLYGON ((-10041429 3500168...
#> 2  POLYGON ((-10038862 3506457...
#> 3  POLYGON ((-10045203 3500427...
#> 4  POLYGON ((-10035132 3501549...
#> 5  POLYGON ((-10042042 3507365...
#> 6  POLYGON ((-10077192 3475608...
#> 7  POLYGON ((-10054588 3494937...
#> 8  POLYGON ((-10019921 3497074...
#> 9  POLYGON ((-10035782 3468335...
#> 10 POLYGON ((-10027180 3474258...

@JosiahParry
Copy link
Collaborator

I've gone ahead and pushed a change to the R package which should be available as a binary within the next hour or so.

https://r-arcgis.r-universe.dev/arcpbf

@JosiahParry
Copy link
Collaborator

@ryanzomorrodi new version hit cran this morning. Please let me know if this works for you!

@ryanzomorrodi
Copy link
Author

ryanzomorrodi commented Jul 10, 2024

It seems like there is a different error caused by x being null in post_process_single.

library(arcgislayers)

PRCP_pred <- "https://services.arcgis.com/GL0fWlNkwysZaKeV/arcgis/rest/services/TXLA_ZCTA_PRCPpred/FeatureServer/0" |> 
    arc_open() |> 
    arc_select(n_max = 1000)
#> Error in x[[1]]: subscript out of bounds

Created on 2024-07-10 with reprex v2.1.1

@JosiahParry
Copy link
Collaborator

hm. I'm not running into this issue.
image

What versions of arcgislayers and arcgisutils are you running?

packageVersion("arcgislayers")
packageVersion("arcgisutils")

@JosiahParry
Copy link
Collaborator

Perhaps you can run sessionInfo() as well

@ryanzomorrodi
Copy link
Author

ryanzomorrodi commented Jul 11, 2024

I have version 0.3.0 for both arcgislayers and arcgisutils. Also it may help to not provide a n_max and see if you can reproduce it that way. I tried to reproduce the error again today with n_max = 1000, and everything works. When pulling the entire layer, I eventually encounter the error. Also I should mention the httr2 progress bar has seemed to disappear.

library(arcgislayers)

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/Chicago
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] arcgislayers_0.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.36     fastmap_1.2.0     xfun_0.45         glue_1.7.0       
#>  [5] knitr_1.47        htmltools_0.5.8.1 rmarkdown_2.27    lifecycle_1.0.4  
#>  [9] cli_3.6.3         reprex_2.1.1      withr_3.0.0       compiler_4.4.1   
#> [13] tools_4.4.1       evaluate_0.24.0   yaml_2.3.8        arcgisutils_0.3.0
#> [17] rlang_1.1.4       fs_1.6.4

@JosiahParry
Copy link
Collaborator

Thanks @ryanzomorrodi for your help here! This helped me identify a regression in arc_select() this is being fixed (and a test added so it doesn't happen in the future).

When working with detailed and large geometries, it is suggested to drop the page_size argument down to something much smaller. When arc_select() is ran, it checks the property x[["maxRecordCount"]] to identify the maximum number of features that can be returned per request.

With detailed geometries AGOL/Enterprise can actually time out before it has prepared the geometries to be sent. To prevent this, we reduce the number of geometries sent per request. In this case, I ran the below which worked quite well and quite fast!

library(arcgislayers)

x <- "https://services.arcgis.com/GL0fWlNkwysZaKeV/arcgis/rest/services/TXLA_ZCTA_PRCPpred/FeatureServer/0" |> 
    arc_open() 

res <- x |> 
    arc_select(n_max = 25000, page_size = 10)

Though you will need to install the development version of {arcgislayers}.

Though looking at this service, there is a good chance that it might be far too big to hold entirely in memory and perhaps you might want to limit the scope of your queries using fields and where or filter_geom arguments.

@ryanzomorrodi
Copy link
Author

Thanks @JosiahParry ! Excited to see where these arcgis packages go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants