-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
squish_df helper #38
Comments
This may be a question more relevant to {arcgislayers} than {arcgisutils} but: have you looked at I noticed that |
I think this is actually something I want to formalize a bit more and get sorted out. I've had a chance to look at The workflow in general for these packages is pretty standard:
Question : how do we formalize this in a function signature / standard? Using I'd like to clean this up in Error handling?One of the challenges here is how do we handle errors in the responses? We want to keep all of the responses that work because 1) they might have cost us money to execute and 2) it might have been slow. I'd like to be able to capture the errors and then return them as an attribute to the result so that they can be handled afterwards. But what does that look like? Row-binding resultsRegarding combining results, collapse is without a doubt the fastest way to do this and it respects the input classes. You can also do this with the x <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))
bench::mark(
collapse = collapse::rowbind(x, x, x, x, x, x, x, x, x),
data.table = data.table::rbindlist(list(x, x, x, x, x, x, x, x, x)) |>
sf::st_sf(),
vctrs = vctrs::vec_rbind(x, x, x, x, x, x, x, x, x, .ptype = x[0,]),
check = F
)
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 collapse 19.8µs 28.9µs 31690. 1007.98KB 66.7
#> 2 data.table 228.21µs 246.2µs 3895. 3.34MB 28.3
#> 3 vctrs 7.18ms 7.6ms 131. 826.91KB 67.4
# illustrate ptype argument
vctrs::vec_rbind(x, x, x, x, x, x, x, x, x, .ptype = x[0,])
#> Simple feature collection with 900 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 900 × 15
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
#> 6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
#> 7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
#> 8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
#> 9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
#> 10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
#> # ℹ 890 more rows
#> # ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <MULTIPOLYGON [°]> Created on 2024-03-21 with reprex v2.0.2 |
Also implement new rbind_results function added w/ R-ArcGIS/arcgisutils#38
A common need from processing many requests at once is to combine the results into a single data frame. This is done ad hoc in arcgislayers and arcpbf.
arcgislayers uses
do.call(rbind.data.frame)
which is the slowest approach. arcpbf has adopted a hierarchy of the fastest implementations using collapse, data.table, dplyr, and base R. This should be provided in arcgisutils. It is needed in arcgeocode at the moment as well.See R-ArcGIS/arcgislayers#167
Also: https://github.com/R-ArcGIS/arcpbf/blob/main/R/post-process.R#L109-L121
The text was updated successfully, but these errors were encountered: