-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #166 and issue w/ row-binding results when some elements are empty #167
Conversation
`do.call(rbind, res)` failed when some but not all elements of `res` are empty (issue w/ combining valid CRS with NA values) This refactor adds `rbind_results` and `rbind_res_list` helper functions for improved handling of results lists. Refactoring also includes: - Passing call and error_call to arcgisutils functions - Add set_page_size helper function to consolidate related input checks - Expose error_call arg in count_results - Expose on_error and progress parameters for `httr2::req_perform_parallel` (including setup for arcgislayers.progress option)
R/arc-select.R
Outdated
rbind_res_list <- function(x, error_call = rlang::caller_env()) { | ||
# TODO: enhance this with additional suggested packages | ||
if (rlang::is_installed("vctrs")) { | ||
x <- vctrs::vec_rbind(!!!x, error_call = error_call) | ||
} else { | ||
x <- do.call(rbind.data.frame, x) | ||
} | ||
x | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth extracting this into arcgisutils
https://github.com/R-ArcGIS/arcpbf/blob/main/R/post-process.R#L109-L121
squish_dfs <- function(x) {
if (rlang::is_installed("collapse", version = "2.0.0")) {
x <- collapse::rowbind(x)
} else if (rlang::is_installed("data.table")) {
x <- data.table::rbindlist(x)
data.table::setDF(x)
} else if (rlang::is_installed("dplyr")) {
x <- dplyr::bind_rows(x)
} else {
x <- do.call(rbind.data.frame, x)
}
x
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the speed boost for including data.table or collapse worth the added depencies? I'm mostly agnostic but didn't want to add a bunch of new dependencies without some clear confirmation.
FWIW – I've also been avoiding dplyr::bind_rows()
ever since purrr::map_dfr()
was superseded (and cited the "confusing semantics with edge cases" for bind_rows
the updated description).
if (is.null(page_size)) { | ||
return(max_records) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure that the type is numeric. Because this might error or return NA
which is not checked later
if (!is.numeric(page_size)) { | |
cli::cli_abort("{.arg page_size} must be a numeric scalar not {.obj_type_friendly {class(page_size)}}", call = error_call) | |
} |
R/arc-select.R
Outdated
|
||
if (page_size > max_records) { | ||
cli::cli_abort( | ||
"{.arg page_size} ({page_size}) cannot excede layer's {.field maxRecordCount} property ({max_records})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm. I agree with this check. But the presence of a default value for max_records
might mean that if we are I am lazy we might report the wrong number (25000) instead of the layers value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value should not be used as long as the layer has a maxRecordsCount value but I dropped it regardless.
empty_res <- vapply(res, rlang::is_empty, FALSE) | ||
|
||
if (all(empty_res)) { | ||
cli::cli_alert_info("No features returned from query") | ||
return(data.frame()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a repro for this scenario? We should have a test for it. Because I feel like I fixed a related issue with parse_esri_json()
not too long ago! R-ArcGIS/arcgisutils#21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can get it to trigger. The existing code checked res after the row-binding was complete to see it it was NULL and then returned an empty data frame - this should catch the same situation. Ideally, it would have no features but the structure of the data frame with columns and CRS would be identical to the request if it had been successful.
Thank you!! I've provided some initial thoughts. I'm not all here mentally right now as I'm at our Esri DevSummit to talk about this. :) I'll give it a more thorough review next week. Regarding the MULTIPOLYGON situation, I might bother the lead dev for the ArcGIS API for Python while I'm here. There may be something else that we could be doing to support it more simply. One option I have in my head is that we can take the sfc and |
Also: - Correct typo in vctrs::vec_rbind call: error_call -> .error_call - Add validate_n_feats helper to consolidate cli_abort calls
`arc_select()` is now returning OBJECTID column in every query (even when fields is supplied)
Partial match errored
- Work w/ NULL n_feats values - Ensure numeric value for page_size and max_records - Remove default value for max_records
@JosiahParry no rush on my account, I ran into both of these bugs with a project today and figured I'd just get it fixed and open the PR while I had it on my mind. Hope the conference goes well! Curious to hear what your colleague recommends but I would strongly recommend against having the filter_geom error on MULTIPOLYGON inputs since so |
- Add `match_fields` helper to improve checks for fields (check length 1 values) - Move collapsing fields to validate_params - Avoid returning Object ID field if not selected - Restore sf class after dropped by vctrs::vec_rbind - Simplify update_params internals
Also - avoid duplicate call to validate_params w/in count_results - disable call to rbind_res_list (issue w/ dropping sf class wasn't fixed by last)
Changes to arc_select drop object ID as expected
Also guard against attempting to drop names from empty data frame
@elipousson I have time in the next few days to review this. Do you mind trying to write out what the objective of this PR is? It's a fairly large PR which makes it a bit tough to review! |
It is largely still the same as in the initial description – I ran into a handful of small bugs and found more when I went to fix them. Here are the issues addressed by this PR:
There is also some additional minor refactoring with no change in features that I addressed along the way including adding the |
I'm having a fair bit of trouble reviewing this PR because it's so large and broad. This PR is a bit too sweeping to merge at this moment. I think we should break this into smaller more bite-sized pieces that are a bit easier and clearer to tackle. Right now it's all a bit too entwined. I can try and piece apart what the objectives are and then we can start smaller? Some of the things that this PR does that I think can be broken down into smaller PRs. I'll work on creating issues.
|
Thanks for creating all those issues! Totally understand the need to split it up. I need to get a better handle on branching so I can keep developing on my fork and throw things into branches as appropriate. As is, these mega-PRs are just me fixing things on main as I work through an issue which can be impractical for collaboration. Did you want me to close this PR after I get the more focused ones opened? Or rename this one after pulling out anything beyond the initial scope? |
Let's keep this open for now or convert to draft. I think it would be nice to crib from this for each of the PRs. I think first of all, creating a utility row-binding function would be most immediately useful (selfishly as I'm working on arcgeocode). What I think would be easiest / most impactful in order:
|
@JosiahParry I have not forgotten about this – just been busy with work. I was able to pull a new branch off from main so I can use that to implement the changes and open a new PR (one at a time unless the changes cover completely separate files) for each of those 5 items. |
@elipousson no worries! I'm a bit distracted as well. I've done #1 in https://github.com/R-ArcGIS/arcgisutils/pull/42/files |
Closing this one out since all of the changes are either merged or included with #186 |
Checklist
devtools::document()
devtools::check()
passes locallyChanges
I'll update this description with more detail later this evening.
filter_geom
inputs with"MULTIPOLYGON"
geometry #166 (and add tests forprepare_spatial_filter()
arc_read()
created byarc_select()
now always returning OBJECTID (new issue sometime in last month?)collect_layer()
Issues that this closes
See changes. I'd add more GH issues but my existing reprex is gnarly and I don't know if/when I'll have time to pull together a good one. The
arc_read()
regression is evident in the existing errored tests – it maybe should get fixed in{arcgisutils}
if you can tell what the root source of the problem is.Follow up tasks
Update README with correct information on handling of
filter_geom
. That change could easily be added to this PR along with the updates to NEWS.