Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count number of geometry primitives #202

Open
anthonynorth opened this issue Oct 9, 2023 · 5 comments
Open

Count number of geometry primitives #202

anthonynorth opened this issue Oct 9, 2023 · 5 comments

Comments

@anthonynorth
Copy link
Contributor

I'm trying to count the number of geometry primitives (point, linestring, polygon) per element. Is there a better / more direct approach than this?

wk_size <- function(handleable) {
  vector_meta <- wk::wk_vector_meta(handleable)
  feature_id <- seq_len(vector_meta$size)

  count <- wk:::wk_handle(
    handleable,
    wk::wk_flatten_filter(
      wk::wk_count_handler(),
      # ensure flat geoms
      max_depth = 1000L,
      add_details = TRUE
    )
  )

  count <- vctrs::new_data_frame(c(
    attr(count, "wk_details", exact = TRUE),
    count
  ))

  # get feature_ids and sizes
  feat_rle <- rle(count$feature_id)
  # or vctrs::vec_locate_sorted_groups(count$feature_id)

  # empty features may have been discarded from flatten. put those back
  idx <- match(feature_id, feat_rle$values)
  size <- replace(feat_rle$lengths[idx], is.na(idx), 0L)

  vctrs::new_data_frame(list(
    feature_id = feature_id,
    size = size
  ))
}

# example
wkt <- wk::wkt(c(
  "MULTIPOINT (1 1, 2 2)",
  "MULTIPOINT EMPTY",
  "POLYGON ((1 1))",
  "GEOMETRYCOLLECTION (POINT (1 1), MULTIPOLYGON (((2 2))))"
))

wk_size(wkt)
#>   feature_id size
#> 1          1    2
#> 2          2    0
#> 3          3    1
#> 4          4    2

Created on 2023-10-09 with reprex v2.0.2

@anthonynorth
Copy link
Contributor Author

If counting primitives isn't currently directly supported, could we add an additional column to wk_count_handler() wk_details to capture it? Perhaps n_primitive?

@paleolimbot
Copy link
Owner

I think adding a simple geometry count to whatever is returned by wk_count() would be useful! I probably won't get around to implementing it before the next release of wk but I'd be happy to review a PR.

I also want to avoid adding too many new features to the reader/handler system...for a variety of reasons, I'd like to move towards a pattern of making calculations from "chunks" of geoarrow arrays (but that functionality hasn't quite landed yet). The most important reason is that writing handlers is incredibly error-prone!

@anthonynorth
Copy link
Contributor Author

This is now low priority for me also. I'm approaching this problem differently, working directly with coords (I needed the coords anyway) and counting unique parts from that. Since empty parts are dropped from coords, parts = simple geometries.

Related: n_geom includes collections. I didn't expect this. Should collections be a separate count instead?

wk::wkt("MULTIPOINT (1 1)") |>
  wk::wk_count()
#>   n_geom n_ring n_coord
#> 1      2      0       1

Created on 2023-10-13 with reprex v2.0.2

@paleolimbot
Copy link
Owner

Yeah, I remember being surprised by that, too. I'm open to the best solution, there...new column? Make n_geom NA for non-collections?

@anthonynorth
Copy link
Contributor Author

Separating collections and primitives makes sense to me. Something like this?

wk::wkt("MULTIPOINT (1 1)") |>
  wk::wk_count()
#>   n_multi_geom n_simple_geom n_ring n_coord
#> 1            1             1      0       1

I don't mind what these counts are named, provided they're clear enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants