Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarise.sf running into a null polygon geometry causes R to crash #2053

Closed
alankjackson opened this issue Dec 5, 2022 · 8 comments
Closed

Comments

@alankjackson
Copy link

Describe the bug
File containing an empty polygon geometry cause summarise to crash R

> bad_states %>% group_by(value) %>% summarize(num=sum(aland, na.rm = TRUE))

 *** caught segfault ***
address 0x8, cause 'memory not mapped'

Traceback:
 1: cpp_s2_unary_union(x, options)
 2: structure(x, class = c("s2_geography", "wk_vctr"))
 3: new_s2_geography(cpp_s2_unary_union(x, options))
 4: s2_union(x, options = options)
 5: cpp_s2_union_agg(s2_union(x, options = options), options, na.rm)
 6: structure(x, class = c("s2_geography", "wk_vctr"))
 7: new_s2_geography(cpp_s2_union_agg(s2_union(x, options = options),     options, na.rm))
 8: s2::s2_union_agg(x, ...)
 9: st_as_sfc(s2::s2_union_agg(x, ...), crs = st_crs(x))
10: st_union.sfc(geom[i == x], is_coverage = is_coverage)
11: st_union(geom[i == x], is_coverage = is_coverage)
12: FUN(X[[i]], ...)
13: lapply(sort(unique(i)), function(x) st_union(geom[i == x], is_coverage = is_coverage))
14: summarise.sf(., num = sum(aland, na.rm = TRUE))
15: summarize(., num = sum(aland, na.rm = TRUE))
16: bad_states %>% group_by(value) %>% summarize(num = sum(aland,     na.rm = TRUE))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

To Reproduce

library(tidyverse)

#   Grab state boundaries as convenient polygon set
states <- USAboundaries::us_states(resolution = "low")

#   Create a bogus state to join
state_abbv <- c(states$state_abbr, "??") 
new_state <- tibble::tibble(value=1:53, state_abbr=state_abbv)

#   Join to original state dataframe to create a bad polygon
bad_states <- left_join(new_state, states, by="state_abbr")
bad_states <- sf::st_as_sf(bad_states)

#   The summarize will crash the rstudio session
bad_states %>% 
  group_by(value) %>% 
    summarize(num=sum(aland, na.rm = TRUE)) 

If reporting a change from previous versions

Please read https://cran.r-project.org/web/packages/sf/news/news.html first.

Additional context
Ubuntu 20.04.5 LTS
R version 4.2.1 (2022-06-23)

Paste the output of your `sessionInfo()` and `sf::sf_extSoftVersion()`

sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.2.1

sf::sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
"3.8.0" "3.0.4" "6.3.1" "true" "true"
PROJ
"6.3.1"

@edzer
Copy link
Member

edzer commented Dec 5, 2022

Could you please report the output of sessionInfo() after loading tidyverse and sf?

@rsbivand
Copy link
Member

rsbivand commented Dec 5, 2022

I can see a segfault in s2 unary union. Am updating packages to get the freshest troubleverse versions, s2 as CRAN, sf local devel version. Will re-run with gdb later today.

@rsbivand
Copy link
Member

rsbivand commented Dec 5, 2022

library(tidyverse)
library(sf)
states <- USAboundaries::us_states(resolution = "low")
state_abbv <- c(states$state_abbr, "??") 
new_state <- tibble::tibble(value=1:53, state_abbr=state_abbv)
bad_states <- left_join(new_state, states, by="state_abbr")
bad_states <- st_as_sf(bad_states)
st_is_longlat(bad_states)
a <- group_by(bad_states, value)
summarize(a, num=sum(aland, na.rm = TRUE))

# Thread 1 "R" received signal SIGSEGV, Segmentation fault.
s2geography::s2_find_validation_error (geog=..., error=error@entry=0x7ffffffbdb60) at /usr/include/c++/12/bits/unique_ptr.h:191
191	      pointer    _M_ptr() const noexcept { return std::get<0>(_M_t); }

(gdb) bt
#0  s2geography::s2_find_validation_error (geog=..., 
    error=error@entry=0x7ffffffbdb60)
    at /usr/include/c++/12/bits/unique_ptr.h:191
#1  0x00007fffdb75ca1d in s2geography::s2_find_validation_error (geog=..., 
    error=error@entry=0x7ffffffbdb60)
    at /usr/include/c++/12/bits/unique_ptr.h:191
#2  0x00007fffdb76077c in s2geography::s2_unary_union (geog=..., options=...)
    at s2geography/build.cc:202
#3  0x00007fffdb738051 in Op::processFeature (this=<optimized out>, 
    feature=..., i=<optimized out>) at s2-transformers.cpp:295
#4  0x00007fffdb7237b3 in UnaryGeographyOperator<Rcpp::Vector<19, Rcpp::PreserveStorage>, SEXPREC*>::processVector (this=0x7ffffffbded0, geog=...)
    at /tmp/RtmpxIxiha/R.INSTALL38cc872f3842/s2/src/geography-operator.h:35
#5  0x00007fffdb737f84 in cpp_s2_unary_union (geog=..., s2options=...)
    at s2-transformers.cpp:306
#6  0x00007fffdb747f9c in _s2_cpp_s2_unary_union (geogSEXP=0x80c6d68, 
    s2optionsSEXP=<optimized out>) at RcppExports.cpp:1293
#7  0x00007ffff7afd09c in R_doDotCall (
    ofun=ofun@entry=0x7fffdb747f10 <_s2_cpp_s2_unary_union(SEXP, SEXP)>, 
    nargs=nargs@entry=2, cargs=cargs@entry=0x7ffffffc0830, 
    call=call@entry=0x8ad5d30) at ../../../src/main/dotcode.c:604
#8  0x00007ffff7b3e2f3 in bcEval (body=<optimized out>, rho=<optimized out>, 
    useCache=<optimized out>) at ../../../src/main/eval.c:7692
#9  0x00007ffff7b53e90 in Rf_eval (e=0x8ad5dd8, rho=rho@entry=0xa2782c0)
    at ../../../src/main/eval.c:748
#10 0x00007ffff7b5598a in R_execClosure (call=call@entry=0x89d3d48, 
    newrho=newrho@entry=0xa2782c0, sysparent=<optimized out>, 
    rho=rho@entry=0xa1c8950, arglist=arglist@entry=0xa2783d8, 
    op=op@entry=0x8ad6190) at ../../../src/main/eval.c:1918
#11 0x00007ffff7b5684e in Rf_applyClosure (call=call@entry=0x89d3d48, 
    op=op@entry=0x8ad6190, arglist=arglist@entry=0xa2783d8, 
    rho=rho@entry=0xa1c8950, suppliedvars=<optimized out>)
    at ../../../src/main/eval.c:1844
#12 0x00007ffff7b419fe in bcEval (body=<optimized out>, rho=<optimized out>, 
    useCache=<optimized out>) at ../../../src/main/eval.c:7104
#13 0x00007ffff7b53e90 in Rf_eval (e=0x89d3680, rho=0xa1c8950)
    at ../../../src/main/eval.c:748
#14 0x00007ffff7b54812 in forcePromise (e=e@entry=0xa2748b8)

No problem with sf_use_s2(FALSE). @paleolimbot any idea why an empty geometry causes havoc here?

@paleolimbot
Copy link
Contributor

Off the top of my head, no, but I remember that the unary union was something I had to implement "by hand" (i.e., it's not in S2 proper) so it's likely an error of mine somewhere. I'll step through the example in the debugger later today 🙂

@fmark
Copy link

fmark commented Jan 12, 2023

I get a similar error trying to do a summarise on a different data set with an empty geometry using a different dataset.

@rsbivand
Copy link
Member

@fmark Did you try to install the development version of s2 in which the problem was fixed? You give no details of your data set - GEOGCRS not PROJCRS, for example?

@paleolimbot
Copy link
Contributor

It will be a few days until there are binaries, but the s2 version that fixes this is now on CRAN!

@alankjackson
Copy link
Author

Awesome! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants