Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict for kmeans #1314

Closed
aloboa opened this issue Oct 17, 2023 · 13 comments
Closed

predict for kmeans #1314

aloboa opened this issue Oct 17, 2023 · 13 comments

Comments

@aloboa
Copy link

aloboa commented Oct 17, 2023

Applying kmeans to all pixels of a large multispectral image is not realistic.
A better way would be to apply kmeans to a random sample and then have
a method for predict() (or an specific predict()) to assign each pixel to its most
similar class.
Could this be implemented?

@kadyb
Copy link
Contributor

kadyb commented Oct 17, 2023

I think you should use the kmeans() function which has a predict method, e.g. clue::cl_predict() or ClusterR::predict_KMeans(). You can also write prediction method for the stats::kmeans() function: https://stackoverflow.com/questions/53352409/creation-prediction-function-for-kmean-in-r/53352914#53352914

@aloboa
Copy link
Author

aloboa commented Oct 17, 2023

Is there a terra::kmeans() function? The one I see is stats::means().

@kadyb
Copy link
Contributor

kadyb commented Oct 17, 2023

Is there a terra::kmeans() function?

No, but you will find plenty of other packages for clustering on CRAN Task View.

Here is example with stats::kmeans() (instead of defined method, you can use {clue} or {ClusteR} package):

library("terra")

predict.kmeans = function(x, newdata) {
  apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))
}
  
f = system.file("ex/logo.tif", package = "terra")
r = rast(f)
smp = spatSample(r, size = 30)
mdl = kmeans(smp, centers = 3)
pr = predict(r, mdl)
pr

@aloboa
Copy link
Author

aloboa commented Oct 17, 2023

Perfect, thanks.
My request is adding such a predict.kmeans() function as an standard function in terra.

@aloboa aloboa closed this as completed Oct 17, 2023
@aloboa aloboa reopened this Oct 18, 2023
@kadyb
Copy link
Contributor

kadyb commented Oct 18, 2023

Personally, I think it would be better to add predict method upstream to R, then all packages could benefit, e.g. {raster}, {stars}, etc. The question remains why add it to {terra} only, if the function is already available in several packages / we can write our own function.

rhijmans added a commit that referenced this issue Oct 18, 2023
@rhijmans
Copy link
Member

I have added a k_means method (could not use kmeans the way I would like to because it has no ellipses). I think it is worthwhile having as it is so commonly used, and because dealing with NAs is a bit cumbersome.

@aloboa
Copy link
Author

aloboa commented Oct 18, 2023

Cool, thanks.

@aloboa aloboa closed this as completed Oct 18, 2023
@kadyb
Copy link
Contributor

kadyb commented Oct 18, 2023

@rhijmans, maybe you could consider using loop with vector allocation instead of apply() -- it should run a little faster and use less memory.

predict_1 = function(x, newdata) {
  apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))
}

predict_2 = function(x, newdata) {
  vec = integer(nrow(newdata))
  newdata = as.matrix(newdata)
  x = x$centers
  for (i in seq_len(nrow(newdata))) {
    vec[i] = which.min(colSums((t(x) - newdata[i, ])^2))
  }
  return(vec)
}

n = 1e6
df = data.frame(x = rnorm(n), y = rnorm(n), z = rnorm(n))
mdl = kmeans(df, centers = 5)

system.time(predict_1(mdl, df))
#>    user  system elapsed
#>   11.09    0.33   11.42
system.time(predict_2(mdl, df))
#>    user  system elapsed
#>    7.91    0.11    8.01

rhijmans added a commit that referenced this issue Oct 19, 2023
@Nowosad
Copy link
Contributor

Nowosad commented Dec 22, 2023

@rhijmans the k_means() function does not work correctly:

library(terra)
#> terra 1.7.67
f <- system.file("ex/logo.tif", package = "terra")
r <- rast(f)
km <- k_means(r, centers=5)
#> Warning: [setValues] values were recycled
km
#> class       : SpatRaster 
#> dimensions  : 77, 101, 1  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
#> coord. ref. : Cartesian (Meter) 
#> source(s)   : memory
#> name        : lyr1 
#> min value   :    1 
#> max value   :    5
plot(km)

@rhijmans
Copy link
Member

Thank you. Fixed.

@Nowosad
Copy link
Contributor

Nowosad commented Dec 27, 2023

It now works for rasters without NAs, but when NAs do exist:

library(terra)
#> terra 1.7.67
f <- system.file("ex/logo.tif", package = "terra")
r <- rast(f)
r[1] <- NA
km <- k_means(r, centers=5)
#> Error in do_one(nmeth): NA/NaN/Inf in foreign function call (arg 1)

rhijmans added a commit that referenced this issue Dec 27, 2023
@rhijmans
Copy link
Member

This is embarrassing, but thank you for your patience. Looks like it works again.

@Nowosad
Copy link
Contributor

Nowosad commented Dec 27, 2023

Robert -- thank you a lot for your work!

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 10, 2024
# version 1.7-83

## bug fixes

- `flip(direction="vertical")` failed in some cases
  [#1518](rspatial/terra#1518) by Ed Carnell

- `zonal(as.raster=TRUE)` failed when the zonal raster was categorical
  [1514](rspatial/terra#1514) by Jessi L
  Brown

- `distance<data.frame,data.frame>` and `<matrix,matrix>` ignored the
  unit
  argument. [#1545](rspatial/terra#1545) by
  Wencheng Lau-Medrano

- NetCDF files with month time-step encode from 0-11 made R crash
  [#1544](rspatial/terra#1544) by Martin
  Holdrege

- `split<SpatVector>` only worked well if the split field was of type
  character. [#1530](rspatial/terra#1530) by
  Igor Graczykowski

- `gridDist` (and probably some other methods) emitted a "cannot
  overwrite existing file" error when processing large datasets
  [#1522](rspatial/terra#1522) by Clare
  Pearson

- `terrain` did not accept multiple variables
  [#1561](rspatial/terra#1561) by Michael
  Mahoney

- `rotate` was vulnerable to an integer overflow
  [#1562](rspatial/terra#1562) by Sacha
  Ruzzante

- `getTileExtents` could return overlapping tiles or tiles with gaps
  due to floating point
  imprecision. [#1564](rspatial/terra#1564)
  by Michael Sumner


## enhancements

- `as.list<SpatRasterDataset>` sets the names of the list
  [#1513](rspatial/terra#1513)

- a SpatVectorCollection can now be subset with its names; and if made
  from a list it takes the names from the list.
  [1515](rspatial/terra#1515) by jedgroev

- argument `fill_range` to plot<SpatRaster> and `plot<SpatVector>` to
  use the color of the extreme values of the specified range
  [#1553](rspatial/terra#1553) by Mike
  Koontz

- plet<SpatRaster> can now handle rasters with a "local" (Cartesian)
  CRS. [#1570](rspatial/terra#1570) by
  Augustin Lobo.

## new

- `map-region` returns the coordinates of the axes position of a map
  created with `plot<Spat*>`
  [https://github.com/rspatial/terra/issues/1517](https://github.com/rspatial/terra/issues/1517)
  by Daniel Schuch

- `polys<leaflet>` method
  [#1543](rspatial/terra#1543) by Márcia
  Barbosa

- `plot<SpatVectorCollection>` method
  [#1532](rspatial/terra#1532) by jedgroev

- `add_mtext` to add text around the margins of a
  map. [#1567](rspatial/terra#1567) by
  Daniel Schuch

# version 1.7-78

Released 2023-05-22

## bug fixes

- `writeVector` and `readVector` better handle empty geopackage layers
  [#1426](rspatial/terra#1426) by Andrew
  Gene Brown.

- `writeCDF` only wrote global variables if there was more than one
  [#1443](rspatial/terra#1443) by Daniel
  Schlaepfer

- `rasterize` with "by" returned odd layernames
  [#1435](rspatial/terra#1435) by Philippe
  Massicotte

- `convHull`, `minCircle` and `minRect` with a zero-row SpatVector
  crashed R [#1445](rspatial/terra#1445) by
  Andrew Gene Brown

- `rangeFill` with argument `circular=TRUE` did not work properly
  [#1460](rspatial/terra#1460) by Alice

- `crs(describe = TRUE)` returned an mis-ordered extent
  [#1485](rspatial/terra#1485) by Dimitri
  Falk

- `tapp` with a custom function and an index like "yearmonths" could
  shift time for not considering the time
  zone. [#1483](rspatial/terra#1483) by Finn
  Roberts

- `plot<SpatRaster>` could fail when there were multiple values with
  very small differences
  [#1491](rspatial/terra#1491) by srfall

- `as.data.frame<SpatRaster>` with "xy=TRUE" and "wide=FALSE" could
  fail if coordinates were very similar
  [#1476](rspatial/terra#1476) by Pascal
  Oettli

- `rasterizeGeom` now returns the correct layer name
  [#1472](rspatial/terra#1472) by
  HRodenhizer

- `cellSize` with "mask=TRUE" failed if the output was to be written
  to a temp file
  [#1496](rspatial/terra#1496) by Pascal
  Sauer

- `ext<SpatVectorProxy>` did not return the full extent
  [#1501](rspatial/terra#1501) by
  erkent-carb


## enhancements

- `extract` has new argument "small=TRUE" to allow for strict use of
  "touches=FALSE"
  [#1419](rspatial/terra#1419) by Floris
  Vanderhaeghe.

- `as.list<SpatRaster>` has new argument "geom=NULL"

- `rast<list>` now recognizes (x, y, z) base R "image" structures
  [stackoverflow]
  (https://stackoverflow.com/questions/77949551/rspatial-convert-a-grid-list-to-a-raster-using-terra)
  by Ignacio Marzan.

- `inset` has new arguments "offset" and "add"
  [#1422](rspatial/terra#1422) by Armand-CT

- `expanse<SpatRaster>` has argument `usenames`
  [#1446](rspatial/terra#1446) by Bappa Das

- the default color palette is now `terra::map.pal("viridis")` instead
  of `terrain.colors`. The default can be changes with
  `options(terra.pal=...)`
  [#1474](rspatial/terra#1474) by Derek
  Friend

- `as.list<SpatRasterDataset>` now returns a named
  list. [#1513](rspatial/terra#1513) by Eric
  R. Scott


## new

- `bestMatch<SpatRaster>` method

- argument "pairs=TRUE" to `cells` [https://github.com/rspatial/terra/issues/1487](https://github.com/rspatial/terra/issues/1487) by Floris Vanderhaeghe

- `add_grid` to add a grid to a map


# version 1.7-71

Released 2023-01-31

## bug fixes

- k_means did not work if there were NAs
  [#1314](rspatial/terra#1314) by Jakub
  Nowosad

- `layerCor` with a custom function did not work anymore
  [#1387](rspatial/terra#1387) by Jakub
  Nowosad

- `plet` broke when using "panel=TRUE"
  [#1384](rspatial/terra#1384) by Elise
  Hellwig

- using /vis3/ to open a SpatRaster did not work
  [#1382](rspatial/terra#1382) by Mike
  Koontz

- `plot<SpatRaster>(add=TRUE)` sampled the raster data without
  considering the extent of the
  map. [#1394](rspatial/terra#1394) by
  Márcia Barbosa

- `plot<SpatRaster>(add=TRUE)` now only considers the first layer of a
  multi-layer SpatRaster
  [1395](rspatial/terra#1395) by Márcia
  Barbosa

- `set.cats` failed with a tibble was used instead of a data.frame
  [#1406](rspatial/terra#1406) by Mike
  Koontz

- `polys` argument "alpha" was ignored if a single color was
  used. [#1413](rspatial/terra#1413) by
  Derek Friend

- `query` ignore the "vars" argument if all rows were
  selected. [#1398](rspatial/terra#1398) by
  erkent-carb.

- `spatSample` ignored "replace=TRUE" with random sampling,
  na.rm=TRUE, and a sample size larger than the non NA
  cells. [#1411](rspatial/terra#1411) by
  Babak Naimi

- `spatSample` sometimes returned fewer values than requested and
  available for lonlat
  rasters. [#1396](rspatial/terra#1396) by
  Márcia Barbosa.


## enhancements

- `vect<character>` now has argument "opts" for GDAL open options,
  e.g. to declare a file
  encoding. [#1389](rspatial/terra#1389) by
  Mats Blomqvist

- `plot(plg=list(tic=""))` now allows choosing alternative continuous
  legend tic-mark styles ("in", "out", "through" or "none")

- `makeTiles` has new argument "buffer"
  [#1408](rspatial/terra#1408) by Joy
  Flowers.


## new

- `prcomp<SpatRaster>` method
  [#1361](rspatial/terra#1361 (comment))
  by Jakub Nowosad

- `add_box` to add a box around the map. The box is drawn where the
  axes are, not around the plotting region.

- `getTileExtents` provides the extents of tiles. These may be used in
parallelization. See [#1391](https://github.com/rspa
tial/terra/issues/1391) by Alex Ilich.


# version 1.7-65

Released 2023-12-15

## bug fixes

- `flip` with argument `direction="vertical"` filed in some cases with
   large rasters processed in chunks
   [0b714b0](rspatial/terra@0b714b0)
   by Dulci on [stackoveflow](
   https://stackoverflow.com/questions/77304534/rspatial-terraflip-error-when-flipping-a-multi-layer-spatrast-object)

- SpatRaster now correctly handles `NA & FALSE` and `NA | TRUE`
  [#1316](rspatial/terra#1316) by John Baums

- `set.names` wasn't working properly for SpatRasterDataset or
  SpatRasterCollection
  [#1333](rspatial/terra#1333) by Derek Friend

- `extract` with argument "layer" not NULL shifted the layers
  [#1332](rspatial/terra#1332) by Ewan
  Wakefield

- `terraOptions` did not capture "memmin" on
  -[stackoverflow](https://stackoverflow.com/questions/77552234/controlling-chunk
  -size-in-terra) by dww

- `rasterize` with points and a built-in function could crash if no
  field was used
  [#1369](rspatial/terra#1369) by
  anjelinejeline


## enhancements

- `mosaic` can now use `fun="modal"`

- `rast<matrix> and rast<data.frame>` now have option 'type="xylz"
  [#1318](rspatial/terra#1318) by Agustin
  Lobo

- `extract<SpatRaster,SpatVector>` can now use multiple summarizing
  functions [#1335](rspatial/terra#1335) by
  Derek Friend

- `disagg` and `focal` have more optimistic memory requirement
  estimation [#1334](rspatial/terra#1334) by
  Mikko Kuronen

## new

- `k_means<SpatRaster>` method
  [#1314](rspatial/terra#1314) by Agustin
  Lobo

- `princomp<SpatRaster>` method
  [#1361](rspatial/terra#1361) by Alex Ilich

- `has.time<SpatRaster>` method

- new argument "raw=FALSE" to `rast`, `sds`, and `sprc` to allow
  ignoring scale and offset
  [1354](rspatial/terra#1354) by Insang Song


# version 1.7-55

Released 2023-10-14

## bug fixes

- `mosaic` ignored the filename argument if the SpatRasterCollection
  only had a single SpatRaster
  [#1267](rspatial/terra#1267) by Michael
  Mahoney

- Attempting to use `extract` with a raster file that had been deleted
  crashed R. [#1268](rspatial/terra#1268) by
  Derek Friend

- `split<SpatVector,SpatVector>` did not work well in all
  cases. [#1256](rspatial/terra#1256) by
  Derek Corcoran Barrios

- `intersect` with two SpatVectors crashed R if there was a date/time
variable [#1273]( rspatial/terra#1273) by
Dave Dixon

- "values=FALSE" was ignored by
  `spatSample<SpatRaster>(method="weights")`
  [#1275](rspatial/terra#1275) by François
  Rousseu

- `coltab<-` again works with a list as value
[#1280](rspatial/terra#1280) by Diego
Hernangómez

- `stretch` with histogram equalization was not memory-safe
  [#1305](rspatial/terra#1305) by Evan Hersh

- `plot` now resets the "mar" parameter
  [#1297](rspatial/terra#1297) by Márcia
  Barbosa

- `plotRGB` ignored the "smooth" argument
  [#1307](rspatial/terra#1307) by Timothée
  Giraud


## enhancements

- argument "gdal" in `project` was renamed to "use_gdal"
  [#1269](rspatial/terra#1269) by Stuart
  Brown.

- SpatVector attributes can now be stored as an ordered factor
  [#1277](rspatial/terra#1277) by Ben Notkin

- `plot<SpatVector>` now uses an "interval" legend when breaks are
  supplied [#1303](rspatial/terra#1303) by
  Gonzalo Rizzo

- `crop<SpatRaster>` now keeps more metadata, including variable names
  [#1302](rspatial/terra#1302) by rhgof

- `extract(fun="table")` now returns an easier to use data.frame
[#1294](rspatial/terra#1294) by Fernando
Aramburu.


## new
- `metags<-` and `metags` to set arbitrary SpatRaster/file level
   metadata [#1304](https://github.com/rspatial/terra/issues/ 1304) by
   Francesco Chianucci

# version 1.7-46

Released 2023-09-06

## bug fixes

- `plot<SpatVector>` used the wrong main label in some cases
  [#1210](rspatial/terra#1210) by Márcia
  Barbosa

- `plotRGB` failed with an "ext=" argument
  [#1228](rspatial/terra#1228) by Dave Edge

- `rast<array>` failed badly when the array had less than three
  dimensions. [#1254](rspatial/terra#1254)
  by andreimirt.

- `all.equal` for a SpatRaster with multiple layers
[#1236](rspatial/terra#1236) by Sarah
Endicot t

- `zonal(wide=FALSE)` could give wrong results if the zonal SpatRaster
  had "layer" as
  layername. [#1251](rspatial/terra#1251) by
  Jeff Hanson

- `panel` now support argument "range"
  [#141](rspatial/terra#1241) by Jakub
  Nowosad

- `rasterize` with `by=` returned wrong layernames if the by field was
  not sorted [#1266](rspatial/terra#1266) by
  Sebastian Dunnett

- `mosaic` with multiple layers was not correct
  [#1262](rspatial/terra#1262) by
  Jean-Romain


## enhancements

- `wrap<SpatRaster>` now stores color tables
  [#1215](rspatial/terra#1215) by Patrick
  Brown

- `global` now has a "maxcell" argument
  [#1213](rspatial/terra#1213) by Alex Ilich

- `layerCor` with fun='pearson' now returns output with the layer
  names [#1206](rspatial/terra#1206)

- `vrt` now has argument "set_names"
  [#1244](rspatial/terra#1244) by sam-a-levy

- `vrt` now has argument "return_filename"
  [#1258](rspatial/terra#1258) by Krzysztof
  Dyba

- `project<SpatRaster>` has new argument "by_util" exposing the GDAL
  warp utility [#1222](rspatial/terra#1222) by
  Michael Sumner.


## new
- `compareGeom` for list and SpatRasterCollection
  [#1207](rspatial/terra#1207) by Sarah
  Endicott

- `is.rotated<SpatRaster>` method
  [#1229](rspatial/terra#1229) by Andy Lyons

- `forceCCW<SpatVector>` method to force counter-clockwise orientation
  of polygons [#1249](rspatial/terra#1249)
  by srfall.

- `vrt_tiles` returns the filenames of the tiles in a vrt file
  [#1261](rspatial/terra#1261) by Derek
  Friend

- `extractAlong` to extract raster cell values for a line that are
  ordered along the
  line. [#1257](rspatial/terra#1257) by
  adamkc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants