Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better coercion from SpatRaster #484

Merged
merged 1 commit into from
Dec 17, 2021
Merged

better coercion from SpatRaster #484

merged 1 commit into from
Dec 17, 2021

Conversation

rhijmans
Copy link
Contributor

@rhijmans rhijmans commented Dec 7, 2021

The motivation for this pull request is two fold. First, an upcoming change to terra::sources breaks the current stars::st_as_stars; with the proposed changes it works with the CRAN and the development version. Second, the coercion from a file source ignored the bands; it always assumed that all bands were included. It also ignores the fact that there can be multiple source files (and perhaps the same source file multiple times). With the current version, you get

library(terra)
library(stars)
s <- rast(system.file("ex/logo.tif", package="terra"))   
x <- c(s[[2]], s[[3:1]])
names(x) <- letters[1:4]

### x has two sources (which happen to be the same file) but never the bands as in the file
x
#class       : SpatRaster 
#dimensions  : 77, 101, 4  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#sources     : logo.tif  
#              logo.tif  (3 layers) 
#names       :   a,   b,   c,   d 
#min values  :   0,   0,   0,   0 
#max values  : 255, 255, 255, 255 
 
### note the two values for sid and the band numbers
sources(x, bands=TRUE)
#  sid                                      source bands
#1   1 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     2
#2   2 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     3
#3   2 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     2
#4   2 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     1

### the result of this is not correct (and has a warning)
st_as_stars(x)

#stars object with 3 dimensions and 2 attributes
#attribute(s):
#            Min. 1st Qu. Median     Mean 3rd Qu. Max.
#logo.tif       0     138    206 186.8136     254  255
#logo.tif.1     0     138    206 186.8136     254  255
#dimension(s):
#     from  to offset delta  refsys point              values x/y
#x       1 101      0     1 unnamed FALSE                NULL [x]
#y       1  77     77    -1 unnamed FALSE                NULL [y]
#band    1   3     NA    NA      NA    NA red  , green, blue     
#Warning message:
#In if (file != "") { :
#  the condition has length > 1 and only the first element will be used

### ignore_file=TRUE works better
 
st_as_stars(x, ignore_file=TRUE)
#stars object with 3 dimensions and 1 attribute
#attribute(s):
#   Min. 1st Qu. Median    Mean 3rd Qu. Max.
#a     0     138    204 186.448     254  255
#dimension(s):
#     from  to offset delta  refsys point  values x/y
#x       1 101      0     1 unnamed    NA    NULL [x]
#y       1  77     77    -1 unnamed    NA    NULL [y]
#band    1   4     NA    NA      NA    NA a,...,d    

This pull request changes the behavior to taking the first file and the bands used from that file:

st_as_stars(x)
#stars object with 2 dimensions and 1 attribute
#attribute(s):
#          Min. 1st Qu. Median     Mean 3rd Qu. Max.
#logo.tif     0     138    199 185.3509     255  255
#dimension(s):
#  from  to offset delta  refsys point values x/y
#x    1 101      0     1 unnamed FALSE   NULL [x]
#y    1  77     77    -1 unnamed FALSE   NULL [y]

That is still suboptimal, but not wrong especially if a warning is given about dropped layers.
Better is possible, and I commented a bit more in the code, but I wanted to propose this small step first

@edzer edzer merged commit 3ce49f9 into r-spatial:main Dec 17, 2021
@edzer
Copy link
Member

edzer commented Dec 17, 2021

Thanks, Robert! Yes, I can see how sources(x, bands=TRUE) gives the information to recreate the same object from the sources on disk; not sure stars will go into that level of granularity. What was the use case for this?

For now, I guess if sources() is longer than 1, we should give a warning and suggestion to use ignore_file=TRUE.

edzer added a commit that referenced this pull request Dec 17, 2021
@edzer
Copy link
Member

edzer commented Dec 17, 2021

I think this nails it pretty much. Do you have an example with time series data, preferably with different amount of time instances per source?

The attribute name of the stars object is still odd (same as first band name); what should it be?

library(terra)
# terra version 1.5.2
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1; sf_use_s2() is TRUE
s <- rast(system.file("ex/logo.tif", package="terra"))   
x <- c(s[[2]], s[[3:1]])
names(x) <- letters[1:4]

### x has two sources (which happen to be the same file) but never the bands as in the file
x
# class       : SpatRaster 
# dimensions  : 77, 101, 4  (nrow, ncol, nlyr)
# resolution  : 1, 1  (x, y)
# extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
# sources     : logo.tif  
#               logo.tif  (3 layers) 
# names       :   a,   b,   c,   d 
# min values  :   0,   0,   0,   0 
# max values  : 255, 255, 255, 255 
#class       : SpatRaster 
#dimensions  : 77, 101, 4  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#sources     : logo.tif  
#              logo.tif  (3 layers) 
#names       :   a,   b,   c,   d 
#min values  :   0,   0,   0,   0 
#max values  : 255, 255, 255, 255 
 
### note the two values for sid and the band numbers
sources(x, bands=TRUE)
#   sid                                                          source bands
# 1   1 /home/edzer/R/x86_64-pc-linux-gnu-library/4.0/terra/ex/logo.tif     2
# 2   2 /home/edzer/R/x86_64-pc-linux-gnu-library/4.0/terra/ex/logo.tif     3
# 3   2 /home/edzer/R/x86_64-pc-linux-gnu-library/4.0/terra/ex/logo.tif     2
# 4   2 /home/edzer/R/x86_64-pc-linux-gnu-library/4.0/terra/ex/logo.tif     1
#  sid                                      source bands
#1   1 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     2
#2   2 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     3
#3   2 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     2
#4   2 C:/soft/R/R-4.1.2/library/terra/ex/logo.tif     1

### the result of this is not correct (and has a warning)
st_as_stars(x) 
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#    Min. 1st Qu. Median    Mean 3rd Qu. Max.
# a     0     138    204 186.448     254  255
# dimension(s):
#      from  to offset delta  refsys point  values x/y
# x       1 101      0     1 unnamed    NA    NULL [x]
# y       1  77     77    -1 unnamed    NA    NULL [y]
# band    1   4     NA    NA      NA    NA a,...,d    
st_as_stars(s) 
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#      Min. 1st Qu. Median     Mean 3rd Qu. Max.
# red     0     138    206 186.8136     254  255
# dimension(s):
#      from  to offset delta  refsys point              values x/y
# x       1 101      0     1 unnamed    NA                NULL [x]
# y       1  77     77    -1 unnamed    NA                NULL [y]
# band    1   3     NA    NA      NA    NA red  , green, blue     

@rhijmans
Copy link
Contributor Author

Thanks, perhaps something like this then:

	if (!ignore_file) {
		file = src$source[1]
		if (file != "") {
			if (terra::nsrc(.x) > 1)  {
				warning("ignore_file=FALSE ignored because .x has multiple sources")
			} else {
                             process and return()

There is no particular use case for allowing combining (subsets of) different sources, it is a general usability feature.

@rhijmans
Copy link
Contributor Author

What you suggested was actually more like this, and probably better

       if (!ignore_file) {
		file = src$source[1]
		if (file != "") {
			if (terra::nsrc(.x) > 1)  {
				warning("only the first source is used. Use ignore_file=TRUE to read all layers")
			} 
                        process and return()

@Nowosad Nowosad mentioned this pull request Dec 17, 2021
3 tasks
@rhijmans
Copy link
Contributor Author

rhijmans commented Dec 17, 2021

Although stored by source, I consider "time" a property of the SpatRaster, with each layer having a single timestamp. That can be a POSIX or a Date or "raw" (just a number). Either all layers have a timestamp, or they are all NA. But they may not be consecutive.

s <- rast(system.file("ex/logo.tif", package="terra"))   
time(s) <- as.Date("2001-05-04") + 0:2
ss <- c(s, s)
time(ss)
#[1] "2001-05-04" "2001-05-05" "2001-05-06" "2001-05-04" "2001-05-05"
#[6] "2001-05-06"

time(ss) <- as.Date("2001-05-04") + 0:5
time(ss)
#[1] "2001-05-04" "2001-05-05" "2001-05-06" "2001-05-07" "2001-05-08"
#[6] "2001-05-09"

# first source
time(ss[[1:3]])
#[1] "2001-05-04" "2001-05-05" "2001-05-06"

edzer added a commit that referenced this pull request Dec 17, 2021
@edzer
Copy link
Member

edzer commented Dec 17, 2021

great, we now have

library(terra)
# terra version 1.5.2
s <- rast(system.file("ex/logo.tif", package="terra"))   
time(s) <- as.Date("2001-05-04") + 0:2
ss <- c(s, s)
time(ss) <- as.Date("2001-05-04") + 0:5
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1; sf_use_s2() is TRUE
(x <- c(ss[[1:3]], ss[[1:2]]))
# class       : SpatRaster 
# dimensions  : 77, 101, 5  (nrow, ncol, nlyr)
# resolution  : 1, 1  (x, y)
# extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
# sources     : logo.tif  (3 layers) 
#               logo.tif  (2 layers) 
# names       : red, green, blue, red, green 
# min values  :   0,     0,    0,   0,     0 
# max values  : 255,   255,  255, 255,   255 
# time        : 2001-05-04 to 2001-05-06 
st_as_stars(x)
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#      Min. 1st Qu. Median     Mean 3rd Qu. Max.
# red     0     136    203 185.6155     254  255
# dimension(s):
#      from  to offset delta  refsys point                    values x/y
# x       1 101      0     1 unnamed    NA                      NULL [x]
# y       1  77     77    -1 unnamed    NA                      NULL [y]
# time    1   5     NA    NA    Date    NA 2001-05-04,...,2001-05-05    

@edzer
Copy link
Member

edzer commented Dec 17, 2021

(still a funny attribute name: red)

@rhijmans
Copy link
Contributor Author

I am not sure, but I take it that an attribute is a (sub)-dataset? If so, there is no direct match, as a SpatRaster always is a single dataset. So perhaps just use a generic name like "data"?

@rhijmans
Copy link
Contributor Author

Actually, there are also varnames(x) but these will almost always be "". Currently they are only assigned when reading from a multi-variable file. However, I can also set them for other files, by using basename(filename_sans_extension), I think that is what stars does as well. Of course, given that you have the filename you can also do this in stars, but if I change it in terra, consistency would be ensured.

Also, any chance of getting this to CRAN soon? It would help as I could then update terra for UCRT.

edzer added a commit that referenced this pull request Dec 17, 2021
@edzer
Copy link
Member

edzer commented Dec 18, 2021

Also, any chance of getting this to CRAN soon? It would help as I could then update terra for UCRT.

submitted!

@edzer
Copy link
Member

edzer commented Dec 19, 2021

On CRAN now...

@ailich
Copy link
Contributor

ailich commented Jun 5, 2022

There still seems to be some friction when converting between raster, terra, and stars objects particularly for multiband rasters that have a categorical layer. Also potentially some issues related to how categories are written/read in from/to a file in each package. I've installed the latest development versions of each package.

library(raster) #‘3.5.19’
library(terra) #‘1.5.40’
library(stars) #‘0.5.6’
packageVersion("sf") #‘1.0.8’
# R.version
# platform       x86_64-w64-mingw32          
# arch           x86_64                      
# os             mingw32                     
# system         x86_64, mingw32             
# status                                     
# major          4                           
# minor          0.4                         
# year           2021                        
# month          02                          
# day            15                          
# svn rev        80002                       
# language       R                           
# version.string R version 4.0.4 (2021-02-15)
# nickname       Lost Library Book 

r<- rast(nrow=100, ncol=100, nlyr=2)
set.seed(5)
values(r[[1]])<- rnorm(n = ncell(r), mean = 20, sd = 3)
set.seed(5)
values(r[[2]])<- sample(1:7, size = ncell(r), replace = TRUE)

r[[2]]<- as.factor(r[[2]])
levels(r[[2]])<- data.frame(ID=1:7, features = c("c1","c2", "c3", "c4", "c5", "c6", "c7"))
names(r[[2]])<- "features"

writeRaster(r, "r.grd", overwrite=TRUE)
writeRaster(r, "r.tif", overwrite=TRUE)

raster::stack(r)[[2]] #Good
# attributes :
#   ID features
# from:  1       c1
# to :  7       c7

terra::rast("r.tif")[[2]] #Good
# name        : features 
# min value   :       c1 
# max value   :       c7 

is.factor(raster::brick("r.grd")[[2]])  #Loss of factor information + warning
# [1] FALSE
# Warning message:
#   In .rasterFromRasterFile(grdfile, band = band, objecttype, ...) :
#   NAs introduced by coercion

raster::brick("r.tif")[[2]] #Factor levels lost and assumed to be from 0-255
# attributes :
#   ID category
# from:   0         
# to : 255 

plot(st_as_stars(r[[2]])) # Bad

p1

plot(st_as_stars(terra::rast("r.tif")[[2]])) #some weirdness happening at the bottom

p2

st_as_stars(terra::rast("r.tif")[[2]]) #c2 appears to be other
# attribute(s):
#   r.tif     
# c7     :1469  
# c4     :1443  
# c3     :1441  
# c1     :1429  
# c6     :1429  
# c5     :1408  
# (Other):1381

plot(st_as_stars(terra::rast("r.tif")[[2]], ignore_file = TRUE)) #  ignore_file doesn't help for terra object

p3

plot(st_as_stars(raster::brick("r.tif")[[2]])) #some weirdness happening at the bottom

p4

st_as_stars(raster::brick("r.tif")[[2]]) #c2 appears to be other
# attribute(s):
#   features    
# c7     :1469  
# c4     :1443  
# c3     :1441  
# c1     :1429  
# c6     :1429  
# c5     :1408  
# (Other):1381
freq(raster::brick("r.tif")[[2]])
# value count
# [1,]     1  1429
# [2,]     2  1381
# [3,]     3  1441
# [4,]     4  1443
# [5,]     5  1408
# [6,]     6  1429
# [7,]     7  1469

plot(st_as_stars(raster::brick("r.tif")[[2]], ignore_file=TRUE)) # ignore_file helps for raster

p5

edzer added a commit that referenced this pull request Jun 5, 2022
@edzer
Copy link
Member

edzer commented Jun 5, 2022

#some weirdness happening at the bottom -> you can try droplevels() on the stars object to drop unused levels. I used raster from CRAN, and most problems seem to have disappeared now.

@ailich
Copy link
Contributor

ailich commented Jun 5, 2022

@edzer, thanks. droplevels helps in certain cases but not all. plot(droplevels(st_as_stars(terra::rast("r.tif")[[2]]))) and plot(droplevels(st_as_stars(raster::brick("r.tif")[[2]]))) work. However, when I do droplevels(st_as_stars(r[[2]])) I get Error in unique.default(x, nmax = nmax) : hash table is full. Also would you be able to provide some additional info on why droplevels helps as I don't fully follow as all the factors levels are occurring in the raster? I updated stars to the latest commit and downgraded raster to the CRAN version.

edzer added a commit that referenced this pull request Jun 5, 2022
@edzer
Copy link
Member

edzer commented Jun 5, 2022

That might work now.

@ailich
Copy link
Contributor

ailich commented Jun 6, 2022

@edzer, I updated stars to the latest github version but I'm still getting the Error in unique.default(x, nmax = nmax) : hash table is full for droplevels(st_as_stars(r[[2]]))

@edzer
Copy link
Member

edzer commented Jun 6, 2022

OK, can you please prepare a bit more of a reprex for this?

@edzer
Copy link
Member

edzer commented Jun 6, 2022

This is what I get:

library(raster) #‘3.5.19’
# Loading required package: sp
library(terra)
# terra 1.5.21
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
packageVersion("sf")
# [1] ‘1.0.7’
r<- rast(nrow=100, ncol=100, nlyr=2)
set.seed(5)
values(r[[1]])<- rnorm(n = ncell(r), mean = 20, sd = 3)
set.seed(5)
values(r[[2]])<- sample(1:7, size = ncell(r), replace = TRUE)
r[[2]]<- as.factor(r[[2]])
levels(r[[2]])<- data.frame(ID=1:7, features = c("c1","c2", "c3", "c4", "c5", "c6", "c7"))
names(r[[2]])<- "features"
droplevels(st_as_stars(r))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#         Min. 1st Qu.   Median     Mean  3rd Qu.     Max.
# values     1       4 7.275819 12.01187 19.98289 30.81064
# dimension(s):
#      from  to offset delta refsys point             values x/y
# x       1 100   -180   3.6 WGS 84    NA               NULL [x]
# y       1 100     90  -1.8 WGS 84    NA               NULL [y]
# band    1   2     NA    NA     NA    NA lyr.1   , features    
droplevels(st_as_stars(r[[2]]))
# stars object with 2 dimensions and 1 attribute
# attribute(s):
#  values   
#  c1:1429  
#  c2:1381  
#  c3:1441  
#  c4:1443  
#  c5:1408  
#  c6:1429  
#  c7:1469  
# dimension(s):
#   from  to offset delta refsys point values x/y
# x    1 100   -180   3.6 WGS 84    NA   NULL [x]
# y    1 100     90  -1.8 WGS 84    NA   NULL [y]

@ailich
Copy link
Contributor

ailich commented Jun 6, 2022

Thanks @edzer. If I downgrade terra to the CRAN version then droplevels(st_as_stars(r[[2]])) works. Perhaps @rhijmans might know which change in the development version is leading to the hitch.

edzer added a commit that referenced this pull request Jun 6, 2022
@ailich
Copy link
Contributor

ailich commented Jun 9, 2022

It does seem ignore_file is still needed, and the categorical raster will need droplevels if it is read in directly with read_stars, which can be undesirable if you want to show all categories even if they're not present, or make sure that each category has a specific color (e.g. if you have a vector of colors you put in for plotting that is the length of the number of levels, dropping levels will now cause a mismatch between the length of your color vector and the number of retained classes).

library(stars)
library(terra)

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(1:3, size = ncell(r), replace = TRUE)
values(r)[1]<- NA

levels(r)<- data.frame(ID=1:3, category = c("c1", "c2", "c3"))

plot(st_as_stars(r)) #Converted from terra object (Good)

pltnew1

writeRaster(r, "rnew.tif", overwrite=TRUE)

r2<- rast("rnew.tif") #read in from file

plot(r2) #Plotting works with terra

pltnew2

# But with stars we need ignore_file or droplevels even if read in directly with read_stars
plot(st_as_stars(r2)) # Converted terra object that is linked to file (Bad: Too many levels)

pltnew3

plot(st_as_stars(r2, ignore_file = TRUE)) # Converted terra object that is linked to file with ignore_file=TRUE (Good)

pltnew4

plot(droplevels(st_as_stars(r2))) # Converted terra object that is linked to file with droplevels (Good, but droplevels can be undesireable)

pltnew5

plot(read_stars("rnew.tif")) # Read in file directly to stars (Bad: Too many levels)

pltnew6

plot(droplevels(read_stars("rnew.tif"))) # Read in file directly to stars  with droplevels (Good but droplevels can be undesireable)

pltnew7

@ailich
Copy link
Contributor

ailich commented Jun 9, 2022

A simpler way to look at my post directly above is that the number of levels is set to 1-255 when reading in the file with stars. I'm not sure if this is a stars issue or if terra isn't clearly encoding the factor levels, as raster also reads it in as having 0-255 (I guess stars removes the 0 instead of bumping them all up 1). terra is the only one that reads it in correctly as having 3 factor levels.

library(terra)
library(stars)
library(raster)

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(1:3, size = ncell(r), replace = TRUE)
values(r)[1]<- NA

levels(r)<- data.frame(ID=1:3, category = c("c1", "c2", "c3"))

writeRaster(r, "rnew.tif", overwrite=TRUE)

nrow(levels(terra::rast("rnew.tif"))[[1]]) #3
length(levels(stars::read_stars("rnew.tif")[["rnew.tif"]])) #255
nrow(levels(raster::raster("rnew.tif"))[[1]]) #256

levels(stars::read_stars("rnew.tif")[["rnew.tif"]])
# [1] "c1" "c2" "c3" ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [28] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [55] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [82] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [109] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [136] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [163] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [190] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [217] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  
# [244] ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""   ""  

@edzer
Copy link
Member

edzer commented Jun 9, 2022

It looks like stars ignores the min & max values here; with gdalinfo rnew.tif I get

....
Band 1 Block=10x10 Type=Byte, ColorInterp=Gray
  Description = category
  Min=1.000 Max=3.000 
  Minimum=1.000, Maximum=3.000, Mean=-9999.000, StdDev=-9999.000
  NoData Value=255
  Categories:
      0: 
      1: c1
      2: c2
      3: c3
      4: 
....

edzer added a commit that referenced this pull request Jun 9, 2022
@edzer
Copy link
Member

edzer commented Jun 9, 2022

That should work better now.

@ailich
Copy link
Contributor

ailich commented Jun 10, 2022

@edzer, it seems to be mostly there now. This edge case however is still not accounted for: if classes are defined starting at 0 instead of 1, but that class 0 class is not present in the raster. When converting from a terra object, stars will link values to the incorrect classes since it looks for what the minimum value in the data is rather than the minimum ID of potential factor levels. When reading in from file, stars preserves the correct classes, but loses the first factor level.

library(terra)
library(stars)
library(raster)

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(1:2, size = ncell(r), replace = TRUE) #3 classes but only 2 are present (class 0 missing)
values(r)[1]<- NA

levels(r)<- data.frame(ID=0:2, category = c("c1", "c2", "c3")) #Define 3 classes starting from 0
freq(r) #raster only has classes c2 and c3
# layer value count
# 1     1    c2    48
# 2     1    c3    51
r_stars<- st_as_stars(r)
unique(r_stars["values"][[1]]) #stars says that raster instead is composed of class c1 and c2
# [1] <NA> c1   c2  
# Levels: c1 c2 c3

writeRaster(r, "rnew.tif", overwrite=TRUE)

levels(terra::rast("rnew.tif"))[[1]]
# value category
# 1     0       c1
# 2     1       c2
# 3     2       c3
unique(stars::read_stars("rnew.tif")[["rnew.tif"]]) # when read in from file stars loses a factor level
# [1] <NA> c2   c3  
# Levels: c2 c3

edzer added a commit that referenced this pull request Jun 10, 2022
@edzer
Copy link
Member

edzer commented Jun 10, 2022

Thanks for pushing the edge case! This gives

library(terra)
# terra 1.5.40
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
library(raster)
# Loading required package: sp
# Warning message:
# no function found corresponding to methods exports from ‘raster’ for: ‘area’ 

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(1:2, size = ncell(r), replace = TRUE) #3 classes but only 2 are present (class 0 missing)
values(r)[1]<- NA

levels(r)<- data.frame(ID=0:2, category = c("c1", "c2", "c3")) #Define 3 classes starting from 0
freq(r) #raster only has classes c2 and c3
#   layer value count
# 1     1    c2    48
# 2     1    c3    51
r_stars<- st_as_stars(r)
unique(r_stars["values"][[1]]) #stars says that raster instead is composed of class c1 and c2
# [1] <NA> c2   c3  
# Levels: c2 c3

writeRaster(r, "rnew.tif", overwrite=TRUE)

levels(terra::rast("rnew.tif"))[[1]]
#   value category
# 1     0       c1
# 2     1       c2
# 3     2       c3
unique(stars::read_stars("rnew.tif")[["rnew.tif"]]) # when read in from file stars loses a factor level
# [1] <NA> c2   c3  
# Levels: c2 c3

This loses factor level c1 in both cases, but I'd be inclined to consider this a feature, not a bug.

@ailich
Copy link
Contributor

ailich commented Jun 10, 2022

Thanks @edzer! I still think it'd be best if all factor levels are retained since droplevels isn't called, and that behavior should be consistent whether or not it was read from file, converted from terra, or starts with 0 or 1. For example, starting with ID=1 and not having that present keeps all factor levels for conversion with terra (while with ID=0 it dropped the first category), and when reading this in from file to stars I get "Error in as.character.factor(x) : malformed factor."

On a related note, if the final class (e.g. c3 in this example) is missing, conversion from terra keeps all factor levels, whereas read_stars drops the last class but does not throw an error. I also think including a warning message to indicate that values have shifted by 1 in st_as_stars like read_stars does would be a good idea.

library(terra)
library(stars)
library(raster)

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(2:3, size = ncell(r), replace = TRUE) #3 classes but only 2 are present (class 1 missing)
values(r)[1]<- NA

levels(r)<- data.frame(ID=1:3, category = c("c1", "c2", "c3")) #Define 3 classes starting from 1
freq(r)
# layer value count
# 1     1    c2    48
# 2     1    c3    51


r_stars<- st_as_stars(r)
unique(r_stars["values"][[1]]) # all factor levels preserved
#[1] <NA> c2   c3  
#Levels: c1 c2 c3

writeRaster(r, "rnew.tif", overwrite=TRUE)

levels(terra::rast("rnew.tif"))[[1]] #all factor levels preserved
# value category
# 1     1       c1
# 2     2       c2
# 3     3       c3

stars::read_stars("rnew.tif")[["rnew.tif"]]
# Error in as.character.factor(x) : malformed factor

edzer added a commit that referenced this pull request Jun 10, 2022
@edzer
Copy link
Member

edzer commented Jun 10, 2022

Good catch; that should work now.

@ailich
Copy link
Contributor

ailich commented Jun 11, 2022

@edzer thanks. The classes are now mapped correctly and there is no error. There is some slight inconsistency on how things are handled though as in some cases the first/last class will be dropped if it is not present and in others they will be kept. I think keeping all classes would be the optimal behavior as then the first and last factor levels are not treated as "special" (i.e. any class in the middle will be present in the levels whether or not it is present).

library(terra)
library(stars)

# Class 1 missing
r0<- rast(nrow=10, ncol=10)
set.seed(5)
values(r0)<- sample(c(0,2), size = ncell(r0), replace = TRUE)
values(r0)[1]<- NA

r1<- r0+1

levels(r0)<- data.frame(ID=0:2, category = c("c1", "c2", "c3")) #Define 3 classes starting from 0
levels(r1)<- data.frame(ID=1:3, category = c("c1", "c2", "c3")) #Define 3 classes starting from 1

writeRaster(r0, "r0.tif", overwrite=TRUE)
writeRaster(r1, "r1.tif", overwrite=TRUE)

# First level dropped

levels(stars::read_stars("r0.tif")[["r0.tif"]]) 
# "c2" "c3"
levels(st_as_stars(r0)[["values"]]) 
# "c2" "c3"

# All levels retained
levels(stars::read_stars("r1.tif")[["r1.tif"]]) 
# "c1" "c2" "c3"
levels(st_as_stars(r1)[["values"]]) 
# "c1" "c2" "c3"

# Class 3 missing
r0_max<- rast(nrow=10, ncol=10)
set.seed(5)
values(r0_max)<- sample(0:1, size = ncell(r0_max), replace = TRUE)
values(r0_max)[1]<- NA

r1_max<- r0_max+1

levels(r0_max)<- data.frame(ID=0:2, category = c("c1", "c2", "c3")) #Define 3 classes starting from 0
levels(r1_max)<- data.frame(ID=1:3, category = c("c1", "c2", "c3")) #Define 3 classes starting from 1

writeRaster(r0_max, "r0_max.tif", overwrite=TRUE)
writeRaster(r1_max, "r1_max.tif", overwrite=TRUE)

# Last level dropped
levels(stars::read_stars("r0_max.tif")[["r0_max.tif"]])
# "c1" "c2"
# Warning message:
#   In stars::read_stars("r0_max.tif") :
#   categorical data values starting at 0 are shifted with one to start at 1

levels(stars::read_stars("r1_max.tif")[["r1_max.tif"]])
"c1" "c2"

# All levels retained
levels(st_as_stars(r0_max)[["values"]]) 
# "c1" "c2" "c3"
levels(st_as_stars(r1_max)[["values"]])
# "c1" "c2" "c3"

edzer added a commit to r-spatial/sf that referenced this pull request Jun 16, 2022
* handle 0 and gaps in attribute and color tables
edzer added a commit that referenced this pull request Jun 16, 2022
@edzer
Copy link
Member

edzer commented Jun 16, 2022

I think this pretty much nails it: zeroes, and gaps in attribute / color tables, and roundtripping. Note that this (currently) needs the respective branches in stars and sf to work.

library(terra)
# terra 1.5.43
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE

# Class 1 missing
r0 <- rast(nrow=2, ncol=2)
values(r0) <- 0:3

values(r0)[1] <- NA

r1 <- r0 + 1

# 1: all classes 
levels(r0) <- data.frame(ID=0:3, category = c("c0", "c1", "c2", "c3")) #Define 3 classes starting from 0
levels(r1) <- data.frame(ID=1:4, category = c("c1", "c2", "c3", "c4")) #Define 3 classes starting from 1

writeRaster(r0, "r0.tif", overwrite=TRUE)
writeRaster(r1, "r1.tif", overwrite=TRUE)

all.equal(levels(stars::read_stars("r0.tif")[["r0.tif"]]),
  levels(st_as_stars(r0)[["values"]]))
# [1] TRUE

all.equal(levels(stars::read_stars("r1.tif")[["r1.tif"]]),
  levels(st_as_stars(r1)[["values"]]))
# [1] TRUE

lc = read_stars(system.file("tif/lc.tif", package = "stars"))
write_stars(lc, "lc.tif")
lc2 = read_stars("lc.tif")
lc
# stars object with 2 dimensions and 1 attribute
# attribute(s):
#                         lc.tif     
#  Evergreen Forest           : 456  
#  Herbaceuous                : 270  
#  Open Water                 : 252  
#  Developed, Low Intensity   :  81  
#  Developed, Medium Intensity:  48  
#  (Other)                    : 142  
#  NA's                       :2615  
# dimension(s):
#   from to  offset delta                    refsys point values x/y
# x    1 84 3092415  3000 Albers Conical Equal Area FALSE   NULL [x]
# y    1 46   59415 -3000 Albers Conical Equal Area FALSE   NULL [y]
lc2
# stars object with 2 dimensions and 1 attribute
# attribute(s):
#                         lc.tif     
#  Evergreen Forest           : 456  
#  Herbaceuous                : 270  
#  Open Water                 : 252  
#  Developed, Low Intensity   :  81  
#  Developed, Medium Intensity:  48  
#  (Other)                    : 142  
#  NA's                       :2615  
# dimension(s):
#   from to  offset delta                    refsys point values x/y
# x    1 84 3092415  3000 Albers Conical Equal Area FALSE   NULL [x]
# y    1 46   59415 -3000 Albers Conical Equal Area FALSE   NULL [y]
all.equal(lc, lc2)
# [1] TRUE

edzer added a commit that referenced this pull request Jun 17, 2022
edzer added a commit that referenced this pull request Jun 17, 2022
@ailich
Copy link
Contributor

ailich commented Jun 17, 2022

Hmmn, @edzer I have the latest development version of stars and sf but I'm not getting the same results.

remotes::install_github("r-spatial/sf")
#> Skipping install of 'sf' from a github remote, the SHA1 (4f550e55) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_github("r-spatial/stars")
#> Skipping install of 'stars' from a github remote, the SHA1 (78b96b92) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

lc = read_stars(system.file("tif/lc.tif", package = "stars"))
#> Warning in read_stars(system.file("tif/lc.tif", package = "stars")): categorical
#> data values starting at 0 are shifted with one to start at 1
write_stars(lc, "lc.tif")
lc2 = read_stars("lc.tif")
lc
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                         lc.tif     
#>                             :2615  
#>  Evergreen Forest           : 456  
#>  Herbaceuous                : 270  
#>  Open Water                 : 252  
#>  Developed, Low Intensity   :  81  
#>  Developed, Medium Intensity:  48  
#>  (Other)                    : 142  
#> dimension(s):
#>   from to  offset delta                    refsys point values x/y
#> x    1 84 3092415  3000 Albers Conical Equal Area FALSE   NULL [x]
#> y    1 46   59415 -3000 Albers Conical Equal Area FALSE   NULL [y]
lc2
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                         lc.tif     
#>                             :2615  
#>  Evergreen Forest           : 456  
#>  Herbaceuous                : 270  
#>  Open Water                 : 252  
#>  Developed, Low Intensity   :  81  
#>  Developed, Medium Intensity:  48  
#>  (Other)                    : 142  
#> dimension(s):
#>   from to  offset delta                    refsys point values x/y
#> x    1 84 3092415  3000 Albers Conical Equal Area FALSE   NULL [x]
#> y    1 46   59415 -3000 Albers Conical Equal Area FALSE   NULL [y]

all.equal(lc, lc2)
#> [1] "Component \"lc.tif\": Attributes: < Component \"colors\": Lengths (256, 255) differ (string compare on first 255) >"

Created on 2022-06-17 by the reprex package (v2.0.1)

@edzer
Copy link
Member

edzer commented Jun 17, 2022

You need the branches:

remotes::install_github("r-spatial/sf@stars_factor")
remotes::install_github("r-spatial/stars@factor")

edzer added a commit that referenced this pull request Jun 17, 2022
@ailich
Copy link
Contributor

ailich commented Jun 17, 2022

Thanks @edzer! Seems to be working great!

@ailich
Copy link
Contributor

ailich commented Jun 17, 2022

Categories are ignored for later layers though if there's more than one categorical raster.

library(terra)
#> terra 1.5.43
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(1:3, size = ncell(r), replace = TRUE)
levels(r) <- data.frame(ID=1:3, category = c("c1", "c2", "c3"))

r2<- rast(nrow=10, ncol=10)
set.seed(6)
values(r2)<- sample(1:3, size = ncell(r2), replace = TRUE)
levels(r2) <- data.frame(ID=1:3, category = c("x1", "x2", "x3"))


r3<- c(r,r2)
names(r3)<- c("lyr.1", "lyr.2")

stars_r3<- st_as_stars(r3)
#> Warning in st_as_stars.SpatRaster(r3): ignoring categories/levels for all but
#> first layer

Created on 2022-06-17 by the reprex package (v2.0.1)

@edzer
Copy link
Member

edzer commented Jun 18, 2022

That is by design, as currently bands in a SpatRaster are put in a band dimension, which means it must be homeneous in every respect. But one could do

> c(st_as_stars(r), st_as_stars(r2))
stars object with 2 dimensions and 2 attributes
attribute(s):
 values  values.1 
 c1:34   x1:28    
 c2:32   x2:37    
 c3:34   x3:35    
dimension(s):
  from to offset delta refsys point values x/y
x    1 10   -180    36 WGS 84    NA   NULL [x]
y    1 10     90   -18 WGS 84    NA   NULL [y]

and we could, in principle, make it an option for st_as_stars.SpatRaster to do this for you.

@edzer
Copy link
Member

edzer commented Jun 18, 2022

Not sure where the x$.self.finalize() messages come from, but I guess from terra. We now have

library(terra)
# terra 1.5.43
#> terra 1.5.43
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(1:3, size = ncell(r), replace = TRUE)
levels(r) <- data.frame(ID=1:3, category = c("c1", "c2", "c3"))

r2<- rast(nrow=10, ncol=10)
set.seed(6)
values(r2)<- sample(1:3, size = ncell(r2), replace = TRUE)
levels(r2) <- data.frame(ID=1:3, category = c("x1", "x2", "x3"))

r3<- c(r,r2)
names(r3)<- c("lyr.1", "lyr.2")

st_as_stars(r3)
# Error in x$.self$finalize() : attempt to apply non-function
# Error in x$.self$finalize() : attempt to apply non-function
# Error in x$.self$finalize() : attempt to apply non-function
# Error in x$.self$finalize() : attempt to apply non-function
# Error in x$.self$finalize() : attempt to apply non-function
# Error in x$.self$finalize() : attempt to apply non-function
# Error in (function (x)  : attempt to apply non-function
# Error in x$.self$finalize() : attempt to apply non-function
# Error in x$.self$finalize() : attempt to apply non-function
# Error in (function (x)  : attempt to apply non-function
# stars object with 2 dimensions and 2 attributes
# attribute(s):
#  lyr.1   lyr.2  
#  c1:34   x1:28  
#  c2:32   x2:37  
#  c3:34   x3:35  
# dimension(s):
#   from to offset delta refsys point values x/y
# x    1 10   -180    36 WGS 84    NA   NULL [x]
# y    1 10     90   -18 WGS 84    NA   NULL [y]
(a = c(st_as_stars(r3[[1]]), st_as_stars(r3[[2]])))
# stars object with 2 dimensions and 2 attributes
# attribute(s):
#  lyr.1   lyr.2  
#  c1:34   x1:28  
#  c2:32   x2:37  
#  c3:34   x3:35  
# dimension(s):
#   from to offset delta refsys point values x/y
# x    1 10   -180    36 WGS 84    NA   NULL [x]
# y    1 10     90   -18 WGS 84    NA   NULL [y]
(b = st_redimension(a))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#  lyr.1.lyr.2 
#  c1:34       
#  c2:32       
#  c3:34       
#  x1:28       
#  x2:37       
#  x3:35       
# dimension(s):
#         from to offset delta refsys point       values x/y
# x          1 10   -180    36 WGS 84    NA         NULL [x]
# y          1 10     90   -18 WGS 84    NA         NULL [y]
# new_dim    1  2     NA    NA     NA    NA lyr.1, lyr.2    
(c = split(b))
# stars object with 2 dimensions and 2 attributes
# attribute(s):
#  lyr.1   lyr.2  
#  c1:34   c1: 0  
#  c2:32   c2: 0  
#  c3:34   c3: 0  
#  x1: 0   x1:28  
#  x2: 0   x2:37  
#  x3: 0   x3:35  
# dimension(s):
#   from to offset delta refsys point values x/y
# x    1 10   -180    36 WGS 84    NA   NULL [x]
# y    1 10     90   -18 WGS 84    NA   NULL [y]
(d = droplevels(c))
# stars object with 2 dimensions and 2 attributes
# attribute(s):
#  lyr.1   lyr.2  
#  c1:34   x1:28  
#  c2:32   x2:37  
#  c3:34   x3:35  
# dimension(s):
#   from to offset delta refsys point values x/y
# x    1 10   -180    36 WGS 84    NA   NULL [x]
# y    1 10     90   -18 WGS 84    NA   NULL [y]

@ailich
Copy link
Contributor

ailich commented Jun 19, 2022

@edzer awesome! The info seems to get transferred properly. plot.stars doesn't seem to understand that it's a multiband stars object yet, but conveniently this does already work with tmap. As for the "x$.self$finalize()" thing, that's a nuisance message from terra that can be ignored (rspatial/terra#218 and rspatial/terra#30).

library(terra)
#> terra 1.5.44
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(tmap)

r<- rast(nrow=10, ncol=10)
set.seed(5)
values(r)<- sample(1:3, size = ncell(r), replace = TRUE)
levels(r) <- data.frame(ID=1:3, category = c("c1", "c2", "c3"))

r2<- rast(nrow=10, ncol=10)
set.seed(6)
values(r2)<- sample(1:3, size = ncell(r2), replace = TRUE)
levels(r2) <- data.frame(ID=1:3, category = c("x1", "x2", "x3"))

r3<- c(r,r2)
names(r3)<- c("lyr.1", "lyr.2")

r3_stars<- st_as_stars(r3)

plot(r3_stars)

plot(r3_stars["lyr.1"])

plot(r3_stars["lyr.2"])

tm_shape(r3_stars)+
  tm_raster()

Created on 2022-06-19 by the reprex package (v2.0.1)

@edzer
Copy link
Member

edzer commented Jun 20, 2022

plot.stars doesn't seem to understand that it's a multiband stars object yet,

It does, but it doesn't plot secondary attributes; the multi-band version assumes the array constitutes single factor:

plot(st_redimension(r3_stars))

x

@ailich
Copy link
Contributor

ailich commented Jun 20, 2022

Thanks! I still need to get a bit more familiar with stars objects and functions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants