Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected errors during subassignment; not mimicking arrays #17

Open
taylorpetty opened this issue Jul 8, 2024 · 3 comments
Open

Unexpected errors during subassignment; not mimicking arrays #17

taylorpetty opened this issue Jul 8, 2024 · 3 comments

Comments

@taylorpetty
Copy link

I'm unable to assign 1- or 2D-slices to vectors. I can loop through every element of an array with nested for-loops, but it is incredibly slow to do. I get a variety of error messages, demonstrated below. All errors were identical whether using = or <-, on a Windows laptop and a Linux server.

library(SparseArray)
a = array(1:45, dim = c(3,3,5))
s = SparseArray(a)

a[2,,4] = 7:9
s[2,,4] = 7:9 # error
# Error in .subassign_SVT_with_Rarray(x, index, value) : 
#   is.array(Rarray) is not TRUE

a[,,3] = matrix(1:9,nrow=3)
s[,,3] = matrix(1:9,nrow=3) # error; can also add as.array() outside to get same error message
# Error in .subassign_SVT_SparseArray_by_index(x, index, value) : 
#   the selection and supplied value must have the same dimensions

The errors differ whether subsetting by column or by row, and whether it's the entire column or a subset of it:

a = array(1:10, dim = c(5,2))
s = SparseArray(a)
a[1:5,1] = 8:12
s[1:5,1] = 8:12 # *no* error
s[1:4,1] = 8:11 # error
# Error in .Call2("C_subassign_SVT_with_short_Rvector", x@dim, x@type, x@SVT,  : 
#                   SparseArray internal error in init_left_bufs():
#                   invalid short Rvector length


a[1,1:2] = 8:9
s[1,1:2] = 8:9 # error
# Error in .subassign_SVT_with_Rarray(x, index, value) : 
#   is.array(Rarray) is not TRUE

Windows sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] SparseArray_1.2.4     S4Arrays_1.2.1        IRanges_2.36.0       
[4] abind_1.4-5           S4Vectors_0.40.2      MatrixGenerics_1.14.0
[7] matrixStats_1.3.0     BiocGenerics_0.48.1   Matrix_1.6-5         

loaded via a namespace (and not attached):
[1] zlibbioc_1.48.0 lattice_0.22-5  parallel_4.3.1  XVector_0.42.0 
[5] this.path_2.3.1 grid_4.3.1      compiler_4.3.1  tools_4.3.1    
[9] crayon_1.5.2 
Linux sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Rocky Linux 9.2 (Blue Onyx)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] SparseArray_1.2.4     S4Arrays_1.2.1        IRanges_2.36.0
[4] abind_1.4-5           S4Vectors_0.40.2      MatrixGenerics_1.14.0
[7] matrixStats_1.2.0     BiocGenerics_0.48.1   Matrix_1.6-4

loaded via a namespace (and not attached):
[1] zlibbioc_1.48.0 compiler_4.3.1  tools_4.3.1     XVector_0.42.0
[5] crayon_1.5.2    grid_4.3.1      lattice_0.22-5
@hpages
Copy link
Contributor

hpages commented Jul 8, 2024

Yeah subassignment is still a work-in-progress. In particular using a multidimensional index is not ready in general. It's on the TODO list but it might take a few more weeks before I get to it.

In the meantime, the workaround is to use a linear index i.e. a numeric vector or N-col matrix where N is the number of dimensions. In this case the right value must be a vector. So:

library(SparseArray)
a <- array(1:45, dim = c(3,3,5))
s <- SparseArray(a)

a[2,,4] <- 7:9
s[cbind(2,1:3,4)] <- 7:9
identical(as.array(s), a)  # TRUE

a[,,3] <- matrix(1:9,nrow=3)
s[19:27] <- as.vector(matrix(1:9,nrow=3))
identical(as.array(s), a)  # TRUE

a <- array(1:10, dim = c(5,2))
s <- SparseArray(a)
a[1,1:2] <- 8:9
s[cbind(1,1:2)] <- 8:9
identical(as.array(s), a)  # TRUE

Not always super easy/convenient to translate a multidimensional index [x,y,z] into a linear index though, sorry. However this form of subassignment is well tested and quite efficient.

Hopefully that'll get you going for now.

@taylorpetty
Copy link
Author

taylorpetty commented Jul 9, 2024

This is helpful. Thank you for the quick response. If anyone is reading this and wants a quick and dirty workaround solution for now, here's the helper function I came up with:

mygrid <- function(...) {
  as.matrix(data.table::CJ(...))
}

Then you can write things like this:

a = array(0L, dim = 5:3)
s = SparseArray(a)
a[1:2,1:3,2] = 2L
s[mygrid(1:2,1:3,2)] = 2L
identical(a, as.array(s)) # TRUE

x = poissonSparseArray(5:3, lambda=0)
y = x
x[rbind(cbind(1,1:3,1), cbind(2,1:3,1))] <- 10L
y[mygrid(1:2,1:3,1)] <- 10L
identical(x,y) # TRUE

@hpages
Copy link
Contributor

hpages commented Jul 9, 2024

Nice. Good to know about data.table::CJ().

I don't know the details of your use case but note that another way to build a big multidimensional sparse array is to construct slices of lower dimensions and abind() them together. For example, to build a big 3D sparse array, construct the individual 2D slices with something like this:

library(SparseArray)
my_2D_slices <- lapply(1:20, function(i) poissonSparseMatrix(8000, 2500, density=0.15))

or load them from a file, then stack them together with:

my_3D_svt <- do.call(abind, c(my_2D_slices, list(along=3)))
my_3D_svt
# <8000 x 2500 x 20 SparseArray> of type "integer" [nzcount=60008425 (15%)]:
# ,,1
#            [,1]    [,2]    [,3]    [,4] ... [,2497] [,2498] [,2499] [,2500]
#    [1,]       0       0       0       0   .       0       0       0       0
#    [2,]       0       0       0       0   .       0       0       0       0
#     ...       .       .       .       .   .       .       .       .       .
# [7999,]       0       0       0       0   .       0       0       0       1
# [8000,]       0       0       0       0   .       0       0       0       1
# 
# ...
# 
# ,,20
#            [,1]    [,2]    [,3]    [,4] ... [,2497] [,2498] [,2499] [,2500]
#    [1,]       1       0       0       0   .       0       0       0       1
#    [2,]       0       0       0       0   .       0       0       0       0
#     ...       .       .       .       .   .       .       .       .       .
# [7999,]       1       0       0       0   .       0       0       0       0
# [8000,]       0       2       0       0   .       0       0       0       0

Note that stacking SVT_SparseArray objects of identical dimensions like this is a very cheap operation (almost instantaneous!). In particular it will be a lot more efficient than creating a big allzero array and subassigning slices to it.

See ?S4Arrays::abind for more information.

Also be aware that the row/col summarization methods for SVT_SparseArray objects (a.k.a. matrixStats methods) can handle multidimensional objects and support the dims argument:

cv23 <- colVars(my_3D_svt)  # takes < 0.2s
dim(cv23)
# [1] 2500   20

rv12 <- rowVars(my_3D_svt, dims=2)  # takes < 4s
dim(rv12)
# [1] 8000 2500

See ?SparseArray::colSums for more information.

Hope this helps,
H.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants