-
Notifications
You must be signed in to change notification settings - Fork 9
/
NEWS
521 lines (359 loc) · 20.1 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
CHANGES IN VERSION 0.32.0
-------------------------
NEW FEATURES
o Add nzwhich() method for DelayedArray objects (block-processed).
o Support %*%, crossprod(), and tcrossprod() between DelayedMatrix and
COO_SparseMatrix objects.
SIGNIFICANT USER-VISIBLE CHANGES
o The default realize() method now returns an SVT_SparseArray object
instead of an SparseArraySeed object when the array-like object to
realize is sparse and 'BACKEND' is NULL.
o Coercing a DelayedArray object or derivative to SparseArray should be
much more efficient (thanks to various tweaks that happened in the
SparseArray and HDF5Array packages).
DEPRECATED AND DEFUNCT
o Deprecate SparseArraySeed objects.
o Deprecate OLD_extract_sparse_array() and read_sparse_block() generics
and methods.
o Fix bug in coercion from DelayedArray to SparseArray when the object to
coerce has NAs. See https://github.com/Bioconductor/HDF5Array/issues/61
CHANGES IN VERSION 0.30.0
-------------------------
NEW FEATURES
o rowsum(<DelayedMatrix>)/colsum(<DelayedMatrix>) now acknowledge the
current "automatic realization backend" and "automatic BiocParallel
BPPARAM". See '?DelayedArray::rowsum' for more information.
SIGNIFICANT USER-VISIBLE CHANGES
o Rename supportedRealizationBackends -> registeredRealizationBackends.
o Slightly modify the behavior of 'realize(x, BACKEND=NULL)'.
See https://github.com/hansenlab/minfi/issues/256
o Two important changes to matrix multiplication of DelayedMatrix objects.
1. Now it returns an ordinary matrix by default (before this change an
ordinary matrix was returned wrapped in a DelayedMatrix object). The
user can change the default behavior by setting an "automatic
realization backend". See ?DelayedArray::`%*%` for more information.
2. Better block processing strategy when only one of the two operands is
a DelayedMatrix object (or derivative). The new strategy acknowledges
the geometry of the physical chunks of the data in the object. This
can make a huge difference in some cases. For example, using a subset
of the "1.3 Million Brain Cell Dataset" from 10x Genomics:
library(HDF5Array)
library(ExperimentHub)
hub <- ExperimentHub()
tenx <- TENxMatrix(hub[["EH1039"]], group="mm10")
M <- tenx[ , 1:25000]
m <- cbind(runif(ncol(M)), runif(ncol(M)))
M %*% m
Doing 'M %*% m' now takes 7.6s and uses 1.1Gb of memory, compared to
110s / 3.1Gb before this improvement. Furthermore, the new strategy
operates in linear time and at constant memory:
with DelayedArray with DelayedArray
ncol(M) 0.29.2 < 0.29.2
------- ----------------- -----------------
12500 4.3s / 1.1Gb 32s / 2.1Gb
25000 7.6s / 1.1Gb 110s / 3.1Gb
50000 13.4s / 1.1Gb 495s / 5.6Gb
100000 24.0s / 1.2Gg 2409s / 9.1Gb
Note that the new strategy is implemented in internal helpers
DelayedArray:::BLOCK_mult_Lgrid() and
DelayedArray:::BLOCK_mult_Rgrid(). When the two operands are
DelayedMatrix objects, the old strategy (which is implemented in
DelayedArray:::.super_BLOCK_mult()) is still used.
CHANGES IN VERSION 0.28.0
-------------------------
NEW FEATURES
o Add coercion from DelayedArray to SparseArray.
o Add efficient rowVars/colVars methods for DelayedMatrix objects.
These methods, like all other row/col summarization methods implemented
in the DelayedArray package, use block processing and can handle blocks
of **arbitrary** geometry, that is, they can handle a grid of class
ArbitraryArrayGrid (the most general type of grid).
o Add 'useNames' arg to row/colMins, row/colMaxs, row/colRanges, and
row/colVars methods for DelayedMatrix objects.
o Add 'current_viewport' argument to set_grid_context().
SIGNIFICANT USER-VISIBLE CHANGES
o DelayedArray now depends on S4Arrays and SparseArray.
o Some improvements to the rowMeans/colMeans methods for DelayedMatrix
objects.
CHANGES IN VERSION 0.26.0
-------------------------
- No changes in this version.
CHANGES IN VERSION 0.24.0
-------------------------
SIGNIFICANT USER-VISIBLE CHANGES
o Move the aperm() S4 generic to BiocGenerics.
CHANGES IN VERSION 0.22.0
-------------------------
DEPRECATED AND DEFUNCT
o The following stuff is now defunct after being deprecated in previous
versions of the package:
- blockGrid(): replaced with defaultAutoGrid()
- rowGrid(): replaced with rowAutoGrid()
- colGrid(): replaced with colAutoGrid()
- multGrids(): replaced with defaultMultAutoGrids()
- linearInd(): replaced with Mindex2Lindex()
- viewportApply(): replaced with gridApply()
- viewportReduce(): replaced with gridReduce()
- getRealizationBackend(): replaced with getAutoRealizationBackend()
- setRealizationBackend(): replaced with setAutoRealizationBackend()
- RealizationSink(): replaced with AutoRealizationSink()
BUG FIXES
o Small tweak to updateObject() method for DelayedArray objects (see
commit abcd154).
CHANGES IN VERSION 0.20.0
-------------------------
BUG FIXES
o Fix long-standing bugs in dense2sparse():
- mishandling of NAs/NaNs in input
- 1D case didn't work
CHANGES IN VERSION 0.18.0
-------------------------
NEW FEATURES
o Implement ConstantArray objects. The ConstantArray class is a
DelayedArray subclass to efficiently mimic an array containing a
constant value, without actually creating said array in memory.
o Add scale() method for DelayedMatrix objects.
o Add sinkApply(), a convenience function for walking on a RealizationSink
derivative and filling it with blocks of data.
o Proper support for dgRMatrix and lgRMatrix objects as DelayedArray
object seeds:
- is_sparse() now returns TRUE on dgRMatrix and lgRMatrix objects.
- Support coercion back and forth between SparseArraySeed objects
and dgRMatrix/lgRMatrix objects.
- Add extract_sparse_array() methods for dgRMatrix and lgRMatrix
objects.
These changes bring the treatment of dgRMatrix and lgRMatrix objects
to the same level as dgCMatrix and lgCMatrix objects. For example,
wrapping a dgRMatrix or lgRMatrix object in a DelayedArray object will
trigger the same sparse-optimized mechanisms during block processing
as when wrapping a dgCMatrix or lgCMatrix object.
o rbind() and cbind() on sparse DelayedArray objects are now fully
supported.
o Delayed operations of type DelayedUnaryIsoOpWithArgs now preserve
sparsity when appropriate.
o Implement DummyArrayGrid and DummyArrayViewport objects.
SIGNIFICANT USER-VISIBLE CHANGES
o Rename viewportApply()/viewportReduce() -> gridApply()/gridReduce().
BUG FIXES
o Subsetting of a DelayedArray object now propagates the names/dimnames,
even when drop=TRUE and the result has only 1 dimension (issue #78).
o log() on a DelayedArray object now handles the 'base' argument.
o Fix issue in is_sparse() methods for DelayedUnaryIsoOpStack and
DelayedNaryIsoOp objects.
o cbind()/rbind() no longer coerce supplied objects to type of 1st object
(commit f1279e07).
o Fix small issue in dim() setter (commit c9488537).
CHANGES IN VERSION 0.16.0
-------------------------
NEW FEATURES
o Added 'as.sparse' argument to read_block() (see ?read_block) and to
AutoRealizationSink() (see ?AutoRealizationSink).
o SparseArraySeed objects now can hold dimnames. As a consequence
read_block() now also propagates the dimnames to sparse blocks,
not just to dense blocks.
o Matrix multiplication is now sparse-aware via sparseMatrices.
o Added is_sparse<- generic (with methods for HDF5Array/HDF5ArraySeed
objects only, see ?HDF5Array in the HDF5Array package).
o Added viewportApply() and viewportReduce() to the blockApply() family.
o Added set_grid_context() for testing/debugging callback functions passed
to blockApply() and family.
SIGNIFICANT USER-VISIBLE CHANGES
o Renamed first write_block() argument 'x' -> 'sink'
o Renamed:
RealizationSink() -> AutoRealizationSink()
get/setRealizationBackend() -> get/setAutoRealizationBackend()
blockGrid() -> defaultAutoGrid()
row/colGrid() -> row/colAutoGrid()
o Improved support of sparse data:
- Slightly more efficient coercion from SparseArraySeed to
dgCMatrix/lgCMatrix (small speedup and memory footprint reduction).
This provides a minor speedup to the sparse aware block-processed
row/col summarization methods for DelayedMatrix objects when the
object is sparse. (These methods are: row/colSums(), row/colMeans(),
row/colMins(), row/colMaxs(), and row/colRanges(). The methods defined
in DelayedMatrixStats are not sparse aware yet so are not affected.)
- Made the following block-processed operations on DelayedArray objects
sparse aware: anyNA(), which(), max(), min(), range(), sum(), prod(),
any(), all(), and mean(). With a typical 50%-60% speedup when the
DelayedArray object is sparse.
- Implemented a bunch of methods to operate natively on SparseArraySeed
objects. Their main purpose is to support the above i.e. to support
block processed methods for DelayedArray objects like sum(), mean(),
which(), etc... when the object is sparse. Note that more are needed
to also support the sparse aware block-processed row/col summarization
methods for DelayedMatrix objects so we can finally ditch the costly
coercion from SparseArraySeed to dgCMatrix/lgCMatrix that they currently
rely on.
o The utility functions for retrieving grid context for the current
block/viewport should now be called with no argument (previously
one needed to pass the current block to them). These functions are
effectiveGrid(), currentBlockId(), and currentViewport().
o DelayedArray now depends on the MatrixGenerics package.
BUG FIXES
o Various fixes and improvements to block processing of sparse logical
DelayedMatrix objects (e.g. DelayedMatrix object with a lgCMatrix
seed from thr Matrix package).
o Fix extract_sparse_array() inefficiency on dgCMatrix and lgCMatrix
objects.
o Switch matrix multiplication to bplapply2() from bpiterate() to fix
error handling.
CHANGES IN VERSION 0.14.0
-------------------------
NEW FEATURES
o Support 'type(x) <- new_type' to change the type of a DelayedArray
object.
o 1D-style single bracket subsetting of DelayedArray objects now supports
subsetting by a numeric matrix with one column per dimension.
SIGNIFICANT USER-VISIBLE CHANGES
o No more parallel evaluation by default, that is, getAutoBPPARAM() now
returns NULL on a fresh session instead of one of the parallelization
backends defined in BiocParallel. It is now the responsibility of the
user to set the parallelization backend (with setAutoBPPARAM()) if they
wish things like matrix multiplication, rowsum() or rowSums() use
parallel evaluation again.
Also BiocParallel has been moved from Depends to Suggests.
o Replace arrayInd2() and linearInd() with Lindex2Mindex() and
Mindex2Lindex(). The new functions are implemented in C for better
performances and they properly handle L-index values greater than
INT_MAX (2^31 - 1) in the input and output.
o 2x speedup to coercion from DelayedArray to SparseArraySeed or dgCMatrix.
DEPRECATED AND DEFUNCT
o arrayInd2() and linearInd() are now deprecated in favor of
Lindex2Mindex() and Mindex2Lindex().
BUG FIXES
o Fix handling of linear indices >= 2^31 in 1D-style single bracket
subsetting of DelayedArray objects.
o rowsum() & colsum() methods for DelayedArray objects now respect factor
level ordering (issue #59).
o Coercion from DelayedMatrix to dgCMatrix now propagates the dimnames.
o No more quotes around the NA values of a DelayedArray of type "character".
o Better error message when Ops methods for DelayedArray objects reject
their operands.
CHANGES IN VERSION 0.12.0
-------------------------
NEW FEATURES
o Add isPristine()
o Delayed subassignment now accepts a right value with dimensions that are
not strictly the same as the dimensions of the selection as long as the
"effective dimensions" are the same
o Small improvement to delayed dimnames setter: atomic vectors or factors
in the supplied 'dimnames' list are now accepted and passed thru
as.character()
SIGNIFICANT USER-VISIBLE CHANGES
o Improve show() method for DelayedArray objects (see commit 54540856)
BUG FIXES
o Setting and getting the dimnames of a DelayedArray object or derivative
now preserves the names on the dimnames
o Some fixes related to DelayedArray objects with list array seeds (see
commit 6c94eac7)
CHANGES IN VERSION 0.10.0
-------------------------
NEW FEATURES
o Many improvements to matrix multiplication (%*%) of DelayedMatrix
objects by Aaron Lun. Also add limited support for (t)crossprod methods.
o Add rowsum() and colsum() methods for DelayedMatrix objects.
These methods are block-processed operations.
o Many improvements to the RleArray() contructor (see messages for
commits 582234a7 and 0a36ee01 for more info).
o Add seedApply()
o Add multGrids() utility (still a work-in-progress, not documented yet)
CHANGES IN VERSION 0.8.0
------------------------
NEW FEATURES
o Add get/setAutoBlockSize(), getAutoBlockLength(),
get/setAutoBlockShape() and get/setAutoGridMaker().
o Add rowGrid() and colGrid(), in addition to blockGrid().
o Add get/setAutoBPPARAM() to control the automatic 'BPPARAM' used by
blockApply().
o Reduce memory usage when realizing a sparse DelayedArray to disk
On-disk realization of a DelayedArray object that is reported to be sparse
(by is_sparse()) to a "sparsity-optimized" backend (i.e. to a backend with
a memory efficient write_sparse_block() like the TENxMatrix backend imple-
mented in the HDF5Array package) now preserves sparse representation of
the data all the way. More precisely, each block of data is now kept in
a sparse form during the 3 steps that it goes thru: read from seed,
realize in memory, and write to disk.
o showtree() now displays whether a tree node or leaf is considered sparse
or not.
o Enhance "aperm" method and dim() setter for DelayedArray objects. In
addition to allowing dropping "ineffective dimensions" (i.e. dimensions
equal to 1) from a DelayedArray object, aperm() and the dim() setter now
allow adding "ineffective dimensions" to it.
o Enhance subassignment to a DelayedArray object.
So far subassignment to a DelayedArray object only supported the **linear
form** (i.e. x[i] <- value) with strong restrictions (the subscript 'i'
must be a logical DelayedArray of the same dimensions as 'x', and 'value'
must be an ordinary vector of length 1).
In addition to this linear form, subassignment to a DelayedArray object
now supports the **multi-dimensional form** (e.g. x[3:1, , 6] <- 0). In
this form, one subscript per dimension is supplied, and each subscript
can be missing or be anything that multi-dimensional subassignment to
an ordinary array supports. The replacement value (a.k.a. the right
value) can be an array-like object (e.g. ordinary array, dgCMatrix object,
DelayedArray object, etc...) or an ordinary vector of length 1. Like the
linear form, the multi-dimensional form is also implemented as a delayed
operation.
o Re-implement internal helper simple_abind() in C and support long arrays.
simple_abind() is the workhorse behind realization of arbind() and
acbind() operations on DelayedArray objects.
o Add "table" and (restricted) "unique" methods for DelayedArray objects,
both block-processed.
o range() (block-processed) now supports the 'finite' argument on a
DelayedArray object.
o %*% (block-processed) now works between a DelayedMatrix object and an
ordinary vector.
o Improve support for DelayedArray of type "list".
o Add TENxMatrix to list of supported realization backends.
o Add backend-agnostic RealizationSink() constructor.
o Add linearInd() utility for turning array indices into linear indices.
Note that linearInd() performs the reverse transformation of
base::arrayInd().
o Add low-level utilities mapToGrid() and mapToRef() for mapping reference
array positions to grid positions and vice-versa.
o Add downsample() for reducing the "resolution" of an ArrayGrid object.
o Add maxlength() generic and methods for ArrayGrid objects.
SIGNIFICANT USER-VISIBLE CHANGES
o Multi-dimensional subsetting is no more delayed when drop=TRUE and the
result has only one dimension. In this case the result now is returned
as an **ordinary** vector (atomic or list). This is the only case of
multi-dimensional single bracket subsetting that is not delayed.
o Rename defaultGrid() -> blockGrid(). The 'max.block.length' argument
is replaced with the 'block.length' argument. 2 new arguments are
added: 'chunk.grid' and 'block.shape'.
o Major improvements to the block processing mechanism.
All block-processed operations (except realization by block) now support
blocks of **arbitrary** geometry instead of column-oriented blocks only.
'blockGrid(x)', which is called by the block-processed operations to get
the grid of blocks to use on 'x', has the following new features:
1) It's "chunk aware". This means that, when the chunk grid is known (i.e.
when 'chunkGrid(x)' is not NULL), 'blockGrid(x)' defines blocks that
are "compatible" with the chunks i.e. that any chunk is fully contained
in a block. In other words, blocks are chosen so that chunks don't
cross their boundaries.
2) When the chunk grid is unknown (i.e. when 'chunkGrid(x)' is NULL),
blocks are "isotropic", that is, they're as close as possible to an
hypercube instead of being "column-oriented" (column-oriented blocks,
also known as "linear blocks", are elongated along the 1st dimension,
then along the 2nd dimension, etc...)
3) The returned grid has the lowest "resolution" compatible with
'getAutoBlockSize()', that is, the blocks are made as big as possible
as long as their size in memory doesn't exceed 'getAutoBlockSize()'.
Note that this is not a new feature. What is new though is that an
exception now is made when the chunk grid is known and some chunks
are >= 'getAutoBlockSize()', in which case 'blockGrid(x)' returns a
grid that is the same as the chunk grid.
These new features are supposed to make the returned grid "optimal" for
block processing. (Some benchmarks still need to be done to
confirm/quantify this.)
o The automatic block size now is set to 100 Mb (instead of 4.5 Mb
previously) at package startup. Use setAutoBlockSize() to change the
automatic block size.
o No more 'BPREDO' argument to blockApply().
o Replace block_APPLY_and_COMBINE() with blockReduce().
BUG FIXES
o No-op operations on a DelayedArray derivative really act like no-ops.
Operating on a DelayedArray derivative (e.g. RleArray, HDF5Array or
GDSArray) will now return an objet of the original class if the result
is "pristine" (i.e. if it doesn't carry delayed operations) instead of
degrading the object to a DelayedArray instance. This applies for example
to 't(t(x))' or 'dimnames(x) <- dimnames(x)' etc...