-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python/r/c++] Revisit shape
for component arrays
#2407
Labels
Comments
This was referenced Apr 8, 2024
johnkerl
changed the title
[python/r/c++] Revisit
[python/r/c++] Revisit Apr 8, 2024
shape
for sparse arraysshape
for sparse arrays [long-term tracker]
johnkerl
changed the title
[python/r/c++] Revisit
[python/r/c++] Revisit May 15, 2024
shape
for sparse arrays [long-term tracker]shape
for sparse arrays
#2785 is a quick-and-dirty concept-prover -- its sole function is to flush out any API misunderstandings we might have, in prep for 2.25.0 core release. |
This was referenced Aug 2, 2024
This was referenced Aug 17, 2024
This was referenced Nov 5, 2024
This was referenced Nov 20, 2024
This was referenced Dec 3, 2024
This was referenced Dec 10, 2024
7 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
PRs
Merged PRs:
kerl/schevo-timestamp-methodize
kerl/name-neaten
kerl/ut-soma-exc-simplify
test/common.cc
#2910kerl/test-common-parameterize
kerl/cpp-test-deadstrip
kerl/minor-unit-test-helper-mod
kerl/cpp-ut-helper-neaten
use_current_domain
unit-test parameterization #2938kerl/more-cur-dom-parameterize
kerl/cpp-strict-int64-shape
kerl/arrow-util-current-domain-optional
kerl/step-two-temp
resize
forSparseNDArray
andDenseNDArray
#2947kerl/cpp-ndarray-resize-testing
kerl/dataframe-test-fixture
kerl/cpp-variant-indexed-dataframes
DataFrame.shape
#2916kerl/sdf-shape
DataFrame
#2917kerl/cpp-resizes
upgrade_shape
forSparseNDArray
andDenseNDArray
#2948kerl/upgrade-shape-int64
kerl/sdf-test-accessors
kerl/py-r-accessor-plumbing
kerl/sdf-domain-accessors
kerl/dense-link
pybind11
exception-mapping #2963kerl/nightly-fix
DenseNDArray
write after create #2970kerl/dense-writeable-after-create
kerl/minor-trim
domain
/maxdomain
#2969kerl/more-py-domain-name-neaten
kerl/libtiledbsoma-env-logging-level
kerl/py-r-creation-paths
resize
andtiledbsoma_upgrade_shape
#2950kerl/py-r-test-2
nanoarrow
helpers #2994kerl/nanoarrow-helpers
kerl/polydom3
kerl/polydom5
kerl/polydom6
nnz
of variant-indexed dataframes #2990kerl/variant-nnz-bug
DataFrame
test case withsoma_joinid
not first #3019kerl/index-swap
kerl/ut-max-shape
kerl/polydom4
kerl/fix-3020-merge
kerl/one-more-rename
kerl/ff-not
valgrind
issue in unit-test code #3029kerl/ut-vg
kerl/table-utils-memory
DataFrame
#3067kerl/improve-sdf-test-field-names
kerl/ut-generate
DataFrame
domain forlibtiledbsoma
unit-test cases #3069kerl/cpp-sdf-domain-at-create
kerl/hll-domainish
kerl/max-domain-int64
kerl/maybe-resize-soma-joinid-cpp-tweak
domain
argument toDataFrame
create
#3032kerl/sdf-domain-at-create
-- fixes [r]SOMADataFrame
create
needs to accept adomain
argument #2967DataFrame
resizer #3091kerl/maybe-resize-soma-joinid-py-r
kerl/cpp-exp-resize-prep
DataFrame
objects shapeable at ingest #3089kerl/r-dataframe-shapeable
domain
argument betweenCollection.add_new_dataframe
andDataFrame.create
SOMA#233kerl/cpp-ut-name-shortens
kerl/helper-rename
kerl/cpp-can-resizers-names
kerl/cpp-dataframe-sizing-helpers
kerl/cpp-dataframe-upgrade-test
kerl/py-resizer-connects
kerl/py-can-upgrade-shape
kerl/registration-shape-acceessors
kerl/py-exp-shaping
kerl/py-exp-shaping2
kerl/py-exp-resize
kerl/py-domain-at-create-ut-1
kerl/py-domain-at-create-ut-2
kerl/py-domain-at-create-ut-3
kerl/py-domain-at-create-ut-4
kerl/py-domain-at-create-ut-5
kerl/min-size-2
kerl/r-min-sizing
can_upgrade_domain
#3211kerl/cpp-ugr-dom
kerl/ff-interop
kerl/ffon
kerl/docstring-prune
kerl/prefixing
kerl/fix-bad-merge
upgrade_domain
#3235kerl/py-r-ugr-dom
kerl/py-r-ugr-dom-2
upgrade_domain
#3238kerl/py-r-ugr-dom-3
set_reader_coords
toset_coords
#3253kerl/set-coords-rename
pybind11
shape methods #3261kerl/pybind11-nda-sizing
kerl/dense-227-a
kerl/dense-range-trim
kerl/dim-explosion
kerl/python-227-dense-ned-read
kerl/r-227-dense-fixes
kerl/r-dense-227-more
function_name_for_messages
#3286kerl/more-fn4m
.rst
files #3283kerl/readthedocs-pre-1.15
tiledbsoma_upgrade_shape
forDenseNDArray
#3288kerl/dense-ugrsh
kerl/notebook-shape-upgrade
kerl/new-shape-doc-updates
.tgz
files in source control #3295kerl/notebook-data-refresh
kerl/notebook-new-shape-refresh
kerl/ffena
kerl/r-data-refresh
kerl/sdf-sjid-lower-zero
kerl/dense-example-data-refresh
kerl/new-shape-notebook-and-vignette
kerl/upgrade-experiment-resources
kerl/fix-notebook-merge
kerl/more-use-shape
kerl/revert-3300
kerl/227a
use_current_domain
unit-test/feature-flag teardown, part 1 of 4 #3369kerl/ucd1
use_current_domain
unit-test/feature-flag teardown, part 2 of 4 #3370kerl/ucd2
use_current_domain
unit-test/feature-flag teardown, part 3 of 4 #3371kerl/ucd3
use_current_domain
unit-test/feature-flag teardown, part 4 of 4 #3372kerl/ucd4
domain
argument tocreate
#3396kerl/domain-at-create-docstrings
kerl/new-shape-vignette
kerl/new-shape-more-docstrings
check_only
support for domain/shape updates #3400kerl/check-only-r
Closed/abandoned PRs:
kerl/feature-flag-temp
-- folded into 2962kerl/polydom
tiledbsoma.io
[WIP] #2964kerl/tiledbsoma-io-test
kerl/min-size
upgrade_domain
forDataFrame
#3220kerl/cpp-ugr-dom-2
dev
#3244kerl/dense-227-fixes
shape
accessor forDataFrame
[RFC] #3276kerl/dataframe-shape
Issues which are related but non-blocking:
SparseNDArray
/DenseNDArray
create
methods need to accept tile extent fromPlatformConfig
#2966_cast_domainish
#3081See also: [sc-51048].
Problem to be solved
Users want to know the
shape
of an array, in the SciPy sense:resize
.tiledbsoma.io
's append mode, or subsequent writes using thetiledbsoma
API.Using TileDB-SOMA up until the present:
domain
is immutable after array creationshape
, users would need to set thedomain
at array-creation time. However, users lose the ability to grow their datasets later.non_empty_domain
accessorX
array for 100 cells and 200 genes. If non-zero expression counts exist only for cell join IDs 2-17, then thenon_empty_domain
will indicate(2,17)
alongsoma_dim_0
.obms["X_pca"]
within the same experiment. This may be 100 cells by 50 PCA components: we need a placd to store the number 50.shape
accessor.used_shape
accessor since TileDB-SOMA 1.5.shape
accessor, in the SciPy sense, but it is not multi-writer safe.New feature for TileDB-SOMA 1.15:
shape
resize
used_shape
accessor will be deprecated in TileDB-SOMA 1.13, and slated for removal in TileDB-SOMA 1.14.Compatiblity:
This will now require users to do an explicit
resize
before appending/growing TileDB-SOMA Experiments. Guidance in the form of example notebooks will be provided.Tracking
See also: [sc-41074] and [sc-51048].
Scheduling
Support arrives in TileDB Core 2.25. Deprecations for TileDB-SOMA will be released with 1.13. Full support within TileDB-SOMA will be release in 1.14.
Details
SOMA API mods as we've discussed in a Google doc are as follows.
SOMADataFrame
create
: Retain thedomain
argument(lo, hi)
tuple per dim, e.g.(0,99)
or(10,19)
SparseNDArray
andDenseNDArray
core can have(lo, hi)
and SOMA can havecount
DataFrame
there can be multiple dims --- default is a singlesoma_joinid
(lo, hi)
fashion orcount
fashioncell_type
) can be on any type, including strings, floats, etc. where there is no implicit lo=0DataFrame
takes adomain
argument (in(lo, hi)
fashion) and not ashape
argument (incount
fashion)SparseNDArray and DenseNDArray
create
Tuple[Int,...]
where each element is the cell count of the corresponding dimensionAll three of
SOMADataFrame
,SparseNDArray
,DenseNDArray
write
write
time are within the current shapetiledb.cc.TileDBError
to TileDB-SOMA, which will catch and raiseIndexError
, and R-standard behavior on the R sideused_shape
accessorNotImplementedError
array.schema.version
(the core storage version).shape
accessornon_empty_domain
accessormaxshape
accessor(lo, hi)
accessor for domain to count-style accessor hi+1. E.g. if the core domain is either(0,99)
or(50,99)
then TileDB-SOMAmaxshape
will say 100.domain
ormaxshape
(see h5py).resize
mutatorreshape
means something else in the community (numpy, zarr, h5py), e.g. a 5x20 (total 100 cells) being reinterpreted as 4x25 (still 100 cells). The standard name for changing cell-count isresize
.NotImplementedError
.ValueError
if the new shape is smaller on any dim than currently in storageValueError
if the new shape exceeds the TileDB domain from create time, which will serve TileDB-SOMA in a role of “max possible shape the user can reshape to”tiledbsoma_upgrade_shape
method for SparseNDArray and DenseNDArrayarray.schema.version
to see if an upgrade is neededcreate
tiledbsoma_upgrade_domain method
forDataFrame
SparseNDArray
/DenseNDArray
except it will take a domain at the SOMA-API level just asDataFrame
's create methodtiledbsoma.io
tiledbsoma.io
, we’ll still ask the tiledbsoma API for the “big domain” (2 billionish)resize
method at theExperiment
levelresize
exp.resize(...)
, or (better) this could betiledbsoma.io.reshape_experiment
obs
andvar
counts as inputs:exp.obs.reshape
to newobs
countexp.ms[name].var.reshape
to newvar
countexp.ms[name].X[name].reshape
to newobs
count xvar
countexp.ms[name].obsm[name].reshape
to newobs
count x same widthexp.ms[name].obsp[name].reshape
to newobs
count xobs
countexp.ms[name].varm[name].reshape
to newvar
count x same widthexp.ms[name].varp[name].reshape
to newvar
count xvar
countThe text was updated successfully, but these errors were encountered: