Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul MAE #17

Merged
merged 25 commits into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
78dfa66
untested code changes
jkanche Jan 2, 2024
9c44837
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 2, 2024
07c54a1
theoretically all changes make sense
jkanche Jan 3, 2024
3764a5f
Merge branch 'overhaul-mae' of https://github.com/BiocPy/MultiAssayEx…
jkanche Jan 3, 2024
7a3f2c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
fb681d8
all changes to fix tests
jkanche Jan 3, 2024
8597966
Merge branch 'overhaul-mae' of https://github.com/BiocPy/MultiAssayEx…
jkanche Jan 3, 2024
b50ac45
lint issues
jkanche Jan 3, 2024
4b0ee95
yet another lint
jkanche Jan 3, 2024
c4da4ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
18f8eac
readme and docs changes
jkanche Jan 3, 2024
5766c1b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
d8ecb94
add tests for experiment
jkanche Jan 3, 2024
597b0b1
Merge branch 'overhaul-mae' of https://github.com/BiocPy/MultiAssayEx…
jkanche Jan 3, 2024
a6983a9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
70b6a77
warn user if column names is None
jkanche Jan 3, 2024
fd57667
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
5db897a
ignore lint
jkanche Jan 3, 2024
2dabf95
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
451e959
typo from column_data to column_names
jkanche Jan 3, 2024
fc59bf1
Merge branch 'overhaul-mae' of https://github.com/BiocPy/MultiAssayEx…
jkanche Jan 3, 2024
de605f8
remove prints
jkanche Jan 3, 2024
8b9a13c
warn when row names of column data contain duplicates
jkanche Jan 3, 2024
9fbf319
create empty MAE
jkanche Jan 3, 2024
07c1df5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 33 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,18 @@ pip install multiassayexperiment
First create mock sample data

```python
import pandas as pd
from random import random

import numpy as np
from biocframe import BiocFrame
from genomicranges import GenomicRanges
from iranges import IRanges

nrows = 200
ncols = 6
counts = np.random.rand(nrows, ncols)
gr = GenomicRanges(
{
"seqnames": [
seqnames=[
"chr1",
"chr2",
"chr2",
Expand All @@ -39,66 +41,65 @@ gr = GenomicRanges(
"chr3",
"chr3",
"chr3",
]
* 20,
"starts": range(100, 300),
"ends": range(110, 310),
"strand": ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
] * 20,
ranges=IRanges(range(100, 300), range(110, 310)),
strand = ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
mcols=BiocFrame({
"score": range(0, 200),
"GC": [random() for _ in range(10)] * 20,
}
})
)

col_data_sce = pd.DataFrame(
{
"treatment": ["ChIP", "Input"] * 3,
},
index=["sce"] * 6,
col_data_sce = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
row_names=[f"sce_{i}" for i in range(6)],
)

col_data_se = pd.DataFrame(
{
"treatment": ["ChIP", "Input"] * 3,
},
index=["se"] * 6,
col_data_se = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
row_names=[f"se_{i}" for i in range(6)],
)

sample_map = pd.DataFrame(
{
"assay": ["sce", "se"] * 6,
"primary": ["sample1", "sample2"] * 6,
"colname": ["sce", "se"] * 6,
}
)
sample_map = BiocFrame({
"assay": ["sce", "se"] * 6,
"primary": ["sample1", "sample2"] * 6,
"colname": ["sce_0", "se_0", "sce_1", "se_1", "sce_2", "se_2", "sce_3", "se_3", "sce_4", "se_4", "sce_5", "se_5"]
})

sample_data = pd.DataFrame({"samples": ["sample1", "sample2"]})
sample_data = BiocFrame({"samples": ["sample1", "sample2"]}, row_names= ["sample1", "sample2"])
```

Now we can create an instance of an MAE -

```python
from multiassayexperiment import MultiAssayExperiment
from singlecellexperiment import SingleCellExperiment
from summarizedExperiment import SummarizedExperiment
from summarizedexperiment import SummarizedExperiment

tsce = SingleCellExperiment(
assays={"counts": counts}, row_data=df_gr, col_data=col_data_sce
assays={"counts": counts}, row_data=gr.to_pandas(), column_data=col_data_sce
)

tse2 = SummarizedExperiment(
assays={"counts": counts.copy()},
row_data=df_gr.copy(),
col_data=col_data_se.copy(),
row_data=gr.to_pandas().copy(),
column_data=col_data_se.copy(),
)

mae = MultiAssayExperiment(
experiments={"sce": tsce, "se": tse2},
col_data=sample_data,
column_data=sample_data,
sample_map=sample_map,
metadata={"could be": "anything"},
)
```

## output
class: MultiAssayExperiment containing 2 experiments
[0] sce: SingleCellExperiment with 200 rows and 6 columns
[1] se: SummarizedExperiment with 200 rows and 6 columns
column_data columns(1): ['samples']
sample_map columns(3): ['assay', 'primary', 'colname']
metadata(1): could be

For more use cases, checkout the [documentation](https://biocpy.github.io/MultiAssayExperiment/).

<!-- pyscaffold-notes -->
Expand Down
98 changes: 42 additions & 56 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,17 @@ An MAE contains three main entities,
Lets create these objects

```python
import pandas as pd
from biocframe import BiocFrame
from iranges import IRanges
import numpy as np
from genomicranges import GenomicRanges
from random import random

nrows = 200
ncols = 6
counts = np.random.rand(nrows, ncols)
df_gr = pd.DataFrame(
{
"seqnames": [
gr = GenomicRanges(
seqnames=[
"chr1",
"chr2",
"chr2",
Expand All @@ -42,57 +43,46 @@ df_gr = pd.DataFrame(
"chr3",
"chr3",
"chr3",
]
* 20,
"starts": range(100, 300),
"ends": range(110, 310),
"strand": ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
] * 20,
ranges=IRanges(range(100, 300), range(110, 310)),
strand = ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
mcols=BiocFrame({
"score": range(0, 200),
"GC": [random() for _ in range(10)] * 20,
}
})
)

gr = GenomicRanges.from_pandas(df_gr)

col_data_sce = pd.DataFrame(
{
"treatment": ["ChIP", "Input"] * 3,
},
index=["sce"] * 6,
col_data_sce = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
row_names=["sce"] * 6,
)

col_data_se = pd.DataFrame(
{
"treatment": ["ChIP", "Input"] * 3,
},
index=["se"] * 6,
col_data_se = BiocFrame({"treatment": ["ChIP", "Input"] * 3},
row_names=["se"] * 6,
)

sample_map = pd.DataFrame(
{
"assay": ["sce", "se"] * 6,
"primary": ["sample1", "sample2"] * 6,
"colname": ["sce", "se"] * 6,
}
)
sample_map = BiocFrame({
"assay": ["sce", "se"] * 6,
"primary": ["sample1", "sample2"] * 6,
"colname": ["sce", "se"] * 6
})

sample_data = pd.DataFrame({"samples": ["sample1", "sample2"]})
sample_data = BiocFrame({"samples": ["sample1", "sample2"]}, row_names=["sample1", "sample2"])
```

Then, create various experiment classes,

```python
from singlecellexperiment import SingleCellExperiment
from summarizedExperiment import SummarizedExperiment
from summarizedexperiment import SummarizedExperiment

tsce = SingleCellExperiment(
assays={"counts": counts}, row_data=df_gr, col_data=col_data_sce
assays={"counts": counts}, row_data=gr.to_pandas(), column_data=col_data_sce
)

tse2 = SummarizedExperiment(
assays={"counts": counts.copy()},
row_data=df_gr.copy(),
col_data=col_data_se.copy(),
row_data=gr.to_pandas().copy(),
column_data=col_data_se.copy(),
)
```

Expand All @@ -101,9 +91,9 @@ Now that we have all the pieces together, we can now create an MAE,
```python
from multiassayexperiment import MultiAssayExperiment

maeObj = MultiAssayExperiment(
mae = MultiAssayExperiment(
experiments={"sce": tsce, "se": tse2},
col_data=sample_data,
column_data=sample_data,
sample_map=sample_map,
metadata={"could be": "anything"},
)
Expand All @@ -114,7 +104,8 @@ To make your life easier, we also provide methods to naively create sample mappi
**_This is not a recommended approach, but if you don't have sample mapping, then it doesn't matter._**

```python
maeObj = mae.make_mae(experiments={"sce": tsce, "se": tse2})
import multiassayexperiment
maeObj = multiassayexperiment.make_mae(experiments={"sce": tsce, "se": tse2})
```

## Import `MuData` and `AnnData` as `MultiAssayExperiment`
Expand Down Expand Up @@ -152,27 +143,30 @@ adata2.var_names = [f"var2_{j+1}" for j in range(d2)]
we can now construct a `MuData` object and convert that to an MAE

```python
from mudata import MuData
from multiassayexperiment import MultiAssayExperiment
mdata = MuData({"rna": adata, "spatial": adata2})

maeObj = mae.from_mudata(mudata=mdata)
maeObj = MultiAssayExperiment.from_mudata(input=mdata)
```

Methods are also available to convert an `AnnData` object to `MAE`.

```python
maeObj = mae.read_h5ad("tests/data/adata.h5ad")
import multiassayexperiment
maeObj = multiassayexperiment.read_h5ad("tests/data/adata.h5ad")
```

# Accessors

Multiple methods are available to access various slots of a `MultiAssayExperiment` object

```python
maeObj.assays
maeObj.col_data
maeObj.sample_map
maeObj.experiments
maeObj.metadata
mae.assays
mae.column_data
mae.sample_map
mae.experiments
mae.metadata
```

## Access experiments
Expand All @@ -181,7 +175,7 @@ if you want to access a specific experiment

```python
# access a specific experiment
maeObj.experiment(experiment_name)
mae.experiment("se")
```

This does not include the sample data stored in the MAE. If you want to include this information
Expand All @@ -199,7 +193,7 @@ expt_with_sampleData = maeObj.experiment(experiment_name, with_sample_data=True)
The structure for slicing,

```
maeObj[rows, columns, experiments]
mae[rows, columns, experiments]
```

- rows, columns: accepts either a slice, list of indices or a dictionary to specify slices per experiment.
Expand All @@ -217,29 +211,21 @@ maeObj[1:5, 0:4]
maeObj[1:5, 0:4, ["spatial"]]
```

## Specify slices per experiment

You can specify slices by experiment, rest of the experiments are not sliced.

```python
maeObj[{"rna": slice(0,10)}, {"spatial": slice(0,5)}, ["spatial"]]
```

Checkout other methods that perform similar operations - `subset_by_rows`, `subset_by_columns` & `subset_by_experiments`.

# Helper methods

## completedCases

This method returns a boolean vector that specifies which biospecimens have data across all experiments.
This method returns a boolean vector that specifies which bio specimens have data across all experiments.

```python
maeObj.completed_cases()
```

## replicated

replicated identifies biospecimens that have multiple observations per experiment.
replicated identifies bio specimens that have multiple observations per experiment.

```python
maeObj.replicated()
Expand Down
18 changes: 14 additions & 4 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,17 @@ package_dir =
=src

# Require a min/specific Python version (comma-separated conditions)
# python_requires = >=3.8
python_requires = >=3.8

# Add here dependencies of your project (line-separated), e.g. requests>=2.2,<3.0.
# Version specifiers like >=2.2,<3.0 avoid problems due to API changes in
# new major versions. This works if the required packages follow Semantic Versioning.
# For more information, check out https://semver.org/.
install_requires =
importlib-metadata; python_version<"3.8"
summarizedexperiment>=0.3.0
singlecellexperiment>=0.3.0
mudata
biocframe>=0.5.6,<0.6.0
biocutils>=0.1.4,<0.2.0
summarizedexperiment>=0.4.0,<0.5.0

[options.packages.find]
where = src
Expand All @@ -62,12 +62,22 @@ exclude =
# Add here additional requirements for extra features, to install with:
# `pip install MultiAssayExperiment[PDF]` like:
# PDF = ReportLab; RXP
optional =
singlecellexperiment
anndata
mudata
genomicranges

# Add here test requirements (semicolon/line-separated)
testing =
setuptools
pytest
pytest-cov
anndata
pandas
mudata
singlecellexperiment
genomicranges

[options.entry_points]
# Add here console scripts like:
Expand Down
Loading
Loading