Skip to content

Commit

Permalink
Refactor with biocutils (#45)
Browse files Browse the repository at this point in the history
- Refactor code with utility functions from biocutils and biocgenerics
- Use `rich` again for printing (`__repr__`)
- Update documentation, Update tests
- Clean up README
  • Loading branch information
jkanche authored Oct 16, 2023
1 parent c84cd7d commit 3e99d12
Show file tree
Hide file tree
Showing 12 changed files with 284 additions and 192 deletions.
2 changes: 1 addition & 1 deletion AUTHORS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Contributors

* jkanche [[email protected]](mailto:[email protected])
* Jayaram Kancherla [[email protected]](mailto:[email protected])
177 changes: 139 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@

# BiocFrame

This package provides `BiocFrame` class, a dataframe-like representation similar to a pandas, with support for flexible and nested objects.
This package provides `BiocFrame` class, an alternative to Pandas DataFrame's.

`BiocFrame` makes no assumption on the types of the columns, the minimum requirement is each column implements length: `__len__` and slice: `__getitem__` operations.
`BiocFrame` makes no assumption on the types of the columns, the minimum requirement is each column implements length: `__len__` and slice: `__getitem__` dunder methods. This allows `BiocFrame` to accept nested representations or any supported class as columns.


To get started, install the package from [PyPI](https://pypi.org/project/biocframe/)
Expand All @@ -29,71 +29,172 @@ pip install biocframe

## Usage

Lets create a `BiocFrame` from a dictionary
To construct a `BiocFrame` object, simply provide the data as a dictionary.

```python
from random import random
from biocframe import BiocFrame

bframe = BiocFrame(
data = {
"seqnames": [
"chr1",
"chr2",
"chr2",
"chr2",
"chr1",
"chr1",
"chr3",
"chr3",
"chr3",
"chr3",
]
* 20,
"starts": range(100, 300),
"ends": range(110, 310),
"strand": ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
"score": range(0, 200),
"GC": [random() for _ in range(10)] * 20,
}
)
obj = {
"ensembl": ["ENS00001", "ENS00002", "ENS00003"],
"symbol": ["MAP1A", "BIN1", "ESR1"],
}
bframe = BiocFrame(obj)
print(bframe)
```

## output
BiocFrame with 3 rows & 2 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ ensembl <list> ┃ symbol <list> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ ENS00001 │ MAP1A │
│ ENS00002 │ BIN1 │
│ ENS00003 │ ESR1 │
└────────────────┴───────────────┘

You can specify complex representations as columns, for example

```python
obj = {
"ensembl": ["ENS00001", "ENS00002", "ENS00002"],
"symbol": ["MAP1A", "BIN1", "ESR1"],
"ranges": BiocFrame({
"chr": ["chr1", "chr2", "chr3"],
"start": [1000, 1100, 5000],
"end": [1100, 4000, 5500]
}),
}

bframe2 = BiocFrame(obj, row_names=["row1", "row2", "row3"])
print(bframe2)
```

### Access Properties
## output
BiocFrame with 3 rows & 3 columns
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ row_names ┃ ensembl <list> ┃ symbol <list> ┃ ranges <BiocFrame> ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ row1 │ ENS00001 │ MAP1A │ {'chr': 'chr1', 'start': 1000, 'end': 1100} │
│ row2 │ ENS00002 │ BIN1 │ {'chr': 'chr2', 'start': 1100, 'end': 4000} │
│ row3 │ ENS00002 │ ESR1 │ {'chr': 'chr3', 'start': 5000, 'end': 5500} │
└───────────┴────────────────┴───────────────┴─────────────────────────────────────────────┘

### Properties

Accessor methods/properties are available to access column names, row names and dims.
Properties can be accessed directly from the object, for e.g. column names, row names and/or dimensions of the `BiocFrame`.

```python
# find the dimensions
# Dimensionality or shape
print(bframe.dims)

## output
## (3, 2)

# get the column names
print(bframe.column_names)

## output
## ['ensembl', 'symbol']
```

### Setters
#### Setters

Using the Pythonic way to set properties
To set various properties

```python
# set new column names
bframe.column_names = [..., new_column_names, ...]
print(bframe.column_names)
bframe.column_names = ["column1", "column2"]
print(bframe)
```

# add or reassign columns
## output
BiocFrame with 3 rows & 2 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ column1 <list> ┃ column2 <list> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ ENS00001 │ MAP1A │
│ ENS00002 │ BIN1 │
│ ENS00003 │ ESR1 │
└────────────────┴────────────────┘

bframe["score"] = range(200, 400)
To add new columns,

```python
bframe["score"] = range(2, 5)
print(bframe)
```

### Slice the `BiocFrame`
## output
BiocFrame with 3 rows & 3 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ column1 <list> ┃ column2 <list> ┃ score <range> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ ENS00001 │ MAP1A │ 2 │
│ ENS00002 │ BIN1 │ 3 │
│ ENS00003 │ ESR1 │ 4 │
└────────────────┴────────────────┴───────────────┘

Currently slicing is only supported by indices or names (column names or row names). A future version may implement pandas query-like operations.
### Subset `BiocFrame`

Use the subset (`[]`) operator to **slice** the object,

```python
sliced_bframe = bframe[3:7, 2:5]
sliced = bframe[1:2, [True, False, False]]
print(sliced)
```

## output
BiocFrame with 1 row & 1 column
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ row_names ┃ column1 <list> ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ 1 │ ENS00002 │
└───────────┴────────────────┘

This operation accepts different slice input types, you can either specify a boolean vector, a `slice` object, a list of indices, or row/column names to subset.


### Combine

`BiocFrame` implements the combine generic from [biocgenerics](https://github.com/BiocPy/generics). To combine multiple objects,

```python
bframe1 = BiocFrame(
{
"odd": [1, 3, 5, 7, 9],
"even": [0, 2, 4, 6, 8],
}
)

bframe2 = BiocFrame(
{
"odd": [11, 33, 55, 77, 99],
"even": [0, 22, 44, 66, 88],
}
)

from biocgenerics.combine import combine
combined = combine(bframe1, bframe2)

# OR an object oriented approach

combined = bframe.combine(bframe2)
```

For more use cases including subset, checkout the [documentation](https://biocpy.github.io/BiocFrame/)
## output
BiocFrame with 10 rows & 2
columns
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ odd <list> ┃ even <list> ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ 1 │ 0 │
│ 3 │ 2 │
│ ... │ ... │
│ 99 │ 88 │
└────────────┴─────────────┘

For more details, check out the BiocFrame class [reference](https://biocpy.github.io/BiocFrame/api/biocframe.html#biocframe.BiocFrame.BiocFrame).


<!-- pyscaffold-notes -->
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
"sphinx.ext.ifconfig",
"sphinx.ext.mathjax",
"sphinx.ext.napoleon",
"sphinx_autodoc_typehints",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
65 changes: 0 additions & 65 deletions docs/getting_started.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ pip install biocframe
:maxdepth: 2
Overview <readme>
Getting Started <getting_started>
Module Reference <api/modules>
Contributions & Help <contributing>
License <license>
Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ furo
# sphinx_rtd_theme
myst-parser[linkify]
sphinx>=3.2.1
sphinx-autodoc-typehints
7 changes: 4 additions & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@

[metadata]
name = BiocFrame
description = flexible dataframes to support nested structures.
author = jkanche
description = Flexible dataframe representation to support nested structures.
author = Jayaram Kancherla
author_email = [email protected]
license = MIT
license_files = LICENSE.txt
Expand Down Expand Up @@ -49,8 +49,9 @@ package_dir =
# For more information, check out https://semver.org/.
install_requires =
importlib-metadata; python_version<"3.8"
prettytable
rich
biocgenerics>=0.1.1
biocutils>=0.0.3

[options.packages.find]
where = src
Expand Down
Loading

0 comments on commit 3e99d12

Please sign in to comment.