Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor #3

Merged
merged 15 commits into from
Dec 7, 2022
Merged
7 changes: 7 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[run]
omit = */tests/*, slkspec/_version.py
concurrency = multiprocessing

[report]
exclude_lines =
pragma: no cover
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "pip"
directory: "/.github/dependabot/"
schedule:
interval: "daily"
2 changes: 2 additions & 0 deletions .github/dependabot/constraints.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
fsspec==2022.11.0
pyslk==1.4.1
23 changes: 23 additions & 0 deletions .github/workflows/lint_job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Linter
run-name: ${{ github.actor }} is linting the code

on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
-
name: Ceckout
uses: actions/checkout@v3
-
name: Set up Python 3.11
uses: actions/setup-python@v3
with:
python-version: "3.11"
-
name: Install packages
run: |
python3 -m pip install -e .[tests]
-
name: Linting
run: make lint
45 changes: 45 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Tests
run-name: ${{ github.actor }} is testing the code

on:
push:
branches: [ master ]
pull_request:

jobs:
tests:

runs-on: ubuntu-latest
strategy:
max-parallel: 5
matrix:
python-version: [3.7, 3.8, 3.9, "3.10", "3.11"]

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}

- name: Add conda to system path
run: |
# $CONDA is an environment variable pointing to the root of the miniconda directory
echo $CONDA/bin >> $GITHUB_PATH

- name: Creating conda environment for ${{ matrix.python-version }}
run: |
conda create -n test -y python=${{matrix.python-version}} pip make

- name: Install dependencies
run: |
conda run -n test python3 -m pip install -e .[tests]

- name: Test with pytest
run: |
conda run -n test make test_coverage

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
32 changes: 32 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Coverage reports
.coverage
coverage.xml
report.xml
24 changes: 24 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# makefile used for testing
#
#
all: install test

.PHONY: docs
install:
python3 -m pip install -e .[tests]

test:
python3 -m pytest -vv $(PWD)/slkspec/tests

test_coverage:
python3 -m pytest -vv \
--cov=$(PWD)/slkspec --cov-report html:coverage_report \
--cov-report=xml --junitxml report.xml
rm -rf '='
python3 -m coverage report


lint:
mypy --install-types --non-interactive
black --check -t py310 .
flake8 slkspec --count --max-complexity=10 --max-line-length=88 --statistics --doctests
42 changes: 23 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,42 @@ Pull requests are welcomed!
```python
import fsspec

with fsspec.open("slk://arch/project/user/file", "r") as f:
with fsspec.open("slk:///arch/project/user/file", "r") as f:
print(f.read())
```
### Loading datasets

```python

import ffspec
import xarray as xr

url = fsspec.open("slk:////arch/project/file.nc", slk_cache="/scratch/b/b12346").open()
dset = xr.open_dataset(url)
```


## Usage in combination with preffs
### Installation of additional requirements
```
git clone [email protected]:observingClouds/slkspec.git
cd slkspec
```console
mamba env create
mamba activate slkspec
pip install .
```

### Loading dataset

Set location for tape archive retrievals on local machine
```bash
export SLK_CACHE="/path/to/local/cache/directory/"
```
Load the slk module
```bash
module load slk
pip install .[preffs]
```

Open parquet referenced zarr-file
```python
import xarray as xr
ds=xr.open_zarr(f"preffs::/path/to/preffs/data.preffs", storage_options={"preffs":{"prefix":"slk:///arch/<project>/<user>/slk/archive/prefix/"}
ds = xr.open_zarr(f"preffs::/path/to/preffs/data.preffs",
storage_options={"preffs":{"prefix":"slk:///arch/<project>/<user>/slk/archive/prefix/"}
```
Now only those files are retrieved from tape which are needed for any requested dataset operation. In the beginning only the file containing the metadata (e.g. .zattrs, .zmetadata) and coordinates are requested (e.g. time). After the files have been retrieved once, they are saved at the path given in `SLK_CACHE` and accessed directly from there.

Now only those files are retrieved from tape which are needed for any requested
dataset operation. In the beginning only the file containing the metadata
(e.g. .zattrs, .zmetadata) and coordinates are requested (e.g. time). After the
files have been retrieved once, they are saved at the path given in
`SLK_CACHE` and accessed directly from there.


## Current limitations (among others)
- tape retrievals are currently not combined, leading to single/inefficient requests
Copy link
Owner

@observingClouds observingClouds Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still true, so far ( see one of my other comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the _retrieve_items method gets only called once, at which point it has information on all items that need to be downloaded. So the retrievals are combined - somewhat. There are limitations though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a pyslk.slk_recall(SEARCH_ID_ALL_FILES), first, and, then, the individual retrievals.

- tape retrievals are done within the currently selected compute resources, which might not be needed for a simple tape retrieval
21 changes: 21 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[mypy]
files = slkspec/core.py
strict = False

warn_unused_ignores = True
warn_unreachable = True
show_error_codes = True
install_types = True
non_interactive = True
warn_unused_configs = True
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
disallow_untyped_decorators = True
warn_redundant_casts = True

[mypy-fsspec.*]
ignore_missing_imports = True

[mypy-pyslk.*]
ignore_missing_imports = True
4 changes: 0 additions & 4 deletions requirements.txt

This file was deleted.

27 changes: 23 additions & 4 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,6 @@
with open("README.md", "r") as fh:
long_description = fh.read()

with open("requirements.txt") as f:
requirements = f.read().strip().split("\n")

setuptools.setup(
name="slkspec",
version=versioneer.get_version(),
Expand All @@ -25,7 +22,29 @@
"Operating System :: OS Independent",
],
python_requires=">=3.7",
install_requires=requirements,
install_requires=[
"fsspec>=0.9.0",
"pyslk @ git+https://gitlab.dkrz.de/hsm-tools/pyslk.git@master",
],
extras_require={
"tests": [
"mypy",
"black",
"flake8",
"mock",
"pandas",
"pytest",
"pytest-env",
"pytest-cov",
"testpath",
"flake8",
"xarray",
],
"preffs": [
"preffs @ git+https://github.com/observingClouds/preffs.git@slkspec_patch",
"aiohttp",
],
},
entry_points={
"fsspec.specs": [
"slk=slkspec.SLKFileSystem",
Expand Down
4 changes: 2 additions & 2 deletions slkspec/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from fsspec import register_implementation

from . import _version
from .core import SLKFileSystem
from .core import SLKFileSystem, logger

__version__ = _version.get_versions()["version"]

register_implementation(SLKFileSystem.protocol, SLKFileSystem)

__all__ = ["__version__", "SLKFileSystem"]
__all__ = ["__version__", "SLKFileSystem", "logger"]
Loading