Skip to content

Commit

Permalink
feat: update docs and ci (#37)
Browse files Browse the repository at this point in the history
* feat: update python docs

* feat: update ci
  • Loading branch information
gadomski authored Nov 19, 2024
1 parent e3f4c2e commit ff1608e
Show file tree
Hide file tree
Showing 14 changed files with 284 additions and 69 deletions.
12 changes: 7 additions & 5 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,11 @@ jobs:
key: ${{ runner.os }}-nextjs-${{ hashFiles('**/yarn.lock') }}-${{ hashFiles('**.[jt]s', '**.[jt]sx') }}
restore-keys: |
${{ runner.os }}-nextjs-${{ hashFiles('**/yarn.lock') }}-
- name: Install dependencies
run: yarn install
- name: Setup
run: scripts/setup
- name: Lint
run: yarn lint
- name: Build with Next.js
run: yarn build
run: scripts/lint
- name: Test
run: scripts/test
- name: Build
run: scripts/build
111 changes: 111 additions & 0 deletions README-python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# heystac

[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/gadomski/heystac/ci.yaml?style=for-the-badge)](https://github.com/gadomski/heystac/actions/workflows/ci.yaml)

A command-line utility (CLI) for rating and crawling [STAC](https://stacspec.org/) catalogs.
**heystac** generates the ratings for <https://www.gadom.ski/heystac/>.

## Usage

```shell
python -m pip install heystac
heystac --help
```

To [rate](#rate) a STAC catalog, collection, or item:

```shell
$ heystac rate https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2-st/items/LC09_L2SP_090091_20241118_20241119_02_T2_ST
5.0 ★★★★★
```

Any issues will be printed to standard output:

```shell

$ heystac rate https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2-st
1.7 ★★

High importance issues
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Rule id | Message |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| validate-core | Validation failed for Collection with ID landsat-c2l2-st against schema at https://schemas.stacspec.org/v1.0.0/collection-spec/json-schema/collection.json |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

To run [json-schema](https://json-schema.org/) validation on a STAC value:

```shell
$ heystac validate https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2-st 2>&1 | tail -n7
Failed validating 'pattern' in schema['allOf'][0]['properties']['license']:
{'title': 'Collection License Name',
'type': 'string',
'pattern': '^[\\w\\-\\.\\+]+$'}

On instance['license']:
'https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/Landsat_Data_Policy.pdf'
```

To [crawl](#crawl) a catalog and save the crawl to a directory:

```shell
heystac crawl https://landsatlook.usgs.gov/stac-server usgs-landsat
```

## Definitions

We've made some opinionated decisions about behavior in this CLI.

### Rate

A `Rating` is generated by applying a set of `Rules` to a STAC value.
This produces one `Check` per rule.
Each `Check` has a score between zero and one:

- `0`: the STAC value failed the check
- `1`: the STAC value passed the check
- Something between `0` and `1`: the STAC value partially failed the check, e.g. if the check was for valid links and some links were valid and some were not

Each rule also has an `Importance`:

- `high`
- `medium`
- `low`

**heystac** applies a configurable weight to each check based on its importance to produce a `score` for the STAC value.
That score is converted to `stars` by the following formula: `5 * score / total`, where `total` is the maximum possible score.

### Crawl

When **heystac** crawls a STAC API, it gets every collection and one item from each collection.
The catalog is saved to the local filesystem in the following layout:

- `catalog.json`
- `collection-a/collection.json`
- `collection-a/item-from-collection-a.json`
- `collection-b/collection.json`
- `collection-b/item-from-collection-b.json`

The item file names are generated from the item ID, with all `/` characters replaced by `_`.

## Configuration

**heystac** comes with a default configuration that should work for most use-cases.
If you want to customize anything, such as the importance weights or the rule descriptions, save the default configuration to a file called `heystac.toml`:

```shell
heystac config > heystac.tomml
```

You can then edit that file to your heart's content.
By default, the CLI will read `heystac.toml` in your current working directory.
To specify a config file in another location:

```shell
heystac --config a/nother/path/config.toml
```

## License

MIT
31 changes: 11 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/gadomski/heystac/pages.yaml?style=for-the-badge&label=pages)](https://github.com/gadomski/heystac/actions/workflows/pages.yaml)

> [!WARNING]
> This is a work in progress, _and_ @gadomski is 🗑️ at front-end dev, so set your expectations low.
> <https://gadom.ski/heystac> is a Proof-of-Concept and not intended to be used as a Real Website™ (yet). The backend **heystac** command-line utility _is_ fit-for-purpose.
A curated geospatial asset discovery experience™.
**heystac** lives on [Github Pages](https://github.com/gadomski/heystac/deployments/github-pages) and has no other infrastructure.
Expand All @@ -13,38 +13,29 @@ A curated geospatial asset discovery experience™.

## Developing

Get [yarn](https://yarnpkg.com/).
Get [yarn](https://yarnpkg.com/) and [uv](https://docs.astral.sh/uv/getting-started/installation/).
Then:

```shell
yarn install
yarn dev
scripts/setup
```

### Frontend
To start the development server:

The frontend is built in [next.js](https://nextjs.org/), using [tailwind css](https://tailwindcss.com/) and [Development Seed's UI components](https://ui.ds.io).
The frontend code lives in [app](./app/).

### Backend

We have a command-line interface (CLI), also called **heystac**, for generating pre-rendered content.
The Python code for the CLI lives in [src](./src/).
The CLI builds our STAC catalog, which lives in a submodule at [heystac-catalog](https://github.com/gadomski/heystac-catalog).
```shell
scripts/start
```

If you want to build the catalog from scratch:
To run all tests:

```shell
heystac bootstrap
heystac crawl all
heystac rate
scripts/test
```

However, most of the time you'll just be (re)crawling catalogs and then rating them:
To run all linters and format checkers:

```shell
heystac crawl my-new-catalog
heystac rate
scripts/lint
```

## License
Expand Down
31 changes: 0 additions & 31 deletions heystac.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,34 +26,3 @@ title = "Earth Search by Element 84"
[catalogs.usgs-landsat]
href = "https://landsatlook.usgs.gov/stac-server"
title = "USGS Landsat"

[weights]
high = 8
medium = 2
low = 1

[rules.validate-core]
# TODO can we use the docstring for the description?
description = "Validate the STAC object against its core json-schema"
importance = "high"
function = "heystac.rules:validate_core"

[rules.validate-geometry]
description = "Validate item geometries"
importance = "high"
function = "heystac.rules:validate_geometry"

[rules.links]
description = "Check that all http and https links are accessible"
importance = "medium"
function = "heystac.rules:links"

[rules.validate-extensions]
description = "Validate the STAC object against all it's extension schemas"
importance = "medium"
function = "heystac.rules:validate_extensions"

[rules.version]
description = "Ensure the STAC version is \"modern\""
importance = "medium"
function = "heystac.rules:version"
Binary file modified img/home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 16 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
[project]
name = "heystac"
version = "0.0.0"
description = "Command-line interface (CLI) for heystac"
readme = "README.md"
authors = [{ name = "Pete Gadomski", email = "[email protected]" }]
version = "0.0.1"
description = "Command-line interface (CLI) to rate and crawl STAC"
keywords = ["stac"]
readme = "README-python.md"
license = { file = "LICENSE.txt" }
requires-python = ">=3.12"
dependencies = [
"click>=8.1.7",
Expand All @@ -14,6 +17,15 @@ dependencies = [
"shapely>=2.0.6",
"tabulate>=0.9.0",
"tqdm>=4.67.0",
"toml>=0.10.2",
]
classifiers = [
"Topic :: Scientific/Engineering :: GIS",
"Natural Language :: English",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]

[project.scripts]
Expand All @@ -30,6 +42,7 @@ dev = [
"ruff>=0.7.3",
"types-requests>=2.32.0.20241016",
"types-tabulate>=0.9.0.20240106",
"types-toml>=0.10.8.20240310",
"types-tqdm>=4.66.0.20240417",
]

Expand Down
5 changes: 5 additions & 0 deletions scripts/build
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env sh

set -e

yarn build
5 changes: 5 additions & 0 deletions scripts/lint
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env sh

set -e

. ./.husky/pre-commit
6 changes: 6 additions & 0 deletions scripts/setup
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env sh

set -e

yarn install
uv sync
5 changes: 5 additions & 0 deletions scripts/start
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env sh

set -e

yarn dev
6 changes: 6 additions & 0 deletions scripts/test
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env sh

set -e

uv run pytest
# yarn test
42 changes: 40 additions & 2 deletions src/heystac/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@

import click
import httpx
import pystac.validation
import tabulate
import toml
from click import Context

from .check import Check
Expand All @@ -15,11 +17,22 @@


@click.group()
@click.option(
"-c",
"--config",
help="Path to a heystac configuration TOML. If not provided, config will be read from heystac.toml in the current directory",
type=click.Path(exists=True, dir_okay=False, path_type=Path),
)
@click.pass_context
def cli(ctx: Context) -> None:
def cli(ctx: Context, config: Path | None) -> None:
"""Crawl and rate STAC catalogs"""
ctx.ensure_object(dict)
ctx.obj["config"] = Config()
if config:
with open(config) as f:
data = toml.load(f)
ctx.obj["config"] = Config.model_validate(data)
else:
ctx.obj["config"] = Config()


@click.command()
Expand Down Expand Up @@ -121,9 +134,34 @@ def rate_catalog(ctx: Context, exclude: list[str]) -> None:
node.write_to(config.catalog.path)


@click.command
@click.argument("href")
def validate(href: str) -> None:
"""Validate a STAC value with json-schema."""
url = urllib.parse.urlparse(href)
if url.scheme:
response = httpx.get(href)
response.raise_for_status()
data = response.json()
else:
with open(href) as f:
data = json.load(f)
pystac.validation.validate_dict(data)


@click.command
@click.pass_context
def config(ctx: Context) -> None:
"""Print the current configuration to standard output."""
config: Config = ctx.obj["config"]
toml.dump(json.loads(config.model_dump_json()), sys.stdout)


cli.add_command(crawl)
cli.add_command(rate)
cli.add_command(rate_catalog)
cli.add_command(validate)
cli.add_command(config)

if __name__ == "__main__":
cli()
Loading

0 comments on commit ff1608e

Please sign in to comment.