Skip to content

Commit

Permalink
Merge pull request #6 from slott56/dev/v5
Browse files Browse the repository at this point in the history
Dev/v5 to Main
  • Loading branch information
slott56 authored May 26, 2022
2 parents 6be63d1 + dc366ec commit 7e9a46c
Show file tree
Hide file tree
Showing 502 changed files with 47,169 additions and 88,265 deletions.
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,17 @@

.idea/*
.tox/*
docs/html/.doctrees/
.notes
.ipynb_checkpoints/*
.mypy_cache/*
.coverage
stingray/__pycache__/*
demo/__pycache__/*
tests/__pycache__/*
stingray.egg-info/*
build/lib/*

v4.5/*
docs/build/doctrees/*
*.doctree
*.doctree
14 changes: 14 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Stingray Reader
.PHONY: test docs

test:
tox --skip-missing-interpreters

docs:
cd docs && PYTHONPATH=$(pwd).. $(MAKE) html

docs-coverage:
cd docs && PYTHONPATH=$(pwd).. SPHINXOPTS="-b coverage" $(MAKE) html

apidoc_gen:
PYTHONPATH=$(pwd) sphinx-apidoc --separate -o apidoc stingray
79 changes: 53 additions & 26 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,50 +3,77 @@ The Stingray Schema-Based File Reader
================================================================

Spreadsheet format files are the *lingua franca* of data processing.
CSV, Tab, XLS, XSLX and ODS files are used widely. Python's ``csv`` module
and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help
us handle spreadsheet files.
CSV, Tab, XLS, XSLX and ODS files are used widely. Python's ``csv``
module handles two common formats. Add-on packages are required for the
variety of other physical file formats.

By themselves, however, they aren't a very complete solution.
The problem is that each add-on package has a unique view of the underlying
data.

The Stingray Schema-Based File Reader offers several features to help
process files in spreadsheet formats.

1. It wraps `csv`, `xlrd`, plus several XML parsers into a single, unified
"workbook" structure to make applications that work with any
of the common physical formats.
1. It wraps format-specific modules with a unified
"workbook" Facade to make applications able to work with any
of the physical formats.

2. It extends the workbook to include fixed format files (with no delimiters)
and even COBOL files in EBCDIC.
2. It extends the workbook concept to include non-delimited files, including
COBOL files encoded in any of the Unicode encodings, as well as ASCII and EBCDIC.

3. It provides a uniform way to load and use schema information. This can
be header rows in the individual sheets of a workbook, or it can be separate
schema information.
3. It provides a uniform way to load and use schema information based on JSONSchema.
A schema can be as small as header rows in the individual sheets of a workbook, or it can be separate
schema information in another spreadsheet, a JSONSchema document, or COBOL "copybook"
data definitions.

4. It provides a suite of data conversions that cover the most common cases.

Additionally, stringray provides some guidance on how to structure
Additionally, the Stingray Reader provides some guidance on how to structure
file-processing applications so that they are testable and composable.

Stingray 4.5 requires Python >=3.5
Stingray 5.0 requires Python >= 3.9. The code is fully annotated with type hints.

It depends on this project to read .XLS files:
This depends on additional projects to read .XLS, .XLSX, .ODS, and .NUMBERS files.

- xlrd. http://www.lexicon.net/sjmachin/xlrd.htm
- CSV files are built-in using the ``csv`` module.

Changes
=======
- COBOL files are built-in using the ``estruct`` and ``cobol_parser`` modules.

If you want to build from scratch and create documentation, you'll need these
other two projects:
- NDJSON or JSON Newline files are JSON with an extra provision that each document must be complete on one physical line.
These use the built-in ``json`` module.

- PyLit3. https://github.com/slott56/PyLit-3
- XLS files can be read via the ``xlrd`` project: http://www.lexicon.net/sjmachin/xlrd.htm

- Sphinx. http://sphinx.pocoo.org/
- ODS and XLSX can be read via two projects: https://openpyxl.readthedocs.io/en/stable/ and http://docs.pyexcel.org/en/v0.0.6-rc2/.

Since Stingray is a *Literate Programming* project, the documentation is also
the source. And vice-versa.
- Numbers (v13 and higher) usees protobuf and and snappy compression. See https://pypi.org/project/numbers-parser/.

The ``build.py`` runs **PyLit3** to convert the RST docs to Python as well
as HTML.
- YAML files can be a sequence of documents, permitting a direct mapping to a Workbook with a single Sheet.

- TOML files are -- in effect -- giant dictionaries with flexible syntax and can be described by a JSONSchema.

- XML files can be wrapped in a Workbook. There's no automated translation from XSD to JSONSchema here.
A sample is provided, but this may not solve very many problems in general.

A file-suffix registry is used to map a suffix to a Workbook subclass that handles the physical format.
A decorator is used to add or replace file suffix mappings.

Environment Setup
=================

Here's an example of building a working environment using
Miniconda. See https://conda.io/miniconda.html for more information
on how to use this environment management tool. With ``conda``
all commands are the same on Windows, Linux, and macos.

::

conda create -n stingray python=3.9
conda activate stingray
python -m pip install tox
python -m pip install --requirement requirements.txt

This makes sure everything you need is in a tidy, self-contained
environment.

This can be done entirely with **PIP**, also. A virtual environment
is strongly encouraged to make sure the dependencies are all installed properly.
Loading

0 comments on commit 7e9a46c

Please sign in to comment.