Skip to content

Commit

Permalink
feat(ts-etl): Add etl cli (#42)
Browse files Browse the repository at this point in the history
* add github actions with templates
* add eslint config and prettier config
* add initial passing test for a crossref API call
* add integration test and reduce object size
* update issue template
* add documentations
* add usage to readme
* include reference from sdk lib readme to etl
* add to release train
* update pnpm-lock
  • Loading branch information
ships authored May 16, 2023
1 parent d003311 commit 44a6042
Show file tree
Hide file tree
Showing 31 changed files with 4,892 additions and 109 deletions.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Packages affected:

- [ ] OWL/SHACL definitions
- [ ] ts-sdk
- [ ] ts-etl

### Expected behavior

Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Packages related to documentation request:

- [ ] OWL/SHACL definitions
- [ ] ts-sdk
- [ ] ts-etl

### Description

Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Packages to improve:

- [ ] OWL/SHACL definitions
- [ ] ts-sdk
- [ ] ts-etl

### Description

Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,14 @@ env:
jobs:
test-ts-sdk:
uses: ./.github/workflows/ts-sdk-tests.yaml
test-ts-etl:
uses: ./.github/workflows/ts-etl-tests.yaml
test-specification:
uses: ./.github/workflows/specification-tests.yaml
nodejs_release:
needs:
- test-ts-sdk
- test-ts-etl
- test-specification

runs-on: ubuntu-latest
Expand Down
67 changes: 67 additions & 0 deletions .github/workflows/ts-etl-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Test ts-etl

on:
push:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
workflow_call:

env:
PKG_DIR: "packages/ts-etl"

jobs:
nodejs_test:
runs-on: ubuntu-latest

strategy:
matrix:
node-version: [18.14.0]

steps:
- uses: actions/checkout@v3

- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}

- uses: pnpm/action-setup@v2
name: Install pnpm
id: pnpm-install
with:
version: 7
run_install: false

- name: Get pnpm store directory
id: pnpm-cache
shell: bash
run: |
echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT
- uses: actions/cache@v3
name: Setup pnpm cache
with:
path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('${{env.PKG_DIR}}/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-pnpm-store-
- name: Install dependencies
run: |
cd ${{env.PKG_DIR}} ;
pnpm install;
- name: Verify builds
run: |
cd ${{env.PKG_DIR}} ;
pnpm build;
- name: Test
run: |
cd ${{env.PKG_DIR}} ;
pnpm test;
- name: Lint Check
run: |
cd ${{env.PKG_DIR}} ;
pnpm lint;
14 changes: 8 additions & 6 deletions .github/workflows/ts-sdk-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ on:
workflow_dispatch:
workflow_call:

env:
PKG_DIR: "packages/ts-sdk"

jobs:
nodejs_test:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -33,33 +36,32 @@ jobs:
id: pnpm-cache
shell: bash
run: |
cd packages/ts-sdk;
echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT
- uses: actions/cache@v3
name: Setup pnpm cache
with:
path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('packages/ts-sdk/pnpm-lock.yaml') }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('${{env.PKG_DIR}}/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-pnpm-store-
- name: Install dependencies
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm install;
- name: Verify builds
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm build;
- name: Test
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm test;
- name: Lint Check
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm lint;
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,15 @@ that library natively integrates with `fp-ts` and enables easy encoding & decodi
from raw data types at runtime by creating Prototypical classes in runtime namespace
along with the types/interfaces in type namespace.

### [ts-etl](/packages/ts-etl)

This package contains a CLI tool based on `commander.js` for generating docmaps. Currently,
it supports generating a docmap for a given DOI if that DOI is indexed on Crossref, and
will traverse the Crossref API to find related preprints and reviews for that DOI. It is
still in a pre-release state while we gather feedback.

## Governance

As stated in CODE_OF_CONDUCT.md:
As stated in [CODE_OF_CONDUCT.md](/CODE_OF_CONDUCT.md):

This project is governed by the [Knowledge Futures, Inc Organizational Code of Conduct](https://github.com/knowledgefutures/general/blob/master/CODE_OF_CONDUCT.md).
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"@rdfjs/formats-common": "^3.1.0",
"@rdfjs/parser-jsonld": "^2.1.0",
"@rdfjs/parser-n3": "^2.0.1",
"@rdfjs/serializer-turtle": "^1.0.1",
"@rdfjs/serializer-turtle": "^1.1.1",
"rdf-ext": "^2.2.0",
"rdf-validate-shacl": "^0.4.5"
},
Expand Down
22 changes: 22 additions & 0 deletions packages/ts-etl/.eslintrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
module.exports = {
extends: [
'eslint:recommended',
'plugin:@typescript-eslint/recommended',
'plugin:@typescript-eslint/eslint-recommended',
'plugin:prettier/recommended',
],
parser: '@typescript-eslint/parser',
plugins: ['@typescript-eslint', 'prettier'],
root: true,
ignorePatterns: ['dist/'],
rules: {
'@typescript-eslint/no-unused-vars': [
'error',
{
varsIgnorePattern: '^_',
argsIgnorePattern: '^_',
caughtErrorsIgnorePattern: '^_',
},
],
},
}
8 changes: 8 additions & 0 deletions packages/ts-etl/.prettierrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
module.exports = {
semi: false,
trailingComma: 'all',
singleQuote: true,
quoteProps: 'as-needed',
printWidth: 100,
tabWidth: 2,
}
59 changes: 59 additions & 0 deletions packages/ts-etl/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Contributing to docmaps ts-etl

We welcome contributions from anyone interested in improving the docmaps project! Before you get started, please read through the guidelines below to ensure that your contributions are effective and useful.

## Workflow
1. Fork the repository and clone it locally, or create a branch.
2. [Recommended] Install pnpm if you haven't already: `npm install -g pnpm`
3. Run pnpm install in the package directory to install all dependencies for the project.
4. Add, commit, and push your changes to your fork/branch.
5. Submit a pull request (PR) to the main branch of the `docmaps-project/docmaps` repository.

## Contributing Guidelines
1. Follow the code of conduct.
2. Before starting any work, make sure to check the issues and pull requests to see if your contribution has already been discussed or implemented.
3. If you are working on a new feature or bug fix, create a new issue to discuss it with the maintainers and other contributors.
4. Before submitting a PR, make sure your code is properly formatted, tested, and documented.
5. Make sure your commit messages are descriptive and follow the conventional commit format (imperative tense). Your PR will be merged with a squash.

Write new tests to cover any new functionality or bug fixes.

## Code Review
All PRs will be reviewed by at least one maintainer or contributor.
Reviewers may request changes or ask for clarifications on the PR.
Once the changes have been made, the PR will be merged by a maintainer or contributor.

## Local development

[`nvm`](https://github.com/nvm-sh/nvm) is a good local Node version manager.

```
nvm use 18.14.0
```

I recommend you use `pnpm` for best performance. Alternatively you can use `npm`.

```bash
pnpm install
pnpm test && pnpm build
```

If these exit zero, you're good to get started with your changes.

## Tests

Tests are written BDD-style. You should make meaningful assertions that cover
any new complex logic. You don't have to cover every possible case. It is recommended
to follow the red-green-refactor pattern by writing tests first. As a rule of thumb,
if your code change can be reverted while leaving your test changes in place, and the
suites still pass, your test coverage or specificity should be increased.

**Hanging tests.**
Test are run using [AVA](https://github.com/avajs/ava). This has much smaller dependency footprint than Jest.
However it runs `tsc` in a hidden way such that if compilation fails, you will get `Timed out while running tests`
rather than a useful error. Diagnose this issue by running `pnpm build` yourself to get a better error message.

Every PR is validated by a Github Actions workflow for EVERY package in the repo, not just the
one you are developing on.

Thanks for contributing!
46 changes: 46 additions & 0 deletions packages/ts-etl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Extract-Transform-Load CLI for Docmaps

This typescript library is designed to provide core, highly-general docmaps
functionality for ease-of-use in Typescript. It provides out-of-the-box
validation of JSON-LD documents interpreted as docmaps directly. It is intended
to additionally support validation of Docmap sub-elements, such as individual
Actions or Actors that might be published separately from a whole Docmap. It
will also be integrated into concrete tools such as a docmap-from-meca ETL pipeline
and general visualization tools.

# Usage

In this repository:

```bash
pnpm install # or npm install
pnpm start item --source crossref-api 10.5194/angeo-40-247-2022 # or npm start
```

## Implementation

This tool and library are written using the [`docmaps-sdk` package](/packages/ts-sdk)
in this repository, as well as the [`crossref-openapi-client-ts`](https://github.com/Docmaps-Project/crossref-openapi-client-ts)
also maintained by Knowledge Futures, Inc. As seen in `src/crossref.ts`[src/crossref.ts],
Codecs from the SDK are processed using functional paradigms provided conveniently by
`fp-ts`.

## Documentation

Documentation is comments-only for now. See [relevant issue](https://github.com/Docmaps-Project/docmaps/issues/20).

## Contributing

For Code of Conduct, see the repository-wide [CODE_OF_CONDUCT.md](/CODE_OF_CONDUCT.md).

For info about local development of this repository, see [CONTRIBUTING.md](CONTRIBUTING.md).

## Current next steps

Review the issues on this repository for up-to-date info of desired improvements.
There are also expressive TODOs in the codebase.
Here are some examples:

- [ ] Enable direct configuration of the publisher information for generated Docmaps
- [ ] Handle paginated requests for efficient parallel processing.
- [ ] Make the ETL interface generic enough to handle at least one other data source than Crossref.
63 changes: 63 additions & 0 deletions packages/ts-etl/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{
"name": "@docmaps/etl",
"version": "0.1.2",
"description": "ETL tool for Docmaps",
"type": "module",
"main": "dist/index.js",
"scripts": {
"test": "ava",
"clean": "rm -rf dist/",
"test:integration": "ava test/integration/",
"test:unit": "ava test/unit/",
"start": "node --loader=ts-node/esm --experimental-specifier-resolution=node --nolazy -r ts-node/register/transpile-only src/cli.ts",
"lint": "npx eslint .",
"lint:fix": "npx eslint --fix .",
"prepare": "tsc --declaration",
"build": "tsc"
},
"bin": {
"docmaps-etl": "dist/cli.js"
},
"keywords": [],
"author": "eve github.com/ships",
"license": "ISC",
"files": [
"dist/",
"README.md",
"package.json",
"tsconfig.json"
],
"dependencies": {
"@commander-js/extra-typings": "^10.0.3",
"commander": "^10.0.1",
"crossref-openapi-client-ts": "^1.3.0",
"docmaps-sdk": "^0.5.1",
"fp-ts": "^2.14.0"
},
"devDependencies": {
"@tsconfig/node-lts-strictest-esm": "^18.12.1",
"@types/node": "^18.16.2",
"@typescript-eslint/eslint-plugin": "^5.59.1",
"@typescript-eslint/parser": "^5.59.1",
"ava": "^5.2.0",
"eslint": "^8.39.0",
"eslint-config-prettier": "^8.8.0",
"eslint-plugin-prettier": "^4.2.1",
"prettier": "^2.8.8",
"ts-mockito": "^2.6.1",
"ts-node": "^10.9.1",
"typescript": "^4.9.5"
},
"ava": {
"extensions": {
"ts": "module"
},
"nodeArguments": [
"--loader=ts-node/esm",
"--experimental-specifier-resolution=node"
],
"files": [
"**/*.test.ts"
]
}
}
Loading

0 comments on commit 44a6042

Please sign in to comment.