Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add etl cli #42

Merged
merged 48 commits into from
May 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
fbf0233
Package initiailization for ETL CLI
ships Mar 29, 2023
722556b
add github actions with templates
ships Mar 29, 2023
7458fe6
amend use of template workflow
ships Mar 29, 2023
3174521
fix new pipeline as well
ships Mar 29, 2023
e5e9ae3
add eslint config and prettier config
ships Mar 29, 2023
f8586d7
run lint fix
ships Mar 29, 2023
6b7faad
add initial passing test for a crossref API call
ships Mar 30, 2023
b52aa4c
lint fix
ships Apr 3, 2023
3432398
refactor test fixture setup
ships Apr 3, 2023
7e1d223
fix test running for ts-etl
ships Apr 12, 2023
7846d18
add relation type to filter query when paging
ships Apr 13, 2023
9e482e0
add initial algorithm for extracting crossref
ships Apr 25, 2023
ed632ee
fix: correct ordering of args in call
ships Apr 25, 2023
2788c00
add missing src/types file to git
ships Apr 26, 2023
d90ed3c
add integration test and reduce object size
ships Apr 27, 2023
1648e3b
lint:fix
ships Apr 27, 2023
c8b0aef
finish fixing lint
ships Apr 27, 2023
ad52b80
fix lint again
ships Apr 27, 2023
0349bad
update issue template
ships May 1, 2023
baf77bb
refactor: extract creation of Action from a Work
ships May 1, 2023
04629a0
refactor: convert step array to docmap body
ships May 2, 2023
561f5c6
handle extra recursive cases
ships May 3, 2023
1b8d6d2
lint fix
ships May 3, 2023
193f9ec
update README
ships May 3, 2023
3bd382a
add documentations
ships May 3, 2023
2d81b50
add usage to readme
ships May 3, 2023
680229b
Update crossref.ts
ships May 4, 2023
eaf8c8a
use helper constructor
ships May 4, 2023
9871f01
syntax cleanup
ships May 8, 2023
9c9e1f8
more refactors
ships May 8, 2023
61f6603
restructure cli directory for plugins
ships May 9, 2023
87afab5
cleaner error handling in parsing of actions
ships May 9, 2023
73364aa
lint fix
ships May 9, 2023
abcf20b
add flags for publisher
ships May 9, 2023
d91082e
include reference from sdk lib readme to etl
ships May 9, 2023
c938d5e
adds some comments
ships May 9, 2023
283732a
add to release train
ships May 10, 2023
99f58ef
update pnpm-lock
ships May 10, 2023
12f4210
move things around for better exports
ships May 10, 2023
b05b811
run release on this branch to attempt to release etl
ships May 10, 2023
0443dbf
lint fix
ships May 10, 2023
754c600
enable workflow call
ships May 10, 2023
e318c34
change releaserc to test releasing etl-docmaps package
ships May 11, 2023
e0d77e9
move devDependency to dependency
ships May 11, 2023
46ef9d6
update docmaps-sdk dep
ships May 11, 2023
ce282a8
update docmaps-etl
ships May 11, 2023
26f799f
patch packaging
ships May 15, 2023
12d52de
no longer release from feature branch
ships May 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Packages affected:

- [ ] OWL/SHACL definitions
- [ ] ts-sdk
- [ ] ts-etl

### Expected behavior

Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Packages related to documentation request:

- [ ] OWL/SHACL definitions
- [ ] ts-sdk
- [ ] ts-etl

### Description

Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Packages to improve:

- [ ] OWL/SHACL definitions
- [ ] ts-sdk
- [ ] ts-etl

### Description

Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,14 @@ env:
jobs:
test-ts-sdk:
uses: ./.github/workflows/ts-sdk-tests.yaml
test-ts-etl:
uses: ./.github/workflows/ts-etl-tests.yaml
test-specification:
uses: ./.github/workflows/specification-tests.yaml
nodejs_release:
needs:
- test-ts-sdk
- test-ts-etl
- test-specification

runs-on: ubuntu-latest
Expand Down
67 changes: 67 additions & 0 deletions .github/workflows/ts-etl-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Test ts-etl

on:
push:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
workflow_call:

env:
PKG_DIR: "packages/ts-etl"

jobs:
nodejs_test:
runs-on: ubuntu-latest

strategy:
matrix:
node-version: [18.14.0]

steps:
- uses: actions/checkout@v3

- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}

- uses: pnpm/action-setup@v2
name: Install pnpm
id: pnpm-install
with:
version: 7
run_install: false

- name: Get pnpm store directory
id: pnpm-cache
shell: bash
run: |
echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT

- uses: actions/cache@v3
name: Setup pnpm cache
with:
path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('${{env.PKG_DIR}}/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-pnpm-store-

- name: Install dependencies
run: |
cd ${{env.PKG_DIR}} ;
pnpm install;

- name: Verify builds
run: |
cd ${{env.PKG_DIR}} ;
pnpm build;

- name: Test
run: |
cd ${{env.PKG_DIR}} ;
pnpm test;

- name: Lint Check
run: |
cd ${{env.PKG_DIR}} ;
pnpm lint;
14 changes: 8 additions & 6 deletions .github/workflows/ts-sdk-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ on:
workflow_dispatch:
workflow_call:

env:
PKG_DIR: "packages/ts-sdk"

jobs:
nodejs_test:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -33,33 +36,32 @@ jobs:
id: pnpm-cache
shell: bash
run: |
cd packages/ts-sdk;
echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT

- uses: actions/cache@v3
name: Setup pnpm cache
with:
path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('packages/ts-sdk/pnpm-lock.yaml') }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('${{env.PKG_DIR}}/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-pnpm-store-

- name: Install dependencies
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm install;

- name: Verify builds
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm build;

- name: Test
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm test;

- name: Lint Check
run: |
cd packages/ts-sdk;
cd ${{env.PKG_DIR}} ;
pnpm lint;
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,15 @@ that library natively integrates with `fp-ts` and enables easy encoding & decodi
from raw data types at runtime by creating Prototypical classes in runtime namespace
along with the types/interfaces in type namespace.

### [ts-etl](/packages/ts-etl)

This package contains a CLI tool based on `commander.js` for generating docmaps. Currently,
it supports generating a docmap for a given DOI if that DOI is indexed on Crossref, and
will traverse the Crossref API to find related preprints and reviews for that DOI. It is
still in a pre-release state while we gather feedback.

## Governance

As stated in CODE_OF_CONDUCT.md:
As stated in [CODE_OF_CONDUCT.md](/CODE_OF_CONDUCT.md):

This project is governed by the [Knowledge Futures, Inc Organizational Code of Conduct](https://github.com/knowledgefutures/general/blob/master/CODE_OF_CONDUCT.md).
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"@rdfjs/formats-common": "^3.1.0",
"@rdfjs/parser-jsonld": "^2.1.0",
"@rdfjs/parser-n3": "^2.0.1",
"@rdfjs/serializer-turtle": "^1.0.1",
"@rdfjs/serializer-turtle": "^1.1.1",
"rdf-ext": "^2.2.0",
"rdf-validate-shacl": "^0.4.5"
},
Expand Down
22 changes: 22 additions & 0 deletions packages/ts-etl/.eslintrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
module.exports = {
extends: [
'eslint:recommended',
'plugin:@typescript-eslint/recommended',
'plugin:@typescript-eslint/eslint-recommended',
'plugin:prettier/recommended',
],
parser: '@typescript-eslint/parser',
plugins: ['@typescript-eslint', 'prettier'],
root: true,
ignorePatterns: ['dist/'],
rules: {
'@typescript-eslint/no-unused-vars': [
'error',
{
varsIgnorePattern: '^_',
argsIgnorePattern: '^_',
caughtErrorsIgnorePattern: '^_',
},
],
},
}
8 changes: 8 additions & 0 deletions packages/ts-etl/.prettierrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
module.exports = {
semi: false,
trailingComma: 'all',
singleQuote: true,
quoteProps: 'as-needed',
printWidth: 100,
tabWidth: 2,
}
59 changes: 59 additions & 0 deletions packages/ts-etl/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Contributing to docmaps ts-etl

We welcome contributions from anyone interested in improving the docmaps project! Before you get started, please read through the guidelines below to ensure that your contributions are effective and useful.

## Workflow
1. Fork the repository and clone it locally, or create a branch.
2. [Recommended] Install pnpm if you haven't already: `npm install -g pnpm`
3. Run pnpm install in the package directory to install all dependencies for the project.
4. Add, commit, and push your changes to your fork/branch.
5. Submit a pull request (PR) to the main branch of the `docmaps-project/docmaps` repository.

## Contributing Guidelines
1. Follow the code of conduct.
2. Before starting any work, make sure to check the issues and pull requests to see if your contribution has already been discussed or implemented.
3. If you are working on a new feature or bug fix, create a new issue to discuss it with the maintainers and other contributors.
4. Before submitting a PR, make sure your code is properly formatted, tested, and documented.
5. Make sure your commit messages are descriptive and follow the conventional commit format (imperative tense). Your PR will be merged with a squash.

Write new tests to cover any new functionality or bug fixes.

## Code Review
All PRs will be reviewed by at least one maintainer or contributor.
Reviewers may request changes or ask for clarifications on the PR.
Once the changes have been made, the PR will be merged by a maintainer or contributor.

## Local development

[`nvm`](https://github.com/nvm-sh/nvm) is a good local Node version manager.

```
nvm use 18.14.0
```

I recommend you use `pnpm` for best performance. Alternatively you can use `npm`.

```bash
pnpm install
pnpm test && pnpm build
```

If these exit zero, you're good to get started with your changes.

## Tests

Tests are written BDD-style. You should make meaningful assertions that cover
any new complex logic. You don't have to cover every possible case. It is recommended
to follow the red-green-refactor pattern by writing tests first. As a rule of thumb,
if your code change can be reverted while leaving your test changes in place, and the
suites still pass, your test coverage or specificity should be increased.

**Hanging tests.**
Test are run using [AVA](https://github.com/avajs/ava). This has much smaller dependency footprint than Jest.
However it runs `tsc` in a hidden way such that if compilation fails, you will get `Timed out while running tests`
rather than a useful error. Diagnose this issue by running `pnpm build` yourself to get a better error message.

Every PR is validated by a Github Actions workflow for EVERY package in the repo, not just the
one you are developing on.

Thanks for contributing!
46 changes: 46 additions & 0 deletions packages/ts-etl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Extract-Transform-Load CLI for Docmaps

This typescript library is designed to provide core, highly-general docmaps
functionality for ease-of-use in Typescript. It provides out-of-the-box
validation of JSON-LD documents interpreted as docmaps directly. It is intended
to additionally support validation of Docmap sub-elements, such as individual
Actions or Actors that might be published separately from a whole Docmap. It
will also be integrated into concrete tools such as a docmap-from-meca ETL pipeline
and general visualization tools.

# Usage

In this repository:

```bash
pnpm install # or npm install
pnpm start item --source crossref-api 10.5194/angeo-40-247-2022 # or npm start
```

## Implementation

This tool and library are written using the [`docmaps-sdk` package](/packages/ts-sdk)
in this repository, as well as the [`crossref-openapi-client-ts`](https://github.com/Docmaps-Project/crossref-openapi-client-ts)
also maintained by Knowledge Futures, Inc. As seen in `src/crossref.ts`[src/crossref.ts],
Codecs from the SDK are processed using functional paradigms provided conveniently by
`fp-ts`.

## Documentation

Documentation is comments-only for now. See [relevant issue](https://github.com/Docmaps-Project/docmaps/issues/20).

## Contributing

For Code of Conduct, see the repository-wide [CODE_OF_CONDUCT.md](/CODE_OF_CONDUCT.md).

For info about local development of this repository, see [CONTRIBUTING.md](CONTRIBUTING.md).

## Current next steps

Review the issues on this repository for up-to-date info of desired improvements.
There are also expressive TODOs in the codebase.
Here are some examples:

- [ ] Enable direct configuration of the publisher information for generated Docmaps
- [ ] Handle paginated requests for efficient parallel processing.
- [ ] Make the ETL interface generic enough to handle at least one other data source than Crossref.
63 changes: 63 additions & 0 deletions packages/ts-etl/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{
"name": "@docmaps/etl",
"version": "0.1.2",
"description": "ETL tool for Docmaps",
"type": "module",
"main": "dist/index.js",
"scripts": {
"test": "ava",
"clean": "rm -rf dist/",
"test:integration": "ava test/integration/",
"test:unit": "ava test/unit/",
"start": "node --loader=ts-node/esm --experimental-specifier-resolution=node --nolazy -r ts-node/register/transpile-only src/cli.ts",
"lint": "npx eslint .",
"lint:fix": "npx eslint --fix .",
"prepare": "tsc --declaration",
"build": "tsc"
},
"bin": {
"docmaps-etl": "dist/cli.js"
},
"keywords": [],
"author": "eve github.com/ships",
"license": "ISC",
"files": [
"dist/",
"README.md",
"package.json",
"tsconfig.json"
],
"dependencies": {
"@commander-js/extra-typings": "^10.0.3",
"commander": "^10.0.1",
"crossref-openapi-client-ts": "^1.3.0",
"docmaps-sdk": "^0.5.1",
"fp-ts": "^2.14.0"
},
"devDependencies": {
"@tsconfig/node-lts-strictest-esm": "^18.12.1",
"@types/node": "^18.16.2",
"@typescript-eslint/eslint-plugin": "^5.59.1",
"@typescript-eslint/parser": "^5.59.1",
"ava": "^5.2.0",
"eslint": "^8.39.0",
"eslint-config-prettier": "^8.8.0",
"eslint-plugin-prettier": "^4.2.1",
"prettier": "^2.8.8",
"ts-mockito": "^2.6.1",
"ts-node": "^10.9.1",
"typescript": "^4.9.5"
},
"ava": {
"extensions": {
"ts": "module"
},
"nodeArguments": [
"--loader=ts-node/esm",
"--experimental-specifier-resolution=node"
],
"files": [
"**/*.test.ts"
]
}
}
Loading