Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add etl cli #42

Merged
merged 48 commits into from
May 16, 2023
Merged

Add etl cli #42

merged 48 commits into from
May 16, 2023

Conversation

ships
Copy link
Collaborator

@ships ships commented Mar 29, 2023

Description

This PR adds a CLI tool in typescript that enables creation of docmaps based on an external source of similar data. In this MVP, it will infer docmapsy information from the Crossref API and make several assumptions. The goal will be to extract the upstream adapter as a pluggable component so this CLI becomes generic.

The basic usage can be discovered with pnpm start help. It is something like this:

pnpm start item --source crossref-api 10.5194/angeo-40-247-2022

Recall that a Docmap's core datapoints are a collection interconnected steps. Additional metadata is also included. However the docmap does not explicitly have a "subject" that is a DOI -- this can sometimes be inferred from its ID or its steps.

The CLI follows a basic recursive routine: it creates a Step for the identified DOI, and if it has any review articles referring to it, an additional step is included after the main step. Further, if there is a Preprint for the identified DOI, it will recursively invoke this routine and prepend the result to the step list. Once all the Steps are identified they are wired together using next-step and previous-step in a slightly hacky way to be fed into a Docmap.

Related Issues

#37 - Crossref-to-Docmaps
#24 - Example using fp-ts to parse

Checklist

  • I have tested these changes locally and they work as expected.
  • I have added or updated tests to cover any new functionality or bug fixes.
  • I have updated the documentation to reflect any changes or additions to the project.
  • I have followed the project's code of conduct and conventions for commit messages.

Additional Information

Provide any additional information that might be helpful in understanding this pull request, such as screenshots, links to relevant research, or other context.

@ships ships force-pushed the ships/add-etl-cli branch 2 times, most recently from 2164cae to 9a83b47 Compare May 3, 2023 23:56
@ships ships changed the title WIP: add etl cli Add etl cli May 3, 2023
packages/ts-etl/package.json Outdated Show resolved Hide resolved
@ships ships force-pushed the ships/add-etl-cli branch from 08b1b4c to c938d5e Compare May 10, 2023 22:48
@ships ships mentioned this pull request May 10, 2023
4 tasks
@3mcd
Copy link

3mcd commented May 16, 2023

Looks great to me! Tests look comprehensive enough and I feel I have a good grasp for how it works after your walkthrough. And thank you for adding those comments!

@ships ships merged commit 44a6042 into main May 16, 2023
@ships ships deleted the ships/add-etl-cli branch August 22, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants