Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to a custom Github-Api module #34

Closed
carterm opened this issue Jul 30, 2021 · 26 comments · Fixed by #118
Closed

Switch to a custom Github-Api module #34

carterm opened this issue Jul 30, 2021 · 26 comments · Fixed by #118
Assignees

Comments

@carterm
Copy link
Contributor

carterm commented Jul 30, 2021

The github-api module is causing many exceptions related to "Cannot read property 'status' of undefined" when there is a fetch issue. Now that we are using tree updates only it would be easy to switch back to a custom module that doesn't need all the other github-api features.

This would incorporate all the features in the current gitTreeCommon module, that is used by multiple projects.

New module features

  • Take content map (filename + content) and compare it with a GitHub project. Create a commit that updates the project.
  • Option to include the commits in a PR
  • Option to add assignees to the PR
  • Uses the GitHub hashing algorithm to improve the ability to determine binary file changes and prevent unnecessary content uploads.

@cagov\github-sync ?

For custom REST examples, use our old GitHub API connector here

@aaronhans
Copy link
Contributor

Makes sense, glad we tried the github module, shame it isn't well supported

@carterm
Copy link
Contributor Author

carterm commented Oct 6, 2021

Latest issue...
github-tools/github#439

@carterm carterm self-assigned this Oct 6, 2021
@carterm carterm changed the title Switch to custom Github module Switch to a custom Github-Api module Oct 20, 2021
@carterm
Copy link
Contributor Author

carterm commented Oct 26, 2021

The new github module will focus on tree updates, and only support the github api functions that we use on a regular basis. Considering names for a new repo. cagov/github-connector

@chachasikes
Copy link
Contributor

Any updates on this?

@chachasikes
Copy link
Contributor

@carterm - I have a couple of questions:
I see the thread on github-tools.

I've handled a similar issue before, and introduced a fetch timeout (on our end).
The issue from github-tools is requesting that they handle network detection, which would be nice - but it's also something we can do.

Q: can you share the lines of code you are proposing to replace (so we can all see them?)

I know you were just asking about what to name the module. It's helpful to see the code. Not sure it needs to be a module yet.

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

This would be a replacement for all references to github-api.

@chachasikes
Copy link
Contributor

Ah.

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

This would incorporate all the features in the current gitTreeCommon module, that is used by multiple projects.
https://github.com/cagov/wordpress-to-github/blob/main/wordpress-to-github/gitTreeCommon/index.js

@chachasikes
Copy link
Contributor

(context: Carter is asking us in Slack about what to name a module.)

https://github.com/github-tools/github - "provides a minimal higher-level wrapper around Github's API."

So you are talking about parsing data from the github-api for this module for wordpress-to-github.

To name it, I would look through the code you wrote, & share with us a bullet pointed list of the main features that are going to go into this new module. Then we can name it better.

For example:

  • Get github tree data and check for recent commits
  • Process sha to see if we need to create a new pull request.

What else?

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

  • Push commits
  • Push commits as Pull request
  • Add issue assignees
  • get commit reports from actions that can be used for logging

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

Take a list of content -> compare content to GitHub -> make updates where needed -> list what was changed

@chachasikes
Copy link
Contributor

Great - so trying to come up with some names for the features you have described:

(brainstorming)

  • github-content-api-pr
  • github-pr-from-content-api
  • github-headless-diffs
  • github-content-diff
  • github-content-api-diff

PR's are an important part of this right?
Also - would this module be useful to the data pipelines?

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

PRs are an option, but the main thing is synchronizing files. It could just be a commit.

@chachasikes
Copy link
Contributor

  • github-sync-content-api
  • github-api-sync

I'm assuming the source is an API. It definitely is for what we are doing & that might be okay as a specialty.

Does this include the media handling, or can that be a wordpress specific add on that's part of this repo?

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

API seems overused. Everything uses an API. All types of files, binaries too.

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

It's also for pushing content syncs into GitHub, not for syncing a file system download like a pull would do.

@chachasikes
Copy link
Contributor

chachasikes commented Nov 8, 2021

If the interface is an api, it's okay to overuse it - if it's designed to work with APIs. (Which, as you saw in my presentation, is overused but not understood - also "content APIs" is underutilized and we might use that term more if it's more clear to people we are working with - and that's one of our most important features.

More questions:

  • Is it (in State context) only for github with WordPress API (or also Snowflake or other API commonly used by the State?)
  • Can it read a binary folder directory, or does it take an array of paths and then downloads them, diffing binaries as needed?

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

  • The only dependency would be GitHub as a service (via using its API).
  • The current gitTreeCommon has the ability to generate hashes from binary files that match the GitHub hashes, which greatly improve the update process. Determining the hashes of the source files still requires downloading them from the content source, unless there was a way to push the GitHub hashes back to the source files (like WordPress).

@carterm
Copy link
Contributor Author

carterm commented Nov 8, 2021

We had a custom GitHub connector before using the NPM module. Here is the last version...
https://github.com/cagov/wordpress-integration-api/blob/dc4fe93267079ccae1ff5a77b0149ceacea50fd2/WordPressService/gitHub.js

@chachasikes
Copy link
Contributor

chachasikes commented Nov 9, 2021

Another thought question: What are the inputs/interfaces to this module - a bunch of arrays?

I found this:

    const newTree = await createTreeFromFileMap(
      gitRepo,
      gitHubTarget.Branch,
      fileMap,
      filePath
    );

And this interface:

* @param {*} gitRepo from github-api
* @param {string} masterBranch usually "master" or "main"
* @param {Map<string,any>} filesMap contains the data to push
* @param {string} outputPath the root path for all files
* @param {boolean} [cleanoutputPath] true to delete all unmatched files in outputPath

This is helpful

@chachasikes
Copy link
Contributor

More terrible options
(github-sync is close but it feels like one more specifier & a little more context awareness will help)

More brainstorming:

  • github-sync-map
  • github-push-data-map - it's a one way sync (pushing into github), right?
  • github-data-push
  • github-map-push - (easy to confuse with maps)
  • github-api-push
  • github-content-push
  • github-content-api-push - it is only for content / designed for content API's right?

More use cases to think about:

  • non-wordpress APIs (headless CMSs that are not wordpress)
  • data APIs yes or no?

@carterm
Copy link
Contributor Author

carterm commented Nov 9, 2021

One important key component is that all the updates will be performed using Git Trees, which are much more transactional for updates and reduce the amount of conflicts for large updates. They work fine for single file updates too. So maybe tree can be in the name? But, that's transparent to the user of this module. So maybe not.

I'm not a fan of putting API in the name because it's just a way that computers talk, and unless understanding how the API works is a pre-requisite for use, I'd suggest we leave it out.

This is CMS agnostic. Downloading content from the CMS (custom data API or not) is outside of the scope of this module.

My favorites from your list...

  • github-data-push
  • github-content-push

I'm thinking for keywords, for 3rd parties looking to use this, tree might be good to have in there. So possibly...

  • github-tree-push
  • github-tree-planter

@carterm
Copy link
Contributor Author

carterm commented Nov 10, 2021

@carterm
Copy link
Contributor Author

carterm commented Nov 19, 2021

Going to start creating the repo with github-tree-push as the name. Will hold off on publishing to NPM.

@carterm
Copy link
Contributor Author

carterm commented Dec 1, 2021

Work has moved to new repo...
https://github.com/cagov/github-tree-push

@carterm
Copy link
Contributor Author

carterm commented Dec 3, 2021

New module deployed!
https://www.npmjs.com/package/@cagov/github-tree-push

@carterm carterm linked a pull request Dec 10, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants