Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper only commits to main #444

Open
CGBassPlayer opened this issue Oct 4, 2022 · 10 comments
Open

Scraper only commits to main #444

CGBassPlayer opened this issue Oct 4, 2022 · 10 comments
Labels
dev Related to the development environment enhancement New feature, enhancement, or request feedback requested

Comments

@CGBassPlayer
Copy link
Collaborator

CGBassPlayer commented Oct 4, 2022

Currently the scraper only commits new episodes to main via the GitHub Action. I think we need the episodes to go into develop and main. I am drafting a PR and it would commit the 3 missing episodes to develop and I am not sure if this will cause issues when develop is merged into main in the future.

I am aware that this only affects contributors that forked the repo before the switch of develop being the main branch, but it is still something we need to think about.

My guess would be

  1. We update the action to commit to both develop and main
    • I'm not 100% sure about this one because I think that this could cause merge conflicts even if the commits are identical (not sure if git/GitHub is smart enough to see they are the same commit)
  2. We could make a branch for the scraped data and automatically create/merge PRs for both develop and main
    • It sounds a bit complicated, but it should relieve the concern I have about in idea 1

Definitely looking for some ideas on this one...

@gerbrent gerbrent added dev Related to the development environment enhancement New feature, enhancement, or request feedback requested labels Oct 4, 2022
@elreydetoda
Copy link
Collaborator

Ya...I noticed this last night with @gerbrent 's PR. I hadn't been able to fix it yet but my plan was to try and do some type of GH action after the scrape to sync over the changes. But when thinking about it more this morning I realized it's a bit more complicated than that, because main could be missing some commits that the develop has...

Also number 1 I believer you're correct it either would cause conflicts or at minimum would probably duplicate the amounts of commits. Number 2 sounds like the most promising, but I personally don't know how to allow auto-merged PRs for a specific "type of commit" without turning it on for the whole repo: https://docs.github.com/en/github-ae@latest/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-auto-merge-for-pull-requests-in-your-repository

For now I've manually ran git pull --ff-only origin main and then git push on the develop branch. So it should be up to date with main now.

@CGBassPlayer
Copy link
Collaborator Author

I was thinking the exact same thing @elreydetoda

Could we have auto-merges part of the GitHub action? I see this action on the marketplace after a quick google search. and it got me thinking we could have the action scrape, create the PRs with an automerge label then immediately merge it.

I have no idea what our branch protection rules look like right now, but I will do a little more research to see what I can learn about handling this problem in the meantime

@elreydetoda
Copy link
Collaborator

That was what I was thinking was potentially via GH action, but I hadn't been able to look for one yet.

I know we could definitely have an action trigger after another one finishes (unfortunately we can't say only if it successfully finishes, but just if it finishes) like here:

https://github.com/JupiterBroadcasting/jupiterbroadcasting.com/blob/develop/.github/workflows/deploy-prod.yml#L14-L17

My biggest concern would be of the security of those GH actions (most seem to be nodejs based) and if the project is actively maintained or not (i.e. when was the last time the author responded to an issue/PR or committed).

@CGBassPlayer
Copy link
Collaborator Author

If we don't find a GitHub action that is still maintained or that isn't JS, maybe we can create our own?

I might look into this regardless because I've been thinking about learning GitHub Actions anyway. This would just give me a good excuse.

elreydetoda added a commit to elreydetoda/jupiterbroadcasting.com that referenced this issue Oct 16, 2022
elreydetoda added a commit to elreydetoda/jupiterbroadcasting.com that referenced this issue Oct 16, 2022
@CGBassPlayer
Copy link
Collaborator Author

@elreydetoda I see you are trying repo-sync. I was looking at that action as well. Have you had any luck with it?

@elreydetoda
Copy link
Collaborator

Ya I was looking at it, but honestly while there are possible actions for us out there I don't think there's going to be a good solution...the issue being that both branches are getting committed to simultaneously... all the actions are expecting either the downstream branch to be stagnant or automatically open PRs for them to be merged by someone...

I think there's a potential (once #475) is merged, that we could have PRs be opened and merged automatically (once we get our testing really solid 😅). Then we could only have a PR reviewed from a human if there is a merge conflict or a test fails.

Till then, we could have an automation that just runs what I've been doing manually after develop is synchronized with main.

git checkout main && git pull && git checkout develop && git pull && git pull --ff-only origin main && git push

Which essentially syncs develop with main after I've merged develop into main.

@CGBassPlayer
Copy link
Collaborator Author

CGBassPlayer commented Nov 13, 2022

So looking into this a little bit, my first draft at an idea is to update the scrape action to have a few extra steps

    - name: 'Commit'
      uses: stefanzweifel/git-auto-commit-action@v4
      with:
        commit_message: 'Automatically scraped and committed via a GitHub Action.'
        repository: ./jbsite
        branch: scraping

    - name: 'Create PR for production'
      uses: devops-infra/[email protected]
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        source_branch: scraping
        target_branch: main
        title: Scrape data into main
        body: "** Automated PR **"
        label: scraper

    - name: 'Create PR for develop'
      uses: devops-infra/[email protected]
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        source_branch: scraping
        target_branch: develop
        title: Scrape data into develop
        body: "** Automated PR **"
        label: scraper

    - name: 'Merge scraper PRs'
      uses: devmasx/merge-branch@master
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        label_name: scraper

So basically the scrape would do the following:

  1. Commit into a branch called scraping
  2. Create 2 PRs to merge this branch into main and develop with a label (currently set to scraper)
  3. Merge PRs with with the scraper label

Looking for some feedback on this approach. I don't think we would really need to sync the scraping branch since it only handles scraped data so conflicts shouldn't really happen. But I haven't tested any of this. Just spit-balling here.

@gerbrent
Copy link
Collaborator

good idea!

One worry would be auto-merges of scraper labelled PRs, since anyone could presumably tag a non-moderated PR and thus have it get merged on the next scrape without approval/knowledge. Could be an issue..

@gerbrent
Copy link
Collaborator

then again, I think only mods can label issues, but still a consideration.

@CGBassPlayer
Copy link
Collaborator Author

One worry would be auto-merges of scraper labelled PRs...

I am only using scraper as an example. We can reserve a label just for this action if we wanted to.

then again, I think only mods can label issues, but still a consideration.

You are correct. I was only able to edit labels after getting triage permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev Related to the development environment enhancement New feature, enhancement, or request feedback requested
Projects
None yet
Development

No branches or pull requests

3 participants