Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Trigger preview of the web/docs #50899

Merged
merged 14 commits into from
Feb 27, 2023

Conversation

datapythonista
Copy link
Member

xref #50832

This call tells a simple service in our new server to preview the docs to download the artifact and uncompress it in a directory with the number of the PR in: https://pandas.pydata.org/preview/

Ideally we want to get #50897 merged before this one (otherwise the website can't be navigated clicking at the links).

Still need to check if the artifact is ready when the service is called, I think it's only ready after the CI job is over. So I may need to return from the webservice asynchronously and introduce a delay.

@datapythonista datapythonista added Docs CI Continuous Integration Web pandas website labels Jan 20, 2023
@datapythonista
Copy link
Member Author

Seems like indeed, the artifact is not yet ready when triggering the preview from the workflow. Returning from the webservice asynchronously and delaying and retrying the github api call to get the artifacts seems to be the way forward.

@datapythonista datapythonista changed the title WIP: Trigger preview of the web/docs CI: Trigger preview of the web/docs Jan 26, 2023
@datapythonista
Copy link
Member Author

Preview docs at: https://pandas.pydata.org/preview/50899/

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 26, 2023
@datapythonista
Copy link
Member Author

@pandas-dev/pandas-core this (the web/doc previewer) is now ready. After the feedback of the dev call where this was discussed, and trying many approaches (github makes this a bit tricky), the behavior that makes more sense to me is:

  • Once the doc build finishes, the CI job makes a call to a webservice to generate the preview
  • The webservice responds immediately, and asynchronously will wait until the artifact is ready, and will publish its content in our preview service
  • I added an action, that when a comment with the text /preview is written, it answers with another comment with the url where the docs preview is available (you can see how it works here: Test preview action datapythonista/pandas#4)

This is now working. The web service needs a bit more work (it's using my personal token, better error logging in case github is broken would be nice, I didn't implement the purging of old previews yet, and it should better be installed as a systemd service), and I need to publish it somewhere too (not sure if in a repo in the pandas org, the pydata org, or my own user, let me know if you've got any preference).

But I think we can merge this, and get the preview working already.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This is going to be awesome

@@ -89,3 +89,10 @@ jobs:
name: website
path: web/build
retention-days: 14

- name: Trigger web/doc preview
run: curl -X POST https://pandas.pydata.org/preview/submit/$RUN_ID/$PR_ID/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this endpoint have any type of rate limiting or session management? Not sure how much we care about a bad actor trying to take it down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. We're using Cloudflare, hopefully that can help us prevent some problems. The service itself is implemented in asynchronous rust, and it's very efficient, should be able to support quite a big work load. That being said, your comment makes me think of some things that can be improved. Happy to discuss them in private. In any case, I think it's good as it is to start, and if there are attacks, we can consider options, including pausing the service until we can prevent the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. As long as you feel like there are other ways to mitigate and react happy to defer to your expertise. Change lgtm overall

@topper-123
Copy link
Contributor

I think this is really great!

A thought: Maybe discoverability for this service can be an issue for non-regular submitters/reviewers. Maybe to start with (after we're sure everything runs smoothly) always let the service make a comment in the PR showing the link to the generated preview?

@datapythonista
Copy link
Member Author

Very good point @topper-123. What I had in mind is that this is going to be more useful for reviewers than the original authors (who probably rendered the web/docs locally). That being said, I think it'd be good to add this to the documentation eventually (I wouldn't just yet, as it'd be good to have a experimental period and see how this works). For the automated comment, I agree it could be useful, but at the same time is going to cause a lot of noise. For people who are watching all activity in our repo, this would cause a new email for each opened PR. Open to it, but not sure if it's worth. I guess with documentation, and seeing that people who knows about it uses the comment may be enough.

But I'm surely open to try it, and if it's really too noisy, we can shut it down anytime.

@topper-123
Copy link
Contributor

Yeah, I hadn't thought about the emails, that would be noisy I agree, good point. I think it's fine then to use a /preview command. As you say we can always discuss later if people appear to have trouble with discoverability.

Copy link
Member

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this. Just a few questions/comments.

Really looking forward to seeing it.

@@ -0,0 +1,22 @@
name: Assign
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should be renamed to preview docs or something like that

contents: read

jobs:
issue_assign:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

env:
RUN_ID: ${{ github.run_id }}
PR_ID: ${{ github.event.pull_request.number }}
if: github.event_name == 'pull_request'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this would run every time the PR is pushed to, so we might run out of disk space on the VM hosting our docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, and for finding the pending renamed from copying the assign job. I addressed them.

IIUC, this would run every time the PR is pushed to, so we might run out of disk space on the VM hosting our docs?

Yes, you're correct. We overwrite the docs, so even if it's updated at every push, we only need one version per PR. But we'll eventually run out of space, since no purging is yet implemented. My idea is to get this working and see how it works for a bit, and if there are no surprises, and this is useful and convenient, I'll continue the pending things. There are some other things pending besides purging, like stop using my personal token, set up the service with systemd, improve the logging. Nothing big, but I think it makes more sense to start using the current MVP first, just in case any major change is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. As long as the service doesn't run on the main docs VM, this is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are independent. I don't think even the CI will fail if the disk becomes full. But as said, planning to implement the purging and anything else needeed before that happens, once we see everythinf is working fine.

@datapythonista datapythonista merged commit 1ccad12 into pandas-dev:main Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs Web pandas website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants