-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: Trigger preview of the web/docs #50899
Conversation
Seems like indeed, the artifact is not yet ready when triggering the preview from the workflow. Returning from the webservice asynchronously and delaying and retrying the github api call to get the artifacts seems to be the way forward. |
Preview docs at: https://pandas.pydata.org/preview/50899/ |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@pandas-dev/pandas-core this (the web/doc previewer) is now ready. After the feedback of the dev call where this was discussed, and trying many approaches (github makes this a bit tricky), the behavior that makes more sense to me is:
This is now working. The web service needs a bit more work (it's using my personal token, better error logging in case github is broken would be nice, I didn't implement the purging of old previews yet, and it should better be installed as a systemd service), and I need to publish it somewhere too (not sure if in a repo in the pandas org, the pydata org, or my own user, let me know if you've got any preference). But I think we can merge this, and get the preview working already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. This is going to be awesome
@@ -89,3 +89,10 @@ jobs: | |||
name: website | |||
path: web/build | |||
retention-days: 14 | |||
|
|||
- name: Trigger web/doc preview | |||
run: curl -X POST https://pandas.pydata.org/preview/submit/$RUN_ID/$PR_ID/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this endpoint have any type of rate limiting or session management? Not sure how much we care about a bad actor trying to take it down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. We're using Cloudflare, hopefully that can help us prevent some problems. The service itself is implemented in asynchronous rust, and it's very efficient, should be able to support quite a big work load. That being said, your comment makes me think of some things that can be improved. Happy to discuss them in private. In any case, I think it's good as it is to start, and if there are attacks, we can consider options, including pausing the service until we can prevent the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. As long as you feel like there are other ways to mitigate and react happy to defer to your expertise. Change lgtm overall
I think this is really great! A thought: Maybe discoverability for this service can be an issue for non-regular submitters/reviewers. Maybe to start with (after we're sure everything runs smoothly) always let the service make a comment in the PR showing the link to the generated preview? |
Very good point @topper-123. What I had in mind is that this is going to be more useful for reviewers than the original authors (who probably rendered the web/docs locally). That being said, I think it'd be good to add this to the documentation eventually (I wouldn't just yet, as it'd be good to have a experimental period and see how this works). For the automated comment, I agree it could be useful, but at the same time is going to cause a lot of noise. For people who are watching all activity in our repo, this would cause a new email for each opened PR. Open to it, but not sure if it's worth. I guess with documentation, and seeing that people who knows about it uses the comment may be enough. But I'm surely open to try it, and if it's really too noisy, we can shut it down anytime. |
Yeah, I hadn't thought about the emails, that would be noisy I agree, good point. I think it's fine then to use a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this. Just a few questions/comments.
Really looking forward to seeing it.
.github/workflows/preview-docs.yml
Outdated
@@ -0,0 +1,22 @@ | |||
name: Assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This should be renamed to preview docs or something like that
.github/workflows/preview-docs.yml
Outdated
contents: read | ||
|
||
jobs: | ||
issue_assign: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
env: | ||
RUN_ID: ${{ github.run_id }} | ||
PR_ID: ${{ github.event.pull_request.number }} | ||
if: github.event_name == 'pull_request' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this would run every time the PR is pushed to, so we might run out of disk space on the VM hosting our docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, and for finding the pending renamed from copying the assign job. I addressed them.
IIUC, this would run every time the PR is pushed to, so we might run out of disk space on the VM hosting our docs?
Yes, you're correct. We overwrite the docs, so even if it's updated at every push, we only need one version per PR. But we'll eventually run out of space, since no purging is yet implemented. My idea is to get this working and see how it works for a bit, and if there are no surprises, and this is useful and convenient, I'll continue the pending things. There are some other things pending besides purging, like stop using my personal token, set up the service with systemd, improve the logging. Nothing big, but I think it makes more sense to start using the current MVP first, just in case any major change is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. As long as the service doesn't run on the main docs VM, this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are independent. I don't think even the CI will fail if the disk becomes full. But as said, planning to implement the purging and anything else needeed before that happens, once we see everythinf is working fine.
xref #50832
This call tells a simple service in our new server to preview the docs to download the artifact and uncompress it in a directory with the number of the PR in: https://pandas.pydata.org/preview/
Ideally we want to get #50897 merged before this one (otherwise the website can't be navigated clicking at the links).
Still need to check if the artifact is ready when the service is called, I think it's only ready after the CI job is over. So I may need to return from the webservice asynchronously and introduce a delay.