-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: GitHub request to increase perpage values to 100 #94
Comments
Where is your current code that will break? https://pypi.org/project/PyGithub/ is definitely one of the packages that would suit your needs if you want Python interface. Otherwise, you can go straight to GitHub's GraphQL as well. Here is example: https://github.com/scientific-python/devstats/blob/main/devstats/query.py (its output is then ingested by https://github.com/scientific-python/devstats.scientific-python.org ). |
p.s. If you use GitHub REST API directly, usually there is a field in the output JSON to tell you how many pages total on top, so then you can send subsequent queries with increasing page number as well. I am less sure about GraphQL output since I never implemented it myself. |
p.p.s. You might want to put in a sleep timer too, to avoid being blocked as spam. |
@pllim my knowledge of working with api's and pagination is a big WIP. i have request get and return response methods here. right now i'm only grabbing accepted reviews but i also want to add other steps so we can document our entire review process (how many packages are under review, etc) on our website! and we will be above 30 accepted soon at the rate we're going :) we have ~13 in review now. So essentially the options here would be to
are there benefits of the graphql approach? i did play with devstats (and want to use it for our work here too!!). thank you so much for this input!! |
FWIW, Option 2 would be easier on you in the long run. I don't think it has a timer but its PaginatedList object seems to support multiple pages natively and is iterable, so theoretically you would loop through it and then if you want, put a timer in yourself in each iteration. I hope that makes sense. List of issues: https://pygithub.readthedocs.io/en/stable/examples/Repository.html#get-list-of-open-issues List of PRs: https://pygithub.readthedocs.io/en/stable/github_objects/Repository.html#github.Repository.Repository.get_pulls |
ok update - i'm going to update the header of this issue with more specifics.
#3 will take a bit more effort but we have plenty of time to implement it so i can begin to set us up for that approach! i'm not sure what the benefits of graphQL are over rest calls at this point. |
Re: GraphQL -- https://docs.github.com/en/graphql/overview/about-the-graphql-api They advertised it as smaller footprint as you get exactly what you ask for. REST API returns everything. (Glad you found a solution.) |
Currently i think we went down a more difficult path by using only requests to parse github issues. Right now our workflow is set to fail when we hit 30 packages because pagination is not handled in our api.
Using pygithub, you can easily grab issues from a specific repository and it handles pagination in addition to easily grabbing metadata for each issue using built in methods.
The code below is one example of us quickly parsing through issues. i'm thikning it's worth considering this heavily - soonish to ensure our build doesn't break. we have 13 packages in review now so i suspect we will hit 30 in the upcoming months rather quickly.
this is both a bug that isn't realized (yet) and an enhancement needed to the api.
The text was updated successfully, but these errors were encountered: