Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Create a Change Log #573

Closed
sameerz opened this issue Aug 18, 2020 · 5 comments · Fixed by #599
Closed

[FEA] Create a Change Log #573

sameerz opened this issue Aug 18, 2020 · 5 comments · Fixed by #599
Assignees
Labels
build Related to CI / CD or cleanly building P1 Nice to have for release

Comments

@sameerz
Copy link
Collaborator

sameerz commented Aug 18, 2020

Create a changelog file that includes all PRs that were merged and Issues that include the labels bug, feature request, sql, performance and shuffle, minus any issues with the labels wontfix, invalid or duplicate.

The change log should be have sections for each project.

For each project there should be an issue subsection for

  1. Features: all issues with label feature requests + sql
  2. Performance: all issues with label performance + shuffle
  3. Bugs: All issues with label bug

If an issue has labels that overlap subsections, it may appear in multiple subsections. This will cause duplicates, I am interested to hear feedback about that

All merged PRs for a given project should appear in a subsection PRs

The most recent project should be first.

Issues and PRs in the changelog should appear sorted in chronological order by date of merge or close, with the most recent being on top. The issue or PR number should precede the Issue or PR title, with an html link to the issue or PR.

The result should be a markdown file named CHANGELOG.md located in the default branch. The change log should be generated on a regular basis (nightly?) and include a generated on date in the heading.

For example,

Change log

Generated on YYYY-MM-DD

Release 0.3

Release 0.2

Features

#566 [FEA] Add support for StringSplit with an array index
#524 [FEA] Add GPU specific metrics to GpuFileSourceScanExec
#500 [FEA] Add maven profiles for testing with AQE on or off
...

Performance

#15 [FEA] Multiple threads shareing the same GPU
...

Bugs fixed

#483 [BUG] Multiple scans for the same parquet data source
...

PRs

#564 Add GPU decode time metric to scans
...

Release 0.1

@mythrocks started an effort in this direction with https://gist.github.com/mythrocks/06a86e1681a7203107e41b1ff12d44c4#file-get_issues-py-L69

@sameerz sameerz added feature request New feature or request ? - Needs Triage Need team to review and classify build Related to CI / CD or cleanly building and removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Aug 18, 2020
@pxLi pxLi self-assigned this Aug 18, 2020
@sameerz sameerz added the P1 Nice to have for release label Aug 18, 2020
@pxLi
Copy link
Collaborator

pxLi commented Aug 20, 2020

@sameerz

Hi, two more questions:

  1. I was seeing duplicates in subsections which could look weird (e.g. shuffle vs bug), I think it might be better to just keep one? any kind of priority on subsections would be nice 👍 also some issues were put in both 0.2 and 0.3 projects
  2. There are four projects "Project(name="Triage and Backlog"),Project(name="Release 0.1"),Project(name="Release 0.2"),Project(name="Release 0.3")", I would assume the tool should only process Release*? Also can I assume only ProjectCards under column Done should be on our changelog?

thx!

@sameerz
Copy link
Collaborator Author

sameerz commented Aug 20, 2020

  1. I was seeing duplicates in subsections which could look weird (e.g. shuffle vs bug), I think it might be better to just keep one? any kind of priority on subsections would be nice 👍 also some issues were put in both 0.2 and 0.3 projects

To deduplicate section priority should be

  • Bugs fixed
  • Performance
  • Features

So an issue marked bug and performance should only appear in Bugs fixed.

  1. There are four projects "Project(name="Triage and Backlog"),Project(name="Release 0.1"),Project(name="Release 0.2"),Project(name="Release 0.3")", I would assume the tool should only process Release*? Also can I assume only ProjectCards under column Done should be on our changelog?

Let's only use the Release* projects, where cards are in the Done column.

@pxLi
Copy link
Collaborator

pxLi commented Aug 20, 2020

@sameerz
https://gist.github.com/pxLi/2e0b148c74baf951234a4ad88d8d2494
this is the CHANGELOG.md generated by my WIP tool,
plz let me know if anything wrong so I can adjust my code in advance. thx!

@jlowe
Copy link
Member

jlowe commented Aug 20, 2020

Thanks, @pxLi this is a great start!

There are some errors of omission in the changelog. It appears the PR search is by project, but there are some PRs merged to branch-0.2 that are not in a project and therefore are missing from the changelog despite. We either need to change the query to search based on PRs merged to a release branch in the range of time a release was "open" (which should eliminate the possibility of PRs being dropped from the changelog), or flag at generation time any PRs merged to a release branch that did not have a project label and need to be corrected before regenerating the changelog. Example query for branch-0.2: https://github.com/NVIDIA/spark-rapids/pulls?q=is%3Apr+is%3Amerged+no%3Aproject+base%3Abranch-0.2. That query currently picks up three PRs which I'll fix.

I also noticed the changelog lists issues being fixed in 0.3, but we haven't started on 0.3 yet. I'll fix the projects of those issues as well.

@pxLi
Copy link
Collaborator

pxLi commented Aug 21, 2020

Thanks, @pxLi this is a great start!

There are some errors of omission in the changelog. It appears the PR search is by project, but there are some PRs merged to branch-0.2 that are not in a project and therefore are missing from the changelog despite. We either need to change the query to search based on PRs merged to a release branch in the range of time a release was "open" (which should eliminate the possibility of PRs being dropped from the changelog), or flag at generation time any PRs merged to a release branch that did not have a project label and need to be corrected before regenerating the changelog. Example query for branch-0.2: https://github.com/NVIDIA/spark-rapids/pulls?q=is%3Apr+is%3Amerged+no%3Aproject+base%3Abranch-0.2. That query currently picks up three PRs which I'll fix.

I also noticed the changelog lists issues being fixed in 0.3, but we haven't started on 0.3 yet. I'll fix the projects of those issues as well.

hi @jlowe thx for your feedback.

Since projects feature was provided by github much later than issue/pr, so their REST API does not provide ability to easily get actual project info from issue/pr object. So I took an approach from project (Release 0.2) -> column (Done) -> ProjectCard -> content -> tell if it is a issue or pr by the content link 🤣, it took ~1 min to fetch all information for changelog. And I tried their newer Graphql API, that is fast but it has terrible search/filter and pagination support for nested objects in query, Let me test more.

flag at generation time any PRs merged to a release branch that did not have a project label and need to be corrected before regenerating the changelog

This sounds good, I can print that info when try use the tool. And if we want nightly auto-doc-gen workflow I can probably comment that in PR thread.

@pxLi pxLi closed this as completed in #599 Sep 1, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
[auto-merge] bot-auto-merge-branch-22.10 to branch-22.12 [skip ci] [bot]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building P1 Nice to have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants