-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add commit message merge functionality #193
Conversation
In order to test new functionality (i.e. the read.commit.messages function) new files containing test commit data were needed. Add two files containing messages corresponding to the test commit data that already exists. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Add the read.commit.messages and create.empty.commits.list functions to util-read.R as well as a unit test to test the new functions. This allows to read 'commitMessages.list' files and return the commit data separated into message title and body. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Discussion has shown that codeface separates lines with five spaces, not four. So the two test files have been modified to account for that fact. See discussion in se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Change the line breaks in the expected output to \n's. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Replace five spaces with \n, remove any white space at the beginning and the end of a commit message. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Get the commit messsage data using the new read function and merge either nothing, the title or message and title into the commit.data of the proj.conf instance. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Add the new attribute "commit.messages" to the project configuration class with options "none", "title" and "message" to make it possible to specify what exactly of the commit message data is to be merged to the commit data. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Signed-off-by: Niklas Schneider <[email protected]>
Add two tests for testing the merge functionality for both full commit messages and titles only. Fix bug that merges message body instead of title when selecting option "title" See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Signed-off-by: Niklas Schneider <[email protected]>
Also exchange the merge attribute when merging data frames of commit messages from commit.id to hash. Signed-off-by: Niklas Schneider <[email protected]>
Signed-off-by: Niklas Schneider <[email protected]>
As commit.id was the first column of the data frame anyway, merging has not changed the order. But when using the hash column it is taken as the first colum of the resulting data frame. Change the order of the columns in order to not break anything that relies on the order. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nlschn!
Thank you very much for your PR! 😀👍 And congrats for opening your (kind of) first one for coronet! 🥳 Your work integrates a feature that @bockthom requested for a long time. He is definitely very happy about this! 😁
I added some comments to your changes, with the main goal to improve maintainability:
- There is an implementation error in the method
ProjectData$get.commit.messages
. I highlighted the corresponding comment with the symbol 🐛. - You access and modify data.frames using indices in several places – instead of using columns names. This may cause problems when functionality changes in the future. You should always avoid using indices in any case if possible (at least, outside the file
util-read.R
). - The algorithm of post-processing commit messages (after they have initially been read from disk) can be tweaked to speed things up and make the implementation more comprehensible.
- You have added a configuration option to the class
ProjectConf
, but you have not updated the README file accordingly. I added a corresponding comment to your update of the fileNEWS.md
.
Your additions to coronet are a very nice piece of work! Keep it up! 👍
If you have any questions regarding my comments or anything else, please do not hesitate to ask. 🙂
By the way, if you got an e-mail indicating a single new and, in the end, non-existing comment on PR #192: Sorry for this! I accidentally clicked the wrong button when starting to review the PR and, then, deleted the comment right away. 🤦♂️
Follow the review suggestions of @clhunsen. See se-sic#180 Signed-off-by: Niklas Schneider <[email protected]>
Following the review of se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Remove some empty lines and indent some lines. Also remove commit.message.data.unprocessed variable and use the commit.message.data variable from the beginning. Add column names beforehand in order to enable access without indices. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Create private function update.commit.message.data in util-data.R which handles the merge and change the location where it is called in set.commits. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Fix an error where the value of a variable that is defined in an if block is returned outside that if block. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nlschn, your implementation looks really good. Sorry for my late review, I have hoped to achieve that earlier. Anyway, I just have found a few minor issues addressing consistency. So, please have a look at them (and please don't overstate the number of my comments, most of my comments go into the same direction). Don't hesitate to ask if something is unclear to you.
Also thanks to @clhunsen for your review, you already have pointed out the most important issues.
Replaced a loop with a conversion from a list of vectors in a data frame and access its columns directly See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Move functions concerning reading commit messages and the constants used by them to a new section in util.read. Replace subset with proper indexing and minor comment fixes. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Also adapt 'update.commit.messages' to better match the implementation of similar methods. Add 'set.commit.messages' in order to be able to set the commit messages to NULL. See se-sic#193. Signed-off-by: Niklas Schneider <[email protected]>
Introduce new function 'format.commit.ids' in along with new section in util-read.R. Also put format "<commit-%s>" into a constant. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Take advice by @clhunsen to replace if else cascade for rearranging columns with better merge call. Also modify test-data tests regarding commit messages: Row names are no longer ignored. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Change order in 'README.md', 'util-conf.R' and 'util-data.R' Also fix table of contents in the readme. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Add (empty) commit message data to all data split tests in 'tests-split.R'. Also sor the additional data sources alphabetically in the tests. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Add (empty) commit message data to all data split tests in 'test-split-sliding-window.R'. Also sort the additional data sources alphabetically in the tests. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
These functions act like cleanup.pasta, they remove rows from the data that are not part of a range data object anymore. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Fix a copy-paste-error where a varibale name was not changed from a prior code snippet. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Yes, this problem will occur everywhere where package |
After spending more than two hours on investigating why the pipeline fails for R 3.3, eventually, I was able to find out what is going wrong there: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the changes in this PR look good already. I have only found some minor issues. And this time I really hope that we can merge the PR afterwards...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just spotted two other things:
For the first one: We have a warning notifying the user about cleaning up the PaStA data, which we should also add for the synchronicity data. Could you please add the following warning in line 450 of util-data.R
, in update.synchronicity.data
(in a similary way as in update.pasta.data
):
logging::logwarn("There might be synchronicity data that does not appear in the commit data.
To clean this up you can call the function 'cleanup.synchronicity.data()'.")
And also in a similar way for commit messages.
The second one is just an update of the NEWS.md, as we have added these cleanup functions, see comment below.
(Now I am finished with my review. I won't look at the existing parts of this PR any more unless someone else spots any other issues.)
Just for documentation reasons: The discussion about the failing pipeline for R 3.3.3 will be continued here: #161 (comment) |
See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Warn user when updating commit messages or synchronicity that they should call the corresponding cleanup method. Remove calls to unlist in those cleanup functions. Fix copy-pasted comments. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
Fix the regex that removes spaces at the end of a commit message. Change the assignment of the 'commit.message.data' data frame such that no new data frame is instantiated anymore. See se-sic#193 Signed-off-by: Niklas Schneider <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 🥳👍
Our CI pipeline does currently not work for "R-3.3" (but for all the other R versions in the pipeline). The reason for that is the following: As of 2021-01-26, there is a new version of package `memoise` (2.0.0), which is imported by `RSQLite`, which in turn is imported by `sqldf`. The new version of `memoise` imports package `cachem`, which in turn imports package `fastmap`. Using all these packages is not a problem per se, they are compatible with R 3.3.3. However, the docker container `r-base:3.3.3`, which we use in our pipeline "R-3.3", uses g++ version 6.3.0-9, which contains a bug. Due to the bug in the compiler, `fastmap` cannot be compiled and installed in tje `r-base:3.3.3` docker container, letting the CI pipeline fail. To circumvent this problem but still keep "R-3.3" in our CI pipeline, we now use docker image `r-base:3.3.2` instead, as there is another g++ version installed in this docker image. With that, the CI pipeline should succeed for "R-3.3" again. Signed-off-by: Thomas Bock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @nlschn, thanks for your endurance.
As everyone has approved this PR and also the test pipeline succeeds again as of the pipeline fix, I will merge right away.
Create a new helper function to be used by 'metrics.centrality' in order to compute the authors from the data.sources of the network's relations automatically. See se-sic#193
Prerequisites
dev
.Description
[Now from mydev branch]
In order to address issue #180 this PR adds functionality to read and process commit message data in "commitMessages.list" files. The first non-white-space line of the message is considered the title, the rest the body of the commit message.
This functionality can be found in "util-read.R"
The ability to merge the commit messages to the commit data is implemented in "set.commits" in "util-data.R" and a new attribute "commit.messages" has been added to "proj.conf" to control what data shall be merged.
For each new functionality tests are provided.
Changelog
commit.messages
inproj.conf
:none
is the default value and does not change the previous behavior ofproj.data$set.commits
.title
merges the commit message titles (i.e. the first non white space line of a commit message) to the commit data. This gives the data frame an additional columntitle
.messages
merges both titles and message bodies to the commit data frame. This adds two new columnstitle
andmessage.body
.