Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add commit message merge functionality #193

Merged
merged 43 commits into from
Feb 3, 2021
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
9589e52
Add test commit message data
nlschn Dec 13, 2020
85b1d05
Add new function to read commit messages
nlschn Dec 13, 2020
61618db
Change commitMessages.list test files to have for the right line breaks
nlschn Dec 16, 2020
17a61ed
Adapt commit message read test to new test files
nlschn Dec 16, 2020
c624c90
Adapt read.commit.messages to handle line breaks correctly
nlschn Dec 16, 2020
fdc414a
Add functions that enable merging commit messages into data
nlschn Dec 21, 2020
5db90d8
Add new configuration option for commit messages
nlschn Dec 21, 2020
f80b24b
Replace seq with seq_along and add missing log statement in util-read.R
nlschn Dec 21, 2020
9414357
Add tests for merging and fix bug when merging only titles
nlschn Dec 28, 2020
359b12c
Add description of changes to unversioned section of NEWS.md
nlschn Jan 2, 2021
70c8395
Remove unnecessary empty lines from several files
nlschn Jan 7, 2021
89a6ea6
Fix a syntax error in util-read
nlschn Jan 7, 2021
6e9147e
Fix merging by hash instead of commit.id
nlschn Jan 8, 2021
c9c7ff7
Modify README and NEWS
nlschn Jan 13, 2021
0457dd5
Rename "message.body" column to "message" everywhere
nlschn Jan 13, 2021
7e61dcb
Fix style issues and improve message processing
nlschn Jan 13, 2021
8e28a1f
Put merge functionality into own function
nlschn Jan 13, 2021
703ab3e
Fix error when returning a variable that is not defined
nlschn Jan 13, 2021
7caaa8d
Simplify data frame creation in read.commit.messages
nlschn Jan 15, 2021
8dd410c
Reorder functions in util read and replace special functions
nlschn Jan 15, 2021
eb1cec8
Fix comments in and change order in 'set.commits'
nlschn Jan 15, 2021
d5c8c78
Add helper function to format 'commit.id' column
nlschn Jan 15, 2021
43e1894
Change commit message merge process
nlschn Jan 15, 2021
70b3cb6
Change order of data sources to be alphabetical
nlschn Jan 16, 2021
31e0f85
Update 'NEWS.md' with commit hashes
nlschn Jan 16, 2021
a0d5e32
Add package 'data.table' to coronet and refactor README
nlschn Jan 20, 2021
4c49269
Increase perfomance of commit message read
nlschn Jan 20, 2021
19655dd
Update my copyright notices
nlschn Jan 20, 2021
a36bde4
Fix spelling errors in 'README.md' and 'util-conf.R'
nlschn Jan 20, 2021
aab0751
Use new helper function in tests to format commit ids
nlschn Jan 25, 2021
0859b9a
Replace for-loop with lapply call in function to read commit messages
nlschn Jan 25, 2021
fc5d20f
Fix minor comment issues and add checks before updating commit messages
nlschn Jan 25, 2021
686459e
Initialize commit message data on RangeData-objects in 'util-split.R'
nlschn Jan 25, 2021
613a773
Fix minor spelling errors
nlschn Jan 25, 2021
98e83b0
Change all data split tests to include commit message data
nlschn Jan 25, 2021
2e42fca
Change all sliding window data tests to include commit message data
nlschn Jan 25, 2021
c052dfb
Fix minor comment issue in 'test-split-sliding-window.R'
nlschn Jan 26, 2021
d3bbae0
Add new cleanup functions for commit messages and synchronicity
nlschn Jan 30, 2021
9385084
Fix wrong variable name in 'cleanup.synchronicity'
nlschn Jan 30, 2021
63b6f79
Add cleanup functions to NEWS.md
nlschn Feb 1, 2021
c63a25a
Remove unnecassary function calls and add logging output
nlschn Feb 1, 2021
e1e1ba8
Fix regex when filtering out spaces and change data frame assignment
nlschn Feb 1, 2021
18843a8
Fix problems in CI pipeline for R-3.3
bockthom Feb 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unversioned

### Added
- Add functionality to read and process commit messages in order to merge them to the commit data (see issue #180). Three values are available for the new attribute `commit.messages` in `ProjectConf`: `none`, `title` and `messages`.
bockthom marked this conversation as resolved.
Show resolved Hide resolved

bockthom marked this conversation as resolved.
Show resolved Hide resolved

### Changed/Improved
- Add `.drone.yml` to enable running our CI pipelines on drone.io (PR #191, 1c5804b59c582cf34af6970b435add51452fbd11)

Expand Down
77 changes: 42 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,40 +11,43 @@ If you wonder: The name `coronet` derives as an acronym from the words "configur

## Table of contents

- [Integration](#integration)
* [Requirements](#requirements)
* [R](#r)
* [packrat (recommended)](#packrat)
* [Folder structure of the input data](#folder-structure-of-the-input-data)
* [Needed R packages](#needed-r-packages)
* [Submodule](#submodule)
* [Selecting the correct version](#selecting-the-correct-version)
- [Functionality](#functionality)
* [Configuration](#configuration)
* [Data sources](#data-sources)
* [Network construction](#network-construction)
* [Data sources for network construction](#data-sources-for-network-construction)
* [Types of networks](#types-of-networks)
* [Relations](#relations)
* [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
* [Vertex and edge attributes](#vertex-and-edge-attributes)
* [Further functionalities](#further-functionalities)
* [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
* [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
* [Handling data independently](#handling-data-independently)
* [How-to](#how-to)
* [File/Module overview](#filemodule-overview)
- [Configuration classes](#configuration-classes)
* [ProjectConf](#projectconf)
* [Basic information](#basic-information)
* [Artifact-related information](#artifact-related-information)
* [Revision-related information](#revision-related-information)
* [Data paths](#data-paths)
* [Splitting information](#splitting-information)
* [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
* [NetworkConf](#networkconf)
- [License](#license)
- [Work in progress](#work-in-progress)
- [coronet - The network library](#coronet---the-network-library)
- [Table of contents](#table-of-contents)
bockthom marked this conversation as resolved.
Show resolved Hide resolved
- [Integration](#integration)
- [Requirements](#requirements)
- [`R`](#r)
- [`packrat` (recommended)](#packrat-recommended)
- [Folder structure of the input data](#folder-structure-of-the-input-data)
- [Needed R packages](#needed-r-packages)
- [Submodule](#submodule)
- [Selecting the correct version](#selecting-the-correct-version)
- [Functionality](#functionality)
- [Configuration](#configuration)
- [Data sources](#data-sources)
- [Network construction](#network-construction)
- [Data sources for network construction](#data-sources-for-network-construction)
- [Types of networks](#types-of-networks)
- [Relations](#relations)
- [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
- [Vertex and edge attributes](#vertex-and-edge-attributes)
- [Further functionalities](#further-functionalities)
- [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
- [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
- [Handling data independently](#handling-data-independently)
- [How-to](#how-to)
- [File/Module overview](#filemodule-overview)
- [Configuration classes](#configuration-classes)
- [ProjectConf](#projectconf)
- [Basic information](#basic-information)
- [Artifact-related information](#artifact-related-information)
- [Revision-related information](#revision-related-information)
- [Data paths](#data-paths)
- [Splitting information](#splitting-information)
- [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
- [NetworkConf](#networkconf)
- [Contributing](#contributing)
- [License](#license)
- [Work in progress](#work-in-progress)


## Integration
Expand Down Expand Up @@ -183,7 +186,11 @@ There are two distinguishable types of data sources that are both handled by the
* Patch-stack analysis to link patches sent to mailing lists and upstream commits
* Synchronicity information on commits (see also the parameter `synchronicity` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class)
* Synchronous commits are commits that change a source-code artifact that has also been changed by another author within a reasonable time-window.

* Commit messages are available through the parameter `commit.messages`. Three values can be used:
bockthom marked this conversation as resolved.
Show resolved Hide resolved
1. `none` is the default value and does not impact the configuration at all.
2. `title` merges the commit message titles (i.e. the first non white space line of a commit message) to the commit data. This gives the data frame an additional column `title`.
3. `messages` merges both titles and message bodies to the commit data frame. This adds two new columns `title` and `message`.

The important difference is that the *main data sources* are used internally to construct artifact vertices in relevant types of networks. Additionally, these data sources can be used as a basis for splitting `ProjectData` in a time-based or activity-based manner – obtaining `RangeData` instances as a result (see file `split.R` and the contained functions). Thus, `RangeData` objects contain only data of a specific period of time.

The *additional data sources* are orthogonal to the main data sources, can augment them by additional information, and, thus, are not split at any time.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";" Add some more stuff "
32710;"3a0ed78458b3976243db6829f63eba3eead26774";" I added important things the things are nothing"
32714;"1143db502761379c2bfcecc2007fc34282e7ee61";" I wish it would work now"
32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish intensifies"
32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";" ... still doesn't work as expected "
32711;"0a1a5c523d835459c42f33e863623138555e2526";""
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";" Add some more stuff "
32710;"3a0ed78458b3976243db6829f63eba3eead26774";" I added important things the things are nothing"
32714;"1143db502761379c2bfcecc2007fc34282e7ee61";" I wish it would work now"
32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish intensifies"
32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";" ... still doesn't work as expected "
32711;"0a1a5c523d835459c42f33e863623138555e2526";""
101 changes: 99 additions & 2 deletions tests/test-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
## Copyright 2018 by Christian Hechtl <[email protected]>
## Copyright 2018-2019 by Claus Hunsen <[email protected]>
## Copyright 2019 by Jakob Kronawitter <[email protected]>
## Copyright 2020 by Niklas Schneider <[email protected]>
## All Rights Reserved.


Expand All @@ -33,7 +34,7 @@ if (!dir.exists(CF.DATA)) CF.DATA = file.path(".", "tests", "codeface-data")

test_that("Compare two ProjectData objects", {

##initialize a ProjectData object with the ProjectConf and clone it into another one
## initialize a ProjectData object with the ProjectConf and clone it into another one
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("pasta", TRUE)
proj.data.one = ProjectData$new(project.conf = proj.conf)
Expand All @@ -44,7 +45,7 @@ test_that("Compare two ProjectData objects", {
## Always change one data source in the one object, test for inequality, change it in the
## second object, as well, and test for equality.

##change the second data object
## change the second data object

proj.data.two$get.pasta()

Expand Down Expand Up @@ -179,3 +180,99 @@ test_that("Filter patchstack mails with PaStA enabled", {
## ensure that there are no other entries than the ones that have been verified to exist above
expect_equal(6, nrow(filtered.pasta))
})


test_that("Merge commit messages to commit data", {
## initialize a ProjectData object with the ProjectConf
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("commit.messages", "message")
proj.data = ProjectData$new(project.conf = proj.conf)

commits = proj.data$get.commits()

commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32713, 32710, 32714, 32715, 32716,
32711, 32711)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45", "2016-07-12 16:05:41",
"2016-07-12 16:06:10", "2016-07-12 16:06:20", "2016-07-12 16:06:30",
"2016-07-12 16:06:32", "2016-07-12 16:06:32")),
author.name = c("Björn", "Olaf", "Olaf", "Karl", "Karl", "Thomas", "Thomas", "Thomas"),
author.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44", "2016-07-12 17:05:55",
"2016-07-12 16:06:10", "2016-07-12 16:06:20", "2016-07-12 16:06:30",
"2016-07-12 16:06:32", "2016-07-12 16:06:32")),
committer.name = c("Björn", "Björn", "Thomas", "Karl", "Karl", "Thomas", "Thomas", "Thomas"),
committer.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338",
"3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f", "d01921773fae4bed8186b0aa411d6a2f7a6626e6",
"0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526"),
changed.files = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1)),
added.lines = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1)),
deleted.lines = as.integer(c(1, 0, 0, 0, 0, 0, 0, 0)),
diff.size = as.integer(c(2, 1, 1, 1, 1, 1, 1, 1)),
file = c("test.c", "test.c", "test2.c", "test3.c", UNTRACKED.FILE,
UNTRACKED.FILE, "test2.c", "test2.c"),
artifact = c("A", "A", "Base_Feature", "Base_Feature",
UNTRACKED.FILE.EMPTY.ARTIFACT, UNTRACKED.FILE.EMPTY.ARTIFACT, "Base_Feature", "foo"),
artifact.type = c("Feature", "Feature", "Feature","Feature", UNTRACKED.FILE.EMPTY.ARTIFACT.TYPE,
UNTRACKED.FILE.EMPTY.ARTIFACT.TYPE, "Feature", "Feature"),
artifact.diff.size = as.integer(c(1, 1, 1, 1, 0, 0, 1, 1)),
title = c("Add stuff", "Add some more stuff", "I added important things", "I wish it would work now", "Wish", "...", "", ""),
message = c("", "", "the things are\nnothing", "", "intensifies", "still\ndoesn't\nwork\nas expected", "", ""))

# throw away the row names as they are permuted when merging and
# we do not care for their order in the test
bockthom marked this conversation as resolved.
Show resolved Hide resolved
rownames(commits) = NULL
rownames(commit.data.expected) = NULL

expect_identical(commits, commit.data.expected)
bockthom marked this conversation as resolved.
Show resolved Hide resolved
})

test_that("Merge commit message titles to commit data", {
## initialize a ProjectData object with the ProjectConf
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("commit.messages", "title")
proj.data = ProjectData$new(project.conf = proj.conf)

commits = proj.data$get.commits()

commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32713, 32710, 32714, 32715, 32716,
32711, 32711)),
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45", "2016-07-12 16:05:41",
"2016-07-12 16:06:10", "2016-07-12 16:06:20", "2016-07-12 16:06:30",
"2016-07-12 16:06:32", "2016-07-12 16:06:32")),
author.name = c("Björn", "Olaf", "Olaf", "Karl", "Karl", "Thomas", "Thomas", "Thomas"),
author.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]"),
committer.date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-20 10:00:44", "2016-07-12 17:05:55",
"2016-07-12 16:06:10", "2016-07-12 16:06:20", "2016-07-12 16:06:30",
"2016-07-12 16:06:32", "2016-07-12 16:06:32")),
committer.name = c("Björn", "Björn", "Thomas", "Karl", "Karl", "Thomas", "Thomas", "Thomas"),
committer.email = c("[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]", "[email protected]", "[email protected]"),
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338",
"3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f", "d01921773fae4bed8186b0aa411d6a2f7a6626e6",
"0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526"),
changed.files = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1)),
added.lines = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1)),
deleted.lines = as.integer(c(1, 0, 0, 0, 0, 0, 0, 0)),
diff.size = as.integer(c(2, 1, 1, 1, 1, 1, 1, 1)),
file = c("test.c", "test.c", "test2.c", "test3.c", UNTRACKED.FILE,
UNTRACKED.FILE, "test2.c", "test2.c"),
artifact = c("A", "A", "Base_Feature", "Base_Feature",
UNTRACKED.FILE.EMPTY.ARTIFACT, UNTRACKED.FILE.EMPTY.ARTIFACT, "Base_Feature", "foo"),
artifact.type = c("Feature", "Feature", "Feature","Feature", UNTRACKED.FILE.EMPTY.ARTIFACT.TYPE,
UNTRACKED.FILE.EMPTY.ARTIFACT.TYPE, "Feature", "Feature"),
artifact.diff.size = as.integer(c(1, 1, 1, 1, 0, 0, 1, 1)),
title = c("Add stuff", "Add some more stuff", "I added important things", "I wish it would work now", "Wish", "...", "", ""))

# throw away the row names as they are permuted when merging and
# we do not care for their order in the test
rownames(commits) = NULL
rownames(commit.data.expected) = NULL
bockthom marked this conversation as resolved.
Show resolved Hide resolved

expect_identical(commits, commit.data.expected)
})
22 changes: 21 additions & 1 deletion tests/test-read.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
## Copyright 2018 by Thomas Bock <[email protected]>
## Copyright 2018 by Jakob Kronawitter <[email protected]>
## Copyright 2018-2019 by Anselm Fehnker <[email protected]>
## Copyright 2020 by Niklas Schneider <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -91,7 +92,6 @@ test_that("Read the raw commit data with the feature artifact.", {
expect_identical(dates, dates.expected, info = "Ordering by date.")
})


test_that("Read the raw commit data with the file artifact.", {

## configuration object for the datapath
Expand Down Expand Up @@ -137,6 +137,26 @@ test_that("Read the raw commit data with the file artifact.", {
expect_identical(dates, dates.expected, info = "Ordering by date.")
})

test_that("Read the commit message data.", {

## configuration object for the datapath
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file")

## read the actual data
commit.message.data.read = read.commit.messages(proj.conf$get.value("datapath"))

## build the expected data.frame
commit.data.expected = data.frame(commit.id = sprintf("<commit-%s>", c(32712, 32713, 32710, 32714, 32715, 32716, 32711)),
bockthom marked this conversation as resolved.
Show resolved Hide resolved
hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338",
"3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f", "d01921773fae4bed8186b0aa411d6a2f7a6626e6",
"0a1a5c523d835459c42f33e863623138555e2526"),
title = c("Add stuff", "Add some more stuff", "I added important things", "I wish it would work now", "Wish", "...", ""),
message = c("", "", "the things are\nnothing", "", "intensifies", "still\ndoesn't\nwork\nas expected", "" ))

## check the results
expect_identical(commit.message.data.read, commit.data.expected, info = "Commit message data.")
})

test_that("Read the synchronicity data.", {
## configuration object
Expand Down
Loading