Skip to content

Commit

Permalink
Merge pull request #193 from nlschn/mydev
Browse files Browse the repository at this point in the history
Add commit message merge functionality

Reviewed-by: Claus Hunsen <[email protected]>
Reviewed-by: Thomas Bock <[email protected]>
Reviewed-by: Christian Hechtl <[email protected]>
  • Loading branch information
bockthom authored Feb 3, 2021
2 parents b1eeaf6 + 18843a8 commit c72188e
Show file tree
Hide file tree
Showing 14 changed files with 1,008 additions and 431 deletions.
2 changes: 1 addition & 1 deletion .drone.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ steps:

- name: R-3.3
pull: if-not-exists
image: r-base:3.3.3
image: r-base:3.3.2
commands: *runTests
depends_on: [clone]

Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unversioned

### Added
- Add functionality to read and process commit messages in order to merge them to the commit data (see issue #180). Three values are available for the new attribute `commit.messages` in `ProjectConf`: `none`, `title` and `messages` (PR #193, 85b1d0572c0fb9f4c062bceb1363b0398f98b85f, fdc414ade1a640f533e809a25cfe012e42b3cffa, 43e1894998e18faff3a65114fa65ee54e1d2f66e)
- Add functions `cleanup.commit.message.data` and `cleanup.synchronicity.data` to remove commit hashes that are not any more present in the commit data from the commit message data or synchronicity data (PR #193, 98e83b037ecc88d9a29e8e4ca93598a9978e85a2)

### Changed/Improved
- Add `.drone.yml` to enable running our CI pipelines on drone.io (PR #191, 1c5804b59c582cf34af6970b435add51452fbd11)

Expand Down
97 changes: 53 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,41 +10,41 @@ If you wonder: The name `coronet` derives as an acronym from the words "configur


## Table of contents

- [Integration](#integration)
* [Requirements](#requirements)
* [R](#r)
* [packrat (recommended)](#packrat)
* [Folder structure of the input data](#folder-structure-of-the-input-data)
* [Needed R packages](#needed-r-packages)
* [Submodule](#submodule)
* [Selecting the correct version](#selecting-the-correct-version)
- [Functionality](#functionality)
* [Configuration](#configuration)
* [Data sources](#data-sources)
* [Network construction](#network-construction)
* [Data sources for network construction](#data-sources-for-network-construction)
* [Types of networks](#types-of-networks)
* [Relations](#relations)
* [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
* [Vertex and edge attributes](#vertex-and-edge-attributes)
* [Further functionalities](#further-functionalities)
* [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
* [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
* [Handling data independently](#handling-data-independently)
* [How-to](#how-to)
* [File/Module overview](#filemodule-overview)
- [Configuration classes](#configuration-classes)
* [ProjectConf](#projectconf)
* [Basic information](#basic-information)
* [Artifact-related information](#artifact-related-information)
* [Revision-related information](#revision-related-information)
* [Data paths](#data-paths)
* [Splitting information](#splitting-information)
* [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
* [NetworkConf](#networkconf)
- [License](#license)
- [Work in progress](#work-in-progress)
- [Integration](#integration)
- [Requirements](#requirements)
- [`R`](#r)
- [`packrat` (recommended)](#packrat-recommended)
- [Folder structure of the input data](#folder-structure-of-the-input-data)
- [Needed R packages](#needed-r-packages)
- [Submodule](#submodule)
- [Selecting the correct version](#selecting-the-correct-version)
- [Functionality](#functionality)
- [Configuration](#configuration)
- [Data sources](#data-sources)
- [Network construction](#network-construction)
- [Data sources for network construction](#data-sources-for-network-construction)
- [Types of networks](#types-of-networks)
- [Relations](#relations)
- [Edge-construction algorithms for author networks](#edge-construction-algorithms-for-author-networks)
- [Vertex and edge attributes](#vertex-and-edge-attributes)
- [Further functionalities](#further-functionalities)
- [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
- [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
- [Handling data independently](#handling-data-independently)
- [How-to](#how-to)
- [File/Module overview](#filemodule-overview)
- [Configuration classes](#configuration-classes)
- [ProjectConf](#projectconf)
- [Basic information](#basic-information)
- [Artifact-related information](#artifact-related-information)
- [Revision-related information](#revision-related-information)
- [Data paths](#data-paths)
- [Splitting information](#splitting-information)
- [(Configurable) Data-retrieval-related parameters](#configurable-data-retrieval-related-parameters)
- [NetworkConf](#networkconf)
- [Contributing](#contributing)
- [License](#license)
- [Work in progress](#work-in-progress)


## Integration
Expand Down Expand Up @@ -123,6 +123,7 @@ Alternatively, you can run `Rscript install.R` to install the packages.
- `parallel`: For parallelization
- `logging`: Logging
- `sqldf`: For advanced aggregation of `data.frame` objects
- `data.table`: For faster data processing
- `testthat`: For the test suite
- `patrick`: For the test suite
- `ggplot2`: For plotting of data
Expand Down Expand Up @@ -179,11 +180,16 @@ There are two distinguishable types of data sources that are both handled by the
* Issue data (called `"issues"` internally)

- Additional (orthogonal) data sources (augmentable to main data sources, not splittable)
* Commit messages are available through the parameter `commit.messages` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class. Three values can be used:
1. `none` is the default value and does not impact the configuration at all.
2. `title` merges the commit message titles (i.e. the first non white space line of a commit message) to the commit data. This gives the data frame an additional column `title`.
3. `messages` merges both titles and message bodies to the commit data frame. This adds two new columns `title` and `message`.
* [PaStA](https://github.com/lfd/PaStA/) data (patch-stack analysis, see also the parameter `pasta` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class))
* Patch-stack analysis to link patches sent to mailing lists and upstream commits
* Synchronicity information on commits (see also the parameter `synchronicity` in the [`ProjectConf`](#configurable-data-retrieval-related-parameters) class)
* Synchronous commits are commits that change a source-code artifact that has also been changed by another author within a reasonable time-window.



The important difference is that the *main data sources* are used internally to construct artifact vertices in relevant types of networks. Additionally, these data sources can be used as a basis for splitting `ProjectData` in a time-based or activity-based manner – obtaining `RangeData` instances as a result (see file `split.R` and the contained functions). Thus, `RangeData` objects contain only data of a specific period of time.

The *additional data sources* are orthogonal to the main data sources, can augment them by additional information, and, thus, are not split at any time.
Expand Down Expand Up @@ -532,16 +538,23 @@ There is no way to update the entries, except for the revision-based parameters.
- `commits.filter.untracked.files`
* Remove all information concerning untracked files from the commit data. This effect becomes clear when retrieving commits using `get.commits.filtered`, because then the result of which does not contain any commits that solely changed untracked files. Networks built on top of this `ProjectData` do also not contain any information about untracked files.
* [*`TRUE`*, `FALSE`]
- `mails.filter.patchstack.mails`
* Filter patchstack mails from the mail data. In a thread, a patchstack spans the first sequence of mails where each mail has been authored by the thread creator and has been sent within a short time window after the preceding mail. The mails spanned by a patchstack are called
'patchstack mails' and for each patchstack, every patchstack mail but the first one are filtered when `mails.filter.patchstack.mails = TRUE`.
* [`TRUE`, *`FALSE`*]
- `commmit.messages`
* Read and add commit messages to commits. The column `title` will contain the first line of the message and, if selected, the column `message` will contain the rest.
* [*`none`*, `title`, `messages`]
- `issues.only.comments`
* Only use comments from the issue data on disk and no further events such as references and label changes
* [*`TRUE`*, `FALSE`]
- `issues.from.source`
* Choose from which sources the issue data on disk is read in. Multiple sources can be chosen.
* [*`github`, `jira`*]
- `mails.filter.patchstack.mails`
* Filter patchstack mails from the mail data. In a thread, a patchstack spans the first sequence of mails where each mail has been authored by the thread creator and has been sent within a short time window after the preceding mail. The mails spanned by a patchstack are called
'patchstack mails' and for each patchstack, every patchstack mail but the first one are filtered when `mails.filter.patchstack.mails = TRUE`.
* [`TRUE`, *`FALSE`*]
- `pasta`
* Read and integrate [PaStA](https://github.com/lfd/PaStA/) data with commit and mail data (columns `pasta` and `revision.set.id`)
* [`TRUE`, *`FALSE`*]
* **Note**: To include PaStA-based edge attributes, you need to give the `"pasta"` edge attribute for `edge.attributes`.
- `synchronicity`
* Read and add synchronicity data to commits (column `synchronicity`)
* [`TRUE`, *`FALSE`*]
Expand All @@ -550,10 +563,6 @@ There is no way to update the entries, except for the revision-based parameters.
* The time-window (in days) to use for synchronicity data if enabled by `synchronicity = TRUE`
* [1, *5*, 10, 15]
* **Note**: If, at least, one artifact in a commit has been edited by more than one developer within the configured time window, then the whole commit is considered to be synchronous.
- `pasta`
* Read and integrate [PaStA](https://github.com/lfd/PaStA/) data with commit and mail data (columns `pasta` and `revision.set.id`)
* [`TRUE`, *`FALSE`*]
* **Note**: To include PaStA-based edge attributes, you need to give the `"pasta"` edge attribute for `edge.attributes`.

### NetworkConf

Expand Down
1 change: 1 addition & 0 deletions install.R
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ packages = c(
"parallel",
"logging",
"sqldf",
"data.table",
"testthat",
"patrick",
"ggplot2",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";" Add some more stuff "
32710;"3a0ed78458b3976243db6829f63eba3eead26774";" I added important things the things are nothing"
32714;"1143db502761379c2bfcecc2007fc34282e7ee61";" I wish it would work now"
32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish intensifies"
32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";" ... still doesn't work as expected "
32711;"0a1a5c523d835459c42f33e863623138555e2526";""
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
32712;"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"Add stuff"
32713;"5a5ec9675e98187e1e92561e1888aa6f04faa338";" Add some more stuff "
32710;"3a0ed78458b3976243db6829f63eba3eead26774";" I added important things the things are nothing"
32714;"1143db502761379c2bfcecc2007fc34282e7ee61";" I wish it would work now"
32715;"418d1dc4929ad1df251d2aeb833dd45757b04a6f";"Wish intensifies"
32716;"d01921773fae4bed8186b0aa411d6a2f7a6626e6";" ... still doesn't work as expected "
32711;"0a1a5c523d835459c42f33e863623138555e2526";""
Loading

0 comments on commit c72188e

Please sign in to comment.