se-sic · Leo-Send · Nov 27, 2024 · Dec 11, 2024 · Dec 11, 2024 · Dec 11, 2024
diff --git a/NEWS.md b/NEWS.md
@@ -15,6 +15,7 @@
 - Add commit network as a new type of network. It uses commits as vertices and connects them either via cochange or commit interactions. This includes adding new config parameters and the function `add.vertex.attribute.commit.network` for adding vertex attributes to a commit network (PR #263, ab73271781e8e9a0715f784936df4b371d64c338, ab73271781e8e9a0715f784936df4b371d64c338, cd9a930fcb54ff465c2a5a7c43cfe82ac15c134d)
 - Add `remove.duplicate.edges` function that takes a network as input and conflates identical edges (PR #268, d9a4be417b340812b744f59398ba6460ba527e1c, 0c2f47c4fea6f5f2f582c0259f8cf23af985058a, c6e90dd9cb462232563f753f414da14a24b392a3)
 - Add `cumulative` as an argument to `construct.ranges` which enables the creation of cumulative ranges from given revisions (PR #268, a135f6bb6f83ccb03ae27c735c2700fccc1ee0c8, 8ec207f1e306ef6a641fb0205a9982fa89c7e0d9)
+- Add four new metrics that can be used for the classification of authors into core and peripheral: Betweenness, Closeness, Pagerank and Eccentricity (PR #276, 65d5c9cc86708777ef458b0c2e744ab4b846bdd1, b392d1a125d0f306b4bce8d95032162a328a3ce2, c5d37d40024e32ad5778fa5971a45bc08f7631e0)
 
 ### Changed/Improved
 

diff --git a/README.md b/README.md
@@ -34,6 +34,9 @@ If you wonder: The name `coronet` derives as an acronym from the words "configur
       - [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
       - [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
       - [Handling data independently](#handling-data-independently)
+      - [Core/Peripheral classification](#coreperipheral-classification)
+           - [Count-based metrics](#count-based-metrics)
+           - [Network-based metrics](#network-based-metrics)
     - [How-to](#how-to)
     - [File/Module overview](#filemodule-overview)
   - [Configuration classes](#configuration-classes)
@@ -375,6 +378,55 @@ Analogously, the `NetworkConf` parameter `unify.date.ranges` enables this very f
 
 In some cases, it is not necessary to build a network to get the information you need. Therefore, please remember that we offer the possibility to get the raw data or mappings between, e.g., authors and the files they edited. The data inside an instance of `ProjectData` can be accessed independently. Examples can be found in the file `showcase.R`.
 
+#### Core/Peripheral classification
+
+Core/Peripheral classification descibes the process of dividing the authors of a project into either `core` or `peripheral` developers based on the principle that the core developers contribute most of the work in a given project. The concrete threshold can be configured in `CORE.THRESHOLD` and is set to 80%  per default, a value commonly used in literature. In practice, this is done by assigning scores to developers to approximate their importance in a project and then dividing the authors into `core` or `peripheral` based on these scores such that the desired split is achieved.
+
+##### Count-based metrics
+
+In this section, we provide descriptions of the different algorithms we provide for classifying authors into core or peripheral authors using count-based metrics.
+- `commit.count`
+    * calculates scores based on the number of commits per author
+- `loc.count`
+    * calculates scores based on the number of lines of code changed by each author
+- `mail.count`
+    * calculates scores based on the number of mails sent per author
+- `mail.thread.count`
+    * calculates scores based on the number of mail threads each author participated in
+- `issue.count`
+    * calculates scores based on the number of issues each author participated in
+- `issue.comment.count`
+    * calculates scores based on the number of comments each author made in issues
+- `issue.commented.in.count`
+    * calculates scores based on the number of issues each author commented in
+- `issue.created.count`
+    * calculates scores based on the number of issues each author created
+
+##### Network-based metrics
+
+In this section, we provide descriptions of the different algorithms we provide for classifying authors into core or peripheral authors using metrics that are used on author networks.
+- `network.degree`
+    * calculates scores for authors based on the vertex degrees in an author network
+    * the degree of a vertex is the number of adjacent edges
+- `network.eigen`
+    * calculates scores for authors based on the eigenvector centralities in an author network
+    * eigenvector centrality measures the importance of vertices within a network by awarding scores for adjacent edges proportional to the score of the connected vertex
+- `network.hierarchy`
+    * calculates scores for authors based on the hierarchy found within an author network
+    * hierarchical scores are calculated by dividing the vertex degree by the clustering coefficient of each vertex
+- `network.betweenness`
+    * calculates scores for authors based on the betweenness of vertices in an author network
+    * betweenness measures the number of shortest paths between any two vertices that go through each vertex
+- `network.closeness`
+    * calculates scores for authors based on the closeness of vertices in an author network
+    * closeness measures how close vertices are to each other by calculating the sum of their shortest paths to all other vertices
+- `network.pagerank`
+    * calculates scores for authors based on the pagerank of vertices in an author network
+    * pagerank refers to the pagerank algorithm employed by google, which is closely related to eigenvector centrality
+- `network.eccentricity`
+    * calculates scores for authors based on the eccentricity of vertices in an author network
+    * eccentricity measures the length of the shortest path to each vertex's furthest reachable vertex
+
 ### How-to
 
 In this section, we give a short example on how to initialize all needed objects and build a bipartite network.

diff --git a/tests/test-core-peripheral.R b/tests/test-core-peripheral.R
@@ -18,6 +18,7 @@
 ## Copyright 2019 by Christian Hechtl <[email protected]>
 ## Copyright 2021 by Christian Hechtl <[email protected]>
 ## Copyright 2023-2024 by Maximilian Löffler <[email protected]>
+## Copyright 2024 by Leo Sendelbach <[email protected]>
 ## All Rights Reserved.
 
 
@@ -105,6 +106,117 @@ test_that("Eigenvector classification", {
     expect_equal(expected, result, tolerance = 0.0001)
 })
 
+test_that("Hierarchy classification", {
+
+    vertices = data.frame(
+        name = c("Olaf", "Thomas", "Karl"),
+        kind = TYPE.AUTHOR,
+        type = TYPE.AUTHOR
+        )
+    edges = data.frame(
+        from = c("Olaf", "Thomas", "Karl", "Thomas"),
+        to = c("Thomas", "Karl", "Olaf", "Thomas"),
+        func = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"),
+        hash = c("0a1a5c523d835459c42f33e863623138555e2526",
+                 "418d1dc4929ad1df251d2aeb833dd45757b04a6f",
+                 "5a5ec9675e98187e1e92561e1888aa6f04faa338",
+                 "d01921773fae4bed8186b0aa411d6a2f7a6626e6"),
+        file = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"),
+        base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774",
+                      "0a1a5c523d835459c42f33e863623138555e2526",
+                      "1143db502761379c2bfcecc2007fc34282e7ee61",
+                      "0a1a5c523d835459c42f33e863623138555e2526"),
+        base.func = c("test2.c::test2", "test2.c::test2",
+                      "test3.c::test_function", "test2.c::test2"),
+        base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
+        artifact.type = c("CommitInteraction", "CommitInteraction", "CommitInteraction", "CommitInteraction"),
+        weight = c(1, 1, 1, 1),
+        type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
+        relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
+        )
+    test.network = igraph::graph_from_data_frame(edges, directed = FALSE, vertices = vertices)
+
+    ## Act
+    result = get.author.class.network.hierarchy(test.network)
+    ## Assert
+    expected.core = data.frame(author.name = c("Thomas"),
+                               hierarchy = c(4))
+    expected.peripheral = data.frame(author.name = c("Olaf", "Karl"),
+                                     hierarchy = c(2, 2))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result)
+})
+
+test_that("Betweenness classification", {
+
+    ## Act
+    result = get.author.class.network.betweenness(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               betweenness.centrality = c(1))
+    expected.peripheral = data.frame(author.name = c("Björn", "udo", "Thomas", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     betweenness.centrality = c(0, 0, 0, 0, 0, 0))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result)
+})
+
+test_that("Closeness classification", {
+
+    ## Act
+    result = get.author.class.network.closeness(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               closeness.centrality = c(0.5))
+    expected.peripheral = data.frame(author.name = c("Björn", "Thomas", "udo", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     closeness.centrality = c(0.33333, 0.33333, 0.0, 0.0, 0.0, 0.0))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result, tolerance = 0.0001)
+})
+
+test_that("Pagerank classification", {
+
+    ## Act
+    result = get.author.class.network.pagerank(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               pagerank.centrality = c(0.40541))
+    expected.peripheral = data.frame(author.name = c("Björn", "Thomas", "udo", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     pagerank.centrality = c(0.21396, 0.21396, 0.041667, 0.041667, 0.041667, 0.041667))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result, tolerance = 0.0001)
+})
+
+test_that("Eccentricity classification", {
+
+    ## Act
+    result = get.author.class.network.eccentricity(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               eccentricity = c(1))
+    expected.peripheral = data.frame(author.name = c("Björn", "udo", "Thomas", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     eccentricity = c(0, 0, 0, 0, 0, 0))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result)
+})
+
 # TODO: Add a test for hierarchy classification
 
 test_that("Commit-count classification using 'result.limit'" , {