diff --git a/NEWS.md b/NEWS.md index 297aa190..0f51cbc0 100644 --- a/NEWS.md +++ b/NEWS.md @@ -16,12 +16,19 @@ and their respective centrality values (d3cd528609480f87658601ef13326e950a74cce7 - Add `get.data.sources.from.relations` to `util-networks.R` which extracts the data sources of a network that were used when building it (PR #195, d1e4413a49ab83a115a7f26719592876371ab264) - Add tests for the `get.data.sources.from.relations` function (PR #195, add0c746dde8279da41d180deecf52b91a46095c) - Add logo directory containing several logo variants (PR #196, 82f99719b8a43b0a44914b67b26bf1a34bb076c6, dc4659ea354e97159f8ee6167811db544f0b7087, fdc5e677325225f92d1f99948cb9441bfe1d150d, 752a9b376ffeffd5d6b380b4fdba838a890e3ef7) +- Add function `preprocess.issue.data`, which implements common issue data filtering operations. (fcf5cee64c809d62a33275cbd3272b8087869eea, a566caec6d7e649cc495d292a19eca8a7ffccde8, 5ba6feb988c44e2ba398bccce6c88e69d3bb552e) +- Add function `get.issues.filtered.uncached`, which gets the issues filtered without poisoning or using the cache. (eb919fad9519d6e1a23261977bb3bfa2b899aaf9) +- Add per-author vertex attributes regarding counting of issues, issue-creations, issue-comments, mails, mail-threads, ... (like mail thread count, issue creation count) (PR #194, issue #188, 9f9150a97ffbb64607df0ddcbce299e16c2580da, 7260d62cf6f1470584753f76970d19664638eeed, 139f70b67903909bcd4c57e26afa458681a869f2, eb4f649c04155e22195627072e0f08bb8fe65dc4, 627873c641410182ca8fee0e78b95d7bda1e8e6b, 1e1406f4a0898cac3e61a7bc9a5aa665dceef79f, 98e11abc651b5fe0ec994eddea078635b0d6f0b2, a566caec6d7e649cc495d292a19eca8a7ffccde8) ### Changed/Improved - Add `.drone.yml` to enable running our CI pipelines on drone.io (PR #191, 1c5804b59c582cf34af6970b435add51452fbd11) - Update documentation in `util-network-metrics.R` (PR #195, f929248182594613bd203e100268e3e3dce87f34, de9988cc171cafdd084701d5a2693a74176a802a) - Add check for empty network in `metrics.hub.degree` function. In the case of an empty network, a warning is being printed and `NA` is returned (PR #195, 4b164bebea1e8258cb93febf51271a4b6f486779) - Adjust the function `ProjectData$get.artifacts`: Rename its single parameter to `data.sources` and change the function so that it can extract the artifacts for multiple data sources at once. The default is still that only artifacts from the commit data are extracted. (PR #195, cf795f26652b00de5d717c703c688af55a972943, 70c05ecd1e3c0f10810acc2b2ae06a3eb8856317, 5a46ff4d428af7f301fe57d6e9e10421f553a9cc, fd767bb37ca608c28d9ff4a449415cc0e863d7ee) +- Rename `get.issues` to `get.issues.filtered`, and write a new `get.issues` to get the unfiltered issues so that these methods follow the naming scheme known from the respective methods for commits(b9dd94c8575b8cab40d0d1185368854f84299d87) + +### Fixed +- Fix fencing issue timing data so that issue events "happen" after the issue was created. Since only `commit_added` events are affected, that only happens for these. (issue #185, 627873c641410182ca8fee0e78b95d7bda1e8e6b, 6ff585d9da1da3432668605f0c09f8e182ad0d2f) ## 3.7 diff --git a/README.md b/README.md index d6109ebb..592be28d 100644 --- a/README.md +++ b/README.md @@ -424,6 +424,8 @@ Additionally, for more examples, the file `showcase.R` is worth a look. * Functionality to add vertex attributes to existing networks - `util-networks-metrics.R` * A set of network-metric functions +- `util-data-misc.R` + * Helper functions for data handling and the calculation of associated metrics - `util-networks-misc.R` * Helper functions for network creation (e.g., create adjacency matrices) - `util-tensor.R` diff --git a/showcase.R b/showcase.R index e6a1d00c..bd3e8e22 100644 --- a/showcase.R +++ b/showcase.R @@ -186,7 +186,9 @@ y = NetworkBuilder$new(project.data = y.data, network.conf = net.conf) # sample.cumulative = add.vertex.attribute.commit.count.author(my.networks, x.data, aggregation.level = "cumulative") # ## add email-address vertex attribute # sample.mail = add.vertex.attribute.author.email(my.networks, x.data, "author.email") - +# sample.mail.thread = add.vertex.attribute.mail.thread.count(my.networks, x.data) +# sample.issues.created = add.vertex.attribute.issue.creation.count(my.networks, x.data) +# sample.pull.requests = add.vertex.attribute.issue.count(my.networks, x.data, issue.type = "pull.requests") # ## add vertex attributes for the project-level network # x.net.as.list = list("1970-01-01 00:00:00-2030-01-01 00:00:00" = x$get.author.network()) # sample.entire = add.vertex.attribute.commit.count.author(x.net.as.list, x.data, aggregation.level = "complete") diff --git a/tests/codeface-data/results/testing/test_feature/feature/issues-github.list b/tests/codeface-data/results/testing/test_feature/feature/issues-github.list index 706b8c40..e287bc5b 100644 --- a/tests/codeface-data/results/testing/test_feature/feature/issues-github.list +++ b/tests/codeface-data/results/testing/test_feature/feature/issues-github.list @@ -1,10 +1,10 @@ -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"created";"Karl";"karl@example.org";"2016-07-12 15:59:25";"open";"[]" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"assigned";"Olaf";"olaf@example.org";"2016-07-12 15:59:25";"";"""""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"commented";"Karl";"karl@example.org";"2016-07-12 15:59:59";"open";"[]" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-12 16:06:30";"closed";"""open""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"add_link";"Karl";"karl@example.org";"2016-08-07 15:37:02";"930af63a030fb92e48eddff01f53284c3eeba80e";"""commit""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"referenced";"Karl";"karl@example.org";"2016-08-31 16:45:09";"";"""""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"referenced";"Thomas";"thomas@example.org";"2016-10-05 16:45:09";"";"""""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"created";"Karl";"karl@example.org";"2016-07-12 15:59:25";"open";"[]" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"assigned";"Olaf";"olaf@example.org";"2016-07-12 15:59:25";"";"""""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"commented";"Karl";"karl@example.org";"2016-07-12 15:59:59";"open";"[]" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-12 16:06:30";"closed";"""open""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"add_link";"Karl";"karl@example.org";"2016-08-07 15:37:02";"930af63a030fb92e48eddff01f53284c3eeba80e";"""commit""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"referenced";"Karl";"karl@example.org";"2016-08-31 16:45:09";"";"""""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"referenced";"Thomas";"thomas@example.org";"2016-10-05 16:45:09";"";"""""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"mentioned";"udo";"udo@example.org";"2016-07-12 15:30:02";"Thomas";"""thomas@example.org""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"subscribed";"udo";"udo@example.org";"2016-07-12 15:30:02";"Thomas";"""thomas@example.org""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"commented";"Thomas";"thomas@example.org";"2016-07-12 16:03:59";"open";"[]" @@ -15,3 +15,16 @@ 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"subscribed";"Björn";"bjoern@example.org";"2016-12-07 15:30:02";"udo";"""udo@example.org""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"labeled";"Olaf";"olaf@example.org";"2017-05-23 12:31:34";"decided";"""""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"commented";"Björn";"bjoern@example.org";"2017-05-23 12:32:39";"open";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"created";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"open";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"commented";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"open";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"state_updated";"Thomas";"thomas@example.org";"2016-07-12 15:59:59";"closed";"""open""" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"commented";"Olaf";"olaf@example.org";"2016-07-12 16:01:01";"closed";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:06:01";"closed";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-14 13:37:00";"open";"""closed""" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"created";"Björn";"bjoern@example.org";"2016-07-12 14:59:25";"open";"[]" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"commented";"Björn";"bjoern@example.org";"2016-07-12 14:59:25";"open";"[]" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"merged";"Olaf";"olaf@example.org";"2016-07-12 16:04:59";"";"""""" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-12 16:04:59";"closed";"""open""" +"4";"Example pull request 4";"[""pull request"", ""enhancement""]";"open";"[]";"2016-07-12 16:02:02";"";"[]";"commit_added";"Björn";"bjoern@example.org";"2016-07-12 15:58:59";"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"""""" +"4";"Example pull request 4";"[""pull request"", ""enhancement""]";"open";"[]";"2016-07-12 16:02:02";"";"[]";"created";"Olaf";"olaf@example.org";"2016-07-12 16:02:02";"open";"[]" +"4";"Example pull request 4";"[""pull request"", ""enhancement""]";"open";"[]";"2016-07-12 16:02:02";"";"[]";"commented";"Olaf";"olaf@example.org";"2016-07-12 16:02:02";"open";"[]" diff --git a/tests/codeface-data/results/testing/test_feature/feature/issues-jira.list b/tests/codeface-data/results/testing/test_feature/feature/issues-jira.list index 529fa520..3740aa58 100644 --- a/tests/codeface-data/results/testing/test_feature/feature/issues-jira.list +++ b/tests/codeface-data/results/testing/test_feature/feature/issues-jira.list @@ -11,9 +11,9 @@ "ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"commented";"Olaf";"olaf@example.org";"2013-05-25 06:22:23";"open";"[""unresolved""]" "ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"commented";"Olaf";"olaf@example.org";"2013-06-01 06:50:26";"open";"[""unresolved""]" "ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"resolution_updated";"Björn";"bjoern@example.org";"2013-06-01 06:53:06";"fixed";"""unresolved""" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"created";"Björn";"bjoern@example.org";"2016-07-12 16:01:30";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:02:30";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-15 19:55:39";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-15 20:07:47";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-27 20:12:08";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-28 06:27:52";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"created";"Björn";"bjoern@example.org";"2016-07-12 16:01:30";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:02:30";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-15 19:55:39";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-15 20:07:47";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-27 20:12:08";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-28 06:27:52";"open";"[""unresolved""]" diff --git a/tests/codeface-data/results/testing/test_proximity/proximity/issues-github.list b/tests/codeface-data/results/testing/test_proximity/proximity/issues-github.list index 706b8c40..e287bc5b 100644 --- a/tests/codeface-data/results/testing/test_proximity/proximity/issues-github.list +++ b/tests/codeface-data/results/testing/test_proximity/proximity/issues-github.list @@ -1,10 +1,10 @@ -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"created";"Karl";"karl@example.org";"2016-07-12 15:59:25";"open";"[]" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"assigned";"Olaf";"olaf@example.org";"2016-07-12 15:59:25";"";"""""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"commented";"Karl";"karl@example.org";"2016-07-12 15:59:59";"open";"[]" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-12 16:06:30";"closed";"""open""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"add_link";"Karl";"karl@example.org";"2016-08-07 15:37:02";"930af63a030fb92e48eddff01f53284c3eeba80e";"""commit""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"referenced";"Karl";"karl@example.org";"2016-08-31 16:45:09";"";"""""" -3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-12-07 15:37:02";"[]";"referenced";"Thomas";"thomas@example.org";"2016-10-05 16:45:09";"";"""""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"created";"Karl";"karl@example.org";"2016-07-12 15:59:25";"open";"[]" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"assigned";"Olaf";"olaf@example.org";"2016-07-12 15:59:25";"";"""""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"commented";"Karl";"karl@example.org";"2016-07-12 15:59:59";"open";"[]" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-12 16:06:30";"closed";"""open""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"add_link";"Karl";"karl@example.org";"2016-08-07 15:37:02";"930af63a030fb92e48eddff01f53284c3eeba80e";"""commit""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"referenced";"Karl";"karl@example.org";"2016-08-31 16:45:09";"";"""""" +3;"Error in construct.networks.from.list for openssl function networks";"[""issue"", ""bug""]";"closed";"[]";"2016-07-12 15:59:25";"2016-07-12 16:06:30";"[]";"referenced";"Thomas";"thomas@example.org";"2016-10-05 16:45:09";"";"""""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"mentioned";"udo";"udo@example.org";"2016-07-12 15:30:02";"Thomas";"""thomas@example.org""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"subscribed";"udo";"udo@example.org";"2016-07-12 15:30:02";"Thomas";"""thomas@example.org""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"commented";"Thomas";"thomas@example.org";"2016-07-12 16:03:59";"open";"[]" @@ -15,3 +15,16 @@ 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"subscribed";"Björn";"bjoern@example.org";"2016-12-07 15:30:02";"udo";"""udo@example.org""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"labeled";"Olaf";"olaf@example.org";"2017-05-23 12:31:34";"decided";"""""" 6;"Distinguish directedness of networks and edge-construction algorithm";"[""issue"", ""bug"", ""enhancement""]";"open";"[]";"2016-07-12 14:30:13";"";"[]";"commented";"Björn";"bjoern@example.org";"2017-05-23 12:32:39";"open";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"created";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"open";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"commented";"Thomas";"thomas@example.org";"2016-07-12 15:59:25";"open";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"state_updated";"Thomas";"thomas@example.org";"2016-07-12 15:59:59";"closed";"""open""" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"commented";"Olaf";"olaf@example.org";"2016-07-12 16:01:01";"closed";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:06:01";"closed";"[]" +"1";"Example pull request 1";"[""pull request""]";"reopened";"[]";"2016-07-14 13:37:00";"";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-14 13:37:00";"open";"""closed""" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"created";"Björn";"bjoern@example.org";"2016-07-12 14:59:25";"open";"[]" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"commented";"Björn";"bjoern@example.org";"2016-07-12 14:59:25";"open";"[]" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"merged";"Olaf";"olaf@example.org";"2016-07-12 16:04:59";"";"""""" +"2";"Example pull request 2";"[""pull request""]";"closed";"[]";"2016-07-12 14:59:25";"2016-07-12 16:04:59";"[]";"state_updated";"Olaf";"olaf@example.org";"2016-07-12 16:04:59";"closed";"""open""" +"4";"Example pull request 4";"[""pull request"", ""enhancement""]";"open";"[]";"2016-07-12 16:02:02";"";"[]";"commit_added";"Björn";"bjoern@example.org";"2016-07-12 15:58:59";"72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0";"""""" +"4";"Example pull request 4";"[""pull request"", ""enhancement""]";"open";"[]";"2016-07-12 16:02:02";"";"[]";"created";"Olaf";"olaf@example.org";"2016-07-12 16:02:02";"open";"[]" +"4";"Example pull request 4";"[""pull request"", ""enhancement""]";"open";"[]";"2016-07-12 16:02:02";"";"[]";"commented";"Olaf";"olaf@example.org";"2016-07-12 16:02:02";"open";"[]" diff --git a/tests/codeface-data/results/testing/test_proximity/proximity/issues-jira.list b/tests/codeface-data/results/testing/test_proximity/proximity/issues-jira.list index 529fa520..3740aa58 100644 --- a/tests/codeface-data/results/testing/test_proximity/proximity/issues-jira.list +++ b/tests/codeface-data/results/testing/test_proximity/proximity/issues-jira.list @@ -11,9 +11,9 @@ "ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"commented";"Olaf";"olaf@example.org";"2013-05-25 06:22:23";"open";"[""unresolved""]" "ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"commented";"Olaf";"olaf@example.org";"2013-06-01 06:50:26";"open";"[""unresolved""]" "ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"resolution_updated";"Björn";"bjoern@example.org";"2013-06-01 06:53:06";"fixed";"""unresolved""" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"created";"Björn";"bjoern@example.org";"2016-07-12 16:01:30";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:02:30";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-15 19:55:39";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-15 20:07:47";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-27 20:12:08";"open";"[""unresolved""]" -"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-04-17 02:06:38";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-28 06:27:52";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"created";"Björn";"bjoern@example.org";"2016-07-12 16:01:30";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:02:30";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-15 19:55:39";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-15 20:07:47";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-27 20:12:08";"open";"[""unresolved""]" +"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Max";"max@example.org";"2016-07-28 06:27:52";"open";"[""unresolved""]" diff --git a/tests/test-data.R b/tests/test-data.R index 8794abef..89404202 100644 --- a/tests/test-data.R +++ b/tests/test-data.R @@ -15,6 +15,7 @@ ## Copyright 2018-2019 by Claus Hunsen ## Copyright 2019 by Jakob Kronawitter ## Copyright 2020-2021 by Niklas Schneider +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -71,11 +72,11 @@ test_that("Compare two ProjectData objects", { expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects.") - proj.data.one$get.issues() + proj.data.one$get.issues.filtered() expect_false(proj.data.one$equals(proj.data.two), "Two not identical ProjectData objects.") - proj.data.two$get.issues() + proj.data.two$get.issues.filtered() expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects.") diff --git a/tests/test-networks-author.R b/tests/test-networks-author.R index 911a604e..4fee98ac 100644 --- a/tests/test-networks-author.R +++ b/tests/test-networks-author.R @@ -18,6 +18,7 @@ ## Copyright 2018 by Thomas Bock ## Copyright 2018 by Jakob Kronawitter ## Copyright 2018-2019 by Anselm Fehnker +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -465,26 +466,39 @@ test_that("Network construction of the undirected author-issue network with all network.built = network.builder$get.author.network() ## vertex attributes - vertices = data.frame(name = c("Karl", "Olaf", "Thomas", "udo", "Björn", "Max"), + vertices = data.frame(name = c("Thomas", "Olaf", "Björn", "Karl", "udo", "Max"), kind = TYPE.AUTHOR, type = TYPE.AUTHOR) ## edge attributes - edges = data.frame(from = c(rep("Karl", 6), rep("Karl", 5), rep("Olaf", 3), # + edges = data.frame(from = c(rep("Thomas", 5), rep("Thomas", 4), rep("Olaf", 3), # + rep("Olaf", 4), # + rep("Karl", 6), rep("Karl", 5), rep("Olaf", 3), # + rep("Olaf", 3), # rep("udo", 4), rep("udo", 7), rep("udo", 3), rep("Thomas", 7), rep("Thomas", 3), rep("Björn", 6), # rep("Thomas", 9), rep("Thomas", 6), rep("Björn", 11), # rep("Björn", 6) # ), - to = c(rep("Olaf", 6), rep("Thomas", 5), rep("Thomas", 3), # + to = c(rep("Olaf", 5), rep("Björn", 4), rep("Björn", 3), # + rep("Björn", 4), # + rep("Olaf", 6), rep("Thomas", 5), rep("Thomas", 3), # + rep("Björn", 3), # rep("Thomas", 4), rep("Björn", 7), rep("Olaf", 3), rep("Björn", 7), rep("Olaf", 3), rep("Olaf", 6), # rep("Björn", 9), rep("Olaf", 6), rep("Olaf", 11), # rep("Max", 6) # ), - date = get.date.from.string(c( "2016-07-12 15:59:25", "2016-07-12 15:59:59", "2016-08-07 15:37:02", # + date = get.date.from.string(c( "2016-07-12 15:59:25", "2016-07-12 15:59:25", "2016-07-12 15:59:59", # + "2016-07-12 16:01:01", "2016-07-14 13:37:00", "2016-07-12 15:59:25", + "2016-07-12 15:59:25", "2016-07-12 15:59:59", "2016-07-12 16:06:01", + "2016-07-12 16:01:01", "2016-07-14 13:37:00", "2016-07-12 16:06:01", + "2016-07-12 14:59:25", "2016-07-12 14:59:25", "2016-07-12 16:04:59", # + "2016-07-12 16:04:59", + "2016-07-12 15:59:25", "2016-07-12 15:59:59", "2016-08-07 15:37:02", # "2016-08-31 16:45:09", "2016-07-12 15:59:25", "2016-07-12 16:06:30", "2016-07-12 15:59:25", "2016-07-12 15:59:59", "2016-08-07 15:37:02", "2016-08-31 16:45:09", "2016-10-05 16:45:09", "2016-07-12 15:59:25", "2016-07-12 16:06:30", "2016-10-05 16:45:09", + "2016-07-12 16:02:02", "2016-07-12 16:02:02", "2016-07-12 16:02:02", # "2016-07-12 15:30:02", "2016-07-12 15:30:02", "2016-07-12 16:03:59", # "2016-10-13 15:30:02", "2016-07-12 15:30:02", "2016-07-12 15:30:02", "2016-08-31 15:30:02", "2016-10-05 15:30:02", "2016-12-07 15:30:02", @@ -508,10 +522,15 @@ test_that("Network construction of the undirected author-issue network with all "2016-07-15 20:07:47", "2016-07-27 20:12:08", "2016-07-28 06:27:52" )), artifact.type = "IssueEvent", - issue.id = c( rep("", 14), rep("", 30), rep("", 26), + issue.id = c( rep("", 12), rep("", 4), rep("", 14), + rep("", 3), rep("", 30), rep("", 26), rep("", 6)), - event.name = c("created", "commented", "add_link", "referenced", "assigned", "state_updated", "created", # + event.name = c("created", "commented", "state_updated", "commented", "state_updated", "created", # + "commented", "state_updated", "commented", "commented", "state_updated", "commented", + "created", "commented", "merged", "state_updated", # + "created", "commented", "add_link", "referenced", "assigned", "state_updated", "created", # "commented", "add_link", "referenced", "referenced", "assigned", "state_updated", "referenced", + "commit_added", "created", "commented", # "mentioned", "subscribed", "commented", "add_link", "mentioned", "subscribed", "mentioned", # "subscribed", "mentioned", "subscribed", "commented", "mentioned", "subscribed", "labeled", "commented", "add_link", "mentioned", "subscribed", "mentioned", "subscribed", "commented", @@ -550,20 +569,24 @@ test_that("Network construction of the undirected author-issue network with just network.built = network.builder$get.author.network() ## vertex attributes - vertices = data.frame(name = c("Karl", "Thomas", "Björn", "Olaf", "Max"), + vertices = data.frame(name = c("Thomas", "Olaf", "Björn", "Karl", "Max"), kind = TYPE.AUTHOR, type = TYPE.AUTHOR) ## edge attributes - edges = data.frame(from = c(rep("Thomas", 2), # + edges = data.frame(from = c(rep("Thomas", 2), rep("Thomas", 2), rep("Olaf", 2), # + rep("Thomas", 2), # rep("Thomas", 7), rep("Thomas", 5), rep("Björn", 10), # rep("Björn", 5) # ), - to = c(rep("Björn", 2), # + to = c(rep("Olaf", 2), rep("Björn", 2), rep("Björn", 2), # + rep("Björn", 2), # rep("Björn", 7), rep("Olaf", 5), rep("Olaf", 10), # rep("Max", 5) # ), - date = get.date.from.string(c( "2016-07-12 16:03:59", "2017-05-23 12:32:39", # + date = get.date.from.string(c( "2016-07-12 15:59:25", "2016-07-12 16:01:01", "2016-07-12 15:59:25", # + "2016-07-12 16:06:01", "2016-07-12 16:01:01", "2016-07-12 16:06:01", + "2016-07-12 16:03:59", "2017-05-23 12:32:39", # "2013-04-21 23:52:09", "2013-05-05 21:46:30", "2013-05-05 21:49:21", # "2013-05-05 21:49:34", "2013-05-06 01:04:34", "2013-05-25 03:48:41", "2013-05-25 04:08:07", "2013-04-21 23:52:09", "2013-05-25 03:25:06", @@ -576,7 +599,8 @@ test_that("Network construction of the undirected author-issue network with just "2016-07-27 20:12:08", "2016-07-28 06:27:52" )), artifact.type = "IssueEvent", - issue.id = c( rep("", 2), rep("", 22), rep("", 5) ), + issue.id = c( rep("", 6), rep("", 2), + rep("", 22), rep("", 5) ), event.name = "commented", weight = 1, type = TYPE.EDGES.INTRA, diff --git a/tests/test-networks-bipartite.R b/tests/test-networks-bipartite.R index a9cca1d3..6177d6b1 100644 --- a/tests/test-networks-bipartite.R +++ b/tests/test-networks-bipartite.R @@ -17,6 +17,7 @@ ## Copyright 2018 by Thomas Bock ## Copyright 2018 by Jakob Kronawitter ## Copyright 2018-2019 by Anselm Fehnker +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -266,7 +267,8 @@ test_that("Construction of the bipartite network for the feature artifact with a type = TYPE.AUTHOR ) artifacts = data.frame( - name = c("", "", "", ""), + name = c("", "", "", + "", "", "", ""), kind = "Issue", type = TYPE.ARTIFACT ) @@ -274,22 +276,30 @@ test_that("Construction of the bipartite network for the feature artifact with a ## 2) construct expected edge attributes (issues ordered by 'author.name') network.expected.data = data.frame( - from = c("Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Karl", "Max", - "Max", "Max", "Olaf", "Olaf", "Olaf", "Olaf", "Thomas", "Thomas"), - to = c("", "", "", "", "", "", - "", "", "", "", "", "", "", - "", "", "", "", "", ""), - date = get.date.from.string(c("2013-05-05 21:46:30", "2013-05-05 21:49:21", "2013-05-05 21:49:34", + from = c("Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Karl", "Max", + "Max", "Max", "Olaf", "Olaf", "Olaf", "Olaf", "Olaf", "Olaf", "Thomas", "Thomas", "Thomas"), + to = c("", "", "", "", + "", "", + "", "", "", "", "", + "", "", "", "", + "", "", "", "", + "", "", "", "", ""), + date = get.date.from.string(c("2013-05-05 21:46:30", "2013-05-05 21:49:21", "2013-05-05 21:49:34", # Björn "2013-05-06 01:04:34", "2013-05-25 03:48:41", "2013-05-25 04:08:07", - "2016-07-12 16:02:30", "2016-07-15 19:55:39", "2017-05-23 12:32:39", - "2016-07-12 15:59:59", "2016-07-15 20:07:47", "2016-07-27 20:12:08", - "2016-07-28 06:27:52", "2013-05-25 03:25:06", "2013-05-25 06:06:53", - "2013-05-25 06:22:23", "2013-06-01 06:50:26", "2013-04-21 23:52:09", - "2016-07-12 16:03:59")), + "2016-07-12 14:59:25", "2016-07-12 16:02:30", "2016-07-12 16:06:01", + "2016-07-15 19:55:39", "2017-05-23 12:32:39", + "2016-07-12 15:59:59", #Karl + "2016-07-15 20:07:47", "2016-07-27 20:12:08", "2016-07-28 06:27:52", # Max + "2013-05-25 03:25:06", "2013-05-25 06:06:53", "2013-05-25 06:22:23", + "2013-06-01 06:50:26", "2016-07-12 16:01:01", "2016-07-12 16:02:02", # Olaf + "2013-04-21 23:52:09", "2016-07-12 15:59:25", "2016-07-12 16:03:59")), # Thomas artifact.type = "IssueEvent", - issue.id = c("", "", "", "", "", "", - "", "", "", "", "", "", "", - "", "", "", "", "", ""), + issue.id = c("", "", "", "", + "", "", + "", "", "", "", "", + "", "", "", "", + "", "", "", "", + "", "", "", "", ""), event.name = "commented", weight = 1, type = TYPE.EDGES.INTER, diff --git a/tests/test-networks-covariates.R b/tests/test-networks-covariates.R index eb7d71e2..41be7e8d 100644 --- a/tests/test-networks-covariates.R +++ b/tests/test-networks-covariates.R @@ -17,6 +17,7 @@ ## Copyright 2018-2019 by Thomas Bock ## Copyright 2018-2019 by Klara Schlüter ## Copyright 2018-2019 by Jakob Kronawitter +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -39,6 +40,8 @@ if (!dir.exists(CF.DATA)) CF.DATA = file.path(".", "tests", "codeface-data") mybins = c("2016-07-12 15:00:00", "2016-07-12 16:00:00", "2016-07-12 16:05:00", "2016-08-31 18:00:00") myranges = construct.ranges(mybins, sliding.window = FALSE) +mybins.since.2010 = c("2010-07-12 12:00:00", "2016-07-12 15:00:00", "2016-07-12 16:00:00", "2016-07-12 16:05:00", "2016-08-31 18:00:00") +myranges.since.2010 = construct.ranges(mybins.since.2010, sliding.window = FALSE) ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @@ -46,9 +49,17 @@ myranges = construct.ranges(mybins, sliding.window = FALSE) #' Load test data and generate test networks #' +#' @param network.type which network to get (\code{"author"} or \code{"artifact"}). [default: "author"] +#' @param issues whether to retain issue data. If \code{FALSE}, issue data is deleted. [default: FALSE] +#' @param author.relation passed to the network config. [default: "cochange"] +#' @param bins the bins which control splitting. [default: mybins] +#' #' @return Tuple containing project data and list of networks -get.network.covariates.test.networks = function(network.type = c("author", "artifact")) { +get.network.covariates.test.networks = function(network.type = c("author", "artifact"), issues = FALSE, + author.relation = c("cochange", "issue", "mail"), + bins = mybins) { + author.relation = match.arg(author.relation) network.type.function = paste("get", match.arg(network.type), "network", sep = ".") ## configuration and data objects @@ -57,14 +68,16 @@ get.network.covariates.test.networks = function(network.type = c("author", "arti proj.conf$update.value("commits.filter.untracked.files", TRUE) proj.conf$update.value("issues.only.comments", FALSE) net.conf = NetworkConf$new() - net.conf$update.values(list(author.relation = "cochange", simplify = FALSE)) + net.conf$update.values(list(author.relation = author.relation, simplify = FALSE)) - ## retrieve project data and network builder + ## retrieve project data project.data = ProjectData$new(proj.conf) - project.data$set.issues(NULL) + if (!issues) { + project.data$set.issues(NULL) + } ## split data - input.data = split.data.time.based(project.data, bins = mybins) + input.data = split.data.time.based(project.data, bins = bins) input.data.networks = lapply(input.data, function(d) NetworkBuilder$new(d, net.conf)[[network.type.function]]()) return(list("networks" = input.data.networks, "project.data" = project.data)) @@ -360,6 +373,20 @@ network.covariates.test.build.expected = function(x, y, z) { return(arguments) } +#' Build list with appropriate range names +#' +#' @param x Value for first range +#' @param y Value for second range +#' @param z Value for third range +#' +#' @return The list of x, y, z with range names +network.covariates.test.build.expected.since.2010 = function(w, x, y, z) { + arguments = list(w, x, y, z) + names(arguments) = myranges.since.2010 + + return(arguments) +} + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Unit tests for author networks ------------------------------------------ @@ -486,6 +513,329 @@ test_that("Test add.vertex.attribute.commit.count.committer.or.author", { }) }) +#' Test the add.vertex.attribute.mail.count method +test_that("Test add.vertex.attribute.mail.count", { + ## Test setup + networks.and.data = get.network.covariates.test.networks(author.relation = "mail", bins = mybins.since.2010) + + expected.attributes = list( + range = network.covariates.test.build.expected.since.2010(c(1L, 7L), c(1L, 1L), c(1L), c(1L)), + cumulative = network.covariates.test.build.expected.since.2010(c(1L, 7L), c(1L, 1L), c(1L), c(2L)), + all.ranges = network.covariates.test.build.expected.since.2010(c(1L, 7L), c(1L, 2L), c(1L), c(2L)), + project.cumulative = network.covariates.test.build.expected.since.2010(c(1L, 7L), c(3L, 1L), c(1L), c(2L)), + project.all.ranges = network.covariates.test.build.expected.since.2010(c(1L, 7L), c(3L, 2L), c(1L), c(2L)), + complete = network.covariates.test.build.expected.since.2010(c(1L, 7L), c(3L, 2L), c(1L), c(2L)) + ) + + ## Test + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.mail.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "mail.count") + + expect_identical(expected.attributes[[level]], actual.attributes) + }) +}) + +#' Test the add.vertex.attribute.mail.count method +test_that("Test add.vertex.attribute.mail.thread.count", { + ## Test setup + networks.and.data = get.network.covariates.test.networks(author.relation = "mail", bins = mybins.since.2010) + + expected.attributes = list( + range = network.covariates.test.build.expected.since.2010(c(1L, 2L), c(1L, 1L), c(1L), c(1L)), + cumulative = network.covariates.test.build.expected.since.2010(c(1L, 2L), c(1L, 1L), c(1L), c(2L)), + all.ranges = network.covariates.test.build.expected.since.2010(c(1L, 2L), c(1L, 2L), c(1L), c(2L)), + project.cumulative = network.covariates.test.build.expected.since.2010(c(1L, 2L), c(3L, 1L), c(1L), c(2L)), + project.all.ranges = network.covariates.test.build.expected.since.2010(c(1L, 2L), c(3L, 2L), c(1L), c(2L)), + complete = network.covariates.test.build.expected.since.2010(c(1L, 2L), c(3L, 2L), c(1L), c(2L)) + ) + + ## Test + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.mail.thread.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "mail.thread.count") + + expect_identical(expected.attributes[[level]], actual.attributes) + }) +}) + +#' Helper function for add.vertex.attribute.issue.*.count tests +sum.expected.attributes = function(expected.attributes.issues.only, expected.attributes.prs.only) { + result = lapply(names(expected.attributes.issues.only), function(n) { + issue.attr = expected.attributes.issues.only[[n]] + pr.attr = expected.attributes.prs.only[[n]] + sum.attr = lapply(names(issue.attr), function (n2) { + issue.attr[[n2]] + pr.attr[[n2]] + }) + names(sum.attr) = names(issue.attr) + return(sum.attr) + }) + names(result) = names(expected.attributes.issues.only) + return(result) +} + +#' Test the add.vertex.attribute.issue.count method +test_that("Test add.vertex.attribute.issue.count", { + ## Test setup + networks.and.data = get.network.covariates.test.networks(issues=TRUE, author.relation = "issue") + + expected.attributes.issues.only = list( + range = network.covariates.test.build.expected(c(0L, 1L, 1L, 1L), c(0L, 1L, 1L), c(2L, 1L, 1L, 1L)), + cumulative = network.covariates.test.build.expected(c(0L, 1L, 1L, 1L), c(1L, 1L, 1L), c(2L, 1L, 1L, 1L)), + all.ranges = network.covariates.test.build.expected(c(1L, 1L, 1L, 1L), c(1L, 2L, 1L), c(2L, 1L, 1L, 1L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 1L, 2L, 1L), c(2L, 2L, 2L), c(3L, 2L, 1L, 1L)), + project.all.ranges = network.covariates.test.build.expected(c(2L, 1L, 2L, 1L), c(2L, 3L, 2L), c(3L, 2L, 1L, 1L)), + complete = network.covariates.test.build.expected(c(3L, 1L, 3L, 1L), c(3L, 3L, 3L), c(3L, 3L, 1L, 1L)) + ) + + expected.attributes.prs.only = list( + range = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(3L, 1L, 0L), c(1L, 1L, 0L, 0L)), + cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(3L, 1L, 1L), c(2L, 3L, 0L, 0L)), + all.ranges = network.covariates.test.build.expected(c(1L, 0L, 3L, 0L), c(3L, 2L, 1L), c(2L, 3L, 0L, 0L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(3L, 2L, 1L), c(3L, 3L, 0L, 0L)), + project.all.ranges = network.covariates.test.build.expected(c(1L, 0L, 3L, 0L), c(3L, 3L, 1L), c(3L, 3L, 0L, 0L)), + complete = network.covariates.test.build.expected(c(1L, 0L, 3L, 0L), c(3L, 3L, 1L), c(3L, 3L, 0L, 0L)) + ) + + expected.attributes.both = sum.expected.attributes(expected.attributes.issues.only, expected.attributes.prs.only) + + ## Test issues only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "issues" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issue.count") + + expect_identical(expected.attributes.issues.only[[level]], actual.attributes) + }) + + # Test PRs only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, + issue.type = "pull.requests", name = "pull.request.count" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "pull.request.count") + + expect_identical(expected.attributes.prs.only[[level]], actual.attributes) + }) + + # Test both + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "all" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issue.count") + + expect_identical(expected.attributes.both[[level]], actual.attributes) + }) +}) + + +#' Test the add.vertex.attribute.issues.commented.count method +test_that("Test add.vertex.attribute.issues.commented.count", { + ## Test setup + networks.and.data = get.network.covariates.test.networks(issues = TRUE, author.relation = "issue") + + expected.attributes.issues.only = list( + range = network.covariates.test.build.expected(c(0L, 1L, 0L, 0L), c(0L, 1L, 1L), c(1L, 0L, 0L, 1L)), + cumulative = network.covariates.test.build.expected(c(0L, 1L, 0L, 0L), c(0L, 1L, 1L), c(1L, 0L, 1L, 1L)), + all.ranges = network.covariates.test.build.expected(c(1L, 1L, 0L, 0L), c(0L, 1L, 1L), c(1L, 0L, 1L, 1L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 1L, 1L, 0L), c(1L, 2L, 2L), c(2L, 1L, 1L, 1L)), + project.all.ranges = network.covariates.test.build.expected(c(2L, 1L, 1L, 0L), c(1L, 2L, 2L), c(2L, 1L, 1L, 1L)), + complete = network.covariates.test.build.expected(c(2L, 1L, 1L, 0L), c(1L, 3L, 2L), c(3L, 1L, 1L, 1L)) + ) + + expected.attributes.prs.only = list( + range = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(2L, 0L, 0L), c(1L, 0L, 0L, 0L)), + cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(2L, 0L, 1L), c(1L, 2L, 0L, 0L)), + all.ranges = network.covariates.test.build.expected(c(1L, 0L, 2L, 0L), c(2L, 1L, 1L), c(1L, 2L, 0L, 0L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(2L, 1L, 1L), c(2L, 2L, 0L, 0L)), + project.all.ranges = network.covariates.test.build.expected(c(1L, 0L, 2L, 0L), c(2L, 2L, 1L), c(2L, 2L, 0L, 0L)), + complete = network.covariates.test.build.expected(c(1L, 0L, 2L, 0L), c(2L, 2L, 1L), c(2L, 2L, 0L, 0L)) + ) + + expected.attributes.both = sum.expected.attributes(expected.attributes.issues.only, expected.attributes.prs.only) + + ## Test issues only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issues.commented.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "issues" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issues.commented.count") + + expect_identical(expected.attributes.issues.only[[level]], actual.attributes) + }) + + # Test PRs only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issues.commented.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, + issue.type = "pull.requests", name = "pull.requests.commented.count" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "pull.requests.commented.count") + + expect_identical(expected.attributes.prs.only[[level]], actual.attributes) + }) + + # Test both + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issues.commented.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "all" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issues.commented.count") + + expect_identical(expected.attributes.both[[level]], actual.attributes) + }) +}) + +#' Test the add.vertex.attribute.issue.creation.count method +test_that("Test add.vertex.attribute.issue.creation.count", { + ## Test setup + networks.and.data = get.network.covariates.test.networks(issues = TRUE, author.relation = "issue") + + expected.attributes.issues.only = list( + range = network.covariates.test.build.expected(c(0L, 1L, 0L, 0L), c(0L, 1L, 0L), c(0L, 0L, 0L, 0L)), + cumulative = network.covariates.test.build.expected(c(0L, 1L, 0L, 0L), c(0L, 1L, 0L), c(1L, 0L, 1L, 0L)), + all.ranges = network.covariates.test.build.expected(c(0L, 1L, 0L, 0L), c(0L, 1L, 0L), c(1L, 0L, 1L, 0L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 1L, 0L, 0L), c(0L, 1L, 1L), c(1L, 0L, 1L, 0L)), + project.all.ranges = network.covariates.test.build.expected(c(1L, 1L, 0L, 0L), c(0L, 1L, 1L), c(1L, 0L, 1L, 0L)), + complete = network.covariates.test.build.expected(c(1L, 1L, 0L, 0L), c(0L, 1L, 1L), c(1L, 0L, 1L, 0L)) + ) + + expected.attributes.prs.only = list( + range = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(1L, 0L, 0L), c(0L, 0L, 0L, 0L)), + cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(1L, 0L, 1L), c(0L, 1L, 0L, 0L)), + all.ranges = network.covariates.test.build.expected(c(1L, 0L, 1L, 0L), c(1L, 0L, 1L), c(0L, 1L, 0L, 0L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(1L, 1L, 1L), c(1L, 1L, 0L, 0L)), + project.all.ranges = network.covariates.test.build.expected(c(1L, 0L, 1L, 0L), c(1L, 1L, 1L), c(1L, 1L, 0L, 0L)), + complete = network.covariates.test.build.expected(c(1L, 0L, 1L, 0L), c(1L, 1L, 1L), c(1L, 1L, 0L, 0L)) + ) + + expected.attributes.both = sum.expected.attributes(expected.attributes.issues.only, expected.attributes.prs.only) + + ## Test issues only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.creation.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "issues" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issue.creation.count") + + expect_identical(expected.attributes.issues.only[[level]], actual.attributes) + }) + + # Test PRs only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.creation.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, + issue.type = "pull.requests", name = "pull.request.creation.count" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "pull.request.creation.count") + + expect_identical(expected.attributes.prs.only[[level]], actual.attributes) + }) + + # Test both + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.creation.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "all" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issue.creation.count") + + expect_identical(expected.attributes.both[[level]], actual.attributes) + }) +}) + +#' Test the add.vertex.attribute.issue.comment.count method +test_that("Test add.vertex.attribute.issue.comment.count", { + ## Test setup + networks.and.data = get.network.covariates.test.networks(issues = TRUE, author.relation = "issue") + + expected.attributes.issues.only = list( + range = network.covariates.test.build.expected(c(0L, 1L, 0L, 0L), c(0L, 1L, 1L), c(1L, 0L, 0L, 3L)), + cumulative = network.covariates.test.build.expected(c(0L, 1L, 0L, 0L), c(0L, 1L, 1L), c(2L, 0L, 1L, 3L)), + all.ranges = network.covariates.test.build.expected(c(1L, 1L, 0L, 0L), c(0L, 2L, 1L), c(2L, 0L, 1L, 3L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 1L, 4L, 0L), c(4L, 7L, 2L), c(8L, 4L, 1L, 3L)), + project.all.ranges = network.covariates.test.build.expected(c(2L, 1L, 4L, 0L), c(4L, 8L, 2L), c(8L, 4L, 1L, 3L)), + complete = network.covariates.test.build.expected(c(2L, 1L, 4L, 0L), c(4L, 9L, 2L), c(9L, 4L, 1L, 3L)) + ) + + expected.attributes.prs.only = list( + range = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(2L, 0L, 0L), c(1L, 0L, 0L, 0L)), + cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(2L, 0L, 1L), c(1L, 2L, 0L, 0L)), + all.ranges = network.covariates.test.build.expected(c(1L, 0L, 2L, 0L), c(2L, 1L, 1L), c(1L, 2L, 0L, 0L)), + project.cumulative = network.covariates.test.build.expected(c(1L, 0L, 0L, 0L), c(2L, 1L, 1L), c(2L, 2L, 0L, 0L)), + project.all.ranges = network.covariates.test.build.expected(c(1L, 0L, 2L, 0L), c(2L, 2L, 1L), c(2L, 2L, 0L, 0L)), + complete = network.covariates.test.build.expected(c(1L, 0L, 2L, 0L), c(2L, 2L, 1L), c(2L, 2L, 0L, 0L)) + ) + + expected.attributes.both = sum.expected.attributes(expected.attributes.issues.only, expected.attributes.prs.only) + + ## Test issues only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.comment.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "issues" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issue.comment.count") + + expect_identical(expected.attributes.issues.only[[level]], actual.attributes) + }) + + # Test PRs only + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.comment.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, + issue.type = "pull.requests", name = "pull.request.comment.count" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "pull.request.comment.count") + + expect_identical(expected.attributes.prs.only[[level]], actual.attributes) + }) + + # Test both + + lapply(AGGREGATION.LEVELS, function(level) { + networks.with.attr = add.vertex.attribute.issue.comment.count( + networks.and.data[["networks"]], networks.and.data[["project.data"]], aggregation.level = level, issue.type = "all" + ) + + actual.attributes = lapply(networks.with.attr, igraph::get.vertex.attribute, name = "issue.comment.count") + + expect_identical(expected.attributes.both[[level]], actual.attributes) + }) +}) + + #' Test the add.vertex.attribute.author.email method test_that("Test add.vertex.attribute.author.email", { diff --git a/tests/test-networks-multi-relation.R b/tests/test-networks-multi-relation.R index 3bed7cd2..3d1aeb72 100644 --- a/tests/test-networks-multi-relation.R +++ b/tests/test-networks-multi-relation.R @@ -17,6 +17,7 @@ ## Copyright 2018-2019 by Anselm Fehnker ## Copyright 2018-2019 by Claus Hunsen ## Copyright 2019 by Anselm Fehnker +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -124,7 +125,8 @@ test_that("Construction of the bipartite network for the feature artifact with a type = TYPE.AUTHOR ) issues = data.frame( - name = c("", "", "", ""), + name = c("", "", "", + "", "", "", ""), kind = "Issue", type = TYPE.ARTIFACT ) @@ -137,33 +139,35 @@ test_that("Construction of the bipartite network for the feature artifact with a vertices = plyr::rbind.fill(authors1, issues, authors2, threads) ## 2) construct expected edge attributes (data sorted by 'author.name') network.expected.data = data.frame( - from = c("Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Karl", "Max", # issue - "Max", "Max", "Olaf", "Olaf", "Olaf", "Olaf", "Thomas", "Thomas", + from = c("Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Karl", "Max", # issue + "Max", "Max", "Olaf", "Olaf", "Olaf", "Olaf", "Olaf", "Olaf", "Thomas", "Thomas", "Thomas", "Björn", "Björn", "Björn", "Fritz fritz@example.org", "georg", "Hans", "Hans", "Hans", # mail "Hans", "Hans", "Hans", "Hans", "Olaf", "Olaf", "Thomas", "udo"), to = c("", "", "", "", # issue - "", "", "", "", - "", "", "", "", "", + "", "", + "", "", "", "", "", + "", "", "", "", "", "", "", "", - "", "", + "", "", "", "", "", "", "", "", "", "", "", "", # mail "", "", "", "", "", "", "", "", ""), date = get.date.from.string(c("2013-05-05 21:46:30", "2013-05-05 21:49:21", "2013-05-05 21:49:34", # issue "2013-05-06 01:04:34", "2013-05-25 03:48:41", "2013-05-25 04:08:07", - "2016-07-12 16:02:30", "2016-07-15 19:55:39", "2017-05-23 12:32:39", - "2016-07-12 15:59:59", "2016-07-15 20:07:47", "2016-07-27 20:12:08", - "2016-07-28 06:27:52", "2013-05-25 03:25:06", "2013-05-25 06:06:53", - "2013-05-25 06:22:23", "2013-06-01 06:50:26", "2013-04-21 23:52:09", - "2016-07-12 16:03:59", + "2016-07-12 14:59:25", "2016-07-12 16:02:30", "2016-07-12 16:06:01", + "2016-07-15 19:55:39", "2017-05-23 12:32:39", "2016-07-12 15:59:59", + "2016-07-15 20:07:47", "2016-07-27 20:12:08", "2016-07-28 06:27:52", + "2013-05-25 03:25:06", "2013-05-25 06:06:53", "2013-05-25 06:22:23", + "2013-06-01 06:50:26", "2016-07-12 16:01:01", "2016-07-12 16:02:02", + "2013-04-21 23:52:09", "2016-07-12 15:59:25", "2016-07-12 16:03:59", "2004-10-09 18:38:13", "2005-02-09 18:49:49", "2016-07-12 15:58:40", # mail "2010-07-12 11:05:35", "2010-07-12 12:05:34", "2010-07-12 12:05:40", "2010-07-12 12:05:41", "2010-07-12 12:05:42", "2010-07-12 12:05:43", "2010-07-12 12:05:44", "2010-07-12 12:05:45", "2010-07-12 12:05:46", "2016-07-12 15:58:50", "2016-07-12 16:05:37", "2016-07-12 16:04:40", "2010-07-12 10:05:36")), - artifact.type = c(rep("IssueEvent", 19), rep("Mail", 16)), - message.id = c(rep(NA, 19), + artifact.type = c(rep("IssueEvent", 24), rep("Mail", 16)), + message.id = c(rep(NA, 24), "", "<1107974989.17910.6.camel@jmcmullan>", "<4cbaa9ef0802201124v37f1eec8g89a412dfbfc8383a@mail.gmail.com>", "", "", @@ -172,20 +176,22 @@ test_that("Construction of the bipartite network for the feature artifact with a "", "<6784529b0802032245r5164f984l342f0f0dc94aa420@mail.gmail.com>", "<9b06e8d20801220234h659c18a3g95c12ac38248c7e0@mail.gmail.com>", "<65a1sf31sagd684dfv31@mail.gmail.com>", ""), - thread = c(rep(NA, 19), + thread = c(rep(NA, 24), "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""), - issue.id = c("", "", "", "", - "", "", "", "", - "", "", "", "", "", + issue.id = c("", "", "", "", # issue + "", "", + "", "", "", "", "", + "", "", "", "", "", "", "", "", - "", "", rep(NA,16)), - event.name = c(rep("commented", 19), + "", "", "", "", "", + rep(NA,16)), + event.name = c(rep("commented", 24), rep(NA, 16)), weight = 1, type = TYPE.EDGES.INTER, - relation = c(rep("issue", 19), rep("mail", 16)) + relation = c(rep("issue", 24), rep("mail", 16)) ) ## 3) build expected network @@ -216,28 +222,30 @@ test_that("Construction of the multi network for the feature artifact with autho ## 1) construct expected vertices vertices = data.frame( name = c("Björn", "Olaf", "Karl", "Thomas", "udo", "Fritz fritz@example.org", "georg", "Hans", - "Base_Feature", "foo", "A", "", "", "", ""), - kind = c(rep(TYPE.AUTHOR, 8), rep("Feature", 3), rep("Issue", 4)), - type = c(rep(TYPE.AUTHOR, 8), rep(TYPE.ARTIFACT, 7)) + "Base_Feature", "foo", "A", "", "", "", "", + "", "", ""), + kind = c(rep(TYPE.AUTHOR, 8), rep("Feature", 3), rep("Issue", 7)), + type = c(rep(TYPE.AUTHOR, 8), rep(TYPE.ARTIFACT, 10)) ) row.names(vertices) = c("Björn", "Olaf", "Karl", "Thomas", "udo", "Fritz fritz@example.org", "georg", "Hans", - "Base_Feature", "foo", "A", "", "", "", "") + "Base_Feature", "foo", "A", "", "", "", "", + "", "", "") ## 2) construct expected edge attributes (data sorted by 'author.name') edges = data.frame(from = c("Björn", "Björn", "Olaf", "Olaf", "Olaf", "Olaf", "Karl", "Karl", # author cochange "Björn", "Björn", "Olaf", "Olaf", # author mail "Base_Feature", "Base_Feature", # artifact cochange "Björn", "Olaf", "Olaf", "Karl", "Thomas", "Thomas", # bipartite cochange - "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", # bipartite issue - "Olaf", "Olaf", "Olaf", "Olaf", "Karl", "Thomas", "Thomas"), + "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", "Björn", # bipartite issue + "Olaf", "Olaf", "Olaf", "Olaf", "Olaf", "Olaf", "Karl", "Thomas", "Thomas", "Thomas"), to = c("Olaf", "Olaf", "Karl", "Karl", "Thomas", "Thomas", "Thomas", "Thomas", # author cochange "Olaf", "Olaf", "Thomas", "Thomas", # author mail "foo", "foo", # artifact cochange "A", "A", "Base_Feature", "Base_Feature", "Base_Feature", "foo", # bipartite cochange "", "", "", "", # bipartite issue - "", "", "", "", + "", "", "", "", "", "", "", "", "", "", "", - "", "", ""), + "", "", "", "", "", ""), date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:00:45", "2016-07-12 16:05:41", # author cochange "2016-07-12 16:06:10", "2016-07-12 16:05:41", "2016-07-12 16:06:32", "2016-07-12 16:06:10", "2016-07-12 16:06:32", @@ -247,13 +255,14 @@ test_that("Construction of the multi network for the feature artifact with autho "2016-07-12 15:58:59", "2016-07-12 16:00:45", "2016-07-12 16:05:41", # bipartite cochange "2016-07-12 16:06:10", "2016-07-12 16:06:32", "2016-07-12 16:06:32", "2013-05-05 21:46:30", "2013-05-05 21:49:21", "2013-05-05 21:49:34", # bipartite issue - "2013-05-06 01:04:34", "2013-05-25 03:48:41", "2013-05-25 04:08:07", - "2016-07-12 16:02:30", "2016-07-15 19:55:39", "2017-05-23 12:32:39", + "2013-05-06 01:04:34", "2013-05-25 03:48:41", "2013-05-25 04:08:07", "2016-07-12 14:59:25", + "2016-07-12 16:02:30", "2016-07-12 16:06:01", "2016-07-15 19:55:39", "2017-05-23 12:32:39", "2013-05-25 03:25:06", "2013-05-25 06:06:53", "2013-05-25 06:22:23", - "2013-06-01 06:50:26", "2016-07-12 15:59:59", "2013-04-21 23:52:09", + "2013-06-01 06:50:26", "2016-07-12 16:01:01", "2016-07-12 16:02:02", + "2016-07-12 15:59:59", "2013-04-21 23:52:09", "2016-07-12 15:59:25", "2016-07-12 16:03:59")), artifact.type = c(rep("Feature", 8), rep("Mail", 4), rep("Feature", 2), rep("Feature", 6), - rep("IssueEvent", 16)), + rep("IssueEvent", 21)), hash = c("72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338", # author cochange "3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61", "3a0ed78458b3976243db6829f63eba3eead26774", "0a1a5c523d835459c42f33e863623138555e2526", @@ -263,37 +272,37 @@ test_that("Construction of the multi network for the feature artifact with autho "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "5a5ec9675e98187e1e92561e1888aa6f04faa338", # bipartite cochange "3a0ed78458b3976243db6829f63eba3eead26774", "1143db502761379c2bfcecc2007fc34282e7ee61", "0a1a5c523d835459c42f33e863623138555e2526", "0a1a5c523d835459c42f33e863623138555e2526", - rep(NA, 16)), # bipartite issue + rep(NA, 21)), # bipartite issue file = c("test.c", "test.c", "test2.c", "test3.c", "test2.c", "test2.c", "test3.c", "test2.c", # author cochange NA, NA, NA, NA, "test2.c", "test2.c", # artifact cochange "test.c", "test.c", "test2.c", "test3.c", "test2.c", "test2.c", # bipartite cochange - rep(NA, 16)), + rep(NA, 21)), artifact = c("A", "A", "Base_Feature", "Base_Feature", "Base_Feature", "Base_Feature", "Base_Feature", # author cochange "Base_Feature", rep(NA, 4), "Base_Feature", "foo", # bipartite cochange "A", "A", "Base_Feature", "Base_Feature", "Base_Feature", "foo", # bipartite cochange - rep(NA, 16)), + rep(NA, 21)), weight = 1, - type = c(rep(TYPE.EDGES.INTRA, 14), rep(TYPE.EDGES.INTER, 22)), + type = c(rep(TYPE.EDGES.INTRA, 14), rep(TYPE.EDGES.INTER, 27)), relation = c(rep("cochange", 8), rep("mail", 4), rep("cochange", 2), rep("cochange", 6), - rep("issue", 16)), + rep("issue", 21)), message.id = c(rep(NA, 8), "<4cbaa9ef0802201124v37f1eec8g89a412dfbfc8383a@mail.gmail.com>", "<6784529b0802032245r5164f984l342f0f0dc94aa420@mail.gmail.com>", "<65a1sf31sagd684dfv31@mail.gmail.com>", "<9b06e8d20801220234h659c18a3g95c12ac38248c7e0@mail.gmail.com>", - rep(NA, 24)), + rep(NA, 29)), thread = c(rep(NA, 8), "", "", "", "", - rep(NA, 24)), + rep(NA, 29)), issue.id = c(rep(NA, 20), "", "", "", "", # bipartite issue - "", "", "", "", - "", "", "", "", - "", "", "", ""), - event.name = c(rep(NA, 20), rep("commented", 16)) + "", "", "", "", "", "", + "", "", "", "", "", + "", "", "", "", "", ""), + event.name = c(rep(NA, 20), rep("commented", 21)) ) ## 3) build expected network diff --git a/tests/test-read.R b/tests/test-read.R index 0e78b848..5fb379dd 100644 --- a/tests/test-read.R +++ b/tests/test-read.R @@ -18,6 +18,7 @@ ## Copyright 2018 by Jakob Kronawitter ## Copyright 2018-2019 by Anselm Fehnker ## Copyright 2020-2021 by Niklas Schneider +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -291,34 +292,53 @@ test_that("Read and parse the issue data.", { ## build the expected data.frame issue.data.expected = data.frame(issue.id = c(rep("", 13), rep("", 6), - rep("", 7), rep("", 10)), + rep("", 7), rep("", 10), + rep("", 6), rep("", 4), rep("", 3)), issue.title = c(rep("[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name", 13), rep("[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table", 6), rep("Error in construct.networks.from.list for openssl function networks", 7), - rep("Distinguish directedness of networks and edge-construction algorithm", 10)), + rep("Distinguish directedness of networks and edge-construction algorithm", 10), + rep("Example pull request 1", 6), + rep("Example pull request 2", 4), + rep("Example pull request 4", 3)), issue.type = I(c(rep(list(list("issue" , "bug")), 13), rep(list(list("issue" , "bug")), 6), - rep(list(list("issue" , "bug")), 7), rep(list(list("issue", "bug", "enhancement")), 10))), - issue.state = c(rep("closed", 13), rep("open", 6), rep("closed", 7), rep("open", 10)), + rep(list(list("issue" , "bug")), 7), rep(list(list("issue", "bug", "enhancement")), 10), + rep(list(list("pull request")), 6), rep(list(list("pull request")), 4), rep(list(list("pull request", "enhancement")), 3))), + issue.state = c(rep("closed", 13), rep("open", 6), rep("closed", 7), rep("open", 10), + rep("reopened", 6), rep("closed", 4), rep("open", 3)), issue.resolution = I(c(rep(list(list("fixed")), 13), rep(list(list("unresolved")), 6), - rep(list(list()), 7), rep(list(list()), 10))), + rep(list(list()), 7), rep(list(list()), 10), + rep(list(list()), 6), rep(list(list()), 4), rep(list(list()), 3))), creation.date = get.date.from.string(c(rep("2013-04-21 23:52:09", 13), - rep("2016-04-17 02:06:38", 6), + rep("2016-07-12 16:01:30", 6), rep("2016-07-12 15:59:25", 7), - rep("2016-07-12 14:30:13", 10))), + rep("2016-07-12 14:30:13", 10), + rep("2016-07-14 13:37:00", 6), + rep("2016-07-12 14:59:25", 4), + rep("2016-07-12 16:02:02", 3))), closing.date = get.date.from.string(c(rep("2013-05-25 20:02:08", 13), rep(NA, 6), - rep("2016-12-07 15:37:02", 7), rep(NA, 10))), + rep("2016-07-12 16:06:30", 7), rep(NA, 10), + rep(NA, 6), + rep("2016-07-12 16:04:59", 4), + rep(NA, 3))), issue.components = I(c(rep(list(list("GUI" , "Interpreters")), 13), rep(list(list("Interpreters")), 6), - rep(list(list()), 7), rep(list(list()), 10))), + rep(list(list()), 7), rep(list(list()), 10), + rep(list(list()), 6), rep(list(list()), 4), rep(list(list()), 3))), event.name = c("created", "commented", "commented", "commented", "commented", "commented", "commented", "commented", "commented", "commented", "commented", "commented", "resolution_updated", "created", "commented", "commented", "commented", "commented", "commented", "created", "assigned", "commented", "state_updated", "add_link", "referenced", "referenced", "mentioned", "subscribed", "commented", "mentioned", - "subscribed", "add_link", "mentioned", "subscribed", "labeled", "commented"), + "subscribed", "add_link", "mentioned", "subscribed", "labeled", "commented", + "created", "commented", "state_updated", "commented", "commented", "state_updated", + "created", "commented", "merged", "state_updated", + "commit_added", "created", "commented"), author.name = c("Thomas", "Thomas", "Björn", "Björn", "Björn", "Björn", "Olaf", "Björn", "Björn", "Olaf", "Olaf", "Olaf", "Björn", "Björn", "Björn", "Björn", "Max", "Max", "Max", "Karl", "Olaf", "Karl", "Olaf", "Karl", "Karl", "Thomas", "udo", - "udo", "Thomas", "Björn", "Björn", "Thomas", "Björn", "Björn", "Olaf", "Björn"), + "udo", "Thomas", "Björn", "Björn", "Thomas", "Björn", "Björn", "Olaf", "Björn", + "Thomas", "Thomas", "Thomas", "Olaf", "Björn", "Olaf", + "Björn", "Björn", "Olaf", "Olaf", "Björn", "Olaf", "Olaf"), author.email = c("thomas@example.org", "thomas@example.org", "bjoern@example.org", "bjoern@example.org", "bjoern@example.org", "bjoern@example.org", "olaf@example.org", "bjoern@example.org", "bjoern@example.org", @@ -330,7 +350,12 @@ test_that("Read and parse the issue data.", { "karl@example.org", "thomas@example.org", "udo@example.org", "udo@example.org", "thomas@example.org", "bjoern@example.org", "bjoern@example.org", "thomas@example.org", "bjoern@example.org", - "bjoern@example.org", "olaf@example.org", "bjoern@example.org"), + "bjoern@example.org", "olaf@example.org", "bjoern@example.org", + "thomas@example.org", "thomas@example.org", "thomas@example.org", + "olaf@example.org", "bjoern@example.org", "olaf@example.org", + "bjoern@example.org", "bjoern@example.org", "olaf@example.org", + "olaf@example.org", "bjoern@example.org", "olaf@example.org", + "olaf@example.org"), date = get.date.from.string(c("2013-04-21 23:52:09", "2013-04-21 23:52:09", "2013-05-05 21:46:30", "2013-05-05 21:49:21", "2013-05-05 21:49:34", "2013-05-06 01:04:34", @@ -348,15 +373,25 @@ test_that("Read and parse the issue data.", { "2016-07-12 16:03:59", "2016-08-31 15:30:02", "2016-10-05 15:30:02", "2016-10-13 15:30:02", "2016-12-07 15:30:02", "2016-12-07 15:30:02", - "2017-05-23 12:31:34", "2017-05-23 12:32:39")), + "2017-05-23 12:31:34", "2017-05-23 12:32:39", + "2016-07-12 15:59:25", "2016-07-12 15:59:25", + "2016-07-12 15:59:59", "2016-07-12 16:01:01", + "2016-07-12 16:06:01", "2016-07-14 13:37:00", + "2016-07-12 14:59:25", "2016-07-12 14:59:25", + "2016-07-12 16:04:59", "2016-07-12 16:04:59", + "2016-07-12 16:02:02", "2016-07-12 16:02:02", + "2016-07-12 16:02:02")), event.info.1 = c("open", "open", "open", "open", "open", "open", "open", "open", "open", "open", "open", "open", "fixed", "open", "open", "open", "open", "open", "open", "open", "", "open", "closed", "930af63a030fb92e48eddff01f53284c3eeba80e", "", "", "Thomas", "Thomas", "open", "Thomas", "Thomas", "fb52357f05958007b867da06f4077abdc04fa0d8", - "udo", "udo", "decided", "open"), + "udo", "udo", "decided", "open", + "open", "open", "closed", "closed", "closed", "open", + "open", "open", "", "closed", + "72c8dd25d3dd6d18f46e2b26a5f5b1e2e8dc28d0", "open", "open"), event.info.2 = NA, # is assigned later event.id = NA, # is assigned later - issue.source = c(rep("jira", 19), rep("github", 17)), + issue.source = c(rep("jira", 19), rep("github", 17), rep("github", 13)), artifact.type = "IssueEvent" ) @@ -368,7 +403,10 @@ test_that("Read and parse the issue data.", { list("unresolved"), list("unresolved"), list("unresolved"), list(), "", list(), "open", "commit", "", "", "thomas@example.org", "thomas@example.org", list(), "thomas@example.org", "thomas@example.org", "commit", "udo@example.org", - "udo@example.org", "", list() + "udo@example.org", "", list(), + list(), list(), "open", list(), list(), "closed", + list(), list(), "", "open", + "2016-07-12 15:58:59", list(), list() )) ## calculate event IDs diff --git a/tests/test-split-sliding-window.R b/tests/test-split-sliding-window.R index 258e6f2c..f37beabf 100644 --- a/tests/test-split-sliding-window.R +++ b/tests/test-split-sliding-window.R @@ -19,6 +19,7 @@ ## Copyright 2018 by Jakob Kronawitter ## Copyright 2019 by Anselm Fehnker ## Copyright 2021 by Niklas Schneider +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -68,7 +69,7 @@ test_that("Split a data object time-based (split.basis = 'commits', sliding.wind data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -107,11 +108,11 @@ test_that("Split a data object time-based (split.basis = 'commits', sliding.wind "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$issues[rownames(data$issues) %in% c(14, 20:22), ], - "2016-07-12 16:00:29-2016-07-12 16:03:29" = data$issues[rownames(data$issues) %in% c(14:15), ], - "2016-07-12 16:01:59-2016-07-12 16:04:59" = data$issues[rownames(data$issues) %in% c(15, 29), ], - "2016-07-12 16:03:29-2016-07-12 16:06:29" = data$issues[rownames(data$issues) == 29, ], - "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) == 23, ] + "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$issues[rownames(data$issues) %in% c(14, 20:22, 37:40), ], + "2016-07-12 16:00:29-2016-07-12 16:03:29" = data$issues[rownames(data$issues) %in% c(14:15, 40, 47:49), ], + "2016-07-12 16:01:59-2016-07-12 16:04:59" = data$issues[rownames(data$issues) %in% c(15, 29, 47:49), ], + "2016-07-12 16:03:29-2016-07-12 16:06:29" = data$issues[rownames(data$issues) %in% c(29,41,45,46), ], + "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) %in% c(23,41,45,46), ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$mails[0, ], @@ -138,7 +139,7 @@ test_that("Split a data object time-based (split.basis = 'commits', sliding.wind results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -164,7 +165,7 @@ test_that("Split a data object time-based (split.basis = 'mails', sliding.window data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -215,7 +216,7 @@ test_that("Split a data object time-based (split.basis = 'mails', sliding.window "2009-04-10 09:38:13-2012-04-10 03:38:13" = data$issues[0, ], "2010-10-10 06:38:13-2013-10-10 00:38:13" = data$issues[rownames(data$issues) %in% 1:13, ], "2012-04-10 03:38:13-2015-04-10 21:38:13" = data$issues[rownames(data$issues) %in% 1:13, ], - "2013-10-10 00:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 27:29), ] + "2013-10-10 00:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 27:29, 37:40, 43:49), ] ), mails = list( "2004-10-09 18:38:13-2007-10-10 12:38:13" = data$mails[rownames(data$mails) %in% 1:2, ], @@ -248,7 +249,7 @@ test_that("Split a data object time-based (split.basis = 'mails', sliding.window results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -274,7 +275,7 @@ test_that("Split a data object time-based (split.basis = 'issues', sliding.windo data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -312,8 +313,8 @@ test_that("Split a data object time-based (split.basis = 'issues', sliding.windo issues = list( "2013-04-21 23:52:09-2015-04-22 11:52:09" = data$issues[rownames(data$issues) %in% 1:13, ], "2014-04-22 05:52:09-2016-04-21 17:52:09" = data$issues[0, ], - "2015-04-22 11:52:09-2017-04-21 23:52:09" = data$issues[rownames(data$issues) %in% 14:34, ], - "2016-04-21 17:52:09-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% 14:36, ] + "2015-04-22 11:52:09-2017-04-21 23:52:09" = data$issues[rownames(data$issues) %in% c(14:34, 37:49), ], + "2016-04-21 17:52:09-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% c(14:36, 37:49), ] ), mails = list( "2013-04-21 23:52:09-2015-04-22 11:52:09" = data$mails[0, ], @@ -337,7 +338,7 @@ test_that("Split a data object time-based (split.basis = 'issues', sliding.windo results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -364,7 +365,7 @@ test_that("Split a data object time-based (bins = ... , sliding.window = TRUE)." data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -394,7 +395,7 @@ test_that("Split a data object time-based (bins = ... , sliding.window = TRUE)." "2016-12-31 23:59:59-2017-06-03 03:03:03" = data$commit.messages ), issues = list( - "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$issues[rownames(data$issues) %in% 14:34, ], + "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$issues[rownames(data$issues) %in% c(14:34, 37:49), ], "2016-12-31 23:59:59-2017-06-03 03:03:03" = data$issues[rownames(data$issues) %in% 35:36, ] ), mails = list( @@ -413,7 +414,7 @@ test_that("Split a data object time-based (bins = ... , sliding.window = TRUE)." results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -439,7 +440,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -474,8 +475,8 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin "2016-07-12 16:06:20-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ], - "2016-07-12 16:00:45-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 29), ], + "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:41, 45:49), ], + "2016-07-12 16:00:45-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 29, 40:41, 45:49), ], "2016-07-12 16:06:10-2016-07-12 16:06:32" = data$issues[rownames(data$issues) == 23, ], "2016-07-12 16:06:20-2016-07-12 16:06:33" = data$issues[rownames(data$issues) == 23, ] ), @@ -501,7 +502,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -532,7 +533,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) %in% c(14:15, 20:23, 29), ] + "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) %in% c(14:15, 20:23, 29, 37:41, 45:49), ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$mails[rownames(data$mails) %in% 16:17, ] @@ -547,7 +548,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -581,7 +582,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin "2016-07-12 16:06:20-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ], + "2016-07-12 15:58:59-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:41, 45:49), ], "2016-07-12 16:06:20-2016-07-12 16:06:33" = data$issues[rownames(data$issues) == 23, ] ), mails = list( @@ -600,7 +601,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -643,7 +644,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -681,8 +682,8 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ], - "2016-07-12 16:00:45-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 29), ], + "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:41, 45:49), ], + "2016-07-12 16:00:45-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 29, 40:41, 45:49), ], "2016-07-12 16:06:10-2016-07-12 16:06:32" = data$issues[rownames(data$issues) == 23, ], "2016-07-12 16:06:20-2016-07-12 16:06:33" = data$issues[rownames(data$issues) == 23, ], "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$issues[0, ] @@ -712,7 +713,7 @@ test_that("Split a data object activity-based (activity.type = 'commits', slidin results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -738,7 +739,7 @@ test_that("Split a data object activity-based (activity.type = 'mails', sliding. data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -797,10 +798,10 @@ test_that("Split a data object activity-based (activity.type = 'mails', sliding. "2010-07-12 12:05:34-2010-07-12 12:05:42" = data$issues[0, ], "2010-07-12 12:05:41-2010-07-12 12:05:44" = data$issues[0, ], "2010-07-12 12:05:42-2010-07-12 12:05:45" = data$issues[0, ], - "2010-07-12 12:05:44-2016-07-12 15:58:40" = data$issues[rownames(data$issues) %in% c(1:13, 27:28), ], - "2010-07-12 12:05:45-2016-07-12 15:58:50" = data$issues[rownames(data$issues) %in% c(1:13, 27:28), ], - "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ], - "2016-07-12 15:58:50-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ] + "2010-07-12 12:05:44-2016-07-12 15:58:40" = data$issues[rownames(data$issues) %in% c(1:13, 27:28, 43:44), ], + "2010-07-12 12:05:45-2016-07-12 15:58:50" = data$issues[rownames(data$issues) %in% c(1:13, 27:28, 43:44), ], + "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:40, 45:49), ], + "2016-07-12 15:58:50-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:40, 45:49), ] ), mails = list( "2004-10-09 18:38:13-2010-07-12 11:05:35" = data$mails[rownames(data$mails) %in% 1:3, ], @@ -842,7 +843,7 @@ test_that("Split a data object activity-based (activity.type = 'mails', sliding. results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -873,7 +874,7 @@ test_that("Split a data object activity-based (activity.type = 'mails', sliding. "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commit.messages ), issues = list( - "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29), ] + "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29, 37:40, 43:49), ] ), mails = list( "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$mails @@ -888,7 +889,7 @@ test_that("Split a data object activity-based (activity.type = 'mails', sliding. results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -923,7 +924,7 @@ test_that("Split a data object activity-based (activity.type = 'mails', sliding. ), issues = list( "2004-10-09 18:38:13-2010-07-12 12:05:43" = data$issues[0, ], - "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29), ] + "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29, 37:40, 43:49), ] ), mails = list( "2004-10-09 18:38:13-2010-07-12 12:05:43" = data$mails[rownames(data$mails) %in% 1:8, ], @@ -941,7 +942,7 @@ test_that("Split a data object activity-based (activity.type = 'mails', sliding. results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -979,7 +980,7 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -992,10 +993,12 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding ## check time ranges expected = c( "2013-04-21 23:52:09-2013-05-25 06:22:23", - "2013-05-06 01:04:34-2016-07-12 15:59:25", - "2013-05-25 06:22:23-2016-07-12 16:03:59", - "2016-07-12 15:59:25-2016-07-27 20:12:08", - "2016-07-12 16:03:59-2016-10-05 15:30:02", + "2013-05-06 01:04:34-2016-07-12 15:30:02", + "2013-05-25 06:22:23-2016-07-12 15:59:59", + "2016-07-12 15:30:02-2016-07-12 16:02:02", + "2016-07-12 15:59:59-2016-07-12 16:06:30", + "2016-07-12 16:02:02-2016-07-27 20:12:08", + "2016-07-12 16:06:30-2016-10-05 15:30:02", "2016-07-27 20:12:08-2017-05-23 12:31:34", "2016-10-05 15:30:02-2017-05-23 12:32:40" ) @@ -1006,55 +1009,67 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding expected.data = list( commits = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$commits[0, ], - "2013-05-06 01:04:34-2016-07-12 15:59:25" = data$commits[1, ], - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$commits[1:2, ], - "2016-07-12 15:59:25-2016-07-27 20:12:08" = data$commits[2:8, ], - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$commits[3:8, ], + "2013-05-06 01:04:34-2016-07-12 15:30:02" = data$commits[0, ], + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$commits[1, ], + "2016-07-12 15:30:02-2016-07-12 16:02:02" = data$commits[1:2, ], + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$commits[2:5, ], + "2016-07-12 16:02:02-2016-07-27 20:12:08" = data$commits[3:8, ], + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$commits[6:8, ], "2016-07-27 20:12:08-2017-05-23 12:31:34" = data$commits[0, ], "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$commits[0, ] ), commit.messages = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$commit.messages, - "2013-05-06 01:04:34-2016-07-12 15:59:25" = data$commit.messages, - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$commit.messages, - "2016-07-12 15:59:25-2016-07-27 20:12:08" = data$commit.messages, - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$commit.messages, + "2013-05-06 01:04:34-2016-07-12 15:30:02" = data$commit.messages, + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$commit.messages, + "2016-07-12 15:30:02-2016-07-12 16:02:02" = data$commit.messages, + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$commit.messages, + "2016-07-12 16:02:02-2016-07-27 20:12:08" = data$commit.messages, + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$commit.messages, "2016-07-27 20:12:08-2017-05-23 12:31:34" = data$commit.messages, "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$commit.messages ), issues = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$issues[rownames(data$issues) %in% 1:10, ], - "2013-05-06 01:04:34-2016-07-12 15:59:25" = data$issues[rownames(data$issues) %in% c(6:13, 27:28), ], - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$issues[rownames(data$issues) %in% c(11:15, 20:22, 27:28), ], - "2016-07-12 15:59:25-2016-07-27 20:12:08" = data$issues[rownames(data$issues) %in% c(14:17, 20:23, 29),], - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$issues[rownames(data$issues) %in% c(16:19, 23:25, 29:30), ], - "2016-07-27 20:12:08-2017-05-23 12:31:34" = data$issues[rownames(data$issues) %in% c(18:19, 24:26, 30:34),], + "2013-05-06 01:04:34-2016-07-12 15:30:02" = data$issues[rownames(data$issues) %in% c(6:13, 43:44), ], + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$issues[rownames(data$issues) %in% c(11:13, 20:21, 27:28, 37:38, 43:44), ], + "2016-07-12 15:30:02-2016-07-12 16:02:02" = data$issues[rownames(data$issues) %in% c(14, 20:22, 27:28, 37:40),], + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$issues[rownames(data$issues) %in% c(14:15, 22, 29, 39:41, 45:49), ], + "2016-07-12 16:02:02-2016-07-27 20:12:08" = data$issues[rownames(data$issues) %in% c(15:17, 23, 29, 41:42, 45:49),], + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$issues[rownames(data$issues) %in% c(16:19, 23:25, 30, 42), ], + "2016-07-27 20:12:08-2017-05-23 12:31:34" = data$issues[rownames(data$issues) %in% c(18:19, 24:26, 30:34), ], "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% c(26, 31:36), ] ), mails = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$mails[0, ], - "2013-05-06 01:04:34-2016-07-12 15:59:25" = data$mails[rownames(data$mails) %in% 14:15, ], - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$mails[rownames(data$mails) %in% 14:15, ], - "2016-07-12 15:59:25-2016-07-27 20:12:08" = data$mails[rownames(data$mails) %in% 16:17, ], - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$mails[rownames(data$mails) %in% 16:17, ], + "2013-05-06 01:04:34-2016-07-12 15:30:02" = data$mails[0, ], + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$mails[rownames(data$mails) %in% 14:15, ], + "2016-07-12 15:30:02-2016-07-12 16:02:02" = data$mails[rownames(data$mails) %in% 14:15, ], + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$mails[rownames(data$mails) %in% 16:17, ], + "2016-07-12 16:02:02-2016-07-27 20:12:08" = data$mails[rownames(data$mails) %in% 16:17, ], + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$mails[0, ], "2016-07-27 20:12:08-2017-05-23 12:31:34" = data$mails[0, ], "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$mails[0, ] ), pasta = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$pasta, - "2013-05-06 01:04:34-2016-07-12 15:59:25" = data$pasta, - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$pasta, - "2016-07-12 15:59:25-2016-07-27 20:12:08" = data$pasta, - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$pasta, + "2013-05-06 01:04:34-2016-07-12 15:30:02" = data$pasta, + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$pasta, + "2016-07-12 15:30:02-2016-07-12 16:02:02" = data$pasta, + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$pasta, + "2016-07-12 16:02:02-2016-07-27 20:12:08" = data$pasta, + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$pasta, "2016-07-27 20:12:08-2017-05-23 12:31:34" = data$pasta, "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$pasta ), synchronicity = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$synchronicity, - "2013-05-06 01:04:34-2016-07-12 15:59:25" = data$synchronicity, - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$synchronicity, - "2016-07-12 15:59:25-2016-07-27 20:12:08" = data$synchronicity, - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$synchronicity, + "2013-05-06 01:04:34-2016-07-12 15:30:02" = data$synchronicity, + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$synchronicity, + "2016-07-12 15:30:02-2016-07-12 16:02:02" = data$synchronicity, + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$synchronicity, + "2016-07-12 16:02:02-2016-07-27 20:12:08" = data$synchronicity, + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$synchronicity, "2016-07-27 20:12:08-2017-05-23 12:31:34" = data$synchronicity, "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$synchronicity ) @@ -1062,7 +1077,7 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -1108,7 +1123,7 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -1125,8 +1140,8 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding ## check time ranges expected = c( - "2013-04-21 23:52:09-2016-07-12 16:02:30", - "2016-07-12 16:02:30-2017-05-23 12:32:40" + "2013-04-21 23:52:09-2016-07-12 16:02:02", + "2016-07-12 16:02:02-2017-05-23 12:32:40" ) result = proj.conf$get.value("ranges") expect_equal(result, expected, info = "Time ranges (number.windows).") @@ -1134,34 +1149,34 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding ## check data for all ranges expected.data = list( commits = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$commits[1:2, ], - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$commits[3:8, ] + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$commits[1:2, ], + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$commits[3:8, ] ), commit.messages = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$commit.messages, - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$commit.messages + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$commit.messages, + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$commit.messages ), issues = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$issues[rownames(data$issues) %in% c(1:14, 20:22, 27:28), ], - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% c(15:19, 23:26, 29:36), ] + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$issues[rownames(data$issues) %in% c(1:14, 20:22, 27:28, 37:40, 43:44), ], + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% c(15:19, 23:26, 29:36, 41:42, 45:49), ] ), mails = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$mails[rownames(data$mails) %in% 14:15, ], - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$mails[rownames(data$mails) %in% 16:17, ] + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$mails[rownames(data$mails) %in% 14:15, ], + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$mails[rownames(data$mails) %in% 16:17, ] ), pasta = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$pasta, - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$pasta + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$pasta, + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$pasta ), synchronicity = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$synchronicity, - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$synchronicity + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$synchronicity, + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$synchronicity ) ) results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -1172,7 +1187,7 @@ test_that("Split a data object activity-based (activity.type = 'issues', sliding expect_error( split.data.activity.based(project.data, activity.type = "issues", - number.windows = nrow(project.data$get.issues()) + 10, sliding.window = TRUE), + number.windows = nrow(project.data$get.issues.filtered()) + 10, sliding.window = TRUE), info = "Error expected (number.windows) (1)." ) diff --git a/tests/test-split.R b/tests/test-split.R index bd8c8e52..947768a9 100644 --- a/tests/test-split.R +++ b/tests/test-split.R @@ -19,6 +19,7 @@ ## Copyright 2018 by Jakob Kronawitter ## Copyright 2019 by Anselm Fehnker ## Copyright 2021 by Niklas Schneider +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -75,7 +76,7 @@ test_that("Split a data object time-based (split.basis = 'commits').", { data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -107,9 +108,9 @@ test_that("Split a data object time-based (split.basis = 'commits').", { "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$issues[rownames(data$issues) %in% c(14, 20:22), ], - "2016-07-12 16:01:59-2016-07-12 16:04:59" = data$issues[rownames(data$issues) %in% c(15,29), ], - "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) == 23, ] + "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$issues[rownames(data$issues) %in% c(14, 20:22, 37:40), ], + "2016-07-12 16:01:59-2016-07-12 16:04:59" = data$issues[rownames(data$issues) %in% c(15,29, 47:49), ], + "2016-07-12 16:04:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) %in% c(23,41,45:46), ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:01:59" = data$mails[0, ], @@ -130,7 +131,7 @@ test_that("Split a data object time-based (split.basis = 'commits').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -157,7 +158,7 @@ test_that("Split a data object time-based (split.basis = 'mails').", { data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -196,7 +197,7 @@ test_that("Split a data object time-based (split.basis = 'mails').", { "2004-10-09 18:38:13-2007-10-10 12:38:13" = data$issues[0, ], "2007-10-10 12:38:13-2010-10-10 06:38:13" = data$issues[0, ], "2010-10-10 06:38:13-2013-10-10 00:38:13" = data$issues[rownames(data$issues) %in% 1:13, ], - "2013-10-10 00:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 27:29), ] + "2013-10-10 00:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 27:29, 37:40, 43:49), ] ), mails = list( "2004-10-09 18:38:13-2007-10-10 12:38:13" = data$mails[rownames(data$mails) %in% 1:2, ], @@ -220,7 +221,7 @@ test_that("Split a data object time-based (split.basis = 'mails').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -246,7 +247,7 @@ test_that("Split a data object time-based (split.basis = 'issues').", { data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -280,7 +281,7 @@ test_that("Split a data object time-based (split.basis = 'issues').", { ), issues = list( "2013-04-21 23:52:09-2015-04-22 11:52:09" = data$issues[rownames(data$issues) %in% 1:13, ], - "2015-04-22 11:52:09-2017-04-21 23:52:09" = data$issues[rownames(data$issues) %in% 14:34, ], + "2015-04-22 11:52:09-2017-04-21 23:52:09" = data$issues[rownames(data$issues) %in% c(14:34, 37:49), ], "2017-04-21 23:52:09-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% 35:36, ] ), mails = list( @@ -302,7 +303,7 @@ test_that("Split a data object time-based (split.basis = 'issues').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -330,7 +331,7 @@ test_that("Split a data object time-based (bins = ... ).", { data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -356,7 +357,7 @@ test_that("Split a data object time-based (bins = ... ).", { "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$commit.messages ), issues = list( - "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$issues[rownames(data$issues) %in% 14:34, ] + "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$issues[rownames(data$issues) %in% c(14:34, 37:49), ] ), mails = list( "2016-01-01 00:00:00-2016-12-31 23:59:59" = data$mails[rownames(data$mails) %in% 13:17, ] @@ -371,7 +372,7 @@ test_that("Split a data object time-based (bins = ... ).", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -476,7 +477,7 @@ test_that("Test splitting data by ranges", { expected.data = list( commits = lapply(expected.results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(expected.results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(expected.results, function(cf.data) cf.data$get.issues()), + issues = lapply(expected.results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(expected.results, function(cf.data) cf.data$get.mails()), pasta = lapply(expected.results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(expected.results, function(cf.data) cf.data$get.synchronicity()) @@ -484,7 +485,7 @@ test_that("Test splitting data by ranges", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -510,7 +511,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -542,7 +543,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ], + "2016-07-12 15:58:59-2016-07-12 16:06:10" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:41, 45:49), ], "2016-07-12 16:06:10-2016-07-12 16:06:32" = data$issues[rownames(data$issues) == 23, ], "2016-07-12 16:06:32-2016-07-12 16:06:33" = data$issues[0, ] ), @@ -565,7 +566,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -596,7 +597,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) %in% c(14:15, 20:23, 29), ] + "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$issues[rownames(data$issues) %in% c(14:15, 20:23, 29, 37:41, 45:49), ] ), mails = list( "2016-07-12 15:58:59-2016-07-12 16:06:33" = data$mails[rownames(data$mails) %in% 16:17, ] @@ -611,7 +612,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -645,7 +646,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { "2016-07-12 16:06:20-2016-07-12 16:06:33" = data$commit.messages ), issues = list( - "2016-07-12 15:58:59-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ], + "2016-07-12 15:58:59-2016-07-12 16:06:20" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:41, 45:49), ], "2016-07-12 16:06:20-2016-07-12 16:06:33" = data$issues[rownames(data$issues) == 23, ] ), mails = list( @@ -664,7 +665,7 @@ test_that("Split a data object activity-based (activity.type = 'commits').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -702,7 +703,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -746,8 +747,8 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { "2004-10-09 18:38:13-2010-07-12 11:05:35" = data$issues[0, ], "2010-07-12 11:05:35-2010-07-12 12:05:41" = data$issues[0, ], "2010-07-12 12:05:41-2010-07-12 12:05:44" = data$issues[0, ], - "2010-07-12 12:05:44-2016-07-12 15:58:40" = data$issues[rownames(data$issues) %in% c(1:13, 27:28), ], - "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29), ], + "2010-07-12 12:05:44-2016-07-12 15:58:40" = data$issues[rownames(data$issues) %in% c(1:13, 27:28, 43:44), ], + "2016-07-12 15:58:40-2016-07-12 16:05:37" = data$issues[rownames(data$issues) %in% c(14:15, 20:22, 29, 37:40, 45:49), ], "2016-07-12 16:05:37-2016-07-12 16:05:38" = data$issues[0, ] ), mails = list( @@ -778,7 +779,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -809,7 +810,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$commit.messages ), issues = list( - "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29), ] + "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29, 37:40, 43:45, 46:49), ] ), mails = list( "2004-10-09 18:38:13-2016-07-12 16:05:38" = data$mails @@ -824,7 +825,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -859,7 +860,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { ), issues = list( "2004-10-09 18:38:13-2010-07-12 12:05:43" = data$issues[0, ], - "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29), ] + "2010-07-12 12:05:43-2016-07-12 16:05:38" = data$issues[rownames(data$issues) %in% c(1:15, 20:22, 27:29, 37:40, 43:45, 46:49), ] ), mails = list( "2004-10-09 18:38:13-2010-07-12 12:05:43" = data$mails[rownames(data$mails) %in% 1:8, ], @@ -877,7 +878,7 @@ test_that("Split a data object activity-based (activity.type = 'mails').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -914,7 +915,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { data = list( commits = project.data$get.commits(), commit.messages = project.data$get.commit.messages(), - issues = project.data$get.issues(), + issues = project.data$get.issues.filtered(), mails = project.data$get.mails(), pasta = project.data$get.pasta(), synchronicity = project.data$get.synchronicity() @@ -927,8 +928,9 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## check time ranges expected = c( "2013-04-21 23:52:09-2013-05-25 06:22:23", - "2013-05-25 06:22:23-2016-07-12 16:03:59", - "2016-07-12 16:03:59-2016-10-05 15:30:02", + "2013-05-25 06:22:23-2016-07-12 15:59:59", + "2016-07-12 15:59:59-2016-07-12 16:06:30", + "2016-07-12 16:06:30-2016-10-05 15:30:02", "2016-10-05 15:30:02-2017-05-23 12:32:40" ) result = proj.conf$get.value("ranges") @@ -938,45 +940,51 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { expected.data = list( commits = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$commits[0, ], - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$commits[1:2, ], - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$commits[3:8, ], + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$commits[1, ], + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$commits[2:5, ], + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$commits[6:8, ], "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$commits[0, ] ), commit.messages = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$commit.messages, - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$commit.messages, - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$commit.messages, + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$commit.messages, + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$commit.messages, + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$commit.messages, "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$commit.messages ), issues = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$issues[rownames(data$issues) %in% 1:10, ], - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$issues[rownames(data$issues) %in% c(11:15, 20:22, 27:28), ], - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$issues[rownames(data$issues) %in% c(16:19, 23:25, 29:30), ], + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$issues[rownames(data$issues) %in% c(11:13, 20:21, 27:28, 43:44, 37:38), ], + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$issues[rownames(data$issues) %in% c(14:15, 22, 29, 39:41, 45:49), ], + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$issues[rownames(data$issues) %in% c(16:19, 23:25, 30, 42), ], "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% c(26, 31:36), ] ), mails = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$mails[0, ], - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$mails[rownames(data$mails) %in% 14:15, ], - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$mails[rownames(data$mails) %in% 16:17, ], + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$mails[rownames(data$mails) %in% 14:15, ], + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$mails[rownames(data$mails) %in% 16:17, ], + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$mails[0, ], "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$mails[0, ] ), pasta = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$pasta, - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$pasta, - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$pasta, + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$pasta, + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$pasta, + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$pasta, "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$pasta ), synchronicity = list( "2013-04-21 23:52:09-2013-05-25 06:22:23" = data$synchronicity, - "2013-05-25 06:22:23-2016-07-12 16:03:59" = data$synchronicity, - "2016-07-12 16:03:59-2016-10-05 15:30:02" = data$synchronicity, + "2013-05-25 06:22:23-2016-07-12 15:59:59" = data$synchronicity, + "2016-07-12 15:59:59-2016-07-12 16:06:30" = data$synchronicity, + "2016-07-12 16:06:30-2016-10-05 15:30:02" = data$synchronicity, "2016-10-05 15:30:02-2017-05-23 12:32:40" = data$synchronicity ) ) results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -1022,7 +1030,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -1039,8 +1047,8 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## check time ranges expected = c( - "2013-04-21 23:52:09-2016-07-12 16:02:30", - "2016-07-12 16:02:30-2017-05-23 12:32:40" + "2013-04-21 23:52:09-2016-07-12 16:02:02", + "2016-07-12 16:02:02-2017-05-23 12:32:40" ) result = proj.conf$get.value("ranges") expect_equal(result, expected, info = "Time ranges (number.windows).") @@ -1048,34 +1056,34 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## check data for all ranges expected.data = list( commits = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$commits[1:2, ], - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$commits[3:8, ] + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$commits[1:2, ], + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$commits[3:8, ] ), commit.messages = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$commit.messages, - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$commit.messages + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$commit.messages, + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$commit.messages ), issues = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$issues[rownames(data$issues) %in% c(1:14, 20:22, 27:28), ], - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% c(15:19, 23:26, 29:36), ] + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$issues[rownames(data$issues) %in% c(1:14, 20:22, 27:28, 37:40, 43:44), ], + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$issues[rownames(data$issues) %in% c(15:19, 23:26, 29:36, 41:42, 45:49), ] ), mails = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$mails[rownames(data$mails) %in% 14:15, ], - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$mails[rownames(data$mails) %in% 16:17, ] + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$mails[rownames(data$mails) %in% 14:15, ], + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$mails[rownames(data$mails) %in% 16:17, ] ), pasta = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$pasta, - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$pasta + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$pasta, + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$pasta ), synchronicity = list( - "2013-04-21 23:52:09-2016-07-12 16:02:30" = data$synchronicity, - "2016-07-12 16:02:30-2017-05-23 12:32:40" = data$synchronicity + "2013-04-21 23:52:09-2016-07-12 16:02:02" = data$synchronicity, + "2016-07-12 16:02:02-2017-05-23 12:32:40" = data$synchronicity ) ) results.data = list( commits = lapply(results, function(cf.data) cf.data$get.commits()), commit.messages = lapply(results, function(cf.data) cf.data$get.commit.messages()), - issues = lapply(results, function(cf.data) cf.data$get.issues()), + issues = lapply(results, function(cf.data) cf.data$get.issues.filtered()), mails = lapply(results, function(cf.data) cf.data$get.mails()), pasta = lapply(results, function(cf.data) cf.data$get.pasta()), synchronicity = lapply(results, function(cf.data) cf.data$get.synchronicity()) @@ -1085,7 +1093,7 @@ test_that("Split a data object activity-based (activity.type = 'issues').", { ## too large number of windows expect_error( - split.data.activity.based(project.data, activity.type = "issues", number.windows = nrow(project.data$get.issues()) + 10), + split.data.activity.based(project.data, activity.type = "issues", number.windows = nrow(project.data$get.issues.filtered()) + 10), info = "Error expected (number.windows) (1)." ) diff --git a/util-core-peripheral.R b/util-core-peripheral.R index b098336d..d385add9 100644 --- a/util-core-peripheral.R +++ b/util-core-peripheral.R @@ -20,6 +20,7 @@ ## Copyright 2018 by Klara Schlüter ## Copyright 2019 by Thomas Bock ## Copyright 2019 by Jakob Kronawitter +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. ## ## This file is derived from following Codeface script: @@ -578,15 +579,15 @@ get.author.class.network.hierarchy = function(network, result.limit = NULL, rest return(result) } - ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Commit-based classification --------------------------------------------- ## * Count-based classification -------------------------------------------- + #' Classify authors into "core" and "peripheral" based on authors' commit-counts and return the classification result. #' -#' The details of the classification algorithm is explained in the documentation of \code{get.author.class.by.type}. +#' The details of the classification algorithm are explained in the documentation of \code{get.author.class.by.type}. #' #' @param proj.data the \code{ProjectData} containing the authors' commit data #' @param result.limit the maximum number of authors contained in the classification result. Only the top @@ -615,147 +616,6 @@ get.author.class.commit.count = function(proj.data, result.limit = NULL, restric return(result) } -#' Get the commit count per comitter in the given range data, where the committer -#' does not match the author of the respective commits -#' -#' @param range.data The data to count on -#' -#' @return A data frame in descending order by the commit count -get.committer.not.author.commit.count = function(range.data) { - logging::logdebug("get.committer.not.author.commit.count: starting.") - - ## Get commit data - commits.df = range.data$get.commits.filtered() - - ## For each commit hash, make sure there is only one row - commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] - - ## Restrict commits to relevant columns - commits.df = commits.df[c("author.name", "committer.name")] - - ## Execute a query to get the commit count per author - res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df` - WHERE `committer.name` <> `author.name` - GROUP BY `committer.name`, `author.name` - ORDER BY `freq` DESC, `author.name` ASC") - - logging::logdebug("get.committer.not.author.commit.count: finished.") - return(res) -} - -#' Get the commit count per person in the given range data for commits where the author equals the committer. -#' -#' @param range.data The data to count on -#' -#' @return A data frame in descending order by the commit count -get.committer.and.author.commit.count = function(range.data) { - logging::logdebug("get.committer.and.author.commit.count: starting.") - - ## Get commit data - commits.df = range.data$get.commits.filtered() - - ## For each commit hash, make sure there is only one row - commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] - - ## Restrict commits to relevant columns - commits.df = commits.df[c("author.name", "committer.name")] - - ## Execute a query to get the commit count per person - res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df` - WHERE `committer.name` = `author.name` - GROUP BY `committer.name`, `author.name` - ORDER BY `freq` DESC, `author.name` ASC") - - logging::logdebug("get.committer.and.author.commit.count: finished.") - return(res) -} - -#' Get the commit count per person in the given range data where the person is committer or author or both. -#' -#' @param range.data The data to count on -#' -#' @return A data frame in descending order by the commit count -get.committer.or.author.commit.count = function(range.data) { - logging::logdebug("get.committer.or.author.commit.count: starting.") - - ## Get commit data - commits.df = range.data$get.commits.filtered() - - ## For each commit hash, make sure there is only one row - commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] - - ## Restrict commits to relevant columns - commits.df = commits.df[c("author.name", "committer.name")] - - ## Execute queries to get the commit count per person - ungrouped = sqldf::sqldf("SELECT `committer.name` AS `name` FROM `commits.df` - WHERE `committer.name` = `author.name` - UNION ALL - SELECT `author.name` AS `name` FROM `commits.df` - WHERE `author.name` <> `committer.name` - UNION ALL - SELECT `committer.name` AS `name` FROM `commits.df` - WHERE `author.name` <> `committer.name`") - - res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `ungrouped` - GROUP BY `name` - ORDER BY `freq` DESC, `name` ASC") - - logging::logdebug("get.committer.or.author.commit.count: finished.") - return(res) -} - -#' Get the commit count per committer in the given range data, where the committer -#' may match the author of the respective commits -#' -#' @param range.data The data to count on -#' -#' @return A data frame in descending order by the commit count. -get.committer.commit.count = function(range.data) { - logging::logdebug("get.committer.commit.count: starting.") - - ## Get commit data - commits.df = range.data$get.commits.filtered() - - ## For each commit hash, make sure there is only one row - commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] - - ## Restrict commits to relevant columns - commits.df = commits.df[c("committer.name")] - - ## Execute a query to get the commit count per author - res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df` - GROUP BY `committer.name` ORDER BY `freq` DESC, `committer.name` ASC") - - logging::logdebug("get.committer.commit.count: finished.") - return(res) -} - -#' Get the commit count for each author based on the commit data contained in the specified \code{ProjectData}. -#' -#' @param proj.data the \code{ProjectData} containing the commit data -#' -#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding -#' their respective commit counts -get.author.commit.count = function(proj.data) { - logging::logdebug("get.author.commit.count: starting.") - - ## Get commit data - commits.df = proj.data$get.commits.filtered() - - ## For each commit hash, make sure there is only one row - commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] - - ## Restrict commits to relevant columns - commits.df = commits.df[c("author.name")] - - ## Execute a query to get the commit count per author - res = sqldf::sqldf("SELECT `author.name`, COUNT(*) AS `freq` FROM `commits.df` - GROUP BY `author.name` ORDER BY `freq` DESC, `author.name` ASC") - - logging::logdebug("get.author.commit.count: finished.") - return(res) -} ## * LOC-based classification ---------------------------------------------- diff --git a/util-data-misc.R b/util-data-misc.R new file mode 100644 index 00000000..f6d22106 --- /dev/null +++ b/util-data-misc.R @@ -0,0 +1,421 @@ +## This file is part of coronet, which is free software: you +## can redistribute it and/or modify it under the terms of the GNU General +## Public License as published by the Free Software Foundation, version 2. +## +## This program is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +## GNU General Public License for more details. +## +## You should have received a copy of the GNU General Public License along +## with this program; if not, write to the Free Software Foundation, Inc., +## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +## Copyright 2017 by Ferdinand Frank +## Copyright 2017 by Sofie Kemper +## Copyright 2017-2020 by Claus Hunsen +## Copyright 2017 by Felix Prasse +## Copyright 2018 by Klara Schlüter +## Copyright 2019 by Jakob Kronawitter +## Copyright 2021 by Johannes Hostert +## All Rights Reserved. + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Libraries --------------------------------------------------------------- + +requireNamespace("sqldf") # for SQL-selections on data.frames +requireNamespace("logging") # for logging + +#' Helper function to mask all issues in the issue data frame. +#' +#' \code{ProjectData$get.issues()} returns a dataframe that mixes issue and PR data. +#' This helper function creates a vector of length \code{nrow(issue.data)} which has +#' entry \code{TRUE} iff the corresponding row in \code{issue.data} is an issue. +#' +#' @param issue.data the issue data, returned from calling \code{get.issues()} +#' or \code{get.issues.filtered()} on a project data object +#' +#' @return a vector containing \code{TRUE} or \code{FALSE} +#' +#' @seealso ProjectData$get.issues.filtered() +#' @seealso ProjectData$get.issues() +mask.issues = function(issue.data) { + return(sapply(issue.data[["issue.type"]], function(tags) {return("issue" %in% tags)})) +} + +#' Helper function to mask all pull requests in the issue data frame. +#' +#' \code{ProjectData$get.issues()} returns a dataframe that mixes issue and PR data. +#' This helper function creates a vector of length \code{nrow(issue.data)} which has +#' entry \code{TRUE} iff the corresponding row in \code{issue.data} is a pull request. +#' +#' @param issue.data the pull request data, returned from calling \code{get.issues()} +#' or \code{get.issues.filtered()} on a project data object +#' +#' @return a vector containing \code{TRUE} or \code{FALSE} +#' +#' @seealso ProjectData$get.issues.filtered() +#' @seealso ProjectData$get.issues() +mask.pull.requests = function(issue.data) { + return(sapply(issue.data[["issue.type"]], function(tags) {return("pull request" %in% tags)})) +} + +#' Get and preprocess issue data, removing unnecessary columns and rows we are not interested in. +#' +#' Retained columns are given in \code{retained.cols}, which defaults to +#' \code{author.name}, \code{issue.id} and \code{event.type}. +#' +#' Retained rows depend on the parameter \code{type}. If it is \code{"all"}, then all rows are retained. +#' Otherwise, only the rows containing information about either issues or pull requests are retained. +#' +#' Note that we preprocess the unfiltered issue data, since common filtering options typically +#' strip out some of the data we might explicitly want to retain. +#' +#' @param proj.data the \code{ProjectData} containing the mail data +#' @param retained.cols the columns to be retained. [default: c("author.name", "issue.id", "event.name")] +#' @param type which issue type to consider. +#' One of \code{"issues"}, \code{"pull.requests"} or \code{"all"} +#' [default: "all"] +#' +#' @return a filtered sub-data frame of the unfiltered issue data from \code{proj.data}. +preprocess.issue.data = function(proj.data, retained.cols = c("author.name", "issue.id", "event.name"), + type = c("all", "pull.requests", "issues")) { + type = match.arg(type) + df = proj.data$get.issues() + + ## forall vectors k, if nrow(df) == 0, then df[k, ..] fails + ## so we abort beforehand + if (nrow(df) == 0) { + return(df[retained.cols]) + } + + switch ( + type, + all = { + df = df[retained.cols] + }, + issues = { + df = df[mask.issues(df), retained.cols] + }, + pull.requests = { + df = df[mask.pull.requests(df), retained.cols] + }, + logging::logerror("Requested unknown issue type %s", type) + ) + return(df) +} + + + +#' Helper function that aggregates counts of things like commits, mails, ... on a per-author basis. +#' +#' For example, called with \code{name = "commit.count"}, \code{data.source = "commits"}, +#' \code{grouping.keys = c("committer.name")}, \code{remove.duplicates = TRUE} and +#' \code{remove.duplicates.by = c("hash")}, the returned function will: +#' +#' 1. get the proper data frame (using \code{DATASOURCE.TO.ARTIFACT.FUNCTION}), +#' 2. remove duplicate entries so that there is only one entry per commit hash, +#' 3. project away unneeded columns, leaving only "committer.name", +#' 4. count the commits grouped by the commiter name, +#' 5. return a data frame with columns "commiter.name" and "freq", +#' which contains the number of commits authored by each author. +#' +#' The signature of the returned function is \code{function(project.data)}. +#' +#' @param name the name the function will be bound to, for logging +#' @param data.source one of \code{"commits"}, \code{"mails"}, \code{"issues"} [default: "commits"] +#' @param grouping.keys the dataframe keys to group by +#' @param remove.duplicates whether to remove duplicates +#' @param remove.duplicates.by if \code{remove.duplicates}, then the key by which to remove duplicates +#' +#' @return a function that aggregates data according to the above specification contained in a given \code{ProjectData}. +#' This function itself returns a dataframe consisting of |grouping.keys|+1 columns, the last holding the count, +#' and the others the respective grouping +group.data.by.key = function(name, data.source = c("commits", "mails", "issues"), + grouping.keys, remove.duplicates, remove.duplicates.by) { + data.source = match.arg(data.source) + data.extractor = DATASOURCE.TO.ARTIFACT.FUNCTION[[data.source]] + return(function(proj.data) { + logging::logdebug("%s: starting", name) + + ## get the data we want to group + df = proj.data[[data.extractor]]() + ## if necessary, make sure that there is only one entry for each remove-duplicate key (combination) + if (remove.duplicates) { + df = df[!duplicated(df[[remove.duplicates.by]]), ] + } + + ## throw away unnecessary columns + df = df[grouping.keys] + grouping.keys.formatted = paste(grouping.keys, sep="`, `") + + ## execute a query that counts the number of occurrences of the grouping.keys + stmt = paste0("SELECT `", grouping.keys.formatted, "`, COUNT(*) as `freq` FROM `df` + GROUP BY `", grouping.keys.formatted, "` ORDER BY `freq` DESC, `", grouping.keys.formatted, "`") + logging::logdebug("%s: running SQL %s", name, stmt) + res = sqldf::sqldf(stmt) + + logging::logdebug("%s: finished", name) + return(res) + }) +} + + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Commit-based statistics ------------------------------------------------- + + +#' Get the commit count for each author based on the commit data contained in the specified \code{ProjectData}. +#' +#' @param proj.data the \code{ProjectData} containing the commit data +#' +#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding +#' their respective commit counts +get.author.commit.count = group.data.by.key("get.author.commit.count", "commits", + c("author.name"), TRUE, c("hash")) + +#' Get the commit count for each committer based on the commit data contained in the specified \code{ProjectData}. +#' The count is aggregated like in \code{get.author.commit.count}, but based on the "committer" commit attribute +#' +#' @param proj.data the data to count on +#' +#' @return a data frame in descending order by the commit count. +get.committer.commit.count = group.data.by.key("get.committer.commit.count", "commits", + c("committer.name"), TRUE, c("hash")) + + + +#' Get the commit count for each committer based on the commit data contained in the specified \code{ProjectData}. +#' The count is aggregated like in \code{get.author.commit.count}, but based on the "committer" commit attribute. +#' However, only commits where the commiter is *not* the author are considered. +#' +#' @param proj.data the data to count on +#' +#' @return a data frame in descending order by the commit count. +get.committer.not.author.commit.count = function(range.data) { + logging::logdebug("get.committer.not.author.commit.count: starting.") + + ## Get commit data + commits.df = range.data$get.commits.filtered() + + ## For each commit hash, make sure there is only one row + commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] + + ## Restrict commits to relevant columns + commits.df = commits.df[c("author.name", "committer.name")] + + ## Execute a query to get the commit count per author + res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df` + WHERE `committer.name` <> `author.name` + GROUP BY `committer.name`, `author.name` + ORDER BY `freq` DESC, `author.name` ASC") + + logging::logdebug("get.committer.not.author.commit.count: finished.") + return(res) +} + +#' Get the commit count for each person based on the commit data contained in the specified \code{ProjectData}. +#' The count is aggregated like in \code{get.author.commit.count}, but only considers commits where the "committer" and +#' "author" fields are identical. +#' +#' @param proj.data the data to count on +#' +#' @return a data frame in descending order by the commit count. +get.committer.and.author.commit.count = function(range.data) { + logging::logdebug("get.committer.and.author.commit.count: starting.") + + ## Get commit data + commits.df = range.data$get.commits.filtered() + + ## For each commit hash, make sure there is only one row + commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] + + ## Restrict commits to relevant columns + commits.df = commits.df[c("author.name", "committer.name")] + + ## Execute a query to get the commit count per person + res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `commits.df` + WHERE `committer.name` = `author.name` + GROUP BY `committer.name`, `author.name` + ORDER BY `freq` DESC, `author.name` ASC") + + logging::logdebug("get.committer.and.author.commit.count: finished.") + return(res) +} + +#' Get the commit count for each person based on the commit data contained in the specified \code{ProjectData}. +#' The count is aggregated like in \code{get.author.commit.count}, but only considers all commits where a person is +#' "committer" or "author" (or both, but one suffices). +#' +#' @param proj.data the data to count on +#' +#' @return a data frame in descending order by the commit count. +get.committer.or.author.commit.count = function(range.data) { + logging::logdebug("get.committer.or.author.commit.count: starting.") + + ## Get commit data + commits.df = range.data$get.commits.filtered() + + ## For each commit hash, make sure there is only one row + commits.df = commits.df[!duplicated(commits.df[["hash"]]), ] + + ## Restrict commits to relevant columns + commits.df = commits.df[c("author.name", "committer.name")] + + ## Execute queries to get the commit count per person + ungrouped = sqldf::sqldf("SELECT `committer.name` AS `name` FROM `commits.df` + WHERE `committer.name` = `author.name` + UNION ALL + SELECT `author.name` AS `name` FROM `commits.df` + WHERE `author.name` <> `committer.name` + UNION ALL + SELECT `committer.name` AS `name` FROM `commits.df` + WHERE `author.name` <> `committer.name`") + + res = sqldf::sqldf("SELECT *, COUNT(*) AS `freq` FROM `ungrouped` + GROUP BY `name` + ORDER BY `freq` DESC, `name` ASC") + + logging::logdebug("get.committer.or.author.commit.count: finished.") + return(res) +} + + + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Mail-based statistics --------------------------------------------------- + +#' Get the mail count for each author based on the mail data contained in the specified \code{ProjectData}. +#' +#' @param proj.data the \code{ProjectData} containing the mail data +#' +#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding +#' their respective mail counts +get.author.mail.count = group.data.by.key("get.author.mail.count", "mails", + c("author.name"), TRUE, c("message.id")) + +#' Get the mail-thread count for each author based on the mail data contained in the specified \code{ProjectData}. +#' This is the number of threads the author participated in, i.e., contributed at least one e-mail to. +#' +#' @param proj.data the \code{ProjectData} containing the mail data +#' +#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding +#' their respective mail thread counts +get.author.mail.thread.count = function(proj.data) { + logging::logdebug("get.author.mail.thread.count: starting.") + + mails.df = proj.data$get.mails() + ## Remove unnecessary rows and columns + mails.df = mails.df[!duplicated(mails.df[["message.id"]]), ] + mails.df = mails.df[c("author.name", "message.id", "thread")] + ## Only count each thread once + stmt = "SELECT `author.name`, COUNT(DISTINCT thread) as `freq` FROM `mails.df` + GROUP BY `author.name` ORDER BY `freq` DESC, `author.name` ASC" + logging::logdebug("get.author.mail.thread.count: running SQL %s", stmt) + res = sqldf::sqldf(stmt) + logging::logdebug("get.author.mail.thread.count: finished") + return(res) +} + +## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +## Issue-/PR-based statistics ---------------------------------------------- + +#' Get the issue/pr count for each author based on the issue data contained in the specified \code{ProjectData}. +#' The issue count here is the number of issues the author participated in (which can mean anything, +#' from commenting to closing to assigning the issue to others, to labeling, referencing it in other issues, +#' adding commits, ...). +#' +#' The type argument specifies whether we count PRs alone, issues alone, or both (\code{"all"}). +#' +#' @param proj.data the \code{ProjectData} containing the issue data +#' @param type which issue type to consider (see \code{preprocess.issue.data}). +#' One of \code{"issues"}, \code{"pull.requests"} or \code{"all"} +#' [default: "all"] +#' +#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding +#' their respective issue counts +get.author.issue.count = function(proj.data, type = c("all", "issues", "pull.requests")) { + type = match.arg(type) + logging::logdebug("get.author.issue.count: starting.") + df = preprocess.issue.data(proj.data, type = type) + ## count distinct since an author may appear in the same issue multiple times + stmt = "SELECT `author.name`, COUNT( DISTINCT `issue.id`) as `freq` FROM `df` + GROUP BY `author.name` ORDER BY `freq` DESC, `author.name` ASC" + res = sqldf::sqldf(stmt) + logging::logdebug("get.author.issue.count: finished") + return(res) +} + +#' Get the issue/pr count for each author based on the issue data contained in the specified \code{ProjectData}. +#' The issue count here is the number of issues the author created. +#' +#' The type argument specifies whether we count PRs alone, issues alone, or both (\code{"all"}). +#' +#' @param proj.data the \code{ProjectData} containing the issue data +#' @param type which issue type to consider (see \code{preprocess.issue.data}). +#' One of \code{"issues"}, \code{"pull.requests"} or \code{"all"} +#' [default: "all"] +#' +#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding +#' their respective issue counts +get.author.issues.created.count = function(proj.data, type = c("all", "issues", "pull.requests")) { + type = match.arg(type) + logging::logdebug("get.author.issues.created.count: starting.") + df = preprocess.issue.data(proj.data, type = type) + ## count distinct since an author may appear in the same issue multiple times + stmt = "SELECT `author.name`, COUNT( DISTINCT `issue.id`) as `freq` FROM `df` + WHERE `event.name` = 'created' + GROUP BY `author.name` ORDER BY `freq` DESC, `author.name` ASC" + res = sqldf::sqldf(stmt) + logging::logdebug("get.author.issues.created.count: finished") + return(res) +} + +#' Get the issue/pr count for each author based on the issue data contained in the specified \code{ProjectData}. +#' The issue count here is the number of issues the author commented in. +#' +#' The type argument specifies whether we count PRs alone, issues alone, or both (\code{"all"}). +#' +#' @param proj.data the \code{ProjectData} containing the issue data +#' @param type which issue type to consider (see \code{preprocess.issue.data}). +#' One of \code{"issues"}, \code{"pull.requests"} or \code{"all"} +#' [default: "all"] +#' +#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding +#' their respective issue counts +get.author.issues.commented.in.count = function(proj.data, type = c("all", "issues", "pull.requests")) { + type = match.arg(type) + logging::logdebug("get.author.issues.commented.in.count: starting.") + df = preprocess.issue.data(proj.data, type = type) + ## count distinct since an author may appear in the same issue multiple times + stmt = "SELECT `author.name`, COUNT( DISTINCT `issue.id`) as `freq` FROM `df` + WHERE `event.name` = 'commented' + GROUP BY `author.name` ORDER BY `freq` DESC, `author.name` ASC" + res = sqldf::sqldf(stmt) + logging::logdebug("get.author.issues.commented.in.count: finished") + return(res) +} + +#' Get the issue/pr comment count for each author based on the issue data contained in the specified \code{ProjectData}. +#' The issue comment count here is the number of comments the author created summed across all issues +#' +#' The type argument specifies whether we count PRs alone, issues alone, or both (\code{"all"}). +#' +#' @param proj.data the \code{ProjectData} containing the issue data +#' @param type which issue type to consider (see \code{preprocess.issue.data}). +#' One of \code{"issues"}, \code{"pull.requests"} or \code{"all"} +#' [default: "all"] +#' +#' @return a dataframe consisting of two columns, the first of which holding the authors' names and the second holding +#' their respective comment counts +get.author.issue.comment.count = function(proj.data, type = c("all", "issues", "pull.requests")) { + type = match.arg(type) + logging::logdebug("get.author.issue.comment.count: starting.") + df = preprocess.issue.data(proj.data, type = type) + stmt = "SELECT `author.name`, COUNT(*) as `freq` FROM `df` + WHERE `event.name` = 'commented' + GROUP BY `author.name` ORDER BY `freq` DESC, `author.name` ASC" + res = sqldf::sqldf(stmt) + logging::logdebug("get.author.issue.comment.count: finished") + return(res) +} diff --git a/util-data.R b/util-data.R index 0a47118d..fb0a119b 100644 --- a/util-data.R +++ b/util-data.R @@ -22,6 +22,7 @@ ## Copyright 2018-2019 by Jakob Kronawitter ## Copyright 2019-2020 by Anselm Fehnker ## Copyright 2020-2021 by Niklas Schneider +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -56,7 +57,7 @@ BASE.ARTIFACTS = c( DATASOURCE.TO.ARTIFACT.FUNCTION = list( "commits" = "get.commits.filtered", "mails" = "get.mails", - "issues" = "get.issues" + "issues" = "get.issues.filtered" ) ## mapping of data source to artifact column @@ -111,6 +112,7 @@ ProjectData = R6::R6Class("ProjectData", mails.patchstacks = NULL, # list ## issues issues = NULL, #data.frame + issues.filtered = NULL, #data.frame ## authors authors = NULL, # data.frame ## additional data sources @@ -147,6 +149,25 @@ ProjectData = R6::R6Class("ProjectData", return(commits) }, + ## * * issue filtering -------------------------------------------- + + #' Filter issue by potentially removing all issue events that are not comments. + #' + #' @param issues the data.frame of issues on which filtering will be applied + #' @param issues.only.comments flag whether non-comment issue events are removed + #' + #' @return the issues after all filters have been applied + filter.issues = function(issues, issues.only.comments) { + logging::logdebug("filter.issues: starting.") + + if (issues.only.comments) { + issues = issues[issues[["event.name"]] == "commented", ] + } + + logging::logdebug("filter.issues: finished.") + return(issues) + }, + ## * * mail filtering ---------------------------------------------- #' Filter patchstack mails from the mails that are currently cached in the field \code{mails} and return them. @@ -545,6 +566,7 @@ ProjectData = R6::R6Class("ProjectData", private$commit.messages = NULL private$mails = NULL private$issues = NULL + private$issues.filtered = NULL private$authors = NULL private$synchronicity = NULL private$pasta = NULL @@ -1066,10 +1088,47 @@ ProjectData = R6::R6Class("ProjectData", private$authors = data }, - #' Get the issue data. + #' Get the issue data, filtered according to options in the project configuration: + #' * The option \code{issues.only.comments} removes all events that are not comments + #' from the issue data. + #' + #' If it does not already exist, call the read method. + #' + #' @return the issue data + get.issues.filtered = function() { + logging::loginfo("Getting issue data") + + ## if issues have not been read yet do this + if (is.null(private$issues.filtered)) { + private$issues.filtered = private$filter.issues( + self$get.issues(), + private$project.conf$get.value("issues.only.comments")) + } + return(private$issues.filtered) + }, + + #' Get the issue data, filtered according the parameters. + #' + #' Unlike \code{get.issues.filtered}, this method does not use caching. If you want caching, please use + #' that method instead. + #' + #' @param issues.only.comments flag whether issue events that are not comments are retained + #' (i.e. opening, closing, ...). + #' + #' @return the issue data + #' + #' @seealso get.issues.filtered + get.issues.filtered.uncached = function(issues.only.comments) { + logging::loginfo("Getting issue data") + return(private$filter.issues(self$get.issues(), issues.only.comments)) + }, + + #' Get the issue data, unfiltered. #' If it does not already exist call the read method. #' #' @return the issue data + #' + #' @seealso get.issues.filtered for a detailed description of filtering options get.issues = function() { logging::loginfo("Getting issue data") @@ -1078,13 +1137,7 @@ ProjectData = R6::R6Class("ProjectData", private$issues = read.issues(self$get.data.path.issues(), private$project.conf$get.value("issues.from.source")) } private$extract.timestamps(source = "issues") - - if (private$project.conf$get.value("issues.only.comments")) { - df = private$issues[private$issues[["event.name"]] == "commented", ] - return(df) - } else { - return(private$issues) - } + return(private$issues) }, #' Set the issue data to the given new data. @@ -1098,6 +1151,7 @@ ProjectData = R6::R6Class("ProjectData", } private$issues = data + private$issues.filtered = NULL }, #' Get the list of artifacts from the given \code{data.source} of the project. diff --git a/util-init.R b/util-init.R index 30c7cc2e..7e2d54d1 100644 --- a/util-init.R +++ b/util-init.R @@ -18,6 +18,7 @@ ## Copyright 2017 by Felix Prasse ## Copyright 2019 by Klara Schlüter ## Copyright 2019-2020 by Anselm Fehnker +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. @@ -54,6 +55,7 @@ source("util-misc.R") source("util-conf.R") source("util-read.R") source("util-data.R") +source("util-data-misc.R") source("util-networks.R") source("util-split.R") source("util-motifs.R") diff --git a/util-networks-covariates.R b/util-networks-covariates.R index 1aa00603..76e54376 100644 --- a/util-networks-covariates.R +++ b/util-networks-covariates.R @@ -17,6 +17,7 @@ ## Copyright 2018-2019 by Klara Schlüter ## Copyright 2018 by Jakob Kronawitter ## Copyright 2020 by Christian Hechtl +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @@ -345,6 +346,193 @@ add.vertex.attribute.commit.count.helper = function(list.of.networks, project.da return(nets.with.attr) } + +## * Mail count ---------------------------------------------------------- + +#' Add mail-count attribute based on the total number of mails sent, +#' where the person represented by the vertex is the author. +#' +#' @param list.of.networks The network list +#' @param project.data The project data +#' @param name The attribute name to add [default: "mail.count"] +#' @param aggregation.level Determines the data to use for the attribute calculation. +#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, +#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and +#' \code{"complete"}. See \code{split.data.by.networks} for +#' more details. [default: "range"] +#' @param default.value The default value to add if a vertex has no matching value [default: 0L] +#' +#' @return A list of networks with the added attribute +add.vertex.attribute.mail.count = function(list.of.networks, project.data, + name = "mail.count", + aggregation.level = c("range", "cumulative", "all.ranges", + "project.cumulative", "project.all.ranges", + "complete"), + default.value = 0L) { + nets.with.attr = add.vertex.attribute.commit.count.helper( + list.of.networks, project.data, name, aggregation.level, + default.value, get.author.mail.count, "author.name" + ) + + return(nets.with.attr) +} +#' Add mail-thread-count attribute based on the number of mail threads participated in, +#' where the person represented by the vertex is the author. +#' +#' @param list.of.networks The network list +#' @param project.data The project data +#' @param name The attribute name to add [default: "mail.thread.count"] +#' @param aggregation.level Determines the data to use for the attribute calculation. +#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, +#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and +#' \code{"complete"}. See \code{split.data.by.networks} for +#' more details. [default: "range"] +#' @param default.value The default value to add if a vertex has no matching value [default: 0L] +#' +#' @return A list of networks with the added attribute +add.vertex.attribute.mail.thread.count = function(list.of.networks, project.data, + name = "mail.thread.count", + aggregation.level = c("range", "cumulative", "all.ranges", + "project.cumulative", "project.all.ranges", + "complete"), + default.value = 0L) { + nets.with.attr = add.vertex.attribute.commit.count.helper( + list.of.networks, project.data, name, aggregation.level, + default.value, get.author.mail.thread.count, "author.name" + ) + + return(nets.with.attr) +} + +## * Issue / PR count -------------------------------------------------------------- + +#' Add issue-count attribute based on the number of issues participated in, +#' where the person represented by the vertex is the author. +#' +#' @param list.of.networks The network list +#' @param project.data The project data +#' @param name The attribute name to add. You might want to change this [default: "issue.count"] +#' @param aggregation.level Determines the data to use for the attribute calculation. +#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, +#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and +#' \code{"complete"}. See \code{split.data.by.networks} for +#' more details. [default: "range"] +#' @param default.value The default value to add if a vertex has no matching value [default: 0L] +#' @param issue.type The issue kind,see \code{preprocess.issue.data} [default: "all"] +#' +#' @return A list of networks with the added attribute +add.vertex.attribute.issue.count = function(list.of.networks, project.data, + name = "issue.count", + aggregation.level = c("range", "cumulative", "all.ranges", + "project.cumulative", "project.all.ranges", + "complete"), + default.value = 0L, issue.type = c("all", "pull.requests", "issues")) { + if (name == "issue.count" && identical(issue.type, "pull.requests")) { + name = "pull.request.count" + } + nets.with.attr = add.vertex.attribute.commit.count.helper( + list.of.networks, project.data, name, aggregation.level, + default.value, function(data) {return(get.author.issue.count(data, type = issue.type))}, "author.name" + ) + + return(nets.with.attr) +} + +#' Add issue-count attribute based on the number of issues participated in by commenting, +#' where the person represented by the vertex is the author. +#' +#' @param list.of.networks The network list +#' @param project.data The project data +#' @param name The attribute name to add [default: "issues.commented.count"] +#' @param aggregation.level Determines the data to use for the attribute calculation. +#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, +#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and +#' \code{"complete"}. See \code{split.data.by.networks} for +#' more details. [default: "range"] +#' @param default.value The default value to add if a vertex has no matching value [default: 0L] +#' @param issue.type The issue kind,see \code{preprocess.issue.data} [default: "all"] +#' +#' @return A list of networks with the added attribute +add.vertex.attribute.issues.commented.count = function(list.of.networks, project.data, + name = "issues.commented.count", + aggregation.level = c("range", "cumulative", "all.ranges", + "project.cumulative", "project.all.ranges", + "complete"), + default.value = 0L, issue.type = c("all", "pull.requests", "issues")) { + if (name == "issues.commented.count" && identical(issue.type, "pull.requests")) { + name = "pull.requests.commented.count" + } + nets.with.attr = add.vertex.attribute.commit.count.helper( + list.of.networks, project.data, name, aggregation.level, + default.value, function(data) {return(get.author.issues.commented.in.count(data, type = issue.type))}, "author.name" + ) + + return(nets.with.attr) +} + +#' Add issue-count attribute based on the number of issues created, +#' where the person represented by the vertex is the author. +#' +#' @param list.of.networks The network list +#' @param project.data The project data +#' @param name The attribute name to add [default: "issue.creation.count"] +#' @param aggregation.level Determines the data to use for the attribute calculation. +#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, +#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and +#' \code{"complete"}. See \code{split.data.by.networks} for +#' more details. [default: "range"] +#' @param default.value The default value to add if a vertex has no matching value [default: 0L] +#' @param issue.type The issue kind,see \code{preprocess.issue.data} [default: "all"] +#' +#' @return A list of networks with the added attribute +add.vertex.attribute.issue.creation.count = function(list.of.networks, project.data, + name = "issue.creation.count", + aggregation.level = c("range", "cumulative", "all.ranges", + "project.cumulative", "project.all.ranges", + "complete"), + default.value = 0L, issue.type = c("all", "pull.requests", "issues")) { + if (name == "issue.creation.count" && identical(issue.type, "pull.requests")) { + name = "pull.request.creation.count" + } + nets.with.attr = add.vertex.attribute.commit.count.helper( + list.of.networks, project.data, name, aggregation.level, + default.value, function(data) {return(get.author.issues.created.count(data, type = issue.type))}, "author.name" + ) + + return(nets.with.attr) +} + +#' Add issue-comments-count attribute based on the number of comments in issues, where the person represented by the vertex is the author. +#' +#' @param list.of.networks The network list +#' @param project.data The project data +#' @param name The attribute name to add [default: "issue.comment.count"] +#' @param aggregation.level Determines the data to use for the attribute calculation. +#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"}, +#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and +#' \code{"complete"}. See \code{split.data.by.networks} for +#' more details. [default: "range"] +#' @param default.value The default value to add if a vertex has no matching value [default: 0L] +#' @param issue.type The issue kind,see \code{preprocess.issue.data} [default: "all"] +#' +#' @return A list of networks with the added attribute +add.vertex.attribute.issue.comment.count = function(list.of.networks, project.data, + name = "issue.comment.count", + aggregation.level = c("range", "cumulative", "all.ranges", + "project.cumulative", "project.all.ranges", + "complete"), + default.value = 0L, issue.type = c("all", "pull.requests", "issues")) { + if (name == "issue.comment.count" && identical(issue.type, "pull.requests")) { + name = "pull.request.comment.count" + } + nets.with.attr = add.vertex.attribute.commit.count.helper( + list.of.networks, project.data, name, aggregation.level, + default.value, function(data) {return(get.author.issue.comment.count(data, type = issue.type))}, "author.name" + ) + + return(nets.with.attr) +} + ## * Meta-data ------------------------------------------------------------- #' Add author email attribute diff --git a/util-read.R b/util-read.R index 57d4559a..ae6b42a6 100644 --- a/util-read.R +++ b/util-read.R @@ -20,6 +20,7 @@ ## Copyright 2018 by Jakob Kronawitter ## Copyright 2018-2019 by Anselm Fehnker ## Copyright 2020-2021 by Niklas Schneider +## Copyright 2021 by Johannes Hostert ## All Rights Reserved. ## Note: @@ -283,6 +284,10 @@ ISSUES.LIST.DATA.TYPES = c( #' Read and parse the issue data from the 'issues.list' file. #' +#' Note: The dates in the \code{"date"} column may be remapped to the creation date of the corresponding issue, +#' especially for \code{"commit_added"} events. This happens when the event has happened before the issue creation date. +#' The original date of these events can always be found in the \code{"event.info.2"} column. +#' #' @param data.path the path to the issue data #' @param issues.sources the sources of the issue data. One or both of \code{"jira"} and \code{"github"}. #' @@ -293,7 +298,7 @@ read.issues = function(data.path, issues.sources = c("jira", "github")) { ## check arguments issues.sources = match.arg(arg = issues.sources, several.ok = TRUE) - ## read data from choosen sources + ## read data from chosen sources issue.data = lapply(issues.sources, function(issue.source) { ## get file name of source issue data @@ -342,6 +347,15 @@ read.issues = function(data.path, issues.sources = c("jira", "github")) { issue.data[["date"]] = get.date.from.string(issue.data[["date"]]) issue.data[["creation.date"]] = get.date.from.string(issue.data[["creation.date"]]) issue.data[["closing.date"]] = get.date.from.string(issue.data[["closing.date"]]) + + ## fix all dates to be after the creation date. + ## violations can happen for "commit_added" events if the commit was made before the PR was opened + ## the original date for "commit_added" events is stored in "event.info.2" in any case + commit.added.events = issue.data[["event.name"]] == "commit_added" + issue.data[commit.added.events, "event.info.2"] = get.date.string(issue.data[commit.added.events, "date"]) + commit.added.events.before.creation = commit.added.events & + !is.na(issue.data["creation.date"]) & (issue.data["date"] < issue.data["creation.date"]) + issue.data[commit.added.events.before.creation, "date"] = issue.data[commit.added.events.before.creation, "creation.date"] issue.data = issue.data[order(issue.data[["date"]], decreasing = FALSE), ] # sort! ## generate a unique event ID from issue ID, author, and date