diff --git a/NEWS.md b/NEWS.md index e58c86118..7df8b15f8 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,6 +2,18 @@ # coronet – Changelog +## unversioned + +### Added + +- Add commit-interaction data and add functions `read.commit.interactions` for reading, as well as `get.commit.interactions`, `set.commit.interactions` and utility functions for working with commit-interaction data (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, b4fd2a29c9b5fd561b1106c6febb54a32b0085ab, fd0aa05f824b93545ae8e05833b95b3bd9809286, bca35760eb0aac86c04923f2d534b2d8cece204e) as well as tests for these features (PR #252, eeba7e29932bc973513c963fb9e716e9230d570f, 8bb39f4df39b49dfaff8f19feb6db5e5fbd81fac, 54b6f655248720436af116fe72521f9cb0348429, 7a5497aaf9114017d1b3b9b68b6cccd7ca8ac114, 7b8585f87675795822c07230192d6454de31dcc7, ef725407bf8818c8fff96ea6f343338b7162cbe0) +- Add commit-interaction networks that can be created with `create.author.network` and `create.artifact.network` if the `artifact.relation` and `author.relation` is configured to be `commit.interaction` (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, 329d97ec3de36a9e1bcadc0c7a53c1d92e8b481c) as well as tests for these features (PR #252, 07e7ed744209b0251217fa8f7f35d9b9875face2, 7068cfa10d993dcae3f5e3f76f8cafa99fa8b350) +- Add helper function for prefixing function names with file names in `util-read.R` (PR #252, f8ea987b138173cf0509c7910e0572d8ee1b3f1f) + +### Changed/Improved + +### Fixed + ## 4.4 ### Announcement diff --git a/README.md b/README.md index 62c029b33..e8bc0877f 100644 --- a/README.md +++ b/README.md @@ -142,6 +142,8 @@ Alternatively, you can run `Rscript install.R` to install the packages. - `jsonlite`: For parsing the issue data - `rTensor`: For calculating EDCPTD centrality - `Matrix`: For sparse matrix representation of large adjacency matrices +- `fastmap`: For fast implementation of a map +- `purrr`: For fast implementtion of a mapping function ### Submodule @@ -264,6 +266,11 @@ Relations determine which information is used to construct edges among the verti * For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts are connected when they reference each other (i.e., one artifact calls a function contained in the other artifact). * For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits (same as for the relation `cochange`). +- `commit.interaction` + * For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who contribute to interacting commits are connected with an edge. + * For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), artifacts are connected when there is an interaction between two commits that occur in the artifacts. + * This relation does not apply for bipartite networks. + #### Edge-construction algorithms for author networks When constructing author networks, we use events in time (i.e., commits, e-mails, issue events) to model interactions among authors on the same artifact as edges. Therefore, we group the events on artifacts, based on the configured relation (see the [previous section](#relations)). @@ -597,6 +604,12 @@ There is no way to update the entries, except for the revision-based parameters. - `custom.event.timestamps.locked`: * Lock custom event timestamps to prevent them from being read if empty or not yet present when calling the getter. * [`TRUE`, *`FALSE`*] +- `commit.interactions`: + * Allow construction of author and artifact networks using commit-interaction data + * [`TRUE`, *`FALSE`*] +- `commit.interactions.filter.global`: + * Filter out entries from commit interaction data that are not matched to a specific function or file + * [*`TRUE`*, `FALSE`] ### NetworkConf diff --git a/install.R b/install.R index 99f047ccc..5a8d57438 100644 --- a/install.R +++ b/install.R @@ -19,6 +19,7 @@ ## Copyright 2020-2023 by Thomas Bock ## Copyright 2019 by Anselm Fehnker ## Copyright 2021 by Christian Hechtl +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. ## ## Adapted from https://github.com/siemens/codeface/blob/be382e9171fb91b4aa99b99b09b2ef64a6dba0d5/packages.r @@ -44,7 +45,9 @@ packages = c( "viridis", "jsonlite", "rTensor", - "Matrix" + "Matrix", + "fastmap", + "purrr" ) diff --git a/tests/README.md b/tests/README.md index 6eb557919..b6558dc13 100644 --- a/tests/README.md +++ b/tests/README.md @@ -16,6 +16,7 @@ We have two test projects you can use when writing your tests: * Commit messages * Pasta * Synchronicity + * Commit interactions * Custom event timestamps in `custom-events.list` * Revisions 2. - Casestudy: `test_empty` diff --git a/tests/codeface-data/results/testing/test_empty_proximity/proximity/commit-interactions.yaml b/tests/codeface-data/results/testing/test_empty_proximity/proximity/commit-interactions.yaml new file mode 100644 index 000000000..e69de29bb diff --git a/tests/codeface-data/results/testing/test_proximity/proximity/commit-interactions.yaml b/tests/codeface-data/results/testing/test_proximity/proximity/commit-interactions.yaml new file mode 100644 index 000000000..8e8b01867 --- /dev/null +++ b/tests/codeface-data/results/testing/test_proximity/proximity/commit-interactions.yaml @@ -0,0 +1,55 @@ +scope: REGION +result-map: + test_function: + demangled-name: test_function + file: test3.c + num-instructions: 30 + insts: + - base-hash: + region: 45620620587549 + function: test_function + commit: 1143db502761379c2bfcecc2007fc34282e7ee61 + repository: test-repo + interacting-hashes: + - region: 87546092348456 + commit: 5a5ec9675e98187e1e92561e1888aa6f04faa338 + repository: test-repo + amount: 2 + callees: + - test_callee + commits: + - commit: 3383d8e5561dfc6fb2b65e0a194df94ccb5e08af + repository: test-repo + test2: + demangled-name: test2 + file: test2.c + num-instructions: 26 + insts: + - base-hash: + region: 50956672345141 + commit: 3a0ed78458b3976243db6829f63eba3eead26774 + repository: test-repo + interacting-hashes: + - region: 98750276234511 + commit: 0a1a5c523d835459c42f33e863623138555e2526 + repository: test-repo + amount: 1 + - base-hash: + region: 67230588834344 + commit: 0a1a5c523d835459c42f33e863623138555e2526 + repository: test-repo + interacting-hashes: + - region: 33295067820043 + function: test2 + commit: 418d1dc4929ad1df251d2aeb833dd45757b04a6f + repository: test-repo + - region: 20194653678423 + function: test2 + commit: d01921773fae4bed8186b0aa411d6a2f7a6626e6 + repository: test-repo + amount: 3 + callees: + - test_callee + commits: + - commit: 3383d8e5561dfc6fb2b65e0a194df94ccb5e08af + repository: test-repo diff --git a/tests/test-data.R b/tests/test-data.R index 9c6f4f8cb..aa665ac48 100644 --- a/tests/test-data.R +++ b/tests/test-data.R @@ -20,6 +20,7 @@ ## Copyright 2021 by Mirabdulla Yusifli ## Copyright 2022 by Jonathan Baumann ## Copyright 2023 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. @@ -98,6 +99,13 @@ test_that("Compare two ProjectData objects on empty data", { proj.data.two$set.project.conf.entry("commit.messages", "message") proj.data.two$get.commit.messages() expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects (commit.messages).") + + proj.data.one$set.project.conf.entry("commit.interactions", TRUE) + proj.data.one$get.commit.interactions() + expect_false(proj.data.one$equals(proj.data.two), "Two non-identical ProjectData objects (commit.interactions).") + proj.data.two$set.project.conf.entry("commit.interactions", TRUE) + proj.data.two$get.commit.interactions() + expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects (commit.interactions).") }) test_that("Compare two ProjectData objects on non-empty data", { @@ -511,3 +519,78 @@ test_that("Create RangeData objects from Codeface ranges and check data path", { expect_identical(range.paths, expected.paths, "RangeData data paths") }) + +test_that("Compare two ProjectData Objects with commit.interactions", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") + proj.conf$update.value("commit.interactions", TRUE) + proj.conf$update.value("commits.filter.untracked.files", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) + proj.conf$update.value("commit.interactions.filter.global", FALSE) + + proj.data.one = ProjectData$new(project.conf = proj.conf) + proj.data.two = proj.data.one$clone(deep = TRUE) + + ## test if the project data is equal and the commit interactions are as well + expect_equal(proj.data.one$get.commit.interactions(), proj.data.two$get.commit.interactions()) + expect_true(proj.data.one$equals(proj.data.two)) + + ## change commit interactions of one project data and assert that equality check fails + proj.data.two$set.commit.interactions(create.empty.commit.interaction.list()) + expect_false(proj.data.one$equals(proj.data.two)) + + ## change commit data in one to test if commit-interactions are correctly updated + ## call get.commit.interactions() once to restore read interactions + proj.data.two$get.commit.interactions() + + ## change commits in one project data + commit.data = proj.data.one$get.commits() + commit.data[["hash"]][[5]] = 1 + proj.data.one$set.commits(commit.data) + + ## use isTRUE to compress result of all.equal into a single boolean + expect_false(isTRUE(all.equal(proj.data.one$get.commit.interactions(), + proj.data.two$get.commit.interactions()))) + + ## The data frame should still have 4 entries: + expect_true(nrow(proj.data.one$get.commit.interactions()) == 4) + ## after cleanup is called, the data frame should only have 3 entries: + proj.data.one$cleanup.commit.interactions() + expect_true(nrow(proj.data.one$get.commit.interactions()) == 3) + + ## set commit list of one project data to empty and test that last + ## two rows of result data frame are empty + proj.data.two$set.commits(create.empty.commits.list()) + + ## create empty data frame of correct size + commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8)) + ## assure that the correct type is used + for(i in seq_len(8)) { + commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]]) + } + ## set everything except for authors as expected + colnames(commit.interactions.data.expected) = c("commit.hash", "base.hash", "func", "file", + "base.func", "base.file", "base.author", + "interacting.author") + commit.interactions.data.expected[["commit.hash"]] = + c("0a1a5c523d835459c42f33e863623138555e2526", + "418d1dc4929ad1df251d2aeb833dd45757b04a6f", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "d01921773fae4bed8186b0aa411d6a2f7a6626e6") + commit.interactions.data.expected[["base.hash"]] = + c("3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526") + commit.interactions.data.expected[["func"]] = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2") + commit.interactions.data.expected[["file"]] = c("GLOBAL", "test2.c", "GLOBAL", "test2.c") + commit.interactions.data.expected[["base.func"]] = c("test2.c::test2", "test2.c::test2", + "test3.c::test_function", "test2.c::test2") + commit.interactions.data.expected[["base.file"]] = c("test2.c", "test2.c", "test3.c", "test2.c") + + expect_equal(proj.data.two$get.commit.interactions(), commit.interactions.data.expected) + + ## reactivate filtering of commit interactions + proj.data.two$set.project.conf.entry("commit.interactions.filter.global", TRUE) + expect_true(nrow(proj.data.two$get.commit.interactions()) == 2) +}) diff --git a/tests/test-networks-artifact.R b/tests/test-networks-artifact.R index 253e08ba5..79251c606 100644 --- a/tests/test-networks-artifact.R +++ b/tests/test-networks-artifact.R @@ -212,3 +212,101 @@ patrick::with_parameters_test_that("Network construction of an empty 'comments-o "directed: FALSE" = list(test.directed = FALSE), "directed: TRUE" = list(test.directed = TRUE) )) + +patrick::with_parameters_test_that("Network construction with commit-interactions as relation, artifact type 'file'", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") + proj.conf$update.value("commit.interactions", TRUE) + proj.conf$update.value("commits.filter.untracked.files", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) + proj.conf$update.value("commit.interactions.filter.global", FALSE) + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(artifact.relation = "commit.interaction", + artifact.directed = test.directed)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.artifact.network() + ## build the expected nbetwork + vertices = data.frame( + name = c("test2.c", "test3.c", "GLOBAL"), + kind = "File", + type = TYPE.ARTIFACT + ) + edges = data.frame( + from = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"), + to = c("test2.c", "test2.c", "test3.c", "test2.c"), + func = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"), + hash = c("0a1a5c523d835459c42f33e863623138555e2526", + "418d1dc4929ad1df251d2aeb833dd45757b04a6f", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "d01921773fae4bed8186b0aa411d6a2f7a6626e6"), + base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526"), + base.func = c("test2.c::test2", "test2.c::test2", + "test3.c::test_function", "test2.c::test2"), + base.author = c("Olaf", "Thomas", "Karl", "Thomas"), + interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"), + weight = c(1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction") + ) + network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices) + + expect_true(igraph::identical_graphs(network.built, network)) +}, patrick::cases( + "directed: FALSE" = list(test.directed = FALSE), + "directed: TRUE" = list(test.directed = TRUE) +)) + +patrick::with_parameters_test_that("Network construction with commit-interactions as relation, artifact type 'function'", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "function") + proj.conf$update.value("commit.interactions", TRUE) + proj.conf$update.value("commits.filter.untracked.files", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) + proj.conf$update.value("commit.interactions.filter.global", FALSE) + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(artifact.relation = "commit.interaction", + artifact.directed = test.directed)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.artifact.network() + ## build the expected network + vertices = data.frame( + name = c("test2.c::test2", "test3.c::test_function", "GLOBAL"), + kind = "Function", + type = TYPE.ARTIFACT + ) + edges = data.frame( + from = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"), + to = c("test2.c::test2", "test2.c::test2", + "test3.c::test_function", "test2.c::test2"), + hash = c("0a1a5c523d835459c42f33e863623138555e2526", + "418d1dc4929ad1df251d2aeb833dd45757b04a6f", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "d01921773fae4bed8186b0aa411d6a2f7a6626e6"), + file = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"), + base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526"), + base.file = c("test2.c", "test2.c", "test3.c", "test2.c"), + base.author = c("Olaf", "Thomas", "Karl", "Thomas"), + interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"), + weight = c(1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction") + ) + network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices) + + expect_true(igraph::identical_graphs(network.built, network)) +}, patrick::cases( + "directed: FALSE" = list(test.directed = FALSE), + "directed: TRUE" = list(test.directed = TRUE) +)) diff --git a/tests/test-networks-author.R b/tests/test-networks-author.R index d4d0e9faa..8f9dd11bb 100644 --- a/tests/test-networks-author.R +++ b/tests/test-networks-author.R @@ -22,6 +22,8 @@ ## Copyright 2018-2019 by Anselm Fehnker ## Copyright 2021 by Johannes Hostert ## Copyright 2023-2024 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach + ## All Rights Reserved. @@ -676,3 +678,53 @@ test_that("Network construction with only untracked files (no edges expected)", ## test expect_true(igraph::identical_graphs(network.built, network.expected)) }) + +patrick::with_parameters_test_that("Network construction with commit-interactions as relation", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") + proj.conf$update.value("commit.interactions", TRUE) + proj.conf$update.value("commits.filter.untracked.files", FALSE) + proj.conf$update.value("commits.filter.base.artifact", FALSE) + proj.conf$update.value("commit.interactions.filter.global", FALSE) + proj.data = ProjectData$new(project.conf = proj.conf) + + net.conf = NetworkConf$new() + net.conf$update.values(updated.values = list(author.relation = "commit.interaction", + author.directed = test.directed)) + + network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf) + network.built = network.builder$get.author.network() + + ## build the expected network + vertices = data.frame( + name = c("Olaf", "Thomas", "Karl"), + kind = TYPE.AUTHOR, + type = TYPE.AUTHOR + ) + edges = data.frame( + from = c("Olaf", "Thomas", "Karl", "Thomas"), + to = c("Thomas", "Karl", "Olaf", "Thomas"), + func = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"), + hash = c("0a1a5c523d835459c42f33e863623138555e2526", + "418d1dc4929ad1df251d2aeb833dd45757b04a6f", + "5a5ec9675e98187e1e92561e1888aa6f04faa338", + "d01921773fae4bed8186b0aa411d6a2f7a6626e6"), + file = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"), + base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "1143db502761379c2bfcecc2007fc34282e7ee61", + "0a1a5c523d835459c42f33e863623138555e2526"), + base.func = c("test2.c::test2", "test2.c::test2", + "test3.c::test_function", "test2.c::test2"), + base.file = c("test2.c", "test2.c", "test3.c", "test2.c"), + weight = c(1, 1, 1, 1), + type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA), + relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction") + ) + network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices) + + expect_true(igraph::identical_graphs(network.built, network)) +}, patrick::cases( + "directed: FALSE" = list(test.directed = FALSE), + "directed: TRUE" = list(test.directed = TRUE) +)) \ No newline at end of file diff --git a/tests/test-read.R b/tests/test-read.R index db3645d4d..58c9bd3c6 100644 --- a/tests/test-read.R +++ b/tests/test-read.R @@ -22,6 +22,7 @@ ## Copyright 2021 by Mirabdulla Yusifli ## Copyright 2022 by Jonathan Baumann ## Copyright 2022-2024 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. @@ -497,3 +498,61 @@ test_that("Read and parse the issue data.", { expect_identical(issue.data.read.github, issue.data.expected.github, info = "Issue data github.") }) +test_that("Read the commit-interactions data.", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") + proj.conf$update.value("commit.interactions", TRUE) + + ## read the actual data + commit.interactions.data.read = read.commit.interactions(proj.conf$get.value("datapath")) + ## build the expected data.frame + + commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8)) + ## assure that the correct type is used + for(i in seq_len(8)) { + commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]]) + } + ## set everything except for authors as expected + colnames(commit.interactions.data.expected) = c("func", "commit.hash", "file", "base.hash", + "base.func", "base.file", "base.author", + "interacting.author") + commit.interactions.data.expected[["commit.hash"]] = + c("5a5ec9675e98187e1e92561e1888aa6f04faa338", + "0a1a5c523d835459c42f33e863623138555e2526", + "418d1dc4929ad1df251d2aeb833dd45757b04a6f", + "d01921773fae4bed8186b0aa411d6a2f7a6626e6") + commit.interactions.data.expected[["base.hash"]] = + c("1143db502761379c2bfcecc2007fc34282e7ee61", + "3a0ed78458b3976243db6829f63eba3eead26774", + "0a1a5c523d835459c42f33e863623138555e2526", + "0a1a5c523d835459c42f33e863623138555e2526") + commit.interactions.data.expected[["func"]] = c("GLOBAL", "GLOBAL", "test2.c::test2", "test2.c::test2") + commit.interactions.data.expected[["file"]] = c("GLOBAL", "GLOBAL", "test2.c", "test2.c") + commit.interactions.data.expected[["base.func"]] = c("test3.c::test_function", "test2.c::test2", + "test2.c::test2", "test2.c::test2") + commit.interactions.data.expected[["base.file"]] = c("test3.c", "test2.c", "test2.c", "test2.c") + ## check the results + expect_identical(commit.interactions.data.read, commit.interactions.data.expected, + info = "commit interaction data.") +}) + +test_that("Read the empty commit-interactions data.", { + ## configuration object for the datapath + proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file") + proj.conf$update.value("commit.interactions", TRUE) + + ## read the actual data + commit.interactions.data.read = read.commit.interactions("./codeface-data/results/testing/ + test_empty_proximity/proximity") + ## build the expected data.frame + commit.interactions.data.expected = data.frame(matrix(nrow = 0, ncol = 8)) + colnames(commit.interactions.data.expected) = c("func", "commit.hash", "file", + "base.hash", "base.func", "base.file", + "base.author", "interacting.author") + for(i in seq_len(8)) { + commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]]) + } + ## check the results + expect_identical(commit.interactions.data.read, commit.interactions.data.expected, + info = "commit interaction data.") +}) \ No newline at end of file diff --git a/util-conf.R b/util-conf.R index 0031771a4..9ae2fd73d 100644 --- a/util-conf.R +++ b/util-conf.R @@ -15,7 +15,7 @@ ## Copyright 2016 by Wolfgang Mauerer ## Copyright 2017 by Raphael Nömmer ## Copyright 2017-2018 by Christian Hechtl -## Copyright 2020-2021 by Christian Hechtl +## Copyright 2020-2021, 2024 by Christian Hechtl ## Copyright 2017 by Felix Prasse ## Copyright 2017-2019 by Thomas Bock ## Copyright 2021, 2023-2024 by Thomas Bock @@ -26,6 +26,7 @@ ## Copyright 2021 by Johannes Hostert ## Copyright 2021 by Mirabdulla Yusifli ## Copyright 2022 by Jonathan Baumann +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. @@ -468,6 +469,18 @@ ProjectConf = R6::R6Class("ProjectConf", inherit = Conf, allowed = c(TRUE, FALSE), allowed.number = 1 ), + commit.interactions = list( + default = FALSE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ), + commit.interactions.filter.global = list( + default = TRUE, + type = "logical", + allowed = c(TRUE, FALSE), + allowed.number = 1 + ), custom.event.timestamps.file = list( default = NA, type = "character", @@ -629,6 +642,9 @@ ProjectConf = R6::R6Class("ProjectConf", inherit = Conf, conf$datapath.synchronicity = private$get.results.folder(data, selection.process, casestudy, "synchronicity") ## store path to PaStA data conf$datapath.pasta = private$get.results.folder(data, selection.process, casestudy, "pasta") + ## store path to commit interaction data + conf$datapath.commit.interaction = + private$get.results.folder(data, selection.process, casestudy, tagging, subfolder = tagging) ## store path to gender data conf$datapath.gender = private$get.results.folder(data, selection.process, casestudy, "gender") ## store path to issue data @@ -781,7 +797,7 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf, author.relation = list( default = "mail", type = "character", - allowed = c("mail", "cochange", "issue"), + allowed = c("mail", "cochange", "issue", "commit.interaction"), allowed.number = Inf ), author.directed = list( @@ -812,7 +828,7 @@ NetworkConf = R6::R6Class("NetworkConf", inherit = Conf, artifact.relation = list( default = "cochange", type = "character", - allowed = c("cochange", "callgraph", "mail", "issue"), + allowed = c("cochange", "callgraph", "mail", "issue", "commit.interaction"), allowed.number = Inf ), artifact.directed = list( diff --git a/util-data.R b/util-data.R index e8c9ee4d1..988146a5f 100644 --- a/util-data.R +++ b/util-data.R @@ -16,7 +16,7 @@ ## Copyright 2020-2021, 2023-2024 by Thomas Bock ## Copyright 2017 by Raphael Nömmer ## Copyright 2017-2018 by Christian Hechtl -## Copyright 2020 by Christian Hechtl +## Copyright 2020, 2024 by Christian Hechtl ## Copyright 2017 by Felix Prasse ## Copyright 2017 by Ferdinand Frank ## Copyright 2018-2019 by Jakob Kronawitter @@ -26,6 +26,7 @@ ## Copyright 2021 by Mirabdulla Yusifli ## Copyright 2022 by Jonathan Baumann ## Copyright 2022-2023 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. @@ -77,6 +78,7 @@ DATASOURCE.TO.ADDITIONAL.ARTIFACT.FUNCTION = list( "synchronicity" = "get.synchronicity", "pasta" = "get.pasta", "gender" = "get.gender", + "commit.interactions" = "get.commit.interactions", "custom.event.timestamps" = "get.custom.event.timestamps" ) @@ -123,7 +125,8 @@ CONF.PARAMETERS.NO.RESET.ENVIRONMENT = c("commit.messages", "issues.locked", "mails.locked", "custom.event.timestamps", - "custom.event.timestamps.locked") + "custom.event.timestamps.locked", + "commit.interactions") ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @@ -162,6 +165,7 @@ ProjectData = R6::R6Class("ProjectData", commits = create.empty.commits.list(), # data.frame commits.unfiltered = create.empty.commits.list(), # data.frame commit.messages = create.empty.commit.message.list(), # data.frame + commit.interactions = create.empty.commit.interaction.list(), # data.frame ## mails mails.unfiltered = create.empty.mails.list(), # data.frame mails = create.empty.mails.list(), # data.frame @@ -404,6 +408,58 @@ ProjectData = R6::R6Class("ProjectData", To clean this up you can call the function 'cleanup.commit.message.data()'.") } }, + + ## * * Commit Interaction data -------------------------------------------------- + + #' Update the commit-interactions + #' + #' This method should be called whenever the field \code{commit.interactions} is changed. + update.commit.interactions = function() { + if (self$is.data.source.cached("commit.interactions")) { + if (!self$is.data.source.cached("commits.unfiltered")) { + self$get.commits() + } + + ## remove existing columns named 'base.author' and 'interaction.author' + indices.to.remove = which("base.author" == colnames(private$commit.interactions)) + if (length(indices.to.remove) > 0) { + private$commit.interactions = private$commit.interactions[, -indices.to.remove] + } + indices.to.remove = which("interacting.author" == colnames(private$commit.interactions)) + if (length(indices.to.remove) > 0) { + private$commit.interactions = private$commit.interactions[, -indices.to.remove] + } + + ## get relevant data from commits + commit.data.subset = data.frame(hash = private$commits.unfiltered[["hash"]], + author.name = private$commits.unfiltered[["author.name"]]) + commit.data.subset = commit.data.subset[!duplicated(commit.data.subset[["hash"]]),] + + ## merge commit interactions with commits and change colnames to avoid duplicates + commit.interaction.data = merge(private$commit.interactions, commit.data.subset, + by.x = "base.hash", by.y = "hash", all.x = TRUE) + + author.index = match("author.name", colnames(commit.interaction.data)) + colnames(commit.interaction.data)[[author.index]] = "base.author" + + commit.interaction.data = merge(commit.interaction.data, commit.data.subset, + by.x = "commit.hash", by.y = "hash", all.x = TRUE) + + author.index = match("author.name", colnames(commit.interaction.data)) + colnames(commit.interaction.data)[[author.index]] = "interacting.author" + + ## warning if we have interactions without authors + if (anyNA(commit.interaction.data[["base.author"]]) || + anyNA(commit.interaction.data[["interacting.author"]])) { + logging::logwarn("There are commits in the commit-interactions that are not in + the commit data, possibly due to incomplete commit data or deleted users. + This results in the commit-interactions having empty entries. + To clean up these entries, call cleanup.commit.interactions.") + } + private$commit.interactions = commit.interaction.data + } + + }, ## * * Gender data -------------------------------------------------- #' Update the gender related fields of: \code{authors} @@ -806,6 +862,7 @@ ProjectData = R6::R6Class("ProjectData", private$pasta.commits = create.empty.pasta.list() private$gender = create.empty.gender.list() private$synchronicity = create.empty.synchronicity.list() + private$commit.interactions = create.empty.commit.interaction.list() }, ## * * configuration ----------------------------------------------- @@ -1111,6 +1168,17 @@ ProjectData = R6::R6Class("ProjectData", } } + ## add commit interaction data if wanted + if (private$project.conf$get.value("commit.interactions")) { + if (!self$is.data.source.cached("commit.interactions")) { + ## get data (no assignment because we just want to trigger anything commit.interaction related) + self$get.commit.interactions() + } else { + ## update all commit.interaction-related data + private$update.commit.interactions() + } + } + ## sort by date private$commits.unfiltered = private$commits.unfiltered[order(private$commits.unfiltered[["date"]], decreasing = FALSE), ] @@ -1186,6 +1254,79 @@ ProjectData = R6::R6Class("ProjectData", } }, + #' Get the commit interaction data. If no data.path is given, the standard data.path + #' will be used. + #' + #' @param data.path an optional different data path to the commit-interaction data + #' + #' @return the commit-interaction data + get.commit.interactions = function(data.path = NULL) { + logging::loginfo("Getting commit interactions.") + + ## if commit-interaction data are to be read, do this + if (private$project.conf$get.value("commit.interactions")) { + ## if the commit-interaction data have not yet been read do this + if (!self$is.data.source.cached("commit.interactions")) { + if (is.null(data.path)) { + commit.interaction.data = read.commit.interactions(self$get.data.path()) + } else { + commit.interaction.data = read.commit.interactions(data.path) + } + + ## filter commit interactions if configured + if (private$project.conf$get.value("commit.interactions.filter.global")) { + commit.interaction.data = subset(commit.interaction.data, + file != COMMIT.INTERACTION.GLOBAL.FILE.FUNCTION.NAME) + } + ## cache the result + private$commit.interactions = commit.interaction.data + private$update.commit.interactions() + } + } else { + logging::logwarn("You have not set the ProjectConf parameter + 'commit.interactions' to 'TRUE'! Ignoring...") + ## mark commit-interaction data as empty + private$commit.interactions = NULL + } + return(private$commit.interactions) + }, + + #' Set the commit-interaction data to the new given data. + #' + #' @param data the new commit-interaction data + set.commit.interactions = function(data) { + logging::loginfo("Setting commit messages data.") + + if (is.null(data)) { + data = create.empty.commit.interaction.list() + } else { + ## verify the format of the given dataframe + verify.data.frame.columns(data, COMMIT.INTERACTION.LIST.COLUMNS, COMMIT.INTERACTION.LIST.DATA.TYPES) + } + + ## set the actual data + private$commit.interactions = data + }, + + #' Remove lines in the commit-interaction data for which the corresponding commit is missing in the + #' commit data, indicated by a missing author in the commit-interaction data. + #' This should only be called AFTER \code{update.commit.interactions} has already been called, as otherwise + #' all commit-interactions data will be removed. + cleanup.commit.interactions = function() { + logging::loginfo("Cleaning up commit-interactions") + + ## remove commit-interactions that do not contain author in 'base.author' + indices.to.remove = which(is.na(private$commit.interactions[["base.author"]])) + if (length(indices.to.remove) > 0) { + private$commit.interactions = private$commit.interactions[-indices.to.remove, ] + } + ## remove commit-interactions that do not contain author in 'interacting.author' + indices.to.remove = which(is.na(private$commit.interactions[["interacting.author"]])) + if (length(indices.to.remove) > 0) { + private$commit.interactions = private$commit.interactions[-indices.to.remove, ] + } + }, + #' Get the synchronicity data. If it is not already stored in the ProjectData, this function triggers a read in #' from disk. #' @@ -1756,6 +1897,7 @@ ProjectData = R6::R6Class("ProjectData", "commit.messages" = "commit.messages", "synchronicity" = "synchronicity", "pasta" = "pasta", + "commit.interactions" = "commit.interactions", "custom.event.timestamps" = "custom.event.timestamps" ) ) @@ -1788,7 +1930,7 @@ ProjectData = R6::R6Class("ProjectData", ## define the data sources unfiltered.data.sources = c("commits.unfiltered", "mails.unfiltered", "issues.unfiltered") additional.data.sources = c("authors", "commit.messages", "synchronicity", "pasta", - "gender", "custom.event.timestamps") + "gender", "commit.interactions", "custom.event.timestamps") main.data.sources = c("issues", "commits", "mails") ## set the right data sources to look for according to the argument @@ -1825,7 +1967,8 @@ ProjectData = R6::R6Class("ProjectData", #' \code{"commits"}, and \code{"issues"}. [default: "commits"] #' #' @return a named list of data classes, with the corresponding data columns as names - get.data.columns.for.data.source = function(data.source = c("commits", "mails", "issues")) { + get.data.columns.for.data.source = function(data.source = c("commits", "mails", + "issues", "commit.interactions")) { ## check arguments data.source = match.arg(arg = data.source, several.ok = FALSE) @@ -1833,6 +1976,11 @@ ProjectData = R6::R6Class("ProjectData", ## get the needed data method first data.fun = DATASOURCE.TO.ARTIFACT.FUNCTION[[data.source]] + ## if 'data.fun' is NULL, check 'DATASOURCE.TO.ADDITIONAL.ARTIFACT.FUNCTION' + if (is.null(data.fun)) { + data.fun = DATASOURCE.TO.ADDITIONAL.ARTIFACT.FUNCTION[[data.source]] + } + ## get the column classes with corresponding names columns = lapply(self[[data.fun]](), class) diff --git a/util-networks-misc.R b/util-networks-misc.R index a183f6039..c9abd08a9 100644 --- a/util-networks-misc.R +++ b/util-networks-misc.R @@ -151,7 +151,7 @@ get.expanded.adjacency = function(network, authors, weighted = FALSE) { # write a warning with the number of authors from the network that we ignore warning.string = sprintf("The network had %d authors that will not be displayed in the matrix!", network.authors.num - nrow(matrix.data)) - warning(warning.string) + logging::logwarn(warning.string) } ## save the activity data per author diff --git a/util-networks.R b/util-networks.R index b02eab694..aa9511b27 100644 --- a/util-networks.R +++ b/util-networks.R @@ -14,6 +14,7 @@ ## Copyright 2016-2019 by Claus Hunsen ## Copyright 2017 by Raphael Nömmer ## Copyright 2017-2018 by Christian Hechtl +## Copyright 2024 by Christian Hechtl ## Copyright 2017-2019 by Thomas Bock ## Copyright 2021, 2023-2024 by Thomas Bock ## Copyright 2018 by Barbara Eckl @@ -22,6 +23,7 @@ ## Copyright 2021 by Niklas Schneider ## Copyright 2022 by Jonathan Baumann ## Copyright 2023-2024 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. @@ -132,10 +134,11 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", get.vertex.kind.for.relation = function(relation) { vertex.kind = switch(relation, - cochange = private$proj.data$get.project.conf.entry("artifact.codeface"), - callgraph = private$proj.data$get.project.conf.entry("artifact.codeface"), - mail = "MailThread", - issue = "Issue" + cochange = private$proj.data$get.project.conf.entry("artifact.codeface"), + callgraph = private$proj.data$get.project.conf.entry("artifact.codeface"), + mail = "MailThread", + issue = "Issue", + commit.interaction = private$proj.data$get.project.conf.entry("artifact.codeface") ) return(vertex.kind) @@ -225,6 +228,36 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", return(author.net) }, + #' Build and get the author network with commit-interactions as the relation. + #' + #' @return the commit-interaction author network + get.author.network.commit.interaction = function() { + ## get the authors that appear in the commit-interaction data as the vertices of the network + vertices = unique(c(private$proj.data$get.commit.interactions()[["base.author"]], + private$proj.data$get.commit.interactions()[["interacting.author"]])) + vertices = data.frame(name = vertices) + + ## get the commit-interaction data as the edge data of the network + edges = private$proj.data$get.commit.interactions() + ## set the authors as the 'to' and 'from' of the network and order the dataframe + edges = edges[, c("base.author", "interacting.author", "func", "commit.hash", + "file", "base.hash", "base.func", "base.file")] + colnames(edges)[1] = "to" + colnames(edges)[2] = "from" + colnames(edges)[4] = "hash" + author.net.data = list(vertices = vertices, edges = edges) + ## construct the network + author.net = construct.network.from.edge.list( + author.net.data[["vertices"]], + author.net.data[["edges"]], + network.conf = private$network.conf, + directed = private$network.conf$get.value("author.directed"), + available.edge.attributes = private$proj.data$ + get.data.columns.for.data.source("commit.interactions") + ) + return(author.net) + }, + #' Get the thread-based author relation as network. #' If it does not already exist build it first. #' @@ -345,6 +378,58 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", return(artifacts.net) }, + #' Build and get the commit-interaction based artifact network. + #' + #' @return the commit-interaction based artifact network + get.artifact.network.commit.interaction = function() { + ## initialize the vertices. They will be set correctly depending on the used config. + vertices = c() + ## get the commit-interaction data as the edge data of the network + edges = private$proj.data$get.commit.interactions() + + ## set 'to' and 'from' of the network according to the config + ## and order the dataframe accordingly + proj.conf.artifact = private$proj.data$get.project.conf.entry("artifact") + if (proj.conf.artifact == "file") { + ## change the vertices to the files from the commit-interaction data + vertices = unique(c(private$proj.data$get.commit.interactions()[["base.file"]], + private$proj.data$get.commit.interactions()[["file"]])) + vertices = data.frame(name = vertices) + + edges = edges[, c("file", "base.file", "func", "commit.hash", + "base.hash", "base.func", "base.author", "interacting.author")] + colnames(edges)[colnames(edges) == "commit.hash"] = "hash" + } else if (proj.conf.artifact == "function") { + ## change the vertices to the functions from the commit-interaction data + vertices = unique(c(private$proj.data$get.commit.interactions()[["base.func"]], + private$proj.data$get.commit.interactions()[["func"]])) + vertices = data.frame(name = vertices) + + edges = edges[, c("func", "base.func", "commit.hash", "file", "base.hash", + "base.file", "base.author", "interacting.author")] + colnames(edges)[colnames(edges) == "commit.hash"] = "hash" + } else { + ## If neither 'function' nor 'file' was configured, send a warning + ## and return an empty network + logging::logwarn("when creating a commit-interaction artifact network, + the artifact should be either 'file' or 'function'!") + return(create.empty.network(directed = private$network.conf$get.value("artifact.directed"))) + } + colnames(edges)[1] = "to" + colnames(edges)[2] = "from" + artifact.net.data = list(vertices = vertices, edges = edges) + ## construct the network + artifact.net = construct.network.from.edge.list( + artifact.net.data[["vertices"]], + artifact.net.data[["edges"]], + network.conf = private$network.conf, + directed = private$network.conf$get.value("artifact.directed"), + available.edge.attributes = private$proj.data$ + get.data.columns.for.data.source("commit.interactions") + ) + return(artifact.net) + }, + #' Get the call-graph-based artifact network. #' If it does not already exist build it first. #' IMPORTANT: This only works for range-level analyses! @@ -743,6 +828,7 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", network = switch( relation, cochange = private$get.author.network.cochange(), + commit.interaction = private$get.author.network.commit.interaction(), mail = private$get.author.network.mail(), issue = private$get.author.network.issue(), stop(sprintf("The author relation '%s' does not exist.", rel)) @@ -810,6 +896,7 @@ NetworkBuilder = R6::R6Class("NetworkBuilder", callgraph = private$get.artifact.network.callgraph(), mail = private$get.artifact.network.mail(), issue = private$get.artifact.network.issue(), + commit.interaction = private$get.artifact.network.commit.interaction(), stop(sprintf("The artifact relation '%s' does not exist.", relation)) ) diff --git a/util-read.R b/util-read.R index 8cfe1a802..f4fe70256 100644 --- a/util-read.R +++ b/util-read.R @@ -14,7 +14,7 @@ ## Copyright 2016-2019 by Claus Hunsen ## Copyright 2017 by Raphael Nömmer ## Copyright 2017-2018 by Christian Hechtl -## Copyright 2020-2022 by Christian Hechtl +## Copyright 2020-2022, 2024 by Christian Hechtl ## Copyright 2017 by Felix Prasse ## Copyright 2017-2018 by Thomas Bock ## Copyright 2023-2024 by Thomas Bock @@ -25,6 +25,7 @@ ## Copyright 2021 by Mirabdulla Yusifli ## Copyright 2022 by Jonathan Baumann ## Copyright 2022-2023 by Maximilian Löffler +## Copyright 2024 by Leo Sendelbach ## All Rights Reserved. ## Note: @@ -42,6 +43,9 @@ requireNamespace("plyr") requireNamespace("digest") # for sha1 hashing of IDs requireNamespace("sqldf") # for SQL-selections on data.frames requireNamespace("data.table") # for faster data.frame processing +requireNamespace("yaml") # for reading commit interaction data +requireNamespace("fastmap") # for fast implementation of a map +requireNamespace("purrr") # for fast mapping function ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Helper functions -------------------------------------------------------- @@ -66,6 +70,16 @@ remove.deleted.and.empty.user = function(data, columns = c("author.name")) { return(data) } +#' Concatenate function and file name, i.e. 'file::function' +#' +#' @param file.name the name of the file +#' @param function.name the name of the function +#' +#' @return the concatenated function name +prefix.function.with.file.name = function(file.name, function.name) { + return(paste(file.name, function.name, sep = "::")) +} + ## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / ## Main data sources ------------------------------------------------------- @@ -164,7 +178,7 @@ read.commits = function(data.path, artifact) { ## (we have proximity-based data as foundation) if (artifact == "function") { ## artifact = file name + "::" . function name - artifacts.new = paste(commit.data[["file"]], commit.data[["artifact"]], sep = "::") + artifacts.new = prefix.function.with.file.name(commit.data[["file"]], commit.data[["artifact"]]) ## clean up empty artifacts and File_Level artifact artifacts.new = gsub("^::$", "", artifacts.new) @@ -843,6 +857,119 @@ create.empty.pasta.list = function() { return(create.empty.data.frame(PASTA.LIST.COLUMNS, PASTA.LIST.DATA.TYPES)) } +## * Commit interaction data ----------------------------------------------- + +## column names of a dataframe containing commit interaction data (see function \code{read.commit.interactions}) +COMMIT.INTERACTION.LIST.COLUMNS = c( + "func", "commit.hash", "file", + "base.hash", "base.func", "base.file", + "base.author", "interacting.author" +) + +## declare the datatype for each column in the constant 'COMMIT.INTERACTION.LIST.COLUMNS' +COMMIT.INTERACTION.LIST.DATA.TYPES = c( + "character", "character", "character", + "character", "character", "character", + "character", "character" +) + +COMMIT.INTERACTION.GLOBAL.FILE.FUNCTION.NAME = "GLOBAL" + +#' Read and parse the commit-interaction data. This data is present in a `.yaml` file which +#' needs to be broken down. Within the yaml file, there are different lists in which each +#' commit (hash) gets mapped to all commits it interacts with and the file/function because of +#' which they interact. +#' +#' @param data.path the path to the commit-interaction data +#' +#' @return the read and parsed commit-interaction data +read.commit.interactions = function(data.path = NULL) { + + file = file.path(data.path, "commit-interactions.yaml") + + commit.interaction.base = try(yaml::read_yaml(file = file, + handlers = list(int = function(x) {as.character(x)})), + silent = TRUE) + + ## handle the case that the list of commit-interactions is empty + if (inherits(commit.interaction.base, "try-error")) { + logging::logwarn("There are no commit-interactions available for the current environment.") + logging::logwarn("Datapath: %s", data.path) + + # return a dataframe with the correct columns but zero rows + return(create.empty.commit.interaction.list()) + } + + ## extract the top level list of the yaml file which is called 'result-map' + result.map = commit.interaction.base[["result-map"]] + + ## extract a mapping of functions to files to be able to determine what file the current interaction is + ## based on + ## 1) create an empty map + file.name.map = fastmap::fastmap() + ## 2) create a mapping between functions and files as a named list + ## which can be directly converted to a map + function.file.list = purrr::map(result.map, "file") + ## 3) set the map using the list + file.name.map$mset(.list = function.file.list) + list.names = names(result.map) + + ## build the result dataframe by iterating over the 'result-map' list + commit.interaction.data = data.table::setDF(data.table::rbindlist( + parallel::mcmapply(result.map, + list.names, + SIMPLIFY = FALSE, + FUN = function(current.interaction, function.name) { + ## get all commits that interact with the current one + insts = current.interaction[["insts"]] + interactions = data.table::setDF(data.table::rbindlist(lapply(insts, function(current.inst) { + base.hash = current.inst[["base-hash"]][["commit"]] + interacting.hashes = current.inst[["interacting-hashes"]] + interacting.hashes.df = data.table::setDF(data.table::rbindlist(lapply(interacting.hashes, function(hash) { + ## if there is no function name in the current interaction, we set the function name to 'GLOBAL' + ## as this is most likely code outside of functions, else we set the function name + if (!"function" %in% names(hash)) { + return(data.frame(func = COMMIT.INTERACTION.GLOBAL.FILE.FUNCTION.NAME, + commit.hash = hash[["commit"]], + file = COMMIT.INTERACTION.GLOBAL.FILE.FUNCTION.NAME)) + } else if (is.null(file.name.map$get(hash[["function"]]))) { + ## This case should never occur if the data was generated correctly! + warning("An interacting hash specifies a function that does not exist in the data!") + return(data.frame(matrix(nrow = 3, ncol = 0))) + } else { + file.name = file.name.map$get(hash[["function"]]) + func.name = prefix.function.with.file.name(file.name, hash[("function")]) + return(data.frame(func = func.name, commit.hash = hash[["commit"]], file = file.name)) + } + }))) + base.file.name = file.name.map$get(function.name) + interacting.hashes.df[["base.hash"]] = base.hash + interacting.hashes.df[["base.func"]] = prefix.function.with.file.name(base.file.name, function.name) + interacting.hashes.df[["base.file"]] = base.file.name + return(interacting.hashes.df) + }))) + ## Initialize author data as 'NA', since it is not available from the commit-interaction data. + ## Author data will be merged from commit data in \code{update.commit.interactions}. + interactions["base.author"] = NA_character_ + interactions["interacting.author"] = NA_character_ + return(interactions) + }))) + + ## remove all duplicate entries from the resulting dataframe + commit.interaction.data = commit.interaction.data[!duplicated(commit.interaction.data), ] + verify.data.frame.columns(commit.interaction.data, COMMIT.INTERACTION.LIST.COLUMNS, COMMIT.INTERACTION.LIST.DATA.TYPES) + return(commit.interaction.data) +} + +#' Create an empty dataframe which has the same shape as a dataframe containing commit interaction data. +#' The dataframe has the column names and column datatypes defined in \code{COMMIT.INTERACTION.LIST.COLUMNS} +#' and \code{COMMIT.INTERACTION.LIST.DATA.TYPES}, respectively. +#' +#' @return the empty dataframe +create.empty.commit.interaction.list = function() { + return (create.empty.data.frame(COMMIT.INTERACTION.LIST.COLUMNS, COMMIT.INTERACTION.LIST.DATA.TYPES)) +} + ## * Synchronicity data ---------------------------------------------------- ## column names of a dataframe containing synchronicity data (see function \code{read.synchronicity})