Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Commit Networks #263

Merged
merged 16 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,11 @@ There are four types of networks that can be built using this library: author ne
* The vertices in an artifact network denote any kind of artifact, e.g., source-code artifact (such as features or files) or communication artifact (such as mail threads or issues). All artifact-type vertices are uniquely identifiable by their name. There are only unipartite edges among artifacts in this type of network.
* The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`. The relation also describes which kinds of artifacts are represented as vertices in the network. (For example, if "mail" is selected as `artifact.relation`, only mail-thread vertices are included in the network.)

- Commit networks
bockthom marked this conversation as resolved.
Show resolved Hide resolved
* The vertices in a commit network denote any commits in the data. All vertices
are uniquely identifyable by the hash of the commit. There are only unipartite edges among commits in this type of network.
* The relations (i.e., the edges meaning and source) can be configured using the [`networkConf`](#networkconf) attribute `commit.relation`. The relation also describes the type of data used for network construction (`cochange` uses commit data, `commit.interaction` uses commit interaction data).
bockthom marked this conversation as resolved.
Show resolved Hide resolved

- Bipartite networks
* The vertices in a bipartite network denote both authors and artifacts. There are only bipartite edges from authors to artifacts in this type of network.
* The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`.
Expand All @@ -249,6 +254,7 @@ Relations determine which information is used to construct edges among the verti
- `cochange`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who change the same source-code artifact are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts that are concurrently changed in the same commit are connected with an edge.
* For commit networks (configured vie `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected if they change the same artifact.
* For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits.

- `mail`
Expand All @@ -269,6 +275,7 @@ Relations determine which information is used to construct edges among the verti
- `commit.interaction`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who contribute to interacting commits are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), artifacts are connected when there is an interaction between two commits that occur in the artifacts.
* For commit networks (configured via `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected when they interact in the commit interaction data.
bockthom marked this conversation as resolved.
Show resolved Hide resolved
* This relation does not apply for bipartite networks.

#### Edge-construction algorithms for author networks
Expand Down
4 changes: 2 additions & 2 deletions showcase.R
Original file line number Diff line number Diff line change
Expand Up @@ -239,8 +239,8 @@ sample.pull.requests = add.vertex.attribute.author.issue.count(my.networks, x.da
## add vertex attributes for the project-level network
x.net.as.list = list("1970-01-01 00:00:00-2030-01-01 00:00:00" = x$get.author.network())
sample.entire = add.vertex.attribute.author.commit.count(x.net.as.list, x.data, aggregation.level = "complete")
## add vertex attributes to commit network
add.vertex.attribute.commit.network(x$get.commit.network(), x.data, "author.name", "NO_AUTHOR")
## add vertex attributes to commit network. Default value 'NO_AUTHOR' is used if vertex is not in commit data
add.vertex.attribute.commit.network(x$get.commit.network(), x.data, attr.name = "author.name", default.value = "NO_AUTHOR")


## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
Expand Down
10 changes: 6 additions & 4 deletions tests/test-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -564,15 +564,15 @@ test_that("Compare two ProjectData Objects with commit.interactions", {
proj.data.two$set.commits(create.empty.commits.list())

## create empty data frame of correct size
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8))
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 9))
## assure that the correct type is used
for(i in seq_len(8)) {
for(i in seq_len(9)) {
commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]])
}
## set everything except for authors as expected
colnames(commit.interactions.data.expected) = c("commit.hash", "base.hash", "func", "file",
"base.func", "base.file", "base.author",
"interacting.author")
"base.func", "base.file","artifact.type",
"base.author", "interacting.author")
commit.interactions.data.expected[["commit.hash"]] =
c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
Expand All @@ -588,6 +588,8 @@ test_that("Compare two ProjectData Objects with commit.interactions", {
commit.interactions.data.expected[["base.func"]] = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2")
commit.interactions.data.expected[["base.file"]] = c("test2.c", "test2.c", "test3.c", "test2.c")
commit.interactions.data.expected[["artifact.type"]] = c("CommitInteraction", "CommitInteraction",
"CommitInteraction", "CommitInteraction")

expect_equal(proj.data.two$get.commit.interactions(), commit.interactions.data.expected)

Expand Down
3 changes: 3 additions & 0 deletions tests/test-networks-commit.R
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,9 @@ patrick::with_parameters_test_that("Network construction with commit-interaction
)
network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices)
expect_true(igraph::identical_graphs(network.built, network))

network.new.attr = add.vertex.attribute.commit.network(network.built, proj.data, "deleted.lines", "NO_DATA")
expect_identical(igraph::V(network.new.attr)$deleted.lines, c("0", "0","0", "NO_DATA", "0", "NO_DATA"))
}, patrick::cases(
"directed: FALSE" = list(test.directed = FALSE),
"directed: TRUE" = list(test.directed = TRUE)
Expand Down
14 changes: 8 additions & 6 deletions tests/test-read.R
Original file line number Diff line number Diff line change
Expand Up @@ -505,15 +505,15 @@ test_that("Read the commit-interactions data.", {
commit.interactions.data.read = read.commit.interactions(proj.conf$get.value("datapath"))
## build the expected data.frame

commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8))
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 9))
## assure that the correct type is used
for(i in seq_len(8)) {
for(i in seq_len(ncol(commit.interactions.data.expected))) {
commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]])
}
## set everything except for authors as expected
colnames(commit.interactions.data.expected) = c("func", "commit.hash", "file", "base.hash",
"base.func", "base.file", "base.author",
"interacting.author")
"interacting.author", "artifact.type")
commit.interactions.data.expected[["commit.hash"]] =
c("5a5ec9675e98187e1e92561e1888aa6f04faa338",
"0a1a5c523d835459c42f33e863623138555e2526",
Expand All @@ -529,6 +529,8 @@ test_that("Read the commit-interactions data.", {
commit.interactions.data.expected[["base.func"]] = c("test3.c::test_function", "test2.c::test2",
"test2.c::test2", "test2.c::test2")
commit.interactions.data.expected[["base.file"]] = c("test3.c", "test2.c", "test2.c", "test2.c")
commit.interactions.data.expected[["artifact.type"]] = c("CommitInteraction", "CommitInteraction",
"CommitInteraction", "CommitInteraction")
## check the results
expect_identical(commit.interactions.data.read, commit.interactions.data.expected,
info = "commit interaction data.")
Expand All @@ -543,11 +545,11 @@ test_that("Read the empty commit-interactions data.", {
commit.interactions.data.read = read.commit.interactions("./codeface-data/results/testing/
test_empty_proximity/proximity")
## build the expected data.frame
commit.interactions.data.expected = data.frame(matrix(nrow = 0, ncol = 8))
commit.interactions.data.expected = data.frame(matrix(nrow = 0, ncol = 9))
colnames(commit.interactions.data.expected) = c("func", "commit.hash", "file",
"base.hash", "base.func", "base.file",
"base.author", "interacting.author")
for(i in seq_len(8)) {
"base.author", "interacting.author", "artifact.type")
for(i in seq_len(ncol(commit.interactions.data.expected))) {
commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]])
}
## check the results
Expand Down
12 changes: 6 additions & 6 deletions util-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,10 @@ ProjectData = R6::R6Class("ProjectData",
#'
#' This method should be called whenever the field \code{commit.interactions} is changed.
update.commit.interactions = function() {
if (self$is.data.source.cached("commit.interactions")) {
stacktrace = get.stacktrace(sys.calls())
caller = get.second.last.element(stacktrace)
if (self$is.data.source.cached("commit.interactions") &&
(is.na(caller)|| paste(caller, collapse = " ") != "self$set.commits(commit.data)")) {
if (!self$is.data.source.cached("commits.unfiltered")) {
self$get.commits()
}
Expand Down Expand Up @@ -2143,8 +2146,6 @@ ProjectData = R6::R6Class("ProjectData",
return(mylist)
},

## * * processed data ----------------------------------------------

#' Group the commits of the given \code{data.source} by the given \code{group.column}.
#' For each group, the column \code{"hash"} is duplicated and prepended to each
#' group's data as first column (see below for details).
Expand All @@ -2162,12 +2163,11 @@ ProjectData = R6::R6Class("ProjectData",
#' as first column (with name \code{"data.vertices"})
#'
#' @seealso ProjectData$group.data.by.column
group.commits.by.data.column = function(data.source = c("commits", "mails", "issues"),
group.column = "artifact") {
group.commits.by.data.column = function(group.column = "artifact") {
logging::loginfo("Grouping commits by data column.")

## store the commits per group that is determined by 'group.column'
mylist = self$group.data.by.column(data.source, group.column, "hash")
mylist = self$group.data.by.column("commits", group.column, "hash")

return(mylist)
},
Expand Down
5 changes: 2 additions & 3 deletions util-networks-covariates.R
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,8 @@ add.vertex.attribute = function(net.to.range.list, attr.name, default.value, com
#' @param network the commit network
#' @param project.data the project data from which to extract the values
#' @param attr.name the name of the attribute
#' @param default.value the dafault value of the attribute
#' if it does not occur in the commit data
#' @param default.value the default value that is used if the current hash
#' is not contained in the commit data at all
#'
#' @return a network with new vertex attribute
add.vertex.attribute.commit.network = function(network, project.data,
Expand All @@ -174,7 +174,6 @@ add.vertex.attribute.commit.network = function(network, project.data,
attribute.values = c(attribute.values, value)
}
net.with.attr = igraph::set.vertex.attribute(network, attr.name, value = attribute.values)

}


Expand Down
Loading