Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding commit-interactions #252

Merged
merged 38 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d82857f
Add commit-interaction data functionality
Leo-Send Feb 12, 2024
b4fd2a2
Add functionality for equals function
Leo-Send Feb 12, 2024
b3394ee
Remove outdated comment int 'util-read.R'
Leo-Send Feb 12, 2024
eeba7e2
Add test for new functionality of 'equals'
Leo-Send Feb 12, 2024
8bb39f4
Add test for new read functionality
Leo-Send Feb 12, 2024
54b6f65
Add test data files with commit interactions
Leo-Send Feb 12, 2024
7a5497a
Add test for reading empty commit-interactions data
Leo-Send Feb 14, 2024
7b8585f
Add test for change in set.commits
Leo-Send Feb 14, 2024
d7dc713
Add comments for update.commit.interactions
Leo-Send Feb 20, 2024
f25632c
Change indexes for 'match' calls
Leo-Send Feb 20, 2024
8fcc6d5
Fix test to correctly check for inequality
Leo-Send Feb 20, 2024
9117be8
Change colnames used for empty commit-interactions
Leo-Send Feb 20, 2024
49acd59
Remove previously added columns to avoid duplication
Leo-Send Feb 20, 2024
3efb38b
Change merge in 'update.commit.interactions'
Leo-Send Feb 27, 2024
099a096
Add additional columns to commit-interactions
Leo-Send Feb 27, 2024
6f73cff
Change test to reflect change to dataframe columns
Leo-Send Feb 27, 2024
fd0aa05
Add 'cleanup.commit.interactions' function
Leo-Send Feb 27, 2024
ef72540
Add test for cleanup function
Leo-Send Feb 27, 2024
7068cfa
Add test for author network
Leo-Send Mar 5, 2024
329d97e
Change 'util-networks.R' to use colnames
Leo-Send Mar 5, 2024
07e7ed7
Add tests for artifact networks
Leo-Send Mar 5, 2024
dbd07e9
Fix artifact network construction
Leo-Send Mar 5, 2024
169dbfe
Change tests for artifact networks
Leo-Send Mar 8, 2024
8736025
Change vertex kind for artifact networks
Leo-Send Mar 8, 2024
a924e86
Add commits to 'NEWS.md'
Leo-Send Mar 12, 2024
48d9de1
Change warning to use 'logging::logwarn'
Leo-Send Mar 13, 2024
91b9c3b
Fix issues pointed out on PR comments
Leo-Send Mar 19, 2024
8d4965a
Change call to 'read_yaml'
Leo-Send Mar 20, 2024
1addce9
Change to adress comments by @bockthom
Leo-Send Apr 4, 2024
1335965
Add global variable and change function names
Leo-Send Apr 5, 2024
8ce1f07
Change tests to match new function names
Leo-Send Apr 5, 2024
7c92b72
Fix typos and change data frame access
Leo-Send Apr 10, 2024
bc49386
Change NEWS.md with new commit hashes after rebase
Leo-Send Apr 10, 2024
bca3576
Add Configuration for filtering commit interactions
Leo-Send Apr 18, 2024
f8ea987
Add helper function for prefixing function names
Leo-Send Apr 18, 2024
7d8be96
Change 'NEWS.md' to include new commits
Leo-Send Apr 18, 2024
b8857cf
Change some comments and variable names
Leo-Send Apr 23, 2024
ee54b1a
Add missing copyright headers
Leo-Send Apr 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

# coronet – Changelog

## unversioned

### Added

- Add commit-interaction data and add functions `read.commit.interactions` for reading, as well as `get.commit.interactions`, `set.commit.interactions` and utility functions for working with commit-interaction data (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, b4fd2a29c9b5fd561b1106c6febb54a32b0085ab, fd0aa05f824b93545ae8e05833b95b3bd9809286, bca35760eb0aac86c04923f2d534b2d8cece204e) as well as tests for these features (PR #252, eeba7e29932bc973513c963fb9e716e9230d570f, 8bb39f4df39b49dfaff8f19feb6db5e5fbd81fac, 54b6f655248720436af116fe72521f9cb0348429, 7a5497aaf9114017d1b3b9b68b6cccd7ca8ac114, 7b8585f87675795822c07230192d6454de31dcc7, ef725407bf8818c8fff96ea6f343338b7162cbe0)
- Add commit-interaction networks that can be created with `create.author.network` and `create.artifact.network` if the `artifact.relation` and `author.relation` is configured to be `commit.interaction` (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, 329d97ec3de36a9e1bcadc0c7a53c1d92e8b481c) as well as tests for these features (PR #252, 07e7ed744209b0251217fa8f7f35d9b9875face2, 7068cfa10d993dcae3f5e3f76f8cafa99fa8b350)
- Add helper function for prefixing function names with file names in `util-read.R` (PR #252, f8ea987b138173cf0509c7910e0572d8ee1b3f1f)
Leo-Send marked this conversation as resolved.
Show resolved Hide resolved

### Changed/Improved

### Fixed

## 4.4

### Announcement
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ Alternatively, you can run `Rscript install.R` to install the packages.
- `jsonlite`: For parsing the issue data
- `rTensor`: For calculating EDCPTD centrality
- `Matrix`: For sparse matrix representation of large adjacency matrices
- `fastmap`: For fast implementation of a map
- `purrr`: For fast implementtion of a mapping function

### Submodule

Expand Down Expand Up @@ -264,6 +266,11 @@ Relations determine which information is used to construct edges among the verti
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts are connected when they reference each other (i.e., one artifact calls a function contained in the other artifact).
* For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits (same as for the relation `cochange`).

- `commit.interaction`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who contribute to interacting commits are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), artifacts are connected when there is an interaction between two commits that occur in the artifacts.
* This relation does not apply for bipartite networks.

#### Edge-construction algorithms for author networks

When constructing author networks, we use events in time (i.e., commits, e-mails, issue events) to model interactions among authors on the same artifact as edges. Therefore, we group the events on artifacts, based on the configured relation (see the [previous section](#relations)).
Expand Down Expand Up @@ -597,6 +604,12 @@ There is no way to update the entries, except for the revision-based parameters.
- `custom.event.timestamps.locked`:
* Lock custom event timestamps to prevent them from being read if empty or not yet present when calling the getter.
* [`TRUE`, *`FALSE`*]
- `commit.interactions`:
* Allow construction of author and artifact networks using commit-interaction data
* [`TRUE`, *`FALSE`*]
- `commit.interactions.filter.global`:
* Filter out entries from commit interaction data that are not matched to a specific function or file
Leo-Send marked this conversation as resolved.
Show resolved Hide resolved
* [*`TRUE`*, `FALSE`]

### NetworkConf

Expand Down
5 changes: 4 additions & 1 deletion install.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
## Copyright 2020-2023 by Thomas Bock <[email protected]>
## Copyright 2019 by Anselm Fehnker <[email protected]>
## Copyright 2021 by Christian Hechtl <[email protected]>
## Copyright 2024 by Leo Sendelbach <[email protected]>
## All Rights Reserved.
##
## Adapted from https://github.com/siemens/codeface/blob/be382e9171fb91b4aa99b99b09b2ef64a6dba0d5/packages.r
Expand All @@ -44,7 +45,9 @@ packages = c(
"viridis",
"jsonlite",
"rTensor",
"Matrix"
"Matrix",
"fastmap",
"purrr"
)


Expand Down
1 change: 1 addition & 0 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ We have two test projects you can use when writing your tests:
* Commit messages
* Pasta
* Synchronicity
* Commit interactions
* Custom event timestamps in `custom-events.list`
* Revisions
2. - Casestudy: `test_empty`
Expand Down
Leo-Send marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
scope: REGION
result-map:
test_function:
demangled-name: test_function
file: test3.c
num-instructions: 30
insts:
- base-hash:
region: 45620620587549
function: test_function
commit: 1143db502761379c2bfcecc2007fc34282e7ee61
repository: test-repo
interacting-hashes:
- region: 87546092348456
commit: 5a5ec9675e98187e1e92561e1888aa6f04faa338
repository: test-repo
amount: 2
callees:
- test_callee
commits:
- commit: 3383d8e5561dfc6fb2b65e0a194df94ccb5e08af
repository: test-repo
test2:
demangled-name: test2
file: test2.c
num-instructions: 26
insts:
- base-hash:
region: 50956672345141
commit: 3a0ed78458b3976243db6829f63eba3eead26774
repository: test-repo
interacting-hashes:
- region: 98750276234511
commit: 0a1a5c523d835459c42f33e863623138555e2526
repository: test-repo
amount: 1
- base-hash:
region: 67230588834344
commit: 0a1a5c523d835459c42f33e863623138555e2526
repository: test-repo
interacting-hashes:
- region: 33295067820043
function: test2
commit: 418d1dc4929ad1df251d2aeb833dd45757b04a6f
repository: test-repo
- region: 20194653678423
function: test2
commit: d01921773fae4bed8186b0aa411d6a2f7a6626e6
repository: test-repo
amount: 3
callees:
- test_callee
commits:
- commit: 3383d8e5561dfc6fb2b65e0a194df94ccb5e08af
repository: test-repo
83 changes: 83 additions & 0 deletions tests/test-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
## Copyright 2021 by Mirabdulla Yusifli <[email protected]>
## Copyright 2022 by Jonathan Baumann <[email protected]>
## Copyright 2023 by Maximilian Löffler <[email protected]>
## Copyright 2024 by Leo Sendelbach <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -98,6 +99,13 @@ test_that("Compare two ProjectData objects on empty data", {
proj.data.two$set.project.conf.entry("commit.messages", "message")
proj.data.two$get.commit.messages()
expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects (commit.messages).")

proj.data.one$set.project.conf.entry("commit.interactions", TRUE)
proj.data.one$get.commit.interactions()
expect_false(proj.data.one$equals(proj.data.two), "Two non-identical ProjectData objects (commit.interactions).")
proj.data.two$set.project.conf.entry("commit.interactions", TRUE)
proj.data.two$get.commit.interactions()
expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects (commit.interactions).")
})

test_that("Compare two ProjectData objects on non-empty data", {
Expand Down Expand Up @@ -511,3 +519,78 @@ test_that("Create RangeData objects from Codeface ranges and check data path", {

expect_identical(range.paths, expected.paths, "RangeData data paths")
})

Leo-Send marked this conversation as resolved.
Show resolved Hide resolved
test_that("Compare two ProjectData Objects with commit.interactions", {
## configuration object for the datapath
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file")
proj.conf$update.value("commit.interactions", TRUE)
proj.conf$update.value("commits.filter.untracked.files", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("commit.interactions.filter.global", FALSE)

proj.data.one = ProjectData$new(project.conf = proj.conf)
proj.data.two = proj.data.one$clone(deep = TRUE)

## test if the project data is equal and the commit interactions are as well
expect_equal(proj.data.one$get.commit.interactions(), proj.data.two$get.commit.interactions())
expect_true(proj.data.one$equals(proj.data.two))

## change commit interactions of one project data and assert that equality check fails
proj.data.two$set.commit.interactions(create.empty.commit.interaction.list())
expect_false(proj.data.one$equals(proj.data.two))

## change commit data in one to test if commit-interactions are correctly updated
## call get.commit.interactions() once to restore read interactions
proj.data.two$get.commit.interactions()

## change commits in one project data
commit.data = proj.data.one$get.commits()
commit.data[["hash"]][[5]] = 1
proj.data.one$set.commits(commit.data)

## use isTRUE to compress result of all.equal into a single boolean
expect_false(isTRUE(all.equal(proj.data.one$get.commit.interactions(),
proj.data.two$get.commit.interactions())))
Leo-Send marked this conversation as resolved.
Show resolved Hide resolved

## The data frame should still have 4 entries:
expect_true(nrow(proj.data.one$get.commit.interactions()) == 4)
## after cleanup is called, the data frame should only have 3 entries:
proj.data.one$cleanup.commit.interactions()
expect_true(nrow(proj.data.one$get.commit.interactions()) == 3)

## set commit list of one project data to empty and test that last
## two rows of result data frame are empty
proj.data.two$set.commits(create.empty.commits.list())

## create empty data frame of correct size
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8))
## assure that the correct type is used
for(i in seq_len(8)) {
commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]])
}
## set everything except for authors as expected
colnames(commit.interactions.data.expected) = c("commit.hash", "base.hash", "func", "file",
"base.func", "base.file", "base.author",
"interacting.author")
commit.interactions.data.expected[["commit.hash"]] =
c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
"5a5ec9675e98187e1e92561e1888aa6f04faa338",
"d01921773fae4bed8186b0aa411d6a2f7a6626e6")
commit.interactions.data.expected[["base.hash"]] =
c("3a0ed78458b3976243db6829f63eba3eead26774",
"0a1a5c523d835459c42f33e863623138555e2526",
"1143db502761379c2bfcecc2007fc34282e7ee61",
"0a1a5c523d835459c42f33e863623138555e2526")
commit.interactions.data.expected[["func"]] = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2")
commit.interactions.data.expected[["file"]] = c("GLOBAL", "test2.c", "GLOBAL", "test2.c")
commit.interactions.data.expected[["base.func"]] = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2")
commit.interactions.data.expected[["base.file"]] = c("test2.c", "test2.c", "test3.c", "test2.c")

expect_equal(proj.data.two$get.commit.interactions(), commit.interactions.data.expected)

## reactivate filtering of commit interactions
proj.data.two$set.project.conf.entry("commit.interactions.filter.global", TRUE)
expect_true(nrow(proj.data.two$get.commit.interactions()) == 2)
})
98 changes: 98 additions & 0 deletions tests/test-networks-artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -212,3 +212,101 @@ patrick::with_parameters_test_that("Network construction of an empty 'comments-o
"directed: FALSE" = list(test.directed = FALSE),
"directed: TRUE" = list(test.directed = TRUE)
))

patrick::with_parameters_test_that("Network construction with commit-interactions as relation, artifact type 'file'", {
## configuration object for the datapath
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file")
proj.conf$update.value("commit.interactions", TRUE)
proj.conf$update.value("commits.filter.untracked.files", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("commit.interactions.filter.global", FALSE)
proj.data = ProjectData$new(project.conf = proj.conf)

net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(artifact.relation = "commit.interaction",
artifact.directed = test.directed))

network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf)
network.built = network.builder$get.artifact.network()
## build the expected nbetwork
vertices = data.frame(
name = c("test2.c", "test3.c", "GLOBAL"),
kind = "File",
type = TYPE.ARTIFACT
)
edges = data.frame(
from = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"),
to = c("test2.c", "test2.c", "test3.c", "test2.c"),
func = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"),
hash = c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
"5a5ec9675e98187e1e92561e1888aa6f04faa338",
"d01921773fae4bed8186b0aa411d6a2f7a6626e6"),
base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774",
"0a1a5c523d835459c42f33e863623138555e2526",
"1143db502761379c2bfcecc2007fc34282e7ee61",
"0a1a5c523d835459c42f33e863623138555e2526"),
base.func = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
)
network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices)

expect_true(igraph::identical_graphs(network.built, network))
}, patrick::cases(
"directed: FALSE" = list(test.directed = FALSE),
"directed: TRUE" = list(test.directed = TRUE)
))

patrick::with_parameters_test_that("Network construction with commit-interactions as relation, artifact type 'function'", {
## configuration object for the datapath
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "function")
proj.conf$update.value("commit.interactions", TRUE)
proj.conf$update.value("commits.filter.untracked.files", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("commit.interactions.filter.global", FALSE)
proj.data = ProjectData$new(project.conf = proj.conf)

net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(artifact.relation = "commit.interaction",
artifact.directed = test.directed))

network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf)
network.built = network.builder$get.artifact.network()
## build the expected network
vertices = data.frame(
name = c("test2.c::test2", "test3.c::test_function", "GLOBAL"),
kind = "Function",
type = TYPE.ARTIFACT
)
edges = data.frame(
from = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"),
to = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2"),
hash = c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
"5a5ec9675e98187e1e92561e1888aa6f04faa338",
"d01921773fae4bed8186b0aa411d6a2f7a6626e6"),
file = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"),
base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774",
"0a1a5c523d835459c42f33e863623138555e2526",
"1143db502761379c2bfcecc2007fc34282e7ee61",
"0a1a5c523d835459c42f33e863623138555e2526"),
base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
)
network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices)

expect_true(igraph::identical_graphs(network.built, network))
}, patrick::cases(
"directed: FALSE" = list(test.directed = FALSE),
"directed: TRUE" = list(test.directed = TRUE)
))
Loading