Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Core/Peripheral Classification Methods #276

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

Leo-Send
Copy link
Contributor

@Leo-Send Leo-Send commented Dec 11, 2024

Prerequisites

  • I adhere to the coding conventions (described here) in my code.
  • I have updated the copyright headers of the files I have modified.
  • I have written appropriate commit messages, i.e., I have recorded the goal, the need, the needed changes, and the location of my code modifications for each commit. This includes also, e.g., referencing to relevant issues.
  • I have put signed-off tags in all commits.
  • I have updated the changelog file NEWS.md appropriately.
  • I have checked whether I need to adjust the showcase file showcase.R with respect to my changes.
  • The pull request is opened against the branch dev.

Description

Add four new metric which can be used for the classification of authors into core and peripheral:
Betweenness, which measures the number of shortest paths between developers that go through a given developer vertex;
Closeness, which measures how close a developer is to all others by taking the inverse of the sum of all of it's shortest paths;
Pagerank, which is based on Google's Pagerank algorithm, which is closely related to Eigenvector Centrality;
Eccentricity, which measures the distance to the furthest developer vertex.

Changelog

Added

Base implementation for new classification metrics.
Documentation and testing still missing.

Signed-off-by: Leo Sendelbach <[email protected]>
Tests use already existing network, this test cases are quite small.
Additional research into potential rounding errors may be required.

Signed-off-by: Leo Sendelbach <[email protected]>
Add default documentation, same as for already existing
classification methods

Signed-off-by: Leo Sendelbach <[email protected]>
add new entry under 'unversioned"

Signed-off-by: Leo Sendelbach <[email protected]>
@Leo-Send Leo-Send changed the title Pullrequest Additional Core/Peripheral Classification Methods Dec 11, 2024
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look at the implementation (but not yet at the tests).
Please find my initial comments below.

@@ -96,7 +101,7 @@ CLASSIFICATION.TYPE.TO.CATEGORY = list(
#' Network-based options/metrics (parameter \code{network} has to be specified):
#' - "network.degree"
#' - "network.eigen"
#' - "network.hierarchy"
#' - "network.hierarchy" ###TODO check all documentation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't forget about this TODO 😉

Comment on lines +261 to +263
## since core developers are expected to have a lower eccentricity,
## we need to invert all non-zero values
indices = which(eccentricity.vec > 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description it is not clear what happens for zero values...

Comment on lines +249 to +253
} else if (type == "network.closeness") {
closeness.centrality.vec = igraph::closeness(network)
## Construct centrality dataframe
centrality.dataframe = data.frame(author.name = names(closeness.centrality.vec),
centrality = as.vector(closeness.centrality.vec))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the mode parameter for closeness. For the degree, we use "all" (if I am not mistaken, please check what we really use), but for closeness, the default seems to be "out". While this looks like an inconsistency in igraph that both functions have different default values, I am not sure whether there is an actual reason why closeness has "out" as default.

Could you please check that with igraph documentation and with small examples of directed networks whether we should use "out" or "all" here? In general, I would like to preserve consistency, but there might be reasons to deviate from consistency 😉

Comment on lines +706 to +707
get.author.class.network.betweenness = function(network, result.limit = NULL,
restrict.classification.to.authors = NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation wrong. (Also applies to some of the functions below.)

@@ -15,6 +15,7 @@
- Add commit network as a new type of network. It uses commits as vertices and connects them either via cochange or commit interactions. This includes adding new config parameters and the function `add.vertex.attribute.commit.network` for adding vertex attributes to a commit network (PR #263, ab73271781e8e9a0715f784936df4b371d64c338, ab73271781e8e9a0715f784936df4b371d64c338, cd9a930fcb54ff465c2a5a7c43cfe82ac15c134d)
- Add `remove.duplicate.edges` function that takes a network as input and conflates identical edges (PR #268, d9a4be417b340812b744f59398ba6460ba527e1c, 0c2f47c4fea6f5f2f582c0259f8cf23af985058a, c6e90dd9cb462232563f753f414da14a24b392a3)
- Add `cumulative` as an argument to `construct.ranges` which enables the creation of cumulative ranges from given revisions (PR #268, a135f6bb6f83ccb03ae27c735c2700fccc1ee0c8, 8ec207f1e306ef6a641fb0205a9982fa89c7e0d9)
- Add four new metric which can be used for the classification of authors into core and peripheral: Betweenness, Closeness, Pagerank and Eccentricity (PR #276, 65d5c9cc86708777ef458b0c2e744ab4b846bdd1, b392d1a125d0f306b4bce8d95032162a328a3ce2, c5d37d40024e32ad5778fa5971a45bc08f7631e0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metric ➡️ metrics
which ➡️ that
And there is no need to capitalize the metrics' names. But please put a comma before the final occurrence of "and".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants