Skip to content

Commit

Permalink
[#569] Allow finding previous authors for ignored commits (#1565)
Browse files Browse the repository at this point in the history
The current implementation of RepoSense assigns lines last modified by
a commit in the IgnoreCommitList to an unknown author. This means that
when a commit involving a bulk change occurs (ex: a formatting change)
and that commit is added to the ignore commit list, the
```authorship.json``` file generated by RepoSense will not contain
much meaningful information as most of the lines are assigned to an
unknown author.

Let's add a --find-previous-authors flag or -F to tell RepoSense to
use Git blame's --ignore-revs-file argument and generate a
.git-blame-ignore-revs file in the repositories analyzed to allow
RepoSense users to have a more meaningful authorship report from
RepoSense's analysis in the event that a bulk change occurs in the
repositories analyzed.
  • Loading branch information
FH-30 authored Sep 10, 2021
1 parent dc26880 commit 387401f
Show file tree
Hide file tree
Showing 73 changed files with 767 additions and 100 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,4 @@ package-lock.json
reposense-report/

docs/_site/

**/.DS_Store
20 changes: 10 additions & 10 deletions config/repo-config.csv
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Repository's Location,Branch,File formats,Ignore Glob List,Ignore standalone config,Ignore Commits List,Ignore Authors List,Shallow Cloning
https://github.com/reposense/testrepo-Alpha.git,master,,,,2fb6b9b2dd9fa40bf0f9815da2cb0ae8731436c7;c5a6dc774e22099cd9ddeb0faff1e75f9cf4f151;cd7f610e0becbdf331d5231887d8010a689f87c7;768015345e70f06add2a8b7d1f901dc07bf70582,,
https://github.com/reposense/testrepo-Beta.git,master,fxml,docs**,yes,,,
https://github.com/reposense/testrepo-Beta.git,add-config-json,fxml,docs**,yes,,,
https://github.com/reposense/testrepo-Delta.git,master,override:java;md,,,,,
https://github.com/reposense/testrepo-Delta.git,nonExistentBranch,,,,,,
https://github.com/reposense/testrepo-Delta.git,add-binary-file,,,,,,
https://github.com/reposense/RepoSense.git,master,,,,,,
https://github.com/reposense/testrepo-Empty.git,master,,,,,,
ftp://github.com/reposense/RepoSense.git,master,,,,,,
Repository's Location,Branch,File formats,Ignore Glob List,Ignore standalone config,Ignore Commits List,Ignore Authors List,Shallow Cloning,Find Previous Authors
https://github.com/reposense/testrepo-Alpha.git,master,,,,2fb6b9b2dd9fa40bf0f9815da2cb0ae8731436c7;c5a6dc774e22099cd9ddeb0faff1e75f9cf4f151;cd7f610e0becbdf331d5231887d8010a689f87c7;768015345e70f06add2a8b7d1f901dc07bf70582,,,
https://github.com/reposense/testrepo-Beta.git,master,fxml,docs**,yes,,,,
https://github.com/reposense/testrepo-Beta.git,add-config-json,fxml,docs**,yes,,,,
https://github.com/reposense/testrepo-Delta.git,master,override:java;md,,,,,,
https://github.com/reposense/testrepo-Delta.git,nonExistentBranch,,,,,,,
https://github.com/reposense/testrepo-Delta.git,add-binary-file,,,,,,,
https://github.com/reposense/RepoSense.git,master,,,,,,,
https://github.com/reposense/testrepo-Empty.git,master,,,,,,,
ftp://github.com/reposense/RepoSense.git,master,,,,,,,
1 change: 1 addition & 0 deletions docs/dg/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
* [`GitRevParse`](https://github.com/reposense/RepoSense/blob/master/src/main/java/reposense/git/GitRevParse.java): Wrapper class for `git rev-parse` functionality. Ensures that the branch of the repo is to be analyzed exists.
* [`GitShortlog`](https://github.com/reposense/RepoSense/blob/master/src/main/java/reposense/git/GitShortlog.java): Wrapper class for `git shortlog` functionality. Obtains the list of authors who have contributed to the target repo.
* [`GitUtil`](https://github.com/reposense/RepoSense/blob/master/src/main/java/reposense/git/GitUtil.java): Contains helper functions used by the other Git classes above.
* [`GitVersion`](https://github.com/reposense/RepoSense/blob/master/src/main/java/reposense/git/GitVersion.java): Wrapper class for `git --version` functionality. Obtains the current git version of the environment that RepoSense is being run on.

<!-- ==================================================================================================== -->

Expand Down
14 changes: 12 additions & 2 deletions docs/ug/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ The command `java -jar RepoSense.jar` takes several flags.
**Examples**:

An example of a command using most parameters:<br>
`java -jar RepoSense.jar --repos https://github.com/reposense/RepoSense.git --output ./report_folder --since 31/1/2017 --until 31/12/2018 --formats java adoc xml --view --ignore-standalone-config --last-modified-date --timezone UTC+08`
`java -jar RepoSense.jar --repos https://github.com/reposense/RepoSense.git --output ./report_folder --since 31/1/2017 --until 31/12/2018 --formats java adoc xml --view --ignore-standalone-config --last-modified-date --timezone UTC+08 --find-previous-authors`

Same command as above but using most parameters in alias format:<br>
`java -jar RepoSense.jar -r https://github.com/reposense/RepoSense.git -o ./report_folder -s 31/1/2017 -u 31/12/2018 -f java adoc xml -v -i -l -t UTC+08`
`java -jar RepoSense.jar -r https://github.com/reposense/RepoSense.git -o ./report_folder -s 31/1/2017 -u 31/12/2018 -f java adoc xml -v -i -l -t UTC+08 -F`
</box>

The section below provides explanations for each of the flags.
Expand Down Expand Up @@ -73,6 +73,16 @@ The section below provides explanations for each of the flags.

Binary file formats, such as `jpg`, `png`,`exe`,`zip`, `rar`, `docx`, and `pptx`, all will be labelled as the file type `binary` in the generated report.
</box>

<!-- ------------------------------------------------------------------------------------------------------ -->

### `--find-previous-authors`, `-F`

**`--find-previous-authors`**: Utilizes Git blame's ignore revisions functionality, RepoSense will attempt to blame the line changes caused by commits in the ignore commit list to the previous authors who altered those lines (if available).
* Default: RepoSense will assume that no authors are responsible for the code changes in the lines altered by commits in the ignore commit list.
* Alias: `-F` (uppercase F)
* Example:`--find-previous-authors` or `-F`

<!-- ------------------------------------------------------------------------------------------------------ -->

### `--help`, `-h`
Expand Down
1 change: 1 addition & 0 deletions docs/ug/configFiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Column Name | Explanation
Repository's Location {{ mandatory }}| The `GitHub URL` or `Disk Path` to the git repository e.g., `https://github.com/foo/bar.git` or `C:\Users\user\Desktop\GitHub\foo\bar`
Branch | The branch to analyze in the target repository e.g., `master`. Default: the default branch of the repo
File formats<sup>*+</sup> | The file extensions to analyze. Binary file formats, such as `png` and `jpg`, will be automatically labelled as the file type `binary` in the generated report. Default: all file formats
Find Previous Authors| Enter **`yes`** to utilize Git blame's ignore revisions functionality, RepoSense will attempt to blame the line changes caused by commits in the ignore commit list to the previous authors who altered those lines (if available).
Ignore Glob List<sup>*+</sup> | The list of file path globs to ignore during analysis for each author e.g., `test/**;temp/**`. Refer to the [_glob format_](https://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob) for the path glob syntax.
Ignore standalone config | To ignore the standalone config file (if any) in target repository, enter **`yes`**. If the cell is empty, the standalone config file in the repo (if any) will take precedence over configurations provided in the csv files.
Ignore Commits List<sup>*+</sup> | The list of commits to ignore during analysis. For accurate results, the commits should be provided with their full hash. Additionally, a range of commits can be specified using the `..` notation e.g. `abc123..def456` (both inclusive).
Expand Down
10 changes: 5 additions & 5 deletions docs/ug/repo-config.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Repository's Location,Branch,File formats,Ignore Glob List,Ignore standalone config,Ignore Commits List,Ignore Authors List,Shallow Cloning
https://github.com/reposense/testrepo-Alpha.git,master,,,,2fb6b9b2dd9fa40bf0f9815da2cb0ae8731436c7;c5a6dc774e22099cd9ddeb0faff1e75f9cf4f151;cd7f610e0becbdf331d5231887d8010a689f87c7;768015345e70f06add2a8b7d1f901dc07bf70582,,
https://github.com/reposense/testrepo-Beta.git,master,java;adocs,docs**,yes,,,
https://github.com/reposense/testrepo-Delta.git,master,html;css,,,,Ahmad Syafiq;Jordan Chong,
https://github.com/reposense/RepoSense.git,master,,,,,,
Repository's Location,Branch,File formats,Ignore Glob List,Ignore standalone config,Ignore Commits List,Ignore Authors List,Shallow Cloning,Find Previous Authors
https://github.com/reposense/testrepo-Alpha.git,master,,,,2fb6b9b2dd9fa40bf0f9815da2cb0ae8731436c7;c5a6dc774e22099cd9ddeb0faff1e75f9cf4f151;cd7f610e0becbdf331d5231887d8010a689f87c7;768015345e70f06add2a8b7d1f901dc07bf70582,,,
https://github.com/reposense/testrepo-Beta.git,master,java;adocs,docs**,yes,,,,
https://github.com/reposense/testrepo-Delta.git,master,html;css,,,,Ahmad Syafiq;Jordan Chong,,
https://github.com/reposense/RepoSense.git,master,,,,,,,
10 changes: 10 additions & 0 deletions src/main/java/reposense/RepoSense.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import java.util.logging.Logger;

import net.sourceforge.argparse4j.helper.HelpScreenException;
import reposense.git.GitVersion;
import reposense.model.AuthorConfiguration;
import reposense.model.CliArguments;
import reposense.model.ConfigCliArguments;
Expand Down Expand Up @@ -74,6 +75,15 @@ public static void main(String[] args) {
cliArguments.isLastModifiedDateIncluded());
RepoConfiguration.setIsShallowCloningPerformedToRepoConfigs(configs,
cliArguments.isShallowCloningPerformed());
RepoConfiguration.setIsFindingPreviousAuthorsPerformedToRepoConfigs(configs,
cliArguments.isFindingPreviousAuthorsPerformed());

if (RepoConfiguration.isAnyRepoFindingPreviousAuthors(configs)
&& !GitVersion.isGitVersionSufficientForFindingPreviousAuthors()) {
logger.warning(GitVersion.FINDING_PREVIOUS_AUTHORS_INVALID_VERSION_WARNING_MESSAGE);
RepoConfiguration.setToFalseIsFindingPreviousAuthorsPerformedToRepoConfigs(configs);
}

List<Path> reportFoldersAndFiles = ReportGenerator.generateReposReport(configs,
cliArguments.getOutputFilePath().toAbsolutePath().toString(),
cliArguments.getAssetsFilePath().toAbsolutePath().toString(), reportConfig,
Expand Down
16 changes: 15 additions & 1 deletion src/main/java/reposense/authorship/FileInfoAnalyzer.java
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,14 @@ private static FileResult generateBinaryFileResult(RepoConfiguration config, Fil
* on the file.
*/
private static void aggregateBlameAuthorModifiedAndDateInfo(RepoConfiguration config, FileInfo fileInfo) {
String blameResults = getGitBlameResult(config, fileInfo.getPath());
String blameResults;

if (!config.isFindingPreviousAuthorsPerformed()) {
blameResults = getGitBlameResult(config, fileInfo.getPath());
} else {
blameResults = getGitBlameWithPreviousAuthorsResult(config, fileInfo.getPath());
}

String[] blameResultLines = blameResults.split("\n");
Path filePath = Paths.get(fileInfo.getPath());
Long sinceDateInMs = config.getSinceDate().getTime();
Expand Down Expand Up @@ -180,4 +187,11 @@ private static void aggregateBlameAuthorModifiedAndDateInfo(RepoConfiguration co
private static String getGitBlameResult(RepoConfiguration config, String filePath) {
return GitBlame.blame(config.getRepoRoot(), filePath);
}

/**
* Returns the analysis result from running git blame with finding previous authors enabled on {@code filePath}.
*/
private static String getGitBlameWithPreviousAuthorsResult(RepoConfiguration config, String filePath) {
return GitBlame.blameWithPreviousAuthors(config.getRepoRoot(), filePath);
}
}
15 changes: 15 additions & 0 deletions src/main/java/reposense/git/GitBlame.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
* Git blame is responsible for showing which revision and author last modified each line of a file.
*/
public class GitBlame {
public static final String IGNORE_COMMIT_LIST_FILE_NAME = ".git-blame-ignore-revs";

private static final String COMMIT_HASH_REGEX = "(^[0-9a-f]{40} .*)";
private static final String AUTHOR_NAME_REGEX = "(^author .*)";
Expand All @@ -33,4 +34,18 @@ public static String blame(String root, String fileDirectory) {

return StringsUtil.filterText(runCommand(rootPath, blameCommand), COMBINATION_REGEX);
}

/**
* Returns the raw git blame result with finding previous authors enabled for the {@code fileDirectory},
* performed at the {@code root} directory.
*/
public static String blameWithPreviousAuthors(String root, String fileDirectory) {
Path rootPath = Paths.get(root);

String blameCommandWithFindingPreviousAuthors = "git blame -w --line-porcelain --ignore-revs-file";
blameCommandWithFindingPreviousAuthors += " " + addQuote(IGNORE_COMMIT_LIST_FILE_NAME);
blameCommandWithFindingPreviousAuthors += " " + addQuote(fileDirectory);

return StringsUtil.filterText(runCommand(rootPath, blameCommandWithFindingPreviousAuthors), COMBINATION_REGEX);
}
}
26 changes: 26 additions & 0 deletions src/main/java/reposense/git/GitShow.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,14 @@
import java.nio.file.Paths;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.logging.Logger;
import java.util.stream.Collectors;

import reposense.git.exception.CommitNotFoundException;
import reposense.model.CommitHash;
import reposense.system.LogsManager;

/**
Expand All @@ -21,6 +24,29 @@ public class GitShow {

private static final Logger logger = LogsManager.getLogger(GitShow.class);

/**
* Returns expanded form of the commit hash associated with the {@code shortCommitHash}
*/
public static CommitHash getExpandedCommitHash(String root, String shortCommitHash) throws CommitNotFoundException {
Path rootPath = Paths.get(root);
String showCommand = "git show -s --format=%H " + shortCommitHash;

try {
String output = runCommand(rootPath, showCommand);
List<CommitHash> commitHashes = Arrays.stream(output.split("\n"))
.map(CommitHash::new).collect(Collectors.toList());
if (commitHashes.size() > 1) {
logger.warning(String.format("%s can be expanded to %d different commits, "
+ "assuming %s refers to commit hash %s",
shortCommitHash, commitHashes.size(), shortCommitHash, commitHashes.get(0)));
}

return commitHashes.get(0);
} catch (RuntimeException re) {
throw new CommitNotFoundException("Commit not found: " + shortCommitHash);
}
}

/**
* Returns date of commit associated with commit hash.
*/
Expand Down
38 changes: 38 additions & 0 deletions src/main/java/reposense/git/GitVersion.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
package reposense.git;

import static reposense.system.CommandRunner.runCommand;

import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.regex.Pattern;

/**
* Contains git version related functionalities.
* Git version is responsible for finding out the version of git the user of RepoSense is running.
*/
public class GitVersion {
public static final String FINDING_PREVIOUS_AUTHORS_INVALID_VERSION_WARNING_MESSAGE =
"--find-previous-authors/-F requires git version 2.23 and above. Feature will be disabled for this run";

/** Regex for matching Git version 2.23 and above */
public static final Pattern FINDING_PREVIOUS_AUTHORS_VALID_GIT_VERSION_PATTERN =
Pattern.compile("(((2\\d*\\.(2[3-9]\\d*|[3-9]\\d+|\\d{3,}))|((([3-9]\\d*)|(1\\d+))\\.\\d+))\\.\\d*)");

/**
* Get current git version of RepoSense user
*/
public static String getGitVersion() {
Path rootPath = Paths.get("/");
String versionCommand = "git --version";

return runCommand(rootPath, versionCommand);
}

/**
* Returns a boolean indicating whether the current user has a version valid for running
* Find Previous Authors functionality in RepoSense.
*/
public static boolean isGitVersionSufficientForFindingPreviousAuthors() {
return FINDING_PREVIOUS_AUTHORS_VALID_GIT_VERSION_PATTERN.matcher(getGitVersion()).find();
}
}
8 changes: 7 additions & 1 deletion src/main/java/reposense/model/CliArguments.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ public abstract class CliArguments {
protected int numCloningThreads;
protected int numAnalysisThreads;
protected ZoneId zoneId;
protected boolean isFindingPreviousAuthorsPerformed;

public ZoneId getZoneId() {
return zoneId;
Expand Down Expand Up @@ -80,6 +81,10 @@ public int getNumAnalysisThreads() {
return numAnalysisThreads;
}

public boolean isFindingPreviousAuthorsPerformed() {
return isFindingPreviousAuthorsPerformed;
}

@Override
public boolean equals(Object other) {
// short circuit if same object
Expand All @@ -106,6 +111,7 @@ public boolean equals(Object other) {
&& this.isStandaloneConfigIgnored == otherCliArguments.isStandaloneConfigIgnored
&& this.numCloningThreads == otherCliArguments.numCloningThreads
&& this.numAnalysisThreads == otherCliArguments.numAnalysisThreads
&& this.zoneId.equals(otherCliArguments.zoneId);
&& this.zoneId.equals(otherCliArguments.zoneId)
&& this.isFindingPreviousAuthorsPerformed == otherCliArguments.isFindingPreviousAuthorsPerformed;
}
}
4 changes: 3 additions & 1 deletion src/main/java/reposense/model/ConfigCliArguments.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ public ConfigCliArguments(Path configFolderPath, Path outputFilePath, Path asset
Date untilDate, boolean isSinceDateProvided, boolean isUntilDateProvided, int numCloningThreads,
int numAnalysisThreads, List<FileType> formats, boolean isLastModifiedDateIncluded,
boolean isShallowCloningPerformed, boolean isAutomaticallyLaunching,
boolean isStandaloneConfigIgnored, ZoneId zoneId, ReportConfiguration reportConfiguration) {
boolean isStandaloneConfigIgnored, ZoneId zoneId, ReportConfiguration reportConfiguration,
boolean isFindingPreviousAuthorsPerformed) {
this.configFolderPath = configFolderPath.equals(EMPTY_PATH)
? configFolderPath.toAbsolutePath()
: configFolderPath;
Expand All @@ -51,6 +52,7 @@ public ConfigCliArguments(Path configFolderPath, Path outputFilePath, Path asset
this.numAnalysisThreads = numAnalysisThreads;
this.zoneId = zoneId;
this.reportConfiguration = reportConfiguration;
this.isFindingPreviousAuthorsPerformed = isFindingPreviousAuthorsPerformed;
}

public Path getConfigFolderPath() {
Expand Down
3 changes: 2 additions & 1 deletion src/main/java/reposense/model/LocationsCliArguments.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ public LocationsCliArguments(List<String> locations, Path outputFilePath, Path a
Date untilDate, boolean isSinceDateProvided, boolean isUntilDateProvided, int numCloningThreads,
int numAnalysisThreads, List<FileType> formats, boolean isLastModifiedDateIncluded,
boolean isShallowCloningPerformed, boolean isAutomaticallyLaunching,
boolean isStandaloneConfigIgnored, ZoneId zoneId) {
boolean isStandaloneConfigIgnored, ZoneId zoneId, boolean isFindingPreviousAuthorsPerformed) {
this.locations = locations;
this.outputFilePath = outputFilePath;
this.assetsFilePath = assetsFilePath;
Expand All @@ -31,6 +31,7 @@ public LocationsCliArguments(List<String> locations, Path outputFilePath, Path a
this.numCloningThreads = numCloningThreads;
this.numAnalysisThreads = numAnalysisThreads;
this.zoneId = zoneId;
this.isFindingPreviousAuthorsPerformed = isFindingPreviousAuthorsPerformed;
}

public List<String> getLocations() {
Expand Down
Loading

0 comments on commit 387401f

Please sign in to comment.