Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added blame info feature for "cmd results" command #4008

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cservakt
Copy link
Collaborator

Blame information for reports was only available on the GUI. Now, if we wish to check the git commit info in the cli, we can do that with "CodeChecker cmd results --details" command We can only check blame info for runs that have a git repository. The server address and the run name should also be given, e.g.: "CodeChecker cmd results --details --url http://localhost:8001/Default test". The syntax is different in the GUI, as there you can check the blame info for each line of a reported file. In case of the cli, it would be huge output result to print long blame info for every report. A filtered one is added to the report data, which contains only the commit data of the given section.

@cservakt cservakt added API change 📄 Content of patch changes API! WIP 💣 Work In Progress CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands CI 📦 new feature 👍 New feature request labels Sep 13, 2023
@cservakt cservakt added this to the release 6.23.0 milestone Sep 13, 2023
@cservakt cservakt requested a review from Szelethus September 13, 2023 15:27
@cservakt cservakt self-assigned this Sep 13, 2023
@cservakt cservakt force-pushed the cmd-results-blameinfo branch 2 times, most recently from 76e443e to 5baf2f7 Compare September 14, 2023 11:37
@cservakt cservakt removed the WIP 💣 Work In Progress label Sep 22, 2023
17: optional string analyzerName, // Analyzer name.
15: i64 bugPathLength, // Length of the bug path.
16: optional ReportDetails details, // Details of the report.
17: optional BlameInfo blameInfo, // Blmae info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New parameters should be added to the end of the list without changing previous numbers. As far as I know, these numbers play a role in API backward compatibility.

Copy link
Collaborator Author

@cservakt cservakt Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've fixed it and added the new prop to the end of the struct.

Comment on lines 1928 to 1929
blame_info = blame_infos[report.id] \
if report.id in blame_infos else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
blame_info = blame_infos[report.id] \
if report.id in blame_infos else None
blame_info = blame_infos.get(report.id)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion!

report_ids, blames = zip(*[
(
r[0].id,
(r[0].id, self.getBlameInfo(r[0].file_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider caching fetched blame info in case there are many reports in a file.

@cservakt cservakt force-pushed the cmd-results-blameinfo branch from 5baf2f7 to 2bc73d1 Compare October 2, 2023 12:47
@cservakt cservakt requested a review from bruntib October 2, 2023 13:39
Copy link
Contributor

@Szelethus Szelethus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely can't land this without tests.

Also, I saw the following blame info from a tinyxml2 analysis in json format:

    "blameInfo": {
      "commits": {
        "a9cf3f9f3fe65df392caa5aecd3b77a260d7921f": {
          "author": {
            "name": "Lee Thomason",
            "email": "[email protected]"
          },
          "summary": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.",
          "message": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.\n",
          "committedDateTime": "2012-10-11 16:56:51-07:00"
        }
      },
      "blame": [
        {
          "startLine": 2646,
          "endLine": 2646,
          "commitHash": "a9cf3f9f3fe65df392caa5aecd3b77a260d7921f"
        }
      ]
    }

It may not be your doing, but are we sure we want the commit hash to be a key?

@@ -325,6 +325,7 @@ struct ReportData {
// of custom labels that describe some properties of a report. For example the
// timestamp in case of dynamic analyzers when the report was actually emitted.
18: optional map<string, string> annotations,
19: optional BlameInfo blameInfo, // Blmae info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a typo, but more importantly, maybe this isn't the comment we need here :) How about
"Contains the git blame information of the report. May be NULL if the analysis was not done in a git repository."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion! I completely agree. It would be better if there was a more precise description.

@cservakt
Copy link
Collaborator Author

cservakt commented Oct 6, 2023

We definitely can't land this without tests.

Also, I saw the following blame info from a tinyxml2 analysis in json format:

    "blameInfo": {
      "commits": {
        "a9cf3f9f3fe65df392caa5aecd3b77a260d7921f": {
          "author": {
            "name": "Lee Thomason",
            "email": "[email protected]"
          },
          "summary": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.",
          "message": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.\n",
          "committedDateTime": "2012-10-11 16:56:51-07:00"
        }
      },
      "blame": [
        {
          "startLine": 2646,
          "endLine": 2646,
          "commitHash": "a9cf3f9f3fe65df392caa5aecd3b77a260d7921f"
        }
      ]
    }

It may not be your doing, but are we sure we want the commit hash to be a key?

I think this is the best way to print the result in cli. Blame info belongs to a file, not to a report and each commit is identified by the commit hash. So we can only filter it to get a shorter form for the given report. If we want to get a different syntax when using the cli, we need to modify the blame info stuct in thirift, which is also used by the GUI.

Blame information for reports was only available on the GUI. Now, if we wish to check the git commit info in the cli, we can do that with "CodeChecker cmd results --details" command
We can only check blame info for runs that have a git repository. The server address and the run name should also be given, e.g.: "CodeChecker cmd results --details --url http://localhost:8001/Default test".
The syntax is different in the GUI, as there you can check the blame info for each line of a reported file. In case of the cli, it would be huge output result to print long blame info for every report. A filtered one is added to the report data, which contains only the commit data of the given section.
@cservakt cservakt force-pushed the cmd-results-blameinfo branch 2 times, most recently from dd5734f to 9d0ac9e Compare October 10, 2023 12:47
@cservakt cservakt requested a review from Szelethus October 10, 2023 13:21
@cservakt cservakt force-pushed the cmd-results-blameinfo branch 3 times, most recently from 781796f to 987e196 Compare October 11, 2023 09:29
Adding test to check cmd results blmae info feature.
@cservakt cservakt force-pushed the cmd-results-blameinfo branch from 987e196 to 5fc5059 Compare October 11, 2023 09:34
@Szelethus
Copy link
Contributor

Shouldn't the value for key commits be a list instead of a dict? How about this:

"blameInfo": {
  "commits": [
    {
      "commit_hash": "a9cf3f9f3fe65df392caa5aecd3b77a260d7921f",
      "author": {
        "name": "Lee Thomason",
        "email": "[email protected]"
      },
      "summary": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.",
      "message": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.\n",
      "committedDateTime": "2012-10-11 16:56:51-07:00"
    },
    {
      "commit_hash": "a9cf3f9f3fe65df392caa5aecd3b77a260d7921f",
      "author": {
        "name": "Lee Thomason",
        "email": "[email protected]"
      },
      "summary": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.",
      "message": "Switched to Artistic Style auto-formatting to allow integration of patches from other coding styles.\n",
      "committedDateTime": "2012-10-11 16:56:51-07:00"
    }
  ],
  "blame": [
    {
      "startLine": 2646,
      "endLine": 2646,
      "commitHash": "a9cf3f9f3fe65df392caa5aecd3b77a260d7921f"
    }
  ]
}

Mind that the hash is no longer a key here either, but a value. This looks more idiomatic to me.

@@ -325,6 +325,7 @@ struct ReportData {
// of custom labels that describe some properties of a report. For example the
// timestamp in case of dynamic analyzers when the report was actually emitted.
18: optional map<string, string> annotations,
19: optional BlameInfo blameInfo, // Contains the git blame information of the report if it exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is longer than 79 columns.

Comment on lines +1904 to +1912
blame_infos = {}
if get_details and len(query_result):
report_ids, blames = zip(*[
(
r[0].id,
(r[0].id, self.getBlameInfo(r[0].file_id))
) for r in query_result])
report_details = get_report_details(session, report_ids)
blame_infos = dict(blames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRunResults is already stomach-churningly long. Can we put these lines a new function instead?

Comment on lines +1928 to +1942
blame_info = blame_infos.get(report.id)
if blame_info and blame_info.commits and blame_info.blame:
blame_data = [b for b in blame_info.blame
if report.line >= b.startLine
and report.line <= b.endLine]
commitHash = blame_data[0].commitHash \
if len(blame_data) else None
commitInfo = {cHash: commit for cHash, commit
in blame_info.commits.items()
if cHash == commitHash}
blame_info = BlameInfo(
commits=commitInfo,
blame=blame_data
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here.

commits=commitInfo,
blame=blame_data
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code snippets are basically copy-paste from above, right? Can we do something about that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more readable form.

Comment on lines +458 to +464
run_results = self._cc_client.getRunResults([runid],
100,
0,
None,
simple_filter,
None,
True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name the parameters we're assign to here? LLVM Coding Standards has a well written guidince on this. By no means are we strictly abiding all of these, but they make a lot of sense :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not name the parameters in the above functions either. If we want to do it, then in all the other cases we must also assigne them.

None,
True)

self.assertTrue(any(res.blameInfo for res in run_results))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test the json output as well? At least that is doesn't crash.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is sufficient to check only the existence of the blame info. If someone modified the test files, the blame info whould be also changed, which whould break the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change 📄 Content of patch changes API! CI 📦 CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands new feature 👍 New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants