Summary statistics for checkers #4018

cservakt · 2023-09-22T12:58:03Z

It can be used to see the number of reports per checker for parse command. Before storing it is helpful to verify with the --summary flag, which checkers are generating too many reports in a large report directory. The result is a table that has a checker name and number of reports columns. It shows the results in descending order.

Example command: CodeChecker parse reports/ --summary

The result table would be:

---==== Checkers Summary Statistics ====----
-------------------------------------------------------------------------------
Checker name                                                | Number of reports
-------------------------------------------------------------------------------
readability-avoid-const-params-in-decls                     |            266836
modernize-use-trailing-return-type                          |            255116
readability-magic-numbers                                   |            138216
modernize-avoid-c-arrays                                    |            116741
-------------------------------------------------------------------------------
----=================----

bruntib

This will be a useful feature, so thanks for the development. However, I'm not sure if its current implementation is a good direction.

This solution relies on the fact that grep utility is installed.

Another problem is that the output is printed in a custom format by an additional code. If I would search the implementation where this new summary table is assembled and printed, then I started with the files where all other statistics are put together: https://github.com/Ericsson/codechecker/blob/master/tools/report-converter/codechecker_report_converter/report/statistics.py

As far as I know, the goal of this patch is to provide an option which prints statistics quickly without parsing the .plist files. The implementation could use the metadata.json. This file could contain some statistics data that could be used. I'm not too familiar with the report-converter tool, but after some quick search I found that it knows about the metadata file. It also writes its content. So it is not an out-of-scope thing for this tool to read it too. So the Statistics class could either fetch its data from the metadata.json or fallback to the .plist files.

What do you think of this direction? Maybe we could continue some consultation about it, and in the meantime I'll get familiar with the report-converter.
Thank you!

It can be used to see the number of reports per checker for 'parse' command. Before storing it is helpful to verify with the '--summary' flag, which checkers are generating too many reports in a large report directory. The result is a table that has a checker name and number of reports columns. It shows the results in descending order. Example command: 'CodeChecker parse reports/ --summary' the result table would be: ---==== Checkers Summary Statistics ====---- ------------------------------------------------------------------------------- Checker name | Number of reports ------------------------------------------------------------------------------- readability-avoid-const-params-in-decls | 266836 modernize-use-trailing-return-type | 255116 readability-magic-numbers | 138216 modernize-avoid-c-arrays | 116741 ------------------------------------------------------------------------------- ----=================----

analyzer/codechecker_analyzer/cmd/parse.py

Szelethus

We need tests before landing this. Maybe an analyze_and_parse test file like this?:
https://github.com/Ericsson/codechecker/blob/master/analyzer/tests/functional/analyze_and_parse/test_files/diagnostic_message_hash_clang_tidy.output

Szelethus · 2023-10-11T12:58:02Z

docs/analyzer/user_guide.md

+The number of reports per checker can be verified with
+
+```sh
+CodeChecker parse ./my_plists --summary


I would like us to start dissociating from plists, because even though that has been the only file format CodeChecker since its inception, thats really not the point here. This is a report directory, regardless of what kinds of files we have in there (not to mention that have issues requesting us to handle sarif natively as well).

Internally, I also hear a lot of well intentioned but misleading discussions on this. We talk about plists, not reports, even well past the point of parsing. Unless we are specifically talking about the parsing process (not even parsing in general), we should omit mentioning the file format.

This is less of a comment on your PR, just my general gripe with CodeChecker.

Suggested change

CodeChecker parse ./my_plists --summary

CodeChecker parse ./my_results --summary

Szelethus · 2023-10-11T12:59:28Z

tools/report-converter/codechecker_report_converter/report/statistics.py

@@ -90,3 +90,20 @@ def add_report(self, report: Report):
        self.checker_statistics[
            Checker(report.checker_name, report.severity)] += 1
        self.file_statistics[report.file.original_path] += 1
+
+    def write_checker_summary(self, checker_stats, out=sys.stdout):
+        """ Print checker summary statistics if it's available. """


When is it available? I don't think the external factors are expected to change much, we can say a little more.

analyzer/codechecker_analyzer/cmd/parse.py

dkrupp

Please add test cases. Otherwise looks ok.

dkrupp

The output of the summary and the normal CodeChecker parse command differs.
I guess they should print the same.

I suspect the root cause is that we don't do bug deduplacation at the calculation of this summary.
This is related:
#4042

This summary:

Checker name | Number of reports

Parse:
----==== Checker Statistics ====----

Checker name | Severity | Number of reports

cppcoreguidelines-special-member-functions | LOW | 533
misc-misplaced-const | LOW | 7
clang-diagnostic-implicit-int-conversion | MEDIUM | 31
clang-diagnostic-switch-enum | MEDIUM | 65
bugprone-sizeof-expression | HIGH | 80
cert-err09-cpp | HIGH | 37
clang-diagnostic-implicit-fallthrough | MEDIUM | 13
clang-diagnostic-misleading-indentation | MEDIUM | 11
clang-diagnostic-float-conversion | MEDIUM | 1
bugprone-misplaced-widening-cast | HIGH | 3
clang-diagnostic-non-virtual-dtor | MEDIUM | 3
clang-diagnostic-unused-parameter | MEDIUM | 2
clang-diagnostic-implicit-int-float-conversion | MEDIUM | 17
clang-diagnostic-unused-variable | MEDIUM | 1
misc-unconventional-assign-operator | MEDIUM | 1
clang-diagnostic-sign-compare | MEDIUM | 1
google-global-names-in-headers | STYLE | 26
clang-diagnostic-implicit-float-conversion | MEDIUM | 6
clang-diagnostic-double-promotion | MEDIUM | 3
clang-diagnostic-implicitly-unsigned-literal | MEDIUM | 4

cservakt · 2024-04-02T12:04:50Z

This PR is not the appropriate implementation. After redesigning the report directory, there will be a correct solution to sum statistics.

cservakt added CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands new feature 👍 New feature request labels Sep 22, 2023

cservakt added this to the release 6.23.0 milestone Sep 22, 2023

cservakt requested a review from Szelethus September 22, 2023 12:58

cservakt requested review from bruntib and vodorok as code owners September 22, 2023 12:58

bruntib requested changes Sep 25, 2023

View reviewed changes

cservakt added the WIP 💣 Work In Progress label Sep 25, 2023

cservakt force-pushed the checker-summary branch from 59b1ddb to a122cf3 Compare September 26, 2023 13:17

cservakt requested a review from dkrupp as a code owner September 26, 2023 13:17

cservakt force-pushed the checker-summary branch 4 times, most recently from 695d3ea to 9b564df Compare September 28, 2023 13:41

cservakt requested a review from bruntib September 28, 2023 13:59

cservakt force-pushed the checker-summary branch from 9b564df to c088fcf Compare October 3, 2023 09:54

Merge branch 'master' into checker-summary

13cbdb8

bruntib requested changes Oct 10, 2023

View reviewed changes

analyzer/codechecker_analyzer/cmd/parse.py Show resolved Hide resolved

Szelethus mentioned this pull request Oct 11, 2023

[server] Rate limit based on report count #3843

Merged

Szelethus requested changes Oct 11, 2023

View reviewed changes

dkrupp requested changes Oct 11, 2023

View reviewed changes

vodorok added 2 commits October 16, 2023 00:40

Fix review comments

f473627

Add tests for parse summary

4f1746b

vodorok force-pushed the checker-summary branch from ac389dd to 4f1746b Compare October 16, 2023 09:39

dkrupp requested changes Oct 16, 2023

View reviewed changes

dkrupp modified the milestones: release 6.23.0, release 6.24.0 Oct 19, 2023

whisperity marked this pull request as draft March 27, 2024 11:13

cservakt closed this Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary statistics for checkers #4018

Summary statistics for checkers #4018

cservakt commented Sep 22, 2023

bruntib left a comment

Szelethus left a comment

Szelethus Oct 11, 2023

Szelethus Oct 11, 2023

dkrupp left a comment

dkrupp left a comment

cservakt commented Apr 2, 2024

	CodeChecker parse ./my_plists --summary
	CodeChecker parse ./my_results --summary

Summary statistics for checkers #4018

Summary statistics for checkers #4018

Conversation

cservakt commented Sep 22, 2023

bruntib left a comment

Choose a reason for hiding this comment

Szelethus left a comment

Choose a reason for hiding this comment

Szelethus Oct 11, 2023

Choose a reason for hiding this comment

Szelethus Oct 11, 2023

Choose a reason for hiding this comment

dkrupp left a comment

Choose a reason for hiding this comment

dkrupp left a comment

Choose a reason for hiding this comment

Checker name | Number of reports

Parse: ----==== Checker Statistics ====----

Checker name | Severity | Number of reports

cservakt commented Apr 2, 2024

Parse:
----==== Checker Statistics ====----