-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary statistics for checkers #4018
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a useful feature, so thanks for the development. However, I'm not sure if its current implementation is a good direction.
This solution relies on the fact that grep
utility is installed.
Another problem is that the output is printed in a custom format by an additional code. If I would search the implementation where this new summary table is assembled and printed, then I started with the files where all other statistics are put together: https://github.com/Ericsson/codechecker/blob/master/tools/report-converter/codechecker_report_converter/report/statistics.py
As far as I know, the goal of this patch is to provide an option which prints statistics quickly without parsing the .plist files. The implementation could use the metadata.json
. This file could contain some statistics data that could be used. I'm not too familiar with the report-converter
tool, but after some quick search I found that it knows about the metadata file. It also writes its content. So it is not an out-of-scope thing for this tool to read it too. So the Statistics
class could either fetch its data from the metadata.json
or fallback to the .plist files.
What do you think of this direction? Maybe we could continue some consultation about it, and in the meantime I'll get familiar with the report-converter
.
Thank you!
59b1ddb
to
a122cf3
Compare
695d3ea
to
9b564df
Compare
It can be used to see the number of reports per checker for 'parse' command. Before storing it is helpful to verify with the '--summary' flag, which checkers are generating too many reports in a large report directory. The result is a table that has a checker name and number of reports columns. It shows the results in descending order. Example command: 'CodeChecker parse reports/ --summary' the result table would be: ---==== Checkers Summary Statistics ====---- ------------------------------------------------------------------------------- Checker name | Number of reports ------------------------------------------------------------------------------- readability-avoid-const-params-in-decls | 266836 modernize-use-trailing-return-type | 255116 readability-magic-numbers | 138216 modernize-avoid-c-arrays | 116741 ------------------------------------------------------------------------------- ----=================----
9b564df
to
c088fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tests before landing this. Maybe an analyze_and_parse
test file like this?:
https://github.com/Ericsson/codechecker/blob/master/analyzer/tests/functional/analyze_and_parse/test_files/diagnostic_message_hash_clang_tidy.output
docs/analyzer/user_guide.md
Outdated
The number of reports per checker can be verified with | ||
|
||
```sh | ||
CodeChecker parse ./my_plists --summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like us to start dissociating from plists, because even though that has been the only file format CodeChecker since its inception, thats really not the point here. This is a report directory, regardless of what kinds of files we have in there (not to mention that have issues requesting us to handle sarif natively as well).
Internally, I also hear a lot of well intentioned but misleading discussions on this. We talk about plists, not reports, even well past the point of parsing. Unless we are specifically talking about the parsing process (not even parsing in general), we should omit mentioning the file format.
This is less of a comment on your PR, just my general gripe with CodeChecker.
CodeChecker parse ./my_plists --summary | |
CodeChecker parse ./my_results --summary |
@@ -90,3 +90,20 @@ def add_report(self, report: Report): | |||
self.checker_statistics[ | |||
Checker(report.checker_name, report.severity)] += 1 | |||
self.file_statistics[report.file.original_path] += 1 | |||
|
|||
def write_checker_summary(self, checker_stats, out=sys.stdout): | |||
""" Print checker summary statistics if it's available. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is it available? I don't think the external factors are expected to change much, we can say a little more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test cases. Otherwise looks ok.
ac389dd
to
4f1746b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of the summary and the normal CodeChecker parse command differs.
I guess they should print the same.
I suspect the root cause is that we don't do bug deduplacation at the calculation of this summary.
This is related:
#4042
This summary:
Checker name | Number of reports
cppcoreguidelines-special-member-functions | 23068
misc-misplaced-const | 345
bugprone-sizeof-expression | 335
clang-diagnostic-implicit-int-float-conversion | 85
clang-diagnostic-non-virtual-dtor | 68
clang-diagnostic-switch-enum | 65
clang-diagnostic-implicit-int-conversion | 48
google-global-names-in-headers | 44
cert-err09-cpp | 37
clang-diagnostic-implicit-fallthrough | 13
clang-diagnostic-misleading-indentation | 13
bugprone-misplaced-widening-cast | 6
clang-diagnostic-implicit-float-conversion | 6
clang-diagnostic-implicitly-unsigned-literal | 4
clang-diagnostic-double-promotion | 3
clang-diagnostic-unused-parameter | 2
clang-diagnostic-float-conversion | 1
clang-diagnostic-unused-variable | 1
misc-unconventional-assign-operator | 1
clang-diagnostic-sign-compare | 1
Parse:
----==== Checker Statistics ====----
Checker name | Severity | Number of reports
cppcoreguidelines-special-member-functions | LOW | 533
misc-misplaced-const | LOW | 7
clang-diagnostic-implicit-int-conversion | MEDIUM | 31
clang-diagnostic-switch-enum | MEDIUM | 65
bugprone-sizeof-expression | HIGH | 80
cert-err09-cpp | HIGH | 37
clang-diagnostic-implicit-fallthrough | MEDIUM | 13
clang-diagnostic-misleading-indentation | MEDIUM | 11
clang-diagnostic-float-conversion | MEDIUM | 1
bugprone-misplaced-widening-cast | HIGH | 3
clang-diagnostic-non-virtual-dtor | MEDIUM | 3
clang-diagnostic-unused-parameter | MEDIUM | 2
clang-diagnostic-implicit-int-float-conversion | MEDIUM | 17
clang-diagnostic-unused-variable | MEDIUM | 1
misc-unconventional-assign-operator | MEDIUM | 1
clang-diagnostic-sign-compare | MEDIUM | 1
google-global-names-in-headers | STYLE | 26
clang-diagnostic-implicit-float-conversion | MEDIUM | 6
clang-diagnostic-double-promotion | MEDIUM | 3
clang-diagnostic-implicitly-unsigned-literal | MEDIUM | 4
This PR is not the appropriate implementation. After redesigning the report directory, there will be a correct solution to sum statistics. |
It can be used to see the number of reports per checker for
parse
command. Before storing it is helpful to verify with the--summary
flag, which checkers are generating too many reports in a large report directory. The result is a table that has a checker name and number of reports columns. It shows the results in descending order.Example command:
CodeChecker parse reports/ --summary
The result table would be: