Skip to content

LSS-USP/kiskadee-ranking-data

Repository files navigation

Feature examples

position, tool, category, severity, redundancy_level, category_frequency, tool_fp_rate, neighbors, positive?
foo.c:47, cppcheck, buffer overflow, critical, 1, 10, 0.3, ?, true

* neighbors would be a feature to catch other warnings around the same warning
* since we are collecting regular expressions about the warning messages to
  label the warnings, we can cluster them in specific categories with these
  regexes (buffer, div0, pointer, etc)

It is also possible that we would benefit of a binary feature for each of the static analyzers, where it is true for the presence of the same bug in the analyzer and false otherwise

Notes on Juliet 1.2 test cases

  • There are regular expressions to identify GOOD and BAD functions
  • There are makefiles for some CWEs, where a binary is built on for files without Windows dependencies
  • Some test cases are bad only test cases. They should not be used if you want to determine the number of false positives generated by a tool I do believe they may be useful for this experiment). These cases are listed in appendix D of Juliet User guide under juliet/doc
  • Accidental flaws (i.e. non-intentional bugs in Juliet) may exist, and they should be ignored.

how to run

To run this experiment, you need the following software installed

  • python >= 3.6
  • RPM
  • firehose
  • ctags
  • RPM packages for
    • cppcheck
    • flawfinder
    • frama-c
    • scan-build (clang-analyzer)

Just run make to download and prepare the test suite and start running the analyzers.

The results will be stored under the reports directory.

Work Log

2017-10-01

  • some entries in the functions scope list end with ':'. It seems they belong to C++ testcases, this needs further inverstigation
  • there will be duplicates for class names when trying to determine functions scopes, in these cases, the largest ranges should be considered (hoping we are considering the whole class)
  • for confirmation on the latest script, do check s01/CWE690_NULL_Deref_From_Return__int64_t_realloc_83_bad.cpp file scope
  • note that for the cpp cases, we can just check the bad|good string in the file names

2017-10-02

  • You can check that there are no repeated file names in the set of test cases used for this experiment with the folowing command:

2017-10-14

  • Added new feature with number of warnings in a file, reaching 69% precision
  • when I add the file name (we no not want it) it goes up to 71%
test `cat c_testcases.list cpp_testcases.list | sed 's/.*\/\([^/]*\.c[p]*\).*/\1/' | sort -u | wc -l` == `cat c_testcases.list cpp_testcases.list | wc -l` && echo 'There are no repeated file names in the used set of the test suite'

Collecting static analysis reports:

Static analysis reports were collected by runnings 4 static analysis tools in Juliet 1.2:

  • Frama-C
  • flawfinder
  • cppcheck
  • scan-build (clang analyzer)

The parameters used to run each script can be seen in the run_analyses.sh script.

We ran the analyzers in a subset of Juliet, removing the testcases calling types or functions that were specific to Windows systems. A complete list of the files analyzed is generated with the bootstrap.sh script.

It is worth mentioning that for Frama-C, we also had to ignore the C++ test cases, since this tool can only analyze C programs.

Preprocessing the reports to generate the trainning set

After generating the reports, we need to pre-process them before we are able to use them to train our model. We need to

  • Label warnings as true/false positives
  • Remove warnings not related to the CWE being tested generated for each test case (this is needed because accidental flaws may exist)
  • Collect potential features for the training set

To aid this task, we first convert all the reports to a common report syntax (firehose)

Labelling true/false positives

We want to use the regular expressions provided by Juliet documentation to match the bad functions. We also need to map the warning messages from each tool with the CWE in question. after this:

  • Warnings related to the CWE, in bad functions, will be considered true positives
  • Warnings related to the CWE, in good functions, will be considered false positives
  • All the other warnings will be ignored and will not be used in our trainnings set

TODO: GENERATE TABLE WITH MAPPINGS WARNING-MESSAGE=>CWE FOR EACH TOOL

TODO: GENERATE TABLES OR CHARTS WITH DATA CONTAINING NUMBERS OF WARNINGS GENERATED BY EACH TOOL FOR EACH TEST CASE. HOW MANY WERE USED? HOW MANY WERE IGNORED? (FOR EACH CASE)

cwe   tool  total_warnings  warnings_inside_good_or_bad_functions  warnings_inside_AND_related_to_cwe
total

To label a warning, first we need to find in which function it belongs, the get_functions_info.sh script outputs a file with all function and class locations.

As per the Juliet 1.2 documentation:

  • Warnings in a function with the word "bad" in its name are TRUE POSITIVES
  • Warnings in a class with the word "bad" in the FILE NAME are TRUE POSITIVES
  • Warnings in a function with the word "good" in its name are FALSE POSITIVES
  • Warnings in a class with the word "good" in the FILE NAME are FALSE POSITIVES

The warnings must match the CWE flaw category to fit in any of the above classifications, i.e., if a warning is triggered in a function with the word bad in its name for a division by zero test case, and the warning message says a null pointer derreference was found, the warning must be ignored and not included in our trainning set. This was done manually, by verifying each different message string in each warning triggered against each different CWE. It is important that the warnings do match the CWE precisely, so we have our trainning set labaled correctly (less is more). This is VERY important, since there will often exist a similar case with a false positive in the test case which we will label as false positive. If we accept strings with related flaws for a test case, where this related flaw may show up in the fix for that CWE, we will assign wrong labels to some warnings, hence, it is better to not include those warnings at all. When in doubt, we would not consider a warning category for the test cases.

The file used as a base for the manual inspections in this repository is raw_cwe_versus_warning_msg.txt. Note that although the firehose_report_parser.py file does output this raw list, the one in the repository was sorted and repeated entries were already removed. One can do that with cat raw_cwe_versus_warning_msg.txt | sort -u. TODO: SORT THE raw file automatically (in the make file?)

Note that for the cpp cases, we can just check the bad|good string in the file names (not just ending in good.cpp or bad.cpp, since there may exist suffixes or preffixes, like goodG2B.cpp)

with this data, we should start generating a CSV file, with information about the tool, file, line, label

Adding features

For each of the desired features, a new entry is added to the CSV file generated in the step above.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published