Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPDATE: Code scanning with GitHub CodeQL #46

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

a-a-ron
Copy link
Collaborator

@a-a-ron a-a-ron commented Sep 28, 2022

The purpose of this PR is to see how we can increase the completion rate and improve the user experience. Below is the August data for this module:

Module Completion Rate # completed / # started
Code scanning with GitHub CodeQL 15.87% 33/208

For reference: See the parent issue for context

Module file structure

Screen Shot 2022-09-28 at 1 54 00 PM

Items to review for this module

We need to conduct a performance review on each of these modules to see how we can increase the completion rates, improve the user experience, and maintain the level of quality that we set for our learners.

Items for consideration

  • Thoroughly review the module content for flow, narrative, and length
    • Review use of images, tables, and other visual elements
    • Does the content read like a doc or like a conversation guide?
    • Does the estimated time to complete each unit remain accurate?
  • Review the introduction unit
    • Does it capture your attention?
    • Does it contain repetitive content from the course title page?
    • Do the learning objectives/scope align with the content?
    • Does the target audience match the content?
    • Do we accurately describe the module and learning objectives? (Is it possible that learners start the course thinking it's one thing to find out it's another?)
  • Review the module exercise
    • Does it work?
    • Does it enforce a learning objective?
    • How is it placed in the module? Should we move it around?
    • Do we need other exercises?
  • Review the questions at the end of the module
    • Do the questions test on content taught in the module?
    • Do the questions follow best practices for item writing?

Estimated Work Effort

Task Hours
Detailed module review 18 hours
Action items as identified from review TBD

@rmallorybpc rmallorybpc self-assigned this Sep 28, 2022
@rmallorybpc
Copy link
Collaborator

rmallorybpc commented Sep 28, 2022

Plan of Action

Launch page

  • From the launch page, this is a long course.

We need to figure out how to split the module.

  • Could the module be broken into CodeQL and then Custom?
  • Move to a new Custom module: Reference a custom CodeQL query, Learn how to use the CodeQL CLI to generate code scanning results and upload them to GitHub, and Implement custom build steps
    • Those launch bullet points seem to match the units: Customize your code scanning workflow with CodeQL - Part 1, Customize your code scanning workflow with CodeQL - Part 2, and Customize languages and builds for code scanning

Introduction

  • Interesting introduction. Same questions as launch page about the amount of content.
  • Yes, the learning objectives/scope aligns with the content.

Content Units

  • Could be better conversation. Unit 2 What is CodeQL?, does not continue the story from the intro.
  • Quizzes in the units?
  • Transitions across all units.

Exercise

  • Add the exercise length to the exercise summary, instead of the summary page length (ex. 6min)

New modules

  • Plan the content in the 3 new modules
  • Create scaffolding for the 3 new modules
  • Add current content for the new Introduction to Code Scanning with GitHub CodeQL module
  • Write new content Introduction to Code Scanning with GitHub CodeQL module
  • Write knowledge check questions for the new Introduction to Code Scanning with GitHub CodeQL module
  • Check durationinminutes for Introduction to Code Scanning with GitHub CodeQL module
  • Add, How to provide feedback
  • Review the new Introduction to the new Analyze code by using CodeQL module
  • Determine new module name - Analyze code by using CodeQL
  • Add current content for the new Introduction to Analyze code by using CodeQL module
  • Write new content for the new Analyze code by using CodeQL module
  • Write knowledge check questions for the new Analyze code by using CodeQL module
  • Check durationinminutes in all units for Analyze code by using CodeQL module
  • Update the unit file titles in yaml and md for Analyze code by using CodeQL module
  • Add, How to provide feedback
  • Review the new Introduction to the new Customize Code Scanning with GitHub CodeQL module
  • Add current content for the new Customize Code Scanning with GitHub CodeQL module
  • Write new content Introduction to the new Customize Code Scanning with GitHub CodeQL module
  • Write knowledge check questions for the new Customize Code Scanning with GitHub CodeQL module
  • Review the new Introduction to the new Customize Code Scanning with GitHub CodeQL module
  • Check durationinminutes in all units for Customize Code Scanning with GitHub CodeQL module
  • Update the unit file titles in yaml and md for Customize Code Scanning with GitHub CodeQL module
  • Add, How to provide feedback

@rmallorybpc
Copy link
Collaborator

rmallorybpc commented Oct 4, 2022

Change approach here, content moved back in.

Move Introduction information

This comment was used to save the moved content, copy and paste, and compare the content across the two files.

Learning objectives

By the end of this module, you will be able to:

  • Understand CodeQL and how it analyzes code
  • Understand QL, a unique logic programming language
  • Set up CodeQL based code scanning in a GitHub repository
  • Reference a custom CodeQL query
  • Configure the language matrix in a CodeQL workflow
  • Learn how to use the CodeQL CLI to generate code scanning results and upload them to GitHub
  • Implement custom build steps

Prerequisites

  • A GitHub enterprise account with a GitHub Advanced Security license
  • Necessary permissions to administrate your repository
  • Knowledge of GitHub Advanced Security's code scanning feature
  • Knowledge of GitHub Actions

Moved the learning objectives and prerequisites to the launch page, and added a transition sentence.
@rmallorybpc
Copy link
Collaborator

rmallorybpc commented Oct 5, 2022

New module order

Proposed, the original module, Code scanning with GitHub CodeQL, to be split into 3 modules.

Introduction to Code scanning with GitHub CodeQL

1-introduction.md - edited existing unit
2-what-is-codeql.md - existing unit
3-how-does-codeql-analyze-code.md - existing unit
4-what-is-ql.md - existing unit
Exercise - new exercise needed - new unit
12-summary.md - edited existing unit

New module

Analyze code by using CodeQL

introduction - for new content
about integration with code scanning - new unit
5-code-scanning-codeql.md - existing unit
9-use-codeql-cli.md - more content if needed - existing unit
Recommended hardware resources for running CodeQL - new unit
Exercise - new exercise needed - new unit
summary.md - edited, revised new unit

New module

Customize Code Scanning with GitHub CodeQL

introduction.md - edited, revised new unit
Configuring the CodeQL workflow for compiled languages - new unit
6-customize-your-scanning-workflow-with-codeql.md - existing unit
7-exercise-reference-codeql-query.md - existing unit
8-customize-your-scanning-workflow-with-codeql-2.md - existing unit
11-exercise-configure-language-matrix.md - existing unit
10-custom-build-steps-for-code-scanning.md - existing unit
Troubleshooting the CodeQL workflow - new unit
summary.md, revised new unit

Added the learning objectives and prerequisites back
@rmallorybpc
Copy link
Collaborator

rmallorybpc commented Oct 21, 2022

Introduction to Code scanning with GitHub CodeQL

intro-code-scanning-codeql

1-introduction.md - edited existing unit

Code scanning using CodeQL provides an extensible method to automate vulnerability scanning across your organizations GitHub repositories.

Imagine that you are a senior developer at a start-up company specializing in health care software. Your flagship product is a Java-based web portal that allows physicians to manage patient records. A recent penetration test of this product revealed a number of serious vulnerabilities that could compromise patient information. The CIO has asked you to implement automated code vulnerability scanning. Because your code is already hosted in a private repository on GitHub, you have decided to use the code scanning feature with CodeQL. You will need to understand how the feature works to persuade other developers and management to use the feature. You will also need to understand the various configuration options and how to implement and maintain a code scanning pipeline to assist other developers at your company in configuring and deploying code scanning correctly.

In this module, you will learn about the CodeQL static analysis tool and how the code scanning feature in GitHub uses it to automate vulnerability scanning. You will also learn how to customize a code scanning workflow that uses CodeQL, how to include additional queries, and how to adapt your workflow to repositories that have multiple languages.

Learning objectives

By the end of this module, you will be able to:

  • Understand CodeQL and how it analyzes code
  • Understand QL, a unique logic programming language
  • Set up CodeQL based code scanning in a GitHub repository
  • Reference a custom CodeQL query
  • Configure the language matrix in a CodeQL workflow
  • Learn how to use the CodeQL CLI to generate code scanning results and upload them to GitHub
  • Implement custom build steps

Prerequisites

  • A GitHub enterprise account with a GitHub Advanced Security license
  • Necessary permissions to administrate your repository
  • Knowledge of GitHub Advanced Security's code scanning feature
  • Knowledge of GitHub Actions

Next up, you'll learn how CodeQL is used by developers.

2-what-is-codeql.md - existing unit

CodeQL is the analysis engine used by developers to automate security checks, and by security researchers to perform variant analysis.

In CodeQL, code is treated like data. Security vulnerabilities, bugs, and other errors are modeled as queries that can be executed against databases extracted from code. You can run the standard CodeQL queries, written by GitHub researchers and community contributors, or write your own to use in custom analyses. Queries that find potential bugs highlight the result directly in the source file.

In this unit, you will learn about the CodeQL static analysis tool and how it uses databases, query suites and query language packs to perform variant analysis.

Variant analysis

Variant analysis is the process of using a known security vulnerability as a seed to find similar problems in your code. It’s a technique that security engineers use to identify potential vulnerabilities, and ensure these threats are properly fixed across multiple codebases.

Querying code using CodeQL is the most efficient way to perform variant analysis. You can use the standard CodeQL queries to identify seed vulnerabilities, or find new vulnerabilities by writing your own custom CodeQL queries. Then, develop or iterate over the query to automatically find logical variants of the same bug that could be missed using traditional manual techniques.

CodeQL databases

CodeQL databases contain queryable data extracted from a codebase, for a single language at a particular point in time. The database contains a full, hierarchical representation of the code, including a representation of the abstract syntax tree, the data flow graph, and the control flow graph.

Each language has its own unique database schema that defines the relations used to create a database. The schema provides an interface between the initial lexical analysis performed during the extraction process, and the actual complex analysis of the CodeQL query evaluator. The schema specifies, for instance, that there is a table for every language construct.

For each language, the CodeQL libraries define classes to provide a layer of abstraction over the database tables. This provides an object-oriented view of the data which makes it easier to write queries.

For example, in a CodeQL database for a Java program, two key tables are:

  • The expressions table containing a row for every single expression in the source code that was analyzed during the build process.
  • The statements table containing a row for every single statement in the source code that was analyzed during the build process.

The CodeQL library defines classes to provide a layer of abstraction over each of these tables (and the related auxiliary tables): Expr and Stmt.

Query suites

CodeQL query suites provide a way of selecting queries, based on their filename, location on disk or in a QL pack, or metadata properties. Create query suites for the queries that you want to frequently use in your CodeQL analyses.

Query suites allow you to pass multiple queries to CodeQL without having to specify the path to each query file individually. Query suite definitions are stored in YAML files with the extension .qls. A suite definition is a sequence of instructions, where each instruction is a YAML mapping with (usually) a single key. The instructions are executed in the order they appear in the query suite definition. After all the instructions in the suite definition have been executed, the result is a set of selected queries.

Default query suites

There are three default query suites for CodeQL:

  • code-scanning: queries run by default in CodeQL code scanning on GitHub.
  • security-extended: queries from code-scanning, plus extra security queries with slightly lower precision and severity.
  • security-and-quality: queries from code-scanning, security-extended, plus extra maintainability and reliability queries.

Query Language (QL) packs

QL packs are used to organize the files used in CodeQL analysis. They contain queries, library files, query suites, and important metadata.

The CodeQL repository contains QL packs for C/C++, C#, Java, JavaScript, Python, and Ruby. The CodeQL for Go repository contains a QL pack for Go analysis. You can also make custom QL packs to contain your own queries and libraries.

QL pack structure

A QL pack must contain a file called qlpack.yml in its root directory. The other files and directories within the pack should be logically organized. For example:

  • Queries are organized into directories for specific categories.
  • Queries for specific products, libraries, and frameworks are organized into their own top-level directories.
  • There is a top-level directory named <owner>/<language> for query library (.qll) files. Within this directory, .qll files should be organized into subdirectories for specific categories.

An example qlpack.yml file is shown below.

name: codeql/java-queries
version: 0.0.6-dev
groups: java
suites: codeql-suites
extractor: java
defaultSuiteFile: codeql-suites/java-code-scanning.qls
dependencies:
    codeql/java-all: "*"
    codeql/suite-helpers: "*"

3-how-does-codeql-analyze-code.md - existing unit

Implementing code scanning with CodeQL requires an understanding of how the tool analyzes code.

CodeQL analysis consists of three steps:

  1. Preparing the code, by creating a CodeQL database.
  2. Running CodeQL queries against the database.
  3. Interpreting the query results.

In this unit, you will learn about the three phases of CodeQL analysis.

Database creation

To create a database, CodeQL first extracts a single relational representation of each source file in the codebase.

For compiled languages, extraction works by monitoring the normal build process. Each time a compiler is invoked to process a source file, a copy of that file is made, and all relevant information about the source code is collected. This includes syntactic data about the abstract syntax tree and semantic data about name binding and type information.

For interpreted languages, the extractor runs directly on the source code, resolving dependencies to give an accurate representation of the codebase.

There is one extractor for each language supported by CodeQL to ensure that the extraction process is as accurate as possible. For multi-language codebases, databases are generated one language at a time.

After extraction, all the data required for analysis (relational data, copied source files, and a language-specific database schema, which specifies the mutual relations in the data) is imported into a single directory, known as a CodeQL database.

Query execution

After you’ve created a CodeQL database, one or more queries are executed against it. CodeQL queries are written in a specially designed object-oriented query language called QL.

You can run the queries checked out from the CodeQL repo (or custom queries that you’ve written yourself) using the CodeQL for VS Code extension or the CodeQL CLI.

Query results

The final step converts results produced during query execution into a form that is more meaningful in the context of the source code. That is, the results are interpreted in a way that highlights the potential issue that the queries are designed to find.

:::image type="content" source="../media/codeql-query-results.png" alt-text="Screenshot of CodeQL query results.":::

Queries contain metadata properties that indicate how the results should be interpreted. For instance, some queries display a simple message at a single location in the code. Others display a series of locations that represent steps along a data-flow or control-flow path, along with a message explaining the significance of the result. Queries that don’t have metadata are not interpreted—their results are output as a table and not displayed in the source code.

Following interpretation, results are output for code review and triaging. In CodeQL for Visual Studio Code, interpreted query results are automatically displayed in the source code. Results generated by the CodeQL CLI can be output into a number of different formats for use with different tools.

4-what-is-ql.md - existing unit

QL is a declarative, object-oriented query language that is optimized to enable efficient analysis of hierarchical data structures, in particular, databases representing software artifacts.

A database is an organized collection of data. The most commonly used database model is a relational model which stores data in tables and SQL (Structured Query Language) is the most commonly used query language for relational databases.

The purpose of a query language is to provide a programming platform where you can ask questions about information stored in a database. A database management system manages the storage and administration of data and provides the querying mechanism. A query typically refers to the relevant database entities and specifies various conditions (called predicates) that must be satisfied by the results. Query evaluation involves checking these predicates and generating the results. Some of the desirable properties of a good query language and its implementation include:

  • Declarative specifications - a declarative specification describes properties that the result must satisfy, rather than providing the procedure to compute the result. In the context of database query languages, declarative specifications abstract away the details of the underlying database management system and query processing techniques. This greatly simplifies query writing.
  • Expressiveness - a powerful query language allows you to write complex queries. This makes the language widely applicable.
  • Efficient execution - queries can be complex and databases can be very large, so it is crucial for a query language implementation to process and execute queries efficiently.

In this unit, you will learn about the basic features of the QL programming language so that you can write your own custom queries or better understand the pre-existing open source queries available.

The QL syntax

The syntax of QL is similar to SQL, but the semantics of QL are based on Datalog, a declarative logic programming language often used as a query language. This makes QL primarily a logic language, and all operations in QL are logical operations. Furthermore, QL inherits recursive predicates from Datalog, and adds support for aggregates, making even complex queries concise and simple. For example, consider a database containing parent-child relationships for people. If you want to find the number of descendants of a person, typically you would:

  1. Find a descendant of the given person, that is, a child or a descendant of a child.
  2. Count the number of descendants found using the previous step.

When you write this process in QL, it closely resembles the above structure. Notice that the example used recursion to find all descendants of the given person, and an aggregate to count the number of descendants. Translating these steps into the final query without adding any procedural details is possible due to the declarative nature of the language. The QL code would look something like this:

Person getADescendant(Person p) {
  result = p.getAChild() or
  result = getADescendant(p.getAChild())
}

int getNumberOfDescendants(Person p) {
  result = count(getADescendant(p))
}

Object orientation

Object orientation is an important feature of QL. The benefits of object orientation are well-known – it increases modularity, enables information hiding, and allows code reuse. QL offers all these benefits without compromising on its logical foundation. This is achieved by defining a simple object model where classes are modeled as predicates and inheritance as implication. The libraries made available for all supported languages make extensive use of classes and inheritance.

QL and general purpose programming languages

Here are a few prominent conceptual and functional differences between general purpose programming languages and QL:

  • QL does not have any imperative features such as assignments to variables or file system operations.
  • QL operates on sets of tuples and a query can be viewed as a complex sequence of set operations that defines the result of the query.
  • QL’s set-based semantics makes it very natural to process collections of values without having to worry about efficiently storing, indexing and traversing them.

In object-oriented programming languages, instantiating a class involves creating an object by allocating physical memory to hold the state of that instance of the class. In QL, classes are just logical properties describing sets of already existing values.

Exercise - new exercise needed - new unit

Knowledge check

Alternatively, quizzes in each unit

12-summary.md - edited existing unit

You are a senior developer responsible for implementing automated code vulnerability scanning at your company. You need to understand how code scanning with CodeQL works and how to configure it, so that you can help your entire organization adopt it.

You did some research on code scanning with CodeQL and found the following:

  • Code scanning with CodeQL uses a workflow file that specifies the location of queries, which languages to analyze, and whether they should be built with autobuild, or manual build steps
  • GitHub supports integration of third party scanning and alerting tools in the code scanning process
  • CodeQL has a CLI that allows you to create and analyze databases offline and then upload the results to GitHub using a SARIF file

Without using GitHub code scanning with CodeQL, it would be very difficult to automate both the scanning of your code, as well as generating pull requests to fix the vulnerable code. In addition, CodeQL provides an extensive, growing library of queries in multiple languages that help you create more secure code with little engineering effort.

Successfully rolling out automated code vulnerability scanning across your organization has made developers more productive and the product at your company more secure.

References

@rmallorybpc
Copy link
Collaborator

rmallorybpc commented Oct 21, 2022

Analyze code by using CodeQL

code-scanning-github-codeql-2 - to be changed to - analyze-code-using-codeql (or similar)

introduction - edit existing unit

Content to include
About CodeQL


Code scanning using CodeQL provides an extensible method to automate vulnerability scanning across your organizations GitHub repositories.

Imagine that you are a senior developer at a start-up company specializing in health care software. Your flagship product is a Java-based web portal that allows physicians to manage patient records. A recent penetration test of this product revealed a number of serious vulnerabilities that could compromise patient information. The CIO has asked you to implement automated code vulnerability scanning. Because your code is already hosted in a private repository on GitHub, you have decided to use the code scanning feature with CodeQL. You will need to understand how the feature works to persuade other developers and management to use the feature. You will also need to understand the various configuration options and how to implement and maintain a code scanning pipeline to assist other developers at your company in configuring and deploying code scanning correctly.

In this module, you will learn about the CodeQL static analysis tool and how the code scanning feature in GitHub uses it to automate vulnerability scanning. You will also learn how to customize a code scanning workflow that uses CodeQL, how to include additional queries, and how to adapt your workflow to repositories that have multiple languages.

Learning objectives

By the end of this module, you will be able to:

  • Understand CodeQL and how it analyzes code
  • Understand QL, a unique logic programming language
  • Set up CodeQL based code scanning in a GitHub repository
  • Reference a custom CodeQL query
  • Configure the language matrix in a CodeQL workflow
  • Learn how to use the CodeQL CLI to generate code scanning results and upload them to GitHub
  • Implement custom build steps

Prerequisites

  • A GitHub enterprise account with a GitHub Advanced Security license
  • Necessary permissions to administrate your repository
  • Knowledge of GitHub Advanced Security's code scanning feature
  • Knowledge of GitHub Actions

Next up, you'll learn how CodeQL is used by developers.

Integration with code scanning

about integration with code scanning - content to include

5-code-scanning-codeql.md - existing unit

Depending on which tool you want to use for analysis and how you want to generate alerts, there are a few different options for setting up a code scanning workflow on your repository:

Analysis tool Alert generation
CodeQL GitHub Actions
CodeQL CodeQL in a third-party continuous integration (CI) system
Third-party GitHub Actions
Third-party Generated externally and then uploaded to GitHub

In this unit, you will learn how to set up code scanning with GitHub Actions, as well as how to perform bulk setup of code scanning for multiple repositories.

Code scanning with GitHub Actions and CodeQL

To set up code scanning with GitHub Actions and CodeQL on a repository, do the following:

  1. Go to the Security tab of your repository.

    :::image type="content" source="../media/security-tab.png" alt-text="Screenshot of the Security tab.":::

  2. To the right of Code scanning alerts, click Set up code scanning. If code scanning is missing, this means you need to enable GitHub Advanced Security.

  3. Under Get started with code scanning, click Set up this workflow on the CodeQL analysis workflow or on a third-party workflow.

    [!Note]
    Workflows are only displayed if they are relevant for the programming languages detected in the repository. The CodeQL analysis workflow is always displayed, but the "Set up this workflow" button is only enabled if CodeQL analysis supports the languages present in the repository.

  4. To customize how code scanning scans your code, edit the workflow. Generally you can commit the CodeQL analysis workflow without making any changes to it. However, many of the third-party workflows require additional configuration, so read the comments in the workflow before committing.

  5. Use the Start commit drop-down, and type a commit message.

  6. Choose whether you'd like to commit directly to the default branch, or create a new branch and start a pull request.

  7. Click Commit new file or Propose new file.

In the default CodeQL analysis workflow, code scanning is configured to analyze your code each time you either push a change to the default branch or any protected branches, or raise a pull request against the default branch. As a result, code scanning will now commence.

The on:pull_request and on:push triggers for code scanning are each useful for different purposes.

Bulk setup of code scanning

You can set up code scanning in many repositories at once using a script. If you'd like to use a script to raise pull requests that add a GitHub Actions workflow to multiple repositories, see the jhutchings1/Create-ActionsPRs repository for an example using PowerShell, or nickliffen/ghas-enablement for an example using NodeJS.

9-use-codeql-cli.md - existing unit

more content if needed
Using the CodeQL CLI


In addition to the graphical user interface on GitHub.com, you can also access many of the same primary CodeQL features through a command line interface.

This unit will cover using the CodeQL CLI to create databases, analyze databases and upload the results to GitHub.

CodeQL CLI commands

Once you've made the CodeQL CLI available to servers in your CI system, and ensured that they can authenticate with GitHub, you're ready to generate data.

You use three different commands to generate results and upload them to GitHub:

  • database create to create a CodeQL database to represent the hierarchical structure of each supported programming language in the repository.
  • database analyze to run queries to analyze each CodeQL database and summarize the results in a SARIF file.
  • github upload-results to upload the resulting SARIF files to GitHub where the results are matched to a branch or pull request and displayed as code scanning alerts.

You can display the command-line help for any command using the --help option.

Uploading SARIF data to display as code scanning results in GitHub is supported for organization-owned repositories with GitHub Advanced Security enabled, and public repositories on GitHub.com.

Create CodeQL databases to analyze

Follow the steps below to create CodeQL databases to analyze:

  1. Check out the code that you want to analyze:
    • For a branch, check out the head of the branch that you want to analyze.
    • For a pull request, check out either the head commit of the pull request, or check out a GitHub-generated merge commit of the pull request.
  2. Set up the environment for the codebase, making sure that any dependencies are available.
  3. Find the build command, if any, for the codebase. Typically this is available in a configuration file in the CI system.
  4. Run codeql database create from the checkout root of your repository and build the codebase:
    • To create one CodeQL database for a single supported language, use the following command:

      codeql database create <database> --command<build> --language=<language-identifier>
    • To create one CodeQL database per language for multiple supported languages, use the following command:

      codeql database create <database> --command<build> \
        --db-cluster --language=<language-identifier>,<language-identifier>

Note

If you use a containerized build, you need to run the CodeQL CLI inside the container where your build task takes place.

The full list of parameters for the database create command is shown in the table below.

Option Required Usage
<database> Specify the name and location of a directory to create for the CodeQL database. The command will fail if you try to overwrite an existing directory. If you also specify --db-cluster, this is the parent directory and a subdirectory is created for each language analyzed.
--language Specify the identifier for the language to create a database for, one of: cpp, csharp, go, java, javascript, python, and ruby (use Javascript to analyze TypeScript code). When used with --db-cluster, the option accepts a comma-separated list, or can be specified more than once.
--command Recommended. Use to specify the build command or script that invokes the build process for the codebase. Commands are run from the current folder or, where it is defined, from --source-root. Not needed for Python and JavaScript/TypeScript analysis.
--db-cluster Optional. Use in multi-language codebases to generate one database for each language specified by --language.
--no-run-unnecessary-builds Recommended. Use to suppress the build command for languages where the CodeQL CLI does not need to monitor the build (for example, Python and JavaScript/TypeScript).
--source-root Optional. Use if you run the CLI outside the checkout root of the repository. By default, the database create command assumes that the current directory is the root directory for the source files, use this option to specify a different location.

Single language example

This example creates a CodeQL database for the repository checked out at /checkouts/example-repo. It uses the JavaScript extractor to create a hierarchical representation of the JavaScript and TypeScript code in the repository. The resulting database is stored in /codeql-dbs/example-repo.

$ codeql database create /codeql-dbs/example-repo --language=javascript \
    --source-root /checkouts/example-repo

> Initializing database at /codeql-dbs/example-repo.
> Running command [/codeql-home/codeql/javascript/tools/autobuild.cmd]
    in /checkouts/example-repo.
> [build-stdout] Single-threaded extraction.
> [build-stdout] Extracting
...
> Finalizing database at /codeql-dbs/example-repo.
> Successfully created database at /codeql-dbs/example-repo.

Multiple languages example

This example creates two CodeQL databases for the repository checked out at /checkouts/example-repo-multi. It uses:

  • --db-cluster to request analysis of more than one language.
  • --language to specify which languages to create databases for.
  • --command to tell the tool the build command for the codebase, here make.
  • --no-run-unnecessary-builds to tell the tool to skip the build command for languages where it is not needed (like Python).

The resulting databases are stored in python and cpp subdirectories of /codeql-dbs/example-repo-multi.

$ codeql database create /codeql-dbs/example-repo-multi \
    --db-cluster --language python,cpp \
    --command make --no-run-unnecessary-builds \
    --source-root /checkouts/example-repo-multi
Initializing databases at /codeql-dbs/example-repo-multi.
Running build command: [make]
[build-stdout] Calling python3 /codeql-bundle/codeql/python/tools/get_venv_lib.py
[build-stdout] Calling python3 -S /codeql-bundle/codeql/python/tools/python_tracer.py -v -z all -c /codeql-dbs/example-repo-multi/python/working/trap_cache -p ERROR: 'pip' not installed.
[build-stdout] /usr/local/lib/python3.6/dist-packages -R /checkouts/example-repo-multi
[build-stdout] [INFO] Python version 3.6.9
[build-stdout] [INFO] Python extractor version 5.16
[build-stdout] [INFO] [2] Extracted file /checkouts/example-repo-multi/hello.py in 5ms
[build-stdout] [INFO] Processed 1 modules in 0.15s
[build-stdout] <output from calling 'make' to build the C/C++ code>
Finalizing databases at /codeql-dbs/example-repo-multi.
Successfully created databases at /codeql-dbs/example-repo-multi.
$

Analyze a CodeQL database

After creating your CodeQL database, follow the steps below to analyze it:

  1. Optionally run codeql pack download <packs> to download any CodeQL packs (beta) that you want to run during analysis.
  2. Run codeql database analyze on the database and specify which packs and/or queries to use.
codeql database analyze <database> --format=<format> \
    --output=<output>  <packs,queries>

Note

If you analyze more than one CodeQL database for a single commit, you must specify a SARIF category for each set of results generated by this command. When you upload the results to GitHub, code scanning uses this category to store the results for each language separately. If you forget to do this, each upload overwrites the previous results.

codeql database analyze <database> --format=<format> \
    --sarif-category=<language-specifier> --output=<output> \
    <packs,queries>

The full list of parameters for the database analyze command is shown in the table below.

Option Required Usage
<database> Specify the path for the directory that contains the CodeQL database to analyze.
<packs,queries> Specify CodeQL packs or queries to run. To run the standard queries used for code scanning, omit this parameter. To see the other query suites included in the CodeQL CLI bundle, look in /<extraction-root>/codeql/qlpacks/codeql-<language>/codeql-suites. For information about creating your own query suite, see Creating CodeQL query suites in the documentation for the CodeQL CLI.
--format Specify the format for the results file generated by the command. For upload to GitHub this should be: sarif-latest.
--output Specify where to save the SARIF results file.
--sarif-category Optional for single database analysis. Required to define the language when you analyze multiple databases for a single commit in a repository. Specify a category to include in the SARIF results file for this analysis. A category is used to distinguish multiple analyses for the same tool and commit, but performed on different languages or different parts of the code.
--sarif-add-query-help Optional. Use if you want to include any available markdown-rendered query help for custom queries used in your analysis. Any query help for custom queries included in the SARIF output will be displayed in the code scanning UI if the relevant query generates an alert.
<packs> Optional. Use if you have downloaded CodeQL query packs and want to run the default queries or query suites specified in the packs.
--threads Optional. Use if you want to use more than one thread to run queries. The default value is 1. You can specify more threads to speed up query execution. To set the number of threads to the number of logical processors, specify 0.
--verbose Optional. Use to get more detailed information about the analysis process and diagnostic data from the database creation process.

Basic example

This example analyzes a CodeQL database stored at /codeql-dbs/example-repo and saves the results as a SARIF file: /temp/example-repo-js.sarif. It uses --sarif-category to include extra information in the SARIF file that identifies the results as JavaScript. This is essential when you have more than one CodeQL database to analyze for a single commit in a repository.

$ codeql database analyze /codeql-dbs/example-repo  \
    javascript-code-scanning.qls --sarif-category=javascript
    --format=sarif-latest --output=/temp/example-repo-js.sarif

> Running queries.
> Compiling query plan for /codeql-home/codeql/qlpacks/
    codeql-javascript/AngularJS/DisablingSce.ql.
...
> Shutting down query evaluator.
> Interpreting results.

Upload results to GitHub

SARIF upload supports a maximum of 5,000 results per upload. Any results over this limit are ignored. If a tool generates too many results, you should update the configuration to focus on results for the most important rules or queries.

For each upload, SARIF upload supports a maximum size of 10 MB for the gzip-compressed SARIF file. Any uploads over this limit will be rejected. If your SARIF file is too large because it contains too many results, you should update the configuration to focus on results for the most important rules or queries.

Before you can upload results to GitHub, you must determine the best way to pass the GitHub App or personal access token you created earlier to the CodeQL CLI. We recommend that you review your CI system's guidance on the secure use of a secret store. The CodeQL CLI supports:

  • Passing the token to the CLI via standard input using the --github-auth-stdin option (recommended).
  • Saving the secret in the environment variable GITHUB_TOKEN and running the CLI without including the --github-auth-stdin option.

When you have decided on the most secure and reliable method for your CI server, run codeql github upload-results on each SARIF results file and include --github-auth-stdin unless the token is available in the environment variable GITHUB_TOKEN.

echo "$UPLOAD_TOKEN" | codeql github upload-results --repository=<repository-name> \
      --ref=<ref> --commit=<commit> --sarif=<file> \
      --github-auth-stdin

The full list of parameters for the github upload-results command is shown in the table below.

Option Required Usage
--repository Specify the OWNER/NAME of the repository to upload data to. The owner must be an organization within an enterprise that has a license for GitHub Advanced Security and GitHub Advanced Security must be enabled for the repository, unless the repository is public.
--ref Specify the name of the ref you checked out and analyzed so that the results can be matched to the correct code. For a branch use: refs/heads/BRANCH-NAME, for the head commit of a pull request use refs/pulls/NUMBER/head, or for the GitHub-generated merge commit of a pull request use refs/pulls/NUMBER/merge.
--commit Specify the full SHA of the commit you analyzed.
--sarif Specify the SARIF file to load.
--github-auth-stdin Optional. Use to pass the CLI the GitHub App or personal access token created for authentication with GitHub's REST API via standard input. This is not needed if the command has access to a GITHUB_TOKEN environment variable set with this token.

Hardware resources for running CodeQL - new unit

Recommended hardware resources for running CodeQL
System requirements

Exercise - new exercise needed - new unit

Content needed

Knowledge check

Alternatively, quizzes in each unit

summary.md - edited, revised new unit

You are a senior developer responsible for implementing automated code vulnerability scanning at your company. You need to understand how code scanning with CodeQL works and how to configure it, so that you can help your entire organization adopt it.

You did some research on code scanning with CodeQL and found the following:

  • Code scanning with CodeQL uses a workflow file that specifies the location of queries, which languages to analyze, and whether they should be built with autobuild, or manual build steps
  • GitHub supports integration of third party scanning and alerting tools in the code scanning process
  • CodeQL has a CLI that allows you to create and analyze databases offline and then upload the results to GitHub using a SARIF file

Without using GitHub code scanning with CodeQL, it would be very difficult to automate both the scanning of your code, as well as generating pull requests to fix the vulnerable code. In addition, CodeQL provides an extensive, growing library of queries in multiple languages that help you create more secure code with little engineering effort.

Successfully rolling out automated code vulnerability scanning across your organization has made developers more productive and the product at your company more secure.

References

@rmallorybpc
Copy link
Collaborator

rmallorybpc commented Oct 21, 2022

Customize Code Scanning with GitHub CodeQL

customize-code-scanning-codeql

introduction.md - edited, revised new unit

Code scanning using CodeQL provides an extensible method to automate vulnerability scanning across your organizations GitHub repositories.

Imagine that you are a senior developer at a start-up company specializing in health care software. Your flagship product is a Java-based web portal that allows physicians to manage patient records. A recent penetration test of this product revealed a number of serious vulnerabilities that could compromise patient information. The CIO has asked you to implement automated code vulnerability scanning. Because your code is already hosted in a private repository on GitHub, you have decided to use the code scanning feature with CodeQL. You will need to understand how the feature works to persuade other developers and management to use the feature. You will also need to understand the various configuration options and how to implement and maintain a code scanning pipeline to assist other developers at your company in configuring and deploying code scanning correctly.

In this module, you will learn about the CodeQL static analysis tool and how the code scanning feature in GitHub uses it to automate vulnerability scanning. You will also learn how to customize a code scanning workflow that uses CodeQL, how to include additional queries, and how to adapt your workflow to repositories that have multiple languages.

Learning objectives

By the end of this module, you will be able to:

  • Understand CodeQL and how it analyzes code
  • Understand QL, a unique logic programming language
  • Set up CodeQL based code scanning in a GitHub repository
  • Reference a custom CodeQL query
  • Configure the language matrix in a CodeQL workflow
  • Learn how to use the CodeQL CLI to generate code scanning results and upload them to GitHub
  • Implement custom build steps

Prerequisites

  • A GitHub enterprise account with a GitHub Advanced Security license
  • Necessary permissions to administrate your repository
  • Knowledge of GitHub Advanced Security's code scanning feature
  • Knowledge of GitHub Actions

Next up, you'll learn how CodeQL is used by developers.

Configuring the CodeQL workflow for compiled languages - new unit

Configuring the CodeQL workflow for compiled languages
QL language specification

6-customize-your-scanning-workflow-with-codeql.md - existing unit

Code scanning workflows that use CodeQL have various configuration options that can be adjusted to better suit the needs of your organization.

When you use CodeQL to scan code, the CodeQL analysis engine generates a database from the code and runs queries on it. CodeQL analysis uses a default set of queries, but you can specify more queries to run, in addition to the default queries.

You can run extra queries if they are part of a CodeQL pack (beta) published to the GitHub Container registry or a QL pack stored in a repository.

There are two options for specifying which queries you want to run with CodeQL code scanning:

  • Using your code scanning workflow
  • Using a custom configuration file

In this unit, you will learn how to edit a workflow file to reference additional queries, how to use queries from query packs and how to combine queries from a workflow file and a custom configuration file.

Specify additional queries in a workflow file

The options available to specify the additional queries you want to run are:

  • packs to install one or more CodeQL query packs (beta) and run the default query suite or queries for those packs.
  • queries to specify a single .ql file, a directory containing multiple .ql files, a .qls query suite definition file, or any combination.

You can use both packs and queries in the same workflow.

We don't recommend referencing query suites directly from the github/codeql repository, like github/codeql/cpp/ql/src@main. Such queries may not be compiled with the same version of CodeQL as used for your other queries, which could lead to errors during analysis.

Use CodeQL query packs

Note

The CodeQL package management functionality, including CodeQL packs, is currently in beta and subject to change.

To add one or more CodeQL query packs (beta), add a with: packs: entry within the uses: github/codeql-action/init@v1 section of the workflow. Within packs you specify one or more packages to use and, optionally, which version to download. Where you don't specify a version, the latest version is downloaded. If you want to use packages that are not publicly available, you need to set the GITHUB_TOKEN environment variable to a secret that has access to the packages.

In the example below, scope is the organization or personal account that published the package. When the workflow runs, the three CodeQL query packs are downloaded from GitHub and the default queries or query suite for each pack run. The latest version of pack1 is downloaded as no version is specified. Version 1.2.3 of pack2 is downloaded, as well as the latest version of pack3 that is compatible with version 1.2.3.

- uses: github/codeql-action/init@v1
  with:
    # Comma-separated list of packs to download
    packs: scope/pack1,scope/[email protected],scope/pack3@~1.2.3

Note

For workflows that generate CodeQL databases for multiple languages, you must instead specify the CodeQL query packs in a configuration file.

Use queries in QL packs

To add one or more queries, add a with: queries: entry within the uses: github/codeql-action/init@v1 section of the workflow. If the queries are in a private repository, use the external-repository-token parameter to specify a token that has access to check out the private repository.

- uses: github/codeql-action/init@v1
  with:
    queries: COMMA-SEPARATED LIST OF PATHS
    # Optional. Provide a token to access queries stored in private repositories.
    external-repository-token: ${{ secrets.ACCESS_TOKEN }}

You can also specify query suites in the value of queries. Query suites are collections of queries, usually grouped by purpose or language.

The following query suites are built into CodeQL code scanning and are available for use.

Query suite Description
code-scanning Queries run by default in CodeQL code scanning on GitHub.
security-extended Queries of lower severity and precision than the default queries
security-and-quality Queries from security-extended, plus maintainability and reliability queries

When you specify a query suite, the CodeQL analysis engine will run the queries contained within the suite for you, in addition to the default set of queries.

Combine queries from a workflow file and a custom configuration file

If you also use a configuration file for custom settings, any additional packs or queries specified in your workflow are used instead of those specified in the configuration file. If you want to run the combined set of additional packs or queries, prefix the value of packs or queries in the workflow with the + symbol.

In the following example, the + symbol ensures that the specified additional packs and queries are used together with any specified in the referenced configuration file.

- uses: github/codeql-action/init@v1
  with:
    config-file: ./.github/codeql/codeql-config.yml
    queries: +security-and-quality,octo-org/python-qlpack/show_ifs.ql@main
    packs: +scope/pack1,scope/[email protected]`

7-exercise-reference-codeql-query.md - existing unit

This exercise checks your knowledge on referencing a CodeQL query in a CodeQL workflow.

This GitHub exercise is graded automatically once you have attempted
a solution to the challenge. The results of your actions, as well as
helpful feedback, are provided in real-time within the grade-learner workflow logs.

Here are some helpful tips before you begin the exercise:

  • Read the About this exercise section in the exercise's
    repository README to understand how the exercise works.
  • Follow the steps provided in the Instructions
    section to successfully complete the exercise.
  • To see the results of your exercise, navigate to the Actions
    tab of your cloned repository and click on the most recent run on the Grading workflow.
  • Stuck on what to do? Revisit the content in the last unit or
    check out the Useful resources section in
    the exercise's repository README for some additional resources.

Note

A grading script exists under .github/workflows/grading.yml.
You do not need to modify this workflow to complete this exercise.
Altering the contents in this workflow can break the exercise's
ability to validate your actions, provide feedback, or grade the results.

This exercise is a challenge based on content covered in this module.
It may take several attempts to complete the exercise, you can revisit
previous content in this module, or navigate to some of the additional resources provided as many times as you want to find the solution.

When you've finished the exercise in GitHub, return here for the next part on customizing your scanning workflow with CodeQL.

[!div class="nextstepaction"]
Start the exercise on GitHub

8-customize-your-scanning-workflow-with-codeql-2.md - existing unit

Code scanning workflows that use CodeQL have various configuration options that can be adjusted to better suit the needs of your organization.

In this unit, you will learn how to reference additional queries in a custom configuration file.

Additional queries in a custom configuration file

A custom configuration file is an alternative way to specify additional packs and queries to run. You can also use the file to disable the default queries and to specify which directories to scan during analysis.

In the workflow file, use the config-file parameter of the init action to specify the path to the configuration file you want to use. This example loads the configuration file ./.github/codeql/codeql-config.yml.

- uses: github/codeql-action/init@v1
  with:
    config-file: ./.github/codeql/codeql-config.yml

The configuration file can be located within the repository you are analyzing, or in an external repository. Using an external repository allows you to specify configuration options for multiple repositories in a single place. When you reference a configuration file located in an external repository, you can use the OWNER/REPOSITORY/FILENAME@BRANCH syntax. For example, octo-org/shared/codeql-config.yml@main.

If the configuration file is located in an external private repository, use the external-repository-token parameter of the init action to specify a token that has access to the private repository.

- uses: github/codeql-action/init@v1
  with:
    external-repository-token: ${{ secrets.ACCESS_TOKEN }}

The settings in the configuration file are written in YAML format.

Specify CodeQL query packs in custom configuration files

Note

The CodeQL package management functionality, including CodeQL packs, is currently in beta and subject to change.

You specify CodeQL query packs in an array. Note that the format is different from the format used by the workflow file.

packs:
  # Use the latest version of 'pack1' published by 'scope'
  - scope/pack1
  # Use version 1.23 of 'pack2'
  - scope/[email protected]
  # Use the latest version of 'pack3' compatible with 1.23
  - scope/pack3@~1.2.3

If you have a workflow that generates more than one CodeQL database, you can specify any CodeQL query packs to run in a custom configuration file using a nested map of packs.

packs:
  # Use these packs for JavaScript analysis
  javascript:
    - scope/js-pack1
    - scope/js-pack2
  # Use these packs for Java analysis
  java:
    - scope/java-pack1
    - scope/[email protected]

Specify additional queries in a custom configuration

You specify additional queries in a queries array. Each element of the array contains a uses parameter with a value that identifies a single query file, a directory containing query files, or a query suite definition file.

queries:
  - uses: ./my-basic-queries/example-query.ql
  - uses: ./my-advanced-queries
  - uses: ./query-suites/my-security-queries.qls

Optionally, you can give each array element a name, as shown in the example configuration file below.

name: "My CodeQL config"

disable-default-queries: true

queries:
  - name: Use an in-repository QL pack (run queries in the my-queries directory)
    uses: ./my-queries
  - name: Use an external JavaScript QL pack (run queries from an external repo)
    uses: octo-org/javascript-qlpack@main
  - name: Use an external query (run a single query from an external QL pack)
    uses: octo-org/python-qlpack/show_ifs.ql@main
  - name: Use a query suite file (run queries from a query suite in this repo)
    uses: ./codeql-qlpacks/complex-python-qlpack/rootAndBar.qls

paths:
  - src
paths-ignore:
  - src/node_modules
  - '**/*.test.js'

Disable the default queries

If you only want to run custom queries, you can disable the default security queries by using disable-default-queries: true. This flag should also be used if you are trying to construct a custom query suite that excludes a particular rule. This is to avoid having all of the queries run twice.

Specify directories to scan

For the interpreted languages that CodeQL supports (Python, Ruby and JavaScript/TypeScript), you can restrict code scanning to files in specific directories by adding a paths array to the configuration file. You can exclude the files in specific directories from analysis by adding a paths-ignore array.

paths:
  - src
paths-ignore:
  - src/node_modules
  - '**/*.test.js'

Note

  • The paths and paths-ignore keywords, used in the context of the code scanning configuration file, should not be confused with the same keywords when used for on.<push|pull_request>.paths in a workflow. When they are used to modify on.<push|pull_request> in a workflow, they determine whether the actions will be run when someone modifies code in the specified directories.
  • The filter pattern characters ?, +, [, ], and ! are not supported and will be matched literally.
  • ** characters can only be at the start or end of a line, or surrounded by slashes, and you can't mix ** and other characters. For example, foo/**, **/foo, and foo/**/bar are all allowed syntax, but **foo isn't. However you can use single stars along with other characters, as shown in the example. You'll need to quote anything that contains a * character.

For compiled languages, if you want to limit code scanning to specific directories in your project, you must specify appropriate build steps in the workflow. The commands you need to use to exclude a directory from the build will depend on your build system.

You can quickly analyze small portions of a monorepo when you modify code in specific directories. You'll need to both exclude directories in your build steps and use the paths-ignore and paths keywords for on.<push|pull_request> in your workflow.

11-exercise-configure-language-matrix.md - existing unit

This exercise checks your knowledge on configuring the language matrix in a CodeQL workflow.

This GitHub exercise is graded automatically once you have attempted
a solution to the challenge. The results of your actions, as well as
helpful feedback, are provided in real-time within the grade-learner workflow logs.

Here are some helpful tips before you begin the exercise:

  • Read the About this exercise section in the exercise's
    repository README to understand how the exercise works.
  • Follow the steps provided in the Instructions
    section to successfully complete the exercise.
  • To see the results of your exercise, navigate to the Actions
    tab of your cloned repository and click on the most recent run on the Grading workflow.
  • Stuck on what to do? Revisit the content in the last unit or
    check out the Useful resources section in
    the exercise's repository README for some additional resources.

Note

A grading script exists under .github/workflows/grading.yml.
You do not need to modify this workflow to complete this exercise.
Altering the contents in this workflow can break the exercise's
ability to validate your actions, provide feedback, or grade the results.

This exercise is a challenge based on content covered in this module.
It may take several attempts to complete the exercise, you can revisit
previous content in this module, or navigate to some of the additional resources provided as many times as you want to find the solution.

When you've finished the exercise in GitHub, return here for:

[!div class="checklist"]

10-custom-build-steps-for-code-scanning.md - existing unit

CodeQL code scanning supports many languages by default with an autobuild feature. If your code uses a non-standard build process, however, you may need to customize your workflow with custom build steps.

This unit will describe how to change the languages analyzed by code scanning and how to add custom build steps to a CodeQL code scanning workflow.

Change the languages that are analyzed

CodeQL code scanning automatically detects code written in the following supported languages: C/C++, C#, Go, Java, JavaScript/TypeScript, Python, and Ruby.

Note

CodeQL analysis for Ruby is currently in beta. During the beta, analysis of Ruby will be less comprehensive than CodeQL analysis of other languages.

The default CodeQL analysis workflow file contains a build matrix called language which lists the languages in your repository that are analyzed. CodeQL automatically populates this matrix when you add code scanning to a repository. Using the language matrix optimizes CodeQL to run each analysis in parallel. We recommend that all workflows adopt this configuration due to the performance benefits of parallelizing builds.

If your repository contains code in more than one of the supported languages, you can choose which languages you want to analyze. There are several reasons you might want to prevent a language being analyzed. For example, the project might have dependencies in a different language to the main body of your code, and you might prefer not to see alerts for those dependencies.

If your workflow uses the language matrix then CodeQL is hardcoded to analyze only the languages in the matrix. To change the languages you want to analyze, edit the value of the matrix variable. You can remove a language to prevent it being analyzed or you can add a language that was not present in the repository when code scanning was set up. For example, if the repository initially only contained JavaScript when code scanning was set up, and you later added Python code, you will need to add python to the matrix.

jobs:
  analyze:
    name: Analyze
    ...
    strategy:
      fail-fast: false
      matrix:
        language: ['javascript', 'python']

If your workflow does not contain a matrix called language, then CodeQL is configured to run analysis sequentially. If you don't specify languages in the workflow, CodeQL automatically detects, and attempts to analyze, any supported languages in the repository. If you want to choose which languages to analyze, without using a matrix, you can use the languages parameter under the init action.

- uses: github/codeql-action/init@v1
  with:
    languages: cpp, csharp, python

Custom build steps for code scanning

For the supported compiled languages, you can use the autobuild action in the CodeQL analysis workflow to build your code. This avoids you having to specify explicit build commands for C/C++, C#, and Java. CodeQL also runs a build for Go projects to set up the project. However, in contrast to the other compiled languages, all Go files in the repository are extracted, not just those that are built. You can use custom build commands to skip extracting Go files that are not touched by the build.

Add build steps for a compiled language

If the C/C++, C#, or Java code in your repository has a non-standard build process, autobuild may fail. You will need to remove the autobuild step from the workflow, and manually add build steps.

After removing the autobuild step, uncomment the run step and add build commands that are suitable for your repository. The workflow run step runs command-line programs using the operating system's shell. You can modify these commands and add more commands to customize the build process.

- run: |
  make bootstrap
  make release

If your repository contains multiple compiled languages, you can specify language-specific build commands. For example, if your repository contains C/C++, C# and Java, and autobuild correctly builds C/C++ and C# but fails to build Java, you could use the following configuration in your workflow, after the init step. This specifies build steps for Java while still using autobuild for C/C++ and C#:

- if: matrix.language == 'cpp' || matrix.language == 'csharp'
  name: Autobuild
  uses: github/codeql-action/autobuild@v1

- if: matrix.language == 'java'
  name: Build Java
  run: |
    make bootstrap
    make release

Troubleshooting the CodeQL workflow - new unit

Troubleshooting the CodeQL workflow
Troubleshooting query performance

summary.md, revised new unit

You are a senior developer responsible for implementing automated code vulnerability scanning at your company. You need to understand how code scanning with CodeQL works and how to configure it, so that you can help your entire organization adopt it.

You did some research on code scanning with CodeQL and found the following:

  • Code scanning with CodeQL uses a workflow file that specifies the location of queries, which languages to analyze, and whether they should be built with autobuild, or manual build steps
  • GitHub supports integration of third party scanning and alerting tools in the code scanning process
  • CodeQL has a CLI that allows you to create and analyze databases offline and then upload the results to GitHub using a SARIF file

Without using GitHub code scanning with CodeQL, it would be very difficult to automate both the scanning of your code, as well as generating pull requests to fix the vulnerable code. In addition, CodeQL provides an extensive, growing library of queries in multiple languages that help you create more secure code with little engineering effort.

Successfully rolling out automated code vulnerability scanning across your organization has made developers more productive and the product at your company more secure.

References

@rmallorybpc rmallorybpc added enhancement New feature or request in progress and removed in progress labels Oct 25, 2022
@rmallorybpc
Copy link
Collaborator

@a-a-ron below are the file titles to update.

Update yaml files' title
2-learning-content.yml to 2-integrate-code-scanning.yml
3-learning-content.yml to 3-code-scanning-github-actions.yml
4-learning-content.yml to 4-codeql-cli-commands.yml
5a-learning-content.yml to 5a-hardware-resources.yaml

Update md files' titles
2-learning-content.md to 2-integrate-code-scan.md
3-learning-content.md to 3-code-scanning-github-actions.md
4-learning-content.md to 4-codeql-cli-commands.md
5a-learning-content.md to 5a-hardware-resources.md

@camihmerhar
Copy link
Collaborator

camihmerhar commented Nov 7, 2022

Feedback on Introduction to Code scanning with GitHub CodeQL

Howdy @rmallorybpc , apologies for the delay on the feedback! First of all great work on all of the CodeQL content! It's extensive!

I did my first grammatical pass on this on Wednesday, but then I needed a bit more time to do a structural pass.

Anyway, here are my thoughts, let me know what you think!

Grammatical Edits:

Introduction:

  • Add an apostrophe in the first sentence in organizations, so the sentence should read as a possessive, "Code scanning using CodeQL provides an extensible method to automate vulnerability scanning across your organization's GitHub repositories."
  • To avoid redundancy in the unit breakdown I would reword the second bullet point to, "Learn about QL, a unique logic programming language"

2-what-is-codeql.md

  • Totally up to you, but I would breakdown the introduction of the subunits into a bullet point list to make is easier for learners to read through it.
  • Another example of helping learners with content is providing a sentence within the subunit that takes it back to why this subject is relevant. I tried to do this in the first unit, in the subunit of Variant analysis by taking one of the introduction paragraphs in unit 1 about variant analysis and put it in the the subunit. So now it reads as,

Variant analysis

CodeQL is the analysis engine used by developers to automate security checks, and by security researchers to perform variant analysis.

Variant analysis is the process of using a known security vulnerability as a seed to find similar problems in your code. It’s a technique that security engineers use to identify potential vulnerabilities, and ensure these threats are properly fixed across multiple codebases.

Querying code using CodeQL is the most efficient way to perform variant analysis. You can use the standard CodeQL queries to identify seed vulnerabilities, or find new vulnerabilities by writing your own custom CodeQL queries. Then, develop or iterate over the query to automatically find logical variants of the same bug that could be missed using traditional manual techniques.


  • Okay, get ready for me deleting commas right and left. Feel free to disregard. I think I have bad memories of getting points taken off writing assignments because I was taught if you naturally pause in a sentence you should use a comma. But during my junior year of high school my English teacher told me for the most part commas make the most sense when you're:
    1. providing and list of things
    2. nonrestrictive clauses
    3. direct addresses (example: John, I think you should stop.)
    4. direct quotes

This proved to be useful for college, especially as an English major. All of that to say is death to commas.

  • Remove comma after "...automate security checks..." so the sentence reads like, "CodeQL is the analysis engine used by developers to automate security checks and by security researchers to perform variant analysis."
  • Remove comma after "...engineers use to identify potential vulnerabilities..." so the sentence should read like, "It’s a technique that security engineers use to identify potential vulnerabilities and ensure these threats are properly fixed across multiple codebases."
  • Remove comma after "...during the extraction process" so the sentence should read like, "The schema provides an interface between the initial lexical analysis performed during the extraction process and the actual complex analysis of the CodeQL query evaluator."
  • In the second paragraph of the subunit Query Suites I retooled the 2nd sentence to remove the parenthesis, "A suite definition is a sequence of instructions, where each instruction is usually a YAML mapping with a single key." I feel like it flows just a bit more and you don't have to include parentheses.
  • I also added a transition statement to the end of the unit to help with the flow to unit 2, "Next up, you will learn about the three phases of CodeQL analysis."

3-how-does-codeql-analyze-code.md

  • Remove comma so the sentence reads like, "To create a database CodeQL first extracts a single relational representation of each source file in the codebase."
  • Remove comma so the sentence reads like, "For compiled languages extraction works by monitoring the normal build process."
  • Rewrote this the sentence for better flow, "For interpreted languages the extractor runs directly on the source code, which resolves dependencies in order to give an accurate representation of the codebase."
  • Remove comma so the sentence reads like, "After extraction all the data required for analysis (relational data, copied source files, and a language-specific database schema, which specifies the mutual relations in the data) is imported into a single directory, known as a CodeQL database."
  • In the last sentence of unit 2, I would change it to say "Next up, you'll learn about QL."

4-what-is-ql.md

  • I would rewrite the last sentence to be a little more colloquial as "Next up, you'll tackle an exercise that'll utilize the knowledge you just learned."

Summary

  • I feel like you don't need the first sentence in the introductory paragraph of the summary unit. Although you're recalling back to the hypothetical at the beginning, I don't think it fits here because you don't follow through with the example in previous units so it feels out of place.

Overall Context and Structural Edits:

  • It might because of my lack of knowledge of CodeQL, but overall I was craving insight into how all of the content related back to CodeQL and its importance to knowing how everything makes organizations more secure. At times, especially in the first unit and 3rd unit, I felt like I was learning terms in a vacuum. One thing that I think might help is providing a sentence within every subunit that ties it back.
    • For example, I think it would help to have a short sentence after we layout the subunits that way learners can anticipate what they will learn. Like in Unit 1 we tell learners they're going to learn about variant analysis, CodeQL databases, query suites, and query language packs. All great, but why is this important? Although redundant, repetition helps learners retain important concepts.

Overall amazing job, good call breaking this down into multiple modules!

@rmallorybpc
Copy link
Collaborator

"For example, I think it would help to have a short sentence after we layout the subunits that way learners can anticipate what they will learn. Like in Unit 1 we tell learners they're going to learn about variant analysis, CodeQL databases, query suites, and query language packs. All great, but why is this important? Although redundant, repetition helps learners retain important concepts."

@camihmerhar where else specifically?

@rmallorybpc
Copy link
Collaborator

rmallorybpc commented Nov 9, 2022

@a-a-ron here is the revised outline for the customize-code-scanning-with-github-codeql module.
customize-code-scanning-codeql - branch
customize-code-scanning-with-github-codeql - folder
https://github.com/githubpartners/microsoft-learn/tree/customize-code-scanning-codeql/github/customize-code-scanning-with-github-codeql

YML files
2-learning-content.yml to 2-configure-codeql.yml
3-learning-content.yml 3-customize-your-scanning-workflow-with-codeql.yml
5-exercise.yml to 5-exercise-reference-codeql-query.yml
4-learning-content.yml to 5a.customize-your-scanning-workflow-with-codeql-2.yml
add new yml file: 5b-exercise-configure-language-matrix.yml
add new yml file: 5c-custom-build-steps-for-code-scanning.yml
add new yml file: 5d-troubleshooting-the-codeql-workflow.yml
6-knowledge-check.yml - here to show where this fits, no change needed

MD files
2-learning-content.yml to 2-configure-codeql.md
3-learning-content.yml 3-customize-your-scanning-workflow-with-codeql.md
5-exercise.yml to 5-exercise-reference-codeql-query.md
4-learning-content.yml to 5a.customize-your-scanning-workflow-with-codeql-2.md
add new md file: 5b-exercise-configure-language-matrix.md
add new md file: 5c-custom-build-steps-for-code-scanning.md
add new md file: 5d-troubleshooting-the-codeql-workflow.md
7-summary.md - here to show where this fits, no change needed

@camihmerhar
Copy link
Collaborator

camihmerhar commented Nov 9, 2022

Howdy @rmallorybpc! I took another quick pass at the introduction module and here are my thoughts.

Structurally, I think there are 2 things we can do to make the module more intuitive for learners:

  1. Ensure that each unit name aligns with the Learning Objectives.
    • Ultimately the Learning Objectives in the Introduction and Summary should read as:
      • Learning objectives:
        • Understanding variant analysis, CodeQL databases, query suites, and query language packs relevancy with Code Scanning
        • How to run a CodeQL analysis
        • What is QL and how to use it with Code Scanning
  2. A relevancy statement for each subunit to tie it back how it helps Code Scanning with CodeQL.
    • For example for the 2nd Unit, What is CodeQL in the intro paragraph somewhere it should have something like this :
      • In this unit, you will learn about:
        • variant analysis: the process of using a known security vulnerability as a seed to find similar problems in your code
        • CodeQL databases: insert relevancy statement
        • query suites: insert relevancy statement
        • and query language packs: insert relevancy statement

Based of off my experience online content learners/readers only take 7-10 seconds to scan a page, so by these two adjustments to help learners know what to expect it will help them digest information more effectively and easily.

Let me know if you have any questions!

@camihmerhar
Copy link
Collaborator

camihmerhar commented Nov 21, 2022

Feedback for Analyze Code Using CodeQL

Howdy @rmallorybpc, great work as always! Let me know if you have any questions.

Grammatical and Structural Suggestions:

Introduction

  • I feel like we can do away with this sentence, "In this module, you will learn about integrating with code scanning, set up code scanning with GitHub Actions, use the CodeQL command-line interface (CLI), and understand what hardware resources to use with CodeQL." It seems redundant to the learning objectives.

Unit 2

  • Yay table of contents! It looks great, but I'm thinking the short description after the topic looks text heavy. I think we should just have the table of contents with the title of the subunit and then have a strong relevancy/transitional sentence at the beginning of every subunit. Thoughts?
  • Minor edit, but I feel like we can break up the first paragraph in the "Integrate with code scanning" subunit so it reads easier so it'll look like...

"You can perform analysis elsewhere and then upload the code scanning results to GitHub. The alerts for code scanning that you run externally are displayed in the same way as those for code scanning that you run within GitHub.

When you use a third-party static analysis tool that can produce results as Static Analysis Results Interchange Format (SARIF) 2.1.0 data, you can upload the results to GitHub."

  • Similarly, I feel like we can do the same to the first paragraph subunit "Integrate with Webhooks," so it reads like...

"You can use code scanning webhooks to build or set up integrations that subscribe to code scanning events in your repository, such as GitHub Apps or OAuth Apps.

For example, you could build an integration that creates an issue on GitHub or sends you a Slack notification when a new code scanning alert is added in your repository."

Unit 3

  • In the second to last paragraph of first subunit of the second learning unit (wow that's a mouthful), I would recommend rewording it to have a bit more flow to something along the lines of, "As a result of either of these two occurrences, code scanning will automatically start."

Unit 4

  • Add a serial comma in the sentence "This unit will cover using the CodeQL CLI to create databases, analyze databases, and upload the results to GitHub."
  • Delete comma in the sentence "Once you've made the CodeQL CLI available to servers in your CI system and ensured that they can authenticate with GitHub, you're ready to generate data."
  • In the transition statement to the exercise, I would recommend rewording to something along the lines of, "Next up, you'll take on the role of a detective in a hands-on exercise to find the thief in a fictional QL village." I think this will help the readers know they'll be switching gears to something else.

Summary

  • I would recommend removing the first sentence of the opening paragraph, it feels a touch out of place.

@rmallorybpc
Copy link
Collaborator

@camihmerhar

Unit 2

  • Yay table of contents! It looks great, but I'm thinking the short description after the topic looks text heavy. I think we should just have the table of contents with the title of the subunit and then have a strong relevancy/transitional sentence at the beginning of every subunit. Thoughts?

I kept in one start sentence, but removed the second sentence. A sentence to launch the module, but shorter.

@camihmerhar
Copy link
Collaborator

camihmerhar commented Dec 8, 2022

Feedback on Customize code scanning with GitHub CodeQL

Introduction

  • I would recommend removing the first sentence of the opening paragraph, "Imagine that you continue to be a senior developer at a start-up company specializing in health care software." We don't use it as an example in the units so I feel like we don't need it.

  • I recommend adding in sub bullets in the learning objectives for the subunits that you include to help learners anticipate what might be coming. An example would be:

  • Configure the language matrix in a CodeQL workflow

  • Customize scanning workflow with CodeQL

    • Customize your scanning workflow with CodeQL workflow
    • Custom build steps for code scanning
    • Troubleshooting the CodeQL workflow
  • Implement custom build steps

Unit 2 Configure CodeQL

  • I would recommend adding an intro statement leading into the unit to help prep learners get into the content, potentially some type of explanation as to why we need to go over QL language before we dive into the title of the unit, Configure CodeQL

Unit 3 Customize Code Scanning with GitHub

  • Since the transition statement/ table of contents of the unit has more than 2 topics you'll cover I would bullet out the topics
  • I recommend making the titles of the subunits the exact same as the ones you list in the transition statement/ table of contents that way if learners are skimming to refresh they know in what order and the title they need to look for to get to the content they want
  • Should the sub sub unit Use CodeQL Query Packs be a sub unit?

5b Exercise

  • Minor edit, there needs to be a space between the first sentence of the unit and the second sentence.

Great work as always @rmallorybpc! Let me know if you have any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants