-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UPDATE: Code scanning with GitHub CodeQL #46
base: main
Are you sure you want to change the base?
Conversation
Plan of ActionLaunch page
We need to figure out how to split the module.
Introduction
Content Units
Exercise
New modules
|
Change approach here, content moved back in. Move Introduction informationThis comment was used to save the moved content, copy and paste, and compare the content across the two files. Learning objectivesBy the end of this module, you will be able to:
Prerequisites
|
Moved the learning objectives and prerequisites to the launch page, and added a transition sentence.
New module orderProposed, the original module, Code scanning with GitHub CodeQL, to be split into 3 modules. Introduction to Code scanning with GitHub CodeQL1-introduction.md - edited existing unit New moduleAnalyze code by using CodeQLintroduction - for new content New moduleCustomize Code Scanning with GitHub CodeQLintroduction.md - edited, revised new unit |
Added the learning objectives and prerequisites back
Introduction to Code scanning with GitHub CodeQLintro-code-scanning-codeql 1-introduction.md - edited existing unitCode scanning using CodeQL provides an extensible method to automate vulnerability scanning across your organizations GitHub repositories. Imagine that you are a senior developer at a start-up company specializing in health care software. Your flagship product is a Java-based web portal that allows physicians to manage patient records. A recent penetration test of this product revealed a number of serious vulnerabilities that could compromise patient information. The CIO has asked you to implement automated code vulnerability scanning. Because your code is already hosted in a private repository on GitHub, you have decided to use the code scanning feature with CodeQL. You will need to understand how the feature works to persuade other developers and management to use the feature. You will also need to understand the various configuration options and how to implement and maintain a code scanning pipeline to assist other developers at your company in configuring and deploying code scanning correctly. In this module, you will learn about the CodeQL static analysis tool and how the code scanning feature in GitHub uses it to automate vulnerability scanning. You will also learn how to customize a code scanning workflow that uses CodeQL, how to include additional queries, and how to adapt your workflow to repositories that have multiple languages. Learning objectivesBy the end of this module, you will be able to:
Prerequisites
Next up, you'll learn how CodeQL is used by developers. 2-what-is-codeql.md - existing unitCodeQL is the analysis engine used by developers to automate security checks, and by security researchers to perform variant analysis. In CodeQL, code is treated like data. Security vulnerabilities, bugs, and other errors are modeled as queries that can be executed against databases extracted from code. You can run the standard CodeQL queries, written by GitHub researchers and community contributors, or write your own to use in custom analyses. Queries that find potential bugs highlight the result directly in the source file. In this unit, you will learn about the CodeQL static analysis tool and how it uses databases, query suites and query language packs to perform variant analysis. Variant analysisVariant analysis is the process of using a known security vulnerability as a seed to find similar problems in your code. It’s a technique that security engineers use to identify potential vulnerabilities, and ensure these threats are properly fixed across multiple codebases. Querying code using CodeQL is the most efficient way to perform variant analysis. You can use the standard CodeQL queries to identify seed vulnerabilities, or find new vulnerabilities by writing your own custom CodeQL queries. Then, develop or iterate over the query to automatically find logical variants of the same bug that could be missed using traditional manual techniques. CodeQL databasesCodeQL databases contain queryable data extracted from a codebase, for a single language at a particular point in time. The database contains a full, hierarchical representation of the code, including a representation of the abstract syntax tree, the data flow graph, and the control flow graph. Each language has its own unique database schema that defines the relations used to create a database. The schema provides an interface between the initial lexical analysis performed during the extraction process, and the actual complex analysis of the CodeQL query evaluator. The schema specifies, for instance, that there is a table for every language construct. For each language, the CodeQL libraries define classes to provide a layer of abstraction over the database tables. This provides an object-oriented view of the data which makes it easier to write queries. For example, in a CodeQL database for a Java program, two key tables are:
The CodeQL library defines classes to provide a layer of abstraction over each of these tables (and the related auxiliary tables): Query suitesCodeQL query suites provide a way of selecting queries, based on their filename, location on disk or in a QL pack, or metadata properties. Create query suites for the queries that you want to frequently use in your CodeQL analyses. Query suites allow you to pass multiple queries to CodeQL without having to specify the path to each query file individually. Query suite definitions are stored in YAML files with the extension Default query suitesThere are three default query suites for CodeQL:
Query Language (QL) packsQL packs are used to organize the files used in CodeQL analysis. They contain queries, library files, query suites, and important metadata. The CodeQL repository contains QL packs for C/C++, C#, Java, JavaScript, Python, and Ruby. The CodeQL for Go repository contains a QL pack for Go analysis. You can also make custom QL packs to contain your own queries and libraries. QL pack structureA QL pack must contain a file called
An example name: codeql/java-queries
version: 0.0.6-dev
groups: java
suites: codeql-suites
extractor: java
defaultSuiteFile: codeql-suites/java-code-scanning.qls
dependencies:
codeql/java-all: "*"
codeql/suite-helpers: "*" 3-how-does-codeql-analyze-code.md - existing unitImplementing code scanning with CodeQL requires an understanding of how the tool analyzes code. CodeQL analysis consists of three steps:
In this unit, you will learn about the three phases of CodeQL analysis. Database creationTo create a database, CodeQL first extracts a single relational representation of each source file in the codebase. For compiled languages, extraction works by monitoring the normal build process. Each time a compiler is invoked to process a source file, a copy of that file is made, and all relevant information about the source code is collected. This includes syntactic data about the abstract syntax tree and semantic data about name binding and type information. For interpreted languages, the extractor runs directly on the source code, resolving dependencies to give an accurate representation of the codebase. There is one extractor for each language supported by CodeQL to ensure that the extraction process is as accurate as possible. For multi-language codebases, databases are generated one language at a time. After extraction, all the data required for analysis (relational data, copied source files, and a language-specific database schema, which specifies the mutual relations in the data) is imported into a single directory, known as a CodeQL database. Query executionAfter you’ve created a CodeQL database, one or more queries are executed against it. CodeQL queries are written in a specially designed object-oriented query language called QL. You can run the queries checked out from the CodeQL repo (or custom queries that you’ve written yourself) using the CodeQL for VS Code extension or the CodeQL CLI. Query resultsThe final step converts results produced during query execution into a form that is more meaningful in the context of the source code. That is, the results are interpreted in a way that highlights the potential issue that the queries are designed to find. :::image type="content" source="../media/codeql-query-results.png" alt-text="Screenshot of CodeQL query results."::: Queries contain metadata properties that indicate how the results should be interpreted. For instance, some queries display a simple message at a single location in the code. Others display a series of locations that represent steps along a data-flow or control-flow path, along with a message explaining the significance of the result. Queries that don’t have metadata are not interpreted—their results are output as a table and not displayed in the source code. Following interpretation, results are output for code review and triaging. In CodeQL for Visual Studio Code, interpreted query results are automatically displayed in the source code. Results generated by the CodeQL CLI can be output into a number of different formats for use with different tools. 4-what-is-ql.md - existing unitQL is a declarative, object-oriented query language that is optimized to enable efficient analysis of hierarchical data structures, in particular, databases representing software artifacts. A database is an organized collection of data. The most commonly used database model is a relational model which stores data in tables and SQL (Structured Query Language) is the most commonly used query language for relational databases. The purpose of a query language is to provide a programming platform where you can ask questions about information stored in a database. A database management system manages the storage and administration of data and provides the querying mechanism. A query typically refers to the relevant database entities and specifies various conditions (called predicates) that must be satisfied by the results. Query evaluation involves checking these predicates and generating the results. Some of the desirable properties of a good query language and its implementation include:
In this unit, you will learn about the basic features of the QL programming language so that you can write your own custom queries or better understand the pre-existing open source queries available. The QL syntaxThe syntax of QL is similar to SQL, but the semantics of QL are based on Datalog, a declarative logic programming language often used as a query language. This makes QL primarily a logic language, and all operations in QL are logical operations. Furthermore, QL inherits recursive predicates from Datalog, and adds support for aggregates, making even complex queries concise and simple. For example, consider a database containing parent-child relationships for people. If you want to find the number of descendants of a person, typically you would:
When you write this process in QL, it closely resembles the above structure. Notice that the example used recursion to find all descendants of the given person, and an aggregate to count the number of descendants. Translating these steps into the final query without adding any procedural details is possible due to the declarative nature of the language. The QL code would look something like this: Person getADescendant(Person p) {
result = p.getAChild() or
result = getADescendant(p.getAChild())
}
int getNumberOfDescendants(Person p) {
result = count(getADescendant(p))
} Object orientationObject orientation is an important feature of QL. The benefits of object orientation are well-known – it increases modularity, enables information hiding, and allows code reuse. QL offers all these benefits without compromising on its logical foundation. This is achieved by defining a simple object model where classes are modeled as predicates and inheritance as implication. The libraries made available for all supported languages make extensive use of classes and inheritance. QL and general purpose programming languagesHere are a few prominent conceptual and functional differences between general purpose programming languages and QL:
In object-oriented programming languages, instantiating a class involves creating an object by allocating physical memory to hold the state of that instance of the class. In QL, classes are just logical properties describing sets of already existing values. Exercise - new exercise needed - new unitKnowledge checkAlternatively, quizzes in each unit 12-summary.md - edited existing unitYou are a senior developer responsible for implementing automated code vulnerability scanning at your company. You need to understand how code scanning with CodeQL works and how to configure it, so that you can help your entire organization adopt it. You did some research on code scanning with CodeQL and found the following:
Without using GitHub code scanning with CodeQL, it would be very difficult to automate both the scanning of your code, as well as generating pull requests to fix the vulnerable code. In addition, CodeQL provides an extensive, growing library of queries in multiple languages that help you create more secure code with little engineering effort. Successfully rolling out automated code vulnerability scanning across your organization has made developers more productive and the product at your company more secure. References |
Analyze code by using CodeQLcode-scanning-github-codeql-2 - to be changed to - analyze-code-using-codeql (or similar) introduction - edit existing unitContent to include Code scanning using CodeQL provides an extensible method to automate vulnerability scanning across your organizations GitHub repositories. Imagine that you are a senior developer at a start-up company specializing in health care software. Your flagship product is a Java-based web portal that allows physicians to manage patient records. A recent penetration test of this product revealed a number of serious vulnerabilities that could compromise patient information. The CIO has asked you to implement automated code vulnerability scanning. Because your code is already hosted in a private repository on GitHub, you have decided to use the code scanning feature with CodeQL. You will need to understand how the feature works to persuade other developers and management to use the feature. You will also need to understand the various configuration options and how to implement and maintain a code scanning pipeline to assist other developers at your company in configuring and deploying code scanning correctly. In this module, you will learn about the CodeQL static analysis tool and how the code scanning feature in GitHub uses it to automate vulnerability scanning. You will also learn how to customize a code scanning workflow that uses CodeQL, how to include additional queries, and how to adapt your workflow to repositories that have multiple languages. Learning objectivesBy the end of this module, you will be able to:
Prerequisites
Next up, you'll learn how CodeQL is used by developers. Integration with code scanningabout integration with code scanning - content to include 5-code-scanning-codeql.md - existing unitDepending on which tool you want to use for analysis and how you want to generate alerts, there are a few different options for setting up a code scanning workflow on your repository:
In this unit, you will learn how to set up code scanning with GitHub Actions, as well as how to perform bulk setup of code scanning for multiple repositories. Code scanning with GitHub Actions and CodeQLTo set up code scanning with GitHub Actions and CodeQL on a repository, do the following:
In the default CodeQL analysis workflow, code scanning is configured to analyze your code each time you either push a change to the default branch or any protected branches, or raise a pull request against the default branch. As a result, code scanning will now commence. The Bulk setup of code scanningYou can set up code scanning in many repositories at once using a script. If you'd like to use a script to raise pull requests that add a GitHub Actions workflow to multiple repositories, see the jhutchings1/Create-ActionsPRs repository for an example using PowerShell, or nickliffen/ghas-enablement for an example using NodeJS. 9-use-codeql-cli.md - existing unitmore content if needed In addition to the graphical user interface on GitHub.com, you can also access many of the same primary CodeQL features through a command line interface. This unit will cover using the CodeQL CLI to create databases, analyze databases and upload the results to GitHub. CodeQL CLI commandsOnce you've made the CodeQL CLI available to servers in your CI system, and ensured that they can authenticate with GitHub, you're ready to generate data. You use three different commands to generate results and upload them to GitHub:
You can display the command-line help for any command using the Uploading SARIF data to display as code scanning results in GitHub is supported for organization-owned repositories with GitHub Advanced Security enabled, and public repositories on GitHub.com. Create CodeQL databases to analyzeFollow the steps below to create CodeQL databases to analyze:
Note If you use a containerized build, you need to run the CodeQL CLI inside the container where your build task takes place. The full list of parameters for the
Single language exampleThis example creates a CodeQL database for the repository checked out at $ codeql database create /codeql-dbs/example-repo --language=javascript \
--source-root /checkouts/example-repo
> Initializing database at /codeql-dbs/example-repo.
> Running command [/codeql-home/codeql/javascript/tools/autobuild.cmd]
in /checkouts/example-repo.
> [build-stdout] Single-threaded extraction.
> [build-stdout] Extracting
...
> Finalizing database at /codeql-dbs/example-repo.
> Successfully created database at /codeql-dbs/example-repo. Multiple languages exampleThis example creates two CodeQL databases for the repository checked out at
The resulting databases are stored in $ codeql database create /codeql-dbs/example-repo-multi \
--db-cluster --language python,cpp \
--command make --no-run-unnecessary-builds \
--source-root /checkouts/example-repo-multi
Initializing databases at /codeql-dbs/example-repo-multi.
Running build command: [make]
[build-stdout] Calling python3 /codeql-bundle/codeql/python/tools/get_venv_lib.py
[build-stdout] Calling python3 -S /codeql-bundle/codeql/python/tools/python_tracer.py -v -z all -c /codeql-dbs/example-repo-multi/python/working/trap_cache -p ERROR: 'pip' not installed.
[build-stdout] /usr/local/lib/python3.6/dist-packages -R /checkouts/example-repo-multi
[build-stdout] [INFO] Python version 3.6.9
[build-stdout] [INFO] Python extractor version 5.16
[build-stdout] [INFO] [2] Extracted file /checkouts/example-repo-multi/hello.py in 5ms
[build-stdout] [INFO] Processed 1 modules in 0.15s
[build-stdout] <output from calling 'make' to build the C/C++ code>
Finalizing databases at /codeql-dbs/example-repo-multi.
Successfully created databases at /codeql-dbs/example-repo-multi.
$ Analyze a CodeQL databaseAfter creating your CodeQL database, follow the steps below to analyze it:
codeql database analyze <database> --format=<format> \
--output=<output> <packs,queries> Note If you analyze more than one CodeQL database for a single commit, you must specify a SARIF category for each set of results generated by this command. When you upload the results to GitHub, code scanning uses this category to store the results for each language separately. If you forget to do this, each upload overwrites the previous results. codeql database analyze <database> --format=<format> \
--sarif-category=<language-specifier> --output=<output> \
<packs,queries> The full list of parameters for the
Basic exampleThis example analyzes a CodeQL database stored at $ codeql database analyze /codeql-dbs/example-repo \
javascript-code-scanning.qls --sarif-category=javascript
--format=sarif-latest --output=/temp/example-repo-js.sarif
> Running queries.
> Compiling query plan for /codeql-home/codeql/qlpacks/
codeql-javascript/AngularJS/DisablingSce.ql.
...
> Shutting down query evaluator.
> Interpreting results. Upload results to GitHubSARIF upload supports a maximum of 5,000 results per upload. Any results over this limit are ignored. If a tool generates too many results, you should update the configuration to focus on results for the most important rules or queries. For each upload, SARIF upload supports a maximum size of 10 MB for the gzip-compressed SARIF file. Any uploads over this limit will be rejected. If your SARIF file is too large because it contains too many results, you should update the configuration to focus on results for the most important rules or queries. Before you can upload results to GitHub, you must determine the best way to pass the GitHub App or personal access token you created earlier to the CodeQL CLI. We recommend that you review your CI system's guidance on the secure use of a secret store. The CodeQL CLI supports:
When you have decided on the most secure and reliable method for your CI server, run echo "$UPLOAD_TOKEN" | codeql github upload-results --repository=<repository-name> \
--ref=<ref> --commit=<commit> --sarif=<file> \
--github-auth-stdin The full list of parameters for the
Hardware resources for running CodeQL - new unitRecommended hardware resources for running CodeQL Exercise - new exercise needed - new unitContent needed Knowledge checkAlternatively, quizzes in each unit summary.md - edited, revised new unitYou are a senior developer responsible for implementing automated code vulnerability scanning at your company. You need to understand how code scanning with CodeQL works and how to configure it, so that you can help your entire organization adopt it. You did some research on code scanning with CodeQL and found the following:
Without using GitHub code scanning with CodeQL, it would be very difficult to automate both the scanning of your code, as well as generating pull requests to fix the vulnerable code. In addition, CodeQL provides an extensive, growing library of queries in multiple languages that help you create more secure code with little engineering effort. Successfully rolling out automated code vulnerability scanning across your organization has made developers more productive and the product at your company more secure. References |
Customize Code Scanning with GitHub CodeQLcustomize-code-scanning-codeql introduction.md - edited, revised new unitCode scanning using CodeQL provides an extensible method to automate vulnerability scanning across your organizations GitHub repositories. Imagine that you are a senior developer at a start-up company specializing in health care software. Your flagship product is a Java-based web portal that allows physicians to manage patient records. A recent penetration test of this product revealed a number of serious vulnerabilities that could compromise patient information. The CIO has asked you to implement automated code vulnerability scanning. Because your code is already hosted in a private repository on GitHub, you have decided to use the code scanning feature with CodeQL. You will need to understand how the feature works to persuade other developers and management to use the feature. You will also need to understand the various configuration options and how to implement and maintain a code scanning pipeline to assist other developers at your company in configuring and deploying code scanning correctly. In this module, you will learn about the CodeQL static analysis tool and how the code scanning feature in GitHub uses it to automate vulnerability scanning. You will also learn how to customize a code scanning workflow that uses CodeQL, how to include additional queries, and how to adapt your workflow to repositories that have multiple languages. Learning objectivesBy the end of this module, you will be able to:
Prerequisites
Next up, you'll learn how CodeQL is used by developers. Configuring the CodeQL workflow for compiled languages - new unitConfiguring the CodeQL workflow for compiled languages 6-customize-your-scanning-workflow-with-codeql.md - existing unitCode scanning workflows that use CodeQL have various configuration options that can be adjusted to better suit the needs of your organization. When you use CodeQL to scan code, the CodeQL analysis engine generates a database from the code and runs queries on it. CodeQL analysis uses a default set of queries, but you can specify more queries to run, in addition to the default queries. You can run extra queries if they are part of a CodeQL pack (beta) published to the GitHub Container registry or a QL pack stored in a repository. There are two options for specifying which queries you want to run with CodeQL code scanning:
In this unit, you will learn how to edit a workflow file to reference additional queries, how to use queries from query packs and how to combine queries from a workflow file and a custom configuration file. Specify additional queries in a workflow fileThe options available to specify the additional queries you want to run are:
You can use both packs and queries in the same workflow. We don't recommend referencing query suites directly from the Use CodeQL query packsNote The CodeQL package management functionality, including CodeQL packs, is currently in beta and subject to change. To add one or more CodeQL query packs (beta), add a In the example below, scope is the organization or personal account that published the package. When the workflow runs, the three CodeQL query packs are downloaded from GitHub and the default queries or query suite for each pack run. The latest version of - uses: github/codeql-action/init@v1
with:
# Comma-separated list of packs to download
packs: scope/pack1,scope/[email protected],scope/pack3@~1.2.3 Note For workflows that generate CodeQL databases for multiple languages, you must instead specify the CodeQL query packs in a configuration file. Use queries in QL packsTo add one or more queries, add a - uses: github/codeql-action/init@v1
with:
queries: COMMA-SEPARATED LIST OF PATHS
# Optional. Provide a token to access queries stored in private repositories.
external-repository-token: ${{ secrets.ACCESS_TOKEN }} You can also specify query suites in the value of queries. Query suites are collections of queries, usually grouped by purpose or language. The following query suites are built into CodeQL code scanning and are available for use.
When you specify a query suite, the CodeQL analysis engine will run the queries contained within the suite for you, in addition to the default set of queries. Combine queries from a workflow file and a custom configuration fileIf you also use a configuration file for custom settings, any additional packs or queries specified in your workflow are used instead of those specified in the configuration file. If you want to run the combined set of additional packs or queries, prefix the value of packs or queries in the workflow with the In the following example, the - uses: github/codeql-action/init@v1
with:
config-file: ./.github/codeql/codeql-config.yml
queries: +security-and-quality,octo-org/python-qlpack/show_ifs.ql@main
packs: +scope/pack1,scope/[email protected]` 7-exercise-reference-codeql-query.md - existing unitThis exercise checks your knowledge on referencing a CodeQL query in a CodeQL workflow.
Note A grading script exists under 8-customize-your-scanning-workflow-with-codeql-2.md - existing unitCode scanning workflows that use CodeQL have various configuration options that can be adjusted to better suit the needs of your organization. In this unit, you will learn how to reference additional queries in a custom configuration file. Additional queries in a custom configuration fileA custom configuration file is an alternative way to specify additional packs and queries to run. You can also use the file to disable the default queries and to specify which directories to scan during analysis. In the workflow file, use the - uses: github/codeql-action/init@v1
with:
config-file: ./.github/codeql/codeql-config.yml The configuration file can be located within the repository you are analyzing, or in an external repository. Using an external repository allows you to specify configuration options for multiple repositories in a single place. When you reference a configuration file located in an external repository, you can use the If the configuration file is located in an external private repository, use the - uses: github/codeql-action/init@v1
with:
external-repository-token: ${{ secrets.ACCESS_TOKEN }} The settings in the configuration file are written in YAML format. Specify CodeQL query packs in custom configuration filesNote The CodeQL package management functionality, including CodeQL packs, is currently in beta and subject to change. You specify CodeQL query packs in an array. Note that the format is different from the format used by the workflow file. packs:
# Use the latest version of 'pack1' published by 'scope'
- scope/pack1
# Use version 1.23 of 'pack2'
- scope/[email protected]
# Use the latest version of 'pack3' compatible with 1.23
- scope/pack3@~1.2.3 If you have a workflow that generates more than one CodeQL database, you can specify any CodeQL query packs to run in a custom configuration file using a nested map of packs. packs:
# Use these packs for JavaScript analysis
javascript:
- scope/js-pack1
- scope/js-pack2
# Use these packs for Java analysis
java:
- scope/java-pack1
- scope/[email protected] Specify additional queries in a custom configurationYou specify additional queries in a queries array. Each element of the array contains a uses parameter with a value that identifies a single query file, a directory containing query files, or a query suite definition file. queries:
- uses: ./my-basic-queries/example-query.ql
- uses: ./my-advanced-queries
- uses: ./query-suites/my-security-queries.qls Optionally, you can give each array element a name, as shown in the example configuration file below. name: "My CodeQL config"
disable-default-queries: true
queries:
- name: Use an in-repository QL pack (run queries in the my-queries directory)
uses: ./my-queries
- name: Use an external JavaScript QL pack (run queries from an external repo)
uses: octo-org/javascript-qlpack@main
- name: Use an external query (run a single query from an external QL pack)
uses: octo-org/python-qlpack/show_ifs.ql@main
- name: Use a query suite file (run queries from a query suite in this repo)
uses: ./codeql-qlpacks/complex-python-qlpack/rootAndBar.qls
paths:
- src
paths-ignore:
- src/node_modules
- '**/*.test.js' Disable the default queriesIf you only want to run custom queries, you can disable the default security queries by using Specify directories to scanFor the interpreted languages that CodeQL supports (Python, Ruby and JavaScript/TypeScript), you can restrict code scanning to files in specific directories by adding a paths array to the configuration file. You can exclude the files in specific directories from analysis by adding a paths:
- src
paths-ignore:
- src/node_modules
- '**/*.test.js' Note
For compiled languages, if you want to limit code scanning to specific directories in your project, you must specify appropriate build steps in the workflow. The commands you need to use to exclude a directory from the build will depend on your build system. You can quickly analyze small portions of a monorepo when you modify code in specific directories. You'll need to both exclude directories in your build steps and use the 11-exercise-configure-language-matrix.md - existing unitThis exercise checks your knowledge on configuring the language matrix in a CodeQL workflow.
Note A grading script exists under
10-custom-build-steps-for-code-scanning.md - existing unitCodeQL code scanning supports many languages by default with an autobuild feature. If your code uses a non-standard build process, however, you may need to customize your workflow with custom build steps. This unit will describe how to change the languages analyzed by code scanning and how to add custom build steps to a CodeQL code scanning workflow. Change the languages that are analyzedCodeQL code scanning automatically detects code written in the following supported languages: C/C++, C#, Go, Java, JavaScript/TypeScript, Python, and Ruby. Note CodeQL analysis for Ruby is currently in beta. During the beta, analysis of Ruby will be less comprehensive than CodeQL analysis of other languages. The default CodeQL analysis workflow file contains a build matrix called language which lists the languages in your repository that are analyzed. CodeQL automatically populates this matrix when you add code scanning to a repository. Using the language matrix optimizes CodeQL to run each analysis in parallel. We recommend that all workflows adopt this configuration due to the performance benefits of parallelizing builds. If your repository contains code in more than one of the supported languages, you can choose which languages you want to analyze. There are several reasons you might want to prevent a language being analyzed. For example, the project might have dependencies in a different language to the main body of your code, and you might prefer not to see alerts for those dependencies. If your workflow uses the language matrix then CodeQL is hardcoded to analyze only the languages in the matrix. To change the languages you want to analyze, edit the value of the matrix variable. You can remove a language to prevent it being analyzed or you can add a language that was not present in the repository when code scanning was set up. For example, if the repository initially only contained JavaScript when code scanning was set up, and you later added Python code, you will need to add jobs:
analyze:
name: Analyze
...
strategy:
fail-fast: false
matrix:
language: ['javascript', 'python'] If your workflow does not contain a matrix called language, then CodeQL is configured to run analysis sequentially. If you don't specify languages in the workflow, CodeQL automatically detects, and attempts to analyze, any supported languages in the repository. If you want to choose which languages to analyze, without using a matrix, you can use the languages parameter under the - uses: github/codeql-action/init@v1
with:
languages: cpp, csharp, python Custom build steps for code scanningFor the supported compiled languages, you can use the autobuild action in the CodeQL analysis workflow to build your code. This avoids you having to specify explicit build commands for C/C++, C#, and Java. CodeQL also runs a build for Go projects to set up the project. However, in contrast to the other compiled languages, all Go files in the repository are extracted, not just those that are built. You can use custom build commands to skip extracting Go files that are not touched by the build. Add build steps for a compiled languageIf the C/C++, C#, or Java code in your repository has a non-standard build process, After removing the - run: |
make bootstrap
make release If your repository contains multiple compiled languages, you can specify language-specific build commands. For example, if your repository contains C/C++, C# and Java, and - if: matrix.language == 'cpp' || matrix.language == 'csharp'
name: Autobuild
uses: github/codeql-action/autobuild@v1
- if: matrix.language == 'java'
name: Build Java
run: |
make bootstrap
make release Troubleshooting the CodeQL workflow - new unitTroubleshooting the CodeQL workflow summary.md, revised new unitYou are a senior developer responsible for implementing automated code vulnerability scanning at your company. You need to understand how code scanning with CodeQL works and how to configure it, so that you can help your entire organization adopt it. You did some research on code scanning with CodeQL and found the following:
Without using GitHub code scanning with CodeQL, it would be very difficult to automate both the scanning of your code, as well as generating pull requests to fix the vulnerable code. In addition, CodeQL provides an extensive, growing library of queries in multiple languages that help you create more secure code with little engineering effort. Successfully rolling out automated code vulnerability scanning across your organization has made developers more productive and the product at your company more secure. References |
@a-a-ron below are the file titles to update. Update yaml files' title Update md files' titles |
Feedback on Introduction to Code scanning with GitHub CodeQLHowdy @rmallorybpc , apologies for the delay on the feedback! First of all great work on all of the CodeQL content! It's extensive! I did my first grammatical pass on this on Wednesday, but then I needed a bit more time to do a structural pass. Anyway, here are my thoughts, let me know what you think! Grammatical Edits: Introduction:
2-what-is-codeql.md
Variant analysisCodeQL is the analysis engine used by developers to automate security checks, and by security researchers to perform variant analysis. Variant analysis is the process of using a known security vulnerability as a seed to find similar problems in your code. It’s a technique that security engineers use to identify potential vulnerabilities, and ensure these threats are properly fixed across multiple codebases. Querying code using CodeQL is the most efficient way to perform variant analysis. You can use the standard CodeQL queries to identify seed vulnerabilities, or find new vulnerabilities by writing your own custom CodeQL queries. Then, develop or iterate over the query to automatically find logical variants of the same bug that could be missed using traditional manual techniques.
This proved to be useful for college, especially as an English major. All of that to say is death to commas.
3-how-does-codeql-analyze-code.md
4-what-is-ql.md
Summary
Overall Context and Structural Edits:
Overall amazing job, good call breaking this down into multiple modules! |
"For example, I think it would help to have a short sentence after we layout the subunits that way learners can anticipate what they will learn. Like in Unit 1 we tell learners they're going to learn about variant analysis, CodeQL databases, query suites, and query language packs. All great, but why is this important? Although redundant, repetition helps learners retain important concepts." @camihmerhar where else specifically? |
@a-a-ron here is the revised outline for the customize-code-scanning-with-github-codeql module. YML files MD files |
Howdy @rmallorybpc! I took another quick pass at the introduction module and here are my thoughts. Structurally, I think there are 2 things we can do to make the module more intuitive for learners:
Based of off my experience online content learners/readers only take 7-10 seconds to scan a page, so by these two adjustments to help learners know what to expect it will help them digest information more effectively and easily. Let me know if you have any questions! |
Feedback for Analyze Code Using CodeQLHowdy @rmallorybpc, great work as always! Let me know if you have any questions. Grammatical and Structural Suggestions: Introduction
Unit 2
"You can perform analysis elsewhere and then upload the code scanning results to GitHub. The alerts for code scanning that you run externally are displayed in the same way as those for code scanning that you run within GitHub. When you use a third-party static analysis tool that can produce results as Static Analysis Results Interchange Format (SARIF) 2.1.0 data, you can upload the results to GitHub."
"You can use code scanning webhooks to build or set up integrations that subscribe to code scanning events in your repository, such as GitHub Apps or OAuth Apps. For example, you could build an integration that creates an issue on GitHub or sends you a Slack notification when a new code scanning alert is added in your repository." Unit 3
Unit 4
Summary
|
I kept in one start sentence, but removed the second sentence. A sentence to launch the module, but shorter. |
Feedback on Customize code scanning with GitHub CodeQLIntroduction
Unit 2 Configure CodeQL
Unit 3 Customize Code Scanning with GitHub
5b Exercise
Great work as always @rmallorybpc! Let me know if you have any questions! |
The purpose of this PR is to see how we can increase the completion rate and improve the user experience. Below is the August data for this module:
For reference: See the parent issue for context
Module file structure
Items to review for this module
We need to conduct a performance review on each of these modules to see how we can increase the completion rates, improve the user experience, and maintain the level of quality that we set for our learners.
Items for consideration
Estimated Work Effort