Replace shell=true commands in subprocess function #738

dlpzx · 2023-09-07T15:24:37Z

Is your idea related to a problem? Please describe.
From several scanning tools we get the following error regarding shell=true. Description: Found 'subprocess' function 'run' with 'shell=True'. This is dangerous because this call will spawn the command using a shell process. Doing so propagates current shell settings and variables, which makes it much easier for a malicious actor to execute commands. Use 'shell=False' instead. Details: https://sg.run/J92w

This happens in:

backend/dataall/base/cdkproxy/cdk_cli_wrapper.py (2 findings)
backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py (4 findings)
backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py (2 findings)
deploy/stacks/solution_bundling.py (2 findings)

Describe the solution you'd like
We have evaluated that there is no risk as there is no input introduced by users in this command. Nevertheless, we can fix the issue and avoid shell=true, so we should do it.

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

… merge package (#751) ### Feature or Bugfix - Feature - Bugfix ### Detail - add npm-audit github workflow on PR - add semgrep worflow on PR and on schedule - ignore semgrep issues with explanation. Those to be fixed will be fixed in #739 and #738 - remove make security checks that uses safety library and rename it as linting - upgrade all pacakges, add package-lock and pin merge to version 2.1.1 ### Relates ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

### Feature or Bugfix - Feature - Bugfix ### Detail As explained in the [semgrep docs](https://semgrep.dev/docs/cheat-sheets/python-command-injection/#1b-shelltrue): "Functions from the subprocess module have the shell argument for specifying if the command should be executed through the shell. Using shell=True is dangerous because it propagates current shell settings and variables. This means that variables, glob patterns, and other special shell features in the command string are processed before the command is run, making it much easier for a malicious actor to execute commands. The subprocess module allows you to start new processes, connect to their input/output/error pipes, and obtain their return codes. Methods such as Popen, run, call, check_call, check_output are intended for running commands provided as an argument ('args'). Allowing user input in a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability." In our case the risk is not exposed as no user input is directly taken into the subprocess commands. Nevertheless we should strive for the highest standards on security and this PR works on replacing all the `shell=True` executions in the data.all code. In this PR: - when possible we have set `shell=False` - in cases where the command was too complex a `CommandSanitizer` ensures that the input arguments are strings following the regex=`[a-zA-Z0-9-_]` Testing: - [X] local testing - deployment of any stack (`backend/dataall/base/cdkproxy/cdk_cli_wrapper.py`) - [X] local testing - deployment of cdk pipeline stack (`backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py`) - [X] local testing - deployment of codepipeline pipeline stack (`backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py`) - [ ] AWS testing - deployment of data.all - [ ] AWS testing - deployment of any stack (`backend/dataall/base/cdkproxy/cdk_cli_wrapper.py`) - [ ] AWS testing - deployment of cdk pipeline stack (`backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py`) - [ ] AWS testing - deployment of codepipeline pipeline stack (`backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py`) ### Relates - #738 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? ---> 🆗 This is exactly what this PR is trying to do - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

dlpzx · 2023-11-08T08:33:04Z

Merged and released with v2.1.0 🚀

dlpzx added type: enhancement Feature enhacement priority: high status: not-picked-yet At the moment we have not picked this item. Anyone can pick it up labels Sep 7, 2023

anmolsgandhi added the effort: low label Sep 8, 2023

anmolsgandhi added this to the v2.1.0 milestone Sep 8, 2023

anmolsgandhi assigned dlpzx Sep 8, 2023

dlpzx mentioned this issue Sep 13, 2023

feat: add npmaudit and semgrep in github workflows. Fix dependabot on merge package #751

Merged

This was referenced Sep 13, 2023

feat: add ASH, Bandit and CDK Nag #530

Merged

Fix: shell true semgrep #760

Merged

noah-paige linked a pull request Sep 15, 2023 that will close this issue

Fix: shell true semgrep #760

Merged

7 tasks

dlpzx closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace shell=true commands in subprocess function #738

Replace shell=true commands in subprocess function #738

dlpzx commented Sep 7, 2023

dlpzx commented Nov 8, 2023

Replace shell=true commands in subprocess function #738

Replace shell=true commands in subprocess function #738

Comments

dlpzx commented Sep 7, 2023

dlpzx commented Nov 8, 2023