Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace shell=true commands in subprocess function #738

Closed
dlpzx opened this issue Sep 7, 2023 · 1 comment · Fixed by #760
Closed

Replace shell=true commands in subprocess function #738

dlpzx opened this issue Sep 7, 2023 · 1 comment · Fixed by #760
Assignees
Labels
effort: low priority: high status: not-picked-yet At the moment we have not picked this item. Anyone can pick it up type: enhancement Feature enhacement
Milestone

Comments

@dlpzx
Copy link
Contributor

dlpzx commented Sep 7, 2023

Is your idea related to a problem? Please describe.
From several scanning tools we get the following error regarding shell=true. Description: Found 'subprocess' function 'run' with 'shell=True'. This is dangerous because this call will spawn the command using a shell process. Doing so propagates current shell settings and variables, which makes it much easier for a malicious actor to execute commands. Use 'shell=False' instead. Details: https://sg.run/J92w

This happens in:

  • backend/dataall/base/cdkproxy/cdk_cli_wrapper.py (2 findings)
  • backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py (4 findings)
  • backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py (2 findings)
  • deploy/stacks/solution_bundling.py (2 findings)

Describe the solution you'd like
We have evaluated that there is no risk as there is no input introduced by users in this command. Nevertheless, we can fix the issue and avoid shell=true, so we should do it.

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

@dlpzx dlpzx added type: enhancement Feature enhacement priority: high status: not-picked-yet At the moment we have not picked this item. Anyone can pick it up labels Sep 7, 2023
@anmolsgandhi anmolsgandhi added this to the v2.1.0 milestone Sep 8, 2023
dlpzx added a commit that referenced this issue Sep 13, 2023
… merge package (#751)

### Feature or Bugfix
- Feature
- Bugfix

### Detail
- add npm-audit github workflow on PR
- add semgrep worflow on PR and on schedule
- ignore semgrep issues with explanation. Those to be fixed will be
fixed in #739 and
#738
- remove make security checks that uses safety library and rename it as
linting
- upgrade all pacakges, add package-lock and pin merge to version 2.1.1

### Relates

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@noah-paige noah-paige linked a pull request Sep 15, 2023 that will close this issue
7 tasks
dlpzx added a commit that referenced this issue Oct 16, 2023
### Feature or Bugfix
- Feature
- Bugfix

### Detail
As explained in the [semgrep
docs](https://semgrep.dev/docs/cheat-sheets/python-command-injection/#1b-shelltrue):
"Functions from the subprocess module have the shell argument for
specifying if the command should be executed through the shell. Using
shell=True is dangerous because it propagates current shell settings and
variables. This means that variables, glob patterns, and other special
shell features in the command string are processed before the command is
run, making it much easier for a malicious actor to execute commands.
The subprocess module allows you to start new processes, connect to
their input/output/error pipes, and obtain their return codes. Methods
such as Popen, run, call, check_call, check_output are intended for
running commands provided as an argument ('args'). Allowing user input
in a command that is passed as an argument to one of these methods can
create an opportunity for a command injection vulnerability."

In our case the risk is not exposed as no user input is directly taken
into the subprocess commands. Nevertheless we should strive for the
highest standards on security and this PR works on replacing all the
`shell=True` executions in the data.all
code.

In this PR:
- when possible we have set `shell=False`
- in cases where the command was too complex a `CommandSanitizer`
ensures that the input arguments are strings following the
regex=`[a-zA-Z0-9-_]`

Testing: 
- [X] local testing - deployment of any stack
(`backend/dataall/base/cdkproxy/cdk_cli_wrapper.py`)
- [X] local testing - deployment of cdk pipeline stack
(`backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py`)
- [X] local testing - deployment of codepipeline pipeline stack
(`backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py`)
- [ ] AWS testing - deployment of data.all
- [ ] AWS testing - deployment of any stack
(`backend/dataall/base/cdkproxy/cdk_cli_wrapper.py`)
- [ ] AWS testing - deployment of cdk pipeline stack
(`backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py`)
- [ ] AWS testing - deployment of codepipeline pipeline stack
(`backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py`)

### Relates
- #738 

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
- Is the input sanitized? ---> 🆗 This is exactly what this PR is trying
to do
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@dlpzx
Copy link
Contributor Author

dlpzx commented Nov 8, 2023

Merged and released with v2.1.0 🚀

@dlpzx dlpzx closed this as completed Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: low priority: high status: not-picked-yet At the moment we have not picked this item. Anyone can pick it up type: enhancement Feature enhacement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants