Fix: shell true semgrep #760

dlpzx · 2023-09-14T09:40:20Z

Feature or Bugfix

Feature
Bugfix

Detail

As explained in the semgrep docs: "Functions from the subprocess module have the shell argument for specifying if the command should be executed through the shell. Using shell=True is dangerous because it propagates current shell settings and variables. This means that variables, glob patterns, and other special shell features in the command string are processed before the command is run, making it much easier for a malicious actor to execute commands. The subprocess module allows you to start new processes, connect to their input/output/error pipes, and obtain their return codes. Methods such as Popen, run, call, check_call, check_output are intended for running commands provided as an argument ('args'). Allowing user input in a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability."

In our case the risk is not exposed as no user input is directly taken into the subprocess commands. Nevertheless we should strive for the highest standards on security and this PR works on replacing all the shell=True executions in the data.all
code.

In this PR:

when possible we have set shell=False
in cases where the command was too complex a CommandSanitizer ensures that the input arguments are strings following the regex=[a-zA-Z0-9-_]

Testing:

local testing - deployment of any stack (backend/dataall/base/cdkproxy/cdk_cli_wrapper.py)
local testing - deployment of cdk pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py)
local testing - deployment of codepipeline pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py)
AWS testing - deployment of data.all
AWS testing - deployment of any stack (backend/dataall/base/cdkproxy/cdk_cli_wrapper.py)
AWS testing - deployment of cdk pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py)
AWS testing - deployment of codepipeline pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py)

Relates

Replace shell=true commands in subprocess function #738

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

Does this PR introduce or modify any input fields or queries - this includes
fetching data from storage outside the application (e.g. a database, an S3 bucket)?
- Is the input sanitized? ---> 🆗 This is exactly what this PR is trying to do
- What precautions are you taking before deserializing the data you consume?
- Is injection prevented by parametrizing queries?
- Have you ensured no eval or similar functions are used?
Does this PR introduce any functionality or component that requires authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
- Are you logging failed auth attempts?
Are you using or adding any cryptographic features?
- Do you use a standard proven implementations?
- Are the used keys controlled by the customer? Where are they stored?
Are you introducing any new policies/roles/users?
- Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

# Conflicts: # deploy/stacks/solution_bundling.py

…/shell-true-semgrep # Conflicts: # backend/dataall/base/cdkproxy/cdk_cli_wrapper.py

noah-paige · 2023-10-03T15:10:57Z

Testing this PR in an AWS Deployment - found errrors on the following

ERROR - CDK Pipeline Creation
- Stack created successfully but then receiving an error on deactivate command - I think originally used to deactivate venv before cleaning up but maybe we can remove this command not sure if it is needed

ERROR - DataPipeline or any other stack creation
- Throwing error botocore.exceptions.NoCredentialsError: Unable to locate credentials
- Earlier on when trying to resolve default credentials receive the error Unable to determine the default AWS account (EINVAL): EC2 Metadata roleName request returned error
- May be originating from aws_configure() requiring shell=True for curl command?

noah-paige

Found some errors (mentioned above) when testing AWS Deployment - may need to look into other workarounds when setting shell=False for cdkproxy

deploy/stacks/solution_bundling.py

dlpzx · 2023-10-11T12:11:00Z

backend/dataall/base/cdkproxy/cdk_cli_wrapper.py

@@ -46,10 +47,15 @@ def aws_configure(profile_name='default'):
    print('..............................................')
    print(f"AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: {os.getenv('AWS_CONTAINER_CREDENTIALS_RELATIVE_URI')}")
    cmd = ['curl', '169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI']
-    process = subprocess.run(' '.join(cmd), text=True, shell=True, encoding='utf-8', capture_output=True)  # nosec  # nosemgrep
+    process = subprocess.run(cmd, text=True, shell=False, encoding='utf-8', capture_output=True)


I am working on this item, after plotting the error logs I think the issue is on the url to get the ECS-EC2 credentials. I will use f-string to create the curl command since we can obtain it as environment variable. I am now testing it in AWS

@noah-paige I tested in AWS and this way of defining the commands works :)

dlpzx · 2023-10-11T12:19:25Z

backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py

-                f"{path}"
-            ]
-
+            cmd = ['rm', '-rf', f"{path}"]


I removed the deactivate command. I think we do not need it. I am testing if there are any implications when deploying to AWS

Tested in AWS and working fine from my end

noah-paige

Tested the below scenarios in AWS as well:

AWS testing - deployment of data.all
AWS testing - deployment of any stack (backend/dataall/base/cdkproxy/cdk_cli_wrapper.py)
AWS testing - deployment of cdk pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py)
AWS testing - deployment of codepipeline pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py)

NOTE - When the CDK Pipeline ran I received the following error in the Build Step related to aws_ddk_core package:

ERROR: Could not find a version that satisfies the requirement aws_ddk_core==0.5.1 (from versions: 1.0.0b0, 1.0.0b1, 1.0.0rc0, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.3.0)

Approving this PR since the error is outside the scope of semgrep checks and the PR looks good but may need to open a new issue if you can also replicate to resolve

dlpzx · 2023-10-16T06:51:10Z

We should also consider upgrading to DDK to the newest major release. They do not make use of the core library anymore and the bootstraping of accounts is replaced by CDK bootstraping, which simplified the usage of pipelines: #747

@dlpzx

### Feature or Bugfix - Feature - Bugfix - Refactoring ### Detail #### Features * Limit pivot role S3 permissions by @dlpzx in #780 * Limit pivot role KMS permissions by @dlpzx in #830 * Add configurable session timeout to IDP by @manjulaK in #786 * Allow to submit a share when you are both an approver and a requester by @zsaltys in #793 * Redirect upon creating a share request by @zsaltys in #799 * Handle Pre-filtering of tables by @anushka-singh in #811 * Email Notification on Share Workflow - Issue - 734 by @TejasRGitHub in #818 * Refactor notifications from core to modules by @dlpzx in #822 * Add frontend and backend feature flags by @zsaltys in #817 * Make hosted_zone_id optional by @lorchda in #812 #### Fixes * Add Additional Error Messages for KMS Key lookup on imported dataset by @noah-paige in #748 * Handle Environment Import of IAM service roles by @noah-paige in #749 * Build Compliant Names for Opensearch Resources by @noah-paige in #750 * Update Lambda runtime by @nikpodsh in #782 * Ensure valid environments for share request and other objects creation by @dlpzx in #781 * Fix shell true semgrep by @dlpzx in #760 * Add condition when there are no public subnets by @lorchda in #794 * Remove unused variable by @zsaltys in #815 * Check other share exists before clean up by @noah-paige in #769 ### Relates - v2.1.0 minor release ## New Contributors * @manjulaK made their first contribution in #786 * @zsaltys made their first contribution in #793 * @anushka-singh made their first contribution in #811 * @TejasRGitHub made their first contribution in #818 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]>

dlpzx added 6 commits September 13, 2023 10:00

Fix shell=true in solutions bundling

4cdc02b

Merge branch 'main' into fix/shell-true-semgrep

90ca669

# Conflicts: # deploy/stacks/solution_bundling.py

Merge branch 'main' into fix/shell-true-semgrep

91e3e1e

shell=false in cdk_cli_wrapper

ad3b590

linting

178c076

Add command sanitizer

bc88a09

noah-paige linked an issue Sep 15, 2023 that may be closed by this pull request

Replace shell=true commands in subprocess function #738

Closed

anmolsgandhi assigned dlpzx Sep 19, 2023

dlpzx added 3 commits September 26, 2023 14:50

Add shell utils for command sanitization

a8a21f1

Merge remote-tracking branch 'origin/fix/shell-true-semgrep' into fix…

8577639

…/shell-true-semgrep # Conflicts: # backend/dataall/base/cdkproxy/cdk_cli_wrapper.py

Apply command sanitizer or transform to shell=false - all commands

83c3e9e

dlpzx marked this pull request as ready for review October 2, 2023 08:08

Small missing semgrep

d131f01

dlpzx changed the base branch from main to v2m1m0 October 2, 2023 08:26

dlpzx requested a review from noah-paige October 2, 2023 09:08

dlpzx changed the base branch from v2m1m0 to main October 2, 2023 09:08

noah-paige reviewed Oct 3, 2023

View reviewed changes

deploy/stacks/solution_bundling.py Outdated Show resolved Hide resolved

dlpzx changed the base branch from main to v2m1m0 October 10, 2023 15:21

dlpzx added 3 commits October 11, 2023 10:26

Semgrep fix AWS issues

1ddb04a

Remove deactivate and add logs

0733b91

ECS Credentials as variable

0b808b9

dlpzx commented Oct 11, 2023

View reviewed changes

noah-paige approved these changes Oct 12, 2023

View reviewed changes

dlpzx merged commit 599fc1a into v2m1m0 Oct 16, 2023

dlpzx modified the milestone: v2.1.0 Oct 30, 2023

dlpzx mentioned this pull request Oct 30, 2023

v2.1.0 features #840

Merged

dlpzx deleted the fix/shell-true-semgrep branch November 8, 2023 08:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: shell true semgrep #760

Fix: shell true semgrep #760

dlpzx commented Sep 14, 2023 •

edited

Loading

noah-paige commented Oct 3, 2023

noah-paige left a comment •

edited

Loading

dlpzx Oct 11, 2023

dlpzx Oct 11, 2023

dlpzx Oct 11, 2023

dlpzx Oct 11, 2023

noah-paige left a comment

dlpzx commented Oct 16, 2023 •

edited

Loading

Fix: shell true semgrep #760

Fix: shell true semgrep #760

Conversation

dlpzx commented Sep 14, 2023 • edited Loading

Feature or Bugfix

Detail

Relates

Security

noah-paige commented Oct 3, 2023

noah-paige left a comment • edited Loading

Choose a reason for hiding this comment

dlpzx Oct 11, 2023

Choose a reason for hiding this comment

dlpzx Oct 11, 2023

Choose a reason for hiding this comment

dlpzx Oct 11, 2023

Choose a reason for hiding this comment

dlpzx Oct 11, 2023

Choose a reason for hiding this comment

noah-paige left a comment

Choose a reason for hiding this comment

dlpzx commented Oct 16, 2023 • edited Loading

dlpzx commented Sep 14, 2023 •

edited

Loading

noah-paige left a comment •

edited

Loading

dlpzx commented Oct 16, 2023 •

edited

Loading