Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: shell true semgrep #760

Merged
merged 13 commits into from
Oct 16, 2023
Merged

Fix: shell true semgrep #760

merged 13 commits into from
Oct 16, 2023

Conversation

dlpzx
Copy link
Contributor

@dlpzx dlpzx commented Sep 14, 2023

Feature or Bugfix

  • Feature
  • Bugfix

Detail

As explained in the semgrep docs: "Functions from the subprocess module have the shell argument for specifying if the command should be executed through the shell. Using shell=True is dangerous because it propagates current shell settings and variables. This means that variables, glob patterns, and other special shell features in the command string are processed before the command is run, making it much easier for a malicious actor to execute commands. The subprocess module allows you to start new processes, connect to their input/output/error pipes, and obtain their return codes. Methods such as Popen, run, call, check_call, check_output are intended for running commands provided as an argument ('args'). Allowing user input in a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability."

In our case the risk is not exposed as no user input is directly taken into the subprocess commands. Nevertheless we should strive for the highest standards on security and this PR works on replacing all the shell=True executions in the data.all
code.

In this PR:

  • when possible we have set shell=False
  • in cases where the command was too complex a CommandSanitizer ensures that the input arguments are strings following the regex=[a-zA-Z0-9-_]

Testing:

  • local testing - deployment of any stack (backend/dataall/base/cdkproxy/cdk_cli_wrapper.py)
  • local testing - deployment of cdk pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py)
  • local testing - deployment of codepipeline pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py)
  • AWS testing - deployment of data.all
  • AWS testing - deployment of any stack (backend/dataall/base/cdkproxy/cdk_cli_wrapper.py)
  • AWS testing - deployment of cdk pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py)
  • AWS testing - deployment of codepipeline pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py)

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized? ---> 🆗 This is exactly what this PR is trying to do
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@noah-paige noah-paige linked an issue Sep 15, 2023 that may be closed by this pull request
@dlpzx dlpzx marked this pull request as ready for review October 2, 2023 08:08
@dlpzx dlpzx changed the base branch from main to v2m1m0 October 2, 2023 08:26
@dlpzx dlpzx requested a review from noah-paige October 2, 2023 09:08
@dlpzx dlpzx changed the base branch from v2m1m0 to main October 2, 2023 09:08
@noah-paige
Copy link
Contributor

Testing this PR in an AWS Deployment - found errrors on the following

  • ERROR - CDK Pipeline Creation
    • Stack created successfully but then receiving an error on deactivate command - I think originally used to deactivate venv before cleaning up but maybe we can remove this command not sure if it is needed

Screenshot 2023-10-03 at 10 55 28 AM

  • ERROR - DataPipeline or any other stack creation
    • Throwing error botocore.exceptions.NoCredentialsError: Unable to locate credentials
    • Earlier on when trying to resolve default credentials receive the error Unable to determine the default AWS account (EINVAL): EC2 Metadata roleName request returned error
    • May be originating from aws_configure() requiring shell=True for curl command?

Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some errors (mentioned above) when testing AWS Deployment - may need to look into other workarounds when setting shell=False for cdkproxy

deploy/stacks/solution_bundling.py Outdated Show resolved Hide resolved
@dlpzx dlpzx changed the base branch from main to v2m1m0 October 10, 2023 15:21
@@ -46,10 +47,15 @@ def aws_configure(profile_name='default'):
print('..............................................')
print(f"AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: {os.getenv('AWS_CONTAINER_CREDENTIALS_RELATIVE_URI')}")
cmd = ['curl', '169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI']
process = subprocess.run(' '.join(cmd), text=True, shell=True, encoding='utf-8', capture_output=True) # nosec # nosemgrep
process = subprocess.run(cmd, text=True, shell=False, encoding='utf-8', capture_output=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this item, after plotting the error logs I think the issue is on the url to get the ECS-EC2 credentials. I will use f-string to create the curl command since we can obtain it as environment variable. I am now testing it in AWS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noah-paige I tested in AWS and this way of defining the commands works :)

f"{path}"
]

cmd = ['rm', '-rf', f"{path}"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the deactivate command. I think we do not need it. I am testing if there are any implications when deploying to AWS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in AWS and working fine from my end

Copy link
Contributor

@noah-paige noah-paige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the below scenarios in AWS as well:

  • AWS testing - deployment of data.all
  • AWS testing - deployment of any stack (backend/dataall/base/cdkproxy/cdk_cli_wrapper.py)
  • AWS testing - deployment of cdk pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py)
  • AWS testing - deployment of codepipeline pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py)

NOTE - When the CDK Pipeline ran I received the following error in the Build Step related to aws_ddk_core package:

ERROR: Could not find a version that satisfies the requirement aws_ddk_core==0.5.1 (from versions: 1.0.0b0, 1.0.0b1, 1.0.0rc0, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.3.0)

Approving this PR since the error is outside the scope of semgrep checks and the PR looks good but may need to open a new issue if you can also replicate to resolve

@dlpzx
Copy link
Contributor Author

dlpzx commented Oct 16, 2023

We should also consider upgrading to DDK to the newest major release. They do not make use of the core library anymore and the bootstraping of accounts is replaced by CDK bootstraping, which simplified the usage of pipelines: #747

@dlpzx dlpzx merged commit 599fc1a into v2m1m0 Oct 16, 2023
@dlpzx dlpzx modified the milestone: v2.1.0 Oct 30, 2023
@dlpzx dlpzx mentioned this pull request Oct 30, 2023
dlpzx added a commit that referenced this pull request Nov 8, 2023
### Feature or Bugfix
- Feature
- Bugfix
- Refactoring

### Detail

#### Features
* Limit pivot role S3 permissions by @dlpzx in
#780
* Limit pivot role KMS permissions by @dlpzx in
#830
* Add configurable session timeout to IDP by @manjulaK in
#786
* Allow to submit a share when you are both an approver and a requester
by @zsaltys in #793
* Redirect upon creating a share request by @zsaltys in
#799
* Handle Pre-filtering of tables by @anushka-singh in
#811
* Email Notification on Share Workflow - Issue - 734 by @TejasRGitHub in
#818
* Refactor notifications from core to modules by @dlpzx in
#822
* Add frontend and backend feature flags by @zsaltys in
#817
* Make hosted_zone_id optional by @lorchda in
#812

#### Fixes
* Add Additional Error Messages for KMS Key lookup on imported dataset
by @noah-paige in #748
* Handle Environment Import of IAM service roles by @noah-paige in
#749
* Build Compliant Names for Opensearch Resources by @noah-paige in
#750
* Update Lambda runtime by @nikpodsh in
#782
* Ensure valid environments for share request and other objects creation
by @dlpzx in #781
* Fix shell true semgrep by @dlpzx in
#760
* Add condition when there are no public subnets by @lorchda in
#794
* Remove unused variable by @zsaltys in
#815
* Check other share exists before clean up by @noah-paige in
#769


### Relates
- v2.1.0 minor release

## New Contributors
* @manjulaK made their first contribution in
#786
* @zsaltys made their first contribution in
#793
* @anushka-singh made their first contribution in
#811
* @TejasRGitHub made their first contribution in
#818

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Noah Paige <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: jaidisido <[email protected]>
Co-authored-by: mourya-33 <[email protected]>
Co-authored-by: nikpodsh <[email protected]>
Co-authored-by: MK <[email protected]>
Co-authored-by: Zilvinas Saltys <[email protected]>
Co-authored-by: Daniel Lorch <[email protected]>
Co-authored-by: Anushka Singh <[email protected]>
Co-authored-by: trajopadhye <[email protected]>
@dlpzx dlpzx deleted the fix/shell-true-semgrep branch November 8, 2023 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace shell=true commands in subprocess function
2 participants