-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: shell true semgrep #760
Conversation
# Conflicts: # deploy/stacks/solution_bundling.py
…/shell-true-semgrep # Conflicts: # backend/dataall/base/cdkproxy/cdk_cli_wrapper.py
Testing this PR in an AWS Deployment - found errrors on the following
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some errors (mentioned above) when testing AWS Deployment - may need to look into other workarounds when setting shell=False
for cdkproxy
@@ -46,10 +47,15 @@ def aws_configure(profile_name='default'): | |||
print('..............................................') | |||
print(f"AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: {os.getenv('AWS_CONTAINER_CREDENTIALS_RELATIVE_URI')}") | |||
cmd = ['curl', '169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI'] | |||
process = subprocess.run(' '.join(cmd), text=True, shell=True, encoding='utf-8', capture_output=True) # nosec # nosemgrep | |||
process = subprocess.run(cmd, text=True, shell=False, encoding='utf-8', capture_output=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on this item, after plotting the error logs I think the issue is on the url to get the ECS-EC2 credentials. I will use f-string to create the curl command since we can obtain it as environment variable. I am now testing it in AWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noah-paige I tested in AWS and this way of defining the commands works :)
f"{path}" | ||
] | ||
|
||
cmd = ['rm', '-rf', f"{path}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the deactivate
command. I think we do not need it. I am testing if there are any implications when deploying to AWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in AWS and working fine from my end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the below scenarios in AWS as well:
- AWS testing - deployment of data.all
- AWS testing - deployment of any stack (backend/dataall/base/cdkproxy/cdk_cli_wrapper.py)
- AWS testing - deployment of cdk pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py)
- AWS testing - deployment of codepipeline pipeline stack (backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py)
NOTE - When the CDK Pipeline ran I received the following error in the Build Step related to aws_ddk_core
package:
ERROR: Could not find a version that satisfies the requirement aws_ddk_core==0.5.1 (from versions: 1.0.0b0, 1.0.0b1, 1.0.0rc0, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.3.0)
Approving this PR since the error is outside the scope of semgrep checks and the PR looks good but may need to open a new issue if you can also replicate to resolve
We should also consider upgrading to DDK to the newest major release. They do not make use of the core library anymore and the bootstraping of accounts is replaced by CDK bootstraping, which simplified the usage of pipelines: #747 |
### Feature or Bugfix - Feature - Bugfix - Refactoring ### Detail #### Features * Limit pivot role S3 permissions by @dlpzx in #780 * Limit pivot role KMS permissions by @dlpzx in #830 * Add configurable session timeout to IDP by @manjulaK in #786 * Allow to submit a share when you are both an approver and a requester by @zsaltys in #793 * Redirect upon creating a share request by @zsaltys in #799 * Handle Pre-filtering of tables by @anushka-singh in #811 * Email Notification on Share Workflow - Issue - 734 by @TejasRGitHub in #818 * Refactor notifications from core to modules by @dlpzx in #822 * Add frontend and backend feature flags by @zsaltys in #817 * Make hosted_zone_id optional by @lorchda in #812 #### Fixes * Add Additional Error Messages for KMS Key lookup on imported dataset by @noah-paige in #748 * Handle Environment Import of IAM service roles by @noah-paige in #749 * Build Compliant Names for Opensearch Resources by @noah-paige in #750 * Update Lambda runtime by @nikpodsh in #782 * Ensure valid environments for share request and other objects creation by @dlpzx in #781 * Fix shell true semgrep by @dlpzx in #760 * Add condition when there are no public subnets by @lorchda in #794 * Remove unused variable by @zsaltys in #815 * Check other share exists before clean up by @noah-paige in #769 ### Relates - v2.1.0 minor release ## New Contributors * @manjulaK made their first contribution in #786 * @zsaltys made their first contribution in #793 * @anushka-singh made their first contribution in #811 * @TejasRGitHub made their first contribution in #818 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Noah Paige <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jaidisido <[email protected]> Co-authored-by: mourya-33 <[email protected]> Co-authored-by: nikpodsh <[email protected]> Co-authored-by: MK <[email protected]> Co-authored-by: Zilvinas Saltys <[email protected]> Co-authored-by: Daniel Lorch <[email protected]> Co-authored-by: Anushka Singh <[email protected]> Co-authored-by: trajopadhye <[email protected]>
Feature or Bugfix
Detail
As explained in the semgrep docs: "Functions from the subprocess module have the shell argument for specifying if the command should be executed through the shell. Using shell=True is dangerous because it propagates current shell settings and variables. This means that variables, glob patterns, and other special shell features in the command string are processed before the command is run, making it much easier for a malicious actor to execute commands. The subprocess module allows you to start new processes, connect to their input/output/error pipes, and obtain their return codes. Methods such as Popen, run, call, check_call, check_output are intended for running commands provided as an argument ('args'). Allowing user input in a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability."
In our case the risk is not exposed as no user input is directly taken into the subprocess commands. Nevertheless we should strive for the highest standards on security and this PR works on replacing all the
shell=True
executions in the data.allcode.
In this PR:
shell=False
CommandSanitizer
ensures that the input arguments are strings following the regex=[a-zA-Z0-9-_]
Testing:
backend/dataall/base/cdkproxy/cdk_cli_wrapper.py
)backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py
)backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py
)backend/dataall/base/cdkproxy/cdk_cli_wrapper.py
)backend/dataall/modules/datapipelines/cdk/datapipelines_cdk_pipeline.py
)backend/dataall/modules/datapipelines/cdk/datapipelines_pipeline.py
)Relates
Security
Please answer the questions below briefly where applicable, or write
N/A
. Based onOWASP 10.
fetching data from storage outside the application (e.g. a database, an S3 bucket)?
eval
or similar functions are used?By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.