📢 Pod Security Standards and Openshift changes on 4.11 and 4.12 that might affect your workloads(Operator, Operands) #1417
Replies: 3 comments 6 replies
-
This is a bit misleading. On OCP 4.11, I don't think this^ is documented anywhere, I had to figure this out via trial-and-error:
CC - @taylorjordanNC since it looks like @camilamacedo86 is no longer with Red Hat. |
Beta Was this translation helpful? Give feedback.
-
Hey @camilamacedo86, @taylorjordanNC - a bit of a tricky question for you between 4.10 and 4.11 The problemOur solution (a database engine) requires This worked fine in OpenShift 4.10, since the But in 4.11, our container fails to come up, unless we add this to the container # StatefulSet
securityContext:
capabilities:
add:
- NET_BIND_SERVICE This is good, container comes up in 4.11 as Now, the problem is, we need to backward-compat support Customers on 4.10. And in 4.10 or
So this means:
Possible solutions
Question for youAny other solutions you can think of, besides 1? |
Beta Was this translation helpful? Give feedback.
-
@mdrakiburrahman We've run into this as well. You simply can't have a Pod Template generically built to use NET_BIND_SERVICE that runs on both OCP 4.10 and 4.11 simultaneously if they are configured to use the out-of-box We followed a pattern similar to your option 1. But instead of detecting what version of OCP or Kube we are running on, we instead test for the MODE of the Target namespace for the Pod. Your Operator would then test the effective security model for that namespace, answering:
One generic way to do this test is to create a Test Deployment using the same Service Account prior to creating your real StatefulSet and see if it starts successfully. Or just try and run your StatefulSet one way and revert it to the other way if you detect a the error. Another way is to check the Namespace statically using various API, Resource and Label checks (a bit for fragile). I'm not aware of a simple API to accomplish this (someone should write one). A fairly narrow test that carries some assumptions with it, is to simply test if the |
Beta Was this translation helpful? Give feedback.
-
Kubernetes API has been changing, and the PodSecurityPolicy API is deprecated and will no longer be served from k8s 1.25. This API is replaced by a new built-in admission controller (KEP-2579: Pod Security Admission Control), allowing cluster admins to enforce the Pod Security Standards with Namespace Labels. Also, OpenShift security context constraints (SCC) have been changing to address these needs.
💁What are the changes?
With the introduction of the new built-in admission controller that enforces the Pod Security Standards, Namespace and Pods can be defined with three different policies: Privileged, Baseline and Restricted. Therefore, Pod(s) that are not configured according to the enforced security standards defined globally or on the namespace level will not be admitted and cannot run.
In Openshift 4.11:
Note: To know more about check the Pod security admission release notes for OCP 4.11 release. Also, If you would like to know more about the OpenShit changes and its motivations check out the blog Pod Security Admission in OpenShift 4.11.
Who is affected by these changes?
Access to the restricted SCC policy is no longer granted to all users by default in new 4.11 clusters. This means that workloads which previously used the restricted SCC may not have access to it in v4.11. If the workload does not qualify for another SCC that it has access to (such as the restricted-v2 policy), the workload will not be admitted onto the cluster.
Note that workloads can be categorized into
restricted-v2
regardless of what capabilities they explicitly drop, so any workload that otherwise qualifies asrestricted-v2
will automatically have ALL capabilities dropped.What is planned for Openshift 4.12?
By the global PSA configuration, all namespaces will run in the “restricted” enforcement mode. This PSA level aligns with the
restricted-v2
SCC policy, so if your workload is categorized intorestricted-v2
, it will be able to run in a namespace that enforces PSA as restricted.In cases where the workload has access to higher SCC privileges (via its Service Account) OCP/OLM (via the new label syncer component) will label namespaces to synchronize the PSA enforcement level to “Privileged” or “Baseline” in order to keep workloads running.⚠️ However, be aware:
"openshift-"
and do not have an OLM Operator(CSV) installed in them.Who is affected by these changes?
Note that the change is mainly in enforcing the restricted policy by default to namespaces prefixed with
openshift-
to not allow workloads which can not be qualified to restricted-v2 SCC to be admitted and consequently run on them.In Openshift, ideally, no one who is distributing solutions integrated and managed by OLM. (unless your workload was created in a namespace that has no CSV installed OR those which are part of Openshift system)
Your Operator solution might be affected in the following ways:
Workloads which cannot be accepted as restricted-v2 SCC will fail to run in the OCP payload namespace (see the list) or in namespaces prefixed with
"openshift-"
which do not have an OLM Operator(CSV) installed in them.Workloads that can run as restricted-v2 but are not configured accordingly or those that require escalated permissions.
error creating new pod: oo-g4t6w-: pods "oo-g4t6w-xlt6m" is forbidden: violates PodSecurity "restricted:latest": runAsNonRoot != true (pod or container "registry-server" must set securityContext.runAsNonRoot=true)
🙆 So, what should I do?
Check if you are affected by the changes on 4.11:
The best recommendation is to ensure workloads can run as restricted unless they require escalating privileges to function. Also, you will properly set their security context configuration.
Action 1: Check If your solution has one or more workloads that:
spec.containers[*].securityContext.allowPrivilegeEscalation = true
and are NOT using other security context configurations that would disqualify the workload to run as restricted up Openshift 4.10 (i.e. runAsUser )
OR
spec.containers[*].securityContext.capabilities
. These other capabilities will be automatically dropped in 4.11(v2 SCC) but were not dropped in 4.10(v1 SCC).Then, your solution might run as restricted before(Openshift versions up to 4.10) but now(in OCP 4.11+) it does not qualify for restricted-v2 SCC. Therefore, OCP/OLM users might be unable to run these workloads on their cluster OR for the item (SCENARIO 2) your workload will not have the required permission to do what it must do.
In this case, you can:
OPTION A) If or when possible, re-develop your product so any Pod(s) and container(s) created do not require escalating privileges
Example:
OR
OPTION B) Ensure that the ServiceAccount managing the workloads has the correct privileges to use the SCCs that allow it to run the workloads it is responsible for.
Note: You may define this into the operator metadata bundle via the RBAC specified in the CSV which is used for the ServiceAccount managing the workloads. To know what is more appropriate SCC checks its documentation Also, for the purpose of the related changes on 4.12 see the item "If your solutions require escalated permissions" in the detailed information provided below.
Example:
Therefore, if your application fails in SCENARIO 2 (i.e. it requires scaling privileges because it requires specific capabilities to run and perform operations) then make sure to explicitly write out the required capabilities in the security context of your workload.
Example:
Action 2: If your Workloads (Operator, Operand(s)) should run as restricted-v2 (it is not configured and/or has no reason to escalate permissions): test your solutions against a NEW Openshift 4.11 and ensure that all workloads are admitted via the restricted-v2 SCC. (See the FAQ section to know more about)
The most straightforward way to ensure that your workloads can work on a restricted namespace(coming in 4.12) is by labelling the namespaces where they should run by enforcing the restricted policy and verifying if they are admitted and are successfully running. You can also check if they are assigned the restricted-v2 SCC in 4.11. It is recommended to ensure the desired behaviour via an e2e test.
💡 If you need to check or verify the SCC a workload is using, you can find the value in the annotation openshift.io/scc, for example:
IMPORTANT NOTE: Please ensure you test and validate your workloads (Operator, Operands) specifically against NEW OpenShift 4.11 installations. If you do not do this, you may not see the full scope of failures because your workloads may still qualify for the restricted (v1) SCC policies.
Best Practices Moving Forward: What is recommended?
The best approach is to ensure that any workload has its security context properly set. Therefore, the best option for any new publication is to ensure that workloads (Operators, Operands) can run under restricted permissions. In this way, your Operator can run on restricted namespaces on Kubernetes vanilla or likely in other vendors and can have access to the restricted-v2 on Openshift and will not be pushed back by cluster admins who are looking to enforce the restrictions. (see the caveats below)
However, If your solutions require escalated permissions: it is recommended that you ensure the namespace containing your solution is labeled accordingly so you are not reliant on the label syncer. You can either update your operator to manage the namespace labels or include the namespace labelling as part of the manual install instructions.
Assuming your operator already grants the appropriate SCC permissions to your operator service account, the label syncer should already be setting the PSA label on the namespace appropriately. If you choose one of those options, you also MUST to disable the labe syncer to ensure it does not reset the PSA label after your operator or user sets it.
On top of that, cluster admins will be looking to understand why your solution requires raised permissions. In this way, please ensure that you properly describe the reasons. You can add this information and the prerequisites to the description of your Operator Bundle (CSV).
You can find code examples, tips and tools to check your operator solutions here.
Note that while K8S restricted requires workloads to set seccompProfile=runtime/default, in OCP 4.10 and earlier, setting the seccompProfile explicitly disqualified the workload from the restricted SCC. To be compatible with 4.10 and 4.11, the seccompProfile value must be left unset(the SCC itself will default it to runtime/default so it is ok to leave it empty).
With some configurations, your solution might comply with the restricted Kubernetes definition but will not be accepted under the SCC restricted-v2 in OCP. In K8s you can specify the
runAsUser
and get the Pod/Container running in a restricted namespace however, for Openshift’srestricted/restricted-v2
SCC you MUST leave therunAsUser
field empty, or provide a value that falls within the specific user range for the namespace. Otherwise, you will only be able to run the Pod if it has access to the SCCnonroot-v2
oranyuid
. If the image used requires a user, the best option is to ensure that the userID is properly defined in the image and not via the security context.TL'DR ( detailed info/ignore if you do not need it):
Below you will find a detailed description of how to configure and test your solutions. Also, we provide an FAQ section which we hope can answer the questions that you might have.
FAQ
🙋♀️ What happens if my solution is impacted when the cluster admin upgrades the OCP cluster from 4.10 to 4.11 with it installed? Will my workload(s) fail because the restricted SCC policy is no longer granted by default?
Default access to the legacy restricted SCC policy is not granted in NEW clusters but is carried forward in upgraded clusters. Therefore, all 4.11 clusters(upgraded and new installs) will have a restricted and a restricted-v2 SCC. But only upgraded clusters will have the permission to use the restricted SCC policy by default.
That means, your workload would not be impacted by a cluster upgrade. However, you must provide solutions that work for both scenarios and for the default OpenShift configuration. In this way, you must ensure that your workloads work for new OCP installs.
🙋♀️Could you please provide the details on how can I publish my Operator bundle versions following the recommendations on Openshif?
If your solution(s) do NOT need escalated permissions: (the most common scenario)
If you can publish a specific solution for Openshift 4.11+ only: (i.e. com.redhat.openshift.versions: 4.11): It is recommended that any Pod/Container created/managed by your Operator as your Operator itself(deployment strategy defined on your CSV) is configured as restricted:
Example:
If you need to publish your solution on previous OCP releases (< 4.11): ensure that any Pod/Container that is created/managed by your Operator and the Operator itself (a deployment strategy defined in your CSV) can be admitted as restricted-v2 by OCP 4.11 and work on the previous OCP versions (< 4.11).
Then, check the RBAC permissions used by the ServiceAccount defined in the CSV. Ensure that you are not allowing access to a higher SCC priority by giving permissions to the apiGroup security.openshift.io ( see the example ). Note that you do not need to give permission to access the restricted-v2. (all users have access to this one).
If your solutions require escalated permissions:
You can publish this new Operator bundle version for both newer (4.11+) and older (4.10-) clusters as the namespace labels will have no effect on older clusters.
🙋♀️When/if the Pod/Container can not run as restricted-v2 and has no access to more permissive SCC to be admitted and run, how does it fail? Could you provide an example?
Following an example where the
spec.containers[0].securityContext.allowPrivilegeEscalation = true
:💂How can I verify if my Operators or Operands are been accepted to the restrictive-v2 criteria?
Example of a Pod admitted as restricted-v2
Example of a Pod admitted as privileged (when they are defined as requiring escalated permissions):
👩🏭I created a Pod in the OCP following the recommendations and the value of the annotation openshift.io/scc is anyuid and not restricted-v2? Why does it happen? What is wrong?
You will check an SCC set on the annotations of your Pod/Containers (openshift.io/scc:value) according to the Pod/Container(s) securityContext configuration and the permissions of the ServiceAccount used to create them. If you are logged as the cluster-admin you have higher permissions and you might get SCC such as non-root-v2 or anyuid. (More info)
To do this as a cluster admin, the easiest approach is to create a Deployment and ensure that the deployment's ServiceAccount does not have any special permissions. The Deployment will be created using your admin permissions, but the Pod itself will be created (and checked for admission/SCC) using the SA permissions.
However, another easier approach can be logged with an account that only has access to the restricted-v2 SCC (not cluster-admin) to create the Pod. Then, you can verify if it will be admitted and qualified as "restricted-v2"
However, If you are ONLY looking to check a manifest you can use the tool to apply it against OCP 4.11+ (i.e. oc apply -f mypod.yaml ) or locally with Kind. By using Kind you MUST label the namespace and then apply the manifest to check the WARN, see (i.e, with Kind v0.14.0):
Note: More info about the metrics can be found in the doc.
👮Aren’t the pipelines checking my Operator solutions against new OCP installs and for all versions where my Operator bundle will be distributed?
All pipelines only check if your Operator bundle can be installed and the deployment defined on the CSV. They are unable to check if the Operands workloads will be able to run. Note that on these tests the CRs are NOT applied and any logic called in your reconciliations can NOT be checked.
🙋♀️Can I use the audit logs?
By looking at the audit logs (
$ oc adm must-gather -- /usr/bin/gather_audit_logs
) you can find more info about what can violate the PodSecurity.👩🏭I followed the recommendations to publish solutions for Openshift < 4.11 and I still checking the warnings when I apply a manifest. Did I make something wrong?
See that to publish workable solutions for Openshift < 4.11 you will need to leave the securityContext.seccompProfile empty and you will still check warns without the other specs like
👩🏭What happens if I use
runAsUser
in my workloads?On Kubernetes vanilla clusters, your workload will run in a namespaced enforced as restricted unless the user is defined with a value that is NOT 0(root). However, as described in the CAVEATs above, the workload will not be accepted on OCP to be qualified for the restricted-v2. Therefore you will face issues like:
Note that the range of users is per namespace and we are unable to know beforehand a fixed value. The best option is to ensure that you can workload can run without set the
RunAsUser
field. So that, your workload can run as restricted on K8s clusters and be qualified to restricted-v2 SCC on Openshift.However, if the image used does not define a USER ID you will check the following error:
"container has runAsNonRoot and image will run as root …
when theRunAsNonRoot
field is set with a true value. Then, you can define a USER ID in the Dockerfile instead of using runAsUser in the security context, such as:If you are unable to change the image, for Openshift distributions you MUST leave the
RunAsUser
andRunAsNonRoot
fields empty and let the Openshift SCC inject the values when your workload is qualified to the SCC policy.Beta Was this translation helpful? Give feedback.
All reactions