📢 Pod Security Standards and Openshift changes on 4.11 and 4.12 that might affect your workloads(Operator, Operands) #1417

camilamacedo86 · 2022-07-15T12:54:45Z

camilamacedo86
Jul 15, 2022

Kubernetes API has been changing, and the PodSecurityPolicy API is deprecated and will no longer be served from k8s 1.25. This API is replaced by a new built-in admission controller (KEP-2579: Pod Security Admission Control), allowing cluster admins to enforce the Pod Security Standards with Namespace Labels. Also, OpenShift security context constraints (SCC) have been changing to address these needs.

💁What are the changes?

With the introduction of the new built-in admission controller that enforces the Pod Security Standards, Namespace and Pods can be defined with three different policies: Privileged, Baseline and Restricted. Therefore, Pod(s) that are not configured according to the enforced security standards defined globally or on the namespace level will not be admitted and cannot run.

In Openshift 4.11:

The new SCC policies [restricted-v2, nonroot-v2, and hostnetwork-v2] are introduced with new criteria to admit workloads according to the PodSecurity Standards Kubernetes definitions. Permissions to use the restricted-v2 SCC is granted to all users.
The new (v2) policy versions drop ALL capabilities, while the previous versions [v1] only drop a subset.

v1 vs. v2 SCC policies

V2 does not permit allowPrivilegeEscalation=true
** Empty or false is compatible with v1 SCC and therefore works on OCP versions < 4.11

V2 requires you to leave the dropped capabilities empty, set it to ALL, or add only NET_BIND_SERVICE
** By being accepted as v2 the SCC will always drop ALL. V1 only dropped KILL, MKNOD, SETUID, SETGID capabilities.

V2 still allows explicitly adding the NET_BIND_SERVICE capability

V2 requires you to either leave SeccompProfile empty or set it to runtime/default
** Empty is compatible with v1 and works on OCP versions < 4.11

Note: To know more about check the Pod security admission release notes for OCP 4.11 release. Also, If you would like to know more about the OpenShit changes and its motivations check out the blog Pod Security Admission in OpenShift 4.11.

Who is affected by these changes?

SCENARIO 1) Workloads (Operator, Operands) qualified for restricted before (4.10) but now can not qualify for restricted-v2(4.11).

Access to the restricted SCC policy is no longer granted to all users by default in new 4.11 clusters. This means that workloads which previously used the restricted SCC may not have access to it in v4.11. If the workload does not qualify for another SCC that it has access to (such as the restricted-v2 policy), the workload will not be admitted onto the cluster.

Who are these ones?
Workloads that are using spec.containers[].securityContext.allowPrivilegeEscalation = true AND are NOT using other security context configurations that would disqualify the workload to run as restricted up Openshift 4.10 (i.e. runAsUser )

SCENARIO 2) If your workload is categorized into the restricted-v2 or nonroot-v2 or hostnetwork-v2 SCC you may find it fails to run because it tries to use a capability that is now being dropped.

Note that workloads can be categorized into restricted-v2 regardless of what capabilities they explicitly drop, so any workload that otherwise qualifies as restricted-v2 will automatically have ALL capabilities dropped.

Who are these ones?
Any workload that requires capabilities other than KILL, MKNOD, SETUID, SETGID, NET_BIND_SERVICE AND does not described in spec.containers[].securityContext.capabilities these other capabilities which will be automatically dropped in 4.11(restricted-v2 SCC) but were not dropped in 4.10(restricted SCC). Therefore, the workload will not have the required permission to do what it must do.

What is planned for Openshift 4.12?

By the global PSA configuration, all namespaces will run in the “restricted” enforcement mode. This PSA level aligns with the restricted-v2 SCC policy, so if your workload is categorized into restricted-v2, it will be able to run in a namespace that enforces PSA as restricted.

In cases where the workload has access to higher SCC privileges (via its Service Account) OCP/OLM (via the new label syncer component) will label namespaces to synchronize the PSA enforcement level to “Privileged” or “Baseline” in order to keep workloads running. ⚠️However, be aware:

this automatic enforcement labeling logic will NOT be applied to system namespaces that are managed as part of the Openshift system or those which are prefixed with "openshift-" and do not have an OLM Operator(CSV) installed in them.
cluster admins can enforce the PSA level for a namespace to be restricted.

Who is affected by these changes?

Note that the change is mainly in enforcing the restricted policy by default to namespaces prefixed with openshift- to not allow workloads which can not be qualified to restricted-v2 SCC to be admitted and consequently run on them.

In Openshift, ideally, no one who is distributing solutions integrated and managed by OLM. (unless your workload was created in a namespace that has no CSV installed OR those which are part of Openshift system)

Your Operator solution might be affected in the following ways:

Workloads which cannot be accepted as restricted-v2 SCC will fail to run in the OCP payload namespace (see the list) or in namespaces prefixed with "openshift-" which do not have an OLM Operator(CSV) installed in them.
Workloads that can run as restricted-v2 but are not configured accordingly or those that require escalated permissions.

Note that the namespaces where your workloads run might be labeled as "Privileged" or "Baseline" by the label syncer. This will work, but cluster admins may ask why escalated permissions are necessary and push for the workload definition to be changed so that it can use the restricted-v2 SCC, and thereby allow the NS to be labeled as restricted.

Operator solutions which are also distributed on vanilla k8s clusters or other vendors.

This label syncing behaviour is an Openshift feature and it will not occur on vanilla k8s clusters, bear in mind that you may need to explicitly state Pod Security admission requirements for your workloads. So if your workload doesn’t meet the requirements, it will fail to run.

If you or your solution creates Pod(s) directly (rather than using a workload resource like Deployment or StatefulSet to manage the pod) and those pods are created using an account which has broader SCC privileges than any other ServiceAccount in that namespace it might fail unless you configure the pod security context such that it can run under the PSA constraints according to the namespace PSA enforcement level.

From 4.12 all namespaces will have the label to enforce the restricted policy by default ( pod-security.kubernetes.io/enforce=restricted) and it brings a special case when you try to create Pod(s) directly. Therefore, to avoid this scenario it would be recommended to ensure that Pods are created and managed by Deployments, rather than being created directly such as in your Golang code implementations, and/or ansible playbooks and/or helm charts.

Be aware that if you set the service account and be logged in, for example, as cluster admin which has permission to the privileged SCC and tries to create a Pod directly it will fail unless you set all security context settings to run as restricted. The same does not happen if you are creating a Deployment instead because the Deployment will be created as cluster admin then, assuming that only restricted-v2 is granted to the SA informed/used by this Deployment, SCC admission will set the values for the fields which are empty in the pod definition to the values required to admit the pod under the restricted-v2 SCC which will also allow it to meet the requirements of PSA-restricted.

If you create the pod using an account which has access to a higher permission SCC such as “privileged”, and if the pod does not explicitly define its security context to meet the requirements of PSA-restricted, the pod may be admitted under that SCC in which case SCC admission will not mutate the pod and the resulting pod may not meet the requirements of PSA-restricted, resulting in the pod being rejected by PSA.

To create the Pod directly and avoid this problem, there are 3 options:

Raise the PSA namespace permissions to baseline or privileged (i.e. pod-security.kubernetes.io/enforce=previleged)

Use an account that only has access to the restricted-v2 SCC when directly creating pods (e.g. oc create -f my-pod.yaml --as system:serviceaccount:your-namespacep1:default)

Ensure the pod’s security context configuration fulfils the requirements of PSA-restricted

If you or your solution creates workloads in special Openshift system namespaces where the SCC where admission is not enforced, these workloads can only be admitted if they can run under the enforced policy and are with all security context set accordingly

Ideally, nobody should be using these namespaces. However, be aware that in those namespaces (i.e. openshift, openshift-config, openshift-config-managed) SCC admission is not enforced and therefore SCC mutation is not performed. In this way if your workload does NOT set all security context criteria, for example, to run as restricted, since all namespaces are by default enforced as restricted, it will fail. Example:

error creating new pod: oo-g4t6w-: pods "oo-g4t6w-xlt6m" is forbidden: violates PodSecurity "restricted:latest": runAsNonRoot != true (pod or container "registry-server" must set securityContext.runAsNonRoot=true)

Note: By running oc get ns -l"openshift.io/run-level in (0,1)", you can check the namespaces that do not perform SCC mutation.

🙆 So, what should I do?

Check if you are affected by the changes on 4.11:

The best recommendation is to ensure workloads can run as restricted unless they require escalating privileges to function. Also, you will properly set their security context configuration.

Action 1: Check If your solution has one or more workloads that:

(SCENARIO 1) are using spec.containers[*].securityContext.allowPrivilegeEscalation = true
and are NOT using other security context configurations that would disqualify the workload to run as restricted up Openshift 4.10 (i.e. runAsUser )
OR
(SCENARIO 2) requires capabilities other than KILL, MKNOD, SETUID, SETGID, NET_BIND_SERVICE to run and which are not described in spec.containers[*].securityContext.capabilities. These other capabilities will be automatically dropped in 4.11(v2 SCC) but were not dropped in 4.10(v1 SCC).

Then, your solution might run as restricted before(Openshift versions up to 4.10) but now(in OCP 4.11+) it does not qualify for restricted-v2 SCC. Therefore, OCP/OLM users might be unable to run these workloads on their cluster OR for the item (SCENARIO 2) your workload will not have the required permission to do what it must do.

In this case, you can:

OPTION A) If or when possible, re-develop your product so any Pod(s) and container(s) created do not require escalating privileges

Example:

spec:
 securityContext:
   # Do not use SeccompProfile if your project must work on 
   # old k8s versions < 1.19 and Openshift < 4.11
   seccompProfile:
      type: RuntimeDefault
   runAsNonRoot: true
 containers:
   - name: my-container
     securityContext:
       allowPrivilegeEscalation: false
       capabilities:
         drop:
           - ALL

OR

OPTION B) Ensure that the ServiceAccount managing the workloads has the correct privileges to use the SCCs that allow it to run the workloads it is responsible for.

Note: You may define this into the operator metadata bundle via the RBAC specified in the CSV which is used for the ServiceAccount managing the workloads. To know what is more appropriate SCC checks its documentation Also, for the purpose of the related changes on 4.12 see the item "If your solutions require escalated permissions" in the detailed information provided below.

Example:

 install:
    spec:
      clusterPermissions:
      - rules:
        - apiGroups:
          - security.openshift.io
          resourceNames:
          - privileged
          resources:
          - securitycontextconstraints
          verbs:
          - use
        serviceAccountName: default

Therefore, if your application fails in SCENARIO 2 (i.e. it requires scaling privileges because it requires specific capabilities to run and perform operations) then make sure to explicitly write out the required capabilities in the security context of your workload.

Example:

 containers:
   - name: my-container
     securityContext:
       allowPrivilegeEscalation: false
       capabilities:
         add:
           - "NET_ADMIN"

Action 2: If your Workloads (Operator, Operand(s)) should run as restricted-v2 (it is not configured and/or has no reason to escalate permissions): test your solutions against a NEW Openshift 4.11 and ensure that all workloads are admitted via the restricted-v2 SCC. (See the FAQ section to know more about)

The most straightforward way to ensure that your workloads can work on a restricted namespace(coming in 4.12) is by labelling the namespaces where they should run by enforcing the restricted policy and verifying if they are admitted and are successfully running. You can also check if they are assigned the restricted-v2 SCC in 4.11. It is recommended to ensure the desired behaviour via an e2e test.

💡 If you need to check or verify the SCC a workload is using, you can find the value in the annotation openshift.io/scc, for example:

kind: Pod
apiVersion: v1
metadata:
 annotations:
   openshift.io/scc: privileged

IMPORTANT NOTE: Please ensure you test and validate your workloads (Operator, Operands) specifically against NEW OpenShift 4.11 installations. If you do not do this, you may not see the full scope of failures because your workloads may still qualify for the restricted (v1) SCC policies.

Best Practices Moving Forward: What is recommended?

The best approach is to ensure that any workload has its security context properly set. Therefore, the best option for any new publication is to ensure that workloads (Operators, Operands) can run under restricted permissions. In this way, your Operator can run on restricted namespaces on Kubernetes vanilla or likely in other vendors and can have access to the restricted-v2 on Openshift and will not be pushed back by cluster admins who are looking to enforce the restrictions. (see the caveats below)

However, If your solutions require escalated permissions: it is recommended that you ensure the namespace containing your solution is labeled accordingly so you are not reliant on the label syncer. You can either update your operator to manage the namespace labels or include the namespace labelling as part of the manual install instructions.

Assuming your operator already grants the appropriate SCC permissions to your operator service account, the label syncer should already be setting the PSA label on the namespace appropriately. If you choose one of those options, you also MUST to disable the labe syncer to ensure it does not reset the PSA label after your operator or user sets it.

On top of that, cluster admins will be looking to understand why your solution requires raised permissions. In this way, please ensure that you properly describe the reasons. You can add this information and the prerequisites to the description of your Operator Bundle (CSV).

You can find code examples, tips and tools to check your operator solutions here.

⚠️ CAVEATS:

Note that while K8S restricted requires workloads to set seccompProfile=runtime/default, in OCP 4.10 and earlier, setting the seccompProfile explicitly disqualified the workload from the restricted SCC. To be compatible with 4.10 and 4.11, the seccompProfile value must be left unset(the SCC itself will default it to runtime/default so it is ok to leave it empty).
With some configurations, your solution might comply with the restricted Kubernetes definition but will not be accepted under the SCC restricted-v2 in OCP. In K8s you can specify the runAsUser and get the Pod/Container running in a restricted namespace however, for Openshift’s restricted/restricted-v2 SCC you MUST leave the runAsUser field empty, or provide a value that falls within the specific user range for the namespace. Otherwise, you will only be able to run the Pod if it has access to the SCC nonroot-v2 or anyuid. If the image used requires a user, the best option is to ensure that the userID is properly defined in the image and not via the security context.

TL'DR ( detailed info/ignore if you do not need it):

Below you will find a detailed description of how to configure and test your solutions. Also, we provide an FAQ section which we hope can answer the questions that you might have.

FAQ

🙋‍♀️ What happens if my solution is impacted when the cluster admin upgrades the OCP cluster from 4.10 to 4.11 with it installed? Will my workload(s) fail because the restricted SCC policy is no longer granted by default?

Default access to the legacy restricted SCC policy is not granted in NEW clusters but is carried forward in upgraded clusters. Therefore, all 4.11 clusters(upgraded and new installs) will have a restricted and a restricted-v2 SCC. But only upgraded clusters will have the permission to use the restricted SCC policy by default.

That means, your workload would not be impacted by a cluster upgrade. However, you must provide solutions that work for both scenarios and for the default OpenShift configuration. In this way, you must ensure that your workloads work for new OCP installs.

🙋‍♀️Could you please provide the details on how can I publish my Operator bundle versions following the recommendations on Openshif?

If your solution(s) do NOT need escalated permissions: (the most common scenario)

If you can publish a specific solution for Openshift 4.11+ only: (i.e. com.redhat.openshift.versions: 4.11): It is recommended that any Pod/Container created/managed by your Operator as your Operator itself(deployment strategy defined on your CSV) is configured as restricted:

Example:

spec:
 securityContext:
   # Do not use SeccompProfile if your project must work on 
   # old k8s versions < 1.19 and Openshift < 4.11
   seccompProfile:
      type: RuntimeDefault
   runAsNonRoot: true
 containers:
   - name: my-container
     securityContext:
       allowPrivilegeEscalation: false
       capabilities:
         drop:
           - ALL

Note: if you need to run a container using a User ID please check the question What happens if I use runAsUser in my workloads?

If you need to publish your solution on previous OCP releases (< 4.11): ensure that any Pod/Container that is created/managed by your Operator and the Operator itself (a deployment strategy defined in your CSV) can be admitted as restricted-v2 by OCP 4.11 and work on the previous OCP versions (< 4.11).

To simplify, it mainly means ensuring that:
All workloads(Operator and Operands Pods) are NOT configured with: seccompProfile(must leave this unset) and do not set allowPrivilegeEscalation: true

Example:
spec:
 securityContext:
   runAsNonRoot: true
 containers:
   - name: my-container
     securityContext:
       allowPrivilegeEscalation: false
       capabilities:
         drop:
           - ALL
Note: by testing against an Openshift 4.11 (if your distribution will be added on 4.11 as well) to ensure that all workloads(Operator and Operands Pods) will receive the annotation openshift.io/scc:restricted-v2

Then, check the RBAC permissions used by the ServiceAccount defined in the CSV. Ensure that you are not allowing access to a higher SCC priority by giving permissions to the apiGroup security.openshift.io ( see the example ). Note that you do not need to give permission to access the restricted-v2. (all users have access to this one).

If your solutions require escalated permissions:

You can publish this new Operator bundle version for both newer (4.11+) and older (4.10-) clusters as the namespace labels will have no effect on older clusters.

🙋‍♀️When/if the Pod/Container can not run as restricted-v2 and has no access to more permissive SCC to be admitted and run, how does it fail? Could you provide an example?

Following an example where the spec.containers[0].securityContext.allowPrivilegeEscalation = true:

reason: FailedCreate
     message: >-
       pods "test-79cbf469c7-" is forbidden: unable to validate against any  security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.allowPrivilegeEscalation: Invalidvalue: true: Allowing privilege escalation for containers is not allowed, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler":
       Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

💂How can I verify if my Operators or Operands are been accepted to the restrictive-v2 criteria?

Spinning a new Openshift 4.11 cluster
Then, install your Operator and/or your Operands
Check the Pod YAML after they are applied and look for the annotation openshift.io/scc

Example of a Pod admitted as restricted-v2

kind: Pod
apiVersion: v1
metadata:
 generateName: skupper-site-controller-74f654c66d-
 annotations:
   openshift.io/scc: restricted-v2
   operators.operatorframework.io/builder: operator-sdk-v1.4.0+git
   operators.operatorframework.io/project_layout: go.kubebuilder.io/v3

Example of a Pod admitted as privileged (when they are defined as requiring escalated permissions):

kind: Pod
apiVersion: v1
metadata:
 generateName: k8s-triliovault-resource-cleaner-27598110-
 annotations:
   openshift.io/scc: privileged

👩‍🏭I created a Pod in the OCP following the recommendations and the value of the annotation openshift.io/scc is anyuid and not restricted-v2? Why does it happen? What is wrong?

You will check an SCC set on the annotations of your Pod/Containers (openshift.io/scc:value) according to the Pod/Container(s) securityContext configuration and the permissions of the ServiceAccount used to create them. If you are logged as the cluster-admin you have higher permissions and you might get SCC such as non-root-v2 or anyuid. (More info)

To do this as a cluster admin, the easiest approach is to create a Deployment and ensure that the deployment's ServiceAccount does not have any special permissions. The Deployment will be created using your admin permissions, but the Pod itself will be created (and checked for admission/SCC) using the SA permissions.

However, another easier approach can be logged with an account that only has access to the restricted-v2 SCC (not cluster-admin) to create the Pod. Then, you can verify if it will be admitted and qualified as "restricted-v2"

However, If you are ONLY looking to check a manifest you can use the tool to apply it against OCP 4.11+ (i.e. oc apply -f mypod.yaml ) or locally with Kind. By using Kind you MUST label the namespace and then apply the manifest to check the WARN, see (i.e, with Kind v0.14.0):

kubectl label --overwrite ns --all pod-security.kubernetes.io/enforce-version=v1.24   pod-security.kubernetes.io/warn=restrictedkubectl apply -f test.yaml 

Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "test" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "test" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "test" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "test" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")pod/example created

Note: More info about the metrics can be found in the doc.

👮Aren’t the pipelines checking my Operator solutions against new OCP installs and for all versions where my Operator bundle will be distributed?

All pipelines only check if your Operator bundle can be installed and the deployment defined on the CSV. They are unable to check if the Operands workloads will be able to run. Note that on these tests the CRs are NOT applied and any logic called in your reconciliations can NOT be checked.

🙋‍♀️Can I use the audit logs?

By looking at the audit logs ($ oc adm must-gather -- /usr/bin/gather_audit_logs) you can find more info about what can violate the PodSecurity.

👩‍🏭I followed the recommendations to publish solutions for Openshift < 4.11 and I still checking the warnings when I apply a manifest. Did I make something wrong?

See that to publish workable solutions for Openshift < 4.11 you will need to leave the securityContext.seccompProfile empty and you will still check warns without the other specs like

Warning: would violate PodSecurity "restricted:latest": seccompProfile (pod or containers "kube-rbac-proxy", "manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

👩‍🏭What happens if I use `runAsUser` in my workloads?

On Kubernetes vanilla clusters, your workload will run in a namespaced enforced as restricted unless the user is defined with a value that is NOT 0(root). However, as described in the CAVEATs above, the workload will not be accepted on OCP to be qualified for the restricted-v2. Therefore you will face issues like:

Leave the runAsUser field empty, or provide a value that falls within the specific user range for the namespace. provide a value in the ranges: [1000680000, 1000689999]

Note that the range of users is per namespace and we are unable to know beforehand a fixed value. The best option is to ensure that you can workload can run without set theRunAsUser field. So that, your workload can run as restricted on K8s clusters and be qualified to restricted-v2 SCC on Openshift.

However, if the image used does not define a USER ID you will check the following error:"container has runAsNonRoot and image will run as root … when the RunAsNonRoot field is set with a true value. Then, you can define a USER ID in the Dockerfile instead of using runAsUser in the security context, such as:

USER 65532:65532 

OR

USER 1001

If you are unable to change the image, for Openshift distributions you MUST leave the RunAsUser and RunAsNonRoot fields empty and let the Openshift SCC inject the values when your workload is qualified to the SCC policy.

mdrakiburrahman · 2023-01-01T19:54:25Z

mdrakiburrahman
Jan 1, 2023

To be compatible with 4.10 and 4.11, the seccompProfile value must be left unset (the SCC itself will default it to runtime/default so it is ok to leave it empty).

This is a bit misleading.

On OCP 4.11, seccompProfile is auto-populated to runtime/default for a Pod only when you use higher level abstractions like deployment or job. If you try to create a barebone pod without seccompProfile, it doesn't have seccompProfile auto-populated, and is denied admission.

I don't think this^ is documented anywhere, I had to figure this out via trial-and-error:

Pod didn't auto-populate seccompProfile, so it was denied admission, but when you take the same Pod's spec and wrap it in a kind: Deployment, the Pod that's created by the ReplicaSet controller is admitted, because seccompProfile is auto-populated by whatever OpenShift operator that does the auto-population.

CC - @taylorjordanNC since it looks like @camilamacedo86 is no longer with Red Hat.

2 replies

camilamacedo86 Jan 2, 2023
Author

Hi @mdrakiburrahman,

Note that is unlike solutions to create Pods directly, however, this scenario is covered above in the section: Your Operator solution might be affected in the following ways:

mdrakiburrahman Jan 2, 2023

@camilamacedo86 ah I had missed this, thanks!

mdrakiburrahman · 2023-01-03T21:29:16Z

mdrakiburrahman
Jan 3, 2023

Hey @camilamacedo86, @taylorjordanNC - a bit of a tricky question for you between 4.10 and 4.11

The problem

Our solution (a database engine) requires NET_BIND_SERVICE, because we bind to ports below <1024.

This worked fine in OpenShift 4.10, since the restricted SCC was not force-dropping ALL in securityContext.capabilities.drop. Our StatefulSet and the container inside it would come up fine, since it looks like CRI-O offers this OOTB for all containers.

But in 4.11, our container fails to come up, unless we add this to the container spec, as per your guide above:

# StatefulSet
securityContext:
 capabilities:
  add:
   - NET_BIND_SERVICE

This is good, container comes up in 4.11 as restricted-v2.

Now, the problem is, we need to backward-compat support Customers on 4.10. And in 4.10 or restricted SCC, securityContext.capabilities.add - NET_BIND_SERVICE is not allowed:

# Openshift 4.10, if you try to use capabilities
create Pod controldb-0 in StatefulSet controldb failed error: pods "controldb-0" is forbidden: spec.containers[1].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added
....

So this means:

Openshift 4.10 restricted allowed NET_BIND_SERVICE via CRI-O by default to all containers, without manifest specifying the capability.
Openshift 4.11 restricted-v2 forces container to drop ALL, which means you need to explicitly ask for NET_BIND_SERVICE
Problem is, if the product explicitly asks for NET_BIND_SERVICE, it works on 4.11 restricted-v2, but it stops working in restricted <---- This is a problem

Possible solutions

There has to be some logic in our operator to "know" it's running on 4.10 vs 4.11, and ask for NET_BIND_SERVICE only if not <4.10.

This is a very brittle solution
We'd need a good way for our operator to "programmatically know" it's running on Openshift 4.10, not sure what this is

Redevelop the product from the ground up to not use NET_BIND_SERVICE

This could be tricky for us (Database engine)

Question for you

Any other solutions you can think of, besides 1?

2 replies

camilamacedo86 Jan 4, 2023
Author

HI @mdrakiburrahman,

Then, see Now, the problem is, we need to backward-compat support Customers on 4.10. And in 4.10 or restricted SCC,

If you cannot cannot change to use Deployment for example then, you will need to create an version to target OCP < 4.10 and another to target OCP >= 4.10

There has to be some logic in our operator to "know" it's running on 4.10 vs 4.11, and ask for NET_BIND_SERVICE only if not <4.10.

The most straightforward way to ensure that your workloads can work properly is via e2e tests. You can check an example in the sample inside of operator-sdk repo to know how to develop your e2e tests. (see: https://master.sdk.operatorframework.io/docs/best-practices/pod-security-standards/#recommended-how-can-i-automate-this-check-using-e2e-testing-to-ensure-that-my-solutions-can-run-under-the-policies and look at the code: https://github.com/operator-framework/operator-sdk/tree/master/testdata/go/v3/memcached-operator/test ). Therefore, you can use it as a base and change it accordingly your scenario targeting OCP instead of k8s vanilla .

Redevelop the product from the ground up to not use NET_BIND_SERVICE

It is vey unlike to happen for many reasons.

I hope that can help you out.

mdrakiburrahman Jan 4, 2023

Thanks @camilamacedo86 - so our operator doesn't use OperatorSDK (we built it using Kubernetes C# client), but I'll take a look at the patterns you shared.

cdjohnson · 2023-01-04T15:18:46Z

cdjohnson
Jan 4, 2023

@mdrakiburrahman We've run into this as well. You simply can't have a Pod Template generically built to use NET_BIND_SERVICE that runs on both OCP 4.10 and 4.11 simultaneously if they are configured to use the out-of-box restricted and restricted-v2 policies by default (or restricted and PSA restricted).

We followed a pattern similar to your option 1. But instead of detecting what version of OCP or Kube we are running on, we instead test for the MODE of the Target namespace for the Pod. Your Operator would then test the effective security model for that namespace, answering:

Does the namespace and ServiceAccount implicitly or explicitly support NET_BIND_SERVICE?

One generic way to do this test is to create a Test Deployment using the same Service Account prior to creating your real StatefulSet and see if it starts successfully. Or just try and run your StatefulSet one way and revert it to the other way if you detect a the error.

Another way is to check the Namespace statically using various API, Resource and Label checks (a bit for fragile). I'm not aware of a simple API to accomplish this (someone should write one). A fairly narrow test that carries some assumptions with it, is to simply test if the SecurityContextConstraint API exists and if the restricted-v2 resource exists. If it does, then it's "probably" OCP 4.11+ with restricted-v2 enabled by default. However... if SCC's go away in the future (I hope they do), then you'll need to also adjust for that....

2 replies

mdrakiburrahman Jan 4, 2023

Thanks Chris.

The first method you mentioned (spin up whatever workload, see if it comes up) is pretty clever. I guess our Operator would either need to cache that empirical knowledge somewhere (after a first try, maybe in it's CR), or each reconcile would always have a "bad deploy" followed by a "good deploy". Not ideal since it's always trial-and-error, and end-users of the operator would be confused as well.

I've thought about the second method, the problem is we didn't really design our Operator from the ground up to "tweak for upcoming OpenShift API Versions", like we don't have the necessary fundamental interfaces to inject things like that at runtime. Also because our Operator supports a large number of K8s distros (Rancher, TKG etc, not just OCP), implementing this would basically be a hack in a corner of the codebase as you mentioned, because every scan for SecurityContextConstraint CR in those platforms would throw.

Thinking more about this problem, I think the true gap in OpenShift 4.10 was the fact that NET_BIND_SERVICE was blanket offered via CRI-O to all containers, which has lead to us all developing workloads that banks on that feature without specifying it in the spec.

Equivalently, if Openshift 4.11 "blanket injected" NET_BIND_SERVICE to all Pods running restricted-v2, that would be the "equivalent" user experience for Operator developers - but I guess that's not possible anymore since we're coming up on 4.12. That being said, I guess this would have been tricky for 4.11 decision makers, since there's no way to know a Pod's binary needs NET_BIND_SERVICE, until it's admitted and Kubelet is rejected at the binary execution (which is what our workloads are facing).

I think the most elegant solution would have been, if OpenShift 4.10 restricted just allowed NET_BIND_SERVICE (just like 4.11 does, I'm guessing this was added in restricted-v2 because it's so common for so many workloads), then all would be well.

cdjohnson Jan 4, 2023

@mdrakiburrahman The real problem is that restricted wasn't restricted enough (new features rolled out in Kubernetes that had non-restrictive defaults for backward compatibility). restricted-v2 aligns with the new restricted "Pod Security Standard". So this had to break at some point. They decided to break it in 4.11. More incremental runway would have been nice, I agree (e.g. restricted-v2, restricted-v3, etc).

You'll have this problem in some other K8s distro's as well, depending on how the customer chooses to implement their own Pod Security policies (PSPs, OPA, etc.). So having some sort of generic testing (my first option) is the only bullet proof way to do it: Test what I'm allowed to do, and make adjustments dynamically based on my environment.

Your operator probably needs to consider "re-testing" as Kubernetes is upgraded, new policies are introduced after your operator and operands are deployed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📢 Pod Security Standards and Openshift changes on 4.11 and 4.12 that might affect your workloads(Operator, Operands) #1417

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

📢 Pod Security Standards and Openshift changes on 4.11 and 4.12 that might affect your workloads(Operator, Operands) #1417

💁What are the changes?

In Openshift 4.11:

Who is affected by these changes?

What is planned for Openshift 4.12?

Who is affected by these changes?

Your Operator solution might be affected in the following ways:

🙆 So, what should I do?

Check if you are affected by the changes on 4.11:

Best Practices Moving Forward: What is recommended?

You can find code examples, tips and tools to check your operator solutions here.

⚠️ CAVEATS:

TL'DR ( detailed info/ignore if you do not need it):

FAQ

🙋‍♀️ What happens if my solution is impacted when the cluster admin upgrades the OCP cluster from 4.10 to 4.11 with it installed? Will my workload(s) fail because the restricted SCC policy is no longer granted by default?

🙋‍♀️Could you please provide the details on how can I publish my Operator bundle versions following the recommendations on Openshif?

If your solution(s) do NOT need escalated permissions: (the most common scenario)

If your solutions require escalated permissions:

🙋‍♀️When/if the Pod/Container can not run as restricted-v2 and has no access to more permissive SCC to be admitted and run, how does it fail? Could you provide an example?

💂How can I verify if my Operators or Operands are been accepted to the restrictive-v2 criteria?

👩‍🏭I created a Pod in the OCP following the recommendations and the value of the annotation openshift.io/scc is anyuid and not restricted-v2? Why does it happen? What is wrong?

👮Aren’t the pipelines checking my Operator solutions against new OCP installs and for all versions where my Operator bundle will be distributed?

🙋‍♀️Can I use the audit logs?

👩‍🏭I followed the recommendations to publish solutions for Openshift < 4.11 and I still checking the warnings when I apply a manifest. Did I make something wrong?

👩‍🏭What happens if I use runAsUser in my workloads?

Replies: 3 comments · 6 replies

camilamacedo86 Jan 2, 2023 Author

The problem

Possible solutions

Question for you

camilamacedo86 Jan 4, 2023 Author

👩‍🏭What happens if I use `runAsUser` in my workloads?

Replies: 3 comments 6 replies

camilamacedo86 Jan 2, 2023
Author

camilamacedo86 Jan 4, 2023
Author