Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need help to install Admiralty 0.14.1 in OpenShift 4.7 #128

Open
hfwen0502 opened this issue Nov 23, 2021 · 13 comments
Open

need help to install Admiralty 0.14.1 in OpenShift 4.7 #128

hfwen0502 opened this issue Nov 23, 2021 · 13 comments

Comments

@hfwen0502
Copy link

I am trying to explore the capabilities that Admiralty can offer in the OCP cluster provisioned on IBM Cloud. Below is the info. about the OCP cluster and the cert-manager version installed here:

[root@hf-ocp-login1 ]# oc version
Client Version: 4.9.7
Server Version: 4.7.36
Kubernetes Version: v1.20.0+bbbc079

[root@hf-ocp-login1 ]# helm ls -A
NAME        	NAMESPACE   	REVISION	UPDATED                                	STATUS  	CHART              	APP VERSION
cert-manager	cert-manager	1       	2021-11-23 18:27:06.126167907 +0000 UTC	deployed	cert-manager-v1.6.1	v1.6.1     

However, when trying to install Admiralty, I encountered issues shown below:

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1alpha2", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1alpha2"]

Any idea how to fix this?

@adrienjt
Copy link
Contributor

cert-manager 1.6 stopped serving alpha and beta APIs: https://github.com/jetstack/cert-manager/releases/tag/v1.6.0

helm template ... | cmctl convert -f - | kubectl apply -f -

instead of

helm install ...

should work. Please feel free to submit a PR to implement the conversions in the chart (for helm install to work again). We haven't upgraded to cert-manager 1.6 yet on our side, so haven't had an urgent need for the conversion.

@hfwen0502
Copy link
Author

@adrienjt Thanks. I also just found out how to get around the helm install issue using "helm template" route. Things seem to be working fine now.

@hfwen0502 hfwen0502 reopened this Jan 24, 2022
@hfwen0502
Copy link
Author

hfwen0502 commented Jan 24, 2022

Everything works fine out of the box using the Kubernetes clusters. However, there are quite few things that users need to change in order to get it working on OpenShift (e.g. clusterroles). Now I am facing an issue for the virtual node which represents the workload cluster:

oc describe node admiralty-default-ocp-eu2-1-6198a17ca3

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)

Any idea why the resources (cpu/memory) on the virtual node are all 0? I am using the service account to do the authentication between the target and workload clusters. It works fine on K8s but not OpenShift.

@hfwen0502
Copy link
Author

I was able to figure out how to set up a kubeconfig secret for OpenShift clusters. Everything works beautifully. Love the tool!

@adrienjt
Copy link
Contributor

Hi @hfwen0502, I'm glad you were able to figure this out. Would you care to contribute how to set up a kubeconfig secret for OpenShift clusters to the Admiralty documentation? (PR under docs/)

@hfwen0502
Copy link
Author

Of course. Would be happy to contribute the documentation. Can the platform be based on the IKS and ROKS services on IBM Cloud? I am working in the hybrid cloud organization in IBM Research.
By the way, RBAC needs to be adjusted as well on OpenShift.

oc edit clusterrole admiralty-multicluster-scheduler-source

- apiGroups:
  - ""
  resources:
  - pods
  # add the line below
  - pods/finalizers
  verbs:
  - list
  # add the line below
  - '*'
- apiGroups:
  - multicluster.admiralty.io
  resources:
  - podchaperons
  # add the three lines below
  - podchaperons/finalizers
  - sources
  - sources/finalizers

@adrienjt
Copy link
Contributor

Can the platform be based on the IKS and ROKS services on IBM Cloud?

Yes, no problem.

By the way, RBAC needs to be adjusted as well on OpenShift.

Could you contribute the RBAC changes to the Helm chart?

@hfwen0502
Copy link
Author

A PR is submitted which includes both RBAC and doc changes: #134

@hfwen0502
Copy link
Author

Things only work in the default namespace on OpenShift. There are issues related to scc when we set up Admiralty in the non-default namespace. Errors are shown below:

E0128 20:33:01.214968       1 controller.go:117] error syncing 'hfwen/test-job-hscvc-ms6r7': pods "test-job-hscvc-ms6r7" 
is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user 
or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1000720000}: 1000720000 is
not an allowed group, provider restricted: .spec.securityContext.seLinuxOptions.level: Invalid value: "s0:c27,c9": must be s0:c26,c25,
spec.containers[0].securityContext.runAsUser: Invalid value: 1000720000: must be in the ranges: [1000700000, 1000709999], 
spec.containers[0].securityContext.seLinuxOptions.level: Invalid value: "s0:c27,c9": must be s0:c26,c25, provider 
"ibm-restricted-scc": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by 
user or serviceaccount, provider "ibm-anyuid-scc": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": 
Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-hostpath-scc": Forbidden: not usable by user or serviceaccount,
provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden:
not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider 
"ibm-anyuid-hostaccess-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by 
user or serviceaccount, provider "ibm-privileged-scc": Forbidden: not usable by user or serviceaccount, provider "privileged": 
Forbidden: not usable by user or serviceaccount], requeuing```

@adrienjt
Copy link
Contributor

when we set up Admiralty in the non-default namespace

When Admiralty is installed in the non-default namespace and/or when Sources/Targets are set up (and pods created) in the non-default namespace?

Which SCC are you expecting to apply? restricted (the only one allowed, but not passing) or something else? If restricted, have you tried configuring your test job's security context to make it pass the policy? If something else, have you tried allowing the SCC for the pod's service account in that namespace?

@hfwen0502
Copy link
Author

@adrienjt Sorry. I should have made myself clear. Admiralty is always installed in the Admiralty namespace. The issue related to SCC occurs when sources/targets are set up in the non-default namespace. Let's assume sources/targets are in the hfwen namespace. In the annotation of the proxy pod at the source, we have the following:

* Source Proxy  Pod
Annotations:    multicluster.admiralty.io/elect:
                multicluster.admiralty.io/sourcepod-manifest:
                  apiVersion: v1
                  kind: Pod
                  spec:
                    containers:
                      securityContext:
                        capabilities:
                          drop:
                          - KILL
                          - MKNOD
                          - SETGID
                          - SETUID
                        runAsUser: 1000690000
                    securityContext:
                      fsGroup: 1000680000
                      seLinuxOptions:
                        level: s0:c26,c15

On the target cluster, the PodChaperon object has this:

oc get podchaperons hf1-job-tvlrx-p7sp2 -o yaml
apiVersion: multicluster.admiralty.io/v1alpha1
kind: PodChaperon
spec:
  containers:
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000680000
  securityContext:
    fsGroup: 1000680000
    seLinuxOptions:
      level: s0:c26,c15

This is a problem because the target cluster actually expects SCC in the hfwen namespace with the following:

  securityContext:
    fsGroup: 1000780000 <= something with the range [1000700000, 1000709999]
    seLinuxOptions:
      level: s0:c26,c15 <= should be s0:c26,c25

Any idea how to resolve this? When sources/targets are in the default namespace, the securityContext stays empty. That's why we did not hit this problem. I have also tried to adjust the SCC in the service account, which did not work.

@hfwen0502 hfwen0502 reopened this Feb 1, 2022
@hfwen0502
Copy link
Author

On OpenShift, it always comes with 3 service accounts by default.

NAME       SECRETS   AGE
builder    2         44m
default    2         44m
deployer   2         44m

Adding the privileged SCC to the default sa in my hfwen namespace (both source and target) seems to fix the SCC issue.

oc adm policy add-scc-to-user privileged -z default -n hfwen

@adrienjt Is this something you have in mind? Is this a good practice or the only way to resolve it?

@hfwen0502
Copy link
Author

ok. Find a better solution. The OpenShift clusters on IBM Cloud come with other preconfigured SCC groups. We can use a less-privileged one instead of privileged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants