NROP CR status is not aligned with the specs #920

shajmakh · 2024-04-10T14:54:12Z

steps to reproduce:

add tolerations to spec
delete them and see the status still contains them
however, RTE pods are restarted with no tolerations so it seems only a wrong reporting in NROP CR status

this is not particularly to Tolerations but can happen with the other config fields too.

spec:
  logLevel: Trace
  nodeGroups:
  - machineConfigPoolSelector:
      matchLabels:
        pools.operator.machineconfiguration.openshift.io/worker: ""
status:
  conditions:
  - lastTransitionTime: "2024-04-10T14:31:37Z"
    message: ""
    reason: Available
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-10T14:31:37Z"
    message: ""
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  - lastTransitionTime: "2024-04-10T14:31:37Z"
    message: ""
    reason: Progressing
    status: "False"
    type: Progressing
  - lastTransitionTime: "2024-04-10T14:31:37Z"
    message: ""
    reason: Degraded
    status: "False"
    type: Degraded
  daemonsets:
  - name: numaresourcesoperator-worker
    namespace: openshift-numaresources
  machineconfigpools:
  - config:
      infoRefreshMode: Periodic
      infoRefreshPeriod: 1s
      podsFingerprinting: Enabled
      tolerations:
      - effect: NoSchedule
        key: sriov
        operator: Equal
        value: "true"
    name: worker

The text was updated successfully, but these errors were encountered:

We have been seeing several cases where the operator status doesn't get updated correctly yet it's been hard to reproduce and track down the reason (see https://issues.redhat.com/browse/OCPBUGS-16058 & https://issues.redhat.com/browse/CNF-9080). During the updates of the controller tests to extend the coverage of reflecting the node group config under the status, with further debugging it turned out that updating the operator status was skipped in case of 2 similar successive conditions. This is wrong because each condition status can vary with the reasons, for example, the operator is successful with one set of NodeGroupConfig and later is configured with another successful set of settings, which we still want to reflect. Fortunately this is just a status reflection that explains why the RTE daemonset for example gets updated even if the status on the NROP CR is not up-to-date. Fix this by relaxing the updateStatus() function not to make it a hard requirement on the successive conditions to unmatch, and thus allowing always update the status after the end of each reconciliation loop. candidate fix for openshift-kni#920 Signed-off-by: Shereen Haj <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NROP CR status is not aligned with the specs #920

NROP CR status is not aligned with the specs #920

shajmakh commented Apr 10, 2024 •

edited

Loading

NROP CR status is not aligned with the specs #920

NROP CR status is not aligned with the specs #920

Comments

shajmakh commented Apr 10, 2024 • edited Loading

shajmakh commented Apr 10, 2024 •

edited

Loading