chore: bump training-operator v1.7 -> v1.8 #162

DnPlas · 2024-05-31T11:33:26Z

This commit introduces the following changes:

The charm now renders and applies a ValidatingWebhookConfiguration resource for
training-operator CRDs.
The charm will render the Service to also serve on port 9443 for the webhook service.
The oci-image is updated to v1.8 of the training-operator
The training-operator Deployment now has a volume mount for mounting
the secret that is used by the cert-controller to generate and rotate
certificates for the ValidatingWebhookConfiguration
The training-operator Deployment will now take an argument so the webhook service can use the training-operator workload's Service instead of the default
Updates the examples directory with examples from kubeflow/training-operator v1.8-branch

Fixes #159

This commit refactors the way the training-operator is deployed, as instead of using a sidecar container that runs the workload, we are now applying the Deployment and all the Kubernetes resources required by the training-operator controller to be able to mange training resources. We are introducing this change in preparation for the upcoming 1.8 version, as it introduces the hard dependency on a Kubernetes Secret mounted in a volume for the training-operator workload to start. For more details please refer to #159.

This commit introduces the following changes: * The charm now renders and applies a ValidatingWebhookConfiguration resource for training-operator CRDs. * The charm will now patch the Service to also serve on port 9443 for the webhook service. * The oci-image is updated to v1.8 of the training-operator * The training-operator Deployment now has a volume mount for mounting the secret that is used by the cert-controller to generate and rotate certificates for the ValidatingWebhookConfiguration Fixes #159

* pin integration test dependencies, refactor constants in tests (#155) Pins dependencies in the integration tests to their corresponding channels for this release. Ref: canonical/bundle-kubeflow#866 Co-authored-by: Andrew Scribner <[email protected]>

* refactor: deploy the training-operator with kubernetes resources This commit refactors the way the training-operator is deployed, as instead of using a sidecar container that runs the workload, we are now applying the Deployment and all the Kubernetes resources required by the training-operator controller to be able to mange training resources. We are introducing this change in preparation for the upcoming 1.8 version, as it introduces the hard dependency on a Kubernetes Secret mounted in a volume for the training-operator workload to start. For more details please refer to #159.

…rainig-operator-1.8

orfeas-k

Thank you @DnPlas, leaving some comments plus IIUC the following changes are missing:

image + secretGenerator field https://github.com/kubeflow/manifests/pull/2699/files#diff-23b73c27a3cd384ed58c680eac804ff86be38f48688cc94742c51c530e4164e8
changes in aggregation roles https://github.com/kubeflow/manifests/pull/2699/files#diff-8b80f589b6814910a625e07ed23ff511ef76e81cf009dad169eba0f26ed04497
changes in training-operator role https://github.com/kubeflow/manifests/pull/2699/files#diff-a88903de8c7c015c3e145b77718f9cee059aedfc24e3b0f37d246f979a4bb15b

src/charm.py

src/templates/validatingwebhookconfiguration.yaml.j2

DnPlas · 2024-06-19T16:32:27Z

Thank you @DnPlas, leaving some comments plus IIUC the following changes are missing:

* image + `secretGenerator` field https://github.com/kubeflow/manifests/pull/2699/files#diff-23b73c27a3cd384ed58c680eac804ff86be38f48688cc94742c51c530e4164e8

* changes in aggregation roles https://github.com/kubeflow/manifests/pull/2699/files#diff-8b80f589b6814910a625e07ed23ff511ef76e81cf009dad169eba0f26ed04497

* changes in training-operator role https://github.com/kubeflow/manifests/pull/2699/files#diff-a88903de8c7c015c3e145b77718f9cee059aedfc24e3b0f37d246f979a4bb15b

I have updated the ClusterRole and aggregated rules, as well as the image tag, but the secretGenerator is a field in the kustomize.yaml file rather than in the manifests; it is used for rendering the Secret's metadata.name field.

…rainig-operator-1.8

DnPlas · 2024-06-19T18:53:34Z

kubeflow/training-operator#2143 could be affecting (intermittently) the CI.

orfeas-k

Some small comments

src/charm.py

.github/workflows/integrate.yaml

src/templates/secret.yaml.j2

This reverts commit 22020b6.

This reverts commit 4d8c656.

This reverts commit 161da4b.

This reverts commit 99e2422.

DnPlas · 2024-06-26T13:38:19Z

Please review #171 before this, it should help working around the CI issue.

…rainig-operator-1.8

DnPlas · 2024-07-01T20:16:35Z

This PR is now affected by #174

NohaIhab

thx @DnPlas, almost there
We need to update the yamls in /examples, I see some images for the examples were updated in the training-opeartor 1.8 branch

NohaIhab

LGTM

This commit introduces the following changes: * The charm now renders and applies a ValidatingWebhookConfiguration resource for training-operator CRDs. * The charm will render the Service to also serve on port 9443 for the webhook service. * The oci-image is updated to v1.8 of the training-operator * The training-operator Deployment now has a volume mount for mounting the secret that is used by the cert-controller to generate and rotate certificates for the ValidatingWebhookConfiguration * The training-operator Deployment will now take an argument so the webhook service can use the training-operator workload's Service instead of the default * Updates the examples directory with examples from kubeflow/training-operator v1.8-branch Fixes #159

…` for workload, also bumps training-operator 1.7->1.8 (#167) * pin integration test dependencies, refactor constants in tests (#164) * refactor: deploy the training-operator with kubernetes resources (#161) * chore: bump training-operator v1.7 -> v1.8 (#162) * refactor: apply a workload Service instead of using juju created one (#173) * tests: skip test_upgrade due to #170 (#171) * build, tests: bump charmed-kubeflow-chisme 0.4.0 -> 0.4.1 (#172) Fixes #159

DnPlas added 5 commits May 31, 2024 12:21

skip: remove unused build-packages

7eff44d

skip: remove oci-image resource from test

2e31dbe

skip: fix lint

5b85fe5

github-actions bot added the Libraries: Out of sync label May 31, 2024

DnPlas mentioned this pull request May 31, 2024

Update training-operator manifests #159

Closed

DnPlas and others added 3 commits June 18, 2024 16:37

Merge branch 'KF-5692-training-1.8-dev-branch' into KF-5692-upgrade-t…

9d788cf

…rainig-operator-1.8

DnPlas marked this pull request as ready for review June 19, 2024 12:18

DnPlas requested a review from a team as a code owner June 19, 2024 12:18

DnPlas added 2 commits June 19, 2024 14:23

skip: fix lint

2beecdd

skip: remove hardcoded ns and change name of secret

00e46bd

orfeas-k reviewed Jun 19, 2024

View reviewed changes

src/charm.py Outdated Show resolved Hide resolved

src/charm.py Outdated Show resolved Hide resolved

src/templates/validatingwebhookconfiguration.yaml.j2 Show resolved Hide resolved

skip: update based on feedback

44f1eea

DnPlas force-pushed the KF-5692-training-1.8-dev-branch branch 2 times, most recently from 5bb0371 to 32ecee6 Compare June 19, 2024 17:07

DnPlas added 5 commits June 19, 2024 11:16

Merge branch 'KF-5692-training-1.8-dev-branch' into KF-5692-upgrade-t…

c98d016

…rainig-operator-1.8

skip: skip special char

563df72

skip: fix port and target port

1571144

skip: fix service patch

5f1849d

skip: debug

4cb8272

orfeas-k reviewed Jun 20, 2024

View reviewed changes

src/charm.py Show resolved Hide resolved

.github/workflows/integrate.yaml Show resolved Hide resolved

src/templates/secret.yaml.j2 Show resolved Hide resolved

DnPlas added 4 commits June 20, 2024 14:56

skip: debug

cb41aef

skip: debug

85fd364

skip: fix webhook port

5be98a5

skip: debug

22020b6

DnPlas added 13 commits June 20, 2024 22:13

Revert "skip: debug"

af85703

This reverts commit 22020b6.

Revert "skip: change limits"

e574b40

This reverts commit 4d8c656.

Revert "skip: debug add resource limits"

2afbf2f

This reverts commit 161da4b.

skip: add async

440794e

skip: add wait

8a9581b

skip: fix await

ef6e2d5

skip: fix helper method

f3776fd

skip: fix typo

de47f25

skip: skip some test cases for debugging

99e2422

skip: re-order apply

d61d61f

Revert "skip: skip some test cases for debugging"

80c0b5d

This reverts commit 99e2422.

skip: add retry on create

6ece45e

skip: add description to retry

c99b0e2

DnPlas added 4 commits June 26, 2024 19:08

skip: update 1.8.0-rc.1

2820d87

Merge branch 'KF-5692-training-1.8-dev-branch' into KF-5692-upgrade-t…

9039891

…rainig-operator-1.8

skip: add webhook port

d2b0591

skip: fix missing var

2a3d7fe

skip: edit webhook server

ce8ae3e

NohaIhab reviewed Jul 3, 2024

View reviewed changes

DnPlas added 3 commits July 3, 2024 15:53

skip: update all examples for 1.8

1fd8cd5

skip: update examples

06b1cea

skip: remove namespace from pyjob

246569e

NohaIhab approved these changes Jul 4, 2024

View reviewed changes

DnPlas merged commit 8f21f06 into KF-5692-training-1.8-dev-branch Jul 4, 2024
8 checks passed

DnPlas deleted the KF-5692-upgrade-trainig-operator-1.8 branch July 4, 2024 11:56

DnPlas mentioned this pull request Jul 4, 2024

refactor, chore: refactor charm to use Deployment for workload, also bumps training-operator 1.7->1.8 #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump training-operator v1.7 -> v1.8 #162

chore: bump training-operator v1.7 -> v1.8 #162

DnPlas commented May 31, 2024 •

edited

Loading

orfeas-k left a comment

DnPlas commented Jun 19, 2024

DnPlas commented Jun 19, 2024

orfeas-k left a comment

DnPlas commented Jun 26, 2024

DnPlas commented Jul 1, 2024

NohaIhab left a comment

NohaIhab left a comment

chore: bump training-operator v1.7 -> v1.8 #162

chore: bump training-operator v1.7 -> v1.8 #162

Conversation

DnPlas commented May 31, 2024 • edited Loading

orfeas-k left a comment

Choose a reason for hiding this comment

DnPlas commented Jun 19, 2024

DnPlas commented Jun 19, 2024

orfeas-k left a comment

Choose a reason for hiding this comment

DnPlas commented Jun 26, 2024

DnPlas commented Jul 1, 2024

NohaIhab left a comment

Choose a reason for hiding this comment

NohaIhab left a comment

Choose a reason for hiding this comment

DnPlas commented May 31, 2024 •

edited

Loading