Skip to content

Commit

Permalink
(merge dev branch) refactor, chore: refactor charm to use `Deployment…
Browse files Browse the repository at this point in the history
…` for workload, also bumps training-operator 1.7->1.8 (#167)

* pin integration test dependencies, refactor constants in tests (#164)
* refactor: deploy the training-operator with kubernetes resources (#161)
* chore: bump training-operator v1.7 -> v1.8 (#162)
* refactor: apply a workload Service instead of using juju created one (#173)
* tests: skip test_upgrade due to #170 (#171)
* build, tests: bump charmed-kubeflow-chisme 0.4.0 -> 0.4.1 (#172)

Fixes #159
  • Loading branch information
DnPlas authored Jul 9, 2024
1 parent 4c66daf commit d9197c5
Show file tree
Hide file tree
Showing 23 changed files with 41,610 additions and 46,758 deletions.
16 changes: 14 additions & 2 deletions .github/workflows/integrate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,20 @@ jobs:
run: juju status
if: failure()

- name: Get workload logs
run: kubectl logs --tail 100 -ntesting -lapp.kubernetes.io/name=training-operator -ccharm
- name: Get validatingwebhookconfigurations
run: kubectl get validatingwebhookconfigurations validator.training-operator.kubeflow.org -oyaml
if: failure()

- name: Get secret
run: kubectl get secret -ntesting training-operator-webhook-cert -oyaml
if: failure()

- name: Describe pod
run: kubectl describe pod -ntesting -lapp.kubernetes.io/name=training-operator
if: failure()

- name: Get pods
run: kubectl get pods -A
if: failure()

- name: Get operator logs
Expand Down
1 change: 0 additions & 1 deletion charmcraft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,3 @@ bases:
parts:
charm:
charm-python-packages: [setuptools, pip]
build-packages: [git]
6 changes: 6 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
options:
training-operator-image:
default: kubeflow/training-operator:v1-4485b0a
description: |
Container image to be used by the training-operator workload.
type: string
3 changes: 2 additions & 1 deletion examples/paddlejob.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# From https://github.com/kubeflow/training-operator/blob/v1.8-branch/examples/paddlepaddle/simple-cpu.yaml
apiVersion: "kubeflow.org/v1"
kind: PaddleJob
metadata:
Expand All @@ -21,4 +22,4 @@ spec:
ports:
- containerPort: 37777
name: master
imagePullPolicy: Always
imagePullPolicy: Always
1 change: 1 addition & 0 deletions examples/pytorchjob.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# From https://github.com/kubeflow/training-operator/blob/v1.8-branch/examples/pytorch/simple.yaml
apiVersion: "kubeflow.org/v1"
kind: PyTorchJob
metadata:
Expand Down
25 changes: 13 additions & 12 deletions examples/tfjob.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# From https://github.com/kubeflow/training-operator/blob/v1.8-branch/examples/tensorflow/simple.yaml
apiVersion: "kubeflow.org/v1"
kind: TFJob
metadata:
name: tfjob-simple
spec:
tfReplicaSpecs:
Worker:
replicas: 2
restartPolicy: OnFailure
template:
spec:
containers:
- name: tensorflow
image: gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0
command:
- "python"
- "/var/tf_mnist/mnist_with_summaries.py"
tfReplicaSpecs:
Worker:
replicas: 2
restartPolicy: OnFailure
template:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-mnist-with-summaries:latest
command:
- "python"
- "/var/tf_mnist/mnist_with_summaries.py"
7 changes: 4 additions & 3 deletions examples/xgboostjob.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# From https://github.com/kubeflow/training-operator/blob/v1.8-branch/examples/xgboost/xgboostjob.yaml
apiVersion: kubeflow.org/v1
kind: XGBoostJob
metadata:
name: xgboost-simple
name: xgboost-dist-iris-test-train
spec:
xgbReplicaSpecs:
Master:
Expand All @@ -11,7 +12,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/merlintang/xgboost-dist-iris:1.1
image: docker.io/kubeflow/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand All @@ -30,7 +31,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/merlintang/xgboost-dist-iris:1.1
image: docker.io/kubeflow/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down
Loading

0 comments on commit d9197c5

Please sign in to comment.