Skip to content

Commit

Permalink
Fix add node pool macro when the definition contains GPU (#70)
Browse files Browse the repository at this point in the history
* Define env before querying for tolerations on gpu driver

* Update plugin.json and changelog for release

* Update changelog

* Add private networking flag on cluster config generation

* Revert "Add private networking flag on cluster config generation"

This reverts commit ef74302.
  • Loading branch information
vrutz authored May 21, 2024
1 parent 33f6e5e commit 6a45501
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 3 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## Version 1.4.1 - Bugfix release
- Fix an issue when using the action to add a node pool with GPU

## Version 1.4.0 - Feature and bugfix release
- Allowing for multiple node pool definitions on cluster startup
- Adding labels and taints support for node pools
Expand Down
2 changes: 1 addition & 1 deletion plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"id": "eks-clusters",
"version": "1.4.0",
"version": "1.4.1",
"meta": {
"label": "EKS clusters",
"description": "Interact with Amazon Elastic Kubernetes Service clusters",
Expand Down
5 changes: 3 additions & 2 deletions python-lib/dku_kube/gpu_driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ def has_gpu_driver(kube_config_path):
return len(out.strip()) > 0

def add_gpu_driver_if_needed(cluster_id, kube_config_path, connection_info, taints):
env = os.environ.copy()
env['KUBECONFIG'] = kube_config_path

# Get the Nvidia driver plugin configuration from the repository
nvidia_config_raw = requests.get('https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/main/deployments/static/nvidia-device-plugin.yml').text
nvidia_config = yaml.safe_load(nvidia_config_raw)
Expand Down Expand Up @@ -48,6 +51,4 @@ def add_gpu_driver_if_needed(cluster_id, kube_config_path, connection_info, tain
logging.info('Running command to install Nvidia drivers: %s', ' '.join(cmd))
logging.info('NVIDIA GPU driver config: %s' % yaml.safe_dump(nvidia_config, default_flow_style=False))

env = os.environ.copy()
env['KUBECONFIG'] = kube_config_path
run_with_timeout(cmd, env=env, timeout=5)

0 comments on commit 6a45501

Please sign in to comment.