-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move nvidia-ctk hook command into own binary #474
Conversation
I also am concerned that I missed some of the tests, or that there is a config property that needs to be added. |
ab99f33
to
41cc661
Compare
@deitch thanks for the contribution. I'll have a look at it early next week. |
cmd/nvidia-cdi-hook/README.md
Outdated
* `create-symlinks` - Create symlinks inside the directory path to be mounted into a container. | ||
* `update-ldcache` - Update the dynamic linker cache inside the directory path to be mounted into a container. | ||
|
||
The CDI itself is created separately, normally using the `nvidia-ctk cdi` command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document how to actually use the hook? Is it triggered automatically or users need to configure something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand. It doesn't change from current behaviour. It's inserted into the cdi yaml when you run nvidia-ctk cdi generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, can you document the current behavior?
For me it does not work by default though, nvidia-ctk cdi generate
creates a CDI spec where nvidia-ctk
is used only in hooks:
sections, but it is missing in the mounts:
section. Then e.g. when I run podman run --rm -it --gpus all nvidia/cuda:12.4.1-devel-ubuntu22.04
, /usr/bin/nvidia-ctk
is not available in the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, can you document the current behavior?
Sure, on its way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it does not work by default though, nvidia-ctk cdi generate creates a CDI spec where nvidia-ctk is used only in hooks: sections, but it is missing in the mounts: section. Then e.g. when I run podman run --rm -it --gpus all nvidia/cuda:12.4.1-devel-ubuntu22.04, /usr/bin/nvidia-ctk is not available in the container.
Do you mind posting your CDI here? I have 2 Jetsons, but unfortunately no access to them for the next week, so I have nothing ti generate CDIs on, in order to tests it. I also don't have remote access to anything with proper GPUs, so I am stuck working this through here without proper testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I see the same behavior on the main branch, but not on the release-1.15 branch. I've compiled the same way from all 3 branches so the issue does not seem related to compiler flags etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. So I didn't introduce the bug.
If you've compiled it from main, do you mind generating the CDI yaml from main, so we can compare? It doesn't get us all of the way, but it helps.
Hopefully @klueska or one of the other maintainers will know what we need to do to complete the devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lahwaacz . Those are identical, except for the executable. That is a good sign.
It means that whatever introduced the issue with the busId
, is in main, and was not introduced by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick check of main
vs v1.15.0
, there is quite a bit in there that I do not understand. I think we are stuck until someone who gets how this works explains how to fix it in main (and we can apply here).
41cc661
to
bd2f3cc
Compare
Following up on this. Can someone help us understand what is different in |
I was able to reproduce the issue regarding the bus information for the device being empty for |
Excellent, thank you @elezar . It looks like that already is in, so I should be able to rebase on main and it will work? |
61554a6
to
af030d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deitch I think I'm ok with the mechanics of this. The one question I have is how to support the transition from the use of nvidia-ctk hook
to nvidia-cdi-hook
.
Note that we would also need to include the new executable in the deb and rpm package definitions.
Name: "nvidia-ctk-path", | ||
Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched.", | ||
Destination: &opts.nvidiaCTKPath, | ||
Name: "nvidia-cdi-hook-path", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is a breaking change. Does it make sense to add an Alias
here instead and then add logic somewhere(TM) for determining whether the nvidia-cdi-hook
or nvidia-ctk
is used to know how we will generate the CDI spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
breaking change
Do you mean that someone who might have some command process (human or scripted) that called this with --nvidia-ctk-path
will now break, because that option doesn't exist?
I would think (cautiously, and probably wrongly) that it isn't that heavily used an option, and if you install a new version of the toolkit, it is ok for this to break.
If we do not want to allow that, then, sure, I can add an Alias
. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know that this may not be the most used option, but I would prefer to either add an alias or add a new option -- possibly marking the old one as deprecated. Given the discussion around a hook constructor below, maybe a separate option is better -- caling out in the usage text that this is intended to replace the nvidia-ctk
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of behaviour do we want for that? The deprecation is easy to add to the comment:
&cli.StringFlag{
Name: "nvidia-ctk-path",
Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched. DEPRECATED. Use flag --nvidia-cdi-hook-path instead.",
},
but when someone uses it, should we:
- treat it like they called
--nvidia-cdi-hook-path
and set that? I don't much like it, as they are distinct binaries. - do nothing with it? That doesn't seem to add value.
- save it to somewhere and use it? But we don't use that anywhere.
🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this ASSUMES that users have not renamed their nvidia-ctk executable for the purpose of injecting the hook
If a user does that, they only have themselves to blame. You always can break something if you try hard enough. If they replace /usr/bin/containerd
with a symlink to /usr/sbin/halt
, I don't think we can help them 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would include logic to handle both the nvidia-cdi-hook and nvidia-ctk case from the perspective of CDI spec generation. This means that we can add an aliases --nvidia-cdi-hook-path and --nvidia-ctk-path and have them treated consistently
So
- if the user uses
--nvidia-ctk-path
, then it would injectnvidia-ctk hook
(2 words) everywhere in CDI - if the user uses
--nvidia-cdi-hook-path
, then it would injectnvidia-cdi-hook
(1 word) everywhere in CDI - if the user uses both, it is an error
- if the user uses neither, it uses
nvidia-cdi-hook
as the default
If I understood you correctly, sure, that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adopted your suggestions right in. Once I saw it inside, totally made sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything else we need for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does:
diff --git a/cmd/nvidia-ctk/cdi/generate/generate.go b/cmd/nvidia-ctk/cdi/generate/generate.go
index fba22301..9f9e994b 100644
--- a/cmd/nvidia-ctk/cdi/generate/generate.go
+++ b/cmd/nvidia-ctk/cdi/generate/generate.go
@@ -132,13 +132,11 @@ func (m command) build() *cli.Command {
Destination: &opts.librarySearchPaths,
},
&cli.StringFlag{
- Name: "nvidia-cdi-hook-path",
- Usage: "Specify the path to use for the nvidia-cdi-hook in the generated CDI specification. If this is left empty, the path will be searched.",
- Destination: &opts.nvidiaCDIHookPath,
- },
- &cli.StringFlag{
- Name: "nvidia-ctk-path",
- Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched. DEPRECATED. Use flag --nvidia-cdi-hook-path instead.",
+ Name: "nvidia-cdi-hook-path",
+ Aliases: []string{"nvidia-ctk-path"},
+ Usage: "Specify the path to use for the nvidia-cdi-hook in the generated CDI specification. " +
+ "If not specified, the PATH will be searched for `nvidia-cdi-hook`. " +
+ "NOTE: That if this is specified as `nvidia-ctk`, the PATH will be searched for `nvidia-ctk` instead.",
Destination: &opts.nvidiaCDIHookPath,
},
&cli.StringFlag{
make sense?
Is there what to do for this? Since they are packaged together, if you install and run a newer version of a package with ctk, it installs
Is that in here? |
Yes. That's the location where the executable should be added (to the |
af030d0
to
17a05d2
Compare
17a05d2
to
7f2cf83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @deitch. It's looking much better already.
There are some minor changes required.
cmd/nvidia-cdi-hook/README.md
Outdated
[`nvidia-ctk cdi`](../nvidia-ctk/) command. | ||
|
||
When `nvidia-ctk cdi generate` is run, the CDI specification is generated as a yaml file. | ||
The CDI file provides instructions for a container runtime to set up devices, files and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CDI file provides instructions for a container runtime to set up devices, files and | |
The CDI specification provides instructions for a container runtime to set up devices, files and |
Name: "nvidia-ctk-path", | ||
Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched.", | ||
Destination: &opts.nvidiaCTKPath, | ||
Name: "nvidia-cdi-hook-path", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does:
diff --git a/cmd/nvidia-ctk/cdi/generate/generate.go b/cmd/nvidia-ctk/cdi/generate/generate.go
index fba22301..9f9e994b 100644
--- a/cmd/nvidia-ctk/cdi/generate/generate.go
+++ b/cmd/nvidia-ctk/cdi/generate/generate.go
@@ -132,13 +132,11 @@ func (m command) build() *cli.Command {
Destination: &opts.librarySearchPaths,
},
&cli.StringFlag{
- Name: "nvidia-cdi-hook-path",
- Usage: "Specify the path to use for the nvidia-cdi-hook in the generated CDI specification. If this is left empty, the path will be searched.",
- Destination: &opts.nvidiaCDIHookPath,
- },
- &cli.StringFlag{
- Name: "nvidia-ctk-path",
- Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched. DEPRECATED. Use flag --nvidia-cdi-hook-path instead.",
+ Name: "nvidia-cdi-hook-path",
+ Aliases: []string{"nvidia-ctk-path"},
+ Usage: "Specify the path to use for the nvidia-cdi-hook in the generated CDI specification. " +
+ "If not specified, the PATH will be searched for `nvidia-cdi-hook`. " +
+ "NOTE: That if this is specified as `nvidia-ctk`, the PATH will be searched for `nvidia-ctk` instead.",
Destination: &opts.nvidiaCDIHookPath,
},
&cli.StringFlag{
make sense?
@@ -49,6 +49,7 @@ type nvcdilib struct { | |||
driverRoot string | |||
devRoot string | |||
nvidiaCTKPath string | |||
nvidiaCDIHookPath string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're effectively replacing nvidiaCTKPath
with this member, I would recommend applying the following diff:
diff --git a/internal/modifier/cdi.go b/internal/modifier/cdi.go
index c53cb996..c5af4f88 100644
--- a/internal/modifier/cdi.go
+++ b/internal/modifier/cdi.go
@@ -185,7 +185,7 @@ func newAutomaticCDISpecModifier(logger logger.Interface, cfg *config.Config, de
func generateAutomaticCDISpec(logger logger.Interface, cfg *config.Config, devices []string) (spec.Interface, error) {
cdilib, err := nvcdi.New(
nvcdi.WithLogger(logger),
- nvcdi.WithNVIDIACTKPath(cfg.NVIDIACTKConfig.Path),
+ nvcdi.WithNVIDIACDIHookPath(cfg.NVIDIACTKConfig.Path),
nvcdi.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root),
nvcdi.WithVendor("runtime.nvidia.com"),
nvcdi.WithClass("gpu"),
diff --git a/internal/modifier/csv.go b/internal/modifier/csv.go
index 54304428..0905d5da 100644
--- a/internal/modifier/csv.go
+++ b/internal/modifier/csv.go
@@ -62,7 +62,7 @@ func NewCSVModifier(logger logger.Interface, cfg *config.Config, image image.CUD
cdilib, err := nvcdi.New(
nvcdi.WithLogger(logger),
nvcdi.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root),
- nvcdi.WithNVIDIACTKPath(cfg.NVIDIACTKConfig.Path),
+ nvcdi.WithNVIDIACDIHookPath(cfg.NVIDIACTKConfig.Path),
nvcdi.WithMode(nvcdi.ModeCSV),
nvcdi.WithCSVFiles(csvFiles),
)
diff --git a/internal/platform-support/tegra/csv_test.go b/internal/platform-support/tegra/csv_test.go
index 2e4aca34..2e8e42fe 100644
--- a/internal/platform-support/tegra/csv_test.go
+++ b/internal/platform-support/tegra/csv_test.go
@@ -187,7 +187,6 @@ func TestDiscovererFromCSVFiles(t *testing.T) {
o := tegraOptions{
logger: logger,
- nvidiaCTKPath: "/usr/bin/nvidia-ctk",
nvidiaCDIHookPath: "/usr/bin/nvidia-cdi-hook",
csvFiles: []string{"dummy"},
ignorePatterns: tc.ignorePatterns,
diff --git a/internal/platform-support/tegra/tegra.go b/internal/platform-support/tegra/tegra.go
index 597d8b16..1031fc72 100644
--- a/internal/platform-support/tegra/tegra.go
+++ b/internal/platform-support/tegra/tegra.go
@@ -30,7 +30,6 @@ type tegraOptions struct {
csvFiles []string
driverRoot string
devRoot string
- nvidiaCTKPath string
nvidiaCDIHookPath string
ldconfigPath string
librarySearchPaths []string
@@ -134,13 +133,6 @@ func WithCSVFiles(csvFiles []string) Option {
}
}
-// WithNVIDIACTKPath sets the path to the nvidia-container-toolkit binary.
-func WithNVIDIACTKPath(nvidiaCTKPath string) Option {
- return func(o *tegraOptions) {
- o.nvidiaCTKPath = nvidiaCTKPath
- }
-}
-
// WithNVIDIACDIHookPath sets the path to the nvidia-cdi-hook binary.
func WithNVIDIACDIHookPath(nvidiaCDIHookPath string) Option {
return func(o *tegraOptions) {
diff --git a/pkg/nvcdi/lib-csv.go b/pkg/nvcdi/lib-csv.go
index f6f652ba..649b801a 100644
--- a/pkg/nvcdi/lib-csv.go
+++ b/pkg/nvcdi/lib-csv.go
@@ -44,7 +44,6 @@ func (l *csvlib) GetAllDeviceSpecs() ([]specs.Device, error) {
tegra.WithLogger(l.logger),
tegra.WithDriverRoot(l.driverRoot),
tegra.WithDevRoot(l.devRoot),
- tegra.WithNVIDIACTKPath(l.nvidiaCTKPath),
tegra.WithNVIDIACDIHookPath(l.nvidiaCDIHookPath),
tegra.WithLdconfigPath(l.ldconfigPath),
tegra.WithCSVFiles(l.csvFiles),
diff --git a/pkg/nvcdi/lib.go b/pkg/nvcdi/lib.go
index 31e08e82..c56c18e1 100644
--- a/pkg/nvcdi/lib.go
+++ b/pkg/nvcdi/lib.go
@@ -48,7 +48,6 @@ type nvcdilib struct {
deviceNamers DeviceNamers
driverRoot string
devRoot string
- nvidiaCTKPath string
nvidiaCDIHookPath string
ldconfigPath string
configSearchPaths []string
@@ -88,9 +87,6 @@ func New(opts ...Option) (Interface, error) {
if l.devRoot == "" {
l.devRoot = l.driverRoot
}
- if l.nvidiaCTKPath == "" {
- l.nvidiaCTKPath = "/usr/bin/nvidia-ctk"
- }
if l.nvidiaCDIHookPath == "" {
l.nvidiaCDIHookPath = "/usr/bin/nvidia-cdi-hook"
}
diff --git a/pkg/nvcdi/options.go b/pkg/nvcdi/options.go
index a90450f6..5a490619 100644
--- a/pkg/nvcdi/options.go
+++ b/pkg/nvcdi/options.go
@@ -63,10 +63,10 @@ func WithLogger(logger logger.Interface) Option {
}
// WithNVIDIACTKPath sets the path to the NVIDIA Container Toolkit CLI path for the library
+//
+// Deprecated: Use WithNVIDIACDIHookPath instead.
func WithNVIDIACTKPath(path string) Option {
- return func(l *nvcdilib) {
- l.nvidiaCTKPath = path
- }
+ return WithNVIDIACDIHookPath(path)
}
// WithNVIDIACDIHookPath sets the path to the NVIDIA Container Toolkit CLI path for the library
Which switches to using nvidiaCDIHookPath
everywhere -- except where we read nvidia-ctk.path
config optionand pass that to the constructor. We also mark the
nvcdi.WithNVIDIACTKPath` option as deprecated (although we could also not do so and just set the member).
The internal usage in the tegra
package has also been changed to directly refer to nvidiaCDIHookPath
instead. I don't think we need to deprecat that option since we don't consume this in any of our othe components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense?
Sure, a clean alias, and an explanation. Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
9fc3845
to
cdc878b
Compare
All comments and suggestions added; thanks for them @elezar . Back to you. |
Is anything else needed for this @elezar ? |
cdc878b
to
bad12a5
Compare
Was a golang unit test error. Fixed. |
bad12a5
to
a647926
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the comment below, we also need:
diff --git a/internal/discover/ldconfig_test.go b/internal/discover/ldconfig_test.go
index 67149845..0b214c77 100644
--- a/internal/discover/ldconfig_test.go
+++ b/internal/discover/ldconfig_test.go
@@ -25,8 +25,8 @@ import (
)
const (
- testNvidiaCDIPath = "/foo/bar/nvidia-cdi-path"
- testLdconfigPath = "/bar/baz/ldconfig"
+ testNvidiaCDIHookPath = "/foo/bar/nvidia-cdi-hook"
+ testLdconfigPath = "/bar/baz/ldconfig"
)
func TestLDCacheUpdateHook(t *testing.T) {
@@ -92,12 +92,12 @@ func TestLDCacheUpdateHook(t *testing.T) {
},
}
expectedHook := Hook{
- Path: testNvidiaCDIPath,
+ Path: testNvidiaCDIHookPath,
Args: tc.expectedArgs,
Lifecycle: "createContainer",
}
- d, err := NewLDCacheUpdateHook(logger, mountMock, testNvidiaCDIPath, tc.ldconfigPath)
+ d, err := NewLDCacheUpdateHook(logger, mountMock, testNvidiaCDIHookPath, tc.ldconfigPath)
require.NoError(t, err)
hooks, err := d.Hooks()
@@ -186,6 +187,22 @@ func ResolveNVIDIACTKPath(logger logger.Interface, nvidiaCTKPath string) string | |||
) | |||
} | |||
|
|||
// ResolveNVIDIACDIHookPath resolves the path to the nvidia-cdi-hook binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this and ResolveNVIDIACTKPath
above again, what do you think of rewriting this as:
// ResolveNVIDIACTKPath resolves the path to the nvidia-ctk binary.
// This executable is used in hooks and needs to be an absolute path.
// If the path is specified as an absolute path, it is used directly
// without checking for existence of an executable at that path.
//
// Deprecated: Use ResolveNVIDIACDIHookPath directly instead.
func ResolveNVIDIACTKPath(logger logger.Interface, nvidiaCTKPath string) string {
return resolveWithDefault(
logger,
"NVIDIA Container Toolkit CLI",
nvidiaCTKPath,
nvidiaCTKDefaultFilePath,
)
}
// ResolveNVIDIACDIHookPath resolves the path to the nvidia-cdi-hook binary.
// This executable is used in hooks and needs to be an absolute path.
// If the path is specified as an absolute path, it is used directly
// without checking for existence of an executable at that path.
func ResolveNVIDIACDIHookPath(logger logger.Interface, nvidiaCDIHookPath string) string {
if filepath.Base(nvidiaCDIHookPath) == "nvidia-ctk" {
return resolveWithDefault(
logger,
"NVIDIA Container Toolkit CLI",
nvidiaCDIHookPath,
nvidiaCTKDefaultFilePath,
)
}
return resolveWithDefault(
logger,
"NVIDIA CDI Hook CLI",
nvidiaCDIHookPath,
nvidiaCDIHookDefaultFilePath,
)
}
and then adding:
//nolint:staticcheck // TODO(elezar): We should swith the nvidia-container-runtime from using nvidia-ctk to using nvidia-cdi-hook.
cfg.NVIDIACTKConfig.Path = config.ResolveNVIDIACTKPath(r.logger, cfg.NVIDIACTKConfig.Path)
where we call ResolveNVIDIACTKPath
?
We can work on properly handling the config changes for the runtime in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Change made and pushed.
a647926
to
940579d
Compare
Yup already working on that one. Since it causes CI to fail, I get to see it. :-) |
940579d
to
764209a
Compare
// This executable is used in hooks and needs to be an absolute path. | ||
// If the path is specified as an absolute path, it is used directly | ||
// without checking for existence of an executable at that path. | ||
func ResolveNVIDIACDIHookPath(logger logger.Interface, nvidiaCDIHookPath string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In responding to cncf-tags/container-device-interface#203 (comment) I had the thought that the concept of a CDI Hook executable could be standardized so that the nvidia-cdi-hook
is a vendor-specific implementation. Does that mean that we should use cdiHookPath
instead of using nvidiaCDIHookPath
for our variables and config values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters all that much. Until such time as that actually happens, might as well get this in and iterate on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. It can definitely be done as a follow-up.
764209a
to
0b65105
Compare
@elezar do you mind changing deitch#1 to be a PR against the branch I used for this PR separate-hook-binary rather than against |
@deitch updated. |
Merged them in. Should probably kick off CI again here. |
This change creates an nvidia-cdi-hook binary for implementing CDI hooks. This allows for these hooks to be separated from the nvidia-ctk command which may, for example, require libnvidia-ml to support other functionality. The nvidia-ctk hook subcommand is maintained as an alias for the time being to allow for existing CDI specifications referring to this path to work as expected. Signed-off-by: Avi Deitcher <[email protected]>
667881d
to
179d865
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this @deitch.
I think this is cleaner than having the hooks as a subcommand of the nvidia-ctk -- especially given the discussion around cncf-tags/container-device-interface#203
My pleasure. I am glad it worked out. |
Fixes #435
This does two things.
First: creates a new binary called
nvidia-cdi-hook
, as suggested by @klueska in #435. For backwards compatibility and ease of migration, I kept the functionality innvidia-ctk
as well as innvidia-cdi-hook
, so the following are equal:To avoid duplication, all of the command code that was under
cmd/nvidia-ctk/hook/{create-symlinks,update-ldcache,chmod}/
was moved tocmd/nvidia-cdi-hook/
, andcmd/nvidia-ctk/hook/hook.go
references those. This keeps the same functionality in place fornvidia-ctk
, while avoiding duplication of code.Second, I changed references in
nvidia-ctk cdi generate
from putting hooks that callnvidia-ctk hook {subcommand}
withnvidia-cdi-generate {subcommand}
. I am not convinced I got it correctly, and kept properties that referencenvidiaCTKPath
in addition tonvidiaCDIHookPath
, just in case.There are 16 places left where
nvidiaCTKPath
remains in the code, almost all of them properties or setting options, which means they probably could be removed.As far as I could tell, and I am not positive, references to
nvidia-ctk
inside the code are used only for creating hooks. But I am not 100% sure, hence leaving them around. Looking forward to comments.