Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option in toolkit container to enable CDI in runtime #838

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 2 additions & 19 deletions cmd/nvidia-ctk/runtime/configure/configure.go
Original file line number Diff line number Diff line change
Expand Up @@ -292,9 +292,8 @@ func (m command) configureConfigFile(c *cli.Context, config *config) error {
return fmt.Errorf("unable to update config: %v", err)
}

err = enableCDI(config, cfg)
if err != nil {
return fmt.Errorf("failed to enable CDI in %s: %w", config.runtime, err)
if config.cdi.enabled {
cfg.EnableCDI()
}

outputPath := config.getOutputConfigPath()
Expand Down Expand Up @@ -354,19 +353,3 @@ func (m *command) configureOCIHook(c *cli.Context, config *config) error {
}
return nil
}

// enableCDI enables the use of CDI in the corresponding container engine
func enableCDI(config *config, cfg engine.Interface) error {
if !config.cdi.enabled {
return nil
}
switch config.runtime {
case "containerd":
cfg.Set("enable_cdi", true)
case "docker":
cfg.Set("features", map[string]bool{"cdi": true})
default:
return fmt.Errorf("enabling CDI in %s is not supported", config.runtime)
}
return nil
}
1 change: 1 addition & 0 deletions pkg/config/engine/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ type Interface interface {
RemoveRuntime(string) error
Save(string) (int64, error)
GetRuntimeConfig(string) (RuntimeConfig, error)
EnableCDI()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Set is only used to enable CDI. Let's remove that from the interface if that's the case.

}

// RuntimeConfig defines the interface to query container runtime handler configuration
Expand Down
4 changes: 4 additions & 0 deletions pkg/config/engine/containerd/config_v1.go
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,7 @@ func (c *ConfigV1) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) {
tree: runtimeData,
}, nil
}

func (c *ConfigV1) EnableCDI() {
c.Set("enable_cdi", true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to cast c to a Config type and call EnableCDI here instead of reimplementing it.

}
4 changes: 4 additions & 0 deletions pkg/config/engine/containerd/containerd.go
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,10 @@ func (c *Config) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) {
}, nil
}

func (c *Config) EnableCDI() {
c.Set("enable_cdi", true)
}

// CommandLineSource returns the CLI-based containerd config loader
func CommandLineSource(hostRoot string) toml.Loader {
return toml.FromCommandLine(chrootIfRequired(hostRoot, "containerd", "config", "dump")...)
Expand Down
3 changes: 3 additions & 0 deletions pkg/config/engine/crio/crio.go
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,9 @@ func (c *Config) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) {
}, nil
}

// no-op since CDI is always enabled in versions where CDI is supported
func (c *Config) EnableCDI() {}

// CommandLineSource returns the CLI-based crio config loader
func CommandLineSource(hostRoot string) toml.Loader {
return toml.LoadFirst(
Expand Down
4 changes: 4 additions & 0 deletions pkg/config/engine/docker/docker.go
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,7 @@ func (c *Config) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) {
}
return &dockerRuntime{}, nil
}

func (c *Config) EnableCDI() {
c.Set("features", map[string]bool{"cdi": true})
}
19 changes: 12 additions & 7 deletions tools/container/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,14 @@ const (

// Options defines the shared options for the CLIs to configure containers runtimes.
type Options struct {
Config string
Socket string
RuntimeName string
RuntimeDir string
SetAsDefault bool
RestartMode string
HostRootMount string
Config string
Socket string
RuntimeName string
RuntimeDir string
SetAsDefault bool
RestartMode string
HostRootMount string
RuntimeEnableCDI bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with the Runtime* prefix here. Runtime in this context refers to the NVIDIA runtime. I would just use EnableCDI here.

}

// ParseArgs parses the command line arguments to the CLI
Expand Down Expand Up @@ -111,6 +112,10 @@ func (o Options) UpdateConfig(cfg engine.Interface) error {
}
}

if o.RuntimeEnableCDI {
cfg.EnableCDI()
}

return nil
}

Expand Down
45 changes: 45 additions & 0 deletions tools/container/runtime/containerd/config_v1_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,51 @@ func TestUpdateV1ConfigWithRuncPresent(t *testing.T) {
}
}

func TestUpdateV1EnableCDI(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"

testCases := []struct {
runtimeEnableCDI bool
expectedEnableCDIValue interface{}
}{
{},
{
runtimeEnableCDI: false,
expectedEnableCDIValue: nil,
},
{
runtimeEnableCDI: true,
expectedEnableCDIValue: true,
},
}

for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: "nvidia",
RuntimeDir: runtimeDir,
RuntimeEnableCDI: tc.runtimeEnableCDI,
}

cfg, err := toml.Empty.Load()
require.NoError(t, err, "%d: %v", i, tc)

v1 := &containerd.ConfigV1{
Logger: logger,
Tree: cfg,
RuntimeType: runtimeType,
}

err = o.UpdateConfig(v1)
require.NoError(t, err, "%d: %v", i, tc)

enableCDIValue := v1.GetPath([]string{"plugins", "cri", "containerd", "enable_cdi"})
require.EqualValues(t, tc.expectedEnableCDIValue, enableCDIValue, "%d: %v", i, tc)
})
}
}

func TestRevertV1Config(t *testing.T) {
testCases := []struct {
config map[string]interface {
Expand Down
46 changes: 46 additions & 0 deletions tools/container/runtime/containerd/config_v2_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,52 @@ func TestUpdateV2ConfigWithRuncPresent(t *testing.T) {
}
}

func TestUpdateV2ConfigEnableCDI(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"

testCases := []struct {
runtimeEnableCDI bool
expectedEnableCDIValue interface{}
}{
{},
{
runtimeEnableCDI: false,
expectedEnableCDIValue: nil,
},
{
runtimeEnableCDI: true,
expectedEnableCDIValue: true,
},
}

for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: "nvidia",
RuntimeDir: runtimeDir,
SetAsDefault: false,
RuntimeEnableCDI: tc.runtimeEnableCDI,
}

cfg, err := toml.LoadMap(map[string]interface{}{})
require.NoError(t, err)

v2 := &containerd.Config{
Logger: logger,
Tree: cfg,
RuntimeType: runtimeType,
}

err = o.UpdateConfig(v2)
require.NoError(t, err)

enableCDIValue := cfg.GetPath([]string{"plugins", "io.containerd.grpc.v1.cri", "enable_cdi"})
require.EqualValues(t, tc.expectedEnableCDIValue, enableCDIValue)
})
}
}

func TestRevertV2Config(t *testing.T) {
testCases := []struct {
config map[string]interface {
Expand Down
4 changes: 4 additions & 0 deletions tools/container/runtime/docker/docker.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ func Setup(c *cli.Context, o *container.Options) error {
return fmt.Errorf("unable to configure docker: %v", err)
}

if o.RuntimeEnableCDI {
cfg.Set("features", map[string]bool{"cdi": true})
}

err = RestartDocker(o)
if err != nil {
return fmt.Errorf("unable to restart docker: %v", err)
Expand Down
15 changes: 13 additions & 2 deletions tools/container/runtime/runtime.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ import (
const (
defaultSetAsDefault = true
// defaultRuntimeName specifies the NVIDIA runtime to be use as the default runtime if setting the default runtime is enabled
defaultRuntimeName = "nvidia"
defaultHostRootMount = "/host"
defaultRuntimeName = "nvidia"
defaultHostRootMount = "/host"
defaultRuntimeEnableCDI = false

runtimeSpecificDefault = "RUNTIME_SPECIFIC_DEFAULT"
)
Expand Down Expand Up @@ -89,6 +90,13 @@ func Flags(opts *Options) []cli.Flag {
EnvVars: []string{"NVIDIA_RUNTIME_SET_AS_DEFAULT", "CONTAINERD_SET_AS_DEFAULT", "DOCKER_SET_AS_DEFAULT"},
Hidden: true,
},
&cli.BoolFlag{
Name: "runtime-enable-cdi",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let the flag be passed as --enable-cdi. The envar and the Option variable can have the Runtime/RUNTIME prefix. This is consistent with the other flags like socket, restart-mode etc

Suggested change
Name: "runtime-enable-cdi",
Name: "enable-cdi",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to be consistent with the option for the nvidia-ctk runtime configure command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we already have a cdi-enabled / enable-cdi flag defined in the toolkit options which triggers the generation of a CDI spec:

&cli.BoolFlag{
Name: "cdi-enabled",
Aliases: []string{"enable-cdi"},
Usage: "enable the generation of a CDI specification",
Destination: &opts.cdiEnabled,
EnvVars: []string{"CDI_ENABLED", "ENABLE_CDI"},
},

Any ideas on what the name of the new flag should be?

Usage: "Enable CDI in the configured runtime",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Usage: "Enable CDI in the configured runtime",
Usage: "Enable Container Device Interface (CDI) in the configured runtime",

Value: defaultRuntimeEnableCDI,
Destination: &opts.RuntimeEnableCDI,
EnvVars: []string{"RUNTIME_ENABLE_CDI"},
},
}

flags = append(flags, containerd.Flags(&opts.containerdOptions)...)
Expand Down Expand Up @@ -124,6 +132,9 @@ func ValidateOptions(opts *Options, runtime string, toolkitRoot string) error {
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = crio.DefaultRestartMode
}
if opts.RuntimeEnableCDI {
opts.RuntimeEnableCDI = false
}
case docker.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = docker.DefaultConfig
Expand Down
Loading