Multiply module by a learnable constant #2392

jvwilliams23 · 2023-11-28T10:51:21Z

jvwilliams23
Nov 28, 2023

Hi,

TLDR; I want to multiply a LBANN module like lbann.Gaussian which has a 3 dimensional output e.g. (512, 8, 8) with a learnable scalar.

I am trying to convert StyleGAN2 from pytorch to LBANN. One part of this is at each layer we create a feature map with random noise to add to the outputs of the convolution layer. For example, in pytorch it would be like so:

def __init__():

  # learnable noise constant (B box in Karras 2019 Figure 1)
  self.noise_strength = torch.nn.Parameter(torch.zeros([])) 

def forward(x):
  ...
  x = self.conv(x, ...)
  
  # create a 
  noise = (
    self.noise_function(
    [x.shape[0], 1, self.resolution, self.resolution],
    device=x.device,
  ) * self.noise_strength 
  x += noise
  ...

In LBANN, I have the following:

def __init__():

            self.noise_strength = lbann.Weights(lbann.ValueInitializer(values=0.0)) 
            self.noise_strength_layer = lbann.WeightsLayer(
                dims = [1],
                name = f'{name}_noise_multiplier',
                weights = self.noise_strength
            )
           ...

def forward(x):
            x = self.conv(x)
            noise = lbann.Scale(
                lbann.Gaussian(
                    mean=0,
                    stdev=1,
                    neuron_dims=[self.out_channels, self.resolution, self.resolution],
                    # hint_layer=x
                ), 
                constant=self.noise_strength_layer
            )
            x = lbann.Add(x, noise)

But lbann.Scale expects the constant to be a float or int, not a WeightsLayer, which I am using to make it a learnable value (maybe where I am going wrong?).

Similarly, I have tried lbann.Multiply but that expects lbann.Gaussian to have the same shape as self.noise_strength_layer. If I make noise_strength_layer have the same shape, it then implies there is not one learnable constant, but an entire feature map of learnable values.

Any tips on how to do this seemingly simple task in LBANN?

Thanks,
Josh

Answered by tbennun

Nov 28, 2023

Your Python solution is close to what I'd recommend doing. Instead of concatenating the values over three dimensions, you should use the Tessellate layer with the right dimensions. Both the Tessellate solution and yours will have only one learnable parameter per layer.

View full answer

jvwilliams23 · 2023-11-28T15:30:02Z

jvwilliams23
Nov 28, 2023
Author

A potential solution, but not sure if it still gives > 1 learnable parameters per layer:

def __init__():
            ...
            self.expected_shape = [self.output_channels, self.resolution, self.resolution] 
            self.noise_strength = lbann.Weights(lbann.ValueInitializer(values=0.0)) 
            self.noise_strength_layer_orig = lbann.WeightsLayer(
                dims = [1,1,1],
                name = f'{name}_noise_multiplier',
                weights = self.noise_strength
            )
            ...

def forward(x):
            ...
            self.noise_strength_layer = lbann.Identity(self.noise_strength_layer_orig)
            for _ in range(0, int(np.log2(self.expected_shape[0]))):
                self.noise_strength_layer = lbann.Concatenation([self.noise_strength_layer], [lbann.Identity(self.noise_strength_layer)], axis=0)
            for _ in range(0, int(np.log2(self.expected_shape[1]))):
                self.noise_strength_layer = lbann.Concatenation([self.noise_strength_layer], [lbann.Identity(self.noise_strength_layer)], axis=1)
            for _ in range(0, int(np.log2(self.expected_shape[2]))):
                self.noise_strength_layer = lbann.Concatenation([self.noise_strength_layer], [lbann.Identity(self.noise_strength_layer)], axis=2)

            gaussian_noise = lbann.Gaussian(
                mean=0,
                stdev=1,
                neuron_dims=self.expected_shape,
            )
            noise = lbann.Multiply(
                gaussian_noise, 
                self.noise_strength_layer
            )
            x = lbann.Add(x, noise)
            ...

Is there a simpler solution?

0 replies

tbennun · 2023-11-28T16:00:28Z

tbennun
Nov 28, 2023
Collaborator

Your Python solution is close to what I'd recommend doing. Instead of concatenating the values over three dimensions, you should use the Tessellate layer with the right dimensions. Both the Tessellate solution and yours will have only one learnable parameter per layer.

6 replies

jvwilliams23 Nov 29, 2023
Author

@tbennun I tested it and it worked, thanks.
At a different point in my python code I have a similar problem that does not work with tessellate:
I need to multiply the weights of my convolution layer which have shape [out_channels, in_channels, kernel_size, kernel_size] by an array with shape [in_channels].

Many operators do not work with >3D arrays (I could not use MatMul), Tessellate only works with 3D arrays, but the weights cannot be reshaped to from 4D to 3D (get an error saying they are fixed).

Any ideas how to do this?

szaman19 Nov 29, 2023

Ah yes, that is a limitation we have on the matmul layer. Are you trying to perform elementwise multiplication of your of the convolution weights with your array or are you are you performing a tensor contraction of the weights to get an output of shape [out_channels, 1, kernel_size, kernel_size]?

jvwilliams23 Nov 29, 2023
Author

I am trying to do elementwise multiplication. The pytorch code I am trying to convert is like so:

# weight has shape [out_channels, in_channels, kernel_size, kernel_size]
# styles has shape [in_channels]
styles_reshaped = styles.reshape(1, -1, 1, 1)
w = weight * styles_reshaped
...
return torch.nn.functional.conv2d(input=x, weight=w)

(I have omitted the batch size from shape since it is implicit in LBANN)

In LBANN I want to do something like:

styles_reshaped = ???
w = lbann.Multiply(weight, styles_reshaped)
...
x = lm.Convolution2dModule(
                out_channels=out_channels,
                kernel_size=kernel_size,
                weights = w.weights
            )(x)
...

where w is of the type lbann.WeightsLayer.

FYI, styles is the output of a lm.FullyConnectedModule.

szaman19 Nov 29, 2023

Ok, so this is a bit tricky. I am not sure if there is a way of tackling this with the current codebase.

The issue is when you create a WeightsLayer from some Weights, it expands the weights to have the same width as the current mini-batch out-of-place. While gradients are passed back to the weights, I don't think there is a way to update the original Weights in place. Weights are generally updated by the Execution_Algorithm. Does that sound right @tbennun @benson31?

Basically:

weights = lbann.Weights(...)
x = lm.Convolution2dModule(
                out_channels=out_channels,
                kernel_size=kernel_size,
                weights = weights  # <--This is possible
            )(x)

# I think we are able to share pointers to the same weights from multiple layers
# But I have never tested it out
w = lbann.WeightsLayer(weights, dims=[out_channels, in_channels, kernel_size, kernel_size))

styles_reshaped = lbann.Reshape(lbann.Tessallate(styles,  dims=[out_channels, in_channels*kernel_size*kernel_size],
                                                        dims=[out_channels, in_channels, kernel_size, kernel_size])
                               
w = lbann.MultiplyOperator(w, styles_reshaped)
...
x = lm.Convolution2dModule(
                out_channels=out_channels,
                kernel_size=kernel_size,
                weights = w.weights  # <--- Not sure if this is possible
            )(x)
...

jvwilliams23 Nov 30, 2023
Author

The following does work:

# styles, is output from a FullyConnectedModule with shape [in_channels]
# weight is a weight layer with shape [out_channels, in_channels, kernel_size, kernel_size]
# x is a featuremap with shape [in_channels, in_resolution, in_resolution]

w = weight
styles_reshaped = lbann.Reshape(styles, dims=[in_channels, 1, 1], name=f"reshape_styles_in{in_resolution}_out{resolution}")
styles_shape_weights = lbann.Tessellate(styles_reshaped, dims=[in_channels*out_channels, kernel_size, kernel_size])
styles_shape_weights = lbann.Reshape(styles_shape_weights, dims=[out_channels, in_channels, kernel_size, kernel_size])
w = lbann.Multiply(w, styles_shape_weights)

x = lm.Convolution2dModule(
    out_channels=out_channels,
    kernel_size=kernel_size,
    weights = w.weights
)(x)

Note that the following works fine (which answers your query about if this is possible in the previous post)

w = weight # type is lbann.WeightsLayer
x = lm.Convolution2dModule(
    out_channels=out_channels,
    kernel_size=kernel_size,
    weights = w.weights
)(x)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiply module by a learnable constant #2392

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Multiply module by a learnable constant #2392

jvwilliams23 Nov 28, 2023

Replies: 2 comments · 6 replies

jvwilliams23 Nov 28, 2023 Author

tbennun Nov 28, 2023 Collaborator

jvwilliams23 Nov 29, 2023 Author

szaman19 Nov 29, 2023

jvwilliams23 Nov 29, 2023 Author

szaman19 Nov 29, 2023

jvwilliams23 Nov 30, 2023 Author

jvwilliams23
Nov 28, 2023

Replies: 2 comments 6 replies

jvwilliams23
Nov 28, 2023
Author

tbennun
Nov 28, 2023
Collaborator

jvwilliams23 Nov 29, 2023
Author

jvwilliams23 Nov 29, 2023
Author

jvwilliams23 Nov 30, 2023
Author