Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any hint on resume training (load state dict) for Orthogonal module? #7

Open
wtomin opened this issue Oct 19, 2021 · 1 comment
Open

Comments

@wtomin
Copy link

wtomin commented Oct 19, 2021

Hi, authors. Thanks for providing this repo.

I'm currently using the Orthogonal module and define it as part of my model weights. When I tried to resume training from a checkpoint, an unexpected error occurred when I executed "load_state_dict":

 Unexpected key(s) in state_dict: "rotation_matrices._B"

rotation_matrices is the name of the Orthogonal object. I think the error ocurred because when the model is initialized, rotation_matrices._B=None, so that the _B weights in the state_dict cannot be loaded.

I tried two methods to solve this problme, but both failed.

  1. Retract _B before load_state_dict:
      mod = rotation_matrices
      not_B = mod._B is None
      if not_B or (not mod._B.grad_fn and torch.is_grad_enabled()):
          B = mod.retraction(mod.A, mod.base)
          mod._B = mod.retraction(mod.A, mod.base).detach()
          # Just to be safe
          mod._B.requires_grad_()
          # Now self._B it's not a leaf tensor, so we convert it into a leaf
          mod._B.retain_grad()

      ... ...
     # Then in the main.py, I run
    model.load_state_dict()

At this point, it did not raise error. The error occurred when running backprogation loss.backward()

 exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: output with shape [64] doesn't match the broadcast shape [128, 768, 1, 64]
  1. load_state_dict(state_dict, strict=False)
    Instead of re-defining _B, I change the strict argument fed into load_state_dict. The error occurred when executing loss.backward():
 exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: The size of tensor a (64) must match the size of tensor b (32) at non-singleton dimension 0

I feel like it has something to do the optimizer. Could you give me some suggestions?

@lezcano
Copy link
Owner

lezcano commented Oct 19, 2021

As it says in the README, this repo has been superseded by https://github.com/Lezcano/geotorch
Have you tried with the tools in that repo?

Even more, in master the torch.nn.utils.parametrizations.orthogonal (to be released in PyTorch 1.11 soon) will bring an improved version of this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants