Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for initial hidden state in RNNs #230

Closed
fdlm opened this issue Dec 2, 2016 · 8 comments
Closed

Allow for initial hidden state in RNNs #230

fdlm opened this issue Dec 2, 2016 · 8 comments

Comments

@fdlm
Copy link
Contributor

fdlm commented Dec 2, 2016

Madmom's RNNs do not support a learned initial hidden state. Since some deep learning frameworks (e.g. Lasagne support this, it might be useful to have such a thing.

Should we adapt the current RecurrentLayer or create a new RecurrentLayer class?

@superbock
Copy link
Collaborator

superbock commented Dec 2, 2016

Yes, this will be supported soon, since streaming mode (#185) requires the hidden state to be buffered anyways.

EDIT: It should be save to add a init variable (with default=None) to the existing class.

@fdlm
Copy link
Contributor Author

fdlm commented Dec 2, 2016

  1. In the GRULayer class, this parameter is called hid_init, so maybe we should go with this.

  2. If we want to set a default value for this parameter, it must come after the activation_fn parameter:

    def __init__(self, weights, bias, recurrent_weights, activation_fn, hid_init=None):

    Isn't this weird? All other layers have activation_fn as last parameter (if they have an activation function).

@superbock
Copy link
Collaborator

superbock commented Dec 3, 2016

Yes, this is a bit unfortunate, but I don't see a real problem. I think it is more important that the parameters have the same ordering and naming, rather than having a certain parameter name always last.

I don't like the hid_init name, but that's the way it is now. But if we want to change it, I'd propose to change it now and update all other classes consistently. This could be done together with streaming mode #185, as it requires access to previous outputs / states anyways. We can keep/access the previous output of the layers as attribute of the class. These can be initialised accordingly then.

superbock pushed a commit that referenced this issue Dec 4, 2016
The previous output ans state of recurrent layers is saved. This makes the
layers more flexible and also compatible for streaming mode. Fixes #230.
superbock pushed a commit that referenced this issue Dec 4, 2016
The previous output ans state of recurrent layers is saved. This makes the
layers more flexible and also compatible for streaming mode. Fixes #230.
@superbock
Copy link
Collaborator

After digging into this issue a bit I realise that we will move straight towards/into if/elif/else-hell if we implement this right.

The problem is the following:

  • if we want to save the state, we need to add another attribute, which is not available in the saved models
  • if we want to initialise these states, we need to save this information independently (otherwise we can not reset to this initial state later on)
  • both attributes need to be checked if they are present at every single time step via hasattr()
  • when we want to reset the layer, we need to decide if we reset to 0 or to the initial state

Thus the question is: is there a real need for this or is it just for completeness? E.g., Lasagne does not learn these initial states per default.

@fdlm
Copy link
Contributor Author

fdlm commented Dec 5, 2016

We will have to deal with this problem anyways, as soon as we want to implement any streaming-capable algorithms for e.g. HMMs or CRFs, because they always have initial distributions/factors.

I think all points can be solved by re-creating the models with the appropriate attributes. If there are models "in the wild" that do not have these attributes, we could overwrite the load functions in the respective processors, and issue a deprecation warning for some future release, and add this attribute when loading the processor.

Regarding your last question, yes, I currently learn the initial state for some models. I have not evaluated if this reduces the loss significantly, but preliminary experiments indicate so. I don't need this solved right now, because I work on a locally patched copy of madmom with initial states enabled. But as I wrote earlier, I think we will face this problem sooner or later anyways.

@superbock
Copy link
Collaborator

Yes, re-creating the models with the needed attributes would solve all problems. However, I tried to avoid this so far. But it looks like this is the only save method to not clutter the code unnecessarily.

Although patching the objects when being loaded is a very simple and elegant solution for saved models, I am not sure if it works for pickled processors in general. But maybe it is ok to be not backwards-compatible for all scenarios and require old code to be able to process old pickled processors.

@fdlm
Copy link
Contributor Author

fdlm commented Dec 6, 2016

You are right, if the RNN is part of a pickled processing chain, the solution would fail. I thus don't think it is a viable option.

But, even if we disregard the initial state problem (and assume it is all zeros), we still have to tackle the resetting problem - we need to ensure that calling the Processor in non-streaming mode twice with the same data returns the exact same results. I think this is what we should focus on, because once we solve this, setting an initial value should be just checking once per sequence if an initial value was given.

I do not have any solution for this right now, because I am not very familiar with streaming mode yet.

Should we start a separate issue to discuss this?

@superbock
Copy link
Collaborator

I started issue #237 for the handling of states of state-ful classes.

superbock pushed a commit that referenced this issue Dec 6, 2016
The previous output and state of recurrent layers is saved. This makes the
layers more flexible and also compatible for streaming mode. Fixes #230
superbock pushed a commit that referenced this issue Dec 8, 2016
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters, to be consistend with all other layers
superbock pushed a commit that referenced this issue Dec 8, 2016
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters, to be consistend with all other layers
superbock pushed a commit that referenced this issue Jan 18, 2017
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters, to be consistend with all other layers
superbock pushed a commit that referenced this issue Jan 18, 2017
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters to be consistend with all other layers
superbock pushed a commit that referenced this issue Jan 18, 2017
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters to be consistend with all other layers
superbock pushed a commit that referenced this issue Jan 18, 2017
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters to be consistend with all other layers
superbock pushed a commit that referenced this issue Jan 18, 2017
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters to be consistend with all other layers
superbock pushed a commit that referenced this issue Jan 18, 2017
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters to be consistend with all other layers
superbock pushed a commit that referenced this issue Jan 19, 2017
added initialisation of hidden states to layers; fixes #230
renamed GRU parameters to be consistend with all other layers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants