Allow for initial hidden state in RNNs #230

fdlm · 2016-12-02T21:07:05Z

Madmom's RNNs do not support a learned initial hidden state. Since some deep learning frameworks (e.g. Lasagne support this, it might be useful to have such a thing.

Should we adapt the current RecurrentLayer or create a new RecurrentLayer class?

superbock · 2016-12-02T21:14:54Z

Yes, this will be supported soon, since streaming mode (#185) requires the hidden state to be buffered anyways.

EDIT: ~~It should be save to add a init variable (with default=None) to the existing class.~~

fdlm · 2016-12-02T21:50:43Z

In the GRULayer class, this parameter is called hid_init, so maybe we should go with this.
If we want to set a default value for this parameter, it must come after the activation_fn parameter:
```
def __init__(self, weights, bias, recurrent_weights, activation_fn, hid_init=None):
```
Isn't this weird? All other layers have activation_fn as last parameter (if they have an activation function).

superbock · 2016-12-03T10:01:00Z

Yes, this is a bit unfortunate, but I don't see a real problem. I think it is more important that the parameters have the same ordering and naming, rather than having a certain parameter name always last.

I don't like the hid_init name, but that's the way it is now. But if we want to change it, I'd propose to change it now and update all other classes consistently. This could be done together with streaming mode #185, as it requires access to previous outputs / states anyways. We can keep/access the previous output of the layers as attribute of the class. These can be initialised accordingly then.

The previous output ans state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230.

superbock · 2016-12-05T14:39:20Z

After digging into this issue a bit I realise that we will move straight towards/into if/elif/else-hell if we implement this right.

The problem is the following:

if we want to save the state, we need to add another attribute, which is not available in the saved models
if we want to initialise these states, we need to save this information independently (otherwise we can not reset to this initial state later on)
both attributes need to be checked if they are present at every single time step via hasattr()
when we want to reset the layer, we need to decide if we reset to 0 or to the initial state

Thus the question is: is there a real need for this or is it just for completeness? E.g., Lasagne does not learn these initial states per default.

fdlm · 2016-12-05T16:03:32Z

We will have to deal with this problem anyways, as soon as we want to implement any streaming-capable algorithms for e.g. HMMs or CRFs, because they always have initial distributions/factors.

I think all points can be solved by re-creating the models with the appropriate attributes. If there are models "in the wild" that do not have these attributes, we could overwrite the load functions in the respective processors, and issue a deprecation warning for some future release, and add this attribute when loading the processor.

Regarding your last question, yes, I currently learn the initial state for some models. I have not evaluated if this reduces the loss significantly, but preliminary experiments indicate so. I don't need this solved right now, because I work on a locally patched copy of madmom with initial states enabled. But as I wrote earlier, I think we will face this problem sooner or later anyways.

superbock · 2016-12-05T16:30:50Z

Yes, re-creating the models with the needed attributes would solve all problems. However, I tried to avoid this so far. But it looks like this is the only save method to not clutter the code unnecessarily.

Although patching the objects when being loaded is a very simple and elegant solution for saved models, I am not sure if it works for pickled processors in general. But maybe it is ok to be not backwards-compatible for all scenarios and require old code to be able to process old pickled processors.

fdlm · 2016-12-06T08:47:08Z

You are right, if the RNN is part of a pickled processing chain, the solution would fail. I thus don't think it is a viable option.

But, even if we disregard the initial state problem (and assume it is all zeros), we still have to tackle the resetting problem - we need to ensure that calling the Processor in non-streaming mode twice with the same data returns the exact same results. I think this is what we should focus on, because once we solve this, setting an initial value should be just checking once per sequence if an initial value was given.

I do not have any solution for this right now, because I am not very familiar with streaming mode yet.

Should we start a separate issue to discuss this?

superbock · 2016-12-06T09:37:11Z

I started issue #237 for the handling of states of state-ful classes.

The previous output and state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers

fdlm mentioned this issue Dec 2, 2016

[WIP] added initial states to RNN and LSTM layers #233

Closed

superbock mentioned this issue Dec 4, 2016

[WIP] save previous output and state of RNNs #235

Closed

superbock pushed a commit that referenced this issue Dec 4, 2016

Refactor RNNs to allow initialisation of the previous output and states.

718f934

The previous output ans state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230.

superbock pushed a commit that referenced this issue Dec 4, 2016

Refactor RNNs to allow initialisation of the previous output and states.

4ebd9ce

The previous output ans state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230.

superbock mentioned this issue Dec 6, 2016

Proper handling of state-ful processors #237

Closed

4 tasks

superbock pushed a commit that referenced this issue Dec 6, 2016

Refactor RNNs to allow initialisation of the previous output and states.

1e8cd39

The previous output and state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230

superbock pushed a commit that referenced this issue Dec 8, 2016

add stateful processing of recurrent layers

99846e9

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers

superbock pushed a commit that referenced this issue Dec 8, 2016

add stateful processing of recurrent layers

eb0310e

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers

superbock pushed a commit that referenced this issue Jan 18, 2017

add stateful processing of recurrent layers

d03da7a

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers

superbock pushed a commit that referenced this issue Jan 18, 2017

add stateful processing of recurrent layers

4541a3c

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers

superbock mentioned this issue Jan 18, 2017

stateful RNNs #243

Merged

superbock pushed a commit that referenced this issue Jan 18, 2017

add stateful processing of recurrent layers

79bbafc

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers

superbock pushed a commit that referenced this issue Jan 18, 2017

add stateful processing of recurrent layers

af11059

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers

superbock pushed a commit that referenced this issue Jan 18, 2017

add stateful processing of recurrent layers

d528b7e

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers

superbock pushed a commit that referenced this issue Jan 18, 2017

add stateful processing of recurrent layers

51314d5

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers

superbock pushed a commit that referenced this issue Jan 19, 2017

add stateful processing of recurrent layers

1ce9cf6

added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers

superbock closed this as completed in a92614d Jan 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for initial hidden state in RNNs #230

Allow for initial hidden state in RNNs #230

fdlm commented Dec 2, 2016 •

edited

Loading

superbock commented Dec 2, 2016 •

edited

Loading

fdlm commented Dec 2, 2016

superbock commented Dec 3, 2016 •

edited

Loading

superbock commented Dec 5, 2016

fdlm commented Dec 5, 2016

superbock commented Dec 5, 2016

fdlm commented Dec 6, 2016

superbock commented Dec 6, 2016

Allow for initial hidden state in RNNs #230

Allow for initial hidden state in RNNs #230

Comments

fdlm commented Dec 2, 2016 • edited Loading

superbock commented Dec 2, 2016 • edited Loading

fdlm commented Dec 2, 2016

superbock commented Dec 3, 2016 • edited Loading

superbock commented Dec 5, 2016

fdlm commented Dec 5, 2016

superbock commented Dec 5, 2016

fdlm commented Dec 6, 2016

superbock commented Dec 6, 2016

fdlm commented Dec 2, 2016 •

edited

Loading

superbock commented Dec 2, 2016 •

edited

Loading

superbock commented Dec 3, 2016 •

edited

Loading