-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for initial hidden state in RNNs #230
Comments
Yes, this will be supported soon, since streaming mode (#185) requires the hidden state to be buffered anyways. EDIT: |
|
Yes, this is a bit unfortunate, but I don't see a real problem. I think it is more important that the parameters have the same ordering and naming, rather than having a certain parameter name always last. I don't like the |
The previous output ans state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230.
The previous output ans state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230.
After digging into this issue a bit I realise that we will move straight towards/into if/elif/else-hell if we implement this right. The problem is the following:
Thus the question is: is there a real need for this or is it just for completeness? E.g., Lasagne does not learn these initial states per default. |
We will have to deal with this problem anyways, as soon as we want to implement any streaming-capable algorithms for e.g. HMMs or CRFs, because they always have initial distributions/factors. I think all points can be solved by re-creating the models with the appropriate attributes. If there are models "in the wild" that do not have these attributes, we could overwrite the Regarding your last question, yes, I currently learn the initial state for some models. I have not evaluated if this reduces the loss significantly, but preliminary experiments indicate so. I don't need this solved right now, because I work on a locally patched copy of madmom with initial states enabled. But as I wrote earlier, I think we will face this problem sooner or later anyways. |
Yes, re-creating the models with the needed attributes would solve all problems. However, I tried to avoid this so far. But it looks like this is the only save method to not clutter the code unnecessarily. Although patching the objects when being loaded is a very simple and elegant solution for saved models, I am not sure if it works for pickled processors in general. But maybe it is ok to be not backwards-compatible for all scenarios and require old code to be able to process old pickled processors. |
You are right, if the RNN is part of a pickled processing chain, the solution would fail. I thus don't think it is a viable option. But, even if we disregard the initial state problem (and assume it is all zeros), we still have to tackle the resetting problem - we need to ensure that calling the Processor in non-streaming mode twice with the same data returns the exact same results. I think this is what we should focus on, because once we solve this, setting an initial value should be just checking once per sequence if an initial value was given. I do not have any solution for this right now, because I am not very familiar with streaming mode yet. Should we start a separate issue to discuss this? |
I started issue #237 for the handling of states of state-ful classes. |
The previous output and state of recurrent layers is saved. This makes the layers more flexible and also compatible for streaming mode. Fixes #230
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters, to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers
added initialisation of hidden states to layers; fixes #230 renamed GRU parameters to be consistend with all other layers
Madmom's RNNs do not support a learned initial hidden state. Since some deep learning frameworks (e.g. Lasagne support this, it might be useful to have such a thing.
Should we adapt the current RecurrentLayer or create a new RecurrentLayer class?
The text was updated successfully, but these errors were encountered: