Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] added initial states to RNN and LSTM layers #233

Closed
wants to merge 1 commit into from
Closed

Conversation

fdlm
Copy link
Contributor

@fdlm fdlm commented Dec 2, 2016

This is a first implementation of initial states for RNN and LSTM layers. Needs some testing. Addresses #230.

Copy link
Collaborator

@superbock superbock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the proposed changes will work the way they are now, I think we should address this together with #185, since we need to store the previous outputs and states anyways. These could then be initialised accordingly.

@@ -98,12 +98,16 @@ class RecurrentLayer(FeedForwardLayer):
Recurrent weights.
activation_fn : numpy ufunc
Activation function.
hid_init : numpy array, shape (), optional
Initial state of hidden units.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really happy with the name hid_init, I'd prefer simply init, or -- if this is too generic -- hidden_init.

@@ -125,6 +129,9 @@ def activate(self, data):
return super(RecurrentLayer, self).activate(data)
# weight input and add bias
out = np.dot(data, self.weights) + self.bias
# if we have a pre-initialised hidden state, add it
if self.hid_init is not None:
out[0] += np.dot(self.hid_init, self.recurrent_weights)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic here should be refactored.

Since we have to keep the last output anyways for streaming mode, me might just save out as self.out to be able to access it in the next step. The initialisation of self.out could then be done in __init__ with the given (learned) value.

The logic in the for loop below has to be changed to always access self.out -- or in case of block-wise processing -- the last item thereof.

self.input_gate = input_gate
self.forget_gate = forget_gate
self.cell = cell
self.output_gate = output_gate
self.activation_fn = activation_fn
self.out_init = out_init
self.state_init = state_init
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I'm not really happy with the names here.
Also, for streaming mode, the output of the previous step must be accessible anyways, e.g. as self.out.

For the state_init I am thinking of moving this as init to the cell, and store the cell's state there -- this is basically what it is. This needs some change in the activate() method, but is the cleaner way to solve this (IMO).

@fdlm
Copy link
Contributor Author

fdlm commented Dec 3, 2016

Since some of your proposed changes require more refactoring, I guess it would be really better to do this together with #185. It will be less work in total.

@superbock
Copy link
Collaborator

Yes, I will work on this next week, since we need NNs in #185 anyways.

@superbock
Copy link
Collaborator

Please see #235 for my first draft. If that PR addresses all your points we can close this PR.

@superbock
Copy link
Collaborator

Closing, since #243 is merged

@superbock superbock closed this Jan 20, 2017
@superbock superbock deleted the rnn_hid_init branch January 20, 2017 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants