-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] added initial states to RNN and LSTM layers #233
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the proposed changes will work the way they are now, I think we should address this together with #185, since we need to store the previous outputs and states anyways. These could then be initialised accordingly.
@@ -98,12 +98,16 @@ class RecurrentLayer(FeedForwardLayer): | |||
Recurrent weights. | |||
activation_fn : numpy ufunc | |||
Activation function. | |||
hid_init : numpy array, shape (), optional | |||
Initial state of hidden units. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really happy with the name hid_init
, I'd prefer simply init
, or -- if this is too generic -- hidden_init
.
@@ -125,6 +129,9 @@ def activate(self, data): | |||
return super(RecurrentLayer, self).activate(data) | |||
# weight input and add bias | |||
out = np.dot(data, self.weights) + self.bias | |||
# if we have a pre-initialised hidden state, add it | |||
if self.hid_init is not None: | |||
out[0] += np.dot(self.hid_init, self.recurrent_weights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic here should be refactored.
Since we have to keep the last output anyways for streaming mode, me might just save out
as self.out
to be able to access it in the next step. The initialisation of self.out
could then be done in __init__
with the given (learned) value.
The logic in the for
loop below has to be changed to always access self.out
-- or in case of block-wise processing -- the last item thereof.
self.input_gate = input_gate | ||
self.forget_gate = forget_gate | ||
self.cell = cell | ||
self.output_gate = output_gate | ||
self.activation_fn = activation_fn | ||
self.out_init = out_init | ||
self.state_init = state_init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I'm not really happy with the names here.
Also, for streaming mode, the output of the previous step must be accessible anyways, e.g. as self.out
.
For the state_init
I am thinking of moving this as init
to the cell, and store the cell's state there -- this is basically what it is. This needs some change in the activate()
method, but is the cleaner way to solve this (IMO).
Since some of your proposed changes require more refactoring, I guess it would be really better to do this together with #185. It will be less work in total. |
Yes, I will work on this next week, since we need NNs in #185 anyways. |
Please see #235 for my first draft. If that PR addresses all your points we can close this PR. |
Closing, since #243 is merged |
This is a first implementation of initial states for RNN and LSTM layers. Needs some testing. Addresses #230.