LSTMs Long Short-Term Memory is a type of RNNs Recurrent Neural Network that can detain long-term dependencies in sequential data. LSTMs are able to process and analyze sequential data, such as time series, text, and speech. They use a memory cell and gates to control the flow of information, allowing them to selectively retain or discard information as needed and thus avoid the vanishing gradient problem that plagues traditional RNNs. LSTMs are widely used in various applications such as natural language processing, speech recognition, and time series forecasting.
LSTM is a type of recurrent neural network which is better than simple RNN as LSTMs are designed to capture not only state of previous inputs but also it carries memories of previous inputs of the sequence, which is not the case with RNN. LSTM are used in processing large sequences. It is applied in conversational AI to predict the next word.
Code and Resources Used
Language: Python 3.11
Dataset: Chatterbot Kaggle English Dataset
Pakages Used: numpy, tensorflow, pickle, keras
Model Used: Seq2Seq LSTM model
API built: Keras Functional API
-
The dataset hails from chatterbot/english on Kaggle.com by kausr25. - It contains pairs of questions and answers based on a number of subjects like food, history, AI etc
-
Parse each .yml file:
- Concatenate two or more sentences if the answer has two or more of them.
- Remove unwanted data types which are produced while parsing the data.
- Append and to all the answers.
- Create a Tokenizer and load the whole vocabulary ( questions + answers ) into it.
- Three arrays required by the model are encoder_input_data, decoder_input_data and decoder_output_data
- Encoder_input-data: Tokenize the questions and Pad them to maximum length.
- Decoder_input-data: Tokenize the answers and Pad them to maximum length.
- Decoder_output-data: Tokenize the answers and Remove the first element from all the tokenized_answers. This is the element which we added earlier.
- The model will have Embedding, LSTM and Dense layers. The basic configuration is as follows:
- 2 Input Layers : One for encoder_input_data and another for decoder_input_data.
- Embedding layer : For converting token vectors to fix sized dense vectors. ( Note : Don't forget the ask_zero=True argument here )
- LSTM layer : Provide access to Long-Short Term cells.
- The encoder_input_data comes in the Embedding layer ( encoder_embedding ).
- The output of the Embedding layer goes to the LSTM cell which produces 2 state vectors ( h and c which are encoder_states )
- These states are set in the LSTM cell of the decoder.
- The decoder_input_data comes in through the Embedding layer.
- The Embeddings goes in LSTM cell ( which had the states ) to produce seqeunces.
- Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems.
- This is a behavior required in complex problem domains like machine translation, speech recognition, and more.
- The success of LSTMs is in their claim to be one of the first implements to overcome the technical problems and deliver on the promise of recurrent neural networks.
- LSTM network is comprised of different memory blocks called cells (the rectangles that we see in the image).
- There are two states that are being transferred to the next cell; the cell state and the hidden state.
- The memory blocks are responsible for remembering things and manipulations to this memory is done through three major mechanisms, called gates.
- The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.
- The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.
- The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.
- Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation.
- The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”
- LSTMs is Explained in much more detail in this Blogpost Colah's Blog
- We train the model for a number of 150 epochs with RMSprop optimizer and categorical_crossentropy loss function.
- Model training accuracy = 0.96 ie; 96%
-
Encoder inference model : Takes the question as input and outputs LSTM states ( h and c ).
-
Decoder inference model : Takes in 2 inputs, one are the LSTM states ( Output of encoder model ), second are the answer input seqeunces ( ones not having the tag ). It will output the answers for the question which we fed to the encoder model and its state values.
- First, we define a method str_to_tokens which converts str questions to Integer tokens with padding.
- First, we take a question as input and predict the state values using enc_model.
- We set the state values in the decoder's LSTM.
- Then, we generate a sequence which contains the element.
- We input this sequence in the dec_model.
- We replace the element with the element which was predicted by the dec_model and update the state values.
- We carry out the above steps iteratively till we hit the tag or the maximum answer length.