Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 1.15 KB

File metadata and controls

31 lines (23 loc) · 1.15 KB

Structured Self-Attentive Sentence Embedding

This is an implementation of the paper: https://arxiv.org/pdf/1703.03130.pdf published in ICLR 2017. This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, the paper use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence.It also propose a self-attention mechanism.

Optional Text

The implementation is done on the imdb dataset with the following parameters:

top_words = 10000
learning_rate =0.001
max_seq_len = 200
emb_dim = 300
batch_size=500
u=64
da = 32
r= 16

top_words : only consider the top 10,000 most common words
u: hidden unit number for each unidirectional LSTM
da : a hyperparameter we can set arbitrarily.
r : no. of different parts to be extracted from the sentence.

To Run:

python self-attention.py

Running this for 4 epochs gives a training accuracy of 94% and test accuracy of 87%.

To Do :

Penalization term
results on other datasets