Skip to content
mimno edited this page Jul 23, 2018 · 2 revisions

Support for training word embeddings in Mallet is included in the current development release on Github. It is not available in the 2.0.8 release.

Importing data

Word embeddings can be trained from the same format data files as topic models. The main difference is that embeddings typically do not remove high-frequency words, as these can provide information about the syntactic function of words.

bin/mallet import-file --input history.txt --keep-sequence --output history.seq

Training embeddings

Training embeddings

bin/mallet run cc.mallet.topics.WordEmbeddings --input history.seq