KATE: K-Competitive Autoencoder for Text

Code accompanying the paper "KATE: K-Competitive Autoencoder for Text"

Prerequisites

This code is written in python. To use it you will need:

Python 2.7
A recent version of Numpy
A recent version of NLTK
Tensorflow >= 1.0
Keras >=2.0

Getting started

To preprocess the corpus, e.g., 20 Newsgroups, just run the following:

    python construct_20news.py -train [train_dir] -test [test_dir] -o [out_dir] -threshold [word_freq_threshold] -topn [top_n_words]

It outputs 4 json files under the [out_dir] directory: train_data, train_label, test_data and test_label.

To train the KATE model, just run the following:

    python train.py -i [train_data] -nd [num_topics] -ne [num_epochs] -bs [batch_size] -nv [num_validation] -ctype kcomp -ck [top_k] -sm [model_file]

To predict on the test set, just run the following:

    python pred.py -i [test_data] -lm [model_file] -o [output_doc_vec_file] -st [output_topics] -sw [output_sample_words] -wc [output_word_clouds]

Architecture

Experiment results on 20 Newsgroups

PCA on the 20-D document vectors

TSNE on the 20-D document vectors

Five nearest neighbors in the word representation space

Extracted topics

Reference

If you found this code useful, please cite the following paper:

Yu Chen and Mohammed J. Zaki. "KATE: K-Competitive Autoencoder for Text." In Proceedings of the ACM SIGKDD International Conference on Data Mining and Knowledge Discovery. Aug 2017.

@inproceedings {chen2017kate,
author = { Yu Chen and Mohammed J. Zaki },
title = { KATE: K-Competitive Autoencoder for Text },
booktitle = { Proceedings of the ACM SIGKDD International Conference on Data Mining and Knowledge Discovery },
doi = { http://dx.doi.org/10.1145/3097983.3098017 },
year = { 2017 },
month = { Aug }
}

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
autoencoder		autoencoder
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
construct_20news.py		construct_20news.py
construct_movie_review_data.py		construct_movie_review_data.py
construct_reuters.py		construct_reuters.py
construct_wiki10plus.py		construct_wiki10plus.py
corpus2dbnformat.py		corpus2dbnformat.py
corpus2libsvm.py		corpus2libsvm.py
docnade_doccodes_converter.py		docnade_doccodes_converter.py
financial_insights.py		financial_insights.py
get_reuters_labels.py		get_reuters_labels.py
get_wiki10plus_labels.py		get_wiki10plus_labels.py
legacy_pred.py		legacy_pred.py
legacy_train.py		legacy_train.py
nvdm_doccodes_converter.py		nvdm_doccodes_converter.py
plot.py		plot.py
plot_DBN.py		plot_DBN.py
plot_reuters.py		plot_reuters.py
pred.py		pred.py
pred_vae.py		pred_vae.py
run_classifier.py		run_classifier.py
run_clf.py		run_clf.py
run_doc2vec.py		run_doc2vec.py
run_doc_retrieval.py		run_doc_retrieval.py
run_doc_word2vec.py		run_doc_word2vec.py
run_lda.py		run_lda.py
run_regression.py		run_regression.py
run_w2v.py		run_w2v.py
run_wikitag_extractor.py		run_wikitag_extractor.py
run_xml2text_wiki10puls.py		run_xml2text_wiki10puls.py
train.py		train.py
train_vae.py		train_vae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KATE: K-Competitive Autoencoder for Text

Prerequisites

Getting started

Architecture

Experiment results on 20 Newsgroups

PCA on the 20-D document vectors

TSNE on the 20-D document vectors

Five nearest neighbors in the word representation space

Extracted topics

Reference

About

Releases

Packages

Languages

License

fabriciorsf/KATE

Folders and files

Latest commit

History

Repository files navigation

KATE: K-Competitive Autoencoder for Text

Prerequisites

Getting started

Architecture

Experiment results on 20 Newsgroups

PCA on the 20-D document vectors

TSNE on the 20-D document vectors

Five nearest neighbors in the word representation space

Extracted topics

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages