MABED

Mention-anomaly-based Event Detection and Tracking in Twitter

Author: Adrien GUILLE

Details of this program are described in the following paper:

Adrien Guille and Cécile Favre (2014) 
Mention-Anomaly-Based Event Detection and Tracking in Twitter.
In Proceedings of the 2014 IEEE/ACM International Conference on
Advances in Social Network Mining and Analysis (ASONAM 2014),
pp. 375-382, DOI: 10.1109/ASONAM.2014.6921613

Please cite this paper when using the program.

Files in the Directory

input/: input files that describe the corpus in which we want to detect events
MABED.jar: Java program that does the event detection
README.txt: this file
parameters.txt: Java properties file in which parameters are set
stopwords.txt: a list of common stopwords to remove when generating the vocabulary
lib/: program dependencies

Preparing Input Format from File with All Tweets (Optionally)

If the program is called with the argument "-split", it expects the file in the "dataset/" directory:

<name_file.csv>.text: content all tweets, one line per tweet, each line should be formatted according to this format: "timestamp","tweet message"; and the timestamp should be formatted according to the format shown below.

Input Format

If the program is called with the argument "-run", it expects two sets of files in the "input/" directory:

<time_slice>.text: content of the messages, one line per message;
<time_slice>.time: timestamp of the messages, each line maps to the message that has the same line number in <time_slice>.text. Timestamps should be formatted according to this format: YYYY-MM-DD HH:mm:ss.S (e.g. 2009-11-01 00:01:24.0)

Time-slices are expected to be numbered starting from 0 and files are expected to be named with 8 digits (e.g. 00000000.text, 00000000.time, 00000001.text, 00000001.time)

Parameter Setting

All the parameters are set in the parameters.txt file:

prepareCorpus (boolean): if you are running MABED for the first time, or if the content of the input directory has been modified, this parameter should be set to 'true', otherwise 'false'.
timeSliceLength (int): length of each time-slice, expressed in minutes (e.g. 30);
numberOfThreads (int): the number of threads used by MABED (if > 1, then the parallelized implementation of MABED is executed)
k (int): desired number of events (e.g. 40);
p (int): maximum number of related words describing each event (e.g. 10);
theta (double): minimum weight of each related word (e.g. 0.7);
sigma (double): merging threshold (e.g. 0.5);
stopwords (String): name of the file that lists the stopwords, one word per line (e.g. stopwords.txt);
minSupport (double): minimum support of words in the vocabulary (e.g. 0)
maxSupport (double): maximum support of words in the vocabulary (e.g. 1)

Running the program

Requirements: JAVA (7+)
Execute the program MABED.jar with the following command: "java -jar MABED.jar -run". It should process the input and save the output in the "ouput/" directory.
To generate input files from a '.csv' file containing all tweets (timestamps and messages), execute the program MABED.jar with the following command: "java -jar MABED.jar -split timeSliceLength name_file.csv". It should process the file containing all tweets, and save the split files in the "input/" directory.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
lib		lib
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
LICENSE		LICENSE
MABED.jar		MABED.jar
README.md		README.md
parameters.txt		parameters.txt
run_split_detect.sh		run_split_detect.sh
screenshot.png		screenshot.png
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MABED

Files in the Directory

Preparing Input Format from File with All Tweets (Optionally)

Input Format

Parameter Setting

Running the program

About

Releases

Packages

Languages

License

fabriciorsf/MABED

Folders and files

Latest commit

History

Repository files navigation

MABED

Files in the Directory

Preparing Input Format from File with All Tweets (Optionally)

Input Format

Parameter Setting

Running the program

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages