This repository contains the GraphDocExplore framework. GraphDocExplore provides
- an intuitive web interface for graph-based document exploration
- a generic interface to plug in different methods to extract graphs from text
- an example implementations of this interface that creates entity graphs
- extensive logging capabilities to support user studies
You can access a demo of the user interface here.
For more details on the features of this framework, please refer to our publication at EMNLP 2017.
Please use the following citation if you make use of the framework in your own work:
@inproceedings{TUD-CS-2017-0153,
title = {GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Exploration Techniques},
author = {Falke, Tobias and Gurevych, Iryna},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages = {(to appear)},
year = {2017},
location = {Copenhagen, Denmark}
}
Abstract: Graphs have long been proposed as a tool to browse and navigate in a collection of documents in order to support exploratory search. Many techniques to automatically extract different types of graphs, showing for example entities or concepts and different relationships between them, have been suggested. While experimental evidence that they are indeed helpful exists for some of them, it is largely unknown which type of graph is most helpful for a specific exploratory task. However, carrying out experimental comparisons with human subjects is challenging and time-consuming. Towards this end, we present the GraphDocExplore framework. It provides an intuitive web interface for graph-based document exploration that is optimized for experimental user studies. Through a generic graph interface, different methods to extract graphs from text can be plugged into the system. Hence, they can be compared at minimal implementation effort in an environment that ensures controlled comparisons. The system is publicly available under an open-source license.
Contacts
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
The framework is built in Java (backend) and AngularJS (web UI). Documents are indexed and searched with Solr.
It is a Java Maven project with several modules:
graphs
central Java data structures, used by the webapp and graph extraction moduleswebapp
backend and frontend code of web applicationindexer
simple command-line application to index documents in Solrgraphs-ne-impl
example implementation of a graph extraction module, creating entity co-occurence graphs with Stanford NER
For more details, please refer to the respective modules README file.
To build the full project, run Maven for the parent POM (this folder), which should compile, test and package all modules. The war created in the webapp module can then be deployed on a webserver (e.g. Tomcat).
For instructions to setup the system in your environment, please refer to webapp/README.md
.