-
Notifications
You must be signed in to change notification settings - Fork 4
Home
The EXCITEMENT Open Platform (EOP) is a generic architecture and a comprehensive implementation for textual inference in multiple languages. The platform includes state-of-art algorithms, a large number of knowledge resources, and facilities for experimenting and testing innovative approaches.
New Features:
-
AdArte (A Transformation-Driven Approach for Recognizing Textual Entailment) is based on modelling entailment relations as a classification problem where the single T-H pairs are first represented by a sequence of edit operations (i.e., deleting, replacing and inserting pieces of text) called transformations needed to transform T into H, and then used as features to feed up a supervised learning classifier to classify the pairs as positive or negative examples.
-
Installation script for installing the EOP and also TreeTagger after you read and agree with its licence.
Bug Fixes:
- Wrong Italian part-of-speech mapping.
Known Bugs and Limitations:
- Scorer for evaluating binary-class classification problems only.
- Some dependencies version in the adarte project file are wrong (1.2.0 instead of 1.2.1). See how to solve it: https://github.com/hltfbk/EOP-1.2.1/wiki/AdArte
-
The EXCITEMENT data sets (Kotlerman et al, forthcoming) contain negative feedbacks from customers where they state reasons for dissatisfaction with a given company. The data sets are available for English and Italian. For each language, the release is composed of 4 data sets, structured along the two orthogonal dimensions of balanced-unbalanced and mixed-pure. Balanced-unbalanced refers to the fact that the data set contains a comparable number of positive and negative examples (balanced) or not (unbalanced), while mixed-pure regards the possibility to have the T-H pairs of a specific topic equally distributed between training and test set (mixed) or only in train or in test (i.e., pure).
-
RTE-3 for Bulgarian language (thanks to Iliana Simova and the BulTreeBank team)
-
SICK (Marelli et al, 2014) is the data set that was used at SemEval-2014 for the two subtasks of (i) Relatedness (i.e., predicting the degree of semantic similarity between two sentences), and (ii) Entailment (i.e., detecting the entailment relation holding between two sentences).