Skip to content
rzanoli edited this page Jan 23, 2014 · 120 revisions

These quick start guides are intended to provide you with some examples (i.e. use cases) that you can follow to install and use EOP in few minutes. No advanced installation options are discussed here - just the basics that will work for most of the users who want to get started.

The section proposes these use cases:

  • Downloading and Installing EOP [d:2, t:2]
  • [Annotating a single text/hypothesis pair by using a pre-trained model](#Annotating_a_single text/hypothesis_pair_by_using_a_pre-trained_model) [d:1, t:1]
  • [Annotating a data set of multiple text/hypothesis pairs by using a pre-trained model] (#Annotating_a_data_set_of_multiple_text/hypothesis_pairs_by_using_a_pre-trained model) [d:1, t:2]
  • Creating and testing a new model [d:1, t:3]
  • Installing and using BIUTEE EDA [d:3, t:5]

For each of the proposed use cases you can see (on their right) the level of difficulty of the task (e.g. d:1) and the lead time (e.g. t:2) needed to complete it. Then for each of these measures we assigned a number ranging from 1 to 5:

  • d:1 (easy), d:5 (difficult)
  • t:1 (no time consuming), t:5 (time consuming)

Downloading and Installing EOP

Goal: downloading and installing EOP by using the .tar.gz (gzip) archive file of its distribution.

Prerequisite: EOP hardware and software requirements have been meet.

Main steps:

  1. downloading the .tar.gz (gzip) archive file
  2. building the code
  3. downloading and installing the file of the resources (i.e. configuration files, models and lexical resources) needed to use the platform.

Downloading the .tar.gz (gzip) archive file

EOP provides different distributions for users. In the running example we will use the source code in the .tar.gz (gzip) archive file that has to be downloaded and unpacked before using it.

  • Download the Excitement-Open-Platform-1.0.2.tar.gz archive file of the code.

  • Copy the archive file from the directory where it has been saved into the directory where you want to have it, e.g. your home directory:

> cp Excitement-Open-Platform-1.0.2.tar.gz ~/
  • Go into your home directory and extract/unpack it, i.e.
> cd ~/
> tar -xvzf Excitement-Open-Platform-1.0.2.tar.gz

It will create the directory Excitement-Open-Platform-1.0.2 containing the source code.

Building the EOP code

To compile the source code and "assembly" the produced files, directories and the needed dependencies, the maven tool needs to be used:

From your home directory go into the Excitement-Open-Platform-1.0.2 directory, i.e.

> cd Excitement-Open-Platform-1.0.2

Then, build the EOP code using the maven command, i.e.

> mvn -Dmaven.test.skip=true package assembly:assembly

It creates a directory called target directory in Excitement-Open-Platform-1.0.2 containing a zip file (i.e. eop-1.0.2-bin.zip) of the generated binary code.

Go into the target directory, i.e.

> cd target

and from this directory unzip the new Zip File created before (eop-1.0.2-bin.zip), i.e.

> unzip eop-1.0.2-bin.zip

It creates a new directory (i.e. EOP-1.0.2) containing the binary files (i.e. jar files) that you have to use to run EOP.

Downloading and installing the file of the resources

Resources like WordNet and Wikipedia as well as the configuration files of the platform and the pre-trained models are distributed in a separated archive file that has to be download and unpack before using it:

  • Click on this link eop-resources-1.0.2.tar.gz to download the archive file of the resources.

  • Copy the archive file into the EOP-1.0.2 directory created in the previous point, e.g.

> cp eop-resources-1.0.2.tar.gz 
~/Excitement-Open-Platform-1.0.2/target/EOP-1.0.2/eop-resources-1.0.2.tar.gz
  • From the EOP-1.0.2 directory where the archive file has been saved, extract/unpack it, i.e.
> cd  ~/Excitement-Open-Platform-1.0.2/target/EOP-1.0.2/
> tar -xvzf eop-resources-1.0.2.tar.gz

It will create the directory eop-resources-1.0.2 containing all the needed files.


Annotating a single text/hypothesis pair by using a pre-trained model

Goal: Given two text fragments, one named text and the other named hypothesis, the Entailment task consists in recognizing whether the hypothesis can be inferred from the text. The purpose of this section consists in annotating a single text/hypothesis pair (i.e. predicting if the hypothesis can be inferred from the text) by using one of the pre-trained models of Edit Distance EDA.

Prerequisite: EOP has to be already installed.

Main steps:

  • pre-processing the text/hypothesis pair by the needed linguistic pipeline
  • annotating the pair with Edit Distance EDA

The Demo class is an utility class provided with EOP that is able to call both the linguistic analysis pipeline to pre-process the data to be annotated and the selected entailment algorithm (EDA). It is the simplest way to perform the proposed task.

Go into the EOP-1.0.2 directory, i.e.

> cd  ~/Excitement-Open-Platform-1.0.2/target/EOP-1.0.2/

and call the Demo class with the needed parameters as reported below, i.e.

> java -Djava.ext.dirs=../EOP-1.0.2/ eu.excitementproject.eop.gui.Demo -config
./eop-resources-1.0.2/configuration-files/EditDistanceEDA_EN.xml -test
-text "Hubble is a telescope" -hypothesis "Hubble is an instrument"
-output ./eop-resources-1.0.2/results/

where:

  • EditDistanceEDA_EN.xml is the configuration containing the linguistic analysis pipeline, the EDA and the pre-trained model that have to be used to annotate the data to be annotated.
  • test means that the selected EDA has to make its annotation by using a pre-trained model.
  • text is the text.
  • hypothesis is the hypothesis.
  • output is the directory where the result file (results.xml) containing the prediction has to be stored.

The prediction is saved in the results.xml file in the eop-resources-1.0.2/results/ directory and you can take a look at it by the following command:

> cat eop-resources-1.0.2/results/results.xml

Below we reported the content of the results.xml file:

<entailment-corpus lang="null">
<pair id="1" entailment="Entailment" benchmark="N/A" score="1.0658141036401503E-14" confidence="1" task="EOP test">
<t>Hubble is a telescope</t>
<h>Hubble is an instrument</h>
</pair>
</entailment-corpus>

The prediction made by the EDA (i.e. entailment="Entailment") means that for Edit Distance EDA there is a relation of Entailment between the text: Hubble is a telescope and the hypothesis: Hubble is an instrument.


Annotating a data set of multiple text/hypothesis pairs by using a pre-trained model

Goal: Given two text fragments, one named text and the other named hypothesis, the Entailment task consists in recognizing whether the hypothesis can be inferred from the text. The purpose of this section consists in annotating the English RTE-3 data set containing multiple text/hypothesis pairs (i.e. for each of the pairs in the data set we want to know if the hypothesis can be inferred from the text) by using one of the pre-trained models of Edit Distance EDA.

Prerequisite: EOP has to be already installed.

Main steps:

  • pre-processing the data set of text/hypothesis pairs by the needed linguistic pipeline
  • annotating the data set with Edit Distance EDA

The Demo class is an utility class provided with EOP that is able to call both the linguistic analysis pipeline to pre-process the data to be annotated and the selected entailment algorithm (EDA). It is the simplest way to perform the proposed task.

Go into the EOP-1.0.2 directory, i.e.

> cd  ~/Excitement-Open-Platform-1.0.2/target/EOP-1.0.2/

and call the Demo class with the needed parameters as reported below:

> java -Djava.ext.dirs=../EOP-1.0.2/ eu.excitementproject.eop.gui.Demo -config
./eop-resources-1.0.2/configuration-files/EditDistanceEDA_EN.xml -test
-testFile ./eop-resources-1.0.2/data-set/English_test.xml
-output ./eop-resources-1.0.2/results/

where:

  • EditDistanceEDA_EN.xml is the configuration containing the linguistic analysis pipeline, the EDA and the pre-trained model that have to be used to annotate the data to be annotated.
  • test means that the selected EDA has to make its annotation by using a pre-trained model.
  • textFile is the data set of the text/hypothesis pairs that has to be annotated.
  • output is the directory where the result file (EditDistanceEDA_EN.xml_results.xml) containing the predictions has to be stored.

The prediction is saved in the EditDistanceEDA_EN.xml_results.xml file in the eop-resources-1.0.2/results/ directory and you can take a look at it by the following command:

> cat eop-resources-1.0.2/results/EditDistanceEDA_EN.xml_results.xml

The output file contains the annotated text/hypothesis pairs, e.g.

<pair id="1" entailment="Entailment" benchmark="ENTAILMENT" score="1.0658141036401503E-14" confidence="1" task="IE">
<t>Claude Chabrol (born June 24, 1930) is a French movie director and has become well-known in the 40 years since his first film, Le Beau Serge , for his chilling tales of murder, including Le Boucher.</t>
<h>Le Beau Serge was directed by Chabrol.</h>
</pair>

The prediction made by the EDA (entailment="Entailment") means that for Edit Distance EDA there is a relation of Entailment between the text t and the hypothesis h.


Creating and testing a new model

Goal: Supervised learning takes a known set of input data and known responses to the data, and seeks to build a predictor model that generates reasonable predictions for the response to new data. The purpose of this section consists in using TIE EDA to create a new model on the English RTE-3 development data set to be tested on the English RTE-3 test data set: we want to use the multiple pairs in the training data set to build a model that can predict, that is for each of the pairs in the test data set, if the hypothesis can be inferred from the text.

Prerequisite: EOP has to be already installed.

Main steps:

  • Training TIE on the training data set
  • pre-processing the English RTE-3 data set by the needed linguistic pipeline
  • learning the new model with TIE EDA
  • Testing the learned model on the test data set
  • pre-processing the English RTE-3 test set by the needed linguistic pipeline
  • testing the new model on the RTE-3 test data set

The Demo class is an utility class provided with EOP that is able to call both the linguistic analysis pipeline to pre-process the data to be annotated and the selected entailment algorithm (EDA). It is the simplest way to perform the proposed task. We will use the Demo class both for training the EDA and for testing the learned model on the test data set.

Training TIE on the training data set

Go into the EOP-1.0.2 directory, i.e.

> cd  ~/Excitement-Open-Platform-1.0.2/target/EOP-1.0.2/

Before training the system, TIE asks to check that the model we want to create doesn't already exist and when it exits we would need to remove it. Supposing that we want to create an existing model called: MaxEntClassificationEDAModel_Base+OpenNLP_EN we first need to remove it, i.e.

> rm  ~/Excitement-Open-Platform-1.0.2/target/EOP-1.0.2/eop-resources-1.0.2/model/MaxEntClassificationEDAModel_Base+OpenNLP_EN

After that we can call Demo class with the needed parameters as reported below:

java -Djava.ext.dirs=../EOP-1.0.2/ eu.excitementproject.eop.gui.Demo -config
./eop-resources-1.0.2/configuration-files/MaxEntClassificationEDA_Base+OpenNLP_EN.xml
-train -trainFile ./eop-resources-1.0.2/data-set/English_dev.xml

where:

  • MaxEntClassificationEDA_Base+OpenNLP_EN.xml is the configuration containing the linguistic analysis pipeline, the EDA and the model that has to be created training the EDA on the training data set.
  • train means that the selected EDA has to be trained on the specified training data set.
  • trainFile is the data set of the text/hypothesis pairs that has to be used to train the EDA.

A the end of this phase the new model MaxEntClassificationEDAModel_Base+OpenNLP_EN should be available from the eop-resources-1.0.2/model/ directory.

Testing the learned model on the test data set

In this phase the model learn in the previous phase is used to annotate the test data set:

java -Djava.ext.dirs=../EOP-1.0.2/ eu.excitementproject.eop.gui.Demo -config
./eop-resources-1.0.2/configuration-files/MaxEntClassificationEDA_Base+OpenNLP_EN.xml 
-test -testFile ./eop-resources-1.0.2/data-set/English_test.xml 
-output ./eop-resources-1.0.2/results/                        

where:

  • MaxEntClassificationEDA_Base+OpenNLP_EN.xml is the configuration containing the linguistic analysis pipeline, the EDA and the pre-trained model that have to be used to annotate the data to be annotated.
  • test means that the selected EDA has to make its annotation by using a pre-trained model.
  • textFile is the data set of the text/hypothesis pairs that has to be annotated.
  • output is the directory where the result file (MaxEntClassificationEDA_Base+OpenNLP_EN.xml_results.xml) containing the predictions has to be stored.

The prediction is saved in the MaxEntClassificationEDA_Base+OpenNLP_EN.xml_results.xml file in the eop-resources-1.0.2/results/ directory and you can take a look at it by the following command:

> cat eop-resources-1.0.2/results/MaxEntClassificationEDA_Base+OpenNLP_EN.xml_results.xml

The output file contains the annotated text/hypothesis pairs, e.g.

<pair id="2" entailment="Entailment" benchmark="ENTAILMENT" score="0.5155268135406939" confidence="1" task="IE">
<t>Claude Chabrol (born June 24, 1930) is a French movie director and has become well-known in the 40 years since his first film, Le Beau Serge , for his chilling tales of murder, including Le Boucher .</t>
<h>Le Boucher was made by a French movie director.</h>
</pair>

The prediction made by the EDA (entailment="Entailment") means that for TIE EDA there is a relation of Entailment between the text t and the hypothesis h.


Installing and using BIUTEE EDA

In the current release installing and using BIUTEE EDA is quite different than using the other EDAs. BIUTEE is by far the most relevant, accurate and complex EDA and that is because we remind to its web page for more information.


Clone this wiki locally