-
Notifications
You must be signed in to change notification settings - Fork 0
Step by Step, TreeTagger Installation
TreeTagger is a tool for annotating text with part-of-speech and lemma information. It is essential for EOP German Linguistic processing pipelines, and also needed for some of English pre-processing: the examples reported in Appendix B require to have TreeTagger installed. Excitement Open platform cannot ship this tool given that it has its own license, which is not compatible with the EOP one.
If you have decided to install TreeTagger, the first thing is reading the license agreement and agree with it: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/Tagger-Licence Actual installation is almost automated with a script. (The script will force you to read the license agreement, and won’t process unless you agree with it). Installing TreeTagger requires these 3 steps:
- Installing ant
- Making a copy the build.xml file
- Using the ant tool to download and install TreeTagger
- Adding the TreeTagger Maven dependency
Installing ant: Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications. We use Ant to install the TreeTagger whereas Ant 1.8.x or later is required.
There are two ways of installing Ant in Ubuntu:
- Using Ubuntu Software Center
- Download ant and then install manually (http://ant.apache.org/)
Making a copy the build.xml file: build.xml is a script provided by DKPro. (Thanks! DKPro.) and it will be used in the next step to install TreeTagger.
-
Move into the home directory of your project (e.g. myProject) e.g.
> cd ~/programs/myProject/
-
Create the new directory treetagger, i.e.
> mkdir -p src/scripts/treetagger/
-
In this directory create a new file called build.xml with your favourite text editor. Then copy and paste the content of build.xml that we reported in Appendix B into the new file.
Using the ant tool to download and install TreeTagger:
-
Move into the treetagger directory, i.e.
> cd ~/programs/myProject/scripts/treetagger/
-
Run the installation script by calling ANT build tool, i.e.
> ant local-maven
This command will download and wrap the binary and models as Maven modules, and install it on your local Maven repository (i.e. ~/.m2/).
TreeTagger Installation will take sometime (about 1 minute). If it works successfully, it will output “BUILD SUCCESSFUL”.
After that, to complete the installation, one needs to know if they want to use EOP via the Application Program Interface () or via the Command Line Interface (). In fact using EOP via or via involves two different procedures to complete the TreeTagger installation as reported below.
If you have decided to use EOP via you have to add the following dependencies into the pom.xml file of your project:
<!-- TreeTagger related dependencies -->
<dependency>
<groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
<artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-bin</artifactId>
<version>20131118.0</version>
</dependency>
<dependency>
<groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
<artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-de</artifactId>
<version>20121207.0</version>
</dependency>
<dependency>
<groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
<artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-en</artifactId>
<version>20111109.0</version>
</dependency>
<dependency>
<groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
<artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-it</artifactId>
<version>20101115.0</version>
</dependency>
<!-- end of TreeTagger related dependencies -->
where 20131118.0 is the version of the artifact treetagger-bin that has been installed. To know the version of the software that has been installed on your machine you have to take a look at the artifact installed in your maven local repository in the .m2 directory, i.e.
> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-bin/
20131118.0
20131118.0 is really the version of the artifact that has been installed and this is the version that has to be reported as part of the dependency information. 20101115.0 is instead the version of the Italian model treetagger-model-it. To see that:
> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-model-it/
20101115.0
The same check should be done for the other two models: treetagger-model-de, treetagger-model-en.
If you decided to use EOP via you have to uncomment TreeTagger dependencies (i.e. the dependency that has also been reported in the previous section) in the pom.xml file of the LAP module and rebuild the code:
-
Open with your favourite text editor (e.g. emacs) the pom.xml file that is in the directory Excitement-Open-Platform-{version}/lap/ and uncomment the TreeTagger dependencies.
-
Check if the TreeTagger dependency versions (e.g. 20131118.0) reported in the pom file really correspond to the artifacts installed in your local maven repository in the .m2 directory. To do that we can follow the procedure that has also been reported when using EOP via API, e.g.
> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-bin/
20131118.0
20101115.0 is instead the version of the Italian model treetagger-model-it. To see that:
> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-model-it/
20101115.0
The same check should be done for the other two models: treetagger-model-de, treetagger-model-en.
-
Go into the Excitement-Open-Platform-{version} directory, i.e.
> cd Excitement-Open-Platform-{version}
-
Then, build the code as usual by using the maven command:
> mvn package assembly:assembly