Skip to content

Step by Step, TreeTagger Installation

rzanoli edited this page Mar 4, 2014 · 8 revisions

TreeTagger is a tool for annotating text with part-of-speech and lemma information. It is essential for EOP German Linguistic processing pipelines, and also needed for some of English pre-processing: the examples reported in Appendix B require to have TreeTagger installed. Excitement Open platform cannot ship this tool given that it has its own license, which is not compatible with the EOP one.

If you have decided to install TreeTagger, the first thing is reading the license agreement and agree with it: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/Tagger-Licence Actual installation is almost automated with a script. (The script will force you to read the license agreement, and won’t process unless you agree with it). Installing TreeTagger requires these 3 steps:

  1. Installing ant
  2. Making a copy the build.xml file
  3. Using the ant tool to download and install TreeTagger
  4. Adding the TreeTagger Maven dependency

Installing ant: Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications. We use Ant to install the TreeTagger whereas Ant 1.8.x or later is required.

There are two ways of installing Ant in Ubuntu:

Making a copy the build.xml file: build.xml is a script provided by DKPro. (Thanks! DKPro.) and it will be used in the next step to install TreeTagger.

  • Move into the home directory of your project (e.g. myProject) e.g.

    > cd ~/programs/myProject/
    
  • Create the new directory treetagger, i.e.

    > mkdir -p src/scripts/treetagger/
    
  • In this directory create a new file called build.xml with your favourite text editor. Then copy and paste the content of build.xml that we reported in Appendix B into the new file.

Using the ant tool to download and install TreeTagger:

  • Move into the treetagger directory, i.e.

    > cd ~/programs/myProject/scripts/treetagger/
    
  • Run the installation script by calling ANT build tool, i.e.

    > ant local-maven 
    

This command will download and wrap the binary and models as Maven modules, and install it on your local Maven repository (i.e. ~/.m2/).

TreeTagger Installation will take sometime (about 1 minute). If it works successfully, it will output “BUILD SUCCESSFUL”.

After that, to complete the installation, one needs to know if they want to use EOP via the Application Program Interface (api) or via the Command Line Interface (cli). In fact using EOP via api or via cli involves two different procedures to complete the TreeTagger installation as reported below.

via API: Adding the TreeTagger maven dependency into your own project

If you have decided to use EOP via api you have to add the following dependencies into the pom.xml file of your project:

<!-- TreeTagger related dependencies -->
        <dependency>
                <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-bin</artifactId>
                <version>20131118.0</version>
        </dependency>
        <dependency>
                <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-de</artifactId>
                <version>20121207.0</version>
        </dependency>
        <dependency>
                <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-en</artifactId>
                <version>20111109.0</version>
        </dependency>
        <dependency>
                <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
                <artifactId>de.tudarmstadt.ukp.dkpro.core.treetagger-model-it</artifactId>
                <version>20101115.0</version>
        </dependency>
 <!-- end of TreeTagger related dependencies -->

where 20131118.0 is the version of the artifact treetagger-bin that has been installed. To know the version of the software that has been installed on your machine you have to take a look at the artifact installed in your maven local repository in the .m2 directory, i.e.

> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-bin/
20131118.0

20131118.0 is really the version of the artifact that has been installed and this is the version that has to be reported as part of the dependency information. 20101115.0 is instead the version of the Italian model treetagger-model-it. To see that:

> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-model-it/
20101115.0

The same check should be done for the other two models: treetagger-model-de, treetagger-model-en.

via CLI: Enabling the TreeTagger maven dependency in the EOP project

If you decided to use EOP via cli you have to uncomment TreeTagger dependencies (i.e. the dependency that has also been reported in the previous section) in the pom.xml file of the LAP module and rebuild the code:

  • Open with your favourite text editor (e.g. emacs) the pom.xml file that is in the directory Excitement-Open-Platform-{version}/lap/ and uncomment the TreeTagger dependencies.

  • Check if the TreeTagger dependency versions (e.g. 20131118.0) reported in the pom file really correspond to the artifacts installed in your local maven repository in the .m2 directory. To do that we can follow the procedure that has also been reported when using EOP via API, e.g.

> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-bin/
20131118.0

20101115.0 is instead the version of the Italian model treetagger-model-it. To see that:

> ls ~/.m2/repository/de/tudarmstadt/ukp/dkpro/core/de.tudarmstadt.ukp.dkpro.core.treetagger-model-it/
20101115.0

The same check should be done for the other two models: treetagger-model-de, treetagger-model-en.

  • Go into the Excitement-Open-Platform-{version} directory, i.e.

    > cd Excitement-Open-Platform-{version}
    
  • Then, build the code as usual by using the maven command:

    > mvn package assembly:assembly
    
Clone this wiki locally