Skip to content

Latest commit

 

History

History
331 lines (238 loc) · 8.3 KB

api-v1.md

File metadata and controls

331 lines (238 loc) · 8.3 KB

SoDA API

This document describes the SoDA API. The SoDA API uses JSON over HTTP and is thus language agnostic. A SoDA request is built as a JSON document and is sent to the SoDA JSON endpoint over HTTP POST. SoDA responds with another JSON document indicating success or failure.

Table of Contents

(generated with DocToc)


The following section provides details about each of the endpoints.

Index

This just returns a status OK JSON message. It is meant to test if the SoDA web component is alive.

URL http://host:port/soda/index.json

INPUT

None

OUTPUT

    { "status": "ok" }

EXAMPLE PYTHON CLIENT

    import json
    import requests

    resp = requests.get("http://host:port/soda/index.json")
    print json.loads(resp.text)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.SodaClient

    val sodaClient = new SodaClient()
    val resp = sodaClient.get("http://host:port/soda/index.json")
    Console.println(resp)

List Lexicons

Returns a list of lexicons available to annotate against. Currently we only allow the ability to annotate documents against a single lexicon. When requiring annotations against multiple documents, it is recommended to annotate documents separately against each lexicon, then merge the annotations.

URL http://host:port/soda/dicts.json

INPUT

None

OUTPUT

    [
        { "lexicon" : "countries", "numEntries" : 248 }
    ]

EXAMPLE PYTHON CLIENT

    import requests

    resp = requests.get("http://host:port/soda/dicts.json")
    print json.loads(resp.text)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.SodaClient

    val sodaClient = new SodaClient()
    val resp = sodaClient.get("http://host:port/soda/dicts.json")
    Console.println(resp)

Annotate Document

Annotates text against a specified lexicon and match type. Match type can be one of the following.

  • exact - matches text against the FST maintained in Solr by SolrTextTagger. This will match segments in text that are identical to a dictionary entry.
  • lower - same as exact, but matches are now case insensitive.
  • punct - removes punctuation and phrase chunks input, then matches phrases against lexicon entries. Also case insensitive.
  • sort - same as punct, except that words in phrase chunks are sorted and matched against similarly sorted lexicon entries.
  • stem - same as sort, except words in phrases are stemmed using Porter stemmer and matched against similarly stemmed lexicon entries.

URL http://host:port/soda/annot.json

INPUT

    {
        "lexicon" : "countries",
        "text" : "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China",
        "matching" : "exact"
    }

OUTPUT

    [
        {
            "id" : "http://www.geonames.org/CHN", 
            "lexicon" : "countries", 
            "begin" : 41, 
            "end" : 46,
            "coveredText" : "China", 
            "confidence" : "1.0"
        }, 
        {
            "id" : "http://www.geonames.org/CHN", 
            "lexicon" : "countries",
            "begin" : 102, 
            "end" : 107,
            "coveredText" : "China", 
            "confidence" : "1.0"
        }
    ]

EXAMPLE PYTHON CLIENT

    import json
    import requests

    params = {
        "lexicon" : "countries",
        "text" : "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China",
        "matching" : "exact"
    }
    req = json.dumps(params)
    resp = requests.post("http://host:port/soda/annot.json", data=req)
    print json.loads(resp.text)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.{SodaClient, SodaUtils}

    val sodaClient = new SodaClient()
    val req = SodaUtils.jsonBuild(Map(
        "lexicon" -> "countries",
        "text" -> "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China",
        "matching" -> "exact"))
    val resp = sodaClient.post("http://host:port/soda/annot.json", req)
    Console.println(SodaUtils.jsonParseList(resp))

Delete Lexicon

A single SoDA index can contain entries from multiple lexicons. This operation deletes all entries in a Lexicon.

URL http://host:port/soda/delete.json

INPUT

    { "lexicon" : "lexicon_name" }

OUTPUT

    {"status": "ok"}

EXAMPLE PYTHON CLIENT

    import json
    import requests

    params = { "lexicon" : "countries" }
    req = json.dumps(params)
    resp = requests.post("http://host:port/soda/delete.json", data=req)
    print json.loads(resp.text)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.{SodaClient, SodaUtils}

    val sodaClient = new SodaClient()
    val params = Map("lexicon" -> "countries")
    val req = SodaUtils.jsonBuild(params)
    val resp = sodaClient.post("http://host:port/soda/delete.json", req)
    Console.println(SodaUtils.jsonParse(resp))

Add Lexicon Entries

Adds new entries to a named Lexicon.

URL http://host:port/soda/save.json

INPUT

    {
        "lexicon" : "lexicon_name", 
        "id" : "unique_url_of_entry",
        "names" : ["name_1", "name_2", "name_3"],
        "commit" : true_or_false
    }

The id value we have chosen to use is the RDF URI of the entity as reported in the imported lexicon. The names are the strings to match for that entity, a single entry can have multiple names. The commit is optional, if omitted, each addition operation results in a commit, which is inefficient. It is better to either commit at regular intervals, and once at the end. In order to send a commit request using the save.json endpoint, omit the id and names entries, like this:

    {
        "lexicon" : "lexicon_name", 
        "commit" : true
    }

OUTPUT

    {"status": "ok"}

EXAMPLE PYTHON CLIENT

    import json
    import requests

    params = {
        "lexicon" : "countries",
        "id" : "http://www.geonames.org/AND",
        "names" : ["AND", "Andorra", "Andorre"],
        "commit" : false
    }
    req = json.dumps(params)
    resp = requests.post("http://host:port/soda/add.json", data=req)
    print json.loads(resp.text)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.{SodaClient, SodaUtils}

    val sodaClient = new SodaClient()
    val params = Map(
        "lexicon" -> "countries",
        "id" -> "http://www.geonames.org/AND",
        "names" -> List("AND", "Andorra", "Andorre"),
        "commit" -> false
    )
    val req = SodaUtils.jsonBuild(params)
    val resp = sodaClient.post("http://host:port/soda/add.json", req)
    Console.println(SodaUtils.jsonParse(resp))

Coverage Info

This can be used to find which lexicons are appropriate for annotating your text. The service allows you to send a piece of text to all hosted lexicons and returns with the number of matches found in each.

URL http://host:port/soda/coverage.json

INPUT

    { "text" : "the text to annotate" }

OUTPUT

    [
        { "lexicon" : "lexicon_name", "numEntries" : 10 },
        { "lexicon" : "another_lexicon", "numEntries" : 100 }
    ]

EXAMPLE PYTHON CLIENT

    import json
    import requests

    params = { "text" : "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China" }
    req = json.dumps(params)
    resp = requests.post("http://host:port/soda/coverage.json", req)
    print json.loads(resp.text)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.{SodaClient, SodaUtils}

    val sodaClient = new SodaClient()
    val req = SodaUtils.jsonBuild(Map(
        "text" -> "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China"
    ))
    val resp = sodaClient.post("http://host:port/soda/coverage.json", req)
    Console.println(SodaUtils.jsonParseList(resp))