This repository contains a tiny web service that lets you tokenize and lemmatize Japanese text.
The service is implemented by wrapping the MeCab tokenizer (paper) in a Sanic app.
Ensure that your server has at least 2-3GB of available RAM (e.g. Azure Standard DS1_v2) and then run:
# start a container for the service and its dependencies
docker run -p 8080:80 cwolff/jp_tokenizer
# call the API
curl -X POST 'http://localhost:8080/tokenize' --data 'サザエさんは走った'
curl -X POST 'http://localhost:8080/lemmatize' --data 'サザエさんは走った'
The API will respond with a space-delimited string of tokens/lemmas.