Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipelines] pipeline with nif-version=2.1 fails #113

Open
jnehring opened this issue Oct 5, 2016 · 16 comments
Open

[Pipelines] pipeline with nif-version=2.1 fails #113

jnehring opened this issue Oct 5, 2016 · 16 comments
Labels

Comments

@jnehring
Copy link
Member

jnehring commented Oct 5, 2016

This curl

curl -X POST -H "Content-Type: application/json" -H "Cache-Control: no-cache" -H "Postman-Token: cc799c16-5d39-accf-b81d-1aa4a48fb5c9" -d '[
{
  "method": "POST",
  "endpoint": "https://api-dev.freme-project.eu/current/e-entity/freme-ner/documents",
  "parameters": {
    "language": "en",
    "dataset": "dbpedia",
    "nif-version": "2.1"
  },
  "headers": {
    "content-type": "text/html",
    "accept": "text/turtle"
  },
  "body": "<p>This summer there is the Zomerbar in Antwerp, one of the most beautiful cities in Belgium.</p>"
},
{
  "method": "POST",
  "endpoint": "https://api-dev.freme-project.eu/current/e-terminology/tilde",
  "parameters": {
    "source-lang": "en",
    "target-lang": "de",
    "nif-version": "2.1"
  },
  "headers": {
    "content-type": "text/turtle",
    "accept": "text/html"
  }
}
]
' "http://api-dev.freme-project.eu/current/pipelining/chain"

fails with error message

{
  "exception": "eu.freme.common.exception.InternalServerErrorException",
  "path": "/pipelining/chain",
  "message": "For input string: \"//freme-project.eu/#offset_38_45\"",
  "error": "Internal Server Error",
  "status": 500,
  "timestamp": 1475653048652
}

It works when I remove nif-version=2.1 from both API calls.

@jnehring
Copy link
Member Author

jnehring commented Oct 5, 2016

The error seems to originate from within internationalization.

Error log:

ERROR   2016-10-05 11:06:28,805 [http-nio-8089-exec-1] eu.freme.bservices.controllers.pipelines.PipelinesController  - For input string: "//freme-project.eu/#offset_38_45"
java.lang.NumberFormatException: For input string: "//freme-project.eu/#offset_38_45"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:569)
        at java.lang.Integer.valueOf(Integer.java:766)
        at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter$TextUnitResource.<init>(HTMLBackConverter.java:450)
        at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.listTextUnitResources(HTMLBackConverter.java:359)
        at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.convertBack(HTMLBackConverter.java:165)
        at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.convertBack(HTMLBackConverter.java:115)
        at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.convertBack(HTMLBackConverter.java:82)
        at eu.freme.bservices.internationalization.api.InternationalizationAPI.convertBack(InternationalizationAPI.java:131)
        at eu.freme.bservices.controllers.pipelines.core.Conversion.convertBack(Conversion.java:62)
        at eu.freme.bservices.controllers.pipelines.core.PipelineService.chain(PipelineService.java:115)
        at eu.freme.bservices.controllers.pipelines.PipelinesController.pipeline(PipelinesController.java:92)
        at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvok

and

ERROR   2016-10-05 11:06:28,807 [http-nio-8089-exec-1] eu.freme.common.exception.ExceptionHandlerService  - Request: http://rv1443.1blu.de:8089/pipelining/chain raised
eu.freme.common.exception.InternalServerErrorException: For input string: "//freme-project.eu/#offset_38_45"
        at eu.freme.bservices.controllers.pipelines.PipelinesController.pipeline(PipelinesController.java:120)
        at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)

@jnehring jnehring changed the title [Pipelines] pipeline with nif-version=2.1 fails [Internationalization] pipeline with nif-version=2.1 fails Oct 5, 2016
@katia-vistatec
Copy link
Contributor

Hi, I tested using these two requests in sequence:

curl -X POST --header "Content-Type: text/html" --header "Accept: text/html" --header "Cache-Control: no-cache" --data "@input.txt" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=spot%2Clink&nif-version=2.1" > output.txt

curl -X POST --header "Content-Type: text/html" --header "Accept: text/html" --header "Cache-Control: no-cache" --data "@output.txt" "http://api-dev.freme-project.eu/current/e-terminology/tilde?source-lang=en&target-lang=de&nif-version=2.1" > out-output.txt

where input.txt is a file whose content type is text/html and output.txt is a file with content-type text/html (the output of the first request) and it is sent as input to the second request.
The files are attached below.
I don't have the error. Can you try again now?

@katia-vistatec
Copy link
Contributor

@jnehring
Copy link
Member Author

jnehring commented Oct 6, 2016

The error happens when executing the pipeline. I could not reproduce it using individual curl commands. The pipeline does not convert from html -> turtle -> html in every step. The pipeline converts from html -> turtle in the beginning, then it performs all pipeline steps with turtle and in the end it converts back to html. So the CURL commands are

curl -X POST --header "Content-Type: text/html" --header "Accept: text/turtle" --header "Cache-Control: no-cache" --data "@input.txt" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=spot%2Clink&nif-version=2.1" > output.txt

curl -X POST --header "Content-Type: text/turtle" --header "Accept: text/turtle" --header "Cache-Control: no-cache" --data "@output.txt" "http://api-dev.freme-project.eu/current/e-terminology/tilde?source-lang=en&target-lang=de&nif-version=2.1"

Executing the two API requests one after another, it works. So I think the problem happens when the output HTML is created because we cannot reproduce this behaviour using separate curl requests.

@katia-vistatec
Copy link
Contributor

katia-vistatec commented Oct 6, 2016

Basing on the Log it is java.lang.NumberFormatException that occurs in HTMLBackConverter.java so in the second step of the pipeline when calling terminology and in particular when getting the begin index of "//freme-project.eu/#offset_38_45".
I think there's some problem with the nif-version. So even if it is nif-version = 2.1, the parameter it is not received correctly and it defaults to version 2.0. So when it parses a nif version 2.1 thinking it's nif 2.0 it fails with number format exception when trying to get the begin index because it uses the wrong identifier. Maybe it is possible to add some log to verify the nif version.

@katia-vistatec
Copy link
Contributor

katia-vistatec commented Oct 6, 2016

I debugged locally using this curl:

curl -X POST -H "Content-Type: application/json" -H "Cache-Control: no-cache" -H "Postman-Token: cc799c16-5d39-accf-b81d-1aa4a48fb5c9" --data "@json.txt" "http://localhost:8080/pipelining/chain"

and with the json.txt attached (see below the attachement) in which I use http://localhost:8080/e-terminology/tilde as the endpoint.
I found that the nif version parameter that arrives to the InternationalizationAPI.java method Reader convertBack(InputStream markupsFile, InputStream enrichedFile, String nifVersion) is null. This creates the above described problem since a nif 2.1 is handled as it were a nif 2.0 (when no value is set for nif-version parameter, the version defaults to 2.0 ), so the string freme-project.eu/#offset_38_45 is not parsed correctly causing the errors.

@katia-vistatec
Copy link
Contributor

json.txt

@jnehring
Copy link
Member Author

jnehring commented Oct 6, 2016

Thanks for the investigation. This is a tough bug. The pipeline itself has no idea of the nif version. We can only guess the nif version by analyzing the nif content. I chose another solution. I scan all pipeline requests and if one of the requests contains a parameter "nif-version" then I submit this nif version to e-internationalization. This implementation does not fix the bug currently, I need to debug it once again. Will do it on monday.

But I do not like this solution. Guessing the nif version from the content might be better. @m1ci do you know of an implementation that guesses the nif version that we can reuse here?

@m1ci
Copy link

m1ci commented Oct 6, 2016

@m1ci do you know of an implementation that guesses the nif version that we can reuse here?

yes, see
https://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&outformat=turtle&informat=text&input=Diego%20Maradona%20is%20from%20Argentina.&nif-version=2.1

In the RDF you can see

<http://freme-project.eu/#collection>
        a               nif:ContextCollection ;
        nif:hasContext  <http://freme-project.eu/#offset_0_33> ;
        <http://purl.org/dc/terms/conformsTo>
                <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1> .

which says that the context http://freme-project.eu/#offset_0_33 conforms to NIF 2.1. This should help.
Also, we agreed that default version is 2.0, and by using the nif-version parameter one can set the version to 2.0 or 2.1.

@jnehring
Copy link
Member Author

jnehring commented Oct 10, 2016

I added source code to guess the nif format. It checks if the version is annotated in the nif document. Further I added this code to the pipelines module. It still does not work although the error message has changed. A debug message says that it detects nif version 2.1 so it hands the right version internationalizationApi.convertBack(). The new version of pipelines is already merged in the master and installed on freme-dev. The problem can be reproduced with above curl request.

I think that the problem is now within e-internationalization.

The error stack trace is here: stacktrace.txt.

@katia-vistatec
Copy link
Contributor

katia-vistatec commented Oct 10, 2016

Hi Jan, debugging locally I found that the nifConvertedFile-skeleton it is being parsed to get the HTML file as a string has "#char=" and not #offset_ as expected.
InternationalizationAPI method:convertToTurtleWithMarkups(InputStream is, String mimeType, String nifVersion) throws ConversionException has the parameter nifVersion null. So it is not set and the nif converted files produced are nif 2.0 version.

@jnehring
Copy link
Member Author

Thank you for investigating on this. I think we should create a parameter nif-version for pipelines so we do not guess the parameter but explicitly set it. Therefore I created #115.

@jnehring jnehring changed the title [Internationalization] pipeline with nif-version=2.1 fails [Pipelines] pipeline with nif-version=2.1 fails Oct 12, 2016
@jnehring
Copy link
Member Author

I put the solution here and close #115

We need the nif-version parameter in pipelines as well. It determines the nif version that is submitted to e-Internationalization in the beginning and in the end of the pipeline. The nif-version parameter of individual pipeline requests is not influenced by the nif-version parameter of the pipeline. This will be a parameter similar to visibility or persist which gets his own field in the database. Currently it can values 2.0 and 2.1. This requires changes in

  • e-Internationalization itself
  • endpoint POST /pipelining/chain
  • endpoint POST /pipelining/templates
  • endpoint PUT /pipelining/templates/{id}
  • maybe GET /pipelineing/templates/{id}
  • respective API documentation

@jnehring
Copy link
Member Author

For now we will not fix the bug.

@katia-vistatec
Copy link
Contributor

Ok.

@ArneBinder
Copy link
Contributor

ArneBinder commented Oct 13, 2016

@jnehring if this will be implemented and the pipeline model is changed, I think it would be really useful to put also useI18n into the pipeline.

But to fix this bug in general, I don't think so many changes are needed, just three files of the Pipelines service need minor changes. The parameter nif-version has to be added to the endpoints POST /pipelining/chain and POST /pipelining/chain/{id}, they just forward it to PipelineService.chain. In the roundtripping case, this method should default it to 2.0 if necessary, (then eventually put it into every single PipelineRequest and) the methods convertToNif here and convertBack here have to use it. convertToNif needs a minor modification, it just has to forward the parameter to convertToTurtleWithMarkups and convertToTurtle (by analogy to convertBack) which are called with null at the moment.
So the same nif version is used for conversion and back conversion and no guessing is necessary. I dont know, if it should be possible or if it makes sense in any way to allow different nif versions within one single pipeline which does roundtripping, so I put it in brackets above.
Do I miss something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants