-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pipelines] pipeline with nif-version=2.1 fails #113
Comments
The error seems to originate from within internationalization. Error log:
and
|
Hi, I tested using these two requests in sequence: curl -X POST --header "Content-Type: text/html" --header "Accept: text/html" --header "Cache-Control: no-cache" --data "@input.txt" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=spot%2Clink&nif-version=2.1" > output.txt curl -X POST --header "Content-Type: text/html" --header "Accept: text/html" --header "Cache-Control: no-cache" --data "@output.txt" "http://api-dev.freme-project.eu/current/e-terminology/tilde?source-lang=en&target-lang=de&nif-version=2.1" > out-output.txt where input.txt is a file whose content type is text/html and output.txt is a file with content-type text/html (the output of the first request) and it is sent as input to the second request. |
The error happens when executing the pipeline. I could not reproduce it using individual curl commands. The pipeline does not convert from html -> turtle -> html in every step. The pipeline converts from html -> turtle in the beginning, then it performs all pipeline steps with turtle and in the end it converts back to html. So the CURL commands are curl -X POST --header "Content-Type: text/html" --header "Accept: text/turtle" --header "Cache-Control: no-cache" --data "@input.txt" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=spot%2Clink&nif-version=2.1" > output.txt curl -X POST --header "Content-Type: text/turtle" --header "Accept: text/turtle" --header "Cache-Control: no-cache" --data "@output.txt" "http://api-dev.freme-project.eu/current/e-terminology/tilde?source-lang=en&target-lang=de&nif-version=2.1" Executing the two API requests one after another, it works. So I think the problem happens when the output HTML is created because we cannot reproduce this behaviour using separate curl requests. |
Basing on the Log it is java.lang.NumberFormatException that occurs in HTMLBackConverter.java so in the second step of the pipeline when calling terminology and in particular when getting the begin index of "//freme-project.eu/#offset_38_45". |
I debugged locally using this curl: curl -X POST -H "Content-Type: application/json" -H "Cache-Control: no-cache" -H "Postman-Token: cc799c16-5d39-accf-b81d-1aa4a48fb5c9" --data "@json.txt" "http://localhost:8080/pipelining/chain" and with the json.txt attached (see below the attachement) in which I use http://localhost:8080/e-terminology/tilde as the endpoint. |
Thanks for the investigation. This is a tough bug. The pipeline itself has no idea of the nif version. We can only guess the nif version by analyzing the nif content. I chose another solution. I scan all pipeline requests and if one of the requests contains a parameter "nif-version" then I submit this nif version to e-internationalization. This implementation does not fix the bug currently, I need to debug it once again. Will do it on monday. But I do not like this solution. Guessing the nif version from the content might be better. @m1ci do you know of an implementation that guesses the nif version that we can reuse here? |
In the RDF you can see
which says that the context http://freme-project.eu/#offset_0_33 conforms to NIF 2.1. This should help. |
I added source code to guess the nif format. It checks if the version is annotated in the nif document. Further I added this code to the pipelines module. It still does not work although the error message has changed. A debug message says that it detects nif version 2.1 so it hands the right version internationalizationApi.convertBack(). The new version of pipelines is already merged in the master and installed on freme-dev. The problem can be reproduced with above curl request. I think that the problem is now within e-internationalization. The error stack trace is here: stacktrace.txt. |
Hi Jan, debugging locally I found that the nifConvertedFile-skeleton it is being parsed to get the HTML file as a string has "#char=" and not #offset_ as expected. |
Thank you for investigating on this. I think we should create a parameter nif-version for pipelines so we do not guess the parameter but explicitly set it. Therefore I created #115. |
I put the solution here and close #115 We need the nif-version parameter in pipelines as well. It determines the nif version that is submitted to e-Internationalization in the beginning and in the end of the pipeline. The nif-version parameter of individual pipeline requests is not influenced by the nif-version parameter of the pipeline. This will be a parameter similar to visibility or persist which gets his own field in the database. Currently it can values 2.0 and 2.1. This requires changes in
|
For now we will not fix the bug. |
Ok. |
@jnehring if this will be implemented and the pipeline model is changed, I think it would be really useful to put also But to fix this bug in general, I don't think so many changes are needed, just three files of the Pipelines service need minor changes. The parameter |
This curl
fails with error message
It works when I remove nif-version=2.1 from both API calls.
The text was updated successfully, but these errors were encountered: