-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with some Unicode chars #4
Comments
when parsed without explicit encoding, are the string literals in the store then correct, or is the text corrupt? |
Yep, the interesting thing is that without explicit encoding it works just fine. The problem happens when encoding is specified. Encoding can be avoided when reading from a file, but it must be specified in order to parse Drakma |
The problematic code is in the The function performs kind of a string trimming: (collapse-whitespace " a b c ") => "a b c"
(collapse-whitespace "aaaa a b c ") => "aaaa a b c" P.S. it looks like that the binary bit-wise operations are meant to detect the kind of a Unicode character/character group. i.e. see here. I think that this code was written prior to Unicode/UTF support on primary Lisp implementations and therefore Ora had to employ these binary tricks. It can be written much easier now... |
I had the same experience of @vityok. |
I suspect that if we consider that the whole Wibur's external interface is weak and based on old limitations of libraries and lisp implementations, better than trying to fix the Wilbur's unicode support, we should try to replace the wilbur's parse with cl-rdfxml parse. Something like the code below worked for me. I still have to improve it a lot and handle the blank-nodes instances created by cl-rdfxml (http://www.cs.rpi.edu/~tayloj/CL-RDFXML/#blank_nodes). (defun puri-to-node (s)
(if (eq (type-of s) 'puri:uri)
(w:node (puri:render-uri s nil))
s))
(setf w:*db* (make-instance 'wilbur:db))
(defun parse-rdfxml (path)
(cl-rdfxml:parse-document (lambda (a b c)
(w:add-triple (w:triple (puri-to-node a) (puri-to-node b) (puri-to-node c))))
path) What you think? Is that a good direction? Of course we will add dependences to Wilbur but that, in my opinion, is good and follow recently suggestion http://fare.livejournal.com/169346.html |
good evening, alex; On 2013-01-10, at 21:19 , Alexandre Rademaker wrote:
yes, in general fare is correct. the problem is, it is not always what else?
|
Currently Wilbur works with in-memory RDF databases, but I've found that there are already efforts to create a persistence layer for Wilbur (see Wiki) and there is I guess that it would be very nice to bring some of them together to make a feature-rich RDF storage/processing engine. P.S. here is for example Twinql, a SPARQL engine built on top of Wilbur. But the project is not actively developed (according to the description) and it is very unfortunate if it will remain so... |
Can we have a solution for this issue? Actually, for me it doesn't work with or without the |
Sorry @vityok , I just saw your PR #5 for 5 years ago. It looks like this repo is abandoned, I will fork it. But how to make quicklisp updated? I opened an issue at quicklisp/quicklisp-projects#1593 |
It looks like Wilbur has a problem with certain Unicode chars in certain circumstances.
Code to reproduce:
Produces error both on CCL and SBCL:
But everything works fine if the external format is not specified:
Produces:
That then can be successfully queried.
The problem is even more evident when using flexi-streams.
The text was updated successfully, but these errors were encountered: