You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<html>
<head></head>
<body>
<div id="emid"> <p≯̢̩̫̠̉̊ͦͤͭ̊..̷͙ͯ̊̽̓͆̉ͫ.͇̪ͧ̅́>
< p="">
</p≯̢̩̫̠̉̊ͦͤͭ̊..̷͙ͯ̊̽̓͆̉ͫ.͇̪ͧ̅́><>
</div>
</body>
</html>
org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 21; Element type "p" must be followed by either attribute specifications, ">" or "/>".
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at Test2.main(Test2.java:31)
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 21; Element type "p" must be followed by either attribute specifications, ">" or "/>".
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at Test2.main(Test2.java:31)
Process finished with exit code 1
The text was updated successfully, but these errors were encountered:
Hi, we are a student group and we would like to take a crack at this. Can't guarantee that we'll be able to complete it with high enough quality but we'll like to try.
Hello! I think there is no error with document.outputSettings().charset("ASCII"); You can look for an online Unicode translator and try "\u226F\u0322\u0329\u032B\u0320\u0309\u030A", then you can see that it do translate it into "≯̢̩̫̠̉̊". By the way, unicode like "\u226F" has no correspoding ASCII character.
You can try below code which proves the correctness of jsoup.
The parsed html is clearly weird and broken, but my assumption is that the output, after re-serializing it, should be valid.
document.outputSettings().charset("ASCII");
Version: 1.13.1
output:
The text was updated successfully, but these errors were encountered: