Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w3cDom.asString: Namespace for prefix 'xxx' has not been declared #2087

Open
richardmorleysmith opened this issue Dec 19, 2023 · 5 comments
Open
Labels
needs-more-info More information is needed from the reporter to progress the issue

Comments

@richardmorleysmith
Copy link

So basically I noticed this crash when trying to use w3CDom.asString() for a site which was created using Vue.js, and it was using "v-bind" in the place of "xxx"

To reproduce you can run the following test case:

@Test
void testNameSpaceCrash()
{
    final W3CDom w3CDom = new W3CDom().namespaceAware(false);
    final String html = """
        <html>
        <body>
        <div xxx:class="test"></div>
        </body>
        </html>""";
    final Document jSoupDoc = Jsoup.parse(html);
    final org.w3c.dom.Document w3CDoc = w3CDom.fromJsoup(jSoupDoc);

    assertDoesNotThrow(() -> w3CDom.asString(w3CDoc));
}
@jhy
Copy link
Owner

jhy commented Dec 23, 2023

I'm not sure what the best way to handle this is. The exception is coming out of the JDKs XML serializer (com.sun.org.apache.xml.internal.serializer) and it's always going to throw an exception if an attribute has an undeclared prefix.

A couple of options:

  1. When creating the attribute, jsoup could set an arbitrary namespace URI for an undeclared prefix. The output would be something like: <div xmlns:xxx="undefined" xxx:class="test"></div>
  2. Or, we could escape the : in the attribute key and so the output would be: <div xxxU00003Aclass="test"></div>

Option 1 is probably more compatible (in that in this instance of Vue, the JS would still execute).

Can you add some detail to your use case -- what are you trying to do with using this W3C interface and serialization vs the jsoup document serialization?

@jhy jhy changed the title Namespace for prefix 'xxx' has not been declared w3cDom.asString: Namespace for prefix 'xxx' has not been declared Dec 23, 2023
@jhy jhy added the needs-more-info More information is needed from the reporter to progress the issue label Dec 30, 2023
@richardmorleysmith
Copy link
Author

Hi @jhy,

I'm using a JavaFX WebView to load web pages, which gives us a W3C document which I then convert into a String using the W3CDom class provided by JSoup.

@richardmorleysmith
Copy link
Author

Hey @jhy, just wondering if there were any updates on this one? Is there still more info you need? :)

@jhy
Copy link
Owner

jhy commented Jul 1, 2024

Sorry for the late reply. Thanks for the usecase info. So, I think my suggested option 1 would be best? Or, do you have another suggestion?

@richardmorleysmith
Copy link
Author

Hi @jhy, I agree that option 1 would work best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-more-info More information is needed from the reporter to progress the issue
Projects
None yet
Development

No branches or pull requests

2 participants