Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mathml #200

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Mathml #200

wants to merge 14 commits into from

Conversation

xaviripo
Copy link

Hi,

My name is Xavier Ripoll, I'm a developer at MathType and colleague of @manuelwiris, who contacted you a couple of months ago asking about a MathML module.

As he stated, we are working on a MathML module of our own for HTML Purifier. Currently, the module has been completed and is ready for release. We share the concerns you raised regarding security so we did an exhaustive job implementing the specification. To back our work, we also created three suites of tests.

One of them is included within the mathml branch (the one intended to merge) and consists of the Mozilla MathML Torture Test.

The two others are the W3C MathML Test Suite and 140+ custom MathML snippets. These are included only in the testing branch because of their size. If you prefer, though, they can be included in the main branch as well.

All three suites of tests are passing in our local environment with all PHP versions. Nonetheless, in older PHP versions (<=5.5) the execution fails in Travis due to an out-of-memory exception we believe to be caused by the size of the schema.ser file. We'd appreciate any feedback from your side about what might be causing the error.

You can check the results of the mathml (https://travis-ci.org/wiris/htmlpurifier/builds/453971161) and testing (https://travis-ci.org/wiris/htmlpurifier/builds/453986321) branches on Travis.

Given all of this, we think HTML Purifier should benefit from the inclusion of the MathML module we propose.

Thank you,
Xavier

P.S. We have also detected some other issues with the HTML Purifier package, is the GitHub tracker the appropriate place to report these?

@ezyang
Copy link
Owner

ezyang commented Nov 13, 2018

P.S. We have also detected some other issues with the HTML Purifier package, is the GitHub tracker the appropriate place to report these?

Yes please.

I haven't looked at the patch yet, but we'll have to work out some solution to the Travis OOM problem. Maybe as simple as just bumping up the max memory size in PHP config.

$E['XLINK.prefix'] = 'xlink';

$proprietary_att_wrs = array(
'wrs:valign' => 'CDATA',
Copy link
Owner

@ezyang ezyang Nov 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum#middle,middle-baseline?

// Prefix used for xlink attrs; is not specified by the MathML DTD
$E['XLINK.prefix'] = 'xlink';

$proprietary_att_wrs = array(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these attributes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dsi:* and wrs:* are a set of proprietary styling attributes which are pretty common in many MathML documents.
In fact, they appear in the standard W3C testing suite (see e.g. https://www.w3.org/Math/testsuite/mml2-testsuite/Presentation/DynamicExpressions/maction/mactionBhigh1.xml) which we use to validate the MathML module in HTMLPurifier.
Even thought they are not part of the standard, we believe them to be important enough to be validated just like the rest of the attributes.

$E['CommonAtt'] = array_merge(
array(
'xmlns' => 'Bool#http://www.w3.org/1998/Math/MathML',
$E['XLINK.prefix'] . ':href' => 'CDATA',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks XSSable


$E['DefEncAtt'] = array(
'encoding' => 'CDATA',
'definitionurl' => 'CDATA'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks XSSable

'id' => 'CDATA', // MathML allows multiple elements with same ID
'xref' => 'CDATA',
'class' => 'CDATA',
'style' => 'CDATA',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely XSSable

@ezyang
Copy link
Owner

ezyang commented Nov 13, 2018

This is a really good start. However, there is too heavy use of CDATA, especially on fields which are known to be XSS-able if you have arbitrary string data. The next step is to go through the CDATA fields and audit them. Use existing attribute definitions as much as possible, and otherwise clamp down on possible values.

@xaviripo
Copy link
Author

I went and updated the most important fields that were previously CDATA, some new AttrDefs were added in order to do so.

@ezyang
Copy link
Owner

ezyang commented Nov 17, 2018

OK. This may take some time to review because I need to audit every remaining occurrence of CDATA to confirm that there isn't security risk.

@mcagigas-at-wiris
Copy link

Hi @ezyang,

Hope you are well.

My name is Manuel, Plugin Manager at WIRIS.

I'm, writing to see if there are some prevision to include this update in a new version of HTML Purifier and to see if we can help in any way to accelerate the integration process.

Best,

@rohitcoder
Copy link

Hi there, i am going through multiple sanitizer libs to use it with CKeditor MathType / ChemType and i found this thread, can anyone please tell me how to sanitize MathType tags properly? It's getting little bit confusing.

@rohitcoder
Copy link

Just in case, if anyone is looking for sanitizer for MathMl or ChemType / MathType you can use this https://github.com/rohitcoder/html-sanitizer/ this is improved and upgraded version of https://github.com/tgalopin/html-sanitizer/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants