Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PHP Template) A single apostrophe ' breaks php-template highlighting #4152

Open
samuelgfeller opened this issue Oct 24, 2024 · 16 comments
Open
Labels
bug good first issue Should be easier for first time contributors help welcome Could use help from community language

Comments

@samuelgfeller
Copy link

samuelgfeller commented Oct 24, 2024

Describe the issue
An apostrophe ' that exists on its own without being closed seems to break highlighting of everything that comes after it even if it's in a comment or escaped but only with the language php-template. With php it works fine as it should.

Funnily if anywhere later this ' is "closed" it works again.

Which language seems to have the issue?
php-template

Are you using highlight or highlightAuto?
Highlight, not auto.

Sample Code to Reproduce
Code: See expected behaviour

Not highlighted:

<?php
// This won't be highlighted unless there is another 
// single apostrophe later.
?>
<p>Some HTML <?= '\code' ?></p>

image

Correctly highlighted:

<?php
// This won't be highlighted unless there is another 
// single apostrophe later.
?>
<p>Some HTML <?= '\'code' ?></p>

image

Expected behavior
Github highlights the php bits:

<?php
// This won't be highlighted unless there is another 
// single apostrophe later.
?>
<p>Some HTML <?= '\code' ?></p>

I expect it to highlight the code correctly even if not every ' is closed. In a comment for instance, English words allow for shortcuts with ' for instance "don't" or "can't".

Additional observations
Its breaks only if the apostrophe is in PHP code (not HTML) and if it's not in a code block comment /* such as this, that won't cause any troubles */.
It's always everything that comes after it that is broken not before so if half the file doesn't contain a ' and then suddenly a php part with a ' either in a // comment ' or in a string 'string \' with an apostrophe' the bottom half of the file is not highlighted.

@samuelgfeller samuelgfeller added bug help welcome Could use help from community language labels Oct 24, 2024
@joshgoebel
Copy link
Member

joshgoebel commented Oct 24, 2024

If this is correct PHP then I don't think we properly fully support all types of PHP comments or all nuances of strings. Our grammar specifically says that single quoted strings may NOT contain any type of escape characters... where as \' above appears to be an escape...

Thoughts?

@samuelgfeller
Copy link
Author

samuelgfeller commented Oct 25, 2024

Hmm thats a good question. But in the bug I tried to demonstrate the escape char is not relevant. I've just used it in the example so it stays valid php but without it or with normal quotes around that string, the same issue appears. Also if there is no escape at all, just php code and then one comment in a code block that contains a "don't" or "can't" all highlight breaks aftwards. Have you tried it in the demo?

@samuelgfeller
Copy link
Author

samuelgfeller commented Oct 25, 2024

Our grammar specifically says that single quoted strings may NOT contain any type of escape characters

That's interesting. To my knowledge a lot of programming/script languages (php, python, js, ruby, java, perl) support escape chars inside a single quoted string.

I've asked chatgpt about it and here is the answer on what can and cannot be escaped in single quote php strings:

In PHP, single-quoted strings support very limited escaping. Within single quotes, only two characters can be escaped with a backslash:

  1. Single Quote (') - To include a single quote inside a single-quoted string.
  2. Backslash (\) - To include an actual backslash character.

Other escape sequences, like \n for newlines or \t for tabs, will not be interpreted in single-quoted strings in PHP. They will simply be treated as literal characters.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 25, 2024

the escape char is not relevant. ... Also if there is no escape

It is very relevant - if we understood the escape we'd know that that quote isn't a terminator at all. The issue here is two fold:

  • Our grammar doesn't fully understand comments
  • Our grammar doesn't understand single quote escaping at all

These combined create the issues you are seeing. To fix this both php and php-template need to fully comprehend the rules for both comments and strings.

  • php will need to support all comment types (it may already)
  • php will need to support strings fully (including limited escapes within ' strings)
  • php-template will need to understand all comment types (and skip them)
  • php-template will need to understand all strings (and skip them)

The way sublanguage processing works (language within language) is almost like "two phase"... the first parsing pass has to consume the entire block as a single unit - and then pass it off to be reparsed by the second grammar. That means a lot of rules that half way understand the content (skipping strings, comments, etc), and just skip it - until we find the closing pattern. We have to skip strings and comments because a ?> hidden within a string or comment wouldn't be a termination at all - so to prevent those false positives we have to match strings and comments in the "outer" grammar...

@joshgoebel joshgoebel added the good first issue Should be easier for first time contributors label Oct 25, 2024
@joshgoebel
Copy link
Member

Have you tried it in the demo?

I use the developer tool, it's much more powerful than the demo. :-)

@joshgoebel
Copy link
Member

If I whip up a fix are you comfortable building from source and validating - perhaps throwing addition test cases at it trying to break it?

@samuelgfeller
Copy link
Author

If you explain to me each step in detail I'd gladly do that! See me as a beginner.

@joshgoebel
Copy link
Member

Might want to start here:

https://highlightjs.readthedocs.io/en/latest/building-testing.html

Once you've built a node build you can run markup tests (and write a few)... once you build a browser build you can use the developer tool for testing. I'd suggest reading all the docs and looking at other grammars (and built in modes) for examples of how things work.

I'd start with php, making sure it highlights all comments, strings, and escapes correctly... after that it should just be coping those key rules into php-template and making then scopeless/skip rules. Do one small change at a time, test, repeat.

For example you might first want to start with the inherited single quoted string, where it removes all the escapes by niling contains - and then add the two needed escapes instead, etc...

@samuelgfeller
Copy link
Author

Okay I built it locally and experimented with the developer tool.

I tried to write test cases that cover the missing highlighting of the escaped char (php) and single quote issue in the comments and escaped in a string (php-template).

In the phpstorm screenshot I show what would be correct.

php

The language php doesn't have an issue with ' in the comments. It skips it correctly.

php test case 1

  • Highlight.js interprets ?> as part of the comment on the first line.
  • Escaped chars are highlighted the same as if they were part of the literal string.
<?php // The closing php tag on the same line is not part of this comment ?>

<?php
// I don't know if you want to highlight the escaped single quote and backslash differently
// as they are escaped and not part of the literal string.
// PHPStorm does highlight it differently, which I find very pleasing, but GitHub does not.
echo 'escaped \' and \n and \\';
?>

<?php
// Only variables are highlighted differently. Would be awesome to be able to distinguish escaped chars.
echo "\n is and \t are interpreted and therefore should not have the same color as \"string\". 
The same goes for the escaped double quotes and $variables";
?>

Highlight.js
image

PHPStorm
image

php-template

I found out it's not only the single quote but also the double quote that breaks highlighting if escaped in a string
or in a comment.

The quotes issue vanishes with the same snippet if "php" is chosen as language.

php-template test case 1

  • Escaped single quote within single-quote strings breaks highlighting
  • Escaped double quote within double-quote string breaks highlighting
<?= "--> has no impact: ' " ?>
<?= '--> also has no impact: " ' ?>

<?= 'breaks it \'' ?>
<?= "breaks it \"" ?>

<?php // Escaped char \\ and \". \n \t not interpreted.
echo '\n \t \\ \"'; ?>

<?php // Escaped chars \n \t \\ \' all interpreted.
echo "\n \t \\"; ?>

<!-- HTML comment -->
<span spellcheck="false"><?= 'php string' ?></span>

Highlight.js
image
PHPStorm
Everything correct.
image

php-template test case 2

  • Single quote alone in a comment breaks highlighting
  • Double quote alone in a comment breaks highlighting
<?= "--> has no impact: ' " ?>
<?= '--> also has no impact: " ' ?>

<?php // breaks it ' ?>
<?php // breaks it " ?>

<!-- HTML comment -->
<span spellcheck="false"><?= 'php string' ?></span>

Highlight.js
image

PHPStorm
image

@joshgoebel
Copy link
Member

So seems for PHP we need to modify the line comment rule to also recognize ?> as a comment terminator. I'd think we'd want a look-ahead regex there - since we just want to recognize the terminator, but we don't want to consume it as the end of the comment.

And of course we need to add contains modes within string to recognize the escapes... you should be able to find examples from other languages or even the default modes.

@samuelgfeller
Copy link
Author

Can you implement this fix? I don't have the capacity right now. I'll gladly test and try to break it afterwards.

@samuelgfeller
Copy link
Author

I found something else for the php language that I don't quite understand. The following code block is wrongly highlighted after the use ($extra) {

$extra = 'Some optional additional value for the closure';
$validator->add('title', 'custom', [
    'rule' => function ($value, $context) use ($extra) {
        // Custom logic that returns true if the validation passes and 
        // false if the error message below should be shown
    },
    'message' => 'The title is not valid'
]);

image

@joshgoebel
Copy link
Member

You really need to fire up the developer tool so you can see the spans visually.

@samuelgfeller
Copy link
Author

Would that have helped you if I included the visual html spans in the screenshot or as text below? I can still do it but I also provided the code.

@joshgoebel
Copy link
Member

I was thinking it might help YOU. :)

@samuelgfeller
Copy link
Author

Unfortunately I really don't have the capacity right now and in the near future to dive into the code and figure out a fix.

Would be really thankful if you do it and I'll gladly help to test it afterwards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue Should be easier for first time contributors help welcome Could use help from community language
Projects
None yet
Development

No branches or pull requests

2 participants