Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue: Remove index/count confusion. #4

Open
dmsnell opened this issue Jan 31, 2024 · 0 comments
Open

Tracking Issue: Remove index/count confusion. #4

dmsnell opened this issue Jan 31, 2024 · 0 comments

Comments

@dmsnell
Copy link
Owner

dmsnell commented Jan 31, 2024

When diffs report deleted or equivalent characters they report a span length as a count of characters; unfortunately there is no common definition of character.

diff-match-patch could resolve this in a backwards-compatible way by adding a preamble to its patches that indicates which definition is in use, through the use of semantically empty diff groups.

For example, a leading group of zero length or an empty insert operation should have no impact on the diffed files, so may be used to communicate very small amounts of information.

Consider:

  • EQUAL(0), EQUAL(0), ...rest of diff indices/counts represent UTF-16 code units.
  • EQUAL(0), ...rest of diff indices/counts represent Unicode code points.
  • ...rest of diff indicates that indices/counts represent whatever they did before in their respetive libraries.

A new parameter to the diffing functions can set a mode so that clients can request specific counts. For example, diff_main(a, b, {units: 'unicode'})

@dmsnell dmsnell pinned this issue Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant