[SwiftParser] Improve diagnostics for misspelled keywords #2794

AppAppWorks · 2024-08-09T07:33:43Z

This is preliminary work on emitting better diagnostics for misspelled keywords. More discussions would be needed to ascertain the scope of misspelling corrections.

fixes swiftlang#2198 swiftlang#2180

Sources/SwiftParser/TokenSpecSet.swift

ahoppen

Great changes! One thought after skimming over the PR:

I think it would be great if we could make misspelling correction even easier to use. What do you think of the following:

Every keyword has a list of possible misspellings + maybe an implicit list of single-character typos (the latter is probably more involved and could be a separate PR)
canRecover always takes misspellings into account and automatically generates a TokenConsumptionHandle that generates a missing token if it discovers a misspelling.
expect takes a parameter to decide whether it should take misspellings into account.

mateusrodriguesxyz · 2024-08-09T16:28:10Z

FWIW I've implemented a few months ago a general solution for keyword misspelling using levenshtein distance:
https://github.com/mateusrodriguesxyz/swift-syntax/tree/keywords-correction

MisspelledKeywordsTest

AppAppWorks · 2024-08-10T03:10:18Z

Every keyword has a list of possible misspellings

I think we could store the mapping of keyword misspellings in some resource files and make use of code generation to create boilerplate code (there would be a lot of them). Resource files would be friendly for crowdsourcing too.

maybe an implicit list of single-character typos (the latter is probably more involved and could be a separate PR)

@mateusrodriguesxyz has done some great work on correcting keywords with single-character typos based on Levenshtein Distance, but I'm not sure if we should precompute typo permutations statically as computation for Levenshtein Distance takes O(mn) time.

canRecover always takes misspellings into account and automatically generates a TokenConsumptionHandle that generates a missing token if it discovers a misspelling.

Would it lead to performance regression as the search space for the parser might explode?

AppAppWorks · 2024-08-10T03:11:09Z

FWIW I've implemented a few months ago a general solution for keyword misspelling using levenshtein distance: https://github.com/mateusrodriguesxyz/swift-syntax/tree/keywords-correction

MisspelledKeywordsTest

Great work! It'll take some time for me to digest it.

mateusrodriguesxyz · 2024-08-10T20:59:53Z

Great work! It'll take some time for me to digest it.

Thanks! Most of the relevance code is in TokenSpec. I do quite a few check before even trying to find the correct keyword to avoid false diagnostics and unnecessary distance computation.

(cherry picked from commit 2a3e108bf7b36539930298a9b78333983eb61e76)

AppAppWorks · 2024-08-15T03:56:15Z

With the latest commit, I've created a kitchen sink to facilitate crowdsourcing of keyword "false friends" in popular programming languages and other general misspellings that won't be captured by one-character Levenshtein Distance permutations.

Truth be told, it might be infeasible to keep in sync with the evolution of all these languages in the future, but at this stage let's just brainstorm :)

ahoppen

That’s a great list. I don’t think it needs to be 100% complete but just having the infrastructure to add more typo Fix-Its is amazing because it makes it very easy to extend.

I think the next big step is to integrate this into the main parsing infrastructure so typo-correction is taken care of automatically in a variety of places, as I described in #2794 (review).

foundation for the recovery facility of misspelled keywords

97ecfb1

fixes swiftlang#2198 swiftlang#2180

AppAppWorks requested review from ahoppen and bnbarham as code owners August 9, 2024 07:33

AppAppWorks marked this pull request as draft August 9, 2024 07:33

AppAppWorks commented Aug 9, 2024

View reviewed changes

Sources/SwiftParser/TokenSpecSet.swift Show resolved Hide resolved

ahoppen reviewed Aug 9, 2024

View reviewed changes

testing ground for code generation for false friend recognition

6d64eaf

(cherry picked from commit 2a3e108bf7b36539930298a9b78333983eb61e76)

ahoppen reviewed Aug 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SwiftParser] Improve diagnostics for misspelled keywords #2794

[SwiftParser] Improve diagnostics for misspelled keywords #2794

AppAppWorks commented Aug 9, 2024

ahoppen left a comment

mateusrodriguesxyz commented Aug 9, 2024 •

edited

Loading

AppAppWorks commented Aug 10, 2024

AppAppWorks commented Aug 10, 2024 •

edited

Loading

mateusrodriguesxyz commented Aug 10, 2024

AppAppWorks commented Aug 15, 2024 •

edited

Loading

ahoppen left a comment

[SwiftParser] Improve diagnostics for misspelled keywords #2794

Are you sure you want to change the base?

[SwiftParser] Improve diagnostics for misspelled keywords #2794

Conversation

AppAppWorks commented Aug 9, 2024

ahoppen left a comment

Choose a reason for hiding this comment

mateusrodriguesxyz commented Aug 9, 2024 • edited Loading

AppAppWorks commented Aug 10, 2024

AppAppWorks commented Aug 10, 2024 • edited Loading

mateusrodriguesxyz commented Aug 10, 2024

AppAppWorks commented Aug 15, 2024 • edited Loading

ahoppen left a comment

Choose a reason for hiding this comment

mateusrodriguesxyz commented Aug 9, 2024 •

edited

Loading

AppAppWorks commented Aug 10, 2024 •

edited

Loading

AppAppWorks commented Aug 15, 2024 •

edited

Loading