-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SwiftParser] Improve diagnostics for misspelled keywords #2794
base: main
Are you sure you want to change the base?
[SwiftParser] Improve diagnostics for misspelled keywords #2794
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes! One thought after skimming over the PR:
I think it would be great if we could make misspelling correction even easier to use. What do you think of the following:
- Every keyword has a list of possible misspellings + maybe an implicit list of single-character typos (the latter is probably more involved and could be a separate PR)
canRecover
always takes misspellings into account and automatically generates aTokenConsumptionHandle
that generates a missing token if it discovers a misspelling.expect
takes a parameter to decide whether it should take misspellings into account.
FWIW I've implemented a few months ago a general solution for keyword misspelling using levenshtein distance: |
I think we could store the mapping of keyword misspellings in some resource files and make use of code generation to create boilerplate code (there would be a lot of them). Resource files would be friendly for crowdsourcing too.
@mateusrodriguesxyz has done some great work on correcting keywords with single-character typos based on Levenshtein Distance, but I'm not sure if we should precompute typo permutations statically as computation for Levenshtein Distance takes O(mn) time.
Would it lead to performance regression as the search space for the parser might explode? |
Great work! It'll take some time for me to digest it. |
Thanks! Most of the relevance code is in TokenSpec. I do quite a few check before even trying to find the correct keyword to avoid false diagnostics and unnecessary distance computation. |
(cherry picked from commit 2a3e108bf7b36539930298a9b78333983eb61e76)
With the latest commit, I've created a kitchen sink to facilitate crowdsourcing of keyword "false friends" in popular programming languages and other general misspellings that won't be captured by one-character Levenshtein Distance permutations. Truth be told, it might be infeasible to keep in sync with the evolution of all these languages in the future, but at this stage let's just brainstorm :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a great list. I don’t think it needs to be 100% complete but just having the infrastructure to add more typo Fix-Its is amazing because it makes it very easy to extend.
I think the next big step is to integrate this into the main parsing infrastructure so typo-correction is taken care of automatically in a variety of places, as I described in #2794 (review).
fixes #2198 #2180
This is preliminary work on emitting better diagnostics for misspelled keywords. More discussions would be needed to ascertain the scope of misspelling corrections.