Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for UTF-8, detect other wide-char sequences, and cope with BOMs #30

Open
duncanmac99 opened this issue Sep 17, 2018 · 3 comments

Comments

@duncanmac99
Copy link

As it stands, it seems that this program should almost handle UTF-8. The main task would be tinkering with one particular function, as well as (possibly) adding command-line args for handling certain peculiar (and often undesirable) situations.

@logological
Copy link
Owner

Further details on the proposed solution, or better yet, a pull request, would be most welcome.

@duncanmac99
Copy link
Author

However, the rest of the program expects regular (byte-size) characters, not wide characters. It would be possible to assemble it and not send back a wide character, but that would require more buffering in the function itself, which would be Messy.

As for BOMs (byte order marks), Windows now expects one at the beginning of every UTF-8 and UTF-16 file. For more on that (for UTF-8), see:

https://social.msdn.microsoft.com/Forums/windowsapps/en-US/dd352270-8790-4b48-8492-17a4a6875e99/why-the-utf8-with-bom-marker-requirement?forum=winappswithhtml5

Also (for UTF-16):

https://docs.microsoft.com/en-us/windows/desktop/intl/using-byte-order-marks

@logological
Copy link
Owner

I'm afraid I still don't understanding the problem. Can you post a minimal example of a UTF-8 or UTF-16 file that GPP doesn't handle correctly, along with the expected and observed output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants