-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accented characters are always displayed in lowercase #724
Comments
As you are right, @Jaffacakelover the font does provide this character (and also the "Cabin Bold" (which is used in the game list for Carbon Theme). However, root cause is that how the uppercase function is implemented in ES. #include <iostream>
#include <locale>
#include <codecvt>
// yarked from current master:
// https://github.com/RetroPie/EmulationStation/blob/771f2a608b84b5262666af7888f377a5b3f280ea/es-core/src/utils/StringUtil.cpp#L163-L172
std::string toUpper(const std::string& _string)
{
std::string string;
for(size_t i = 0; i < _string.length(); ++i)
string += (char)toupper(_string[i]);
return string;
}
// proposed change/fix for #724
std::string toUpperNew(const std::string& _string)
{
std::wstring w_str = std::wstring_convert<std::codecvt_utf8<wchar_t> >().from_bytes(_string);
auto& fct = std::use_facet<std::ctype<wchar_t> >(std::locale());
fct.toupper(&w_str[0], &w_str[0] + w_str.size());
return std::wstring_convert<std::codecvt_utf8<wchar_t> >().to_bytes(w_str);
}
int main() {
std::locale::global(std::locale(""));
std::string str = "Déjà Vu: A Nightmare Come True";
std::cout << "\nInput string : " << str << std::endl;
std::cout << "\nUpper (current) : " << toUpper(str) << std::endl;
std::cout << "\nUpper (proposed): " << toUpperNew(str) << std::endl;
// enable for profiling
/*
std::string tmp;
for (int i = 0 ; i < 20 * 1000; i++) {
//tmp = toUpper(str);
tmp = toUpperNew(str);
}
*/
return 0;
} (used: Will output with locale
Verified ok on Linux/recent RetroPie 4.7.1 (the impact on runtime can be neglected IMHO). |
According to cppreference, Does work on macos, just needed to include the new headers. |
Ah, thanks. If/if not MacOS is supported by this repo escaped my mind. However, it boils down to a design decision: leave it as it, use this approach (albeit deprecated), use a library (which, if not boost?) or roll your own. |
Could verify successfully on Windows too (W10/VS2019). Works with same output and with |
Only done some quick test, and as mentioned, this requires locale to be changed from "C" to "", which breaks other things, (most noticeable is that Carbon theme layout broke) but I have not had time to investigate how or why it broke. So if the choice is between broken themes and lowercase accented e, I know which one I prefer. |
No need to make a choice. Works for me: |
Yeah that's not the latest VIP Carbon, and that's beside the point, if this breaks the current VIP Carbon, it might break many other themes, which is why this needs to be investigated further, I don't remember the details anymore but I know I did a lot of testing back when I removed boost before coming to the conclusion that locale( "C" ) was the one that behaved as it should. |
Now we are getting closer to the facts. If you let me know where this "VIP Carbon" can be found and also can provide specific details what broke in this "VIP Carbon" we might be able to tackle this issue. Or maybe you table it on the RetroPie forum and ask for testers (for Windows builds). |
I just tested it on "normal" Carbon and it flatout crashes with locale( "" ), and now I remember why I chose to use locale( "C" ) when I removed boost. Swedish for example does not use numbers with periods, but with comma, so it doesn't handle xml values like 0.30, since it expects it to be 0,30, so when it sees 0.30 it just becomes 0. In this case it meant that fontSize became 0 which is invalid. I'm sure there might be other ways around this, but this is where we are right now. This is the "dangers" with locale, what works on your computer, might crash on someone elses computer using a different locale. |
I also did some benchmarking
It added 7 seconds to starting up ES and that is not an impact on runtime that can be neglected IMHO since this is even on pc which is way faster than a pi. (and yes, this was on an optimized release build, on a debug build it adds 26 seconds) I'm tired and might be wrong. but my quick math says it's about 117 times slower, just to get an uppercase "é" |
This is gonna be a long comment, but I have now done quite a bit of testing. I wanna point out that all my tests have been done on Windows 10 so the results might differ on a pi. First off, the local that in my tests works the best is: std::locale::global(std::locale("en_US.UTF-8")); Simply because this gives us the ability to do upper case é without breaking the parsing of floats from xml files. And onto the toUpper itself I made 4 versions and profiled them (PR #716 adds a profiling utility) But first I added 2 new functions to StringUtil which will be used later on: std::wstring multiByteToWideChar(const std::string& _string)
{
std::wstring string;
string.resize(_string.length());
mbstowcs((wchar_t*)string.c_str(), _string.c_str(), string.length());
return string;
} // multiByteToWideChar
std::string wideCharToMultiByte(const std::wstring& _string)
{
std::string string;
string.resize(_string.length());
wcstombs((char*)string.c_str(), _string.c_str(), string.length());
return string;
} // wideCharToMultiByte Then I added these 4 versions of toUpper: std::string toUpperStd(const std::string& _string)
{
ProfileScope("toUpperStd");
std::wstring w_str = std::wstring_convert<std::codecvt_utf8<wchar_t> >().from_bytes(_string);
auto& fct = std::use_facet<std::ctype<wchar_t> >(std::locale());
fct.toupper(&w_str[0], &w_str[0] + w_str.size());
std::string string = std::wstring_convert<std::codecvt_utf8<wchar_t> >().to_bytes(w_str);
return string;
} // toUpperStd
std::string toUpperC2U(const std::string& _string)
{
ProfileScope("toUpperC2U");
const std::ctype<wchar_t>& facet = std::use_facet<std::ctype<wchar_t>>(std::locale());
std::wstring wstring;
std::string string;
size_t i = 0;
while(i < _string.length())
wstring += facet.toupper((wchar_t)Utils::String::chars2Unicode(_string, i));
for(size_t i = 0; i < wstring.length(); ++i)
string += unicode2Chars(wstring[i]);
return string;
} // toUpperC2U
std::string toUpperMB2WC(const std::string& _string)
{
ProfileScope("toUpperMB2WC");
const std::ctype<wchar_t>& facet = std::use_facet<std::ctype<wchar_t>>(std::locale());
std::wstring wstring = multiByteToWideChar(_string);
for(size_t i = 0; i < wstring.length(); ++i)
wstring[i] = facet.toupper(wstring[i]);
return wideCharToMultiByte(wstring);
} // toUpperMB2WC
std::string toUpperMB2WC2(const std::string& _string)
{
ProfileScope("toUpperMB2WC2");
const std::ctype<wchar_t>& facet = std::use_facet<std::ctype<wchar_t>>(std::locale());
std::wstring wstring = multiByteToWideChar(_string);
facet.toupper(&wstring[0], &wstring[0] + wstring.size());
return wideCharToMultiByte(wstring);
} // toUpperMB2WC2 A quick explanation of each function: I also modfied toUpper to call all 4 of these just so they could be profiled, all 4 variants have been tested and verified to make é to the proper uppercase, everything have also been tested as to not break chinese characters: std::string toUpper(const std::string& _string)
{
toUpperC2U(_string);
toUpperStd(_string);
toUpperMB2WC(_string);
toUpperMB2WC2(_string);
ProfileScope("toUpper");
std::string string;
for(size_t i = 0; i < _string.length(); ++i)
string += (char)toupper(_string[i]);
return string;
} // toUpper None of the results from the 4 new methods in the profiling are actually used as to not favor any of them. The results were as follow on PC: Debug:
The winner here is toUpperMB2WC2. Release:
The winner is once again toUpperMB2WC2. What's everyones thought on this? |
@tomaz82 nicely done. |
They use mbstowcs and wcstombs which should not be windows specific. |
@tomaz82 yes, I forgot to include the implementation and assumed to be the Win32 equivalents. Here's a result from a Pi4 (no Debug/Release configurations):
|
@tomaz82 thanks for the throughout analysis and data. If there are better solutions to this which work on Linux and Windows with satisfaction I am open to it. Looks |
Well no, not really, that thing has been solved by setting std::setlocale(LC_NUMERIC, "C"); The problem is what locale the global locale should be set to, it can't be "" because that uses whatever is the default on the running system, which might not even be UTF-8. And that would break even more things. This is exactly why it was left as "C", 99.9% of the things works, and it's guaranteed to work identically on all systems. |
Ah, yes. RetroPie ships locale |
Looking at previous changes for locale settings via
Looks like the locale was set to the user's runtime locale.
- std::locale::global(std::locale(std::locale(""), "C", std::locale::numeric));
+ std::locale::global(std::locale("C")); I cannot find a reference of the issue that triggered the latest change (on Github), but testing the previous version of the code with Japanese characters does indeed shows garbled text (mojibake) when the game title is in Japanese (UTF8) and the theme has Looks like one way or the other there's some breakage in displaying text. Most likely using something like ICU would solve the issue without relying on the locale support for upcase/lowercase. |
The content scraper (and any rom lists for systems that have been scraped) displays accented characters in lower case, while everything else is upper case.
Most noticeable with Pokémon games. For example:
The font that RetroPie uses (Open Sans Condensed) does include upper case accented characters: For example,
É
isU+00C9
in unicode.The text was updated successfully, but these errors were encountered: