-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LaTeX hyphenation patterns encoded in iso-8859-1 #688
Comments
Hi @lopippo, thanks for your ideas. 👍 Well, I think you can ignore these files especially for Debian. Probably there is nobody who uses XEP as formatter (it's a commercial product). These hyphenation files are from a time when we still used XEP. At that time, FOP was not yet as advanced as it is now. Nevertheless, I've tried to convert them to UTF-8. I used the following XEP config file (I removed other parts and only showed the structure that has been changed): XEP configuration file<config xmlns="http://www.renderx.com/XEP/config" xml:base="/usr/share/xep/">
<options>
<!-- The following tow options are moved into the /usr/bin/xep
script:
-->
<option name="LICENSE" value="file:///etc/xep/license.xml"/>
<option name="BROKENIMAGE" value="file:///usr/share/xep/images/images/404.gif"/>
<option name="TMPDIR" value="none"/>
<option name="LOGO" value="file:///usr/share/xep/images/logo-renderx.svg"/>
<option name="STAMP_PNG" value="file:///usr/share/xep/images/stamp-renderx.png"/>
<!-- ... -->
</options>
<fonts xmlns="http://www.renderx.com/XEP/config"
xml:base="fonts/"
default-family="Helvetica">
<!-- ... -->
<font-group label="SUSE" embed="true">
<font-family name="OpenSans">
<font><font-data ttf="/usr/share/fonts/truetype/OpenSans-Regular.ttf"/></font>
<font style="italic"><font-data ttf="/usr/share/fonts/truetype/OpenSans-Italic.ttf"/></font>
<font weight="bold"><font-data ttf="/usr/share/fonts/truetype/OpenSans-Bold.ttf"/></font>
<font weight="bold" style="italic"><font-data ttf="/usr/share/fonts/truetype/OpenSans-BoldItalic.ttf"/></font>
</font-family>
<font-family name="DejaVuSansMono">
<font><font-data ttf="/usr/share/fonts/truetype/DejaVuSansMono.ttf"/></font>
<font style="italic"><font-data ttf="/usr/share/fonts/truetype/DejaVuSansMono-Oblique.ttf"/></font>
<font weight="bold"><font-data ttf="/usr/share/fonts/truetype/DejaVuSansMono-Bold.ttf"/></font>
<font weight="bold" style="italic"><font-data ttf="/usr/share/fonts/truetype/DejaVuSansMono-BoldOblique.ttf"/></font>
</font-family>
</font-group>
</fonts>
<languages default-language="en-US" xml:base="file:///home/tom/.config/daps/xep-hyphen/">
<language name="German" codes="de deu ger">
<!-- old <hyphenation pattern="dehyph_rx.tex"/> -->
<hyphenation encoding="UTF-8" pattern="dehyph_rx-utf8.tex"/>
</language>
</languages>
</config> With that config, I've build a German guide and get the following message:
I don't get this warning when I use the original file (
You can add an encoding attribute in the config file. However, that doesn't change the output. Perhaps these are old files (the XEP tool is quite old). |
Greetings Tom, thank you for your quick response. I also suspect on Debian nobody is going to use XEP, so I think I'll scrap the /etc/daps/xep directory altogether. We'll see if we get bug reports. Is remove the whole /etc/daps/xep acceptable ? Sincerely, |
Greetings Filippo,
I would, but it seems we still need it internally for the Security Guide. If we could fix that part, maybe we are able to remove them alltogether. @fsundermeyer could we circumvent the issue with the Security Guide? |
Problem description
Greetings,
in my quest to have a proper package for daps in official Debian, I have stumbled upon this series of lintian warnings:
W: daps: national-encoding [etc/daps/xep/hyphen/dehyph_rx.tex]
W: daps: national-encoding [etc/daps/xep/hyphen/huhyph_rx.tex]
W: daps: national-encoding [etc/daps/xep/hyphen/ithyph_rx.tex]
W: daps: national-encoding [etc/daps/xep/hyphen/ruhyphal.tex]
The reason of the warning is the following:
A file is not valid UTF-8.
Debian has used UTF-8 for many years. Support for national encodings is being phased out. This file probably appears to users in mangled characters (also called mojibake).
Packaging control files must be encoded in valid UTF-8.
Please convert the file to UTF-8 using iconv or a similar tool.
I can see that this makes perfect sense with respect to the hyphenation files: they are indeed for languages that require diacritical signs not present in us-ascii, for example.
So my question is the following, would it be correct to convert these files to utf-8 with a command line like the following:
iconv --from-code=ISO_8859-1 --to-code=UTF-8// -o file-new file
Note: the file huhyph_rx.tex has the following first line:
% ISO8859-2
but when checked using file -i huhyph_rx.tex, the encoding seems to actually be charset=iso-8859-1 (which is confirmed by vim which says the encoding is latin-1, a synonym of iso-8859-1, I think).
This means that the first line of the file is not binding and we can freely recode these files to UTF-8.
If this idea is not silly nor erroneous, would you do this upstream so that for the next version it will be there? For the time being, if you do not make negative comments about this, I'll make that reencoding myself.
What's your take on this?
Sincerely,
Filippo
The text was updated successfully, but these errors were encountered: