-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.en
98 lines (67 loc) · 3.38 KB
/
README.en
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
uTeXer
======
About uTeXer
------------
uTeXer is an helper script written in Python to translate unicode math signs and
Latin ligatures into plain text to make them readible for blind computer users.
It can be used for:
- translating formulas from web sites or from PDF's into LaTeX signs
- renice the output of the text versions of PDF's (generated e.g. by pdftotext
from popplerutils); those are often Latin ligatures or other signs which make
the texts harder to read.
So this allows blind users the first time to download papers or other
scientific stuff and read it without a sighted person correcting the formulas.
Download/Installation
---------------------
You can optain a copy by using git:
git clone https://github.com/humenda/utexer.git
or here [a zip file.](https://codeload.github.com/humenda/utexer/zip/master).
For running uTeXer, you need a working python3 installation. You can use
./install
which installs the program to /usr/local/*, or set PREFIX="/" to install it to
/bin and /share directly (and /opt or /usr respectively).
You can run it also directly from the source(s).
Using uTeXer
------------
uTeXer is a simple program, the help screen should explain most:
Usage: utexer [options] INPUTFILE
If no output file is specified with the -o option, the input file will be
overwritten. If no input file is specified, stdin/stdout will be used (but you
can redirect stdout with -o too).
Options:
-h, --help show this help message and exit
-e ENC, --encoding=ENC
Set encoding for stdin (default UTF-8)
-l, --ligature replace ligatures through normal letters (at least in
Latin languages where they are only for better
readibility)
-o FILE, --output=FILE
set output file (if unset, overwrite input file)
-p, --pdftotext Replace some signs generated just by PDFtotext
-s, --strip-pagebreak Strip the newpage character
-u FILE, --userdict=FILE
set path to user-defined replacements/additions for
unicode mappings (format described in README)
Where Do The LaTeX-commands Come From / How Do I Customize Them?
----------------------------------------------------------------
The initial unicode table was downloaded from:
http://www.w3.org/Math/characters/unicode.xml
With the -u switch you can supply an additional unicode table to override (or
even add) unicode points. The format is simple:
<decimal_number><tab><replacement>
Example:
123 \{
This allows you to customize LaTeX-commands. E.g. I don't like \\varnothing,
\\emptyset seems more intuitive for me.
Known Issues
------------
As said before, uTeXer can not fully translate formulas. Especially formulas who
are bigger than a line, e.g. a fraction, indices and powers are (often) not
recognized, just because they are not marked in unicode, but by changing their
relative height. This only matters for PDF output, ob web pages, people often
use tags to indicate subscripts and so on.
Overline and underlines are also lost!
There are signs in the unicode table which should not be translated or are
translated to not commonly used LaTeX-commands:
- \\varnothing instead of the more common \\emptyset
- { } instead of \\lbrace and \\rbrace, since source code is also replaced