-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
42 lines (28 loc) · 1.21 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
NAME
similarity - find and print text similarity between sets of strings
SYNOPSIS
similarity combfile
similarity reffile testfile
DESCRIPTION
similarity is a program that finds the Damerau-Levenshtein distance
between strings.
The first form of invocation treats the contents of the file
'combfile' to be strings that each needs to be compared with all the
others in the file.
The second form treats those from the file 'reffile' to be correct
reference strings, against which those in the file 'testfile' should
be compared.
In all cases, the input files should have one string per line.
Blank lines are ignored. The program does trim the strings, but
users should take care of non-printable characters themselves.
The output will be one line printed for each combination of strings,
and has the following format:
pd d tl rl tstr rstr
where, 'd' is the Damerau-Levenshtein distance between strings 'tstr'
'rstr'; 'tl' and 'rl' are the line numbers of test string 'tstr' and
reference string 'rstr', respectively; and 'pd' is calculated as:
d / (len(tstr) + len(rstr)).
AUTHOR
JONNALAGADDA Srinivas
LICENSE
New BSD License