Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuts and x_bboxes #79

Open
kba opened this issue Oct 22, 2016 · 4 comments
Open

cuts and x_bboxes #79

kba opened this issue Oct 22, 2016 · 4 comments

Comments

@kba
Copy link
Owner

kba commented Oct 22, 2016

Why have mechanisms for relative and absolute positioning of codepoints within a word/cinfo?

Why not a bboxes attribute without the engine-specific prefix?

Related to #69

@kba
Copy link
Owner Author

kba commented Oct 26, 2016

#17 (comment)

The "cuts" attribute is for representing cuts. It exists as a compact,
pixel-accurate representation of a character segmentation. Cuts are not
bounding boxes, and, in fact, are not all that useful unless you have the
original page image available.

@kba
Copy link
Owner Author

kba commented Oct 26, 2016

#17 (comment)

Cuts are for pixel-accurate segmentation in the presence of kerning,
something bounding boxes can't represent.

def decode_cuts(s, x=0, ymax=None):
    print repr(x)
    cuts = []
    for path in s.split():
        turns = [int(p) for p in path.split(",")]
        print repr(x), repr(turns)
        x += turns[0]
        pos = [x, 0]
        cut = [tuple(pos)]
        for i, d in enumerate(turns[1:]):
            pos[(i+1)%2] += d
            cut.append(tuple(pos))
        if ymax is not None:
            pos[1] = ymax
            cut.append(tuple(pos))
        cuts.append(cut)
    return cuts

To convert these to tight bounding boxes, you need the original binary
image (it's another 10-20 lines to do that conversion).

@kba
Copy link
Owner Author

kba commented Oct 26, 2016

@mttagessen in #17 (comment)

My point with the x_cuts, x_confs, x_* still stands even if you cut it down to a single engine and reencoding existing output. Without access to the particular model it is still impossible to align confidences/bboxes with code points even when you can make sure that nobody "tampered" with the file by renormalizing it to another Unicode normalization. The fundamental reason is that there is no mapping between Unicode code points and recognition units. Formats like AbbyyXML actually allow this alignment by being designed bottom-up (glyph-first) instead of top down like hOCR. I use "glyph" as the lowest level of label an engine may produce.

While per-character bounding boxes are indeed rather useless (and techniques like CTC layers may or may not produce them randomly), quite a few people seem keen on confidences for postprocessing.

@kba
Copy link
Owner Author

kba commented Oct 26, 2016

Kerning:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant