Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access Non-Axis-Aligned Bounding Boxes #359

Open
zkalson opened this issue Apr 17, 2024 · 2 comments
Open

Access Non-Axis-Aligned Bounding Boxes #359

zkalson opened this issue Apr 17, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@zkalson
Copy link

zkalson commented Apr 17, 2024

Hi all,

Based on my understanding, Textract provides an axis-aligned BoundingBox object and a Polygon object which is composed of more specific points (https://docs.aws.amazon.com/textract/latest/dg/text-location.html). It seems that Textractor only provides the BoundingBox object.

When documents contain significant skew or rotation, axis-aligned boxes will be much larger than non-axis-aligned boxes, and they won't neatly match up with the actual position of the text.

I've attached an example input document, an output text layer using Textractor results, and an output text layer from a different OCR inference that provided non-axis-aligned bounding boxes to hopefully make this easy to visualize.

input_document.pdf
text_layer_non-aabb.pdf
text_layer_textractor_aabb.pdf

Is it possible to add the Polygon object in Textractor? It would be a big help!

@zkalson
Copy link
Author

zkalson commented Apr 17, 2024

As a temporary workaround, I am getting the id field from the word/line and finding the associated polygon in Document.response

@Belval
Copy link
Contributor

Belval commented May 6, 2024

You can use the word/lines raw_object member to get the polygon without doing an id-based look up.

https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/parsers/response_parser.py#L226

In the future we would definitely like to support Polygon objects, but it will require some work as a lot of the code is tightly coupled with the BoundingBox object.

@Belval Belval added the enhancement New feature or request label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants