You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on my understanding, Textract provides an axis-aligned BoundingBox object and a Polygon object which is composed of more specific points (https://docs.aws.amazon.com/textract/latest/dg/text-location.html). It seems that Textractor only provides the BoundingBox object.
When documents contain significant skew or rotation, axis-aligned boxes will be much larger than non-axis-aligned boxes, and they won't neatly match up with the actual position of the text.
I've attached an example input document, an output text layer using Textractor results, and an output text layer from a different OCR inference that provided non-axis-aligned bounding boxes to hopefully make this easy to visualize.
In the future we would definitely like to support Polygon objects, but it will require some work as a lot of the code is tightly coupled with the BoundingBox object.
Hi all,
Based on my understanding, Textract provides an axis-aligned BoundingBox object and a Polygon object which is composed of more specific points (https://docs.aws.amazon.com/textract/latest/dg/text-location.html). It seems that Textractor only provides the BoundingBox object.
When documents contain significant skew or rotation, axis-aligned boxes will be much larger than non-axis-aligned boxes, and they won't neatly match up with the actual position of the text.
I've attached an example input document, an output text layer using Textractor results, and an output text layer from a different OCR inference that provided non-axis-aligned bounding boxes to hopefully make this easy to visualize.
input_document.pdf
text_layer_non-aabb.pdf
text_layer_textractor_aabb.pdf
Is it possible to add the Polygon object in Textractor? It would be a big help!
The text was updated successfully, but these errors were encountered: