You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
My customer is receiving below error when using the textractor with a large multi-page pdf file.
899858907a773d1d5932a263c039a8fced6b281b0e716fbd31366bff7c4392c
Traceback (most recent call last):
File "C:\Users\YADAVA66\PycharmProjects\pythonProject\main.py", line 80, in <module>
doc = Document(response)
File "C:\Users\YADAVA66\PycharmProjects\Textract\lib\site-packages\trp\__init__.py", line 633, in __init__
self._parse()
File "C:\Users\YADAVA66\PycharmProjects\Textract\lib\site-packages\trp\__init__.py", line 667, in _parse
page = Page(documentPage["Blocks"], self._blockMap)
File "C:\Users\YADAVA66\PycharmProjects\Textract\lib\site-packages\trp\__init__.py", line 516, in __init__
self._parse(blockMap)
File "C:\Users\YADAVA66\PycharmProjects\Textract\lib\site-packages\trp\__init__.py", line 530, in _parse
l = Line(item, blockMap)
File "C:\Users\YADAVA66\PycharmProjects\Textract\lib\site-packages\trp\__init__.py", line 142, in __init__
if(blockMap[cid]["BlockType"] == "WORD"):
KeyError: '5e06e009-03ac-42cc-9abf-4df8f606c2af'
The text was updated successfully, but these errors were encountered:
schadem
transferred this issue from aws-samples/amazon-textract-textractor
Jun 12, 2023
This is no bug, instead the JSON passed to the trp is not complete and therefore missing an id that is referenced. Usually this happens when an asychronous API is called (Start*) and the result is paginated and then only the first JSON response block is used.
Use the get_full_json_from_output_config or get_full_json from the https://pypi.org/project/amazon-textract-caller/ to get the full JSON object and pass that to the textract-response parser.
Keeping this issue to remind me updating the error message and pointing to this and recommend getting the full JSON.
athewsey
changed the title
Error parsing multiple page pdf
Improve error messages for missing blocks when parsing incomplete JSON
Jun 7, 2024
Hi,
My customer is receiving below error when using the textractor with a large multi-page pdf file.
The text was updated successfully, but these errors were encountered: