-
-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RapidOCR Error - Leaked Semaphore Objects & OOM Killer #231
Comments
I guess that some of the 1000 images are large in size, which causes the memory request to exceed the limit when recognizing these images. Later, I will add this logic in the code to control the memory from exceeding the limit. |
@SWHL, Thank you for the response! I have couple of follow-up questions based on your suggestions:
|
These two points are already under development, please refer to the develop branch, and they will be updated to the new version soon. |
You can try it again with the |
@SWHL, Thanks for your update, I tried with the version 1.3.25, but it does not work for me. I am facing the same issue. |
Can you confirm if there are any fixed ones among the 1000 that will trigger OOM issues? If it can be stably reproduced, please provide this image. |
@SWHL, I believe this issue might be related to image dimensions. In my experience, the OOM killer was triggered when the image dimension width was in 1px width and 602px height. To clarify, I’d like to understand what the minimum and maximum required dimensions for width and height are. Additionally, I have a suggestion for improving the plugin: it would be helpful to implement an internal image size check. If an image’s dimensions are outside the required range, the plugin could resize it. If resizing isn’t possible, the image could be skipped during the OCR process. This approach could be particularly beneficial when the plugin is integrated with others, such as LangChain. For instance, LangChain’s PDF loader (when extract_image is set to true) uses RapidOCR internally for OCR. Since we cannot predict the dimensions of images embedded in a PDF, having a dimension check before processing each image would make the workflow more robust. Additionally I have attached that sample 1px width image here |
Thanks for the suggestion. There is definitely something wrong with the image resizing here. RapidOCR/python/rapidocr_onnxruntime/main.py Lines 129 to 140 in 62bc487
The original image width is 1px and height is 602px. After preprocess, img shape: hegith=18048px width=32px Enter the following function: RapidOCR/python/rapidocr_onnxruntime/main.py Lines 142 to 159 in 62bc487
Before entering the text detection model, the image width is always 32px and the height is 18048px, so it will trigger the OOM problem. I'm thinking about how to avoid this problem. Or how to avoid this kind of image before sending it to OCR. |
@SWHL, Any update on it? |
Problem Description:
While processing a large number of images (approximately 1000) using RapidOCR, I encountered the following errors midway through the process:
System Information:
rapidocr-onnxruntime 1.3.24
Reproducible Code:
Research & Findings:
These errors seem to be related to memory leaks during batch image processing. I am uncertain about how to resolve these issues within RapidOCR, especially when handling large numbers of images.
Additional Questions:
Any guidance or solutions would be greatly appreciated!
The text was updated successfully, but these errors were encountered: