Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Hebrew not supported with LibreOffice(?) #2494

Open
1 task done
MatanMaimon opened this issue Dec 18, 2024 · 5 comments
Open
1 task done

[Bug]: Hebrew not supported with LibreOffice(?) #2494

MatanMaimon opened this issue Dec 18, 2024 · 5 comments

Comments

@MatanMaimon
Copy link

MatanMaimon commented Dec 18, 2024

Installation Method

Docker

The Problem

Trying to use the Convert any file to PDF and seems like any file (docx / txt / etc...) written in Hebrew for some reason result PDF with bad text (the text looks like empty squares):
image

you can check the result PDF file:
hebrew_text_convertedToPDF.pdf

you can test with this simple text file:
hebrew_text.txt

Version of Stirling-PDF

0.36.2

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

http://localhost:8080/file-to-pdf

Docker Configuration

version: '3.3'
services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:latest
    ports:
      - '8080:8080'
    volumes:
      - ./trainingData:/usr/share/tessdata
    environment:
      - DOCKER_ENABLE_SECURITY=false
      - INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
      - LANGS=en_GB

Relevant Log Output

No response

Additional Information

i assume this related to this issue - #420

Browsers Affected

No response

No Duplicate of the Issue

  • I have verified that there are no existing issues raised related to my problem.
Copy link

dosubot bot commented Dec 18, 2024

Hey there, @MatanMaimon! I'm Dosu, a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's work on resolving the issue you're facing while we wait for a human maintainer.

A similar closed issue was found regarding font-related problems with non-Latin characters, specifically Cyrillic characters. The solution suggested in that issue was to address missing fonts by specifying the language code in the configuration file. You can refer to the example provided in the Docker configuration file here: docker-compose-latest.yml. This should help in downloading the correct fonts needed for Hebrew text [1].

In your Docker configuration, you can try adding the Hebrew language code to the LANGS environment variable. For example, you might modify it to include he_IL or another appropriate Hebrew language code. This change should help ensure that the necessary fonts for Hebrew are installed and used during the conversion process.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@MatanMaimon
Copy link
Author

Hey there, @MatanMaimon! I'm Dosu, a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's work on resolving the issue you're facing while we wait for a human maintainer.

A similar closed issue was found regarding font-related problems with non-Latin characters, specifically Cyrillic characters. The solution suggested in that issue was to address missing fonts by specifying the language code in the configuration file. You can refer to the example provided in the Docker configuration file here: docker-compose-latest.yml. This should help in downloading the correct fonts needed for Hebrew text [1].

In your Docker configuration, you can try adding the Hebrew language code to the LANGS environment variable. For example, you might modify it to include he_IL or another appropriate Hebrew language code. This change should help ensure that the necessary fonts for Hebrew are installed and used during the conversion process.

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosu i have tried this code and it didn't help (of course i tried to rerun the container, remove and run etec...)

version: '3.3'
services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:latest
    ports:
      - '8080:8080'
    volumes:
      - ./trainingData:/usr/share/tessdata
    environment:
      DOCKER_ENABLE_SECURITY: false
      INSTALL_BOOK_AND_ADVANCED_HTML_OPS: false
      LANGS: "en_GB,en_US,he_IL"
      SYSTEM_DEFAULTLOCALE: he_IL

@Frooodle
Copy link
Member

can you try LANGS as "ALL"

Copy link

dosubot bot commented Dec 18, 2024

If adding the Hebrew language code to the LANGS environment variable didn't resolve the issue, you might want to check if the necessary fonts for Hebrew are installed in the Docker container. Sometimes, the issue with non-Latin scripts like Hebrew can be due to missing fonts.

Here are a few steps you can try:

  1. Install Hebrew Fonts: Ensure that Hebrew fonts are installed in the Docker container. You can do this by modifying the Dockerfile to include the installation of Hebrew fonts. For example, you can add a line to install ttf-freefont or any other package that includes Hebrew fonts.

  2. Verify Font Configuration: Check if the font configuration in LibreOffice (used by Stirling-PDF for conversion) is set to use the installed Hebrew fonts. This might require accessing the LibreOffice configuration files within the container.

  3. Check Locale Settings: Ensure that the locale settings in the container are correctly configured for Hebrew. You can do this by setting the LANG and LC_ALL environment variables to he_IL.UTF-8.

  4. Rebuild the Docker Image: After making these changes, rebuild the Docker image and restart the container to apply the new configurations.

These steps should help in resolving the issue with Hebrew text not displaying correctly in the converted PDF files.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@MatanMaimon
Copy link
Author

MatanMaimon commented Dec 18, 2024

@Frooodle that worked!
however, the result PDF is LTR.. can't it auto-detect the source file and set it to RTL when needed? Alternatively, can I explicitly set the result PDF's direction to RTL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants