PDF to Audio Conversion Script

This project extracts text from a PDF document, formats it using Google Generative AI, and converts the formatted text into an audio file using Google Cloud Text-to-Speech.

Prerequisites

Python 3.x
pdfminer.six library
google-cloud-texttospeech library
google-generativeai library
python-dotenv library
Google Cloud credentials JSON file

Installation

Clone the repository:

git clone https://github.com/your-username/pdf-to-audio.git
cd pdf-to-audio

Install the necessary libraries:

pip install pdfminer.six google-cloud-texttospeech google-generativeai python-dotenv

Set up Google Cloud:
- Obtain a Google Cloud credentials JSON file and place it in a secure location on your machine.
- Enable the Text-to-Speech API on Google Cloud.
Set up environment variables:
- Create a .env file in the project directory with the following content:
```
API_KEY=your_google_generative_ai_api_key
```

Usage

Ensure the environment variables are loaded:

from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("API_KEY")

Set the environment variable for Google Cloud credentials:

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "/path/to/your/credentials_key.json"

Extract text from a PDF:

from pdfminer.high_level import extract_text
text = extract_text("/path/to/your/pdf_document.pdf")

Configure the Generative AI API:

import google.generativeai as genai
genai.configure(api_key=API_KEY)
model = genai.GenerativeModel('gemini-1.5-flash')

Format the text using Generative AI:

response = model.generate_content(
    "In the Following text remove all the numbers and special characters, make it more readable and give the response in paragraphs, don't give it in points only in paragraphs. Here is text: \n" + text)
formatted_text = response.text

Set up the Google Cloud Text-to-Speech client:

from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

Convert the formatted text to audio:

synthesis_input = texttospeech.SynthesisInput(text=formatted_text)

voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
)

with open("output.mp3", "wb") as out:
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

Notes

Ensure that the paths to your PDF document and Google Cloud credentials JSON file are correct.
Make sure to handle any exceptions and errors as needed to ensure smooth execution of the script.
You can customize the voice and audio settings by modifying the VoiceSelectionParams and AudioConfig.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
sairoopesh_resume_2.5.pdf		sairoopesh_resume_2.5.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Audio Conversion Script

Prerequisites

Installation

Usage

Notes

License

About

Releases

Packages

Languages

Sai-Roopesh/pdf2audiobook

Folders and files

Latest commit

History

Repository files navigation

PDF to Audio Conversion Script

Prerequisites

Installation

Usage

Notes

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages