This project extracts text from a PDF document, formats it using Google Generative AI, and converts the formatted text into an audio file using Google Cloud Text-to-Speech.
- Python 3.x
pdfminer.six
librarygoogle-cloud-texttospeech
librarygoogle-generativeai
librarypython-dotenv
library- Google Cloud credentials JSON file
-
Clone the repository:
git clone https://github.com/your-username/pdf-to-audio.git cd pdf-to-audio
-
Install the necessary libraries:
pip install pdfminer.six google-cloud-texttospeech google-generativeai python-dotenv
-
Set up Google Cloud:
- Obtain a Google Cloud credentials JSON file and place it in a secure location on your machine.
- Enable the Text-to-Speech API on Google Cloud.
-
Set up environment variables:
- Create a
.env
file in the project directory with the following content:API_KEY=your_google_generative_ai_api_key
- Create a
-
Ensure the environment variables are loaded:
from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("API_KEY")
-
Set the environment variable for Google Cloud credentials:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "/path/to/your/credentials_key.json"
-
Extract text from a PDF:
from pdfminer.high_level import extract_text text = extract_text("/path/to/your/pdf_document.pdf")
-
Configure the Generative AI API:
import google.generativeai as genai genai.configure(api_key=API_KEY) model = genai.GenerativeModel('gemini-1.5-flash')
-
Format the text using Generative AI:
response = model.generate_content( "In the Following text remove all the numbers and special characters, make it more readable and give the response in paragraphs, don't give it in points only in paragraphs. Here is text: \n" + text) formatted_text = response.text
-
Set up the Google Cloud Text-to-Speech client:
from google.cloud import texttospeech client = texttospeech.TextToSpeechClient()
-
Convert the formatted text to audio:
synthesis_input = texttospeech.SynthesisInput(text=formatted_text) voice = texttospeech.VoiceSelectionParams( language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE ) audio_config = texttospeech.AudioConfig( audio_encoding=texttospeech.AudioEncoding.MP3 ) response = client.synthesize_speech( input=synthesis_input, voice=voice, audio_config=audio_config ) with open("output.mp3", "wb") as out: out.write(response.audio_content) print('Audio content written to file "output.mp3"')
- Ensure that the paths to your PDF document and Google Cloud credentials JSON file are correct.
- Make sure to handle any exceptions and errors as needed to ensure smooth execution of the script.
- You can customize the voice and audio settings by modifying the
VoiceSelectionParams
andAudioConfig
.
This project is licensed under the MIT License. See the LICENSE file for details.