Flashcard Generator - Web Scraping Project

📚 Overview

This project is a web scraping tool designed to automate the process of extracting flashcard data from a language learning website, IKnowJP!, and saving it into a database. The data includes vocabulary, translations, example sentences, pronunciation, and other relevant details.

The extracted data integrates seamlessly with another project, Flashcard Generator, which organizes and generates the final flashcards for language learning. Check out that repository to see how this data is utilized in practice.

Features

Scrapes vocabulary details (source word, translation, pronunciation, and usage examples).
Converts proficiency levels to standardized CEFR levels.
Saves the data directly into a SQL Server database.
Logs errors and saves failed items to a JSON file for troubleshooting. Check the failed items here.

🛠️ Technologies Used

Python (Core scripting language)
Selenium WebDriver (Web scraping automation)
SQL Server (Database)
pyodbc (Database connection library)
Logging Module (Error and process logging)

For additional context, the repository includes HTML reference files, which showcase the relevant sections of the website used for scraping data.

📋 Setup Instructions

Prerequisites

Python 3.8+
Google Chrome and ChromeDriver installed
SQL Server with the specified database and table structure
Necessary Python libraries (see requirements)

Installation

Clone the repository:

git clone https://github.com/your-username/flashcard-generator.git
cd flashcard-generator

🗂️ Repository Contents

scraper.py: Main script for scraping and saving data.
failed_items.json: Log of failed items for troubleshooting.
HTML Reference Files: Examples of website structure used during scraping.

🌟 Additional Information

Error Handling:

Errors encountered during scraping are logged, and failed items are saved to a JSON file for easy review and reprocessing.

Database Integration:

The script is optimized for SQL Server but can be adapted for other relational databases by modifying the connection settings in the code.

🚀 Related Projects

Flashcard Generator: Uses the data from this project to generate language learning flashcards.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
READme.md		READme.md
coursedata_estructure.html		coursedata_estructure.html
divs_estructure.html		divs_estructure.html
failed_items.json		failed_items.json
item_estructure.html		item_estructure.html
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flashcard Generator - Web Scraping Project

📚 Overview

Features

🛠️ Technologies Used

📋 Setup Instructions

Prerequisites

Installation

🗂️ Repository Contents

🌟 Additional Information

Error Handling:

Database Integration:

🚀 Related Projects

About

Releases

Packages

Languages

MonykPenafor/Raspagem-de-dados

Folders and files

Latest commit

History

Repository files navigation

Flashcard Generator - Web Scraping Project

📚 Overview

Features

🛠️ Technologies Used

📋 Setup Instructions

Prerequisites

Installation

🗂️ Repository Contents

🌟 Additional Information

Error Handling:

Database Integration:

🚀 Related Projects

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages