Collection and analysis of PGA Tour data. Python for collection and organization of data. Analysis may be performed in Python, R, Julia, or MATLAB.
1. Download the results webpages for every year of all PGA Tour Tournaments
2. Parse webpages into data by golfer
3. Collect list of courses
4. Get geolocation data of all courses in list
5. Split golfer data by round
6. Link geolocation data with golfer round data
7. Get weather conditions of every course on tournament days, write csv
8. Link weather conditions to each existing round (rewrite to before split by round?)
- (?) Get course length, rating, slope for every course
10. Create list of data where each element is a list of data
-
Find which weather elements strongly correlate and consider removing
-
Plan and perform analysis
Change printed CSVs to not have index columns, but this is low priority
Refactor html_parser.py and the scraper module.