For this project we'll be taking data from Reddit. Specifically, the r/DataEngineering
sub.
Feel free to change the subreddit in the extract_reddit_etl.py script.
To extract Reddit data, we need to use its Application Programming Interface (API). There's a couple steps you'll need to follow to set this up.
-
Create a Reddit account.
-
Navigate here and create an app. Make sure you select "script" from the radio buttons during the setup process.
-
Take a note of a few things once this is setup:
- the App name
- the App ID
- API Secret Key
or