This python script transforms data from authentication logs
We currently receive one 80mb log file per month from our Identity Management team that we use to visualize authentication traffic in Tableau, and compare it to web traffic from other sources. These files have over 1 million rows, and transformations must be done to:
- reformat the timestamp field into an actual timestamp data type (as opposed to a string)
- aggregate the authentication records up to a total count by date and hour
This is a great start, but I would like to create a mostly-automated workflow that:
- Places the log file into a network directory on a scheduled basis
- Runs a scheduled job to execute this python script on those files
- Appends the new, aggregated data set to a master file
- Updates a Tableau Data Extract based on the master file