This repository contains all the data analysis and data collection projects that I have worked on. It encompasses projects that involved statistical analysis with hypothesis testing, as well as those that involved data manipulation and extraction.
This project involved extracting data from multiple PDF files and converting them to CSV files for further use and analysis.
This project focused on analysing trends present in COVID-19 data sourced from worldometer. An initial analysis was performed on data from all around the world by comparing the trend of cases and deaths in the top 5 countries with the most number of cases. The report then focuses on the transmission within India, and aims to answer two questions -
- What are the factors that led to Maharashtra developing such a high number of cases?
- Which state has the highest death rate, and why?
All analyses were performed using the pandas and matplotlib modules in Python 3.
In this report, we examine the data obtained from an online survey conducted for students enrolled in the Data Analytics unit (DATA2x02) students in Semester 2, 2021 at The University of Sydney. The survey was conducted via the online discussion forum Ed and was open to all students enrolled in the Data Analytics unit, both mainstream and advanced (DATA2002 and DATA2902). The survey consisted of 23 questions in total.
The report aims to answer four questions -
- Does the number of COVID tests a student has taken in the past two months follow a Poisson distribution?
- Do students in the advanced unit (DATA2902) experience higher levels of stress than students in the mainstream unit (DATA2002)?
- How do living arrangements affect the levels of loneliness that students experience?
- Do students who find the unit easier have a higher self-rated math/coding ability?
The aim of this project was to use spatial data and calculate a "fire risk score" based on the type of vegetation, risk category classified by the NSW Rural Fire Service, and other paramters such as the number of emergency services and health services. We then investigated the relationship between this fire risk score and neighborhood affluency.