data.html
: pure raw data from the data source. Retrieved using inspect element and save the HTML.raw_data.csv
: table extracted fromdata.html
. Columns are fixed (see schema)clean_data.csv
: cleaned with these steps in order- Set lower case of feature with string data type
- Normalize salary to monthly IDR
- Fix company name
- Remove duplicates
Note that in clean_data.csv
, no outlier were removed. Please see notebook for more details on data cleaning.
role
: str. Role/title of the jobcompany
: str. Not all of the companies are disclosed, only Gojek, Shopee, Tiket.com, Traveloka, Tokopedia, Bukalapak companies are disclosed by default. Undisclosed company is set asPurchase to unlock 👆
years_of_exp
: int. Years of experience, ranging from 0 to 20 or morecity
: str. City where the salary owner is livingcountry
: str. Country code such as "ID", "SG", "MY", "DE"gender
: str. Male, female, or prefer not to tellcurrency
: str. Salary currency code such as "IDR" or "USD"salary
: int. The salary amount based on the currency and the periodmode
: str. Gross or net salaryperiod
: str. Monthly or annually paid salarycompensation
: str. Non-cash compensationverified
: str. Whether the salary owner also attach salary slip during the data submission