Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis and Auto Label #29

Open
andreaschandra opened this issue Mar 20, 2021 · 4 comments
Open

Analysis and Auto Label #29

andreaschandra opened this issue Mar 20, 2021 · 4 comments
Assignees
Labels
pipeline: analysis something need to be prove hypothesis pipeline: exploratory explore the data priority: now approaching deadline type: discussion work: obvious

Comments

@andreaschandra
Copy link
Member

andreaschandra commented Mar 20, 2021

boxplot_agg

Checking outlier of both hashtag and mention count. from this, we can automatically label account from threshold max or above average as a buzzer

This distribution will be combined with initial 200 accounts and 7000 accounts

@andreaschandra andreaschandra self-assigned this Mar 20, 2021
@andreaschandra andreaschandra added pipeline: analysis something need to be prove hypothesis pipeline: exploratory explore the data priority: now approaching deadline type: discussion work: obvious labels Mar 20, 2021
@andreaschandra
Copy link
Member Author

and check if 7200 accounts using this threshold is effective and accurate enough.
After apply thresholding, checking manually if there is a different label

@andreaschandra
Copy link
Member Author

boxplot_agg (1)

combination 46K Users + 7200K users

@andreaschandra
Copy link
Member Author

In total, we have 41,756 Users

@andreaschandra
Copy link
Member Author

andreaschandra commented Mar 20, 2021

remove account that has less than 100 tweets, end up we have 39,343 Users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline: analysis something need to be prove hypothesis pipeline: exploratory explore the data priority: now approaching deadline type: discussion work: obvious
Projects
None yet
Development

No branches or pull requests

1 participant