Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spam submissions filter? #2

Open
joshuaeckroth opened this issue May 24, 2011 · 1 comment
Open

Spam submissions filter? #2

joshuaeckroth opened this issue May 24, 2011 · 1 comment

Comments

@joshuaeckroth
Copy link
Member

Suppose somebody submits a news article via the website, and Bruce is emailed but opts not to upload the submission on the wiki. Does the AINews software respect this decision or does the software still process the submission regardless?

@joshuaeckroth
Copy link
Member Author

From Bruce:

The articles that are submitted via the Submit Content button seem to fall into four categories: (a) true spam with no redeemable information content; (b) self-serving pointers to irrelevant articles or blogs; (c) occasional articles about AI; (d) stories I have found

The first two are the main reasons we want to review submissions before putting them on the site.
About half of the small number of case (c) submissions are missing information or have not been published in legitimate publications. When I ask for the information I either get no response or (a few times a year) the information needed for me to add it to the site.
With case (d) about half of the articles I submit are recent news stories, the rest are good expositions of an issue or a concept that I happen to find more than a week after their publication.

The case you hypothesize has not ever occurred, as you correctly suppose, probably because we only crawl legitimate sources and the scoring function is reasonably accurate.

So you may see the spam article in the AINews results. I doubt that this contingency has ever occurred, however.

We do have a mechanism for catching things like this, however. If one of us used the News Viewer every weekend to peruse the contents of the database of stories accumulated throughout the week prior to Monday a.m. publication, we would have a chance to mark the bad apples as irrelevant. This would keep them from being considered for publication (I believe) and would add a negative example to the training set for retraining the SVM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant