This is the backend (frontend here) for News X-Ray, a Chrome extension identifying agency reports that were used for news articles, as well as the exact sentences used. That way, you know which information is well-founded and what might be controversial. Through analysis we can show how biases differ between news sources. News X-Ray was created by Daniel Thevessen, Carl Goedecken and Nils Strelow at HackZurich 2017 within just under two days.
We all read news articles in some shape or form almost everyday. But in times of fake news and clickbait, how can you know who to trust? No matter what source you're reading, news should be transparent. That's why we want to show what information is behind a news article, enabling us to show what the news source might have added as well. That way, you can better identify the biases of individual news sources, and check if this is a trend for this source.
We have developed a browser extension that looks at the article you are currently reading, and identifies a related report from a news agency such as Reuters where the basic information and facts come from. A backend finds the parts of the article that were derived from this report - even if the author has rewritten them completely - through Natural Language Processing. The extension highlights the similarities and saves sentiment trends to show how news sources differ.
We identify related Reuter reports by finding keywords with IBM Bluemix and searching recent news history through the Reuters API. Once it has found one, the extension communicates with a Python Flask backend, which runs semantic sentence analysis on both articles to find similar meaning. We do so by generating sentence vectors with Facebook's fasttext and a model pre-trained on the English language. The extension takes this information to highlight things, and run sentiment analysis on the parts written by the news outlet itself.
Semantic sentence similarity is far from easy and still an open topic of research! There is no clear way of extracting meaning from whole sentences, so we had to try out many things. We even got to the limit of what free hosting vouchers could offer. With a model larger than 10 GB, we were lucky to have a laptop with 16 GB of RAM.
We're proud to have found a solution that not only works (no mockups!), but can also provide great insight on real articles. And, despite getting little sleep, we have all made it through HackZurich in one piece!
There is still so much AI can do for us, that we haven't even begun to think about!
Decomposing articles using data is great, but the data generated from that itself can be used as well. It would be great to collect more data and explore what we can do with that.