Combating Fake News with AI

It isn’t unreasonable to assume that political half-truths and celebrity hoaxes are recent phenomena. In reality, fake news – misinformation designed to deceive its listeners – has been a problem since ancient times.

Fake news was used back in the day by those in power to suit their agenda: for instance, monarchs would fabricate stories to defame those who threatened to take their position in power, lowering the public’s perception of their rivals. Following the invention of the printing press and other mass printing technologies, news agencies also got the power to spread fake news, and to more people. But still, there were relatively few that had the ability to distribute news to a large audience, whether real or fake.

However, all of this changed with the birth of the internet.

On the web, anyone could upload anything, and there is no governing body to police the information shared. The ability to perpetrate fake news trickled down to the average Joe. The internet’s easy accessibility and high visibility made it the prime site to disseminate misinformation for those with malicious intent. These two aspects of the internet made fake news a more difficult problem than it had ever been before.

And even still, new technologies are being developed that, in the wrong hands, could become even more powerful weapons to create and transmit fake news. As an example, Deep fakes can be used by perpetrators to make their victim appear to say anything in a video format.

It’s crucial that we combat this issue because the extent to which fake news can impact us – as individuals and as a society – is limitless.

We humans, by nature, are thinking machines that make our decisions based on the information we have. If your eardrums receive the sound of a child crying for help, you make the decision to find and help them. If your sense of touch tells you that the pot handle you’re grabbing is burning hot, you make the decision to let go of it. And, if your eyes tell you that there is a bear chasing after you, you make the sensible decision to run! The same concept applies to news – and consequently, fake news. If one day you hear on the news that a parliamentary candidate has committed tax fraud, you might make the decision not to vote for them. Of course, the trouble arises if this news was made up; and, you can see how such news, if fake (and convincing enough), would be damaging to both the candidate and the whole country.

External information can deceive us greatly, in contrast to the information we get directly from our physical senses. So, as long as the perpetrator of false information is skilled enough, humans can be made to do anything that suits their agenda. Countless people fall prey to fake news and make wrongly informed decisions every day. Many are even trapped in a bubble of fake news and see the world through glasses that are tinted one way or another.

What makes fake news such a powerful tool for evil is that it’s often popular!

Fake news can go viral, quickly. It doesn’t face the same limitations as factual news; any catchy fictional headline can be crafted specifically to give it the best chance of getting the most clicks and views. In the case of news agencies, there is a monetary incentive associated with greater publicity – albeit at the potential cost of distrust and a negative image. But, there will always exist those who don’t face the same consequences for circulating false information: anonymous users, nameless organizations, and nomadic corporations, just to name a few.

On the internet, since there isn’t someone to verify that a given piece of uploaded information is legitimate, the responsibility is given to the individual platforms and websites (e.g. Twitter, Facebook, Reddit) to manage it themselves. Often, they don’t – and can’t – do a very good job of it, due to the sensitivity of the topic and the sheer volume of information that must be filtered through by hand. Platforms like Twitter have made an effort to flag and remove high-profile fake information; however, since this is a manual process, it cannot come close to accounting for each of the 6,000 tweets made every second. As a result, much of social media (along with the rest of the internet) has become a sullied breeding ground for fake news. In short, human intervention cannot be a solution to the fake news epidemic.

Solution

If us humans can’t do it, then who can? Sometimes, it isn’t a “who”, but rather, a “what”.

Let me explain.

In the same way that technology has enabled more effective and efficient means of producing and spreading fake news, we can also use it to combat fake news in ways we couldn’t before.

Because of our existing biases, preconceptions, and agendas, it can be impossible to come to a consensus on whether some piece of information classifies as fake news. So, instead of arguing who is right or wrong amongst ourselves, why not utilise machines – which don’t have their own interests or agendas, and simply make their decisions based on algorithms – and have them decide for us?

If we could do this, there would be myriad benefits. Two major advantages that machines bring are speed and objectivity.

An efficient machine could determine the truthfulness of information much more rapidly than humans ever could. It learns from new information and never forgets. If such machine learning models are implemented effectively to support platforms and websites, it might be able to flag or eliminate virtually all fake news on the internet.

“If only we could create such a machine.” And although we might not be able to build a flawless one, we can come close, using the technology of today. When trained using a ground-truth set of data, machine-learning models can predict a target output in terms of the input variables; in our case, predicting whether something is fake news or not, given its text.

To do this experiment, I used Braintoy mlOS that lets me model at scale.

First, I sourced a news dataset from Kaggle containing ~45,000 entries, each of which comprises a news article’s title text, body text, publication date, news subject, and whether it was real or fake news (a ~50-50 split in this dataset).

Raw data sourced from Kaggle

My goal was to use this data to “teach” a model to detect fake news using the text.

Models never perform their best unless the data is appropriate. So I started by uploading the raw data as a .CSV file in the Data Engine.

Much time was spent wrangling and preprocessing the data. Datasets were made and models built. ML Engine was used to create classification models using a variety of algorithms. I iterated my experiments to increase the model performance by wrangling and preprocessing the data better.

Early Version of Fake News Model

Figure: Above, an early version of models. Below, the final models.

I tested several algorithms on my datasets and found that the best one was a Multilayer Perceptron, a type of Neural Network. That gave an accuracy of 94.88%.

It was a good first performance but there was room for improvement.

After some more wrestling with the datasets and countless more headaches, I was able to create a model using Random Forest with a 99.82% accuracy!

Here are some more performance metrics for the data scientists reading this.

Precision Recall Graph

Figure: Precision-recall graph

ROC Curve

Figure: ROC curve

Confusion Matrix

Figure: Confusion Matrix.

Now imagine if we implemented such models into social media platforms and search engines. It could warn users that what they’re about to read likely contains fake news, or potentially even purge false information altogether! You can see how much of an impact this model could have.

Now, I must point out that my models aren’t perfect. The dataset on which the model was trained isn’t representative of all fake news. It doesn’t work for any other mediums of media besides text. And, even if other models were built to avoid these shortcomings, perpetrators can and will devise methods to elude having their misinformation being flagged as fake news by exploiting the model’s inner mechanisms. Models have to be tuned. In many ways, this is much like the unceasing battle between black-hat hackers and cybersecurity experts.

Although the deficiencies of my models are undeniable, so is the potential. It should be taken as a proof-of-concept rather than a final product. I’m only a high school student with limited knowledge and experience in this matter. Data science and machine learning experts are already exploring this topic, with promising results (e.g. detecting deep fake images). Imagine what they could create in a few years! Perhaps more robust models: one that could prevent the need to fact-check online information, alleviate partisanship, and boost societal productivity.

With the recent and accelerating developments in machine learning, maybe the age of the information utopia isn’t too far-off.

Author

Andrew Han

Andrew Han is a Grade 11 high school student at North Toronto CI. Well-versed in computer science and math, he is cognizant of the boundless potential of AI and Machine Learning.

Start modeling now

FREE TRIAL

Attend a training

BOOK MY SPOT