AI and the Fight against Fake News & Fake Stats
One of the biggest challenges facing society today is the proliferation of fake news and fake stats. It is relatively easy, today, to come up with statistics and charts that can bolster dubious claims. But, it is much more difficult to counter such claims. And it is especially difficult to counter such claims in a timely manner before these claims spread like wildfire in social media.
Our recent history has been littered with fake statistics of all kinds. Let’s just briefly examine a couple of them.
In Sept 2015, the following chart was shown in the US Congress to substantiate a claim that Planned Parenthood was shifting its focus from cancer screenings to abortions. The chart points appear to indicate that 327,000 abortions are greater in inherent value than 935,573 cancer screenings.
Yet, a closer examination will reveal that the chart has no defined y-axis. And if we plot it using a well-defined y-axis, it’s clear that the number of abortions has held relatively steady. The number of cancer screenings has come down (which can be explained by a variety of other reasons; however, it still outstrips the number of abortions. See the articles in Politifact and Datapine for a more detailed analysis of this story.) Politifact was able to use publicly available data from Planned Parenthood to present a much more accurate depiction of the number of abortions and cancer screenings.
Another example from a few years back concerned crime stats. One of the charts that was making its way around social media was this:
However, this was debunked by FactCheck.org. They used the FBI’s publicly available crime statistics reports to make them say that most of the numbers were grossly inaccurate, especially for whites killed by blacks and whites killed by whites. They reported that the actual statistics are:
- Blacks killed by whites: 7.6 percent.
- Whites killed by whites: 82.4 percent.
- Whites killed by blacks: 14.8 percent.
- Blacks killed by blacks: 90 percent.
In both these cases, the source data behind these fake statistics and fake news was publicly available and verifiable. However, it was left to fact-checkers to present the real statistics. This might take several days or weeks, and by that time the fake statistics might get widely dispersed in social media.
The general public, though, has no way to independently verify these facts. That is because, even though the raw data is available, it is usually difficult for most people to organize and analyze the data and come up with their own conclusions. They may not have the data modeling or programming skills to get insights from the available raw data.
In my view, one way to solve the problem is with the help of AI. More specifically, I’d like the general public to be more capable of analyzing datasets without needing technical expertise. And one very promising approach for this is through “Conversational Analytics.”
Conversational Analytics allows users to ask questions of datasets using plain natural language; i.e. they can converse with a bot to better understand one or more datasets. The bot should be able to understand the user’s questions and analyze the data and then present back relevant charts and analyses. This will give power to users to form their own opinions of the data rather than relying on charts created by others who may have their own biases and agendas.
Check out qbo insights from Unscrambl for a better understanding of how this might work in practice. You can also learn more about how conversational analytics can help tackle the scourge of fake news and fake stats in my upcoming talk, “Chat with your Data: Accessing Insights through Conversational Analytics,” at ODSC East 2021.
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform.