Search This Blog

Monday, May 21, 2018

Everybody lies with visualizations

Photo by Ashkan Forouzani on Unsplash



As I wrote here, before it became trendy to point out the problem of fake news, I explored how data visualizations can mislead people. I’ve noticed that in the last couple of years, data visualization has become a major focal point.  The old maxim of “Seeing is believing” is the real driving force behind visualizations of data.  While not all of us relate to spreadsheets, we tend to respond well to graphs, charts, and other visually appealing renderings of those numbers.
  

While we can all fall for it, we can immunize ourselves to some extent with vitamin C. 

Back in 2016, I identified three key C's in a Baseline article, Data Visualization: You Must 'C' It to Believe It: Context, Correlatin, and Causation. 
Context: This includes contextual information for the graphs, which sometimes indicates that the results visualized represent outliers rather than typical results. Getting the context also requires getting the baseline for the survey, including timelines, locations, and the population size and type used to get the numbers.
As data visualization tools include ways to slice and dice your data, it is not all that difficult to zero in on just the segment that yields the results you want. So you need to know the larger context, as well as any added-in points that are outside that particular context.
Correlation: This is the supposed strongpoint of visualizations: showing up correlations. But they are easily manipulated and misleading, as there are many correlations of time that are not necessarily causally connected—though visualizations can make them appear that they are.
Causation: This is what real insight is all about: finding out what causes what. There is no substitute for thinking this through, no matter how seductive it may be to simply go with the correlations presented by the visualization.
In revisiting an argument offering data visualization as proof, I've come to add some additional C tests:
  • Correspondence to reality. Just because someone claims expertise doesn't mean they are completely correct about their assertions. For example, when I was in labor with my first baby, the doctors and nurses at the hospital just dismissed my pains, claiming the contractions were "mild" and that the birth was far from imminent. I was not the expert; they were, but I knew that I felt the baby coming. As it turned out, the resident barely got to me in time. I learned from that experience that you should not be gaslighted by expert views that directly contradict not what you just think you know but what you do know and directly experience. 
  • Convenience: This pertains to both means and ends. Convenience of means refers to using the data that is on hand or easily measured even if it's not necessarily the data that is the most relevant. It's rather like measuring how much snow fell on your windowsill because it's easy to reach rather than going out to get the measure on the street and in drifts to get a more accurate measurement. Convenience for ends is about selecting data that you can easily fit into the conclusion you wish to draw AKA cherry picking. 
  • Confirmation Bias:In general, when you look for data on something, you have to bear in mind that absolute objectivity is rare. Many of us have deeply-seated values and beliefs that will not allow us to entertain the possibility that we are on the wrong track,which would skew our results because of what we allow and disallow in the data set. It is the equivalent to painting a bull's eye around where your arrow went. So ask yourself, does the person have some personal agenda that could be coloring the outcome? If so you should treat them with the same healthy skepticism you would treat cigarette tobacco studies sponsored by tobacco companies. 
  • Certainty Camouflaging Contingencies: Few things are absolutes, so if someone states something without qualifiers, likely something is being hidden or glossed over -- like the fact that the data is out of date or taking searches of racist terms and jokes as proxies for the person being a racist and then shifting labels from what actually is measured to what the person says is signified by the measurement. This leads to a triple F: Fudging Figures and Facts.

    All of these were inspired by an argument made in Seth Stephens-Davidowitz's book Everybody Lies. Read more about it in Sex, Lies, and Data Profiles

    No comments: