Monday, February 9, 2015

Selective Correlation

Sociologist Gabriel Rossman of UCLA has an article "When Correlation Is Not Causation, But Something Much More Screwy" in The Atlantic. It graphically shows how calculating correlations on selected samples can produce misleading results. He imagines two characteristics of aspiring actors, ability (mind) and attractiveness (body) plotted in a scatterplot. Random observations from independent standard normal distributions are drawn to represent these variables in a population of actors. Most actors are centered around zero with fewer well above or well below, on mind or body. But since mind and body are chosen independently there is no correlation shown in the plot, that is, no tendency of the plot to tilt with a positive or negative slope. 

But then he imagines computing the correlation between mind and body from a sample of working actors. To get work, his actors have been selected by casting directors only if they have a high value for the sum of both mind and body. He has marked these working, observed actors with small triangles in the plot. The remaining aspiring non-working and unobserved actors are marked with small circles. The plotted pattern of triangles for the working, observed actors has a definite negative correlation suggesting, wrongly, that either the more able actors are not much to look at or the most attractive actors are dunces in their acting ability. Although this might fit some stereotypical caricatures, it is entirely due to the selective sampling based on the definition that produced our observed sample. He continues with an SAT example as well.

His scatterplot can be improved. Measurements with equal variability should be plotted in a square plot. The plotting characters for the unobserved and observed actors could be more pronounced. See an example below.

No comments: