Monday, February 23, 2015

Gas Pump Wear


The wear pattern in this image shows a frequency distribution of customers’ finger placement as they take their receipt from a gasoline pump. They likely target their grab at the center of the receipt. As they remove the receipt, the paper and their fingers rub paint off the dispenser. The wear pattern seems to be symmetric about the center, that is, the right side of the wear pattern is a near mirror image of the left side. This shows that users favored neither left nor right more often in removing their receipts. The image of wear shows more wear near the center of the dispenser and less on the edges but it seems balanced at the central target point. The bell-shaped curve of appears to be a very good match for a normal curve. 

Monday, February 16, 2015

Waldo, revisited

We've seen the scatterplot and marginal distributions of Waldo's location in a collection of Where's Waldo. These were from Ben Blatt at Slate.com. Now Ronald Olsen has added a plot of the contours of a kernel density estimate of the joint frequency distribution of Waldo's location on the facing pages of the Waldo books. Darker color indicates higher density. Olsen then dynamically computes the optimal search path. Try it out. Can you find Waldo quicker?


Monday, February 9, 2015

Selective Correlation

Sociologist Gabriel Rossman of UCLA has an article "When Correlation Is Not Causation, But Something Much More Screwy" in The Atlantic. It graphically shows how calculating correlations on selected samples can produce misleading results. He imagines two characteristics of aspiring actors, ability (mind) and attractiveness (body) plotted in a scatterplot. Random observations from independent standard normal distributions are drawn to represent these variables in a population of actors. Most actors are centered around zero with fewer well above or well below, on mind or body. But since mind and body are chosen independently there is no correlation shown in the plot, that is, no tendency of the plot to tilt with a positive or negative slope. 

But then he imagines computing the correlation between mind and body from a sample of working actors. To get work, his actors have been selected by casting directors only if they have a high value for the sum of both mind and body. He has marked these working, observed actors with small triangles in the plot. The remaining aspiring non-working and unobserved actors are marked with small circles. The plotted pattern of triangles for the working, observed actors has a definite negative correlation suggesting, wrongly, that either the more able actors are not much to look at or the most attractive actors are dunces in their acting ability. Although this might fit some stereotypical caricatures, it is entirely due to the selective sampling based on the definition that produced our observed sample. He continues with an SAT example as well.

His scatterplot can be improved. Measurements with equal variability should be plotted in a square plot. The plotting characters for the unobserved and observed actors could be more pronounced. See an example below.