# statpics

Devoted to images that illustrate statistical ideas

## Monday, February 23, 2015

### Gas Pump Wear

Labels:
bell-shaped,
distribution,
frequency,
normal

## Monday, February 16, 2015

### Waldo, revisited

We've seen the scatterplot and marginal distributions of Waldo's location in a collection of Where's Waldo. These were from Ben Blatt at Slate.com. Now Ronald Olsen as added a plot of the contours of a kernel density estimate of the joint frequency distribution of Waldo's location on the facing pages of the Waldo books. Darker color indicates higher density. Olsen then dynamically computes the optimal search path. Try it out. Can you find Waldo quicker?

Labels:
density,
distribution,
frequency,
joint,
kernel

## Monday, February 9, 2015

### Selective Correlation

Sociologist Gabriel Rossman of UCLA has an article "When Correlation Is Not Causation, But Something Much More Screwy" in The Atlantic. It graphically shows how calculating correlations on selected samples can produce misleading results. He imagines two characteristics of aspiring actors, ability (mind) and attractiveness (body) plotted in a scatterplot. Random observations from independent standard normal distributions are drawn to represent these variables in a population of actors. Most actors are centered around zero with fewer well above or well below, on mind or body. But since mind and body are chosen independently there is no correlation shown in the plot, that is, no tendency of the plot to tilt with a positive or negative slope.

But then he imagines computing the correlation between mind and body from a sample of working actors. To get work, his actors have been selected by casting directors only if they have a high value for the sum of both mind and body. He has marked these working, observed actors with small triangles in the plot. The remaining aspiring non-working and unobserved actors are marked with small circles. The plotted pattern of triangles for the working, observed actors has a definite negative correlation suggesting, wrongly, that either the more able actors are not much to look at or the most attractive actors are dunces in their acting ability. Although this might fit some stereotypical caricatures, it is entirely due to the selective sampling based on the definition that produced our observed sample. He continues with an SAT example as well.

His scatterplot can be improved. Measurements with equal variability should be plotted in a square plot. The plotting characters for the unobserved and observed actors could be more pronounced. See an example below.

But then he imagines computing the correlation between mind and body from a sample of working actors. To get work, his actors have been selected by casting directors only if they have a high value for the sum of both mind and body. He has marked these working, observed actors with small triangles in the plot. The remaining aspiring non-working and unobserved actors are marked with small circles. The plotted pattern of triangles for the working, observed actors has a definite negative correlation suggesting, wrongly, that either the more able actors are not much to look at or the most attractive actors are dunces in their acting ability. Although this might fit some stereotypical caricatures, it is entirely due to the selective sampling based on the definition that produced our observed sample. He continues with an SAT example as well.

His scatterplot can be improved. Measurements with equal variability should be plotted in a square plot. The plotting characters for the unobserved and observed actors could be more pronounced. See an example below.

Labels:
causation,
correlation,
normal,
scatterplot

## Saturday, January 31, 2015

### Deflater Mess

The statistical rumble over the fumble continues. Did the first analysis use the wrong statistic (similar to miles per gallon versus the gallons per mile discussions of a few years ago), the wrong comparison distribution, or did it not account for the myriad ways and opportunities for fumbling the football? Above is the histogram of fumbles per play (the reciprocal of plays per fumble in the first analysis). The New England Patriots are still separated, now on the left, from the rest of the league, but now they do not look as extreme. The histogram looks much more like what might be obtained from a normal distribution. Now the team is 2.45 standard deviations below the mean (as compared to 3.9 standard deviations above the mean using the plays per fumble statistic). Such results would occur slightly less than 1% of the time. Certainly not the one in 16,000 chance that the plays per fumble would suggest. What we have here is a statistical "teachable moment."

Labels:
distribution,
normal,
outlier,
probability

## Monday, January 26, 2015

### Super Hold

As the build-up to next Sunday's NFL Super Bowl continues, more evidence is surfacing about the amazing ability of the New England Patriots to hold onto the football. First, there was the football deflation controversy, termed InflateGate by some and Ballghazi by others. Somehow 11 of the 12 balls that the Patriots used in the first half of their rainy game against the Indianapolis Colts were under-inflated by as much as 2 psi. The twelfth ball was also under-inflated but not by as much. It is claimed that this makes the ball easier to hold onto, especially in the rain.

Now, from football data analyst Warren Sharp we get the histogram above. This shows data from 2010-2014 on the number of offensive plays run divided by the number of fumbles lost. This ratio of Plays per Fumble for the Patriots was 187, far out in the tail of the histogram. Based on Sharp's data on the 32 NFL teams, the mean number of Plays per Fumble was just over 105 and the standard deviation was just over 21. This makes the Patriots' value of 187 over 3.9 standard deviations above the mean. Assuming a normal distribution for this measure, as Sharp reported, such results have only a one chance in over 16,000 of occurring. A clear outlier.

Have they practiced and honed the skill of holding onto the football or is possibly deflation also at work here?

Now, from football data analyst Warren Sharp we get the histogram above. This shows data from 2010-2014 on the number of offensive plays run divided by the number of fumbles lost. This ratio of Plays per Fumble for the Patriots was 187, far out in the tail of the histogram. Based on Sharp's data on the 32 NFL teams, the mean number of Plays per Fumble was just over 105 and the standard deviation was just over 21. This makes the Patriots' value of 187 over 3.9 standard deviations above the mean. Assuming a normal distribution for this measure, as Sharp reported, such results have only a one chance in over 16,000 of occurring. A clear outlier.

Have they practiced and honed the skill of holding onto the football or is possibly deflation also at work here?

Labels:
distribution,
normal,
outlier,
probability

## Monday, January 19, 2015

### A Color-Coded One Sentence Explanation of the Fourier Transform

Ingenious use of concepts and color to help explain the formula for the (inverse) Fourier Transform. Devised by graphics/systems programmer Stuart Riffle, he includes a scale factor 1/N and as he has explained deals with the inverse of the Fourier Transform, but this is still clever. Similar to Epps' geometric explanation of the Characteristic Function in probability theory, that we illustrated some time ago with a cut-out from a CD canister.

Via Revolutions Analytics via Stuart Riffle post on i-programmer.

Labels:
characteristic function,
fourier transform

## Monday, January 12, 2015

Subscribe to:
Posts (Atom)