Monday, August 26, 2013

Monday, August 19, 2013

Variance Rules

[Earlier this post had errors. Thanks Kevin. I was thinking sequentially instead of group-wise.  For correct reference, my mistake is corrected here. The overall conclusions have not changed.]

An interesting probability paradox from Futility Closet who credits Gábor J. Székely’s Paradoxes in Probability Theory and Mathematical Statistics via's Mark Chang’s Paradoxes in Scientific Inference.

Variance in a jury's judgement seems to be better than taking one person's word for it. As Futility Closet mentions:
Chang writes, “This paradox implies it is better to have your own opinion even if it is not as good as the leader’s opinion, in general.”
From Futility Closet consider:
"A, B, C, D, and E make up a five-member jury. They’ll decide the guilt of a prisoner by a simple majority vote. The probability that A gives the wrong verdict is 5%; for B, C, and D it’s 10%; for E it’s 20%. When the five jurors vote independently, the probability that they’ll bring in the wrong verdict is about 1%".
For such a 5 member juries the possibilities are: mistaken=1, correct=0:
 
A    B    C    D    E        P(A)    P(B)   P(C)   P(D)   P(E)   Product
1    0    0    0    0        0.05    0.9    0.9    0.9    0.8    0.02916
0    1    0    0    0        0.95    0.1    0.9    0.9    0.8    0.06156
0    0    1    0    0        0.95    0.9    0.1    0.9    0.8    0.06156
0    0    0    1    0        0.95    0.9    0.9    0.1    0.8    0.06156
0    0    0    0    1        0.95    0.9    0.9    0.9    0.2    0.13851
1    1    0    0    0        0.05    0.1    0.9    0.9    0.8    0.00324
1    0    1    0    0        0.05    0.9    0.1    0.9    0.8    0.00324
1    0    0    1    0        0.05    0.9    0.9    0.1    0.8    0.00324
1    0    0    0    1        0.05    0.9    0.9    0.9    0.2    0.00729
0    1    1    0    0        0.95    0.1    0.1    0.9    0.8    0.00684
0    1    0    1    0        0.95    0.1    0.9    0.1    0.8    0.00684
0    1    0    0    1        0.95    0.1    0.9    0.9    0.2    0.01539
0    0    1    1    0        0.95    0.9    0.1    0.1    0.8    0.00684
0    0    1    0    1        0.95    0.9    0.1    0.9    0.2    0.01539
0    0    0    1    1        0.95    0.9    0.9    0.1    0.2    0.01539
0    0    1    1    1        0.95    0.9    0.1    0.1    0.2    0.00171
0    1    0    1    1        0.95    0.1    0.9    0.1    0.2    0.00171
0    1    1    0    1        0.95    0.1    0.1    0.9    0.2    0.00171
0    1    1    1    0        0.95    0.1    0.1    0.1    0.8    0.00076
1    0    0    1    1        0.05    0.9    0.9    0.1    0.2    0.00081
1    0    1    0    1        0.05    0.9    0.1    0.9    0.2    0.00081
1    0    1    1    0        0.05    0.9    0.1    0.1    0.8    0.00036
1    1    0    0    1        0.05    0.1    0.9    0.9    0.2    0.00081
1    1    0    1    0        0.05    0.1    0.9    0.1    0.8    0.00036
1    1    1    0    0        0.05    0.1    0.1    0.9    0.8    0.00036
0    1    1    1    1        0.95    0.1    0.1    0.1    0.2    0.00019
1    0    1    1    1        0.05    0.9    0.1    0.1    0.2    0.00009
1    1    0    1    1        0.05    0.1    0.9    0.1    0.2    0.00009
1    1    1    0    1        0.05    0.1    0.1    0.9    0.2    0.00009
1    1    1    1    0        0.05    0.1    0.1    0.1    0.8    0.00004
1    1    1    1    1        0.05    0.1    0.1    0.1    0.2    0.00001


All those possibilities in red are mistaken coalitions with probability totaling:  0.00991.
[This is slightly smaller than the result originally posted which over-estimated this value as a comment suggested.]


From Futility Closet:
"But if E (whose judgment is poorest) abandons his autonomy and echoes the vote of A (whose judgment is best), the chance of an error rises to 1.5%".
In this situation juror E always agrees with juror A, so if A is included in a mistaken coalition it only needs two more jurors to form a simple majority. Of course A might not be included, then a mistaken coalition needs jurors B, C, and D. The possibilities and their probabilities are shown below:

A    B    C    D        P(A)    P(B)   P(C)   P(D)   Product
1    0    0    0        0.05    0.9    0.9    0.9    0.03645
0    1    0    0        0.95    0.1    0.9    0.9    0.07695
0    0    1    0        0.95    0.9    0.1    0.9    0.07695
0    0    0    1        0.95    0.9    0.9    0.1    0.07695
1    1    0    0        0.05    0.1    0.9    0.9    0.00405
1    0    1    0        0.05    0.9    0.1    0.9    0.00405
1    0    0    1        0.05    0.9    0.9    0.1    0.00405
0    1    1    0        0.95    0.1    0.1    0.9    0.00855
0    1    0    1        0.95    0.1    0.9    0.1    0.00855
0    0    1    1        0.95    0.9    0.1    0.1    0.00855
0    1    1    1        0.95    0.1    0.1    0.1    0.00095
1    0    1    1        0.05    0.9    0.1    0.1    0.00045
1    1    0    1        0.05    0.1    0.9    0.1    0.00045
1    1    1    0        0.05    0.1    0.1    0.9    0.00045
1    1    1    1        0.05    0.1    0.1    0.1    0.00005


All those possibilities in red are mistaken coalitions with probability totaling:  0.0145.
[This is slightly smaller than the result originally posted as a comment suggested.]
Again from Futility Closet:
"Even more surprisingly, if B, C, D, and E all follow A, then the chance of a bad verdict rises to 5%, five times worse than if they vote independently, even though A is nominally the best leader".
Variance is good!

Monday, August 12, 2013

Plastic Feet Peaks



Here is an example of a remarkably symmetric pattern of wear and use, but most assuredly not bell-shaped. The pattern on the top of this trash bin shows two prominent areas of wear at the left and right side of the opening. These two areas show greater wear than a large fairly uniform area of use in the center between the peaks. The two extreme areas of use tell us something about the modes of customers’ and restaurant workers’ actions.

Fast food is often delivered on plastic serving trays. As diners leave, they collect the assorted packaging and wrappings from their meals and deposit them in the trash bin near the exit. The diners then return their serving trays to the top of the trash bin. The plastic trays have small raised ridges on bottom of each corner. These small ridged “feet” act to provide a tiny gap between stacked trays to make them easier to separate.

When the top of the trash bin is empty, trays are returned by sliding them back along the front edge of the bin. The plastic feet on the bottom of the trays scrape along the top of the bin. This leaves prominent peaks in the wear pattern on the bin. As the trays are slid further the central portion of the trays sag and also scrape the bin to produce the pattern of use showing almost uniform wear between these two peaks.

Of course we would expect to produce this type of wear mainly when the top of the bin is empty, allowing the sliding tray to wear down the top. Later trays may not produce any wear along these edges if they are just placed on top of trays already in position. But here is where the restaurant’s workers contribute to the pattern.

After awhile, the trays stack up and must be returned for, what is hoped, a good washing. As they are retrieved, the pile of trays is slid forward to be picked up. This produces the uniform center wear and the peaks along the right-hand and left-hand edges as the trays and their feet again scrape the top of the bin. These actions produce the pattern of nearly equal left and right peaks of use with more uniform wear in between, resulting in a symmetric, but bi-modal frequency distribution.

Monday, August 5, 2013

Earliest Living Histogram Revisited (and Reversed)

While preparing for my talk at the Joint Statistical Meetings in Montreal this week, I had the occasion to consider again the Earliest Living Histogram that I posted in 2008. This image appeared on page 450 of Popular Science Monthly, September 1901 in a paper "The Statistical Study of Evolution," by C.B. Davenport. Forty University of Chicago students are arranged by height in bins of two inch width. When viewed from above we see what was much later called a "living histogram" of the heights of this sample of men.  It is described in more detail in Graphical Methods for Presenting Facts (1914) (Figure 141) by W.C. Brinton where, on page 165, he writes:
In Fig. 141 a group of men have been arranged in different rows. There is only one man in the shortest class at the left, and only one man in each of the tallest two classes at the right. Most of the men are of that height shown by the row to the right of the center of the diagram. A glance at the photograph taken looking down on this group of men shows that there are more men shorter than the most frequent height than there are men taller.
Davenport's original publication of this photograph also contains another image of the forty students:
Davenport says they are "arranged (approximately) in order of height." Examine the tallest few students shown here:
Note the shading of the hats that the tallest five men are holding: Gray, White, Black, White, and Black, reading from tallest downward (right to left). Now consider their arrangement in classes by height in the image above. Note that the five men on the far left are now wearing hats with shading Gray, White, Black, White, and Black.

The histogram image is reversed!

The taller men are shown on the left and the shorter on the right. This is exactly the reverse of the description given by Brinton and it reverses his reasoning and conclusions about the frequency of tall and short men in this sample. But there is more evidence. Reversing this image along a properly oriented and indicated number line we get the image below along with counts of the men standing in two of the histogram classes.
The five numbered men in dark hats, on the right, stand in a row of about the same extent as the seven numbered, smaller men in the row on the left. The men on the right are not just taller but also broader, and the smaller men on the left take up much less space in this regard. Another indication that this is the proper view of this histogram. The original published histogram should be reversed.
The correctly oriented Earliest Living Histogram is shown below: