Is it better to be right or be happy? A British Medical Journal report investigates this research question in their lighthearted Christmas issue. The research sample consisted of one married couple (n=2). The female was blind to the null hypothesis being tested: it is better to be right than happy. The female was assigned the "right" condition. The male was assigned the condition of agreeing with the female's "every opinion and request without complaint." Happiness was measured on a 10 point Likert scale labeled Quality of Life. Unfortunately this study was terminated early because of "severe adverse outcomes." The males Quality of Life fell 4 points in 12 days, whereas the female's Quality of Life increased slightly from 8 to 8.5. The researcher's findings: "The results of this trial show that the availability of unbridled power
adversely affects the quality of life of those on the receiving end."
Be nice in the new year and be happy.
Monday, December 30, 2013
Monday, December 23, 2013
Probability of a White Christmas
The map above is the current snow cover (as of 13 December 2013) in the US according to the weather.com. Compare this to National Oceanic and Atmospheric Administration map from the last few years of Christmas morning snow cover over the last five years.
Combining such maps from 1908 to 2010, NOAA maps out the probability of a white Christmas (that is, 1" of snow on the ground).
They also have a report from 1995 that maps the probability of 1", 5", or 10" of snow on Christmas morning.
Except for the Northeast, a many of the areas of the greatest chances are not densely populated.
I wonder, what is the expected percentage of the US population that experiences a white Christmas?
Happy Holidays.
Monday, December 16, 2013
Scatterplot Waldo
Slate.com staff writer Ben Blatt has examined the wide range of Where's Waldo picture books,
looking for a useful search strategy to find Martin Handford's elusive cartoon character. Blatt plotted the horizontal and vertical page location of Waldo in 68 pictures in, what he calls, the seven "primary" Where's Waldo books. He claims to have sat for three hours with a tape measure in a Barnes & Noble bookstore and measured Waldo's location on each 20" X 12.5" two-page spread. The image above shows these locations and two horizontal bands of 1.5" each: one three inches from the bottom of the page and the other seven inches from the bottom. Blatt found that Waldo can be found in these bands in 53% (36) of the 68 images.
We can see the higher frequency of occurrence in bands by finding the marginal histograms from digitized locations in Blatt's scatterplot (data shown below). Here is the marginal histogram of the vertical locations of Waldo. The regions found by Blatt stand our prominently in the two modal peaks in the histogram.
Horizontally there is a less prominent patten of the locations across the two-page spread.
looking for a useful search strategy to find Martin Handford's elusive cartoon character. Blatt plotted the horizontal and vertical page location of Waldo in 68 pictures in, what he calls, the seven "primary" Where's Waldo books. He claims to have sat for three hours with a tape measure in a Barnes & Noble bookstore and measured Waldo's location on each 20" X 12.5" two-page spread. The image above shows these locations and two horizontal bands of 1.5" each: one three inches from the bottom of the page and the other seven inches from the bottom. Blatt found that Waldo can be found in these bands in 53% (36) of the 68 images.
We can see the higher frequency of occurrence in bands by finding the marginal histograms from digitized locations in Blatt's scatterplot (data shown below). Here is the marginal histogram of the vertical locations of Waldo. The regions found by Blatt stand our prominently in the two modal peaks in the histogram.
Horizontally there is a less prominent patten of the locations across the two-page spread.
Perhaps we could improve on the two-vertical-strips strategy by concentrating on the far left and far right of the two-page spread with then a glance just left of center?
Here are the data:
horizontal | vertical | horizontal | vertical | horizontal | vertical | horizontal | vertical |
1.02 | 11.97 | 9.52 | 7.21 | 11.79 | 6.22 | 8.8 | 2.95 |
7.78 | 10.22 | 10.51 | 7.71 | 14.78 | 5.72 | 12.57 | 3.71 |
8.51 | 9.99 | 11.52 | 7.97 | 14.51 | 4.96 | 17.76 | 3.97 |
9.26 | 9.46 | 11.29 | 6.95 | 18.03 | 5.43 | 18.03 | 4.21 |
10.77 | 10.48 | 12.19 | 7.77 | 18.75 | 5.43 | 19.01 | 4.47 |
11.99 | 11.48 | 12.51 | 8.24 | 1.8 | 3.97 | 16.81 | 2.95 |
13.27 | 10.48 | 12.51 | 7.48 | 1.31 | 3.45 | 1.04 | 2.72 |
16.26 | 11.21 | 14.51 | 8.21 | 2.26 | 3.18 | 1.54 | 1.93 |
19.48 | 12 | 14.25 | 7.21 | 3.77 | 4.21 | 1.54 | 1.43 |
16.26 | 9.99 | 15.62 | 7.48 | 3.51 | 3.45 | 3.28 | 1.2 |
17.5 | 9.99 | 17.24 | 7.48 | 3.8 | 3.45 | 7.29 | 1.93 |
17.74 | 9.46 | 18.49 | 7.21 | 5.31 | 3.97 | 8.27 | 1.2 |
15.5 | 8.97 | 19.25 | 7.48 | 4.79 | 2.98 | 8.53 | 0.44 |
5.78 | 8.24 | 2.76 | 6.75 | 5.54 | 3.21 | 10.54 | 2.45 |
6.76 | 7.97 | 3.8 | 6.98 | 6.76 | 4.44 | 17.5 | 2.69 |
7.52 | 8.47 | 3.28 | 5.96 | 8.27 | 4.21 | 19.48 | 2.45 |
8.51 | 8.47 | 5.54 | 6.72 | 9.03 | 3.45 | 18.98 | 1.93 |
Labels:
frequency,
histogram,
marginal,
scatterplot
Monday, December 9, 2013
Scatterplot Artists
Here is a scatterplot of writers placed, very subjectively, by J. Chen at htmlgiant on scales of Mediocrity to Genius on the horizontal axis and Modesty to Arrogance on the vertical. Some eclectic combinations along straight lines: Tom Wolfe, John Updike, T.S. Eliot, Jonathan Swift(?), and D.F. Wallace along a line of decreasing arrogance and increasing genius. He's also produced similar scatterplots for musicians: rappers and rockers (+Miles Davis?). For the writers, those that are Mediocre and Modest appear under-represented in his evaluation. Perhaps not surprising for writers, but check out his third quadrant for rappers.
This type of scatterplot is regularly published in the New York Magazine. Last year, the actress Meryl Streep was treated to a scatterplot that place her various movie roles on the axes from Cold to Warm and Frivolous to Serious.
And here's another from New York Magazine for Bruce Willis.
And in 2001 Newsweek magazine did the same for TV shows. For several years I referred to this in my lecture on scatterplots for Basic Statistics students. Of course, it's quite dated now. My current students were six years old when these shows were on TV!
This type of scatterplot is regularly published in the New York Magazine. Last year, the actress Meryl Streep was treated to a scatterplot that place her various movie roles on the axes from Cold to Warm and Frivolous to Serious.
And here's another from New York Magazine for Bruce Willis.
And in 2001 Newsweek magazine did the same for TV shows. For several years I referred to this in my lecture on scatterplots for Basic Statistics students. Of course, it's quite dated now. My current students were six years old when these shows were on TV!
Friday, November 29, 2013
Looking at Literary Lives
Here is a graphic showing the event history in the literary careers of famous 20th century authors. It was produced by the design firm Accurat for one of my favorite blogs Brain Pickings. (click here for the image for all the authors).
As the legend indicates, begin at an author's birth (at noon on the top of the circle) then move clockwise (around to midnight) representing an elapsed time of 100 years. Triangles are drawn connecting notable events in these literary lives (birth, publication(debut, masterpieces), and death). Authors' ages at their debut mark the vertex of one triangle (with birth and death). Ages, at the publication of of their masterpieces, are marked by the vertices of other shaded triangles. For many authors, their careers are displayed as a single triangle, showing that their masterpiece was their debut work. Others have several notable works, represented by overlapping shaded triangles. This provides a stylish depiction of these literary lives. But a long-lived author with a lone, early debut masterpiece (e.g. Norman Mailer) might have a triangle of the same size as one whose notable life was cut short (e.g. Jack London). Our eyes/minds are drawn to the sizes of the triangles. What are we to learn from the most central feature of these displays?
Labels:
data representation,
graphics,
infographic,
multivariate,
time series
Monday, November 25, 2013
Transitions
Thanksgiving is this week, now through Christmas is the most heavily traveled period of the year. It brings to mind how mobile the US population is. Not just for holiday travel, but even for places to call home. Americans are restless and we move.
Here is a clever interactive graphic illustrating this from data journalist Chris Walker and his site Vizynary (also posted by Wired.) The flows of migration from one US state to another are shown as arcs drawn between two states if at least 10,000 people moved between them in 2012. The width of the arcs indicate the frequency of migration along that path. The data came from the U.S Census American Community Survey. Interactively you can select a state and see the arcs of migration flow to other states. It reminds me of blasts of fireworks. Very well done.
Here is a clever interactive graphic illustrating this from data journalist Chris Walker and his site Vizynary (also posted by Wired.) The flows of migration from one US state to another are shown as arcs drawn between two states if at least 10,000 people moved between them in 2012. The width of the arcs indicate the frequency of migration along that path. The data came from the U.S Census American Community Survey. Interactively you can select a state and see the arcs of migration flow to other states. It reminds me of blasts of fireworks. Very well done.
Monday, November 18, 2013
Normal Snare Distribution
Here is the head of a snare drum (thanks Sean) showing the two dimensional, joint distribution of his drumstick hits. The maker, Evans, produces drum heads with two plys of plastic bonded together. In this image the pattern of wear and use reveal themselves through contour lines as closed curves indicating regions of similar frequency of use. The greatest wear is in the lightest colored central region, having seen the greatest frequency of hits. This central region has worn through the first ply showing the remaining plastic support underneath. Sean seems to have a very stable left hand, consistently hitting this small central region in nearly a circular pattern. In this region, the horizontal location of his hits seems independent of the vertical, producing this near circular pattern of the joint distribution. Surrounding this is the darker region of the top ply of plastic. This layer retains more dirt and grime than the underlying supporting plastic. Again the pattern of these hits is nearly circular. It turns out that with simple assumptions, like radial symmetry and independence, the pattern can be shown to be that of a bivariate normal distribution. A result that was thought first published (p. 398) by John Herschel in 1850, but actually discovered much earlier by the American mathematician Robert Adrain in 1808. More details on that in a future post.
But here, perhaps we see slight deviations from normality. The darker ring seems to show slightly more variability extending vertically and a greater clustering of hits on the bottom of this image. This indicates a bit of skewness towards the top of the picture. Less use and wear is finally shown in the cream colored outer region that has seem very few hits. Thanks again Sean.
But here, perhaps we see slight deviations from normality. The darker ring seems to show slightly more variability extending vertically and a greater clustering of hits on the bottom of this image. This indicates a bit of skewness towards the top of the picture. Less use and wear is finally shown in the cream colored outer region that has seem very few hits. Thanks again Sean.
Labels:
bivariate,
contour,
distribution,
independence,
joint,
normal
Monday, November 11, 2013
Top of the Heap? Mediocre Still
Monday, November 4, 2013
Correlation Ellipse Matrix
Here is an informative graphical representation of the correlation matrix of a data set of weather data on 16 variables. The image is via the RevolutionAnalytics blog and their post on big data available in R. The data mining R package Rattle produced the image. The magnitudes of correlations are shown with a concentration ellipses. Blue ellipses, with positive slopes, display variables with positive correlation with darker blue shading depicting higher positive correlation and lighter blue shading depicting lower positive correlation. Near zero correlations are shown as open, unfilled ellipses that are nearly circular. Variables with negative correlation are shown similarly by red ellipses with negative slopes. Dark red shading depicts greater negative correlation and lighter red shading depicts negative correlations closer to zero. This is a very useful tool to guide the eye through the relationships of the many variables used. Of course, as is well known, correlation is only an appropriate measure of the strength of the linear relationships between two variables when their scatterplot shows approximately the elliptical shape shown in these shaded icons. Examples abound of scatterplots where representing them as above can be misleading. My favorite, below, from the book by Chambers, et al. shows eight scatterplots all with the same positive correlation of 0.7. Correlation and its representation with an ellipsoidal icon is appropriate for only one of these scatterplots. This extols us all to remember, to look at the data.
Labels:
correlation,
data representation,
matrix,
multivariate,
scatterplot
Monday, October 28, 2013
Six Decades of Most Popular Girls Names by US State
Animated maps showing the most popular girls names from 1960 to 2012. There are some quick country-wide shifts from one year to the next. Consider Lisa in 1969 and Jennifer in 1970.
Monday, October 21, 2013
Best American Infographics of 2013
And another showing how the DNA analysis of canine genes cluster into four broad categories of dogs.
The video above gives more examples. It's easy to get engrossed in this elegant and insightful book.
Labels:
data representation,
frequency,
infographic,
video,
visualization
Monday, October 14, 2013
Central Limit Rabbits
A cute video illustrating the Central Limit Theorem in a "Creature Cast" by animator Shuyi Chiou via the New York Times. Here a normal distribution is presented correctly. On the horizontal axis is a number line of the weight of rabbits. Small rabbits are on the left and large rabbits are on the right. A smooth curve of their relative frequency of occurrence is also shown. The curve is low on each end indicating that at each extreme there are few small rabbits and few large ones. The curve is high in the middle where the relative frequency of average weight rabbits is large. In this case, the population distribution of rabbit weights appears to follow a normal curve. In previous posts, we have seen this number line aspect of such a normal curve indicated correctly and also incorrectly.
In illustrating the Central Limit Theorem the animation goes on to examine the average weight of groups or samples of rabbits.
Such average weights of samples builds up the sampling distribution of these averages. The Central Limit Theorem tells us that these group averages will more closely follow a normal distribution as the group size increases. An illustration starting with the bi-modal distribution of dragon wings is also shown.
In illustrating the Central Limit Theorem the animation goes on to examine the average weight of groups or samples of rabbits.
Such average weights of samples builds up the sampling distribution of these averages. The Central Limit Theorem tells us that these group averages will more closely follow a normal distribution as the group size increases. An illustration starting with the bi-modal distribution of dragon wings is also shown.
Labels:
bi-modal,
central limit theorem,
distribution,
frequency,
normal
Monday, October 7, 2013
Little Fruit Punch Love
Monday, September 30, 2013
Monday, September 23, 2013
Simpson's Paradox
Simpson's paradox fools many. Percentages can favor women over men across each of several subgroups but then reverse, favoring men over women when the subgroups are combined into one. At one level this seems illogical. We seem to expect that patterns observed consistently for portions of a whole should also apply when the portions are aggregated together into one. This simple view misses lurking variables. In a famous example, graduate admission to Berkeley seemed biased against women when considered overall, but when the admissions were considered by individual departments there was no bias or bias in favor of women. The lurking variable is that "not all departments are equally easy to enter." and "the proportion of women applicants tends to be high in departments that are hard to get into and low in those departments that are easy to get into".
Lewis Lehe and Victor Powell at UC Berkeley have produced interactive applets to illustrate Simpson's paradox. As Flowing Data mentions "Sometimes when you zoom in, you see a completely opposite trend of what you saw overall".
We've considered Simpson's paradox before where even microbes can be used to illustrate it.
Lewis Lehe and Victor Powell at UC Berkeley have produced interactive applets to illustrate Simpson's paradox. As Flowing Data mentions "Sometimes when you zoom in, you see a completely opposite trend of what you saw overall".
We've considered Simpson's paradox before where even microbes can be used to illustrate it.
Monday, September 16, 2013
Top of the Line
Upscale neighborhood. Greatest frequency of wear is on the premium.
Forwarded by a colleague (thanks Jun). Originally, I think, from Reddit.
Forwarded by a colleague (thanks Jun). Originally, I think, from Reddit.
Monday, September 9, 2013
Probability WONK
.
Robert Jernigan WONK Challenge from American University on Vimeo.
I finally saw my American University WONK Challenge Spot on the Jumbotron at the Washington Nationals game on August 27. Here's me pointing, and the Nationals won!
Robert Jernigan WONK Challenge from American University on Vimeo.
I finally saw my American University WONK Challenge Spot on the Jumbotron at the Washington Nationals game on August 27. Here's me pointing, and the Nationals won!
Foul balls have really hit the news lately with a fan in Cleveland catching 4 in one game this last month! As I mention in the spot some put the probability of catching a foul ball in any
game at about 1 in 1000. This, of course, varies with where you sit. Defending
or attacking this figure was not possible in such a short spot, so if
we accept it, we compute the probability that you catch at least one
foul ball in say, n, games. We can compute this probability by first
finding the probability of its complement. The complementary event of catching at least one foul ball is
catching no foul balls. In one game our chance of not catching a foul
ball is 1-0.001=0.999. If our catching a foul ball is independent from
game to game, then our chance of not catching a foul in n games is
(0.999)n. Subtracting this from one,
we get the probability of catching at least one foul ball in n games: 1-
(0.999)n. If we want this result to be at least 50-50 (that is, 0.50) we need to find
the value of n so that: 0.50 < =1-
(0.999)n. You can do this by
trial and error on a calculator or by using logarithms to solve for n. This will be the number of home games you must
attend to increase your chances of catching at least one foul ball to at
least 50-50. Now convert this to seasons of home games.
There are 162 games in a season, but only 81 are home games. You should get an answer of 8 home seasons plus about half of a ninth season, hence choice B in the video.
This was fun to do.
Monday, September 2, 2013
Wear Pattern in "Bedrock"
Monday, August 26, 2013
Monday, August 19, 2013
Variance Rules
[Earlier this post had errors. Thanks Kevin. I was thinking sequentially instead of group-wise. For correct reference, my mistake is corrected here. The overall conclusions have not changed.]
An interesting probability paradox from Futility Closet who credits Gábor J. Székely’s Paradoxes in Probability Theory and Mathematical Statistics via's Mark Chang’s Paradoxes in Scientific Inference.
Variance in a jury's judgement seems to be better than taking one person's word for it. As Futility Closet mentions:
A B C D E P(A) P(B) P(C) P(D) P(E) Product
1 0 0 0 0 0.05 0.9 0.9 0.9 0.8 0.02916
0 1 0 0 0 0.95 0.1 0.9 0.9 0.8 0.06156
0 0 1 0 0 0.95 0.9 0.1 0.9 0.8 0.06156
0 0 0 1 0 0.95 0.9 0.9 0.1 0.8 0.06156
0 0 0 0 1 0.95 0.9 0.9 0.9 0.2 0.13851
1 1 0 0 0 0.05 0.1 0.9 0.9 0.8 0.00324
1 0 1 0 0 0.05 0.9 0.1 0.9 0.8 0.00324
1 0 0 1 0 0.05 0.9 0.9 0.1 0.8 0.00324
1 0 0 0 1 0.05 0.9 0.9 0.9 0.2 0.00729
0 1 1 0 0 0.95 0.1 0.1 0.9 0.8 0.00684
0 1 0 1 0 0.95 0.1 0.9 0.1 0.8 0.00684
0 1 0 0 1 0.95 0.1 0.9 0.9 0.2 0.01539
0 0 1 1 0 0.95 0.9 0.1 0.1 0.8 0.00684
0 0 1 0 1 0.95 0.9 0.1 0.9 0.2 0.01539
0 0 0 1 1 0.95 0.9 0.9 0.1 0.2 0.01539
0 0 1 1 1 0.95 0.9 0.1 0.1 0.2 0.00171
0 1 0 1 1 0.95 0.1 0.9 0.1 0.2 0.00171
0 1 1 0 1 0.95 0.1 0.1 0.9 0.2 0.00171
0 1 1 1 0 0.95 0.1 0.1 0.1 0.8 0.00076
1 0 0 1 1 0.05 0.9 0.9 0.1 0.2 0.00081
1 0 1 0 1 0.05 0.9 0.1 0.9 0.2 0.00081
1 0 1 1 0 0.05 0.9 0.1 0.1 0.8 0.00036
1 1 0 0 1 0.05 0.1 0.9 0.9 0.2 0.00081
1 1 0 1 0 0.05 0.1 0.9 0.1 0.8 0.00036
1 1 1 0 0 0.05 0.1 0.1 0.9 0.8 0.00036
0 1 1 1 1 0.95 0.1 0.1 0.1 0.2 0.00019
1 0 1 1 1 0.05 0.9 0.1 0.1 0.2 0.00009
1 1 0 1 1 0.05 0.1 0.9 0.1 0.2 0.00009
1 1 1 0 1 0.05 0.1 0.1 0.9 0.2 0.00009
1 1 1 1 0 0.05 0.1 0.1 0.1 0.8 0.00004
1 1 1 1 1 0.05 0.1 0.1 0.1 0.2 0.00001
All those possibilities in red are mistaken coalitions with probability totaling: 0.00991.
[This is slightly smaller than the result originally posted which over-estimated this value as a comment suggested.]
From Futility Closet:
A B C D P(A) P(B) P(C) P(D) Product
1 0 0 0 0.05 0.9 0.9 0.9 0.03645
0 1 0 0 0.95 0.1 0.9 0.9 0.07695
0 0 1 0 0.95 0.9 0.1 0.9 0.07695
0 0 0 1 0.95 0.9 0.9 0.1 0.07695
1 1 0 0 0.05 0.1 0.9 0.9 0.00405
1 0 1 0 0.05 0.9 0.1 0.9 0.00405
1 0 0 1 0.05 0.9 0.9 0.1 0.00405
0 1 1 0 0.95 0.1 0.1 0.9 0.00855
0 1 0 1 0.95 0.1 0.9 0.1 0.00855
0 0 1 1 0.95 0.9 0.1 0.1 0.00855
0 1 1 1 0.95 0.1 0.1 0.1 0.00095
1 0 1 1 0.05 0.9 0.1 0.1 0.00045
1 1 0 1 0.05 0.1 0.9 0.1 0.00045
1 1 1 0 0.05 0.1 0.1 0.9 0.00045
1 1 1 1 0.05 0.1 0.1 0.1 0.00005
All those possibilities in red are mistaken coalitions with probability totaling: 0.0145.
[This is slightly smaller than the result originally posted as a comment suggested.]
Again from Futility Closet:
An interesting probability paradox from Futility Closet who credits Gábor J. Székely’s Paradoxes in Probability Theory and Mathematical Statistics via's Mark Chang’s Paradoxes in Scientific Inference.
Variance in a jury's judgement seems to be better than taking one person's word for it. As Futility Closet mentions:
Chang writes, “This paradox implies it is better to have your own opinion even if it is not as good as the leader’s opinion, in general.”From Futility Closet consider:
"A, B, C, D, and E make up a five-member jury. They’ll decide the guilt of a prisoner by a simple majority vote. The probability that A gives the wrong verdict is 5%; for B, C, and D it’s 10%; for E it’s 20%. When the five jurors vote independently, the probability that they’ll bring in the wrong verdict is about 1%".For such a 5 member juries the possibilities are: mistaken=1, correct=0:
A B C D E P(A) P(B) P(C) P(D) P(E) Product
1 0 0 0 0 0.05 0.9 0.9 0.9 0.8 0.02916
0 1 0 0 0 0.95 0.1 0.9 0.9 0.8 0.06156
0 0 1 0 0 0.95 0.9 0.1 0.9 0.8 0.06156
0 0 0 1 0 0.95 0.9 0.9 0.1 0.8 0.06156
0 0 0 0 1 0.95 0.9 0.9 0.9 0.2 0.13851
1 1 0 0 0 0.05 0.1 0.9 0.9 0.8 0.00324
1 0 1 0 0 0.05 0.9 0.1 0.9 0.8 0.00324
1 0 0 1 0 0.05 0.9 0.9 0.1 0.8 0.00324
1 0 0 0 1 0.05 0.9 0.9 0.9 0.2 0.00729
0 1 1 0 0 0.95 0.1 0.1 0.9 0.8 0.00684
0 1 0 1 0 0.95 0.1 0.9 0.1 0.8 0.00684
0 1 0 0 1 0.95 0.1 0.9 0.9 0.2 0.01539
0 0 1 1 0 0.95 0.9 0.1 0.1 0.8 0.00684
0 0 1 0 1 0.95 0.9 0.1 0.9 0.2 0.01539
0 0 0 1 1 0.95 0.9 0.9 0.1 0.2 0.01539
0 0 1 1 1 0.95 0.9 0.1 0.1 0.2 0.00171
0 1 0 1 1 0.95 0.1 0.9 0.1 0.2 0.00171
0 1 1 0 1 0.95 0.1 0.1 0.9 0.2 0.00171
0 1 1 1 0 0.95 0.1 0.1 0.1 0.8 0.00076
1 0 0 1 1 0.05 0.9 0.9 0.1 0.2 0.00081
1 0 1 0 1 0.05 0.9 0.1 0.9 0.2 0.00081
1 0 1 1 0 0.05 0.9 0.1 0.1 0.8 0.00036
1 1 0 0 1 0.05 0.1 0.9 0.9 0.2 0.00081
1 1 0 1 0 0.05 0.1 0.9 0.1 0.8 0.00036
1 1 1 0 0 0.05 0.1 0.1 0.9 0.8 0.00036
0 1 1 1 1 0.95 0.1 0.1 0.1 0.2 0.00019
1 0 1 1 1 0.05 0.9 0.1 0.1 0.2 0.00009
1 1 0 1 1 0.05 0.1 0.9 0.1 0.2 0.00009
1 1 1 0 1 0.05 0.1 0.1 0.9 0.2 0.00009
1 1 1 1 0 0.05 0.1 0.1 0.1 0.8 0.00004
1 1 1 1 1 0.05 0.1 0.1 0.1 0.2 0.00001
All those possibilities in red are mistaken coalitions with probability totaling: 0.00991.
[This is slightly smaller than the result originally posted which over-estimated this value as a comment suggested.]
From Futility Closet:
"But if E (whose judgment is poorest) abandons his autonomy and echoes the vote of A (whose judgment is best), the chance of an error rises to 1.5%".In this situation juror E always agrees with juror A, so if A is included in a mistaken coalition it only needs two more jurors to form a simple majority. Of course A might not be included, then a mistaken coalition needs jurors B, C, and D. The possibilities and their probabilities are shown below:
A B C D P(A) P(B) P(C) P(D) Product
1 0 0 0 0.05 0.9 0.9 0.9 0.03645
0 1 0 0 0.95 0.1 0.9 0.9 0.07695
0 0 1 0 0.95 0.9 0.1 0.9 0.07695
0 0 0 1 0.95 0.9 0.9 0.1 0.07695
1 1 0 0 0.05 0.1 0.9 0.9 0.00405
1 0 1 0 0.05 0.9 0.1 0.9 0.00405
1 0 0 1 0.05 0.9 0.9 0.1 0.00405
0 1 1 0 0.95 0.1 0.1 0.9 0.00855
0 1 0 1 0.95 0.1 0.9 0.1 0.00855
0 0 1 1 0.95 0.9 0.1 0.1 0.00855
0 1 1 1 0.95 0.1 0.1 0.1 0.00095
1 0 1 1 0.05 0.9 0.1 0.1 0.00045
1 1 0 1 0.05 0.1 0.9 0.1 0.00045
1 1 1 0 0.05 0.1 0.1 0.9 0.00045
1 1 1 1 0.05 0.1 0.1 0.1 0.00005
All those possibilities in red are mistaken coalitions with probability totaling: 0.0145.
[This is slightly smaller than the result originally posted as a comment suggested.]
Again from Futility Closet:
"Even more surprisingly, if B, C, D, and E all follow A, then the chance of a bad verdict rises to 5%, five times worse than if they vote independently, even though A is nominally the best leader".Variance is good!
Monday, August 12, 2013
Plastic Feet Peaks
Here is an example of a remarkably symmetric pattern of wear and use,
but most assuredly not bell-shaped. The pattern on the top of this trash bin shows two prominent areas of
wear at the left and right side of the opening. These two areas show
greater wear than a large fairly uniform area of use in the center between the
peaks. The two extreme areas of use tell us something about the modes of
customers’ and restaurant workers’ actions.
Fast food is often delivered on plastic serving trays. As
diners leave, they collect the assorted packaging and wrappings from their
meals and deposit them in the trash bin near the exit. The diners then return
their serving trays to the top of the trash bin. The plastic trays have small
raised ridges on bottom of each corner. These small ridged “feet” act to
provide a tiny gap between stacked trays to make them easier to separate.
When the top of the trash bin is empty, trays are returned
by sliding them back along the front edge of the bin. The plastic feet on the
bottom of the trays scrape along the top of the bin. This leaves prominent
peaks in the wear pattern on the bin. As the trays are slid further the central
portion of the trays sag and also scrape the bin to produce the pattern of use
showing almost uniform wear between these two peaks.
Of course we would expect to produce this type of wear
mainly when the top of the bin is empty, allowing the sliding tray to wear down
the top. Later trays may not produce any wear along these edges if they are
just placed on top of trays already in position. But here is where the
restaurant’s workers contribute to the pattern.
After awhile, the trays stack up and must be returned for,
what is hoped, a good washing. As they are retrieved, the pile of trays is slid
forward to be picked up. This produces the uniform center wear and the peaks
along the right-hand and left-hand edges as the trays and their feet again
scrape the top of the bin. These actions produce the pattern of nearly equal
left and right peaks of use with more uniform wear in between, resulting in a
symmetric, but bi-modal frequency distribution.
Labels:
bell-shaped,
bi-modal,
distribution,
frequency
Monday, August 5, 2013
Earliest Living Histogram Revisited (and Reversed)
While preparing for my talk at the Joint Statistical Meetings in Montreal this week, I had the occasion to consider again the Earliest Living Histogram that I posted in 2008. This image appeared on page 450 of Popular Science Monthly, September 1901 in a paper "The Statistical Study of Evolution," by C.B. Davenport. Forty University of Chicago students are arranged by height in bins of two inch width. When viewed from above we see what was much later called a "living histogram" of the heights of this sample of men. It is described in more detail in Graphical Methods for Presenting Facts (1914) (Figure 141) by W.C. Brinton where, on page 165, he writes:
Davenport says they are "arranged (approximately) in order of height." Examine the tallest few students shown here:
Note the shading of the hats that the tallest five men are holding: Gray, White, Black, White, and Black, reading from tallest downward (right to left). Now consider their arrangement in classes by height in the image above. Note that the five men on the far left are now wearing hats with shading Gray, White, Black, White, and Black.
The histogram image is reversed!
The taller men are shown on the left and the shorter on the right. This is exactly the reverse of the description given by Brinton and it reverses his reasoning and conclusions about the frequency of tall and short men in this sample. But there is more evidence. Reversing this image along a properly oriented and indicated number line we get the image below along with counts of the men standing in two of the histogram classes.
The five numbered men in dark hats, on the right, stand in a row of about the same extent as the seven numbered, smaller men in the row on the left. The men on the right are not just taller but also broader, and the smaller men on the left take up much less space in this regard. Another indication that this is the proper view of this histogram. The original published histogram should be reversed.
The correctly oriented Earliest Living Histogram is shown below:
In Fig. 141 a group of men have been arranged in different rows. There is only one man in the shortest class at the left, and only one man in each of the tallest two classes at the right. Most of the men are of that height shown by the row to the right of the center of the diagram. A glance at the photograph taken looking down on this group of men shows that there are more men shorter than the most frequent height than there are men taller.Davenport's original publication of this photograph also contains another image of the forty students:
Davenport says they are "arranged (approximately) in order of height." Examine the tallest few students shown here:
Note the shading of the hats that the tallest five men are holding: Gray, White, Black, White, and Black, reading from tallest downward (right to left). Now consider their arrangement in classes by height in the image above. Note that the five men on the far left are now wearing hats with shading Gray, White, Black, White, and Black.
The histogram image is reversed!
The taller men are shown on the left and the shorter on the right. This is exactly the reverse of the description given by Brinton and it reverses his reasoning and conclusions about the frequency of tall and short men in this sample. But there is more evidence. Reversing this image along a properly oriented and indicated number line we get the image below along with counts of the men standing in two of the histogram classes.
The five numbered men in dark hats, on the right, stand in a row of about the same extent as the seven numbered, smaller men in the row on the left. The men on the right are not just taller but also broader, and the smaller men on the left take up much less space in this regard. Another indication that this is the proper view of this histogram. The original published histogram should be reversed.
The correctly oriented Earliest Living Histogram is shown below:
Subscribe to:
Posts (Atom)