Monday, August 27, 2012

Physics as a Continent

Here is a post from the blog Strange Maps showing Bernard H. Porter’s 1939 depiction of the history of physics as a continental map, complete with rivers of thought, villages named after the pioneers of physics, and a scattering of symbols. Several thinkers prominent to probability and statistics can be found on the map, being natural philosophers of their time. Included are Pascal, Laplace, the Bernoullis, and Galton, although his dates are wrong. He was born in 1822 but he died in 1911 not 1899 as listed. We need a map like this for just the field of statistics. It also brings to mind the old IBM timeline of the history of mathematics which has been updated by IBM for the iPad. From Strange Maps at Think Big via The Quantum Pontiff.

Monday, August 13, 2012

Great graphics! Time series plots and scatterplot movies by Kevin Quealy, Graham Roberts, and Amanda Cox from the New York Times graphic staff on the history of Olympic swimming and track and field results. Wonderful.

Tuesday, August 7, 2012

Law of Large Crowds (Monty Pythony)



A NOVA video about the Wisdom of Crowds. In his paper "Vox Populi" published in Nature, March 7, 1907, Sir Francis Galton investigated this wisdom at a Fat Stock and Poultry Exhibition writing:

In these democratic days, any investigation into the trustworthiness and peculiarities of popular judgments is of interest. The material about to be discussed refers to a small matter, but is much to the point. A weight-judging competition was carried on at the annual show of the West of England Fat Stock and Poultry Exhibition recently held at Plymouth. A fat ox having been selected, competitors bought stamped and numbered cards, for 6d. each, on which to inscribe their respective names, addresses, and estimates of what the ox would weigh after it had been slaughtered and "dressed." Those who guessed most successfully received prizes.

Galton received the 787 cards with the estimates and, then he tabulated results. The actual "dressed" weight of the ox was 1198 lbs. The median of the crowd's guesses was 1207 lbs., so the crowd was off by only 9 pounds out of 1198 or less than 0.8%. This was better than any individual guess, even from the experts. Galton also found the upper quartile (q3) of the guesses to be 1236 and the lower quartile (q1) to be 1162. Thus the inter-quartile range is 74. He modeled the data with a normal distribution and estimated its standard deviation as 1/2 of this inter-quartile range or 37, (he calls this the probable error, p.e.). We could be a bit more accurate by using 3/4 of the inter-quartile range. He notes the skewness of the distribution of guesses, with the lower portion best modeled with a normal distribution with a standard deviation of 45 and the upper portion best modeled with a standard deviation of 29. He graphs horizontally the cumulative distribution function of the fitted normal (solid line) along with the cumulative relative frequencies of the data (dotted line).

His overall finding:
This result is, I think, more creditable to the trust-worthiness of a democratic judgment than might have been expected.

Monday, August 6, 2012

Climate Dice Back in the Day

There's been much on blogs about "Climate Dice" and how humans are perhaps loading them. See the article by Andrew Revkin (New York Times) and more recently by Paul Krugman (New York Times) discussing the findings by James Hansen et al. on public perceptions of climate change, these links via Andrew Gelman. Basically, the data show that hotter extremes are becoming more likely.

Investigating the variability in climate, with analogies to random tosses of dice, has a long history, and not always considered in the correct way. C. F. Marvin, former chief of the US Weather Bureau, can be seen using such random methods in this article from Popular Science 1932 page 46.  Describing this methods a bit more, this newspaper article from 1931 begins with an especially lyrical view of his work :
Common dice, inventions of the ancients and purveyors of financial distress to unlucky moderns, have risen to a new dignity. After having been rolled in many places, ranging from the cobblestones of side streets to the green covered tables in palaces of chance, they are now being tossed with analytical earnestness by the hands of science.
In this new field the "galloping dominoes" are being used as a means of increasing man's working knowledge of the weather and its pranks. The greenback and silver involved when glassy eyed gamblers seek to get something for nothing are supplanted by graphs and slide rules in this new environment, where scientists seek the answer to the high sounding question, "Are meteorological sequences fortuitous?"
I think here one should read "random" for "fortuitous". The article's final question is the title of a paper by Marvin the appeared in December 1930 Monthly Weather Review (pdf). In that paper he uses such random methods to simulate graphs of precipitation and compares the results to actual records questioning "who could pick out the natural from the chance order"?

He does complicate things by confusing a sample and a population. He marvels at the fact that the products of the results on four tossed dice produces a histogram of measurements that is  sparse and skewed to the right, resembling a sample record of actual precipitation. Of course, this is a population of all the possible products and its sparseness will always be present. As he notes, there is no way to fill in between the products that can only be produced with four dice. This has no relation to the sampling variability and sparse histograms, of many shapes, that could result come from a skewed but continuous population of measurements. Of course, it was 1931!