Monday, July 30, 2012

Happy Tweets from a Jellyfish

 Average Happiness
A clever "jellyfish plot" from researchers at the Center for Complex Systems at the University of Vermont showing indexed percentiles from a study of the Positivity of the English Language. Words are scored by 50 people on a happiness scale (1=unhappy, 5=neutral, 9=happy). These 50 scores are averaged to produce an average happiness measure for each word. This is done for the 5000 most frequently occurring words on Twitter. Each point represents a word plotted by its average happiness and its rank by frequency of use (1=most frequent, 5000=least frequent). Percentiles (here deciles) of average happiness are graphed in a sliding window of 500 ranked words.

The plot shows that generally the distributions of happiness, in frequently used or not so frequently used words, are skewed towards happier words, generally matching the marginal distribution curve shown on top. This finding was consistent across words from the New York Times, Google Books, and Music Lyrics.

But note the spread  in the deciles decreases as the word usage decreases, and their average happiness generally gets smaller for less frequently used words. More details and graphs can also be found here.

Monday, July 23, 2012

Curve of the fruit length of Oenothera lamarckiana

From the blog Scientific Illustration and Eyeball Mansion showing curves emphasizing the size (length) distribution of the Evening-Primrose (Oenothera lamarckiana). Drawing by botanist and geneticist Hugo de Vries. The top view is a cumulative distribution function flipped by ninety degrees (called by NIST the percent point function). The bottom view shows a larger sketched bell-shaped curve that is just a multiple of the smaller.

Monday, July 16, 2012

YADDA, Yet Another Door Distribution Again

Bell-shaped wear pattern on the door of Sackets Harbor Brewing Company, Sackets Harbor, New York.

Wednesday, July 11, 2012

Here's the right idea

G. S. Springer, a geology professor at Ohio University,  has a cartoon that emphasizes the fact that the normal curve describes the distribution of measurements on a number line. Small measurements are on the left and large measurements are on the right. Those in the middle at "the top of the bell curve" are just average. These two getting ready to sled are "at the top" because they are average and there are so many others that are also average. The real "top" of the normal distribution are those large measurements (and sleds) on the extreme right of the number line.

Monday, July 9, 2012

The Puzzle of You

The previous post offered a Puzzle that showed a t-shirt design that conveyed the wrong idea about a normal curve. The problem is the point meant to indicate "YOU". This point is placed to indicate that "YOU" are on the low end of a scale of measurements.

The problem is, the design places the point on the curve, rather than on the number line that underlies the curve. To see the difficulty, compare the previous t-shirt design with manipulated image above (this is not an available shirt). Placing the point on the curve suggests that it slides along the curve.

Imagine the impression conveyed when it reaches the peak of the curve, as illustrated here. No doubt, "YOU" would say you were at the top of the heap, even though your measurement would be very common, the most likely one that we could observe. This "top of the curve" error is common (no pun intended). The curve is not the distribution, the curve only describes the distribution. The labeled point should be on the supporting number line, not on the curve.

Monday, July 2, 2012

Puzzle: How does this t-shirt design get it wrong?

Here is a t-shirt design available from ThinkGeek. The point labeled "YOU ARE HERE" is placed to convey an incorrect idea about the normal distribution. What is wrong here? Answer in the next post.