Statpics: June 2013

Monday, June 24, 2013

Counting Gym Freqs

So let's measure the amount of wear on Jon Purdy's image of exercise weights from a previous post. Here is a Photoshop image of the weights modified by rotation and perspective correction. Unfortunately, the weight selection plug (in red) has hidden the amount of wear at 30lbs/14kgs. Its coiled cord has only partially hidden the full wear pattern at 45lbs/20kgs and at 60lbs/27kgs. We can attempt to impute these last two. The amount of wear behind the coiled cord at 45lbs/20kgs appears to be about the same as at 135lbs/61kgs. We will replicate the 150lbs/68kgs pattern over the top of the image of the coiled cord at 45lbs/20kgs. In this image, the cord also obscures the upper portion of the wear pattern at 60lbs/27kgs. We will replicate only the upper portion of this wear pattern with with upper portion from its neighbor 75lbs/34kgs resulting in the image below.

We then convert the selected wear region to black and weight and increase the contrast. A portion of this is shown below.

Notice that, as a result of our imputation, the first panel of this sequence (30lbs/14kgs) is a duplicate of the eighth panel (150lbs/68kgs) and the upper half of the second panel is a duplicate of the upper half of the third panel. The overall balance of dark and light pixels can be seen with the Photoshop histogram option that displays lines whose height represents the count of pixels corresponding to total black (a level of 0, on the left) to total white (a level of 255, on the right), as shown below.

We see peaks of predominately dark pixels levels (on the left) and light pixels (on the right). On this histogram I have selected the approximate midpoint between the two peaks (level 133) and shaded in all the pixels that are lighter (levels 133...255). Of the 106512 pixels in this image, 44443 were thus selected. For each weight, a region of the same size size was placed around the panel and the number of light pixels (above 133) was computed, as shown below for panel 6 (120lbs/55kgs).

Of the 7504 pixels commonly selected from each panel, this panel had a count of 4781 light pixels (those with levels 133...255). Repeating this for all the panels (1 through 14), the total number of light pixels is 33651, so that panel 6 accounts for 4781/33651 or 14% of the light colored wear. The other panels can be similarly computed finally resulting in the rounded wear percentages shown in the bar chart below.

Monday, June 17, 2013

Using Metadata to find Colonial Traitors to the English Crown (c. 1770s)

Last week Duke sociologist Kieran Healy had a fun and informative post showing how much information about social networks can be gleaned from so little metadata. Metadata have been in the news lately with the disclosure of Prism the US Government's communication surveillance system. Healy's post shows how knowing only what groups individuals belong to and a little matrix arithmetic that individuals can be "linked through the groups they belong to [and] groups can be linked by the people they share". Using data from the book "Paul Revere's Ride" by Fischer, Healy produced the basic image above showing the links between various Colonials through the clubs, committees, caucuses, etc that they belonged to. In particular this plot shows people that serve as bridges between the groups. I've added the portrait of Paul Revere, a central bridge, "ready to ride and spread the alarm", to many political groups and marking him as a notable Colonial traitor to the British Crown of the 1770s. Healy also has a plot showing the links between the groups. So much information from so little metadata. We've used other similarity plotting techniques here to cluster the Justices of the Supreme Court.

(The follow-up to Gym Freqs, promised last week, will appear next week).

Monday, June 10, 2013

Gym Freqs

A submission from teacher Jon Purdy. Jon writes of his picture of exercise weights in a gym:

[I] noticed this while exercising today. More wear on the middle weights indicate greater usage, with usage dropping off as the weights get heavier and slight drop off with the lightest weights.

A great example of a discrete frequency distribution of use. Thanks Jon.
Next week, I'll have some measurements of this use and wear.

Monday, June 3, 2013

Testing Poissonness Petals

Last week’s image was of cherry blossom petals that had fallen on a stone walk, a random realization of a spatial Poisson process. In such a process, the probability of some number of petals falling on any stone is proportional to the area of the stone. The mean number of petals falling on the larger stones is 8.71. The mean number of petals falling on the smaller stones is 5.58. Their ratio is 1.56. The ratio of the stones’ area is 1.5. Such a close agreement is what a Poisson process should produce.

Not so fast, implies commenter Kevin. He suggests that we should test whether our observed ratio of 1.56 is significantly different from the ratio of the stones’ area of 1.5. My intuition tells me that, with the large variability in a Poisson process and such a small sample of fallen petals, we would need a much larger difference between the observed and expected ratios for the difference to be significant. But let’s test it.

First, consider the larger stones. If we count those laid down horizontally, left to right, from top to bottom, my counts (again likely prone to error) are 12,9,10,9,11,13,3,13,11,7. For the larger stones laid down vertically, left to right, from top to bottom, I count 12,6,7,6,11,6,6,10,10,4,7 petals. For the smaller, square stones, left to right, from top to bottom, I count 3,3,2,6,8,4,8,11,4,5,7,6 petals. Let’s investigate whether these data can be modeled with a Poisson distribution. For example, the mean and variance for the larger stones are 8.71 and 8.61, respectively, values close to the equal mean and variance expected for a Poisson distribution. But we can do more.

The image above is called a Poissonness plot (Hoaglin 1980). This one is for the sample of petals from the larger stones. It allows us to graphically test whether a Poisson distribution is an appropriate model for the petal counts. Let x[k] be the count of stones collecting k petals, we plot on the vertical axis: log(x[k])+log(k!) against k (on the horizontal axis). In such a plot, Poisson counts fall upon a straight line with a slope equal to the logarithm of the Poisson mean. This is what we see for the larger stones. The plot for the smaller stones is also straight.

Kevin suggests testing our observed ratio to see if it is significantly different from the expected ratio of 1.5. Since we are dealing with discrete distributions we can compute the exact probability distribution for this ratio. The larger stones have a length of 9.375. The shorter, square stones have a length of 6.25, for a ratio of 1.5. We take these as the null parameters of two independent Poisson distributions. Our samples of sizes 21 and 12 have sums that are also Poisson. The joint distribution is simply their product, allowing easy computation of the probabilities for the ratio of the means. Its discrete probability distribution is shown below (ignoring the small probability that the denominator is zero, 2.6 x 10^-33 ).

Although this appears to be a darkly colored continuous probability density, it is actually a discrete distribution of probability mass plotted as densely packed individual vertical lines or spikes of probability. For this null distribution our observed ratio of means, 1.56, has an upper tail p-value of 0.39. This, of course, is not significantly different from 1.5. In fact, our observed ratio would have to exceed 1.89 to be significant at the 5% level. Thanks for the prompting, Kevin.