Here is an informative graphical representation of the correlation matrix of a data set of weather data on 16 variables. The image is via the RevolutionAnalytics blog and their post on big data available in R. The data mining R package Rattle produced the image. The magnitudes of correlations are shown with a concentration ellipses. Blue ellipses, with positive slopes, display variables with positive correlation with darker blue shading depicting higher positive correlation and lighter blue shading depicting lower positive correlation. Near zero correlations are shown as open, unfilled ellipses that are nearly circular. Variables with negative correlation are shown similarly by red ellipses with negative slopes. Dark red shading depicts greater negative correlation and lighter red shading depicts negative correlations closer to zero. This is a very useful tool to guide the eye through the relationships of the many variables used. Of course, as is well known, correlation is only an appropriate measure of the strength of the linear relationships between two variables when their scatterplot shows approximately the elliptical shape shown in these shaded icons. Examples abound of scatterplots where representing them as above can be misleading. My favorite, below, from the book by Chambers, et al. shows eight scatterplots all with the same positive correlation of 0.7. Correlation and its representation with an ellipsoidal icon is appropriate for only one of these scatterplots. This extols us all to remember, to look at the data.

## No comments:

Post a Comment