Correlation Resources

This is not a Stats class

Well, even though a solid knowledge of statistics would help inform your research method, we don’t have the luxury of providing a 2 or 3 course sequence in Stats. Start with a supplemental 2 Summers chapter 14a— here I have begun to enter the content from HuskyCT into the TopHat text format– let me know if it reads better below or in the glossy textbook format.  Chapter 14 in your TopHat text has a pretty good overview of key ideas, including variance, confidence intervals, degrees of freedom, standardizing your data with Z scores, and on to the strength of relationships between 2 variables with covariance and correlation. It is only this last thing the Pearson correlation statistic that is the focus of this module. It leads to several other basic statistics that are very common in social research, such as regression (or predicting on variable on the basis of another). Note this is different from simply detecting if 2 variables are related. Steady your courage and let’s consider correlation, jumping right in to the stats end, but still with our focus remaining on the design and methods of research, not the stats.

They are highly correlated, one must cause the other, right?

Consider some of the implications of assuming that correlation equals causality. From the spurious correlations site, we would have to conclude that increased spending on space science causes suicides, strangulations and suffocations– they are nearly perfectly correlated r=.9979. We would also have to conclude that eating more cheese causes you to get entangled to death in your bedsheets, r=.947. Next consider these two correlations:

  • The more firemen are sent to a fire, the more damage is done (high positive correlation).
  • Children who get tutored, get worse grades than children who do not get tutored (high positive correlation).

So let’s think about this. Causality is a one-way relationship.

  • Firemen cause more fire damage.
  • Tutoring causes worse grades.

While correlation is two-way.

  • More firemen are present when there is more damage and more damage is seen at fires with many firemen.
  • Students who get low grades get more tutoring and more tutoring is given to students with low grades.

In fact, often it is some 3rd undisclosed or yet unknown factor at work.

  • Large fires bring more firemen and cause more damage.
  • Poor study skills cause low grades and result in more tutoring.

stick figure correlation cartoon

Correlations of +1 and -1, what does perfectly correlated mean?

Student absences are negatively correlated with grades (as absences increase, grades go down)– Is this causal?

As students exercise more, their weight and body fat are less, so negatively correlated– Is this causal?

On a snow day, the more it snows, the fewer drivers are on the roads — Is this negative correlation causal?

So when every time something goes up, another thing goes down, that’s a perfect negative correlation, with a value of -1.0. These are also known as inverse relationships.

 

As the temperature goes up, ice cream sales increase. Does the Sun cause ice cream sales?

The more gasoline put into your car, the farther you can drive it.

As the amount of tread on your car tires goes down, the amount of road traction is reduced . Don’t get confused, you could also state this as the more tread, the more traction. Think of this as looking right or left, up or down the trend line shown here.

So these are positive correlations. Every time one thing goes up or down, the other things does the same. This is a correlation that nears +1.o or perfect positive correlation.

And of course there are variables whose relationship is completely random. These have a correlation around zero. Most real data does not fall perfectly on a straight line, but instead visibly trends one way or another, so is something less than perfect.

This is described further in TopHat Chapter 14, particularly section 14.7.

If you’d like an interactive visual of the magnitude of a correlation and how that looks with regard to a scatterplot of scores, there is a nice tool by Berkeley stats Professor P.B. Stark available. I suggest you click all the lines off and vary the r correlation, perhaps you can even imagine the regression line. Have fun with this.

Research Methods and Correlations

It is interesting to think how using correlation leads to slightly different Research Questions and methods. For instance when comparing a tech-using class with a comparison group, instead of asking whether a technology causes a particular difference, you would be asking about the relationship between tech and the target (dependent) variable.

  • How does increased use of a digital tuner affect playing a musical instrument in tune? Does it go up linearly (more use associated with more playing in tune or the opposite)?
  • How does more use of Google Docs affect student engagement? Does it vary linearly (use use of GDocs associated with more student engagement, or the opposite)?

There are ethical implications of this, since you are seeking evidence of how things vary, instead of seeking causality. So you can try a technology with all your students to see if those who use it more, have any stable relationship with your dependent variables of interest, and use correlation to quantify that relationship.

Other Readings:

Ariely, Loewnnstein, & Prelec (2003) showing how how correlation is used to show how easily human judgment can be influenced and biased.

Essential Understanding:

It is very easy to mistake correlation for causality. Things can appear related, either because they really are, or just by chance, or even because they are related to some other 3rd variable that causes both things to change together. Many complex issues are oversimplified by use of correlations. Determining causality requires direct experimental interventions. Correlations can help us construct theories, but interventions (experimental) research is required to test for causation.