In a multi-variable scientific inquiry process, correlation is used to describe the degree of dependence between two variables.

I. Introduction to correlational reasoning

In a multi-variable scientific inquiry process, variables can be either independent or dependent, and relationships can exist among these variables.  In everyday life, people pay attention to correlational relationships.  Examples include the correlations between smoking and the chance to get lung cancer, drinking tea and losing weight, the physical statures of parents and their offspring, and the correlation between the demand for a product and its price.  One variable may have a very strong dependence, a weak dependence, or no dependence at all on another variable.  Correlation is used to describe the degree of dependence between two variables. (Correlation can exist between more than two variables, but our discussion focuses only on the link between two variables.)

Lawson’s definition about correlational reasoning

Correlational reasoning is defined as the thought patterns individuals use to determine the strength of mutual or reciprocal relationships between variables.  Correlational reasoning is fundamental to the establishment of relationships between variables; such relationships allow for the making of predictions during scientific exploration. (Lawson, Adi, & Karplus, 1979)

Though there are variations on the definition of correlation, there are two typical features that are always present:

1. When we see two variables, there are two different ways to look at their relationship.  One is to see if there is a link between them.  The other is to see how these two variables are related—the mechanism of their relationship.  Researchers studying correlational reasoning mainly focus on people’s ability to identify relationships between variables and make predictions given a data set.  Correlational reasoning does not require people to identify the existence of mechanisms or causal relationship between two variables.

2. Correlational reasoning is closely related to conditional probability.  When correlation exists between variables A and B, the probability of A can influence the probability of B and vice versa.


II. Sample questions of correlational reasoning

1. Question ID: 20300121100

Farmer Brown was observing the mice that live in his field.  He discovered that all of them were either fat or thin.  Also, all of them had either black tails or white tails.  This made him wonder if there might be a link between the size of the mice and the color of their tails.  So he captured all of the mice in one part of his field and observed them. The picture shows the mice that he captured.

Based on the captured mice, do you think there is a link between the size of the mice and the color of their tails?

A. appears to be a link
B. appears not to be a link
C. cannot make a reasonable guess

Answer: A

Note:  This question is asking people to judge whether or not there exists a correlation between the size of the mice and the color of their tails.  We should compare the 4 groups of mice based on their properties (fat or thin, black or white tail), which gives us the following table:


Fat mouse

Thin mouse

Mouse with black tail



Mouse with white tail




We can see that most of the fat mice have black tails while most of the thin mice have white tails.  Therefore, there exists a correlation between the size of the mouse and the color of its tail.

In another context, this is a conditional probability question.  There is a link between the mouse and the color of its tail.  This means that if we catch a mouse with black tail it will most likely be a fat mouse; if we catch a thin mouse, it will most likely have a white tail.  We can ask the question in multiple ways.  For instance, suppose you catch a mouse with white tail, what size do you think it will most likely be?


2. The following is a probability-based question:

You and your friends bought some apples from the store. The apples are either small or large and are either dark red or light yellow (see the diagram below).  Imagine that all the apples were put into a bag so that you cannot see the color or size of the apple from outside.

Suppose you close your eyes and reach into the bag to pull out an apple. You feel that it is a really big apple. What color do you think it will most likely be?

A. red
B. yellow     
C. equally possible for it to be yellow or red
D. cannot be determined

Answer: A

Note:  This question is asking people to make a prediction based on the correlation between two variables. Before answering this question right, we should judge whether or not there exists a correlation between the variables.  This judgment step is similar to the previous question:  we can see that most red apples are big while most of the yellow apples are small.  Therefore, there is a correlation between apple color (red or yellow) and apple size (big or small).  Knowing this correlation, we can predict that if we pick a big apple, it will most likely be red, and if we pick a small apple, it will most likely be yellow.


III. Importance of correlational reasoning

Correlational reasoning plays a role in both everyday life and scientific research.  There are two main skills associated with this reasoning.  One skill is the ability to make a judgment about whether or not there exists a link between two variables.  The other skill is using existing correlations to make reasonable predictions to guide our decisions in life and in research.

1. An example of correlational reasoning in everyday life

Because there is a correlation between a mushroom’s color and it being poisonous, when people see a colored mushroom, they tend to throw it away rather than eat it.

2. Correlational reasoning in science

In scientific research, the first step of data analysis is to check whether or not there exist correlations between multiple variables.  Then scientists will compare two variables which seem to correlate with each other and make further judgments about the relationship between them.  This could include finding possible causal or mechanistic relationships, though correlation does not necessarily imply causation.


IV. Research on correlational reasoning

Within Piaget's developmental theory, correlational reasoning is considered to be a formal stage acquisition due to its hypothesized dependence on the development of formal operations (Inhelder & Piaget, 1958).  Although many other investigators have taken issue with Piaget's theoretical model of formal operational thought, it is generally agreed that correlational reasoning is developmentally advanced as it involves an understanding of relations between previous relations, e.g., proportions (Lunzer, 1965), acceptance of ambiguity as a starting point for further reasoning (Collis, 1972), and second- order operations (Lovell, 1971).

Development of correlational reasoning, as measured by the mice puzzle (and the fish puzzle not shown here), appears to not be enhanced by the study of high school biology (Lawson, Adi, & Karplus, 1979).

Deficiencies in reasoning abilities in the areas of proportionality, probability, and correlational reasoning can be successfully addressed with classroom intervention (Vass, Schiller, Nappi,  2000).


The following literature review is from the paper Zieffler, A. (2006), A Longitudinal Investigation of the Development of College Students’ Reasoning About Bivariate Data During an Introductory Statistics Course, unpublished Ph.D. dissertation, University of Minnesota., pages 11-23.

Findings from the Field of Psychology

Peoples’ prior beliefs about the relationship between two variables have a great deal of influence on their judgments of the covariation between those variables (e.g., Alloy & Tabachnik, 1984; Crocker, 1982; Jennings, Amabile & Ross, 1982; Kuhn, Amsel, & O’Loughlin, 1988; Kuhn, Garcia-Mila, Zohar,& Andersen, 1995; Nisbett & Ross, 1980; Peterson, 1980; Smedslund, 1963; Snyder,1981; Snyder & Swann, 1978; Trolier & Hamilton, 1986; Ward & Jenkins, 1965; Wason& Johnson-Laird, 1972)

This finding is related to another major finding from the field of psychology, that of illusory correlation. Chapman (1967) more specifically defined illusory correlation as a perceived correlation between two events that “(a) are not correlated, or (b) are correlated to a lesser extent than reported, or (c) are correlated in the opposite direction from that which is reported (p. 151).” The finding of illusory correlation has been very consistent in the psychological literature (e.g., Chapman & Chapman, 1967, 1969; Crocker, 1981, Fiedler, 1991; Hamilton & Gifford, 1976; Hamilton & Rose, 1980; Haslam & McGarty, 1994; McGahan, Flynn, Williamson & McDougal, 1997; McGahan, McDougal,Williamson, & Pryor, 2000; McGarty, Haslam, Turner, & Oakes, 1993; Mullen &Johnson, 1990; Yates, McGahan, & Williamson, 2000).

A second robust finding from the psychological research has suggested that people tend to not treat the four cells of a 2-by-2 contingency table as equally important.  In fact, the findings suggest peoples’ judgments seem to be most influenced by the joint presence of variables and least influenced by the joint absence of variables (e.g., Kao & Wasserman, 1993; Levin, Wasserman & Kao, 1993; Lipe, 1990; Schustack & Sternberg, 1981; Wasserman, Dorner & Kao, 1990).

Other findings that tend to be consistent throughout this body of literature are that subjects have difficulty when the relationship is negative (e.g., Beyth-Marom, 1982; Erlick, 1966; Erlick & Mills, 1967; Gray, 1968), and that peoples’ covariational judgment of the relationship between two variables tends to be less than optimum (i.e. smaller thanthe actual correlation presented in the data or graph) (e.g., Bobko & Karren, 1979; Cleveland, Diaconiss, & McGill, 1982; Jennings, Amabile, & Ross, 1982; Konarski, 2005; Kuhn, 1989; Lane, Anderson, & Kellam, 1985; Meyer, Taieb, & Flascher, 1997; Shaklee & Mims, 1981; Shaklee & Paszek, 1985). Still another consistent finding in these studies is that subjects have a tendency to form causal relationships based on a covariational analysis (e.g., Crocker, 1981; Heider, 1958; Inhelder & Piaget, 1958; Kelley, 1967; Ross & Cousins, 1993; Smedslund, 1963; Shaklee & Tucker, 1980).

There is also a fair amount of research from this field that has examined the conditions and accommodations under which people tend to make better covariational judgments. For instance, researchers have found that subjects tend to make more accurate judgments when the variables to be examined are continuous rather than dichotomous(e.g., Beach & Scopp, 1966; Erlick & Mills, 1967, Jennings et al., 1982), and others tudies from this field have suggested that certain accommodations such as detailed instructions (Alloy & Abrahamson, 1979), easy to process formats (Ward & Jenkins,1965), subjects being told non-contingency is possible (Peterson, 1980), and low frequency of data/cases (Inhelder & Piaget, 1958) might help subjects more accurately judge covariation. Subjects have also been shown to make more accurate judgments when data are presented simultaneously rather than when it is presented one case at a time (Seggie & Endersby, 1972; Smedslund, 1963)

Findings from the Fields of Science and Mathematics Education

One prominent study from the field of mathematics education (Carlson et al., 2002) examined what the mental actions are that students apply when reasoning about covariation. The five mental actions of covariational reasoning that were used by Carlson et al. (2002) to classify students were as follows:

• Mental Action One (MA1) The coordination of the value of one variable with changes in the other.

• Mental Action Two (MA2) The coordination of the direction of change of one variable with changes in the other.

• Mental Action Three (MA3) The coordination of the amount of change of one variable with the amount of change in the other.

• Mental Action Four (MA4) The coordination of the average rate of change of the function with uniform increments of change in the input variable.

• Mental Action Five (MA5) The coordination of the instantaneous rate of change of the function with continuous change in the independent variable for the entire domain of the function.

The study just described showed that most students could determine the direction of change (MA2) but that many had difficulties constructing images of continuous rate of change, even after completion of a second course in calculus. They also found that students have particular problems representing and interpreting graphical displays. In some cases, mathematics education researchers have found that kinesthetic or physical enactment of certain problems appeared to aid the students in their ability to reason correctly about covariation (e.g., Carlson, 1998; Carlson, 2002; Carlson et al., 2002; Carlson, Larsen, & Jacobs, 2001). These studies have suggested the need for teachers to have students think about covariation as it occurs in functions in terms of real-life dynamic events.

Studies by Researchers in Statistics Education      

Research has suggested that technology seems to improve subjects’ strategies to evaluate and judge covariation (e.g., Batanero et al., 1997; Batanero et al., 1998, Morris, 1997; Stockburger, 1982).

Subjects have trouble with negative relationships (e.g., Batanero et al., 1997; Batanero et al., 1998; Morris, 1997); subjects only perceive a relationship in the positive direction (e.g., Batanero et al., 1997; Batanero etal., 1998),

Subjects have trouble with negative relationships is especially true when they run contrary to their prior beliefs (Moritz, 2004),

Subjects tend to form causal relationships from correlational data (e.g.,Batanero et al.,1997; Batanero et al., 1998).


V. References

Piaget, J. 1972. Intellectual evolution from adolescence to adulthood. Human Development 15:1

Vass, E., Schiller, D., & Nappi, A. J. (2000). The effects of instructional intervention on improving proportional, probabilistic, and correlational reasoning skills among undergraduate education majors. Journal of Research in Science Teaching, 37, 981-995

Lawson A.E., Adi H. and Karplus R., (1979), Development of correlational reasoning in secondary schools: do biology courses make a difference?, The American Biology Teacher, 41, 420-425

Zieffler, A. (2006), A Longitudinal Investigation of the Development of College Students’ Reasoning About Bivariate Data During an Introductory Statistics Course, unpublished Ph.D. dissertation, University of Minnesota.

Alloy, L. B., & Tabachnik, N. (1984). Assessment of covariation by humans and animals: The joint influence of prior expectations and current situational information. Psychological Review, 91, 112-149.

Chapman, L. J. (1967). Illusory correlation in observational report. Journal of Verbal Learning and Verbal Behavior, 6, 151-155.

Kao, S. F., & Wasserman, E. A. (1993). Assessment of an information integration account of contingency judgment with examination of subjective cell importance and method of information presentation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1363-1386.

Beyth-Marom, R. (1982). Perception of correlation reexamined. Memory & Cognition, 10, 511-519.

Beach, L. R., & Scopp, T. S. (1966). Inferences about correlations. Psychonomic Science, 6, 253-254.

Carlson, M. (1998). A cross-sectional investigation of the development of the function concept. In E. Dubinsky, A. H. Schoenfeld, & J. J. Kaput (Eds.), Research in collegiate mathematics education III, Issues in mathematics education, 7, 115-162.

Carlson, M. P. (2002). Physical enactment: A powerful representational tool for understanding the nature of covarying relationships. In F. Hitt (Ed.), Representations and mathematics visualization (pp. 63-77). Mexico: CINVESTAV.

Batanero, C., Godino, J. D., & Estepa, A. (1998). Building the meaning of statistical association through data analysis activities. In J. Garfield, & D. Ben-Zvi (Eds.), First International Research Forum on Statistical Reasoning, Thinking, and Literacy (pp. 37-53). Kibbutz Be’eri, Isreal