Sciencing_Icons_Science SCIENCE

Sciencing_icons_biology biology, sciencing_icons_cells cells, sciencing_icons_molecular molecular, sciencing_icons_microorganisms microorganisms, sciencing_icons_genetics genetics, sciencing_icons_human body human body, sciencing_icons_ecology ecology, sciencing_icons_chemistry chemistry, sciencing_icons_atomic & molecular structure atomic & molecular structure, sciencing_icons_bonds bonds, sciencing_icons_reactions reactions, sciencing_icons_stoichiometry stoichiometry, sciencing_icons_solutions solutions, sciencing_icons_acids & bases acids & bases, sciencing_icons_thermodynamics thermodynamics, sciencing_icons_organic chemistry organic chemistry, sciencing_icons_physics physics, sciencing_icons_fundamentals-physics fundamentals, sciencing_icons_electronics electronics, sciencing_icons_waves waves, sciencing_icons_energy energy, sciencing_icons_fluid fluid, sciencing_icons_astronomy astronomy, sciencing_icons_geology geology, sciencing_icons_fundamentals-geology fundamentals, sciencing_icons_minerals & rocks minerals & rocks, sciencing_icons_earth scructure earth structure, sciencing_icons_fossils fossils, sciencing_icons_natural disasters natural disasters, sciencing_icons_nature nature, sciencing_icons_ecosystems ecosystems, sciencing_icons_environment environment, sciencing_icons_insects insects, sciencing_icons_plants & mushrooms plants & mushrooms, sciencing_icons_animals animals, sciencing_icons_math math, sciencing_icons_arithmetic arithmetic, sciencing_icons_addition & subtraction addition & subtraction, sciencing_icons_multiplication & division multiplication & division, sciencing_icons_decimals decimals, sciencing_icons_fractions fractions, sciencing_icons_conversions conversions, sciencing_icons_algebra algebra, sciencing_icons_working with units working with units, sciencing_icons_equations & expressions equations & expressions, sciencing_icons_ratios & proportions ratios & proportions, sciencing_icons_inequalities inequalities, sciencing_icons_exponents & logarithms exponents & logarithms, sciencing_icons_factorization factorization, sciencing_icons_functions functions, sciencing_icons_linear equations linear equations, sciencing_icons_graphs graphs, sciencing_icons_quadratics quadratics, sciencing_icons_polynomials polynomials, sciencing_icons_geometry geometry, sciencing_icons_fundamentals-geometry fundamentals, sciencing_icons_cartesian cartesian, sciencing_icons_circles circles, sciencing_icons_solids solids, sciencing_icons_trigonometry trigonometry, sciencing_icons_probability-statistics probability & statistics, sciencing_icons_mean-median-mode mean/median/mode, sciencing_icons_independent-dependent variables independent/dependent variables, sciencing_icons_deviation deviation, sciencing_icons_correlation correlation, sciencing_icons_sampling sampling, sciencing_icons_distributions distributions, sciencing_icons_probability probability, sciencing_icons_calculus calculus, sciencing_icons_differentiation-integration differentiation/integration, sciencing_icons_application application, sciencing_icons_projects projects, sciencing_icons_news news.

  • Share Tweet Email Print
  • Home ⋅
  • Math ⋅
  • Probability & Statistics ⋅
  • Distributions

How to Write a Hypothesis for Correlation

A hypothesis for correlation predicts a statistically significant relationship.

How to Calculate a P-Value

A hypothesis is a testable statement about how something works in the natural world. While some hypotheses predict a causal relationship between two variables, other hypotheses predict a correlation between them. According to the Research Methods Knowledge Base, a correlation is a single number that describes the relationship between two variables. If you do not predict a causal relationship or cannot measure one objectively, state clearly in your hypothesis that you are merely predicting a correlation.

Research the topic in depth before forming a hypothesis. Without adequate knowledge about the subject matter, you will not be able to decide whether to write a hypothesis for correlation or causation. Read the findings of similar experiments before writing your own hypothesis.

Identify the independent variable and dependent variable. Your hypothesis will be concerned with what happens to the dependent variable when a change is made in the independent variable. In a correlation, the two variables undergo changes at the same time in a significant number of cases. However, this does not mean that the change in the independent variable causes the change in the dependent variable.

Construct an experiment to test your hypothesis. In a correlative experiment, you must be able to measure the exact relationship between two variables. This means you will need to find out how often a change occurs in both variables in terms of a specific percentage.

Establish the requirements of the experiment with regard to statistical significance. Instruct readers exactly how often the variables must correlate to reach a high enough level of statistical significance. This number will vary considerably depending on the field. In a highly technical scientific study, for instance, the variables may need to correlate 98 percent of the time; but in a sociological study, 90 percent correlation may suffice. Look at other studies in your particular field to determine the requirements for statistical significance.

State the null hypothesis. The null hypothesis gives an exact value that implies there is no correlation between the two variables. If the results show a percentage equal to or lower than the value of the null hypothesis, then the variables are not proven to correlate.

Record and summarize the results of your experiment. State whether or not the experiment met the minimum requirements of your hypothesis in terms of both percentage and significance.

Related Articles

How to determine the sample size in a quantitative..., how to calculate a two-tailed test, how to interpret a student's t-test results, how to know if something is significant using spss, quantitative vs. qualitative data and laboratory testing, similarities of univariate & multivariate statistical..., what is the meaning of sample size, distinguishing between descriptive & causal studies, how to calculate cv values, how to determine your practice clep score, what are the different types of correlations, how to calculate p-hat, how to calculate percentage error, how to calculate percent relative range, how to calculate a sample size population, how to calculate bias, how to calculate the percentage of another number, how to find y value for the slope of a line, advantages & disadvantages of finding variance.

  • University of New England; Steps in Hypothesis Testing for Correlation; 2000
  • Research Methods Knowledge Base; Correlation; William M.K. Trochim; 2006
  • Science Buddies; Hypothesis

About the Author

Brian Gabriel has been a writer and blogger since 2009, contributing to various online publications. He earned his Bachelor of Arts in history from Whitworth University.

Photo Credits

Thinkstock/Comstock/Getty Images

Find Your Next Great Science Fair Project! GO

We Have More Great Sciencing Articles!

How to Determine the Sample Size in a Quantitative Research Study

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 5.

  • Example: Correlation coefficient intuition
  • Correlation coefficient intuition
  • Calculating correlation coefficient r

Correlation coefficient review

What is a correlation coefficient.

  • It always has a value between − 1 ‍   and 1 ‍   .
  • Strong positive linear relationships have values of r ‍   closer to 1 ‍   .
  • Strong negative linear relationships have values of r ‍   closer to − 1 ‍   .
  • Weaker relationships have values of r ‍   closer to 0 ‍   .

Practice problem

Want to join the conversation.

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Conducting a hypothesis test for the population correlation coefficient p.

There is one more point we haven't stressed yet in our discussion about the correlation coefficient r and the coefficient of determination r 2 — namely, the two measures summarize the strength of a linear relationship in samples only . If we obtained a different sample, we would obtain different correlations, different r 2 values, and therefore potentially different conclusions. As always, we want to draw conclusions about populations , not just samples. To do so, we either have to conduct a hypothesis test or calculate a confidence interval. In this section, we learn how to conduct a hypothesis test for the population correlation coefficient ρ (the Greek letter "rho").

Incidentally, where does this topic fit in among the four regression analysis steps?

  • Model formulation
  • Model estimation
  • Model evaluation

It's a situation in which we use the model to answer a specific research question, namely whether or not a linear relationship exists between two quantitative variables

In general, a researcher should use the hypothesis test for the population correlation ρ to learn of a linear association between two variables, when it isn't obvious which variable should be regarded as the response. Let's clarify this point with examples of two different research questions.

We previously learned that to evaluate whether or not a linear relationship exists between skin cancer mortality and latitude, we can perform either of the following tests:

  • t -test for testing H 0 : β 1 = 0
  • ANOVA F -test for testing H 0 : β 1 = 0

That's because it is fairly obvious that latitude should be treated as the predictor variable and skin cancer mortality as the response. Suppose we want to evaluate whether or not a linear relationship exists between a husband's age and his wife's age? In this case, one could treat the husband's age as the response:

or one could treat wife's age as the response:

Pearson correlation of HAge and WAge = 0.939

In cases such as these, we answer our research question concerning the existence of a linear relationship by using the t -test for testing the population correlation coefficient H 0 : ρ = 0.

Let's jump right to it! We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient ρ . First, we specify the null and alternative hypotheses:

Null hypothesis H 0 : ρ = 0 Alternative hypothesis H A : ρ ≠ 0 or H A : ρ < 0 or H A : ρ > 0

Second, we calculate the value of the test statistic using the following formula:

Test statistic :  \(t^*=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\) 

Third, we use the resulting test statistic to calculate the P -value. As always, the P -value is the answer to the question "how likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis were true?" The P -value is determined by referring to a t- distribution with n -2 degrees of freedom.

Finally, we make a decision:

  • If the P -value is smaller than the significance level α, we reject the null hypothesis in favor of the alternative. We conclude "there is sufficient evidence at the α level to conclude that there is a linear relationship in the population between the predictor x and response y ."
  • If the P -value is larger than the significance level α, we fail to reject the null hypothesis. We conclude "there is not enough evidence at the α level to conclude that there is a linear relationship in the population between the predictor x and response y ."

Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0 : ρ = 0 against the alternative H A : ρ ≠ 0, we obtain the following test statistic:

\[t^*=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}=\frac{0.939\sqrt{170-2}}{\sqrt{1-0.939^2}}=35.39\]

To obtain the P -value, we need to compare the test statistic to a t -distribution with 168 degrees of freedom (since 170 - 2 = 168). In particular, we need to find the probability that we'd observe a test statistic more extreme than 35.39, and then, since we're conducting a two-sided test, multiply the probability by 2. Minitab helps us out here:

Incidentally, we can let statistical software like Minitab do all of the dirty work for us. In doing so, Minitab reports:

It should be noted that the three hypothesis tests we learned for testing the existence of a linear relationship — the t -test for H 0 : β 1 = 0, the ANOVA F -test for H 0 : β 1 = 0, and the t -test for H 0 : ρ = 0 — will always yield the same results. For example, if we treat the husband's age ("HAge") as the response and the wife's age ("WAge") as the predictor, each test yields a P -value of 0.000... < 0.001:

And similarly, if we treat the wife's age ("WAge") as the response and the husband's age ("HAge") as the predictor, each test yields of P -value of 0.000... < 0.001:

Technically, then, it doesn't matter what test you use to obtain the P -value. You will always get the same P -value. But, you should report the results of the test that make sense for your particular situation:

  • If one of the variables can be clearly identified as the response, report that you conducted a t -test or F -test results for testing H 0 : β 1 = 0. (Does it make sense to use x to predict y ?)
  • If it is not obvious which variable is the response, report that you conducted a t -test for testing H 0 : ρ = 0. (Does it only make sense to look for an association between x and y ?)

One final note ... as always, we should clarify when it is okay to use the t -test for testing H 0 : ρ = 0? The guidelines are a straightforward extension of the "LINE" assumptions made for the simple linear regression model. It's okay:

  • When it is not obvious which variable is the response.
  • For each x , the y 's are normal with equal variances.
  • For each y , the x 's are normal with equal variances.
  • Either, y can be considered a linear function of x .
  • Or, x can be considered a linear function of y .
  • The ( x , y ) pairs are independent

12.4 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter "rho."
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero."
  • What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

PERFORMING THE HYPOTHESIS TEST

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

WHAT THE HYPOTHESES MEAN IN WORDS:

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

DRAWING A CONCLUSION: There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05

Using the p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a p -value to make a decision

Using the ti-83, 83+, 84, 84+ calculator.

To calculate the p -value using LinRegTTEST: On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight " ≠ 0 " The output screen shows the p-value on the line that reads "p =". (Most computer statistical software can calculate the p -value.)

  • Decision: Reject the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero."
  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero."
  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n - 2 degrees of freedom.
  • The formula for the test statistic is t = r n − 2 1 − r 2 t = r n − 2 1 − r 2 . The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

An alternative way to calculate the p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

  • Consider the third exam/final exam example .
  • The line of best fit is: ŷ = -173.51 + 4.83 x with r = 0.6631 and there are n = 11 data points.
  • Can the regression line be used for prediction? Given a third exam score ( x value), can we use the line to predict the final exam score (predicted y value)?
  • H 0 : ρ = 0
  • H a : ρ ≠ 0
  • The p -value is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The p -value, 0.026, is less than the significance level of α = 0.05.
  • Decision: Reject the Null Hypothesis H 0
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of r r is significant or not . Compare r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.

Example 12.7

Suppose you computed r = 0.801 using n = 10 data points. df = n - 2 = 10 - 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Try It 12.7

For a given line of best fit, you computed that r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

Example 12.8

Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Try It 12.8

For a given line of best fit, you compute that r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

Example 12.9

Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Try It 12.9

For a given line of best fit, you compute that r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example . The line of best fit is: ŷ = –173.51+4.83 x with r = 0.6631 and there are n = 11 data points. Can the regression line be used for prediction? Given a third-exam score ( x value), can we use the line to predict the final exam score (predicted y value)?

  • Use the "95% Critical Value" table for r with df = n – 2 = 11 – 2 = 9.
  • The critical values are –0.602 and +0.602
  • Since 0.6631 > 0.602, r is significant.
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Example 12.10

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

Try It 12.10

For a given line of best fit, you compute that r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics
  • Publication date: Sep 19, 2013
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics/pages/12-4-testing-the-significance-of-the-correlation-coefficient

© Jun 23, 2022 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Module 12: Linear Regression and Correlation

Hypothesis test for correlation, learning outcomes.

  • Conduct a linear regression t-test using p-values and critical values and interpret the conclusion in context

The correlation coefficient,  r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the “ significance of the correlation coefficient ” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute  r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter “rho.”
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient  ρ is “close to zero” or “significantly different from zero.” We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.”

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that the correlation coefficient is “not significant.”

  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.”
  • What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

What the Hypotheses Mean in Words

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

Drawing a Conclusion

There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%,  α = 0.05

Using the  p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook).

Method 1: Using a p -value to make a decision

Using the ti-83, 83+, 84, 84+ calculator.

To calculate the  p -value using LinRegTTEST:

  • On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight “≠ 0”
  • The output screen shows the p-value on the line that reads “p =”.
  • (Most computer statistical software can calculate the  p -value).

If the p -value is less than the significance level ( α = 0.05)

  • Decision: Reject the null hypothesis.
  • Conclusion: “There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.”

If the p -value is NOT less than the significance level ( α = 0.05)

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.”

Calculation Notes:

  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n – 2 degrees of freedom.
  • The formula for the test statistic is [latex]\displaystyle{t}=\dfrac{{{r}\sqrt{{{n}-{2}}}}}{\sqrt{{{1}-{r}^{{2}}}}}[/latex]. The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

Recall: ORDER OF OPERATIONS

1st find the numerator:

Step 1: Find [latex]n-2[/latex], and then take the square root.

Step 2: Multiply the value in Step 1 by [latex]r[/latex].

2nd find the denominator: 

Step 3: Find the square of [latex]r[/latex], which is [latex]r[/latex] multiplied by [latex]r[/latex].

Step 4: Subtract this value from 1, [latex]1 -r^2[/latex].

Step 5: Find the square root of Step 4.

3rd take the numerator and divide by the denominator.

An alternative way to calculate the  p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAM:  p- value method

  • Consider the  third exam/final exam example (example 2).
  • The line of best fit is: [latex]\hat{y}[/latex] = -173.51 + 4.83 x  with  r  = 0.6631 and there are  n  = 11 data points.
  • Can the regression line be used for prediction?  Given a third exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?
  • H 0 :  ρ  = 0
  • H a :  ρ  ≠ 0
  • The  p -value is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The  p -value, 0.026, is less than the significance level of  α  = 0.05.
  • Decision: Reject the Null Hypothesis  H 0
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Because  r  is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

Method 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of r is significant or not . Compare  r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If  r is significant, then you may want to use the line for prediction.

Suppose you computed  r = 0.801 using n = 10 data points. df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

r is not significant between -0.632 and +0.632. r = 0.801 > +0.632. Therefore, r is significant.

For a given line of best fit, you computed that  r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because  r > the positive critical value.

Suppose you computed  r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

r = –0.624-0.532. Therefore, r is significant.

For a given line of best fit, you compute that  r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because  r < the positive critical value.

Suppose you computed  r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

–0.811 <  r = 0.776 < 0.811. Therefore, r is not significant.

For a given line of best fit, you compute that  r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because  r < the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the  third exam/final exam example  again. The line of best fit is: [latex]\hat{y}[/latex] = –173.51+4.83 x  with  r  = 0.6631 and there are  n  = 11 data points. Can the regression line be used for prediction?  Given a third-exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?

  • Use the “95% Critical Value” table for  r  with  df  =  n  – 2 = 11 – 2 = 9.
  • The critical values are –0.602 and +0.602
  • Since 0.6631 > 0.602,  r  is significant.

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if  r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that  r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between  x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population).
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y  values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

The  y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.

  • Provided by : Lumen Learning. License : CC BY: Attribution
  • Testing the Significance of the Correlation Coefficient. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/12-4-testing-the-significance-of-the-correlation-coefficient . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Introductory Statistics. Authored by : Barbara Illowsky, Susan Dean. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/1-introduction . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction

Footer Logo Lumen Candela

Privacy Policy

Examples

Correlation Hypothesis

how to write a correlation hypothesis

Understanding the relationships between variables is pivotal in research. Correlation hypotheses delve into the degree of association between two or more variables. In this guide, delve into an array of correlation hypothesis examples that explore connections, followed by a step-by-step tutorial on crafting these thesis statement hypothesis effectively. Enhance your research prowess with valuable tips tailored to unravel the intricate world of correlations.

What is Correlation Hypothesis?

A correlation hypothesis is a statement that predicts a specific relationship between two or more variables based on the assumption that changes in one variable are associated with changes in another variable. It suggests that there is a correlation or statistical relationship between the variables, meaning that when one variable changes, the other variable is likely to change in a consistent manner.

What is an example of a Correlation Hypothesis Statement?

Example: “If the amount of exercise increases, then the level of physical fitness will also increase.”

In this example, the correlation hypothesis suggests that there is a positive correlation between the amount of exercise a person engages in and their level of physical fitness. As exercise increases, the hypothesis predicts that physical fitness will increase as well. This hypothesis can be tested by collecting data on exercise levels and physical fitness levels and analyzing the relationship between the two variables using statistical methods.

100 Correlation Hypothesis Statement Examples

Correlation Hypothesis Statement Examples

Size: 277 KB

Discover the intriguing world of correlation through a collection of examples that illustrate how variables can be linked in research. Explore diverse scenarios where changes in one variable may correspond to changes in another, forming the basis of correlation hypotheses. These real-world instances shed light on the essence of correlation analysis and its role in uncovering connections between different aspects of data.

  • Study Hours and Exam Scores : If students study more hours per week, then their exam scores will show a positive correlation, indicating that increased study time might lead to better performance.
  • Income and Education : If the level of education increases, then income levels will also rise, demonstrating a positive correlation between education attainment and earning potential.
  • Social Media Usage and Well-being : If individuals spend more time on social media platforms, then their self-reported well-being might exhibit a negative correlation, suggesting that excessive use could impact mental health.
  • Temperature and Ice Cream Sales : If temperatures rise, then the sales of ice cream might increase, displaying a positive correlation due to the weather’s influence on consumer behavior.
  • Physical Activity and Heart Rate : If the intensity of physical activity rises, then heart rate might increase, signifying a positive correlation between exercise intensity and heart rate.
  • Age and Reaction Time : If age increases, then reaction time might show a positive correlation, indicating that as people age, their reaction times might slow down.
  • Smoking and Lung Capacity : If the number of cigarettes smoked daily increases, then lung capacity might decrease, suggesting a negative correlation between smoking and respiratory health.
  • Stress and Sleep Quality : If stress levels elevate, then sleep quality might decline, reflecting a negative correlation between psychological stress and restorative sleep.
  • Rainfall and Crop Yield : If the amount of rainfall decreases, then crop yield might also decrease, illustrating a negative correlation between precipitation and agricultural productivity.
  • Screen Time and Academic Performance : If screen time usage increases among students, then academic performance might show a negative correlation, suggesting that excessive screen time could be detrimental to studies.
  • Exercise and Body Weight : If individuals engage in regular exercise, then their body weight might exhibit a negative correlation, implying that physical activity can contribute to weight management.
  • Income and Crime Rates : If income levels decrease in a neighborhood, then crime rates might show a positive correlation, indicating a potential link between socio-economic factors and crime.
  • Social Support and Mental Health : If the level of social support increases, then individuals’ mental health scores may exhibit a positive correlation, highlighting the potential positive impact of strong social networks on psychological well-being.
  • Study Time and GPA : If students spend more time studying, then their Grade Point Average (GPA) might display a positive correlation, suggesting that increased study efforts may lead to higher academic achievement.
  • Parental Involvement and Academic Success : If parents are more involved in their child’s education, then the child’s academic success may show a positive correlation, emphasizing the role of parental support in shaping student outcomes.
  • Alcohol Consumption and Reaction Time : If alcohol consumption increases, then reaction time might slow down, indicating a negative correlation between alcohol intake and cognitive performance.
  • Social Media Engagement and Loneliness : If time spent on social media platforms increases, then feelings of loneliness might show a positive correlation, suggesting a potential connection between excessive online interaction and emotional well-being.
  • Temperature and Insect Activity : If temperatures rise, then the activity of certain insects might increase, demonstrating a potential positive correlation between temperature and insect behavior.
  • Education Level and Voting Participation : If education levels rise, then voter participation rates may also increase, showcasing a positive correlation between education and civic engagement.
  • Work Commute Time and Job Satisfaction : If work commute time decreases, then job satisfaction might show a positive correlation, indicating that shorter commutes could contribute to higher job satisfaction.
  • Sleep Duration and Cognitive Performance : If sleep duration increases, then cognitive performance scores might also rise, suggesting a potential positive correlation between adequate sleep and cognitive functioning.
  • Healthcare Access and Mortality Rate : If access to healthcare services improves, then the mortality rate might decrease, highlighting a potential negative correlation between healthcare accessibility and mortality.
  • Exercise and Blood Pressure : If individuals engage in regular exercise, then their blood pressure levels might exhibit a negative correlation, indicating that physical activity can contribute to maintaining healthy blood pressure.
  • Social Media Use and Academic Distraction : If students spend more time on social media during study sessions, then their academic focus might show a negative correlation, suggesting that excessive online engagement can hinder concentration.
  • Age and Technological Adaptation : If age increases, then the speed of adapting to new technologies might exhibit a negative correlation, suggesting that younger individuals tend to adapt more quickly.
  • Temperature and Plant Growth : If temperatures rise, then the rate of plant growth might increase, indicating a potential positive correlation between temperature and biological processes.
  • Music Exposure and Mood : If individuals listen to upbeat music, then their reported mood might show a positive correlation, suggesting that music can influence emotional states.
  • Income and Healthcare Utilization : If income levels increase, then the frequency of healthcare utilization might decrease, suggesting a potential negative correlation between income and healthcare needs.
  • Distance and Communication Frequency : If physical distance between individuals increases, then their communication frequency might show a negative correlation, indicating that proximity tends to facilitate communication.
  • Study Group Attendance and Exam Scores : If students regularly attend study groups, then their exam scores might exhibit a positive correlation, suggesting that collaborative study efforts could enhance performance.
  • Temperature and Disease Transmission : If temperatures rise, then the transmission of certain diseases might increase, pointing to a potential positive correlation between temperature and disease spread.
  • Interest Rates and Consumer Spending : If interest rates decrease, then consumer spending might show a positive correlation, suggesting that lower interest rates encourage increased economic activity.
  • Digital Device Use and Eye Strain : If individuals spend more time on digital devices, then the occurrence of eye strain might show a positive correlation, suggesting that prolonged screen time can impact eye health.
  • Parental Education and Children’s Educational Attainment : If parents have higher levels of education, then their children’s educational attainment might display a positive correlation, highlighting the intergenerational impact of education.
  • Social Interaction and Happiness : If individuals engage in frequent social interactions, then their reported happiness levels might show a positive correlation, indicating that social connections contribute to well-being.
  • Temperature and Energy Consumption : If temperatures decrease, then energy consumption for heating might increase, suggesting a potential positive correlation between temperature and energy usage.
  • Physical Activity and Stress Reduction : If individuals engage in regular physical activity, then their reported stress levels might display a negative correlation, indicating that exercise can help alleviate stress.
  • Diet Quality and Chronic Diseases : If diet quality improves, then the prevalence of chronic diseases might decrease, suggesting a potential negative correlation between healthy eating habits and disease risk.
  • Social Media Use and Body Image Dissatisfaction : If time spent on social media increases, then feelings of body image dissatisfaction might show a positive correlation, suggesting that online platforms can influence self-perception.
  • Income and Access to Quality Education : If household income increases, then access to quality education for children might improve, suggesting a potential positive correlation between financial resources and educational opportunities.
  • Workplace Diversity and Innovation : If workplace diversity increases, then the rate of innovation might show a positive correlation, indicating that diverse teams often generate more creative solutions.
  • Physical Activity and Bone Density : If individuals engage in weight-bearing exercises, then their bone density might exhibit a positive correlation, suggesting that exercise contributes to bone health.
  • Screen Time and Attention Span : If screen time increases, then attention span might show a negative correlation, indicating that excessive screen exposure can impact sustained focus.
  • Social Support and Resilience : If individuals have strong social support networks, then their resilience levels might display a positive correlation, suggesting that social connections contribute to coping abilities.
  • Weather Conditions and Mood : If sunny weather persists, then individuals’ reported mood might exhibit a positive correlation, reflecting the potential impact of weather on emotional states.
  • Nutrition Education and Healthy Eating : If individuals receive nutrition education, then their consumption of fruits and vegetables might show a positive correlation, suggesting that knowledge influences dietary choices.
  • Physical Activity and Cognitive Aging : If adults engage in regular physical activity, then their cognitive decline with aging might show a slower rate, indicating a potential negative correlation between exercise and cognitive aging.
  • Air Quality and Respiratory Illnesses : If air quality deteriorates, then the incidence of respiratory illnesses might increase, suggesting a potential positive correlation between air pollutants and health impacts.
  • Reading Habits and Vocabulary Growth : If individuals read regularly, then their vocabulary size might exhibit a positive correlation, suggesting that reading contributes to language development.
  • Sleep Quality and Stress Levels : If sleep quality improves, then reported stress levels might display a negative correlation, indicating that sleep can impact psychological well-being.
  • Social Media Engagement and Academic Performance : If students spend more time on social media, then their academic performance might exhibit a negative correlation, suggesting that excessive online engagement can impact studies.
  • Exercise and Blood Sugar Levels : If individuals engage in regular exercise, then their blood sugar levels might display a negative correlation, indicating that physical activity can influence glucose regulation.
  • Screen Time and Sleep Duration : If screen time before bedtime increases, then sleep duration might show a negative correlation, suggesting that screen exposure can affect sleep patterns.
  • Environmental Pollution and Health Outcomes : If exposure to environmental pollutants increases, then the occurrence of health issues might show a positive correlation, suggesting that pollution can impact well-being.
  • Time Management and Academic Achievement : If students improve time management skills, then their academic achievement might exhibit a positive correlation, indicating that effective planning contributes to success.
  • Physical Fitness and Heart Health : If individuals improve their physical fitness, then their heart health indicators might display a positive correlation, indicating that exercise benefits cardiovascular well-being.
  • Weather Conditions and Outdoor Activities : If weather is sunny, then outdoor activities might show a positive correlation, suggesting that favorable weather encourages outdoor engagement.
  • Media Exposure and Body Image Perception : If exposure to media images increases, then body image dissatisfaction might show a positive correlation, indicating media’s potential influence on self-perception.
  • Community Engagement and Civic Participation : If individuals engage in community activities, then their civic participation might exhibit a positive correlation, indicating an active citizenry.
  • Social Media Use and Productivity : If individuals spend more time on social media, then their productivity levels might exhibit a negative correlation, suggesting that online distractions can affect work efficiency.
  • Income and Stress Levels : If income levels increase, then reported stress levels might exhibit a negative correlation, suggesting that financial stability can impact psychological well-being.
  • Social Media Use and Interpersonal Skills : If individuals spend more time on social media, then their interpersonal skills might show a negative correlation, indicating potential effects on face-to-face interactions.
  • Parental Involvement and Academic Motivation : If parents are more involved in their child’s education, then the child’s academic motivation may exhibit a positive correlation, highlighting the role of parental support.
  • Technology Use and Sleep Quality : If screen time increases before bedtime, then sleep quality might show a negative correlation, suggesting that technology use can impact sleep.
  • Outdoor Activity and Mood Enhancement : If individuals engage in outdoor activities, then their reported mood might display a positive correlation, suggesting the potential emotional benefits of nature exposure.
  • Income Inequality and Social Mobility : If income inequality increases, then social mobility might exhibit a negative correlation, suggesting that higher inequality can hinder upward mobility.
  • Vegetable Consumption and Heart Health : If individuals increase their vegetable consumption, then heart health indicators might show a positive correlation, indicating the potential benefits of a nutritious diet.
  • Online Learning and Academic Achievement : If students engage in online learning, then their academic achievement might display a positive correlation, highlighting the effectiveness of digital education.
  • Emotional Intelligence and Workplace Performance : If emotional intelligence improves, then workplace performance might exhibit a positive correlation, indicating the relevance of emotional skills.
  • Community Engagement and Mental Well-being : If individuals engage in community activities, then their reported mental well-being might show a positive correlation, emphasizing social connections’ impact.
  • Rainfall and Agriculture Productivity : If rainfall levels increase, then agricultural productivity might exhibit a positive correlation, indicating the importance of water for crops.
  • Social Media Use and Body Posture : If screen time increases, then poor body posture might show a positive correlation, suggesting that screen use can influence physical habits.
  • Marital Satisfaction and Relationship Length : If marital satisfaction decreases, then relationship length might show a negative correlation, indicating potential challenges over time.
  • Exercise and Anxiety Levels : If individuals engage in regular exercise, then reported anxiety levels might exhibit a negative correlation, indicating the potential benefits of physical activity on mental health.
  • Music Listening and Concentration : If individuals listen to instrumental music, then their concentration levels might display a positive correlation, suggesting music’s impact on focus.
  • Internet Usage and Attention Deficits : If screen time increases, then attention deficits might show a positive correlation, implying that excessive internet use can affect concentration.
  • Financial Literacy and Debt Levels : If financial literacy improves, then personal debt levels might exhibit a negative correlation, suggesting better financial decision-making.
  • Time Spent Outdoors and Vitamin D Levels : If time spent outdoors increases, then vitamin D levels might show a positive correlation, indicating sun exposure’s role in vitamin synthesis.
  • Family Meal Frequency and Nutrition : If families eat meals together frequently, then nutrition quality might display a positive correlation, emphasizing family dining’s impact on health.
  • Temperature and Allergy Symptoms : If temperatures rise, then allergy symptoms might increase, suggesting a potential positive correlation between temperature and allergen exposure.
  • Social Media Use and Academic Distraction : If students spend more time on social media, then their academic focus might exhibit a negative correlation, indicating that online engagement can hinder studies.
  • Financial Stress and Health Outcomes : If financial stress increases, then the occurrence of health issues might show a positive correlation, suggesting potential health impacts of economic strain.
  • Study Hours and Test Anxiety : If students study more hours, then test anxiety might show a negative correlation, suggesting that increased preparation can reduce anxiety.
  • Music Tempo and Exercise Intensity : If music tempo increases, then exercise intensity might display a positive correlation, indicating music’s potential to influence workout vigor.
  • Green Space Accessibility and Stress Reduction : If access to green spaces improves, then reported stress levels might exhibit a negative correlation, highlighting nature’s stress-reducing effects.
  • Parenting Style and Child Behavior : If authoritative parenting increases, then positive child behaviors might display a positive correlation, suggesting parenting’s influence on behavior.
  • Sleep Quality and Productivity : If sleep quality improves, then work productivity might show a positive correlation, emphasizing the connection between rest and efficiency.
  • Media Consumption and Political Beliefs : If media consumption increases, then alignment with specific political beliefs might exhibit a positive correlation, suggesting media’s influence on ideology.
  • Workplace Satisfaction and Employee Retention : If workplace satisfaction increases, then employee retention rates might show a positive correlation, indicating the link between job satisfaction and tenure.
  • Digital Device Use and Eye Discomfort : If screen time increases, then reported eye discomfort might show a positive correlation, indicating potential impacts of screen exposure.
  • Age and Adaptability to Technology : If age increases, then adaptability to new technologies might exhibit a negative correlation, indicating generational differences in tech adoption.
  • Physical Activity and Mental Health : If individuals engage in regular physical activity, then reported mental health scores might exhibit a positive correlation, showcasing exercise’s impact.
  • Video Gaming and Attention Span : If time spent on video games increases, then attention span might display a negative correlation, indicating potential effects on focus.
  • Social Media Use and Empathy Levels : If social media use increases, then reported empathy levels might show a negative correlation, suggesting possible effects on emotional understanding.
  • Reading Habits and Creativity : If individuals read diverse genres, then their creative thinking might exhibit a positive correlation, emphasizing reading’s cognitive benefits.
  • Weather Conditions and Outdoor Exercise : If weather is pleasant, then outdoor exercise might show a positive correlation, suggesting weather’s influence on physical activity.
  • Parental Involvement and Bullying Prevention : If parents are actively involved, then instances of bullying might exhibit a negative correlation, emphasizing parental impact on behavior.
  • Digital Device Use and Sleep Disruption : If screen time before bedtime increases, then sleep disruption might show a positive correlation, indicating technology’s influence on sleep.
  • Friendship Quality and Psychological Well-being : If friendship quality increases, then reported psychological well-being might show a positive correlation, highlighting social support’s impact.
  • Income and Environmental Consciousness : If income levels increase, then environmental consciousness might also rise, indicating potential links between affluence and sustainability awareness.

Correlational Hypothesis Interpretation Statement Examples

Explore the art of interpreting correlation hypotheses with these illustrative examples. Understand the implications of positive, negative, and zero correlations, and learn how to deduce meaningful insights from data relationships.

  • Relationship Between Exercise and Mood : A positive correlation between exercise frequency and mood scores suggests that increased physical activity might contribute to enhanced emotional well-being.
  • Association Between Screen Time and Sleep Quality : A negative correlation between screen time before bedtime and sleep quality indicates that higher screen exposure could lead to poorer sleep outcomes.
  • Connection Between Study Hours and Exam Performance : A positive correlation between study hours and exam scores implies that increased study time might correspond to better academic results.
  • Link Between Stress Levels and Meditation Practice : A negative correlation between stress levels and meditation frequency suggests that engaging in meditation could be associated with lower perceived stress.
  • Relationship Between Social Media Use and Loneliness : A positive correlation between social media engagement and feelings of loneliness implies that excessive online interaction might contribute to increased loneliness.
  • Association Between Income and Happiness : A positive correlation between income and self-reported happiness indicates that higher income levels might be linked to greater subjective well-being.
  • Connection Between Parental Involvement and Academic Performance : A positive correlation between parental involvement and students’ grades suggests that active parental engagement might contribute to better academic outcomes.
  • Link Between Time Management and Stress Levels : A negative correlation between effective time management and reported stress levels implies that better time management skills could lead to lower stress.
  • Relationship Between Outdoor Activities and Vitamin D Levels : A positive correlation between time spent outdoors and vitamin D levels suggests that increased outdoor engagement might be associated with higher vitamin D concentrations.
  • Association Between Water Consumption and Skin Hydration : A positive correlation between water intake and skin hydration indicates that higher fluid consumption might lead to improved skin moisture levels.

Alternative Correlational Hypothesis Statement Examples

Explore alternative scenarios and potential correlations in these examples. Learn to articulate different hypotheses that could explain data relationships beyond the conventional assumptions.

  • Alternative to Exercise and Mood : An alternative hypothesis could suggest a non-linear relationship between exercise and mood, indicating that moderate exercise might have the most positive impact on emotional well-being.
  • Alternative to Screen Time and Sleep Quality : An alternative hypothesis might propose that screen time has a curvilinear relationship with sleep quality, suggesting that moderate screen exposure leads to optimal sleep outcomes.
  • Alternative to Study Hours and Exam Performance : An alternative hypothesis could propose that there’s an interaction effect between study hours and study method, influencing the relationship between study time and exam scores.
  • Alternative to Stress Levels and Meditation Practice : An alternative hypothesis might consider that the relationship between stress levels and meditation practice is moderated by personality traits, resulting in varying effects.
  • Alternative to Social Media Use and Loneliness : An alternative hypothesis could posit that the relationship between social media use and loneliness depends on the quality of online interactions and content consumption.
  • Alternative to Income and Happiness : An alternative hypothesis might propose that the relationship between income and happiness differs based on cultural factors, leading to varying happiness levels at different income ranges.
  • Alternative to Parental Involvement and Academic Performance : An alternative hypothesis could suggest that the relationship between parental involvement and academic performance varies based on students’ learning styles and preferences.
  • Alternative to Time Management and Stress Levels : An alternative hypothesis might explore the possibility of a curvilinear relationship between time management and stress levels, indicating that extreme time management efforts might elevate stress.
  • Alternative to Outdoor Activities and Vitamin D Levels : An alternative hypothesis could consider that the relationship between outdoor activities and vitamin D levels is moderated by sunscreen usage, influencing vitamin synthesis.
  • Alternative to Water Consumption and Skin Hydration : An alternative hypothesis might propose that the relationship between water consumption and skin hydration is mediated by dietary factors, influencing fluid retention and skin health.

Correlational Hypothesis Pearson Interpretation Statement Examples

Discover how the Pearson correlation coefficient enhances your understanding of data relationships with these examples. Learn to interpret correlation strength and direction using this valuable statistical measure.

  • Strong Positive Correlation : A Pearson correlation coefficient of +0.85 between study time and exam scores indicates a strong positive relationship, suggesting that increased study time is strongly associated with higher grades.
  • Moderate Negative Correlation : A Pearson correlation coefficient of -0.45 between screen time and sleep quality reflects a moderate negative correlation, implying that higher screen exposure is moderately linked to poorer sleep outcomes.
  • Weak Positive Correlation : A Pearson correlation coefficient of +0.25 between social media use and loneliness suggests a weak positive correlation, indicating that increased online engagement is weakly related to higher loneliness.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.75 between stress levels and meditation practice indicates a strong negative relationship, implying that engaging in meditation is strongly associated with lower stress.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.60 between income and happiness signifies a moderate positive correlation, suggesting that higher income is moderately linked to greater happiness.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.30 between parental involvement and academic performance represents a weak negative correlation, indicating that higher parental involvement is weakly associated with lower academic performance.
  • Strong Positive Correlation : A Pearson correlation coefficient of +0.80 between time management and stress levels reveals a strong positive relationship, suggesting that effective time management is strongly linked to lower stress.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.20 between outdoor activities and vitamin D levels signifies a weak negative correlation, implying that higher outdoor engagement is weakly related to lower vitamin D levels.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.50 between water consumption and skin hydration denotes a moderate positive correlation, suggesting that increased fluid intake is moderately linked to better skin hydration.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.70 between screen time and attention span indicates a strong negative relationship, implying that higher screen exposure is strongly associated with shorter attention spans.

Correlational Hypothesis Statement Examples in Psychology

Explore how correlation hypotheses apply to psychological research with these examples. Understand how psychologists investigate relationships between variables to gain insights into human behavior.

  • Sleep Patterns and Cognitive Performance : There is a positive correlation between consistent sleep patterns and cognitive performance, suggesting that individuals with regular sleep schedules exhibit better cognitive functioning.
  • Anxiety Levels and Social Media Use : There is a positive correlation between anxiety levels and excessive social media use, indicating that individuals who spend more time on social media might experience higher anxiety.
  • Self-Esteem and Body Image Satisfaction : There is a negative correlation between self-esteem and body image satisfaction, implying that individuals with higher self-esteem tend to be more satisfied with their physical appearance.
  • Parenting Styles and Child Aggression : There is a negative correlation between authoritative parenting styles and child aggression, suggesting that children raised by authoritative parents might exhibit lower levels of aggression.
  • Emotional Intelligence and Conflict Resolution : There is a positive correlation between emotional intelligence and effective conflict resolution, indicating that individuals with higher emotional intelligence tend to resolve conflicts more successfully.
  • Personality Traits and Career Satisfaction : There is a positive correlation between certain personality traits (e.g., extraversion, openness) and career satisfaction, suggesting that individuals with specific traits experience higher job contentment.
  • Stress Levels and Coping Mechanisms : There is a negative correlation between stress levels and adaptive coping mechanisms, indicating that individuals with lower stress levels are more likely to employ effective coping strategies.
  • Attachment Styles and Romantic Relationship Quality : There is a positive correlation between secure attachment styles and higher romantic relationship quality, suggesting that individuals with secure attachments tend to have healthier relationships.
  • Social Support and Mental Health : There is a negative correlation between perceived social support and mental health issues, indicating that individuals with strong social support networks tend to experience fewer mental health challenges.
  • Motivation and Academic Achievement : There is a positive correlation between intrinsic motivation and academic achievement, implying that students who are internally motivated tend to perform better academically.

Does Correlational Research Have Hypothesis?

Correlational research involves examining the relationship between two or more variables to determine whether they are related and how they change together. While correlational studies do not establish causation, they still utilize hypotheses to formulate expectations about the relationships between variables. These good hypotheses predict the presence, direction, and strength of correlations. However, in correlational research, the focus is on measuring and analyzing the degree of association rather than establishing cause-and-effect relationships.

How Do You Write a Null-Hypothesis for a Correlational Study?

The null hypothesis in a correlational study states that there is no significant correlation between the variables being studied. It assumes that any observed correlation is due to chance and lacks meaningful association. When writing a null hypothesis for a correlational study, follow these steps:

  • Identify the Variables: Clearly define the variables you are studying and their relationship (e.g., “There is no significant correlation between X and Y”).
  • Specify the Population: Indicate the population from which the data is drawn (e.g., “In the population of [target population]…”).
  • Include the Direction of Correlation: If relevant, specify the direction of correlation (positive, negative, or zero) that you are testing (e.g., “…there is no significant positive/negative correlation…”).
  • State the Hypothesis: Write the null hypothesis as a clear statement that there is no significant correlation between the variables (e.g., “…there is no significant correlation between X and Y”).

What Is Correlation Hypothesis Formula?

The correlation hypothesis is often expressed in the form of a statement that predicts the presence and nature of a relationship between two variables. It typically follows the “If-Then” structure, indicating the expected change in one variable based on changes in another. The correlation hypothesis formula can be written as:

“If [Variable X] changes, then [Variable Y] will also change [in a specified direction] because [rationale for the expected correlation].”

For example, “If the amount of exercise increases, then mood scores will improve because physical activity has been linked to better emotional well-being.”

What Is a Correlational Hypothesis in Research Methodology?

A correlational hypothesis in research methodology is a testable hypothesis statement that predicts the presence and nature of a relationship between two or more variables. It forms the basis for conducting a correlational study, where the goal is to measure and analyze the degree of association between variables. Correlational hypotheses are essential in guiding the research process, collecting relevant data, and assessing whether the observed correlations are statistically significant.

How Do You Write a Hypothesis for Correlation? – A Step by Step Guide

Writing a hypothesis for correlation involves crafting a clear and testable statement about the expected relationship between variables. Here’s a step-by-step guide:

  • Identify Variables : Clearly define the variables you are studying and their nature (e.g., “There is a relationship between X and Y…”).
  • Specify Direction : Indicate the expected direction of correlation (positive, negative, or zero) based on your understanding of the variables and existing literature.
  • Formulate the If-Then Statement : Write an “If-Then” statement that predicts the change in one variable based on changes in the other variable (e.g., “If [Variable X] changes, then [Variable Y] will also change [in a specified direction]…”).
  • Provide Rationale : Explain why you expect the correlation to exist, referencing existing theories, research, or logical reasoning.
  • Quantitative Prediction (Optional) : If applicable, provide a quantitative prediction about the strength of the correlation (e.g., “…for every one unit increase in [Variable X], [Variable Y] is predicted to increase by [numerical value].”).
  • Specify Population : Indicate the population to which your hypothesis applies (e.g., “In a sample of [target population]…”).

Tips for Writing Correlational Hypothesis

  • Base on Existing Knowledge : Ground your hypothesis in existing literature, theories, or empirical evidence to ensure it’s well-informed.
  • Be Specific : Clearly define the variables and direction of correlation you’re predicting to avoid ambiguity.
  • Avoid Causation Claims : Remember that correlational hypotheses do not imply causation. Focus on predicting relationships, not causes.
  • Use Clear Language : Write in clear and concise language, avoiding jargon that may confuse readers.
  • Consider Alternative Explanations : Acknowledge potential confounding variables or alternative explanations that could affect the observed correlation.
  • Be Open to Results : Correlation results can be unexpected. Be prepared to interpret findings even if they don’t align with your initial hypothesis.
  • Test Statistically : Once you collect data, use appropriate statistical tests to determine if the observed correlation is statistically significant.
  • Revise as Needed : If your findings don’t support your hypothesis, revise it based on the data and insights gained.

Crafting a well-structured correlational hypothesis is crucial for guiding your research, conducting meaningful analysis, and contributing to the understanding of relationships between variables.

AI Generator

Text prompt

  • Instructive
  • Professional

10 Examples of Public speaking

20 Examples of Gas lighting

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Correlational Research | When & How to Use

Correlational Research | When & How to Use

Published on July 7, 2021 by Pritha Bhandari . Revised on June 22, 2023.

A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them.

A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

Table of contents

Correlational vs. experimental research, when to use correlational research, how to collect correlational data, how to analyze correlational data, correlation and causation, other interesting articles, frequently asked questions about correlational research.

Correlational and experimental research both use quantitative methods to investigate relationships between variables. But there are important differences in data collection methods and the types of conclusions you can draw.

Prevent plagiarism. Run a free check.

Correlational research is ideal for gathering data quickly from natural settings. That helps you generalize your findings to real-life situations in an externally valid way.

There are a few situations where correlational research is an appropriate choice.

To investigate non-causal relationships

You want to find out if there is an association between two variables, but you don’t expect to find a causal relationship between them.

Correlational research can provide insights into complex real-world relationships, helping researchers develop theories and make predictions.

To explore causal relationships between variables

You think there is a causal relationship between two variables, but it is impractical, unethical, or too costly to conduct experimental research that manipulates one of the variables.

Correlational research can provide initial indications or additional support for theories about causal relationships.

To test new measurement tools

You have developed a new instrument for measuring your variable, and you need to test its reliability or validity .

Correlational research can be used to assess whether a tool consistently or accurately captures the concept it aims to measure.

There are many different methods you can use in correlational research. In the social and behavioral sciences, the most common data collection methods for this type of research include surveys, observations , and secondary data.

It’s important to carefully choose and plan your methods to ensure the reliability and validity of your results. You should carefully select a representative sample so that your data reflects the population you’re interested in without research bias .

In survey research , you can use questionnaires to measure your variables of interest. You can conduct surveys online, by mail, by phone, or in person.

Surveys are a quick, flexible way to collect standardized data from many participants, but it’s important to ensure that your questions are worded in an unbiased way and capture relevant insights.

Naturalistic observation

Naturalistic observation is a type of field research where you gather data about a behavior or phenomenon in its natural environment.

This method often involves recording, counting, describing, and categorizing actions and events. Naturalistic observation can include both qualitative and quantitative elements, but to assess correlation, you collect data that can be analyzed quantitatively (e.g., frequencies, durations, scales, and amounts).

Naturalistic observation lets you easily generalize your results to real world contexts, and you can study experiences that aren’t replicable in lab settings. But data analysis can be time-consuming and unpredictable, and researcher bias may skew the interpretations.

Secondary data

Instead of collecting original data, you can also use data that has already been collected for a different purpose, such as official records, polls, or previous studies.

Using secondary data is inexpensive and fast, because data collection is complete. However, the data may be unreliable, incomplete or not entirely relevant, and you have no control over the reliability or validity of the data collection procedures.

After collecting data, you can statistically analyze the relationship between variables using correlation or regression analyses, or both. You can also visualize the relationships between variables with a scatterplot.

Different types of correlation coefficients and regression analyses are appropriate for your data based on their levels of measurement and distributions .

Correlation analysis

Using a correlation analysis, you can summarize the relationship between variables into a correlation coefficient : a single number that describes the strength and direction of the relationship between variables. With this number, you’ll quantify the degree of the relationship between variables.

The Pearson product-moment correlation coefficient , also known as Pearson’s r , is commonly used for assessing a linear relationship between two quantitative variables.

Correlation coefficients are usually found for two variables at a time, but you can use a multiple correlation coefficient for three or more variables.

Regression analysis

With a regression analysis , you can predict how much a change in one variable will be associated with a change in the other variable. The result is a regression equation that describes the line on a graph of your variables.

You can use this equation to predict the value of one variable based on the given value(s) of the other variable(s). It’s best to perform a regression analysis after testing for a correlation between your variables.

It’s important to remember that correlation does not imply causation . Just because you find a correlation between two things doesn’t mean you can conclude one of them causes the other for a few reasons.

Directionality problem

If two variables are correlated, it could be because one of them is a cause and the other is an effect. But the correlational research design doesn’t allow you to infer which is which. To err on the side of caution, researchers don’t conclude causality from correlational studies.

Third variable problem

A confounding variable is a third variable that influences other variables to make them seem causally related even though they are not. Instead, there are separate causal links between the confounder and each variable.

In correlational research, there’s limited or no researcher control over extraneous variables . Even if you statistically control for some potential confounders, there may still be other hidden variables that disguise the relationship between your study variables.

Although a correlational study can’t demonstrate causation on its own, it can help you develop a causal hypothesis that’s tested in controlled experiments.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). Correlational Research | When & How to Use. Scribbr. Retrieved April 9, 2024, from https://www.scribbr.com/methodology/correlational-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, what is quantitative research | definition, uses & methods, correlation vs. causation | difference, designs & examples, correlation coefficient | types, formulas & examples, what is your plagiarism score.

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

how to write a correlation hypothesis

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

how to write a correlation hypothesis

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis, operational definitions, types of hypotheses, hypotheses examples.

  • Collecting Data

Frequently Asked Questions

A hypothesis is a tentative statement about the relationship between two or more  variables. It is a specific, testable prediction about what you expect to happen in a study.

One hypothesis example would be a study designed to look at the relationship between sleep deprivation and test performance might have a hypothesis that states: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. It is only at this point that researchers begin to develop a testable hypothesis. Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore a number of factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk wisdom that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis.   In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in a number of different ways. One of the basic principles of any type of scientific research is that the results must be replicable.   By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. How would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

In order to measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming other people. In this situation, the researcher might utilize a simulated task to measure aggressiveness.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests that there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type of hypothesis suggests a relationship between three or more variables, such as two independent variables and a dependent variable.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative sample of the population and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • Complex hypothesis: "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "Children who receive a new reading intervention will have scores different than students who do not receive the intervention."
  • "There will be no difference in scores on a memory recall task between children and adults."

Examples of an alternative hypothesis:

  • "Children who receive a new reading intervention will perform better than students who did not receive the intervention."
  • "Adults will perform better on a memory task than children." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when it would be impossible or difficult to  conduct an experiment . These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a correlational study can then be used to look at how the variables are related. This type of research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

A Word From Verywell

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Some examples of how to write a hypothesis include:

  • "Staying up late will lead to worse test performance the next day."
  • "People who consume one apple each day will visit the doctor fewer times each year."
  • "Breaking study sessions up into three 20-minute sessions will lead to better test results than a single 60-minute study session."

The four parts of a hypothesis are:

  • The research question
  • The independent variable (IV)
  • The dependent variable (DV)
  • The proposed relationship between the IV and DV

Castillo M. The scientific method: a need for something better? . AJNR Am J Neuroradiol. 2013;34(9):1669-71. doi:10.3174/ajnr.A3401

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Population, sample and hypothesis testing

What is a hypothesis?

A hypothesis is an assumption that is neither proven nor disproven. In the research process, a hypothesis is made at the very beginning and the goal is to either reject or not reject the hypothesis. In order to reject or or not reject a hypothesis, data, e.g. from an experiment or a survey, are needed, which are then evaluated using a hypothesis test .

Usually, hypotheses are formulated starting from a literature review. Based on the literature review, you can then justify why you formulated the hypothesis in this way.

An example of a hypothesis could be: "Men earn more than women in the same job in Austira."

hypothesis

To test this hypothesis, you need data, e.g. from a survey, and a suitable hypothesis test such as the t-test or correlation analysis . Don't worry, DATAtab will help you choose the right hypothesis test.

How do I formulate a hypothesis?

In order to formulate a hypothesis, a research question must first be defined. A precisely formulated hypothesis about the population can then be derived from the research question, e.g. men earn more than women in the same job in Austria.

Formulate hypothesis

Hypotheses are not simple statements; they are formulated in such a way that they can be tested with collected data in the course of the research process.

To test a hypothesis, it is necessary to define exactly which variables are involved and how the variables are related. Hypotheses, then, are assumptions about the cause-and-effect relationships or the associations between variables.

What is a variable?

A variable is a property of an object or event that can take on different values. For example, the eye color is a variable, it is the property of the object eye and can take different values (blue, brown,...).

If you are researching in the social sciences, your variables may be:

  • Attitude towards environmental protection

If you are researching in the medical field, your variables may be:

  • Body weight
  • Smoking status

What is the null and alternative hypothesis?

There are always two hypotheses that are exactly opposite to each other, or that claim the opposite. These opposite hypotheses are called null and alternative hypothesis and are abbreviated with H0 and H1 .

Null hypothesis H0:

The null hypothesis assumes that there is no difference between two or more groups with respect to a characteristic.

The salary of men and women does not differ in Austria.

Alternative hypothesis H1:

Alternative hypotheses, on the other hand, assume that there is a difference between two or more groups.

The salary of men and women differs in Austria.

The hypothesis that you want to test or that you have derived from the theory usually states that there is an effect e.g. gender has an effect on salary . This hypothesis is called an alternative hypothesis.

The null hypothesis usually states that there is no effect e.g. gender has no effect on salary . In a hypothesis test, only the null hypothesis can be tested; the goal is to find out whether the null hypothesis is rejected or not.

Types of hypotheses

What types of hypotheses are available? The most common distinction is between difference and correlation hypotheses , as well as directional and non-directional hypotheses .

Differential and correlation hypotheses

Difference hypotheses are used when different groups are to be distinguished, e.g., the group of men and the group of women. Correlation hypotheses are used when the relationship or correlation between variables is to be tested, e.g., the relationship between age and height.

Difference hypotheses

Difference hypotheses test whether there is a difference between two or more groups.

Difference hypotheses

Examples of difference hypotheses are:

  • The "group" of men earn more than the "group" of women.
  • Smokers have a higher risk of heart attack than non-smokers
  • There is a difference between Germany, Austria and France in terms of hours worked per week.

Thus, one variable is always a categorical variable, e.g., gender (male, female), smoking status (smoker, nonsmoker), or country (Germany, Austria, and France); the other variable is at least ordinally scaled, e.g., salary, percent risk of heart attack, or hours worked per week.

Correlation hypotheses

Correlation hypotheses test correlations between two variables, for example height and body weight

Correlation hypotheses

Correlation hypotheses are, for example:

  • The taller a person is, the heavier he is.
  • The more horsepower a car has, the higher its fuel consumption.
  • The better the math grade, the higher the future salary.

As can be seen from the examples, correlation hypotheses often take the form "The more..., the higher/lower...". Thus, at least two ordinally scaled variables are being examined.

Directional and non-directional hypotheses

Hypotheses are divided into directional and non-directional or one-sided and two-sided hypotheses. If the hypothesis contains words like "better than" or "worse than", the hypothesis is usually directional.

directional hypotheses

In the case of a non-directional hypothesis, one often finds building blocks such as "there is a difference between" in the formulation, but it is not stated in which direction the difference lies.

  • With a non-directional hypothesis , the only thing of interest is whether there is a difference in a value between the groups under consideration.
  • In a directional hypothesis , what is of interest is whether one group has a higher or lower value than the other.

Directional and non-directional hypothesis test

Non-directional hypotheses

Non-directional hypotheses test whether there is a relationship or a difference, and it does not matter in which direction the relationship or difference goes. In the case of a difference hypothesis, this means there is a difference between two groups, but it does not say whether one of the groups has a higher value.

  • There is a difference between the salary of men and women (but it is not said who earns more!).
  • There is a difference in the risk of heart attack between smokers and non-smokers (but it is not said who has the higher risk!).

In regard to a correlation hypothesis, this means there is a relationship or correlation between two variables, but it is not said whether this relationship is positive or negative.

  • There is a correlation, between height and weight.
  • There is a correlation between horsepower and fuel consumption in cars.

In both cases it is not said whether this correlation is positive or negative!

Directional hypotheses

Directional hypotheses additionally indicate the direction of the relationship or the difference. In the case of the difference hypothesis a statement is made which group has a higher or lower value.

  • Men earn more than women

In the case of a correlation hypothesis, a statement is made as to whether the correlation is positive or negative.

  • The taller a person is the heavier he is
  • The more horsepower a car has, the higher its fuel economy

The p-value for directional hypotheses

Usually, statistical software always calculates the non-directional test and then also outputs the p-value for this.

To obtain the p-value for the directional hypothesis, it must first be checked whether the effect is in the right direction. Then the p-value must be divided by two. This is because the significance level is not split on two sides, but only on one side. More about this in the tutorial about the p-value .

If you select a directed alternative hypothesis in DATAtab for the calculated hypothesis test, the conversion is done automatically and you only need to read the result.

Step-by-step instructions for testing hypotheses

  • Literature research
  • Formulation of the hypothesis
  • Define scale level
  • Determine significance level
  • Determination of hypothesis type
  • Which hypothesis test is suitable for the scale level and hypothesis type?

Next tutorial about hypothesis testing

The next tutorial is about hypothesis testing. You will learn what hypothesis tests are, how to find the right one and how to interpret it.

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 301 pages
  • 4rd revised edition (February 2024)
  • Only 6.99 €

Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Statistics Calculator

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

Statology

Statistics Made Easy

How to Perform a Correlation Test in R (With Examples)

One way to quantify the relationship between two variables is to use the Pearson correlation coefficient , which is a measure of the linear association between two variables .

It always takes on a value between -1 and 1 where:

  • -1 indicates a perfectly negative linear correlation between two variables
  • 0 indicates no linear correlation between two variables
  • 1 indicates a perfectly positive linear correlation between two variables

To determine if a correlation coefficient is statistically significant, you can calculate the corresponding t-score and p-value.

The formula to calculate the t-score of a correlation coefficient (r) is:

t = r * √ n-2 / √ 1-r 2

The p-value is calculated as the corresponding two-sided p-value for the t-distribution with n-2 degrees of freedom.

Example: Correlation Test in R

To determine if the correlation coefficient between two variables is statistically significant, you can perform a correlation test in R using the following syntax:

cor.test(x, y, method=c(“pearson”, “kendall”, “spearman”))

  • x, y: Numeric vectors of data.
  • method:  Method used to calculate correlation between two vectors. Default is “pearson.”

For example, suppose we have the following two vectors in R:

Before we perform a correlation test between the two variables, we can create a quick scatterplot to view their relationship:

Correlation test in R

There appears to be a positive correlation between the two variables. That is, as one increases the other tends to increase as well.

To see if this correlation is statistically significant, we can perform a correlation test:

The correlation coefficient between the two vectors turns out to be 0.9279869 .

The test statistic turns out to be 7.8756 and the corresponding p-value is  1.35e-05 .

Since this value is less than .05, we have sufficient evidence to say that the correlation between the two variables is statistically significant.

Additional Resources

The following tutorials provide additional information about correlation coefficients:

An Introduction to the Pearson Correlation Coefficient What is Considered to Be a “Strong” Correlation? The Five Assumptions for Pearson Correlation

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

12.5: Testing the Significance of the Correlation Coefficient

  • Last updated
  • Save as PDF
  • Page ID 800

The correlation coefficient, \(r\), tells us about the strength and direction of the linear relationship between \(x\) and \(y\). However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient \(r\) and the sample size \(n\), together. We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute \(r\), the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, \(r\), is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is \(\rho\), the Greek letter "rho."
  • \(\rho =\) population correlation coefficient (unknown)
  • \(r =\) sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\).

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between \(x\) and \(y\). We can use the regression line to model the linear relationship between \(x\) and \(y\) in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is not significantly different from zero."
  • What the conclusion means: There is not a significant linear relationship between \(x\) and \(y\). Therefore, we CANNOT use the regression line to model a linear relationship between \(x\) and \(y\) in the population.
  • If \(r\) is significant and the scatter plot shows a linear trend, the line can be used to predict the value of \(y\) for values of \(x\) that are within the domain of observed \(x\) values.
  • If \(r\) is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If \(r\) is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed \(x\) values in the data.

PERFORMING THE HYPOTHESIS TEST

  • Null Hypothesis: \(H_{0}: \rho = 0\)
  • Alternate Hypothesis: \(H_{a}: \rho \neq 0\)

WHAT THE HYPOTHESES MEAN IN WORDS:

  • Null Hypothesis \(H_{0}\) : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship(correlation) between \(x\) and \(y\) in the population.
  • Alternate Hypothesis \(H_{a}\) : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between \(x\) and \(y\) in the population.

DRAWING A CONCLUSION:There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the \(p\text{-value}\)
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, \(\alpha = 0.05\)

Using the \(p\text{-value}\) method, you could choose any appropriate significance level you want; you are not limited to using \(\alpha = 0.05\). But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, \(\alpha = 0.05\). (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a \(p\text{-value}\) to make a decision

Using the ti83, 83+, 84, 84+ calculator.

To calculate the \(p\text{-value}\) using LinRegTTEST:

On the LinRegTTEST input screen, on the line prompt for \(\beta\) or \(\rho\), highlight "\(\neq 0\)"

The output screen shows the \(p\text{-value}\) on the line that reads "\(p =\)".

(Most computer statistical software can calculate the \(p\text{-value}\).)

If the \(p\text{-value}\) is less than the significance level ( \(\alpha = 0.05\) ):

  • Decision: Reject the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero."

If the \(p\text{-value}\) is NOT less than the significance level ( \(\alpha = 0.05\) )

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is NOT significantly different from zero."

Calculation Notes:

  • You will use technology to calculate the \(p\text{-value}\). The following describes the calculations to compute the test statistics and the \(p\text{-value}\):
  • The \(p\text{-value}\) is calculated using a \(t\)-distribution with \(n - 2\) degrees of freedom.
  • The formula for the test statistic is \(t = \frac{r\sqrt{n-2}}{\sqrt{1-r^{2}}}\). The value of the test statistic, \(t\), is shown in the computer or calculator output along with the \(p\text{-value}\). The test statistic \(t\) has the same sign as the correlation coefficient \(r\).
  • The \(p\text{-value}\) is the combined area in both tails.

An alternative way to calculate the \(p\text{-value}\) ( \(p\) ) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: \(p\text{-value}\) method

  • Consider the third exam/final exam example.
  • The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points.
  • Can the regression line be used for prediction? Given a third exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?
  • \(H_{0}: \rho = 0\)
  • \(H_{a}: \rho \neq 0\)
  • \(\alpha = 0.05\)
  • The \(p\text{-value}\) is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The \(p\text{-value}\), 0.026, is less than the significance level of \(\alpha = 0.05\).
  • Decision: Reject the Null Hypothesis \(H_{0}\)
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Because \(r\) is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of \(r\) is significant or not . Compare \(r\) to the appropriate critical value in the table. If \(r\) is not between the positive and negative critical values, then the correlation coefficient is significant. If \(r\) is significant, then you may want to use the line for prediction.

Example \(\PageIndex{1}\)

Suppose you computed \(r = 0.801\) using \(n = 10\) data points. \(df = n - 2 = 10 - 2 = 8\). The critical values associated with \(df = 8\) are \(-0.632\) and \(+0.632\). If \(r <\) negative critical value or \(r >\) positive critical value, then \(r\) is significant. Since \(r = 0.801\) and \(0.801 > 0.632\), \(r\) is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

Exercise \(\PageIndex{1}\)

For a given line of best fit, you computed that \(r = 0.6501\) using \(n = 12\) data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because \(r >\) the positive critical value.

Example \(\PageIndex{2}\)

Suppose you computed \(r = –0.624\) with 14 data points. \(df = 14 – 2 = 12\). The critical values are \(-0.532\) and \(0.532\). Since \(-0.624 < -0.532\), \(r\) is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

Exercise \(\PageIndex{2}\)

For a given line of best fit, you compute that \(r = 0.5204\) using \(n = 9\) data points, and the critical value is \(0.666\). Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because \(r <\) the positive critical value.

Example \(\PageIndex{3}\)

Suppose you computed \(r = 0.776\) and \(n = 6\). \(df = 6 - 2 = 4\). The critical values are \(-0.811\) and \(0.811\). Since \(-0.811 < 0.776 < 0.811\), \(r\) is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

Exercise \(\PageIndex{3}\)

For a given line of best fit, you compute that \(r = -0.7204\) using \(n = 8\) data points, and the critical value is \(= 0.707\). Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because \(r <\) the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example. The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points. Can the regression line be used for prediction? Given a third-exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?

  • Use the "95% Critical Value" table for \(r\) with \(df = n - 2 = 11 - 2 = 9\).
  • The critical values are \(-0.602\) and \(+0.602\)
  • Since \(0.6631 > 0.602\), \(r\) is significant.
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Example \(\PageIndex{4}\)

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if \(r\) is significant and the line of best fit associated with each r can be used to predict a \(y\) value. If it helps, draw a number line.

  • \(r = –0.567\) and the sample size, \(n\), is \(19\). The \(df = n - 2 = 17\). The critical value is \(-0.456\). \(-0.567 < -0.456\) so \(r\) is significant.
  • \(r = 0.708\) and the sample size, \(n\), is \(9\). The \(df = n - 2 = 7\). The critical value is \(0.666\). \(0.708 > 0.666\) so \(r\) is significant.
  • \(r = 0.134\) and the sample size, \(n\), is \(14\). The \(df = 14 - 2 = 12\). The critical value is \(0.532\). \(0.134\) is between \(-0.532\) and \(0.532\) so \(r\) is not significant.
  • \(r = 0\) and the sample size, \(n\), is five. No matter what the \(dfs\) are, \(r = 0\) is between the two critical values so \(r\) is not significant.

Exercise \(\PageIndex{4}\)

For a given line of best fit, you compute that \(r = 0\) using \(n = 100\) data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between \(x\) and \(y\) in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between \(x\) and \(y\) in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatter plot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of \(y\) for varying values of \(x\). In other words, the expected value of \(y\) for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The \(y\) values for any particular \(x\) value are normally distributed about the line. This implies that there are more \(y\) values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of \(y\) values lie on the line.
  • The standard deviations of the population \(y\) values about the line are equal for each value of \(x\). In other words, each of these normal distributions of \(y\) values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

Linear regression is a procedure for fitting a straight line of the form \(\hat{y} = a + bx\) to data. The conditions for regression are:

  • Linear In the population, there is a linear relationship that models the average value of \(y\) for different values of \(x\).
  • Independent The residuals are assumed to be independent.
  • Normal The \(y\) values are distributed normally for any value of \(x\).
  • Equal variance The standard deviation of the \(y\) values is equal for each \(x\) value.
  • Random The data are produced from a well-designed random sample or randomized experiment.

The slope \(b\) and intercept \(a\) of the least-squares line estimate the slope \(\beta\) and intercept \(\alpha\) of the population (true) regression line. To estimate the population standard deviation of \(y\), \(\sigma\), use the standard deviation of the residuals, \(s\). \(s = \sqrt{\frac{SEE}{n-2}}\). The variable \(\rho\) (rho) is the population correlation coefficient. To test the null hypothesis \(H_{0}: \rho =\) hypothesized value , use a linear regression t-test. The most common null hypothesis is \(H_{0}: \rho = 0\) which indicates there is no linear relationship between \(x\) and \(y\) in the population. The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest).

Formula Review

Least Squares Line or Line of Best Fit:

\[\hat{y} = a + bx\]

\[a = y\text{-intercept}\]

\[b = \text{slope}\]

Standard deviation of the residuals:

\[s = \sqrt{\frac{SSE}{n-2}}\]

\[SSE = \text{sum of squared errors}\]

\[n = \text{the number of data points}\]

IMAGES

  1. PPT

    how to write a correlation hypothesis

  2. Correlation Formula

    how to write a correlation hypothesis

  3. PPT

    how to write a correlation hypothesis

  4. Day 9 hypothesis and correlation for students

    how to write a correlation hypothesis

  5. How to Write a Hypothesis

    how to write a correlation hypothesis

  6. How to Write a Hypothesis: The Ultimate Guide with Examples

    how to write a correlation hypothesis

VIDEO

  1. Hypothesis Testing

  2. correlation& hypothesis using spss

  3. Statistics Correlation

  4. Correlation Hypothesis Test Theory

  5. Pearson Product Moment Correlation

  6. Testing of hypothesis about correlation coefficient

COMMENTS

  1. How to Write a Hypothesis for Correlation

    Read the findings of similar experiments before writing your own hypothesis. Identify the independent variable and dependent variable. Your hypothesis will be concerned with what happens to the dependent variable when a change is made in the independent variable. In a correlation, the two variables undergo changes at the same time in a ...

  2. 11.2: Correlation Hypothesis Test

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\).

  3. 12.1.2: Hypothesis Test for a Correlation

    The t-test is a statistical test for the correlation coefficient. It can be used when x x and y y are linearly related, the variables are random variables, and when the population of the variable y y is normally distributed. The formula for the t-test statistic is t = r ( n − 2 1 −r2)− −−−−−−−√ t = r ( n − 2 1 − r 2).

  4. How to Write a Strong Hypothesis

    5. Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  5. Correlation Coefficient

    i. = the difference between the x-variable rank and the y-variable rank for each pair of data. ∑ d2. i. = sum of the squared differences between x- and y-variable ranks. n = sample size. If you have a correlation coefficient of 1, all of the rankings for each variable match up for every data pair.

  6. 1.9

    Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0: ρ = 0 against the alternative H A: ρ ≠ 0, we obtain the following test statistic: t ∗ = r n − 2 1 − R 2 = 0.939 170 − 2 1 − 0.939 2 = 35.39. To obtain the P -value, we need ...

  7. Pearson Correlation Coefficient (r)

    It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. Pearson correlation coefficient ( r) Correlation type. Interpretation. Example. Between 0 and 1. Positive correlation. When one variable changes, the other variable changes in the same direction. Baby length & weight:

  8. Interpreting Correlation Coefficients

    Hypothesis Test for Correlation Coefficients. Correlation coefficients have a hypothesis test. As with any hypothesis test, this test takes sample data and evaluates two mutually exclusive statements about the population from which the sample was drawn. ... I write more about this in my post about statistical vs. practical significance. But, in ...

  9. Correlation coefficient review (article)

    The correlation coefficient r measures the direction and strength of a linear relationship. Calculating r is pretty complex, so we usually rely on technology for the computations. We focus on understanding what r says about a scatterplot. Here are some facts about r : It always has a value between − 1. ‍.

  10. Conducting a Hypothesis Test for the Population Correlation Coefficient

    We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient ρ. First, we specify the null and alternative hypotheses: Null hypothesis H0: ρ = 0. Alternative hypothesis HA: ρ ≠ 0 or HA: ρ < 0 or HA: ρ > 0. Second, we calculate the value of the test statistic using the following ...

  11. 12.4 Testing the Significance of the Correlation Coefficient

    The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y.However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.. We perform a hypothesis test of the "significance of the correlation ...

  12. Everything you need to know about interpreting correlations

    The null hypothesis is the hypothesis that we are trying to provide evidence against, in our case, we try to provide evidence againt the hypothesis that there is not a significant linear correlation between x and y in the population (i.e. ρ = 0) Null hypothesis Ho: ρ = 0; Alternative hypothesis Ha: ρ ≠ 0; Step 2: T-test

  13. Hypothesis Test for Correlation

    The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y.However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.. We perform a hypothesis test of the "significance of the ...

  14. 2.5.2 Hypothesis Testing for Correlation

    How is a hypothesis test for correlation carried out? Most of the time the hypothesis test will be carried out by using a critical value; You won't be expected to calculate p-values but you might be given a p-value; Step 1. Write the null and alternative hypotheses clearly. The hypothesis test could either be a one-tailed test or a two-tailed test

  15. PDF Lecture 2: Hypothesis testing and correlation

    Correlation values lie in the range -1 to 1, where -1 indicates a perfect negative linear relationship, 0 indicates no relationship, and 1 indicates a perfect positive linear relationship. (Note that we are discussing Pearson's product-moment correlation, but there are other variants of correlation.) - Correlation can be calculated as follows:

  16. Correlation Hypothesis

    Writing a hypothesis for correlation involves crafting a clear and testable statement about the expected relationship between variables. Here's a step-by-step guide: Identify Variables : Clearly define the variables you are studying and their nature (e.g., "There is a relationship between X and Y…").

  17. How may I write my hypothesis for a quantitative correlation study

    When writing a hypothesis for a quantitative correlation study, you typically propose a relationship between two variables. Here's a general structure for writing a hypothesis in this context ...

  18. 12.1: Correlation

    12.1: Correlation. We are often interested in the relationship between two variables. This chapter determines whether a linear relationship exists between sets of quantitative data and making predictions for a population—for instance, the relationship between the number of hours of study time and an exam score, or smoking and heart disease.

  19. Correlational Research

    Revised on June 22, 2023. A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them. A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

  20. How to Write a Great Hypothesis

    What is a hypothesis and how can you write a great one for your research? A hypothesis is a tentative statement about the relationship between two or more variables that can be tested empirically. Find out how to formulate a clear, specific, and testable hypothesis with examples and tips from Verywell Mind, a trusted source of psychology and mental health information.

  21. What are hypotheses? • Simply explained

    A hypothesis is an assumption that is neither proven nor disproven. In the research process, a hypothesis is made at the very beginning and the goal is to either reject or not reject the hypothesis. In order to reject or or not reject a hypothesis, data, e.g. from an experiment or a survey, are needed, which are then evaluated using a ...

  22. How to Perform a Correlation Test in R (With Examples)

    To determine if the correlation coefficient between two variables is statistically significant, you can perform a correlation test in R using the following syntax: cor.test (x, y, method=c ("pearson", "kendall", "spearman")) where: x, y: Numeric vectors of data. method: Method used to calculate correlation between two vectors.

  23. A New Coefficient of Correlation. What if you were told there exists a

    As a final reminder, this value can range from -1 to +1 with negative values implying an inverse linear relationship between the two variables being measured and a positive one implying the opposite. Notice the emphasis so far being placed on measuring linear relationships. Linear relationships can be understood as the shape of a relationship being somewhat traceable using a straight line.

  24. 12.5: Testing the Significance of the Correlation Coefficient

    The p-value is calculated using a t -distribution with n − 2 degrees of freedom. The formula for the test statistic is t = r√n − 2 √1 − r2. The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r.

  25. 2024 URCAF Program

    "A Test of the Enemy Release Hypothesis Using the Invasive New Zealand Mud Snail" ... The results of my notes revealed that there was a relevant correlation between those in the community and the holistic approach that the library maintains to provide an inclusive, fruitful, creative, and safe environment for ages across the lifespan to use as ...