Statology

Statistics Made Easy

The Complete Guide: Hypothesis Testing in R

A hypothesis test is a formal statistical test we use to reject or fail to reject some statistical hypothesis.

This tutorial explains how to perform the following hypothesis tests in R:

  • One sample t-test
  • Two sample t-test
  • Paired samples t-test

We can use the t.test() function in R to perform each type of test:

  • x, y: The two samples of data.
  • alternative: The alternative hypothesis of the test.
  • mu: The true value of the mean.
  • paired: Whether to perform a paired t-test or not.
  • var.equal: Whether to assume the variances are equal between the samples.
  • conf.level: The confidence level to use.

The following examples show how to use this function in practice.

Example 1: One Sample t-test in R

A one sample t-test is used to test whether or not the mean of a population is equal to some value.

For example, suppose we want to know whether or not the mean weight of a certain species of some turtle is equal to 310 pounds. We go out and collect a simple random sample of turtles with the following weights:

Weights : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303

The following code shows how to perform this one sample t-test in R:

From the output we can see:

  • t-test statistic: -1.5848
  • degrees of freedom:  12
  • p-value:  0.139
  • 95% confidence interval for true mean:  [303.4236, 311.0379]
  • mean of turtle weights:  307.230

Since the p-value of the test (0.139) is not less than .05, we fail to reject the null hypothesis.

This means we do not have sufficient evidence to say that the mean weight of this species of turtle is different from 310 pounds.

Example 2: Two Sample t-test in R

A two sample t-test is used to test whether or not the means of two populations are equal.

For example, suppose we want to know whether or not the mean weight between two different species of turtles is equal. To test this, we collect a simple random sample of turtles from each species with the following weights:

Sample 1 : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303

Sample 2 : 335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305

The following code shows how to perform this two sample t-test in R:

  • t-test statistic: -2.1009
  • degrees of freedom:  19.112
  • p-value:  0.04914
  • 95% confidence interval for true mean difference: [-14.74, -0.03]
  • mean of sample 1 weights: 307.2308
  • mean of sample 2 weights:  314.6154

Since the p-value of the test (0.04914) is less than .05, we reject the null hypothesis.

This means we have sufficient evidence to say that the mean weight between the two species is not equal.

Example 3: Paired Samples t-test in R

A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.

For example, suppose we want to know whether or not a certain training program is able to increase the max vertical jump (in inches) of basketball players.

To test this, we may recruit a simple random sample of 12 college basketball players and measure each of their max vertical jumps. Then, we may have each player use the training program for one month and then measure their max vertical jump again at the end of the month.

The following data shows the max jump height (in inches) before and after using the training program for each player:

Before : 22, 24, 20, 19, 19, 20, 22, 25, 24, 23, 22, 21

After : 23, 25, 20, 24, 18, 22, 23, 28, 24, 25, 24, 20

The following code shows how to perform this paired samples t-test in R:

  • t-test statistic: -2.5289
  • degrees of freedom:  11
  • p-value:  0.02803
  • 95% confidence interval for true mean difference: [-2.34, -0.16]
  • mean difference between before and after: -1.25

Since the p-value of the test (0.02803) is less than .05, we reject the null hypothesis.

This means we have sufficient evidence to say that the mean jump height before and after using the training program is not equal.

Additional Resources

Use the following online calculators to automatically perform various t-tests:

One Sample t-test Calculator Two Sample t-test Calculator Paired Samples t-test Calculator

Featured Posts

5 Statistical Biases to Avoid

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

ab-logo

Code. Models. Analysis. Decisions.

  • Publications

Using R for Hypothesis Testing

This tutorial demonstrates step-by step how to use R and Jupyter Notebook to conduct a two-tailed or two sided hypothesis test. An eight step approach that begins with the formulation of Null and Alternative Hypothesis and ends by stating what the results of the test mean in plain English.

What is hypothesis testing?

Hypothesis testing is one of the cornerstones of inferential statistics. It is generally used to test whether some phenomenon observed in a sample is likely the result of random change or is "real", that is statistically significant.

What are the steps of hypothesis testing?

Hypothesis testing is statistics is usually challenging at first, but a step-by-step approach can help keep everything straight. We always start with a hypothesis we would like to prove, i.e. Human activity is contributing to global warming. This is what we need to prove, and is called the alternative hypothesis . Its opposite, humans are not contributing to global warming, is the null hypothesis . To prove the alternative hypothesis we would need to collect data and if the collected sample data deviates enough from what we can attribute to normal variation we will have proved the alternative hypothesis is much more likely than not. The global warming question is devilishly hard to test, and as you may have heard the jury is still out. But it does serve as a nice vignette.

When presented with the sorts of questions found in introductory statistics, you can use the following steps:

  • State the null and alternative hypotheses - the null is the status quo, and requires sufficient evidence to disprove. It is often easier to start wit hthe alternative hypothesis and then present its opposite as the null
  • Choose the level of significance at which you would reject the Null - so at what point would you be reasonably sure that the difference between the sample mean and status quo mean is not the result of random chance? The level of significance is usually 0.05, but not always
  • Choose a sample size to test your hypothesis. The textbook problem usually does this for you, but you will need to use the sample size to claculate your test statistic.
  • Determine the appropriate statistical technique to use, i.e. Z-test or t-test
  • Determine the critical value(s) that separate the rejection region from the non-rejection region. You will always need to either reject or fail to reject he null hyopthesis after you calculate your test statistic. This is the line or lines in thesand that allow you to do that.
  • Collect data and calculate the test statistic. Again, this stepo is usually done for you in textbook problems.
  • Compare the test statistic to the critical value(s). Is the test statistic beyond your line in the sand?
  • State the statistical decision - if the test statistic is beyond a boundary established by a critical value, then you reject the null hypothesis, oitherwise you do not reject the null.
  • State your decision in terms of the original question, i.e the results either do or do nbot support the alternative hypothese.

The video tutorial presented here is an example of a so called two-tailed hypothesis test, and walks through a problem using these steps.

You can download the Jupyter Notebook used in the video here

Two Sample t-test in R, with examples

In this article, we will discuss how to do a two sample t-test in R with some practical examples.

What is Two-sample t-test for mean?

Two sample t-test is used to determine whether there is a significant difference between the two population means given for the two samples with unknown population variance.

Conditions required to conduct two sample t-test for mean

Assumptions for Two Sample Mean t-test

  • Both Population variances are unknown.
  • Both sample size should be small.
  • Both samples should be drawn at random from their respective populations.
  • Two Samples should be independent of each other.
  • Both Populations should follow a normal distribution.

Hypothesis for the two sample t-test for mean

Let x̅ 1  denote the sample mean for a random sample from population 1.

x̅ 2  denote the sample mean for a random sample from population 2.

µ 1  denotes the mean for population 1

µ 2  denotes the mean for population 2

Null Hypothesis:

H 0  : µ 1  = µ 2   Both population means are equal.

Alternative Hypothesis:  Three forms of alternative hypothesis are as follows:

  • H a  : µ 1  – µ 2  <0  The difference between two population means is less than 0 i.e.mean for population 1 is less than the mean for population 2.It is called lower tail test (left-tailed test).
  • H a  : µ 1  – µ 2  >0  The difference between two population means is greater than 0 i.e. mean for population 1 is greater than the mean for population 2. It is called Upper tail test (right-tailed test).
  • H a  : µ 1  – µ 2   ≠   0  The difference between two population means is not equal to 0 i.e. mean for population 1 is not equal to mean for population 2. It is called two tail test.

Formula for the test statistic two sample t test is:

Formula for two sample t-test with unequal variance

x̅ 1  : sample mean for population 1

x̅ 2 : sample mean for population 2

µ 1  : mean for population 1

µ 2   : mean for population 2

n 1  : sample size for sample mean from population 1.

n 2  : sample size for sample mean from population 2.

s 2 1  : variance for sample 1

s 2 2  : variance for sample 2

df : degree of freedom

when variance are unequal and unknown then df will be:

Formula of degree of freedom for two sample t-test with unequal varianc

Function in R for t-test

To perform two sample t-test for the mean we will use the  t.test() function in R  from the stats library.

The  t.test()  function uses the following basic syntax:

x,y:  It tells us about the datasets used in the test.

alternative:  The alternative hypothesis for the test.

mu:  The true value of the mean.

paired:  Specify it is a paired t-test or not.

var. equal:  a logical variable indicates whether to treat the two variances as being equal.

conf. level:  confidence level of the interval

Summary for the two sample t-test for mean

How to do two sample t-test for mean in r.

We will calculate the test statistic by using a two sample t-test for the mean.

Procedure for Two Sample t-test   for mean

Step 1:  Define the Null Hypothesis and Alternate Hypothesis.

Step 2:  Decide the level of significance α (alpha).

Step 3:  Calculate the test statistic using the t.test() function from R.

Step 4:  Interpret the two sample t-test results.

Step 5:  Determine the rejection criteria for the given confidence level and conclude the results whether the test statistic lies in the rejection region or non-rejection region.

Let’s see practical examples that show how to use the t.test() function in R.

Example for Two Sample t-test

Example 1: two-tailed test in r with unknown equal variance..

Body weight among boys and girls in class are known to be normally distributed, each with sample standard deviations for girls is 25 and for boys is 23.

A teacher wants to know if the mean body weight between girls and boys in class are different, so she selects two random samples of boys and girls each of size 20 from the class and records their weights.

Wants to determine if the mean weight is different between boys and girls with 5% level of significance.

Solution :  Given data:

sample size for boys (n 1 ) = 20

sample size for girls (n 2 ) = 20

sample standard deviation for boys (σ 1 ) = 23

sample standard deviation for girls (σ 2 ) = 25

Now we will solve this example with the step-by-step procedure.

µ 1  denotes the mean for boys

µ 2  denotes the mean for girls

Null Hypothesis:  The body weight for girls and boys are equal.

H 0  : µ 1  = µ 2

Alternate Hypothesis  : The body weight for girls and boys are not equal.

H a  : µ 1  ≠  µ 2

Step 2:  level of significance (α) = 0.05

Step 3:  Calculate the test statistic using a t.test() function in R using the below code.

Specify the alternative hypothesis as “two.sided” because we are performing a two-tailed test. The results are as follows.

Step 4:  Interpret the two sample test results.

How to interpret two sample z-test results in R?

Let’s see the interpretation of z-test results in R.

data : This gives information about the data set used in the one-sample t-test. In this, we use dataset vector as data.

t : It is the test statistic of the t-test. In our case test statistic = -1.5119

df : It is the degree of freedom for the t-test statistic. In our case df=5

p-value : This is the p-value corresponding to t-test statistic i.e. -1.5119 and degree of freedom i.e. 5. In our case, the p-value is 0.09549.

alternative : It is the alternative hypothesis used for the t-test. In our case, an alternative hypothesis is true to mean is less than 553 i.e left tailed.

95 percent confidence interval:  This gives us a 95% confidence interval for the true mean. Here the 95% confidence interval is [- ∞ ,553.8875].

sample estimates : It gives the sample mean. In our case sample mean is 550.33

Step 6:  Determine the rejection criteria for the given confidence level and conclude the results whether the test statistic lies in the rejection region or non-rejection region.

Conclusion:

Since the p-value[ 0.09549] is not less than the level of significance (α) = 0.05, we fail to reject the null hypothesis.

This means we do not have sufficient evidence to say that the mean weight of the almonds in the dry fruits is different from 553 grams.

data : This gives information about the vector used in the z-test. x represents the data set for boys and y represents the data set for girls.

z : It is the test statistic of the z-test. In our case, test statistic = -0.68424.

p-value : This is the p-value corresponding to a statistic. In our case, the p-value is 0.4938.

alternative : It is the alternative hypothesis used for the z-test. In our case, an alternative hypothesis the IQ level for girls and boys are not equal, i.e. two-tailed.

95 percent confidence interval:  This gives us a 95% confidence interval for the true mean. Here the 95% confidence interval is [-14.781532,7.131532].

sample estimates : It gives the sample means.In our case, the sample mean for boys=106.350 and sample mean for girls =110.175.

Since the p-value[ 0.4938] is greater than the level of significance (α) = 0.05, we fail to reject the null hypothesis.

This means we have sufficient evidence to say that IQ level for boys and girls are equal in 10th class.

Example 2: Left-tailed two sample test in R with known unequal variance.

The two independent populations taken from two shops in a small town.The first shop A sells “traditional” lime juice.  However the second shop B is selling “Special” Mojito. We selects the two random sample of sales for each drink(shop) and records their sales for 35 days to determine if sales for “Special” Mojito out performed sales of “traditional” lime juice at 5% level of significance. The population variances for lime juice sales is 15 and for Mojito is 12.

sample size for lime juice sales (n 1 ) = 35

sample size for Mojito sales (n 2 ) = 35

Population standard deviation for lime juice sales (σ 1 ) = 15

Population standard deviation for Mojito sales (σ 2 ) = 12

Lets perform z-test in this example with the step-by-step procedure.

µ 1  denotes the mean for lime juice sales

µ 2  denotes the mean for Mojito sales

Null Hypothesis:  The sales for lime juice sales and Mojito sales are equal.

Alternate Hypothesis  : The sales for Mojito sales are greater and sales for lime juice.

H a  : µ 1  – µ 2  <0  i.e  µ 2  > µ 1

Step 3:  Calculate the test statistic using a z.test() function in R using the below code.# Define the datasets for both drinkslime_juice_sales = c ( 56,65,37,47,66,76,75,31,80,45,34,42,42,23,67,47,45,50,45,42,59,34,50,48,65,41,53,41,36,39,51,69,30,52,42 ) mojito_sales = c ( 51,47,53,40,70,49,63,71,47,62,65,62,56,74,49,33,80,60,46,65,48,61,54,67,65,48,46,66,52,65,62,59,63,44,50 ) # Perform the two sample z-testz.test ( x=lime_juice_sales, y=mojito_sales, mu=0, sigma.x=15, sigma.y=12,alternative = “less” )

Specify the alternative hypothesis as “less” because we are performing a left-tailed test. The results are as follows.#ResultsTwo-sample z-Testdata: lime_juice_sales and mojito_salesz = -2.3582, p-value = 0.009181alternative hypothesis: true difference in means is less than 095 percent confidence interval: NA -2.316342sample estimates:mean of x mean of y 49.28571 56.94286

data : This gives information about the vector used in the z-test. x represents the data set for lime juice sales and y represents the data set for mojito sales.

z : It is the test statistic of the z-test. In our case, test statistic = -2.3582.

p-value : This is the p-value corresponding to a statistic. In our case, the p-value is 0.009181.

alternative : It is the alternative hypothesis used for the z-test. In our case, an alternative hypothesis the sales for Mojito sales are greater and sales for lime juice , i.e. left-tailed.

95 percent confidence interval:  This gives us a 95% confidence interval for the true mean.

sample estimates : It gives the sample means.In our case, the sample mean for lime juice sales = 49.28571and sample mean for mojito sales = 56.94286

Since the p-value[ 0.009181] is less than the level of significance (α) = 0.05, we reject the null hypothesis.

This means we have sufficient evidence to say that the sales for mojito drink is out performed as comapre to lime juice sales in the town.

A statistician claims that the average score on logical reasoning test taken by students who major in Physics is less than that of students who major in English. The result of the exams, given to 22 Physics students and 33 English students, is shown here. Is there enough evidence to reject the statistician’s claim at α=0.05α=0.05? Assume that the standard deviations for the two populations are not equal.

The two sample t-test is used to determine whether the two populations are equal or not then population variance is unknown.

I hope you found the above article on two sample t-test in R with Examples informative and educational.

Leave a Comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

An R Introduction to Statistics

  • Terms of Use
  • Two-Tailed Test of Population Proportion

The null hypothesis of the two-tailed test about population proportion can be expressed as follows:

where p 0 is a hypothesized value of the true population proportion p .

Let us define the test statistic z in terms of the sample proportion and the sample size:

Then the null hypothesis of the two-tailed test is to be rejected if z ≤− z α∕ 2 or z ≥ z α∕ 2 , where z α∕ 2 is the 100(1 − α ) percentile of the standard normal distribution .

Suppose a coin toss turns up 12 heads out of 20 trials. At .05 significance level, can one reject the null hypothesis that the coin toss is fair?

The null hypothesis is that p = 0 . 5 . We begin with computing the test statistic.

We then compute the critical values at .05 significance level.

The test statistic 0.89443 lies between the critical values -1.9600 and 1.9600. Hence, at .05 significance level, we do not reject the null hypothesis that the coin toss is fair.

Alternative Solution 1

Instead of using the critical value, we apply the pnorm function to compute the two-tailed p-value of the test statistic. It doubles the upper tail p-value as the sample proportion is greater than the hypothesized value. Since it turns out to be greater than the .05 significance level, we do not reject the null hypothesis that p = 0 . 5 .

Alternative Solution 2

We apply the prop.test function to compute the p-value directly. The Yates continuity correction is disabled for pedagogical reasons.

  • Elementary Statistics with R
  • hypothesis testing
  • normal distribution
  • population proportion

R Tutorial eBook

R Tutorial eBook

R Tutorials

  • Combining Vectors
  • Vector Arithmetics
  • Vector Index
  • Numeric Index Vector
  • Logical Index Vector
  • Named Vector Members
  • Matrix Construction
  • Named List Members
  • Data Frame Column Vector
  • Data Frame Column Slice
  • Data Frame Row Slice
  • Data Import
  • Frequency Distribution of Qualitative Data
  • Relative Frequency Distribution of Qualitative Data
  • Category Statistics
  • Frequency Distribution of Quantitative Data
  • Relative Frequency Distribution of Quantitative Data
  • Cumulative Frequency Distribution
  • Cumulative Frequency Graph
  • Cumulative Relative Frequency Distribution
  • Cumulative Relative Frequency Graph
  • Stem-and-Leaf Plot
  • Scatter Plot
  • Interquartile Range
  • Standard Deviation
  • Correlation Coefficient
  • Central Moment
  • Binomial Distribution
  • Poisson Distribution
  • Continuous Uniform Distribution
  • Exponential Distribution
  • Normal Distribution
  • Chi-squared Distribution
  • Student t Distribution
  • F Distribution
  • Point Estimate of Population Mean
  • Interval Estimate of Population Mean with Known Variance
  • Interval Estimate of Population Mean with Unknown Variance
  • Sampling Size of Population Mean
  • Point Estimate of Population Proportion
  • Interval Estimate of Population Proportion
  • Sampling Size of Population Proportion
  • Lower Tail Test of Population Mean with Known Variance
  • Upper Tail Test of Population Mean with Known Variance
  • Two-Tailed Test of Population Mean with Known Variance
  • Lower Tail Test of Population Mean with Unknown Variance
  • Upper Tail Test of Population Mean with Unknown Variance
  • Two-Tailed Test of Population Mean with Unknown Variance
  • Lower Tail Test of Population Proportion
  • Upper Tail Test of Population Proportion
  • Type II Error in Lower Tail Test of Population Mean with Known Variance
  • Type II Error in Upper Tail Test of Population Mean with Known Variance
  • Type II Error in Two-Tailed Test of Population Mean with Known Variance
  • Type II Error in Lower Tail Test of Population Mean with Unknown Variance
  • Type II Error in Upper Tail Test of Population Mean with Unknown Variance
  • Type II Error in Two-Tailed Test of Population Mean with Unknown Variance
  • Population Mean Between Two Matched Samples
  • Population Mean Between Two Independent Samples
  • Comparison of Two Population Proportions
  • Multinomial Goodness of Fit
  • Chi-squared Test of Independence
  • Completely Randomized Design
  • Randomized Block Design
  • Factorial Design
  • Wilcoxon Signed-Rank Test
  • Mann-Whitney-Wilcoxon Test
  • Kruskal-Wallis Test
  • Estimated Simple Regression Equation
  • Coefficient of Determination
  • Significance Test for Linear Regression
  • Confidence Interval for Linear Regression
  • Prediction Interval for Linear Regression
  • Residual Plot
  • Standardized Residual
  • Normal Probability Plot of Residuals
  • Estimated Multiple Regression Equation
  • Multiple Coefficient of Determination
  • Adjusted Coefficient of Determination
  • Significance Test for MLR
  • Confidence Interval for MLR
  • Prediction Interval for MLR
  • Estimated Logistic Regression Equation
  • Significance Test for Logistic Regression
  • Distance Matrix by GPU
  • Hierarchical Cluster Analysis
  • Kendall Rank Coefficient
  • Significance Test for Kendall's Tau-b
  • Support Vector Machine with GPU
  • Support Vector Machine with GPU, Part II
  • Bayesian Classification with Gaussian Process
  • Hierarchical Linear Model
  • Installing GPU Packages

Copyright © 2009 - 2024 Chi Yau All Rights Reserved Theme design by styleshout Adaptation by Chi Yau

  • Data Visualization
  • Statistics in R
  • Machine Learning in R
  • Data Science in R
  • Packages in R

Two-Tailed Test of Population Proportion in R

  • Upper Tail Test of Population Proportion in R
  • Lower Tail Test of Population Proportion in R
  • Two-Proportions Z-Test in R Programming
  • One-Proportion Z-Test in R Programming
  • R Program to Sample from a Population
  • Two-Tailed Test of Population Mean with Known Variance in R
  • Two-Tailed Test of Population Mean with Unknown Variance in R
  • How to Perform a One Proportion Z-Test in Python
  • Lower Tail Test of Population Mean with Known Variance in R
  • Upper Tail Test of Population Mean with Known Variance in R
  • How to Perform an F-Test in Python
  • Lower Tail Test of Population Mean with Unknown Variance in R
  • Upper Tail Test of Population Mean with Unknown Variance in R
  • Performing F-Test in R programming - var.test() Method
  • Type II Error in Two-Tailed Test of Population Mean with Known Variance in R
  • How to Find the Proportion in Statistics?
  • Ratio proportion and partnership | Set-2
  • QA - Placement Quizzes | Ratio and Proportion | Question 3
  • QA - Placement Quizzes | Ratio and Proportion | Question 2
  • Change column name of a given DataFrame in R
  • Convert Factor to Numeric and Numeric to Factor in R Programming
  • Adding elements in a vector in R programming - append() method
  • Printing Output of an R Program
  • Clear the Console and the Environment in R Studio
  • How to Replace specific values in column in R DataFrame ?
  • Filter data by multiple conditions in R using Dplyr
  • Comments in R
  • R Programming Language - Introduction
  • Change Color of Bars in Barchart using ggplot2 in R

A statistical hypothesis test is putting your assumption about population parameters to test and checking if your assumption is still valid by computing and comparing the population parameters with test statistics.

The conventional steps that are followed while formulating the hypothesis test, are listed as follows

  • State null hypothesis (Ho) and alternate hypothesis (Ha)
  • Collect a relevant sample of data to test the hypothesis.
  • Choose a significance level for the hypothesis test.
  • Perform an appropriate statistical test.
  • Based on test statistics and p-value decide whether to reject or fail to reject your null hypothesis.

Generally, hypothesis testing is performed to estimate the population mean and population proportion, in this article let us discuss how to perform a two-tailed population proportion test. A two-tailed test in general is a method in which the critical area of a distribution is two-sided (both extremes) and tests whether a sample is greater than or less than a specific range of values.

two tailed hypothesis test r

Two-tailed test with alpha = 0.05, (0.025 on either side) 

Hypothesis testing for proportions involves measuring two outcomes, like success or failure, true or false, good or bad, and so on in a defined set of trials. The probability of getting success or failure should be the same throughout the trial.  

Let us take a more realistic example, today cybercrime is a major threat online, so let us consider a credit card fraudulent transaction case. Assume a popular banking firm has conducted a study for several years and came up with standard reference values of the probability of any transaction being fraudulent is 5% which conversely means that the probability of any transaction to be non-fraudulent is 9%. So here the probability of success is 0.95 and failure is 0.05 and assume these numbers are going to be constant for any credit card transaction. 

One fine day a competitive banking firm wishes to challenge these numbers and wanted to conduct a hypothesis test to prove the probability of fraudulent transaction is not equal to 2%. The competitive banking firm samples 25 random transactions and found 2 in 25 transactions found to be fraudulent. . Let us frame the population proportion hypothesis based on the above problem.

Null Hypothesis: The probability of any transaction being fraudulent is 5% p = po (Here, po = 0.05)   Alternate Hypothesis: The probability of any transaction being fraudulent is not equal to 5%   p != po alpha: 0.05

Let us define the test statistic as follows

z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}}

  • p = population proportion
  • n = sample size

Let us compute the test statistic function using R as shown below

Now let us compute the critical values at a 0.05 significance level.

The computed test statistic 0.68824720 lies between the critical values -1.9600 and 1.9600. Hence, at the .05 significance level, we fail to reject the null hypothesis . In other words, we don’t have enough evidence to conclude that the probability of a fraudulent transaction is not equal to 5%.

Please Login to comment...

Similar reads.

  • R-Statistics

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

two tailed hypothesis test r

Hypothesis Testing for Means & Proportions

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

On This Page sidebar

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

Type i and type ii errors.

Learn More sidebar

All Modules

More Resources sidebar

Z score Table

t score Table

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.  

  • Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);  

H 1 : Research hypothesis (investigator's belief); α =0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is a single number that summarizes the sample information.   An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

  • Step 3.  Set up decision rule.  

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

  • The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value.  In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
  • The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.  
  • The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value.   For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.  

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

  • Step 4. Compute the test statistic.  

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

  • Step 5. Conclusion.  

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).  

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191                 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

  • Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

  • Step 3. Set up decision rule.  

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05.   Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.  

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.                  

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

Lightbulb icon signifying an important idea

 The most common reason for a Type II error is a small sample size.

return to top | previous page | next page

Content ©2017. All Rights Reserved. Date last modified: November 6, 2017. Wayne W. LaMorte, MD, PhD, MPH

logo

Stats and R

Student's t-test in r and by hand: how to compare two groups under different scenarios.

  • Hypothesis test
  • Inferential statistics

Introduction

Null and alternative hypothesis, hypothesis testing, different versions of the student’s t-test, scenario 1: independent samples with 2 known variances, scenario 2: independent samples with 2 equal but unknown variances, scenario 3: independent samples with 2 unequal and unknown variances, scenario 4: paired samples where the variance of the differences is known, scenario 5: paired samples where the variance of the differences is unknown, a note on p -value and significance level \(\alpha\), assumptions.

two tailed hypothesis test r

One of the most important test within the branch of inferential statistics is the Student’s t-test . 1 The Student’s t-test for two samples is used to test whether two groups (two populations) are different in terms of a quantitative variable, based on the comparison of two samples drawn from these two groups. In other words, a Student’s t-test for two samples allows to determine whether the two populations from which your two samples are drawn are different (with the two samples being measured on a quantitative continuous variable). 2

The reasoning behind this statistical test is that if your two samples are markedly different from each other, it can be assumed that the two populations from which the samples are drawn are different. On the contrary, if the two samples are rather similar, we cannot reject the hypothesis that the two populations are similar, so there is no sufficient evidence in the data at hand to conclude that the two populations from which the samples are drawn are different. Note that this statistical tool belongs to the branch of inferential statistics because conclusions drawn from the study of the samples are generalized to the population, even though we do not have the data on the entire population.

To compare two samples, it is usual to compare a measure of central tendency computed for each sample. In the case of the Student’s t-test, the mean is used to compare the two samples. However, in some cases, the mean is not appropriate to compare two samples so the median is used to compare them via the Wilcoxon test . This article being already quite long and complete, the Wilcoxon test is covered in a separate article , together with some illustrations on when to use one test or the other.

These two tests (Student’s t-test and Wilcoxon test) have the same final goal, that is, compare two samples in order to determine whether the two populations from which they were drawn are different or not. Note that the Student’s t-test is more powerful than the Wilcoxon test (i.e., it more often detects a significant difference if there is a true difference, so a smaller difference can be detected with the Student’s t-test) but the Student’s t-test is sensitive to outliers and data asymmetry. Furthermore, within each of these two tests, several versions exist, with each version using different formulas to arrive at the final result. It is thus necessary to understand the difference between the two tests and which version to use in order to carry out the appropriate analyses depending on the question and the data at hand.

In this article, I will first detail step by step how to perform all versions of the Student’s t-test for independent and paired samples by hand. The analyses will be done on a small set of observations for the sake of illustration and easiness. I will then show how to perform this test in R with the exact same data in order to verify the results found by hand. Reminders about the reasoning behind hypothesis testing , interpretations of the p -value and the results, and assumptions of this test will also be presented.

Note that the aim of this article is to show how to compute the Student’s t-test by hand and in R, so we refrain from testing the assumptions and we assume all of them are met for this exercise. For completeness, we still mention the assumptions, how to test them and what other tests exist if one is not met. Interested readers are invited to have a look at the end of the article for more information about these assumptions.

Before diving into the computations of the Student’s t-test by hand, let’s recap the null and alternative hypotheses of this test:

  • \(H_0\) : \(\mu_1 = \mu_2\)
  • \(H_1\) : \(\mu_1 \ne \mu_2\)

where \(\mu_1\) and \(\mu_2\) are the means of the two populations from which the samples were drawn.

As mentioned in the introduction, although technically the Student’s t-test is based on the comparison of the means of the two samples, the final goal of this test is actually to test the following hypotheses:

  • \(H_0\) : the two populations are similar
  • \(H_1\) : the two populations are different

This is in the general case where we simply want to determine whether the two populations are different or not (in terms of the dependent variable). In this sense, we have no prior belief about a particular population mean being larger or smaller than the other. This type of test is referred as a two-sided or bilateral test.

If we have some prior beliefs about one population mean being larger or smaller than the other, the Student’s t-test also allows to test the following hypotheses:

  • \(H_1\) : \(\mu_1 > \mu_2\)
  • \(H_1\) : \(\mu_1 < \mu_2\)

In the first case, we want to test if the mean of the first population is significantly larger than the mean of the second, while in the latter case, we want to test if the mean of the first population is significantly smaller than the mean of the second. This type of test is referred as a one-sided or unilateral test.

Some authors argue that one-sided tests should not be used in practice for the simple reason that, if a researcher is so sure that the mean of one population is larger (smaller) than the mean of the other and would never be smaller (larger) than the other, why would she needs to test for significance at all? This a rather philosophical question and it is beyond the scope of this article. Interested readers are invited to see part of the discussion in Rowntree ( 2000 ) .

In statistics, many statistical tests is in the form of hypothesis tests . Hypothesis tests are used to determine whether a certain belief can be deemed as true (plausible) or not, based on the data at hand (i.e., the sample(s)). Most hypothesis tests boil down to the following 4 steps: 3

  • State the null and alternative hypothesis.
  • Compute the test statistic, denoted t-stat. Formulas to compute the test statistic differ among the different versions of the Student’s t-test but they have the same structure. See scenarios 1 to 5 below to see the different formulas.
  • Find the critical value given the theoretical statistical distribution of the test, the parameters of the distribution and the significance level \(\alpha\) . For a Student’s t-test and its extended version, it is either the normal or the Student’s t distribution ( t denoting the Student distribution and z denoting the normal distribution).
  • Conclude by comparing the t-stat (found in step 2.) with the critical value (found in step. 3). If the t-stat lies in the rejection region (determined thanks to the critical value and the direction of the test), we reject the null hypothesis, otherwise we do not reject the null hypothesis. These two alternatives (reject or do not reject the null hypothesis) are the only two possible solutions, we never “accept” an hypothesis. It is also a good practice to always interpret the decision in the terms of the initial question.

For the interested reader, see these 4 steps of hypothesis testing in more details in this article .

There are several versions of the Student’s t-test for two samples, depending on whether the samples are independent or paired and depending on whether the variances of the populations are (un)equal and/or (un)known:

On the one hand, independent samples means that the two samples are collected on different experimental units or different individuals, for instance when we are working on women and men separately, or working on patients who have been randomly assigned to a control and a treatment group (and a patient belongs to only one group). On the other hand, we face paired samples when measurements are collected on the same experimental units, same individuals. This is often the case, for example in medical studies, when testing the efficiency of a treatment at two different times. The same patients are measured twice, before and after the treatment, and the dependency between the two samples must be taken into account in the computation of the test statistic by working on the differences of measurements for each subject. Paired samples are usually the result of measurements at two different times, but not exclusively. Suppose we want to test the difference in vision between the left and right eyes of 50 athletes. Although the measurements are not made at two different time (before-after), it is clear that both eyes are dependent within each subject. Therefore, the Student’s t-test for paired samples should be used to account for the dependency between the two samples instead of the standard Student’s t-test for independent samples.

Another criteria for choosing the appropriate version of the Student’s t-test is whether the variances of the populations (not the variances of the samples!) are known or unknown and equal or unequal. This criteria is rather straightforward, we either know the variances of the populations or we do not. The variances of the populations cannot be computed because if you can compute the variance of a population, it means you have the data for the whole population, then there is no need to do a hypothesis test anymore… So the variances of the populations are either given in the statement (use them in that case), or there is no information about these variances and in this case, it is assumed that the variances are unknown. In practice, the variances of the populations are most of the time unknown and the only thing to do in order to choose the appropriate version of the test is to check whether the variances are equal or not. However, we still illustrate how to do all versions of this test by hand and in R in the next sections following the 4 steps of hypothesis testing.

How to compute Student’s t-test by hand?

Note that the data are artificial and do not represent any real variable. Furthermore, remind that the assumptions may or may not be met. The point of the article is to detail how to compute the different versions of the test by hand and in R, so all assumptions are assumed to be met. Moreover, assume that the significance level \(\alpha = 5\) % for all tests.

If you are interested in applying these tests by hand without having to do the computations yourself, here is a Shiny app which does it for you. You just need to enter the data and choose the appropriate version of the test thanks to the sidebar menu. There is also a graphical representation that helps you to visualize the test statistic and the rejection region. I hope you will find it useful!

For the first scenario, suppose the data below. Moreover, suppose that the two samples are independent, that the variances \(\sigma^2 = 1\) in both populations and that we would like to test whether the two population means are different.

So we have:

  • 5 observations in each sample: \(n_1 = n_2 = 5\)
  • mean of sample 1: \(\bar{x}_1 = 0.02\)
  • mean of sample 2: \(\bar{x}_2 = 0.06\)
  • variances of both populations: \(\sigma^2_1 = \sigma^2_2 = 1\)

Following the 4 steps of hypothesis testing we have:

  • \(H_0: \mu_1 = \mu_2\) and \(H_1: \mu_1 - \mu_2 \ne 0\) . ( \(\ne\) because we want to test whether the two means are different, we do not impose a direction in the test.)
  • Test statistic: \[z_{obs} = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}}\] \[= \frac{0.02-0.06-0}{0.632} = -0.063\]
  • Critical value: \(\pm z_{\alpha / 2} = \pm z_{0.025} = \pm 1.96\) (see a guide on how to read statistical tables if you struggle to find the critical value)
  • Conclusion: The rejection regions are thus from \(-\infty\) to -1.96 and from 1.96 to \(+\infty\) . The test statistic is outside the rejection regions so we do not reject the null hypothesis \(H_0\) . In terms of the initial question: At the 5% significance level, we do not reject the hypothesis that the two population means are the same, or there is no sufficient evidence in the data to conclude that the two populations considered are different.

For the second scenario, suppose the data below. Moreover, suppose that the two samples are independent, that the variances in both populations are unknown but equal ( \(\sigma^2_1 = \sigma^2_1\) ) and that we would like to test whether the mean of population 1 is larger than the mean of population 2.

  • 6 observations in sample 1: \(n_1 = 6\)
  • 5 observations in sample 2: \(n_2 = 5\)
  • mean of sample 1: \(\bar{x}_1 = 1.247\)
  • mean of sample 2: \(\bar{x}_2 = 0.1\)
  • variance of sample 1: \(s^2_1 = 0.303\)
  • variance of sample 2: \(s^2_1 = 0.315\)
  • \(H_0: \mu_1 = \mu_2\) and \(H_1: \mu_1 - \mu_2 > 0\) . (> because we want to test if the mean of the first population is larger than the mean of the second population.)
  • Test statistic: \[t_{obs} = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\] where \[s_p = \sqrt{\frac{(n_1-1)s^2_1+ (n_2 - 1)s^2_2}{n_1 + n_2 - 2}} = 0.555\] so \[t_{obs} = \frac{1.247-0.1-0}{0.555 * 0.606} = 3.411\] (Note that as it is assumed the variances of the two populations are equal, a pooled (common) variance, denoted \(s_p\) , is computed.)
  • Critical value: \(t_{\alpha, n_1 + n_2 - 2} = t_{0.05, 9} = 1.833\)
  • Conclusion: The rejection region is thus from 1.833 to \(+\infty\) (there is only one rejection region because it is a one-sided test). The test statistic lies within the rejection region so we reject the null hypothesis \(H_0\) . In terms of the initial question: At the 5% significance level, we conclude that the mean of population 1 is larger than the mean of population 2.

For the third scenario, suppose the data below. Moreover, suppose that the two samples are independent, that the variances in both populations are unknown and unequal ( \(\sigma^2_1 \ne \sigma^2_1\) ) and that we would like to test whether the mean of population 1 is smaller than the mean of population 2.

  • 5 observations in sample 1: \(n_1 = 5\)
  • 6 observations in sample 2: \(n_2 = 6\)
  • mean of sample 1: \(\bar{x}_1 = 0.42\)
  • mean of sample 2: \(\bar{x}_2 = 1.247\)
  • variance of sample 1: \(s^2_1 = 0.107\)
  • variance of sample 2: \(s^2_1 = 0.303\)
  • \(H_0: \mu_1 = \mu_2\) and \(H_1: \mu_1 - \mu_2 < 0\) . (< because we want to test if the mean of the first population is smaller than the mean of the second population.)
  • Test statistic: \[t_{obs} = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}}\] \[= \frac{0.42-1.247-0}{0.268} = -3.084\]
  • Critical value: \(-t_{\alpha, \upsilon}\) where \[\upsilon = \frac{\bigg(\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2} \bigg)^2}{\frac{\bigg(\frac{s^2_1}{n_1}\bigg)^2}{n_1 - 1} + \frac{\bigg(\frac{s^2_2}{n_2}\bigg)^2}{n_2 - 1}} = 8.28\] so \[-t_{0.05, 8.28} = -1.851\] Note: The degrees of freedom 8.28 does not exist in the standard Student distribution table, so simply take 8, or compute it in R with qt(p = 0.05, df = 8.28) . For simplicity, this number of degrees of freedom is sometimes approximated as \(df = min(n_1 - 1, n_2 - 1)\) , so in this case it would be \(df = 4\) .
  • Conclusion: The rejection region is thus from \(-\infty\) to -1.851. The test statistic lies within the rejection region so we reject the null hypothesis \(H_0\) . In terms of the initial question: At the 5% significance level, we conclude that the mean of population 1 is smaller than the mean of population 2.

Student’s t-test with paired samples are a bit different than with independent samples, they are actually more similar to one sample Student’s t-test . Here is how it works. We actually compute the difference between the two samples for each pair of observations, and then we work on these differences as if we were doing a one sample Student’s t-test by computing the test statistic on these differences.

In case it is not clear, here is the fourth scenario as an illustration. Suppose the data below. Moreover, suppose that the two samples are dependent (matched), that the variance of the differences in the population is known and equal to 1 ( \(\sigma^2_D = 1\) ) and that we would like to test whether the mean difference between the two populations is different than 0.

The first thing to do is to compute the differences for all pairs of observations:

  • number of pairs: \(n = 5\)
  • mean of the difference: \(\bar{D} = 0.04\)
  • variance of the difference in the population: \(\sigma^2_D = 1\)
  • standard deviation of the difference in the population: \(\sigma_D = 1\)
  • \(H_0: \mu_D = 0\) and \(H_1: \mu_D \ne 0\)
  • Test statistic: \[z_{obs} = \frac{\bar{D} - \mu_0}{\frac{\sigma_D}{\sqrt{n}}} = \frac{0.04-0}{0.447} = 0.089\] (This formula is exactly the same than for one sample Student’s t-test with a known variance, except that we work on the mean of the differences.)
  • Critical value: \(\pm z_{\alpha/2} = \pm z_{0.025} = \pm 1.96\)
  • Conclusion: The rejection regions are thus from \(-\infty\) to -1.96 and from 1.96 to \(+\infty\) . The test statistic is outside the rejection regions so we do not reject the null hypothesis \(H_0\) . In terms of the initial question: At the 5% significance level, we do not reject the hypothesis that the mean difference between the two populations is equal to 0.

For the fifth and final scenario, suppose the data below. Moreover, suppose that the two samples are dependent (matched), that the variance of the differences in the population is unknown and that we would like to test whether a treatment is effective in increasing running capabilities (the higher the value, the better in terms of running capabilities).

  • mean of the difference: \(\bar{D} = 8\)
  • variance of the difference in the sample: \(s^2_D = 16\)
  • standard deviation of the difference in the sample: \(s_D = 4\)
  • \(H_0: \mu_D = 0\) and \(H_1: \mu_D > 0\) (> because we would like to test whether the treatment is effective, so whether the treatment has a positive impact on the running capabilities.)
  • Test statistic: \[t_{obs} = \frac{\bar{D} - \mu_0}{\frac{s_D}{\sqrt{n}}} = \frac{8-0}{1.789} = 4.472\] (This formula is exactly the same than for one sample Student’s t-test with an unknown variance, except that we work on the mean of the differences.)
  • Critical value: \(t_{\alpha, n-1} = t_{0.05, 4} = 2.132\) ( n is the number of pairs, not the number of observations!)
  • Conclusion: The rejection regions are thus from 2.132 to \(+\infty\) . The test statistic lies within the rejection region so we reject the null hypothesis \(H_0\) . In terms of the initial question: At the 5% significance level, we conclude that the treatment has a positive impact on the running capabilities (because the mean of the differences is greater than 0)

This concludes how to perform the different versions of the Student’s t-test for two samples by hand. In the next sections, we detail how to perform the exact same tests in R.

How to compute Student’s t-test in R?

A good practice before doing t-tests in R is to visualize the data by group thanks to a boxplot (or a density plot , or eventually both). A boxplot with the two boxes overlapping each other gives a first indication that the two samples are similar, and thus, that the null hypothesis of equal means may not be rejected. On the contrary, if the two boxes are not overlapping, it indicates that the two samples are not similar, and thus, that the populations may be different in terms of the considered variable. However, even if boxplots or density plots are great in showing a comparison between the two groups, only a sound statistical test will confirm our first impression.

After a visualization of the data by group, we replicate in R the results found by hand. We will see that for some versions of the t-test, there is no default function built in R (at least to my knowledge, do not hesitate to let me know in the comments if I’m mistaken). In these cases, a function is written to replicate the results by hand.

Note that we use the same data, the same assumptions and the same question for all 5 scenarios to facilitate the comparison between the tests performed by hand and in R.

two tailed hypothesis test r

Note that you can use the {esquisse} RStudio addin if you want to draw a boxplot with the package {ggplot2} without writing the code yourself. If you prefer the default graphics, use the boxplot() function:

two tailed hypothesis test r

The two boxes seem to overlap which illustrate that the two samples are quite similar, so we tend to believe that we will not be able to reject the null hypothesis that the two population means are similar. However, only a formal statistical test will confirm this belief.

Below a function to perform a t-test with known variances, with arguments accepting:

  • the two samples ( x and y ),
  • the two variances of the populations ( V1 and V2 ),
  • the difference in means under the null hypothesis ( m0 , default is 0 ),
  • the significance level ( alpha , default is 0.05 )
  • and the alternative ( alternative , one of "two.sided" (default), "less" or "greater" ):

The output above recaps all the information needed to perform the test: the test statistic, the p -value, the alternative used, the two sample means and the two variances of the populations (compare these results found in R with the results found by hand).

The p -value can be extracted as usual:

The p -value is 0.95 so at the 5% significance level we do not reject the null hypothesis of equal means. There is no sufficient evidence in the data to reject the hypothesis that the two means in the populations are similar. This result confirms what we found by hand.

Note that a similar function exists in the {BSDA} package: 4

For those unfamiliar with the concept of p -value, the p -value is a probability and as any probability it goes from 0 to 1. The p -value is the probability of having observations at least as extreme as the one we measured (via the samples) if the null hypothesis were true . In other words, it is the probability of having a test statistic at least as extreme as the one we computed, given that the null hypothesis is true. In some sense, it gives you an indication on how likely your null hypothesis is . It is also defined as the smallest level of significance for which the data indicate rejection of the null hypothesis.

If the observations are not so extreme—not unlikely to occur if the null hypothesis were true—we do not reject this null hypothesis because it is deemed plausible to be true. And if the observations are considered too extreme—too unlikely to happen under the null hypothesis—we reject the null hypothesis because it is deemed too implausible to be true. Note that it does not mean that we are 100% sure that it is too unlikely, it happens sometimes that the null hypothesis is rejected although it is true (see the significance level \(\alpha\) later on).

In our example above, the observations are not really extreme and the difference between the two means is not extreme, so the test statistic is not extreme (since the test statistic is partially based on the difference of the means of the two samples). Having a test statistic which is not extreme is not unlikely and that is the reason why the p -value is quite high. The p -value of 0.95 actually tells us that the probability of having two samples with a difference in means of -0.04 (= 0.02 - 0.06), given that the difference in means in the populations is 0 (the null hypothesis), equals 95%. A probability of 95% is definitely considered as plausible, so we do not reject the null hypothesis of equal means in the populations.

One may then wonder, “What is too extreme for a test statistic?” Most of the time, we consider that a test statistic is too extreme to happen just by chance when the probability of having such an extreme test statistic given that the null hypothesis is true is below 5%. The threshold of 5% ( \(\alpha = 0.05\) ) that you very often see in statistic courses or textbooks is the threshold used in many fields. With a p -value under that threshold of 5%, we consider that the observations (and thus the test statistic) is too unlikely to happen just by chance if the null hypothesis were true, so the null hypothesis is rejected. With a p -value above that threshold of 5%, we consider that it is not really implausible to face the observations we have if the null hypothesis were true, and we therefore do not reject the null hypothesis.

Note that I wrote “we do not reject the null hypothesis”, and not “we accept the null hypothesis”. This is because it may be the case that the null hypothesis is in fact false, but we failed to prove it with the samples. Suppose the analogy of a suspect accused of murder and we do not know the truth. On the one hand, if we have collected enough evidence that the suspect committed the murder, he is considered guilty: we reject the null hypothesis that he is innocent. On the other hand, if we have not collected enough evidence against the suspect, he is presumed to be innocent although he may in fact have committed the crime: we failed to reject the null hypothesis of him being innocent. We are never sure that he did not commit the crime even if he is released, we just did not find sufficient evidence against the null hypothesis of the suspect being innocent. This is the reason why we do not reject the null hypothesis instead of accepting it, and why you will often read things like “there is no sufficient evidence in the data to reject the null hypothesis” or “based on the samples we fail to reject the null hypothesis”.

The significance level \(\alpha\) , derived from the threshold of 5% mentioned earlier, is the probability of rejecting the null hypothesis when it is in fact true . In this sense, it is an error (of 5%) that we accept to deal with, in order to be able to draw conclusions. If we would accept no error (an error of 0%), we would not be able to draw any conclusion about the population(s) since we only have access to a limited portion of the population(s) via the sample(s). As a consequence, we will never be 100% sure when interpreting the result of a hypothesis test unless we have access to the data for the entire population, but then there is no reason to do a hypothesis test anymore since we can simply compare the two populations. We usually allow this error (called Type I error) to be 5%, but in order to be a bit more certain when concluding that we reject the null hypothesis, the alpha level can also be set to 1% (or even to 0.1% in some rare cases).

To sum up what you need to remember about p -value and significance level \(\alpha\) :

  • If the p -value is smaller than the predetermined significance level \(\alpha\) (usually 5%) so if p -value < 0.05 \(\rightarrow H_0\) is unlikely \(\rightarrow\) we reject the null hypothesis
  • If the p -value is greater than or equal to the predetermined significance level \(\alpha\) (usually 5%) so if p -value \(\ge\) 0.05 \(\rightarrow H_0\) is likely \(\rightarrow\) we do not reject the null hypothesis

This applies to all statistical tests without exception. Of course, the null and alternative hypotheses change depending on the test.

A rule of thumb is that, for most hypothesis tests, the alternative hypothesis is what you want to test and the null hypothesis is the status quo. Take this with extreme caution (!) because, even if it works for all versions of the Student’s t-test it does not apply to ALL statistical tests. For example, when testing for normality, you usually want to test whether your distribution follows a normal distribution. Following this piece of advice, you would write the alternative hypothesis \(H_1:\) the distribution follows a normal distribution. Nonetheless, for normality tests such as the Shapiro-Wilk or Kolmogorov-Smirnov test, it is the opposite; the alternative hypothesis is \(H_1:\) the distribution does not follow a normal distribution. So for every test, make sure to use the correct hypotheses, otherwise the conclusion and interpretation of your test will be wrong.

Last but not least, note that statistical significance is not equal to scientific significance. To this end, a result may be statistically significant (a p -value < \(\alpha\) ), but of little or no interest from a scientific point of view (because the effect is so small that it is negligible and/or useless for instance).

two tailed hypothesis test r

Unlike the previous scenario, the two boxes do not overlap which illustrates that the two samples are different from each other. From this boxplot, we can expect the test to reject the null hypothesis of equal means in the populations. Nonetheless, only a formal statistical test will confirm this expectation.

There is a function in R, and it is simply the t.test() function. This version of the test is actually the “standard” Student’s t-test for two samples. Note that it is assumed that the variances of the two populations are equal so we need to specify it in the function with the argument var.equal = TRUE (the default is FALSE ) and the alternative hypothesis is \(H_1: \mu_1 - \mu_2 > 0\) so we need to add the argument alternative = "greater" as well:

The output above recaps all the information needed to perform the test: the name of the test, the test statistic, the degrees of freedom, the p -value, the alternative used and the two sample means (compare these results found in R with the results found by hand).

The p -value is 0.004 so at the 5% significance level we reject the null hypothesis of equal means. This result confirms what we found by hand.

Unlike the first scenario, the p -value in this scenario is below 5% so we reject the null hypothesis. At the 5% significance level, we can conclude that the mean of population 1 is larger than the mean of population 2.

A nice and easy way to report results of a Student’s t-test in R is with the report() function from the {report} package:

As you can see, the function interprets the test (together with the p -value) for you.

Note that the report() function can be used for other analyses. See more tips and tricks in R if you find this one useful.

If your data is formatted in the long format (which is even better), simply use the tilde ( ~ ). For instance, imagine the exact same data presented like this:

Here is how to perform the Student’s t-test in R with data in the long format:

The results are exactly the same.

two tailed hypothesis test r

There is a function in R for this version of the test as well, and it is simply the t.test() function with the var.equal = FALSE argument. FALSE is the default option for the var.equal argument so you actually do not need to specify it. This version of the test is actually the Welch Student’s test, used when the variances of the populations are unknown and unequal. To test if two population variances are equal, you can use the Levene’s test ( leveneTest(dat3$value, dat3$sample) from the {car} package, or simply by comparing the dispersion of the two samples via a dotplot or a boxplot). Note that the alternative hypothesis is \(H_1: \mu_1 - \mu_2 < 0\) so we need to add the argument alternative = "less" as well:

The output above recaps all the information needed to perform the test (compare these results found in R with the results found by hand).

The p -value is 0.007 so at the 5% significance level we reject the null hypothesis of equal means, meaning that we can conclude that the mean of population 1 is smaller than the mean of population 2. This result confirms what we found by hand.

For the fourth scenario, suppose the data below. Moreover, suppose that the two samples are dependent (matched), that the variance of the differences in the population is known and equal to 1 ( \(\sigma^2_D = 1\) ) and that we would like to test whether the mean difference between the two populations is different than 0.

two tailed hypothesis test r

Since there is no function in R to perform a t-test with paired samples where the variance of the differences is known, here is one with arguments accepting:

  • the differences between the two samples ( x ),
  • the variance of the differences in the population ( V ),
  • the mean of the differences under the null hypothesis ( m0 , default is 0 ),

The p -value is 0.929 so at the 5% significance level we do not reject the null hypothesis of the mean of the differences being equal to 0. There is no sufficient evidence in the data to reject the hypothesis that the mean difference between the two populations is equal to 0. This result confirms what we found by hand.

two tailed hypothesis test r

There is a function in R for this version of the test, and it is simply the t.test() function with the paired = TRUE argument. This version of the test is actually the standard version of the Student’s t-test with paired samples. Note that the alternative hypothesis is \(H_1: \mu_D > 0\) so we need to add the argument alternative = "greater" as well:

Note that we wrote after and then before in this order. If you write before and then after , make sure to change the alternative to alternative = "less" .

If your data is in the long format, use the tilde ~ :

The p -value is 0.006 so at the 5% significance level we reject the null hypothesis of the mean of the differences being equal to 0, meaning that we can conclude that the treatment is effective in increasing the running capabilities (because the mean of the differences is greater than 0). This result confirms what we found by hand.

Combination of plot and statistical test

After having written this article, I discovered the {ggstatsplot} package which I believe is worth mentioning here, in particular the ggbetweenstats() and ggwithinstats() functions for independent and paired samples, respectively.

These two functions combine a boxplot—representing the distribution for each group—and the results of the statistical test displayed in the subtitle of the plot.

See examples below for scenarios 2, 3 and 5. Unfortunately, the package does not allow to run tests for scenarios 1 and 4.

The ggbetweenstats() function is used for independent samples:

two tailed hypothesis test r

The p -value is displayed after p = in the subtitle of the plot. Based on this plot and the p -value being lower than 5% ( p -value = 0.008), we reject the null hypothesis that the two population means are equal.

Note that, the p -value is two times as large as the one obtained with the t.test() function because when we ran t.test() we specified alternative = "greater" (i.e., a one-sided test). In our plot with the ggbetweenstats() function, it is a two-sided test that is performed by default, that is, alternative = "two.sided" .

We also have independent samples so we use the ggbetweenstats() function again, but this time the two populations variances are not assumed to be equal so we specify the argument var.equal = FALSE :

two tailed hypothesis test r

Based on the output, we reject the null hypothesis that the two population means are equal ( p -value = 0.01).

Note that the p -value displayed in the subtitle of the plot is also two times larger than with the t.test() function for the same reason than above.

In this case, the samples are paired so we use the ggwithinstats() function:

two tailed hypothesis test r

Based on the output, we reject the null hypothesis that mean of the differences between the two populations is equal to 0 ( p -value = 0.01).

Again the p -value in the subtitle of the plot is twice the one obtained with the t.test() function for the same reason than above.

The point of this section was to illustrate how to easily draw plots together with statistical results, which is exactly the aim of the {ggstatsplot} package. See more details and examples in this article .

As for many statistical tests , there are some assumptions that need to be met in order to be able to interpret the results. When one or several of them are not met, although it is technically possible to perform these tests, it would be incorrect to interpret the results or trust the conclusions.

Below are the assumptions of the Student’s t-test for two samples, how to test them and which other tests exist if an assumption is not met:

  • Variable type : A Student’s t-test requires a mix of one quantitative dependent variable (which corresponds to the measurements to which the question relates) and one qualitative independent variable (with exactly 2 levels which will determine the groups to compare).
  • Independence : The data, collected from a representative and randomly selected portion of the population , should be independent between groups and within each group. The assumption of independence is most often verified based on the design of the experiment and on the good control of experimental conditions rather than via a formal test. If you are still unsure about independence based on the experiment design, ask yourself if one observation is related to another (if one observation has an impact on another) within each group or between the groups themselves. If not, it is most likely that you have independent samples . If observations between samples (forming the different groups to be compared) are dependent (for example, if two measurements have been collected on the same individuals as it is often the case in medical studies when measuring a metric (i) before and (ii) after a treatment), the paired version of the Student’s t-test, called the Student’s t-test for paired samples, should be preferred in order to take into account the dependency between the two groups to be compared.
  • With small samples (usually \(n < 30\) ), when the two samples are independent, observations in both samples should follow a normal distribution . When using the Student’s t-test for paired samples, it is the difference between the observations of the two samples that should follow a normal distribution. The normality assumption can be tested visually thanks to a histogram and a QQ-plot , and/or formally via a normality test such as the Shapiro-Wilk or Kolmogorov-Smirnov test. If, even after a transformation (e.g., logarithmic transformation, square root, etc.), your data still do not follow a normal distribution, the Wilcoxon test ( wilcox.test(variable1 ~ variable2, data = dat in R) can be applied. This non-parametric test, robust to non normal distributions, compares the medians instead of the means in order to compare the two populations.
  • With large samples (usually \(n \ge 30\) ), normality of the data is not required (this is a common misconception!). By the central limit theorem , sample means of large samples are often well-approximated by a normal distribution even if the data are not normally distributed ( Stevens 2013 ) . It is therefore not required to test the normality assumption when the number of observations in each group/sample is large.
  • Equality of variances : When the two samples are independent, the variances of the two groups should be equal in the populations (an assumption called homogeneity of the variances , or even sometimes referred as homoscedasticity, as opposed to heteroscedasticity if variances are different across groups). This assumption can be tested graphically (by comparing the dispersion in a boxplot or dotplot for instance), or more formally via the Levene’s test ( leveneTest(variable ~ group) from the {car} package) or via a F test ( var.test(variable ~ group) ). If the hypothesis of equal variances is rejected, another version of the Student’s t-test can be used: the Welch test ( t.test(variable ~ group, var.equal = FALSE) ). Note that the Welch test does not require homogeneity of the variances, but the distributions should still follow a normal distribution in case of small sample sizes. If your distributions are not normally distributed or the variances are unequal, the Wilcoxon test should be used. This test does not require the assumptions of normality nor homoscedasticity of the variances.
  • use the non-parametric version (i.e., the Wilcoxon test )
  • transform your data (logarithmic or Box-Cox transformation, among others)
  • or remove them (be careful)

This concludes a relatively long article. Thanks for reading.

I hope this article helped you to understand how the different versions of the Student’s t-test for two samples work and how to perform them by hand and in R. If you are interested, here is a Shiny app to perform these tests by hand easily (you just need to enter your data and select the appropriate version of the test thanks to the sidebar menu).

Moreover, I invite you to read:

  • this article if you would like to know how to compute the Student’s t-test but this time, for one sample,
  • this article if you would like to compare 2 groups under the non-normality assumption, or
  • this article if you would like to use an ANOVA to compare 3 or more groups.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.

Remind that inferential statistics, as opposed to descriptive statistics , is a branch of statistics defined as the science of drawing conclusions about a population from observations made on a representative sample of that population. See the difference between population and sample . ↩︎

For the rest of the present article, when we write Student’s t-test, we refer to the case of 2 samples. See one sample t-test if you want to compare only one sample. ↩︎

It is a least the case for parametric hypothesis tests. A parametric test means that it is based on a theoretical statistical distribution, which depends on some defined parameters. In the case of the Student’s t-test for two samples, it is based on the Student’s t distribution with a single parameter, the degrees of freedom ( \(df = n_1 + n_2 - 2\) where \(n_1\) and \(n_2\) are the two sample sizes), or the normal distribution. ↩︎

Thanks gmacar for pointing it out to me. ↩︎

Related articles

  • Wilcoxon test in R: how to compare 2 groups under the non-normality assumption?
  • Correlation coefficient and correlation test in R
  • One-proportion and chi-square goodness of fit test
  • How to do a t-test or ANOVA for more than one variable at once in R?

Liked this post?

  • Get updates every time a new article is published (no spam and unsubscribe anytime):

Yes, receive new posts by email

  • Support the blog

Consulting FAQ Contribute Sitemap

two tailed hypothesis test r

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.1 hypothesis testing (critical value approach).

The critical value approach involves determining "likely" or "unlikely" by determining whether or not the observed test statistic is more extreme than would be expected if the null hypothesis were true. That is, it entails comparing the observed test statistic to some cutoff value, called the " critical value ." If the test statistic is more extreme than the critical value, then the null hypothesis is rejected in favor of the alternative hypothesis. If the test statistic is not as extreme as the critical value, then the null hypothesis is not rejected.

Specifically, the four steps involved in using the critical value approach to conducting any hypothesis test are:

  • Specify the null and alternative hypotheses.
  • Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. To conduct the hypothesis test for the population mean μ , we use the t -statistic \(t^*=\frac{\bar{x}-\mu}{s/\sqrt{n}}\) which follows a t -distribution with n - 1 degrees of freedom.
  • Determine the critical value by finding the value of the known distribution of the test statistic such that the probability of making a Type I error — which is denoted \(\alpha\) (greek letter "alpha") and is called the " significance level of the test " — is small (typically 0.01, 0.05, or 0.10).
  • Compare the test statistic to the critical value. If the test statistic is more extreme in the direction of the alternative than the critical value, reject the null hypothesis in favor of the alternative hypothesis. If the test statistic is less extreme than the critical value, do not reject the null hypothesis.

Example S.3.1.1

Mean gpa section  .

In our example concerning the mean grade point average, suppose we take a random sample of n = 15 students majoring in mathematics. Since n = 15, our test statistic t * has n - 1 = 14 degrees of freedom. Also, suppose we set our significance level α at 0.05 so that we have only a 5% chance of making a Type I error.

Right-Tailed

The critical value for conducting the right-tailed test H 0 : μ = 3 versus H A : μ > 3 is the t -value, denoted t \(\alpha\) , n - 1 , such that the probability to the right of it is \(\alpha\). It can be shown using either statistical software or a t -table that the critical value t 0.05,14 is 1.7613. That is, we would reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ > 3 if the test statistic t * is greater than 1.7613. Visually, the rejection region is shaded red in the graph.

t distribution graph for a t value of 1.76131

Left-Tailed

The critical value for conducting the left-tailed test H 0 : μ = 3 versus H A : μ < 3 is the t -value, denoted -t ( \(\alpha\) , n - 1) , such that the probability to the left of it is \(\alpha\). It can be shown using either statistical software or a t -table that the critical value -t 0.05,14 is -1.7613. That is, we would reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ < 3 if the test statistic t * is less than -1.7613. Visually, the rejection region is shaded red in the graph.

t-distribution graph for a t value of -1.76131

There are two critical values for the two-tailed test H 0 : μ = 3 versus H A : μ ≠ 3 — one for the left-tail denoted -t ( \(\alpha\) / 2, n - 1) and one for the right-tail denoted t ( \(\alpha\) / 2, n - 1) . The value - t ( \(\alpha\) /2, n - 1) is the t -value such that the probability to the left of it is \(\alpha\)/2, and the value t ( \(\alpha\) /2, n - 1) is the t -value such that the probability to the right of it is \(\alpha\)/2. It can be shown using either statistical software or a t -table that the critical value -t 0.025,14 is -2.1448 and the critical value t 0.025,14 is 2.1448. That is, we would reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ ≠ 3 if the test statistic t * is less than -2.1448 or greater than 2.1448. Visually, the rejection region is shaded red in the graph.

t distribution graph for a two tailed test of 0.05 level of significance

  • Search Search Please fill out this field.

What Is a Two-Tailed Test?

Understanding a two-tailed test, special considerations, two-tailed vs. one-tailed test.

  • Two-Tailed Test FAQs
  • Corporate Finance
  • Financial Analysis

What Is a Two-Tailed Test? Definition and Example

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

two tailed hypothesis test r

Investopedia / Joules Garcia

A two-tailed test, in statistics, is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. It is used in null-hypothesis testing and testing for statistical significance . If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.

Key Takeaways

  • In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater or less than a range of values.
  • It is used in null-hypothesis testing and testing for statistical significance.
  • If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
  • By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.

A basic concept of inferential statistics is hypothesis testing , which determines whether a claim is true or not given a population parameter. A hypothesis test that is designed to show whether the mean of a sample is significantly greater than and significantly less than the mean of a population is referred to as a two-tailed test. The two-tailed test gets its name from testing the area under both tails of a normal distribution , although the test can be used in other non-normal distributions.

A two-tailed test is designed to examine both sides of a specified data range as designated by the probability distribution involved. The probability distribution should represent the likelihood of a specified outcome based on predetermined standards. This requires the setting of a limit designating the highest (or upper) and lowest (or lower) accepted variable values included within the range. Any data point that exists above the upper limit or below the lower limit is considered out of the acceptance range and in an area referred to as the rejection range.

There is no inherent standard about the number of data points that must exist within the acceptance range. In instances where precision is required, such as in the creation of pharmaceutical drugs, a rejection rate of 0.001% or less may be instituted. In instances where precision is less critical, such as the number of food items in a product bag, a rejection rate of 5% may be appropriate.

A two-tailed test can also be used practically during certain production activities in a firm, such as with the production and packaging of candy at a particular facility. If the production facility designates 50 candies per bag as its goal, with an acceptable distribution of 45 to 55 candies, any bag found with an amount below 45 or above 55 is considered within the rejection range.

To confirm the packaging mechanisms are properly calibrated to meet the expected output, random sampling may be taken to confirm accuracy. A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.

For the packaging mechanisms to be considered accurate, an average of 50 candies per bag with an appropriate distribution is desired. Additionally, the number of bags that fall within the rejection range needs to fall within the probability distribution limit considered acceptable as an error rate. Here, the null hypothesis would be that the mean is 50 while the alternate hypothesis would be that it is not 50.

If, after conducting the two-tailed test, the z-score falls in the rejection region, meaning that the deviation is too far from the desired mean, then adjustments to the facility or associated equipment may be required to correct the error. Regular use of two-tailed testing methods can help ensure production stays within limits over the long term.

Be careful to note if a statistical test is one- or two-tailed as this will greatly influence a model's interpretation.

When a hypothesis test is set up to show that the sample mean would be only higher than the population mean, this is referred to as a  one-tailed test . A formulation of this hypothesis would be, for example, that "the returns on an investment fund would be  at least  x%." One-tailed tests could also be set up to show that the sample mean could be only less than the population mean. The key difference from a two-tailed test is that in a two-tailed test, the sample mean could be different from the population mean by being  either  higher or lower than it.

If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis. A one-tailed test is also known as a directional hypothesis or directional test.

A two-tailed test, on the other hand, is designed to examine both sides of a specified data range to test whether a sample is greater than or less than the range of values.

Example of a Two-Tailed Test

As a hypothetical example, imagine that a new  stockbroker , named XYZ, claims that their brokerage fees are lower than that of your current stockbroker, ABC) Data available from an independent research firm indicates that the mean and standard deviation of all ABC broker clients are $18 and $6, respectively.

A sample of 100 clients of ABC is taken, and brokerage charges are calculated with the new rates of XYZ broker. If the mean of the sample is $18.75 and the sample standard deviation is $6, can any inference be made about the difference in the average brokerage bill between ABC and XYZ broker?

  • H 0 : Null Hypothesis: mean = 18
  • H 1 : Alternative Hypothesis: mean <> 18 (This is what we want to prove.)
  • Rejection region: Z <= - Z 2.5  and Z>=Z 2.5  (assuming 5% significance level, split 2.5 each on either side).
  • Z = (sample mean – mean) / (std-dev / sqrt (no. of samples)) = (18.75 – 18) / (6/(sqrt(100)) = 1.25

This calculated Z value falls between the two limits defined by: - Z 2.5  = -1.96 and Z 2.5  = 1.96.

This concludes that there is insufficient evidence to infer that there is any difference between the rates of your existing broker and the new broker. Therefore, the null hypothesis cannot be rejected. Alternatively, the p-value = P(Z< -1.25)+P(Z >1.25) = 2 * 0.1056 = 0.2112 = 21.12%, which is greater than 0.05 or 5%, leads to the same conclusion.

How Is a Two-Tailed Test Designed?

A two-tailed test is designed to determine whether a claim is true or not given a population parameter. It examines both sides of a specified data range as designated by the probability distribution involved. As such, the probability distribution should represent the likelihood of a specified outcome based on predetermined standards.

What Is the Difference Between a Two-Tailed and One-Tailed Test?

A two-tailed hypothesis test is designed to show whether the sample mean is significantly greater than  or  significantly less than the mean of a population. The two-tailed test gets its name from testing the area under both tails (sides) of a normal distribution. A one-tailed hypothesis test, on the other hand, is set up to show only one test; that the sample mean would be higher than the population mean, or, in a separate test, that the sample mean would be lower than the population mean.

What Is a Z-score?

A Z-score numerically describes a value's relationship to the mean of a group of values and is measured in terms of the number of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score whereas Z-scores of 1.0 and -1.0 would indicate values one standard deviation above or below the mean. In most large data sets, 99% of values have a Z-score between -3 and 3, meaning they lie within three standard deviations above and below the mean.

San Jose State University. " 6: Introduction to Null Hypothesis Significance Testing ."

two tailed hypothesis test r

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

One-Sample T-test in R

  • Discussion (1)
  • Printable version

What is one-sample t-test?

Research questions and statistical hypotheses, formula of one-sample t-test, install ggpubr r package for data visualization, r function to compute one-sample t-test, import your data into r, check your data, visualize your data using box plots, preleminary test to check one-sample t-test assumptions, compute one-sample t-test, interpretation of the result, access to the values returned by t.test() function, online one-sample t-test calculator.

Generally, the theoretical mean comes from:

  • a previous experiment. For example, compare whether the mean weight of mice differs from 200 mg, a value determined in a previous study.
  • or from an experiment where you have control and treatment conditions. If you express your data as “percent of control”, you can test whether the average value of treatment condition differs significantly from 100.

Note that, one-sample t-test can be used only, when the data are normally distributed . This can be checked using Shapiro-Wilk test .

Related Book:

Typical research questions are:

  • whether the mean ( \(m\) ) of the sample is equal to the theoretical mean ( \(\mu\) )?
  • whether the mean ( \(m\) ) of the sample is less than the theoretical mean ( \(\mu\) )?
  • whether the mean ( \(m\) ) of the sample is greater than the theoretical mean ( \(\mu\) )?

In statistics, we can define the corresponding null hypothesis ( \(H_0\) ) as follow:

  • \(H_0: m = \mu\)
  • \(H_0: m \leq \mu\)
  • \(H_0: m \geq \mu\)

The corresponding alternative hypotheses ( \(H_a\) ) are as follow:

  • \(H_a: m \ne \mu\) (different)
  • \(H_a: m > \mu\) (greater)
  • \(H_a: m (less)
  • Hypotheses 1) are called two-tailed tests
  • Hypotheses 2) and 3) are called one-tailed tests

The t-statistic can be calculated as follow:

\[ t = \frac{m-\mu}{s/\sqrt{n}} \]

  • m is the sample mean
  • n is the sample size
  • s is the sample standard deviation with \(n-1\) degrees of freedom
  • \(\mu\) is the theoretical value

We can compute the p-value corresponding to the absolute value of the t-test statistics (|t|) for the degrees of freedom (df): \(df = n - 1\) .

How to interpret the results?

If the p-value is inferior or equal to the significance level 0.05, we can reject the null hypothesis and accept the alternative hypothesis. In other words, we conclude that the sample mean is significantly different from the theoretical mean.

Visualize your data and compute one-sample t-test in R

You can draw R base graps as described at this link: R base graphs . Here, we’ll use the ggpubr R package for an easy ggplot2-based data visualization

  • Install the latest version from GitHub as follow (recommended):
  • Or, install from CRAN as follow:

To perform one-sample t-test, the R function t.test () can be used as follow:

  • x : a numeric vector containing your data values
  • mu : the theoretical mean. Default is 0 but you can change it.
  • alternative : the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.

Prepare your data as specified here: Best practices for preparing your data set for R

Save your data in an external .txt tab or .csv files

Import your data into R as follow:

Here, we’ll use an example data set containing the weight of 10 mice.

We want to know, if the average weight of the mice differs from 25g?

  • Min. : the minimum value
  • 1st Qu. : The first quartile. 25% of values are lower than this.
  • Median : the median value. Half the values are lower; half are higher.
  • 3rd Qu. : the third quartile. 75% of values are higher than this.
  • Max. : the maximum value

One-Sample Student’s T-test in R

  • Is this a large sample ? - No, because n < 30.
  • Since the sample size is not large enough (less than 30, central limit theorem), we need to check whether the data follow a normal distribution .

How to check the normality?

Read this article: Normality Test in R .

Briefly, it’s possible to use the Shapiro-Wilk normality test and to look at the normality plot .

  • Null hypothesis: the data are normally distributed
  • Alternative hypothesis: the data are not normally distributed

From the output, the p-value is greater than the significance level 0.05 implying that the distribution of the data are not significantly different from normal distribtion. In other words, we can assume the normality.

  • Visual inspection of the data normality using Q-Q plots (quantile-quantile plots). Q-Q plot draws the correlation between a given sample and the normal distribution.

From the normality plots, we conclude that the data may come from normal distributions.

Note that, if the data are not normally distributed, it’s recommended to use the non parametric one-sample Wilcoxon rank test.

We want to know, if the average weight of the mice differs from 25g (two-tailed test)?

In the result above :

  • t is the t-test statistic value (t = -9.078),
  • df is the degrees of freedom (df= 9),
  • p-value is the significance level of the t-test (p-value = 7.95310^{-6}).
  • conf.int is the confidence interval of the mean at 95% (conf.int = [17.8172, 20.6828]);
  • sample estimates is he mean value of the sample (mean = 19.25).
  • if you want to test whether the mean weight of mice is less than 25g (one-tailed test), type this:
  • Or, if you want to test whether the mean weight of mice is greater than 25g (one-tailed test), type this:

The p-value of the test is 7.95310^{-6}, which is less than the significance level alpha = 0.05. We can conclude that the mean weight of the mice is significantly different from 25g with a p-value = 7.95310^{-6}.

The result of t.test() function is a list containing the following components:

  • statistic : the value of the t test statistics
  • parameter : the degrees of freedom for the t test statistics
  • p.value : the p-value for the test
  • conf.int : a confidence interval for the mean appropriate to the specified alternative hypothesis .
  • estimate : the means of the two groups being compared (in the case of independent t test ) or difference in means (in the case of paired t test ).

The format of the R code to use for getting these values is as follow:

You can perform one-sample t-test , online , without any installation by clicking the following link:

One-sample wilcoxon test (non-parametric)

This analysis has been performed using R software (ver. 3.2.4).

Recommended for You!

Recommended for you.

This section contains best data science and self-development resources to help you on your path.

Coursera - Online Courses and Specialization

Data science.

  • Course: Machine Learning: Master the Fundamentals by Standford
  • Specialization: Data Science by Johns Hopkins University
  • Specialization: Python for Everybody by University of Michigan
  • Courses: Build Skills for a Top Job in any Industry by Coursera
  • Specialization: Master Machine Learning Fundamentals by University of Washington
  • Specialization: Statistics with R by Duke University
  • Specialization: Software Development in R by Johns Hopkins University
  • Specialization: Genomic Data Science by Johns Hopkins University

Popular Courses Launched in 2020

  • Google IT Automation with Python by Google
  • AI for Medicine by deeplearning.ai
  • Epidemiology in Public Health Practice by Johns Hopkins University
  • AWS Fundamentals by Amazon Web Services

Trending Courses

  • The Science of Well-Being by Yale University
  • Google IT Support Professional by Google
  • Python for Everybody by University of Michigan
  • IBM Data Science Professional Certificate by IBM
  • Business Foundations by University of Pennsylvania
  • Introduction to Psychology by Yale University
  • Excel Skills for Business by Macquarie University
  • Psychological First Aid by Johns Hopkins University
  • Graphic Design by Cal Arts

Books - Data Science

  • Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
  • Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
  • Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
  • R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
  • GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
  • Network Analysis and Visualization in R by A. Kassambara (Datanovia)
  • Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
  • Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
  • R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
  • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
  • Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
  • Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
  • An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
  • Deep Learning with R by François Chollet & J.J. Allaire
  • Deep Learning with Python by François Chollet

by FeedBurner

IMAGES

  1. Two Tailed Test Tutorial

    two tailed hypothesis test r

  2. What Is a Two-Tailed Test? Definition and Example

    two tailed hypothesis test r

  3. Hypothesis Testing: Upper, Lower, and Two Tailed Tests

    two tailed hypothesis test r

  4. Two-Tailed Test of Population Proportion in R

    two tailed hypothesis test r

  5. p-value (Two tailed test) : Solved Example 2

    two tailed hypothesis test r

  6. Hypothesis Testing

    two tailed hypothesis test r

VIDEO

  1. HYPOTHESIS TESTING

  2. 1 tailed and 2 tailed Hypothesis

  3. One tailed hypothesis and two tailed hypothesis

  4. Two-Tailed Hypothesis Test for the Population Mean

  5. Relation of a Two-Tailed Hypothesis Test to a Confidence Interval, InvT in the TI-84

  6. Paired-Samples Hotelling T2 Test using R

COMMENTS

  1. The Complete Guide: Hypothesis Testing in R

    A hypothesis test is a formal statistical test we use to reject or fail to reject some statistical hypothesis.. This tutorial explains how to perform the following hypothesis tests in R: One sample t-test; Two sample t-test; Paired samples t-test; We can use the t.test() function in R to perform each type of test:. #one sample t-test t. test (x, y = NULL, alternative = c(" two.sided", "less ...

  2. Two-Tail

    Using R for Hypothesis Testing. This tutorial demonstrates step-by step how to use R and Jupyter Notebook to conduct a two-tailed or two sided hypothesis test. An eight step approach that begins with the formulation of Null and Alternative Hypothesis and ends by stating what the results of the test mean in plain English.

  3. Two Sample t-test in R, with examples

    Specify the alternative hypothesis as "two.sided" because we are performing a two-tailed test. The results are as follows. Welch Two Sample t-test data: boys_dataset and girls_dataset t = 1.0381, df = 37.998, p-value = 0.3058 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.470469 7. ...

  4. How to Use R for Two-Tailed Hypothesis Tests

    Please SUBSCRIBE:https://www.youtube.com/subscription_center?add_user=mjmacartyhttp://alphabench.com/data/two-tail-hypothesis-test.htmlStep by step walk-thro...

  5. Two-Tailed Test of Population Proportion

    The null hypothesis of the two-tailed test about population proportion can be expressed as follows: where p0 is a hypothesized value of the true population proportion p . Let us define the test statistic z in terms of the sample proportion and the sample size: Then the null hypothesis of the two-tailed test is to be rejected if z ≤−zα∕2 ...

  6. Two-Tailed Test of Population Proportion in R

    A two-tailed test in general is a method in which the critical area of a distribution is two-sided (both extremes) and tests whether a sample is greater than or less than a specific range of values. Two-tailed test with alpha = 0.05, (0.025 on either side) Hypothesis testing for proportions involves measuring two outcomes, like success or ...

  7. One-Tailed and Two-Tailed Hypothesis Tests Explained

    Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution. In the example below, I use an alpha of 5% and the distribution has two shaded regions of 2. ...

  8. 5.2

    5.2 - Writing Hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis ( H 0) and an alternative hypothesis ( H a ). When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the ...

  9. Hypothesis Testing

    z-value = (105-100)÷(15÷√7.5) = 2.89. This value 2.89 is called the test statistic. This takes us to our last step. 5. Draw a conclusion. So, if you look at the curve, the value of 2.89 will definitely lie on the red area towards the right of the curve because the critical value of 1.96 is less than 2.89.

  10. Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

    Step 1. Set up hypotheses and select the level of significance α. H 0: Null hypothesis (no change, no difference); H 1: Research hypothesis (investigator's belief); α =0.05. Upper-tailed, Lower-tailed, Two-tailed Tests. The research or alternative hypothesis can take one of three forms.

  11. S.3.2 Hypothesis Testing (P-Value Approach)

    Two-Tailed. In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t* instead of equaling -2.5.The P-value for conducting the two-tailed test H 0: μ = 3 versus H A: μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean ...

  12. How to perform a two-tailed 2-sample t-test in R

    1. I've following problem creating a two-tailed 2-sample t-test. The only values given are the values below. We have two groups S and L, which have mid-term and final exams. My task is to perform a two-tailed 2-sample t-test once on the mid-semester scores and then on the difference of scores from mid-semester to end-semester examinations. The ...

  13. Calculating one-and two-tailed t-tests manually in R

    In order to perform a paired two-tail t-test I use the following code on the differences d: ... What happens when there's a one-tailed hypothesis test and the results are in the other tail? Related. 5. Output of one-tailed Wilcoxon sign rank test in R. 2. One-sided T-test vs. two-sided T-test. 0.

  14. Student's t-test in R and by hand: how to compare two groups under

    Hypothesis testing. In statistics, many statistical tests is in the form of hypothesis tests. Hypothesis tests are used to determine whether a certain belief can be deemed as true (plausible) or not, based on the data at hand (i.e., the sample(s)). Most hypothesis tests boil down to the following 4 steps: 3. State the null and alternative ...

  15. S.3.1 Hypothesis Testing (Critical Value Approach)

    The critical value for conducting the left-tailed test H0 : μ = 3 versus HA : μ < 3 is the t -value, denoted -t( α, n - 1), such that the probability to the left of it is α. It can be shown using either statistical software or a t -table that the critical value -t0.05,14 is -1.7613. That is, we would reject the null hypothesis H0 : μ = 3 ...

  16. One- and two-tailed tests

    In coin flipping, the null hypothesis is a sequence of Bernoulli trials with probability 0.5, yielding a random variable X which is 1 for heads and 0 for tails, and a common test statistic is the sample mean (of the number of heads) ¯. If testing for whether the coin is biased towards heads, a one-tailed test would be used - only large numbers of heads would be significant.

  17. Unpaired Two-Samples T-test in R

    Two Sample t-test data: women_weight and men_weight t = -2.7842, df = 16, p-value = 0.01327 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -29.748019 -4.029759 sample estimates: mean of x mean of y 52.10000 68.98889

  18. What Is a Two-Tailed Test? Definition and Example

    Two-Tailed Test: A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values ...

  19. One-Sample T-test in R

    R function to compute one-sample t-test. To perform one-sample t-test, the R function t.test () can be used as follow: t.test(x, mu = 0, alternative = "two.sided") x: a numeric vector containing your data values. mu: the theoretical mean. Default is 0 but you can change it. alternative: the alternative hypothesis.