- Free Python 3 Tutorial
- Control Flow
- Exception Handling
- Python Programs
- Python Projects
- Python Interview Questions
- Python Database
- Data Science With Python
- Machine Learning with Python
- Explore Our Geeks Community
- Pneumonia Detection using Deep Learning
- How to Perform a Shapiro-Wilk Test in Python
- How to Find a P-Value from a Z-Score in Python?
- How to Conduct a One Sample T-Test in Python
- Black and white image colorization with OpenCV and Deep Learning
- How to Perform a Mann-Kendall Trend Test in Python
- Yawn Detection using OpenCV and Dlib
- How to Find the F Critical Value in Python?
- How to Perform Bartlett’s Test in Python?
- PyBrain - Overview
- How to Find the Z Critical Value in Python?
- What is SoftmaxLayer in PyBrain?
- How to Find a P-Value from a t-Score in Python?
- What is TanhLayer in PyBrain
- PyBrain - Working with Networks
- How to Conduct a Paired Samples T-Test in Python
- Logistic Regression on MNIST with PyTorch
- How OpenCV’s blobFromImage works?
- Difference between Multilayer Perceptron and Linear Regression
How to Conduct a Two Sample T-Test in Python
In this article, we are going to see how to conduct a two-sample T-test in Python.
This test has another name as the independent samples t-test. It is basically used to check whether the unknown population means of given pair of groups are equal. tt allows one to test the null hypothesis that the means of two groups are equal
Before conducting the two-sample t-test using Python let us discuss the assumptions of this parametric test. Basically, there are three assumptions that we can make regarding the data groups:
- Whether the two samples data groups are independent.
- Whether the data elements in respective groups follow any normal distribution.
- Whether the given two samples have similar variances. This assumption is also known as the homogeneity assumption.
Note that even if our data groups don’t follow the three assumptions discussed above. This is because there is an alternate test present if our data do not fall in the normal distribution or we can transform the dependent data group using different techniques like square root, log, etc
Two sample T-Test in Python
Let us consider an example, we are given two-sample data, each containing heights of 15 students of a class. We need to check whether two different class students have the same mean height. There are three ways to conduct a two-sample T-Test in Python.
Method 1: Using Scipy library
Scipy stands for scientific python and as the name implies it is a scientific python library and it uses Numpy under the cover. This library provides a variety of functions that can be quite useful in data science. Firstly, let’s create the sample data. Now let’s perform two sample T-Test. For this purpose, we have ttest_ind() function in Python.
Syntax: ttest_ind(data_group1, data_group2, equal_var=True/False) Here, data_group1: First data group data_group2: Second data group equal_var = “True”: The standard independent two sample t-test will be conducted by taking into consideration the equal population variances. equal_var = “False”: The Welch’s t-test will be conducted by not taking into consideration the equal population variances.
Note that by default equal_var is True
Before conducting the two-sample T-Test we need to find if the given data groups have the same variance. If the ratio of the larger data groups to the small data group is less than 4:1 then we can consider that the given data groups have equal variance. To find the variance of a data group, we can use the below syntax,
Syntax: print(np.var(data_group)) Here, data_group: The given data group
Two sample T-Test
Here, the ratio is 12.260 / 7.7275 which is less than 4:1.
Performing Two-Sample T-Test
Analyzing the result:
Two sample t-test has the following hypothesis:
H0 => µ1 = µ2 (population mean of dataset1 is equal to dataset2) HA => µ1 ≠µ2 (population mean of dataset1 is different from dataset2)
Here, since the p-value (0.53004) is greater than alpha = 0.05 so we cannot reject the null hypothesis of the test. We do not have sufficient evidence to say that the mean height of students between the two data groups is different.
Method 2: Two-Sample T-Test with Pingouin
Pingouin is a statistical-type package project that is based on Pandas and NumPy. Pingouin provides a wide range of features. The package is used to conduct the T-Test but also for computing the degree of freedoms, Bayes factor, etc.
Firstly, let’s create the sample data. We are creating two arrays and now let’s perform two sample T-Test. For this purpose, we have ttest() function in the pingouin package of Python. The syntax is given below,
Syntax: ttest(data_group1, data_group2, correction = True/False) Here, data_group1: First data group data_group2: Second data group correction = “True”: The standard independent two sample t-test will be conducted by taking into consideration the homogeneity assumption. correction = “False”: The Welch’s t-test will be conducted by not taking into consideration the homogeneity assumption.
Two-Sample T-Test with Pingouin
Interpreting the result
This is the time to analyze the result. The p-value of the test comes out to be equal to 0.523, which is greater than the significance level alpha (that is, 0.05). This implies that we can say that the average height of students in one class is statistically not different from the average height of students in another class. Also, the Cohen’s D that is obtained in a t-test is in terms of the relative strength. According to Cohen:
- cohen-d = 0.2 is considered as the ‘small’ effect size
- cohen-d = 0.5 is considered as the ‘medium’ effect size
- cohen-d = 0.8 is considered as the ‘large’ effect size
It implies that even if the two data groups’ means don’t differ by 0.2 standard deviations or more then the difference is trivial, even if it is statistically significant.
Method 3: Two-Sample T-Test with Statsmodels
Statsmodels is a python library that is specifically used to compute different statistical models and for conducting statistical tests. This library makes use of R-style modules and dataframes.
Firstly, let’s create the sample data. We are creating two arrays and now let’s perform the two-sample T-test. Statsmodels library provides ttest_ind() function to conduct two-sample T-Test whose syntax is given below,
Syntax: ttest_ind(data_group1, data_group2) Here, data_group1: First data group data_group2: Second data group
Two-Sample T-Test with Statsmodels
Interpreting the result:
This is the time to analyze the result. The p-value of the test comes out to be equal to 0.521, which is greater than the significance level alpha (that is, 0.05). This implies that we can say that the average height of students in one class is statistically not different from the average height of students in another class.
Please Login to comment...
- Geeks Premier League
- Machine Learning
- 10 Best WhatsApp Call Recorder Apps for iOS
- The 10 Best 'Hunger Games: The Ballad of Songbirds & Snakes' Characters, Ranked
- How To Download Instagram Videos
- How To Change Page Orientation in Google Docs
- 30 OOPs Interview Questions and Answers (2023)
- Top 50 C++ Project Ideas For Beginners & Advanced
Improve your Coding Skills with Practice
Two-Sample Hypothesis Tests, with Python
The complete beginner’s guide to perform two-sample hypothesis tests (with code).
Level Up Coding
A Hypothesis Test is a statistical test that is used to test the assumption or hypothesis made and draw a conclusion about the entire population. In the previous article , I have introduced how to do one-sample hypothesis tests under different situations. In this article, I will share how hypothesis tests can extend to comparing samples from 2 populations instead of one.
The FIVE steps process of hypothesis testing is the same as one-sample hypothesis tests except for the calculation of test statistics, in summary:
- Define the Null Hypothesis (H₀)
- Define the Alternative Hypothesis (H₁)
- Set the Level of Significance (α)
- Collect data and calculate the Test Statistic
5. Construct the Rejection and Non-Rejection Regions and make a conclusion
A. Hypothesis tests for comparing 2 independent populations
It is for 2 independent populations or samples instead of just one population or sample. It has twice the number of parameters present than in one-sample hypothesis tests. A common assumption for 2 independent populations is that samples are randomly and independently drawn from each population.
If the population variance, σ is unknown , we need to test whether variance from independent populations are equal or not using the F test first. After the F test, we can decide which t test to do. Either separate-variance t-test or pooled-variance t-test.
A1. F test for comparing variance between 2 independent populations
In the F test, populations are assumed normally distributed. When testing the assumption of equal variance,
H₀: σ₁² = σ₂² H₁: σ₁² ≠σ₂² Given α When s₁ ≥ s₂, ∴ F should be ≥ 1, so only need to care right tail critical value: the test statistics = F = s₁² / s₂² ~ Fₙ₁- ₁, ₙ₂- ₁
The F statistic follows the F distribution, with two degrees of freedom: (n₁-1) and (n₂-1) respectively.
A2. Separate-Variances t test for comparing means from 2 independent populations
When σ₁ & σ₂ are both unknown , AND σ₁ & σ₂ are unequal .
A3. Pooled-Variances t test for comparing means from 2 independent populations
When σ₁ & σ₂ are both unknown , AND σ₁ & σ₂ are not unequal .
In 2000, an experiment regarding the relation of sex and sense of direction to spatial orientation in an unfamiliar environment was conducted wherein the sense of direction of 30 female and 30 male psychology students at the University of Boston was put into test. The students were given spatial orientation tests (pointing to the south) after being taken to an unfamiliar wooded park. The students pointed by moving a pointer attached to a 360° protractor. Is there any evidence that, on average, males have a better sense of direction than females, at 5% significance level?
First given the absolute pointing errors of students, test the equality of the population variances of the pointing errors for the males and females.
Let σ₁ be the true standard deviation (std) of pointing error for female in ° σ₂ be the true standard deviation (std) of pointing error for female in ° Firstly, calculate for male and female sample mean and standard deviation.
Follow the F test steps or 5 steps process given above: H₀: σ₁² = σ₂² H₁: σ₁² ≠σ₂² α = 0.01 Assuming the pointing errors are approximately normal for each population, F = s₁² / s₂² ~ F₂₉, ₂₉ Following Figure 1 Part A1:
Since we do not have enough evidence that the true std of pointing error of female is different from that of male, we can use the pooled variance t test to test whether males have a better sense of direction than female.
Let μ₁ be the true mean pointing error for female in ° μ₂ the true mean pointing error for male in ° Now let’s follow the five steps: H₀: μ₁ ≤ μ₂ H₁: μ₁ > μ₂ (right-tailed test) α = 0.05 Following Figure 1 Part A3:
Conclusion: We do not have enough evidence that the true mean pointing error for male is less than that of female at 5% significance level.
A4. Two-Sample Z test for comparing means from 2 independent populations
When σ₁ & σ₂ are both known .
B. Hypothesis tests for comparing 2 dependent populations
If groups are paired or matched according to some characteristic, or when repeated measurements are obtained from the same set of groups, samples are considered dependent.
B1. Paired-samples Z test for Mean Difference & B2. Paired-samples t test for Mean Difference
The goal of paired sample test is to determine any significant difference between the 2 dependent groups. One example can be investigating which store in Singapore is cheaper? Cold Storage vs NTUC Fairprice or Watsons vs Guardian Pharmacy.
The paired-sample Z test and t test can be reduced to the one-sample numerical hypothesis test as in the previous article.
C. Two-Sample Z test for comparing proportion from 2 independent populations
A two-sample test for proportion follows the same hypothesis testing principles as those in one-sample tests for the proportion.
One-sample hypothesis tests, with python, the complete beginner guide to perform one-sample hypothesis tests (with code).
Chi-Square Test, with Python
The complete beginner guide to perform chi-square test (with code).
ANOVA Test, with Python
The complete beginner’s guide to perform anova test (with code), two-way anova test, with python, the complete beginner’s guide to perform two-way anova test (with code), mcnemar’s test, with python, the complete beginner’s guide to perform mcnemar’s test (with code).
 “F-tests for Equality of Two Variances.” [Online]. Available: https://saylordotorg.github.io/text_introductory-statistics/s15-03-f-tests-for-equality-of-two-va.html
 M. Jeanne Sholl, J. C. Acacio, R. O. Makar, and C. Leon, “ The relation of sex and sense of direction to spatial orientation in an unfamiliar environment ,” J. Environ. Psychol. , vol. 20, no. 1, pp. 17–28, 2000.
 “Standard Deviations Not Assumed Equal • SOGA • Department of Earth Sciences.” [Online]. Available: https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Hypothesis-Tests/Hypothesis-Tests-for-Two-Population-Means/Standard-Deviations-Not-Assumed-Equal/index.html
 “Standard Deviations Assumed Equal • SOGA • Department of Earth Sciences.” [Online]. Available: https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Hypothesis-Tests/Hypothesis-Tests-for-Two-Population-Means/Standard-Deviations-Assumed-Equal/index.html
Written by Chao De-Yu
Data Analyst | MSc. Artificial Intelligence | LinkedIn — https://www.linkedin.com/in/thet-thet-yee-deyu/
More from Chao De-Yu and Level Up Coding
Towards Data Science
Deep Q-Network, with PyTorch
Explaining the fundamentals of model-free rl algorithms: deep q-network model (with code).
Building a Million-Parameter LLM from Scratch Using Python
A step-by-step guide to replicating llama architecture.
An Algo Trading Strategy which made +8,371%: A Python Case Study
Backtesting of a simple breakout trading strategy with apis and python.
Recommended from Medium
A/B Testing: Step-by-Step Guide with Code Examples
The steps of a/b testing.
Ishan | Virginia Tech & IIT Delhi | ✍️ ML/AI
Choosing the Right Correlation: Pearson vs. Spearman vs. Kendall’s Tau
Predictive Modeling w/ Python
Practical Guides to Machine Learning
Coding & Development
Bhujith Madav Velmurugan
Hypothesis Testing Uncovered
Think about a time at home when your mom asked you to go to a nearby store. you were a bit busy, so you said no. right away, your mom might….
How to Find the Best Theoretical Distribution for Your Data
Knowing the underlying data distribution is an essential step for data modeling and has many applications, such as anomaly detection….
The Statistical Foundation of Linear Regression: T-Tests, ANOVA, and Chi-Square Tests
In kaggle’s 2020 state of data science and machine learning survey, it was reported that 83.7% of data scientists favor linear and logistic….
The Normal Distribution with Python
Understanding the normal or gaussian distribution with simulation using python.
Text to speech
Calculate the T-test for the means of two independent samples of scores.
This is a test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.
The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
If an int, the axis of the input along which to compute the statistic. The statistic of each axis-slice (e.g. row) of the input will appear in a corresponding element of the output. If None , the input will be raveled before computing the statistic.
If True (default), perform a standard independent 2 sample test that assumes equal population variances  . If False, perform Welch’s t-test, which does not assume equal population variance  .
New in version 0.11.0.
Defines how to handle input NaNs.
propagate : if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding entry of the output will be NaN.
omit : NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding entry of the output will be NaN.
raise : if a NaN is present, a ValueError will be raised.
If 0 or None (default), use the t-distribution to calculate p-values. Otherwise, permutations is the number of random permutations that will be used to estimate p-values using a permutation test. If permutations equals or exceeds the number of distinct partitions of the pooled data, an exact test is performed instead (i.e. each distinct partition is used exactly once). See Notes for details.
New in version 1.7.0.
numpy.random.RandomState }, optional
If seed is None (or np.random ), the numpy.random.RandomState singleton is used. If seed is an int, a new RandomState instance is used, seeded with seed . If seed is already a Generator or RandomState instance then that instance is used.
Pseudorandom number generator state used to generate permutations (used only when permutations is not None).
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
‘two-sided’: the means of the distributions underlying the samples are unequal.
‘less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.
‘greater’: the mean of the distribution underlying the first sample is greater than the mean of the distribution underlying the second sample.
New in version 1.6.0.
If nonzero, performs a trimmed (Yuen’s) t-test. Defines the fraction of elements to be trimmed from each end of the input samples. If 0 (default), no elements will be trimmed from either side. The number of trimmed elements from each tail is the floor of the trim times the number of elements. Valid range is [0, .5).
New in version 1.7.
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
An object with the following attributes:
The p-value associated with the given alternative.
The number of degrees of freedom used in calculation of the t-statistic. This is always NaN for a permutation t-test.
New in version 1.11.0.
The object also has the following method:
Computes a confidence interval around the difference in population means for the given confidence level. The confidence interval is returned in a namedtuple with fields low and high . When a permutation t-test is performed, the confidence interval is not computed, and fields low and high contain NaN.
Suppose we observe two independent samples, e.g. flower petal lengths, and we are considering whether the two samples were drawn from the same population (e.g. the same species of flower or two species with similar petal characteristics) or two different populations.
The t-test quantifies the difference between the arithmetic means of the two samples. The p-value quantifies the probability of observing as or more extreme values assuming the null hypothesis, that the samples are drawn from populations with the same population means, is true. A p-value larger than a chosen threshold (e.g. 5% or 1%) indicates that our observation is not so unlikely to have occurred by chance. Therefore, we do not reject the null hypothesis of equal population means. If the p-value is smaller than our threshold, then we have evidence against the null hypothesis of equal population means.
By default, the p-value is determined by comparing the t-statistic of the observed data against a theoretical t-distribution. When 1 < permutations < binom(n, k) , where
k is the number of observations in a ,
n is the total number of observations in a and b , and
binom(n, k) is the binomial coefficient ( n choose k ),
the data are pooled (concatenated), randomly assigned to either group a or b , and the t-statistic is calculated. This process is performed repeatedly ( permutation times), generating a distribution of the t-statistic under the null hypothesis, and the t-statistic of the observed data is compared to this distribution to determine the p-value. Specifically, the p-value reported is the “achieved significance level” (ASL) as defined in 4.4 of  . Note that there are other ways of estimating p-values using randomized permutation tests; for other options, see the more general permutation_test .
When permutations >= binom(n, k) , an exact test is performed: the data are partitioned between the groups in each distinct way exactly once.
The permutation test can be computationally expensive and not necessarily more accurate than the analytical test, but it does not make strong assumptions about the shape of the underlying distribution.
Use of trimming is commonly referred to as the trimmed t-test. At times called Yuen’s t-test, this is an extension of Welch’s t-test, with the difference being the use of winsorized means in calculation of the variance and the trimmed sample size in calculation of the statistic. Trimming is recommended if the underlying distribution is long-tailed or contaminated with outliers  .
The statistic is calculated as (np.mean(a) - np.mean(b))/se , where se is the standard error. Therefore, the statistic will be positive when the sample mean of a is greater than the sample mean of b and negative when the sample mean of a is less than the sample mean of b .
Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix . Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False .
Efron and T. Hastie. Computer Age Statistical Inference. (2016).
Yuen, Karen K. “The Two-Sample Trimmed t for Unequal Population Variances.” Biometrika, vol. 61, no. 1, 1974, pp. 165-170. JSTOR, www.jstor.org/stable/2334299. Accessed 30 Mar. 2021.
Yuen, Karen K., and W. J. Dixon. “The Approximate Behaviour and Performance of the Two-Sample Trimmed t.” Biometrika, vol. 60, no. 2, 1973, pp. 369-374. JSTOR, www.jstor.org/stable/2334550. Accessed 30 Mar. 2021.
Test with sample with identical means:
ttest_ind underestimates p for unequal variances:
When n1 != n2 , the equal variance t-statistic is no longer equal to the unequal variance t-statistic:
T-test with different means, variance, and n:
When performing a permutation test, more permutations typically yields more accurate results. Use a np.random.Generator to ensure reproducibility:
Take these two samples, one of which has an extreme tail.
Use the trim keyword to perform a trimmed (Yuen) t-test. For example, using 20% trimming, trim=.2 , the test will reduce the impact of one ( np.floor(trim*len(a)) ) element from each tail of sample a . It will have no effect on sample b because np.floor(trim*len(b)) is 0.
How to run a two-sample t-test using SciPy
A t-test refers to a hypothesis test that is used to compare the mean of a population to a certain value (one-sample t-test) or to compare two means (two-sample t-test). In this post, we’ll show you how to conduct a two-sample t-test in Python using the SciPy library . We’ll cover the basic syntax, and a few key arguments you can use to further configure your hypothesis test. For this example, we’re using a dataset about Adidas sales in the United States.
Since t-tests are a statistical test, there are certain assumptions that the data has to meet in order for there to be high confidence in the results of the test. For example:
- Both datasets should have equivalent variance
- Each value should be independent of other values in the dataset
- The data should be normally distributed
- The data should be continuous
Assuming we’ve met these criteria, we need to establish a null and alternative hypothesis to run the test:
In order to reject the null hypothesis with a 95% confidence level, the test needs to yield a p-value of less than 0.05. While you can calculate the test statistic and p-value by hand, SciPy has a convenient function, ttest_ind() , which will run a two-sample t-test for you.
Basic Syntax: stats.ttest_ind(a, b)
The function takes 2 key arguments: a and b , which represent the two samples you are comparing. In this case, we're comparing the price per unit for sales in the Northeast and West regions. We should expect that the prices per unit are not meaningfully different.
From the results of the t-test, we can see the p-value is very small, ~1.08e-16 . This means we can reject the null hypothesis that the means of the underlying distributions are the same.
Let's run through a quick gut check if the results make sense. First we can plot the underlying distributions:
Based on the graph, we can see that the distribution of prices for the West region is heavier for the higher prices than the distribution of prices for the Northeast region.
If we calculate the average for the two samples, we see this as well:
We can see that the average price for the Northeast is $46.69, while the average price for the West is $49.94.
Additional Arguments: permutations , random_state , alternative
If you have a small sample size, for example, you can use the permutations argument, which will then run a permutations test using your data. The datasets are pooled together, and each value is randomly assigned to group a or b. The t-statistic is calculated, and the process is repeated, and you can then compare the t-statistic for the observed data with the distribution of simulated t-statistics.
If you’re running a permutations test, and want reproducible results, it’s advised to set the random_state . It does not matter what value you set the random_state to, just that it is set to some value.
Lastly, the alternative argument helps define what kind of two-sample t-test you are running: one-sided or two-sided.
- ‘two-sided’ : default, two-sided t-test, alternative hypothesis states that the means of the distributions are unequal
- ‘less’ : one-sided t-test, alternative hypothesis states that the first sample comes from a distribution whose mean is less than the distribution underlying the second sample
- ‘greater’ : one-sided t-test, alternative hypothesis states that the first sample comes from a distribution whose mean is greater than the distribution underlying the second sample
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter .
Start using Einblick
Pull all your data sources together, and build actionable insights on a single unified platform.
- All connectors
- Unlimited teammates
- All operators
Statistics Made Easy
How to Conduct a Two Sample T-Test in Python
A two sample t-test is used to test whether or not the means of two populations are equal.
This tutorial explains how to conduct a two sample t-test in Python.
Example: Two Sample t-Test in Python
Researchers want to know whether or not two different species of plants have the same mean height. To test this, they collect a simple random sample of 20 plants from each species.
Use the following steps to conduct a two sample t-test to determine if the two species of plants have the same height.
Step 1: Create the data.
First, we’ll create two arrays to hold the measurements of each group of 20 plants:
Step 2: Conduct a two sample t-test.
Next, we’ll use the ttest_ind() function from the scipy.stats library to conduct a two sample t-test, which uses the following syntax:
ttest_ind(a, b, equal_var=True)
- a: an array of sample observations for group 1
- b: an array of sample observations for group 2
- equal_var: if True, perform a standard independent 2 sample t-test that assumes equal population variances. If False, perform Welch’s t-test , which does not assume equal population variances. This is True by default.
Before we perform the test, we need to decide if we’ll assume the two populations have equal variances or not. As a rule of thumb, we can assume the populations have equal variances if the ratio of the larger sample variance to the smaller sample variance is less than 4:1.
The ratio of the larger sample variance to the smaller sample variance is 12.26 / 7.73 = 1.586 , which is less than 4. This means we can assume that the population variances are equal.
Thus, we can proceed to perform the two sample t-test with equal variances:
The t test statistic is -0.6337 and the corresponding two-sided p-value is 0.53005 .
Step 3: Interpret the results.
The two hypotheses for this particular two sample t-test are as follows:
H 0 : µ 1 = µ 2 (the two population means are equal)
H A : µ 1 ≠µ 2 (the two population means are not equal)
Because the p-value of our test (0.53005) is greater than alpha = 0.05, we fail to reject the null hypothesis of the test. We do not have sufficient evidence to say that the mean height of plants between the two populations is different.
How to Conduct a One Sample T-Test in Python How to Conduct a Paired Samples T-Test in Python
Published by Zach
Leave a reply cancel reply.
Your email address will not be published. Required fields are marked *
Upgrade to get unlimited access ($10 one off payment).
Develop a Data Analytics Web App in 3 Steps
What Does ChatGPT Say About Machine Learning Trend and How Can We Prepare For It?
Linear Algebra for ML Part 2 | Principal Component Analysis
Linear Algebra for ML Part 1 | Data Representation
- Apr 15, 2022
An Interactive Guide to Hypothesis Testing in Python
Updated: Jun 12, 2022
upgrade and grab the cheatsheet from our infographics gallery
What is hypothesis testing.
Hypothesis testing is an essential part in inferential statistics where we use observed data in a sample to draw conclusions about unobserved data - often the population.
Implication of hypothesis testing:
clinical research: widely used in psychology, biology and healthcare research to examine the effectiveness of clinical trials
A/B testing: can be applied in business context to improve conversions through testing different versions of campaign incentives, website designs ...
feature selection in machine learning: filter-based feature selection methods use different statistical tests to determine the feature importance
college or university: well, if you major in statistics or data science, it is likely to appear in your exams
For a brief video walkthrough along with the blog, check out my YouTube channel.
4 Steps in Hypothesis testing
Step 1. define null and alternative hypothesis.
Null hypothesis (H0) can be stated differently depends on the statistical tests, but generalize to the claim that no difference, no relationship or no dependency exists between two or more variables.
Alternative hypothesis (H1) is contradictory to the null hypothesis and it claims that relationships exist. It is the hypothesis that we would like to prove right. However, a more conservational approach is favored in statistics where we always assume null hypothesis is true and try to find evidence to reject the null hypothesis.
Step 2. Choose the appropriate test
Common Types of Statistical Testing including t-tests, z-tests, anova test and chi-square test
T-test: compare two groups/categories of numeric variables with small sample size
Z-test: compare two groups/categories of numeric variables with large sample size
ANOVA test: compare the difference between two or more groups/categories of numeric variables
Chi-Squared test: examine the relationship between two categorical variables
Correlation test: examine the relationship between two numeric variables
Step 3. Calculate the p-value
How p value is calculated primarily depends on the statistical testing selected. Firstly, based on the mean and standard deviation of the observed sample data, we are able to derive the test statistics value (e.g. t-statistics, f-statistics). Then calculate the probability of getting this test statistics given the distribution of the null hypothesis, we will find out the p-value. We will use some examples to demonstrate this in more detail.
Step 4. Determine the statistical significance
p value is then compared against the significance level (also noted as alpha value) to determine whether there is sufficient evidence to reject the null hypothesis. The significance level is a predetermined probability threshold - commonly 0.05. If p value is larger than the threshold, it means that the value is likely to occur in the distribution when the null hypothesis is true. On the other hand, if lower than significance level, it means it is very unlikely to occur in the null hypothesis distribution - hence reject the null hypothesis.
Hypothesis Testing with Examples
Kaggle dataset “ Customer Personality Analysis” is used in this case study to demonstrate different types of statistical test. T-test, ANOVA and Chi-Square test are sensitive to large sample size, and almost certainly will generate very small p-value when sample size is large . Therefore, I took a random sample (size of 100) from the original data:
T-test is used when we want to test the relationship between a numeric variable and a categorical variable.There are three main types of t-test.
one sample t-test: test the mean of one group against a constant value
two sample t-test: test the difference of means between two groups
paired sample t-test: test the difference of means between two measurements of the same subject
For example, if I would like to test whether “Recency” (the number of days since customer’s last purchase - numeric value) contributes to the prediction of “Response” (whether the customer accepted the offer in the last campaign - categorical value), I can use a two sample t-test.
The first sample would be the “Recency” of customers who accepted the offer:
The second sample would be the “Recency” of customers who rejected the offer:
To compare the “Recency” of these two groups intuitively, we can use histogram (or distplot) to show the distributions.
It appears that positive response have lower Recency compared to negative response. To quantify the difference and make it more scientific, let’s follow the steps in hypothesis testing and carry out a t-test.
Step1. define null and alternative hypothesis
null: there is no difference in Recency between the customers who accepted the offer in the last campaign and who did not accept the offer
alternative: customers who accepted the offer has lower Recency compared to customers who did not accept the offer
Step 2. choose the appropriate test
To test the difference between two independent samples, two-sample t-test is the most appropriate statistical test which follows student t-distribution. The shape of student-t distribution is determined by the degree of freedom, calculated as the sum of two sample size minus 2.
In python, simply import the library scipy.stats and create the t-distribution as below.
Step 3. calculate the p-value
There are some handy functions in Python calculate the probability in a distribution. For any x covered in the range of the distribution, pdf(x) is the probability density function of x — which can be represented as the orange line below, and cdf(x) is the cumulative density function of x — which can be seen as the cumulative area. In this example, we are testing the alternative hypothesis that — Recency of positive response minus the Recency of negative response is less than 0. Therefore we should use a one-tail test and compare the t-statistics we get against the lowest value in this distribution — therefore p-value can be calculated as cdf(t_statistics) in this case.
ttest_ind() is a handy function for independent t-test in python that has done all of these for us automatically. Pass two samples rececency_P and recency_N as the parameters, and we get the t-statistics and p-value.
Here I use plotly to visualize the p-value in t-distribution. Hover over the line and see how point probability and p-value changes as the x shifts. The area with filled color highlights the p-value we get for this specific test.
Check out the code in our Code Snippet section, if you want to build this yourself.
An interactive visualization of t-distribution with t-statistics vs. significance level.
Step 4. determine the statistical significance
The commonly used significance level threshold is 0.05. Since p-value here (0.024) is smaller than 0.05, we can say that it is statistically significant based on the collected sample. A lower Recency of customer who accepted the offer is likely not occur by chance. This indicates the feature “Response” may be a strong predictor of the target variable “Recency”. And if we would perform feature selection for a model predicting the "Recency" value, "Response" is likely to have high importance.
Now that we know t-test is used to compare the mean of one or two sample groups. What if we want to test more than two samples? Use ANOVA test.
ANOVA examines the difference among groups by calculating the ratio of variance across different groups vs variance within a group . Larger ratio indicates that the difference across groups is a result of the group difference rather than just random chance.
As an example, I use the feature “Kidhome” for the prediction of “NumWebPurchases”. There are three values of “Kidhome” - 0, 1, 2 which naturally forms three groups.
Firstly, visualize the data. I found box plot to be the most aligned visual representation of ANOVA test.
It appears there are distinct differences among three groups. So let’s carry out ANOVA test to prove if that’s the case.
1. define hypothesis:
null hypothesis: there is no difference among three groups
alternative hypothesis: there is difference between at least two groups
2. choose the appropriate test: ANOVA test for examining the relationships of numeric values against a categorical value with more than two groups. Similar to t-test, the null hypothesis of ANOVA test also follows a distribution defined by degrees of freedom. The degrees of freedom in ANOVA is determined by number of total samples (n) and the number of groups (k).
dfn = n - 1
dfd = n - k
3. calculate the p-value: To calculate the p-value of the f-statistics, we use the right tail cumulative area of the f-distribution, which is 1 - rv.cdf(x).
To easily get the f-statistics and p-value using Python, we can use the function stats.f_oneway() which returns p-value: 0.00040.
An interactive visualization of f-distribution with f-statistics vs. significance level. (Check out the code in our Code Snippet section, if you want to build this yourself. )
4. determine the statistical significance : Compare the p-value against the significance level 0.05, we can infer that there is strong evidence against the null hypothesis and very likely that there is difference in “NumWebPurchases” between at least two groups.
Chi-Squared test is for testing the relationship between two categorical variables. The underlying principle is that if two categorical variables are independent, then one categorical variable should have similar composition when the other categorical variable change. Let’s look at the example of whether “Education” and “Response” are independent.
First, use stacked bar chart and contingency table to summary the count of each category.
If these two variables are completely independent to each other (null hypothesis is true), then the proportion of positive Response and negative Response should be the same across all Education groups. It seems like composition are slightly different, but is it significant enough to say there is dependency - let’s run a Chi-Squared test.
null hypothesis: “Education” and “Response” are independent to each other.
alternative hypothesis: “Education” and “Response” are dependent to each other.
2. choose the appropriate test: Chi-Squared test is chosen and you probably found a pattern here, that Chi-distribution is also determined by the degree of freedom which is (row - 1) x (column - 1).
3. calculate the p-value: p value is calculated as the right tail cumulative area: 1 - rv.cdf(x).
Python also provides a useful function to get the chi statistics and p-value given the contingency table.
An interactive visualization of chi-distribution with chi-statistics vs. significance level. (Check out the code in our Code Snippet section, if you want to build this yourself. )
4. determine the statistical significanc e: the p-value here is 0.41, suggesting that it is not statistical significant. Therefore, we cannot reject the null hypothesis that these two categorical variables are independent. This further indicates that “Education” may not be a strong predictor of “Response”.
Thanks for reaching so far, we have covered a lot of contents in this article but still have two important hypothesis tests that are worth discussing separately in upcoming posts.
z-test: test the difference between two categories of numeric variables - when sample size is LARGE
correlation: test the relationship between two numeric variables
Hope you found this article helpful. If you’d like to support my work and see more articles like this, treat me a coffee ☕️ by signing up Premium Membership with $10 one-off purchase.
Take home message.
In this article, we interactively explore and visualize the difference between three common statistical tests: t-test, ANOVA test and Chi-Squared test. We also use examples to walk through essential steps in hypothesis testing:
1. define the null and alternative hypothesis
2. choose the appropriate test
3. calculate the p-value
4. determine the statistical significance
- Data Science
How to Self Learn Data Science in 2022
T Test in Python: Easily Test Hypothesis in Python
T test in Python is a statistical method to test any hypothesis. Moreover, It helps you validate your study in terms of statistical values. There are three types of T-tests you can perform, and we are going to cover all three and their implementation in Python.
What Is T Test?
First, let’s understand what the T test is, also known as the student’s test. It is an inferential statistical approach to finding the relation between two samples using their means and variances. T test is basically used to accept or reject a null hypothesis H0.. However, to accept or reject the null hypothesis depends on the P value. Mainly if the P > alpha value which in most cases is 0.05 , we reject the null hypothesis and consider that there is a significant difference between the two samples.
If you want to understand more about statistics in terms of programming, check out this post .
Types Of T Test In Python
There are four types of T test you can perform in Python. They are as follows:
- One sample T test
- Two sample T test (paired)
- Two sample T test (independent)
Welch T test
Let’s understand each of the tests and how we can implement every single of the tests accordingly.
One Sample Test
In one sample T test, we usually test the difference between a mean of the sample from a particular group and a mean that we know or we have hypothesized. For example, we hypothesize the mean height of a person in a classroom of 25 students of 5feet. Further, we carry out a T test to know if the mean height is actually 5 feet or not.
Where x is the sample mean, μ is hypothesized or known to mean, s is the sample standard deviation and n is the sample size.
Two Sample Test (paired)
In two sample test, which is paired, we carry out a T test between two means of samples that we take from the same population or group. For example, we apply pesticide on one part of a crop field and further take the mean of yields from the part where there is no pesticide and from the part where the pesticide is applied.
Where x 1 and x 2 are sample means, v 1 and v 2 are variances of two samples, respectively, and s 1 and s 2 are sample sizes.
Two Sample Test (unpaired)
On the other hand, in two sample tests unpaired, we carry out a T test between two means of samples which we take from different populations or groups. For example, similar to the last one, we apply pesticides on crops but now on another field. Further, we take samples from both the fields, the one which has pesticides and one which doesn’t. Finally, we calculate the mean of the yield given by both samples and carry out a T test between them to see if they both have some difference or not.
The Welch test is the same as the student’s T test, but in this test, it is assumed that both the samples have different variances. Further, we can also say that the Welch test takes into account the standard deviation of both samples.
Where x1 bar and x2 bar are sample means, s1 and s2 are standard deviations of both the samples, respectively, and N1 and N2 are sample sizes.
Implementing T Test In Python
Let’s understand how to implement the T test in Python. In this article, we will see how to implement the t test in Python using the ‘Scipy package. We will understand all the three types of tests discussed above one by one with an example.
Example 1 (Single sample)
Firstly we will discuss Implementing one sample T test in Python. Let’s take an example where we take blood samples of people who work out and measure their LDL cholesterol levels. We hypothesize the mean cholesterol level is 100mg/dL.
- Null hypothesis: There is a difference between hypothesized mean and the actual mean.
- Alternate hypothesis: There is no difference between hypothesized and actual mean.
Steps to implement the test:
- Firstly import the scipy library
- From scipy import stats
- define your alpha value
- Use the syntax given in the example code to calculate the p and t value
- perform the T test and check the results.
Example 2 (two sample T test paired)
Let’s take an example where we use fertilizer on one part of the field. On the other hand, another part is left as it is. Finally, we calculate the mean yields of both parts x 1 and x 2 , respectively.
- Null Hypothesis: There is no difference between the mean of yields from both parts of the field.
- Alternative Hypothesis: There is a significant difference between the mean of yields from both parts of the field.
P-value is less than the alpha value. Therefore, we reject the null hypothesis and conclude there is a difference between two means of yield in the different parts of the field.
In the same manner, you can perform a T test for paired samples just by changing the values of variance and number of samples.
Example 3 (two sample T test unpaired)
This T test is done when we take samples from different populations. Assume that one of the fields of the crop is covered in fertilizers and one is not. We carry out a T test to find if fertilizers make any difference between the two fields.
P-value is less than the alpha value. Therefore, we reject the null hypothesis and conclude there is a difference between the two means of yield in the different fields.
Example 4 (Welch T test)
Welch T test is the same as the student’s T test but implements a T test between samples with different mean variances. For performing the Welch T test, you have to set ‘equal_var = False’.
T Test In Pandas
Further, let’s see how to implement the T test in pandas. We will see paired and unpaired t tests both. Firstly we will understand how to implement an unpaired two-sample T-test in pandas.
Two Sample T Test (unpaired)
We will understand using an example. Assume the same example that one of the fields of the crop is covered in fertilizers and one is not. We carry out a T test to find if fertilizers make any difference between the two fields.
P-value is less than 0.5. Therefore, we can say fertilizers make a difference in the final yields of crops.
Two Sample T Test (paired)
In this T test, we will test samples taken from the same population, for example, if one part of a field is covered with fertilizers and another part of the same field is not. We will check our hypothesis if fertilizers make any difference in the final yields of both parts.
We can see the p-value is less than 0.5. Therefore, we reject the null hypothesis and conclude fertilizers make a difference in the final yields of identical parts of fields.
FAQs on T Test in Python
The T-test is used for testing a hypothesis by comparing means of two samples.
There are four types of t-test you can perform in Python: one sample t-test, two sample t-test paired, two sample t-test unpaired, Welch t-test
The p-value is the value of the probability of chance in our sample. If the p-value is less than 0.5, it means the results are not simply because of circumstance and hence shows us the validity of the test.
A null hypothesis is a hypothesis we assume before testing our hypothesis. If the p-value is less than 0.5, we reject the null hypothesis and valid our own alternative hypothesis.
In conclusion, we can say that the T test in Python helps programmers to test their hypothesis much more quickly and give accurate results. Further, it provides programmers with statistical data of the samples, which can be of great use.