hypothesis testing in statistics analytics vidhya

What is Hypothesis Testing in statistics

31 October 2023 16 November 2023

Hypothesis testing is a technique to prove or reject a statement backed by sample data that represents the population. In this article, we will discuss what is hypothesis testing, its applications, and its role in machine learning.

Table of Contents

What is Hypothesis Testing / Significance testing?

Hypothesis testing is a type of statistical analysis where we make an assumption about a population parameter and try to accept or reject the hypothesis by doing statistical tests on sample data. The sample data comes from a large population.

This image shows the workflow for hypothesis testing

Example of Hypothesis Test

Following is the list of examples where we can use hypothesis testing to determine:

Whether the parts manufactured from two different machines are of the same quality.
Is there a relation between gender and a person’s height?
Is the customer churn due to product poor quality or delayed delivery time?

How does hypothesis testing work?

In hypothesis testing, we provide evidence for the validity of the null hypothesis using sample data from the population. In other words, we can use hypothesis testing to reject or accept the null hypothesis.

Now we will discuss more about these steps in hypothesis testing. We recommend you read our articles on statistical tests to understand hypothesis testing with live examples and their implementation in Python.

You can refer below articles for details:

Define Hypothesis: Null and Alternate Hypothesis

The first step in hypothesis testing is to define the null and alternate hypothesis along with acceptable significance or confidence level from the problem statement. Now we will try to understand what is significance level, Null and Alternate Hypothesis.

What is Level of Significance (α)?

The Level of significance or alpha is an acceptance criterion to determine whether a test is statistically significant.

This image shows the relation between the significance level and type of error

The significance level gives the probability of type-l error (Accepting the alternate hypothesis instead of the null hypothesis). Its value lies between 0 to 1. Zero indicates we will not accept any type-l error.

In practice, we keep alpha values as 0.01, 0.05, and 0.1. It indicates 1%, 5%, and 10% chances of type-l error.
A low significance level reduces the risk of type-1 error and increases the type-ll error.

hypothesis testing in statistics analytics vidhya

In practice, we need to strike a balance between type-l and type error. Therefore, it is important to define alpha carefully.

What is Null Hypothesis (Ho)?

The Null Hypothesis is an assumption that says

There is no significant difference between the sample data and the population.
The observed difference between two groups or between sample data and the population is by chance or due to sampling error.
There is no relationship between dependent and independent variables.

A null Hypothesis is a statement that is considered true until it is proved wrong based on experimental data. Researchers try to reject the null hypothesis using hypothesis testing.

What is Alternate Hypothesis (H1)

The Alternate hypothesis is opposite to the null hypothesis. As per the alternate hypothesis:

There is a significant difference between the sample data and the population.
The observed difference between two groups or between sample data and the population is not by chance or due to sampling error.
There is a relationship between dependent and independent variables.

How to write the Null and alternate Hypothesis?

The research question or problem statement is the first input for the null hypothesis. From this statement we drive null and alternate hypothesis.

Example of null hypothesis in Manufacturing

Null Hypothesis: The average part weight manufactured using machine-1 is 10 grams.

Alternate Hypothesis: The average part weight manufactured using machine-1 is greater or less than 10 grams.

Example of null hypothesis in Investment

Null Hypothesis: The average yearly return for a mutual fund is 9%.

Alternate Hypothesis: The average yearly return for the mutual fund is less than 9%.

Example of null hypothesis in Stock Market

We can use the null hypothesis to determine if the two stocks are highly correlated (The correlation between Stock X & Stock Y is high).

Null Hypothesis: The two stocks are not correlated. X ≠ Y

Alternate Hypothesis: The two stocks are correlated: X = Y

Example of null hypothesis in Medical

There is a drug that reduces the risk of heart attack.

Null Hypothesis: The drug x does not reduce the impact of the heart attack.

Alternate Hypothesis: The drug x reduces the impact of heart attack.

Types of Statistical tests in hypothesis Testing

We can classify statistical tests into the following two categories:

Parametric Tests
Non-parametric Test

Parametric Test

During parametric testing, we assume that the population distribution is normal. We use population mean and standard deviation to prove or reject the null hypothesis.

Non-Parametric Test

We do not assume anything about the population during non-parametric testing and use the population median instead of mean to prove or reject the null hypothesis.

Parametric vs Non-Parametric Tests

Hypothesis testing result interpretation.

After we have statistical test results, the next step is to compare them with critical values to reject or accept the null hypothesis. These critical values are driven by significance level.

These hypothesis testing results are not 100% correct. We can get the following two types of errors during hypothesis testing.

Type I Error

Type ii error.

A Type I error is where we reject the null hypothesis when it’s true. We can understand this using the following example.

Null Hypothesis:

The average yearly return for a mutual fund is 9%.

Alternate Hypothesis:

The average yearly return for the mutual fund is less than 9%.

Type I Error:

In reality, the actual yearly returns are 9%. However, during hypothesis testing, we reject the null hypothesis and accept the alternate hypothesis.

A Type II error is where we fail to reject the null hypothesis when it’s true. We can understand this using the following example.

Null Hypothesis:

The yearly return for a mutual fund is 9%

Alternate Hypothesis:

The yearly return for the mutual fund is less than 9%.

Type II Error:

In reality, the actual yearly returns are 3%. However, during hypothesis testing, we failed to reject the null hypothesis (because the null hypothesis is not true) and accept the null hypothesis.

Hypothesis Testing and Confidence Interval

Both hypothesis testing and confidence intervals are inferential techniques that use an approximation of sample distribution. Confidence intervals use sample data to estimate a population parameter, whereas hypothesis testing uses sample data to test a hypothesis.

The problem statement determines if we should use hypothesis testing or confidence intervals to get answers for the problem.

Hypothesis Testing

Inferential techniques use an approximation using sample data to statistically accept or reject the null hypothesis.

Application Example: Determine if a person’s weight is equal to / less than or greater than 60kg.

Confidence Interval

Inferential techniques use an approximation using sample data to estimate a population parameter.

CI = Mean ± Estimated Variation

Application Example: Determine the variation in a person’s weight from the mean value.

Examples to select between Hypothesis Testing and Confidence Interval

Problem Statement:

How strong is the correlation between a teenager’s height and weight?

In the above problem, we need a quantitative answer about the correlation between weight and height. We don’t need to test any specific correlation.

In other words, there is no relationship or correlation between height and weight of a teenager. Therefore, we can use Confidence Interval for this problem.

Is the mean part weight produced from Machine A different from the mean part weight from Machine B?

In the above problem, We want to know whether the part weights from machine A and machine B are similar. In other words, our answer is not a number. Therefore, we can use the Hypothesis testing for this problem.

Is there any correlation between the average outside temperature and energy consumption?

Both energy consumption and outside temperature are quantitative parameters. The answer to this question will be yes if the correlation between these two variables is more than zero. Therefore, we can use hypothesis testing for this problem.

Advantages of Hypothesis Testing

Here is the list of advantages of hypothesis testing.

We can make conclusions about the population based on sample data.
Statistically prove or reject a statement based on the random sample data from a population.
It helps in avoiding false claims and conclusions.
Decision-making is according to data instead of personal opinions.

Limitations of Hypothesis Testing

Here is the list of limitations of hypothesis testing.

Results depend on the quality of available data.
The selection of the statistical method for hypothesis testing may impact the results.
There are chances of type 1 (Rejecting the True Null Hypothesis) and type 2 errors (Failing to reject the False Null Hypothesis).

Formulation of the null hypothesis is critical. Inaccuracy in null hypothesis formulation results in wrong interpretation.

Application Examples for Hypothesis Testing

Following are the application examples of hypothesis tests in the real world.

In Marketing

Impact of a new ad campaign on product sales.
Determine customer preference, whether they like new or old products.
We can use hypothesis testing to make an informed decision on whether changing a button color affects sales.

In Manufacturing

If the new manufacturing process improved the product quality.
Does a new machine in the manufacturing plant reduce the number of failures?
What is the impact of implementing predictive maintenance on some machine breakdowns?
Are the products produced from multiple manufacturing lines are similar?
Is the number of defective parts is less than 0.5%?

In Agriculture

Do the fertilizers have an impact on plant growth?

Electricity Distribution

When we have two processes to distribute energy to households, which method will give the best efficiency?

Role of Hypothesis Testing in Machine Learning

Hypothesis testing has a key role in ensuring ML algorithm results fulfill the business expectations.

Machine learning applications is to identify patterns in the training data and make predictions on unseen data. But sometimes, available training data have outliners, noises, and random fluctuations. We can use hypothesis testing in ML to ensure training data.
Understand the business problem.
We can use hypothesis testing to validate the patterns in data. In other words, we can use hypothesis tests to determine if the patterns in data are actual or just a result of chance.
Selection of machine learning algorithms.
Feature Selection.
A/B testing.
Anomaly Detection.
Hyperparameter tuning: Select the best hyperparameters.
Select the best statistical and probability distribution.
Compare two or more sets of predictions.

Hypothesis or significance testing consist of a set of statistical tests. We use significance testing to make inferences about the population and draw conclusions from sample data. Hypothesis testing is a scientific process to test assumptions backed by data.

We suggest you read this article on how to convert a business problem into a machine learning problem .

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

IMAGES

Importance of Hypothesis Testing in Quality Management
Master Hypothesis Testing in Statistics Guide
Learn Hypothesis Testing
Learn all About Hypothesis Testing!
What is Hypothesis Testing in Statistics? Types and Examples
Hypothesis Testing Steps. Hypothesis testing is the process of…

VIDEO

Concept of Hypothesis
Intro to hypothesis testing worksheet 1
Hypothesis Testing
Module8: Hypothesis Testing Sigma Unknown
L-4
Introduction to Statistics: Hypothesis Testing

COMMENTS

Hypothesis Testing Guide for Data Science Beginners
Hypothesis testing is a statistical method used to evaluate a claim or hypothesis about a population parameter based on sample data. It involves making decisions about the validity of a statement, often referred to as the null hypothesis, by assessing the likelihood of observing the sample data if the null hypothesis were true.
What is Hypothesis Testing in statistics
Here is the list of limitations of hypothesis testing. Results depend on the quality of available data. The selection of the statistical method for hypothesis testing may impact the results. There are chances of type 1 (Rejecting the True Null Hypothesis) and type 2 errors (Failing to reject the False Null Hypothesis).
Hypothesis Testing for Data Science and Analytics
The symbol for the alternative hypothesis is 'H1'. 2. Set the level of Significance - To set the criteria for a decision, we state the level of significance for a test. It could 5%, 1% or 0.5%. Based on the level of significance, we make a decision to accept the Null or Alternate hypothesis.
A Crash Course on Hypothesis Testing
The statistical hypothesis testing is criticized by many statisticians. The first and the main limitation of the hypothesis testing is that it depends on the sample size, i.e., everything else ...
The Ultimate Guide to Hypothesis Testing for beginners
1.Formulating the Null and Alternative Hypothesis: The first step in performing Hypothesis Testing is to describe the Null and Alternative Hypothesis in words. These Hypotheses are described using ...
Hypothesis Testing for Data Science and Analytics
Statistics is the study of the collection, analysis, interpretation, presentation, and organisation of data. For all the data science and machine learning enthusiasts it is paramount to be well versed with various statistical concepts such as Hypothesis testing
Hypothesis Test
May 26, 2020. --. 1. Hypothesis testing is one of the most important concept in statistics. In industry, 'AB Test' utilizes this idea as well. However, most of the articles only talked about ...
Statistical Hypothesis Testing
Statistical Hypothesis Testing is a very important and a "must-have" concept. ... Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data ...
Learn Hypothesis Testing
Data can be interpreted by assuming a specific outcome and using statistical methods to confirm or reject the assumption. This assumption is called a hypothe...
DataHour: Practical Hypothesis Testing
What are the steps for hypothesis testing, types of tests, critical values, alpha & p-values and how they are calculated using Python; 5 hands-on exercises including z-test & t-test for population means, independent sample t-test, paired sample t-tests, and chi-squared goodness of fit test using actual real-life business problems with datasets.
An easy guide for hypothesis testing
Steps for hypothesis testing-. Step 1: State the null hypothesis & Alternate hypothesis. Step 2: Set criteria for deciding on accepting or rejecting a hypothesis i.e., level of significance (α ...
Hypothesis Testing for Dummies
Hypothesis testing in statistics is a way for us to test the results of a survey or experiments to see if we have meaningful results. ... See all from Analytics Vidhya. Recommended from Medium ...
Relevance of Hypothesis Testing in Data Science: Insights and
Relevance of Hypothesis Testing in Data Science: Insights and Experiences | Analytics Vidhya. Let's talk about the role of hypothesis tests like t-tests, chi-square, and more in statistical analysis. Are they just as valuable in the realm of data science? We want to hear from you! Share your experiences and insights. How often do you utilize ...
Hypothesis Testing
Read writing about Hypothesis Testing in Analytics Vidhya. Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https ...
Statistics Guide for Data Science Ebook by Kunal Jain
Statistics is the backbone of data science and machine learning. It is the study of the. collection. analysis. interpretation. presentation. organisation of data. Being in the field of Machine Learning and Data Science, it is paramount for a practitioner to be well versed with various statistical concepts.Any expert in data science would surely ...
Analytics Vidhya on LinkedIn: Your Guide to Master Hypothesis Testing
Learn to differentiate signal from noise data - an important factor in Statistics known as "Hypothesis Testing", explained with a case study here. ... Analytics Vidhya 182,728 followers 8y ...
Analytics Vidhya
A verification link has been sent to your email id . If you have not recieved the link please goto Sign Up page again