• Privacy Policy

Research Method

Home » ANOVA (Analysis of variance) – Formulas, Types, and Examples

ANOVA (Analysis of variance) – Formulas, Types, and Examples

Table of Contents

ANOVA

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It is similar to the t-test, but the t-test is generally used for comparing two means, while ANOVA is used when you have more than two means to compare.

ANOVA is based on comparing the variance (or variation) between the data samples to the variation within each particular sample. If the between-group variance is high and the within-group variance is low, this provides evidence that the means of the groups are significantly different.

ANOVA Terminology

When discussing ANOVA, there are several key terms to understand:

  • Factor : This is another term for the independent variable in your analysis. In a one-way ANOVA, there is one factor, while in a two-way ANOVA, there are two factors.
  • Levels : These are the different groups or categories within a factor. For example, if the factor is ‘diet’ the levels might be ‘low fat’, ‘medium fat’, and ‘high fat’.
  • Response Variable : This is the dependent variable or the outcome that you are measuring.
  • Within-group Variance : This is the variance or spread of scores within each level of your factor.
  • Between-group Variance : This is the variance or spread of scores between the different levels of your factor.
  • Grand Mean : This is the overall mean when you consider all the data together, regardless of the factor level.
  • Treatment Sums of Squares (SS) : This represents the between-group variability. It is the sum of the squared differences between the group means and the grand mean.
  • Error Sums of Squares (SS) : This represents the within-group variability. It’s the sum of the squared differences between each observation and its group mean.
  • Total Sums of Squares (SS) : This is the sum of the Treatment SS and the Error SS. It represents the total variability in the data.
  • Degrees of Freedom (df) : The degrees of freedom are the number of values that have the freedom to vary when computing a statistic. For example, if you have ‘n’ observations in one group, then the degrees of freedom for that group is ‘n-1’.
  • Mean Square (MS) : Mean Square is the average squared deviation and is calculated by dividing the sum of squares by the corresponding degrees of freedom.
  • F-Ratio : This is the test statistic for ANOVAs, and it’s the ratio of the between-group variance to the within-group variance. If the between-group variance is significantly larger than the within-group variance, the F-ratio will be large and likely significant.
  • Null Hypothesis (H0) : This is the hypothesis that there is no difference between the group means.
  • Alternative Hypothesis (H1) : This is the hypothesis that there is a difference between at least two of the group means.
  • p-value : This is the probability of obtaining a test statistic as extreme as the one that was actually observed, assuming that the null hypothesis is true. If the p-value is less than the significance level (usually 0.05), then the null hypothesis is rejected in favor of the alternative hypothesis.
  • Post-hoc tests : These are follow-up tests conducted after an ANOVA when the null hypothesis is rejected, to determine which specific groups’ means (levels) are different from each other. Examples include Tukey’s HSD, Scheffe, Bonferroni, among others.

Types of ANOVA

Types of ANOVA are as follows:

One-way (or one-factor) ANOVA

This is the simplest type of ANOVA, which involves one independent variable . For example, comparing the effect of different types of diet (vegetarian, pescatarian, omnivore) on cholesterol level.

Two-way (or two-factor) ANOVA

This involves two independent variables. This allows for testing the effect of each independent variable on the dependent variable , as well as testing if there’s an interaction effect between the independent variables on the dependent variable.

Repeated Measures ANOVA

This is used when the same subjects are measured multiple times under different conditions, or at different points in time. This type of ANOVA is often used in longitudinal studies.

Mixed Design ANOVA

This combines features of both between-subjects (independent groups) and within-subjects (repeated measures) designs. In this model, one factor is a between-subjects variable and the other is a within-subjects variable.

Multivariate Analysis of Variance (MANOVA)

This is used when there are two or more dependent variables. It tests whether changes in the independent variable(s) correspond to changes in the dependent variables.

Analysis of Covariance (ANCOVA)

This combines ANOVA and regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (interval variables) account. This allows the comparison of one variable outcome between groups, while statistically controlling for the effect of other continuous variables that are not of primary interest.

Nested ANOVA

This model is used when the groups can be clustered into categories. For example, if you were comparing students’ performance from different classrooms and different schools, “classroom” could be nested within “school.”

ANOVA Formulas

ANOVA Formulas are as follows:

Sum of Squares Total (SST)

This represents the total variability in the data. It is the sum of the squared differences between each observation and the overall mean.

  • yi represents each individual data point
  • y_mean represents the grand mean (mean of all observations)

Sum of Squares Within (SSW)

This represents the variability within each group or factor level. It is the sum of the squared differences between each observation and its group mean.

  • yij represents each individual data point within a group
  • y_meani represents the mean of the ith group

Sum of Squares Between (SSB)

This represents the variability between the groups. It is the sum of the squared differences between the group means and the grand mean, multiplied by the number of observations in each group.

  • ni represents the number of observations in each group
  • y_mean represents the grand mean

Degrees of Freedom

The degrees of freedom are the number of values that have the freedom to vary when calculating a statistic.

For within groups (dfW):

For between groups (dfB):

For total (dfT):

  • N represents the total number of observations
  • k represents the number of groups

Mean Squares

Mean squares are the sum of squares divided by the respective degrees of freedom.

Mean Squares Between (MSB):

Mean Squares Within (MSW):

F-Statistic

The F-statistic is used to test whether the variability between the groups is significantly greater than the variability within the groups.

If the F-statistic is significantly higher than what would be expected by chance, we reject the null hypothesis that all group means are equal.

Examples of ANOVA

Examples 1:

Suppose a psychologist wants to test the effect of three different types of exercise (yoga, aerobic exercise, and weight training) on stress reduction. The dependent variable is the stress level, which can be measured using a stress rating scale.

Here are hypothetical stress ratings for a group of participants after they followed each of the exercise regimes for a period:

  • Yoga: [3, 2, 2, 1, 2, 2, 3, 2, 1, 2]
  • Aerobic Exercise: [2, 3, 3, 2, 3, 2, 3, 3, 2, 2]
  • Weight Training: [4, 4, 5, 5, 4, 5, 4, 5, 4, 5]

The psychologist wants to determine if there is a statistically significant difference in stress levels between these different types of exercise.

To conduct the ANOVA:

1. State the hypotheses:

  • Null Hypothesis (H0): There is no difference in mean stress levels between the three types of exercise.
  • Alternative Hypothesis (H1): There is a difference in mean stress levels between at least two of the types of exercise.

2. Calculate the ANOVA statistics:

  • Compute the Sum of Squares Between (SSB), Sum of Squares Within (SSW), and Sum of Squares Total (SST).
  • Calculate the Degrees of Freedom (dfB, dfW, dfT).
  • Calculate the Mean Squares Between (MSB) and Mean Squares Within (MSW).
  • Compute the F-statistic (F = MSB / MSW).

3. Check the p-value associated with the calculated F-statistic.

  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This suggests there is a statistically significant difference in mean stress levels between the three exercise types.

4. Post-hoc tests

  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (exercise types) are different from each other.

Examples 2:

Suppose an agricultural scientist wants to compare the yield of three varieties of wheat. The scientist randomly selects four fields for each variety and plants them. After harvest, the yield from each field is measured in bushels. Here are the hypothetical yields:

The scientist wants to know if the differences in yields are due to the different varieties or just random variation.

Here’s how to apply the one-way ANOVA to this situation:

  • Null Hypothesis (H0): The means of the three populations are equal.
  • Alternative Hypothesis (H1): At least one population mean is different.
  • Calculate the Degrees of Freedom (dfB for between groups, dfW for within groups, dfT for total).
  • If the p-value is less than the chosen significance level (often 0.05), then we reject the null hypothesis in favor of the alternative hypothesis. This would suggest there is a statistically significant difference in mean yields among the three varieties.
  • If we reject the null hypothesis, we conduct a post-hoc test to determine which specific groups’ means (wheat varieties) are different from each other.

How to Conduct ANOVA

Conducting an Analysis of Variance (ANOVA) involves several steps. Here’s a general guideline on how to perform it:

  • Null Hypothesis (H0): The means of all groups are equal.
  • Alternative Hypothesis (H1): At least one group mean is different from the others.
  • The significance level (often denoted as α) is usually set at 0.05. This implies that you are willing to accept a 5% chance that you are wrong in rejecting the null hypothesis.
  • Data should be collected for each group under study. Make sure that the data meet the assumptions of an ANOVA: normality, independence, and homogeneity of variances.
  • Calculate the Degrees of Freedom (df) for each sum of squares (dfB, dfW, dfT).
  • Compute the Mean Squares Between (MSB) and Mean Squares Within (MSW) by dividing the sum of squares by the corresponding degrees of freedom.
  • Compute the F-statistic as the ratio of MSB to MSW.
  • Determine the critical F-value from the F-distribution table using dfB and dfW.
  • If the calculated F-statistic is greater than the critical F-value, reject the null hypothesis.
  • If the p-value associated with the calculated F-statistic is smaller than the significance level (0.05 typically), you reject the null hypothesis.
  • If you rejected the null hypothesis, you can conduct post-hoc tests (like Tukey’s HSD) to determine which specific groups’ means (if you have more than two groups) are different from each other.
  • Regardless of the result, report your findings in a clear, understandable manner. This typically includes reporting the test statistic, p-value, and whether the null hypothesis was rejected.

When to use ANOVA

ANOVA (Analysis of Variance) is used when you have three or more groups and you want to compare their means to see if they are significantly different from each other. It is a statistical method that is used in a variety of research scenarios. Here are some examples of when you might use ANOVA:

  • Comparing Groups : If you want to compare the performance of more than two groups, for example, testing the effectiveness of different teaching methods on student performance.
  • Evaluating Interactions : In a two-way or factorial ANOVA, you can test for an interaction effect. This means you are not only interested in the effect of each individual factor, but also whether the effect of one factor depends on the level of another factor.
  • Repeated Measures : If you have measured the same subjects under different conditions or at different time points, you can use repeated measures ANOVA to compare the means of these repeated measures while accounting for the correlation between measures from the same subject.
  • Experimental Designs : ANOVA is often used in experimental research designs when subjects are randomly assigned to different conditions and the goal is to compare the means of the conditions.

Here are the assumptions that must be met to use ANOVA:

  • Normality : The data should be approximately normally distributed.
  • Homogeneity of Variances : The variances of the groups you are comparing should be roughly equal. This assumption can be tested using Levene’s test or Bartlett’s test.
  • Independence : The observations should be independent of each other. This assumption is met if the data is collected appropriately with no related groups (e.g., twins, matched pairs, repeated measures).

Applications of ANOVA

The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications:

Agriculture

ANOVA is commonly used in agricultural research to compare the effectiveness of different types of fertilizers, crop varieties, or farming methods. For example, an agricultural researcher could use ANOVA to determine if there are significant differences in the yields of several varieties of wheat under the same conditions.

Manufacturing and Quality Control

ANOVA is used to determine if different manufacturing processes or machines produce different levels of product quality. For instance, an engineer might use it to test whether there are differences in the strength of a product based on the machine that produced it.

Marketing Research

Marketers often use ANOVA to test the effectiveness of different advertising strategies. For example, a marketer could use ANOVA to determine whether different marketing messages have a significant impact on consumer purchase intentions.

Healthcare and Medicine

In medical research, ANOVA can be used to compare the effectiveness of different treatments or drugs. For example, a medical researcher could use ANOVA to test whether there are significant differences in recovery times for patients who receive different types of therapy.

ANOVA is used in educational research to compare the effectiveness of different teaching methods or educational interventions. For example, an educator could use it to test whether students perform significantly differently when taught with different teaching methods.

Psychology and Social Sciences

Psychologists and social scientists use ANOVA to compare group means on various psychological and social variables. For example, a psychologist could use it to determine if there are significant differences in stress levels among individuals in different occupations.

Biology and Environmental Sciences

Biologists and environmental scientists use ANOVA to compare different biological and environmental conditions. For example, an environmental scientist could use it to determine if there are significant differences in the levels of a pollutant in different bodies of water.

Advantages of ANOVA

Here are some advantages of using ANOVA:

Comparing Multiple Groups: One of the key advantages of ANOVA is the ability to compare the means of three or more groups. This makes it more powerful and flexible than the t-test, which is limited to comparing only two groups.

Control of Type I Error: When comparing multiple groups, the chances of making a Type I error (false positive) increases. One of the strengths of ANOVA is that it controls the Type I error rate across all comparisons. This is in contrast to performing multiple pairwise t-tests which can inflate the Type I error rate.

Testing Interactions: In factorial ANOVA, you can test not only the main effect of each factor, but also the interaction effect between factors. This can provide valuable insights into how different factors or variables interact with each other.

Handling Continuous and Categorical Variables: ANOVA can handle both continuous and categorical variables . The dependent variable is continuous and the independent variables are categorical.

Robustness: ANOVA is considered robust to violations of normality assumption when group sizes are equal. This means that even if your data do not perfectly meet the normality assumption, you might still get valid results.

Provides Detailed Analysis: ANOVA provides a detailed breakdown of variances and interactions between variables which can be useful in understanding the underlying factors affecting the outcome.

Capability to Handle Complex Experimental Designs: Advanced types of ANOVA (like repeated measures ANOVA, MANOVA, etc.) can handle more complex experimental designs, including those where measurements are taken on the same subjects over time, or when you want to analyze multiple dependent variables at once.

Disadvantages of ANOVA

Some limitations or disadvantages that are important to consider:

Assumptions: ANOVA relies on several assumptions including normality (the data follows a normal distribution), independence (the observations are independent of each other), and homogeneity of variances (the variances of the groups are roughly equal). If these assumptions are violated, the results of the ANOVA may not be valid.

Sensitivity to Outliers: ANOVA can be sensitive to outliers. A single extreme value in one group can affect the sum of squares and consequently influence the F-statistic and the overall result of the test.

Dichotomous Variables: ANOVA is not suitable for dichotomous variables (variables that can take only two values, like yes/no or male/female). It is used to compare the means of groups for a continuous dependent variable.

Lack of Specificity: Although ANOVA can tell you that there is a significant difference between groups, it doesn’t tell you which specific groups are significantly different from each other. You need to carry out further post-hoc tests (like Tukey’s HSD or Bonferroni) for these pairwise comparisons.

Complexity with Multiple Factors: When dealing with multiple factors and interactions in factorial ANOVA, interpretation can become complex. The presence of interaction effects can make main effects difficult to interpret.

Requires Larger Sample Sizes: To detect an effect of a certain size, ANOVA generally requires larger sample sizes than a t-test.

Equal Group Sizes: While not always a strict requirement, ANOVA is most powerful and its assumptions are most likely to be met when groups are of equal or similar sizes.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Substantive Framework

Substantive Framework – Types, Methods and...

  • Analysis of Variance (ANOVA): Types, Examples & Uses

busayo.longe

ANOVA is an acronym that stands for “analysis of variance.” The ANOVA test is used to determine whether a significant difference exists between the means of three or more groups. This article will look at the types of ANOVA and their uses.

What is ANOVA?

ANOVA, or analysis of variance, is a statistical method used to determine whether there are significant differences between the means of two or more groups. It separates the observed variation found within a data set into components attributable to different sources of variation.

The null hypothesis states that the means of all groups are the same and that any difference between group means observed in the data is due to random chance. The one-way ANOVA compares the mean differences between one independent variable and one dependent variable by examining means across three or more groups.

  • A repeated measures design might be used within-subjects study when;
  • time is an important factor in how much something changes
  • what effect it has on individuals or groups, and
  • how those changes vary based on other factors.

ANOVA helps you compare how different groups are different from each other and allows you to see if any two groups are statistically similar.

Read: Hypothesis Testing: Definition, Uses, Limitations + Examples

Because it can be a complex procedure, it’s not often used in journalism (unless you’re one of those fancy data-driven journalists) but it is frequently used in academic research. For example, let’s say you’re studying how different brands of salad dressing affect the taste of salad (the dependent variable).

You would have your different brands as independent variables maybe Caesar dressing, Italian dressing, Blue Cheese dressing, and Thousand Island dressing. You could poll your participants on their preferred salad flavor before and after trying all four dressings.

The ANOVA method would help you evaluate whether or not there was a statistically significant difference between the four dressings ‘effect on participants’ preferred flavor profiles. Then you might be able to say something like “participants preferred salad flavors were most highly influenced by Blue Cheese dressing.”

Read: Type I vs Type II Errors: Causes, Examples & Prevention

How Does ANOVA Work?

ANOVA is a statistical analysis that tests the differences between the means of two or more treatment groups. When you want to know if there’s a difference between two or more groups, you can run a t-test, but that’s only useful when you have two groups. What do you do when you have more than two? That’s where ANOVA comes in.

ANOVA lets you compare multiple groups at once and see if they differ significantly from each other. It’s like running a bunch of t-tests all at the same time, which is great because it saves time and helps you avoid making errors with multiple comparisons.

The basic logic behind the ANOVA test is quite similar to the t-test. In a nutshell, it compares the variability within each sample against the variability between each sample.

Read: Explanatory Research: Types, Examples, Pros & Cons

For example, let’s say you want to know whether there’s a significant difference in height among your three friends. You take three measurements of each person: once before breakfast, once after breakfast, and once after lunch.

After calculating the mean height for each individual and for each time period (pre-breakfast, post-breakfast, post-lunch), you plug all these values into the ANOVA formula. The ANOVA will then tell you whether there’s a statistically significant difference in height among these three time periods.

How Formplus Can Help Aggregate Data for ANOVA

Formplus makes it easy to aggregate data from multiple sources. You can use the platform to create surveys, forms, and other documents that require data collection and automatically import them into Google Sheets.

Formplus also offers a number of tools that help researchers collect data for ANOVA tests. These tools include fields for Likert scales and multiple-choice questions, which allow you to provide the respondent with a list of options from which they can select their answers.

Use For Free: Simple Data Collection Tool: Online & Offline Data Tool

You can also use Formplus to create forms for surveys, questionnaires, interviews, and other data collection methods. The information provided by your respondents can then be exported as CSV files for further analysis in a statistical software package like SPSS.

Now, the first step in collecting data for an ANOVA test is to create a survey that will collect the relevant information about your population. With Formplus, you can create custom surveys using its form builder but first, you will need to log in to your account.

analysis of variance in research example

The form builder is intuitive and easy to use, so you don’t need to be an expert in web design to use it. You can easily add question fields including multiple-choice questions and matrix questions and add logic to your survey so that respondents only see questions that are relevant to them.

analysis of variance in research example

Once you’ve created your survey, you can share it with respondents via a QR code, email, or a link. You can also embed your survey right into your website using Formplus’s advanced HTML code generator. 

Read: Research Questions: Definitions, Types + [Examples]

analysis of variance in research example

Once you’ve collected responses, you can export them in CSV format or display them as charts and graphs within Formplus’s dashboard.

Types of ANOVA

There are two main types of ANOVA: one-way (or unidirectional) and two-way.

1. Unidirectional or one-way

One-way ANOVA compares the means of three or more independent groups to see if they’re statistically different. In a one-way experiment, the experimenter is interested in studying how a response variable changes according to the levels of one single factor.

For example, in an agricultural field trial, the farmer may be interested in studying how the average yield of corn varies when three different types of fertilizer are used. The three types of fertilizer are levels of a single factor and the corn yield is a response variable. Here, the interest is in comparing the mean values of only one single factor.

In another example, you want to test the effect of adding four different levels of magnesium (mg) into a plant’s water on the growth of that plant. You grow 50 plants, each with a different level of mg (0, 5, 10, 15), and measure their height every week for one month. Then you would use one-way ANOVA to determine if there was a statistically significant difference in the mean heights of plants watered with the different amounts of mg.

2. Two-way ANOVA

Two-way ANOVA determines the effect of two factors, such as product and gender, on a dependent variable like sales revenue. In a two-way experiment, the experimenter studies how two factors affect a response variable. For example, the farmer may be interested not only in seeing how different fertilizers affect corn yields but also in studying whether or not yields vary when corn is planted at different times during the year. In this example, fertilizer type and time of planting each contribute to variation in corn yield and we call them factors.

Read: Extrapolation in Statistical Research: Definition, Examples, Types, Applications

When Might You Use ANOVA?

  • You might use ANOVA to compare three different treatments against each other, to compare two different diets against each other, or compare two different exercise programs against each other. For example, let’s say you want to know if there’s a difference between the average heights of four different types of trees in a forest. Instead of calculating whether each pair is statistically different from one another, you could run one ANOVA test to find out whether any of them are significantly different from one another.
  • ANOVA is also used as a method of testing how well different groups of data fit together. Let’s say you have a group of dogs, and you want to know whether they are all the same size or if some dogs are bigger than others. You can use ANOVA to test whether the groups differ from each other. You can also use ANOVA to compare more than two groups at once. You could test whether German Shepherds are the same size as Poodles, teacup Poodles, teacup Chihuahuas, and regular-sized Chihuahuas. This way, you could see if all of these dog breeds are the same size, or if one breed is larger than another breed.
  • You can use ANOVA to test for statistical differences between two or more groups to see if there is a significant difference between the means of those groups. ANOVA determines whether a test is valid by looking at the variation between and within groups.
  • If a test shows a large standard deviation between groups, then the differences are likely due to random chance; however, if the standard deviation within groups is large, then it may be due to real differences between groups.

An important thing to know about ANOVA tests is that they assume all groups are sampled from populations with equal variances. If the variances between groups are not equal, you’ll need to use Welch’s ANOVA instead.

Read: Alternative vs Null Hypothesis: Pros, Cons, Uses & Examples

Examples of Using ANOVA

Let’s say you want to compare the heights of men and women. Here’s how you might do that with ANOVA:

  • Gather your data
  • Calculate the mean (average) height of all people in the sample
  • Calculate the mean height for men and women separately
  • Find out how much these numbers differ from the overall mean and square them. This will tell you how much each group differs from the overall sample. Squaring makes sure that we’re only dealing with positive values since it’s not meaningful for you to talk about “negative differences.”
  • Add up all of these squared difference values. This is called your SST, or “sum of squares total.” It tells you how much variation there is in your whole sample.
  • Find out how much each group contributes to your overall SST by dividing the squared difference value we calculated in step 4 by the number of people in each group (this will be two values, one for men and one for women). 

Understanding ANOVA Assumptions

To run an ANOVA, you need to make sure your data meet certain assumptions:

  • Your dependent variable should be measured at the continuous level (remember: ordinal, interval, and ratio data are all continuous).
  • Your independent variable should consist of two or more categorical, independent groups.
  • Samples should be random, independent, and come from a normal population.
  • Variances between groups should be equal. This assumption is tested using Levene’s test.

These assumptions are fairly strict and somewhat limiting—and if you find yourself not able to meet them, you may need to look into other statistical techniques.

What are the Limitations of ANOVA?

The limitations of ANOVA include:

  • It is inflexible in terms of the number of groups you can have, and there are other tests that are more powerful.
  • ANOVA can only be used when the dependent variable is continuous. If you want to compare three or more means on a categorical dependent variable, you may need to use a chi-square test instead.
  • You are limited to one dependent variable. You cannot use ANOVA if you have multiple dependent variables that you want to analyze simultaneously.

ANOVA is a great tool to use when you want to compare a continuous variable across 3 or more independent groups. Keep in mind that if your data fails the ANOVA assumption of homogeneity of variance, it can lead to some inaccurate results.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • analysis of variance
  • anova examples
  • one-way anova
  • two-way anova
  • busayo.longe

Formplus

You may also like:

Type I vs Type II Errors: Causes, Examples & Prevention

This article will discuss the two different types of errors in hypothesis testing and how you can prevent them from occurring in your research

analysis of variance in research example

Acceptance Sampling: Meaning, Examples, When to Use

In this post, we will discuss extensively what acceptance sampling is and when it is applied.

Extrapolation in Statistical Research: Definition, Examples, Types, Applications

In this article we’ll look at the different types and characteristics of extrapolation, plus how it contrasts to interpolation.

Extraneous Variables Explained: Types & Examples

In this article, we are going to discuss extraneous variables and how they impact research.

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Unit 16: Analysis of variance (ANOVA)

About this unit.

Analysis of variance, or ANOVA, is an approach to comparing data with multiple means across different groups, and allows us to see patterns and trends within complex and varied data. See three examples of ANOVA in action as you learn how it can be applied to more complex statistical analyses.

Analysis of variance (ANOVA)

  • ANOVA 1: Calculating SST (total sum of squares) (Opens a modal)
  • ANOVA 2: Calculating SSW and SSB (total sum of squares within and between) (Opens a modal)
  • ANOVA 3: Hypothesis test with F-statistic (Opens a modal)

What Is An ANOVA Test In Statistics: Analysis Of Variance

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

On This Page:

An ANOVA test is a statistical test used to determine if there is a statistically significant difference between two or more categorical groups by testing for differences of means using a variance.

Another key part of ANOVA is that it splits the independent variable into two or more groups.

For example, one or more groups might be expected to influence the dependent variable, while the other group is used as a control group and is not expected to influence the dependent variable.

Assumptions of ANOVA

The assumptions of the ANOVA test are the same as the general assumptions for any parametric test:

  • An ANOVA can only be conducted if there is no relationship between the subjects in each sample. This means that subjects in the first group cannot also be in the second group (e.g., independent samples/between groups).
  • The different groups/levels must have equal sample sizes .
  • An ANOVA can only be conducted if the dependent variable is normally distributed so that the middle scores are the most frequent and the extreme scores are the least frequent.
  • Population variances must be equal (i.e., homoscedastic). Homogeneity of variance means that the deviation of scores (measured by the range or standard deviation, for example) is similar between populations.

Types of ANOVA Tests

There are different types of ANOVA tests. The two most common are a “One-Way” and a “Two-Way.”

The difference between these two types depends on the number of independent variables in your test.

One-way ANOVA

A one-way ANOVA (analysis of variance) has one categorical independent variable (also known as a factor) and a normally distributed continuous (i.e., interval or ratio level) dependent variable.

The independent variable divides cases into two or more mutually exclusive levels, categories, or groups.

The one-way ANOVA test for differences in the means of the dependent variable is broken down by the levels of the independent variable.

An example of a one-way ANOVA includes testing a therapeutic intervention (CBT, medication, placebo) on the incidence of depression in a clinical sample.

Note : Both the One-Way ANOVA and the Independent Samples t-Test can compare the means for two groups. However, only the One-Way ANOVA can compare the means across three or more groups.

Two-way (factorial) ANOVA

A two-way ANOVA (analysis of variance) has two or more categorical independent variables (also known as a factor) and a normally distributed continuous (i.e., interval or ratio level) dependent variable.

The independent variables divide cases into two or more mutually exclusive levels, categories, or groups. A two-way ANOVA is also called a factorial ANOVA.

An example of factorial ANOVAs include testing the effects of social contact (high, medium, low), job status (employed, self-employed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.

What are “Groups” or “Levels”?

In ANOVA, “groups” or “levels” refer to the different categories of the independent variable being compared.

For example, if the independent variable is “eggs,” the levels might be Non-Organic, Organic, and Free Range Organic. The dependent variable could then be the price per dozen eggs.

ANOVA F -value

The test statistic for an ANOVA is denoted as F . The formula for ANOVA is F = variance caused by treatment/variance due to random chance.

The ANOVA F value can tell you if there is a significant difference between the levels of the independent variable, when p < .05. So, a higher F value indicates that the treatment variables are significant.

Note that the ANOVA alone does not tell us specifically which means were different from one another. To determine that, we would need to follow up with multiple comparisons (or post-hoc) tests.

When the initial F test indicates that significant differences exist between group means, post hoc tests are useful for determining which specific means are significantly different when you do not have specific hypotheses that you wish to test.

Post hoc tests compare each pair of means (like t-tests), but unlike t-tests, they correct the significance estimate to account for the multiple comparisons.

What Does “Replication” Mean?

Replication requires a study to be repeated with different subjects and experimenters. This would enable a statistical analyzer to confirm a prior study by testing the same hypothesis with a new sample.

How to run an ANOVA?

For large datasets, it is best to run an ANOVA in statistical software such as R or Stata. Let’s refer to our Egg example above.

Non-Organic, Organic, and Free-Range Organic Eggs would be assigned quantitative values (1,2,3). They would serve as our independent treatment variable, while the price per dozen eggs would serve as the dependent variable. Other erroneous variables may include “Brand Name” or “Laid Egg Date.”

Using data and the aov() command in R, we could then determine the impact Egg Type has on the price per dozen eggs.

ANOVA vs. t-test?

T-tests and ANOVA tests are both statistical techniques used to compare differences in means and spreads of the distributions across populations.

The t-test determines whether two populations are statistically different from each other, whereas ANOVA tests are used when an individual wants to test more than two levels within an independent variable.

Referring back to our egg example, testing Non-Organic vs. Organic would require a t-test while adding in Free Range as a third option demands ANOVA.

Rather than generate a t-statistic, ANOVA results in an f-statistic to determine statistical significance.

What does anova stand for?

ANOVA stands for Analysis of Variance. It’s a statistical method to analyze differences among group means in a sample. ANOVA tests the hypothesis that the means of two or more populations are equal, generalizing the t-test to more than two groups.

It’s commonly used in experiments where various factors’ effects are compared. It can also handle complex experiments with factors that have different numbers of levels.

When to use anova?

ANOVA should be used when one independent variable has three or more levels (categories or groups). It’s designed to compare the means of these multiple groups.

What does an anova test tell you?

An ANOVA test tells you if there are significant differences between the means of three or more groups. If the test result is significant, it suggests that at least one group’s mean differs from the others. It does not, however, specify which groups are different from each other.

Why do you use chi-square instead of ANOVA?

You use the chi-square test instead of ANOVA when dealing with categorical data to test associations or independence between two categorical variables. In contrast, ANOVA is used for continuous data to compare the means of three or more groups.

Print Friendly, PDF & Email

Related Articles

Exploratory Data Analysis

Exploratory Data Analysis

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Convergent Validity: Definition and Examples

Content Validity in Research: Definition & Examples

Content Validity in Research: Definition & Examples

Construct Validity In Psychology Research

Construct Validity In Psychology Research

Analysis of Variance (ANOVA)

What is an analysis of variance.

An analysis of variance (ANOVA) tests whether statistically significant differences exist between more than two samples. For this purpose, the means and variances of the respective groups are compared with each other. In contrast to the t-test , which tests whether there is a difference between two samples, the ANOVA tests whether there is a difference between more than two groups.

There are different types of analysis of variance, being the one-way and two-way analyses of variance the most common ones, each of which can be calculated either with or without repeated measurements.

In this tutorial you will learn the basics of ANOVA; for each of the four types of analysis of variance you will find a separate detailed tutorial:

  • One-factor (or one-way) ANOVA
  • Two-factors (or two-way) ANOVA
  • One way ANOVA with repeated measurements
  • Two-factors ANOVA with repeated measurements

ANOVA

Tip: You can easily calculate all four variants of the ANOVA online on DATAtab. Just visit the ANOVA calculator .

Why not calculate multiple t-tests?

ANOVA is used when there are more than two groups. Of course, it would also be a possibility to calculate a t-test for each combination of the groups. The problem here, however, is that every hypothesis test has some degree of error. This probability of error is usually set at 5%, so that, from a purely statistical point of view, every 20 th test gives a wrong result

If, for example, 20 groups are compared in which there is actually no difference, one of the tests will show a significant difference purely due to the sampling.

Difference between one-way and two-way ANOVA

The one-way analysis of variance only checks whether an independent variable has an influence on a metric dependent variable. This is the case, for example, if it is to be examined whether the place of residence (independent variable) has an influence on the salary (dependent variable). However, if two factors, i.e. two independent variables, are considered, a two-way analysis of variance must be used.

Two-way analysis of variance tests whether there is a difference between more than two independent samples that are split between two variables or factors.

Factors in the analysis of variance

Analysis of variance with and without repeated measures

Depending on whether the sample is independent or dependent , either analysis of variance with or without repeated measures is used. If the same person was interviewed at several points in time, the sample is a dependent sample and analysis of variance with repeated measurements is used.

One-way ANOVA

The one-way analysis of variance is an extension of the t-test for independent groups . With the t-test only a maximum of two groups can be compared; this is now extended to more than two groups. For two groups (k = 2), the analysis of variance is therefore equivalent to the t-test. The independent variable is accordingly a nominally scaled variable with at least two characteristic values. The dependent variable is on a metric scale . In the case of the analysis of variance, the independent variable is referred to as the factor.

Is there a difference in the population between the different groups of the independent variable with respect to the dependent variable?

The aim of ANOVA is to explain as much variance as possible in the dependent variable by dividing it into the groups. Let us consider the following example.

One way ANOVA example

With the help of the dependent variable, e.g. "highest educational qualification" with the three characteristics group 1, group 2 and group 3 should be explained as much variance of the dependent variable "salary" as possible. In the graphic below, under A) a lot of variance can be explained with the three groups and under B) only very little variance.

analysis of variance

Accordingly, in case A) the groups have a very high influence on the salary and in case B) they do not.

In the case of A), the values in the respective groups deviate only slightly from the group mean, the variance within the groups is therefore very small. In the case of B), however, the variance within the groups is large. The variance between the groups is the other way round; it is large in the case of A) and small in the case of B). In the case of B) the group means are close together, in the case of A) they are not.

Analysis of variance hypotheses

The null hypothesis and the alternative hypothesis result from a one-way analysis of variance as follows:

  • Null hypothesis H 0 : The mean value of all groups is the same.
  • Alternative hypothesis H 1 : There are differences in the mean values of the groups.

The results of the Anova can only make a statement about whether there are differences between at least two groups. However, it cannot be determined which groups are exactly different. A post-hoc test is needed to determine which groups differ. There are various methods to choose from, with Duncan, Dunnet C and Scheffe being among the most common methods.

In a screw factory, a screw is produced by three different production lines. You now want to find out whether all production lines produce screws with the same weight. To do this, take 50 screws from each production line and measure the weight. Now you use the ANOVA procedure to determine whether the average weight of the screws from the three production lines differs significantly from one another.

An example of the one-way analysis of variance would be to investigate whether the daily coffee consumption of students from different fields of study differs significantly.

Assumptions for one-way analysis of variance

  • Scale level: The scale level of the dependent variable must be metric, whereas the independent variable must be nominally scaled.
  • Homogeneity: The variances in each group should be roughly the same. This can be checked with the Levene test.
  • Normal distribution: The data within the groups should be normally distributed. This means that the majority of the values ​​are in the average range, while very few values ​​are significantly below or significantly above. If this condition is not met, the Kruskal-Wallis test can be used.

If there are no independent samples but dependent ones, then a one-way analysis of variance with repeated measures is used.

Welch's ANOVA

If the condition of variance homogeneity is not fulfilled, Welch's ANOVA can be calculated instead of the "normal" ANOVA. If the Levene test results in a significant deviation from the variances in the groups, DATAtab automatically calculates the Welch's ANOVA in addition.

Welch's ANOVA

Effect size Eta squared (η²)

The best known measures of effect size for analysis of variance are the Eta squared and the partial Eta squared. For a one-way ANOVA, the Eta squared and the partial Eta squared are identical.

The Eta squared estimate the variance that a variable explains. however, it should be noted that the variance explained is always overestimated. Eta squared is calculated by dividing the sum of squares between by the sum of squares total.

Two factor analysis of variance

As the name suggests, two-way analysis of variance examines the influence of two factors on a dependent variable. This extends the one-way analysis of variance by a further factor, i.e. by a further nominally scaled independent variable. The question is again whether the mean of the groups differs significantly.

In a screw factory, a screw is produced by three different production systems, factor 1 in two shifts, factor 2. You now want to find out whether the production facilities or the shifts have an influence on the weight of the bolts. To do this, take 50 screws from each production line and each shift and measure the weight. Now you use two-factor ANOVA to determine whether the average weight of the screws from the three production lines and the two shifts is significantly different from one another.

Example with DATAtab

One-way analysis of variance:.

You want to check whether there is a difference in coffee consumption between students in different subjects. To do this, ask 10 students from each field of study.

After the table above has been copied into the hypothesis test calculator, simply click on Hypothesis test and select the three variables. The result looks like this:

Where N is the number of cases for each category, df is the degrees of freedom, F is the F-statistic from the calculated analysis of variance and p is the p-value.

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 412 pages
  • 5rd revised edition (April 2024)
  • Only 7.99 €

Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Statistics Calculator

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

Hypothesis Testing - Analysis of Variance (ANOVA)

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

analysis of variance in research example

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups. For example, in some clinical trials there are more than two comparison groups. In a clinical trial to evaluate a new medication for asthma, investigators might compare an experimental medication to a placebo and to a standard treatment (i.e., a medication currently being used). In an observational study such as the Framingham Heart Study, it might be of interest to compare mean blood pressure or mean cholesterol levels in persons who are underweight, normal weight, overweight and obese.  

The technique to test for a difference in more than two independent means is an extension of the two independent samples procedure discussed previously which applies when there are exactly two independent comparison groups. The ANOVA technique applies when there are two or more than two independent groups. The ANOVA procedure is used to compare the means of the comparison groups and is conducted using the same five step approach used in the scenarios discussed in previous sections. Because there are more than two groups, however, the computation of the test statistic is more involved. The test statistic must take into account the sample sizes, sample means and sample standard deviations in each of the comparison groups.

If one is examining the means observed among, say three groups, it might be tempting to perform three separate group to group comparisons, but this approach is incorrect because each of these comparisons fails to take into account the total data, and it increases the likelihood of incorrectly concluding that there are statistically significate differences, since each comparison adds to the probability of a type I error. Analysis of variance avoids these problemss by asking a more global question, i.e., whether there are significant differences among the groups, without addressing differences between any two groups in particular (although there are additional tests that can do this if the analysis of variance indicates that there are differences among the groups).

The fundamental strategy of ANOVA is to systematically examine variability within groups being compared and also examine variability among the groups being compared.

Learning Objectives

After completing this module, the student will be able to:

  • Perform analysis of variance by hand
  • Appropriately interpret results of analysis of variance tests
  • Distinguish between one and two factor analysis of variance tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

The ANOVA Approach

Consider an example with four independent groups and a continuous outcome measure. The independent groups might be defined by a particular characteristic of the participants such as BMI (e.g., underweight, normal weight, overweight, obese) or by the investigator (e.g., randomizing participants to one of four competing treatments, call them A, B, C and D). Suppose that the outcome is systolic blood pressure, and we wish to test whether there is a statistically significant difference in mean systolic blood pressures among the four groups. The sample data are organized as follows:

The hypotheses of interest in an ANOVA are as follows:

  • H 0 : μ 1 = μ 2 = μ 3 ... = μ k
  • H 1 : Means are not all equal.

where k = the number of independent comparison groups.

In this example, the hypotheses are:

  • H 0 : μ 1 = μ 2 = μ 3 = μ 4
  • H 1 : The means are not all equal.

The null hypothesis in ANOVA is always that there is no difference in means. The research or alternative hypothesis is always that the means are not all equal and is usually written in words rather than in mathematical symbols. The research hypothesis captures any difference in means and includes, for example, the situation where all four means are unequal, where one is different from the other three, where two are different, and so on. The alternative hypothesis, as shown above, capture all possible situations other than equality of all means specified in the null hypothesis.

Test Statistic for ANOVA

The test statistic for testing H 0 : μ 1 = μ 2 = ... =   μ k is:

and the critical value is found in a table of probability values for the F distribution with (degrees of freedom) df 1 = k-1, df 2 =N-k. The table can be found in "Other Resources" on the left side of the pages.

NOTE: The test statistic F assumes equal variability in the k populations (i.e., the population variances are equal, or s 1 2 = s 2 2 = ... = s k 2 ). This means that the outcome is equally variable in each of the comparison populations. This assumption is the same as that assumed for appropriate use of the test statistic to test equality of two independent means. It is possible to assess the likelihood that the assumption of equal variances is true and the test can be conducted in most statistical computing packages. If the variability in the k comparison groups is not similar, then alternative techniques must be used.

The F statistic is computed by taking the ratio of what is called the "between treatment" variability to the "residual or error" variability. This is where the name of the procedure originates. In analysis of variance we are testing for a difference in means (H 0 : means are all equal versus H 1 : means are not all equal) by evaluating variability in the data. The numerator captures between treatment variability (i.e., differences among the sample means) and the denominator contains an estimate of the variability in the outcome. The test statistic is a measure that allows us to assess whether the differences among the sample means (numerator) are more than would be expected by chance if the null hypothesis is true. Recall in the two independent sample test, the test statistic was computed by taking the ratio of the difference in sample means (numerator) to the variability in the outcome (estimated by Sp).  

The decision rule for the F test in ANOVA is set up in a similar way to decision rules we established for t tests. The decision rule again depends on the level of significance and the degrees of freedom. The F statistic has two degrees of freedom. These are denoted df 1 and df 2 , and called the numerator and denominator degrees of freedom, respectively. The degrees of freedom are defined as follows:

df 1 = k-1 and df 2 =N-k,

where k is the number of comparison groups and N is the total number of observations in the analysis.   If the null hypothesis is true, the between treatment variation (numerator) will not exceed the residual or error variation (denominator) and the F statistic will small. If the null hypothesis is false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-hand) tail of the distribution as shown below.

Rejection Region for F   Test with a =0.05, df 1 =3 and df 2 =36 (k=4, N=40)

Graph of rejection region for the F statistic with alpha=0.05

For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87.

The ANOVA Procedure

We will next illustrate the ANOVA procedure using the five step approach. Because the computation of the test statistic is involved, the computations are often organized in an ANOVA table. The ANOVA table breaks down the components of variation in the data into variation between treatments and error or residual variation. Statistical computing packages also produce ANOVA tables as part of their standard output for ANOVA, and the ANOVA table is set up as follows: 

where  

  • X = individual observation,
  • k = the number of treatments or independent comparison groups, and
  • N = total number of observations or total sample size.

The ANOVA table above is organized as follows.

  • The first column is entitled "Source of Variation" and delineates the between treatment and error or residual variation. The total variation is the sum of the between treatment and error variation.
  • The second column is entitled "Sums of Squares (SS)" . The between treatment sums of squares is

and is computed by summing the squared differences between each treatment (or group) mean and the overall mean. The squared differences are weighted by the sample sizes per group (n j ). The error sums of squares is:

and is computed by summing the squared differences between each observation and its group mean (i.e., the squared differences between each observation in group 1 and the group 1 mean, the squared differences between each observation in group 2 and the group 2 mean, and so on). The double summation ( SS ) indicates summation of the squared differences within each treatment and then summation of these totals across treatments to produce a single value. (This will be illustrated in the following examples). The total sums of squares is:

and is computed by summing the squared differences between each observation and the overall sample mean. In an ANOVA, data are organized by comparison or treatment groups. If all of the data were pooled into a single sample, SST would reflect the numerator of the sample variance computed on the pooled or total sample. SST does not figure into the F statistic directly. However, SST = SSB + SSE, thus if two sums of squares are known, the third can be computed from the other two.

  • The third column contains degrees of freedom . The between treatment degrees of freedom is df 1 = k-1. The error degrees of freedom is df 2 = N - k. The total degrees of freedom is N-1 (and it is also true that (k-1) + (N-k) = N-1).
  • The fourth column contains "Mean Squares (MS)" which are computed by dividing sums of squares (SS) by degrees of freedom (df), row by row. Specifically, MSB=SSB/(k-1) and MSE=SSE/(N-k). Dividing SST/(N-1) produces the variance of the total sample. The F statistic is in the rightmost column of the ANOVA table and is computed by taking the ratio of MSB/MSE.  

A clinical trial is run to compare weight loss programs and participants are randomly assigned to one of the comparison programs and are counseled on the details of the assigned program. Participants follow the assigned program for 8 weeks. The outcome of interest is weight loss, defined as the difference in weight measured at the start of the study (baseline) and weight measured at the end of the study (8 weeks), measured in pounds.  

Three popular weight loss programs are considered. The first is a low calorie diet. The second is a low fat diet and the third is a low carbohydrate diet. For comparison purposes, a fourth group is considered as a control group. Participants in the fourth group are told that they are participating in a study of healthy behaviors with weight loss only one component of interest. The control group is included here to assess the placebo effect (i.e., weight loss due to simply participating in the study). A total of twenty patients agree to participate in the study and are randomly assigned to one of the four diet groups. Weights are measured at baseline and patients are counseled on the proper implementation of the assigned diet (with the exception of the control group). After 8 weeks, each patient's weight is again measured and the difference in weights is computed by subtracting the 8 week weight from the baseline weight. Positive differences indicate weight losses and negative differences indicate weight gains. For interpretation purposes, we refer to the differences in weights as weight losses and the observed weight losses are shown below.

Is there a statistically significant difference in the mean weight loss among the four diets?  We will run the ANOVA using the five-step approach.

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ 1 = μ 2 = μ 3 = μ 4 H 1 : Means are not all equal              α=0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is the F statistic for ANOVA, F=MSB/MSE.

  • Step 3. Set up decision rule.  

The appropriate critical value can be found in a table of probabilities for the F distribution(see "Other Resources"). In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k. In this example, df 1 =k-1=4-1=3 and df 2 =N-k=20-4=16. The critical value is 3.24 and the decision rule is as follows: Reject H 0 if F > 3.24.

  • Step 4. Compute the test statistic.  

To organize our computations we complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean based on the total sample.  

We can now compute

So, in this case:

Next we compute,

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants in the low calorie diet:  

For the participants in the low fat diet:  

For the participants in the low carbohydrate diet:  

For the participants in the control group:

We can now construct the ANOVA table .

  • Step 5. Conclusion.  

We reject H 0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to show that there is a difference in mean weight loss among the four diets.    

ANOVA is a test that provides a global assessment of a statistical difference in more than two independent means. In this example, we find that there is a statistically significant difference in mean weight loss among the four diets considered. In addition to reporting the results of the statistical test of hypothesis (i.e., that there is a statistically significant difference in mean weight losses at α=0.05), investigators should also report the observed sample means to facilitate interpretation of the results. In this example, participants in the low calorie diet lost an average of 6.6 pounds over 8 weeks, as compared to 3.0 and 3.4 pounds in the low fat and low carbohydrate groups, respectively. Participants in the control group lost an average of 1.2 pounds which could be called the placebo effect because these participants were not participating in an active arm of the trial specifically targeted for weight loss. Are the observed weight losses clinically meaningful?

Another ANOVA Example

Calcium is an essential mineral that regulates the heart, is important for blood clotting and for building healthy bones. The National Osteoporosis Foundation recommends a daily calcium intake of 1000-1200 mg/day for adult men and women. While calcium is contained in some foods, most adults do not get enough calcium in their diets and take supplements. Unfortunately some of the supplements have side effects such as gastric distress, making them difficult for some patients to take on a regular basis.  

 A study is designed to test whether there is a difference in mean daily calcium intake in adults with normal bone density, adults with osteopenia (a low bone density which may lead to osteoporosis) and adults with osteoporosis. Adults 60 years of age with normal bone density, osteopenia and osteoporosis are selected at random from hospital records and invited to participate in the study. Each participant's daily calcium intake is measured based on reported food intake and supplements. The data are shown below.   

Is there a statistically significant difference in mean calcium intake in patients with normal bone density as compared to patients with osteopenia and osteoporosis? We will run the ANOVA using the five-step approach.

H 0 : μ 1 = μ 2 = μ 3 H 1 : Means are not all equal                            α=0.05

In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k.   In this example, df 1 =k-1=3-1=2 and df 2 =N-k=18-3=15. The critical value is 3.68 and the decision rule is as follows: Reject H 0 if F > 3.68.

To organize our computations we will complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean.  

 If we pool all N=18 observations, the overall mean is 817.8.

We can now compute:

Substituting:

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants with normal bone density:

For participants with osteopenia:

For participants with osteoporosis:

We do not reject H 0 because 1.395 < 3.68. We do not have statistically significant evidence at a =0.05 to show that there is a difference in mean calcium intake in patients with normal bone density as compared to osteopenia and osterporosis. Are the differences in mean calcium intake clinically meaningful? If so, what might account for the lack of statistical significance?

One-Way ANOVA in R

The video below by Mike Marin demonstrates how to perform analysis of variance in R. It also covers some other statistical issues, but the initial part of the video will be useful to you.

Two-Factor ANOVA

The ANOVA tests described above are called one-factor ANOVAs. There is one treatment or grouping factor with k > 2 levels and we wish to compare the means across the different categories of this factor. The factor might represent different diets, different classifications of risk for disease (e.g., osteoporosis), different medical treatments, different age groups, or different racial/ethnic groups. There are situations where it may be of interest to compare means of a continuous outcome across two or more factors. For example, suppose a clinical trial is designed to compare five different treatments for joint pain in patients with osteoarthritis. Investigators might also hypothesize that there are differences in the outcome by sex. This is an example of a two-factor ANOVA where the factors are treatment (with 5 levels) and sex (with 2 levels). In the two-factor ANOVA, investigators can assess whether there are differences in means due to the treatment, by sex or whether there is a difference in outcomes by the combination or interaction of treatment and sex. Higher order ANOVAs are conducted in the same way as one-factor ANOVAs presented here and the computations are again organized in ANOVA tables with more rows to distinguish the different sources of variation (e.g., between treatments, between men and women). The following example illustrates the approach.

Consider the clinical trial outlined above in which three competing treatments for joint pain are compared in terms of their mean time to pain relief in patients with osteoarthritis. Because investigators hypothesize that there may be a difference in time to pain relief in men versus women, they randomly assign 15 participating men to one of the three competing treatments and randomly assign 15 participating women to one of the three competing treatments (i.e., stratified randomization). Participating men and women do not know to which treatment they are assigned. They are instructed to take the assigned medication when they experience joint pain and to record the time, in minutes, until the pain subsides. The data (times to pain relief) are shown below and are organized by the assigned treatment and sex of the participant.

Table of Time to Pain Relief by Treatment and Sex

The analysis in two-factor ANOVA is similar to that illustrated above for one-factor ANOVA. The computations are again organized in an ANOVA table, but the total variation is partitioned into that due to the main effect of treatment, the main effect of sex and the interaction effect. The results of the analysis are shown below (and were generated with a statistical computing package - here we focus on interpretation). 

 ANOVA Table for Two-Factor ANOVA

There are 4 statistical tests in the ANOVA table above. The first test is an overall test to assess whether there is a difference among the 6 cell means (cells are defined by treatment and sex). The F statistic is 20.7 and is highly statistically significant with p=0.0001. When the overall test is significant, focus then turns to the factors that may be driving the significance (in this example, treatment, sex or the interaction between the two). The next three statistical tests assess the significance of the main effect of treatment, the main effect of sex and the interaction effect. In this example, there is a highly significant main effect of treatment (p=0.0001) and a highly significant main effect of sex (p=0.0001). The interaction between the two does not reach statistical significance (p=0.91). The table below contains the mean times to pain relief in each of the treatments for men and women (Note that each sample mean is computed on the 5 observations measured under that experimental condition).  

Mean Time to Pain Relief by Treatment and Gender

Treatment A appears to be the most efficacious treatment for both men and women. The mean times to relief are lower in Treatment A for both men and women and highest in Treatment C for both men and women. Across all treatments, women report longer times to pain relief (See below).  

Graph of two-factor ANOVA

Notice that there is the same pattern of time to pain relief across treatments in both men and women (treatment effect). There is also a sex effect - specifically, time to pain relief is longer in women in every treatment.  

Suppose that the same clinical trial is replicated in a second clinical site and the following data are observed.

Table - Time to Pain Relief by Treatment and Sex - Clinical Site 2

The ANOVA table for the data measured in clinical site 2 is shown below.

Table - Summary of Two-Factor ANOVA - Clinical Site 2

Notice that the overall test is significant (F=19.4, p=0.0001), there is a significant treatment effect, sex effect and a highly significant interaction effect. The table below contains the mean times to relief in each of the treatments for men and women.  

Table - Mean Time to Pain Relief by Treatment and Gender - Clinical Site 2

Notice that now the differences in mean time to pain relief among the treatments depend on sex. Among men, the mean time to pain relief is highest in Treatment A and lowest in Treatment C. Among women, the reverse is true. This is an interaction effect (see below).  

Graphic display of the results in the preceding table

Notice above that the treatment effect varies depending on sex. Thus, we cannot summarize an overall treatment effect (in men, treatment C is best, in women, treatment A is best).    

When interaction effects are present, some investigators do not examine main effects (i.e., do not test for treatment effect because the effect of treatment depends on sex). This issue is complex and is discussed in more detail in a later module. 

analysis of variance in research example

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

10.1 - introduction to analysis of variance.

Let's use the following example to look at the logic behind what an analysis of variance is after.

Application: Tar Content Comparisons Section  

We want to see whether the tar contents (in milligrams) for three different brands of cigarettes are different. Two different labs took samples, Lab Precise and Lab Sloppy.

Lab Precise

Lab Precise took six samples from each of the three brands and got the following measurements:

Lab Precise Dotplot

Dotplot of the 3 brands for lab precise

Lab Sloppy also took six samples from each of the three brands and got the following measurements:

Lab Sloppy Dotplot

Dotplot of the six samples taken from each brand for the Sloppy Lab

The sample means from the two labs turned out to be the same and thus the differences in the sample means from the two labs are zero.

From which data set can you draw more conclusive evidence that the means from the three populations are different?

We need to compare the between-sample-variation to the within-sample-variation. Since the between-sample-variation from Lab Sloppy is large compared to the within-sample-variation for data from Lab Precise, we will be more inclined to conclude that the three population means are different using the data from Lab Precise. Since such analysis is based on the analysis of variances for the data set, we call this statistical method the Analysis of Variance (or ANOVA) .

Statology

Statistics Made Easy

One-Way ANOVA: Definition, Formula, and Example

A one-way ANOVA  (“analysis of variance”) compares the means of three or more independent groups to determine if there is a statistically significant difference between the corresponding population means.

This tutorial explains the following:

  • The motivation for performing a one-way ANOVA.
  • The assumptions that should be met to perform a one-way ANOVA.
  • The process to perform a one-way ANOVA.
  • An example of how to perform a one-way ANOVA.

One-Way ANOVA: Motivation

Suppose we want to know whether or not three different exam prep programs lead to different mean scores on a college entrance exam. Since there are millions of high school students around the country, it would be too time-consuming and costly to go around to each student and let them use one of the exam prep programs.

Instead, we might select three  random samples  of 100 students from the population and allow each sample to use one of the three test prep programs to prepare for the exam. Then, we could record the scores for each student once they take the exam.

Selecting samples from a population

However, it’s virtually guaranteed that the mean exam score between the three samples will be at least a little different.  The question is whether or not this difference is statistically significant . Fortunately, a one-way ANOVA allows us to answer this question.

One-Way ANOVA: Assumptions

For the results of a one-way ANOVA to be valid, the following assumptions should be met:

1. Normality  – Each sample was drawn from a normally distributed population.

2. Equal Variances  – The variances of the populations that the samples come from are equal. You can use Bartlett’s Test to verify this assumption.

3. Independence  – The observations in each group are independent of each other and the observations within groups were obtained by a random sample.

Read this article for in-depth details on how to check these assumptions.

One-Way ANOVA: The Process

A one-way ANOVA uses the following null and alternative hypotheses:

  • H 0 (null hypothesis):  μ 1  = μ 2  = μ 3  = … = μ k  (all the population means are equal)
  • H 1  (alternative hypothesis):  at least one population mean is different   from the rest

You will typically use some statistical software (such as R, Excel, Stata, SPSS, etc.) to perform a one-way ANOVA since it’s cumbersome to perform by hand.

No matter which software you use, you will receive the following table as output:

  • SSR: regression sum of squares
  • SSE: error sum of squares
  • SST: total sum of squares (SST = SSR + SSE)
  • df r : regression degrees of freedom (df r  = k-1)
  • df e : error degrees of freedom (df e  = n-k)
  • k:  total number of groups
  • n:  total observations
  • MSR:  regression mean square (MSR = SSR/df r )
  • MSE: error mean square (MSE = SSE/df e )
  • F:  The F test statistic (F = MSR/MSE)
  • p:  The p-value that corresponds to F dfr, dfe

If the p-value is less than your chosen significance level (e.g. 0.05), then you can reject the null hypothesis and conclude that at least one of the population means is different from the others.

Note: If you reject the null hypothesis, this indicates that at least one of the population means is different from the others, but the ANOVA table doesn’t specify which  population means are different. To determine this, you need to perform post hoc tests , also known as “multiple comparisons” tests.

One-Way ANOVA: Example

Suppose we want to know whether or not three different exam prep programs lead to different mean scores on a certain exam. To test this, we recruit 30 students to participate in a study and split them into three groups.

The students in each group are randomly assigned to use one of the three exam prep programs for the next three weeks to prepare for an exam. At the end of the three weeks, all of the students take the same exam. 

The exam scores for each group are shown below:

Example one-way ANOVA data

To perform a one-way ANOVA on this data, we will use the Statology One-Way ANOVA Calculator with the following input:

One-way ANOVA calculation example

From the output table we see that the F test statistic is  2.358  and the corresponding p-value is  0.11385 .

ANOVA output table interpretation

Since this p-value is not less than 0.05, we fail to reject the null hypothesis.

This means  we don’t have sufficient evidence to say that there is a statistically significant difference between the mean exam scores of the three groups.

Additional Resources

The following articles explain how to perform a one-way ANOVA using different statistical softwares:

How to Perform a One-Way ANOVA in Excel How to Perform a One-Way ANOVA in R How to Perform a One-Way ANOVA in Python How to Perform a One-Way ANOVA in SAS How to Perform a One-Way ANOVA in SPSS How to Perform a One-Way ANOVA in Stata How to Perform a One-Way ANOVA on a TI-84 Calculator Online One-Way ANOVA Calculator

Featured Posts

7 Best YouTube Channels to Learn Statistics for Free

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “One-Way ANOVA: Definition, Formula, and Example”

Zach how do I cite something from Statology? I am using it for my dissertation.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Analysis of Variance

  • Reference work entry
  • First Online: 03 December 2021
  • Cite this reference work entry

analysis of variance in research example

  • Jan R. Landwehr 4  

7323 Accesses

Experiments are becoming increasingly important in marketing research. Suppose a company has to decide which of three potential new brand logos should be used in the future. An experiment in which three groups of participants rate their liking of one of the logos would provide the necessary information to make this decision. The statistical challenge is to determine which (if any) of the three logos is liked significantly more than the others. The adequate statistical technique to assess the statistical significance of such mean differences between groups of participants is called analysis of variance (ANOVA). The present chapter provides an introduction to the key statistical principles of ANOVA and compares this method to the closely related t -test, which can alternatively be used if exactly two means need to be compared. Moreover, it provides introductions to the key variants of ANOVA that have been developed for use when participants are exposed to more than one experimental condition (repeated-measures ANOVA), when more than one dependent variable is measured (multivariate ANOVA), or when a continuous control variable is considered (analysis of covariance). This chapter is intended to provide an applied introduction to ANOVA and its variants. Therefore, it is accompanied by an exemplary dataset and self-explanatory command scripts for the statistical software packages R and SPSS, which can be found in the Web-Appendix.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The naming of the variables throughout the chapter follows the key characteristic of the respective experimental scenario. “_2” refers to the two factor levels employed in the present experiment. All variable names are constructed following the same logic.

All barplots in this chapter were produced using the ggplot2 -library in R (Wickham 2009 ).

R denotes the p -value by “Pr(>F),” which refers to the probability of observing the empirical F -value given the null hypothesis. R uses exponential notation to show small numbers. Hence, the value 1.45e-05 in Fig. 3 is equivalent to 0.0000145.

In real data collections, we would collect a second independent dataset from new participants. Please assume that although the data for the second experiment (and all further studies) are stored in the same dataset, these datasets are independent and come from different participants.

Please note how the df of the ANOVA changed compared to Fig. 3 due to three rather than two factor levels.

The interested reader can find more information about a priori contrasts (also called planned contrasts) in the textbooks of Field ( 2013 ), Field et al. ( 2012 ), and of Klockars and Sax ( 1986 ).

For the example with 2 × 2 experimental cells provided in Table 4 , dummy-coding Factor 1 (simple = 0; complex = 1) and Factor 2 (business = 0; leisure = 1) would mean that the effect of Factor 1 compares the cell denoted by {0,0} (i.e., “simple and business”) to the two cells for which Factor 1 has the value 1 (i.e., “complex and business” and “complex and leisure”). The cell “simple and leisure” would be omitted from the test of the main effect, which is an undesirable feature of dummy coding when applied to ANOVA models.

It is important to note that the term “Type I” is used to denote more than just one statistical concept, which can be confusing. We already encountered the term in the context of the statistical p -value, where falsely rejecting the null hypothesis is called an alpha or Type I error. In the present context, “Type I” refers to a specific way of computing the sum of squares in an ANOVA model, which is completely unrelated to the “Type I error” in statistical hypothesis testing.

Please note that the residual degrees of freedom (i.e., 116) for the simple effects are the same as in the initial factorial ANOVA. This is the reason why simple effects have higher statistical power than other post hoc approaches that would just compare the two means, such as an independent-samples t -test.

The term demand artifact indicates that participants guess the hypothesis of an experiment and demonstrate behavior that is consistent with their guess instead of their natural behavior. Therefore, the occurrence of a demand artifact destroys the external validity of the observed effects. Sawyer ( 1975 ) provides an excellent discussion of this problem and potential solutions.

A third possible approach would be an extension of the regression framework called linear mixed models (LMM; for an applied introduction, see West et al. 2015 ).

For example, when the effect of funny vs. rational advertisement is examined, one usually shows several funny and several rational advertisements and compares the aggregated mean evaluations. The random variation between advertisements can be controlled by LMM.

An excellent introduction to the use of effect size measures and a comparison of different approaches can be found in the referred article by Lakens ( 2013 ).

Beaujean, A.A. (2012). BaylorEdPsych: R package for Baylor University educational psychology quantitative courses . R package version 0.5.

Google Scholar  

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.

Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60 (2), 170–180.

Article   Google Scholar  

Field, A. (2013). Discovering statistics using R (4th ed.). Los Angeles: Sage.

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R . Los Angeles: Sage.

Fisher, R. A. (1935). The design of experiments . Edinburgh: Oliver & Boyd.

Fox, J., & Weisberg, S. (2011). An {R} companion to applied regression (2nd ed.). Thousand Oaks: Sage.

Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24 (2), 95–112.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6 (2), 65–70.

Huynh, H., & Feldt, L. S. (1976). Estimation of the box correction for degrees of freedom from sample data in randomized block and split-plot designs. Journal of Educational Statistics, 1 (1), 69–82.

Klockars, A. J., & Sax, G. (1986). Multiple comparisons . Newbury Park: Sage.

Book   Google Scholar  

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology . https://doi.org/10.3389/fpsyg.2013.00863 .

Lawrence, M.A. (2015). ez: Easy analysis and visualization of factorial experiments . R package version 4.3.

Levene, H. (1960). Robust tests for equality of variances. In I. Olkin et al. (Eds.), Contributions to probability and statistics (pp. 278–292). Stanford: University Press.

Malhotra, N. K., Peterson, M., & Kleiser, S. B. (1999). Marketing research: A state-of-the-art review and directions for the twenty-first century. Journal of the Academy of Marketing Science, 27 (2), 160–183.

Miller, G. A., & Chapman, J. P. (2001). Misunderstanding analysis of covariance. Journal of Abnormal Psychology, 110 (1), 40–48.

Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6 (2), 135–147.

Rodger, R. S., & Roberts, M. (2013). Comparison of power for multiple comparison procedures. Journal of Methods and Measurements in the Social Sciences, 4 (1), 20–47.

Rutherford, A. (2001). Introducing ANOVA and MANOVA: A GLM approach . London: Sage.

Sawyer, A. G. (1975). Demand artifacts in laboratory experiments in consumer research. Journal of Consumer Research, 1 (4), 20–30.

West, B. T., Welch, K. B., & Galecki, A. T. (2015). Linear mixed models: A practical guide using statistical software (2nd ed.). Boca Raton: Chapman & Hall.

Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143 (5), 2020–2045.

Wickham, H. (2009). ggplot2: Elegant graphics for data analysis . New York: Springer.

Download references

Author information

Authors and affiliations.

Marketing Department, Goethe University Frankfurt, Theodor-W.-Adorno-Platz 4, 60629, Frankfurt, Germany

Jan R. Landwehr

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jan R. Landwehr .

Editor information

Editors and affiliations.

Department of Business-to-Business Marketing, Sales, and Pricing, University of Mannheim, Mannheim, Germany

Christian Homburg

Department of Marketing & Sales Research Group, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Martin Klarmann

Marketing & Sales Department, University of Mannheim, Mannheim, Germany

Arnd Vomberg

Section Editor information

University of Mannheim, Mannheim, Germany

 & Christian Homburg & Arnd Vomberg

Karlsruhe Institute of Technology, Karlsruhe, Germany

Electronic Supplementary Material

Rights and permissions.

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this entry

Cite this entry.

Landwehr, J.R. (2022). Analysis of Variance. In: Homburg, C., Klarmann, M., Vomberg, A. (eds) Handbook of Market Research. Springer, Cham. https://doi.org/10.1007/978-3-319-57413-4_16

Download citation

DOI : https://doi.org/10.1007/978-3-319-57413-4_16

Published : 03 December 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-57411-0

Online ISBN : 978-3-319-57413-4

eBook Packages : Business and Management Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Search Search Please fill out this field.

What Is Analysis of Variance (ANOVA)?

Using anova, what anova reveals, one-way vs. two-way anova, the bottom line.

  • Fundamental Analysis

Learn how to use this statistical analysis tool

analysis of variance in research example

Erika Rasure is globally-recognized as a leading consumer economics subject matter expert, researcher, and educator. She is a financial therapist and transformational coach, with a special interest in helping women learn how to invest.

analysis of variance in research example

Analysis of variance (ANOVA) is a statistical test used to evaluate the difference between the means of more than two groups. This statistical analysis tool separates the total variability within a data set into two components: random and systematic factors.

A one-way ANOVA uses one independent variable. A two-way ANOVA uses two independent variables. Analysts use the ANOVA test to determine independent variables' influence on the dependent variable in a regression study.

Key Takeaways

  • Analysis of variance (ANOVA) is a statistical test used to evaluate the difference between the means of more than two groups.
  • A one-way ANOVA uses one independent variable. A two-way ANOVA uses two independent variables.
  • If no true variance exists between the groups, the ANOVA's F-ratio should equal close to 1.

An ANOVA test can be applied when data needs to be experimental. Analysis of variance is employed if there is no access to statistical software and ANOVA must be calculated by hand. It is simple to use and best suited for small samples. It is employed with subjects, test groups, and between and among groups.

ANOVA is similar to multiple two-sample t-tests . However, it results in fewer type I errors . ANOVA groups differences by comparing each group's means and includes spreading the variance into diverse sources. Analysts use a one-way ANOVA with collected data about one independent variable and one dependent variable. A two-way ANOVA uses two independent variables . The independent variable should have at least three different groups or categories. ANOVA determines if the dependent variable changes according to the level of the independent variable. 

A researcher might test students from multiple colleges to see if students from one of the colleges consistently outperform students from the other schools. In a business application, an R&D researcher might test two different processes of creating a product to see if one is better than the other in terms of cost efficiency.

F = MST MSE where: F = ANOVA coefficient MST = Mean sum of squares due to treatment MSE = Mean sum of squares due to error \begin{aligned} &\text{F} = \frac{ \text{MST} }{ \text{MSE} } \\ &\textbf{where:} \\ &\text{F} = \text{ANOVA coefficient} \\ &\text{MST} = \text{Mean sum of squares due to treatment} \\ &\text{MSE} = \text{Mean sum of squares due to error} \\ \end{aligned} ​ F = MSE MST ​ where: F = ANOVA coefficient MST = Mean sum of squares due to treatment MSE = Mean sum of squares due to error ​

History of ANOVA

The t- and z-test methods developed in the 20th century were used for statistical analysis until 1918 when Ronald Fisher created the analysis of variance method. ANOVA is also called the Fisher analysis of variance, and it is the extension of the t- and z-tests. The term became well-known in 1925, after appearing in Fisher's book, "Statistical Methods for Research Workers." It was employed in experimental psychology and later expanded to more complex subjects.

The ANOVA test is the initial step in analyzing factors that affect a given data set. Once the test is finished, an analyst performs additional testing on the methodical factors that measurably contribute to the data set's inconsistency. The analyst utilizes the ANOVA test results in an f-test to generate additional data that aligns with the proposed regression models.

ANOVA splits an observed aggregate variability inside a data set into two parts: systematic factors and random factors. The systematic factors influence the given data set, while the random factors do not.

The ANOVA test allows a comparison of more than two groups simultaneously to determine whether a relationship exists between them. The result of the ANOVA formula, the F statistic or F-ratio, allows for the analysis of multiple data groups to determine the variability between samples and within samples.

If no real difference exists between the tested groups, called the null hypothesis , the result of the ANOVA's F-ratio statistic will be close to 1. The distribution of all possible values of the F statistic is the F-distribution. This is a group of distribution functions, with two characteristic numbers, called the numerator degrees of freedom and the denominator degrees of freedom.

A one-way ANOVA evaluates the impact of a sole factor on a sole response variable. It determines whether all the samples are the same. The one-way ANOVA is used to determine whether there are any statistically significant differences between the means of three or more independent groups.

A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have one independent variable affecting a dependent variable. With a two-way ANOVA, there are two independents. For example, a two-way ANOVA allows a company to compare worker productivity based on two independent variables, such as salary and skill set. It is utilized to observe the interaction between the two factors and test the effect of two factors simultaneously.

MANOVA (multivariate ANOVA), differs from ANOVA as it tests for multiple dependent variables simultaneously while the ANOVA assesses only one dependent variable at a time.

How Does ANOVA Differ From a T Test?

ANOVA differs from T tests in that ANOVA can compare three or more groups while T tests are only useful for comparing two groups at one time.

What Is Analysis of Covariance (ANCOVA)?

Analysis of Covariance combines ANOVA and regression. It can be useful for understanding within-group variance that ANOVA tests do not explain.

Does ANOVA Rely on Any Assumptions?

Yes, ANOVA tests assume that the data is normally distributed and that variance levels in each group are roughly equal. Finally, it assumes that all observations are made independently. If these assumptions are inaccurate, ANOVA may not be useful for comparing groups.

ANOVA can compare more than two groups to identify relationships between them. The technique can be used in scholarly settings to analyze research or finance to predict future movements in stock prices.

Genetic Epidemiology, Translational Neurogenomics, Psychiatric Genetics and Statistical Genetics-QIMR Berghofer Medical Research Institute. " The Correlation Between Relatives on the Supposition of Mendelian Inheritance ."

Ronald Fisher. " Statistical Methods for Research Workers ." Springer-Verlag New York, 1992.

analysis of variance in research example

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

5.1: Analysis of Variance

  • Last updated
  • Save as PDF
  • Page ID 2897

  • Diane Kiernan
  • SUNY College of Environmental Science and Forestry via OpenSUNY

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Variance Analysis

Previously, we have tested hypotheses about two population means. This chapter examines methods for comparing more than two means. Analysis of variance (ANOVA) is an inferential method used to test the equality of three or more population means.

\(H_0: \mu_1= \mu_2= \mu_3= \cdot =\mu_k\)

This method is also referred to as single-factor ANOVA because we use a single property, or characteristic, for categorizing the populations. This characteristic is sometimes referred to as a treatment or factor.

A treatment (or factor) is a property, or characteristic, that allows us to distinguish the different populations from one another.

The objects of ANOVA are (1) estimate treatment means, and the differences of treatment means; (2) test hypotheses for statistical significance of comparisons of treatment means, where “treatment” or “factor” is the characteristic that distinguishes the populations.

For example, a biologist might compare the effect that three different herbicides may have on seed production of an invasive species in a forest environment. The biologist would want to estimate the mean annual seed production under the three different treatments, while also testing to see which treatment results in the lowest annual seed production. The null and alternative hypotheses are:

It would be tempting to test this null hypothesis \(H_0: \mu_1= \mu_2= \mu_3\) by comparing the population means two at a time. If we continue this way, we would need to test three different pairs of hypotheses:

If we used a 5% level of significance, each test would have a probability of a Type I error (rejecting the null hypothesis when it is true) of α = 0.05. Each test would have a 95% probability of correctly not rejecting the null hypothesis. The probability that all three tests correctly do not reject the null hypothesis is 0.953 = 0.86. There is a 1 – 0.953 = 0.14 (14%) probability that at least one test will lead to an incorrect rejection of the null hypothesis. A 14% probability of a Type I error is much higher than the desired alpha of 5% (remember: α is the same as Type I error). As the number of populations increases, the probability of making a Type I error using multiple t-tests also increases. Analysis of variance allows us to test the null hypothesis (all means are equal) against the alternative hypothesis (at least one mean is different) with a specified value of α.

Image37184.PNG

In the previous chapter, we used a two-sample t-test to compare the means from two independent samples with a common variance. The sample data are used to compute the test statistic:

\(t=\dfrac {\bar {x_1}-\bar {x_2}}{s_p\sqrt {\dfrac {1}{n_1}+\dfrac {1}{n_2}}}\) where \(S_p^2 = \dfrac {(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1+n_2-2}\)

is the pooled estimate of the common population variance σ2. To test more than two populations, we must extend this idea of pooled variance to include all samples as shown below:

\[s^2_w= \frac {(n_1-1)s_1^2 + (n_2-1)s_2^2 + ...+(n_k - 1)s_k^2}{n_1+n_2+...+n_k-k}\]

where \(s_w^2\) represents the pooled estimate of the common variance \(\sigma^2\), and it measures the variability of the observations within the different populations whether or not H 0 is true . This is often referred to as the variance within samples (variation due to error).

If the null hypothesis IS true (all the means are equal), then all the populations are the same, with a common mean \(\mu\) and variance \(\sigma^2\). Instead of randomly selecting different samples from different populations, we are actually drawing k different samples from one population. We know that the sampling distribution for k means based on n observations will have mean \(\mu \bar x\) and variance \(\frac {\sigma^2}{n}\) (squared standard error). Since we have drawn k samples of n observations each, we can estimate the variance of the k sample means (\(\frac {\sigma^2}{n}\)) by

\[\dfrac {\sum(\bar {x_1} - \mu_{\bar x} )^2}{k-1} = \dfrac {\sum \bar {x_i}^1 - \dfrac {[\sum \bar {x_i}]^2}{k}}{k-1} = \frac {\sigma^2}{n}\]

Consequently, n times the sample variance of the means estimates σ2. We designate this quantity as SB2 such that

\[S_B^2 = n*\dfrac {\sum (\bar {x_i}-\mu_{\bar x})^2}{k-1}=n*\dfrac {\sum \bar {x_i}^2 -\dfrac {[\bar {x_i}]^2}{k}}{k-1}\]

where \(S_B^2\) is also an unbiased estimate of the common variance \(\sigma^2\), IF \(H_0\) IS TRUE. This is often referred to as the variance between samples (variation due to treatment).

Under the null hypothesis that all k populations are identical, we have two estimates of \(σ_2\) (\(S_W^2\)and \(S_B^2\)). We can use the ratio of \(S_B^2/ S_W^2\) as a test statistic to test the null hypothesis that \(H_0: \mu_1= \mu_2= \mu_3= …= \mu_k\), which follows an F-distribution with degrees of freedom \(df_1= k – 1\) and \(df_2= N –k \) (where k is the number of populations and N is the total number of observations (\(N = n_1 + n_2+…+ n_k\)). The numerator of the test statistic measures the variation between sample means. The estimate of the variance in the denominator depends only on the sample variances and is not affected by the differences among the sample means.

When the null hypothesis is true, the ratio of \(S_B^2\) and \(S_W^2\) will be close to 1. When the null hypothesis is false, \(S_B^2\) will tend to be larger than \(S_W^2\) due to the differences among the populations. We will reject the null hypothesis if the F test statistic is larger than the F critical value at a given level of significance (or if the p-value is less than the level of significance).

Tables are a convenient format for summarizing the key results in ANOVA calculations. The following one-way ANOVA table illustrates the required computations and the relationships between the various ANOVA table elements.

Table 1. One-way ANOVA table.

The sum of squares for the ANOVA table has the relationship of SSTo = SSTr + SSE where:

\[SSTo = \sum_{i=1}^k \sum_{j=1}^n (x_{ij} - \bar {\bar{x}})^2\]

\[SSTr = \sum_{i=1}^k n_i(\bar {x_i} -\bar {\bar{x}})^2\]

\[SSE = \sum_{i=1}^k \sum^n_{j=1} (x_{ij}-\bar {x_i})^2\]

Total variation (SSTo) = explained variation (SSTr) + unexplained variation (SSE)

The degrees of freedom also have a similar relationship: df(SSTo) = df(SSTr) + df(SSE)

The Mean Sum of Squares for the treatment and error are found by dividing the Sums of Squares by the degrees of freedom for each. While the Sums of Squares are additive, the Mean Sums of Squares are not. The F-statistic is then found by dividing the Mean Sum of Squares for the treatment (MSTr) by the Mean Sum of Squares for the error(MSE). The MSTr is the \(S_B^2\) and the MSE is the \(S_W^2\).

\[F=\dfrac {S_B^2}{S_W^2}=\dfrac {MSTr}{MSE}\]

Example \(\PageIndex{1}\):

An environmentalist wanted to determine if the mean acidity of rain differed among Alaska, Florida, and Texas. He randomly selected six rain dates at each site obtained the following data:

Table 2. Data for Alaska, Florida, and Texas.

\(H_0: \mu_A = \mu_F = \mu_T\)

\(H_1\): at least one of the means is different

Table 3. Summary Table.

Notice that there are differences among the sample means. Are the differences small enough to be explained solely by sampling variability? Or are they of sufficient magnitude so that a more reasonable explanation is that the μ’s are not all equal? The conclusion depends on how much variation among the sample means (based on their deviations from the grand mean) compares to the variation within the three samples.

The grand mean is equal to the sum of all observations divided by the total sample size:

\(\bar {\bar{x}}\)= grand total/N = 90.52/18 = 5.0289

\[SSTo = (5.11-5.0289)^2 + (5.01-5.0289)^2 +…+(5.24-5.0289)^2+ (4.87-5.0289)^2 + (4.18-5.0289)^2 +…+(4.09-5.0289)^2 + (5.46-5.0289)^2 + (6.29-5.0289)^2 +…+(5.30-5.0289)^2 = 4.6384\]

\[SSTr = 6(5.033-5.0289)^2 + 6(4.517-5.0289)^2 + 6(5.537-5.0289)^2 = 3.1214\]

\[SSE = SSTo – SSTr = 4.6384 – 3.1214 = 1.5170\]

Table 4. One-way ANOVA Table.

This test is based on \(df_1 = k – 1 = 2\) and \(df_2 = N – k = 15\). For α = 0.05, the F critical value is 3.68. Since the observed F = 15.4372 is greater than the F critical value of 3.68, we reject the null hypothesis. There is enough evidence to state that at least one of the means is different.

Software Solutions

clipboard_e664bf0e8f3a6cce5e16057e057520d39.png

One-way ANOVA: pH vs. State

The p-value (0.000) is less than the level of significance (0.05) so we will reject the null hypothesis.

clipboard_e39809b8297b4f8b399d0ca66a85e15bf.png

ANOVA: Single Factor

The p-value (0.000229) is less than alpha (0.05) so we reject the null hypothesis. There is enough evidence to support the claim that at least one of the means is different.

Once we have rejected the null hypothesis and found that at least one of the treatment means is different, the next step is to identify those differences. There are two approaches that can be used to answer this type of question: contrasts and multiple comparisons.

Contrasts can be used only when there are clear expectations BEFORE starting an experiment, and these are reflected in the experimental design. Contrasts are planned comparisons . For example, mule deer are treated with drug A, drug B, or a placebo to treat an infection. The three treatments are not symmetrical. The placebo is meant to provide a baseline against which the other drugs can be compared. Contrasts are more powerful than multiple comparisons because they are more specific. They are more able to pick up a significant difference. Contrasts are not always readily available in statistical software packages (when they are, you often need to assign the coefficients), or may be limited to comparing each sample to a control.

Multiple comparisons should be used when there are no justified expectations. They are aposteriori , pair-wise tests of significance. For example, we compare the gas mileage for six brands of all-terrain vehicles. We have no prior knowledge to expect any vehicle to perform differently from the rest. Pair-wise comparisons should be performed here, but only if an ANOVA test on all six vehicles rejected the null hypothesis first.

It is NOT appropriate to use a contrast test when suggested comparisons appear only after the data have been collected. We are going to focus on multiple comparisons instead of planned contrasts.

IMAGES

  1. Objective of variance analysis

    analysis of variance in research example

  2. Variance Analysis

    analysis of variance in research example

  3. What is ANOVA (Analysis of Variance) in Statistics ?

    analysis of variance in research example

  4. Analysis of Variance (ANOVA) Explained with Formula, and an Example

    analysis of variance in research example

  5. How To Calculate Variance In 4 Simple Steps

    analysis of variance in research example

  6. Variance Analysis Report

    analysis of variance in research example

VIDEO

  1. Advanced Variance Analysis Part-

  2. Variance Analysis Part 03

  3. Variance Analysis Direct Material Price Variance & Quantity Variance

  4. analysis of variance teach one merged

  5. VARIANCE CONCEPT AND EXAMPLE #shorts #statistics #data #datanalysis #analysis #mean

  6. Lesson 3 variance & standard deviation of discrete random variables

COMMENTS

  1. ANOVA (Analysis of variance)

    The Analysis of Variance (ANOVA) is a powerful statistical technique that is used widely across various fields and industries. Here are some of its key applications: Agriculture. ANOVA is commonly used in agricultural research to compare the effectiveness of different types of fertilizers, crop varieties, or farming methods.

  2. One-way ANOVA

    ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups. A one-way ANOVA uses one independent variable, while a two-way ANOVA uses two independent variables. As a crop researcher, you want to test the effect of three different fertilizer mixtures on crop yield.

  3. Analysis of Variance (ANOVA): Types, Examples & Uses

    ANOVA, or analysis of variance, is a statistical method used to determine whether there are significant differences between the means of two or more groups. It separates the observed variation found within a data set into components attributable to different sources of variation. The null hypothesis states that the means of all groups are the ...

  4. ANOVA Test: Definition, Types, Examples, SPSS

    A factorial ANOVA is an Analysis of Variance test with more than one independent variable, or "factor". It can also refer to more than one Level of Independent Variable. For example, an experiment with a treatment group and a control group has one factor (the treatment) but two levels (the treatment and the control). The terms "two-way ...

  5. Lesson 10: Introduction to ANOVA

    In this Lesson, we introduce Analysis of Variance or ANOVA. ANOVA is a statistical method that analyzes variances to determine if the means from more than two populations are the same. In other words, we have a quantitative response variable and a categorical explanatory variable with more than two levels. In ANOVA, the categorical explanatory ...

  6. Analysis of variance (ANOVA)

    Analysis of variance, or ANOVA, is an approach to comparing data with multiple means across different groups, and allows us to see patterns and trends within complex and varied data. See three examples of ANOVA in action as you learn how it can be applied to more complex statistical analyses.

  7. Two-Way ANOVA

    ANOVA (Analysis of Variance) is a statistical test used to analyze the difference between the means of more than two groups. A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. Use a two-way ANOVA when you want to know how two independent variables, in ...

  8. ANOVA: Complete guide to Statistical Analysis & Applications

    Steps to Perform Two-Way ANOVA in Excel 2013. Step 1: Click the "Data" tab and then click "Data Analysis.". If you don't see the Data analysis option, install the Data Analysis Toolpak. Step 2: Click "ANOVA two factor with replication" and then click "OK.". The two-way ANOVA window will open.

  9. 1: Overview of ANOVA

    "Classic" analysis of variance (ANOVA) is a method to compare average (mean) responses to experimental manipulations in controlled environments. For example, if people who want to lose weight are randomly selected to participate in a weight-loss study, each person might be randomly assigned to a dieting group, an exercise group, and a "control ...

  10. ANOVA Test Statistics: Analysis of Variance

    The test statistic for an ANOVA is denoted as F. The formula for ANOVA is F = variance caused by treatment/variance due to random chance. The ANOVA F value can tell you if there is a significant difference between the levels of the independent variable, when p < .05. So, a higher F value indicates that the treatment variables are significant.

  11. 15: Analysis of Variance

    15.1: Introduction to ANOVA. Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It may seem odd that the technique is called "Analysis of Variance" rather than "Analysis of Means." As you will see, the name is appropriate because inferences about means are made by analyzing variance.

  12. ANOVA (Analysis of variance) • Simply explained

    Analysis of variance hypotheses. The null hypothesis and the alternative hypothesis result from a one-way analysis of variance as follows: Null hypothesis H 0: The mean value of all groups is the same.; Alternative hypothesis H 1: There are differences in the mean values of the groups.; The results of the Anova can only make a statement about whether there are differences between at least two ...

  13. Hypothesis Testing

    In analysis of variance we are testing for a difference in means (H 0: means are all equal versus H 1: means are not all equal) by evaluating variability in the data. The numerator captures between treatment variability (i.e., differences among the sample means) and the denominator contains an estimate of the variability in the outcome.

  14. 15.1: Introduction to ANOVA

    Describe the uses of ANOVA. Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. It may seem odd that the technique is called "Analysis of Variance" rather than "Analysis of Means." As you will see, the name is appropriate because inferences about means are made by analyzing variance.

  15. Analysis of variance

    Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher.ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components ...

  16. 10.1

    Since the between-sample-variation from Lab Sloppy is large compared to the within-sample-variation for data from Lab Precise, we will be more inclined to conclude that the three population means are different using the data from Lab Precise. Since such analysis is based on the analysis of variances for the data set, we call this statistical ...

  17. One-Way ANOVA: Definition, Formula, and Example

    A one-way ANOVA ("analysis of variance") compares the means of three or more independent groups to determine if there is a statistically significant difference between the corresponding population means.. This tutorial explains the following: The motivation for performing a one-way ANOVA. The assumptions that should be met to perform a one-way ANOVA.

  18. How to Calculate Variance

    With samples, we use n - 1 in the formula because using n would give us a biased estimate that consistently underestimates variability. The sample variance would tend to be lower than the real variance of the population. Reducing the sample n to n - 1 makes the variance artificially large, giving you an unbiased estimate of variability: it is better to overestimate rather than ...

  19. PDF Introduction to analysis of variance

    Analysis of variance, often abbreviated to ANOVA, is a powerful statistic and a core technique for testing causality in biological data. Researchers use ANOVA to explain variation in the magnitude of a response variable of interest. For example, an investigator might be interested in the sources of variation in patients' blood cholesterol ...

  20. Analysis of Variance

    The higher the empirical t-value , the less likely is a purely random mean difference.The theoretical t-distribution can be used to compute the exact likelihood of observing an empirical mean difference given the null hypothesis that both means are the same, which is denoted as the p-value.For the present example, the mean is 4.32 for the simple font condition and 5.53 for the complex font ...

  21. What Is Analysis of Variance (ANOVA)?

    Analysis Of Variance - ANOVA: Analysis of variance (ANOVA) is an analysis tool used in statistics that splits the aggregate variability found inside a data set into two parts: systematic factors ...

  22. 5.1: Analysis of Variance

    A general rule of thumb is as follows: One-way ANOVA may be used if the largest sample standard deviation is no more than twice the smallest sample standard deviation. In the previous chapter, we used a two-sample t-test to compare the means from two independent samples with a common variance. The sample data are used to compute the test statistic: