Teach yourself statistics

How to Test for Normality: Three Simple Tests

Many statistical techniques (regression, ANOVA, t-tests, etc.) rely on the assumption that data is normally distributed. For these techniques, it is good practice to examine the data to confirm that the assumption of normality is tenable.

With that in mind, here are three simple ways to test interval-scale data or ratio-scale data for normality.

  • Check descriptive statistics.
  • Generate a histogram.
  • Conduct a chi-square test.

Each option is easy to implement with Excel, as long as you have Excel's Analysis ToolPak.

The Analysis ToolPak

To conduct the tests for normality described below, you need a free Microsoft add-in called the Analysis ToolPak, which may or may not be already installed on your copy of Excel.

To determine whether you have the Analysis ToolPak, click the Data tab in the main Excel menu. If you see Data Analysis in the Analysis section, you're good. You have the ToolPak.

If you don't have the ToolPak, you need to get it. Go to: How to Install the Data Analysis ToolPak in Excel .

Descriptive Statistics

Perhaps, the easiest way to test for normality is to examine several common descriptive statistics. Here's what to look for:

  • Central tendency. The mean and the median are summary measures used to describe central tendency - the most "typical" value in a set of values. With a normal distribution, the mean is equal to the median.
  • Skewness. Skewness is a measure of the asymmetry of a probability distribution. If observations are equally distributed around the mean, the skewness value is zero; otherwise, the skewness value is positive or negative. As a rule of thumb, skewness between -2 and +2 is consistent with a normal distribution.
  • Kurtosis. Kurtosis is a measure of whether observations cluster around the mean of the distribution or in the tails of the distribution. The normal distribution has a kurtosis value of zero. As a rule of thumb, kurtosis between -2 and +2 is consistent with a normal distribution.

Together, these descriptive measures provide a useful basis for judging whether a data set satisfies the assumption of normality.

To see how to compute descriptive statistics in Excel, consider the following data set:

Begin by entering data in a column or row of an Excel spreadsheet:

Next, from the navigation menu in Excel, click Data / Data analysis . That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Descriptive Statistics and click the OK button:

Then, in the Descriptive Statistics dialog box, enter the input range, and click the Summary Statistics check box. The dialog box, with entries, should look like this:

And finally, to display summary statistics, click the OK button on the Descriptive Statistics dialog box. Among other outputs, you should see the following:

The mean is nearly equal to the median. And both skewness and kurtosis are between -2 and +2.

Conclusion: These descriptive statistics are consistent with a normal distribution.

Another easy way to test for normality is to plot data in a histogram , and see if the histogram reveals the bell-shaped pattern characteristic of a normal distribution. With Excel, this is a a four-step process:

  • Enter data. This means entering data values in an Excel spreadsheet. The column, row, or range of cells that holds data is the input range .
  • Define bins. In Excel, bins are category ranges. To define a bin, you enter the upper range of each bin in a column, row, or range of cells. The block of cells that holds upper-range entries is called the bin range .
  • Plot the data in a histogram. In Excel, access the histogram function through: Data / Data analysis / Histogram .
  • In the Histogram dialog box, enter the input range and the bin range ; and check the Chart Output box. Then, click OK.

If the resulting histogram looks like a bell-shaped curve, your work is done. The data set is normal or nearly normal. If the curve is not bell-shaped, the data may not be normal.

To see how to plot data for normality with a histogram in Excel, we'll use the same data set (shown below) that we used in Example 1.

Begin by entering data to define an input range and a bin range. Here is what data entry looks like in an Excel spreadsheet:

Next, from the navigation menu in Excel, click Data / Data analysis . That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Histogram and click the OK button:

Then, in the Histogram dialog box, enter the input range, enter the bin range, and click the Chart Output check box. The dialog box, with entries, should look like this:

And finally, to display the histogram, click the OK button on the Histogram dialog box. Here is what you should see:

The plot is fairly bell-shaped - an almost-symmetric pattern with one peak in the middle. Given this result, it would be safe to assume that the data were drawn from a normal distribution. On the other hand, if the plot were not bell-shaped, you might suspect the data were not from a normal distribution.

Chi-Square Test

The chi-square test for normality is another good option for determining whether a set of data was sampled from a normal distribution.

Note: All chi-square tests assume that the data under investigation was sampled randomly.

Hypothesis Testing

The chi-square test for normality is an actual hypothesis test , where we examine observed data to choose between two statistical hypotheses:

  • Null hypothesis: Data is sampled from a normal distribution.
  • Alternative hypothesis: Data is not sampled from a normal distribution.

Like many other techniques for testing hypotheses, the chi-square test for normality involves computing a test-statistic and finding the P-value for the test statistic, given degrees of freedom and significance level . If the P-value is bigger than the significance level, we accept the null hypothesis; if it is smaller, we reject the null hypothesis.

How to Conduct the Chi-Square Test

The steps required to conduct a chi-square test of normality are listed below:

  • Specify the significance level.
  • Find the mean, standard deviation, sample size for the sample.
  • Define non-overlapping bins.
  • Count observations in each bin, based on actual dependent variable scores.
  • Find the cumulative probability for each bin endpoint.
  • Find the probability that an observation would land in each bin, assuming a normal distribution.
  • Find the expected number of observations in each bin, assuming a normal distribution.
  • Compute a chi-square statistic.
  • Find the degrees of freedom, based on the number of bins.
  • Find the P-value for the chi-square statistic, based on degrees of freedom.
  • Accept or reject the null hypothesis, based on P-value and significance level.

So you will understand how to accomplish each step, let's work through an example, one step at a time.

To demonstrate how to conduct a chi-square test for normality in Excel, we'll use the same data set (shown below) that we've used for the previous two examples. Here it is again:

Now, using this data, let's check for normality.

Specify Significance Level

The significance level is the probability of rejecting the null hypothesis when it is true. Researchers often choose 0.05 or 0.01 for a significance level. For the purpose of this exercise, let's choose 0.05.

Find the Mean, Standard Deviation, and Sample Size

To compute a chi-square test statistic, we need to know the mean, standared deviation, and sample size. Excel can provide this information. Here's how:

Define Bins

To conduct a chi-square analysis, we need to define bins, based on dependent variable scores. Each bin is defined by a non-overlapping range of values.

For the chi-square test to be valid, each bin should hold at least five observations. With that in mind, we'll define four bins for this example, as shown below:

Bin 1 will hold dependent variable scores that are less than 4; Bin 2, scores between 4 and 5; Bin 3, scores between 5.1 and 6; and and Bin 4, scores greater than 6.

Note: The number of bins is an arbitrary decision made by the experimenter, as long as the experimenter chooses at least four bins and at least five observations per bin. With fewer than four bins, there are not enough degrees of freedom for the analysis. For this example, we chose to define only four bins. Given the small sample, if we used more bins, at least one bin would have fewer than five observations per bin.

Count Observed Data Points in Each Bin

The next step is to count the observed data points in each bin. The figure below shows sample observations allocated to bins, with a frequency count for each bin in the final row.

Note: We have five observed data points in each bin - the minimum required for a valid chi-square test of normality.

Find Cumulative Probability

A cumulative probability refers to the probability that a random variable is less than or equal to a specific value. In Excel, the NORMDIST function computes cumulative probabilities from a normal distribution.

Assuming our data follows a normal distribution, we can use the NORMDIST function to find cumulative probabilities for the upper endpoints in each bin. Here is the formula we use:

P j = NORMDIST (MAX j , X , s, TRUE)

where P j is the cumulative probability for the upper endpoint in Bin j , MAX j is the upper endpoint for Bin j , X is the mean of the data set, and s is the standard deviation of the data set.

When we execute the formula in Excel, we get the following results:

P 1 = NORMDIST (4, 5.1, 2.0, TRUE) = 0.29

P 2 = NORMDIST (5, 5.1, 2.0, TRUE) = 0.48

P 3 = NORMDIST (6, 5.1, 2.0, TRUE) = 0.67

P 4 = NORMDIST (999999999, 5.1, 2.0, TRUE) = 1.00

Note: For Bin 4, the upper endpoint is positive infinity (∞), a quantity that is too large to be represented in an Excel function. To estimate cumulative probability for Bin 4 (P 4 ) with excel, you can use a very large number (e.g., 999999999) in place of positive infinity (as shown above). Or you can recognize that the probability that any random variable is less than or equal to positive infinity is 1.00.

Find Bin Probability

Given the cumulative probabilities shown above, it is possible to find the probability that a randomly selected observation would fall in each bin, using the following formulas:

P( Bin = 1 ) = P 1 = 0.29

P( Bin = 2 ) = P 2 - P 1 = 0.48 - 0.29 = 0.19

P( Bin = 3 ) = P 3 - P 2 = 0.67 - 0.48 = 0.19

P( Bin = 4 ) = P 4 - P 3 = 1.000 - 0.67 = 0.33

Find Expected Number of Observations

Assuming a normal distribution, the expected number of observations in each bin can be found by using the following formula:

Exp j = P( Bin = j ) * n

where Exp j is the expected number of observations in Bin j , P( Bin = j ) is the probability that a randomly selected observation would fall in Bin j , and n is the sample size

Applying the above formula to each bin, we get the following:

Exp 1 = P( Bin = 1 ) * 20 = 0.29 * 20 = 5.8

Exp 2 = P( Bin = 2 ) * 20 = 0.19 * 20 = 3.8

Exp 3 = P( Bin = 3 ) * 20 = 0.19 * 20 = 3.8

Exp 3 = P( Bin = 4 ) * 20 = 0.33 * 20 = 6.6

Compute Chi-Square Statistic

Finally, we can compute the chi-square statistic ( χ 2 ), using the following formula:

χ 2 = Σ [ ( Obs j - Exp j ) 2 / Exp j ]

where Obs j is the observed number of observations in Bin j , and Exp j is the expected number of observations in Bin j .

Find Degrees of Freedom

Assuming a normal distribution, the degrees of freedom (df) for a chi-square test of normality equals the number of bins (n b ) minus the number of estimated parameters (n p ) minus one. We used four bins, so n b equals four. And to conduct this analysis, we estimated two parameters (the mean and the standard deviation), so n p equals two. Therefore,

df = n b - n p - 1 = 4 - 2 - 1 = 1

Find P-Value

The P-value is the probability of seeing a chi-square test statistic that is more extreme (bigger) than the observed chi-square statistic. For this problem, we found that the observed chi-square statistic was 1.26. Therefore, we want to know the probability of seeing a chi-square test statistic bigger than 1.26, given one degree of freedom.

Use Stat Trek's Chi-Square Calculator to find that probability. Enter the degrees of freedom (1) and the observed chi-square statistic (1.26) into the calculator; then, click the Calculate button.

From the calculator, we see that P( X 2 > 1.26 ) equals 0.26.

Test Null Hypothesis

When the P-Value is bigger than the significance level, we cannot reject the null hypothesis. Here, the P-Value (0.26) is bigger than the significance level (0.05), so we cannot reject the null hypothesis that the data tested follows a normal distribution.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Endocrinol Metab
  • v.10(2); Spring 2012

Logo of ijem

Normality Tests for Statistical Analysis: A Guide for Non-Statisticians

Asghar ghasemi.

1 Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran

Saleh Zahediasl

Statistical errors are common in scientific literature and about 50% of the published articles have at least one error. The assumption of normality needs to be checked for many statistical procedures, namely parametric tests, because their validity depends on it. The aim of this commentary is to overview checking for normality in statistical analysis using SPSS.

1. Background

Statistical errors are common in scientific literature, and about 50% of the published articles have at least one error ( 1 ). Many of the statistical procedures including correlation, regression, t tests, and analysis of variance, namely parametric tests, are based on the assumption that the data follows a normal distribution or a Gaussian distribution (after Johann Karl Gauss, 1777–1855); that is, it is assumed that the populations from which the samples are taken are normally distributed ( 2 - 5 ). The assumption of normality is especially critical when constructing reference intervals for variables ( 6 ). Normality and other assumptions should be taken seriously, for when these assumptions do not hold, it is impossible to draw accurate and reliable conclusions about reality ( 2 , 7 ).

With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems ( 4 ); this implies that we can use parametric procedures even when the data are not normally distributed ( 8 ). If we have samples consisting of hundreds of observations, we can ignore the distribution of the data ( 3 ). According to the central limit theorem, (a) if the sample data are approximately normal then the sampling distribution too will be normal; (b) in large samples (> 30 or 40), the sampling distribution tends to be normal, regardless of the shape of the data ( 2 , 8 ); and (c) means of random samples from any distribution will themselves have normal distribution ( 3 ). Although true normality is considered to be a myth ( 8 ), we can look for normality visually by using normal plots ( 2 , 3 ) or by significance tests, that is, comparing the sample distribution to a normal one ( 2 , 3 ). It is important to ascertain whether data show a serious deviation from normality ( 8 ). The purpose of this report is to overview the procedures for checking normality in statistical analysis using SPSS.

2. Visual Methods

Visual inspection of the distribution may be used for assessing normality, although this approach is usually unreliable and does not guarantee that the distribution is normal ( 2 , 3 , 7 ). However, when data are presented visually, readers of an article can judge the distribution assumption by themselves ( 9 ). The frequency distribution (histogram), stem-and-leaf plot, boxplot, P-P plot (probability-probability plot), and Q-Q plot (quantile-quantile plot) are used for checking normality visually ( 2 ). The frequency distribution that plots the observed values against their frequency, provides both a visual judgment about whether the distribution is bell shaped and insights about gaps in the data and outliers outlying values ( 10 ). The stem-and-leaf plot is a method similar to the histogram, although it retains information about the actual data values ( 8 ). The P-P plot plots the cumulative probability of a variable against the cumulative probability of a particular distribution (e.g., normal distribution). After data are ranked and sorted, the corresponding z-score is calculated for each rank as follows: z = x - ᵪ̅ / s . This is the expected value that the score should have in a normal distribution. The scores are then themselves converted to z-scores. The actual z-scores are plotted against the expected z-scores. If the data are normally distributed, the result would be a straight diagonal line ( 2 ). A Q-Q plot is very similar to the P-P plot except that it plots the quantiles (values that split a data set into equal portions) of the data set instead of every individual score in the data. Moreover, the Q-Q plots are easier to interpret in case of large sample sizes ( 2 ). The boxplot shows the median as a horizontal line inside the box and the interquartile range (range between the 25 th to 75 th percentiles) as the length of the box. The whiskers (line extending from the top and bottom of the box) represent the minimum and maximum values when they are within 1.5 times the interquartile range from either end of the box ( 10 ). Scores greater than 1.5 times the interquartile range are out of the boxplot and are considered as outliers, and those greater than 3 times the interquartile range are extreme outliers. A boxplot that is symmetric with the median line at approximately the center of the box and with symmetric whiskers that are slightly longer than the subsections of the center box suggests that the data may have come from a normal distribution ( 8 ).

3. Normality Tests

The normality tests are supplementary to the graphical assessment of normality ( 8 ). The main tests for the assessment of normality are Kolmogorov-Smirnov (K-S) test ( 7 ), Lilliefors corrected K-S test ( 7 , 10 ), Shapiro-Wilk test ( 7 , 10 ), Anderson-Darling test ( 7 ), Cramer-von Mises test ( 7 ), D’Agostino skewness test ( 7 ), Anscombe-Glynn kurtosis test ( 7 ), D’Agostino-Pearson omnibus test ( 7 ), and the Jarque-Bera test ( 7 ). Among these, K-S is a much used test ( 11 ) and the K-S and Shapiro-Wilk tests can be conducted in the SPSS Explore procedure (Analyze → Descriptive Statistics → Explore → Plots → Normality plots with tests) ( 8 ).

The tests mentioned above compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation; the null hypothesis is that “sample distribution is normal.” If the test is significant, the distribution is non-normal. For small sample sizes, normality tests have little power to reject the null hypothesis and therefore small samples most often pass normality tests ( 7 ). For large sample sizes, significant results would be derived even in the case of a small deviation from normality ( 2 , 7 ), although this small deviation will not affect the results of a parametric test ( 7 ). The K-S test is an empirical distribution function (EDF) in which the theoretical cumulative distribution function of the test distribution is contrasted with the EDF of the data ( 7 ). A limitation of the K-S test is its high sensitivity to extreme values; the Lilliefors correction renders this test less conservative ( 10 ). It has been reported that the K-S test has low power and it should not be seriously considered for testing normality ( 11 ). Moreover, it is not recommended when parameters are estimated from the data, regardless of sample size ( 12 ).

The Shapiro-Wilk test is based on the correlation between the data and the corresponding normal scores ( 10 ) and provides better power than the K-S test even after the Lilliefors correction ( 12 ). Power is the most frequent measure of the value of a test for normality—the ability to detect whether a sample comes from a non-normal distribution ( 11 ). Some researchers recommend the Shapiro-Wilk test as the best choice for testing the normality of data ( 11 ).

4. Testing Normality Using SPSS

We consider two examples from previously published data: serum magnesium levels in 12–16 year old girls (with normal distribution, n = 30) ( 13 ) and serum thyroid stimulating hormone (TSH) levels in adult control subjects (with non-normal distribution, n = 24) ( 14 ). SPSS provides the K-S (with Lilliefors correction) and the Shapiro-Wilk normality tests and recommends these tests only for a sample size of less than 50 ( 8 ).

In Figure , both frequency distributions and P-P plots show that serum magnesium data follow a normal distribution while serum TSH levels do not. Results of K-S with Lilliefors correction and Shapiro-Wilk normality tests for serum magnesium and TSH levels are shown in Table . It is clear that for serum magnesium concentrations, both tests have a p-value greater than 0.05, which indicates normal distribution of data, while for serum TSH concentrations, data are not normally distributed as both p values are less than 0.05. Lack of symmetry (skewness) and pointiness (kurtosis) are two main ways in which a distribution can deviate from normal. The values for these parameters should be zero in a normal distribution. These values can be converted to a z-score as follows:

An external file that holds a picture, illustration, etc.
Object name is ijem-10-486-i001.jpg

a Abbreviations: Df, Degree of freedom; K-S, Kolmogorov-Smirnov; SD, Standard deviation; SEM, Standard error of mean; TSH, Thyroid stimulating hormone

Z Skewness = Skewness-0 / SE Skewness and Z Kurtosis = Kurtosis-0 / SE Kurtosis .

An absolute value of the score greater than 1.96 or lesser than -1.96 is significant at P < 0.05, while greater than 2.58 or lesser than -2.58 is significant at P < 0.01, and greater than 3.29 or lesser than -3.29 is significant at P < 0.001. In small samples, values greater or lesser than 1.96 are sufficient to establish normality of the data. However, in large samples (200 or more) with small standard errors, this criterion should be changed to ± 2.58 and in very large samples no criterion should be applied (that is, significance tests of skewness and kurtosis should not be used) ( 2 ). Results presented in Table indicate that parametric statistics should be used for serum magnesium data and non-parametric statistics should be used for serum TSH data.

5. Conclusions

According to the available literature, assessing the normality assumption should be taken into account for using parametric statistical tests. It seems that the most popular test for normality, that is, the K-S test, should no longer be used owing to its low power. It is preferable that normality be assessed both visually and through normality tests, of which the Shapiro-Wilk test, provided by the SPSS software, is highly recommended. The normality assumption also needs to be considered for validation of data presented in the literature as it shows whether correct statistical tests have been used.

Acknowledgments

The authors thank Ms. N. Shiva for critical editing of the manuscript for English grammar and syntax and Dr. F. Hosseinpanah for statistical comments.

Implication for health policy/practice/research/medical education: Data presented in this article could help for the selection of appropriate statistical analyses based on the distribution of data.

Please cite this paper as: Ghasemi A, Zahediasl S. Normality Tests for Statistical Analysis: A Guide for Non-Statisticians. Int J Endocrinol Metab. 2012;10(2):486-9. DOI: 10.5812/ijem.3505

Financial Disclosure: None declared.

Funding/Support: None declared.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

11.9: Checking the Normality of a Sample

  • Last updated
  • Save as PDF
  • Page ID 36159

  • Danielle Navarro
  • University of New South Wales

All of the tests that we have discussed so far in this chapter have assumed that the data are normally distributed. This assumption is often quite reasonable, because the central limit theorem (Section 10.3.3) does tend to ensure that many real world quantities are normally distributed: any time that you suspect that your variable is actually an average of lots of different things, there’s a pretty good chance that it will be normally distributed; or at least close enough to normal that you can get away with using t-tests. However, life doesn’t come with guarantees; and besides, there are lots of ways in which you can end up with variables that are highly non-normal. For example, any time you think that your variable is actually the minimum of lots of different things, there’s a very good chance it will end up quite skewed. In psychology, response time (RT) data is a good example of this. If you suppose that there are lots of things that could trigger a response from a human participant, then the actual response will occur the first time one of these trigger events occurs. 198 This means that RT data are systematically non-normal. Okay, so if normality is assumed by all the tests, and is mostly but not always satisfied (at least approximately) by real world data, how can we check the normality of a sample? In this section I discuss two methods: QQ plots, and the Shapiro-Wilk test.

qq1a-1.png

The Shapiro-Wilk statistic associated with the data in Figures 13.14 and 13.15 is W=.99, indicating that no significant departures from normality were detected (p=.73). As you can see, these data form a pretty straight line; which is no surprise given that we sampled them from a normal distribution! In contrast, have a look at the two data sets shown in Figures 13.16, 13.17, 13.18, 13.19. Figures 13.16 and 13.17 show the histogram and a QQ plot for a data set that is highly skewed: the QQ plot curves upwards. Figures 13.18 and 13.19 show the same plots for a heavy tailed (i.e., high kurtosis) data set: in this case, the QQ plot flattens in the middle and curves sharply at either end.

qq2a-1.png

The skewness of the data in Figures 13.16 and 13.17 is 1.94, and is reflected in a QQ plot that curves upwards. As a consequence, the Shapiro-Wilk statistic is W=.80, reflecting a significant departure from normality (p<.001).

qq2c-1.png

Figures 13.18 and 13.19 shows the same plots for a heavy tailed data set, again consisting of 100 observations. In this case, the heavy tails in the data produce a high kurtosis (2.80), and cause the QQ plot to flatten in the middle, and curve away sharply on either side. The resulting Shapiro-Wilk statistic is W=.93, again reflecting significant non-normality (p<.001).

One way to check whether a sample violates the normality assumption is to draw a “quantile-quantile” plot (QQ plot). This allows you to visually check whether you’re seeing any systematic violations. In a QQ plot, each observation is plotted as a single dot. The x co-ordinate is the theoretical quantile that the observation should fall in, if the data were normally distributed (with mean and variance estimated from the sample) and on the y co-ordinate is the actual quantile of the data within the sample. If the data are normal, the dots should form a straight line. For instance, lets see what happens if we generate data by sampling from a normal distribution, and then drawing a QQ plot using the R function qqnorm() . The qqnorm() function has a few arguments, but the only one we really need to care about here is y , a vector specifying the data whose normality we’re interested in checking. Here’s the R commands:

unnamed-chunk-45-1.png

Shapiro-Wilk tests

Although QQ plots provide a nice way to informally check the normality of your data, sometimes you’ll want to do something a bit more formal. And when that moment comes, the Shapiro-Wilk test (Shapiro and Wilk 1965) is probably what you’re looking for. 199 As you’d expect, the null hypothesis being tested is that a set of N observations is normally distributed. The test statistic that it calculates is conventionally denoted as W, and it’s calculated as follows. First, we sort the observations in order of increasing size, and let X1 be the smallest value in the sample, X2 be the second smallest and so on. Then the value of W is given by

\(W=\dfrac{\left(\sum_{i=1}^{N} a_{i} X_{i}\right)^{2}}{\sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)^{2}}\)

where \(\ \bar{X}\) is the mean of the observations, and the ai values are … mumble, mumble … something complicated that is a bit beyond the scope of an introductory text.

Because it’s a little hard to explain the maths behind the W statistic, a better idea is to give a broad brush description of how it behaves. Unlike most of the test statistics that we’ll encounter in this book, it’s actually small values of W that indicated departure from normality. The W statistic has a maximum value of 1, which arises when the data look “perfectly normal”. The smaller the value of W, the less normal the data are. However, the sampling distribution for W – which is not one of the standard ones that I discussed in Chapter 9 and is in fact a complete pain in the arse to work with – does depend on the sample size N. To give you a feel for what these sampling distributions look like, I’ve plotted three of them in Figure 13.20. Notice that, as the sample size starts to get large, the sampling distribution becomes very tightly clumped up near W=1, and as a consequence, for larger samples W doesn’t have to be very much smaller than 1 in order for the test to be significant.

shapirowilkdist.png

To run the test in R, we use the shapiro.test() function. It has only a single argument x , which is a numeric vector containing the data whose normality needs to be tested. For example, when we apply this function to our normal.data , we get the following:

So, not surprisingly, we have no evidence that these data depart from normality. When reporting the results for a Shapiro-Wilk test, you should (as usual) make sure to include the test statistic W and the p value, though given that the sampling distribution depends so heavily on N it would probably be a politeness to include N as well.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Choosing the Right Statistical Test | Types & Examples

Choosing the Right Statistical Test | Types & Examples

Published on January 28, 2020 by Rebecca Bevans . Revised on June 22, 2023.

Statistical tests are used in hypothesis testing . They can be used to:

  • determine whether a predictor variable has a statistically significant relationship with an outcome variable.
  • estimate the difference between two or more groups.

Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis.

If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

Statistical tests flowchart

Table of contents

What does a statistical test do, when to perform a statistical test, choosing a parametric test: regression, comparison, or correlation, choosing a nonparametric test, flowchart: choosing a statistical test, other interesting articles, frequently asked questions about statistical tests.

Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p value (probability value). The p -value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment , or through observations made using probability sampling methods .

For a statistical test to be valid , your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

  • whether your data meets certain assumptions.
  • the types of variables that you’re dealing with.

Statistical assumptions

Statistical tests make some common assumptions about the data they are testing:

  • Independence of observations (a.k.a. no autocorrelation): The observations/variables you include in your test are not related (for example, multiple measurements of a single test subject are not independent, while measurements of multiple different test subjects are independent).
  • Homogeneity of variance : the variance within each group being compared is similar among all groups. If one group has much more variation than others, it will limit the test’s effectiveness.
  • Normality of data : the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data .

If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test , which allows you to make comparisons without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeated-measures tests or tests that include blocking variables).

Types of variables

The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:

  • Continuous (aka ratio variables): represent measures and can usually be divided into units smaller than one (e.g. 0.75 grams).
  • Discrete (aka integer variables): represent counts and usually can’t be divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:

  • Ordinal : represent data with an order (e.g. rankings).
  • Nominal : represent group names (e.g. brands or species names).
  • Binary : represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment , these are the independent and dependent variables ). Consult the tables below to see which test best matches your variables.

Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.

Regression tests

Regression tests look for cause-and-effect relationships . They can be used to estimate the effect of one or more continuous variables on another variable.

Comparison tests

Comparison tests look for differences among group means . They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

Correlation tests

Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above.

Choosing the right statistical test

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Statistical tests commonly assume that:

  • the data are normally distributed
  • the groups that are being compared have similar variance
  • the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

A test statistic is a number calculated by a  statistical test . It describes how far your observed data is from the  null hypothesis  of no relationship between  variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Choosing the Right Statistical Test | Types & Examples. Scribbr. Retrieved March 26, 2024, from https://www.scribbr.com/statistics/statistical-tests/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, normal distribution | examples, formulas, & uses, what is your plagiarism score.

Please enable JavaScript to view this site.

  • Statistics Guide
  • Curve Fitting Guide
  • Prism Guide
  • Zoom Window Out
  • Larger Text  |  Smaller Text
  • Hide Page Header
  • Show Expanding Text
  • Printable Version
  • Save Permalink URL

What question does the normality test answer?

The normality tests all report a P value. To understand any P value, you need to know the null hypothesis. In this case, the null hypothesis is that all the values were sampled from a population that follows a Gaussian distribution.

The P value answers the question:

If that null hypothesis were true, what is the chance that a random sample of data would deviate from the Gaussian ideal as much as these data do?

Prism also uses the traditional 0.05 cut-off to answer the question whether the data passed the normality test. If the P value is greater than 0.05, the answer is Yes. If the P value is less than or equal to 0.05, the answer is No.  

What should I conclude if the P value from the normality test is high?

All you can say is that the data are not inconsistent with a Gaussian distribution. A normality test cannot prove the data were sampled from a Gaussian distribution. All the normality test can do is demonstrate that the deviation from the Gaussian ideal is not more than you’d expect to see with chance alone. With large data sets, this is reassuring. With smaller data sets, the normality tests don’t have much power to detect modest deviations from the Gaussian ideal.

What should I conclude if the P value from the normality test is low?

The null hypothesis is that the data are sampled from a Gaussian distribution. If the P value is small enough, you reject that null hypothesis and so accept the alternative hypothesis that the data are not sampled from a Gaussian population. The distribution could be close to Gaussian (with large data sets) or very far form it. The normality test tells you nothing about the alternative distributions.

If you P value is small enough to declare the deviations from the Gaussian idea to be "statistically significant", you then have four choices:

• The data may come from another identifiable distribution. If so, you may be able to transform your values to create a Gaussian distribution. For example, if the data come from a lognormal distribution, transform all values to their logarithms.

• The presence of one or a few outliers might be causing the normality test to fail. Run an outlier test. Consider excluding the outlier(s).

• If the departure from normality is small, you may choose to do nothing. Statistical tests tend to be quite robust to mild violations of the Gaussian assumption.

• Switch to nonparametric tests that don’t assume a Gaussian distribution. But the decision to use (or not use) nonparametric tests is a big decision. It should not be based on a single normality test and should not be automated .

© 1995- 2019 GraphPad Software, LLC. All rights reserved.

resize nav pane

An Introduction to the Shapiro-Wilk Test for Normality

thesis normality test

Data scientists usually have to check if data is normally distributed . An example is the normality check on the residuals of linear regression in order to correctly use the F-test. One way to do that is through the Shapiro-Wilk test, which is a hypothesis test applied to a sample with a null hypothesis that the sample stems from a normal distribution.

Shapiro-Wilk Test Explained

Shapiro-Wilk test is a hypothesis test that evaluates whether a data set is normally distributed. It evaluates data from a sample with the null hypothesis that the data set is normally distributed. A large p-value indicates the data set is normally distributed, a low p-value indicates that it isn’t normally distributed.

Let’s see how we can check the normality of a data set.

normal distribution graph

What Is Normality?

Normality means that a particular sample has been generated from a Gaussian distribution . It doesn’t necessarily have to be a standardized normal distribution, with a zero mean and a variance equal to one.

There are several situations in which data scientists may need normally distributed data:

  • To compare the residuals of linear regression in the training test with the residuals in the test set using an F-test.
  • To compare the mean value of a variable across different groups using a one-way analysis of variance (ANOVA) test or a student’s t-test .
  • To assess the linear correlation between two variables using a proper test on their Pearson’s correlation coefficient.
  • To assess if the likelihood of a feature against a target in a Naive Bayes model allows us to use a Gaussian Naive Bayes classification model .

These are all different examples that may occur frequently in a data scientist’s everyday job.

Unfortunately, data is not always normally distributed. Although, we can apply some particular transformation to make a distribution more symmetrical, like in power transformation.

A good way to assess the normality of a data set would be to use a Q-Q plot , which gives us a graphical visualization of normality. But we often need a quantitative result to check and a chart couldn’t be enough. That’s why we can use a hypothesis test to assess the normality of a sample.

What Is the Shapiro-Wilk Test?

The Shapiro-Wilk test is a hypothesis test that is applied to a sample with a null hypothesis that the sample has been generated from a normal distribution. If the  p-value is low, we can reject such a null hypothesis and say that the sample has not been generated from a normal distribution.

It’s an easy-to-use statistical tool that can help us find an answer to the normality check we need, but it has one flaw: It doesn’t work well with large data sets. The maximum allowed size for a data set depends on the implementation, but in Python , we see that a sample size larger than 5,000 will give us an approximate calculation for the p-value.

However, this test is still a very powerful tool we can use. Let’s see a practical example in Python.

More on Data Science What Is Database Normalization?

Shapiro-Wilk Test Example in Python

First of all, let’s import NumPy and Matplotlib .

Now, we have to import the function that calculates the p-value of a Shapiro-Wilk test. It’s the “shapiro” function in scipy.stats .

Let’s now simulate two data sets, one generated from a normal distribution and the other generated from a uniform distribution.

This is the histogram for “x”:

Normally distributed histogram for the x value of the data set.

We can clearly see that the distribution is very similar to a normal distribution.

And this is the histogram for “y”:

Histogram for Y data set not normal distribution

As expected, the distribution is very far from a normal one.

So, we expect a Shapiro-Wilk test to give us a pretty large p-value for the “x” sample and a small p-value for the “y” sample, because it’s not normally distributed.

Let’s calculate such p-values:

As we can see, the p-value for the “x” sample is not low enough for us to reject the null hypothesis.

If we calculate the p-value on “y,” we get a different result.

The p-value is lower than 5 percent, so we can reject the null hypothesis of the normality of the data set.

If we try to calculate the p-value on a sample larger than 5,000 points, we get a warning:

So, that’s how we can perform the Shapiro-Wilk test for normality in Python. Just make sure to use a properly shaped data set so that you don’t have to work with approximated p-values.

More on Data Science Z-Test for Statistical Hypothesis Testing Explained

Advantages of the Shapiro-Wilk Test

The Shapiro-Wilk test for normality is a very simple-to-use tool of statistics to assess the normality of a data set. I typically apply it after creating  data visualization  set either via a histogram and/or a Q-Q plot. It’s a very useful tool to ensure that a normality requirement is satisfied every time we need it, and it must be present in every data scientist’s toolbox.

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

Normality test

One of the most common assumptions for statistical tests is that the data used are normally distributed. For example, if you want to run a t-test or an ANOVA , you must first test whether the data or variables are normally distributed.

The assumption of normal distribution is also important for linear regression analysis , but in this case it is important that the error made by the model is normally distributed, not the data itself.

Nonparametric tests

If the data are not normally distributed, the above procedures cannot be used and non-parametric tests must be used. Non-parametric tests do not assume that the data are normally distributed.

How is the normal distribution tested?

Normal distribution can be tested either analytically (statistical tests) or graphically. The most common analytical tests to check data for normal distribution are the:

  • Kolmogorov-Smirnov Test
  • Shapiro-Wilk Test
  • Anderson-Darling Test

For graphical verification, either a histogram or, better, the Q-Q plot is used. Q-Q stands for quantile-quantile plot, where the actually observed distribution is compared with the theoretically expected distribution.

Statistical tests for normal distribution

To test your data analytically for normal distribution, there are several test procedures, the best known being the Kolmogorov-Smirnov test, the Shapiro-Wilk test, and the Anderson Darling test.

Analytically test data for normal distribution

In all of these tests, you are testing the null hypothesis that your data are normally distributed. The null hypothesis is that the frequency distribution of your data is normally distributed. To reject or not reject the null hypothesis, all these tests give you a p-value . What matters is whether this p-value is less than or greater than 0.05.

Null hypothesis Test for normality

If the p-value is less than 0.05, this is interpreted as a significant deviation from the normal distribution and it can be assumed that the data are not normally distributed. If the p-value is greater than 0.05 and you want to be statistically clean, you cannot necessarily say that the frequency distribution is normal, you just cannot reject the null hypothesis.

In practice, a normal distribution is assumed for values greater than 0.05, although this is not entirely correct. Nevertheless, the graphical solution should always be considered.

Note: The Kolmogorov-Smirnov test and the Anderson-Darling test can also be used to test distributions other than the normal distribution.

Disadvantage of the analytical tests for normal distribution

Unfortunately, the analytical method has a major drawback, which is why more and more attention is being paid to graphical methods.

The problem is that the calculated p-value is affected by the size of the sample. Therefore, if you have a very small sample, your p-value may be much larger than 0.05, but if you have a very very large sample from the same population, your p-value may be smaller than 0.05.

Disadvantage of the analytical tests for normal distribution

If we assume that the distribution in the population deviates only slightly from the normal distribution, we will get a very large p-value with a very small sample and therefore assume that the data are normally distributed. However, if you take a larger sample, the p-value gets smaller and smaller, even though the samples are from the same population with the same distribution. With a very large sample, you can even get a p-value of less than 0.05, rejecting the null hypothesis of normal distribution.

To avoid this problem, graphical methods are increasingly being used.

Graphical test for normal distribution

If the normal distribution is tested graphically, one looks either at the histogram or even better the QQ plot.

If you want to check the normal distribution using a histogram, plot the normal distribution on the histogram of your data and check that the distribution curve of the data approximately matches the normal distribution curve.

Testing normality with histogram

A better way to do this is to use a quantile-quantile plot, or Q-Q plot for short. This compares the theoretical quantiles that the data should have if they were perfectly normal with the quantiles of the measured values.

Testing normality with QQ-Plot

If the data were perfectly normally distributed, all points would lie on the line. The further the data deviates from the line, the less normally distributed the data is.

In addition, DATAtab plots the 95% confidence interval. If all or almost all of the data fall within this interval, this is a very strong indication that the data are normally distributed. They are not normally distributed if, for example, they form an arc and are far from the line in some areas.

Test Normal distribution in DATAtab

When you test your data for normal distribution with DATAtab, you get the following evaluation, first the analytical test procedures clearly arranged in a table, then the graphical test procedures.

Test Normal distribution in DATAtab

If you want to test your data for normal distribution, simply copy your data into the table on DATAtab, click on descriptive statistics and then select the variable you want to test for normal distribution. Then, just click on Test Normal Distribution and you will get the results.

Furthermore, if you are calculating a hypothesis test with DATAtab, you can test the assumptions for each hypothesis test, if one of the assumptions is the normal distribution, then you will get the test for normal distribution in the same way.

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 301 pages
  • 4rd revised edition (February 2024)
  • Only 6.99 €

Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Statistics Calculator

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

Testing for Normality using SPSS Statistics

Introduction.

An assessment of the normality of data is a prerequisite for many statistical tests because normal data is an underlying assumption in parametric testing. There are two main methods of assessing normality: graphically and numerically.

This "quick start" guide will help you to determine whether your data is normal, and therefore, that this assumption is met in your data for statistical tests. The approaches can be divided into two main themes: relying on statistical tests or visual inspection. Statistical tests have the advantage of making an objective judgement of normality, but are disadvantaged by sometimes not being sensitive enough at low sample sizes or overly sensitive to large sample sizes. As such, some statisticians prefer to use their experience to make a subjective judgement about the data from plots/graphs. Graphical interpretation has the advantage of allowing good judgement to assess normality in situations when numerical tests might be over or under sensitive, but graphical methods do lack objectivity. If you do not have a great deal of experience interpreting normality graphically, it is probably best to rely on the numerical methods.

If you want to be guided through the testing for normality procedure in SPSS Statistics for the specific statistical test you are using to analyse your data, we provide comprehensive guides in our enhanced content. For each statistical test where you need to test for normality, we show you, step-by-step, the procedure in SPSS Statistics, as well as how to deal with situations where your data fails the assumption of normality (e.g., where you can try to "transform" your data to make it "normal"; something we also show you how to do using SPSS Statistics). You can learn about our enhanced content in general on our Features: Overview page or how we help with assumptions on our Features: Assumptions page. However, in this "quick start" guide, we take you through the basics of testing for normality in SPSS Statistics.

SPSS Statistics

Methods of assessing normality.

SPSS Statistics allows you to test all of these procedures within Explore... command. The Explore... command can be used in isolation if you are testing normality in one group or splitting your dataset into one or more groups. For example, if you have a group of participants and you need to know if their height is normally distributed, everything can be done within the Explore... command. If you split your group into males and females (i.e., you have a categorical independent variable), you can test for normality of height within both the male group and the female group using just the Explore... command. This applies even if you have more than two groups. However, if you have 2 or more categorical, independent variables, the Explore... command on its own is not enough and you will have to use the Split File... command also.

Note: The procedures that follow are identical for SPSS Statistics versions 17 to 28 , as well as the subscription version of SPSS Statistics, with version 28 and the subscription version being the latest versions of SPSS Statistics. However, in version 27 and the subscription version , SPSS Statistics introduced a new look to their interface called " SPSS Light ", replacing the previous look for versions 26 and earlier versions , which was called " SPSS Standard ". Therefore, if you have SPSS Statistics versions 27 or 28 (or the subscription version of SPSS Statistics), the images that follow will be light grey rather than blue. However, the procedures are identical .

Testimonials

Procedure for none or one grouping variable

The following example comes from our guide on how to perform a one-way ANOVA in SPSS Statistics.

Normality Test Menu

Published with written permission from SPSS Statistics, IBM Corporation.

Normality Screenshot

SPSS Statistics outputs many table and graphs with this procedure. One of the reasons for this is that the Explore... command is not used solely for the testing of normality, but in describing data in many different ways. When testing for normality, we are mainly interested in the Tests of Normality table and the Normal Q-Q Plots , our numerical and graphical methods to test for the normality of data, respectively.

Shapiro-Wilk Test of Normality

Normality Screenshot

The above table presents the results from two well-known tests of normality, namely the Kolmogorov-Smirnov Test and the Shapiro-Wilk Test. The Shapiro-Wilk Test is more appropriate for small sample sizes (< 50 samples), but can also handle sample sizes as large as 2000. For this reason, we will use the Shapiro-Wilk test as our numerical means of assessing normality.

We can see from the above table that for the "Beginner", "Intermediate" and "Advanced" Course Group the dependent variable, "Time", was normally distributed. How do we know this? If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data is normal. If it is below 0.05, the data significantly deviate from a normal distribution.

If you need to use skewness and kurtosis values to determine normality, rather the Shapiro-Wilk test, you will find these in our enhanced testing for normality guide. You can learn more about our enhanced content on our Features: Overview page.

Normal Q-Q Plot

In order to determine normality graphically, we can use the output of a normal Q-Q Plot. If the data are normally distributed, the data points will be close to the diagonal line. If the data points stray from the line in an obvious non-linear fashion, the data are not normally distributed. As we can see from the normal Q-Q plot below, the data is normally distributed. If you are at all unsure of being able to correctly interpret the graph, rely on the numerical methods instead because it can take a fair bit of experience to correctly judge the normality of data based on plots.

Normality Screenshot

If you need to know what Normal Q-Q Plots look like when distributions are not normal (e.g., negatively skewed), you will find these in our enhanced testing for normality guide. You can learn more about our enhanced content on our Features: Overview page.

Testing for normality with neural networks

  • Original Article
  • Published: 21 July 2021
  • Volume 33 , pages 16279–16313, ( 2021 )

Cite this article

  • Miloš Simić   ORCID: orcid.org/0000-0003-1506-3728 1  

311 Accesses

2 Citations

6 Altmetric

Explore all metrics

In this paper, we treat the problem of testing for normality as a binary classification problem and construct a feedforward neural network that can act as a powerful normality test. We show that by changing its decision threshold, we can control the frequency of false non-normal predictions and thus make the network more similar to standard statistical tests. We also find the optimal decision thresholds that minimize the total error probability for each sample size. The experiments conducted on the samples with no more than 100 elements suggest that our method is more accurate and more powerful than the selected standard tests of normality for almost all the types of alternative distributions and sample sizes. In particular, the neural network was the most powerful method for testing normality of the samples with fewer than 30 elements regardless of the alternative distribution type. Its total accuracy increased with the sample size. Additionally, when the optimal decision-thresholds were used, the network was very accurate for larger samples with 250–1000 elements. With AUROC equal to almost 1, the network was the most accurate method overall. Since the normality of data is an assumption of numerous statistical techniques, the network constructed in this study has a very high potential for use in everyday practice of statistics, data analysis and machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

thesis normality test

Similar content being viewed by others

thesis normality test

Testing normality of a large number of populations

M. D. Jiménez-Gamero

thesis normality test

Bayesian Confidence Intervals for Means of Normal Distributions with Unknown Coefficients of Variation

thesis normality test

T-normal family of distributions: a new approach to generalize the normal distribution

Ayman Alzaatreh, Carl Lee & Felix Famoye

Availability of data and material

The data and the code are in the following \({\mathtt{github}}\) repository: https://github.com/milos-simic/neural-normality .

https://github.com/rmcelreath/rethinking/blob/master/data/Howell1.csv.

Nornadiah Mohd Razali and Yap Bee Wah (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J Statistical Model Anal 2(1):21–33

Google Scholar  

Thode HC (2002) Testing For Normality. Statistics, textbooks and monographs. Taylor & Francis. ISBN 9780203910894

Sigut J, Piñeiro J, Estévez J, and Toledo P (2006) A neural network approach to normality testing. Intell Data Anal, 10(6):509–519, 12

Esteban MD, Castellanos ME, Morales D, Vajda I (2001) Monte carlo comparison of four normality tests using different entropy estimates. Commun Statistics - Simul Comput 30(4):761–785

Article   MathSciNet   Google Scholar  

Hadi Alizadeh Noughabi and Naser Reza Arghami (2011) Monte carlo comparison of seven normality tests. J Statistical Comput Simul 81(8):965–972

Hain Johannes (August 2010) Comparison of common tests for normality. Diploma thesis, Julius-Maximilians-Universität Würzburg Institut für Mathematik und Informatik

Yap BW and Sim CH (2011) Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation , 81 (12):2141–2155, 12

Ahmad Fiaz, Khan Rehan (2015) A power comparison of various normality tests. Pakistan Journal of Statistics and Operation Research 11(3):331–345

Marmolejo-Ramos Fernando and González-Burgos Jorge (2013) A power comparison of various tests of univariate normality on ex-gaussian distributions. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences , 9(4):137

Mbah Alfred K, Paothong Arnut (2015) Shapiro-francia test compared to other normality test using expected p-value. Journal of Statistical Computation and Simulation 85(15):3002–3016

binti Yusoff S and Bee Wah Y (Sept 2012) Comparison of conventional measures of skewness and kurtosis for small sample size. In 2012 International Conference on Statistics in Science, Business and Engineering (ICSSBE) , pages 1–6. 10.1109/ICSSBE.2012.6396619

Patrício Miguel, Ferreira Fábio, Oliveiros Bárbara, and Caramelo Francisco (2017) Comparing the performance of normality tests with roc analysis and confidence intervals. Communications in Statistics-Simulation and Computation , pages 1–17

Wijekularathna Danush K, Manage Ananda BW, and Scariano Stephen M (September 2019) Power analysis of several normality tests: A monte carlo simulation study. Communications in Statistics - Simulation and Computation , pages 1–17

Wilson PR and Engel AB (1990) Testing for normality using neural networks. In [1990] Proceedings. First International Symposium on Uncertainty Modeling and Analysis , pages 700–704

Gel Yulia, Miao Weiwen, and Gastwirth Joseph L (2005) The importance of checking the assumptions underlying statistical analysis: Graphical methods for assessing normality. Jurimetrics , 46(1):3–29. ISSN 08971277, 21544344. http://www.jstor.org/stable/29762916

Stehlík M, Střelec L, and Thulin M (2014) On robust testing for normality in chemometrics. Chemometrics and Intelligent Laboratory Systems , 130: 98–108. ISSN 0169-7439. https://doi.org/10.1016/j.chemolab.2013.10.010. https://www.sciencedirect.com/science/article/pii/S0169743913001913

Lopez-Paz David and Oquaba Maxime (2017) Revisiting classifier two-sample tests. In ICLR

Ojala Markus and Garriga Gemma C (2010) Permutation tests for studying classifier performance. Journal of Machine Learning Research , 11 (Jun):1833–1863

Al-Rawi Mohammed Sadeq and Paulo Silva Cunha João (2012) Using permutation tests to study how the dimensionality, the number of classes, and the number of samples affect classification analysis. In Aurélio Campilho and Mohamed Kamel, editors, Image Analysis and Recognition , pages 34–42, Berlin, Heidelberg. Springer Berlin Heidelberg. ISBN 978-3-642-31295-3

Kim Ilmun, Ramdas Aaditya, Singh Aarti, and Wasserman Larry (Feb 2016) Classification accuracy as a proxy for two sample testing. arXiv e-prints , art. arXiv:1602.02210

Blanchard Gilles, Lee Gyemin, and Scott Clayton (2010) Semi-supervised novelty detection. Journal of Machine Learning Research , 11 (Nov):2973–3009

Rosenblatt Jonathan D, Benjamini Yuval, Gilron Roee, Mukamel Roy, and Goeman Jelle J (2019) Better-than-chance classification for signal detection. Biostatistics , 10 . kxz035

Gretton Arthur, Borgwardt Karsten M, Rasch Malte J, Schölkopf Bernhard, and Smola Alexander (March 2012) A kernel two-sample test. J. Mach. Learn. Res. , 13(null):723–773

Borgwardt Karsten M, Gretton Arthur, Rasch Malte J, Kriegel Hans-Peter, Schölkopf Bernhard, and Smola Alex J (2006) Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics , 22(14):e49–e57, 7

Gretton Arthur, Borgwardt Karsten, Rasch Malte, Schölkopf Bernhard, and Smola Alex J (2007a) A kernel method for the two-sample-problem. In B. Schölkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19 , pages 513–520. MIT Press

Gretton Arthur, Borgwardt Karsten M, Rasch Malte J (2007b) Bernhard Schëlkopf, and Alexander J. Smola. A kernel approach to comparing distributions. In AAAI , pages 1637–1641

Smola Alex, Gretton Arthur, Song Le, and Schölkopf Bernhard (2007) A hilbert space embedding for distributions. In Marcus Hutter, Rocco A. Servedio, and Eiji Takimoto, editors, Algorithmic Learning Theory , pages 13–31, Berlin, Heidelberg. Springer Berlin Heidelberg. ISBN 978-3-540-75225-7

Gretton Arthur, Fukumizu Kenji, Harchaoui Zaïd, and Sriperumbudur Bharath K (2009) A fast, consistent kernel two-sample test. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22 , pages 673–681. Curran Associates, Inc.

Scholköpf Bernhard and Smola Alexander J (2001) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning) . The MIT Press, 12 . ISBN 0262194759

Steinwart Ingo, Crhistmann Andreas (2008) Support Vector Machines. Information Science and Statistics. Springer-Verlag, New York, NY, USA

Gretton Arthur, Fukumizu Kenji, Teo Choon H, Song Le, Schölkopf Bernhard, and Smola Alex J (2008) A kernel statistical test of independence. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems 20 , pages 585–592. Curran Associates, Inc.

Gretton Arthur, Bousquet Olivier, Smola Alex, and Schölkopf Bernhard (2005) Measuring statistical dependence with hilbert-schmidt norms. In Sanjay Jain, Hans Ulrich Simon, and Etsuji Tomita, editors, Algorithmic Learning Theory , pages 63–77, Berlin, Heidelberg. Springer Berlin Heidelberg. ISBN 978-3-540-31696-1

Chwialkowski Kacper and Gretton Arthur (Jun 2014) A kernel independence test for random processes. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning , volume 32 of Proceedings of Machine Learning Research , pages 1422–1430, Bejing, China, 22–24 . PMLR

Pfister Niklas, Bühlmann Peter, Schölkopf Bernhard, and Peters Jonas (2017) Kernel-based tests for joint independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 80(1):5–31, 5

Chwialkowski Kacper, Strathmann Heiko, and Gretton Arthur (2016) A kernel test of goodness of fit. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , ICML’16, page 2606–2615. JMLR.org

Stein Charles (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory , pages 583–602, Berkeley, Calif. University of California Press

Shao Xiaofeng (2010) The dependent wild bootstrap. Journal of the American Statistical Association 105(489):218–235

Leucht Anne, Neumann Michael H (2013) Dependent wild bootstrap for degenerateU- andV-statistics. Journal of Multivariate Analysis 117:257–280

Liu Qiang, Lee Jason, and Jordan Michael (2016) A kernelized stein discrepancy for goodness-of-fit tests. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning , volume 48 of Proceedings of Machine Learning Research , pages 276–284, New York, New York, USA, 20–22 Jun . PMLR

Arcones Miguel A and Gine Evarist (1992) On the bootstrap of \(u\) and \(v\) statistics. Ann. Statist. , 20(2):655–674, 06

Huskova Marie and Janssen Paul (1993) Consistency of the generalized bootstrap for degenerate \(u\) -statistics. Ann. Statist. , 21(4):1811–1823, 12

Wittawat Jitkrittum, Wenkai Xu, Zoltan Szabo, Kenji Fukumizu, and Arthur Gretton. A linear-time kernel goodness-of-fit test. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages 262–271. Curran Associates, Inc., 2017

Chwialkowski Kacper, Ramdas Aaditya, Sejdinovic Dino, and Gretton Arthur (2015) Fast two-sample testing with analytic representations of probability measures. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 , NIPS’15, page 1981–1989, Cambridge, MA, USA. MIT Press

Lloyd James R and Ghahramani Zoubin (2015) Statistical model criticism using kernel two sample tests. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28 , pages 829–837. Curran Associates, Inc.

Jérémie Kellner and Alain Celisse (2019) A one-sample test for normality with kernel methods. Bernoulli , 25(3):1816–1837, 08

Kojadinovic Ivan, Yan Jun (2012) Goodness-of-fit testing based on a weighted bootstrap: A fast large-sample alternative to the parametric bootstrap. Canadian Journal of Statistics 40(3):480–500

Johnson NL (1949a) Bivariate distributions based on simple translation systems. Biometrika 36(3/4):297–304

Johnson NL (1949b) Systems of frequency curves generated by methods of translation. Biometrika 36(1/2):149–176

Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611

Lin Ching-Chuong, Mudholkar Govind S (1980) A simple test for normality against asymmetric alternatives. Biometrika 67(2):455–461

Vasicek Oldrich (1976) A test for normality based on sample entropy. Journal of the Royal Statistical Society. Series B (Methodological) , 38(1):54–59

Henderson A. Ralph (2006) Testing experimental data for univariate normality. Clinica Chimica Acta , 366(1–2):112 – 129

Gel Yulia R, Miao Weiwen, and Gastwirth Joseph L (2007) Robust directed tests of normality against heavy-tailed alternatives. Computational Statistics | & Data Analysis , 51 (5):2734–2746. ISSN 0167-9473. https://doi.org/10.1016/j.csda.2006.08.022. https://www.sciencedirect.com/science/article/pii/S0167947306002805

Seier Edith (2011) Normality tests: Power comparison. Springer, In International Encyclopedia of Statistical Science. 978-3-642-04897-5

Dufour Jean-Marie, Farhat Abdeljelil, Gardiol Lucien, Khalaf Lynda (1998) Simulation-based finite sample normality tests in linear regressions. The Econometrics Journal 1(1):154–173

Article   Google Scholar  

Lilliefors Hubert W (1967) On the kolmogorov-smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association 62(318):399–402

Anderson TW and Darling DA (1952) Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Statist. , 23(2):193–212, 06

Anderson TW, Darling DA (1954) A test of goodness of fit. Journal of the American Statistical Association 49(268):765–769

Jarque Carlos M, Bera Anil K (1980) Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters 6(3):255–259

Jarque Carlos M, Bera Anil K (1987) A test for normality of observations and regression residuals. International Statistical Review / Revue Internationale de Statistique 55(2):163–172

MathSciNet   MATH   Google Scholar  

Shapiro SS, Francia RS (1972) An approximate analysis of variance test for normality. Journal of the American Statistical Association 67(337):215–216

Cramér Harald (1928) On the composition of elementary errors. Scandinavian Actuarial Journal 1928(1):13–74

von Mises Richard (1928) Wahrscheinlichkeit Statistik und Wahrheit. Springer, Berlin Heidelberg

Book   Google Scholar  

Richard von Mises. Wahrscheinlichkeitsrechnung und Ihre Anwendung in der Statistik und Theoretischen Physik . F. Deuticke, 1931

Nikolai Vasilyevich Smirnov (1936) Sur la distribution de \(\omega ^2\) . CR Acad. Sci. Paris , 202(S 449)

D’Agostino Ralph and Pearson ES (1973) Tests for departure from normality. empirical results for the distributions of \(b_2\) and \(\sqrt{b_1}\) . Biometrika , 60(3):613–622. ISSN 00063444. http://www.jstor.org/stable/2335012

D’agostino Ralph B, Belanger Albert, and D’agostino Jr Ralph B (1990) A suggestion for using powerful and informative tests of normality. The American Statistician , 44(4):316–321. 10.1080/00031305.1990.10475751. https://www.tandfonline.com/doi/abs/10.1080/00031305.1990.10475751

Gel Yulia R and Gastwirth Joseph L (2008) A robust modification of the jarque-bera test of normality. Economics Letters , 99(1):30–32. ISSN 0165-1765. https://doi.org/10.1016/j.econlet.2007.05.022. https://www.sciencedirect.com/science/article/pii/S0165176507001838

Brys Guy, Hubert Mia, Struyf Anja (2007) Goodness-of-fit tests based on a robust measure of skewness. Computational Statistics 23(3):429–442

Stehlík Milan, Fabián Zdeněk, Střelec Luboš (2012) Small sample robust testing for normality against pareto tails. Communications in Statistics - Simulation and Computation 41(7):1167–1194

Wolf-Dieter Richter, Luboš Střelec, Hamid Ahmadinezhad,and Milan Stehlík. Geometric aspects of robust testing for normality and sphericity. Stochastic Analysis and Applications , 35 (3):511–532, 2017. 10.1080/07362994.2016.1273785. https://doi.org/10.1080/07362994.2016.1273785

John B. Hampshire and Barak Pearlmutter. Equivalence proofs for multi-layer perceptron classifiers and the bayesian discriminant function. In Connectionist Models , pages 159–172. Elsevier, 1991

Richard Michael D, Lippmann Richard P (1991) Neural network classifiers estimate bayesian a posteriori probabilities. Neural Computation 3(4):461–483

Charles M (2003) Grinstead and J. Laurie Snell. Introduction to Probability, AMS

Pearson Karl (1895) Contributions to the mathematical theory of evolution. ii. skew variation in homogeneous material. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences , 186:343–414

Pearson Karl (1901) Mathematical contributions to the theory of evolution. x. supplement to a memoir on skew variation. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences , 197 (287-299):443–459

Pearson Karl (1916) Mathematical contributions to the theory of evolution. xix. second supplement to a memoir on skew variation. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences , 216 (538-548):429–457

Juha Karvanen, Jan Eriksson, and Visa Koivunen (2000) Pearson system based method for blind separation. In Proceedings of Second International Workshop on Independent Component Analysis and Blind Signal Separation (ICA2000), Helsinki, Finland , pages 585–590

Howell Nancy (2010) Life Histories of the Dobe !Kung: food, fatness and well-being over the life span, volume 4 of Origins of Human Behavior and Culture. University of California Press

Howell Nancy (2017) Demography of the Dobe !Kung. Taylor & Francis. ISBN 9781351522694

Northern California Earthquake Data Center. Berkeley digital seismic network (bdsn), 2014a

Northern California Earthquake Data Center. Northern california earthquake data center, 2014b

Kursa Miron B, Rudnicki Witold R (2010) Feature selection with theBorutaPackage. Journal of Statistical Software 36(11)

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger (2017) On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages 1321–1330

Kingma Diederik P and Ba Jimmy (December 2014) Adam: A Method for Stochastic Optimization. arXiv e-prints , art. arXiv:1412.6980

Jarrett Kevin, Kavukcuoglu Koray, Ranzato Marc’Aurelio , and LeCun Yann (2009) What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th International Conference on Computer Vision , pages 2146–2153.10.1109/ICCV.2009.5459469

Vinod Nair and Geoffrey E. Hinton (2010) Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning , ICML’10, page 807–814, Madison, WI, USA. Omnipress. ISBN 9781605589077

Xavier Glorot, Antoine Bordes, and Yoshua Bengio (Apr 2011) Deep sparse rectifier neural networks. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , volume 15 of Proceedings of Machine Learning Research , pages 315–323, Fort Lauderdale, FL, USA, 11–13 . PMLR

Murphy Allan H, Winkler Robert L (1977) Reliability of subjective probability forecasts of precipitation and temperature. Journal of the Royal Statistical Society: Series C (Applied Statistics) 26(1):41–47

Murphy Allan H, Winkler Robert L (1987) A general framework for forecast verification. Monthly Weather Review 115(7):1330–1338

Bröcker Jochen (2008) Some remarks on the reliability of categorical probability forecasts. Monthly Weather Review 136(11):4488–4502

Fenlon Caroline, O’Grady Luke, Doherty Michael L, Dunnion John (2018) A discussion of calibration techniques for evaluating binary and categorical predictive models. Preventive Veterinary Medicine 149:107–114

John Shawe-Taylor and Nello Cristianini (june 2004) Kernel Methods for Pattern Analysis . Cambridge University Press,

Steven Goodman. A dirty dozen: Twelve p-value misconceptions. Seminars in Hematology , 45(3):135 – 140, 2008. Interpretation of Quantitative Research

John C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In ADVANCES IN LARGE MARGIN CLASSIFIERS , pages 61–74. MIT Press, 1999

Bianca Zadrozny and Charles Elkan (2002) Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD 02, page 694-699, New York, NY, USA. Association for Computing Machinery. ISBN 158113567X

Alex Rosenberg and Lee McIntyre. Philosophy of science: A contemporary introduction . Routledge, 4 edition, 2019. ISBN 9781138331518

Royston JP (1982a) An extension of shapiro and wilk’s w test for normality to large samples. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 31(2):115–124

Patrick Royston. Remark as r94: A remark on algorithm as 181: The w-test for normality. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 44(4):547–551, 1995

J. P. Royston. Algorithm as 177: Expected normal order statistics (exact and approximate). Journal of the Royal Statistical Society. Series C (Applied Statistics) , 31(2):161–165, 1982b. ISSN 00359254, 14679876. http://www.jstor.org/stable/2347982

van Soest J (1967) Some experimental results concerning tests of normality*. Statistica Neerlandica 21(1):91–97

Kolmogorov AN (1933) Sulla determinazione empirica di una legge di distribuzione. Giornale dell’ Instituto Italiano degli Attuari 4:83–91

MATH   Google Scholar  

Juergen Gross and Uwe Ligges. nortest: Tests for normality, 2015. R package version 1.0-4

Stephens MA (1986) Tests based on edf statistics. In: D’Agostino RB, Stephens MA (eds) Goodness-of-Fit Techniques. Marcel Dekker, New York

K. O. Bowman and L. R. Shenton. Omnibus test contours for departures from normality based on \(\sqrt{b}_1\) and \(b_2\) . Biometrika , 62(2):243–250, 08 1975

Urzua Carlos (1996) On the correct use of omnibus tests for normality. Economics Letters 53(3):247–251

Carmeli C, de Vito E, Toigo A, Umanità V (2010) VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES AND UNIVERSALITY. Analysis and Applications 08(01):19–61

Damien Garreau, Wittawat Jitkrittum, and Motonobu Kanagawa. Large sample analysis of the median heuristic. arXiv e-prints , art. arXiv:1707.07269 , July 2017

Download references

Acknowledgements

The author would like to thank his advisor, Dr. Miloš Stanković (Innovation Center, School of Electrical Engineering, University of Belgrade), for useful discussions and advice, and Dr. Wittawat Jitkrittum (Google Research) for advice on the kernel tests of goodness-of-fit. The earthquake data for this study come from the Berkeley Digital Seismic Network (BDSN), doi:10.7932/BDSN, operated by the UC Berkeley Seismological Laboratory, which is archived at the Northern California Earthquake Data Center (NCEDC), doi:10.7932/NCEDC, and were accessed through NCEDC.

No funding has been received for this research.

Author information

Authors and affiliations.

University of Belgrade, Studentski trg 1, 11000, Belgrade, Serbia

Miloš Simić

You can also search for this author in PubMed   Google Scholar

Contributions

The whole study was conducted and the paper written by Miloš Simić.

Corresponding author

Correspondence to Miloš Simić .

Ethics declarations

Conflict of interest/competing interests.

There are no conflicts of interest and no competing interests regarding this study.

Code availability

The data and the code are in the following \({\mathtt{github}}\) repository:  https://github.com/milos-simic/neural-normality .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Standard statistical tests of normality

Throughout this Appendix, \({\mathbf {x}}=[x_1, x_2, \ldots , x_n]\) will denote a sample drawn from the distribution whose normality we want to test. The same holds for other Appendices.

2.1 A.1 The Shapiro–Wilk Test (SW)

Let \({\mathbf {b}} = [b_1, b_2, \ldots , b_n]^T\) denote the vector of the expected values of order statistics of independent and identically distributed random variables sampled from the standard normal distribution. Let \({\mathbf {V}}\) denote the corresponding covariance matrix.

The intuition behind the SW test is as follows. If a random variable X follows normal distribution \(N(\mu ,\sigma ^2)\) , and \(Z\sim N(0, 1)\) , then \(X=\mu +\sigma Z\) [ 49 ]. For the ordered random samples \({\mathbf {x}}=[x_1,x_2,\ldots ,x_n]\sim N(\mu , \sigma ^2)\) and \({\mathbf {z}}=[z_1,z_2,\ldots ,z_n]\sim N(0,1)\) , the best linear unbiased estimate of \(\sigma\) is [ 49 ]:

In that case, \({\hat{\sigma }}^2\) should be equal to the usual estimate of variance \(\text {S}({\mathbf {x}})^2\) :

The value of the test statistic, W , is a scaled ratio of \({\hat{\sigma }}^2\) and \(\text {S}({\mathbf {x}})^2\) :

The range of W is [0, 1], with higher values indicating stronger evidence in support of normality. The original formulation required use of the tables of the critical values of W [ 52 ] at the most common levels of \(\alpha\) and was limited to smaller samples with \(n \in [3, 20]\) elements because the values of \({\mathbf {b}}\) and \({\mathbf {V}}^{-1}\) were known only for small samples at the time [ 6 ]. Royston [ 98 ] extended the upper limit of n to 2000 and presented a normalizing transformation algorithm suitable for computer implementation. The upper limit was further improved by Royston [ 99 ] who formulated algorithm AS 194 which allowed the test to be used for the samples with \(n \in [3, 5000]\) .

2.2 A.2 The Shapiro–Francia Test (SF)

The Shapiro–Francia test is a simplification of the SW test. It was initially proposed for the larger samples, for which the matrix \({\mathbf {V}}\) in the SW test was unknown [ 61 ]. By assuming that the order statistics are independent, we can substitute the identity matrix \({\mathbf {I}}\) for \({\mathbf {V}}\) in Equation ( 16 ) to obtain the Shapiro-Francia test statistic:

Then, SF is actually the squared Pearson correlation between \(\mathbf{a }\) and \({\mathbf {x}}\) , i.e., the \(R^2\) of the regression of \({\mathbf {x}}\) on \(\mathbf{a }\) [ 2 ]. Only the expected values of the order statistics are needed to conduct this test. They can be calculated using algorithm AS 177 proposed by Royston [ 100 ].

As in the SW test, \({\mathbf {x}}\) is assumed to be ordered.

2.3 A.3 The Lilliefors Test (LF)

This test was introduced independently by Lilliefors [ 56 ] and van Soest [ 101 ]. The LF test checks if the given sample \({\mathbf {x}}\) comes from a normal distribution whose parameters \(\mu\) and \(\sigma\) are taken to be the sample mean ( \(\overline{x}\) ) and standard deviation ( \(\text {sd}({\mathbf {x}})\) ). It is equivalent to the Kolmogorov–Smirnov test of goodness-of-fit [ 102 ] for those particular choices of \(\mu\) and \(\sigma\) [ 13 ]. The LF test is conducted as follows. If the sample at hand comes from \(N(\overline{x},\text {sd}({\mathbf {x}})^2)\) , then its transformation \({\mathbf {z}}=[z_1,z_2,\ldots ,z_n]\) , where:

should follow N (0, 1). The difference between the EDF of \({\mathbf {z}}\) , \(edf_{{\mathbf {z}}}\) , and the CDF of N (0, 1), \(\Phi\) , quantifies how well the sample \({\mathbf {x}}\) fits the normal distribution. In the LF test, that difference is calculated as follows

Higher values of D indicate greater deviation from normality.

2.4 A.4 The Anderson–Darling Test (AD)

Whereas the Lilliefors test focuses on the largest difference between the sample’s EDF and the CDF of the hypothesized model distribution, the AD test calculates the expected weighted difference between those two [ 58 ], with the weighting function designed to make use of the specific properties of the model. The AD statistic is:

where \(\psi\) is the weighting function, edf is the sample’s EDF, and \(F^*\) is the CDF of the distribution we want to test. When testing for normality, the weighting function is chosen to be sensitive to the tails [ 13 , 58 ]:

Then, the statistic ( 21 ) can be calculated in the following way [ 2 , 103 ]:

where \([z_{(1)},z_{(2)},\ldots ,z_{(n)}]\) ( \(z_{(i)} \le z_{(i+1)}, i=1,2,\ldots ,n-1\) ) is the ordered permutation of \({\mathbf {z}}=[z_1,z_2,\ldots ,z_n]\) that is obtained from \({\mathbf {x}}\) as in the LF test. The p -values are computed from the modified statistic [ 103 , 104 ]:

Larger values of the AD statistic indicate stronger arguments in favor of non-normality.

2.5 A.5 The Cramér–von Mises Test (CVM)

As noted by Wijekularathna et al. [ 13 ], the AD test is a generalization of the Cramér–von Mises test [ 62 , 63 , 64 , 65 ]. When \(\psi (\cdot )=1\) , the AD test’s statistic reduces to that of the CVM test:

Because \(\psi (\cdot )\) takes into account the specific properties of the model distribution, the AD test may be more sensitive than the CVM test [ 13 ]. As is the case with the AD test, larger values of the CVM statistic ( 25 ) are more compatible with departures from normality.

2.6 A.6 The Jarque–Bera Test (JB)

The Jarque–Bera test [ 59 , 60 ], checks how much the sample’s skewness ( \(\sqrt{\beta _1}\) ) and kurtosis ( \(\beta _2\) ) match those of normal distributions. Namely, for each normal distribution it holds that \(\sqrt{\beta _1}=0\) and \(\beta _2=3\) . The statistic of the test, as originally defined by Jarque and Bera [ 59 ], is computed as follows:

We see that higher values of J indicate greater deviation from the skewness and kurtosis of normal distributions. The same idea was examined by Bowman and Shenton [ 105 ]. The asymptotic expected values of the estimators of skewness and kurtosis are 0 and 3, while the asymptotic variances are 6/ n and 24/ n for the sample of size n [ 105 ]. The J statistic is then a sum of two asymptotically independent standardized normals. However, the estimator of kurtosis slowly converges to normality, which is why the original statistic is not useful for small and medium-sized samples [ 106 ]. Urzua [ 106 ] adjusted the statistic by using the exact expressions for the means and variances of the estimators of skewness and kurtosis:

which allowed the test to be applied to smaller samples.

2.7 A.7 The D’Agostino–Pearson Test (DP)

The DP test is a combination of the individual skewness and kurtosis tests of normality [ 66 , 67 ]. Its statistic is:

where \(Z_1\) and \(Z_2\) are normal approximations to the skewness and kurtosis. The exact formulas for \(Z_1\) and \(Z_2\) can be found in D’Agostino et al. [ 67 ]. Under the null hypothesis of normality, \(K^2\) has an approximate \(\chi ^2\) distribution with two degrees of freedom [ 67 ].

The rationale behind the DP test is that the statistic combining both the sample skewness and kurtosis can detect departures from normality in terms of both moments, unlike the tests based on only one standardized moment.

B Robustified tests of normality

1.1 b.1 the robustified jarque–bera tests (rjb).

The statistic of the classical JB test, introduced in Section A.6, is based on the classical estimators of the first four moments. Those estimators are sensitive to outliers, which is why the JB test is not robust. The robustified Jarque–Bera tests are obtained when the non-robust moment estimators are replaced with their robust alternatives. Since there are multiple ways to define a robust estimator of a moment, there is not one, but a plethora of the RJB tests, which were defined by Stehlík et al. [ 70 ] and named the RT tests.

Let \({\mathbf {x}}=[x_1, x_2, \ldots , x_n]\) be the ordered sample under consideration .Let \(T_0\) be the sample mean, \(T_1\) its median, \(T_2 \equiv T_{(2, s)}=\frac{1}{n-2s}\sum _{i=s+1}^{n-s}x_i\) the trimmed sample mean, and \(T_3=\text {median}_{i\le j}\{(x_i+x_j)/2\}\) the Lehman–Hodges pseudo-median of \({\mathbf {x}}\) . Except \(T_0\) , all the other location estimators are not sensitive to outliers. Then, the robust central moment estimators can be defined as follows:

With the notation set up, the general statistic of the robustified Jarque–Bera tests can be defined as follows [ 16 , 70 ]:

With the right choice of the parameters ( \(C_1, j_1, a_1 \ldots\) ), the RJB statistic can be reduced to that of the classical Jarque–Bera test.

For the purpose of this study, we used the same four RJB tests that Stehlík et al. [ 16 ] evaluated in their study: MMRT \(_1\) , MMRT \(_2\) , TTRT \(_1\) , TTRT \(_2\) . We refer readers to Stehlík et al. [ 16 ] for more details on those particular tests’ parameters.

1.2 B.2 The Robustified Lin–Mudholkar Test (RLM)

The Lin–Mudholkar test (LM) is based on the fact that the estimators of mean and variance are independent if and only if the sample at hand comes from a normal distribution [ 50 ]. The correlation coefficient between \(\overline{x}\) and \(\text {sd}({\mathbf {x}})\) serves as the statistic of the LM test. Stehlík et al. [ 16 ] use the following bootstrap estimator of the coefficient:

The robustified LM tests rely on robust estimation of moments to obtain robust estimators of the skewness and kurtosis. As is the case with the RJB tests, RLM is a class of tests each defined by the particular choice of the estimators’ parameters. The RLM test considered in this study is the same as the one used by Stehlík et al. [ 16 ], in which

estimates skewness and

estimates kurtosis.

1.3 B.3 The Robustified Shapiro–Wilk Test (RSW)

Let \(J({\mathbf {x}})\) be the scaled average absolute deviation from the sample median \(M=\text {median}({\mathbf {x}})\) :

The RSW test statistic is the ratio of the usual standard deviation estimate \(\text {sd}({\mathbf {x}})\) and \(J(\mathbf{x})\) [ 53 ]:

Under the null hypothesis of normality, J is asymptotically normally distributed and a consistent estimate of the true deviation [ 53 ]. Therefore, the values of RSW close to 1 are expected when the sample at hand does come from a normal distribution.

C Machine-learning methods for normality testing

1.1 c.1 the fssd kernel test of normality.

As mentioned in Section  2 , there are several kernel tests of goodness-of-fit. Just as our approach, they also represent a blend of machine learning and statistics. We evaluate the test of Jitkrittum et al. [ 42 ] against our network because that test is computationally less complex than the original kernel tests of goodness-of-fit proposed by Chwialkowski et al. [ 35 ], but comparable with them in terms of statistical power. Of the other candidates, the approach of Kellner and Celisse [ 45 ] is of quadratic complexity and requires bootstrap, and a drawback of the approach of Lloyd and Ghahramani [ 44 ], as Jitkrittum et al. [ 42 ] point out, is that it requires a model to be fit and new data to be simulated from it. Also, this approach fails to exploit our prior knowledge on the characteristics of the distribution for which goodness-of-fit is being determined. So, the test formulated by Jitkrittum et al. [ 42 ] was our choice as the representative of the kernel tests of goodness-of-fit. It is a distribution-free test, so we first describe its general version before we show how we used it to test for normality. The test is defined for multidimensional distributions, but we present it for the case of one-dimensional distributions because we are interested in one-dimensional Gaussians.

Let \({\mathcal {F}}\) be the RKHS of real-valued functions over \({\mathcal {X}}\in \mathrm{I\!R}\) with the reproducing kernel k . Let q be the density of model \(\Psi\) . As in Chwialkowski et al. [ 35 ], a Stein operator \(T_{q}\) [ 36 ] can be defined over \({\mathcal {F}}\) :

Let us note that for

it holds that:

If \(Z \sim \Psi\) , then \({\mathrm{I\!E}}(T_{q}f)(Z)= 0\) [ 35 ]. Let X be the random variable which follows the distribution from which the sample \({\mathbf {x}}\) was drawn. The Stein discrepancy \(S_q\) between X and Z is defined as follows [ 35 ]:

where \(g(\cdot )={\mathrm{I\!E}}\xi _q(X,\cdot )\) is called the Stein witness function and belongs to \({\mathcal {F}}\) . Chwialkowski et al. [ 35 ] show that if k is a cc-universal kernel [ 107 ], then \(S_{q}(X)=0\) if and only if \(X \sim \Psi\) , provided a couple of mathematical conditions are satisfied.

Jitkrittum et al. [ 42 ] follow the same approach as Chwialkowski et al. [ 43 ] for kernel two-sample tests and present the statistic that is comparable to the original one of Chwialkowski et al. [ 35 ] in terms of power, but faster to compute. The idea is to use a real analytic kernel k that makes the witness function g real analytic. In that case, the values of \(g(v_1),g(v_2),\ldots ,g(v_m)\) for a sample of points \(\{v_j\}_{j=1}^{m}\) , drawn from X , are almost surely zero w.r.t. the density of X if \(X\sim \Psi\) . Jitkrittum et al. [ 42 ] define the following statistic which they call the finite set Stein discrepancy:

If \(X\sim \Psi\) , \(FSSD^2=0\) almost surely. Jitkrittum et al. [ 42 ] use the following estimate of \(FSSD^2\) :

In our case, we want to test if \(\Psi\) is equal to any normal distribution. Similarly to the LF test, we can use the sample estimates of the mean and variance as the parameters of the normal model. Then, we can randomly draw m numbers from \(N(\overline{x}, \text {sd}({\mathbf {x}})^2)\) and use them as points \(\{v_j\}_{j=1}^{m}\) , calculating the estimate ( 42 ) for \({\mathbf {x}}\) and \(N(\overline{x}, \text {sd}({\mathbf {x}}))\) . For kernel, we chose the Gaussian kernel as it fulfills the conditions laid out by Jitkrittum et al. [ 42 ]. To set its bandwidth, we used the median heuristic [ 108 ], which sets it to the median of the absolute differences \(|x_i-x_j|\) ( \(x_i, x_j \in {\mathbf {x}}, 1\le i < j \le n\) ). The exact number of locations, m , was set to 10.

Since g is always zero if the sample comes from the normal distribution, the larger the value of \(FSSD^2\) , the more likely it is that the sample came from a non-normal distribution. We refer to this test as the FSSD test.

1.2 C.2 The statistic-based neural network

Since the neural networks are designed with a fixed-size input in mind, but samples can have any number of elements, Sigut et al. [ 3 ] represent the samples with the statistics of several normality tests which were chosen in advance. The rationale behind this method is that, taken together, the statistics of different normality tests examine samples from complementary perspectives, so a neural network that combines the statistics could be more accurate than individual tests. Sigut et al. [ 3 ] use the estimates of the following statistics:

the W statistic of the Shapiro-Wilk test (see Equation ( 14 )),

the statistic of the test proposed by Lin and Mudholkar [ 50 ]:

and the statistic of the Vasicek test [ 51 ]:

where m is a positive integer smaller than n /2, \([x_{(1)},x_{(2)},\ldots ,x_{(n)}]\) is the non-decreasingly sorted sample \({\mathbf {x}}\) , \(x_{(i)}=x_{(1)}\) for \(i<1\) , and \(x_{(i)}=x_{(n)}\) for \(i>n\) .

It is not clear which activation function Sigut et al. [ 3 ] use. They train three networks with a single hidden layer containing 3, 5, and 10 neurons, respectively. One of the networks is designed to take the sample size into account as well so that it can be more flexible. Just as the other two, the network showed that it was capable of modeling posterior Bayesian probabilities of the input samples being normal. Sigut et al. [ 3 ] focus on the samples with no more than 200 elements.

In addition to our network, presented in Section  4 , we trained one that follows the approach of Sigut et al. [ 3 ]. We refer to that network as Statistic-Based Neural Network (SBNN) because it expects an array of statistics as its input. More precisely, prior to being fed to the network, each sample \({\mathbf {x}}\) is transformed to the following array:

just as in Sigut et al. [ 3 ] ( n is the sample size). ReLU was used as the activation function. To make comparison fair, we trained SBNN in the same way as our network which we design in Section  4 .

D Results for set \({\mathcal {F}}\)

Detailed results for each n in each subset of \({\mathcal {F}}\) are presented in Tables  15 , 16 , 17 and 18 .

Rights and permissions

Reprints and permissions

About this article

Simić, M. Testing for normality with neural networks. Neural Comput & Applic 33 , 16279–16313 (2021). https://doi.org/10.1007/s00521-021-06229-7

Download citation

Received : 03 November 2020

Accepted : 13 June 2021

Published : 21 July 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s00521-021-06229-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Neural Networks
  • Binary Classification
  • Normal Distribution
  • Goodness-of-Fit
  • Find a journal
  • Publish with us
  • Track your research

Statistics add-in software for statistical analysis in Excel

  • Statistical Reference Guide
  • Distribution
  • Continuous distributions

Normality hypothesis test

A hypothesis test formally tests if the population the sample represents is normally-distributed.

The null hypothesis states that the population is normally distributed, against the alternative hypothesis that it is not normally-distributed. If the test p-value is less than the predefined significance level, you can reject the null hypothesis and conclude the data are not from a population with a normal distribution. If the p-value is greater than the predefined significance level, you cannot reject the null hypothesis.

Note that small deviations from normality can produce a statistically significant p-value when the sample size is large, and conversely it can be impossible to detect non-normality with a small sample. You should always examine the normal plot and use your judgment, rather than rely solely on the hypothesis test. Many statistical tests and estimators are robust against moderate departures in normality due to the central limit theorem.

thesis normality test

  • How It Works

Normality Test in SPSS

Discover the Normality Test in SPSS ! Learn how to perform, understand SPSS output , and report results in APA style. Check out this simple, easy-to-follow guide below for a quick read!

Struggling with the Normality test in SPSS? We’re here to help . We offer comprehensive assistance to students , covering assignments , dissertations , research, and more. Request Quote Now !

thesis normality test

Introduction

In the realm of statistical analysis , ensuring the data conforms to a normal distribution is pivotal. Researchers often turn to Normality Tests in SPSS to evaluate the distribution of their data. As statistical significance relies on certain assumptions, assessing normality becomes a crucial step in the analytical process. This blog post delves into the intricacies of Normality Tests, shedding light on tools like the Kolmogorov-Smirnov test and Shapiro-Wilk test, and exploring the steps involved in examining normal distribution using SPSS .

Normal Distribution Test

A Normal Distribution Test, as the name implies, is a statistical method employed to determine if a dataset follows a normal distribution . The assumption of normality is fundamental in various statistical analyses, such as t-tests and ANOVA. In the context of SPSS, researchers utilize tests like the Kolmogorov-Smirnov test and Shapiro-Wilk test to ascertain whether their data conforms to the bell-shaped curve characteristic of a normal distribution. This initial step is crucial as it influences the choice of subsequent statistical tests, ensuring the robustness and reliability of the analytical process. Moving forward, we will dissect the significance and objectives of conducting Normality Tests in SPSS .

Aim of Normality Test

Exploring the data is a fundamental step in statistical analysis, and SPSS offers a comprehensive tool called Explore Analysis for this purpose.

The Analysis provides a detailed overview of the dataset,

  • presenting essential descriptive statistics,
  • measures of central tendency, and

The primary aim of a Normality Test is to evaluate whether a dataset adheres to the assumptions of a normal distribution. This is pivotal because many statistical analyses, including parametric tests, assume that the data is normally distributed.

Assumption of Normality Test

Understanding the assumptions underpinning the Normality Test is crucial for accurate interpretation. Firstly, it’s essential to acknowledge that many parametric tests assume a normal distribution of data for valid results. Consequently, the assumption of normality ensures that the sampling distribution of a statistic is approximately normal, which, in turn, facilitates the application of inferential statistics. Therefore, by subjecting the data to a Normality Test, researchers validate this assumption, providing a solid foundation for subsequent analyses.

How to Check Normal Distribution in SPSS

To comprehensively check for normal distribution in SPSS, researchers can employ a multifaceted approach. Firstly, visual inspection through a Histogram can reveal the shape of the distribution, offering a quick overview. The Normal Q-Q plot provides a graphical representation of how closely the data follows a normal distribution. Additionally, assessing skewness and kurtosis values adds a numerical dimension to the evaluation. High skewness and kurtosis values can indicate departures from normality. Lastly, we can check with statistical tests such as the Kolmogorov-Smirnov test, and the Shapiro-Wilk test . This section will guide users through the practical steps of executing these checks within the SPSS interface, ensuring a thorough examination of the dataset’s distributional characteristics.

 1. Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov test, often abbreviated as the K-S test , is a non-parametric method for determining whether a sample follows a specific distribution. In the context of SPSS, this test is a powerful tool for assessing the normality of a dataset. By comparing the empirical distribution function of the sample with the expected cumulative distribution function of a normal distribution, the K-S test quantifies the degree of similarity.

2. Shapiro-Wilk Test

An alternative to the Kolmogorov-Smirnov test, the Shapiro-Wilk test is another statistical method used to assess the normality of a dataset. Particularly effective for smaller sample sizes , the Shapiro-Wilk test evaluates the null hypothesis that a sample is drawn from a normal distribution. SPSS facilitates the application of this test, offering a straightforward process for researchers.

4. Histogram Plot

Utilizing a Histogram Plot in SPSS is a visual and intuitive method for determining normal distribution. By representing the distribution of data in a graphical format, researchers can promptly identify patterns and deviations.

Here are some common shapes of histograms and their explanations:

  • Normal Distribution (Bell Curve): It has a symmetrical, bell-shaped curve. The data is evenly distributed around the mean, forming a characteristic bell curve. The majority of observations cluster around the mean, with fewer observations towards the tails.
  • Positively Skewed (Skewed Right): The right tail is longer than the left. Most of the data is concentrated on the left side, and a few extreme values pull the mean to the right. This shape is often seen in datasets with a floor effect, where values cannot go below a certain point.
  •  Negatively Skewed (Skewed Left): It has a longer left tail. The majority of data points are concentrated on the right side, with a few extreme values dragging the mean to the left. This shape is common in datasets with a ceiling effect, where values cannot exceed a certain point.

thesis normality test

Other Distributions

  • Bimodal : It has two distinct peaks, indicating the presence of two separate modes or patterns in the data. This shape suggests that the dataset is a combination of two different underlying distributions.
  • Uniform : All values have roughly the same frequency, resulting in a flat, rectangular shape. There are no clear peaks or valleys, and each value has an equal chance of occurrence.
  • Multimodal : It has more than two peaks, indicating multiple modes in the dataset. Each peak represents a distinct pattern or subgroup within the data.
  • Exponential : It has a rapidly decreasing frequency as values increase. It is characterized by a steep decline in the right tail. The shape is common in datasets where the likelihood of an event decreases exponentially with time.
  • Comb: It has alternating high and low frequencies, creating a pattern that resembles the teeth of a comb. This shape suggests periodicity or systematic variation in the data.

5. Normal Q-Q Plot

Furthermore, the Normal Q-Q plot complements the Histogram by providing a visual comparison between the observed data quantiles and the quantiles expected in a normal distribution. By comparing observed quantiles with expected quantiles, researchers gain insights into the conformity of the data to a normal distribution. Clear instructions ensure a seamless incorporation of this method into the normality checking process.

6. Skewness and Kurtosis

It is a statistical measure that quantifies the asymmetry of a probability distribution or a dataset. In the context of data analysis, skewness helps us understand the distribution of values in a dataset and whether it is symmetric or not. The value can be positive, negative, or zero.

  • Positive: it indicates that the data distribution is skewed to the right. In other words, the tail on the right side of the distribution is longer or fatter than the left side, and the majority of the data points are concentrated towards the left.
  • Negative: Conversely, the data distribution is skewed to the left. The tail on the left side is longer or fatter than the right side, and the majority of data points are concentrated towards the right.
  • Zero : A skewness value of zero suggests that the distribution is perfectly symmetrical, with equal tails on both sides.

In summary, skewness provides insights into the shape of the distribution and the relative concentration of data points on either side of the mean.

It is a statistical measure that describes the distribution’s “tailedness” or the sharpness of the peak of a dataset. This helps to identify whether the tails of a distribution contain extreme values. Like skewness, kurtosis can be positive, negative, or zero.

  • Positive: The distribution has heavier tails and a sharper peak than the normal distribution. So, It suggests that the dataset has more outliers or extreme values than would be expected in a normal distribution.
  • Negative: Conversely,  the distribution has lighter tails and a flatter peak than the normal distribution. Therefore, It implies that the dataset has fewer outliers or extreme values than a normal distribution.
  • Zero: A kurtosis value of zero, also known as mesokurtic, indicates a distribution with tails and a peak similar to the normal distribution.

In data analysis,  For a normal distribution, skewness is close to zero, and kurtosis is around 3 (known as mesokurtic). Deviations from these values may suggest non-normality and guide researchers in choosing appropriate statistical methods.

Example of Normality Test

To provide practical insights, this section will present a hypothetical example illustrating the application of normality tests in SPSS. Through a step-by-step walkthrough, readers will gain a tangible understanding of how to apply the Kolmogorov-Smirnov and Shapiro-Wilk tests to a real-world dataset, reinforcing the theoretical concepts discussed earlier.

Imagine you are a researcher conducting a study on the screen time of a group of individuals. You have collected data on the number of min each participant’s screen time per day. As part of your analysis, you want to assess whether the phone screen time follows a normal distribution. Let’s see how to conduct explore analysis in SPSS.

How to Perform Normality Test in SPSS

thesis normality test

Step by Step: Running Normality Analysis in SPSS Statistics

Practicality is paramount, and this section will guide researchers through the step-by-step process of performing a Normality Test in SPSS. From importing the dataset to interpreting the results, this comprehensive guide ensures seamless execution of the normality testing procedure, fostering confidence in the analytical journey.

  • STEP: Load Data into SPSS

Commence by launching SPSS and loading your dataset, which should encompass the variables of interest – a categorical independent variable. If your data is not already in SPSS format, you can import it by navigating to File > Open > Data and selecting your data file.

  • STEP: Access the Analyze Menu

In the top menu, locate and click on “ Analyze .” Within the “Analyze” menu, navigate to “ Descriptive Statistics ” and choose ” Explore .” Analyze > Descriptive Statistics > Explore

  • STEP: Specify Variables 

Upon selecting “ Explore ” a dialog box will appear. Choose the variable of interest and move it to the “ Dependent List ” box

  • STEP: Define Statistics

Click on the ‘Statistics’ button to include Descriptives, Outliers, and Percentiles.

  • STEP: Define the Normality Plot with the Test

Click on the ‘Plot’ button to include visual representations, such as histogram, and stem-and-leaf. Check “ Normality plots with tests ” to obtain a Normal Q-Q plot, Kolmogorov-Smirnov Test, and Shapiro-Wilk test .

6. Final STEP: Generate Normality Test and Chart :

Once you have specified your variables and chosen options, click the “ OK ” button to perform the analysis. SPSS will generate a comprehensive output, including the requested frequency table and chart for your dataset.

Conducting the Normality Test in SPSS provides a robust foundation for understanding the key features of your data. Always ensure that you consult the documentation corresponding to your SPSS version, as steps might slightly differ based on the software version in use. This guide is tailored for SPSS version 25 , and any variations, it’s recommended to refer to the software’s documentation for accurate and updated instructions.

SPSS Output for Normality Test

thesis normality test

How to Interpret SPSS Output of Normality Test

Interpreting the output of a normality test is a critical skill for researchers. This section will dissect the SPSS output, explaining how to analyze results from the Kolmogorov-Smirnov and Shapiro-Wilk tests, as well as interpret visual aids like Histograms and Normal Q-Q plots.

  • Skewness and Kurtosis: The skewness of approximately 0 suggests a symmetrical distribution, while the negative kurtosis of -0.293 indicates lighter tails compared to a normal distribution.
  • Kolmogorov-Smirnov Test: The Kolmogorov-Smirnov test yields a statistic of 0.051 with a p-value of 0.200 (approximately), indicating no significant evidence to reject the null hypothesis of normality.
  • Shapiro-Wilk Test: The Shapiro-Wilk test produces a statistic of 0.993 with a p-value of 0.876, providing further support for the assumption of normality.
  • Histogram and Normal Q-Q Plot: The Histogram with a central peak, reflects a symmetric distribution, and the Normal Q-Q plot with points closely aligned along a straight line, affirming the approximate normality of the “Screen Time (in min)” variable.

How to Report Results of Normality Analysis in APA

Effective communication of research findings is essential, and this section will guide researchers on how to report the results of normality tests following the guidelines of the American Psychological Association (APA). From structuring sentences to incorporating statistical values, this segment ensures that researchers convey their findings accurately and professionally.

thesis normality test

Get Help From SPSSanalysis.com

Embark on a seamless research journey with SPSSAnalysis.com , where our dedicated team provides expert data analysis assistance for students, academicians, and individuals. We ensure your research is elevated with precision. Explore our pages;

  • SPSS Data Analysis Help – SPSS Helper ,
  • Quantitative Analysis Help ,
  • Qualitative Analysis Help ,
  • SPSS Dissertation Analysis Help ,
  • Dissertation Statistics Help ,
  • Statistical Analysis Help ,
  • Medical Data Analysis Help .

Connect with us at SPSSAnalysis.com to empower your research endeavors and achieve impactful results. Get a Free Quote Today !

Expert SPSS data analysis assistance available.

thesis normality test

  • [email protected]
  • +1 424 666 28 24
  • How it works
  • GET YOUR FREE QUOTE
  • GET A FREE QUOTE
  • Frequently Asked Questions
  • Become a Statistician

Reporting Normality Test in SPSS

Looking for Normality Test in SPSS ? Doing it yourself is always cheaper, but it can also be a lot  more time-consuming . If you’re not good at SPSS, you can  pay someone to do your SPSS task  for you.

thesis normality test

How to Run Normality Test in SPSS: Explanation Step by Step

From the spss menu, choose analyze – descriptives – explore.

Normality Test in SPSS menu

A new window will appear. From the left box, transfer variables Age and Height into Dependent list box. Click Both in the Display box.

Normality Test in SPSS

Click on Statistics… button. A new window will open. Choose Descriptives. Click Continue, and you will return to the previous box.

Kolmogorov Smirnov Test in SPSS

Click on Plots… button, New window will open. In the Boxplots box, choose Factor levels together. In the Descriptive box, choose Stem-and-leaf and Normality plots with tests. Click Continue, and you will return to the previous box. Click OK.

SPSS MENU

The test of normality results will appear in the output window.

SPSS Output for Normality Test

How to report a Normality Test results: Explanation Step by Step

How to report case processing summary table in spss output.

The first table is the Case Processing summary table. It shows the number and percent of valid, missing and total cases for variables Age and Height.

SPSS output for Normality Test

How to Report Descriptive Statistics Table in SPSS Output?

The second table shows descriptive statistics for variable Age and Height.

descriptive statistics output in spss

How to Report P-Value of Kolmogorov-Smirnov and Shapiro-Wilk tests of normality Table in SPSS Output?

The third table shows the results of Kolmogorov-Smirnov and Shapiro-Wilk tests of normality (tests statistic, degrees of freedom, p-value). Since we have less than 50 observations (N = 32 < 50), we will interpret the Shapiro-Wilk test results.

Firstly, If p (Sig.) > 0.05, we fail to reject the null hypothesis and conclude that data is normally distributed so we must use parametric tests.

secondly, if the p-value is less than 0.05. Therefore, we must reject the null hypothesis in other words data is not normally distributed. Therefore, We must use nonparametric tests.

In our example, the p-value for age is 0.018 < 0.05. Therefore, we must reject the null hypothesis and conclude that age is not normally distributed.

test of normality spss output

How to Report Normal Q-Q Plot in SPSS output?

The output also shows the Normal Q-Q Plot for Age and Height.

Firstly, If the data points are close to the diagonal line on the chart so we conclude that data is normally distributed otherwise data set does not show normal distribution.

Lastly, From the chart for age, we can conclude that data points are not close to the diagonal line, we, therefore, conclude that data are not normally distributed.

Normal Q-Q

How to Interpret a Normality Test Results in APA Style?

Shapiro-Wilk test of normality was conducted to determine whether Age and Height data is normally distributed. The results indicate that we must reject the null hypothesis for Age data (p = 0.018) and conclude that data is not normally distributed. Consequently, the results also indicate that we fail to reject the null hypothesis for Height data (p = 0.256) and conclude that data is normally distributed.

Visit our “ How to Run Normality Test in SPSS ” page for more details. Moreover, go to the general page to check Other Reporting  Statistical Tests in SPSS . Finally, If you want to watch SPSS videos, Please visit our  YouTube Chanel.

GET HELP FROM THE US

There is a lot of statistical software out there, but SPSS is one of the most popular. If you’re a student who needs   help with SPSS , there are a few different resources you can turn to. The first is  SPSS Video Tutorials . We prepared a page for  SPSS Tutor for Beginners . All contents can guide you through Step-by-step SPSS data analysis tutorials and you can see  How to Run in Statistical Analysis in SPSS .

The second option is that you can get help from us, we give  SPSS help for students  with their assignments, dissertation, or research. Doing it yourself is always cheaper, but it can also be a lot more time-consuming. If you’re not the best at SPSS, then this might not be a good idea. It can take days just to figure out how to do some of the easier things in SPSS. So  paying someone to do your SPSS  will save you a ton of time and make your life a lot easier.

The procedure of the SPSS help service at  OnlineSPSS.com   is fairly simple. There are three easy-to-follow steps.

1.  Click and Get a FREE Quote 2. Make the Payment 3. Get the Solution

Our purpose is to provide quick, reliable, and understandable information about SPSS data analysis to our clients.

What to do with nonnormal data

You have several options when you want to perform a hypothesis test with nonnormal data.

Proceed with the analysis if the sample is large enough

Although many hypothesis tests are formally based on the assumption of normality, you can still obtain good results with nonnormal data if your sample is large enough. The amount of data you need depends on how nonnormal your data are but a sample size of 20 is often adequate. The relationship between robustness to normality and sample size is based on the central limit theorem . This theorem proves that the distribution of the mean of data from any distribution approaches the normal distribution as the sample size increases. Therefore, if you're interested in making an inference about a population mean the normality assumption is not critical so long as your sample is large enough.

Use a nonparametric test

Nonparametric tests do not assume a specific distribution for the population. Minitab provides several nonparametric tests that you can use instead of tests that assume normality. These tests can be especially useful when you have a small sample that is skewed or a sample that contains several outliers.

Nonparametric tests are not completely free of assumptions about your data: for example, they still require the data to be an independent random sample.

Transform the data

Sometimes you can transform your data by applying a function to make your data fit a normal distribution, so that you can finish your analysis.

  • Minitab.com
  • License Portal
  • Cookie Settings

You are now leaving support.minitab.com.

Click Continue to proceed to:

IMAGES

  1. (PDF) Applications of Normality Test in Statistical Analysis

    thesis normality test

  2. Normality test [Simply Explained]

    thesis normality test

  3. Normality Test With Example

    thesis normality test

  4. ks test for normality in r

    thesis normality test

  5. How To Interpret Normality Test In Spss

    thesis normality test

  6. How to Test the Normality Assumption in Linear Regression and

    thesis normality test

VIDEO

  1. Eviews. heteroscedasticity, autocorrelation, normality 2

  2. How to conduct normality test (PART 2)

  3. Normality test

  4. Normality test Methods: Which method is appropriate to test Normality assumption (Amharic Tutorial)

  5. Normality Checking Part-1: Various Statistics in SPSS

  6. Normality Test Kudri

COMMENTS

  1. Normality Test Explained: Methods Of Assessing Normality

    Normality TestA normality test determines whether a sample data has been drawn from a normally distributed population. It is generally performed to verify whether the data involved in the research have a normal distribution. Many statistical procedures such as correlation, regression, t-tests, and ANOVA, namely parametric tests, are based on the normal distribution of data. […]

  2. Descriptive Statistics and Normality Tests for Statistical Data

    For the continuous data, test of the normality is an important step for deciding the measures of central tendency and statistical methods for data analysis. When our data follow normal distribution, parametric tests otherwise nonparametric methods are used to compare the groups. ... For both of the above tests, null hypothesis states that data ...

  3. Normality test

    Normality test. In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. More precisely, the tests are a form of model selection, and can be interpreted several ways, depending on one's ...

  4. How to Test for Normality: Three Simple Tests

    Null hypothesis: Data is sampled from a normal distribution. Alternative hypothesis: Data is not sampled from a normal distribution. Like many other techniques for testing hypotheses, the chi-square test for normality involves computing a test-statistic and finding the P-value for the test statistic, given degrees of freedom and significance ...

  5. Normality Tests for Statistical Analysis: A Guide for Non-Statisticians

    For small sample sizes, normality tests have little power to reject the null hypothesis and therefore small samples most often pass normality tests . For large sample sizes, significant results would be derived even in the case of a small deviation from normality ( 2 , 7 ), although this small deviation will not affect the results of a ...

  6. 11.9: Checking the Normality of a Sample

    Figure 13.20: Sampling distribution of the Shapiro-Wilk W statistic, under the null hypothesis that the data are normally distributed, for samples of size 10, 20 and 50. Note that small values of W indicate departure from normality. ... ## ## Shapiro-Wilk normality test ## ## data: normal.data ## W = 0.98654, p-value = 0.4076 ...

  7. Choosing the Right Statistical Test

    Statistical tests are used in hypothesis testing. They can be used to: ... If one group has much more variation than others, it will limit the test's effectiveness. Normality of data: the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data.

  8. (PDF) Normality Tests for Statistical Analysis: A Guide for Non

    In this paper, Shapiro-Wilk test (Ghasemi and Zahediasl 2012; Rani Das 2016;Mishra et al. 2019) was used to test the normality of TS and K IC datasets. The results of Shapiro-Wilk test, obtained ...

  9. Normality Tests

    Which of the many tests for normality is the best to use in a specific situation depends on how much is known or assumed about the alternative hypothesis. Normality tests have been derived based on alternative hypotheses ranging from specific distributions to non-normality of a completely unspecified nature.

  10. Interpreting results: Normality tests

    What question does the normality test answer? The normality tests all report a P value. To understand any P value, you need to know the null hypothesis. In this case, the null hypothesis is that all the values were sampled from a population that follows a Gaussian distribution. The P value answers the question:

  11. An Introduction to the Shapiro-Wilk Test for Normality

    The Shapiro-Wilk test is a hypothesis test that is applied to a sample with a null hypothesis that the sample has been generated from a normal distribution. If the p-value is low, we can reject such a null hypothesis and say that the sample has not been generated from a normal distribution. It's an easy-to-use statistical tool that can help ...

  12. Interpret the key results for Normality Test

    Step 1: Determine whether the data do not follow a normal distribution. To determine whether the data do not follow a normal distribution, compare the p-value to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the data do not ...

  13. Test of Normality • Simply explained

    Statistical tests for normal distribution. To test your data analytically for normal distribution, there are several test procedures, the best known being the Kolmogorov-Smirnov test, the Shapiro-Wilk test, and the Anderson Darling test. In all of these tests, you are testing the null hypothesis that your data are normally distributed.

  14. (PDF) A Brief Review of Tests for Normality

    Table 6 displays the result of the normality test. In the Kolmogorov-Smirnov normality test, the data is normal if the significance of the data is more than the critical alpha (0.05) (Das & Imon ...

  15. Hypotheses for Normality Test

    For a normality test, the hypotheses are as follows. H 0: Data follow a normal distribution.; H 1: Data do not follow a normal distribution.

  16. Testing for Normality using SPSS Statistics

    The above table presents the results from two well-known tests of normality, namely the Kolmogorov-Smirnov Test and the Shapiro-Wilk Test. The Shapiro-Wilk Test is more appropriate for small sample sizes (< 50 samples), but can also handle sample sizes as large as 2000. For this reason, we will use the Shapiro-Wilk test as our numerical means ...

  17. (PDF) Normality Test

    Abstract and Figures. This paper deals with the use of Normality tests In Research. Actually, researcher should check whether the data, to be analysed, represent the symmetrical distribution or ...

  18. Testing for normality with neural networks

    The standard statistical tests of normality require two hypotheses to be formulated: the null ( \ (H_0\)) that the data at hand come from a normal distribution, and the alternative hypothesis ( \ (H_a\)) that the distribution from which the data were drawn is not normal. The tests are usually conducted as follows.

  19. Normality hypothesis test > Normality > Continuous distributions

    The null hypothesis states that the population is normally distributed, against the alternative hypothesis that it is not normally-distributed. If the test p-value is less than the predefined significance level, you can reject the null hypothesis and conclude the data are not from a population with a normal distribution.

  20. Normality Test in SPSS

    A Normal Distribution Test, as the name implies, is a statistical method employed to determine if a dataset follows a normal distribution. The assumption of normality is fundamental in various statistical analyses, such as t-tests and ANOVA. In the context of SPSS, researchers utilize tests like the Kolmogorov-Smirnov test and Shapiro-Wilk test ...

  21. (PDF) Normality test: Is it really necessary?

    The results of the data normality test calculation used the Kolmogorov Smirnov test at a significance level of 0.05, showing a significance value> 0.05, so that all sample groups in this study ...

  22. A Gentle Introduction to Normality Tests in Python

    A Gentle Introduction to Normality Tests in Python. By Jason Brownlee on August 8, 2019 in Statistics 128. An important decision point when working with a sample of data is whether to use parametric or nonparametric statistical methods. Parametric statistical methods assume that the data has a known and specific distribution, often a Gaussian ...

  23. Reporting Normality Test in SPSS

    Consequently, the results also indicate that we fail to reject the null hypothesis for Height data (p = 0.256) and conclude that data is normally distributed. Visit our "How to Run Normality Test in SPSS" page for more details. Moreover, go to the general page to check Other Reporting Statistical Tests in SPSS.

  24. What to do with nonnormal data

    Proceed with the analysis if the sample is large enough. Although many hypothesis tests are formally based on the assumption of normality, you can still obtain good results with nonnormal data if your sample is large enough. The amount of data you need depends on how nonnormal your data are but a sample size of 20 is often adequate.