• Search Search Please fill out this field.

What Are Descriptive Statistics?

  • How They Work

Univariate vs. Bivariate

Descriptive statistics and visualizations, descriptive statistics and outliers.

  • Descriptive vs. Inferential

The Bottom Line

  • Corporate Finance
  • Financial Analysis

Descriptive Statistics: Definition, Overview, Types, and Example

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

descriptive statistics in research definition

Descriptive statistics are brief informational coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of a population. Descriptive statistics are broken down into measures of central tendency and measures of variability (spread). Measures of central tendency include the mean, median, and mode, while measures of variability include standard deviation, variance, minimum and maximum variables, kurtosis , and skewness .

Key Takeaways

  • Descriptive statistics summarizes or describes the characteristics of a data set.
  • Descriptive statistics consists of three basic categories of measures: measures of central tendency, measures of variability (or spread), and frequency distribution.
  • Measures of central tendency describe the center of the data set (mean, median, mode).
  • Measures of variability describe the dispersion of the data set (variance, standard deviation).
  • Measures of frequency distribution describe the occurrence of data within the data set (count).

Jessica Olah

Understanding Descriptive Statistics

Descriptive statistics help describe and understand the features of a specific data set by giving short summaries about the sample and measures of the data. The most recognized types of descriptive statistics are measures of center. For example, the mean , median , and mode , which are used at almost all levels of math and statistics, are used to define and describe a data set. The mean, or the average, is calculated by adding all the figures within the data set and then dividing by the number of figures within the set.

For example, the sum of the following data set is 20: (2, 3, 4, 5, 6). The mean is 4 (20/5). The mode of a data set is the value appearing most often, and the median is the figure situated in the middle of the data set. It is the figure separating the higher figures from the lower figures within a data set. However, there are less common types of descriptive statistics that are still very important.

People use descriptive statistics to repurpose hard-to-understand quantitative insights across a large data set into bite-sized descriptions. A student's grade point average (GPA), for example, provides a good understanding of descriptive statistics. The idea of a GPA is that it takes data points from a wide range of exams, classes, and grades, and averages them together to provide a general understanding of a student's overall academic performance. A student's personal GPA reflects their mean academic performance.

Descriptive statistics, especially in fields such as medicine, often visually depict data using scatter plots, histograms, line graphs, or stem and leaf displays. We'll talk more about visuals later in this article.

Types of Descriptive Statistics

All descriptive statistics are either measures of central tendency or measures of variability , also known as measures of dispersion.

Central Tendency

Measures of central tendency focus on the average or middle values of data sets, whereas measures of variability focus on the dispersion of data. These two measures use graphs, tables and general discussions to help people understand the meaning of the analyzed data.

Measures of central tendency describe the center position of a distribution for a data set. A person analyzes the frequency of each data point in the distribution and describes it using the mean, median, or mode, which measures the most common patterns of the analyzed data set.

Measures of Variability

Measures of variability (or the measures of spread) aid in analyzing how dispersed the distribution is for a set of data. For example, while the measures of central tendency may give a person the average of a data set, it does not describe how the data is distributed within the set.

So while the average of the data maybe 65 out of 100, there can still be data points at both 1 and 100. Measures of variability help communicate this by describing the shape and spread of the data set. Range, quartiles , absolute deviation, and variance are all examples of measures of variability.

Consider the following data set: 5, 19, 24, 62, 91, 100. The range of that data set is 95, which is calculated by subtracting the lowest number (5) in the data set from the highest (100).

Distribution

Distribution (or frequency distribution) refers to the quantity of times a data point occurs. Alternatively, it is the measurement of a data point failing to occur. Consider a data set: male, male, female, female, female, other. The distribution of this data can be classified as:

  • The number of males in the data set is 2.
  • The number of females in the data set is 3.
  • The number of individuals identifying as other is 1.
  • The number of non-males is 4.

In descriptive statistics, univariate data analyzes only one variable. It is used to identify characteristics of a single trait and is not used to analyze any relationships or causations.

For example, imagine a room full of high school students. Say you wanted to gather the average age of the individuals in the room. This univariate data is only dependent on one factor: each person's age. By gathering this one piece of information from each person and dividing by the total number of people, you can determine the average age.

Bivariate data, on the other hand, attempts to link two variables by searching for correlation. Two types of data are collected, and the relationship between the two pieces of information is analyzed together. Because multiple variables are analyzed, this approach may also be referred to as multivariate .

Let's say each high school student in the example above takes a college assessment test, and we want to see whether older students are testing better than younger students. In addition to gathering the age of the students, we need to gather each student's test score. Then, using data analytics, we mathematically or graphically depict whether there is a relationship between student age and test scores.

The preparation and reporting of financial statements is an example of descriptive statistics. Analyzing that financial information to make decisions on the future is inferential statistics.

One essential aspect of descriptive statistics is graphical representation. Visualizing data distributions effectively can be incredibly powerful, and this is done in several ways.

Histograms are tools for displaying the distribution of numerical data. They divide the data into bins or intervals and represent the frequency or count of data points falling into each bin through bars of varying heights. Histograms help identify the shape of the distribution, central tendency, and variability of the data.

Another visualization is boxplots. Boxplots, also known as box-and-whisker plots, provide a concise summary of a data distribution by highlighting key summary statistics including the median (middle line inside the box), quartiles (edges of the box), and potential outliers (points outside the "whiskers"). Boxplots visually depict the spread and skewness of the data and are particularly useful for comparing distributions across different groups or variables.

Anytime descriptive statistics are being discussed, it's important to note outliers. Outliers are data points that significantly differ from other observations in a dataset. These could be errors, anomalies, or rare events within the data.

Detecting and managing outliers is a step in descriptive statistics to ensure accurate and reliable data analysis. To identify outliers, you can use graphical techniques (such as boxplots or scatter plots) or statistical methods (such as Z-score or IQR method). These approaches help pinpoint observations that deviate substantially from the overall pattern of the data.

The presence of outliers can have a notable impact on descriptive statistics. This is vitally important in statistics, as this can skew results and affect the interpretation of data. Outliers can disproportionately influence measures of central tendency, such as the mean, pulling it towards their extreme values. For example, the dataset of (1, 1, 1, 997) is 250, even though that is hardly representative of the dataset. This distortion can lead to misleading conclusions about the typical behavior of the dataset.

Depending on the context, outliers can be treated by either removing them (if they are genuinely erroneous or irrelevant). Alternatively, outliers may hold important information and should be kept for the value they may be able to demonstrate. As you analyze your data, consider the relevance of what outliers can contribute and whether it makes more sense to just strike those data points from your descriptive statistic calculations.

Descriptive Statistics vs. Inferential Statistics

Descriptive statistics have a different function than inferential statistics, data sets that are used to make decisions or apply characteristics from one data set to another.

Imagine another example where a company sells hot sauce. The company gathers data such as the count of sales , average quantity purchased per transaction , and average sale per day of the week. All of this information is descriptive, as it tells a story of what actually happened in the past. In this case, it is not being used beyond being informational.

Let's say the same company wants to roll out a new hot sauce. It gathers the same sales data above, but it crafts the information to make predictions about what the sales of the new hot sauce will be. The act of using descriptive statistics and applying characteristics to a different data set makes the data set inferential statistics. We are no longer simply summarizing data; we are using it predict what will happen regarding an entirely different body of data (the new hot sauce product).

What Is Descriptive Statistics?

Descriptive statistics is a means of describing features of a data set by generating summaries about data samples. It's often depicted as a summary of data shown that explains the contents of data. For example, a population census may include descriptive statistics regarding the ratio of men and women in a specific city.

What Are Examples of Descriptive Statistics?

Descriptive statistics are informational and meant to describe the actual characteristics of a data set. When analyzing numbers regarding the prior Major League Baseball season, descriptive statistics including the highest batting average for a single player, the number of runs allowed per team, and the average wins per division.

What Is the Main Purpose of Descriptive Statistics?

The main purpose of descriptive statistics is to provide information about a data set. In the example above, there are hundreds of baseballs players that engage in thousands of games. Descriptive statistics summarizes the large amount of data into several useful bits of information.

What Are the Types of Descriptive Statistics?

The three main types of descriptive statistics are frequency distribution, central tendency, and variability of a data set. The frequency distribution records how often data occurs, central tendency records the data's center point of distribution, and variability of a data set records its degree of dispersion.

Can Descriptive Statistics Be Used to Make Inference or Predictions?

No. While these descriptives help understand data attributes, inferential statistical techniques—a separate branch of statistics—are required to understand how variables interact with one another in a data set.

Descriptive statistics refers to the analysis, summary, and communication of findings that describe a data set. Often not useful for decision-making, descriptive statistics still hold value in explaining high-level summaries of a set of information such as the mean, median, mode, variance, range, and count of information.

Purdue Online Writing Lab. " Writing with Statistics: Descriptive Statistics ."

Cooksey, Ray W. " Descriptive Statistics for Summarizing Data ." Illustrating Statistical Procedures: Finding Meaning in Quantitative Data , vol. 15, May 2020, pp. 61–139.

Professor Andrew Ainsworth, California State University Northridge. " Measures of Variability, Descriptive Statistics Part 2 ." Page 2.

Professor Beverly Reed, Kent State University. " Summary: Differences Between Univariate and Bivariate Data ."

Purdue Online Writing Lab. " Writing with Statistics: Basic Inferential Statistics: Theory and Application ."

descriptive statistics in research definition

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Grad Coach

Quant Analysis 101: Descriptive Statistics

Everything You Need To Get Started (With Examples)

By: Derek Jansen (MBA) | Reviewers: Kerryn Warren (PhD) | October 2023

If you’re new to quantitative data analysis , one of the first terms you’re likely to hear being thrown around is descriptive statistics. In this post, we’ll unpack the basics of descriptive statistics, using straightforward language and loads of examples . So grab a cup of coffee and let’s crunch some numbers!

Overview: Descriptive Statistics

What are descriptive statistics.

  • Descriptive vs inferential statistics
  • Why the descriptives matter
  • The “ Big 7 ” descriptive statistics
  • Key takeaways

At the simplest level, descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset – for example, a set of survey responses. They provide a snapshot of the characteristics of your dataset and allow you to better understand, roughly, how the data are “shaped” (more on this later). For example, a descriptive statistic could include the proportion of males and females within a sample or the percentages of different age groups within a population.

Another common descriptive statistic is the humble average (which in statistics-talk is called the mean ). For example, if you undertook a survey and asked people to rate their satisfaction with a particular product on a scale of 1 to 10, you could then calculate the average rating. This is a very basic statistic, but as you can see, it gives you some idea of how this data point is shaped .

Descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset, including its “shape”

What about inferential statistics?

Now, you may have also heard the term inferential statistics being thrown around, and you’re probably wondering how that’s different from descriptive statistics. Simply put, descriptive statistics describe and summarise the sample itself , while inferential statistics use the data from a sample to make inferences or predictions about a population .

Put another way, descriptive statistics help you understand your dataset , while inferential statistics help you make broader statements about the population , based on what you observe within the sample. If you’re keen to learn more, we cover inferential stats in another post , or you can check out the explainer video below.

Why do descriptive statistics matter?

While descriptive statistics are relatively simple from a mathematical perspective, they play a very important role in any research project . All too often, students skim over the descriptives and run ahead to the seemingly more exciting inferential statistics, but this can be a costly mistake.

The reason for this is that descriptive statistics help you, as the researcher, comprehend the key characteristics of your sample without getting lost in vast amounts of raw data. In doing so, they provide a foundation for your quantitative analysis . Additionally, they enable you to quickly identify potential issues within your dataset – for example, suspicious outliers, missing responses and so on. Just as importantly, descriptive statistics inform the decision-making process when it comes to choosing which inferential statistics you’ll run, as each inferential test has specific requirements regarding the shape of the data.

Long story short, it’s essential that you take the time to dig into your descriptive statistics before looking at more “advanced” inferentials. It’s also worth noting that, depending on your research aims and questions, descriptive stats may be all that you need in any case . So, don’t discount the descriptives! 

Free Webinar: Research Methodology 101

The “Big 7” descriptive statistics

With the what and why out of the way, let’s take a look at the most common descriptive statistics. Beyond the counts, proportions and percentages we mentioned earlier, we have what we call the “Big 7” descriptives. These can be divided into two categories – measures of central tendency and measures of dispersion.

Measures of central tendency

True to the name, measures of central tendency describe the centre or “middle section” of a dataset. In other words, they provide some indication of what a “typical” data point looks like within a given dataset. The three most common measures are:

The mean , which is the mathematical average of a set of numbers – in other words, the sum of all numbers divided by the count of all numbers. 
The median , which is the middlemost number in a set of numbers, when those numbers are ordered from lowest to highest.
The mode , which is the most frequently occurring number in a set of numbers (in any order). Naturally, a dataset can have one mode, no mode (no number occurs more than once) or multiple modes.

To make this a little more tangible, let’s look at a sample dataset, along with the corresponding mean, median and mode. This dataset reflects the service ratings (on a scale of 1 – 10) from 15 customers.

Example set of descriptive stats

As you can see, the mean of 5.8 is the average rating across all 15 customers. Meanwhile, 6 is the median . In other words, if you were to list all the responses in order from low to high, Customer 8 would be in the middle (with their service rating being 6). Lastly, the number 5 is the most frequent rating (appearing 3 times), making it the mode.

Together, these three descriptive statistics give us a quick overview of how these customers feel about the service levels at this business. In other words, most customers feel rather lukewarm and there’s certainly room for improvement. From a more statistical perspective, this also means that the data tend to cluster around the 5-6 mark , since the mean and the median are fairly close to each other.

To take this a step further, let’s look at the frequency distribution of the responses . In other words, let’s count how many times each rating was received, and then plot these counts onto a bar chart.

Example frequency distribution of descriptive stats

As you can see, the responses tend to cluster toward the centre of the chart , creating something of a bell-shaped curve. In statistical terms, this is called a normal distribution .

As you delve into quantitative data analysis, you’ll find that normal distributions are very common , but they’re certainly not the only type of distribution. In some cases, the data can lean toward the left or the right of the chart (i.e., toward the low end or high end). This lean is reflected by a measure called skewness , and it’s important to pay attention to this when you’re analysing your data, as this will have an impact on what types of inferential statistics you can use on your dataset.

Example of skewness

Measures of dispersion

While the measures of central tendency provide insight into how “centred” the dataset is, it’s also important to understand how dispersed that dataset is . In other words, to what extent the data cluster toward the centre – specifically, the mean. In some cases, the majority of the data points will sit very close to the centre, while in other cases, they’ll be scattered all over the place. Enter the measures of dispersion, of which there are three:

Range , which measures the difference between the largest and smallest number in the dataset. In other words, it indicates how spread out the dataset really is.

Variance , which measures how much each number in a dataset varies from the mean (average). More technically, it calculates the average of the squared differences between each number and the mean. A higher variance indicates that the data points are more spread out , while a lower variance suggests that the data points are closer to the mean.

Standard deviation , which is the square root of the variance . It serves the same purposes as the variance, but is a bit easier to interpret as it presents a figure that is in the same unit as the original data . You’ll typically present this statistic alongside the means when describing the data in your research.

Again, let’s look at our sample dataset to make this all a little more tangible.

descriptive statistics in research definition

As you can see, the range of 8 reflects the difference between the highest rating (10) and the lowest rating (2). The standard deviation of 2.18 tells us that on average, results within the dataset are 2.18 away from the mean (of 5.8), reflecting a relatively dispersed set of data .

For the sake of comparison, let’s look at another much more tightly grouped (less dispersed) dataset.

Example of skewed data

As you can see, all the ratings lay between 5 and 8 in this dataset, resulting in a much smaller range, variance and standard deviation . You might also notice that the data are clustered toward the right side of the graph – in other words, the data are skewed. If we calculate the skewness for this dataset, we get a result of -0.12, confirming this right lean.

In summary, range, variance and standard deviation all provide an indication of how dispersed the data are . These measures are important because they help you interpret the measures of central tendency within context . In other words, if your measures of dispersion are all fairly high numbers, you need to interpret your measures of central tendency with some caution , as the results are not particularly centred. Conversely, if the data are all tightly grouped around the mean (i.e., low dispersion), the mean becomes a much more “meaningful” statistic).

Key Takeaways

We’ve covered quite a bit of ground in this post. Here are the key takeaways:

  • Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis.
  • Measures of central tendency include the mean (average), median and mode.
  • Skewness indicates whether a dataset leans to one side or another
  • Measures of dispersion include the range, variance and standard deviation

If you’d like hands-on help with your descriptive statistics (or any other aspect of your research project), check out our private coaching service , where we hold your hand through each step of the research journey. 

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Inferential stats 101

Good day. May I ask about where I would be able to find the statistics cheat sheet?

Khan

Right above you comment 🙂

Laarbik Patience

Good job. you saved me

Lou

Brilliant and well explained. So much information explained clearly!

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Descriptive Statistics: Reporting the Answers to the 5 Basic Questions of Who, What, Why, When, Where, and a Sixth, So What?

Affiliation.

  • 1 From the Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas.
  • PMID: 28891910
  • DOI: 10.1213/ANE.0000000000002471

Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic statistical tutorial discusses a series of fundamental concepts about descriptive statistics and their reporting. The mean, median, and mode are 3 measures of the center or central tendency of a set of data. In addition to a measure of its central tendency (mean, median, or mode), another important characteristic of a research data set is its variability or dispersion (ie, spread). In simplest terms, variability is how much the individual recorded scores or observed values differ from one another. The range, standard deviation, and interquartile range are 3 measures of variability or dispersion. The standard deviation is typically reported for a mean, and the interquartile range for a median. Testing for statistical significance, along with calculating the observed treatment effect (or the strength of the association between an exposure and an outcome), and generating a corresponding confidence interval are 3 tools commonly used by researchers (and their collaborating biostatistician or epidemiologist) to validly make inferences and more generalized conclusions from their collected data and descriptive statistics. A number of journals, including Anesthesia & Analgesia, strongly encourage or require the reporting of pertinent confidence intervals. A confidence interval can be calculated for virtually any variable or outcome measure in an experimental, quasi-experimental, or observational research study design. Generally speaking, in a clinical trial, the confidence interval is the range of values within which the true treatment effect in the population likely resides. In an observational study, the confidence interval is the range of values within which the true strength of the association between the exposure and the outcome (eg, the risk ratio or odds ratio) in the population likely resides. There are many possible ways to graphically display or illustrate different types of data. While there is often latitude as to the choice of format, ultimately, the simplest and most comprehensible format is preferred. Common examples include a histogram, bar chart, line chart or line graph, pie chart, scatterplot, and box-and-whisker plot. Valid and reliable descriptive statistics can answer basic yet important questions about a research data set, namely: "Who, What, Why, When, Where, How, How Much?"

  • Analysis of Variance
  • Biomedical Research / statistics & numerical data*
  • Computer Graphics
  • Confidence Intervals
  • Data Collection / statistics & numerical data*
  • Data Interpretation, Statistical*
  • Models, Statistical*
  • Research Design / statistics & numerical data*
  • Sample Size
  • Privacy Policy

Research Method

Home » Descriptive Statistics – Types, Methods and Examples

Descriptive Statistics – Types, Methods and Examples

Table of Contents

Descriptive Statistics

Descriptive Statistics

Descriptive statistics is a branch of statistics that deals with the summarization and description of collected data. This type of statistics is used to simplify and present data in a manner that is easy to understand, often through visual or numerical methods. Descriptive statistics is primarily concerned with measures of central tendency, variability, and distribution, as well as graphical representations of data.

Here are the main components of descriptive statistics:

  • Measures of Central Tendency : These provide a summary statistic that represents the center point or typical value of a dataset. The most common measures of central tendency are the mean (average), median (middle value), and mode (most frequent value).
  • Measures of Dispersion or Variability : These provide a summary statistic that represents the spread of values in a dataset. Common measures of dispersion include the range (difference between the highest and lowest values), variance (average of the squared differences from the mean), standard deviation (square root of the variance), and interquartile range (difference between the upper and lower quartiles).
  • Measures of Position : These are used to understand the distribution of values within a dataset. They include percentiles and quartiles.
  • Graphical Representations : Data can be visually represented using various methods like bar graphs, histograms, pie charts, box plots, and scatter plots. These visuals provide a clear, intuitive way to understand the data.
  • Measures of Association : These measures provide insight into the relationships between variables in the dataset, such as correlation and covariance.

Descriptive Statistics Types

Descriptive statistics can be classified into two types:

Measures of Central Tendency

These measures help describe the center point or average of a data set. There are three main types:

  • Mean : The average value of the dataset, obtained by adding all the data points and dividing by the number of data points.
  • Median : The middle value of the dataset, obtained by ordering all data points and picking out the one in the middle (or the average of the two middle numbers if the dataset has an even number of observations).
  • Mode : The most frequently occurring value in the dataset.

Measures of Variability (or Dispersion)

These measures describe the spread or variability of the data points in the dataset. There are four main types:

  • Range : The difference between the largest and smallest values in the dataset.
  • Variance : The average of the squared differences from the mean.
  • Standard Deviation : The square root of the variance, giving a measure of dispersion that is in the same units as the original dataset.
  • Interquartile Range (IQR) : The range between the first quartile (25th percentile) and the third quartile (75th percentile), which provides a measure of variability that is resistant to outliers.

Descriptive Statistics Formulas

Sure, here are some of the most commonly used formulas in descriptive statistics:

Mean (μ or x̄) :

The average of all the numbers in the dataset. It is computed by summing all the observations and dividing by the number of observations.

Formula : μ = Σx/n or x̄ = Σx/n (where Σx is the sum of all observations and n is the number of observations)

The middle value in the dataset when the observations are arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle numbers.

The most frequently occurring number in the dataset. There’s no formula for this as it’s determined by observation.

The difference between the highest (max) and lowest (min) values in the dataset.

Formula : Range = max – min

Variance (σ² or s²) :

The average of the squared differences from the mean. Variance is a measure of how spread out the numbers in the dataset are.

Population Variance formula : σ² = Σ(x – μ)² / N Sample Variance formula: s² = Σ(x – x̄)² / (n – 1)

(where x is each individual observation, μ is the population mean, x̄ is the sample mean, N is the size of the population, and n is the size of the sample)

Standard Deviation (σ or s) :

The square root of the variance. It measures the amount of variability or dispersion for a set of data. Population Standard Deviation formula: σ = √σ² Sample Standard Deviation formula: s = √s²

Interquartile Range (IQR) :

The range between the first quartile (Q1, 25th percentile) and the third quartile (Q3, 75th percentile). It measures statistical dispersion, or how far apart the data points are.

Formula : IQR = Q3 – Q1

Descriptive Statistics Methods

Here are some of the key methods used in descriptive statistics:

This method involves arranging data into a table format, making it easier to understand and interpret. Tables often show the frequency distribution of variables.

Graphical Representation

This method involves presenting data visually to help reveal patterns, trends, outliers, or relationships between variables. There are many types of graphs used, such as bar graphs, histograms, pie charts, line graphs, box plots, and scatter plots.

Calculation of Central Tendency Measures

This involves determining the mean, median, and mode of a dataset. These measures indicate where the center of the dataset lies.

Calculation of Dispersion Measures

This involves calculating the range, variance, standard deviation, and interquartile range. These measures indicate how spread out the data is.

Calculation of Position Measures

This involves determining percentiles and quartiles, which tell us about the position of particular data points within the overall data distribution.

Calculation of Association Measures

This involves calculating statistics like correlation and covariance to understand relationships between variables.

Summary Statistics

Often, a collection of several descriptive statistics is presented together in what’s known as a “summary statistics” table. This provides a comprehensive snapshot of the data at a glanc

Descriptive Statistics Examples

Descriptive Statistics Examples are as follows:

Example 1: Student Grades

Let’s say a teacher has the following set of grades for 7 students: 85, 90, 88, 92, 78, 88, and 94. The teacher could use descriptive statistics to summarize this data:

  • Mean (average) : (85 + 90 + 88 + 92 + 78 + 88 + 94)/7 = 88
  • Median (middle value) : First, rearrange the grades in ascending order (78, 85, 88, 88, 90, 92, 94). The median grade is 88.
  • Mode (most frequent value) : The grade 88 appears twice, more frequently than any other grade, so it’s the mode.
  • Range (difference between highest and lowest) : 94 (highest) – 78 (lowest) = 16
  • Variance and Standard Deviation : These would be calculated using the appropriate formulas, providing a measure of the dispersion of the grades.

Example 2: Survey Data

A researcher conducts a survey on the number of hours of TV watched per day by people in a particular city. They collect data from 1,000 respondents and can use descriptive statistics to summarize this data:

  • Mean : Calculate the average hours of TV watched by adding all the responses and dividing by the total number of respondents.
  • Median : Sort the data and find the middle value.
  • Mode : Identify the most frequently reported number of hours watched.
  • Histogram : Create a histogram to visually display the frequency of responses. This could show, for example, that the majority of people watch 2-3 hours of TV per day.
  • Standard Deviation : Calculate this to find out how much variation there is from the average.

Importance of Descriptive Statistics

Descriptive statistics are fundamental in the field of data analysis and interpretation, as they provide the first step in understanding a dataset. Here are a few reasons why descriptive statistics are important:

  • Data Summarization : Descriptive statistics provide simple summaries about the measures and samples you have collected. With a large dataset, it’s often difficult to identify patterns or tendencies just by looking at the raw data. Descriptive statistics provide numerical and graphical summaries that can highlight important aspects of the data.
  • Data Simplification : They simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary, making it easier to understand and interpret the dataset.
  • Identification of Patterns and Trends : Descriptive statistics can help identify patterns and trends in the data, providing valuable insights. Measures like the mean and median can tell you about the central tendency of your data, while measures like the range and standard deviation tell you about the dispersion.
  • Data Comparison : By summarizing data into measures such as the mean and standard deviation, it’s easier to compare different datasets or different groups within a dataset.
  • Data Quality Assessment : Descriptive statistics can help identify errors or outliers in the data, which might indicate issues with data collection or entry.
  • Foundation for Further Analysis : Descriptive statistics are typically the first step in data analysis. They help create a foundation for further statistical or inferential analysis. In fact, advanced statistical techniques often assume that one has first examined their data using descriptive methods.

When to use Descriptive Statistics

They can be used in a wide range of situations, including:

  • Understanding a New Dataset : When you first encounter a new dataset, using descriptive statistics is a useful first step to understand the main characteristics of the data, such as the central tendency, dispersion, and distribution.
  • Data Exploration in Research : In the initial stages of a research project, descriptive statistics can help to explore the data, identify trends and patterns, and generate hypotheses for further testing.
  • Presenting Research Findings : Descriptive statistics can be used to present research findings in a clear and understandable way, often using visual aids like graphs or charts.
  • Monitoring and Quality Control : In fields like business or manufacturing, descriptive statistics are often used to monitor processes, track performance over time, and identify any deviations from expected standards.
  • Comparing Groups : Descriptive statistics can be used to compare different groups or categories within your data. For example, you might want to compare the average scores of two groups of students, or the variance in sales between different regions.
  • Reporting Survey Results : If you conduct a survey, you would use descriptive statistics to summarize the responses, such as calculating the percentage of respondents who agree with a certain statement.

Applications of Descriptive Statistics

Descriptive statistics are widely used in a variety of fields to summarize, represent, and analyze data. Here are some applications:

  • Business : Businesses use descriptive statistics to summarize and interpret data such as sales figures, customer feedback, or employee performance. For instance, they might calculate the mean sales for each month to understand trends, or use graphical representations like bar charts to present sales data.
  • Healthcare : In healthcare, descriptive statistics are used to summarize patient data, such as age, weight, blood pressure, or cholesterol levels. They are also used to describe the incidence and prevalence of diseases in a population.
  • Education : Educators use descriptive statistics to summarize student performance, like average test scores or grade distribution. This information can help identify areas where students are struggling and inform instructional decisions.
  • Social Sciences : Social scientists use descriptive statistics to summarize data collected from surveys, experiments, and observational studies. This can involve describing demographic characteristics of participants, response frequencies to survey items, and more.
  • Psychology : Psychologists use descriptive statistics to describe the characteristics of their study participants and the main findings of their research, such as the average score on a psychological test.
  • Sports : Sports analysts use descriptive statistics to summarize athlete and team performance, such as batting averages in baseball or points per game in basketball.
  • Government : Government agencies use descriptive statistics to summarize data about the population, such as census data on population size and demographics.
  • Finance and Economics : In finance, descriptive statistics can be used to summarize past investment performance or economic data, such as changes in stock prices or GDP growth rates.
  • Quality Control : In manufacturing, descriptive statistics can be used to summarize measures of product quality, such as the average dimensions of a product or the frequency of defects.

Limitations of Descriptive Statistics

While descriptive statistics are a crucial part of data analysis and provide valuable insights about a dataset, they do have certain limitations:

  • Lack of Depth : Descriptive statistics provide a summary of your data, but they can oversimplify the data, resulting in a loss of detail and potentially significant nuances.
  • Vulnerability to Outliers : Some descriptive measures, like the mean, are sensitive to outliers. A single extreme value can significantly skew your mean, making it less representative of your data.
  • Inability to Make Predictions : Descriptive statistics describe what has been observed in a dataset. They don’t allow you to make predictions or generalizations about unobserved data or larger populations.
  • No Insight into Correlations : While some descriptive statistics can hint at potential relationships between variables, they don’t provide detailed insights into the nature or strength of these relationships.
  • No Causality or Hypothesis Testing : Descriptive statistics cannot be used to determine cause and effect relationships or to test hypotheses. For these purposes, inferential statistics are needed.
  • Can Mislead : When used improperly, descriptive statistics can be used to present a misleading picture of the data. For instance, choosing to only report the mean without also reporting the standard deviation or range can hide a large amount of variability in the data.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Descriptive Statistics

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The mean, the mode, the median, the range, and the standard deviation are all examples of descriptive statistics. Descriptive statistics are used because in most cases, it isn't possible to present all of your data in any form that your reader will be able to quickly interpret.

Generally, when writing descriptive statistics, you want to present at least one form of central tendency (or average), that is, either the mean, median, or mode. In addition, you should present one form of variability , usually the standard deviation.

Measures of Central Tendency and Other Commonly Used Descriptive Statistics

The mean, median, and the mode are all measures of central tendency. They attempt to describe what the typical data point might look like. In essence, they are all different forms of 'the average.' When writing statistics, you never want to say 'average' because it is difficult, if not impossible, for your reader to understand if you are referring to the mean, the median, or the mode.

The mean is the most common form of central tendency, and is what most people usually are referring to when the say average. It is simply the total sum of all the numbers in a data set, divided by the total number of data points. For example, the following data set has a mean of 4: {-1, 0, 1, 16}. That is, 16 divided by 4 is 4. If there isn't a good reason to use one of the other forms of central tendency, then you should use the mean to describe the central tendency.

The median is simply the middle value of a data set. In order to calculate the median, all values in the data set need to be ordered, from either highest to lowest, or vice versa. If there are an odd number of values in a data set, then the median is easy to calculate. If there is an even number of values in a data set, then the calculation becomes more difficult. Statisticians still debate how to properly calculate a median when there is an even number of values, but for most purposes, it is appropriate to simply take the mean of the two middle values. The median is useful when describing data sets that are skewed or have extreme values. Incomes of baseballs players, for example, are commonly reported using a median because a small minority of baseball players makes a lot of money, while most players make more modest amounts. The median is less influenced by extreme scores than the mean.

The mode is the most commonly occurring number in the data set. The mode is best used when you want to indicate the most common response or item in a data set. For example, if you wanted to predict the score of the next football game, you may want to know what the most common score is for the visiting team, but having an average score of 15.3 won't help you if it is impossible to score 15.3 points. Likewise, a median score may not be very informative either, if you are interested in what score is most likely.

Standard Deviation

The standard deviation is a measure of variability (it is not a measure of central tendency). Conceptually it is best viewed as the 'average distance that individual data points are from the mean.' Data sets that are highly clustered around the mean have lower standard deviations than data sets that are spread out.

For example, the first data set would have a higher standard deviation than the second data set:

Notice that both groups have the same mean (5) and median (also 5), but the two groups contain different numbers and are organized much differently. This organization of a data set is often referred to as a distribution. Because the two data sets above have the same mean and median, but different standard deviation, we know that they also have different distributions. Understanding the distribution of a data set helps us understand how the data behave.

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on 4 November 2022 by Pritha Bhandari . Revised on 9 January 2023.

Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population .

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalisable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, frequently asked questions.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarise the frequency of every possible value of a variable in numbers or percentages.

  • Simple frequency distribution table
  • Grouped frequency distribution table

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Measures of central tendency estimate the center, or average, of a data set. The mean , median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then, the median is the number in the middle. If there are two numbers in the middle, find their mean.

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

If you were to only consider the mean as a measure of central tendency, your impression of the ‘middle’ of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to extreme values, you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read ‘across’ the table to see how the independent and dependent variables relate to each other.

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables. It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

Descriptive statistics summarise the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalisable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarise only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2023, January 09). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved 29 April 2024, from https://www.scribbr.co.uk/stats/descriptive-statistics-explained/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, data collection methods | step-by-step guide & examples, variability | calculating range, iqr, variance, standard deviation, normal distribution | examples, formulas, & uses.

Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

Brief introduction to this section that descibes Open Access especially from an IntechOpen perspective

Want to get in touch? Contact our London head office or media team here

Our team is growing all the time, so we’re always on the lookout for smart people who want to help us reshape the world of scientific publishing.

Home > Books > Recent Advances in Biostatistics [Working Title]

Introduction to Descriptive Statistics

Submitted: 04 July 2023 Reviewed: 20 July 2023 Published: 07 September 2023

DOI: 10.5772/intechopen.1002475

Cite this chapter

There are two ways to cite this chapter:

From the Edited Volume

Recent Advances in Biostatistics [Working Title]

B. Santhosh Kumar

Chapter metrics overview

165 Chapter Downloads

Impact of this chapter

Total Chapter Downloads on intechopen.com

This chapter offers a comprehensive exploration of descriptive statistics, tracing its historical development from Condorcet’s “average” concept to Galton and Pearson’s contributions. Emphasizing its pivotal role in academia, descriptive statistics serve as a fundamental tool for summarizing and analyzing data across disciplines. The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques. It delves into their historical context, highlighting their organizational and presentational significance. Furthermore, the chapter accentuates the advantages of descriptive statistics in academia, including their ability to succinctly represent complex data, aid decision-making, and enhance research communication. It highlights the potency of visualization in discerning data patterns and explores emerging trends like large dataset analysis, Bayesian statistics, and nonparametric methods. Sources of variance intrinsic to descriptive statistics, such as sampling fluctuations, measurement errors, and outliers, are discussed, stressing the importance of considering these factors in data interpretation.

  • academic research
  • data analysis
  • data visualization
  • decision-making
  • research methodology
  • data summarization

Author Information

Olubunmi alabi *.

  • African University of Science and Technology, Abuja, Nigeria

Tosin Bukola

  • University of Greenwich, London, United Kingdom

*Address all correspondence to: [email protected]

1. Introduction

The French mathematician and philosopher Condorcet established the idea of the “average” as a means to summarize data, which is when descriptive statistics got their start. Yet, the widespread use of descriptive statistics in academic study did not start until the 19th century. Francis Galton, who was concerned in the examination of human features and attributes, was one of the major forerunners of descriptive statistics. Galton created various statistical methods that are still frequently applied in academic research today, such as the correlation and regression analysis concepts. The American statistician and mathematician in the early 20th century Karl Pearson created the “normal distribution,” which is a bell-shaped curve that characterizes the distribution of many natural occurrences. Moreover, Pearson created a number of correlational measures and popularized the chi-square test, which evaluates the significance of variations between observed and predicted frequencies. With the advent of new methods like multivariate analysis and factor analysis in the middle of the 20th century, the development of electronic computers sparked a revolution in statistical analysis. Descriptive statistics is the analysis and summarization of data to gain insights into its characteristics and distribution [ 1 ].

Descriptive statistics help researchers generate study ideas and guide further analysis by allowing them to explore data patterns and trends [ 2 ]. Descriptive statistics were used more often in academic research because they helped researchers better comprehend their datasets and served as a basis for more sophisticated statistical techniques. Similarly, Descriptive statistics are used to summarize and analyze data in a variety of academic areas, including psychology, sociology, economics, education, and epidemiology [ 3 ]. Descriptive statistics continue to be a crucial research tool in academia today, giving researchers a method to compile and analyze data from many fields. It is now simpler than ever to analyze and understand data, enabling researchers to make better informed judgments based on their results. This is due to the development of new statistical techniques and computer tools. Descriptive statistics can benefit researchers in hypothesis creation and exploratory analysis by identifying trends, patterns, and correlations between variables in huge datasets [ 4 ]. Descriptive statistics are important in data-driven decision-making processes because they allow stakeholders to make educated decisions based on reliable data [ 5 ].

2. Background

The history of descriptive statistics may be traced back to the 17th century, when early pioneers like John Graunt and William Petty laid the groundwork for statistical analysis [ 6 ]. Descriptive statistics is a fundamental concept in academia that is widely used across many disciplines, including social sciences, economics, medicine, engineering, and business. Descriptive statistics provides a comprehensive background for understanding data by organizing, summarizing, and presenting information effectively [ 7 ]. In academia, descriptive statistics is used to summarize and analyze data, providing insights into the patterns, trends, and characteristics of a dataset. Similarly, in academic research, descriptive statistics are often used as a preliminary analysis technique to gain a better understanding of the dataset before applying more complex statistical methods. Descriptive statistics lay the groundwork for inferential statistics by assisting researchers in drawing inferences about a population based on observed sample data [ 8 ]. Descriptive statistics aid in the identification and analysis of outliers, which can give useful insights into unusual observations or data collecting problems [ 9 ].

Descriptive statistics enable researchers to synthesize both quantitative and qualitative data, allowing for a thorough examination of factors [ 10 ]. Descriptive statistics can provide valuable information about the central tendency, variability, and distribution of the data, allowing researchers to make informed decisions about the appropriate statistical techniques to use. Descriptive statistics are an essential component of survey research technique, allowing researchers to efficiently summarize and display survey results [ 11 ]. Descriptive statistics may be used to summarize data as well as spot outliers, or observations that dramatically depart from the trend of the data as a whole. Finding outliers can help researchers spot any issues or abnormalities in the data so they can make the necessary modifications or repairs. In academic research, descriptive statistics are frequently employed to address research issues and evaluate hypotheses. Descriptive statistics, for instance, can be used to compare the average scores of two groups to see if there is a significant difference between them. In order to create new hypotheses or validate preexisting ideas, descriptive statistics may also be used to find patterns and correlations in the data.

There are several sources of variation that can affect the descriptive statistics of a data set, some of which include: Sampling Variation, descriptive statistics are often calculated using a sample of data rather than the entire population. Therefore, the descriptive statistics can vary depending on the particular sample that is selected. This is known as sampling variation. Measurement Variation, different measurement methods can produce different results, leading to variation in descriptive statistics. For example, if a scale is used to measure the weight of objects, slight differences in how the scale is used can produce slightly different measurements.

Data entry errors are mistakes made during the data entry process which can lead to variation in descriptive statistics. Even small errors, such as transposing two digits, can significantly impact the results. Outliers, Outliers are extreme values that fall outside of the expected range of values. These values can skew the descriptive statistics, making them appear more or less extreme than they actually are. Natural Variation, Natural variation refers to the inherent variability in the data itself. For example, if a data set contains measurements of the heights of trees, there will naturally be variation in the heights of the trees. It is important to understand these sources of variation when interpreting and using descriptive statistics in academia. Properly accounting for these sources of variation can help ensure that the descriptive statistics accurately reflect the underlying data.

Some emerging patterns in descriptive statistics in academia include: Big data analysis, with the increasing availability of large data sets, researchers are using descriptive statistics to identify patterns and trends in the data. The use of big data analysis techniques, such as machine learning and data mining, is becoming more common in academic research. Visualization techniques, advances in data visualization techniques are enabling researchers to more easily identify patterns in data sets. For example, heat maps and scatter plots can be used to visualize the relationship between different variables. Bayesian statistics is an emerging area of research in academia, which involves using probability theory to make inferences about data. Bayesian statistics can provide more accurate estimates of descriptive statistics, particularly when dealing with complex data sets.

Non-parametric statistics are becoming increasingly popular in academia, particularly when dealing with data sets that do not meet the assumptions of traditional parametric statistical tests. Non-parametric tests do not require the data to be normally distributed, and can be more robust to outliers. Open science practices, such as pre-registration and data sharing, are becoming more common in academia. This is enabling researchers to more easily replicate and verify the results of descriptive statistical analyses, which can improve the quality and reliability of research findings. Overall, the emerging patterns in descriptive statistics in academia reflect the increasing availability of data, the need for more accurate and robust statistical techniques, and a growing emphasis on transparency and openness in research practices.

3. Benefits of descriptive statistics

The advantages of descriptive statistics extend beyond research and academia, with applications in commercial decision-making, public policy, and strategic planning [ 12 ]. The benefits of descriptive statistics include providing a clear and concise summary of data, aiding in decision-making processes, and facilitating effective communication of findings [ 13 ]. Descriptive statistics provide numerous benefits to academia, some of which include: Summarization of Data: descriptive statistics allow researchers to quickly and efficiently summarize large data sets, providing a snapshot of the key characteristics of the data. This can help researchers identify patterns and trends in the data, and can also help to simplify complex data sets. Better decision making: descriptive statistics can help researchers make data-driven decisions. For example, if a researcher is comparing the effectiveness of two different treatments, descriptive statistics can be used to identify which treatment is more effective based on the data. Visualization of data: descriptive statistics can be used to create visualizations of data, which can make it easier to communicate research findings to others.

Histograms, bar charts, and scatterplots are examples of data visualization techniques that may be used to graphically depict data in order to detect trends, outliers, and correlations [ 14 ]. Visualizations can also help to identify patterns and trends in the data that might not be immediately apparent from raw data. Hypothesis Testing: descriptive statistics are often used in hypothesis testing, which allows researchers to determine whether a particular hypothesis about a data set is supported by the data. This can help to validate research findings and increase confidence in the conclusions drawn from the data. Improved data quality: Descriptive statistics can help to identify errors or inconsistencies in the data, which can help researchers improve the quality of the data. This can lead to more accurate research findings and a better understanding of the underlying phenomena. Overall, the benefits of descriptive statistics in academia are many and varied. They help researchers summarize large data sets, make data-driven decisions, visualize data, validate research findings, and improve the quality of the data. By using descriptive statistics, researchers can gain valuable insights into complex data sets and make more informed decisions based on the data.

4. Practical applications of descriptive statistics

Descriptive statistics has practical applications in disciplines such as business, social sciences, healthcare, finance, and market research [ 15 ]. Descriptive statistics have a wide range of practical applications in academia, some of which include: Data Summarization: Descriptive statistics can be used to summarize large data sets, making it easier for researchers to understand the key characteristics of the data. This is particularly useful when dealing with complex data sets that contain many variables. Hypothesis Testing: Descriptive statistics can be used to test hypotheses about a data set. For example, researchers can use descriptive statistics to test whether the mean value of a particular variable is significantly different from a hypothesized value. Data visualization: descriptive statistics can be used to create visualizations of data, which can make it easier to identify patterns and trends in the data. For example, a histogram or boxplot can be used to visualize the distribution of a variable. Comparing Groups: Descriptive statistics can be used to compare different groups within a data set. For example, researchers may compare the mean values of a particular variable between different demographic groups, such as age or gender. Predictive modeling: Descriptive statistics can be used to build predictive models, which can be used to forecast future trends or outcomes. For example, a researcher might use descriptive statistics to identify the key variables that predict student performance in a particular course. The practical applications of descriptive statistics in academia are wide-ranging and varied. They can be used in many different fields, including psychology, economics, sociology, and biology, among others, to provide insights into complex data sets and help researchers make data-driven decisions ( Figure 1 ).

descriptive statistics in research definition

Types of descriptive statistics. Ref: https://www.analyticssteps.com/blogs/types-descriptive-analysis-examples-steps .

Descriptive statistics is a useful tool for researchers in a variety of sectors since it allows them express the major characteristics of a dataset, such as its frequency, central tendency, variability, and distribution.

4.1 Central tendency measurements

Central tendency metrics, such as mean, median, and mode, are essential descriptive statistics that offer information about the average or typical value in a collection [ 16 ]. One of the primary purposes of descriptive statistics is to summarize data in a succinct and useful manner. Measures of central tendency, such as the median, are resistant to outliers and offer a more representative assessment of the average value in a skewed distribution [ 17 ]. The mean, median, and mode are measures of central tendency that are used to characterize the usual or center value of a dataset. The mean of a dataset is the arithmetic average, but the median is the midway number when the data is ordered in order of magnitude. The mode is the most often occurring value in the collection. Central tendency measurements are one of the most important aspects of descriptive statistics, as they provide a summary of the “typical” value of a data set.

The three most commonly used measures of central tendency are: Mean: the mean is calculated by adding up all the values in a data set and dividing by the total number of values. The mean is sensitive to outliers, as even one extreme value can greatly affect the mean. Median: the median is the middle value in a data set when the values are ordered from smallest to largest. If the data set has an odd number of values, the median is the middle value. If the data set has an even number of values, the median is the average of the two middle values. The median is more robust to outliers than the mean. Mode: the mode is the most common value in a data set. In some cases, there may be multiple modes (i.e. bimodal or multimodal distributions). The mode is useful for identifying the most frequently occurring value in a data set. Each of these measures of central tendency provides a different perspective on the “typical” value of a data set, and which measure is most appropriate to use depends on the nature of the data and the research question being addressed. For example, if the data set contains extreme outliers, the median may be a better measure of central tendency than the mean. Conversely, if the data set is symmetrical and normally distributed, the mean may provide the best measure of central tendency.

4.2 Variability indices

It is another key part of descriptive statistics is determining data variability. The spread or dispersion of data points about the central tendency readings is quantified by variability indices such as range, variance, and standard deviation [ 18 ]. Variability measures, such as range, variance, and standard deviation, reveal information about the spread or dispersion of the data. Variability indices, such as the coefficient of variation, allow you to compare variability across various datasets with different scales or units of measurement [ 19 ]. The range is the distance between the dataset’s greatest and lowest values, and the variance and standard deviation are measures of how much the data values depart from the mean. Variability indices are measures used in descriptive statistics to provide information about how much the data varies or how spread out it is. Variability indices, such as the interquartile range, give insights into data distribution while being less impacted by extreme values than the standard deviation [ 20 ]. Some commonly used variability indices include:

Range: The range is the difference between the largest and smallest values in a data set. It provides a simple measure of the spread of the data, but is sensitive to outliers. Interquartile Range (IQR): The IQR is the range of the middle 50% of the data. It is calculated by subtracting the 25th percentile (lower quartile) from the 75th percentile (upper quartile). The IQR is more robust to outliers than the range. Variance: The variance is a measure of how spread out the data is around the mean. It is calculated by taking the average of the squared differences between each data point and the mean. The variance is sensitive to outliers. Standard Deviation: The standard deviation is the square root of the variance. It provides a measure of how much the data varies from the mean, and is more commonly used than the variance because it has the same units as the original data.

Coefficient of Variation (CV): The CV is a measure of relative variability, expressed as a percentage. It is calculated by dividing the standard deviation by the mean and multiplying by 100. The CV is useful for comparing variability across different data sets that have different units or scales. These variability indices provide important information about the spread and variability of the data, which can help researchers better understand the characteristics of the data and draw meaningful conclusions from it.

4.3 Data visualization

Data may be visually represented using graphical approaches in addition to numerical metrics. Graphs and charts, such as histograms, box plots, and scatterplots, allow researchers investigate data patterns and correlations. Box plots and violin plots are efficient data visualization approaches for showing data distribution and spotting potential outliers [ 21 ]. They may also be used to detect outliers, or data points that deviate dramatically from the rest of the data. Data visualization is an important aspect of descriptive statistics, as it allows researchers to communicate complex data in a visual and easily understandable format. Some common types of data visualization used in descriptive statistics include: Histograms: Histograms are used to display the distribution of a continuous variable. The data is divided into intervals (or “bins”), and the number of observations falling into each bin is displayed on the vertical axis. Histograms provide a visual representation of the shape of the distribution, and can help to identify outliers or skewness. Box plots: Box plots provide a graphical representation of the distribution of a continuous variable. The application of graphical approaches, such as scatterplots and heat maps, improves comprehension of correlations and patterns in large datasets [ 22 ].

The box represents the middle 50% of the data, with the median displayed as a horizontal line inside the box. The whiskers extend to the minimum and maximum values in the data set, and any outliers are displayed as points outside the whiskers. Box plots are useful for comparing distributions across different groups or for identifying outliers. Scatter plots: Scatter plots are used to display the relationship between two continuous variables. Each data point is represented as a point on the graph, with one variable displayed on the horizontal axis and the other variable displayed on the vertical axis. Scatter plots can help to identify patterns or relationships in the data, such as a positive or negative correlation. Bar charts: Bar charts are used to display the distribution of a categorical variable.

The categories are displayed on the horizontal axis, and the frequency or percentage of observations falling into each category is displayed on the vertical axis. Bar charts can help to compare the frequency of different categories or to display the results of a survey or questionnaire. Heat maps: Heat maps are used to display the relationship between two categorical variables. The categories are displayed on both the horizontal and vertical axes, and the frequency or percentage of observations falling into each combination of categories is displayed using a color scale. Heat maps can help to identify patterns or relationships in the data, such as a higher frequency of observations in certain combinations of categories. These types of data visualizations can help researchers to communicate complex data in a clear and understandable format, and can also provide insights into the characteristics of the data that may not be immediately apparent from the raw data.

4.4 Data cleaning and preprocessing

Data cleaning and preprocessing procedures, such as imputation methods for missing data, aid in the preservation of data integrity and the reduction of bias in descriptive analysis [ 23 ]. Before beginning any statistical analysis, be certain that the data is clean and well arranged. The process of discovering and fixing flaws or inconsistencies in data, such as missing numbers or outliers, is known as data cleaning. Data preparation is the process of putting data into an appropriate format for analysis, such as scaling or normalizing the data. Data cleaning and preprocessing are essential steps in descriptive analysis, as they help to ensure that the data is accurate, complete, and ready for analysis. Some common data cleaning and preprocessing steps include: Handling missing data: Missing data can be a common problem in datasets and can impact the accuracy of the analysis. Depending on the amount of missing data, researchers may choose to remove incomplete cases or impute missing values using techniques such as mean imputation, regression imputation, or multiple imputation. Handling outliers: Outliers are extreme values that are different from the majority of the data points and can distort the analysis. Outlier identification and removal procedures, for example, assist increase the accuracy and reliability of descriptive statistics [ 24 ].

To assure the correctness and dependability of descriptive statistics, data cleaning and preprocessing require finding and dealing with missing values, outliers, and data inconsistencies [ 25 ]. Researchers may choose to remove or transform outliers to better reflect the characteristics of the data. Data transformation: Data transformation is used to normalize the data or to make it easier to analyze. Common transformations include logarithmic, square root, or Box-Cox transformations. Handling categorical data: Categorical data, such as nominal or ordinal data, may need to be recoded into numerical data before analysis. Researchers may also need to handle missing or inconsistent categories within the data. Standardizing data: Standardizing data involves scaling the data to have a mean of zero and a standard deviation of one. This can be useful for comparing variables with different units or scales. Data integration: Data integration involves merging or linking multiple datasets to create a single, comprehensive dataset for analysis. This may involve matching or merging datasets based on common variables or identifiers. By performing these data cleaning and preprocessing steps, researchers can ensure that the data is accurate and ready for analysis, which can lead to more reliable and meaningful insights from the data.

5. Descriptive statistics in academic methodology

Descriptive statistics are important in academic technique because they enable researchers to synthesize and describe data collected for research objectives [ 26 ]. Descriptive statistics is often used in combination with other statistical techniques, such as inferential statistics, to draw conclusions and make predictions from the data. In academic research, descriptive statistics is used in a variety of ways, such as describing sample characteristics. Descriptive statistics is used to describe the characteristics of a sample, such as the mean, median, and standard deviation of a variable. This information can be used to identify patterns, trends, or differences within the sample. Identifying data outliers: Descriptive statistics can help researchers identify potential outliers or anomalies in the data, which can affect the validity of the results. For example, identifying extreme values in a dataset can help researchers to investigate whether these values are due to measurement error or a true characteristic of the population.

Communicating research findings: Descriptive statistics is used to summarize and communicate research findings in a clear and concise manner. Graphs, charts, and tables can be used to display descriptive statistics in a way that is easy to understand and interpret. Testing assumptions: Descriptive statistics can be used to test assumptions about the data, such as normality or homogeneity of variance, which are important for selecting appropriate statistical tests and interpreting the results. Overall, descriptive statistics is a critical methodology in academic research that helps researchers to describe and understand the characteristics of their data. By using descriptive statistics, researchers can draw meaningful insights and conclusions from their data, and communicate these findings to others in a clear and concise manner.

6. Pitfalls of descriptive statistics

The possibility for misunderstanding, reliance on summary measures alone, and susceptibility to high values or outliers are all disadvantages of descriptive statistics [ 27 ]. While descriptive statistics is an essential tool in academic statistics, there are several potential pitfalls that researchers should be aware of: Limited scope: Descriptive statistics can provide a useful summary of the characteristics of a dataset, but it is limited in its ability to provide insights into the underlying causes or mechanisms that drive the data. Descriptive statistics alone cannot establish causal relationships or test hypotheses. Misleading interpretations: Descriptive statistics can be misleading if not interpreted correctly. For example, a small sample size may not accurately represent the population, and summary statistics such as the mean may not be meaningful if the data is not normally distributed.

Incomplete analysis: Descriptive statistics can only provide a limited view of the data, and researchers may need to use additional statistical techniques to fully analyze the data. For example, hypothesis testing and regression analysis may be needed to establish relationships between variables and make predictions. Biased data: Descriptive statistics can be biased if the data is not representative of the population of interest. Sampling bias, measurement bias, or non-response bias can all impact the validity of descriptive statistics. Over-reliance on summary statistics: Descriptive statistics can be over-reliant on summary statistics such as the mean or median, which may not provide a complete picture of the data. Visualizations and other descriptive statistics, such as measures of variability, can provide additional insight into the data. To avoid these pitfalls, researchers should carefully consider the scope and limitations of descriptive statistics and use additional statistical techniques as needed. They should also ensure that their data is representative of the population of interest and interpret their descriptive statistics in a thoughtful and nuanced manner.

7. Conclusion

Researchers can test the normalcy assumptions of their data by using relevant descriptive statistics techniques such as measures of skewness and kurtosis [ 28 ]. Descriptive statistics has become a fundamental methodology in academic research that is used to summarize and describe the characteristics of a dataset, such as the central tendency, variability, and distribution of the data. It is used in a wide range of disciplines, including social sciences, natural sciences, engineering, and business. Descriptive statistics can be used to describe sample characteristics, identify data outliers, communicate research findings, and test assumptions. The kind of data, research topic, and particular aims of the study all influence the right choice and implementation of descriptive statistical approaches [ 29 ].

However, there are several potential pitfalls of descriptive statistics, including limited scope, misleading interpretations, incomplete analysis, biased data, and over-reliance on summary statistics. The use of descriptive statistics in data presentation can improve the interpretability of study findings, making complicated material more accessible to a larger audience [ 30 ]. To use descriptive statistics effectively in academic research, researchers should carefully consider the limitations and scope of the methodology, use additional statistical techniques as needed, ensure that their data is representative of the population of interest, and interpret their descriptive statistics in a thoughtful and nuanced manner.

Conflict of interest

The authors declare no conflict of interest.

  • 1. Agresti A, Franklin C. Statistics: The Art and Science of Learning from Data. Upper Saddle River, NJ: Pearson; 2009
  • 2. Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 4th ed. Shelton (CT): PMPH-USA; 2014
  • 3. Cohen J, Cohen P, West SG, Aiken LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New York: Routledge; 2013
  • 4. Osborne J. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment. 2019; 10 (7):1-9
  • 5. Field A, Hole G. How To Design and Report Experiments Sage. The Tyranny of Evaluation Human Factors in Computing Systems CHI Fringe; 2003
  • 6. Anders H. A History of Mathematical Statistics from 1750 to 1930. New York: Wiley; 1998. p. xvii+795. ISBN 0-471-17912-4
  • 7. Rebecca M. Warner’s Applied Statistics: From Bivariate Through Multivariate Techniques. Second Edition. Thousand Oaks, California: SAGE Publications; 2012
  • 8. Sullivan LM, Artino AR Jr. Analyzing and interpreting continuous data using ordinal regression. Journal of Graduate Medical Education. 2013; 5 (4):542-543
  • 9. Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; 2011
  • 10. Maxwell SE, Delaney HD, Kelley K. Designing Experiments and Analyzing Data: A Model Comparison Perspective. Routledge; 2017
  • 11. De Leeuw ED, Hox JJ. International Handbook of Survey Methodology. Routledge; 2008
  • 12. Chatfield C. The Analysis of Time Series: An Introduction. CRC Press; 2016
  • 13. Tabachnick BG, Fidell LS. Using Multivariate Statistics. Pearson; 2013
  • 14. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; 2016
  • 15. Field A, Miles J, Field Z. Discovering Statistics Using R. Sage; 2012
  • 16. Howell DC. Statistical Methods for Psychology. Cengage Learning; 2013
  • 17. Wilcox RR. Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction. CRC Press; 2017
  • 18. Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis. Pearson; 2019
  • 19. Beasley TM, Schumacker RE. Multiple regression approach to analyzing contingency tables: Post hoc and planned comparison procedures. Journal of Experimental Education. 2013; 81 (3):310-312
  • 20. Dodge Y. The Concise Encyclopedia of Statistics. Springer Science & Business Media; 2008
  • 21. Krzywinski M, Altman N. Points of significance: Visualizing samples with box plots. Nature Methods. 2014; 11 (2):119-120
  • 22. Cleveland WS. Visualizing data. Hobart Press; 1993
  • 23. Little RJ, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2019
  • 24. Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Computational Statistics & Data Analysis. 2008; 52 (3):1694-1711
  • 25. Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl KC Jr, Desarbo WS. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons; 2017
  • 26. Aguinis H, Gottfredson RK. Statistical power analysis in HRM research. Organizational Research Methods. 2013; 16 (2):289-324
  • 27. Stevens JP. Applied Multivariate Statistics for the Social Sciences. Routledge; 2012
  • 28. Byrne BM. Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming. Routledge; 2016
  • 29. Everitt BS, Hothorn T. An Introduction to Applied Multivariate Analysis with R. Springer; 2011
  • 30. Kosslyn SM. Graph Design for the Eye and Mind. Oxford University Press; 2006

© The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Child Care and Early Education Research Connections

Descriptive Statistics

This page describes graphical and pictorial methods of descriptive statistics and the three most common measures of descriptive statistics (central tendency, dispersion, and association).

Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a dataset and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of:

Graphical/Pictorial Methods

Measures of central tendency, measures of dispersion, measures of association.

There are several graphical and pictorial methods that enhance researchers' understanding of individual variables and the relationships between variables. Graphical and pictorial methods provide a visual representation of the data. Some of these methods include:

Scatter plots

Geographical Information Systems (GIS)

Visually represent the frequencies with which values of variables occur

Each value of a variable is displayed along the bottom of a histogram, and a bar is drawn for each value

The height of the bar corresponds to the frequency with which that value occurs

Display the relationship between two quantitative or numeric variables by plotting one variable against the value of another variable

For example, one axis of a scatter plot could represent height and the other could represent weight. Each person in the data would receive one data point on the scatter plot that corresponds to his or her height and weight

Geographic Information Systems (GIS)

A GIS is a computer system capable of capturing, storing, analyzing, and displaying geographically referenced information; that is, data identified according to location

Using a GIS program, a researcher can create a map to represent data relationships visually

Display networks of relationships among variables, enabling researchers to identify the nature of relationships that would otherwise be too complex to conceptualize

Visit the following websites for more information:

Graphical Analytic Techniques

Geographic Information Systems

Glossary terms related to graphical and pictorial methods:

GIS Histogram Scatter Plot Sociogram

Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics. They describe the "average" member of the population of interest. There are three measures of central tendency:

Mean  -- the sum of a variable's values divided by the total number of values Median  -- the middle value of a variable Mode  -- the value that occurs most often

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000.

Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = $225,000 Median Income = $45,000 Modal Income = $10,000

The mean is the most commonly used measure of central tendency. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution). For example, the median income is often the best measure of the average income because, while most individuals earn between $0 and $200,000, a handful of individuals earn millions.

Basic Statistics

Measures of Position

Glossary terms related to measures of central tendency:

Average Central Tendency Confidence Interval Mean Median Mode Moving Average Point Estimate Univariate Analysis

Measures of dispersion provide information about the spread of a variable's values. There are four key measures of dispersion:

Standard Deviation

Range  is simply the difference between the smallest and largest values in the data. The interquartile range is the difference between the values at the 75th percentile and the 25th percentile of the data.

Variance  is the most commonly used measure of dispersion. It is calculated by taking the average of the squared differences between each value and the mean.

Standard deviation , another commonly used statistic, is the square root of the variance.

Skew  is a measure of whether some values of a variable are extremely different from the majority of the values. For example, income is skewed because most people make between $0 and $200,000, but a handful of people earn millions. A variable is positively skewed if the extreme values are higher than the majority of values. A variable is negatively skewed if the extreme values are lower than the majority of values.

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000:

Range = 1,000,000 - 10,000 = 990,000 Variance = [(10,000 - 225,000)2 + (10,000 - 225,000)2 + (45,000 - 225,000)2 + (60,000 - 225,000)2 + (1,000,000 - 225,000)2] / 5 = 150,540,000,000 Standard Deviation = Square Root (150,540,000,000) = 387,995 Skew = Income is positively skewed

Survey Research Tools

Variance and Standard Deviation

Summarizing and Presenting Data

Skewness Simulation

Glossary terms related to measures of dispersion:

Confidence Interval Distribution Kurtosis Point Estimate Quartiles Range Skewness Standard Deviation Univariate Analysis Variance

Measures of association indicate whether two variables are related. Two measures are commonly used:

Correlation

As a measure of association between variables, chi-square tests are used on nominal data (i.e., data that are put into classes: e.g., gender [male, female] and type of job [unskilled, semi-skilled, skilled]) to determine whether they are associated*

A chi-square is called significant if there is an association between two variables, and nonsignificant if there is not an association

To test for associations, a chi-square is calculated in the following way: Suppose a researcher wants to know whether there is a relationship between gender and two types of jobs, construction worker and administrative assistant. To perform a chi-square test, the researcher counts up the number of female administrative assistants, the number of female construction workers, the number of male administrative assistants, and the number of male construction workers in the data. These counts are compared with the number that would be expected in each category if there were no association between job type and gender (this expected count is based on statistical calculations). If there is a large difference between the observed values and the expected values, the chi-square test is significant, which indicates there is an association between the two variables.

*The chi-square test can also be used as a measure of goodness of fit, to test if data from a sample come from a population with a specific distribution, as an alternative to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests. As such, the chi square test is not restricted to nominal data; with non-binned data, however, the results depend on how the bins or classes are created and the size of the sample

A correlation coefficient is used to measure the strength of the relationship between numeric variables (e.g., weight and height)

The most common correlation coefficient is  Pearson's r , which can range from -1 to +1.

If the coefficient is between 0 and 1, as one variable increases, the other also increases. This is called a positive correlation. For example, height and weight are positively correlated because taller people usually weigh more

If the correlation coefficient is between -1 and 0, as one variable increases the other decreases. This is called a negative correlation. For example, age and hours slept per night are negatively correlated because older people usually sleep fewer hours per night

Chi-Square Procedures for the Analysis of Categorical Frequency Data

Chi-square Analysis

Glossary terms related to measures of association:

Association Chi Square Correlation Correlation Coefficient Measures of Association Pearson's Correlational Coefficient Product Moment Correlation Coefficient

Table of Contents

What is descriptive statistics, descriptive statistics examples, types of descriptive statistics, univariate vs. bivariate statistics, what is the main purpose of descriptive statistics, what’s the difference between descriptive statistics and inferential statistics, why not become a data scientist, choose the right program, frequently asked questions, what is descriptive statistics: definition, types, applications, and examples.

What is Descriptive Statistics: Definition, Types, Applications, and Examples

Reviewed and fact-checked by Sayantoni Das

If you work with datasets long enough, you will eventually need to deal with statistics. Ask the average person what statistics are, and they’ll probably throw around words like “numbers,” “figures,” and “research.”

Statistics is the science, or a branch of mathematics, that involves collecting, classifying, analyzing, interpreting, and presenting numerical facts and data. It is especially handy when dealing with populations too numerous and extensive for specific, detailed measurements. Statistics are crucial for drawing general conclusions relating to a dataset from a data sample.

Statistics further breaks down into two types: descriptive and inferential. Today, we look at descriptive statistics, including a definition, the types of descriptive statistics, and the differences between descriptive statistics and inferential statistics.

Descriptive statistics refers to a branch of statistics that involves summarizing, organizing, and presenting data meaningfully and concisely. It focuses on describing and analyzing a dataset's main features and characteristics without making any generalizations or inferences to a larger population.

The primary goal of descriptive statistics is to provide a clear and concise summary of the data, enabling researchers or analysts to gain insights and understand patterns, trends, and distributions within the dataset. This summary typically includes measures such as central tendency (e.g., mean, median, mode), dispersion (e.g., range, variance, standard deviation), and shape of the distribution (e.g., skewness, kurtosis).

Descriptive statistics also involves a graphical representation of data through charts, graphs, and tables, which can further aid in visualizing and interpreting the information. Common graphical techniques include histograms, bar charts, pie charts, scatter plots, and box plots.

By employing descriptive statistics, researchers can effectively summarize and communicate the key characteristics of a dataset, facilitating a better understanding of the data and providing a foundation for further statistical analysis or decision-making processes.

Also Read: The Difference Between Data Mining and Statistics

Example 1: 

Exam Scores Suppose you have the following scores of 20 students on an exam:

85, 90, 75, 92, 88, 79, 83, 95, 87, 91, 78, 86, 89, 94, 82, 80, 84, 93, 88, 81

To calculate descriptive statistics:

  • Mean: Add up all the scores and divide by the number of scores. Mean = (85 + 90 + 75 + 92 + 88 + 79 + 83 + 95 + 87 + 91 + 78 + 86 + 89 + 94 + 82 + 80 + 84 + 93 + 88 + 81) / 20 = 1770 / 20 = 88.5
  • Median: Arrange the scores in ascending order and find the middle value. Median = 86 (middle value)
  • Mode: Identify the score(s) that appear(s) most frequently. Mode = 88
  • Range: Calculate the difference between the highest and lowest scores. Range = 95 - 75 = 20
  • Variance: Calculate the average of the squared differences from the mean. Variance = [(85-88.5)^2 + (90-88.5)^2 + ... + (81-88.5)^2] / 20 = 33.25
  • Standard Deviation: Take the square root of the variance. Standard Deviation = √33.25 = 5.77

Monthly Income Consider a sample of 50 individuals and their monthly incomes:

$2,500, $3,000, $3,200, $4,000, $2,800, $3,500, $4,500, $3,200, $3,800, $3,500, $2,800, $4,200, $3,900, $3,600, $3,000, $2,700, $2,900, $3,700, $3,500, $3,200, $3,600, $4,300, $4,100, $3,800, $3,600, $2,500, $4,200, $4,200, $3,400, $3,300, $3,800, $3,900, $3,500, $2,800, $4,100, $3,200, $3,600, $4,000, $3,700, $3,000, $3,100, $2,900, $3,400, $3,800, $4,000, $3,300, $3,100, $3,200, $4,200, $3,400.

  • Mean: Add up all the incomes and divide by the number of incomes. Mean = ($2,500 + $3,000 + ... + $3,400) / 50 = $166,200 / 50 = $3,324
  • Median: Arrange the incomes in ascending order and find the middle value. Median = $3,400 (middle value)
  • Range: Calculate the difference between the highest and lowest incomes. Range = $4,500 - $2,500 = $2,000
  • Variance: Calculate the average of the squared differences from the mean. Variance = [($2,500-$3,324)^2 + ($3,000-$3,324)^2 + ... + ($3,400-$3,324)^2] / 50 = $221,684,000 / 50 = $4,433,680
  • Standard Deviation: Take the square root of the variance. Standard Deviation = √$4,433,680 = $2,105.18

These calculations provide descriptive statistics that summarize the central tendency, dispersion, and shape of the data in these examples.

Descriptive statistics break down into several types, characteristics, or measures. Some authors say that there are two types. Others say three or even four. 

Distribution (Also Called Frequency Distribution)

Datasets consist of a distribution of scores or values. Statisticians use graphs and tables to summarize the frequency of every possible value of a variable, rendered in percentages or numbers. For instance, if you held a poll to determine people’s favorite Beatle, you’d set up one column with all possible variables (John, Paul, George, and Ringo), and another with the number of votes.

Statisticians depict frequency distributions as either a graph or as a table.

Measures of Central Tendency

Measures of central tendency estimate a dataset's average or center, finding the result using three methods: mean, mode, and median.

Mean: The mean is also known as “M” and is the most common method for finding averages. You get the mean by adding all the response values together, and dividing the sum by the number of responses, or “N.” For instance, say someone is trying to figure out how many hours a day they sleep in a week. So, the data set would be the hour entries (e.g., 6,8,7,10,8,4,9), and the sum of those values is 52. There are seven responses, so N=7. You divide the value sum of 52 by N, or 7, to find M, which in this instance is 7.3.

Mode: The mode is just the most frequent response value. Datasets may have any number of modes, including “zero.” You can find the mode by arranging your dataset's order from the lowest to highest value and then looking for the most common response. So, in using our sleep study from the last part: 4,6,7,8,8,9,10. As you can see, the mode is eight.

Median: Finally, we have the median, defined as the value in the precise center of the dataset. Arrange the values in ascending order (like we did for the mode) and look for the number in the set’s middle. In this case, the median is eight.

Variability (Also Called Dispersion)

The measure of variability gives the statistician an idea of how spread out the responses are. The spread has three aspects — range, standard deviation , and variance.

Range: Use range to determine how far apart the most extreme values are. Start by subtracting the dataset’s lowest value from its highest value. Once again, we turn to our sleep study: 4,6,7,8,8,9,10. We subtract four (the lowest) from ten (the highest) and get six. There’s your range.

Standard Deviation: This aspect takes a little more work. The standard deviation (s) is your dataset’s average amount of variability, showing you how far each score lies from the mean. The larger your standard deviation, the greater your dataset’s variable. Follow these six steps:

  • List the scores and their means.
  • Find the deviation by subtracting the mean from each score.
  • Square each deviation.
  • Total up all the squared deviations.
  • Divide the sum of the squared deviations by N-1.
  • Find the result’s square root.

When you divide the sum of the squared deviations by 6 (N-1): 23.83/6, you get 3.971, and the square root of that result is 1.992. As a result, we now know that each score deviates from the mean by an average of 1.992 points.

Variance: Variance reflects the dataset’s degree spread. The greater the degree of data spread, the larger the variance relative to the mean. You can get the variance by just squaring the standard deviation. Using the above example, we square 1.992 and arrive at 3.971.

Become a Data Science & Business Analytics Professional

  • 28% Annual Job Growth By 2026
  • 11.5 M Expected New Jobs For Data Science By 2026

Data Analyst

  • Industry-recognized Data Analyst Master’s certificate from Simplilearn
  • Dedicated live sessions by faculty of industry experts

Post Graduate Program in Data Analytics

  • Post Graduate Program certificate and Alumni Association membership
  • Exclusive hackathons and Ask me Anything sessions by IBM

Here's what learners are saying regarding our programs:

Gayathri Ramesh

Gayathri Ramesh

Associate data engineer , publicis sapient.

The course was well structured and curated. The live classes were extremely helpful. They made learning more productive and interactive. The program helped me change my domain from a data analyst to an Associate Data Engineer.

Felix Chong

Felix Chong

Project manage , codethink.

After completing this course, I landed a new job & a salary hike of 30%. I now work with Zuhlke Group as a Project Manager.

Univariate Descriptive Statistics

Univariate descriptive statistics examine only one variable at a time and do not compare variables. Rather, it allows the researcher to describe individual variables. As a result, this sort of statistic is also known as descriptive statistics. The patterns identified in this sort of data may be explained using the following:

  • Measures of central tendency (mean, mode, and median)
  • Data dispersion (standard deviation, variance, range, minimum, maximum, and quartiles) (standard deviation, variance, range, minimum, maximum, and quartiles)
  • Tables of frequency distribution
  • Frequency polygon histograms

Bivariate Descriptive Statistics

When using bivariate descriptive statistics, two variables are concurrently analyzed (compared) to see whether they are correlated. Generally, by convention, the independent variable is represented by the columns, and the rows represent the dependent variable.'

There are numerous real-world applications for bivariate data. For example, estimating when a natural occurrence will occur is quite valuable. Bivariate data analysis is a tool in the statistician's toolbox. Sometimes, something as simple as projecting one parameter against the other on a Two-dimensional plane can better understand what the information is trying to convince you. For example, the scatterplot below demonstrates the link between the period between eruptions at Old Faithful and the eruption's duration.

Descriptive statistics can be useful for two things: 1) providing basic information about variables in a dataset and 2) highlighting potential relationships between variables. Graphical/Pictorial Methods are measures of the three most common descriptive statistics that can be displayed graphically or pictorially. It is used to summarise data. Descriptive statistics only make statements about the data set used to calculate them; they never go beyond your data.

Scatter Plots

A scatter plot employs dots to indicate values for two separate numeric variables. Each dot's location on the horizontal and vertical axes represents a data point's values. Scatter plots are being used to monitor relationships between variables.

The main purposes of scatter plots are to examine and display relationships between two numerical variables. The points in a scatter plot document the values of individual points and trends when the data is obtained as a whole. Identification of correlational links is prevalent with scatter plots. In these situations, we want to know what a good vertical value prediction would be given a specific horizontal value.

This can lead to overplotting when there are many data points to plot. When data points are overlaid to the point where it is difficult to see the connections between them and the variables, this is known as overplotting. It might be difficult to discern how densely-packed data points are when lots of them are in a tiny space.

There are a couple simple methods to relieve this issue. One approach is to choose only a subset of data points: a random sample of points should still offer the basic sense of the patterns in the whole data. Additionally, we can alter the shape of the dots by increasing transparency to make overlaps visible or decreasing point size to minimise overlaps.

So, what’s the difference between the two statistical forms? We’ve already touched upon this when we mentioned that descriptive statistics doesn’t infer any conclusions or predictions, which implies that inferential statistics do so.

Inferential statistics takes a random sample of data from a portion of the population and describes and makes inferences about the entire population. For instance, in asking 50 people if they liked the movie they had just seen, inferential statistics would build on that and assume that those results would hold for the rest of the moviegoing population in general.

Therefore, if you stood outside that movie theater and surveyed 50 people who had just seen Rocky 20: Enough Already! and 38 of them disliked it (about 76 percent), you could extrapolate that 76% of the rest of the movie-watching world will dislike it too, even though you haven’t the means, time, and opportunity to ask all those people.

Simply put: Descriptive statistics give you a clear picture of what your current data shows. Inferential statistics makes projections based on that data.

Whether you like descriptive or inferential statistics, you can find many opportunities in the field of data analytics and data science . Simplilearn’s Professional Certificate Program in Data Science , gives you broad exposure to key data science concepts and tools like Python, R, Machine Learning , and more. Hands-on labs and project work in this acclaimed program bring the ideas to life with skilled trainers and teaching assistants to guide you along the way.

The boot camp, conducted in partnership with Purdue University and in collaboration with IBM, features the perfect mix of theory, case studies, & extensive hands-on practice. The Economic Times ranked this Data Science certification program at the top of its list.

According to Glassdoor , data scientists earn an annual average of USD 113,309. Payscale shows that a data scientist in India makes a yearly average of ₹817,366. Data science is a great career choice if you’re looking for a challenge in a secure vocation and getting well-compensated in the process!

Check out Simplilearn’s data science courses today and embark on this exciting new opportunity!

Are you interested in the data science field? Our Data Science courses are meticulously curated to equip you with the requisite expertise and know-how to flourish in this swiftly expanding sector. Below is an elaborate comparison to help you comprehend better:

Program Name DS Master's Post Graduate Program In Data Science Post Graduate Program In Data Science Geo All Geos All Geos IN/ROW University Simplilearn Purdue Caltech Course Duration 11 Months 11 Months 11 Months Coding Experience Required Basic Basic No Skills You Will Learn 10+ skills including data structure, data manipulation, NumPy, Scikit-Learn, Tableau and more 8+ skills including Exploratory Data Analysis, Descriptive Statistics, Inferential Statistics, and more 8+ skills including Supervised & Unsupervised Learning Deep Learning Data Visualization, and more Additional Benefits Applied Learning via Capstone and 25+ Data Science Projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Resume Building Assistance Upto 14 CEU Credits Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

1. What do you mean by descriptive statistics?

Descriptive statistics refers to a set of methods used to summarize and describe the main features of a dataset, such as its central tendency, variability, and distribution. These methods provide an overview of the data and help identify patterns and relationships.

2. What is descriptive statistics. Explain with examples.

Descriptive statistics are methods used to summarize and describe the main features of a dataset. Examples include measures of central tendency, such as mean, median, and mode, which provide information about the typical value in the dataset. Measures of variability, such as range, variance, and standard deviation, describe the spread or dispersion of the data. Descriptive statistics can also include graphical methods, including histograms, box plots, and scatter plots, to visually represent the data.

3. What are the four types of descriptive statistics?

The four types of descriptive statistics are:

  • Measures of central tendency
  • Measures of variability
  • Standards of relative position
  • Graphical methods

Measures of central tendency describe the typical value in the dataset and include mean, median, and mode. Measures of variability represent the spread or dispersion of the data and include range, variance, and standard deviation. Measures of relative position describe the location of a specific value within the dataset, such as percentiles. Graphical methods use charts, histograms, and other visual representations to display data.

4. What is the main purpose of descriptive statistics?

The primary objective of descriptive statistics is to effectively summarize and describe the main features of a dataset, providing an overview of the data and helping to identify patterns and relationships within it. Descriptive statistics provide a useful starting point for analyzing data, as they can help to identify outliers, summarize key characteristics of the data, and inform the selection of appropriate statistical methods for further analysis. They are commonly used in multiplle fields, including social sciences, business, and healthcare.

5. Can Descriptive Statistics be used to make inferences or predictions?

Descriptive statistics is primarily used to summarize and describe data, but they do not involve making inferences or predictions beyond the data itself. Statistical inference methods are needed to make inferences or predictions about a larger population, which go beyond descriptive statistics and involve estimating parameters and testing hypotheses.

6. Why is descriptive statistics important?

Descriptive statistics is important because it allows us to summarize and describe data meaningfully. It helps us understand a dataset's main features and characteristics, identify patterns and trends, and gain insights from the data. Descriptive statistics provide a foundation for further analysis, decision-making, and communication of findings.

7. What is descriptive statistics used for? 

Descriptive statistics is used to summarize and present data concisely and meaningfully. It is commonly used in various fields such as research, business, economics, social sciences, and healthcare. Descriptive statistics helps researchers and analysts to describe the central tendency (mean, median, mode), dispersion (range, variance, and standard deviation), and shape of the distribution of a dataset. It also involves graphical representation of data to aid visualization and understanding.

8. Explain the difference between inferential and descriptive statistics ?

The main difference between descriptive and inferential statistics lies in their purpose and scope. Descriptive statistics focuses on summarizing and describing the characteristics of a sample or population, without making inferences or generalizations to a larger population. It aims to provide a concise summary of data and reveal patterns within the observed dataset.

In contrast, inferential statistics involves drawing conclusions, making predictions, or testing hypotheses about a population based on a sample of data. It uses probability theory and statistical techniques to generalize findings from a sample to a larger population. Inferential statistics allows researchers to make inferences, estimate parameters, assess relationships, and make predictions beyond the observed data.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Get Free Certifications with free video courses

Introduction to Data Analytics Course

Data Science & Business Analytics

Introduction to Data Analytics Course

Introduction to Data Visualization

Introduction to Data Visualization

Learn from Industry Experts with free Masterclasses

Data Scientist vs Data Analyst: Breaking Down the Roles

Open Gates to a Successful AI & Data Science Career in 2024 with Brown University

Unlock Your Data Game with Generative AI Techniques in 2024

Recommended Reads

Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer

What You Need to Know About Inferential Statistics to Boost Your Career in Data Science

A Comprehensive Look at Percentile in Statistics

Free eBook: Top Programming Languages For A Data Scientist

The Difference Between Data Mining and Statistics

All You Need to Know About Bias in Statistics

Get Affiliated Certifications with Live Class programs

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

What is Descriptive Statistics? Definition, Types, Examples

Appinio Research · 23.11.2023 · 38min read

What is Descriptive Statistics Definition Types Examples

Have you ever wondered how we make sense of the vast sea of data surrounding us? In a world overflowing with information, the ability to distill complex datasets into meaningful insights is a skill of immense importance.

This guide will equip you with the knowledge and tools to unravel the stories hidden within data. Whether you're a data analyst, a researcher, a business professional, or simply curious about the art of data interpretation, this guide will demystify the fundamental concepts and techniques of descriptive statistics, empowering you to explore, understand, and communicate data like a seasoned expert.

What is Descriptive Statistics?

Descriptive statistics  refers to a set of mathematical and graphical tools used to summarize and describe essential features of a dataset. These statistics provide a clear and concise representation of data, enabling researchers, analysts, and decision-makers to gain valuable insights, identify patterns, and understand the characteristics of the information at hand.

Purpose of Descriptive Statistics

The primary purpose of descriptive statistics is to simplify and condense complex data into manageable, interpretable summaries. Descriptive statistics serve several key objectives:

  • Data Summarization:  They provide a compact summary of the main characteristics of a dataset, allowing individuals to grasp the essential features quickly.
  • Data Visualization:  Descriptive statistics often accompany visual representations, such as histograms, box plots, and bar charts, making it easier to interpret and communicate data trends and distributions.
  • Data Exploration:  They facilitate the exploration of data to identify outliers, patterns, and potential areas of interest or concern.
  • Data Comparison:  Descriptive statistics enable the comparison of datasets, groups, or variables, aiding in decision-making and hypothesis testing.
  • Informed Decision-Making:  By providing a clear understanding of data, descriptive statistics support informed decision-making across various domains, including business, healthcare, social sciences, and more.

Importance of Descriptive Statistics in Data Analysis

Descriptive statistics play a pivotal role in data analysis by providing a foundation for understanding, summarizing, and interpreting data. Their importance is underscored by their widespread use in diverse fields and industries.

Here are key reasons why descriptive statistics are crucial in data analysis:

  • Data Simplification:  Descriptive statistics simplify complex datasets, making them more accessible to analysts and decision-makers. They condense extensive information into concise metrics and visual representations.
  • Initial Data Assessment:  Descriptive statistics are often the first step in data analysis. They help analysts gain a preliminary understanding of the data's characteristics and identify potential areas for further investigation.
  • Data Visualization:  Descriptive statistics are often paired with visualizations, enhancing data interpretation. Visual representations, such as histograms and scatter plots, provide intuitive insights into data patterns.
  • Communication and Reporting:  Descriptive statistics serve as a common language for conveying data insights to a broader audience. They are instrumental in research reports, presentations, and data-driven decision-making.
  • Quality Control:  In manufacturing and quality control processes, descriptive statistics help monitor and maintain product quality by identifying deviations from desired standards.
  • Risk Assessment:  In finance and insurance, descriptive statistics, such as standard deviation and variance, are used to assess and manage risk associated with investments and policies.
  • Healthcare Decision-Making:  Descriptive statistics inform healthcare professionals about patient demographics , treatment outcomes, and disease prevalence, aiding in clinical decision-making and healthcare policy formulation.
  • Market Analysis :  In marketing and consumer research, descriptive statistics reveal customer preferences, market trends, and product performance, guiding marketing strategies and product development .
  • Scientific Research:  In scientific research, descriptive statistics are fundamental for summarizing experimental results, comparing groups, and identifying meaningful patterns in data.
  • Government and Policy:  Government agencies use descriptive statistics to collect and analyze data on demographics, economics, and social trends to inform policy decisions and resource allocation.

Descriptive statistics serve as a critical foundation for effective data analysis and decision-making across a wide range of disciplines. They empower individuals and organizations to extract meaningful insights from data, enabling more informed and evidence-based choices.

Data Collection and Preparation

First, let's delve deeper into the crucial initial data collection and preparation steps. These initial stages lay the foundation for effective descriptive statistics.

Data Sources

When embarking on a data analysis journey, you must first identify your data sources. These sources can be categorized into two main types:

  • Primary Data :  This data is collected directly from original sources. It includes surveys, experiments, and observations tailored to your specific research objectives. Primary data offers high relevance and control over the data collection process.
  • Secondary Data :  Secondary data, on the other hand, is data that already exists and has been collected by someone else for a different purpose. It can include publicly available datasets, reports, and databases. Secondary data can save time and resources but may not always align perfectly with your research needs.

Understanding the nature of your data is fundamental. Data can be classified into two primary types:

  • Quantitative Data :  Quantitative data consists of numeric values and is often used for measurements and calculations. Examples include age, income, temperature, and test scores. Quantitative data can further be categorized as discrete (countable) or continuous (measurable).
  • Qualitative Data :  Qualitative data, also known as categorical data, represents categories or labels and cannot be measured numerically. Examples include gender, color, and product categories. Qualitative data can be nominal (categories with no specific order) or ordinal (categories with a meaningful order).

Data Cleaning and Preprocessing

Once you have your data in hand, preparing it for analysis is essential. Data cleaning and preprocessing involve several critical steps:

Handling Missing Data

Missing data can significantly impact your analysis. There are various approaches to address missing values:

  • Deletion:  You can remove rows or columns with missing data, but this may lead to a loss of valuable information.
  • Imputation:  Imputing missing values involves estimating or filling in the missing data using methods such as mean imputation, median imputation, or advanced techniques like regression imputation.

Outlier Detection

Outliers are data points that deviate significantly from the rest of the data. Detecting and handling outliers is crucial to prevent them from skewing your results. Popular methods for identifying outliers include box plots and z-scores.

Data Transformation

Data transformation aims to normalize or standardize the data to make it more suitable for analysis. Common transformations include:

  • Normalization:  Scaling data to a standard range, often between 0 and 1.
  • Standardization:  Transforming data to have a mean of 0 and a standard deviation of 1.

Data Organization and Presentation

Organizing and presenting your data effectively is essential for meaningful analysis and communication. Here's how you can achieve this:

Data Tables

Data tables are a straightforward way to present your data, especially when dealing with smaller datasets. They allow you to list data in rows and columns, making it easy to review and perform basic calculations.

Graphs and Charts

Visualizations play a pivotal role in conveying the message hidden within your data. Some common types of graphs and charts include:

  • Histograms:  Histograms display the distribution of continuous data by dividing it into intervals or bins and showing the frequency of data points within each bin.
  • Bar Charts:  Bar charts are excellent for representing categorical or discrete data . They display categories on one axis and corresponding values on the other.
  • Line Charts:  Line charts are useful for identifying trends over time, making them suitable for time series data.
  • Scatter Plots:  Scatter plots help visualize the relationship between two variables, making them valuable for identifying correlations.
  • Pie Charts:  Pie charts are suitable for displaying the composition of a whole in terms of its parts, often as percentages.

Summary Statistics

Calculating summary statistics, such as the mean, median, and standard deviation, provides a quick snapshot of your data's central tendencies and variability.

When it comes to data collection and visualization, Appinio offers a seamless solution that simplifies the process. In Appinio, creating interactive visualizations is the easiest way to understand and present your data effectively. These visuals help you uncover insights and patterns within your data, making it a valuable tool for anyone seeking to make data-driven decisions.

Book a demo today to explore how Appinio can enhance your data collection and visualization efforts, ultimately empowering your decision-making process!

Book a Demo

Measures of Central Tendency

Measures of central tendency are statistics that provide insight into the central or typical value of a dataset. They help you understand where the data tends to cluster, which is crucial for drawing meaningful conclusions.

The mean, also known as the average, is the most widely used measure of central tendency. It is calculated by summing all the values in a dataset and then dividing by the total number of values. The formula for the mean (μ) is:

μ = (Σx) / N
  • μ represents the mean.
  • Σx represents the sum of all individual data points.
  • N is the total number of data points.

The mean is highly sensitive to outliers and extreme values in the dataset. It's an appropriate choice for normally distributed data.

The median is another measure of central tendency that is less influenced by outliers compared to the mean. To find the median, you first arrange the data in ascending or descending order and then locate the middle value. If there's an even number of data points, the median is the average of the two middle values.

For example, in the dataset [3, 5, 7, 8, 10], the median is 7.

The mode is the value that appears most frequently in a dataset. Unlike the mean and median, which are influenced by the actual values, the mode represents the data point with the highest frequency of occurrence.

In the dataset [3, 5, 7, 8, 8], the mode is 8.

Choosing the Right Measure

Selecting the appropriate measure of central tendency depends on the nature of your data and your research objectives:

  • Use the  mean  for normally distributed data without significant outliers.
  • Choose the  median  when dealing with skewed data or data with outliers.
  • The  mode  is most useful for categorical data  or nominal data .

Understanding these measures and when to apply them is crucial for accurate data analysis and interpretation.

Measures of Variability

The measures of variability provide insights into how spread out or dispersed your data is. These measures complement the central tendency measures discussed earlier and are essential for a comprehensive understanding of your dataset.

The range is the simplest measure of variability and is calculated as the difference between the maximum and minimum values in your dataset. It offers a quick assessment of the spread of your data.

Range = Maximum Value - Minimum Value

For example, consider a dataset of daily temperatures in Celsius for a month:

  • Maximum temperature: 30°C
  • Minimum temperature: 10°C

The range would be 30°C - 10°C = 20°C, indicating a 20-degree Celsius spread in temperature over the month.

Variance measures the average squared deviation of each data point from the mean. It quantifies the overall dispersion of data points. The formula for variance (σ²) is as follows:

σ² = Σ(x - μ)² / N
  • σ² represents the variance.
  • Σ represents the summation symbol.
  • x represents each individual data point.
  • μ is the mean of the dataset.

Calculating the variance involves the following:

  • Find the mean (μ) of the dataset.
  • For each data point, subtract the mean (x - μ).
  • Square the result for each data point [(x - μ)²].
  • Sum up all the squared differences [(Σ(x - μ)²)].
  • Divide by the total number of data points (N) to get the variance.

A higher variance indicates greater variability among data points, while a lower variance suggests data points are closer to the mean.

Standard Deviation

The standard deviation is a widely used measure of variability and is simply the square root of the variance. It provides a more interpretable value and is often preferred for reporting. The formula for standard deviation (σ) is:

Calculating the standard deviation follows the same process as variance but with an additional step of taking the square root of the variance. It represents the average deviation of data points from the mean in the same units as the data.

For example, if the variance is calculated as 16 (square units), the standard deviation would be 4 (the same units as the data). A smaller standard deviation indicates data points are closer to the mean, while a larger standard deviation indicates greater variability.

Interquartile Range (IQR)

The interquartile range (IQR) is a robust measure of variability that is less influenced by extreme values (outliers) than the range, variance, or standard deviation. It is based on the quartiles of the dataset. To calculate the IQR:

  • Arrange the data in ascending order.
  • Calculate the first quartile (Q1), which is the median of the lower half of the data.
  • Calculate the third quartile (Q3), which is the median of the upper half of the data.
  • Subtract Q1 from Q3 to find the IQR.
IQR = Q3 - Q1

The IQR represents the range within which the central 50% of your data falls. It provides valuable information about the middle spread of your dataset, making it a useful measure for skewed or non-normally distributed data.

Data Distribution

Understanding the distribution of your data is essential for making meaningful inferences and choosing appropriate statistical methods. In this section, we will explore different aspects of data distribution.

Normal Distribution

The normal distribution, also known as the Gaussian distribution or bell curve, is a fundamental concept in statistics. It is characterized by a symmetric, bell-shaped curve. In a normal distribution:

  • The mean, median, and mode are all equal and located at the center of the distribution.
  • Data points are evenly spread around the mean.
  • The distribution is defined by two parameters: mean (μ) and standard deviation (σ).

The normal distribution is essential in various statistical tests and modeling techniques. Many natural phenomena, such as heights and IQ scores, closely follow a normal distribution. It serves as a reference point for understanding other distributions and statistical analyses.

Skewness and Kurtosis

Skewness and kurtosis are measures that provide insights into the shape of a data distribution:

Skewness quantifies the asymmetry of a distribution. A distribution can be:

  • Positively Skewed (Right-skewed):  In a positively skewed distribution, the tail extends to the right, and the majority of data points are concentrated on the left side of the distribution. The mean is typically greater than the median.
  • Negatively Skewed (Left-skewed):  In a negatively skewed distribution, the tail extends to the left, and the majority of data points are concentrated on the right side of the distribution. The mean is typically less than the median.

Skewness is calculated using various formulas, including Pearson's first coefficient of skewness.

Kurtosis measures the "tailedness" of a distribution, indicating whether the distribution has heavy or light tails compared to a normal distribution. Kurtosis can be:

  • Leptokurtic:  A distribution with positive kurtosis has heavier tails and a more peaked central region than a normal distribution.
  • Mesokurtic:  A distribution with kurtosis equal to that of a normal distribution.
  • Platykurtic:  A distribution with negative kurtosis has lighter tails and a flatter central region than a normal distribution.

Kurtosis is calculated using different formulas, including the fourth standardized moment.

Understanding skewness and kurtosis helps you assess the departure of your data from normality and choose appropriate statistical methods.

Other Types of Distributions

While the normal distribution is prevalent, real-world data often follows different distributions. Some other types of distributions you may encounter include:

  • Exponential Distribution:  Commonly used for modeling the time between events in a Poisson process, such as arrival times in a queue.
  • Poisson Distribution:  Used for counting the number of events in a fixed interval of time or space, such as the number of phone calls received in an hour.
  • Binomial Distribution:  Suitable for modeling the number of successes in a fixed number of independent Bernoulli trials.
  • Lognormal Distribution:  Often used for data that is the product of many small, independent, positive factors, such as stock prices.
  • Uniform Distribution:  Represents a constant probability over a specified range of values, making all outcomes equally likely.

Understanding the characteristics and properties of these distributions is crucial for selecting appropriate statistical techniques and making accurate interpretations in various fields of study and data analysis.

Visualizing Data

Visualizing data is a powerful way to gain insights and understand the patterns and characteristics of your dataset. Below are several standard methods of data visualization.

Histograms  are a widely used graphical representation of the distribution of continuous data. They are particularly useful for understanding the shape of the data's frequency distribution. Here's how they work:

  • Data is divided into intervals, or "bins."
  • The number of data points falling into each bin is represented by the height of bars on a graph.
  • The bars are typically adjacent and do not have gaps between them.

Histograms help you visualize the central tendency, spread, and skewness of your data. They can reveal whether your data is normally distributed, skewed to the left or right, or exhibits multiple peaks.

Histograms are especially useful when you have a large dataset and want to quickly assess its distribution. They are commonly used in fields like finance to analyze stock returns, biology to study species distribution, and quality control to monitor manufacturing processes.

Box plots , also known as box-and-whisker plots, are excellent tools for visualizing the distribution of data, particularly for identifying outliers and comparing multiple datasets. Here's how they are constructed:

  • The box represents the interquartile range (IQR), with the lower edge of the box at the first quartile (Q1) and the upper edge at the third quartile (Q3).
  • A vertical line inside the box indicates the median (Q2).
  • Whiskers extend from the edges of the box to the minimum and maximum values within a certain range.
  • Outliers, which are data points significantly outside the whiskers, are often shown as individual points.

Box plots provide a concise summary of data distribution, including central tendency and variability. They are beneficial when comparing data distribution across different categories or groups.

Box plots are commonly used in fields like healthcare to compare patient outcomes by treatment, in education to assess student performance across schools, and in market research to analyze customer ratings for different products.

Scatter Plots

Scatter plots  are a valuable tool for visualizing the relationship between two continuous variables. They are handy for identifying patterns, trends, and correlations in data. Here's how they work:

  • Each data point is represented as a point on the graph, with one variable on the x-axis and the other on the y-axis.
  • The resulting plot shows the dispersion and clustering of data points, allowing you to assess the strength and direction of the relationship.

Scatter plots help you determine whether there is a positive, negative, or no correlation between the variables. Additionally, they can reveal outliers and influential data points that may affect the relationship.

Scatter plots are commonly used in fields like economics to analyze the relationship between income and education, environmental science to study the correlation between temperature and plant growth, and marketing to understand the relationship between advertising spend and sales.

Frequency Distributions

Frequency distributions  are a tabular way to organize and display categorical or discrete data. They show the count or frequency of each category within a dataset. Here's how to create a frequency distribution:

  • Identify the distinct categories or values in your dataset.
  • Count the number of occurrences of each category.
  • Organize the results in a table, with categories in one column and their respective frequencies in another.

Frequency distributions help you understand the distribution of categorical data, identify dominant categories, and detect any rare or uncommon values. They are commonly used in fields like marketing to analyze customer demographics, in education to assess student grades, and in social sciences to study survey responses.

Descriptive Statistics for Categorical Data

Categorical data requires its own set of descriptive statistics to gain insights into the distribution and characteristics of these non-numeric variables. There are various methods for describing categorical data.

Frequency Tables

Frequency tables , also known as contingency tables, summarize categorical data by displaying the count or frequency of each category within one or more variables. Here's how they are created:

  • List the categories or values of the categorical variable(s) in rows or columns.
  • Count the occurrences of each category and record the frequencies.

Frequency tables are best used for summarizing and comparing categorical data across different groups or dimensions. They provide a straightforward way to understand data distribution and identify patterns or associations.

For example, in a survey about favorite ice cream flavors , a frequency table might show how many respondents prefer vanilla, chocolate, strawberry, and other flavors.

Bar charts  are a common graphical representation of categorical data. They are similar to histograms but are used for displaying categorical variables. Here's how they work:

  • Categories are listed on one axis (usually the x-axis), while the corresponding frequencies or counts are shown on the other axis (usually the y-axis).
  • Bars are drawn for each category, with the height of each bar representing the frequency or count of that category.

Bar charts make it easy to compare the frequencies of different categories visually. They are especially helpful for presenting categorical data in a visually appealing and understandable way.

Bar charts are commonly used in fields like market research to display survey results, in social sciences to illustrate demographic information, and in business to show product sales by category.

Pie charts  are circular graphs that represent the distribution of categorical data as "slices of a pie." Here's how they are constructed:

  • Categories or values are represented as segments or slices of the pie, with each segment's size proportional to its frequency or count.

Pie charts are effective for showing the relative proportions of different categories within a dataset. They are instrumental when you want to emphasize the composition of a whole in terms of its parts.

Pie charts are commonly used in areas such as marketing to display market share, in finance to show budget allocations, and in demographics to illustrate the distribution of ethnic groups within a population.

These methods for visualizing and summarizing categorical data are essential for gaining insights into non-numeric variables and making informed decisions based on the distribution of categories within a dataset.

Descriptive Statistics Summary and Interpretation

Summarizing and interpreting descriptive statistics gives you the skills to extract meaningful insights from your data and apply them to real-world scenarios.

Summarizing Descriptive Statistics

Once you've collected and analyzed your data using descriptive statistics, the next step is to summarize the findings. This involves condensing the wealth of information into a few key points:

  • Central Tendency:  Summarize the central tendency of your data. If it's a numeric dataset, mention the mean, median, and mode as appropriate. For categorical data, highlight the most frequent categories.
  • Variability:  Describe the spread of the data using measures like range, variance, and standard deviation. Discuss whether the data is tightly clustered or widely dispersed.
  • Distribution:  Mention the shape of the data distribution. Is it normal, skewed, or bimodal? Use histograms or box plots to illustrate the distribution visually.
  • Outliers:  Identify any outliers and discuss their potential impact on the analysis. Consider whether outliers should be treated or investigated further.
  • Key Observations: Highlight any notable observations or patterns that emerged during your analysis. Are there clear trends or interesting findings in the data?

Interpreting Descriptive Statistics

Interpreting descriptive statistics involves making sense of the numbers and metrics you've calculated. It's about understanding what the data is telling you about the underlying phenomenon. Here are some steps to guide your interpretation:

  • Context Matters:  Always consider the context of your data. What does a specific value or pattern mean in the real-world context of your study? For example, a mean salary value may vary significantly depending on the industry.
  • Comparisons:  If you have multiple datasets or groups, compare their descriptive statistics. Are there meaningful differences or similarities between them? Statistical tests may be needed for formal comparisons.
  • Correlations:  If you've used scatter plots to visualize relationships, interpret the direction and strength of correlations. Are variables positively or negatively correlated, or is there no clear relationship?
  • Causation:  Be cautious about inferring causation from descriptive statistics alone. Correlation does not imply causation, so consider additional research or experimentation to establish causal relationships.
  • Consider Outliers:  If you have outliers, assess their impact on the overall interpretation. Do they represent genuine data points or measurement errors?

Descriptive Statistics Examples

To better understand how descriptive statistics are applied in real-world scenarios, let's explore a range of practical examples across various fields and industries. These examples illustrate how descriptive statistics provide valuable insights and inform decision-making processes.

Financial Analysis

Example:  Investment Portfolio Analysis

Description:  An investment analyst is tasked with evaluating the performance of a portfolio of stocks over the past year. They collect daily returns for each stock and want to provide a comprehensive summary of the portfolio's performance.

Use of Descriptive Statistics:

  • Central Tendency:  Calculate the portfolio's average daily return (mean) to assess its overall performance during the year.
  • Variability:  Compute the portfolio's standard deviation to measure the risk or volatility associated with the investment.
  • Distribution:  Create a histogram to visualize the distribution of daily returns, helping the analyst understand the nature of the portfolio's gains and losses.
  • Outliers:  Identify any outliers in daily returns that may require further investigation.

The resulting descriptive statistics will guide the analyst in making recommendations to investors, such as adjusting the portfolio composition to manage risk or improve returns.

Example:  Hospital Patient Demographics

Description:  A hospital administrator wants to understand the demographics of patients admitted to their facility over the past year. They have data on patient age, gender, and medical conditions.

  • Central Tendency:  Calculate the average age of patients to assess the typical age of admissions.
  • Variability:  Compute the standard deviation of patient ages to understand how age varies among patients.
  • Distribution:  Create bar charts or pie charts to visualize the gender distribution of patients and frequency tables to analyze the prevalence of different medical conditions.
  • Key Observations:  Identify any trends, such as seasonal variations in admissions or common medical conditions among specific age groups.

These descriptive statistics help the hospital administration allocate resources effectively, plan for future patient needs, and tailor healthcare services to the demographics of their patient population.

Marketing Research

Example:  Product Sales Analysis

Description:  A marketing team wants to evaluate the sales performance of different products in their product line. They have monthly sales data for the past two years.

  • Central Tendency:  Calculate the mean monthly sales for each product to determine their average performance.
  • Variability:  Compute the standard deviation of monthly sales to identify products with the most variable sales.
  • Distribution:  Create box plots to visualize the sales distribution for each product, helping to understand the range and variability.
  • Comparisons:  Compare sales trends over the two years for each product to identify growth or decline patterns.

Descriptive statistics allow the marketing team to make informed decisions about product marketing strategies, inventory management, and product development.

Social Sciences

Example:  Survey Analysis on Happiness Levels

Description:  A sociologist conducts a survey to assess the happiness levels of residents in different neighborhoods within a city. Respondents rate their happiness on a scale of 1 to 10.

  • Central Tendency:  Calculate the mean happiness score for each neighborhood to identify areas with higher or lower average happiness levels.
  • Variability:  Compute the standard deviation of happiness scores to understand the degree of variation within each neighborhood.
  • Distribution:  Create histograms to visualize the distribution of happiness scores, revealing whether happiness levels are normally distributed or skewed.
  • Comparisons:  Compare the happiness levels across neighborhoods to identify potential factors influencing happiness disparities.

Descriptive statistics help sociologists pinpoint areas that may require interventions to improve residents' overall well-being and identify potential research directions.

These examples demonstrate how descriptive statistics play a vital role in summarizing and interpreting data across diverse domains. By applying these statistical techniques, professionals can make data-driven decisions, identify trends and patterns, and gain valuable insights into various aspects of their work.

Common Descriptive Statistics Mistakes and Pitfalls

While descriptive statistics are valuable tools, they can be misused or misinterpreted if not handled carefully. Here are some common mistakes and pitfalls to avoid when working with descriptive statistics.

Misinterpretation of Descriptive Statistics

  • Assuming Causation:  One of the most common mistakes is inferring causation from correlation . Just because two variables are correlated does not mean that one causes the other. Always be cautious about drawing causal relationships from descriptive statistics alone.
  • Ignoring Context:  Failing to consider the context of the data can lead to misinterpretation. A descriptive statistic may seem significant, but it might not have practical relevance in the specific context of your study.
  • Neglecting Outliers:  Ignoring outliers or treating them as errors without investigation can lead to incomplete and inaccurate conclusions. Outliers may hold valuable information or reveal unusual phenomena.
  • Overlooking Distribution Assumptions:  When applying statistical tests or methods, it's important to check whether your data meets the assumptions of those techniques. For example, using methods designed for normally distributed data on skewed data can yield misleading results.

Data Reporting Errors

  • Inadequate Data Documentation:  Failing to provide clear documentation about data sources, collection methods, and preprocessing steps can make it challenging for others to replicate your analysis or verify your findings.
  • Mislabeling Variables:  Accurate labeling of variables and units is crucial. Mislabeling or using inconsistent units can lead to erroneous calculations and interpretations.
  • Failure to Report Measures of Uncertainty:  Descriptive statistics provide point estimates of central tendency and variability. It's crucial to report measures of uncertainty, such as confidence intervals or standard errors, to convey the range of possible values.

Avoiding Biases in Descriptive Statistics

  • Sampling Bias :  Ensure that your sample is representative of the population you intend to study. Sampling bias can occur when certain groups or characteristics are over- or underrepresented in the sample, leading to biased results.
  • Selection Bias:  Be cautious of selection bias, where specific data points are systematically included or excluded based on criteria that are unrelated to the research question. This can distort the analysis.
  • Confirmation Bias:  Avoid the tendency to seek, interpret, or remember information in a way that confirms preexisting beliefs or hypotheses. This bias can lead to selective attention and misinterpretation of data.
  • Reporting Bias:  Be transparent in reporting all relevant data, even if the results do not support your hypothesis or are inconclusive. Omitting such data can create a biased view of the overall picture.

Awareness of these common mistakes and pitfalls can help you conduct more robust and accurate analyses using descriptive statistics, leading to more reliable and meaningful conclusions in your research and decision-making processes.

Descriptive statistics are the essential building blocks of data analysis. They provide us with the means to summarize, visualize, and comprehend the often intricate world of data. By mastering these techniques, you have gained a valuable skill that can be applied across a multitude of fields and industries. From making informed business decisions to advancing scientific research, from understanding market trends to improving healthcare outcomes, descriptive statistics serve as our trusted guides in the realm of data.

You've learned how to calculate measures of central tendency, assess variability, explore data distributions, and employ powerful visualization tools. You've seen how descriptive statistics bring clarity to the chaos of data, revealing patterns and outliers, guiding your decisions, and enabling you to communicate insights effectively . As you continue to work with data, remember that descriptive statistics are your steadfast companions, ready to help you navigate the data landscape, extract valuable insights, and make informed choices based on evidence rather than guesswork.

How to Collect Descriptive Statistics in Minutes?

Introducing Appinio , the real-time market research platform that's revolutionizing how businesses harness consumer insights. Imagine conducting your own market research in minutes, with the power of descriptive statistics at your fingertips.

Here's why Appinio is your go-to choice for fast, data-driven decisions:

Instant Insights: From questions to insights in minutes. Appinio accelerates your decision-making process, delivering real-time results when you need them most.

User-Friendly: No need for a PhD in research. Appinio's intuitive platform ensures that anyone can seamlessly gather and analyze data, making market research accessible to all.

Global Reach: Define your target group from 1200+ characteristics and survey it in over 90 countries. With Appinio, you can tap into a diverse pool of respondents worldwide.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

Targeted Advertising Definition Benefits Examples

25.04.2024 | 37min read

Targeted Advertising: Definition, Benefits, Examples

Quota Sampling Definition Types Methods Examples

17.04.2024 | 25min read

Quota Sampling: Definition, Types, Methods, Examples

What is Market Share? Definition, Formula, Examples

15.04.2024 | 34min read

What is Market Share? Definition, Formula, Examples

Descriptive and Inferential Statistics

When analysing data, such as the marks achieved by 100 students for a piece of coursework, it is possible to use both descriptive and inferential statistics in your analysis of their marks. Typically, in most research conducted on groups of people, you will use both descriptive and inferential statistics to analyse your results and draw conclusions. So what are descriptive and inferential statistics? And what are their differences?

Descriptive Statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.

Descriptive statistics are very important because if we simply presented our raw data it would be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows simpler interpretation of the data. For example, if we had the results of 100 pieces of students' coursework, we may be interested in the overall performance of those students. We would also be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this. How to properly describe data through statistics and graphs is an important topic and discussed in other Laerd Statistics guides. Typically, there are two general types of statistic that are used to describe data:

  • Measures of central tendency: these are ways of describing the central position of a frequency distribution for a group of data. In this case, the frequency distribution is simply the distribution and pattern of marks scored by the 100 students from the lowest to the highest. We can describe this central position using a number of statistics, including the mode, median, and mean. You can learn more in our guide: Measures of Central Tendency .
  • Measures of spread: these are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of our 100 students may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be lower and others higher. Measures of spread help us to summarize how spread out these scores are. To describe this spread, a number of statistics are available to us, including the range, quartiles, absolute deviation, variance and standard deviation .

When we use descriptive statistics it is useful to summarize our group of data using a combination of tabulated description (i.e., tables), graphical description (i.e., graphs and charts) and statistical commentary (i.e., a discussion of the results).

Inferential Statistics

We have seen that descriptive statistics provide information about our immediate group of data. For example, we could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide valuable information about this group of 100 students. Any group of data like this, which includes all the data you are interested in, is called a population . A population can be small or large, as long as it includes all the data you are interested in. For example, if you were only interested in the exam marks of 100 students, the 100 students would represent your population. Descriptive statistics are applied to populations, and the properties of populations, like the mean or standard deviation, are called parameters as they represent the whole population (i.e., everybody you are interested in).

Often, however, you do not have access to the whole population you are interested in investigating, but only a limited number of data instead. For example, you might be interested in the exam marks of all students in the UK. It is not feasible to measure all exam marks of all students in the whole of the UK so you have to measure a smaller sample of students (e.g., 100 students), which are used to represent the larger population of all UK students. Properties of samples, such as the mean or standard deviation, are not called parameters, but statistics . Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling (sampling strategies are discussed in detail in the section, Sampling Strategy , on our sister site). Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. The methods of inferential statistics are (1) the estimation of parameter(s) and (2) testing of statistical hypotheses .

We have provided some answers to common FAQs on the next page . Alternatively, why not now read our guide on Types of Variable?

descriptive statistics in research definition

Research 101: Descriptive statistics

Use these tools to analyze data vital to practice-improvement projects..

By Brian Conner, PhD, RN, CNE and Emily Johnson, PhD

  • Nurses at every level should be able to understand and apply basic statistical analyses related to performance improvement projects.
  • Measures of central tendency (such as mean) and variability (such as standard deviation) are fairly common and easy to use.

How many times have you said (or heard), “Statistics are too complicated”? A significant percentage of graduate students and nurses in clinical practice report feeling anxious when working with statistics. And although some statistical analysis is pretty complicated, you don’t need a doctoral degree to understand and use descriptive statistics.

What are descriptive statistics?

research descriptive statistics

Sometimes, descriptive statistics are the only analyses completed in a research or evidence-based practice study; however, they don’t typically help us reach conclusions about hypotheses. Instead, they’re used as preliminary data, which can provide the foundation for future research by defining initial problems or identifying essential analyses in more complex investigations.

Common descriptive statistics

The most common types of descriptive statistics are the measures of central tendency (mean, median, and mode) that are used in most levels of math, research, evidence-based practice, and quality improvement. These measures describe the central portion of frequency distribution for a data set.

The most familiar of these is the mean , or average, which most people use and understand. It’s calculated by adding the sum of values in the data and dividing by the total number of observations.

The median is a number found at the exact middle of a set of data. If there are two numbers at the middle of the data set (which occurs when there is an even number of data points), these two numbers are averaged to identify the median. It’s typically used to describe a data set that has extreme outliers (very low or very high numbers, distant from the majority of data points), in which case the mean will not accurately represent the data. (See What to do with outliers .) To calculate a mean or median, data must be quantitative/continuous (have an infinite number of possibilities).

What to do with outliers

The mode represents the most frequently occurring number or item in a data set. Some data sets have more than one mode, making them bimodal (two modes) or multimodal (more than two modes). The mode can be calculated with data that are quantitative/continuous or qualitative/categorical (have a finite number of categories or groups, such as sex, race, or education level). The mode is the only measure of central tendency that can be analyzed with qualitative/categorical data.

Less common descriptive statistics

Measures of variability or dispersion are less common descriptive statistics, but they’re still important because they describe the spread of values across a data set. Although the central tendency of data is vital, the range of values (the difference between the maximum and minimum values in the data) also may be important to note. The range not only sets boundaries for your data set and indicates the spread, but it also can identify errors in the data. For example, if you have a data set with a diastolic blood pressure range of 230 (highest diastolic value) to 25 (lowest diastolic value) = 205 (range), an error probably exists in your data because the values of 230 and 25 aren’t valid blood pressure measures in most studies. Other measures of variability include standard deviation, variance, and quartiles. (See Other variability measures .)

Other variability measures

Practical application of descriptive statistics.

To put all of this information into perspective, here’s an example of how these measures can be used in a clinical setting.

A rural primary care clinic has a high percentage of patients with diabetes whose glycated hemo­globin (HbA1c) levels are greater than 7% (uncontrolled HbA1c) and body mass index (BMI) is over 30. The clinic implements a 9-month quality-improvement initiative to lower these numbers. The initiative includes a wellness education program focused on exercise, healthy eating, and understanding the importance of regular blood glucose monitoring. Before implementing the program, the clinic collects 3 months of aggregate data (3, 6, and 9 months before the intervention) for all patients with diabetes in the clinic, including HbA1c levels, BMIs, and patients with uncontrolled HbA1c. Gender and age also are collected. The clinic then collects the same data 3, 6, and 9 months after implementation of the program. (See Snapshot of aggregate data .) Because of the different types of data collected, different measures of central tendency and variability can help describe outcomes. (See Statistical analysis examples .)

Snapshot of aggregate data

research descriptive statistics snapshot aggregate data

Statistical analysis examples

research descriptive statistics analysis example

Implications for practice

Nurses are increasingly asked to participate and lead evidence-based practice and quality-improvement projects. Many healthcare organizations, including those aspiring to or holding Magnet® recognition from the American Nurses Credentialing Center, require that nurses take part in these activities to achieve higher levels of professional development within clinical ladder programs. Nurses can and should learn how to use descriptive statistics to analyze and depict vital data related to practice-improvement projects.

Brian Conner is adjunct faculty at the School of Nursing and Health Sciences for Simmons College in Boston, Massachusetts. Emily Johnson is an assistant professor at the Medical University of South Carolina College of Nursing in Charleston.

Selected references

American Nurses Credentialing Center (ANCC). Magnet Recognition Program® Overview . 2016.

Heavey E. Statistics for Nursing: A Practical Approach . 2nd ed. Burlington, MA: Jones and Bartlett Learning; 2015.

Thabane L, Akhtar-Danesh N. Guidelines for reporting descriptive statistics in health research. Nurse Res . 2008;15(2):72-81.

Zhang Y, Shang L, Wang R, et al. Attitudes toward statistics in medical postgraduates: Measuring, evaluating and monitoring. BMC Med Educ . 2012;12:117.

1 Comment .

This is a great data set and I would like to see if I can have copy right to use it in my « statistical thinking for Nursing » course here at Castleton Unvesity to demonstrate Bootsrapjng technique as Inferential statistics. Please let me know. Thank You Dr. Rajia

Comments are closed.

descriptive statistics in research definition

NurseLine Newsletter

  • First Name *
  • Last Name *
  • Hidden Referrer

*By submitting your e-mail, you are opting in to receiving information from Healthcom Media and Affiliates. The details, including your email address/mobile number, may be used to keep you informed about future products and services.

Test Your Knowledge

Recent posts.

tiles spelling out fake

Rising number of ‘predatory’ academic journals undermines research and public trust in scholarship

descriptive statistics in research definition

Interpreting statistical significance in nursing research

team

The Dauntless Nurse: Returning to “normal”?

Postural orthostatic tachycardia syndrome

Postural orthostatic tachycardia syndrome

Magnet® and COVID-19 response

Magnet® and COVID-19 response

Engage new nurses to ensure a sense of shared responsibility

Operationalizing the Pathway to Excellence® quality standard

Roe vs. Wade

ANA Statement on overturn of Roe vs. Wade

The ABCs of nursing research

The ABCs of nursing research

pulmonary hypertension

Pulmonary hypertension: Consider the “zebra”

STI guidelines

2021 CDC STI guidelines: A review of changes

Outreach score identifies care coordination needs

Outreach score identifies care coordination needs

Comorbid mental illness on acute medical units

Comorbid mental illness on acute medical units

writing clinical articles

Writing clinical articles: A step-by-step guide for nurses

Sickle Cell Crisis

Sickle cell crisis in the ED

preceptor, american nurse, nursing journal, healthcare

How to be a good preceptee

descriptive statistics in research definition

Statology

Statistics Made Easy

Descriptive vs. Inferential Statistics: What’s the Difference?

There are two main branches in the field of statistics:

  • Descriptive Statistics

Inferential Statistics

This tutorial explains the difference between the two branches and why each one is useful in certain situations.

Descriptive  Statistics

In a nutshell,  descriptive statistics  aims to  describe  a chunk of raw data using summary statistics, graphs, and tables.

Descriptive statistics are useful because they allow you to understand a group of data much more quickly and easily compared to just staring at rows and rows of raw data values.

For example, suppose we have a set of raw data that shows the test scores of 1,000 students at a particular school. We might be interested in the average test score along with the distribution of test scores.

Using descriptive statistics, we could find the average score and create a graph that helps us visualize the distribution of scores.

This allows us to understand the test scores of the students much more easily compared to just staring at the raw data.

Common Forms of Descriptive Statistics

There are three common forms of descriptive statistics:

1. Summary statistics.  These are statistics that  summarize  the data using a single number. There are two popular types of summary statistics:

  • Measures of central tendency : these numbers describe where the center of a dataset is located. Examples include the  mean   and the  median .
  • Measures of dispersion : these numbers describe how spread out the values are in the dataset. Examples include the  range ,  interquartile range ,  standard deviation , and  variance .

2. Graphs . Graphs help us visualize data. Common types of graphs used to visualize data include boxplots , histograms , stem-and-leaf plots , and scatterplots .

3. Tables . Tables can help us understand how data is distributed. One common type of table is a  frequency table , which tells us how many data values fall within certain ranges. 

Example of Using Descriptive Statistics

The following example illustrates how we might use descriptive statistics in the real world.

Suppose 1,000 students at a certain school all take the same test. We are interested in understanding the distribution of test scores, so we use the following descriptive statistics:

1. Summary Statistics

Mean: 82.13 . This tells us that the average test score among all 1,000 students is 82.13.

Median: 84.  This tells us that half of all students scored higher than 84 and half scored lower than 84.

Max: 100. Min: 45.  This tells us the maximum score that any student obtained was 100 and the minimum score was 45. The  range – which tells us the difference between the max and the min – is 55.

To visualize the distribution of test scores, we can create a histogram – a type of chart that uses rectangular bars to represent frequencies.

descriptive statistics in research definition

Based on this histogram, we can see that the distribution of test scores is roughly bell-shaped. Most of the students scored between 70 and 90, while very few scored above 95 and fewer still scored below 50.

Another easy way to gain an understanding of the distribution of scores is to create a frequency table. For example, the following frequency table shows what percentage of students scored between various ranges:

descriptive statistics in research definition

We can see that just 4% of the total students scored above a 95. We can also see that (12% + 9% + 4% = ) 25% of all students scored an 85 or higher.

A frequency table is particularly helpful if we want to know what percentage of the data values fall above or below a certain value. For example, suppose the school considers an “acceptable” test score to be any score above a 75.

By looking at the frequency table, we can easily see that (20% + 22% + 12% + 9% + 4% = ) 67% of the students received an acceptable test score.

In a nutshell,  inferential statistics  uses a small sample of data to draw  inferences  about the larger population that the sample came from.

For example, we might be interested in understanding the political preferences of millions of people in a country.

However, it would take too long and be too expensive to actually survey every individual in the country. Thus, we would instead take a smaller survey of say, 1,000 Americans, and use the results of the survey to draw inferences about the population as a whole.

This is the whole premise behind inferential statistics – we want to answer some question about a population, so we obtain data for a small sample of that population and use the data from the sample to draw inferences about the population.

The Importance of a Representative Sample

In order to be confident in our ability to use a sample to draw inferences about a population, we need to make sure that we have a  representative sample   – that is, a sample in which the characteristics of the individuals in the sample closely match the characteristics of the overall population.

Ideally, we want our sample to be like a “mini version” of our population. So, if we want to draw inferences on a population of students composed of 50% girls and 50% boys, our sample would not be representative if it included 90% boys and only 10% girls.

descriptive statistics in research definition

If our sample is not similar to the overall population, then we cannot generalize the findings from the sample to the overall population with any confidence.

How to Obtain a Representative Sample

To maximize the chances that you obtain a representative sample, you need to focus on two things:

1. Make sure you use a random sampling method.

There are several different random sampling methods that you can use that are likely to produce a representative sample, including:

  • A simple random sample
  • A systematic random sample
  • A cluster random sample
  • A stratified random sample

Random sampling methods tend to produce representative samples because every member of the population has an equal chance of being included in the sample.

2. Make sure your sample size is large enough . 

Along with using an appropriate sampling method, it’s important to ensure that the sample is large enough so that you have enough data to generalize to the larger population.

To determine how large your sample should be, you have to consider the population size you’re studying, the confidence level you’d like to use, and the margin of error you consider to be acceptable.

Fortunately, you can use online calculators to plug in these values and see how large your sample needs to be.

Common Forms of Inferential Statistics

There are three common forms of inferential statistics:

1. Hypothesis Tests.

Often we’re interested in answering questions about a population such as:

  • Is the percentage of people in Ohio in support of candidate A higher than 50%?
  • Is the mean height of a certain plant equal to 14 inches?
  • Is there a difference between the mean height of students at School A compared to School B?

To answer these questions we can perform a hypothesis test , which allows us to use data from a sample to draw conclusions about populations.

2. Confidence Intervals . 

Sometimes we’re interested in estimating some value for a population. For example, we might be interested in the mean height of a certain plant species in Australia.

Instead of going around and measuring every single plant in the country, we might collect a small sample of plants and measure each one. Then, we can use the mean height of the plants in the sample to estimate the mean height for the population.

However, our sample is unlikely to provide a perfect estimate for the population. Fortunately, we can account for this uncertainty by creating a confidence interval , which provides a range of values that we’re confident the true population parameter falls in.

For example, we might produce a 95% confidence interval of [13.2, 14.8], which says we’re 95% confident that the true mean height of this plant species is between 13.2 inches and 14.8 inches.

3. Regression .

Sometimes we’re interested in understanding the relationship between two variables in a population.

For example, suppose we want to know if  hours spent studying per week  is related to  test scores . To answer this question, we could perform a technique known as  regression analysis .

So, we may observe the number of hours studied along with the test scores for 100 students and perform a regression analysis to see if there is a significant relationship between the two variables.

If the p-value of the regression turns out to be significant , then we can conclude that there is a significant relationship between these two variables in the overall population of students.

The Difference Between Descriptive and Inferential Statistics

In summary, the difference between descriptive and inferential statistics can be described as follows:

Descriptive statistics  use summary statistics, graphs, and tables to describe  a data set.

This is useful for helping us gain a quick and easy understanding of a data set without pouring over all of the individual data values.

Inferential statistics  use samples to draw  inferences  about larger populations.

Depending on the question you want to answer about a population, you may decide to use one or more of the following methods: hypothesis tests, confidence intervals, and regression analysis.

If you do choose to use one of these methods, keep in mind that your sample needs to be representative of your population , or the conclusions you draw will be unreliable.

Featured Posts

5 Statistical Biases to Avoid

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

3 Replies to “Descriptive vs. Inferential Statistics: What’s the Difference?”

Wow! Awesome! So easily explained! I finally understood and know now how to create and answer my questions! Thank you!

I just came across this site and all I can say is “I love you Sir”

This site is the real treasure I was lucky to find. Thanks a million, Zach Bobbitt!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

IMAGES

  1. Descriptive Statistics

    descriptive statistics in research definition

  2. 4 SAS/STAT Descriptive Statistics Procedure You Must Know

    descriptive statistics in research definition

  3. Descriptive Statistics: Definition, Overview, Types, Example

    descriptive statistics in research definition

  4. Descriptive Statistics

    descriptive statistics in research definition

  5. What is Descriptive Statistics?

    descriptive statistics in research definition

  6. PPT

    descriptive statistics in research definition

VIDEO

  1. Statistics

  2. Descriptive Statistics

  3. DEMOGRAPHIC CHARACTERISTICS

  4. Descriptive statistics

  5. Descriptive Research definition, types, and its use in education

  6. 4. Descriptive Statistics (Measures of Variability)

COMMENTS

  1. Descriptive Statistics

    Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population. ... In quantitative research, after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable ...

  2. Descriptive Statistics: Definition, Overview, Types, and Example

    Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of it. Descriptive statistics ...

  3. Descriptive Statistics for Summarising Data

    Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).

  4. What Is Descriptive Statistics: Full Explainer With Examples

    Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis. Measures of central tendency include the mean (average), median and mode. Skewness indicates whether a dataset leans to one side or another. Measures of dispersion include the range, variance and standard deviation.

  5. Descriptive Statistics: Reporting the Answers to the 5 Basic Questions

    Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic …

  6. Descriptive Statistics

    Descriptive statistics are widely used in a variety of fields to summarize, represent, and analyze data. Here are some applications: Business: Businesses use descriptive statistics to summarize and interpret data such as sales figures, customer feedback, or employee performance. For instance, they might calculate the mean sales for each month ...

  7. Descriptive statistics

    Research. A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, [1] while descriptive statistics (in the mass noun sense) is the process of using and analysing those statistics. Descriptive statistics is distinguished from inferential ...

  8. Exploratory Data Analysis: Frequencies, Descriptive Statistics

    Researchers must utilize exploratory data techniques to present findings to a target audience and create appropriate graphs and figures. Researchers can determine if outliers exist, data are missing, and statistical assumptions will be upheld by understanding data. Additionally, it is essential to comprehend these data when describing them in conclusions of a paper, in a meeting with ...

  9. Descriptive Research

    Descriptive research aims to accurately and systematically describe a population, situation or phenomenon. It can answer what, where, when and how questions, but not why questions. A descriptive research design can use a wide variety of research methods to investigate one or more variables. Unlike in experimental research, the researcher does ...

  10. Types of Variables, Descriptive Statistics, and Sample Size

    Abstract. This short "snippet" covers three important aspects related to statistics - the concept of variables, the importance, and practical aspects related to descriptive statistics and issues related to sampling - types of sampling and sample size estimation. Keywords: Biostatistics, descriptive statistics, sample size, variables.

  11. Descriptive Statistics

    Descriptive Statistics. The mean, the mode, the median, the range, and the standard deviation are all examples of descriptive statistics. Descriptive statistics are used because in most cases, it isn't possible to present all of your data in any form that your reader will be able to quickly interpret.

  12. Descriptive Statistics

    Importance of Descriptive Statistics. Descriptive statistics allow for the ease of data visualization. It allows for data to be presented in a meaningful and understandable way, which, in turn, allows for a simplified interpretation of the data set in question. Raw data would be difficult to analyze, and trend and pattern determination may be ...

  13. Descriptive Statistics

    Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population . In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e ...

  14. Introduction to Descriptive Statistics

    Descriptive statistics is the analysis and summarization of data to gain insights into its characteristics and distribution [ 1 ]. Descriptive statistics help researchers generate study ideas and guide further analysis by allowing them to explore data patterns and trends [ 2 ].

  15. Descriptive Statistics

    Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a dataset and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of: Graphical/Pictorial Methods. Measures of Central Tendency.

  16. Descriptive Statistics: Definition & Charts and Graphs

    For example, if you have ten items in your data set, type them into cells A1 through A10. Step 2: Click the "Data" tab and then click "Data Analysis" in the Analysis group. Step 3: Highlight "Descriptive Statistics" in the pop-up Data Analysis window. Step 4: Type an input range into the "Input Range" text box.

  17. Exploring Descriptive Statistics: Everything You Need to Know

    Descriptive statistics is important because it allows us to summarize and describe data meaningfully. It helps us understand a dataset's main features and characteristics, identify patterns and trends, and gain insights from the data. Descriptive statistics provide a foundation for further analysis, decision-making, and communication of ...

  18. Basic statistical tools in research and data analysis

    Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and ...

  19. What is Descriptive Statistics? Definition, Types, Examples

    Scientific Research: In scientific research, descriptive statistics are fundamental for summarizing experimental results, comparing groups, and identifying meaningful patterns in data. Government and Policy: Government agencies use descriptive statistics to collect and analyze data on demographics, economics, and social trends to inform policy ...

  20. Descriptive and Inferential Statistics

    Descriptive statistics are very important because if we simply presented our raw data it would be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows simpler interpretation of the data.

  21. Descriptive Statistics

    Descriptive statistics is one of the approaches for realizing descriptive analytics. It is a collection of tools that quantitatively describes the data in summary and graphical forms. Such tools compute measures of central tendency and dispersion. Mean, median, and mode are commonly used measures of central tendency.

  22. Research 101: Descriptive statistics

    The most common types of descriptive statistics are the measures of central tendency (mean, median, and mode) that are used in most levels of math, research, evidence-based practice, and quality improvement. These measures describe the central portion of frequency distribution for a data set. The most familiar of these is the mean, or average ...

  23. Descriptive vs. Inferential Statistics: What's the Difference?

    Descriptive statistics are useful because they allow you to understand a group of data much more quickly and easily compared to just staring at rows and rows of raw data values. For example, suppose we have a set of raw data that shows the test scores of 1,000 students at a particular school. We might be interested in the average test score ...