Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

A Refresher on Regression Analysis

research on regression analysis

Understanding one of the most important types of data analysis.

You probably know by now that whenever possible you should be making data-driven decisions at work . But do you know how to parse through all the data available to you? The good news is that you probably don’t need to do the number crunching yourself (hallelujah!) but you do need to correctly understand and interpret the analysis created by your colleagues. One of the most important types of data analysis is called regression analysis.

  • Amy Gallo is a contributing editor at Harvard Business Review, cohost of the Women at Work podcast , and the author of two books: Getting Along: How to Work with Anyone (Even Difficult People) and the HBR Guide to Dealing with Conflict . She writes and speaks about workplace dynamics. Watch her TEDx talk on conflict and follow her on LinkedIn . amyegallo

research on regression analysis

Partner Center

  • Privacy Policy

Research Method

Home » Regression Analysis – Methods, Types and Examples

Regression Analysis – Methods, Types and Examples

Table of Contents

Regression Analysis

Regression Analysis

Regression analysis is a set of statistical processes for estimating the relationships among variables . It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’).

Regression Analysis Methodology

Here is a general methodology for performing regression analysis:

  • Define the research question: Clearly state the research question or hypothesis you want to investigate. Identify the dependent variable (also called the response variable or outcome variable) and the independent variables (also called predictor variables or explanatory variables) that you believe are related to the dependent variable.
  • Collect data: Gather the data for the dependent variable and independent variables. Ensure that the data is relevant, accurate, and representative of the population or phenomenon you are studying.
  • Explore the data: Perform exploratory data analysis to understand the characteristics of the data, identify any missing values or outliers, and assess the relationships between variables through scatter plots, histograms, or summary statistics.
  • Choose the regression model: Select an appropriate regression model based on the nature of the variables and the research question. Common regression models include linear regression, multiple regression, logistic regression, polynomial regression, and time series regression, among others.
  • Assess assumptions: Check the assumptions of the regression model. Some common assumptions include linearity (the relationship between variables is linear), independence of errors, homoscedasticity (constant variance of errors), and normality of errors. Violation of these assumptions may require additional steps or alternative models.
  • Estimate the model: Use a suitable method to estimate the parameters of the regression model. The most common method is ordinary least squares (OLS), which minimizes the sum of squared differences between the observed and predicted values of the dependent variable.
  • I nterpret the results: Analyze the estimated coefficients, p-values, confidence intervals, and goodness-of-fit measures (e.g., R-squared) to interpret the results. Determine the significance and direction of the relationships between the independent variables and the dependent variable.
  • Evaluate model performance: Assess the overall performance of the regression model using appropriate measures, such as R-squared, adjusted R-squared, and root mean squared error (RMSE). These measures indicate how well the model fits the data and how much of the variation in the dependent variable is explained by the independent variables.
  • Test assumptions and diagnose problems: Check the residuals (the differences between observed and predicted values) for any patterns or deviations from assumptions. Conduct diagnostic tests, such as examining residual plots, testing for multicollinearity among independent variables, and assessing heteroscedasticity or autocorrelation, if applicable.
  • Make predictions and draw conclusions: Once you have a satisfactory model, use it to make predictions on new or unseen data. Draw conclusions based on the results of the analysis, considering the limitations and potential implications of the findings.

Types of Regression Analysis

Types of Regression Analysis are as follows:

Linear Regression

Linear regression is the most basic and widely used form of regression analysis. It models the linear relationship between a dependent variable and one or more independent variables. The goal is to find the best-fitting line that minimizes the sum of squared differences between observed and predicted values.

Multiple Regression

Multiple regression extends linear regression by incorporating two or more independent variables to predict the dependent variable. It allows for examining the simultaneous effects of multiple predictors on the outcome variable.

Polynomial Regression

Polynomial regression models non-linear relationships between variables by adding polynomial terms (e.g., squared or cubic terms) to the regression equation. It can capture curved or nonlinear patterns in the data.

Logistic Regression

Logistic regression is used when the dependent variable is binary or categorical. It models the probability of the occurrence of a certain event or outcome based on the independent variables. Logistic regression estimates the coefficients using the logistic function, which transforms the linear combination of predictors into a probability.

Ridge Regression and Lasso Regression

Ridge regression and Lasso regression are techniques used for addressing multicollinearity (high correlation between independent variables) and variable selection. Both methods introduce a penalty term to the regression equation to shrink or eliminate less important variables. Ridge regression uses L2 regularization, while Lasso regression uses L1 regularization.

Time Series Regression

Time series regression analyzes the relationship between a dependent variable and independent variables when the data is collected over time. It accounts for autocorrelation and trends in the data and is used in forecasting and studying temporal relationships.

Nonlinear Regression

Nonlinear regression models are used when the relationship between the dependent variable and independent variables is not linear. These models can take various functional forms and require estimation techniques different from those used in linear regression.

Poisson Regression

Poisson regression is employed when the dependent variable represents count data. It models the relationship between the independent variables and the expected count, assuming a Poisson distribution for the dependent variable.

Generalized Linear Models (GLM)

GLMs are a flexible class of regression models that extend the linear regression framework to handle different types of dependent variables, including binary, count, and continuous variables. GLMs incorporate various probability distributions and link functions.

Regression Analysis Formulas

Regression analysis involves estimating the parameters of a regression model to describe the relationship between the dependent variable (Y) and one or more independent variables (X). Here are the basic formulas for linear regression, multiple regression, and logistic regression:

Linear Regression:

Simple Linear Regression Model: Y = β0 + β1X + ε

Multiple Linear Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

In both formulas:

  • Y represents the dependent variable (response variable).
  • X represents the independent variable(s) (predictor variable(s)).
  • β0, β1, β2, …, βn are the regression coefficients or parameters that need to be estimated.
  • ε represents the error term or residual (the difference between the observed and predicted values).

Multiple Regression:

Multiple regression extends the concept of simple linear regression by including multiple independent variables.

Multiple Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

The formulas are similar to those in linear regression, with the addition of more independent variables.

Logistic Regression:

Logistic regression is used when the dependent variable is binary or categorical. The logistic regression model applies a logistic or sigmoid function to the linear combination of the independent variables.

Logistic Regression Model: p = 1 / (1 + e^-(β0 + β1X1 + β2X2 + … + βnXn))

In the formula:

  • p represents the probability of the event occurring (e.g., the probability of success or belonging to a certain category).
  • X1, X2, …, Xn represent the independent variables.
  • e is the base of the natural logarithm.

The logistic function ensures that the predicted probabilities lie between 0 and 1, allowing for binary classification.

Regression Analysis Examples

Regression Analysis Examples are as follows:

  • Stock Market Prediction: Regression analysis can be used to predict stock prices based on various factors such as historical prices, trading volume, news sentiment, and economic indicators. Traders and investors can use this analysis to make informed decisions about buying or selling stocks.
  • Demand Forecasting: In retail and e-commerce, real-time It can help forecast demand for products. By analyzing historical sales data along with real-time data such as website traffic, promotional activities, and market trends, businesses can adjust their inventory levels and production schedules to meet customer demand more effectively.
  • Energy Load Forecasting: Utility companies often use real-time regression analysis to forecast electricity demand. By analyzing historical energy consumption data, weather conditions, and other relevant factors, they can predict future energy loads. This information helps them optimize power generation and distribution, ensuring a stable and efficient energy supply.
  • Online Advertising Performance: It can be used to assess the performance of online advertising campaigns. By analyzing real-time data on ad impressions, click-through rates, conversion rates, and other metrics, advertisers can adjust their targeting, messaging, and ad placement strategies to maximize their return on investment.
  • Predictive Maintenance: Regression analysis can be applied to predict equipment failures or maintenance needs. By continuously monitoring sensor data from machines or vehicles, regression models can identify patterns or anomalies that indicate potential failures. This enables proactive maintenance, reducing downtime and optimizing maintenance schedules.
  • Financial Risk Assessment: Real-time regression analysis can help financial institutions assess the risk associated with lending or investment decisions. By analyzing real-time data on factors such as borrower financials, market conditions, and macroeconomic indicators, regression models can estimate the likelihood of default or assess the risk-return tradeoff for investment portfolios.

Importance of Regression Analysis

Importance of Regression Analysis is as follows:

  • Relationship Identification: Regression analysis helps in identifying and quantifying the relationship between a dependent variable and one or more independent variables. It allows us to determine how changes in independent variables impact the dependent variable. This information is crucial for decision-making, planning, and forecasting.
  • Prediction and Forecasting: Regression analysis enables us to make predictions and forecasts based on the relationships identified. By estimating the values of the dependent variable using known values of independent variables, regression models can provide valuable insights into future outcomes. This is particularly useful in business, economics, finance, and other fields where forecasting is vital for planning and strategy development.
  • Causality Assessment: While correlation does not imply causation, regression analysis provides a framework for assessing causality by considering the direction and strength of the relationship between variables. It allows researchers to control for other factors and assess the impact of a specific independent variable on the dependent variable. This helps in determining the causal effect and identifying significant factors that influence outcomes.
  • Model Building and Variable Selection: Regression analysis aids in model building by determining the most appropriate functional form of the relationship between variables. It helps researchers select relevant independent variables and eliminate irrelevant ones, reducing complexity and improving model accuracy. This process is crucial for creating robust and interpretable models.
  • Hypothesis Testing: Regression analysis provides a statistical framework for hypothesis testing. Researchers can test the significance of individual coefficients, assess the overall model fit, and determine if the relationship between variables is statistically significant. This allows for rigorous analysis and validation of research hypotheses.
  • Policy Evaluation and Decision-Making: Regression analysis plays a vital role in policy evaluation and decision-making processes. By analyzing historical data, researchers can evaluate the effectiveness of policy interventions and identify the key factors contributing to certain outcomes. This information helps policymakers make informed decisions, allocate resources effectively, and optimize policy implementation.
  • Risk Assessment and Control: Regression analysis can be used for risk assessment and control purposes. By analyzing historical data, organizations can identify risk factors and develop models that predict the likelihood of certain outcomes, such as defaults, accidents, or failures. This enables proactive risk management, allowing organizations to take preventive measures and mitigate potential risks.

When to Use Regression Analysis

  • Prediction : Regression analysis is often employed to predict the value of the dependent variable based on the values of independent variables. For example, you might use regression to predict sales based on advertising expenditure, or to predict a student’s academic performance based on variables like study time, attendance, and previous grades.
  • Relationship analysis: Regression can help determine the strength and direction of the relationship between variables. It can be used to examine whether there is a linear association between variables, identify which independent variables have a significant impact on the dependent variable, and quantify the magnitude of those effects.
  • Causal inference: Regression analysis can be used to explore cause-and-effect relationships by controlling for other variables. For example, in a medical study, you might use regression to determine the impact of a specific treatment while accounting for other factors like age, gender, and lifestyle.
  • Forecasting : Regression models can be utilized to forecast future trends or outcomes. By fitting a regression model to historical data, you can make predictions about future values of the dependent variable based on changes in the independent variables.
  • Model evaluation: Regression analysis can be used to evaluate the performance of a model or test the significance of variables. You can assess how well the model fits the data, determine if additional variables improve the model’s predictive power, or test the statistical significance of coefficients.
  • Data exploration : Regression analysis can help uncover patterns and insights in the data. By examining the relationships between variables, you can gain a deeper understanding of the data set and identify potential patterns, outliers, or influential observations.

Applications of Regression Analysis

Here are some common applications of regression analysis:

  • Economic Forecasting: Regression analysis is frequently employed in economics to forecast variables such as GDP growth, inflation rates, or stock market performance. By analyzing historical data and identifying the underlying relationships, economists can make predictions about future economic conditions.
  • Financial Analysis: Regression analysis plays a crucial role in financial analysis, such as predicting stock prices or evaluating the impact of financial factors on company performance. It helps analysts understand how variables like interest rates, company earnings, or market indices influence financial outcomes.
  • Marketing Research: Regression analysis helps marketers understand consumer behavior and make data-driven decisions. It can be used to predict sales based on advertising expenditures, pricing strategies, or demographic variables. Regression models provide insights into which marketing efforts are most effective and help optimize marketing campaigns.
  • Health Sciences: Regression analysis is extensively used in medical research and public health studies. It helps examine the relationship between risk factors and health outcomes, such as the impact of smoking on lung cancer or the relationship between diet and heart disease. Regression analysis also helps in predicting health outcomes based on various factors like age, genetic markers, or lifestyle choices.
  • Social Sciences: Regression analysis is widely used in social sciences like sociology, psychology, and education research. Researchers can investigate the impact of variables like income, education level, or social factors on various outcomes such as crime rates, academic performance, or job satisfaction.
  • Operations Research: Regression analysis is applied in operations research to optimize processes and improve efficiency. For example, it can be used to predict demand based on historical sales data, determine the factors influencing production output, or optimize supply chain logistics.
  • Environmental Studies: Regression analysis helps in understanding and predicting environmental phenomena. It can be used to analyze the impact of factors like temperature, pollution levels, or land use patterns on phenomena such as species diversity, water quality, or climate change.
  • Sports Analytics: Regression analysis is increasingly used in sports analytics to gain insights into player performance, team strategies, and game outcomes. It helps analyze the relationship between various factors like player statistics, coaching strategies, or environmental conditions and their impact on game outcomes.

Advantages and Disadvantages of Regression Analysis

About the author.

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Research-Methodology

Regression Analysis

Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a dependent variable and one or more independent variables.

The basic form of regression models includes unknown parameters (β), independent variables (X), and the dependent variable (Y).

Regression model, basically, specifies the relation of dependent variable (Y) to a function combination of independent variables (X) and unknown parameters (β)

                                    Y  ≈  f (X, β)   

Regression equation can be used to predict the values of ‘y’, if the value of ‘x’ is given, and both ‘y’ and ‘x’ are the two sets of measures of a sample size of ‘n’. The formulae for regression equation would be

Regression analysis

Do not be intimidated by visual complexity of correlation and regression formulae above. You don’t have to apply the formula manually, and correlation and regression analyses can be run with the application of popular analytical software such as Microsoft Excel, Microsoft Access, SPSS and others.

Linear regression analysis is based on the following set of assumptions:

1. Assumption of linearity . There is a linear relationship between dependent and independent variables.

2. Assumption of homoscedasticity . Data values for dependent and independent variables have equal variances.

3. Assumption of absence of collinearity or multicollinearity . There is no correlation between two or more independent variables.

4. Assumption of normal distribution . The data for the independent variables and dependent variable are normally distributed

My e-book,  The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance  offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline. John Dudovskiy

Regression analysis

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 31 January 2022

The clinician’s guide to interpreting a regression analysis

  • Sofia Bzovsky 1 ,
  • Mark R. Phillips   ORCID: orcid.org/0000-0003-0923-261X 2 ,
  • Robyn H. Guymer   ORCID: orcid.org/0000-0002-9441-4356 3 , 4 ,
  • Charles C. Wykoff 5 , 6 ,
  • Lehana Thabane   ORCID: orcid.org/0000-0003-0355-9734 2 , 7 ,
  • Mohit Bhandari   ORCID: orcid.org/0000-0001-9608-4808 1 , 2 &
  • Varun Chaudhary   ORCID: orcid.org/0000-0002-9988-4146 1 , 2

on behalf of the R.E.T.I.N.A. study group

Eye volume  36 ,  pages 1715–1717 ( 2022 ) Cite this article

20k Accesses

9 Citations

1 Altmetric

Metrics details

  • Outcomes research

Introduction

When researchers are conducting clinical studies to investigate factors associated with, or treatments for disease and conditions to improve patient care and clinical practice, statistical evaluation of the data is often necessary. Regression analysis is an important statistical method that is commonly used to determine the relationship between several factors and disease outcomes or to identify relevant prognostic factors for diseases [ 1 ].

This editorial will acquaint readers with the basic principles of and an approach to interpreting results from two types of regression analyses widely used in ophthalmology: linear, and logistic regression.

Linear regression analysis

Linear regression is used to quantify a linear relationship or association between a continuous response/outcome variable or dependent variable with at least one independent or explanatory variable by fitting a linear equation to observed data [ 1 ]. The variable that the equation solves for, which is the outcome or response of interest, is called the dependent variable [ 1 ]. The variable that is used to explain the value of the dependent variable is called the predictor, explanatory, or independent variable [ 1 ].

In a linear regression model, the dependent variable must be continuous (e.g. intraocular pressure or visual acuity), whereas, the independent variable may be either continuous (e.g. age), binary (e.g. sex), categorical (e.g. age-related macular degeneration stage or diabetic retinopathy severity scale score), or a combination of these [ 1 ].

When investigating the effect or association of a single independent variable on a continuous dependent variable, this type of analysis is called a simple linear regression [ 2 ]. In many circumstances though, a single independent variable may not be enough to adequately explain the dependent variable. Often it is necessary to control for confounders and in these situations, one can perform a multivariable linear regression to study the effect or association with multiple independent variables on the dependent variable [ 1 , 2 ]. When incorporating numerous independent variables, the regression model estimates the effect or contribution of each independent variable while holding the values of all other independent variables constant [ 3 ].

When interpreting the results of a linear regression, there are a few key outputs for each independent variable included in the model:

Estimated regression coefficient—The estimated regression coefficient indicates the direction and strength of the relationship or association between the independent and dependent variables [ 4 ]. Specifically, the regression coefficient describes the change in the dependent variable for each one-unit change in the independent variable, if continuous [ 4 ]. For instance, if examining the relationship between a continuous predictor variable and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that for every one-unit increase in the predictor, there is a two-unit increase in intra-ocular pressure. If the independent variable is binary or categorical, then the one-unit change represents switching from one category to the reference category [ 4 ]. For instance, if examining the relationship between a binary predictor variable, such as sex, where ‘female’ is set as the reference category, and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that, on average, males have an intra-ocular pressure that is 2 mm Hg higher than females.

Confidence Interval (CI)—The CI, typically set at 95%, is a measure of the precision of the coefficient estimate of the independent variable [ 4 ]. A large CI indicates a low level of precision, whereas a small CI indicates a higher precision [ 5 ].

P value—The p value for the regression coefficient indicates whether the relationship between the independent and dependent variables is statistically significant [ 6 ].

Logistic regression analysis

As with linear regression, logistic regression is used to estimate the association between one or more independent variables with a dependent variable [ 7 ]. However, the distinguishing feature in logistic regression is that the dependent variable (outcome) must be binary (or dichotomous), meaning that the variable can only take two different values or levels, such as ‘1 versus 0’ or ‘yes versus no’ [ 2 , 7 ]. The effect size of predictor variables on the dependent variable is best explained using an odds ratio (OR) [ 2 ]. ORs are used to compare the relative odds of the occurrence of the outcome of interest, given exposure to the variable of interest [ 5 ]. An OR equal to 1 means that the odds of the event in one group are the same as the odds of the event in another group; there is no difference [ 8 ]. An OR > 1 implies that one group has a higher odds of having the event compared with the reference group, whereas an OR < 1 means that one group has a lower odds of having an event compared with the reference group [ 8 ]. When interpreting the results of a logistic regression, the key outputs include the OR, CI, and p-value for each independent variable included in the model.

Clinical example

Sen et al. investigated the association between several factors (independent variables) and visual acuity outcomes (dependent variable) in patients receiving anti-vascular endothelial growth factor therapy for macular oedema (DMO) by means of both linear and logistic regression [ 9 ]. Multivariable linear regression demonstrated that age (Estimate −0.33, 95% CI − 0.48 to −0.19, p  < 0.001) was significantly associated with best-corrected visual acuity (BCVA) at 100 weeks at alpha = 0.05 significance level [ 9 ]. The regression coefficient of −0.33 means that the BCVA at 100 weeks decreases by 0.33 with each additional year of older age.

Multivariable logistic regression also demonstrated that age and ellipsoid zone status were statistically significant associated with achieving a BCVA letter score >70 letters at 100 weeks at the alpha = 0.05 significance level. Patients ≥75 years of age were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.96, 95% CI 0.94 to 0.98, p  = 0.001) [ 9 ]. Similarly, patients between the ages of 50–74 years were also at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.15, 95% CI 0.04 to 0.48, p  = 0.001) [ 9 ]. As well, those with a not intact ellipsoid zone were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone (OR 0.20, 95% CI 0.07 to 0.56; p  = 0.002). On the other hand, patients with an ungradable/questionable ellipsoid zone were at an increased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone, since the OR is greater than 1 (OR 2.26, 95% CI 1.14 to 4.48; p  = 0.02) [ 9 ].

The narrower the CI, the more precise the estimate is; and the smaller the p value (relative to alpha = 0.05), the greater the evidence against the null hypothesis of no effect or association.

Simply put, linear and logistic regression are useful tools for appreciating the relationship between predictor/explanatory and outcome variables for continuous and dichotomous outcomes, respectively, that can be applied in clinical practice, such as to gain an understanding of risk factors associated with a disease of interest.

Schneider A, Hommel G, Blettner M. Linear Regression. Anal Dtsch Ärztebl Int. 2010;107:776–82.

Google Scholar  

Bender R. Introduction to the use of regression models in epidemiology. In: Verma M, editor. Cancer epidemiology. Methods in molecular biology. Humana Press; 2009:179–95.

Schober P, Vetter TR. Confounding in observational research. Anesth Analg. 2020;130:635.

Article   Google Scholar  

Schober P, Vetter TR. Linear regression in medical research. Anesth Analg. 2021;132:108–9.

Szumilas M. Explaining odds ratios. J Can Acad Child Adolesc Psychiatry. 2010;19:227–9.

Thiese MS, Ronna B, Ott U. P value interpretations and considerations. J Thorac Dis. 2016;8:E928–31.

Schober P, Vetter TR. Logistic regression in medical research. Anesth Analg. 2021;132:365–6.

Zabor EC, Reddy CA, Tendulkar RD, Patil S. Logistic regression in clinical studies. Int J Radiat Oncol Biol Phys. 2022;112:271–7.

Sen P, Gurudas S, Ramu J, Patrao N, Chandra S, Rasheed R, et al. Predictors of visual acuity outcomes after anti-vascular endothelial growth factor treatment for macular edema secondary to central retinal vein occlusion. Ophthalmol Retin. 2021;5:1115–24.

Download references

R.E.T.I.N.A. study group

Varun Chaudhary 1,2 , Mohit Bhandari 1,2 , Charles C. Wykoff 5,6 , Sobha Sivaprasad 8 , Lehana Thabane 2,7 , Peter Kaiser 9 , David Sarraf 10 , Sophie J. Bakri 11 , Sunir J. Garg 12 , Rishi P. Singh 13,14 , Frank G. Holz 15 , Tien Y. Wong 16,17 , and Robyn H. Guymer 3,4

Author information

Authors and affiliations.

Department of Surgery, McMaster University, Hamilton, ON, Canada

Sofia Bzovsky, Mohit Bhandari & Varun Chaudhary

Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada

Mark R. Phillips, Lehana Thabane, Mohit Bhandari & Varun Chaudhary

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia

Robyn H. Guymer

Department of Surgery, (Ophthalmology), The University of Melbourne, Melbourne, VIC, Australia

Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

Charles C. Wykoff

Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

Biostatistics Unit, St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada

Lehana Thabane

NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

Sobha Sivaprasad

Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Peter Kaiser

Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

David Sarraf

Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

Sophie J. Bakri

The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

Sunir J. Garg

Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Rishi P. Singh

Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

Department of Ophthalmology, University of Bonn, Bonn, Germany

Frank G. Holz

Singapore Eye Research Institute, Singapore, Singapore

Tien Y. Wong

Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

  • Varun Chaudhary
  • , Mohit Bhandari
  • , Charles C. Wykoff
  • , Sobha Sivaprasad
  • , Lehana Thabane
  • , Peter Kaiser
  • , David Sarraf
  • , Sophie J. Bakri
  • , Sunir J. Garg
  • , Rishi P. Singh
  • , Frank G. Holz
  • , Tien Y. Wong
  •  & Robyn H. Guymer

Contributions

SB was responsible for writing, critical review and feedback on manuscript. MRP was responsible for conception of idea, critical review and feedback on manuscript. RHG was responsible for critical review and feedback on manuscript. CCW was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript. MB was responsible for conception of idea, critical review and feedback on manuscript. VC was responsible for conception of idea, critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary .

Ethics declarations

Competing interests.

SB: Nothing to disclose. MRP: Nothing to disclose. RHG: Advisory boards: Bayer, Novartis, Apellis, Roche, Genentech Inc.—unrelated to this study. CCW: Consultant: Acuela, Adverum Biotechnologies, Inc, Aerpio, Alimera Sciences, Allegro Ophthalmics, LLC, Allergan, Apellis Pharmaceuticals, Bayer AG, Chengdu Kanghong Pharmaceuticals Group Co, Ltd, Clearside Biomedical, DORC (Dutch Ophthalmic Research Center), EyePoint Pharmaceuticals, Gentech/Roche, GyroscopeTx, IVERIC bio, Kodiak Sciences Inc, Novartis AG, ONL Therapeutics, Oxurion NV, PolyPhotonix, Recens Medical, Regeron Pharmaceuticals, Inc, REGENXBIO Inc, Santen Pharmaceutical Co, Ltd, and Takeda Pharmaceutical Company Limited; Research funds: Adverum Biotechnologies, Inc, Aerie Pharmaceuticals, Inc, Aerpio, Alimera Sciences, Allergan, Apellis Pharmaceuticals, Chengdu Kanghong Pharmaceutical Group Co, Ltd, Clearside Biomedical, Gemini Therapeutics, Genentech/Roche, Graybug Vision, Inc, GyroscopeTx, Ionis Pharmaceuticals, IVERIC bio, Kodiak Sciences Inc, Neurotech LLC, Novartis AG, Opthea, Outlook Therapeutics, Inc, Recens Medical, Regeneron Pharmaceuticals, Inc, REGENXBIO Inc, Samsung Pharm Co, Ltd, Santen Pharmaceutical Co, Ltd, and Xbrane Biopharma AB—unrelated to this study. LT: Nothing to disclose. MB: Research funds: Pendopharm, Bioventus, Acumed—unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis—unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Bzovsky, S., Phillips, M.R., Guymer, R.H. et al. The clinician’s guide to interpreting a regression analysis. Eye 36 , 1715–1717 (2022). https://doi.org/10.1038/s41433-022-01949-z

Download citation

Received : 08 January 2022

Revised : 17 January 2022

Accepted : 18 January 2022

Published : 31 January 2022

Issue Date : September 2022

DOI : https://doi.org/10.1038/s41433-022-01949-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Factors affecting patient satisfaction at a plastic surgery outpatient department at a tertiary centre in south africa.

  • Chrysis Sofianos

BMC Health Services Research (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research on regression analysis

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Social justice
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

Explained: Regression analysis

research on regression analysis

Previous image Next image

Share this news article on:

Related links.

  • Department of Economics
  • Department of Mathematics
  • Explained: "Linear and nonlinear systems"

Related Topics

  • Mathematics

More MIT News

Janabel Xia dancing in front of a blackboard. Her back is arched, head thrown back, hair flying, and arms in the air as she looks at the camera and smiles.

Janabel Xia: Algorithms, dance rhythms, and the drive to succeed

Read full story →

Headshot of Jonathan Byrnes outdoors

Jonathan Byrnes, MIT Center for Transportation and Logistics senior lecturer and visionary in supply chain management, dies at 75

Colorful rendering shows a lattice of black and grey balls making a honeycomb-shaped molecule, the MOF. Snaking around it is the polymer, represented as a translucent string of teal balls. Brown molecules, representing toxic gas, also float around.

Researchers develop a detector for continuously monitoring toxic gases

Portrait photo of Hanjun Lee

The beauty of biology

Three people sit on a stage, one of them speaking. Red and white panels with the MIT AgeLab logo are behind them.

Navigating longevity with industry leaders at MIT AgeLab PLAN Forum

Jeong Min Park poses leaning on an outdoor sculpture in Killian Court.

Jeong Min Park earns 2024 Schmidt Science Fellowship

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 24, Issue 4
  • Understanding and interpreting regression analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-7839-8130 Parveen Ali 1 , 2 ,
  • http://orcid.org/0000-0003-0157-5319 Ahtisham Younas 3 , 4
  • 1 School of Nursing and Midwifery , University of Sheffield , Sheffield , South Yorkshire , UK
  • 2 Sheffiled University Interpersonal Violence Research Group , The University of Sheffiled SEAS , Sheffield , UK
  • 3 Faculty of Nursing , Memorial University of Newfoundland , St. John's , Newfoundland and Labrador , Canada
  • 4 Swat College of Nursing , Mingora, Swat , Pakistan
  • Correspondence to Ahtisham Younas, Memorial University of Newfoundland, St. John's, NL A1C 5S7, Canada; ay6133{at}mun.ca

https://doi.org/10.1136/ebnurs-2021-103425

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • statistics & research methods

Introduction

A nurse educator is interested in finding out the academic and non-academic predictors of success in nursing students. Given the complexity of educational and clinical learning environments, demographic, clinical and academic factors (age, gender, previous educational training, personal stressors, learning demands, motivation, assignment workload, etc) influencing nursing students’ success, she was able to list various potential factors contributing towards success relatively easily. Nevertheless, not all of the identified factors will be plausible predictors of increased success. Therefore, she could use a powerful statistical procedure called regression analysis to identify whether the likelihood of increased success is influenced by factors such as age, stressors, learning demands, motivation and education.

What is regression?

Purposes of regression analysis.

Regression analysis has four primary purposes: description, estimation, prediction and control. 1 , 2 By description, regression can explain the relationship between dependent and independent variables. Estimation means that by using the observed values of independent variables, the value of dependent variable can be estimated. 2 Regression analysis can be useful for predicting the outcomes and changes in dependent variables based on the relationships of dependent and independent variables. Finally, regression enables in controlling the effect of one or more independent variables while investigating the relationship of one independent variable with the dependent variable. 1

Types of regression analyses

There are commonly three types of regression analyses, namely, linear, logistic and multiple regression. The differences among these types are outlined in table 1 in terms of their purpose, nature of dependent and independent variables, underlying assumptions, and nature of curve. 1 , 3 However, more detailed discussion for linear regression is presented as follows.

  • View inline

Comparison of linear, logistic and multiple regression

Linear regression and interpretation

Linear regression analysis involves examining the relationship between one independent and dependent variable. Statistically, the relationship between one independent variable (x) and a dependent variable (y) is expressed as: y= β 0 + β 1 x+ε. In this equation, β 0 is the y intercept and refers to the estimated value of y when x is equal to 0. The coefficient β 1 is the regression coefficient and denotes that the estimated increase in the dependent variable for every unit increase in the independent variable. The symbol ε is a random error component and signifies imprecision of regression indicating that, in actual practice, the independent variables are cannot perfectly predict the change in any dependent variable. 1 Multiple linear regression follows the same logic as univariate linear regression except (a) multiple regression, there are more than one independent variable and (b) there should be non-collinearity among the independent variables.

Factors affecting regression

Linear and multiple regression analyses are affected by factors, namely, sample size, missing data and the nature of sample. 2

Small sample size may only demonstrate connections among variables with strong relationship. Therefore, sample size must be chosen based on the number of independent variables and expect strength of relationship.

Many missing values in the data set may affect the sample size. Therefore, all the missing values should be adequately dealt with before conducting regression analyses.

The subsamples within the larger sample may mask the actual effect of independent and dependent variables. Therefore, if subsamples are predefined, a regression within the sample could be used to detect true relationships. Otherwise, the analysis should be undertaken on the whole sample.

Building on her research interest mentioned in the beginning, let us consider a study by Ali and Naylor. 4 They were interested in identifying the academic and non-academic factors which predict the academic success of nursing diploma students. This purpose is consistent with one of the above-mentioned purposes of regression analysis (ie, prediction). Ali and Naylor’s chosen academic independent variables were preadmission qualification, previous academic performance and school type and the non-academic variables were age, gender, marital status and time gap. To achieve their purpose, they collected data from 628 nursing students between the age range of 15–34 years. They used both linear and multiple regression analyses to identify the predictors of student success. For analysis, they examined the relationship of academic and non-academic variables across different years of study and noted that academic factors accounted for 36.6%, 44.3% and 50.4% variability in academic success of students in year 1, year 2 and year 3, respectively. 4

Ali and Naylor presented the relationship among these variables using scatter plots, which are commonly used graphs for data display in regression analysis—see examples of various scatter plots in figure 1 . 4 In a scatter plot, the clustering of the dots denoted the strength of relationship, whereas the direction indicates the nature of relationships among variables as positive (ie, increase in one variable results in an increase in the other) and negative (ie, increase in one variable results in decrease in the other).

  • Download figure
  • Open in new tab
  • Download powerpoint

An Example of Scatter Plot for Regression.

Table 2 presents the results of regression analysis for academic and non-academic variables for year 4 students’ success. The significant predictors of student success are denoted with a significant p value. For every, significant predictor, the beta value indicates the percentage increase in students’ academic success with one unit increase in the variable.

Regression model for the final year students (N=343)

Conclusions

Regression analysis is a powerful and useful statistical procedure with many implications for nursing research. It enables researchers to describe, predict and estimate the relationships and draw plausible conclusions about the interrelated variables in relation to any studied phenomena. Regression also allows for controlling one or more variables when researchers are interested in examining the relationship among specific variables. Some of the key considerations are presented that may be useful for researchers undertaking regression analysis. While planning and conducting regression analysis, researchers should consider the type and number of dependent and independent variables as well as the nature and size of sample. Choosing a wrong type of regression analysis with small sample may result in erroneous conclusions about the studied phenomenon.

Ethics statements

Patient consent for publication.

Not required.

  • Montgomery DC ,
  • Schneider A ,

Twitter @parveenazamali, @@Ahtisham04

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Commissioned; internally peer reviewed.

Read the full text or download the PDF:

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Regression Analysis

Try Qualtrics for free

The complete guide to regression analysis.

19 min read What is regression analysis and why is it useful? While most of us have heard the term, understanding regression analysis in detail may be something you need to brush up on. Here’s what you need to know about this popular method of analysis.

When you rely on data to drive and guide business decisions, as well as predict market trends, just gathering and analyzing what you find isn’t enough — you need to ensure it’s relevant and valuable.

The challenge, however, is that so many variables can influence business data: market conditions, economic disruption, even the weather! As such, it’s essential you know which variables are affecting your data and forecasts, and what data you can discard.

And one of the most effective ways to determine data value and monitor trends (and the relationships between them) is to use regression analysis, a set of statistical methods used for the estimation of relationships between independent and dependent variables.

In this guide, we’ll cover the fundamentals of regression analysis, from what it is and how it works to its benefits and practical applications.

Free eBook: 2024 global market research trends report

What is regression analysis?

Regression analysis is a statistical method. It’s used for analyzing different factors that might influence an objective – such as the success of a product launch, business growth, a new marketing campaign – and determining which factors are important and which ones can be ignored.

Regression analysis can also help leaders understand how different variables impact each other and what the outcomes are. For example, when forecasting financial performance, regression analysis can help leaders determine how changes in the business can influence revenue or expenses in the future.

Running an analysis of this kind, you might find that there’s a high correlation between the number of marketers employed by the company, the leads generated, and the opportunities closed.

This seems to suggest that a high number of marketers and a high number of leads generated influences sales success. But do you need both factors to close those sales? By analyzing the effects of these variables on your outcome,  you might learn that when leads increase but the number of marketers employed stays constant, there is no impact on the number of opportunities closed, but if the number of marketers increases, leads and closed opportunities both rise.

Regression analysis can help you tease out these complex relationships so you can determine which areas you need to focus on in order to get your desired results, and avoid wasting time with those that have little or no impact. In this example, that might mean hiring more marketers rather than trying to increase leads generated.

How does regression analysis work?

Regression analysis starts with variables that are categorized into two types: dependent and independent variables. The variables you select depend on the outcomes you’re analyzing.

Understanding variables:

1. dependent variable.

This is the main variable that you want to analyze and predict. For example, operational (O) data such as your quarterly or annual sales, or experience (X) data such as your net promoter score (NPS) or customer satisfaction score (CSAT) .

These variables are also called response variables, outcome variables, or left-hand-side variables (because they appear on the left-hand side of a regression equation).

There are three easy ways to identify them:

  • Is the variable measured as an outcome of the study?
  • Does the variable depend on another in the study?
  • Do you measure the variable only after other variables are altered?

2. Independent variable

Independent variables are the factors that could affect your dependent variables. For example, a price rise in the second quarter could make an impact on your sales figures.

You can identify independent variables with the following list of questions:

  • Is the variable manipulated, controlled, or used as a subject grouping method by the researcher?
  • Does this variable come before the other variable in time?
  • Are you trying to understand whether or how this variable affects another?

Independent variables are often referred to differently in regression depending on the purpose of the analysis. You might hear them called:

Explanatory variables

Explanatory variables are those which explain an event or an outcome in your study. For example, explaining why your sales dropped or increased.

Predictor variables

Predictor variables are used to predict the value of the dependent variable. For example, predicting how much sales will increase when new product features are rolled out .

Experimental variables

These are variables that can be manipulated or changed directly by researchers to assess the impact. For example, assessing how different product pricing ($10 vs $15 vs $20) will impact the likelihood to purchase.

Subject variables (also called fixed effects)

Subject variables can’t be changed directly, but vary across the sample. For example, age, gender, or income of consumers.

Unlike experimental variables, you can’t randomly assign or change subject variables, but you can design your regression analysis to determine the different outcomes of groups of participants with the same characteristics. For example, ‘how do price rises impact sales based on income?’

Carrying out regression analysis

Regression analysis

So regression is about the relationships between dependent and independent variables. But how exactly do you do it?

Assuming you have your data collection done already, the first and foremost thing you need to do is plot your results on a graph. Doing this makes interpreting regression analysis results much easier as you can clearly see the correlations between dependent and independent variables.

Let’s say you want to carry out a regression analysis to understand the relationship between the number of ads placed and revenue generated.

On the Y-axis, you place the revenue generated. On the X-axis, the number of digital ads. By plotting the information on the graph, and drawing a line (called the regression line) through the middle of the data, you can see the relationship between the number of digital ads placed and revenue generated.

Regression analysis - step by step

This regression line is the line that provides the best description of the relationship between your independent variables and your dependent variable. In this example, we’ve used a simple linear regression model.

Regression analysis - step by step

Statistical analysis software can draw this line for you and precisely calculate the regression line. The software then provides a formula for the slope of the line, adding further context to the relationship between your dependent and independent variables.

Simple linear regression analysis

A simple linear model uses a single straight line to determine the relationship between a single independent variable and a dependent variable.

This regression model is mostly used when you want to determine the relationship between two variables (like price increases and sales) or the value of the dependent variable at certain points of the independent variable (for example the sales levels at a certain price rise).

While linear regression is useful, it does require you to make some assumptions.

For example, it requires you to assume that:

  • the data was collected using a statistically valid sample collection method that is representative of the target population
  • The observed relationship between the variables can’t be explained by a ‘hidden’ third variable – in other words, there are no spurious correlations.
  • the relationship between the independent variable and dependent variable is linear – meaning that the best fit along the data points is a straight line and not a curved one

Multiple regression analysis

As the name suggests, multiple regression analysis is a type of regression that uses multiple variables. It uses multiple independent variables to predict the outcome of a single dependent variable. Of the various kinds of multiple regression, multiple linear regression is one of the best-known.

Multiple linear regression is a close relative of the simple linear regression model in that it looks at the impact of several independent variables on one dependent variable. However, like simple linear regression, multiple regression analysis also requires you to make some basic assumptions.

For example, you will be assuming that:

  • there is a linear relationship between the dependent and independent variables (it creates a straight line and not a curve through the data points)
  • the independent variables aren’t highly correlated in their own right

An example of multiple linear regression would be an analysis of how marketing spend, revenue growth, and general market sentiment affect the share price of a company.

With multiple linear regression models you can estimate how these variables will influence the share price, and to what extent.

Multivariate linear regression

Multivariate linear regression involves more than one dependent variable as well as multiple independent variables, making it more complicated than linear or multiple linear regressions. However, this also makes it much more powerful and capable of making predictions about complex real-world situations.

For example, if an organization wants to establish or estimate how the COVID-19 pandemic has affected employees in its different markets, it can use multivariate linear regression, with the different geographical regions as dependent variables and the different facets of the pandemic as independent variables (such as mental health self-rating scores, proportion of employees working at home, lockdown durations and employee sick days).

Through multivariate linear regression, you can look at relationships between variables in a holistic way and quantify the relationships between them. As you can clearly visualize those relationships, you can make adjustments to dependent and independent variables to see which conditions influence them. Overall, multivariate linear regression provides a more realistic picture than looking at a single variable.

However, because multivariate techniques are complex, they involve high-level mathematics that require a statistical program to analyze the data.

Logistic regression

Logistic regression models the probability of a binary outcome based on independent variables.

So, what is a binary outcome? It’s when there are only two possible scenarios, either the event happens (1) or it doesn’t (0). e.g. yes/no outcomes, pass/fail outcomes, and so on. In other words, if the outcome can be described as being in either one of two categories.

Logistic regression makes predictions based on independent variables that are assumed or known to have an influence on the outcome. For example, the probability of a sports team winning their game might be affected by independent variables like weather, day of the week, whether they are playing at home or away and how they fared in previous matches.

What are some common mistakes with regression analysis?

Across the globe, businesses are increasingly relying on quality data and insights to drive decision-making — but to make accurate decisions, it’s important that the data collected and statistical methods used to analyze it are reliable and accurate.

Using the wrong data or the wrong assumptions can result in poor decision-making, lead to missed opportunities to improve efficiency and savings, and — ultimately — damage your business long term.

  • Assumptions

When running regression analysis, be it a simple linear or multiple regression, it’s really important to check that the assumptions your chosen method requires have been met. If your data points don’t conform to a straight line of best fit, for example, you need to apply additional statistical modifications to accommodate the non-linear data. For example, if you are looking at income data, which scales on a logarithmic distribution, you should take the Natural Log of Income as your variable then adjust the outcome after the model is created.

  • Correlation vs. causation

It’s a well-worn phrase that bears repeating – correlation does not equal causation. While variables that are linked by causality will always show correlation, the reverse is not always true. Moreover, there is no statistic that can determine causality (although the design of your study overall can).

If you observe a correlation in your results, such as in the first example we gave in this article where there was a correlation between leads and sales, you can’t assume that one thing has influenced the other. Instead, you should use it as a starting point for investigating the relationship between the variables in more depth.

  • Choosing the wrong variables to analyze

Before you use any kind of statistical method, it’s important to understand the subject you’re researching in detail. Doing so means you’re making informed choices of variables and you’re not overlooking something important that might have a significant bearing on your dependent variable.

  • Model building The variables you include in your analysis are just as important as the variables you choose to exclude. That’s because the strength of each independent variable is influenced by the other variables in the model. Other techniques, such as Key Drivers Analysis, are able to account for these variable interdependencies.

Benefits of using regression analysis

There are several benefits to using regression analysis to judge how changing variables will affect your business and to ensure you focus on the right things when forecasting.

Here are just a few of those benefits:

Make accurate predictions

Regression analysis is commonly used when forecasting and forward planning for a business. For example, when predicting sales for the year ahead, a number of different variables will come into play to determine the eventual result.

Regression analysis can help you determine which of these variables are likely to have the biggest impact based on previous events and help you make more accurate forecasts and predictions.

Identify inefficiencies

Using a regression equation a business can identify areas for improvement when it comes to efficiency, either in terms of people, processes, or equipment.

For example, regression analysis can help a car manufacturer determine order numbers based on external factors like the economy or environment.

Using the initial regression equation, they can use it to determine how many members of staff and how much equipment they need to meet orders.

Drive better decisions

Improving processes or business outcomes is always on the minds of owners and business leaders, but without actionable data, they’re simply relying on instinct, and this doesn’t always work out.

This is particularly true when it comes to issues of price. For example, to what extent will raising the price (and to what level) affect next quarter’s sales?

There’s no way to know this without data analysis. Regression analysis can help provide insights into the correlation between price rises and sales based on historical data.

How do businesses use regression? A real-life example

Marketing and advertising spending are common topics for regression analysis. Companies use regression when trying to assess the value of ad spend and marketing spend on revenue.

A typical example is using a regression equation to assess the correlation between ad costs and conversions of new customers. In this instance,

  • our dependent variable (the factor we’re trying to assess the outcomes of) will be our conversions
  • the independent variable (the factor we’ll change to assess how it changes the outcome) will be the daily ad spend
  • the regression equation will try to determine whether an increase in ad spend has a direct correlation with the number of conversions we have

The analysis is relatively straightforward — using historical data from an ad account, we can use daily data to judge ad spend vs conversions and how changes to the spend alter the conversions.

By assessing this data over time, we can make predictions not only on whether increasing ad spend will lead to increased conversions but also what level of spending will lead to what increase in conversions. This can help to optimize campaign spend and ensure marketing delivers good ROI.

This is an example of a simple linear model. If you wanted to carry out a more complex regression equation, we could also factor in other independent variables such as seasonality, GDP, and the current reach of our chosen advertising networks.

By increasing the number of independent variables, we can get a better understanding of whether ad spend is resulting in an increase in conversions, whether it’s exerting an influence in combination with another set of variables, or if we’re dealing with a correlation with no causal impact – which might be useful for predictions anyway, but isn’t a lever we can use to increase sales.

Using this predicted value of each independent variable, we can more accurately predict how spend will change the conversion rate of advertising.

Regression analysis tools

Regression analysis is an important tool when it comes to better decision-making and improved business outcomes. To get the best out of it, you need to invest in the right kind of statistical analysis software.

The best option is likely to be one that sits at the intersection of powerful statistical analysis and intuitive ease of use, as this will empower everyone from beginners to expert analysts to uncover meaning from data, identify hidden trends and produce predictive models without statistical training being required.

IQ stats in action

To help prevent costly errors, choose a tool that automatically runs the right statistical tests and visualizations and then translates the results into simple language that anyone can put into action.

With software that’s both powerful and user-friendly, you can isolate key experience drivers, understand what influences the business, apply the most appropriate regression methods, identify data issues, and much more.

Regression analysis tools

With Qualtrics’ Stats iQ™, you don’t have to worry about the regression equation because our statistical software will run the appropriate equation for you automatically based on the variable type you want to monitor. You can also use several equations, including linear regression and logistic regression, to gain deeper insights into business outcomes and make more accurate, data-driven decisions.

Related resources

Analysis & Reporting

Data Analysis 31 min read

Social media analytics 13 min read, kano analysis 21 min read, margin of error 11 min read, data saturation in qualitative research 8 min read, thematic analysis 11 min read, behavioral analytics 12 min read, request demo.

Ready to learn more about Qualtrics?

Regression Analysis

  • Reference work entry
  • Cite this reference work entry

research on regression analysis

1223 Accesses

1 Citations

Regression analysis is a technique that permits one to study and measure the relation between two or more variables. Starting from data registered in a  sample , regression analysis seeks to determine an estimate of a mathematical relation between two or more variables. The goal is to estimate the value of one variable as a function of one or more other variables. The estimated variable is called the dependent variable and is commonly denoted by Y . In contrast, the variables that explain the variations in Y are called independent variables, and they are denoted by  X .

When Y depends on only one X , we have simple regression analysis, but when Y depends on more than one independent variable, we have multiple regression analysis. If the relation between the dependent and the independent variables is linear, then we have linear regression analysis.

The pioneer in linear regression analysis, Boscovich, Roger Joseph , an astronomer as well as a physician, was one of the first to find...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Eisenhart, C.: Boscovich and the Combination of Observations. In: Kendall, M., Plackett, R.L. (eds.) Studies in the History of Statistics and Probability, vol. II. Griffin, London (1977)

Google Scholar  

Galton, F.: Natural Inheritance. Macmillan, London (1889)

Gauss, C.F.: Theoria Motus Corporum Coelestium. Werke, 7 (1809)

Laplace, P.S. de: Sur les degrés mesurés des méridiens, et sur les longueurs observées sur pendule. Histoire de l'Académie royale des inscriptions et belles lettres, avec les Mémoires de littérature tirées des registres de cette académie. Paris (1789)

Legendre, A.M.: Nouvelles méthodes pour la détermination des orbites des comètes. Courcier, Paris (1805)

Plackett, R.L.: Studies in the history of probability and statistics. In: Kendall, M., Plackett, R.L. (eds.) The discovery of the method of least squares. vol. II. Griffin, London (1977)

Stigler, S.: The History of Statistics, the Measurement of Uncertainty Before 1900. Belknap, London (1986)

MATH   Google Scholar  

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag

About this entry

Cite this entry.

(2008). Regression Analysis. In: The Concise Encyclopedia of Statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32833-1_348

Download citation

DOI : https://doi.org/10.1007/978-0-387-32833-1_348

Publisher Name : Springer, New York, NY

Print ISBN : 978-0-387-31742-7

Online ISBN : 978-0-387-32833-1

eBook Packages : Mathematics and Statistics Reference Module Computer Science and Engineering

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Cardiopulm Phys Ther J
  • v.20(3); 2009 Sep

Regression Analysis for Prediction: Understanding the Process

Phillip b palmer.

1 Hardin-Simmons University, Department of Physical Therapy, Abilene, TX

Dennis G O'Connell

2 Hardin-Simmons University, Department of Physical Therapy, Abilene, TX

Research related to cardiorespiratory fitness often uses regression analysis in order to predict cardiorespiratory status or future outcomes. Reading these studies can be tedious and difficult unless the reader has a thorough understanding of the processes used in the analysis. This feature seeks to “simplify” the process of regression analysis for prediction in order to help readers understand this type of study more easily. Examples of the use of this statistical technique are provided in order to facilitate better understanding.

INTRODUCTION

Graded, maximal exercise tests that directly measure maximum oxygen consumption (VO 2 max) are impractical in most physical therapy clinics because they require expensive equipment and personnel trained to administer the tests. Performing these tests in the clinic may also require medical supervision; as a result researchers have sought to develop exercise and non-exercise models that would allow clinicians to predict VO 2 max without having to perform direct measurement of oxygen uptake. In most cases, the investigators utilize regression analysis to develop their prediction models.

Regression analysis is a statistical technique for determining the relationship between a single dependent (criterion) variable and one or more independent (predictor) variables. The analysis yields a predicted value for the criterion resulting from a linear combination of the predictors. According to Pedhazur, 15 regression analysis has 2 uses in scientific literature: prediction, including classification, and explanation. The following provides a brief review of the use of regression analysis for prediction. Specific emphasis is given to the selection of the predictor variables (assessing model efficiency and accuracy) and cross-validation (assessing model stability). The discussion is not intended to be exhaustive. For a more thorough explanation of regression analysis, the reader is encouraged to consult one of many books written about this statistical technique (eg, Fox; 5 Kleinbaum, Kupper, & Muller; 12 Pedhazur; 15 and Weisberg 16 ). Examples of the use of regression analysis for prediction are drawn from a study by Bradshaw et al. 3 In this study, the researchers' stated purpose was to develop an equation for prediction of cardiorespiratory fitness (CRF) based on non-exercise (N-EX) data.

SELECTING THE CRITERION (OUTCOME MEASURE)

The first step in regression analysis is to determine the criterion variable. Pedhazur 15 suggests that the criterion have acceptable measurement qualities (ie, reliability and validity). Bradshaw et al 3 used VO 2 max as the criterion of choice for their model and measured it using a maximum graded exercise test (GXT) developed by George. 6 George 6 indicated that his protocol for testing compared favorably with the Bruce protocol in terms of predictive ability and had good test-retest reliability ( ICC = .98 –.99). The American College of Sports Medicine indicates that measurement of VO 2 max is the “gold standard” for measuring cardiorespiratory fitness. 1 These facts support that the criterion selected by Bradshaw et al 3 was appropriate and meets the requirements for acceptable reliability and validity.

SELECTING THE PREDICTORS: MODEL EFFICIENCY

Once the criterion has been selected, predictor variables should be identified (model selection). The aim of model selection is to minimize the number of predictors which account for the maximum variance in the criterion. 15 In other words, the most efficient model maximizes the value of the coefficient of determination ( R 2 ). This coefficient estimates the amount of variance in the criterion score accounted for by a linear combination of the predictor variables. The higher the value is for R 2 , the less error or unexplained variance and, therefore, the better prediction. R 2 is dependent on the multiple correlation coefficient ( R ), which describes the relationship between the observed and predicted criterion scores. If there is no difference between the predicted and observed scores, R equals 1.00. This represents a perfect prediction with no error and no unexplained variance ( R 2 = 1.00). When R equals 0.00, there is no relationship between the predictor(s) and the criterion and no variance in scores has been explained ( R 2 = 0.00). The chosen variables cannot predict the criterion. The goal of model selection is, as stated previously, to develop a model that results in the highest estimated value for R 2 .

According to Pedhazur, 15 the value of R is often overestimated. The reasons for this are beyond the scope of this discussion; however, the degree of overestimation is affected by sample size. The larger the ratio is between the number of predictors and subjects, the larger the overestimation. To account for this, sample sizes should be large and there should be 15 to 30 subjects per predictor. 11 , 15 Of course, the most effective way to determine optimal sample size is through statistical power analysis. 11 , 15

Another method of determining the best model for prediction is to test the significance of adding one or more variables to the model using the partial F-test . This process, which is further discussed by Kleinbaum, Kupper, and Muller, 12 allows for exclusion of predictors that do not contribute significantly to the prediction, allowing determination of the most efficient model of prediction. In general, the partial F-test is similar to the F-test used in analysis of variance. It assesses the statistical significance of the difference between values for R 2 derived from 2 or more prediction models using a subset of the variables from the original equation. For example, Bradshaw et al 3 indicated that all variables contributed significantly to their prediction. Though the researchers do not detail the procedure used, it is highly likely that different models were tested, excluding one or more variables, and the resulting values for R 2 assessed for statistical difference.

Although the techniques discussed above are useful in determining the most efficient model for prediction, theory must be considered in choosing the appropriate variables. Previous research should be examined and predictors selected for which a relationship between the criterion and predictors has been established. 12 , 15

It is clear that Bradshaw et al 3 relied on theory and previous research to determine the variables to use in their prediction equation. The 5 variables they chose for inclusion–gender, age, body mass index (BMI), perceived functional ability (PFA), and physical activity rating (PAR)–had been shown in previous studies to contribute to the prediction of VO 2 max (eg, Heil et al; 8 George, Stone, & Burkett 7 ). These 5 predictors accounted for 87% ( R = .93, R 2 = .87 ) of the variance in the predicted values for VO 2 max. Based on a ratio of 1:20 (predictor:sample size), this estimate of R , and thus R 2 , is not likely to be overestimated. The researchers used changes in the value of R 2 to determine whether to include or exclude these or other variables. They reported that removal of perceived functional ability (PFA) as a variable resulted in a decrease in R from .93 to .89. Without this variable, the remaining 4 predictors would account for only 79% of the variance in VO 2 max. The investigators did note that each predictor variable contributed significantly ( p < .05 ) to the prediction of VO 2 max (see above discussion related to the partial F-test).

ASSESSING ACCURACY OF THE PREDICTION

Assessing accuracy of the model is best accomplished by analyzing the standard error of estimate ( SEE ) and the percentage that the SEE represents of the predicted mean ( SEE % ). The SEE represents the degree to which the predicted scores vary from the observed scores on the criterion measure, similar to the standard deviation used in other statistical procedures. According to Jackson, 10 lower values of the SEE indicate greater accuracy in prediction. Comparison of the SEE for different models using the same sample allows for determination of the most accurate model to use for prediction. SEE % is calculated by dividing the SEE by the mean of the criterion ( SEE /mean criterion) and can be used to compare different models derived from different samples.

Bradshaw et al 3 report a SEE of 3.44 mL·kg −1 ·min −1 (approximately 1 MET) using all 5 variables in the equation (gender, age, BMI, PFA, PA-R). When the PFA variable is removed from the model, leaving only 4 variables for the prediction (gender, age, BMI, PA-R), the SEE increases to 4.20 mL·kg −1 ·min −1 . The increase in the error term indicates that the model excluding PFA is less accurate in predicting VO 2 max. This is confirmed by the decrease in the value for R (see discussion above). The researchers compare their model of prediction with that of George, Stone, and Burkett, 7 indicating that their model is as accurate. It is not advisable to compare models based on the SEE if the data were collected from different samples as they were in these 2 studies. That type of comparison should be made using SEE %. Bradshaw and colleagues 3 report SEE % for their model (8.62%), but do not report values from other models in making comparisons.

Some advocate the use of statistics derived from the predicted residual sum of squares ( PRESS ) as a means of selecting predictors. 2 , 4 , 16 These statistics are used more often in cross-validation of models and will be discussed in greater detail later.

ASSESSING STABILITY OF THE MODEL FOR PREDICTION

Once the most efficient and accurate model for prediction has been determined, it is prudent that the model be assessed for stability. A model, or equation, is said to be “stable” if it can be applied to different samples from the same population without losing the accuracy of the prediction. This is accomplished through cross-validation of the model. Cross-validation determines how well the prediction model developed using one sample performs in another sample from the same population. Several methods can be employed for cross-validation, including the use of 2 independent samples, split samples, and PRESS -related statistics developed from the same sample.

Using 2 independent samples involves random selection of 2 groups from the same population. One group becomes the “training” or “exploratory” group used for establishing the model of prediction. 5 The second group, the “confirmatory” or “validatory” group is used to assess the model for stability. The researcher compares R 2 values from the 2 groups and assessment of “shrinkage,” the difference between the two values for R 2 , is used as an indicator of model stability. There is no rule of thumb for interpreting the differences, but Kleinbaum, Kupper, and Muller 12 suggest that “shrinkage” values of less than 0.10 indicate a stable model. While preferable, the use of independent samples is rarely used due to cost considerations.

A similar technique of cross-validation uses split samples. Once the sample has been selected from the population, it is randomly divided into 2 subgroups. One subgroup becomes the “exploratory” group and the other is used as the “validatory” group. Again, values for R 2 are compared and model stability is assessed by calculating “shrinkage.”

Holiday, Ballard, and McKeown 9 advocate the use of PRESS-related statistics for cross-validation of regression models as a means of dealing with the problems of data-splitting. The PRESS method is a jackknife analysis that is used to address the issue of estimate bias associated with the use of small sample sizes. 13 In general, a jackknife analysis calculates the desired test statistic multiple times with individual cases omitted from the calculations. In the case of the PRESS method, residuals, or the differences between the actual values of the criterion for each individual and the predicted value using the formula derived with the individual's data removed from the prediction, are calculated. The PRESS statistic is the sum of the squares of the residuals derived from these calculations and is similar to the sum of squares for the error (SS error ) used in analysis of variance (ANOVA). Myers 14 discusses the use of the PRESS statistic and describes in detail how it is calculated. The reader is referred to this text and the article by Holiday, Ballard, and McKeown 9 for additional information.

Once determined, the PRESS statistic can be used to calculate a modified form of R 2 and the SEE . R 2 PRESS is calculated using the following formula: R 2 PRESS = 1 – [ PRESS / SS total ], where SS total equals the sum of squares for the original regression equation. 14 Standard error of the estimate for PRESS ( SEE PRESS ) is calculated as follows: SEE PRESS =, where n equals the number of individual cases. 14 The smaller the difference between the 2 values for R 2 and SEE , the more stable the model for prediction. Bradshaw et al 3 used this technique in their investigation. They reported a value for R 2 PRESS of .83, a decrease of .04 from R 2 for their prediction model. Using the standard set by Kleinbaum, Kupper, and Muller, 12 the model developed by these researchers would appear to have stability, meaning it could be used for prediction in samples from the same population. This is further supported by the small difference between the SEE and the SEE PRESS , 3.44 and 3.63 mL·kg −1 ·min −1 , respectively.

COMPARING TWO DIFFERENT PREDICTION MODELS

A comparison of 2 different models for prediction may help to clarify the use of regression analysis in prediction. Table ​ Table1 1 presents data from 2 studies and will be used in the following discussion.

Comparison of Two Non-exercise Models for Predicting CRF

As noted above, the first step is to select an appropriate criterion, or outcome measure. Bradshaw et al 3 selected VO 2 max as their criterion for measuring cardiorespiratory fitness. Heil et al 8 used VO 2 peak. These 2 measures are often considered to be the same, however, VO 2 peak assumes that conditions for measuring maximum oxygen consumption were not met. 17 It would be optimal to compare models based on the same criterion, but that is not essential, especially since both criteria measure cardiorespiratory fitness in much the same way.

The second step involves selection of variables for prediction. As can be seen in Table ​ Table1, 1 , both groups of investigators selected 5 variables to use in their model. The 5 variables selected by Bradshaw et al 3 provide a better prediction based on the values for R 2 (.87 and .77), indicating that their model accounts for more variance (87% versus 77%) in the prediction than the model of Heil et al. 8 It should also be noted that the SEE calculated in the Bradshaw 3 model (3.44 mL·kg −1 ·min −1 ) is less than that reported by Heil et al 8 (4.90 mL·kg −1 ·min −1 ). Remember, however, that comparison of the SEE should only be made when both models are developed using samples from the same population. Comparing predictions developed from different populations can be accomplished using the SEE% . Review of values for the SEE% in Table ​ Table1 1 would seem to indicate that the model developed by Bradshaw et al 3 is more accurate because the percentage of the mean value for VO 2 max represented by error is less than that reported by Heil et al. 8 In summary, the Bradshaw 3 model would appear to be more efficient, accounting for more variance in the prediction using the same number of variables. It would also appear to be more accurate based on comparison of the SEE% .

The 2 models cannot be compared based on stability of the models. Each set of researchers used different methods for cross-validation. Both models, however, appear to be relatively stable based on the data presented. A clinician can assume that either model would perform fairly well when applied to samples from the same populations as those used by the investigators.

The purpose of this brief review has been to demystify regression analysis for prediction by explaining it in simple terms and to demonstrate its use. When reviewing research articles in which regression analysis has been used for prediction, physical therapists should ensure that the: (1) criterion chosen for the study is appropriate and meets the standards for reliability and validity, (2) processes used by the investigators to assess both model efficiency and accuracy are appropriate, 3) predictors selected for use in the model are reasonable based on theory or previous research, and 4) investigators assessed model stability through a process of cross-validation, providing the opportunity for others to utilize the prediction model in different samples drawn from the same population.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

research on regression analysis

Home Market Research

Regression Analysis: Definition, Types, Usage & Advantages

research on regression analysis

Regression analysis is perhaps one of the most widely used statistical methods for investigating or estimating the relationship between a set of independent and dependent variables. In statistical analysis , distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities.

It is also used as a blanket term for various data analysis techniques utilized in a qualitative research method for modeling and analyzing numerous variables. In the regression method, the dependent variable is a predictor or an explanatory element, and the dependent variable is the outcome or a response to a specific query.

LEARN ABOUT:   Statistical Analysis Methods

Content Index

Definition of Regression Analysis

Types of regression analysis, regression analysis usage in market research, how regression analysis derives insights from surveys, advantages of using regression analysis in an online survey.

Regression analysis is often used to model or analyze data. Most survey analysts use it to understand the relationship between the variables, which can be further utilized to predict the precise outcome.

For Example – Suppose a soft drink company wants to expand its manufacturing unit to a newer location. Before moving forward, the company wants to analyze its revenue generation model and the various factors that might impact it. Hence, the company conducts an online survey with a specific questionnaire.

After using regression analysis, it becomes easier for the company to analyze the survey results and understand the relationship between different variables like electricity and revenue – here, revenue is the dependent variable.

LEARN ABOUT: Level of Analysis

In addition, understanding the relationship between different independent variables like pricing, number of workers, and logistics with the revenue helps the company estimate the impact of varied factors on sales and profits.

Survey researchers often use this technique to examine and find a correlation between different variables of interest. It provides an opportunity to gauge the influence of different independent variables on a dependent variable.

Overall, regression analysis saves the survey researchers’ additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.

Create a Free Account

Researchers usually start by learning linear and logistic regression first. Due to the widespread knowledge of these two methods and ease of application, many analysts think there are only two types of models. Each model has its own specialty and ability to perform if specific conditions are met.

This blog explains the commonly used seven types of multiple regression analysis methods that can be used to interpret the enumerated data in various formats.

01. Linear Regression Analysis

It is one of the most widely known modeling techniques, as it is amongst the first elite regression analysis methods picked up by people at the time of learning predictive modeling. Here, the dependent variable is continuous, and the independent variable is more often continuous or discreet with a linear regression line.

Please note that multiple linear regression has more than one independent variable than simple linear regression. Thus, linear regression is best to be used only when there is a linear relationship between the independent and a dependent variable.

A business can use linear regression to measure the effectiveness of the marketing campaigns, pricing, and promotions on sales of a product. Suppose a company selling sports equipment wants to understand if the funds they have invested in the marketing and branding of their products have given them substantial returns or not.

Linear regression is the best statistical method to interpret the results. The best thing about linear regression is it also helps in analyzing the obscure impact of each marketing and branding activity, yet controlling the constituent’s potential to regulate the sales.

If the company is running two or more advertising campaigns simultaneously, one on television and two on radio, then linear regression can easily analyze the independent and combined influence of running these advertisements together.

LEARN ABOUT: Data Analytics Projects

02. Logistic Regression Analysis

Logistic regression is commonly used to determine the probability of event success and event failure. Logistic regression is used whenever the dependent variable is binary, like 0/1, True/False, or Yes/No. Thus, it can be said that logistic regression is used to analyze either the close-ended questions in a survey or the questions demanding numeric responses in a survey.

Please note logistic regression does not need a linear relationship between a dependent and an independent variable, just like linear regression. Logistic regression applies a non-linear log transformation for predicting the odds ratio; therefore, it easily handles various types of relationships between a dependent and an independent variable.

Logistic regression is widely used to analyze categorical data, particularly for binary response data in business data modeling. More often, logistic regression is used when the dependent variable is categorical, like to predict whether the health claim made by a person is real(1) or fraudulent, to understand if the tumor is malignant(1) or not.

Businesses use logistic regression to predict whether the consumers in a particular demographic will purchase their product or will buy from the competitors based on age, income, gender, race, state of residence, previous purchase, etc.

03. Polynomial Regression Analysis

Polynomial regression is commonly used to analyze curvilinear data when an independent variable’s power is more than 1. In this regression analysis method, the best-fit line is never a ‘straight line’ but always a ‘curve line’ fitting into the data points.

Please note that polynomial regression is better to use when two or more variables have exponents and a few do not.

Additionally, it can model non-linearly separable data offering the liberty to choose the exact exponent for each variable, and that too with full control over the modeling features available.

When combined with response surface analysis, polynomial regression is considered one of the sophisticated statistical methods commonly used in multisource feedback research. Polynomial regression is used mostly in finance and insurance-related industries where the relationship between dependent and independent variables is curvilinear.

Suppose a person wants to budget expense planning by determining how long it would take to earn a definitive sum. Polynomial regression, by taking into account his/her income and predicting expenses, can easily determine the precise time he/she needs to work to earn that specific sum amount.

04. Stepwise Regression Analysis

This is a semi-automated process with which a statistical model is built either by adding or removing the dependent variable on the t-statistics of their estimated coefficients.

If used properly, the stepwise regression will provide you with more powerful data at your fingertips than any method. It works well when you are working with a large number of independent variables. It just fine-tunes the unit of analysis model by poking variables randomly.

Stepwise regression analysis is recommended to be used when there are multiple independent variables, wherein the selection of independent variables is done automatically without human intervention.

Please note, in stepwise regression modeling, the variable is added or subtracted from the set of explanatory variables. The set of added or removed variables is chosen depending on the test statistics of the estimated coefficient.

Suppose you have a set of independent variables like age, weight, body surface area, duration of hypertension, basal pulse, and stress index based on which you want to analyze its impact on the blood pressure.

In stepwise regression, the best subset of the independent variable is automatically chosen; it either starts by choosing no variable to proceed further (as it adds one variable at a time) or starts with all variables in the model and proceeds backward (removes one variable at a time).

Thus, using regression analysis, you can calculate the impact of each or a group of variables on blood pressure.

05. Ridge Regression Analysis

Ridge regression is based on an ordinary least square method which is used to analyze multicollinearity data (data where independent variables are highly correlated). Collinearity can be explained as a near-linear relationship between variables.

Whenever there is multicollinearity, the estimates of least squares will be unbiased, but if the difference between them is larger, then it may be far away from the true value. However, ridge regression eliminates the standard errors by appending some degree of bias to the regression estimates with a motive to provide more reliable estimates.

If you want, you can also learn about Selection Bias through our blog.

Please note, Assumptions derived through the ridge regression are similar to the least squared regression, the only difference being the normality. Although the value of the coefficient is constricted in the ridge regression, it never reaches zero suggesting the inability to select variables.

Suppose you are crazy about two guitarists performing live at an event near you, and you go to watch their performance with a motive to find out who is a better guitarist. But when the performance starts, you notice that both are playing black-and-blue notes at the same time.

Is it possible to find out the best guitarist having the biggest impact on sound among them when they are both playing loud and fast? As both of them are playing different notes, it is substantially difficult to differentiate them, making it the best case of multicollinearity, which tends to increase the standard errors of the coefficients.

Ridge regression addresses multicollinearity in cases like these and includes bias or a shrinkage estimation to derive results.

06. Lasso Regression Analysis

Lasso (Least Absolute Shrinkage and Selection Operator) is similar to ridge regression; however, it uses an absolute value bias instead of the square bias used in ridge regression.

It was developed way back in 1989 as an alternative to the traditional least-squares estimate with the intention to deduce the majority of problems related to overfitting when the data has a large number of independent variables.

Lasso has the capability to perform both – selecting variables and regularizing them along with a soft threshold. Applying lasso regression makes it easier to derive a subset of predictors from minimizing prediction errors while analyzing a quantitative response.

Please note that regression coefficients reaching zero value after shrinkage are excluded from the lasso model. On the contrary, regression coefficients having more value than zero are strongly associated with the response variables, wherein the explanatory variables can be either quantitative, categorical, or both.

Suppose an automobile company wants to perform a research analysis on average fuel consumption by cars in the US. For samples, they chose 32 models of car and 10 features of automobile design – Number of cylinders, Displacement, Gross horsepower, Rear axle ratio, Weight, ¼ mile time, v/s engine, transmission, number of gears, and number of carburetors.

As you can see a correlation between the response variable mpg (miles per gallon) is extremely correlated to some variables like weight, displacement, number of cylinders, and horsepower. The problem can be analyzed by using the glmnet package in R and lasso regression for feature selection.

07. Elastic Net Regression Analysis

It is a mixture of ridge and lasso regression models trained with L1 and L2 norms. The elastic net brings about a grouping effect wherein strongly correlated predictors tend to be in/out of the model together. Using the elastic net regression model is recommended when the number of predictors is far greater than the number of observations.

Please note that the elastic net regression model came into existence as an option to the lasso regression model as lasso’s variable section was too much dependent on data, making it unstable. By using elastic net regression, statisticians became capable of over-bridging the penalties of ridge and lasso regression only to get the best out of both models.

A clinical research team having access to a microarray data set on leukemia (LEU) was interested in constructing a diagnostic rule based on the expression level of presented gene samples for predicting the type of leukemia. The data set they had, consisted of a large number of genes and a few samples.

Apart from that, they were given a specific set of samples to be used as training samples, out of which some were infected with type 1 leukemia (acute lymphoblastic leukemia) and some with type 2 leukemia (acute myeloid leukemia).

Model fitting and tuning parameter selection by tenfold CV were carried out on the training data. Then they compared the performance of those methods by computing their prediction mean-squared error on the test data to get the necessary results.

A market research survey focuses on three major matrices; Customer Satisfaction , Customer Loyalty , and Customer Advocacy . Remember, although these matrices tell us about customer health and intentions, they fail to tell us ways of improving the position. Therefore, an in-depth survey questionnaire intended to ask consumers the reason behind their dissatisfaction is definitely a way to gain practical insights.

However, it has been found that people often struggle to put forth their motivation or demotivation or describe their satisfaction or dissatisfaction. In addition to that, people always give undue importance to some rational factors, such as price, packaging, etc. Overall, it acts as a predictive analytic and forecasting tool in market research.

When used as a forecasting tool, regression analysis can determine an organization’s sales figures by taking into account external market data. A multinational company conducts a market research survey to understand the impact of various factors such as GDP (Gross Domestic Product), CPI (Consumer Price Index), and other similar factors on its revenue generation model.

Obviously, regression analysis in consideration of forecasted marketing indicators was used to predict a tentative revenue that will be generated in future quarters and even in future years. However, the more forward you go in the future, the data will become more unreliable, leaving a wide margin of error .

Case study of using regression analysis

A water purifier company wanted to understand the factors leading to brand favorability. The survey was the best medium for reaching out to existing and prospective customers. A large-scale consumer survey was planned, and a discreet questionnaire was prepared using the best survey tool .

A number of questions related to the brand, favorability, satisfaction, and probable dissatisfaction were effectively asked in the survey. After getting optimum responses to the survey, regression analysis was used to narrow down the top ten factors responsible for driving brand favorability.

All the ten attributes derived (mentioned in the image below) in one or the other way highlighted their importance in impacting the favorability of that specific water purifier brand.

Regression Analysis in Market Research

It is easy to run a regression analysis using Excel or SPSS, but while doing so, the importance of four numbers in interpreting the data must be understood.

The first two numbers out of the four numbers directly relate to the regression model itself.

  • F-Value: It helps in measuring the statistical significance of the survey model. Remember, an F-Value significantly less than 0.05 is considered to be more meaningful. Less than 0.05 F-Value ensures survey analysis output is not by chance.
  • R-Squared: This is the value wherein the independent variables try to explain the amount of movement by dependent variables. Considering the R-Squared value is 0.7, a tested independent variable can explain 70% of the dependent variable’s movement. It means the survey analysis output we will be getting is highly predictive in nature and can be considered accurate.

The other two numbers relate to each of the independent variables while interpreting regression analysis.

  • P-Value: Like F-Value, even the P-Value is statistically significant. Moreover, here it indicates how relevant and statistically significant the independent variable’s effect is. Once again, we are looking for a value of less than 0.05.
  • Interpretation: The fourth number relates to the coefficient achieved after measuring the impact of variables. For instance, we test multiple independent variables to get a coefficient. It tells us, ‘by what value the dependent variable is expected to increase when independent variables (which we are considering) increase by one when all other independent variables are stagnant at the same value.

In a few cases, the simple coefficient is replaced by a standardized coefficient demonstrating the contribution from each independent variable to move or bring about a change in the dependent variable.

01. Get access to predictive analytics

Do you know utilizing regression analysis to understand the outcome of a business survey is like having the power to unveil future opportunities and risks?

For example, after seeing a particular television advertisement slot, we can predict the exact number of businesses using that data to estimate a maximum bid for that slot. The finance and insurance industry as a whole depends a lot on regression analysis of survey data to identify trends and opportunities for more accurate planning and decision-making.

02. Enhance operational efficiency

Do you know businesses use regression analysis to optimize their business processes?

For example, before launching a new product line, businesses conduct consumer surveys to better understand the impact of various factors on the product’s production, packaging, distribution, and consumption.

A data-driven foresight helps eliminate the guesswork, hypothesis, and internal politics from decision-making. A deeper understanding of the areas impacting operational efficiencies and revenues leads to better business optimization.

03. Quantitative support for decision-making

Business surveys today generate a lot of data related to finance, revenue, operation, purchases, etc., and business owners are heavily dependent on various data analysis models to make informed business decisions.

For example, regression analysis helps enterprises to make informed strategic workforce decisions. Conducting and interpreting the outcome of employee surveys like Employee Engagement Surveys, Employee Satisfaction Surveys, Employer Improvement Surveys, Employee Exit Surveys, etc., boosts the understanding of the relationship between employees and the enterprise.

It also helps get a fair idea of certain issues impacting the organization’s working culture, working environment, and productivity. Furthermore, intelligent business-oriented interpretations reduce the huge pile of raw data into actionable information to make a more informed decision.

04. Prevent mistakes from happening due to intuitions

By knowing how to use regression analysis for interpreting survey results, one can easily provide factual support to management for making informed decisions. ; but do you know that it also helps in keeping out faults in the judgment?

For example, a mall manager thinks if he extends the closing time of the mall, then it will result in more sales. Regression analysis contradicts the belief that predicting increased revenue due to increased sales won’t support the increased operating expenses arising from longer working hours.

Regression analysis is a useful statistical method for modeling and comprehending the relationships between variables. It provides numerous advantages to various data types and interactions. Researchers and analysts may gain useful insights into the factors influencing a dependent variable and use the results to make informed decisions. 

With QuestionPro Research, you can improve the efficiency and accuracy of regression analysis by streamlining the data gathering, analysis, and reporting processes. The platform’s user-friendly interface and wide range of features make it a valuable tool for researchers and analysts conducting regression analysis as part of their research projects.

Sign up for the free trial today and let your research dreams fly!

FREE TRIAL         LEARN MORE

MORE LIKE THIS

data information vs insight

Data Information vs Insight: Essential differences

May 14, 2024

pricing analytics software

Pricing Analytics Software: Optimize Your Pricing Strategy

May 13, 2024

relationship marketing

Relationship Marketing: What It Is, Examples & Top 7 Benefits

May 8, 2024

email survey tool

The Best Email Survey Tool to Boost Your Feedback Game

May 7, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence
  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

research on regression analysis

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading Change and Organizational Renewal
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

What Is Regression Analysis in Business Analytics?

Business professional using calculator for regression analysis

  • 14 Dec 2021

Countless factors impact every facet of business. How can you consider those factors and know their true impact?

Imagine you seek to understand the factors that influence people’s decision to buy your company’s product. They range from customers’ physical locations to satisfaction levels among sales representatives to your competitors' Black Friday sales.

Understanding the relationships between each factor and product sales can enable you to pinpoint areas for improvement, helping you drive more sales.

To learn how each factor influences sales, you need to use a statistical analysis method called regression analysis .

If you aren’t a business or data analyst, you may not run regressions yourself, but knowing how analysis works can provide important insight into which factors impact product sales and, thus, which are worth improving.

Access your free e-book today.

Foundational Concepts for Regression Analysis

Before diving into regression analysis, you need to build foundational knowledge of statistical concepts and relationships.

Independent and Dependent Variables

Start with the basics. What relationship are you aiming to explore? Try formatting your answer like this: “I want to understand the impact of [the independent variable] on [the dependent variable].”

The independent variable is the factor that could impact the dependent variable . For example, “I want to understand the impact of employee satisfaction on product sales.”

In this case, employee satisfaction is the independent variable, and product sales is the dependent variable. Identifying the dependent and independent variables is the first step toward regression analysis.

Correlation vs. Causation

One of the cardinal rules of statistically exploring relationships is to never assume correlation implies causation. In other words, just because two variables move in the same direction doesn’t mean one caused the other to occur.

If two or more variables are correlated , their directional movements are related. If two variables are positively correlated , it means that as one goes up or down, so does the other. Alternatively, if two variables are negatively correlated , one goes up while the other goes down.

A correlation’s strength can be quantified by calculating the correlation coefficient , sometimes represented by r . The correlation coefficient falls between negative one and positive one.

r = -1 indicates a perfect negative correlation.

r = 1 indicates a perfect positive correlation.

r = 0 indicates no correlation.

Causation means that one variable caused the other to occur. Proving a causal relationship between variables requires a true experiment with a control group (which doesn’t receive the independent variable) and an experimental group (which receives the independent variable).

While regression analysis provides insights into relationships between variables, it doesn’t prove causation. It can be tempting to assume that one variable caused the other—especially if you want it to be true—which is why you need to keep this in mind any time you run regressions or analyze relationships between variables.

With the basics under your belt, here’s a deeper explanation of regression analysis so you can leverage it to drive strategic planning and decision-making.

Related: How to Learn Business Analytics without a Business Background

What Is Regression Analysis?

Regression analysis is the statistical method used to determine the structure of a relationship between two variables (single linear regression) or three or more variables (multiple regression).

According to the Harvard Business School Online course Business Analytics , regression is used for two primary purposes:

  • To study the magnitude and structure of the relationship between variables
  • To forecast a variable based on its relationship with another variable

Both of these insights can inform strategic business decisions.

“Regression allows us to gain insights into the structure of that relationship and provides measures of how well the data fit that relationship,” says HBS Professor Jan Hammond, who teaches Business Analytics, one of three courses that comprise the Credential of Readiness (CORe) program . “Such insights can prove extremely valuable for analyzing historical trends and developing forecasts.”

One way to think of regression is by visualizing a scatter plot of your data with the independent variable on the X-axis and the dependent variable on the Y-axis. The regression line is the line that best fits the scatter plot data. The regression equation represents the line’s slope and the relationship between the two variables, along with an estimation of error.

Physically creating this scatter plot can be a natural starting point for parsing out the relationships between variables.

Credential of Readiness | Master the fundamentals of business | Learn More

Types of Regression Analysis

There are two types of regression analysis: single variable linear regression and multiple regression.

Single variable linear regression is used to determine the relationship between two variables: the independent and dependent. The equation for a single variable linear regression looks like this:

Single Variable Linear Regression Formula

In the equation:

  • ŷ is the expected value of Y (the dependent variable) for a given value of X (the independent variable).
  • x is the independent variable.
  • α is the Y-intercept, the point at which the regression line intersects with the vertical axis.
  • β is the slope of the regression line, or the average change in the dependent variable as the independent variable increases by one.
  • ε is the error term, equal to Y – ŷ, or the difference between the actual value of the dependent variable and its expected value.

Multiple regression , on the other hand, is used to determine the relationship between three or more variables: the dependent variable and at least two independent variables. The multiple regression equation looks complex but is similar to the single variable linear regression equation:

Multiple Regression Formula

Each component of this equation represents the same thing as in the previous equation, with the addition of the subscript k, which is the total number of independent variables being examined. For each independent variable you include in the regression, multiply the slope of the regression line by the value of the independent variable, and add it to the rest of the equation.

How to Run Regressions

You can use a host of statistical programs—such as Microsoft Excel, SPSS, and STATA—to run both single variable linear and multiple regressions. If you’re interested in hands-on practice with this skill, Business Analytics teaches learners how to create scatter plots and run regressions in Microsoft Excel, as well as make sense of the output and use it to drive business decisions.

Calculating Confidence and Accounting for Error

It’s important to note: This overview of regression analysis is introductory and doesn’t delve into calculations of confidence level, significance, variance, and error. When working in a statistical program, these calculations may be provided or require that you implement a function. When conducting regression analysis, these metrics are important for gauging how significant your results are and how much importance to place on them.

Business Analytics | Become a data-driven leader | Learn More

Why Use Regression Analysis?

Once you’ve generated a regression equation for a set of variables, you effectively have a roadmap for the relationship between your independent and dependent variables. If you input a specific X value into the equation, you can see the expected Y value.

This can be critical for predicting the outcome of potential changes, allowing you to ask, “What would happen if this factor changed by a specific amount?”

Returning to the earlier example, running a regression analysis could allow you to find the equation representing the relationship between employee satisfaction and product sales. You could input a higher level of employee satisfaction and see how sales might change accordingly. This information could lead to improved working conditions for employees, backed by data that shows the tie between high employee satisfaction and sales.

Whether predicting future outcomes, determining areas for improvement, or identifying relationships between seemingly unconnected variables, understanding regression analysis can enable you to craft data-driven strategies and determine the best course of action with all factors in mind.

Do you want to become a data-driven professional? Explore our eight-week Business Analytics course and our three-course Credential of Readiness (CORe) program to deepen your analytical skills and apply them to real-world business problems.

research on regression analysis

About the Author

  • Search Search Please fill out this field.
  • Macroeconomics

Regression: Definition, Analysis, Calculation, and Example

research on regression analysis

What Is Regression?

Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables).

Also called simple regression or ordinary least squares (OLS), linear regression is the most common form of this technique. Linear regression establishes the linear relationship between two variables based on a line of best fit . Linear regression is thus graphically depicted using a straight line with the slope defining how the change in one variable impacts a change in the other. The y-intercept of a linear regression relationship represents the value of one variable when the value of the other is zero. Nonlinear regression models also exist, but are far more complex.

Regression analysis is a powerful tool for uncovering the associations between variables observed in data, but cannot easily indicate causation. It is used in several contexts in business, finance, and economics. For instance, it is used to help investment managers value assets and understand the relationships between factors such as commodity prices and the stocks of businesses dealing in those commodities.

Regression as a statistical technique should not be confused with the concept of regression to the mean ( mean reversion ).

Key Takeaways

  • A regression is a statistical technique that relates a dependent variable to one or more independent (explanatory) variables.
  • A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the explanatory variables.
  • It does this by essentially fitting a best-fit line and seeing how the data is dispersed around this line.
  • Regression helps economists and financial analysts in things ranging from asset valuation to making predictions.
  • For regression results to be properly interpreted, several assumptions about the data and the model itself must hold.

Joules Garcia / Investopedia

Understanding Regression

Regression captures the correlation between variables observed in a data set and quantifies whether those correlations are statistically significant or not.

The two basic types of regression are simple linear regression and  multiple linear regression , although there are nonlinear regression methods for more complicated data and analysis. Simple linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple linear regression uses two or more independent variables to predict the outcome (while holding all others constant). Analysts can use stepwise regression to examine each independent variable contained in the linear regression model.

Regression can help finance and investment professionals as well as professionals in other businesses. Regression can also help predict sales for a company based on weather, previous sales, gross domestic product (GDP) growth, or other types of conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for pricing assets and discovering the costs of capital.

Regression and Econometrics

Econometrics is a set of statistical techniques used to analyze data in finance and economics. An example of the application of econometrics is to study the income effect using observable data. An economist may, for example, hypothesize that as a person increases their income , their spending will also increase.

If the data show that such an association is present, a regression analysis can then be conducted to understand the strength of the relationship between income and consumption and whether or not that relationship is statistically significant—that is, it appears to be unlikely that it is due to chance alone.

Note that you can have several explanatory variables in your analysis—for example, changes to GDP and inflation in addition to unemployment in explaining stock market prices. When more than one explanatory variable is used, it is referred to as  multiple linear regression . This is the most commonly used tool in econometrics.

Econometrics is sometimes criticized for relying too heavily on the interpretation of regression output without linking it to economic theory or looking for causal mechanisms. It is crucial that the findings revealed in the data are able to be adequately explained by a theory, even if that means developing your own theory of the underlying processes.

Calculating Regression

Linear regression models often use a least-squares approach to determine the line of best fit. The least-squares technique is determined by minimizing the sum of squares created by a mathematical function. A square is, in turn, determined by squaring the distance between a data point and the regression line or mean value of the data set.

Once this process has been completed (usually done today with software), a regression model is constructed. The general form of each type of regression model is:

Simple linear regression:

Y = a + b X + u \begin{aligned}&Y = a + bX + u \\\end{aligned} ​ Y = a + b X + u ​

Multiple linear regression:

Y = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + . . . + b t X t + u where: Y = The dependent variable you are trying to predict or explain X = The explanatory (independent) variable(s) you are  using to predict or associate with Y a = The y-intercept b = (beta coefficient) is the slope of the explanatory variable(s) u = The regression residual or error term \begin{aligned}&Y = a + b_1X_1 + b_2X_2 + b_3X_3 + ... + b_tX_t + u \\&\textbf{where:} \\&Y = \text{The dependent variable you are trying to predict} \\&\text{or explain} \\&X = \text{The explanatory (independent) variable(s) you are } \\&\text{using to predict or associate with Y} \\&a = \text{The y-intercept} \\&b = \text{(beta coefficient) is the slope of the explanatory} \\&\text{variable(s)} \\&u = \text{The regression residual or error term} \\\end{aligned} ​ Y = a + b 1 ​ X 1 ​ + b 2 ​ X 2 ​ + b 3 ​ X 3 ​ + ... + b t ​ X t ​ + u where: Y = The dependent variable you are trying to predict or explain X = The explanatory (independent) variable(s) you are  using to predict or associate with Y a = The y-intercept b = (beta coefficient) is the slope of the explanatory variable(s) u = The regression residual or error term ​

Example of How Regression Analysis Is Used in Finance

Regression is often used to determine how many specific factors such as the price of a commodity, interest rates, particular industries, or sectors influence the price movement of an asset. The aforementioned CAPM is based on regression, and is utilized to project the expected returns for stocks and to generate costs of capital. A stock’s returns are regressed against the returns of a broader index, such as the S&P 500, to generate a beta for the particular stock.

Beta is the stock’s risk in relation to the market or index and is reflected as the slope in the CAPM. The return for the stock in question would be the dependent variable Y, while the independent variable X would be the market risk premium.

Additional variables such as the market capitalization of a stock, valuation ratios, and recent returns can be added to the CAPM to get better estimates for returns. These additional factors are known as the Fama-French factors, named after the professors who developed the multiple linear regression model to better explain asset returns.

Why Is It Called Regression?

Although there is some debate about the origins of the name, the statistical technique described above most likely was termed “regression” by Sir Francis Galton in the 19th century to describe the statistical feature of biological data (such as heights of people in a population) to regress to some mean level. In other words, while there are shorter and taller people, only outliers are very tall or short, and most people cluster somewhere around (or “regress” to) the average.

What Is the Purpose of Regression?

In statistical analysis, regression is used to identify the associations between variables occurring in some data. It can show the magnitude of such an association and determine its statistical significance (i.e., whether or not the association is likely due to chance). Regression is a powerful tool for statistical inference and has been used to try to predict future outcomes based on past observations.

How Do You Interpret a Regression Model?

A regression model output may be in the form of Y = 1.0 + (3.2) X 1 - 2.0( X 2 ) + 0.21.

Here we have a multiple linear regression that relates some variable Y with two explanatory variables X 1 and X 2 . We would interpret the model as the value of Y changes by 3.2× for every one-unit change in X 1 (if X 1 goes up by 2, Y goes up by 6.4, etc.) holding all else constant (all else equal). That means controlling for X 2 , X 1 has this observed relationship. Likewise, holding X1 constant, every one unit increase in X 2 is associated with a 2× decrease in Y. We can also note the y-intercept of 1.0, meaning that Y = 1 when X 1 and X 2 are both zero. The error term (residual) is 0.21.

What Are the Assumptions That Must Hold for Regression Models?

To properly interpret the output of a regression model, the following main assumptions about the underlying data process of what you are analyzing must hold:

  • The relationship between variables is linear.
  • There must be homoskedasticity , or the variance of the variables and error term must remain constant.
  • All explanatory variables are independent of one another.
  • All variables are normally distributed .

The Bottom Line

Regression is a statistical method that tries to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables). It is used in finance, investing, and other disciplines.

Regression analysis uncovers the associations between variables observed in data, but cannot easily indicate causation.

Margo Bergman. “ Quantitative Analysis for Business: 12. Simple Linear Regression and Correlation .” University of Washington Pressbooks, 2022.

Margo Bergman. “ Quantitative Analysis for Business: 13. Multiple Linear Regression .” University of Washington Pressbooks, 2022.

Eugene F. Fama and Kenneth R. French, via Wiley Online Library. “ The Cross-Section of Expected Stock Returns .” The Journal of Finance , Vol. 47, No. 2 (June 1992), Pages 427–465.

Jeffrey M. Stanton, via Taylor & Francis Online. “ Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors .” Journal of Statistics Education , vol. 9, no. 3, 2001, .

CFA Institute. “ Basics of Multiple Regression and Underlying Assumptions .”

research on regression analysis

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices
  • Open access
  • Published: 14 May 2024

Exploring predictors and prevalence of postpartum depression among mothers: Multinational study

  • Samar A. Amer   ORCID: orcid.org/0000-0002-9475-6372 1 ,
  • Nahla A. Zaitoun   ORCID: orcid.org/0000-0002-5274-6061 2 ,
  • Heba A. Abdelsalam 3 ,
  • Abdallah Abbas   ORCID: orcid.org/0000-0001-5101-5972 4 ,
  • Mohamed Sh Ramadan 5 ,
  • Hassan M. Ayal 6 ,
  • Samaher Edhah Ahmed Ba-Gais 7 ,
  • Nawal Mahboob Basha 8 ,
  • Abdulrahman Allahham 9 ,
  • Emmanuael Boateng Agyenim 10 &
  • Walid Amin Al-Shroby 11  

BMC Public Health volume  24 , Article number:  1308 ( 2024 ) Cite this article

119 Accesses

Metrics details

Postpartum depression (PPD) affects around 10% of women, or 1 in 7 women, after giving birth. Undiagnosed PPD was observed among 50% of mothers. PPD has an unfavorable relationship with women’s functioning, marital and personal relationships, the quality of the mother-infant connection, and the social, behavioral, and cognitive development of children. We aim to determine the frequency of PPD and explore associated determinants or predictors (demographic, obstetric, infant-related, and psychosocial factors) and coping strategies from June to August 2023 in six countries.

An analytical cross-sectional study included a total of 674 mothers who visited primary health care centers (PHCs) in Egypt, Yemen, Iraq, India, Ghana, and Syria. They were asked to complete self-administered assessments using the Edinburgh Postnatal Depression Scale (EPDS). The data underwent logistic regression analysis using SPSS-IBM 27 to list potential factors that could predict PPD.

The overall frequency of PPD in the total sample was 92(13.6%). It ranged from 2.3% in Syria to 26% in Ghana. Only 42 (6.2%) were diagnosed. Multiple logistic regression analysis revealed there were significant predictors of PPD. These factors included having unhealthy baby adjusted odds ratio (aOR) of 11.685, 95% CI: 1.405–97.139, p  = 0.023), having a precious baby (aOR 7.717, 95% CI: 1.822–32.689, p  = 0.006), who don’t receive support (aOR 9.784, 95% CI: 5.373–17.816, p  = 0.001), and those who are suffering from PPD. However, being married and comfortable discussing mental health with family relatives are significant protective factors (aOR = 0.141 (95% CI: 0.04–0.494; p  = 0.002) and (aOR = 0.369, 95% CI: 0.146–0.933, p  = 0.035), respectively.

The frequency of PPD among the mothers varied significantly across different countries. PPD has many protective and potential factors. We recommend further research and screenings of PPD for all mothers to promote the well-being of the mothers and create a favorable environment for the newborn and all family members.

Peer Review reports

Introduction

Postpartum depression (PPD) is among the most prevalent mental health issues [ 1 ]. The onset of depressive episodes after childbirth occurs at a pivotal point in a woman’s life and can last for an extended period of 3 to 6 months; however, this varies based on several factors [ 2 ]. PPD can develop at any time within the first year after childbirth and last for years [ 2 ]. It refers to depressive symptoms that a mother experiences during the postpartum period, which are vastly different from “baby blues,” which many mothers experience within three to five days after the birth of their child [ 3 ].

Depressive episodes are twice as likely to occur during pregnancy compared to other times in a woman’s life, and they frequently go undetected and untreated [ 4 ]. According to estimates, almost 50% of mothers with PPD go undiagnosed [ 4 ]. The Diagnostic and Statistical Manual of Mental Disorders (DSM-5) criteria for PPD include mood instability, loss of interest, feelings of guilt, sleep disturbances, sleep disorders, and changes in appetite [ 5 ], as well as decreased libido, crying spells, anxiety, irritability, feelings of isolation, mental liability, thoughts of hurting oneself and/or the infant, and even suicidal ideation [ 6 ].

Approximately 1 in 10 women will experience PPD after giving birth, with some studies reporting 1 in 7 women [ 7 ]. Globally, the prevalence of PPD is estimated to be 17.22% (95% CI: 16.00–18.05) [ 4 ], with a prevalence of up to 15% in the previous year in eighty different countries or regions [ 1 ]. This estimate is lower than the 19% prevalence rate of PPD found in studies from low- and middle-income countries and higher than the 13% prevalence rate (95% CI: 12.3–13.4%) stated in a different meta-analysis of data from high-income countries [ 8 ].

The occurrence of postpartum depression is influenced by various factors, including social aspects like marital status, education level, lack of social support, violence, and financial difficulties, as well as other factors such as maternal age (particularly among younger women), obstetric stressors, parity, and unplanned pregnancy [ 4 ]. When a mother experiences depression, she may face challenges in forming a satisfying bond with her child, which can negatively affect both her partner and the emotional and cognitive development of infants and adolescents [ 4 ]. As a result, adverse effects may be observed in children during their toddlerhood, preschool years, and beyond [ 9 ].

Around one in seven women can develop PPD [ 7 ]. While women experiencing baby blues tend to recover quickly, PPD tends to last longer and severely affects women’s ability to return to normal function. PPD affects the mother and her relationship with the infant [ 7 ]. The prevalence of postpartum depression varies depending on the assessment method, timing of assessment, and cultural disparities among countries [ 7 ]. To address these aspects, we conducted a cross-sectional study focusing on mothers who gave birth within the previous 18 months. Objectives: to determine the frequency of PPD and explore associated determinants or predictors, including demographic, obstetric, infant-related, and psychosocial factors, and coping strategies from June to August 2023 in six countries.

Study design and participants

This is an analytical cross-sectional design and involved 674 mothers during the childbearing period (CBP) from six countries, based on the authors working settings, namely Egypt, Syria, Yemen, Ghana, India, and Iraq. It was conducted from June to August 2023. It involved all mothers who gave birth within the previous 18 months, citizens of one of the targeted countries, and those older than 18 years and less than 40 years. Women who visited for a routine postpartum follow-up visit and immunization of their newborns were surveyed.

Multiple pregnancies, illiteracy, or anyone deemed unfit to participate in accordance with healthcare authorities, mothers who couldn’t access or use the Internet, mothers who couldn’t read or speak Arabic or English and couldn’t deal with the online platform or smart devices, mothers whose babies were diagnosed with serious health problems, were stillborn, or experienced intrauterine fetal death, and participants with complicated medical, mental, or psychological disorders that interfered with completing the questionnaire were all exclusion criteria. There were no incentives offered to encourage participation.

Sample size and techniques

The sample size was estimated according to the following equation: n = Z 2 P (1-P)/d 2 . This calculation was based on the results of a systematic review and meta-analysis in 2020 of 17% as the worldwide prevalence of PPD and 12% as the worldwide incidence of PPD, as well as a 5% precision percentage, 80% power of the study, a 95% confidence level, and an 80% response rate [ 11 ]. The total calculated sample size is 675. The sample was diverse in terms of nationality, with the majority being Egyptian (16.3%), followed by Yemeni (24.3%) and Indian (19.1%), based on many factors discussed in the limitation section.

The sampling process for recruiting mothers utilized a multistage approach. Two governorates were randomly selected from each country. Moreover, we selected one rural and one urban area from each governorate. Through random selection, participants were chosen for the study. Popular and officially recognized online platforms, including websites and social media platforms such as Facebook, Twitter, WhatsApp groups, and registered emails across various health centers, were utilized for reaching out to participants. Furthermore, a community-based sample was obtained from different public locations, including well-baby clinics, PHCs, and family planning units.

Mothers completed the questionnaire using either tablets or cellphones provided by the data collectors or by scanning the QR code. All questions were mandatory to prevent incomplete forms. Once they provided their informed consent, they received the questionnaire, which they completed and submitted. To enhance the response rate, reminder messages and follow-up communications were employed until the desired sample size was achieved or until the end of August. To avoid seasonal affective disorders, the meteorological autumn season began on the 1st day of September, which may be associated with Autum depressive symptoms that may confound or affect our results.

Data collection tool

Questionnaire development and structure.

The questionnaire was developed and adapted based on data obtained from previous studies [ 7 , 8 , 9 , 10 , 11 , 12 ]. Initially, it was created in English and subsequently translated into Arabic. To ensure accuracy, a bilingual panel consisting of two healthcare experts and an externally qualified medical translator translated the English version into Arabic. Additionally, two English-speaking translators performed a back translation, and the original panel was consulted if any concerns arose.

Questionnaire validation

To collect the data, an online, self-administered questionnaire was utilized, designed in Arabic with a well-structured format. We conducted an assessment of the questionnaire’s reliability and validity to ensure a consistent interpretation of the questions. The questionnaire underwent validation by psychiatrists, obstetricians, and gynecologists. Furthermore, in a pilot study involving 20 women of CBA, the questionnaire’s clarity and comprehensibility were evaluated. It is important to note that the findings from the pilot study were not included in our main study.

The participants were asked to rate the questionnaire’s organization, clarity, and length, as well as provide a general opinion. Following that, certain questions were revised in light of their input. To check for reliability and reproducibility, the questionnaire was tested again on the same people one week later. The final data analysis will not include the data collected during the pilot test. We calculated a Cronbach’s alpha of 0.76 for the questionnaire.

The structure of the questionnaire

After giving their permission to take part in the study. The questionnaire consisted of the following sections:

Study information and electronic solicitation of informed consent.

Demographic and health-related factors: age, gender, place of residence, educational level, occupation, marital status, weight, height, and the fees of access to healthcare services.

Obstetric history: number of pregnancies, gravida, history of abortions, number of live children, history of dead children, inter-pregnancy space (y), current pregnancy status, type of the last delivery, weight gain during pregnancy (kg), baby age (months), premature labor, healthy baby, baby admitted to the NICU, Feeding difficulties, pregnancy problems, postnatal problems, and natal problems The nature of baby feeding.

Assessment of postpartum depression (PPD) levels using the Edinburgh 10-question scale: This scale is a simple and effective screening tool for identifying individuals at risk of perinatal depression. The EPDS (Edinburgh Postnatal Depression Scale) is a valuable instrument that helps identify the likelihood of a mother experiencing depressive symptoms of varying severity. A score exceeding 13 indicates an increased probability of a depressive illness. However, clinical discretion should not be disregarded when interpreting the EPDS score. This scale captures the mother’s feelings over the past week, and in cases of uncertainty, it may be beneficial to repeat the assessment after two weeks. It is important to note that this scale is not capable of identifying mothers with anxiety disorders, phobias, or personality disorders.

For Questions 1, 2, and 4 (without asterisks): Scores range from 0 to 3, with the top box assigned a score of 0 and the bottom box assigned a score of 3. For Questions 3 and 5–10 (with asterisks): Scores are reversed, with the top box assigned a score of 3 and the bottom box assigned a score of 0. The maximum score achievable is 30, and a probability of depression is considered when the score is 10 or higher. It is important to always consider item 10, which pertains to suicidal ideation [ 12 ].

Psychological and social characteristics: received support or treatment for PPD, awareness of symptoms and risk factors, experienced cultural stigma or judgment about PPD in the community, suffer from any disease or mental or psychiatric disorder, have you ever been diagnosed with PPD, problems with the husband, and financial problems.

Coping strategies and causes for not receiving the treatment and reactions to PPD, in descending order: social norms, cultural or traditional beliefs, personal barriers, 48.5% geographical or regional disparities in mental health resources, language or communication barriers, and financial constraints.

Statistical analysis

The collected data was computerized and statistically analyzed using the SPSS program (Statistical Package for Social Science), version 27. The data was tested for normal distribution using the Shapiro-Walk test. Qualitative data was represented as frequencies and relative percentages. Quantitative data was expressed as mean ± SD (standard deviation) if it was normally distributed; otherwise, median and interquartile range (IQR) were used. The Mann-Whitney test (MW) was used to calculate the difference between quantitative variables in two groups for non-parametric variables. Correlation analysis (using Spearman’s method) was used to assess the relationship between two nonparametric quantitative variables. All results were considered statistically significant when the significant probability was < 0.05. The chi-square test (χ 2 ) and Fisher exact were used to calculate the difference between qualitative variables.

The frequency of PPD among mothers (Fig.  1 )

figure 1

The frequency of PPD among the studied mothers

The frequency of PPD in the total sample using the Edinburgh 10-question scale was 13.5% (Table S1) and 92 (13.6%). Which significantly ( p  = 0.001) varied across different countries, being highest among Ghana mothers 13 (26.0%) out of 50 and Indians 28 (21.7%) out of 129. Egyptian 21 (19.1) out of 110, Yemen 14 (8.5%) out of 164, Iraq 13 (7.7%) out of 168, and Syria 1 (2.3%) out of 43 in descending order. Nationality is also significantly associated with PPD ( p  = 0.001).

Demographic, and health-related characteristics and their association with PPD (Table  1 )

The study included 674 participants. The median age was 27 years, with 407 (60.3%) of participants falling in the >25 to 40-year-old age group. The majority of participants were married, 650 (96.4%), had sufficient monthly income, 449 (66.6%), 498 (73.9%), had at least a preparatory or high school level of education, and were urban. Regarding health-related factors, 270 (40.01%) smoked, 645 (95.7%) smoked, 365 (54.2%) got the COVID-19 vaccine, and 297 (44.1%) got COVID-19. Moreover, 557 (82.6%) had no comorbidities, 623 (92.4%) had no psychiatric illness or family history, and they charged for health care services for themselves 494 (73.3%).

PPD is significant ( p  < 0.05). Higher among single or widowed women 9 (56.3%) and mothers who had both medical, mental, or psychological problems 2 (66.7%), with ex-cigarette smoking 5 (35.7%) ( p  = 0.033), alcohol consumption ( p  = 0.022) and mothers were charged for the health care services for themselves 59 (11.9%).

Obstetric, current pregnancy, and infant-related characteristics and their association with PPD (Table  2 )

The majority of the studied mothers were on no hormonal treatment or contraceptive pills 411 (60.9%), the current pregnancy was unplanned and wanted 311 (46.1%), they gained 10 ≥ kg 463 (68.6%), 412 (61.1%) delivered vaginal, a healthy baby 613 (90.9%), and, on breastfeeding, only 325 (48.2%).

There was a significant ( P  < 0.05) association observed between PPD, which was significantly higher among mothers on contraceptive methods, and those who had 1–2 live births (76.1%) and mothers who had interpregnancy space for less than 2 years. 86 (93.5%), and those who had a history of dead children. Moreover, among those who had postnatal problems (27.2%).

The psychosocial characteristics and their association with PPD (Table  3 )

Regarding the psychological and social characteristics of the mothers, the majority of mothers were unaware of the symptoms of PPD (75%), and only 236 (35.3%) experienced cultural stigma or judgment about PPD in the community. About 41 (6.1%) were diagnosed with PPD during the previous pregnancy, and only 42 (6.2%) were diagnosed and on medications.

A p -value of less than 0.001 demonstrates a highly statistically significant association with the presence of PPD. Mothers with PPD were significantly more likely to have a history of or be currently diagnosed with PPD, as well as financial and marital problems. Experienced cultural stigma or judgment about PPD and received more support.

Coping strategies and causes for not receiving the treatment and reaction to PPD (Table  3 ; Fig.  2 )

figure 2

Causes for not receiving the treatment and reaction to PPD

Around half of the mothers didn’t feel comfortable discussing mental health: 292 (43.3%) with a physician, 307 (45.5%) with a husband, 326 (48.4%) with family, and 472 (70.0%) with the community. Moreover, mothers with PPD felt significantly more comfortable discussing mental health in descending order: 46 (50.0%) with a physician, 41 (44.6%) with a husband, and 39 (42.3%) with a family (Table  3 ).

There were different causes for not receiving the treatment and reactions to PPD, in descending order: 65.7% social norms, 60.5% cultural or traditional beliefs, 56.5% personal barriers, 48.5% geographical or regional disparities in mental health resources, 47.4% language or communication barriers, and 39.7% financial constraints.

Prediction of PPD (significant demographics, obstetric, current pregnancy, and infant-related, and psychosocial), and coping strategies derived from multiple logistic regression analysis (Table  4 ).

Significant demographic predictors of ppd.

Marital Status (Married or Single): The adjusted odds ratio (aOR) among PPD mothers who were married in comparison to their single counterparts was 0.141 (95% CI: 0.04–0.494; p -value = 0.002).

Nationality: For PPD Mothers of Yemeni nationality compared to those with Egyptian nationality, the aOR was 0.318 (95% CI: 0.123–0.821, p  = 0.018). Similarly, for Syrian nationality in comparison to Egyptian nationality, the aOR was 0.111 (95% CI: 0.0139–0.887, p  = 0.038), and for Iraqi nationality compared to Egyptian nationality, the aOR was 0.241 (95% CI: 0.0920–0.633, p  = 0.004).

Significant obstetric, current pregnancy, and infant-related characteristics predictors of PPD

Current Pregnancy Status (Precious Baby—Planned): The aOR for the occurrence of PPD among women with a “precious baby” relative to those with a “planned” pregnancy was 7.717 (95% CI: 1.822–32.689, p  = 0.006).

Healthy Baby (No-Yes): The aOR for the occurrence of PPD among women with unhealthy babies in comparison to those with healthy ones is 11.685 (95% CI: 1.405–97.139, p  = 0.023).

Postnatal Problems (No–Yes): The aOR among PPD mothers reporting postnatal problems relative to those not reporting such problems was 0.234 (95% CI: 0.0785–0.696, p  = 0.009).

Significant psychological and social predictors of PPD

Receiving support or treatment for PPD (No-Yes): The aOR among PPD mothers who were not receiving support or treatment relative to those receiving support or treatment was 9.784 (95% CI: 5.373–17.816, p  = 0.001).

Awareness of symptoms and risk factors (No-Yes): The aOR among PPD mothers who lack awareness of symptoms and risk factors relative to those with awareness was 2.902 (95% CI: 1.633–5.154, p  = 0.001).

Experienced cultural stigma or judgement about PPD in the community (No-Yes): The aOR among PPD mothers who had experienced cultural stigma or judgment in the community relative to those who have not was 4.406 (95% CI: 2.394–8.110, p  < 0.001).

Suffering from any disease or mental or psychiatric disorder: For “Now I am suffering—not at all,” the aOR among PPD mothers was 12.871 (95% CI: 3.063–54.073, p  = 0.001). Similarly, for “Had a past history but was treated—not at all,” the adjusted odds ratio was 16.6 (95% CI: 2.528–108.965, p  = 0.003), and for “Had a family history—not at all,” the adjusted odds ratio was 3.551 (95% CI: 1.012–12.453, p  = 0.048).

Significant coping predictors of PPD comfort: discussing mental health with family (maybe yes)

The aOR among PPD mothers who were maybe more comfortable discussing mental health with family relatives was 0.369 (95% CI: 0.146–0.933, p  = 0.035).

PDD is a debilitating mental disorder that has many potential and protective risk factors that should be considered to promote the mental and psychological well-being of the mothers and to create a favorable environment for the newborn and all family members. This multinational cross-sectional survey was conducted in six different countries to determine the frequency of PDD using EPDS and to explore its predictors. It was found that PPD was a prevalent problem that varied across different nations.

The frequency of PPD across the studied countries

Using the widely used EPDS to determine the current PPD, we found that the overall frequency of PPD in the total sample was 92 (13.6%). Which significantly ( p  = 0.001) varied across different countries, being highest among Ghana mothers 13 (26.0%) out of 50 and Indians 28 (21.7%) out of 129. Egyptian 21 (19.1) out of 110, Yemen 14 (8.5%) out of 164, Iraq 13 (7.7%) out of 169, and Syria 1 (2.3%) out of 43 in descending order. This prevalence was similar to that reported by Hairol et al. (2021) in Malaysia (14.3%) [ 13 ], Yusuff et al. (2010) in Malaysia (14.3%) [ 14 ], and Nakku et al. (2006) in New Delhi (12.75%) [ 15 ].

While the frequency of PPD varied greatly based on the timing, setting, and existence of many psychosocial and post-partum periods, for example, it was higher than that reported in Italy (2012), which was 4.7% [ 16 ], in Turkey (2017) was 9.1%/110 [ 17 ], 9.2% in Sudan [ 18 ], Eritrea (2020) was 7.4% [ 19 ], in the capital Kuala Lumpur (2001) was (3.9%) [ 20 ], in Malaysia (2002) was (9.8%) [ 21 ], and in European countries. (2021) was 13–19% [ 22 ].

Lower frequencies were than those reported; PPD is a predominant problem in Asia, e.g., in Pakistan, the three-month period after childbirth, ranging from 28.8% in 2003 to 36% in 2006 to 94% in 2007, while after 12 months after childbirth, it was 62% in 2021 [ 23 – 24 ]. While in 2022 Afghanistan 45% after their first labour [ 25 ] in Canada (2015) was 40% [ 26 ], in India, the systematic review in 2022 was 22% of Primipara [ 27 ], in Malaysia (2006) was 22.8% [ 28 ], in India (2019) was 21.5% [ 29 ], in the Tigray zone in Ethiopia (2017) was 19% [ 30 ], varied in Iran between 20.3% and 35% [ 31 – 32 ], and in China was 499 (27.37%) out of 1823 [ 33 ]. A possible explanation might be the differences in the study setting and the type of design utilized. Other differences should be considered, like different populations with different socioeconomic characteristics and the variation in the timing of post-partum follow-up. It is vital to consider the role of culture, the impact of patients’ beliefs, and the cultural support for receiving help for PPD.

Demographic and health-related associations, or predictors of PPD (Tables  1 and 4 )

Regarding age, our study found no significant difference between PPD and non-PPD mothers with regard to age. In agreement with our study [ 12 , 34 , 35 ], other studies [ 36 , 37 , 38 ] found an inverse association between women’s age and PPD, with an increased risk of PPD (increases EPDS scores) at a younger age significantly, as teenage mothers, being primiparous, encounter difficulty during the postpartum period due to their inability to cope with financial and emotional difficulties, as well as the challenge of motherhood. Cultural factors and social perspectives of young mothers in different countries could be a reason for this difference. [ 38 – 39 ] and Abdollahi et al. [ 36 ] reported that older mothers were a protective factor for PPD (OR = 0.88, 95% CI: 0.84–0.92].

Regarding marital status, after controlling for other variables, married mothers exhibited a significantly diminished likelihood of experiencing PPD in comparison to single women (0.141; 95% CI: 0.04–0.494; p  = 0.002). Also, Gebregziabher et al. [ 19 ] reported that there were statistically significant differences in proportions between mothers’ PPD and marital status.

Regarding the mother’s education, in agreement with our study, Ahmed et al. [ 34 ] showed that there was no statistically significant difference between PPD and a mother’s education. While Agarwala et al. [ 29 ] showed that a higher level of mother’s education. increases the risk of PPD, Gebregziabher et al. [ 19 ] showed that the housewives were 0.24 times less likely to develop PPD as compared to the employed mothers (aOR = 0.24, 95% CI: 0.06–0.97; p  = 0.046); those mothers who perceived their socioeconomic status (SES) as low were 13 times more likely to develop PPD as compared to the mothers who had good SES (aOR = 13.33, 95% CI: 2.66–66.78; p  = 0.002).

Regarding the SES or monthly income, while other studies [ 18 , 40 ] found that there was a statistically significant association between PPD mothers and different domains of SES, 34% of depressed women were found to live under low SES conditions in comparison to only 15.4% who were found to live in high SES and experienced PPD. In disagreement with our study, Hairol et al. [ 12 ] demonstrated that the incidence of PPD was significantly p  = 0.01 higher for participants from the low-income group (27.27%) who were 2.58 times more likely to have PDD symptoms (OR: 2.58, 95% CI: 1.23–5.19; p  = 0.01 compared to those from the middle- and high-income groups (8.33%), and low household income (OR = 3.57 [95% CI: 1.49–8.5] increased the odds of PPD [ 41 ].

Adeyemo et al. (2020),and Al Nasr et al. (2020) revealed that there was no significant difference between the occurrence of PPD and socio-demographic characteristics. This difference may be due to a different sample size and ethnicity [ 42 , 43 ]. In agreement with our findings, Abdollahi et al. [ 36 ] demonstrated that after multiple logistic regression analyses, there were increased odds of PPD with a lower state of general health (OR = 1.08 [95% CI: 1.06–1.11]), gestational diabetes (OR = 2.93 [95% CI = 1.46–5.88]), and low household income (OR = 3.57 [95% CI: 1.49–8.5]). The odds of PPD decreased.

Regarding access to health care, in agreement with studies conducted at Gondar University Hospital, Ethiopia [ 18 ], North Carolina, Colorado [ 21 ], Khartoum, Sudan [ 44 ], Asaye et al. [ 45 ], the current study found that participants who did not have free access to the healthcare system were riskier for the development of PPD. the study results may be affected by the care given during the antenatal care (ANC) visits. This can be explained by the fact that PPD was four times higher than that of mothers who did not have ANC, where counseling and anticipatory guidance care are given that build maternal self-esteem and resiliency, along with knowledge about normal and problematic complications to discuss at care visits and their right to mental and physical wellness, including access to care. The increased access to care (including postpartum visits) will increase the diagnosis of PPD and provide guidance, reassurance, and appropriate referrals. Healthcare professionals have the ability to both educate and empower mothers as they care for their babies, their families, and themselves [ 46 ].

Regarding nationality, for PPD mothers of Yemeni nationality compared to those of Egyptian nationality, the aOR is 0.318 (95% CI: 0.123–0.821, p  = 0.018). Similarly, for Syrian nationality in comparison to Egyptian nationality, the aOR is 0.111 (95% CI: 0.0139–0.887, p  = 0.038), and for Iraqi nationality compared to Egyptian nationality, the aOR is 0.241 (95% CI: 0.0920–0.633, p  = 0.004). These findings indicated that, while accounting for other covariates, individuals from the aforementioned nationalities were less predisposed to experiencing PPD than their Egyptian counterparts. These findings can be explained by the fact that, in Egypt, the younger age of marriage, especially in rural areas, poor mental health services, being illiterate, dropping out of school early, unemployment, and the stigma of psychiatric illnesses are cultural factors that hinder the diagnosis and treatment of PPD [ 40 ].

Obstetric, current pregnancy, and infant-related characteristics and their association or predictors of PPD (Tables  2 and 4 )

In the present study, the number of dead children was significantly associated with PPD. This report was supported by studies conducted with Gujarati postpartum women [ 41 ] and rural southern Ethiopia [ 43 ]. This might be because mothers who have dead children pose different psychosocial problems and might regret it for fear of complications developing during their pregnancy. Agarwala et al. [ 29 ] found that a history of previous abortions and having more than two children increased the risk of developing PPD due to a greater psychological burden. The inconsistencies in the findings of these studies indicate that the occurrence of postpartum depression is not solely determined by the number of childbirths.

In obstetric and current pregnancy , there was no significant difference regarding the baby’s age, number of miscarriages, type of last delivery, premature labour, healthy baby, baby admitted to the neonatal intensive care unit (NICU), or feeding difficulties. In agreement with Al Nasr et al. [ 42 ], inconsistent with Asaye et al. [ 45 ], they showed that concerning multivariable logistic regression analysis, abortion history, birth weight, and gestational age were significant associated factors of postpartum depression at a value of p <  0.05.

However, a close association was noted between the mode of delivery and the presence of PPD in mothers, with p  = 0.107. There is a high tendency towards depression seen in mothers who have delivered more than three times (44%). In disagreement with what was reported by Adeyemo et al. [ 41 ], having more than five children ( p  = 0.027), cesarean section delivery ( p  = 0.002), and mothers’ poor state of health since delivery ( p  < 0.001) are associated with an increase in the risk of PPD [ 47 ]. An increased risk of cesarean section as a mode of delivery was observed (OR = 1.958, p  = 0.049) in a study by Al Nasr et al. [ 42 ].

We reported breastfeeding mothers had a lower, non-significant frequency of PPD compared to non-breast-feeding mothers (36.6% vs. 45%). In agreement with Ahmed et al. [ 34 ], they showed that with respect to breastfeeding and possible PPD, about 67.3% of women who depend on breastfeeding reported no PPD, while 32.7% only had PP. Inconsistency with Adeyemo et al. [ 41 ], who reported that unexclusive breastfeeding ( p  = 0.003) was associated with PPD, while Shao et al. [ 40 ] reported that mothers who were exclusively formula feeding had a higher prevalence of PPD.

Regarding postnatal problems, our results revealed that postnatal problems display a significant association with PPD. In line with our results, Agarwala et al. [ 29 ] and Gebregziabher et al. [ 19 ] showed that mothers who experienced complications during childbirth, those who became ill after delivery, and those whose babies were unhealthy had a statistically significant higher proportion of PPD.

Hormone-related contraception methods were found to have a statistically significant association with PPD, consistent with the literature [ 46 ]; this can be explained by the hormones and neurotransmitters as biological factors that play significant roles in the onset of PPD. Estrogen hormones act as regulators of transcription from brain neurotransmitters and modulate the action of serotonin receptors. This hormone stimulates neurogenesis, the process of generating new neurons in the brain, and promotes the synthesis of neurotransmitters. In the hypothalamus, estrogen modulates neurotransmitters and governs sleep and temperature regulation. Variations in the levels of this hormone or its absence are linked to depression [ 19 ].

Participants whose last pregnancy was unplanned were 3.39 times more likely to have postpartum depression (aOR = 3.39, 95% CI: 1.24–9.28; p  = 0.017). Mothers who experienced illness after delivery were more likely to develop PPD as compared to their counterparts (aOR = 7.42, 95% CI: 1.44–34.2; p  = 0.016) [ 40 ]. In agreement with Asaye et al. [ 45 ] and Abdollahi et al. [ 36 ], unplanned pregnancy has been associated with the development of PPD (aOR = 2.02, 95% CI: 1.24, 3.31) and OR = 2.5 [95% CI: 1.69–3.7] than those of those who had planned, respectively.

The psychosocial characteristics and their association with PPD

Mothers with a family history of mental illness were significantly associated with PPD. This finding was in accordance with studies conducted in Istanbul, Turkey [ 47 ], and Bahrain [ 48 ]. Other studies also showed that women with PPD were most likely to have psychological symptoms during pregnancy [ 43 , 44 , 45 , 46 , 47 , 48 , 49 ]. A meta-analysis of 24,000 mothers concluded that having depression and anxiety during pregnancy and a previous history of psychiatric illness or a history of depression are strong risk factors for developing PPD [ 50 , 51 , 52 ]. Asaye et al. [ 45 ], mothers whose relatives had mental illness history were (aOR = 1.20, 95% CI: 1.09, 3.05 0) be depressed than those whose relatives did not have mental illness history.

This can be attributed to the links between genetic predisposition and mood disorders, considering both nature and nurture are important to address PDD. PPD may be seen as a “normal” condition for those who are acquainted with relatives with mood disorders, especially during the CBP. A family history of mental illness can be easily elicited in the ANC first visit history and requires special attention during the postnatal period. There are various risk factors for PPD, including stressful life events, low social support, the infant’s gender preference, and low income [ 53 ].

Concerning familial support and possible PPD, a statistically significant association was found between them. We reported that mothers who did not have social support (a partner or the father of the baby) had higher odds (aOR = 5.8, 95% CI: 1.33–25.29; p  = 0.019) of experiencing PPD. Furthermore, Al Nasr et al. [ 42 ] revealed a significant association between the PPD and an unsupportive spouse ( P value = 0.023). while it was noted that 66.5% of women who received good familial support after giving birth had no depression, compared to 33.5% who only suffered from possible PPD [ 40 ]]. Also, Adeyemo et al. [ 41 ] showed that some psychosocial factors were significantly associated with having PPD: having an unsupportive partner ( p  < 0.001), experiencing intimate partner violence ( p  < 0.001), and not getting help in taking care of their baby ( p  < 0.001). Al Nasr et al. (2020) revealed that the predictor of PPD was an unsupportive spouse (OR = 4.53, P  = 0.049) [ 48 ].

Regarding the perceived stigma, in agreement with our study, Bina (2020) found that shame, stigma, the fear of being labeled mentally ill, and language and communication barriers were significant factors in women’s decisions to seek treatment or accept help [ 53 ]. Other mothers were hesitant about mental health services [ 54 ]. It is noteworthy that some PPD mothers refused to seek treatment due to perceived insufficient time and the inconvenience of attending appointments [ 55 ].

PPD was significantly higher among mothers with financial problems or problems with their husbands. This came in agreement with Ahmed et al. [ 34 ], who showed that, regarding stressful conditions and PPD, there was a statistically significant association with a higher percentage of PPD among mothers who had a history of stressful conditions (59.3%), compared to those with no history of stressful conditions (40.7%). Furthermore, Al Nasr et al. (2020) revealed that stressful life events contributed significantly ( P value = 0.003) to the development of PPD in the sample population. Al Nasr et al. stressful life events (OR = 2.677, p  = 0.005) [ 42 ].

Coping strategies: causes of fearing and not seeking

Feeling at ease discussing mental health topics with one’s husband, family, community, and physician and experiencing cultural stigma or judgment regarding PPD within the community was significantly associated with the presence of PPD. In the current study, there were different reasons for not receiving the treatment, including cultural or traditional beliefs, language or communication barriers, social norms, and geographical or regional disparities in mental health resources. Haque and Malebranche [ 56 ] portrayed culture and the various conceptualizations of the maternal role as barriers to women seeking help and treatment.

In the present study, marital status, nationality, current pregnancy status, healthy baby, postnatal problems, receiving support or treatment for PPD, having awareness of symptoms and risk factors of PPD, suffering from any disease or mental or psychiatric disorder, comfort discussing mental health with family, and experiencing cultural stigma or judgment about PPD in the community were the significant predictors of PPD. In agreement with Ahmed et al. [ 34 ], the final logistic regression model contained seven predictors for PPD symptoms: SES, history of depression, history of PPD, history of stressful conditions, familial support, unwanted pregnancy, and male preference.

PPD has been recognized as a public health problem and may cause negative consequences for infants. It is estimated that 20 to 40% of women living in low-income countries experience depression during pregnancy or the postpartum period. The prevalence of PPD shows a wide variation, affecting 8–50% of postnatal mothers across countries [ 19 ].

Strengths and limitations

Strengths of our study include its multinational scope, which involved participants from six different countries, enhancing the generalizability of the findings. The study also boasted a large sample size of 674 participants, increasing the statistical power and reliability of the results. Standardized measures, such as the Edinburgh Postnatal Depression Scale (EPDS), were used for assessing postpartum depression, ensuring consistency and comparability across diverse settings. Additionally, the study explored a comprehensive range of predictors and associated factors of postpartum depression, including demographic, obstetric, health-related, and psychosocial characteristics. Rigorous analysis techniques, including multiple logistic regression analyses, were employed to identify significant predictors of postpartum depression, controlling for potential confounders and providing robust statistical evidence.

However, the study has several limitations that should be considered. Firstly, its cross-sectional design limits causal inference, as it does not allow for the determination of temporal relationships between variables. Secondly, the reliance on self-reported data, including information on postpartum depression symptoms and associated factors, may be subject to recall bias and social desirability bias. Thirdly, the use of convenience sampling methods may introduce selection bias and limit the generalizability of the findings to a broader population. Lastly, cultural differences in the perception and reporting of postpartum depression symptoms among participants from different countries could influence the results.

Moreover, the variation in sample size and response rates among countries can be attributed to two main variables. (1) The methodology showed that the sample size was determined by considering several parameters, such as allocating proportionately to the mothers who gave birth and fulfilling the selection criteria during the data collection period served by each health center. (2) The political turmoil in Syria affects how often and how well people can use the Internet, especially because the data was gathered using an online survey link, leading to a relatively low number of responses from those areas. (3) Language barrier in Ghana: as we used the Arabic and English-validated versions of the EPDS, Ghana is a multilingual country with approximately eighty languages spoken. Although English is considered an official language, the primarily spoken languages in the southern region are Akan, specifically the Akuapem Twi, Asante Twi, and Fante dialects. In the northern region, primarily spoken are the Mole-Dagbani ethnic languages, Dagaare and Dagbanli. Moreover, there are around seventy ethnic groups, each with its own unique language [ 57 ]. (4) At the end of the data collection period, to avoid seasonal affective disorders, the meteorological autumn season began on the 1st day of September, which may be associated with autumm depressive symptoms that may confound or affect our results. Furthermore, the sampling methods were not universal across all Arabic countries, potentially constraining the generalizability of our findings.

Recommendations

The antenatal programme should incorporate health education programmes about the symptoms of PPD. Health education programs about the symptoms of PPD should be included in the antenatal program.

Mass media awareness campaigns have a vital role in raising public awareness about PPD-related issues. Mass media.

The ANC first visit history should elicit a family history of mental illness, enabling early detection of risky mothers. Family history of mental illness can be easily elicited in the ANC first visit history.

For effective management of PPD, effective support (from husband, friends, and family) is an essential component. For effective management of PPD effectiveness of support.

The maternal (antenatal, natal, and postnatal) services should be provided for free and of high quality The maternal (antenatal, natal, postnatal) services should be provided free and of high quality.

It should be stressed that although numerous studies have been carried out on PPD, further investigation needs to be conducted on the global prevalence and incidence of depressive symptoms in pregnant women and related risk factors, especially in other populations.

Around 14% of the studied mothers had PPD, and the frequency varies across different countries and half of them do not know. Our study identified significant associations and predictors of postpartum depression (PPD) among mothers. Marital status was significantly associated with PPD, with married mothers having lower odds of experiencing PPD compared to single mothers. Nationality also emerged as a significant predictor, with Yemeni, Syrian, and Iraqi mothers showing lower odds of PPD compared to Egyptian mothers. Significant obstetric, current pregnancy, and infant-related predictors included the pregnancy status, the health status of the baby, and the presence of postnatal problems. Among psychological and social predictors, receiving support or treatment for PPD, awareness of symptoms and risk factors, experiencing cultural stigma or judgment about PPD, and suffering from any disease or mental disorder were significantly associated with PPD. Additionally, mothers who were maybe more comfortable discussing mental health with family relatives had lower odds of experiencing PPD.

These findings underscore the importance of considering various demographic, obstetric, psychosocial, and coping factors in the identification and management of PPD among mothers. Targeted interventions addressing these predictors could potentially mitigate the risk of PPD and improve maternal mental health outcomes.

Data availability

Yes, I have research data to declare.The data is available when requested from the corresponding author [email protected].

Abbreviations

Adjusted Odds Ratio

  • Postpartum depression

Primary Health Care centers

Socioeconomic Status

program (Statistical Package for Social Science

The Edinburgh Postnatal Depression Scale

The Neonatal Intensive Care Unit

Sultan P, Ando K, Elkhateb R, George RB, Lim G, Carvalho B et al. (2022). Assessment of Patient-Reported Outcome Measures for Maternal Postpartum Depression Using the Consensus-Based Standards for the Selection of Health Measurement Instruments Guideline: A Systematic Review. JAMA Network Open; 1;5(6).

Crotty F, Sheehan J. Prevalence and detection of postnatal depression in an Irish community sample. Ir J Psychol Med. 2004;21:117–21.

Article   PubMed   Google Scholar  

Goodman SH, Brand SR. Parental psychopathology and its relation to child psychopathology. In: Hersen M, Gross AM, editors. Handbook of clinical psychology vol 2: children and adolescents. Hoboken, NJ: Wiley; 2008. pp. 937–65.

Google Scholar  

Wang Z, Liu J, Shuai H, Cai Z, Fu X, Liu Y, Xiao et al. (2021). Mapping global prevalence of depression among postpartum women. Transl Psychiatry. 2021;11(1):543. https://doi.org/10.1038/s41398-021-01663-6 . Erratum in: Transl Psychiatry; 20;11(1):640. PMID: 34671011IF: 6.8 Q1 B1; PMCID: PMC8528847IF: 6.8 Q1 B1.Lase accessed Jan 2024.

American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders. 5th edition. Arlington, VA: American Psychiatric Association; Lase accessed October 2023.

Robertson E, Grace S, Wallington T, Stewart DE. Antenatal risk factors for postpartum depression: a synthesis of recent literature. Gen Hosp Psychiatry. 2004;26:289–95.

Gaynes BN, Gavin N, Meltzer-Brody S, Lohr KN, Swinson T, Gartlehner G, Brody S, Miller WC. Perinatal depression: prevalence, screening accuracy, and screening outcomes: Summary. AHRQ evidence report summaries; 2005. pp. 71–9.

O’hara MW, Swain AM. (1996). Rates and risk of postpartum depression: a meta-analysis. Int Rev Psychiatry. 1996; 8:37–54.

Goodman SH, Brand SR. (2008). Parental psychopathology and its relation to child psychopathology. In: Hersen M, Gross AM, editors. Handbook of clinical psychology Vol 2: Children and adolescents. Hoboken, NJ: John Wiley & Sons; 2008. pp. 937–65.

Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression: development of the 10-item Edinburgh postnatal depression scale. Br J Psychiatry. 1987;150:782–6.

Article   CAS   PubMed   Google Scholar  

Martín-Gómez C, Moreno-Peral P, Bellón J, SC Cerón S, Campos-Paino H, Gómez-Gómez I, Rigabert A, Benítez I, Motrico E. Effectiveness of psychological, psychoeducational and psychosocial interventions to prevent postpartum depression in adolescent and adult mothers: study protocol for a systematic review and meta-analysis of randomised controlled trials. BMJ open. 2020;10(5):e034424. [accessed Mar 16 2024].

Article   PubMed   PubMed Central   Google Scholar  

Sehairi Z. (2020). Validation Of The Arabic Version Of The Edinburgh Postnatal Depression Scale And Prevalence Of Postnatal Depression On An Algerian Sample. https://api.semanticscholar.org/CorpusID:216391386 . Accessed August 2023.

Hairol MI, Ahmad SA, Sharanjeet-Kaur S et al. (2021). Incidence and predictors of postpartum depression among postpartum mothers in Kuala Lumpur, Malaysia: a cross-sectional study. PLoS ONE, 16(11), e0259782.

Yusuff AS, Tang L, Binns CW, Lee AH. Prevalence and risk factors for postnatal depression in Sabah, Malaysia: a cohort study. Women Birth. 2015;1(1):25–9. pmid:25466643

Article   Google Scholar  

Nakku JE, Nakasi G, Mirembe F. Postpartum major depression at six weeks in primary health care: prevalence and associated factors. Afr Health Sci. 2006;6(4):207–14. https://doi.org/10.5555/afhs.2006.6.4.207 . PMID: 17604509IF: 1.0 Q4 B4; PMCID: PMC1832062

Clavenna A, Seletti E, Cartabia M, Didoni A, Fortinguerra F, Sciascia T, et al. Postnatal depression screening in a paediatric primary care setting in Italy. BMC Psychiatry. 2017;17(1):42. pmid:28122520

Serhan N, Ege E, Ayrancı U, Kosgeroglu N. (2013). Prevalence of postpartum depression in mothers and fathers and its correlates. Journal of clinical nursing; 1;22(1–2):279–84. pmid:23216556

Deribachew H, Berhe D, Zaid T, et al. Assessment of prevalence and associated factors of postpartum depression among postpartum mothers in eastern zone of Tigray. Eur J Pharm Med Res. 2016;3(10):54–60.

Gebregziabher NK, Netsereab TB, Fessaha YG, et al. Prevalence and associated factors of postpartum depression among postpartum mothers in central region, Eritrea: a health facility based survey. BMC Public Health. 2020;20:1–10.

Grace J, Lee K, Ballard C, et al. The relationship between post-natal depression, somatization and behaviour in Malaysian women. Transcult Psychiatry. 2001;38(1):27–34.

Mahmud WMRW, Shariff S, Yaacob MJ. Postpartum depression: a survey of the incidence and associated risk factors among malay women in Beris Kubor Besar, Bachok, Kelantan. The Malaysian journal of medical sciences. Volume 9. MJMS; 2002. p. 41. 1.

Anna S. Postpartum depression and birthexperience in Russia. Psychol Russia: State Theart. 2021;14(1):28–38.

Yadav T, Shams R, Khan AF, Azam H, Anwar M et al. (2020).,. Postpartum Depression: Prevalence and Associated Risk Factors Among Women in Sindh, Pakistan. Cureus.22;12(12):e12216. https://doi.org/10.7759/cureus.12216 . PMID: 33489623IF: 1.2 NA NA; PMCID: PMC7815271IF: 1.2 NA NA.

Abdullah M, Ijaz S, Asad S. (2024). Postpartum depression-an exploratory mixed method study for developing an indigenous tool. BMC Pregnancy Childbirth 24, 49 (2024). https://doi.org/10.1186/s12884-023-06192-2 .

Upadhyay RP, Chowdhury R, Salehi A, Sarkar K, Singh SK, Sinha B et al. (2022). Postpartum depression in India: a systematic review and meta-analysis. Bull World Health Organ [Internet]. 2017 October 10 [cited 2022 October 6];95(10):706. https://doi.org/10.2471/BLT.17.192237/ .

Khalifa DS, Glavin K, Bjertness E et al. (2016). Determinants of postnatal depression in Sudanese women at 3 months postpartum: a cross-sectional study. BMJ open, 6(3), e00944327).

Khadija Sharifzade BK, Padhi S, Manna etal. (2022). Prevalence and associated factors of postpartum depression among Afghan women: a phase-wise cross-sectional study in Rezaie maternal hospital in Herat province.; Razi International Medical Journa2| 2|59| https://doi.org/10.56101/rimj.v2i2.59 .

Azidah A, Shaiful B, Rusli N, et al. Postnatal depression and socio-cultural practices among postnatal mothers in Kota Bahru, Kelantan, Malaysia. Med J Malay. 2006;61(1):76–83.

CAS   Google Scholar  

Agarwala A, Rao PA, Narayanan P. Prevalence and predictors of postpartum depression among mothers in the rural areas of Udupi Taluk, Karnataka, India: a cross-sectional study. Clin Epidemiol Global Health. 2019;7(3):342–5.

Arikan I, Korkut Y, Demir BK et al. (2017). The prevalence of postpartum depression and associated factors: a hospital-based descriptive study.

Azimi-Lolaty HMD, Hosaini SH, Khalilian A, et al. Prevalence and predictors of postpartum depression among pregnant women referred to mother-child health care clinics (MCH). Res J Biol Sci. 2007;2:285–90.

Najafi KFA, Nazifi F, Sabrkonandeh S. Prevalence of postpartum depression in Alzahra Hospital in Rasht in 2004. Guilan Univ Med Sci J. 2006;15:97–105. (In Persian.).

Deng AW, Xiong RB, Jiang TT, Luo YP, Chen WZ. (2014). Prevalence and risk factors of postpartum depression in a population-based sample of women in Tangxia Community, Guangzhou. Asian Pacific journal of tropical medicine; 1;7(3):244–9. pmid:24507649

Ahmed GK, Elbeh K, Shams RM, et al. Prevalence and predictors of postpartum depression in Upper Egypt: a multicenter primary health care study. J Affect Disord. 2021;290:211–8.

Cantilino A, Zambaldi CF, Albuquerque T, et al. Postpartum depression in Recife–Brazil: prevalence and association with bio-socio-demographic factors. J Bras Psiquiatr. 2010;59:1–9.

Abdollahi F, Zarghami M, Azhar MZ, et al. Predictors and incidence of post-partum depression: a longitudinal cohort study. J Obstet Gynecol Res. 2014;40(12):2191–200.

McCoy SJB, Beal JM, et al. Risk factors for postpartum depression: a retrospective investigation at 4-weeks postnatal and a review of the literature. JAOA. 2006;106:193–8.

PubMed   Google Scholar  

Sierra J. (2008). Risk Factors Related to Postpartum Depression in Low-Income Latina Mothers. Ann Arbor: ProQuest Information and Learning Company, 2008.

Çankaya S. The effect of psychosocial risk factors on postpartum depression in antenatal period: a prospective study. Arch Psychiatr Nurs. 2020;34(3):176–83.

Shao HH, Lee SC, Huang JP, et al. Prevalence of postpartum depression and associated predictors among Taiwanese women in a mother-child friendly hospital. Asia Pac J Public Health. 2021;33(4):411–7.

Adeyemo EO, Oluwole EO, Kanma-Okafor OJ, et al. Prevalence and predictors of postpartum depression among postnatal women in Lagos. Nigeria Afr Health Sci. 2020;20(4):1943–54.

Al Nasr RS, Altharwi K, Derbah MS et al. (2020). Prevalence and predictors of postpartum depression in Riyadh, Saudi Arabia: a cross sectional study. PLoS ONE, 15(2), e0228666.

Desai ND, Mehta RY, Ganjiwale J. Study of prevalence and risk factors of postpartum depression. Natl J Med Res. 2012;2(02):194–8.

Azale T, Fekadu A, Medhin G, et al. Coping strategies of women with postpartum depression symptoms in rural Ethiopia: a cross-sectional community study. BMC Psychiatry. 2018;18(1):1–13.

Asaye MM, Muche HA, Zelalem ED. (2020). Prevalence and predictors of postpartum depression: Northwest Ethiopia. Psychiatry journal, 2020.

Ayele TA, Azale T, Alemu K et al. (2016). Prevalence and associated factors of antenatal depression among women attending antenatal care service at Gondar University Hospital, Northwest Ethiopia. PLoS ONE, 11(5), e0155125.

Saraswat N, Wal P, Pal RS et al. (2021). A detailed Biological Approach on Hormonal Imbalance Causing Depression in critical periods (Postpartum, Postmenopausal and Perimenopausal Depression) in adult women. Open Biology J, 9.

Guida J, Sundaram S, Leiferman J. Antenatal physical activity: investigating the effects on postpartum depression. Health. 2012;4:1276–86.

Robertson E, Grace S, Wallington T, et al. Antenatal risk factors for postpartum depression: a synthesis of recent literature. Gen Hosp Psychiatry. 2004;26:289–95.

Watanabe M, Wada K, Sakata Y, et al. Maternity blues as predictor of postpartum depression: a prospective cohort study among Japanese women. J Psychosom Obstet Gynecol. 2008;29:211–7.

Kirpinar I˙, Gözüm S, Pasinliog˘ lu T. Prospective study of post-partum depression in eastern Turkey prevalence, socio- demographic and obstetric correlates, prenatal anxiety and early awareness. J Clin Nurs. 2009;19:422–31.

Zhao XH, Zhang ZH. Risk factors for postpartum depression: an evidence-based systematic review of systematic reviews and meta-analyses. Asian J Psychiatry. 2020;53:102353.

Bina R. Predictors of postpartum depression service use: a theory-informed, integrative systematic review. Women Birth. 2020;33(1):e24–32.

Jannati N, Farokhzadian J, Ahmadian L. The experience of healthcare professionals providing mental health services to mothers with postpartum depression: a qualitative study. Sultan Qaboos Univ Med J. 2021;21(4):554.

Dennis CL, Chung-Lee L. Postpartum depression help‐seeking barriers and maternal treatment preferences: a qualitative systematic review. Birth. 2006;33(4):323–31.

Haque S, Malebranche M. (2020). Impact of culture on refugee women’s conceptualization and experience of postpartum depression in high-income countries of resettlement: a scoping review. PLoS ONE, 15(9), e0238109.

https:// www.statista.com/statistics/1285335/population-in-ghana-by-languages-spoken/ .

Download references

Acknowledgements

We would like to express our deep thanks to Rovan Hossam Abdulnabi Ali for her role in completing this study and her unlimited support. Special thanks to Dr. Mohamed Liaquat Raza for his role in reviewing the questionnaire. Moreover, we would like to thank all the mothers who participated in this study.

No funding for this project.

Author information

Authors and affiliations.

Department of Public Health and Community Medicine, Faculty of Medicine, Zagazig University, Zagazig, Egypt

Samar A. Amer

Department of Family Medicine, Faculty of Medicine, Zagazig University, Zagazig, Egypt

Nahla A. Zaitoun

Department of Psychiatry, Faculty of Medicine, Zagazig University, Zagazig, Egypt

Heba A. Abdelsalam

Faculty of Medicine, Al-Azhar University, Damietta, Egypt

Abdallah Abbas

Department of Obstetrics and Gynecology, Faculty of Medicine, Zagazig University, Zagazig, Egypt

Mohamed Sh Ramadan

Hammurabi Medical College, University of Babylon, Al-Diwaniyah, Iraq

Hassan M. Ayal

Hardamout University College of Medicine, Almukalla, Yemen

Samaher Edhah Ahmed Ba-Gais

Department of General Medicine, Shadan Institute of Medical Science, Hyderabad, India

Nawal Mahboob Basha

College of Medicine, Sulaiman Alrajhi University, Albukayriah, Al-Qassim, Saudi Arabia

Abdulrahman Allahham

Department of Virology, Noguchi Memorial Institute for Medical Research, University of Ghana Legon, Accra, Ghana

Emmanuael Boateng Agyenim

Department of Public Health and Community Medicine, Faculty of Medicine, Beni-Suef University, Beni-Suef, Egypt

Walid Amin Al-Shroby

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: Samar A. Amer (SA); Methodology: SA, Nahal A. Zaitoun (NZ); Validation: Mohamed Ramadan Ali Shaaban (MR), Hassan Majid Abdulameer Aya (HM), Samaher Edhah Ahmed Ba-Gais (SG), Nawal Mahboob Basha (NB), Abdulrahman Allahham (AbAl), Emmanuael Boateng Agyenim (EB); Formal analysis: Abdallah Abbas (AA); Data curation: MR, HM, SG, NB, AbAl, NZ, and EB; Writing original draft preparation: SA, Heba Ahmed Abdelsalam (HAA), and NZ; Writing review and editing: MR, AA, Walid Amin Elshrowby (WE); Visualization: SA, AA; Supervision: SA; Project administration: AA. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Samar A. Amer .

Ethics declarations

Ethical approval and consent to participate.

All participants were provided with electronic informed consent after receiving clear explanations regarding the study’s objectives, data confidentiality, voluntary participation, and the right to withdraw. The questionnaire did not contain any sensitive questions, and data collection was performed anonymously. We affirm that all relevant ethical guidelines have been adhered to, and any necessary approvals from the ethics committee have been obtained. Approval was received from the ethical committee of the family medicine department, the faculty of medicine at Zagazig University, and from the patients included in the study. IRP#ZU-IRP#11079-8/10-2023.

Practicing ethical decision-making is crucial for providing clinical treatment. Such decisions are frequently made challenging due to a lack of knowledge and the mother’s ability to handle the associated complexities and uncertainties that affect the patient’s current level of functioning and ability to take care of her child. At the end of the survey, we raised concerns regarding the red flags, such as suicidal thoughts, and called for a revisit for the psychiatrist’s evaluation of the discussion of the risks, benefits, and alternatives to using medication.

Consent for publication

All authors have read and agreed to the published version of the manuscript.

Previous publication

We declare that this research paper has not been published elsewhere in any other academic journal or platform.

Generative AI in scientific writing

We declare that we have not used AI in writing any part of this manuscript.

Conflict of interest

No conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Amer, S.A., Zaitoun, N.A., Abdelsalam, H.A. et al. Exploring predictors and prevalence of postpartum depression among mothers: Multinational study. BMC Public Health 24 , 1308 (2024). https://doi.org/10.1186/s12889-024-18502-0

Download citation

Received : 06 February 2024

Accepted : 02 April 2024

Published : 14 May 2024

DOI : https://doi.org/10.1186/s12889-024-18502-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • The Edinburgh postnatal depression scale (EPDS)
  • Determinants
  • Psychosocial

BMC Public Health

ISSN: 1471-2458

research on regression analysis

  • Open access
  • Published: 08 May 2024

CD163 + macrophages in the triple-negative breast tumor microenvironment are associated with improved survival in the Women’s Circle of Health Study and the Women’s Circle of Health Follow-Up Study

  • Angela R. Omilian 1 ,
  • Rikki Cannioto 1 ,
  • Lucas Mendicino 1 ,
  • Leighton Stein 2 ,
  • Wiam Bshara 2 ,
  • Bo Qin 3 , 4 ,
  • Elisa V. Bandera 3 , 4 ,
  • Nur Zeinomar 3 , 4 ,
  • Scott I. Abrams 5 ,
  • Chi-Chen Hong 1 ,
  • Song Yao 1 ,
  • Thaer Khoury 2 &
  • Christine B. Ambrosone 1  

Breast Cancer Research volume  26 , Article number:  75 ( 2024 ) Cite this article

240 Accesses

Metrics details

Tumor-associated macrophages (TAMs) are a prominent immune subpopulation in the tumor microenvironment that could potentially serve as therapeutic targets for breast cancer. Thus, it is important to characterize this cell population across different tumor subtypes including patterns of association with demographic and prognostic factors, and breast cancer outcomes.

We investigated CD163 + macrophages in relation to clinicopathologic variables and breast cancer outcomes in the Women’s Circle of Health Study and Women’s Circle of Health Follow-up Study populations of predominantly Black women with breast cancer. We evaluated 611 invasive breast tumor samples (507 from Black women, 104 from White women) with immunohistochemical staining of tissue microarray slides followed by digital image analysis. Multivariable Cox proportional hazards models were used to estimate hazard ratios for overall survival (OS) and breast cancer-specific survival (BCSS) for 546 cases with available survival data (median follow-up time 9.68 years (IQR: 7.43–12.33).

Women with triple-negative breast cancer showed significantly improved OS in relation to increased levels of tumor-infiltrating CD163 + macrophages in age-adjusted (Q3 vs. Q1: HR = 0.36; 95% CI 0.16–0.83) and fully adjusted models (Q3 vs. Q1: HR = 0.30; 95% CI 0.12–0.73). A similar, but non-statistically significant, association was observed for BCSS. Macrophage infiltration in luminal and HER2+ tumors was not associated with OS or BCSS. In a multivariate regression model that adjusted for age, subtype, grade, and tumor size, there was no significant difference in CD163 + macrophage density between Black and White women (RR = 0.88; 95% CI 0.71–1.10).

Conclusions

In contrast to previous studies, we observed that higher densities of CD163 + macrophages are independently associated with improved OS and BCSS in women with invasive triple-negative breast cancer.

Trial registration

Not applicable.

The tumor-immune microenvironment (TIME) has a key role in pathologic complete response and patient survival in breast cancer [ 1 , 2 , 3 , 4 ]. While tumor-infiltrating lymphocytes (TILs) in aggregate and various T cell subpopulations have been routinely examined, tumor-associated macrophages (TAMs) and other cells of the myeloid lineage have received less attention, despite being a prevalent immune subpopulation in breast carcinoma. Typically, high macrophage counts in breast tumors are regarded as being associated with tumor progression and poorer survival [ 5 , 6 , 7 , 8 ]. However, much prior work on macrophage markers in relation to breast cancer outcomes had small study samples that precluded analyses stratified by subtype, or adequately powered analyses adjusted for prognostic factors that are known to influence breast cancer survival. Moreover, most of these earlier studies were overwhelmingly conducted in populations of White or Asian women, and representation of Black women on this topic is poor, with only a handful of studies to date [ 9 , 10 , 11 ].

Novel therapeutic approaches that target macrophages are an increasingly important area of clinical study, and thus it is important to understand how specific macrophage markers vary in accordance with demographic and clinical factors [ 12 , 13 ]. As part of our ongoing work that investigates the breast TIME in relation to aggressive disease and poorer outcomes in Black women, we investigated the macrophage marker CD163 among women participating in the Women’s Circle of Health Study and Women’s Circle of Health Follow-up Study. Our objective was to compare macrophage infiltration between Black and White women and to investigate the association of CD163 + cells with overall and breast cancer-specific survival in a study sample that was large enough to allow stratification by subtype and adjustment for known prognostic factors in breast cancer.

Study population

We used data and tissue samples from women participating in the Women’s Circle of Health Study (WCHS), a multi-site, case–control study designed to evaluate the risk factors for aggressive breast cancer in Black and White women, and the Women’s Circle of Health Follow-up Study (WCHFS), a population-based cohort study of Black breast cancer survivors, both of which have been described extensively in our previous work and in the Additional file 1 : Methods [ 14 , 15 , 16 , 17 ]. The WCHS and WCHFS used the same methods for recruitment, interviews, and eligibility. Briefly, participants were 20–75 years old; self-identified as Black or White (for WCHS); had primary, histologically confirmed invasive breast cancer or ductal carcinoma in situ (DCIS); and had no previous history of cancer other than non-melanoma skin cancer. Women in WCHS were diagnosed between 2001 and 2013 and included Black and White cases from New York City and New Jersey; while cases in WCHFS included only Black women diagnosed from 2013 to 2019 in New Jersey. Clinical and tumor pathology variables were extracted from the pathology reports. All women provided informed consent and the study protocol was approved by the Institutional Review Boards at Rutgers Cancer Institute of New Jersey and Roswell Park Comprehensive Cancer Center.

Tissue samples

Formalin-fixed and paraffin-embedded (FFPE) invasive breast tumor tissues were built into tissue microarrays (TMAs) under the guidance of an experienced breast pathologist (TK). TMA cores ranged in size from 0.6 to 1.2 mm in diameter, and the majority of patient tumors (67.2%) were represented by at least 3 TMA cores (range 1–6 cores). We aimed to include both tumor nests and stromal regions when selecting regions for coring and avoid the tumor margins. TMA construction was completed in 2017 from patients recruited up until this point with incident, primary, and treatment-naïve invasive breast cancer. As the WCHS and WCHFS focused on recruiting Black women, the number of cases from Black women in our dataset exceeds the number of White cases (Black: N = 507, White: N = 104).

Immunohistochemical staining and image analysis

CD163 has long been established as a clinical antibody for detecting histiocytes that has greater specificity than CD68 [ 18 ], and is commonly used to represent immunosuppressive macrophages in the TIME in research studies [ 19 ]. Immunohistochemistry (IHC) was performed by the Pathology Network Shared Resource at Roswell Park following standard procedures. To reduce staining variability that can occur with IHC, we used an automated staining platform, clinical-grade reagents, and stained all TMAs in a single batch. Briefly, TMA sections were cut at 4 μm, placed on charged slides, dried, and deparaffinized. Bond Epitope Retrieval 2 (Leica AR9640) was used for antigen retrieval. Slides were stained on the Leica Bond Rx autostainer with the CD163 antibody (Leica Biosystems, clone 10D6) and the Bond Polymer Refine Detection kit (Leica DS9800). Diaminobenzidine (DAB) was used for marker visualization. TMA cores were excluded if the tumor could not be reliably scored for CD163 marker expression (e.g., the tissue was folded or damaged) or there was insufficient tumor cellularity (cutoff set at 100 tumor cells).

Slides were digitally scanned using Aperio AT2 (Leica Biosystems, Inc., Buffalo Grove, IL) with 20X bright-field microscopy. Aperio ImageScope version 12.4.3.8007 (Leica Biosystems, Inc., Buffalo Grove, IL) was used for image analysis. Slide image data fields were populated, and images were visually examined for quality and amended as necessary (e.g., core excluded if there was excessive folding or damage). An annotation layer was created for each core and our study pathologist who was blinded to sample characteristics made an image analysis algorithm macro that was used to quantify the number of cells that were positive for CD163 stain. Details pertaining to the algorithm and scoring are described in the Additional file 1 : Methods. The number of CD163 + cells in each patient sample were reported per square millimeter of tumor tissue and the average CD163 + cell density across multiple cores from each patient was used for analyses.

Epidemiological and tumor variables

Women self-identified their race in the baseline questionnaire. Tumor and clinicopathological factors were abstracted from the patient pathology report and included AJCC stage, grade, tumor size, node status, and treatment (surgery, chemotherapy, radiation therapy, and/or hormone therapy). Breast cancer subtypes were inferred from estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status information from the pathology reports as follows: luminal (HR+/HER2−), HER2-positive (HR+/HER2+ or HR-/HER2+), and triple-negative (HR-/HER2−), where hormone receptor (HR+) refers to ER+ and/or PR+. Other factors, including age and body mass index (BMI), were obtained by interviewer- and self-administered questionnaires at baseline and have been previously described [ 20 ].

Breast cancer outcomes

Data on vital status, including dates and causes of death, were ascertained through linkage with the New Jersey State Cancer Registry files, and were available for 546 cases. Primary outcomes of interest in the study were overall survival (OS) and breast cancer-specific survival (BCSS). The ICD-10 code (C50) was used to identify breast cancer mortality. Time to follow-up was calculated from the date of diagnosis until the date of last follow-up (August 2023) or death from any cause or death from breast cancer.

Statistical analyses

Demographic and clinical factors were summarized using the mean and standard deviation for normally distributed continuous variables and the median and interquartile range (IQR) otherwise, and number and percentage for categorical variables. A negative binomial regression model was used to resolve overdispersion of CD163 + cell density and non-normally distributed residuals seen with a linear model. A zero-inflation parameter was included due to underfitting of zero values and an offset term for the log of total cell density to account for tumor cellularity differences across patients. Model assumptions were verified graphically. Beta coefficients were exponentiated to obtain Rate Ratios (RR) and 95% Confidence Intervals (CI) representing the change in CD163 + cell density in terms of percentage increase or decrease. Separate models were used to model CD163 + cell density as a function of race and clinical/tumor factors. F tests about the appropriate contrasts of model estimates were used to evaluate, within race, the association between CD163 + cell density and each factor. A multivariable model was formulated to assess the association between race and CD163 + macrophage density, adjusted for age, subtype, grade, and tumor size.

Multivariable Cox regression models were used to compute hazard ratios (HRs) and 95% confidence intervals (CIs) for the association of CD163 + cell density with OS and BCSS for each breast cancer subtype. As there are currently no established cutoffs in the literature, CD163 + cell density was divided into tertiles. Other cutoffs were examined, including dividing CD163 + cell density at the median, and by quantiles and quintiles. Variables that were significantly associated with CD163 + cell density or survival in the univariate setting were added to a multivariable model and sequentially removed while assessing model fit using a likelihood ratio test (Additional file 1 : Tables S1 and S2). Covariates were retained in the final model if their inclusion improved model fit. Model covariates differed by breast cancer subtype. Model 1 was adjusted for age at diagnosis. For OS, Model 2 was adjusted for age, BMI, stage, and tumor size in the luminal subtype; age plus tumor size for the HER2+ subtype; and age, stage, grade, and node status for the triple-negative subtype. For BCSS, Model 2 was adjusted for age and BMI in the luminal subtype; no additional covariates were retained for the HER2+ subtype; and age and stage for the triple-negative subtype. The proportional hazards assumption was verified graphically by analyzing the correlation between time and scaled Schoenfeld residuals. All statistical analyses were conducted in R (version 4.2.0) and two-sided p values ≤ 0.05 were considered statistically significant. Analyses are reported according to REMARK guidelines [ 21 ].

Characteristics of the cohort

Cohort characteristics are shown in Table  1 and the study sampling schema is shown in Additional file 1 : Figure S1. In total, there were 611 women with invasive breast cancer (507 Black and 104 White); of these 546 women had available survival data. Compared with White women, Black women were significantly more likely to have higher BMI (30.5 vs 26.6 kg/m 2 ), have tumors that were ER-negative (33.5 vs 21.2, p  = 0.01), triple-negative (25.5 vs 14.4, p  = 0.04), and tumors with higher grade (54.1 grade 3 vs 35.6 p  = 0.003). Black women were also more likely than White women to have received radiation therapy (68.5 vs 46.5, p  < 0.001). There were no statistically significant differences between Black and White women in age, the distribution of breast cancer stage, mean tumor size, node status, and the receipt of surgery, chemotherapy, or hormone therapy.

Macrophage densities, race, and clinical prognostic factors

Staining is shown for cores representative of low, intermediate, and high levels of CD163 + macrophage infiltration in Fig.  1 . Almost all women in the WCHS had macrophages in their tumors; CD163 + macrophages were not detected in only 6 out of 611 women. In univariate analyses, Black women had a significantly higher density of CD163 + cells ( p  = 0.0099, Fig.  2 a). CD163 + macrophage densities were also higher in triple-negative tumors ( p  < 0.0001, Fig.  2 b), and higher-grade tumors ( p  < 0.0001, Fig.  2 c). Black women with the triple-negative subtype (median 574.3 cells/μm 2 , p  < 0.001), Black women with the HER2 + subtype (314.6 cells/μm 2 , p  < 0.001), and White women with the HER2 + subtype (281.5 cells/μm 2 , p  = 0.035) had significantly higher densities of CD163 + macrophages compared to White women with the luminal subtype (Fig.  2 d). In the overall study population, CD163 + macrophage density was significantly associated with age ( p  = 0.025), breast cancer subtype ( p  < 0.001), stage ( p  < 0.001), grade ( p  < 0.001), and tumor size ( p  = 0.002); similar associations were observed when the Black population was examined separately (Table  2 ). In a multivariate negative binomial regression model that adjusted for age, subtype, grade, and tumor size, there were no significant differences in CD163 + macrophage densities between Black and White women (RR = 0.88; 95% CI 0.71–1.10). To investigate a possible cohort effect given that recruitment for White women ended earlier than that for Black women, we compared Black and White cases up until the last timepoint that White women were enrolled and observed similar results (RR = 0.88; 95% CI 0.67–1.16).

figure 1

Representative CD163 immunohistochemical staining in breast tissue microarray cores. Two representative cores are shown from each of three categories of infiltration: a low, b intermediate, c high

figure 2

Boxplots comparing CD163 + cell density by a race, b breast cancer subtype, c tumor grade, and d combination of race and breast cancer subtype. Comparisons tested using negative binomial regression. ns non-significant, * p  < 0.05, ** p  < 0.01, *** p  < 0.001, **** p  < 0.0001

Survival outcomes and CD163 + macrophages

Data for survival analyses were available for 546 women, with 127 deaths, 66 of which were due to breast cancer. The median follow-up time was 9.68 years (IQR: 7.43–12.33) years. For the overall cohort, increasing tertiles of CD163 + macrophage density were not associated with a significant improvement in OS or BCSS in the age-adjusted models (Table  3 ). For the fully adjusted models, there was a significant association for OS (Q3 vs. Q1: HR = 0.59; 95% CI 0.37–0.94), but not BCSS (Q3 vs. Q1: HR = 0.59; 95% CI 0.30–1.14; Table  3 ). In both age-adjusted and fully adjusted models stratified by subtype, increasing tertiles of CD163 + macrophage density were associated with a significant improvement in OS (Q3 vs. Q1: HR = 0.30; 95% CI 0.12–0.73; Table  4 ) in the triple-negative subtype. A statistically significant association between CD163 + macrophage densities and OS was not observed for the luminal and HER2+ subtypes. A similar pattern was observed for BCSS, in which increasing CD163 + macrophage densities were associated with better survival in the triple-negative subtype only (Q3 vs. Q1: HR = 0.38; 95% CI 0.10–1.44), although the associations were not significant (Table  4 ).

To ensure that race and grade were not confounding the associations that we observed in the triple-negative subtype, additional multivariable analyses that added race and grade as variables in the fully adjusted models were investigated. Again, we observed that increasing CD163 + macrophage density was associated with a significant improvement in OS for the triple-negative subtype (Q3 vs. Q1: HR = 0.28; 95% CI 0.11–0.69), but not for the luminal or HER2+ subtypes (Additional file 1 : Table S3). Several additional sensitivity analyses were performed to ensure our results were robust. Additional cut points of CD163 marker density were examined, such as dividing at the cohort median to differentiate high vs low CD163 density, as well as quantiles and quintiles (Additional file 1 : Tables S4 and S5). We stratified by ER status rather than breast cancer subtype (Additional file 1 : Table S6). Lastly, we performed the analysis in Black patients only (Additional file 1 : Table S7). For all these additional analyses, we observed that increasing levels of CD163 + macrophage infiltration were associated with improved OS in the triple-negative subtype (or ER-negative group for analyses stratified by ER status), and this effect was not observed for the luminal or HER2 + subtypes.

In this study, we found that increasing densities of CD163 + macrophages in the breast TIME were associated with a pronounced and significant improvement in OS for women with the triple-negative subtype. Prior studies investigating the association between TAMs and breast cancer prognosis have contributed to a general consensus that high levels of TAMs in the breast TIME, especially M2-like macrophages, are associated with adverse survival outcomes [ 5 , 6 , 7 , 9 , 22 ]. So, what might explain the differing results in our study? First, we have a relatively large population of Black women allowing us to stratify by subtype and adjust for confounding factors. As subtypes of breast cancer differ in their patterns of short and long-term survival, stratification by subtype can reveal different associations in relation to prognostic or risk factors [ 23 , 24 , 25 ]. This holds true for patterns of immune infiltration in the breast TIME that are known to vary by subtype and show differing associations with survival [ 1 , 26 , 27 ]. The majority of prior studies that examined TAM infiltration in breast carcinoma were underpowered for subtype-specific associations, especially for the triple-negative subtype, in which sample sizes were extremely small [ 5 , 6 , 9 , 11 ].

Second, macrophages are a complex immune cell population with a variety of phenotypes and functional states that can be tissue specific and dependent on microenvironmental cues and/or spatial proximity to other immune subsets [ 28 , 29 , 30 ]. Moreover, there are no standardized methods for macrophage detection and different studies have used different markers (e.g., CD68, CD163, or CD206) and staining platforms to make conclusions about the prognostic value of macrophages in invasive breast cancer. Methods for quantifying macrophages in the breast TIME are also heterogeneous (e.g., density, percentage) as well as the tissue compartment in which macrophages are assessed (e.g., tumor compartment vs. stroma or both). The cutoff values for what constitute high versus low macrophage infiltration also varies by study, as well as what factors are included in multivariable models.

We conducted several quality controls and performed several sensitivity analyses to ensure that our findings were robust. First, we used a clinical-grade CD163 antibody that is approved for in vitro diagnostic purposes. Second, quality control for staining specificity was performed by an experienced breast pathologist. Third, automated image analysis was performed ensuring that the quantification of CD163 positive cells was standardized and objective across each TMA core. Fourth, all TMAs were stained in a single batch to eliminate inter-batch variability that is known to occur with IHC. From an analysis standpoint, we examined different cutoffs for what constitutes high or low CD163 + macrophage infiltration, dividing the cohort at the median, tertiles, quantiles, and quintiles. We examined associations when stratifying by ER status instead of subtype. Lastly, we examined Black women separately. The same general patterns of improved OS and BCSS in the triple-negative subtype (or ER- group) were observed across all these additional analyses.

As shown in our results and in the literature, high macrophage infiltration in breast cancer is correlated with several factors that indicate poor survival, like the triple-negative subtype, and higher grade and stage [ 5 , 6 , 7 , 8 ]. In prior studies that could not account for these factors, the associations of high macrophage densities with poor survival may have been largely driven by these correlated factors. A recent study that investigated multiple macrophage markers in relation to breast cancer outcomes showed that when examining the ER-positive versus ER-negative groups separately, high expression of CD163 was associated with improved OS in ER− cases, but not in ER+ cancers [ 31 ]. When examining CD163 expression by tumor locations, Fortis et al. found that disease-free survival (DFS) and OS were prolonged in patients with CD163 expression that was low in the tumor center but high at the invasive margins compared to the inverse (i.e., high in tumor center and low in the invasive margin) [ 32 ]. Collectively, these findings together with those reported in our study add to the existing body of evidence suggesting that tumor-associated macrophages have distinct programs that vary by tissue context or breast cancer subtype. While CD163 + macrophages are usually regarded as immune-suppressing and tumor-promoting, human macrophages are likely to concurrently exhibit phenotypic characteristics of both M1-like and M2-like subtypes. Therefore, to gain a broader appreciation of the macrophage response in breast cancer outcomes, phenotypic studies combined with comprehensive functional and transcriptomic analyses may strengthen translational relevance to prognosis.

Univariate analyses showed that CD163 + cell densities differed between Black and White women, but these differences were attenuated in the multivariable analyses that adjusted for age, grade, tumor size, and breast cancer subtype. Earlier work has shown that immune profiles vary in breast tumors from Black and White women [ 14 , 15 , 33 , 34 ]. While other studies have compared macrophage markers in Black and White women, to our knowledge, only a couple studies have compared CD163 marker expression specifically [ 9 , 10 ]. Koru-Sengul et al. reported that Black women had higher levels of CD163 + macrophages, however multivariable analyses were not performed [ 11 ]. In a more recent study, Bauer et al. found that the frequency of CD163 + macrophages varied by region within African populations and a population from Germany; West African women had the highest numbers of CD163 + macrophages [ 35 ].

The strengths of this work are accompanied by some limitations. While our study sample exceeds that of several prior studies of CD163 in relation to breast cancer prognosis, it is nonetheless not as large as some of the more well-characterized T cell populations like CD8 + T cells [ 4 ], and our findings need to be replicated in additional cohorts. As the WCHS and WCHFS prioritized recruitment of Black women, our findings may not be generalizable to more demographically or clinically diverse populations. As the vast majority (89.5%) of our cases were obtained through the New Jersey Cancer registry, our sample is largely population-based. Nonetheless, potential sources of bias include women who agreed to participate verses those who did not. However, the distributions of tumor stage and grade are similar among participants in the WCHFS and all eligible breast cancer cases in the New Jersey State Cancer Registry in the same counties, suggesting that tumor characteristics in our study are representative of Black women diagnosed with breast cancer in New Jersey [ 16 ]. Recall bias is minimized as the data pertaining to the tumor characteristics were obtained by independent review of pathology reports. Despite adjusting for important clinical and demographic prognostic factors, we cannot rule out the possibility of residual confounding due to unmeasured variables. Lastly, although whole sections are ideal for studies of the TIME, a study of this size is not practicable with whole sections, and therefore TMAs are commonly used in large studies of marker expression in breast cancer [ 4 , 36 , 37 ]. Importantly, we cored the interior of the tumor block for TMA construction and thus our results are specific to this region and do not apply to the tumor interface or other non-tumor regions. Macrophages are a complex population and our future work will build on this fundamental finding, making use of multiplexed panels to more fully define macrophage phenotypes in women with invasive breast cancer, as well as their spatial distribution, which could further influence their prognostic relevance [ 32 ].

We observed that higher densities CD163 + macrophages are independently associated with improved OS and BCSS in the triple-negative subtype. Future investigations will expand upon this work in a larger cohort, incorporating more comprehensive multiplexed staining technologies to further define the complexity of macrophage functional states and compare their localization within the TIME to prognosis in women with invasive breast cancer.

Availability of data and materials

Epidemiological data and image data are available from the corresponding author upon reasonable request.

Denkert C, von Minckwitz G, Darb-Esfahani S, et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 2018;19:40–50.

Article   PubMed   Google Scholar  

Fridman WH, Zitvogel L, Sautes-Fridman C, Kroemer G. The immune contexture in cancer prognosis and treatment. Nat Rev Clin Oncol. 2017;14:717–34.

Article   CAS   PubMed   Google Scholar  

Savas P, Salgado R, Denkert C, et al. Clinical relevance of host immunity in breast cancer: from TILs to the clinic. Nat Rev Clin Oncol. 2016;13:228–41.

Ali HR, Provenzano E, Dawson SJ, et al. Association between CD8+ T-cell infiltration and breast cancer survival in 12,439 patients. Ann Oncol. 2014;25:1536–43.

Medrek C, Ponten F, Jirstrom K, Leandersson K. The presence of tumor associated macrophages in tumor stroma as a prognostic marker for breast cancer patients. BMC Cancer. 2012;12:306.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Tiainen S, Tumelius R, Rilla K, et al. High numbers of macrophages, especially M2-like (CD163-positive), correlate with hyaluronan accumulation and poor outcome in breast cancer. Histopathology. 2015;66:873–83.

Ni C, Yang L, Xu Q, et al. CD68- and CD163-positive tumor infiltrating macrophages in non-metastatic breast cancer: a retrospective study and meta-analysis. J Cancer. 2019;10:4463–72.

Jamiyan T, Kuroda H, Yamaguchi R, Abe A, Hayashi M. CD68- and CD163-positive tumor-associated macrophages in triple negative cancer of the breast. Virchows Arch. 2020;477:767–75.

Mukhtar RA, Moore AP, Nseyo O, et al. Elevated PCNA+ tumor-associated macrophages in breast cancer are associated with early recurrence and non-Caucasian ethnicity. Breast Cancer Res Treat. 2011;130:635–44.

Carrio R, Koru-Sengul T, Miao F, et al. Macrophages as independent prognostic factors in small T1 breast cancers. Oncol Rep. 2013;29:141–8.

Koru-Sengul T, Santander AM, Miao F, et al. Breast cancers from black women exhibit higher numbers of immunosuppressive macrophages with proliferative activity and of crown-like structures associated with lower survival compared to non-black Latinas and Caucasians. Breast Cancer Res Treat. 2016;158:113–26.

Mantovani A, Allavena P, Marchesi F, Garlanda C. Macrophages as tools and targets in cancer therapy. Nat Rev Drug Discov. 2022;21:799–820.

Goswami S, Anandhan S, Raychaudhuri D, Sharma P. Myeloid cell-targeted therapies for solid tumours. Nat Rev Immunol. 2023;23:106–20.

Yao S, Cheng TD, Elkhanany A, et al. Breast tumor microenvironment in black women: a distinct signature of CD8+ T-cell exhaustion. J Natl Cancer Inst. 2021;113:1036–43.

Article   PubMed   PubMed Central   Google Scholar  

Abdou Y, Attwood K, Cheng TD, et al. Racial differences in CD8(+) T cell infiltration in breast tumors from Black and White women. Breast Cancer Res. 2020;22:62.

Bandera EV, Demissie K, Qin B, et al. The Women’s Circle of Health Follow-Up Study: a population-based longitudinal study of Black breast cancer survivors in New Jersey. J Cancer Surviv. 2020;14:331–46.

Ambrosone CB, Ciupak GL, Bandera EV, et al. Conducting molecular epidemiological research in the age of HIPAA: a multi-institutional case-control study of breast cancer in African-American and European-American Women. J Oncol. 2009;2009:871250.

Lau SK, Chu PG, Weiss LM. CD163: a specific marker of macrophages in paraffin-embedded tissue samples. Am J Clin Pathol. 2004;122:794–801.

Mantovani A, Marchesi F, Malesci A, Laghi L, Allavena P. Tumour-associated macrophages as treatment targets in oncology. Nat Rev Clin Oncol. 2017;14:399–416.

Bandera EV, Qin B, Lin Y, et al. Association of body mass index, central obesity, and body composition with mortality among black breast cancer survivors. JAMA Oncol. 2021;7:1–10.

McShane LM, Altman DG, Sauerbrei W, et al. Reporting recommendations for tumor marker prognostic studies (REMARK). J Natl Cancer Inst. 2005;97:1180–4.

Sousa S, Brion R, Lintunen M, et al. Human breast cancer cells educate macrophages toward the M2 activation status. Breast Cancer Res. 2015;17:101.

Ambrosone CB, Zirpoli G, Ruszczyk M, et al. Parity and breastfeeding among African-American women: differential effects on breast cancer risk by estrogen receptor status in the Women’s Circle of Health Study. Cancer Causes Control. 2014;25:259–65.

Millikan RC, Newman B, Tse CK, et al. Epidemiology of basal-like breast cancer. Breast Cancer Res Treat. 2008;109:123–39.

Blows FM, Driver KE, Schmidt MK, et al. Subtyping of breast cancer by immunohistochemistry to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies. PLoS Med. 2010;7:e1000279.

Stanton SE, Adams S, Disis ML. Variation in the incidence and magnitude of tumor-infiltrating lymphocytes in breast cancer subtypes: a systematic review. JAMA Oncol. 2016;2:1354–60.

Hammerl D, Massink MPG, Smid M, et al. Clonality, antigen recognition, and suppression of CD8(+) T cells differentially affect prognosis of breast cancer subtypes. Clin Cancer Res. 2020;26:505–17.

Cassetta L, Fragkogianni S, Sims AH, et al. Human tumor-associated macrophage and monocyte transcriptional landscapes reveal cancer-specific reprogramming, biomarkers, and therapeutic targets. Cancer Cell. 2019;35:588–602.

DeNardo DG, Ruffell B. Macrophages as regulators of tumour immunity and immunotherapy. Nat Rev Immunol. 2019;19:369–82.

Laviron M, Petit M, Weber-Delacroix E, et al. Tumor-associated macrophage heterogeneity is driven by tissue territories in breast cancer. Cell Rep. 2022;39:110865.

Pelekanou V, Villarroel-Espindola F, Schalper KA, Pusztai L, Rimm DL. CD68, CD163, and matrix metalloproteinase 9 (MMP-9) co-localization in breast tumor microenvironment predicts survival differently in ER-positive and -negative cancers. Breast Cancer Res. 2018;20:154.

Fortis SP, Sofopoulos M, Sotiriadou NN, et al. Differential intratumoral distributions of CD8 and CD163 immune cells as prognostic biomarkers in breast cancer. J Immunother Cancer. 2017;5:39.

Martin DN, Boersma BJ, Yi M, et al. Differences in the tumor microenvironment between African-American and European-American breast cancer patients. PLoS ONE. 2009;4:e4531.

Hamilton AM, Hurson AN, Olsson LT, et al. The landscape of immune microenvironments in racially diverse breast cancer patients. Cancer Epidemiol Biomarkers Prev. 2022;31:1341–50.

Bauer M, Vetter M, Stuckrath K, et al. Regional variation in the tumor microenvironment, immune escape and prognostic factors in breast cancer in Sub-Saharan Africa. Cancer Immunol Res. 2023;11:720–31.

Salgado R, Denkert C, Demaria S, et al. The evaluation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommendations by an International TILs Working Group 2014. Ann Oncol. 2015;26:259–71.

Denkert C, Loibl S, Noske A, et al. Tumor-associated lymphocytes as an independent predictor of response to neoadjuvant chemotherapy in breast cancer. J Clin Oncol. 2010;28:105–13.

Download references

Acknowledgements

Biospecimens or research pathology services for this study were provided by the Pathology Network Shared Resource and the DataBank and Biorepository Shared Resource, which are funded by the National Cancer Institute (NCI P30CA16056) as Cancer Center Support Grant shared resources.

This work was supported by the National Cancer Institute (R01 CA10059, R01 CA185623, R01 CA247281, R01 CA133264, P01 CA151135, R03 CA238792, P30 CA16056). The New Jersey State Cancer Registry, Cancer Epidemiology Services, New Jersey Department of Health are funded by the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute under contract No. HHSN261201300021I and No. N01-PC-2013-00021, the National Program of Cancer Registries (NPCR), Centers for Disease Control and Prevention under Grant No. NU5U58DP006279-02-00, and the State of New Jersey and the Rutgers Cancer Institute of New Jersey. Dr. Ambrosone is supported by the Breast Cancer Research Foundation.

Author information

Authors and affiliations.

Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA

Angela R. Omilian, Rikki Cannioto, Lucas Mendicino, Chi-Chen Hong, Song Yao & Christine B. Ambrosone

Department of Pathology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA

Leighton Stein, Wiam Bshara & Thaer Khoury

Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, NJ, USA

Bo Qin, Elisa V. Bandera & Nur Zeinomar

Cancer Epidemiology and Health Outcomes, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA

Department of Immunology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA

Scott I. Abrams

You can also search for this author in PubMed   Google Scholar

Contributions

Conception and design of the work: A.R.O., R.C. Acquisition and/or analysis of the data: A.R.O, R.C., L.M., L.S., W.B., B.Q., E.V.B., C.H., T.K., S.Y., C.B.A. Interpretation of data: A.R.O., R.C., L.M., S.I.A., S.Y., T.K., C.B.A., N.Z, E.V.B, B.Q. Drafted the manuscript: A.R.O. Approved the submitted version: All authors. All authors have agreed both to be personally accountable for their contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.

Corresponding author

Correspondence to Angela R. Omilian .

Ethics declarations

Ethics approval and consent to participate.

All women provided informed consent and the study protocol was approved by the Institutional Review Boards at Rutgers Cancer Institute of New Jersey and Roswell Park Comprehensive Cancer Center.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: table s1..

Univariate Cox regression models assessing associations of additional CD163 + cell density cutoffs and cohort characteristics with overall survival (OS) within subtype. Table S2. Univariate Cox regression models assessing associations of additional CD163 + cell density cutoffs and cohort characteristics with breast cancer-specific survival (BCSS) within subtype. Table S3.  Multivariable Cox regression models assessing associations between CD163 + cell density tertiles with OS and BCSS within subtype, additionally adjusting for self-identified race and grade in Model 2. Table S4.  Multivariable Cox regression models assessing associations of additional CD163 + cell density cutoffs with OS within subtype. Table S5. . Multivariable Cox regression models assessing associations of additional CD163 + cell density cutoffs with BCSS within subtype. Table S6.  Multivariable Cox regression models assessing associations between CD163 + cell density tertiles with OS and BCSS by estrogen receptor (ER) status. Table S7.  Multivariable Cox regression models assessing associations between CD163 + cell density tertiles with OS and BCSS within Black cases. Figure S1. Diagram of participant availability for CD163 profiling in the Women’s Circle of Health Study.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Omilian, A.R., Cannioto, R., Mendicino, L. et al. CD163 + macrophages in the triple-negative breast tumor microenvironment are associated with improved survival in the Women’s Circle of Health Study and the Women’s Circle of Health Follow-Up Study. Breast Cancer Res 26 , 75 (2024). https://doi.org/10.1186/s13058-024-01831-8

Download citation

Received : 16 February 2024

Accepted : 25 April 2024

Published : 08 May 2024

DOI : https://doi.org/10.1186/s13058-024-01831-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Tumor-associated macrophages
  • Breast cancer

Breast Cancer Research

ISSN: 1465-542X

research on regression analysis

  • Open access
  • Published: 10 May 2024

Digital media use, depressive symptoms and support for violent radicalization among young Canadians: a latent profile analysis

  • Diana Miconi 1 ,
  • Tara Santavicca 2 ,
  • Rochelle L. Frounfelker 3 ,
  • Aoudou Njingouo Mounchingam 4 &
  • Cécile Rousseau 5  

BMC Psychology volume  12 , Article number:  260 ( 2024 ) Cite this article

154 Accesses

6 Altmetric

Metrics details

Despite the prominent role that digital media play in the lives and mental health of young people as well as in violent radicalization (VR) processes, empirical research aimed to investigate the association between Internet use, depressive symptoms and support for VR among young people is scant. We adopt a person-centered approach to investigate patterns of digital media use and their association with depressive symptoms and support for VR.

A sample of 2,324 Canadian young people (M age = 30.10; SD age = 5.44 ; 59% women) responded to an online questionnaire. We used latent profile analysis to identify patterns of digital media use and linear regression to estimate the associations between class membership, depressive symptoms and support for VR.

We identified four classes of individuals with regards to digital media use, named

Average Internet Use/Institutional trust, Average internet use/Undifferentiated Trust, Limited Internet Use/Low Trust and Online Relational and Political Engagement/Social Media Trust. Linear regression indicated that individuals in the Online Relational and Political Engagement/Social Media Trust and Average Internet Use/Institutional trust profiles reported the highest and lowest scores of both depression and support for VR, respectively.

Conclusions

It is essential to tailor prevention and intervention efforts to mitigate risks of VR to the specific needs and experiences of different groups in society, within a socio-ecological perspective. Prevention should consider both strengths and risks of digital media use and simulteaneously target both online and offline experiences and networks, with a focus on the sociopolitical and relational/emotional components of Internet use.

Peer Review reports

The recent increase in support for – and direct engagement in – ideologically motivated violence among youth can be associated with the increase in social polarization in society [ 1 ] as well as the specificities of adolescence and early adulthood, a seminal period for the development of ideologies [ 2 , 3 ]. Violent radicalization (VR) is a complex and multidimensional phenomenon [ 4 ] defined as a process whereby an individual or a group increases support for violence as a legitimate means to reach a specific (e.g., political, social, religious) goal [ 5 ]. Noteably, VR processes are increasingly occurring online [ 6 , 7 ]. Internet use has been primarily investigated in the field of terrorism studies and with samples of radicalized individuals [ 6 , 8 ]. Less is known about the association of digital media use, social polarization and attitudes towards support for VR among young people. Although the association between attitudes and behaviors is not a linear one, positive attitudes towards VR can contribute to the creation of socially polarized environments that fuel conflicts and shatter social solidarities, resulting in some cases in extremist ideologies and the normalization of violence. In such contexts, vulnerable individuals - such as those experiencing significant social grievances - are at higher risk of engaging in violent acts and extremism. Thus, in a primary prevention perspective, a reduction in support of VR among youth can result in an overall decrease of violence in our societies in the short and long-term [ 9 , 10 , 11 ].

Although numerous interventions target online literacy and social media use as potential ways to counter violent extremism [ 7 ], empirical research on their effectiveness is scarce and the role that Internet use plays in the development of positive attitudes towards VR among young people is largely understudied. While depressive symptomatology, which has also been increasing among young people in the past decade [ 12 , 13 ], is associated with both digital media use and support for VR [ 14 , 15 , 16 , 17 ], empirical research has not yet examined the associations between these variables simultaneously in one study. The current study aims to fill this gap in the literature by empirically investigating if and how patterns of digital media use are differently associated with depressive symptoms and support for VR among a sample of Canadian young people via a person-centered approach. Given the prominent role that digital media use play in both VR processes and mental health among young people, a better understanding of risk and protective factors associated with digital media use is warranted to inform and tailor evidence-based prevention programs that could significantly help reduce social ruptures and the associated risk of violence.

Digital media use and support for VR

The online space has become a central developmental context for young people [ 18 , 19 ]. Empirical evidence remains mixed, suggesting that digital media use can be either a risk or protective factor across multiple developmental outcomes depending on a complex interplay between both online and offline factors [ 18 ]. A consensus is now emerging that the specific behaviors in which youth engage online, rather than overall digital media per se, are key determinants of well-being. Yet, gaps in knowledge remain [ 20 ].

On the one hand, digital media can be used to connect with peers and to counter isolation, thus extending or reinforcing one’s social support network and possibly one’s trust in institutions and in democracy. On the other hand, the Internet can provide instant and unfiltered access to content and groups that propagate fake news, extreme beliefs and encourage violent actions, representing one of the main settings that can facilitate disaffiliation phenomena and recruitment of young people by extremist groups [ 7 , 21 , 22 , 23 , 24 ]. Notably, whereas the majority of young people go online, only a minority of them get involved in VR processes. As such, it is likely that digital media use does not have a linear relationship with support for VR, but that specific constellations of digital media use are differentially associated with support for VR [ 8 , 25 ].

Young people’s use of digital media is complex and heterogeneous [ 18 ], making the measurement and conceptualization of digital media use a challenging area [ 26 , 27 ]. In this study, we focus on some aspects of digital media use that have been theoretically and/or empirically associated with VR, namely time on social media, reasons for Internet use (work, informational, entertainment, social), news literacy, trust in specific online sources of information (news, peers, influencers, government, youtube), preference for online social interactions and online political interactions.

The Internet can be used for multiple purposes, spanning from work or entertainment, to relational maintenance and social interaction [ 18 , 28 ]. Although spending more time online has been associated with increased exposure to extremist content [ 23 ], whether this exposure is associated with risks of VR is yet unclear [ 29 ]. Overall, the impact of time spent on social media on a variety of social and health outcomes including VR varies based on the specific online activities and experiences [ 8 , 18 , 20 ].

Of importance, the Internet is currently the most important source of information for young people [ 30 , 31 ], but trust on the validity of information from official governmental websites as well as from social media (e.g., Instagram, Twitter, Youtube) can vary between individuals. Misinformation and beliefs in conspiracy theories have been associated with higher support for VR [ 32 , 33 ]. News literacy is considered a potential avenue to countering both misinformation, social polarization and online extremism [ 34 , 35 ]. News literacy is defined as the ability to find/identify/recognize news, critically evaluate and produce them [ 36 ]. However, empirical research that examines the association between news literacy and support for VR is lacking.

Prior research has found that preference for online social interactions over face-to-face relationships represents a risk factor for support for VR [ 37 , 38 ]. Preference for online social interactions is characterized by beliefs that one is safer, more confident, more comfortable and appreciated when online as opposed to offline [ 39 ] and is considered a component of problematic Internet use as it implies problematic relational experiences offline.

Some studies suggest that actively seeking and engaging with extremist content online is associated with higher risk of VR [ 8 , 22 , 25 ]. Although online interactions with strangers have been associated with higher risk of psychological distress [ 17 , 40 ], the extent to which interactions with known and unknown people around political or current issues are associated, if at all, with support for VR has yet to be explored [ 23 ].

Given the variety in online experiences and type of digital media use, a person-centered approach via a latent profile analysis (LPA) facilitates examining different constellations of digital media use among young adults and associations between latent groups and support for VR. As VR is the result of complex and unique interplays between personal and social/contextual variables [ 4 , 41 , 42 ], identifying patterns of vulnerabilities online via a person-centered approach can inform the development of tailored VR prevention programs targeting digital media use.

The present study

The present study adopts a person-centered approach to investigate: 1) patterns of digital media use among young Canadians. Specifically we focus on reasons for digital media use (i.e., work, entertainment, socialization, information), reported trust in different sources of online information (i.e., official government and news websites and social media), news literacy, time on social media, preference for online social interactions and online political interactions (e.g., posting/discussing with peers vs strangers, having conflicts online about these issues); 2) the association between patterns of digital media use and levels of depressive symptoms; and 3) the association between patterns of digital media use and support for VR. We expect to identify at least two groups of young people who differ in their reported digital media use. Given that we do not have a priori knowledge of the class structure in the data, we did not have a priori hypotheses about the association between each profile and depressive symptoms. However, we anticipate that the group(s) that will report the highest levels of depressive symptoms will also be at higher risk of supporting VR.

Participants

A total of 2,695 participants answered an online survey; missing outcome data ( n =362) and individuals identifying as “other” gender ( n =9) were removed for methodological concerns given the very small sample size of this gender group. Final sample size was 2,324 participants (59.3% women; mean age = 30.10; SD = 5.44 ). Socio-demographic characteristics are presented in Table 1 .

Data were collected in November 2021, during the COVID-19 pandemic in Alberta, Ontario, and Quebec. Participants were recruited through the Leger360 online platform with over 500,0000 registered members and answered the survey in either English or French [ 43 ]. informed consent to participate was obtained electronically from all of the participants in the study, and response rate was 53.8%. Exclusion criteria were individuals under the age of 18 or above 41. Study protocol and procedures were approved by the Institutional Review Board of.

Support for VR

The Radicalism Intention Scale (RIS) is a 4-item subscale of the Activism and Radicalism Intention Scales (ARIS) [ 44 ]. It assesses an individual’s readiness to participate in illegal and violent behavior in the name of one’s group or organization. Respondents rated their agreement with four statements on a seven-point Likert scale, with higher scores indicating more support for VR (range 4-28). The scale has good psychometric properties among young adults [ 45 ] (α = .89; Ω = 0.89).

Time spent on social media (daily)

Participants were asked to identify how many hours they spend on social media on a typical day (i.e., less than 2 hours, 2-4 hours, 4-6 hours, and 6 hours or more).

Reasons for Internet use

Four statements on Internet use were presented (i.e., using Internet: for personal relationships, to actively search for information/news, for entertainment, and for work). Participants were asked to indicate on a 5-point Likert scale how much they used the Internet for each reason (not at all, a little, moderately, a lot, most of the time).

News literacy

Was measured as a subscale of the literacy scale by Jones-Jang et al. [ 36 ]. Participants were asked using a 5-point Likert scale how much they agreed with each statement (six items, from 1-strongly disagree to 5- strongly agree, range 6 - 30)(α = .80; Ω = 0.80).

Trust on online sources of information

Five statements around trusting different sources of online information were presented, namely trust in news, peers, influencers, government, and YouTube sources of information. Participants were asked to indicate how often they trust each source of information on a 4-point Likert scale (never, rarely, sometimes, often).

Preference for online social interactions

(PFOSI) was measured with the 13-item Social Comfort subscale of the Online Cognition Scale [ 46 ]. Participants rated on a 8-point Likert scale (range 0 – 91) how much they agreed with statements describing their relationships with people who they know primarily through the Internet (e.g., chat rooms, message boards, online gaming communities). Higher scores indicate more preference for online social interactions (α = .92; Ω = 0.92).

Online political interactions

Participants were asked to indicate on a 6-point Likert scale (from “None/No time at all”, to “Several times a day”) how often their online interactions were oriented around these four statements: posted information about politics/current affairs on social media, discussed politics/current affairs with people you know, discussed politics/current affairs with people you do not know, had verbal conflicts with known people around information shared/posted online.

Depressive symptoms

Depressive symptoms were measured by using the 15-item subscale of the Hopkins Symptom Checklist-25 (HSCL-25) [ 47 ]. Items are rated on a Likert scale from 1 (not at all) to 4 (extremely) based on how much discomfort that problem has caused them during the past seven days, and a total score is obtained by computing the mean of all items. The clinical cut-off is set at 1.75 (score range from 1 to 4) and scores have been recoded as below (0) or above (1) this cut-off. The HSCL-25’s psychometric qualities have been well established [ 48 ] (α = .94; Ω = 0.94).

Socio-demographic variables

Participants provided information on their age, gender (man or woman), education (None/Less than high school, High school graduate, Apprenticeship, technical institute, trade or vocational school, College, CEGEP or other non-university certificate or diploma or University certificate, diploma or degree), Income ($19,999 or less, $20,000- $39,999, $40,000- $59,999, $60,000 - $79,999, $80,000- $99,999, $100,000 or more), employment (not employed, employed -essential, employed – non-essential), generational status (first-generation immigrant, second-generation immigrant, and third generation or more immigrant/non-immigrant), province (Alberta, Ontario, Quebec), religious beliefs (no religion, religion), and age.

Statistical analyses

Analyses were conducted using R software [ 49 ]. Missing data were imputed using the Random Forest method via the mice package [ 50 , 51 ]. Sensitivity analysis suggested that missing data and multiple imputations did not alter the observed patterns of associations. First, we estimated the LPA model around variables related to digital media use via the tidyLPA package [ 52 ]. LPA is an analytic strategy that attempts to identify subgroups of people within a heterogeneous population who has a high degree of homogeneity in responses on a set of indicators. The appropriate number of latent profiles was selected based on the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the Sample-size-adjusted BIC (SABIC), Bootstrap Likelihood Ratio Test (BLRT), characteristics of the profiles (interpretability of response profiles or uniqueness) and a conservative profile sample size (>10%) [ 53 , 54 , 55 , 56 ]. Lower AIC, BIC and SABIC values and a statistically significant BLRT indicate a better model fit [ 53 , 54 ]. Once the best LPA solution was identified, the level of entropy (acceptable if >.70) and Average Posterior Class Probability (AvePP; acceptable if >.70) were examined to determine the accuracy of classification [ 57 ].

Next, based on the predicted probabilities of profile membership made by the LPA, we assigned each participant to a specific profile. Analyses were then conducted on the univariable associations between sociodemographic characteristics and profile membership. Frequencies of profile membership by sociodemographic characteristics can be found in Table 3 .

Lastly, we conducted linear regression analyses that estimated support for VR as a function of profile membership. A sequential model building approach was used to evaluate the associations between profiles and support for VR. Model 1 presents the unadjusted association between profile and support for VR; model 2 adjusts for sociodemographic characteristics, and model 3 adjusts for sociodemographic characteristics and depression.

Latent Profile Structure

LPA models 2 through 6 are presented in Table 2 along associated BIC and log-likelihood values. The four-class solution was selected as the best fit for sample size of profiles (>10%) and interpretability of findings, despite not having the lowest BIC value. Figure 1 presents profile membership item response probabilities for digital media use. Participants in all profiles had a high probability of reporting average levels of news literacy. Unique class characteristics emerged around time spent on social media and overall Internet use preference for online social interactions, online political interactions and trusting multiple sources of information. Profile 1, named Average Internet use/Undifferentiated trust is characterized by individuals who demonstrated average Internet use yet infrequently used the Internet for interactions around politics/current affairs and showed undifferentiated trust towards information found on-line, regardless of the source. Participants in Profile 2, named Limited Internet use/Low Trust , infrequently used the Internet across the considered reasons and reported a low probability of trusting news and government sources compared to information from social media (e.g., peers, influencers, youtube). Profile 3, named Average Internet use/Institutional trust , is characterized by participants with average and undifferentiated internet use, who were more likely to report greater trust in institutional sources of online information (e.g., news, government) compared to other social media sources. Profile 4, named Online relational and political engagement/Social media trust , consists of individuals with a high probability of preferring online, as opposed to in person, social interactions and spending a large amount of time on social media on a daily basis. In addition, participants in Profile 4 had a high probability of using the Internet for discussing politics and other issues with both peers and strangers, actively posting on-line about politics, and were more likely to report conflicts online compared to all other profiles. Profile 4 participants had a lower probability of trusting news and government sources compared to other sources of information online (peers, influencers, youtube). Overall, Profile 1 and 3 included 46.8% and 27.9% of participants, respectively. Profile 2 was smaller and included 11.4% of participants, while Profile 4 included 13.9% of participants.

figure 1

Four-Profile Solution with Standardized Mean by Item Responses ( N = 2324)

Note. PFOSI Preference for online social interactions

Profile belonging, sociodemographic characteristics and depressive symptoms

Table 3 represents sociodemographic characteristics and depressive symptoms by profile for study participants. All variables were significantly associated ( p < 0.05) with profile belonging at the univariable level. The Average Internet use/Undifferentiated trust profile included a higher representation of women, non-immigrant, employed and non-religious participants, as well as participants who reported high education and income. A total of 45% of participants in this profile scored above the clinical cut-off for depressive symptoms. Participants in the Limited Internet use/Low Trust profile had a higher probability of being less educated, reporting a lower income and more unemployement. Participants in this profile were more likely to live in Alberta. Profile 3, Average Internet use/Institutional trust included participants who were highly educated, had high income, without an immigration background (third generation or more) and without a religion. Participants in this profile reported also the lowest levels of depression (37.5% above clinical cut-off) and more of them lived in Quebec. Finally, the Online relational and political engagement/Social media trust profile had an overrepresentation of men, immigrants, participants with a religion and who lived mainly in Ontario. In addition, participants in this group were overall educated but reported low income and high unemployment. A total of 70% of participants in this profile scored above the clinical cut-off to our measure of depressive symptoms.

Associations of profile membership with support for VR

Profile membership was associated with scores on the RIS ( p < 0.001). Participants in the Online relational and political engagement/Social media trust profile were more likely to report higher levels of support for VR compared to the other profiles in both unadjusted and adjusted models. Specifically, belonging to this profile was associated with a 0.91 ( SE = 0.06, p < 0.001) increase in support for VR compared to the Average Internet use/Undifferentiated trust profile when controlling for sociodemographic variables and depressive symptoms (Table 4 ). Belonging to the Average Internet use/Institutional trust profile was associated with a -0.267 ( SE = 0.046, p < 0.001) decrease in support for VR compared to the Average Internet use/Undifferentiated trust profile when controlling for sociodemographic variables and depression (Table 4 ). Gender, generation, province, age, and depressive symptoms were also associated with support for VR ( p < 0.05). Men, first generation immigrants, participants from Ontario, younger participants, and participants reporting more depressive symptoms were more likely to report higher support for VR. Religion, income and education were not significantly associated with support for VR (Table 4 ).

The current study investigated patterns of digital media use in a sample of young adults from three Canadian provinces. In addition, we examined whether these patterns were differentially associated with depressive symptoms and support for VR. Four profiles emerged from our LPA, confirming the pertinence of using a person-centered approach to shed light on the complex patterns of digital media use among young people. Overall, profiles differentiated participants mostly in terms of trust on specific sources of information and level and type of online engagement.

The two largest profiles ( Average Internet use/Undifferentiated trust and Average Internet use/Insitutional trust) differed primarily in their trust of online sources of information. Specifically, individuals in the Average Internet use/Insitutional trust profile reported to trust more frequently institutional sources of information (i.e., government and news) rather than social media (i.e., youtube, influencers, and peers), suggesting an overall acceptance of mainstream information and of the status quo. In contrast, the Average Internet use/Undifferentiated trust group showed average levels of trust to all sources of information alike. This group spent slightly more time online than the Average Internet use/Insitutional trust one , but overall these two groups did not differ much in their online social or political interactions. These two groups included 74.7% of participants, indicating a divide in the population mostly linked to what online sources to trust for information. The remaining participants were equally distributed between the Online relational and political engagement/Social media trust and the Low Internet use/Low trust profiles. Participants in both of these profiles trusted more frequently alternative social media sources of information compared to institutional ones, but they differed in overall levels of trust, with the Limited Internet use/Low trust group reporting overall low levels of trust, especially for institutional sources of information. Participants in the Online relational and political engagement/Social media trust profile reported high levels of trust in alternative social media sources of information and were more actively and politically engaged online with both peers and especially with strangers. They spent more time online and preferred online social interactions more compared to the other profiles. Taken together, these findings suggest that patterns of digital media use echo the increasing polarization in our societies [ 58 , 59 ] around issues of trust/distrust, engagement/disengagement as well as a variety of negative/positive online experiences. Indeed, the most important variables to differentiate the four profiles were related to the frequency of trusting different online sources of information as well as specific social and political interactions online, rather than reasons for Internet use or news literacy, which on the contrary did not seem to play a significant role in determining profile membership.

We suggest that the divide around trust in online information and engagement needs to be situated in the broader socio-political context, which can partly explain the socio-demographic differences we found across profiles.The Average Internet use/Insitutional trust and the Average Internet use/Undifferentiated trust profiles consisted of more affluent and more educated participants, mostly employed and without an immigration background. Participants in these profiles may benefit from more privileges in society, which can favor their trust in mainstream institutional sources of information online [ 60 , 61 , 62 ] . Indeed, participants in the Average Internet use/Institutional trust group were more likely to report the highest levels of education and income as well as the lowest levels of depressive symptoms followed by the Average Internet use/Undifferentiated trust profile. The difference in levels of depression between these two profiles can also be associated with the presence of younger participants and more women in the Average Internet use/Undifferentiated trust profile compared to the Average internet use/Institutional trust one. The Low Internet use/Low trust and Online relational and political engagement/Social media trust profiles included more participants reporting lower income. Participants in the Online relational and political engagement/Social media trust group included a higher percentage of men, participants with an immigration background and professing a religion – although participants in this profile reported an education level similar to the two larger profiles. This profile reported concerning levels of depression (70.2% above clinical cut-off). Relying on the internet for relational and political purposes combined with more frequent trust in alternative social media sources of information and less privileges in society can jeopardize young people’s mental health. Within a socio-ecological perspective, the fact that this profile is made up of primarily educated men with an immigrant background may represent a form of double-bind in which some groups may feel alienated because official discourses and stances about equity in Canada are contradicted by daily life experiences. This group’s pattern of digital media use may be related to the hardships, grievances and social deprivation experienced by minorities both online and offline. The combination of negative life experiences with high emotional distress may lead to experience overall negative and conflictual online social and political exchanges, subsequently legitimazing violence as an ultimate solution [ 16 , 17 , 63 ]. Besides reporting low income similarly to the Online relational and political engagement/Social media trust group, the Limited Internet use/Low trust profile included less educated and more unemployed participants compared to all other profiles, mostly without an immigration background. Participants in this group may not be content with their socio-political reality, and disengage from social and political issues, at least online. Noteworthy, our profiles suggest that digital media use is closely intertwined with social experiences offline. Interventions should consider this complex interaction and adopt a socio-ecological approach to both research and intervention, tailored not only to the different groups in society but also addressing the gap between them to mend the social fabric.

With regards to depressive symptoms and support for VR, the Online relational and political engagement/Social media trust reported the highest levels of depression and support for VR, followed by the Limited Internet use/Low trust profile. The fact that the two groups that reported less trust in institutional sources of information compared to alternative social media showed more depressive symptoms and support for VR indicates that issues of trust are important to address with young people in prevention and intervention efforts. Given that individuals in these groups had overall a lower status in society, compared to the other two profiles, it is possible that they may have been experiencing more social deprivation and grievances during the pandemic and have been more sensitive to the anti-system rhetoric which provided meaning to this perceived injustice [ 60 ]. This divide aligns with the emergence of polarized social movements in the whole of Canada (e.g., pro- and anti-vaxx groups during the pandemic). Promoting a sense of agency and belonging as well as ensuring that young people can express their opinions and have a purpose in life may help decrease depressive symptoms and reduce overall socio-political distrust and disengement both online and offline, which can in turn contribute to reduce the legitimation of violence. However, such interventions need to consider the social adversity and deprivation experienced by young people and be tailored to the specific needs and challenges that they face. Multi-level systemic interventions that target online and offline socio-political macro-determinants of mental health and injustices in our societies are needed above and beyond individual intervention programs.

The association between membership to the Online relational and political engagement/Social media trust profile and support for VR aligns with prior studies pointing to an association between active online political engagement and interactions and support for VR [ 8 , 23 , 35 ]. Noteworthy, this was a characteristic that clearly distinguished the Online active political engagement/Social media trust profile from all other profiles. Online relational and political engagement should be addressed in prevention and intervention, while also addressing possible isolation and injustices experienced offline. The association between membership to the Limited Internet use/Low trust and support for VR can be related to an overall distrust in society and especially in government and official institutions, which has been found to represent a risk factor for VR [ 32 ].

As expected, the group that was at higher risk of supporting VR was also the one that reported the highest level of depressive symptoms, which were significantly and positively associated with support for VR, confirming prior evidence [ 38 , 64 , 65 , 66 , 67 ]. Depressive symptoms do not necessarily lead to greater risk of VR [ 68 ]. Yet, multiple studies indicate a positive association between depressive symptoms and support for VR [ 38 , 64 , 65 , 66 , 67 ]. Although directionality of associations remain to be established, available evidence suggests that youth who interact more with strangers online [ 17 , 40 ], who prefer online social interactions [ 69 , 70 , 71 ] and who experience more social adversity [ 14 , 67 ] are at higher risk of depression, which can partly explain the higher scores of depressive symptoms found among the Online relational and political engagement/Social media trust profiles . Identifying as a man and being younger were also risk factors for support for VR, in line with prior studies [ 7 , 15 ], underlining the pertinence for future studies to focus on young people and to consider specificities by gender in VR studies [ 14 , 29 , 32 , 45 ].

Limitations

This study has several limitations. Most importantly, the cross-sectional design prevents us from drawing any conclusions about causality. Longitudinal studies are needed to shed light on the trajectories of associations between patterns of digital media use, depressive symptoms and support for VR. Second, our study is based on a convenience sample with a relatively high socio-economic level and education. This means that our results may not be generalizable to a larger, general population of young adults. Nonetheless, our online method of recruitment is appropriate given the sensitivity of the topic and the challenges of conducting research during a pandemic. Third, all data are based on young people’s self-reports and social desirability biases cannot be excluded. Fourth, our measures of digital media use were limited and not comprehensive of the broad range of possible online experiences. Given the rapidly evolving and dynamic aspects of the Internet, the availability of validated measures for different facets of Internet use remains a challenge for future studies. Last, our data were collected during the COVID-19 pandemic in three Canadian provinces, and results cannot be easily generalized to other provinces or countries, nor to a non-pandemic context.

Despite these limitations, our findings suggest that digital media use, psychological distress and their interaction play a role in processes of VR among young people and need to be situated and understood within a socio-ecological and social justice perspective. Specifically, trust in different sources of information and social and political experiences online are as relevant as the emotional and relational experiences of young people. The dynamic associations among these key elements have to be considered simultanously when reflecting on VR prevention and digital media use among young people. Prevention efforts should be adapted to the needs of specific populations and consider the diversity of their online/offline experiences. Indeed, our results suggests that online experiences are intertwined with offline experiences in society, in particular with grievances, and that an attention to the rapidly evolving socio-political scenario is warranted when designing intervention programs to prevent processes of VR among young people targeting their digital media use. The fact that self-reported news literacy did not differ across profiles questions the pertinence of VR prevention programs that target mainly news literacy skills among youth. Our findings support preliminary results that showed that media literacy did not protect youth from exposure to extremist content online [ 35 ] or risks of VR [ 25 ]. It has been argued that programs aimed to foster digital literacy may be associated with improved technical competence but leave participants “critically naïve” [ 72 ], failing to situate digital competence within the broader socio-political context. Although digital literacies may still be relevant skills to promote among young people, our findings suggest that, when it comes to the prevention of VR processes, critical thinking skills, supportive environments and a social justice approach to intervention may be equally important.

Availability of data and materials

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon request.

Venkatesh V, Rousseau C, Morin D, Hassan G. Violence as collateral damage of the COVID-19 pandemic. The Conversation. 2021. https://theconversation.com/collateral-damage-of-covid-19-rising-rates-of-domestic-and-social-violence-143345 .

Harpviken AN. Psychological vulnerabilities and extremism among western youth: A literature review. Adolescent Research Review. 2020;5(1):1–26.

Article   Google Scholar  

Steinberg LD. Age of opportunity: Lessons from the new science of adolescence. Boston: Houghton Mifflin Harcourt; 2014.

McGilloway A, Ghosh P, Bhui K. A systematic review of pathways to and processes associated with radicalization and extremism amongst Muslims in Western societies. International review of psychiatry. 2015;27(1):39–50.

Article   PubMed   Google Scholar  

Schmid AP. Radicalisation, de-radicalisation, counter-radicalisation: A conceptual discussion and literature review. International Centre for Counter-terrorism (ICCT). Research Paper. 2013. https://www.icct.nl/sites/default/files/import/publication/ICCT-Schmid-Radicalisation-De-Radicalisation-Counter-Radicalisation-March-2013_2.pdf .

Oppetit A, Campelo N, Bouzar L, Pellerin H, Hefez S, Bronsard G, et al. Do radicalized minors have different social and psychological profiles from radicalized adults? Frontiers in Psychiatry. 2019;10:644.

Article   PubMed   PubMed Central   Google Scholar  

Amit S, Kafy AA. A systematic literature review on preventing violent extremism. J Adolesc. 2022;94(8):1068–80.

Hassan G, Brouillette-Alarie S, Alava S, Frau-Meigs D, Lavoie L, Fetiu A, et al. Exposure to extremist online content could lead to violent radicalization: A systematic review of empirical evidence. International Journal of Developmental Science. 2018(Preprint):1-18.

Eisenman DP, Flavahan L. Canaries in the coal mine: Interpersonal violence, gang violence, and violent extremism through a public health prevention lens. International review of psychiatry. 2017;29(4):341–9.

Weine S, Eisenman DP, Kinsler J, Glik DC, Polutnik C. Addressing violent extremism as public health policy and practice. Behavioral Sciences of Terrorism and Political Aggression. 2017;9(3):208–21.

Adam-Troian J, Tecmen A, Kaya A. Youth extremism as a response to global threats? European Psychologist. 2021;26(1):15–28.

Knapstad M, Sivertsen B, Knudsen AK, Smith ORF, Aarø LE, Lønning KJ, et al. Trends in self-reported psychological distress among college and university students from 2010 to 2018. Psychological medicine. 2021;51(3):470–8.

Lipson SK, Phillips MV, Winquist N, Eisenberg D, Lattie EG. Mental health conditions among community college students: A national study of prevalence and use of treatment services. Psychiatric services. 2021:appi. ps. 202000437.

Miconi D, Levinsson A, Frounfelker RL, Li ZY, Oulhote Y, Rousseau C. Cumulative and independent effects of experiences of social adversity on support for violent radicalization during the COVID-19 pandemic: the mediating role of depression. Soc Psychiatry Psychiatr Epidemiol. 2022;57:1–13.

Rousseau C, Miconi D, Frounfelker RL, Hassan G, Oulhote Y. A repeated cross-sectional study of sympathy for violent radicalization in Canadian college students. American Journal of Orthopsychiatry. 2020;90(4):406–18.

Baker DA, Algorta GP. The relationship between online social networking and depression: A systematic review of quantitative studies. Cyberpsychology, Behavior, and Social Networking. 2016;19(11):638–48.

Ybarra ML, Alexander C, Mitchell KJ. Depressive symptomatology, youth Internet use, and online interactions: A national survey. Journal of adolescent health. 2005;36(1):9–18.

Stavropoulos V, Motti-Stefanidi F, Griffiths MD. Risks and opportunities for youth in the digital era: A cyber-developmental approach to mental health. European Psychologist. 2022;27(2):86–101.

Subrahmanyam K, Šmahel D. Connecting online behavior to adolescent development: A theoretical framework. Digital youth: Springer; 2011. p. 27–39.

Google Scholar  

Hamilton L, Gross B. How Has the Pandemic Affected Students’ Social-Emotional Well-Being? A Review of the Evidence to Date: Center on Reinventing Public Education; 2021.

Kaakinen M, Keipi T, Räsänen P, Oksanen A. Cybercrime victimization and subjective well-being: An examination of the buffering effect hypothesis among adolescents and young adults. Cyberpsychology, Behavior, and Social Networking. 2018;21(2):129–37.

Turner N, Holt TJ, Brewer R, Cale J, Goldsmith A. Exploring the relationship between opportunity and self-control in youth exposure to and sharing of online hate content. Terrorism and Political Violence. 2022;35(7):1–16.

Costello M, Barrett-Fox R, Bernatzky C, Hawdon J, Mendes K. Predictors of viewing online extremism among America’s youth. Youth & Society. 2020;52(5):710–27.

Morris E. Children: extremism and online radicalization. Journal of Children and Media. 2016;10(4):508–14.

Wolfowicz M, Hasisi B, Weisburd D. What are the effects of different elements of media on radicalization outcomes? A systematic review. Campbell Systematic Reviews. 2022;18(2):e1244.

V Shah NK, R. Lance Holbert, Dhavan. Connecting and disconnecting with civic life: Patterns of Internet use and the production of social capital. Political Commun. 2001;18(2):141-62.

Lee N-J, Shah DV, McLeod JM. Processes of political socialization: A communication mediation approach to youth civic engagement. Communication Research. 2013;40(5):669–97.

Ekström M, Olsson T, Shehata A. Spaces for public orientation? Longitudinal effects of Internet use in adolescence. Information, Communication & Society. 2014;17(2):168–83.

Conway M. Determining the role of the internet in violent extremism and terrorism: Six suggestions for progressing research. Studies in Conflict & Terrorism. 2017;40(1):77–98.

Anderson M, Jiang J. Teens, Social Media, and Technology 2018. Pew Research Center; 2018.

Ohme J. Updating citizenship? The effects of digital media use on citizenship understanding and political participation. Information, Communication & Society. 2019;22(13):1903–28.

Levinsson A, Miconi D, Li Z, Frounfelker RL, Rousseau C. Conspiracy theories, psychological distress, and sympathy for violent radicalization in young adults during the CoViD-19 pandemic: a cross-sectional study. International journal of environmental research and public health. 2021;18(15):7846.

Vv Mulukom, Pummerer LJ, Alper S, Bai H, Čavojová V, Farias J, et al.  Antecedents and consequences of COVID-19 conspiracy beliefs A systematic review. Soc Sci Med. 2022;301:114912.

Henschke A, Reed A. Toward an ethical framework for countering extremist propaganda online. Studies in Conflict and Terrorism. 2021:1–18. https://doi.org/10.1080/1057610X.2020.1866744 .

Schmuck D, Fawzi N, Reinemann C, Riesmeyer C. Social media use and political cynicism among German youth: the role of information-orientation, exposure to extremist content, and online media literacy. Journal of Children and Media. 2022;16(3):313–31.

Jones-Jang SM, Mortensen T, Liu J. Does media literacy help identification of fake news? Information literacy helps, but other literacies don’t. American Behavioral Scientist. 2021;65(2):371–88.

Ellis H, Miller E, Sideridis G, Frounfelker RL, Miconi D, Abdi S, et al. Risk and protective factors associated with attitudes in support of violent radicalization: variations by geographic location. Int J Public Health. 2021;66:617053.

Miconi D, Geenen G, Frounfelker RL, Levinsson A, Rousseau C. Meaning in life, future orientation and support for violent radicalization among Canadian college students during the CoViD-19 pandemic. Front Psychiatry. 2022;13:765908.

Caplan SE. Problematic Internet use and psychosocial well-being: development of a theory-based cognitive–behavioral measurement instrument. Computers in human behavior. 2002;18(5):553–75.

Gross EF, Juvonen J, Gable SL. Internet use and well-being in adolescence. Journal of social issues. 2002;58(1):75–90.

Miconi D, Rousseau C. Another way out: A positive youth development (PYD) approach to the study of violent radicalizaton in Quebec (Canada). In: Dimitrova R, Wiium N, editors. Handbook of positive youth development: Advancing research, policy, and practice in global contexts. New York: Springer; 2021. p. 415–29.

Chapter   Google Scholar  

Gøtzsche-Astrup O, Van den Bos K, Hogg MA. Radicalization and violent extremism: Perspectives from research on group processes and intergroup relations. Group Processes & Intergroup Relations. 2020;23(8):1127–36.

Leger 360. Leger Marketing 2022 [Available from: https://leger360.com/ .

Moskalenko S, McCauley C. Measuring political mobilization: The distinction between activism and radicalism. Terrorism and political violence. 2009;21(2):239–60.

Frounfelker RL, Frissen T, Miconi D, Lawson J, Brennan RT, d’Haenens L, et al. Transnational evaluation of the Sympathy for Violent Radicalization Scale: Measuring population attitudes toward violent radicalization in two countries. Transcult Psychiatry. 2021;58(5):669–82.

Davis RA, Flett GL, Besser A. Validation of a new scale for measuring problematic Internet use: Implications for pre-employment screening. Cyberpsychology & behavior. 2002;5(4):331–45.

Derogatis LR, Lipman RS, Rickels K, Uhlenhuth EH, Covi L. The Hopkins Symptom Checklist (HSCL): A self-report symptom inventory. Systems Research and Behavioral Science. 1974;19(1):1–15.

Mollica RF, Caspi-Yavin Y, Bollini P, Truong T, Tor S, Lavelle J. The Harvard Trauma Questionnaire: validating a cross-cultural instrument for measuring torture, trauma, and posttraumatic stress disorder in Indochinese refugees. Journal of nervous and mental disease. 1992;180(2):111–6.

R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2017.

van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011;45(3):1–67.

Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.

Rosenberg JM, Beymer PN, Anderson DJ, Van Lissa C, Schmidt JA. tidyLPA: An R package to easily carry out latent profile analysis (LPA) using open-source or commercial software. J Open Source Software. 2019;3(30):978.

Ferguson SL, G. Moore EW, Hull DM Finding latent groups in observed data A primer on latent profile analysis in Mplus for applied researchers. Int J Behav Devel. 2020;44(5):458-68.

Tein J-Y, Coxe S, Cham H. Statistical power to detect the correct number of classes in latent profile analysis. Struct Equation Model: a multidisciplinary J. 2013;20(4):640–57.

Lubke G, Neale MC. Distinguishing between latent classes and continuous factors: Resolution by maximum likelihood? Multivariate Behav Res. 2006;41(4):499–532.

Spurk D, Hirschi A, Wang M, Valero D, Kauffeld S. Latent profile analysis: A review and “how to” guide of its application within vocational behavior research. J Vocational Behav. 2020;120:103445.

Masyn KE. Latent class analysis and finite mixture modeling. In Todd D. Little (ed.). The Oxford Handbook of Quantitative Methods in Psychology vol. 2. Oxford Library of Psychology. Oxford: Oxford University Press; 2013. p. 551–611.

Pickup M, Stecula D, Van Der Linden C. Novel coronavirus, old partisanship: COVID-19 attitudes and behaviours in the United States and Canada. Can J Pol Sci/Revue canadienne de science politique. 2020;53(2):357–64.

Boxell L, Gentzkow M, Shapiro JM. Cross-country trends in affective polarization. Rev Econ Stat. 2024;106(2):557–65.

Gibson B, Schneider J, Talamonti D, Forshaw M. The impact of inequality on mental health outcomes during the COVID-19 pandemic: A systematic review. Can Psychol/Psychologie Canadienne. 2021;62(1):101.

Bornand T, Klein O. Political Trust by Individuals of low Socioeconomic Status: The Key Role of Anomie. Soc Psychol Bull. 2022;17:1–22.

Goubin S, Hooghe M. The Effect of Inequality on the Relation Between Socioeconomic Stratification and Political Trust in Europe. Soc Justice Res. 2020;33(2):219–47.

Vannucci A, Simpson EG, Gagnon S, Ohannessian CM. Social media use and risky behaviors in adolescents: A meta-analysis. J Adolesc. 2020;79:258–74.

Bhui K, Warfa N, Jones E. Is violent radicalisation associated with poverty, migration, poor self-reported health and common mental disorders? PloS one. 2014;9(3):e90718.

Bhui K, Silva MJ, Topciu RA, Jones E. Pathways to sympathies for violent protest and terrorism. Br J Psychiatry J Mental Sci. 2016;209(6):483–90.

Bhui K, Otis M, Silva MJ, Halvorsrud K, Freestone M, Jones E. Extremism and common mental illness: Cross-sectional community survey of White British and Pakistani men and women living in England. Br J Psychiatry. 2020;217(4):547–54.

Rousseau C, Oulhote Y, Lecompte V, Mekki-Berrada A, Hassan G, El Hage H. Collective identity, social adversity and college student sympathy for violent radicalization. Transcult Psychiatry. 2019;58(5):654–68.

Misiak B, Samochowiec J, Bhui K, Schouler-Ocak M, Demunter H, Kuey L, et al. A systematic review on the relationship between mental health, radicalization and mass violence. Eur Psychiatry. 2019;56:51–9.

Gámez-Guadix M. Depressive symptoms and problematic Internet use among adolescents: Analysis of the longitudinal relationships from the cognitive–behavioral model. Cyberpsychol Behav Soc Netw. 2014;17(11):714–9.

Seabrook EM, Kern ML, Rickard NS. Social networking sites, depression, and anxiety: a systematic review. JMIR Mental Health. 2016;3(4): e5842.

Vannucci A, McCauley Ohannessian C. Social media use subgroups differentially predict psychosocial well-being during early adolescence. J Youth Adolesc. 2019;48(8):1469–93.

Hinrichsen J, Coombs A. The five resources of critical digital literacy: a framework for curriculum integration. Res Lear Technol. 2014;21(0):1–16.

Download references

Acknowledgements

Not applicable.

Our work is funded by a Digital Citizen Contribution Program grant awarded to DM by Canadian Heritage and by a project grant awarded to CR by the Canadian Institute of Health Research (CIHR).

Author information

Authors and affiliations.

Department of Educational Psychology and Adult Education, University of Montréal, Montréal, QC, Canada

Diana Miconi

MUHC Research Institute, Montréal, QC, Canada

Tara Santavicca

College of Health, Lehigh University, Bethlehem, PA, USA

Rochelle L. Frounfelker

Department of Sociology, Université de Québec à Montréal, Montréal, QC, Canada

Aoudou Njingouo Mounchingam

Division of Social and Cultural Psychiatry, McGill University, Montréal, QC, Canada

Cécile Rousseau

You can also search for this author in PubMed   Google Scholar

Contributions

Author DM contributed to conception and design of the study, interpretation of study findings, and writing the manuscript. Authors TS, RLF and ANM contributed to data analysis and drafting parts of the manuscript. Author CR contributed to the interpretation of study findings and provided feedback on several versions of the manuscript. The authors listed in the byline have agreed to the byline order and to submission of the manuscript in this form. All authors agreed to act as guarantor of the work.

Corresponding author

Correspondence to Diana Miconi .

Ethics declarations

Ethics approval and consent to participate.

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. Informed consent to participate was obtained from all of the participants in the study. All procedures involving human subjects were approved by the Educational and Psychological Institutional Review Board of the University of Montreal.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Miconi, D., Santavicca, T., Frounfelker, R.L. et al. Digital media use, depressive symptoms and support for violent radicalization among young Canadians: a latent profile analysis. BMC Psychol 12 , 260 (2024). https://doi.org/10.1186/s40359-024-01739-0

Download citation

Received : 24 October 2023

Accepted : 18 April 2024

Published : 10 May 2024

DOI : https://doi.org/10.1186/s40359-024-01739-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Violent radicalization
  • Digital media use
  • Young people
  • Person-centered approach

BMC Psychology

ISSN: 2050-7283

research on regression analysis

IMAGES

  1. Regression Analysis: The Ultimate Guide

    research on regression analysis

  2. Regression Analysis. Regression analysis models Explained…

    research on regression analysis

  3. Regression analysis: What it means and how to interpret the outcome

    research on regression analysis

  4. What is regression analysis?

    research on regression analysis

  5. How to Read and Interpret a Regression Table

    research on regression analysis

  6. A Refresher on Regression Analysis (2022)

    research on regression analysis

VIDEO

  1. 3. Regression Analysis

  2. Quantitative market research / Regression for causation

  3. REGRESSION ANALYSIS

  4. How Does Regression Analysis Work in Economics? Exploring Types and Uses

  5. What is regression, regression analysis? Regression & Correlation? Dependent & Independent Variable?

  6. Regression-Scattered Diagram Method Of Regression Line

COMMENTS

  1. A Refresher on Regression Analysis

    A Refresher on Regression Analysis. Understanding one of the most important types of data analysis. by. Amy Gallo. November 04, 2015. uptonpark/iStock/Getty Images. You probably know by now that ...

  2. Regression Analysis

    Regression analysis also helps in predicting health outcomes based on various factors like age, genetic markers, or lifestyle choices. Social Sciences: Regression analysis is widely used in social sciences like sociology, psychology, and education research. Researchers can investigate the impact of variables like income, education level, or ...

  3. Regression Analysis

    Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a dependent variable and one or more independent variables.

  4. Regression analysis

    t. e. In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables ...

  5. The clinician's guide to interpreting a regression analysis

    Regression analysis is an important statistical method that is commonly used to determine the relationship between ... Schober P, Vetter TR. Linear regression in medical research. Anesth Analg ...

  6. Regression Analysis

    The aim of linear regression analysis is to estimate the coefficients of the regression equation b 0 and b k (k∈K) so that the sum of the squared residuals (i.e., the sum over all squared differences between the observed values of the i th observation of y i and the corresponding predicted values \( {\hat{y}}_i \)) is minimized.The lower part of Fig. 1 illustrates this approach, which is ...

  7. Regression Tutorial with Analysis Examples

    My tutorial helps you go through the regression content in a systematic and logical order. This tutorial covers many facets of regression analysis including selecting the correct type of regression analysis, specifying the best model, interpreting the results, assessing the fit of the model, generating predictions, and checking the assumptions.

  8. Explained: Regression analysis

    The regression analysis creates the single line that best summarizes the distribution of points. Mathematically, the line representing a simple linear regression is expressed through a basic equation: Y = a 0 + a 1 X. Here X is hours spent studying per week, the "independent variable.". Y is the exam scores, the "dependent variable ...

  9. Simple Linear Regression

    Regression allows you to estimate how a dependent variable changes as the independent variable (s) change. Simple linear regression example. You are a social researcher interested in the relationship between income and happiness. You survey 500 people whose incomes range from 15k to 75k and ask them to rank their happiness on a scale from 1 to ...

  10. Regression: Models, Methods and Applications

    His main research interests include semiparametric regression, longitudinal data analysis and spatial statistics, with applications ranging from social science and risk management to public health and neuroscience. Thomas Kneib is a Professor of Statistics at the University of Göttingen, Germany, where he is the Speaker of the ...

  11. Understanding and interpreting regression analysis

    Conclusions. Regression analysis is a powerful and useful statistical procedure with many implications for nursing research. It enables researchers to describe, predict and estimate the relationships and draw plausible conclusions about the interrelated variables in relation to any studied phenomena.

  12. Regression Analysis: The Complete Guide

    Regression analysis is a statistical method. It's used for analyzing different factors that might influence an objective - such as the success of a product launch, business growth, a new marketing campaign - and determining which factors are important and which ones can be ignored.

  13. Regression Analysis

    Regression analysis is a technique that permits one to study and measure the relation between two or more variables. Starting from data registered in a sample, regression analysis seeks to determine an estimate of a mathematical relation between two or more variables.The goal is to estimate the value of one variable as a function of one or more other variables.

  14. Regression Analysis for Prediction: Understanding the Process

    Regression analysis is a statistical technique for determining the relationship between a single dependent (criterion) variable and one or more independent (predictor) variables. The analysis yields a predicted value for the criterion resulting from a linear combination of the predictors. ... When reviewing research articles in which regression ...

  15. Regression Analysis

    Author's research in combining clustering and dimensionality reduction for indexing high dimensional data which appeared in (Castelli et al., 2000 ... Regression analysis is a statistical method for analyzing a relationship between two or more variables in such a manner that one variable can be predicted or explained by using information on the ...

  16. Regression Analysis

    Regression analysis is a set of statistical methods used to estimate relationships between a dependent variable and one or more independent variables. ... Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more. Discover Full-Immersion Membership.

  17. Regression Analysis

    Analysis and Interpretation of Multivariate Data. D.J. Bartholomew, in International Encyclopedia of Education (Third Edition), 2010 Regression Analysis. Regression analysis is the oldest, and probably, most widely used multivariate technique in the social sciences. Unlike the preceding methods, regression is an example of dependence analysis in which the variables are not treated symmetrically.

  18. What Is Regression Analysis? Types, Importance, and Benefits

    I n such a linear regression model, a response variable has a single corresponding predictor variable that impacts its value. For example, consider the linear regression formula: y = 5x + 4 If the value of x is defined as 3, only one possible outcome of y is possible.. Multiple linear regression analysis. In most cases, simple linear regression analysis can't explain the connections between data.

  19. (PDF) Regression Analysis

    Regression analysis allows researchers to understand the relationship between two or more variables by estimating the mathematical relationship between them (Sarstedt & Mooi, 2014). In this case ...

  20. A Beginner's Guide to Regression Analysis

    Logistic Regression. Logistic Regression comes into play when the dependent variable is discrete. This means that the target value will only have one or two values. For instance, a true or false, a yes or no, a 0 or 1, and so on. In this case, a sigmoid curve describes the relationship between the independent and dependent variables.

  21. Regression Analysis: Definition, Types, Usage & Advantages

    Overall, regression analysis saves the survey researchers' additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.

  22. What Is Regression Analysis in Business Analytics?

    Regression analysis is the statistical method used to determine the structure of a relationship between two variables (single linear regression) or three or more variables (multiple regression). According to the Harvard Business School Online course Business Analytics, regression is used for two primary purposes: To study the magnitude and ...

  23. Regression: Definition, Analysis, Calculation, and Example

    Regression is a statistical measure used in finance, investing and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by ...

  24. Multiple Linear Regression Model

    Multiple Linear Regression Model. December 2023. December 2023. DOI: 10.1007/978-3-031-37865-2_18. In book: Introduction to Probability, Statistics & R (pp.405-440) Authors: Sujit K. Sahu. To read ...

  25. Harness BI with Regression Analysis Benefits

    Regression analysis is a statistical method used to determine the relationship between a dependent variable and one or more independent variables. The goal is to model the expected value of the ...

  26. Sequential estimation for the multiple linear regression models with

    Sequential analysis (SA) as a sampling technique has notable advantages like smaller average sample size and reduced value of risk compared to similarly comparable fixed-sample techniques. In this study, we first propose a few models for the estimation of the regression parameters or functions of parameters under the multiple linear regression ...

  27. Exploring predictors and prevalence of postpartum depression among

    The data underwent logistic regression analysis using SPSS-IBM 27 to list potential factors that could predict PPD. The overall frequency of PPD in the total sample was 92(13.6%). It ranged from 2.3% in Syria to 26% in Ghana. Only 42 (6.2%) were diagnosed. Multiple logistic regression analysis revealed there were significant predictors of PPD.

  28. Geospatial pattern of level of minimum acceptable diet and its

    ArcGIS Pro and Sat Scan version 9.6 were used to map the visual presentation of geographical distribution failed to achieve the minimum acceptable diet. A multiscale geographically weighted regression analysis was done to identify significant determinants of level of minimum acceptable diet.

  29. CD163

    Multivariable Cox regression models assessing associations between CD163 + cell density tertiles with OS and BCSS within subtype, additionally adjusting for self-identified race and grade in Model 2. Table S4. Multivariable Cox regression models assessing associations of additional CD163 + cell density cutoffs with OS within subtype. Table S5..

  30. Digital media use, depressive symptoms and support for violent

    Background Despite the prominent role that digital media play in the lives and mental health of young people as well as in violent radicalization (VR) processes, empirical research aimed to investigate the association between Internet use, depressive symptoms and support for VR among young people is scant. We adopt a person-centered approach to investigate patterns of digital media use and ...