U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Korean J Anesthesiol
  • v.71(2); 2018 Apr

Introduction to systematic review and meta-analysis

1 Department of Anesthesiology and Pain Medicine, Inje University Seoul Paik Hospital, Seoul, Korea

2 Department of Anesthesiology and Pain Medicine, Chung-Ang University College of Medicine, Seoul, Korea

Systematic reviews and meta-analyses present results by combining and analyzing data from different studies conducted on similar research topics. In recent years, systematic reviews and meta-analyses have been actively performed in various fields including anesthesiology. These research methods are powerful tools that can overcome the difficulties in performing large-scale randomized controlled trials. However, the inclusion of studies with any biases or improperly assessed quality of evidence in systematic reviews and meta-analyses could yield misleading results. Therefore, various guidelines have been suggested for conducting systematic reviews and meta-analyses to help standardize them and improve their quality. Nonetheless, accepting the conclusions of many studies without understanding the meta-analysis can be dangerous. Therefore, this article provides an easy introduction to clinicians on performing and understanding meta-analyses.

Introduction

A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality. A meta-analysis is a valid, objective, and scientific method of analyzing and combining different results. Usually, in order to obtain more reliable results, a meta-analysis is mainly conducted on randomized controlled trials (RCTs), which have a high level of evidence [ 2 ] ( Fig. 1 ). Since 1999, various papers have presented guidelines for reporting meta-analyses of RCTs. Following the Quality of Reporting of Meta-analyses (QUORUM) statement [ 3 ], and the appearance of registers such as Cochrane Library’s Methodology Register, a large number of systematic literature reviews have been registered. In 2009, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [ 4 ] was published, and it greatly helped standardize and improve the quality of systematic reviews and meta-analyses [ 5 ].

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f1.jpg

Levels of evidence.

In anesthesiology, the importance of systematic reviews and meta-analyses has been highlighted, and they provide diagnostic and therapeutic value to various areas, including not only perioperative management but also intensive care and outpatient anesthesia [6–13]. Systematic reviews and meta-analyses include various topics, such as comparing various treatments of postoperative nausea and vomiting [ 14 , 15 ], comparing general anesthesia and regional anesthesia [ 16 – 18 ], comparing airway maintenance devices [ 8 , 19 ], comparing various methods of postoperative pain control (e.g., patient-controlled analgesia pumps, nerve block, or analgesics) [ 20 – 23 ], comparing the precision of various monitoring instruments [ 7 ], and meta-analysis of dose-response in various drugs [ 12 ].

Thus, literature reviews and meta-analyses are being conducted in diverse medical fields, and the aim of highlighting their importance is to help better extract accurate, good quality data from the flood of data being produced. However, a lack of understanding about systematic reviews and meta-analyses can lead to incorrect outcomes being derived from the review and analysis processes. If readers indiscriminately accept the results of the many meta-analyses that are published, incorrect data may be obtained. Therefore, in this review, we aim to describe the contents and methods used in systematic reviews and meta-analyses in a way that is easy to understand for future authors and readers of systematic review and meta-analysis.

Study Planning

It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical methods on estimates from two or more different studies to form a pooled estimate [ 1 ]. Following a systematic review, if it is not possible to form a pooled estimate, it can be published as is without progressing to a meta-analysis; however, if it is possible to form a pooled estimate from the extracted data, a meta-analysis can be attempted. Systematic reviews and meta-analyses usually proceed according to the flowchart presented in Fig. 2 . We explain each of the stages below.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f2.jpg

Flowchart illustrating a systematic review.

Formulating research questions

A systematic review attempts to gather all available empirical research by using clearly defined, systematic methods to obtain answers to a specific question. A meta-analysis is the statistical process of analyzing and combining results from several similar studies. Here, the definition of the word “similar” is not made clear, but when selecting a topic for the meta-analysis, it is essential to ensure that the different studies present data that can be combined. If the studies contain data on the same topic that can be combined, a meta-analysis can even be performed using data from only two studies. However, study selection via a systematic review is a precondition for performing a meta-analysis, and it is important to clearly define the Population, Intervention, Comparison, Outcomes (PICO) parameters that are central to evidence-based research. In addition, selection of the research topic is based on logical evidence, and it is important to select a topic that is familiar to readers without clearly confirmed the evidence [ 24 ].

Protocols and registration

In systematic reviews, prior registration of a detailed research plan is very important. In order to make the research process transparent, primary/secondary outcomes and methods are set in advance, and in the event of changes to the method, other researchers and readers are informed when, how, and why. Many studies are registered with an organization like PROSPERO ( http://www.crd.york.ac.uk/PROSPERO/ ), and the registration number is recorded when reporting the study, in order to share the protocol at the time of planning.

Defining inclusion and exclusion criteria

Information is included on the study design, patient characteristics, publication status (published or unpublished), language used, and research period. If there is a discrepancy between the number of patients included in the study and the number of patients included in the analysis, this needs to be clearly explained while describing the patient characteristics, to avoid confusing the reader.

Literature search and study selection

In order to secure proper basis for evidence-based research, it is essential to perform a broad search that includes as many studies as possible that meet the inclusion and exclusion criteria. Typically, the three bibliographic databases Medline, Embase, and Cochrane Central Register of Controlled Trials (CENTRAL) are used. In domestic studies, the Korean databases KoreaMed, KMBASE, and RISS4U may be included. Effort is required to identify not only published studies but also abstracts, ongoing studies, and studies awaiting publication. Among the studies retrieved in the search, the researchers remove duplicate studies, select studies that meet the inclusion/exclusion criteria based on the abstracts, and then make the final selection of studies based on their full text. In order to maintain transparency and objectivity throughout this process, study selection is conducted independently by at least two investigators. When there is a inconsistency in opinions, intervention is required via debate or by a third reviewer. The methods for this process also need to be planned in advance. It is essential to ensure the reproducibility of the literature selection process [ 25 ].

Quality of evidence

However, well planned the systematic review or meta-analysis is, if the quality of evidence in the studies is low, the quality of the meta-analysis decreases and incorrect results can be obtained [ 26 ]. Even when using randomized studies with a high quality of evidence, evaluating the quality of evidence precisely helps determine the strength of recommendations in the meta-analysis. One method of evaluating the quality of evidence in non-randomized studies is the Newcastle-Ottawa Scale, provided by the Ottawa Hospital Research Institute 1) . However, we are mostly focusing on meta-analyses that use randomized studies.

If the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) system ( http://www.gradeworkinggroup.org/ ) is used, the quality of evidence is evaluated on the basis of the study limitations, inaccuracies, incompleteness of outcome data, indirectness of evidence, and risk of publication bias, and this is used to determine the strength of recommendations [ 27 ]. As shown in Table 1 , the study limitations are evaluated using the “risk of bias” method proposed by Cochrane 2) . This method classifies bias in randomized studies as “low,” “high,” or “unclear” on the basis of the presence or absence of six processes (random sequence generation, allocation concealment, blinding participants or investigators, incomplete outcome data, selective reporting, and other biases) [ 28 ].

The Cochrane Collaboration’s Tool for Assessing the Risk of Bias [ 28 ]

Data extraction

Two different investigators extract data based on the objectives and form of the study; thereafter, the extracted data are reviewed. Since the size and format of each variable are different, the size and format of the outcomes are also different, and slight changes may be required when combining the data [ 29 ]. If there are differences in the size and format of the outcome variables that cause difficulties combining the data, such as the use of different evaluation instruments or different evaluation timepoints, the analysis may be limited to a systematic review. The investigators resolve differences of opinion by debate, and if they fail to reach a consensus, a third-reviewer is consulted.

Data Analysis

The aim of a meta-analysis is to derive a conclusion with increased power and accuracy than what could not be able to achieve in individual studies. Therefore, before analysis, it is crucial to evaluate the direction of effect, size of effect, homogeneity of effects among studies, and strength of evidence [ 30 ]. Thereafter, the data are reviewed qualitatively and quantitatively. If it is determined that the different research outcomes cannot be combined, all the results and characteristics of the individual studies are displayed in a table or in a descriptive form; this is referred to as a qualitative review. A meta-analysis is a quantitative review, in which the clinical effectiveness is evaluated by calculating the weighted pooled estimate for the interventions in at least two separate studies.

The pooled estimate is the outcome of the meta-analysis, and is typically explained using a forest plot ( Figs. 3 and ​ and4). 4 ). The black squares in the forest plot are the odds ratios (ORs) and 95% confidence intervals in each study. The area of the squares represents the weight reflected in the meta-analysis. The black diamond represents the OR and 95% confidence interval calculated across all the included studies. The bold vertical line represents a lack of therapeutic effect (OR = 1); if the confidence interval includes OR = 1, it means no significant difference was found between the treatment and control groups.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f3.jpg

Forest plot analyzed by two different models using the same data. (A) Fixed-effect model. (B) Random-effect model. The figure depicts individual trials as filled squares with the relative sample size and the solid line as the 95% confidence interval of the difference. The diamond shape indicates the pooled estimate and uncertainty for the combined effect. The vertical line indicates the treatment group shows no effect (OR = 1). Moreover, if the confidence interval includes 1, then the result shows no evidence of difference between the treatment and control groups.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f4.jpg

Forest plot representing homogeneous data.

Dichotomous variables and continuous variables

In data analysis, outcome variables can be considered broadly in terms of dichotomous variables and continuous variables. When combining data from continuous variables, the mean difference (MD) and standardized mean difference (SMD) are used ( Table 2 ).

Summary of Meta-analysis Methods Available in RevMan [ 28 ]

The MD is the absolute difference in mean values between the groups, and the SMD is the mean difference between groups divided by the standard deviation. When results are presented in the same units, the MD can be used, but when results are presented in different units, the SMD should be used. When the MD is used, the combined units must be shown. A value of “0” for the MD or SMD indicates that the effects of the new treatment method and the existing treatment method are the same. A value lower than “0” means the new treatment method is less effective than the existing method, and a value greater than “0” means the new treatment is more effective than the existing method.

When combining data for dichotomous variables, the OR, risk ratio (RR), or risk difference (RD) can be used. The RR and RD can be used for RCTs, quasi-experimental studies, or cohort studies, and the OR can be used for other case-control studies or cross-sectional studies. However, because the OR is difficult to interpret, using the RR and RD, if possible, is recommended. If the outcome variable is a dichotomous variable, it can be presented as the number needed to treat (NNT), which is the minimum number of patients who need to be treated in the intervention group, compared to the control group, for a given event to occur in at least one patient. Based on Table 3 , in an RCT, if x is the probability of the event occurring in the control group and y is the probability of the event occurring in the intervention group, then x = c/(c + d), y = a/(a + b), and the absolute risk reduction (ARR) = x − y. NNT can be obtained as the reciprocal, 1/ARR.

Calculation of the Number Needed to Treat in the Dichotomous table

Fixed-effect models and random-effect models

In order to analyze effect size, two types of models can be used: a fixed-effect model or a random-effect model. A fixed-effect model assumes that the effect of treatment is the same, and that variation between results in different studies is due to random error. Thus, a fixed-effect model can be used when the studies are considered to have the same design and methodology, or when the variability in results within a study is small, and the variance is thought to be due to random error. Three common methods are used for weighted estimation in a fixed-effect model: 1) inverse variance-weighted estimation 3) , 2) Mantel-Haenszel estimation 4) , and 3) Peto estimation 5) .

A random-effect model assumes heterogeneity between the studies being combined, and these models are used when the studies are assumed different, even if a heterogeneity test does not show a significant result. Unlike a fixed-effect model, a random-effect model assumes that the size of the effect of treatment differs among studies. Thus, differences in variation among studies are thought to be due to not only random error but also between-study variability in results. Therefore, weight does not decrease greatly for studies with a small number of patients. Among methods for weighted estimation in a random-effect model, the DerSimonian and Laird method 6) is mostly used for dichotomous variables, as the simplest method, while inverse variance-weighted estimation is used for continuous variables, as with fixed-effect models. These four methods are all used in Review Manager software (The Cochrane Collaboration, UK), and are described in a study by Deeks et al. [ 31 ] ( Table 2 ). However, when the number of studies included in the analysis is less than 10, the Hartung-Knapp-Sidik-Jonkman method 7) can better reduce the risk of type 1 error than does the DerSimonian and Laird method [ 32 ].

Fig. 3 shows the results of analyzing outcome data using a fixed-effect model (A) and a random-effect model (B). As shown in Fig. 3 , while the results from large studies are weighted more heavily in the fixed-effect model, studies are given relatively similar weights irrespective of study size in the random-effect model. Although identical data were being analyzed, as shown in Fig. 3 , the significant result in the fixed-effect model was no longer significant in the random-effect model. One representative example of the small study effect in a random-effect model is the meta-analysis by Li et al. [ 33 ]. In a large-scale study, intravenous injection of magnesium was unrelated to acute myocardial infarction, but in the random-effect model, which included numerous small studies, the small study effect resulted in an association being found between intravenous injection of magnesium and myocardial infarction. This small study effect can be controlled for by using a sensitivity analysis, which is performed to examine the contribution of each of the included studies to the final meta-analysis result. In particular, when heterogeneity is suspected in the study methods or results, by changing certain data or analytical methods, this method makes it possible to verify whether the changes affect the robustness of the results, and to examine the causes of such effects [ 34 ].

Heterogeneity

Homogeneity test is a method whether the degree of heterogeneity is greater than would be expected to occur naturally when the effect size calculated from several studies is higher than the sampling error. This makes it possible to test whether the effect size calculated from several studies is the same. Three types of homogeneity tests can be used: 1) forest plot, 2) Cochrane’s Q test (chi-squared), and 3) Higgins I 2 statistics. In the forest plot, as shown in Fig. 4 , greater overlap between the confidence intervals indicates greater homogeneity. For the Q statistic, when the P value of the chi-squared test, calculated from the forest plot in Fig. 4 , is less than 0.1, it is considered to show statistical heterogeneity and a random-effect can be used. Finally, I 2 can be used [ 35 ].

I 2 , calculated as shown above, returns a value between 0 and 100%. A value less than 25% is considered to show strong homogeneity, a value of 50% is average, and a value greater than 75% indicates strong heterogeneity.

Even when the data cannot be shown to be homogeneous, a fixed-effect model can be used, ignoring the heterogeneity, and all the study results can be presented individually, without combining them. However, in many cases, a random-effect model is applied, as described above, and a subgroup analysis or meta-regression analysis is performed to explain the heterogeneity. In a subgroup analysis, the data are divided into subgroups that are expected to be homogeneous, and these subgroups are analyzed. This needs to be planned in the predetermined protocol before starting the meta-analysis. A meta-regression analysis is similar to a normal regression analysis, except that the heterogeneity between studies is modeled. This process involves performing a regression analysis of the pooled estimate for covariance at the study level, and so it is usually not considered when the number of studies is less than 10. Here, univariate and multivariate regression analyses can both be considered.

Publication bias

Publication bias is the most common type of reporting bias in meta-analyses. This refers to the distortion of meta-analysis outcomes due to the higher likelihood of publication of statistically significant studies rather than non-significant studies. In order to test the presence or absence of publication bias, first, a funnel plot can be used ( Fig. 5 ). Studies are plotted on a scatter plot with effect size on the x-axis and precision or total sample size on the y-axis. If the points form an upside-down funnel shape, with a broad base that narrows towards the top of the plot, this indicates the absence of a publication bias ( Fig. 5A ) [ 29 , 36 ]. On the other hand, if the plot shows an asymmetric shape, with no points on one side of the graph, then publication bias can be suspected ( Fig. 5B ). Second, to test publication bias statistically, Begg and Mazumdar’s rank correlation test 8) [ 37 ] or Egger’s test 9) [ 29 ] can be used. If publication bias is detected, the trim-and-fill method 10) can be used to correct the bias [ 38 ]. Fig. 6 displays results that show publication bias in Egger’s test, which has then been corrected using the trim-and-fill method using Comprehensive Meta-Analysis software (Biostat, USA).

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f5.jpg

Funnel plot showing the effect size on the x-axis and sample size on the y-axis as a scatter plot. (A) Funnel plot without publication bias. The individual plots are broader at the bottom and narrower at the top. (B) Funnel plot with publication bias. The individual plots are located asymmetrically.

An external file that holds a picture, illustration, etc.
Object name is kjae-2018-71-2-103f6.jpg

Funnel plot adjusted using the trim-and-fill method. White circles: comparisons included. Black circles: inputted comparisons using the trim-and-fill method. White diamond: pooled observed log risk ratio. Black diamond: pooled inputted log risk ratio.

Result Presentation

When reporting the results of a systematic review or meta-analysis, the analytical content and methods should be described in detail. First, a flowchart is displayed with the literature search and selection process according to the inclusion/exclusion criteria. Second, a table is shown with the characteristics of the included studies. A table should also be included with information related to the quality of evidence, such as GRADE ( Table 4 ). Third, the results of data analysis are shown in a forest plot and funnel plot. Fourth, if the results use dichotomous data, the NNT values can be reported, as described above.

The GRADE Evidence Quality for Each Outcome

N: number of studies, ROB: risk of bias, PON: postoperative nausea, POV: postoperative vomiting, PONV: postoperative nausea and vomiting, CI: confidence interval, RR: risk ratio, AR: absolute risk.

When Review Manager software (The Cochrane Collaboration, UK) is used for the analysis, two types of P values are given. The first is the P value from the z-test, which tests the null hypothesis that the intervention has no effect. The second P value is from the chi-squared test, which tests the null hypothesis for a lack of heterogeneity. The statistical result for the intervention effect, which is generally considered the most important result in meta-analyses, is the z-test P value.

A common mistake when reporting results is, given a z-test P value greater than 0.05, to say there was “no statistical significance” or “no difference.” When evaluating statistical significance in a meta-analysis, a P value lower than 0.05 can be explained as “a significant difference in the effects of the two treatment methods.” However, the P value may appear non-significant whether or not there is a difference between the two treatment methods. In such a situation, it is better to announce “there was no strong evidence for an effect,” and to present the P value and confidence intervals. Another common mistake is to think that a smaller P value is indicative of a more significant effect. In meta-analyses of large-scale studies, the P value is more greatly affected by the number of studies and patients included, rather than by the significance of the results; therefore, care should be taken when interpreting the results of a meta-analysis.

When performing a systematic literature review or meta-analysis, if the quality of studies is not properly evaluated or if proper methodology is not strictly applied, the results can be biased and the outcomes can be incorrect. However, when systematic reviews and meta-analyses are properly implemented, they can yield powerful results that could usually only be achieved using large-scale RCTs, which are difficult to perform in individual studies. As our understanding of evidence-based medicine increases and its importance is better appreciated, the number of systematic reviews and meta-analyses will keep increasing. However, indiscriminate acceptance of the results of all these meta-analyses can be dangerous, and hence, we recommend that their results be received critically on the basis of a more accurate understanding.

1) http://www.ohri.ca .

2) http://methods.cochrane.org/bias/assessing-risk-bias-included-studies .

3) The inverse variance-weighted estimation method is useful if the number of studies is small with large sample sizes.

4) The Mantel-Haenszel estimation method is useful if the number of studies is large with small sample sizes.

5) The Peto estimation method is useful if the event rate is low or one of the two groups shows zero incidence.

6) The most popular and simplest statistical method used in Review Manager and Comprehensive Meta-analysis software.

7) Alternative random-effect model meta-analysis that has more adequate error rates than does the common DerSimonian and Laird method, especially when the number of studies is small. However, even with the Hartung-Knapp-Sidik-Jonkman method, when there are less than five studies with very unequal sizes, extra caution is needed.

8) The Begg and Mazumdar rank correlation test uses the correlation between the ranks of effect sizes and the ranks of their variances [ 37 ].

9) The degree of funnel plot asymmetry as measured by the intercept from the regression of standard normal deviates against precision [ 29 ].

10) If there are more small studies on one side, we expect the suppression of studies on the other side. Trimming yields the adjusted effect size and reduces the variance of the effects by adding the original studies back into the analysis as a mirror image of each study.

Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Systematic review Q & A

What is a systematic review.

A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies. A well-designed systematic review includes clear objectives, pre-selected criteria for identifying eligible studies, an explicit methodology, a thorough and reproducible search of the literature, an assessment of the validity or risk of bias of each included study, and a systematic synthesis, analysis and presentation of the findings of the included studies. A systematic review may include a meta-analysis.

For details about carrying out systematic reviews, see the Guides and Standards section of this guide.

Is my research topic appropriate for systematic review methods?

A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure. By focusing on a single intervention or a few specific interventions for a particular condition, the investigator can ensure a manageable results set. Moreover, examining a single or small set of related interventions, exposures, or outcomes, will simplify the assessment of studies and the synthesis of the findings.

Systematic reviews are poor tools for hypothesis generation: for instance, to determine what interventions have been used to increase the awareness and acceptability of a vaccine or to investigate the ways that predictive analytics have been used in health care management. In the first case, we don't know what interventions to search for and so have to screen all the articles about awareness and acceptability. In the second, there is no agreed on set of methods that make up predictive analytics, and health care management is far too broad. The search will necessarily be incomplete, vague and very large all at the same time. In most cases, reviews without clearly and exactly specified populations, interventions, exposures, and outcomes will produce results sets that quickly outstrip the resources of a small team and offer no consistent way to assess and synthesize findings from the studies that are identified.

If not a systematic review, then what?

You might consider performing a scoping review . This framework allows iterative searching over a reduced number of data sources and no requirement to assess individual studies for risk of bias. The framework includes built-in mechanisms to adjust the analysis as the work progresses and more is learned about the topic. A scoping review won't help you limit the number of records you'll need to screen (broad questions lead to large results sets) but may give you means of dealing with a large set of results.

This tool can help you decide what kind of review is right for your question.

Can my student complete a systematic review during her summer project?

Probably not. Systematic reviews are a lot of work. Including creating the protocol, building and running a quality search, collecting all the papers, evaluating the studies that meet the inclusion criteria and extracting and analyzing the summary data, a well done review can require dozens to hundreds of hours of work that can span several months. Moreover, a systematic review requires subject expertise, statistical support and a librarian to help design and run the search. Be aware that librarians sometimes have queues for their search time. It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

How can I know if my topic has been been reviewed already?

Before starting out on a systematic review, check to see if someone has done it already. In PubMed you can use the systematic review subset to limit to a broad group of papers that is enriched for systematic reviews. You can invoke the subset by selecting if from the Article Types filters to the left of your PubMed results, or you can append AND systematic[sb] to your search. For example:

"neoadjuvant chemotherapy" AND systematic[sb]

The systematic review subset is very noisy, however. To quickly focus on systematic reviews (knowing that you may be missing some), simply search for the word systematic in the title:

"neoadjuvant chemotherapy" AND systematic[ti]

Any PRISMA-compliant systematic review will be captured by this method since including the words "systematic review" in the title is a requirement of the PRISMA checklist. Cochrane systematic reviews do not include 'systematic' in the title, however. It's worth checking the Cochrane Database of Systematic Reviews independently.

You can also search for protocols that will indicate that another group has set out on a similar project. Many investigators will register their protocols in PROSPERO , a registry of review protocols. Other published protocols as well as Cochrane Review protocols appear in the Cochrane Methodology Register, a part of the Cochrane Library .

  • Next: Guides and Standards >>
  • Last Updated: Feb 26, 2024 3:17 PM
  • URL: https://guides.library.harvard.edu/meta-analysis

Book cover

Handbook of Clinical Psychology Competencies pp 483–500 Cite as

Literature Reviews and Meta Analysis

  • Joseph A. Durlak 3  
  • Reference work entry

4293 Accesses

2 Citations

This chapter discusses the most common research methodology in psychology: the literature review. Reviews generally have three purposes: (1) to critically evaluate and summarize a body of research; (2) to reach some conclusions about that research; and, finally, (3) to offer suggestions for future work. The basic and expert competencies required for completing a high quality literature review are described by discussing seven major components of reviews along with relevant questions that should be answered to assess the successful completion of each component. A major focus is on meta-analysis, but guidelines are pertinent in assessing the quality of various types of reviews including reviews of theories and clinical applications. Readers are directed to additional helpful resources in order to aide them in becoming critical consumers or producers of good literature reviews.

  • Pool Standard Deviation
  • Methodological Feature
  • Basic Competency
  • Rigorous Review
  • Anxiety Outcome

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

American Psychological Association (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.

Google Scholar  

Bem, D. J. (1995). Writing a review article for Psychological Bulletin . Psychological Bulletin, 118 , 172–177.

Article   Google Scholar  

Boote, D. N., & Beile, P. (2005). Scholars before researchers: On the centrality of the dissertation literature re-view in research preparation. Educational Research-er, 34 , 3–15.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Cooper, H. (2008). The search for meaningful ways to express the effects of interventions. Child Development Perspectives, 2 , 181–186.

Cooper, H., & Hedges, L. V. (Eds.). (1994). The handbook of research synthesis . New York: Russell Sage Foundation.

Cuipers, P., van Straten, A., & Warmerdam, L. (2007). Behavioral activation treatments of depression: A meta-analysis. Clinical Psychology Review, 27 , 318–326.

Cumming, G., & Finch, S. (2005). Inference by eye; confidence intervals and how to read pictures of data. American Psychologist, 60 , 170–180.

Article   PubMed   Google Scholar  

Dickersin, K. (1997). How important is publication bias?: A synthesis of available data. AIDS Education and Prevention, 9 , 15–21.

PubMed   Google Scholar  

Drotar, D. (2000). Reviewing and editing manuscripts for scientific journals. In Drotar, D. (Ed.), Handbook of research in pediatric and child clinical psychology (pp. 409–424). New York: Kluwer/Plenum.

Durlak, J. A. (1995). Understanding meta-analysis. In Grimm, L., & Yarnold P. (Eds.), Reading and understanding multivariate statistics (pp. 319–352). Washington, DC: American Psychological Association.

Durlak, J. A. (2000). How to evaluate a meta-analysis. In Drotar, D. (Ed.), Handbook of research in pediatric and clinical child psychology (pp. 395–407). New York: Kluwer/Plenum.

Durlak, J. A. (2003). Basic principles of meta analysis. In Roberts, M., & Ilardi, S. S. (Eds.), Methods of research in clinical psychology: A Handbook (pp. 196–209). Malden, MA: Blackwell.

Durlak, J. A. (in press). How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology .

Durlak, J. A., Celio, C. I., Pachan, M. K., & Schellinger, K. B. (in press). Sometimes it is the researchers, not the research that goes “Off the rails”: The value of clear, complete and precise information in scientific reports. In Streiner, D., & Sidani, S. (Eds.), When research studies go off the rails . New York: Guilford.

Durlak, J. A., & Dupre, E. P. (2008). Implementation matters: A review of research on the influence of implementation on program outcomes and the factors affecting implementation. American Journal of Com-munity Psychology, 41 , 327–350.

Durlak, J. A., Meerson, I., & Ewell-Foster, C. (2003). Meta-analysis. In Thomas, J. C., & Hersen, M. (Eds.), Understanding research in clinical and counseling psychology: A textbook (pp. 243–267). Mahwah, NJ: Erlbaum.

Durlak, J. A., Weissberg, R. P., & Pachan, M. (in press). A meta-analysis of after-school programs that seek to promote personal and social skills in children and adolescents. American Journal of Community Psychology .

Durlak, J. A., Weissberg, R. P., Taylor, R. D., Dymnicki, A. B., & Schellinger, K. B. (in press). The impact of enhancing students’ social and emotional learning: a meta-analysis of school-based universal interventions. Child Development . Unpublished manuscript, Loyola University, Chicago.

Durlak, J. A., & Wells, A. M. (1998). Evaluation of indicated preventive intervention (secondary prevention) mental health programs for children and adolescents. American Journal of Community Psychology, 26 , 775–802.

Fleiss, J. L. (1994). Measures of effect size for categorical data. In Cooper, H., & Hedges, L. V. (Eds.), The handbook of research synthesis (pp. 245–260). New York: Russell Sage.

Galvan, J. L. (2004). Writing literature reviews . Glendale, CA: Pyczak.

Haddock, C. K., Rinsdkopf, D., & Shadish, W. R. (1998). Using odds ratios as effect sizes for meta-analysis of dichotomous data: A primer on methods and issues. Psychological Methods, 3 , 339–353.

Hahn, R., Fuqua-Whitley, D., Wethington, H., Lowy, J., Crosby, A., Fullilove, M., et al. (2007). Effectiveness of universal school-based programs to prevent violent and aggressive behavior: A systematic review. Ameri-can Journal of Preventive Medicine, 33 (Suppl. 2S), 114–129.

Haney, P., & Durlak, J. A. (1998). Changing self-esteem in children and adolescents: A meta-analytic review. Journal of Clinical Child Psychology, 27 , 423–433.

Hartmann, D. P. (Ed.). (1982). Using observers to study behavior: New directions for methodology of social and behavioral sciences . San Francisco: Jossey-Bass

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis . New York: Academic.

Higgins, J. P., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327 , 557–560.

Hill, J. C., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2 , 172–177.

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis (2nd ed.) . Thousand Oaks, CA: Sage.

Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of psychological, educational, and behavioral treatment. Confirmation from meta-analysis. American Psychol-ogist, 48 , 1181–1209.

Lipsey, M. W., & Wilson D. B. (2001). Practical meta-analysis . Thousand Oaks, CA: Sage.

Maxwell, J. A. (2006a). Literature reviews of, and for, educational research: A commentary on Boote and Beile’s “Scholars before researchers.” Educational Researcher, 35 , 28–31.

Maxwell, J. A. (2006b). On “Literature reviews of, and for, educational research”: A response to the critique. Educational Researcher, 35 , 32–35.

McGrath, R. E., & Meyer, B. (2006). When effect sizes disagree: The case of r and d . Psychological Methods, 11 , 386–401.

Oxman, A. D. (1994). Checklists for review articles. British Medical Journal, 309 , 648–651.

Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2007). Performance of the trim and fill method in the presence of publication bias and between-study variability. Statistics in Medicine, 26 , 4544–4562.

Randolph, J. J., & Edmondson, R. S. (2005). Using the Binomial Effect Size Display (BESD) to present the magnitude of effect sizes to the evaluation audience. Practical Assessment, Research and Evaluation, 10 (14). Retrieved January 21, 2008, from http://pareonline.net/getvn.asp?v = 10&n = 14.

Rosenthal, R. (1991). Meta-analytic procedures for social research (rev. ed.). Newbury Park, CA: Sage.

Rosenthal, R. (2001). Meta-analytic procedures for social research (rev. ed.). Newbury Park, CA: Sage.

Rosenthal, R., & Rubin, D. B. (1982). A simple general purpose display of magnitude and experimental effect. Journal of Educational Psychology, 74 , 166–169.

Ruscio, J. (2008). A probability-based measure of effect size: Robustness to base rates and other factors. Psychological Methods, 13 , 19–30.

Sternberg, R. J. (1991). Editorial. Psychological Bulletin, 109 , 3–4.

Sternberg, R. J., Hojjat, M., Brigockas, M. G., & Grigorenko, E. L. (1997). Getting in: Criteria for acceptance of manuscripts in Psychological Bulletin . Psychological Bulletin, 121 , 321–323.

Taylor, B., Wylie, E., Dempster, M., & Donnelly, M. (2007). Systematically retrieving research: A case study evaluating seven databases. Research on Social Work Practice, 17 , 697–706.

Terrin, N., Schmid, C. H., Lau, J., & Oklin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in Medicine, 22 , 2113–2126.

Tobler, N. S., Roona, M. R., Ochshorn, P., Marshall, D. G., Streke, A. V., & Stackpole, K. M. (2000). School-based adolescent drug prevention programs: 1998 meta-analysis. The Journal of Primary Prevention, 20 , 275–336.

Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret effect sizes. Journal of Counseling Psychology, 51 , 473–481.

Valentine, J., & Cooper, H. (2003). Effect size substantive interpretation guidelines: Issues in the interpretation of effect sizes. Washington, DC: What Works Clearinghouse. Retrieved August 22, 2008, from http://ies.ed.gov/ncee/wwc/references/iDocViewer/Doc.aspx?docId = 1&tocId = 5.

Volker, M. A. (2006). Reporting effect sizes in school psychology research. Psychology in the Schools, 43 , 653–672.

Weisz, J. R., Jensen-Doss, A., & Hawley, K. M. (2006). Evidence-based youth psychotherapies versus usual clinical care. American Psychologist, 61 , 671–689.

Wilson, D. B., Gottfredson, D. C., & Najaka, S. S. (2001). School-based prevention of problem behaviors: A meta-analysis. Journal of Quantitative Criminology, 17 , 247–272.

Wilson, D. B., & Lipsey, M. W. (2001). The role of method in treatment effectiveness research: Evi-dence from meta-analysis. Psychological Methods, 6 , 413–429.

Wolf, F. M. (1986). Meta-analysis: Quantitative methods for research synthesis . Beverly Hills, CA: Sage.

Download references

Author information

Authors and affiliations.

Loyola University, Chicago, IL, USA

Joseph A. Durlak

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

School of Professional Psychology, Pacific University, 511 SW 10th Avenue, 4th fl oor, 97205, Portland, OR, USA

Jay C. Thomas Ph.D., ABPP ( Professor and Assistant Dean ) ( Professor and Assistant Dean )

School of Professional Psychology, HCP/Pacific University, 222 SE 8th Ave., Suite 563, 97123-4218, Hillsboro, OR, USA

Michel Hersen Ph.D., ABPP ( Professor and Dean ) ( Professor and Dean )

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media LLC

About this entry

Cite this entry.

Durlak, J.A. (2010). Literature Reviews and Meta Analysis. In: Thomas, J.C., Hersen, M. (eds) Handbook of Clinical Psychology Competencies. Springer, New York, NY. https://doi.org/10.1007/978-0-387-09757-2_18

Download citation

DOI : https://doi.org/10.1007/978-0-387-09757-2_18

Publisher Name : Springer, New York, NY

Print ISBN : 978-0-387-09756-5

Online ISBN : 978-0-387-09757-2

eBook Packages : Behavioral Science

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • En español – ExME
  • Em português – EME

Systematic reviews vs meta-analysis: what’s the difference?

Posted on 24th July 2023 by Verónica Tanco Tellechea

""

You may hear the terms ‘systematic review’ and ‘meta-analysis being used interchangeably’. Although they are related, they are distinctly different. Learn more in this blog for beginners.

What is a systematic review?

According to Cochrane (1), a systematic review attempts to identify, appraise and synthesize all the empirical evidence to answer a specific research question. Thus, a systematic review is where you might find the most relevant, adequate, and current information regarding a specific topic. In the levels of evidence pyramid , systematic reviews are only surpassed by meta-analyses. 

To conduct a systematic review, you will need, among other things: 

  • A specific research question, usually in the form of a PICO question.
  • Pre-specified eligibility criteria, to decide which articles will be included or discarded from the review. 
  • To follow a systematic method that will minimize bias.

You can find protocols that will guide you from both Cochrane and the Equator Network , among other places, and if you are a beginner to the topic then have a read of an overview about systematic reviews.

What is a meta-analysis?

A meta-analysis is a quantitative, epidemiological study design used to systematically assess the results of previous research (2) . Usually, they are based on randomized controlled trials, though not always. This means that a meta-analysis is a mathematical tool that allows researchers to mathematically combine outcomes from multiple studies.

When can a meta-analysis be implemented?

There is always the possibility of conducting a meta-analysis, yet, for it to throw the best possible results it should be performed when the studies included in the systematic review are of good quality, similar designs, and have similar outcome measures.

Why are meta-analyses important?

Outcomes from a meta-analysis may provide more precise information regarding the estimate of the effect of what is being studied because it merges outcomes from multiple studies. In a meta-analysis, data from various trials are combined and generate an average result (1), which is portrayed in a forest plot diagram. Moreover, meta-analysis also include a funnel plot diagram to visually detect publication bias.

Conclusions

A systematic review is an article that synthesizes available evidence on a certain topic utilizing a specific research question, pre-specified eligibility criteria for including articles, and a systematic method for its production. Whereas a meta-analysis is a quantitative, epidemiological study design used to assess the results of articles included in a systematic-review. 

Remember: All meta-analyses involve a systematic review, but not all systematic reviews involve a meta-analysis.

If you would like some further reading on this topic, we suggest the following:

The systematic review – a S4BE blog article

Meta-analysis: what, why, and how – a S4BE blog article

The difference between a systematic review and a meta-analysis – a blog article via Covidence

Systematic review vs meta-analysis: what’s the difference? A 5-minute video from Research Masterminds:

  • About Cochrane reviews [Internet]. Cochranelibrary.com. [cited 2023 Apr 30]. Available from: https://www.cochranelibrary.com/about/about-cochrane-reviews
  • Haidich AB. Meta-analysis in medical research. Hippokratia. 2010;14(Suppl 1):29–37.

' src=

Verónica Tanco Tellechea

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

literature review meta analysis

How to read a funnel plot

This blog introduces you to funnel plots, guiding you through how to read them and what may cause them to look asymmetrical.

""

Heterogeneity in meta-analysis

When you bring studies together in a meta-analysis, one of the things you need to consider is the variability in your studies – this is called heterogeneity. This blog presents the three types of heterogeneity, considers the different types of outcome data, and delves a little more into dealing with the variations.

""

Natural killer cells in glioblastoma therapy

As seen in a previous blog from Davide, modern neuroscience often interfaces with other medical specialities. In this blog, he provides a summary of new evidence about the potential of a therapeutic strategy born at the crossroad between neurology, immunology and oncology.

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • The PRISMA 2020...

The PRISMA 2020 statement: an updated guideline for reporting systematic reviews

PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews

  • Related content
  • Peer review
  • Matthew J Page , senior research fellow 1 ,
  • Joanne E McKenzie , associate professor 1 ,
  • Patrick M Bossuyt , professor 2 ,
  • Isabelle Boutron , professor 3 ,
  • Tammy C Hoffmann , professor 4 ,
  • Cynthia D Mulrow , professor 5 ,
  • Larissa Shamseer , doctoral student 6 ,
  • Jennifer M Tetzlaff , research product specialist 7 ,
  • Elie A Akl , professor 8 ,
  • Sue E Brennan , senior research fellow 1 ,
  • Roger Chou , professor 9 ,
  • Julie Glanville , associate director 10 ,
  • Jeremy M Grimshaw , professor 11 ,
  • Asbjørn Hróbjartsson , professor 12 ,
  • Manoj M Lalu , associate scientist and assistant professor 13 ,
  • Tianjing Li , associate professor 14 ,
  • Elizabeth W Loder , professor 15 ,
  • Evan Mayo-Wilson , associate professor 16 ,
  • Steve McDonald , senior research fellow 1 ,
  • Luke A McGuinness , research associate 17 ,
  • Lesley A Stewart , professor and director 18 ,
  • James Thomas , professor 19 ,
  • Andrea C Tricco , scientist and associate professor 20 ,
  • Vivian A Welch , associate professor 21 ,
  • Penny Whiting , associate professor 17 ,
  • David Moher , director and professor 22
  • 1 School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
  • 2 Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, Netherlands
  • 3 Université de Paris, Centre of Epidemiology and Statistics (CRESS), Inserm, F 75004 Paris, France
  • 4 Institute for Evidence-Based Healthcare, Faculty of Health Sciences and Medicine, Bond University, Gold Coast, Australia
  • 5 University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA; Annals of Internal Medicine
  • 6 Knowledge Translation Program, Li Ka Shing Knowledge Institute, Toronto, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada
  • 7 Evidence Partners, Ottawa, Canada
  • 8 Clinical Research Institute, American University of Beirut, Beirut, Lebanon; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
  • 9 Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
  • 10 York Health Economics Consortium (YHEC Ltd), University of York, York, UK
  • 11 Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada; School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada; Department of Medicine, University of Ottawa, Ottawa, Canada
  • 12 Centre for Evidence-Based Medicine Odense (CEBMO) and Cochrane Denmark, Department of Clinical Research, University of Southern Denmark, Odense, Denmark; Open Patient data Exploratory Network (OPEN), Odense University Hospital, Odense, Denmark
  • 13 Department of Anesthesiology and Pain Medicine, The Ottawa Hospital, Ottawa, Canada; Clinical Epidemiology Program, Blueprint Translational Research Group, Ottawa Hospital Research Institute, Ottawa, Canada; Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, Canada
  • 14 Department of Ophthalmology, School of Medicine, University of Colorado Denver, Denver, Colorado, United States; Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
  • 15 Division of Headache, Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA; Head of Research, The BMJ , London, UK
  • 16 Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
  • 17 Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
  • 18 Centre for Reviews and Dissemination, University of York, York, UK
  • 19 EPPI-Centre, UCL Social Research Institute, University College London, London, UK
  • 20 Li Ka Shing Knowledge Institute of St. Michael's Hospital, Unity Health Toronto, Toronto, Canada; Epidemiology Division of the Dalla Lana School of Public Health and the Institute of Health Management, Policy, and Evaluation, University of Toronto, Toronto, Canada; Queen's Collaboration for Health Care Quality Joanna Briggs Institute Centre of Excellence, Queen's University, Kingston, Canada
  • 21 Methods Centre, Bruyère Research Institute, Ottawa, Ontario, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada
  • 22 Centre for Journalology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada; School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Canada
  • Correspondence to: M J Page matthew.page{at}monash.edu
  • Accepted 4 January 2021

The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline. The PRISMA 2020 statement replaces the 2009 statement and includes new reporting guidance that reflects advances in methods to identify, select, appraise, and synthesise studies. The structure and presentation of the items have been modified to facilitate implementation. In this article, we present the PRISMA 2020 27-item checklist, an expanded checklist that details reporting recommendations for each item, the PRISMA 2020 abstract checklist, and the revised flow diagrams for original and updated reviews.

Systematic reviews serve many critical roles. They can provide syntheses of the state of knowledge in a field, from which future research priorities can be identified; they can address questions that otherwise could not be answered by individual studies; they can identify problems in primary research that should be rectified in future studies; and they can generate or evaluate theories about how or why phenomena occur. Systematic reviews therefore generate various types of knowledge for different users of reviews (such as patients, healthcare providers, researchers, and policy makers). 1 2 To ensure a systematic review is valuable to users, authors should prepare a transparent, complete, and accurate account of why the review was done, what they did (such as how studies were identified and selected) and what they found (such as characteristics of contributing studies and results of meta-analyses). Up-to-date reporting guidance facilitates authors achieving this. 3

The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement published in 2009 (hereafter referred to as PRISMA 2009) 4 5 6 7 8 9 10 is a reporting guideline designed to address poor reporting of systematic reviews. 11 The PRISMA 2009 statement comprised a checklist of 27 items recommended for reporting in systematic reviews and an “explanation and elaboration” paper 12 13 14 15 16 providing additional reporting guidance for each item, along with exemplars of reporting. The recommendations have been widely endorsed and adopted, as evidenced by its co-publication in multiple journals, citation in over 60 000 reports (Scopus, August 2020), endorsement from almost 200 journals and systematic review organisations, and adoption in various disciplines. Evidence from observational studies suggests that use of the PRISMA 2009 statement is associated with more complete reporting of systematic reviews, 17 18 19 20 although more could be done to improve adherence to the guideline. 21

Many innovations in the conduct of systematic reviews have occurred since publication of the PRISMA 2009 statement. For example, technological advances have enabled the use of natural language processing and machine learning to identify relevant evidence, 22 23 24 methods have been proposed to synthesise and present findings when meta-analysis is not possible or appropriate, 25 26 27 and new methods have been developed to assess the risk of bias in results of included studies. 28 29 Evidence on sources of bias in systematic reviews has accrued, culminating in the development of new tools to appraise the conduct of systematic reviews. 30 31 Terminology used to describe particular review processes has also evolved, as in the shift from assessing “quality” to assessing “certainty” in the body of evidence. 32 In addition, the publishing landscape has transformed, with multiple avenues now available for registering and disseminating systematic review protocols, 33 34 disseminating reports of systematic reviews, and sharing data and materials, such as preprint servers and publicly accessible repositories. To capture these advances in the reporting of systematic reviews necessitated an update to the PRISMA 2009 statement.

Summary points

To ensure a systematic review is valuable to users, authors should prepare a transparent, complete, and accurate account of why the review was done, what they did, and what they found

The PRISMA 2020 statement provides updated reporting guidance for systematic reviews that reflects advances in methods to identify, select, appraise, and synthesise studies

The PRISMA 2020 statement consists of a 27-item checklist, an expanded checklist that details reporting recommendations for each item, the PRISMA 2020 abstract checklist, and revised flow diagrams for original and updated reviews

We anticipate that the PRISMA 2020 statement will benefit authors, editors, and peer reviewers of systematic reviews, and different users of reviews, including guideline developers, policy makers, healthcare providers, patients, and other stakeholders

Development of PRISMA 2020

A complete description of the methods used to develop PRISMA 2020 is available elsewhere. 35 We identified PRISMA 2009 items that were often reported incompletely by examining the results of studies investigating the transparency of reporting of published reviews. 17 21 36 37 We identified possible modifications to the PRISMA 2009 statement by reviewing 60 documents providing reporting guidance for systematic reviews (including reporting guidelines, handbooks, tools, and meta-research studies). 38 These reviews of the literature were used to inform the content of a survey with suggested possible modifications to the 27 items in PRISMA 2009 and possible additional items. Respondents were asked whether they believed we should keep each PRISMA 2009 item as is, modify it, or remove it, and whether we should add each additional item. Systematic review methodologists and journal editors were invited to complete the online survey (110 of 220 invited responded). We discussed proposed content and wording of the PRISMA 2020 statement, as informed by the review and survey results, at a 21-member, two-day, in-person meeting in September 2018 in Edinburgh, Scotland. Throughout 2019 and 2020, we circulated an initial draft and five revisions of the checklist and explanation and elaboration paper to co-authors for feedback. In April 2020, we invited 22 systematic reviewers who had expressed interest in providing feedback on the PRISMA 2020 checklist to share their views (via an online survey) on the layout and terminology used in a preliminary version of the checklist. Feedback was received from 15 individuals and considered by the first author, and any revisions deemed necessary were incorporated before the final version was approved and endorsed by all co-authors.

The PRISMA 2020 statement

Scope of the guideline.

The PRISMA 2020 statement has been designed primarily for systematic reviews of studies that evaluate the effects of health interventions, irrespective of the design of the included studies. However, the checklist items are applicable to reports of systematic reviews evaluating other interventions (such as social or educational interventions), and many items are applicable to systematic reviews with objectives other than evaluating interventions (such as evaluating aetiology, prevalence, or prognosis). PRISMA 2020 is intended for use in systematic reviews that include synthesis (such as pairwise meta-analysis or other statistical synthesis methods) or do not include synthesis (for example, because only one eligible study is identified). The PRISMA 2020 items are relevant for mixed-methods systematic reviews (which include quantitative and qualitative studies), but reporting guidelines addressing the presentation and synthesis of qualitative data should also be consulted. 39 40 PRISMA 2020 can be used for original systematic reviews, updated systematic reviews, or continually updated (“living”) systematic reviews. However, for updated and living systematic reviews, there may be some additional considerations that need to be addressed. Where there is relevant content from other reporting guidelines, we reference these guidelines within the items in the explanation and elaboration paper 41 (such as PRISMA-Search 42 in items 6 and 7, Synthesis without meta-analysis (SWiM) reporting guideline 27 in item 13d). Box 1 includes a glossary of terms used throughout the PRISMA 2020 statement.

Glossary of terms

Systematic review —A review that uses explicit, systematic methods to collate and synthesise findings of studies that address a clearly formulated question 43

Statistical synthesis —The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates (described below) and other methods, such as combining P values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect (see McKenzie and Brennan 25 for a description of each method)

Meta-analysis of effect estimates —A statistical technique used to synthesise results when study effect estimates and their variances are available, yielding a quantitative summary of results 25

Outcome —An event or measurement collected for participants in a study (such as quality of life, mortality)

Result —The combination of a point estimate (such as a mean difference, risk ratio, or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome

Report —A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information

Record —The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.

Study —An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses

PRISMA 2020 is not intended to guide systematic review conduct, for which comprehensive resources are available. 43 44 45 46 However, familiarity with PRISMA 2020 is useful when planning and conducting systematic reviews to ensure that all recommended information is captured. PRISMA 2020 should not be used to assess the conduct or methodological quality of systematic reviews; other tools exist for this purpose. 30 31 Furthermore, PRISMA 2020 is not intended to inform the reporting of systematic review protocols, for which a separate statement is available (PRISMA for Protocols (PRISMA-P) 2015 statement 47 48 ). Finally, extensions to the PRISMA 2009 statement have been developed to guide reporting of network meta-analyses, 49 meta-analyses of individual participant data, 50 systematic reviews of harms, 51 systematic reviews of diagnostic test accuracy studies, 52 and scoping reviews 53 ; for these types of reviews we recommend authors report their review in accordance with the recommendations in PRISMA 2020 along with the guidance specific to the extension.

How to use PRISMA 2020

The PRISMA 2020 statement (including the checklists, explanation and elaboration, and flow diagram) replaces the PRISMA 2009 statement, which should no longer be used. Box 2 summarises noteworthy changes from the PRISMA 2009 statement. The PRISMA 2020 checklist includes seven sections with 27 items, some of which include sub-items ( table 1 ). A checklist for journal and conference abstracts for systematic reviews is included in PRISMA 2020. This abstract checklist is an update of the 2013 PRISMA for Abstracts statement, 54 reflecting new and modified content in PRISMA 2020 ( table 2 ). A template PRISMA flow diagram is provided, which can be modified depending on whether the systematic review is original or updated ( fig 1 ).

Noteworthy changes to the PRISMA 2009 statement

Inclusion of the abstract reporting checklist within PRISMA 2020 (see item #2 and table 2 ).

Movement of the ‘Protocol and registration’ item from the start of the Methods section of the checklist to a new Other section, with addition of a sub-item recommending authors describe amendments to information provided at registration or in the protocol (see item #24a-24c).

Modification of the ‘Search’ item to recommend authors present full search strategies for all databases, registers and websites searched, not just at least one database (see item #7).

Modification of the ‘Study selection’ item in the Methods section to emphasise the reporting of how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process (see item #8).

Addition of a sub-item to the ‘Data items’ item recommending authors report how outcomes were defined, which results were sought, and methods for selecting a subset of results from included studies (see item #10a).

Splitting of the ‘Synthesis of results’ item in the Methods section into six sub-items recommending authors describe: the processes used to decide which studies were eligible for each synthesis; any methods required to prepare the data for synthesis; any methods used to tabulate or visually display results of individual studies and syntheses; any methods used to synthesise results; any methods used to explore possible causes of heterogeneity among study results (such as subgroup analysis, meta-regression); and any sensitivity analyses used to assess robustness of the synthesised results (see item #13a-13f).

Addition of a sub-item to the ‘Study selection’ item in the Results section recommending authors cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded (see item #16b).

Splitting of the ‘Synthesis of results’ item in the Results section into four sub-items recommending authors: briefly summarise the characteristics and risk of bias among studies contributing to the synthesis; present results of all statistical syntheses conducted; present results of any investigations of possible causes of heterogeneity among study results; and present results of any sensitivity analyses (see item #20a-20d).

Addition of new items recommending authors report methods for and results of an assessment of certainty (or confidence) in the body of evidence for an outcome (see items #15 and #22).

Addition of a new item recommending authors declare any competing interests (see item #26).

Addition of a new item recommending authors indicate whether data, analytic code and other materials used in the review are publicly available and if so, where they can be found (see item #27).

PRISMA 2020 item checklist

  • View inline

PRISMA 2020 for Abstracts checklist*

Fig 1

PRISMA 2020 flow diagram template for systematic reviews. The new design is adapted from flow diagrams proposed by Boers, 55 Mayo-Wilson et al. 56 and Stovold et al. 57 The boxes in grey should only be completed if applicable; otherwise they should be removed from the flow diagram. Note that a “report” could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report or any other document providing relevant information.

  • Download figure
  • Open in new tab
  • Download powerpoint

We recommend authors refer to PRISMA 2020 early in the writing process, because prospective consideration of the items may help to ensure that all the items are addressed. To help keep track of which items have been reported, the PRISMA statement website ( http://www.prisma-statement.org/ ) includes fillable templates of the checklists to download and complete (also available in the data supplement on bmj.com). We have also created a web application that allows users to complete the checklist via a user-friendly interface 58 (available at https://prisma.shinyapps.io/checklist/ and adapted from the Transparency Checklist app 59 ). The completed checklist can be exported to Word or PDF. Editable templates of the flow diagram can also be downloaded from the PRISMA statement website.

We have prepared an updated explanation and elaboration paper, in which we explain why reporting of each item is recommended and present bullet points that detail the reporting recommendations (which we refer to as elements). 41 The bullet-point structure is new to PRISMA 2020 and has been adopted to facilitate implementation of the guidance. 60 61 An expanded checklist, which comprises an abridged version of the elements presented in the explanation and elaboration paper, with references and some examples removed, is available in the data supplement on bmj.com. Consulting the explanation and elaboration paper is recommended if further clarity or information is required.

Journals and publishers might impose word and section limits, and limits on the number of tables and figures allowed in the main report. In such cases, if the relevant information for some items already appears in a publicly accessible review protocol, referring to the protocol may suffice. Alternatively, placing detailed descriptions of the methods used or additional results (such as for less critical outcomes) in supplementary files is recommended. Ideally, supplementary files should be deposited to a general-purpose or institutional open-access repository that provides free and permanent access to the material (such as Open Science Framework, Dryad, figshare). A reference or link to the additional information should be included in the main report. Finally, although PRISMA 2020 provides a template for where information might be located, the suggested location should not be seen as prescriptive; the guiding principle is to ensure the information is reported.

Use of PRISMA 2020 has the potential to benefit many stakeholders. Complete reporting allows readers to assess the appropriateness of the methods, and therefore the trustworthiness of the findings. Presenting and summarising characteristics of studies contributing to a synthesis allows healthcare providers and policy makers to evaluate the applicability of the findings to their setting. Describing the certainty in the body of evidence for an outcome and the implications of findings should help policy makers, managers, and other decision makers formulate appropriate recommendations for practice or policy. Complete reporting of all PRISMA 2020 items also facilitates replication and review updates, as well as inclusion of systematic reviews in overviews (of systematic reviews) and guidelines, so teams can leverage work that is already done and decrease research waste. 36 62 63

We updated the PRISMA 2009 statement by adapting the EQUATOR Network’s guidance for developing health research reporting guidelines. 64 We evaluated the reporting completeness of published systematic reviews, 17 21 36 37 reviewed the items included in other documents providing guidance for systematic reviews, 38 surveyed systematic review methodologists and journal editors for their views on how to revise the original PRISMA statement, 35 discussed the findings at an in-person meeting, and prepared this document through an iterative process. Our recommendations are informed by the reviews and survey conducted before the in-person meeting, theoretical considerations about which items facilitate replication and help users assess the risk of bias and applicability of systematic reviews, and co-authors’ experience with authoring and using systematic reviews.

Various strategies to increase the use of reporting guidelines and improve reporting have been proposed. They include educators introducing reporting guidelines into graduate curricula to promote good reporting habits of early career scientists 65 ; journal editors and regulators endorsing use of reporting guidelines 18 ; peer reviewers evaluating adherence to reporting guidelines 61 66 ; journals requiring authors to indicate where in their manuscript they have adhered to each reporting item 67 ; and authors using online writing tools that prompt complete reporting at the writing stage. 60 Multi-pronged interventions, where more than one of these strategies are combined, may be more effective (such as completion of checklists coupled with editorial checks). 68 However, of 31 interventions proposed to increase adherence to reporting guidelines, the effects of only 11 have been evaluated, mostly in observational studies at high risk of bias due to confounding. 69 It is therefore unclear which strategies should be used. Future research might explore barriers and facilitators to the use of PRISMA 2020 by authors, editors, and peer reviewers, designing interventions that address the identified barriers, and evaluating those interventions using randomised trials. To inform possible revisions to the guideline, it would also be valuable to conduct think-aloud studies 70 to understand how systematic reviewers interpret the items, and reliability studies to identify items where there is varied interpretation of the items.

We encourage readers to submit evidence that informs any of the recommendations in PRISMA 2020 (via the PRISMA statement website: http://www.prisma-statement.org/ ). To enhance accessibility of PRISMA 2020, several translations of the guideline are under way (see available translations at the PRISMA statement website). We encourage journal editors and publishers to raise awareness of PRISMA 2020 (for example, by referring to it in journal “Instructions to authors”), endorsing its use, advising editors and peer reviewers to evaluate submitted systematic reviews against the PRISMA 2020 checklists, and making changes to journal policies to accommodate the new reporting recommendations. We recommend existing PRISMA extensions 47 49 50 51 52 53 71 72 be updated to reflect PRISMA 2020 and advise developers of new PRISMA extensions to use PRISMA 2020 as the foundation document.

We anticipate that the PRISMA 2020 statement will benefit authors, editors, and peer reviewers of systematic reviews, and different users of reviews, including guideline developers, policy makers, healthcare providers, patients, and other stakeholders. Ultimately, we hope that uptake of the guideline will lead to more transparent, complete, and accurate reporting of systematic reviews, thus facilitating evidence based decision making.

Acknowledgments

We dedicate this paper to the late Douglas G Altman and Alessandro Liberati, whose contributions were fundamental to the development and implementation of the original PRISMA statement.

We thank the following contributors who completed the survey to inform discussions at the development meeting: Xavier Armoiry, Edoardo Aromataris, Ana Patricia Ayala, Ethan M Balk, Virginia Barbour, Elaine Beller, Jesse A Berlin, Lisa Bero, Zhao-Xiang Bian, Jean Joel Bigna, Ferrán Catalá-López, Anna Chaimani, Mike Clarke, Tammy Clifford, Ioana A Cristea, Miranda Cumpston, Sofia Dias, Corinna Dressler, Ivan D Florez, Joel J Gagnier, Chantelle Garritty, Long Ge, Davina Ghersi, Sean Grant, Gordon Guyatt, Neal R Haddaway, Julian PT Higgins, Sally Hopewell, Brian Hutton, Jamie J Kirkham, Jos Kleijnen, Julia Koricheva, Joey SW Kwong, Toby J Lasserson, Julia H Littell, Yoon K Loke, Malcolm R Macleod, Chris G Maher, Ana Marušic, Dimitris Mavridis, Jessie McGowan, Matthew DF McInnes, Philippa Middleton, Karel G Moons, Zachary Munn, Jane Noyes, Barbara Nußbaumer-Streit, Donald L Patrick, Tatiana Pereira-Cenci, Ba’ Pham, Bob Phillips, Dawid Pieper, Michelle Pollock, Daniel S Quintana, Drummond Rennie, Melissa L Rethlefsen, Hannah R Rothstein, Maroeska M Rovers, Rebecca Ryan, Georgia Salanti, Ian J Saldanha, Margaret Sampson, Nancy Santesso, Rafael Sarkis-Onofre, Jelena Savović, Christopher H Schmid, Kenneth F Schulz, Guido Schwarzer, Beverley J Shea, Paul G Shekelle, Farhad Shokraneh, Mark Simmonds, Nicole Skoetz, Sharon E Straus, Anneliese Synnot, Emily E Tanner-Smith, Brett D Thombs, Hilary Thomson, Alexander Tsertsvadze, Peter Tugwell, Tari Turner, Lesley Uttley, Jeffrey C Valentine, Matt Vassar, Areti Angeliki Veroniki, Meera Viswanathan, Cole Wayant, Paul Whaley, and Kehu Yang. We thank the following contributors who provided feedback on a preliminary version of the PRISMA 2020 checklist: Jo Abbott, Fionn Büttner, Patricia Correia-Santos, Victoria Freeman, Emily A Hennessy, Rakibul Islam, Amalia (Emily) Karahalios, Kasper Krommes, Andreas Lundh, Dafne Port Nascimento, Davina Robson, Catherine Schenck-Yglesias, Mary M Scott, Sarah Tanveer and Pavel Zhelnov. We thank Abigail H Goben, Melissa L Rethlefsen, Tanja Rombey, Anna Scott, and Farhad Shokraneh for their helpful comments on the preprints of the PRISMA 2020 papers. We thank Edoardo Aromataris, Stephanie Chang, Toby Lasserson and David Schriger for their helpful peer review comments on the PRISMA 2020 papers.

Contributors: JEM and DM are joint senior authors. MJP, JEM, PMB, IB, TCH, CDM, LS, and DM conceived this paper and designed the literature review and survey conducted to inform the guideline content. MJP conducted the literature review, administered the survey and analysed the data for both. MJP prepared all materials for the development meeting. MJP and JEM presented proposals at the development meeting. All authors except for TCH, JMT, EAA, SEB, and LAM attended the development meeting. MJP and JEM took and consolidated notes from the development meeting. MJP and JEM led the drafting and editing of the article. JEM, PMB, IB, TCH, LS, JMT, EAA, SEB, RC, JG, AH, TL, EMW, SM, LAM, LAS, JT, ACT, PW, and DM drafted particular sections of the article. All authors were involved in revising the article critically for important intellectual content. All authors approved the final version of the article. MJP is the guarantor of this work. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: There was no direct funding for this research. MJP is supported by an Australian Research Council Discovery Early Career Researcher Award (DE200101618) and was previously supported by an Australian National Health and Medical Research Council (NHMRC) Early Career Fellowship (1088535) during the conduct of this research. JEM is supported by an Australian NHMRC Career Development Fellowship (1143429). TCH is supported by an Australian NHMRC Senior Research Fellowship (1154607). JMT is supported by Evidence Partners Inc. JMG is supported by a Tier 1 Canada Research Chair in Health Knowledge Transfer and Uptake. MML is supported by The Ottawa Hospital Anaesthesia Alternate Funds Association and a Faculty of Medicine Junior Research Chair. TL is supported by funding from the National Eye Institute (UG1EY020522), National Institutes of Health, United States. LAM is supported by a National Institute for Health Research Doctoral Research Fellowship (DRF-2018-11-ST2-048). ACT is supported by a Tier 2 Canada Research Chair in Knowledge Synthesis. DM is supported in part by a University Research Chair, University of Ottawa. The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.

Competing interests: All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/conflicts-of-interest/ and declare: EL is head of research for the BMJ ; MJP is an editorial board member for PLOS Medicine ; ACT is an associate editor and MJP, TL, EMW, and DM are editorial board members for the Journal of Clinical Epidemiology ; DM and LAS were editors in chief, LS, JMT, and ACT are associate editors, and JG is an editorial board member for Systematic Reviews . None of these authors were involved in the peer review process or decision to publish. TCH has received personal fees from Elsevier outside the submitted work. EMW has received personal fees from the American Journal for Public Health , for which he is the editor for systematic reviews. VW is editor in chief of the Campbell Collaboration, which produces systematic reviews, and co-convenor of the Campbell and Cochrane equity methods group. DM is chair of the EQUATOR Network, IB is adjunct director of the French EQUATOR Centre and TCH is co-director of the Australasian EQUATOR Centre, which advocates for the use of reporting guidelines to improve the quality of reporting in research articles. JMT received salary from Evidence Partners, creator of DistillerSR software for systematic reviews; Evidence Partners was not involved in the design or outcomes of the statement, and the views expressed solely represent those of the author.

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient and public involvement: Patients and the public were not involved in this methodological research. We plan to disseminate the research widely, including to community participants in evidence synthesis organisations.

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/ .

  • Gurevitch J ,
  • Koricheva J ,
  • Nakagawa S ,
  • Liberati A ,
  • Tetzlaff J ,
  • Altman DG ,
  • PRISMA Group
  • Tricco AC ,
  • Sampson M ,
  • Shamseer L ,
  • Leoncini E ,
  • de Belvis G ,
  • Ricciardi W ,
  • Fowler AJ ,
  • Leclercq V ,
  • Beaudart C ,
  • Ajamieh S ,
  • Rabenda V ,
  • Tirelli E ,
  • O’Mara-Eves A ,
  • McNaught J ,
  • Ananiadou S
  • Marshall IJ ,
  • Noel-Storr A ,
  • Higgins JPT ,
  • Chandler J ,
  • McKenzie JE ,
  • López-López JA ,
  • Becker BJ ,
  • Campbell M ,
  • Sterne JAC ,
  • Savović J ,
  • Sterne JA ,
  • Hernán MA ,
  • Reeves BC ,
  • Whiting P ,
  • Higgins JP ,
  • ROBIS group
  • Hultcrantz M ,
  • Stewart L ,
  • Bossuyt PM ,
  • Flemming K ,
  • McInnes E ,
  • France EF ,
  • Cunningham M ,
  • Rethlefsen ML ,
  • Kirtley S ,
  • Waffenschmidt S ,
  • PRISMA-S Group
  • ↵ Higgins JPT, Thomas J, Chandler J, et al, eds. Cochrane Handbook for Systematic Reviews of Interventions : Version 6.0. Cochrane, 2019. Available from https://training.cochrane.org/handbook .
  • Dekkers OM ,
  • Vandenbroucke JP ,
  • Cevallos M ,
  • Renehan AG ,
  • ↵ Cooper H, Hedges LV, Valentine JV, eds. The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation, 2019.
  • IOM (Institute of Medicine)
  • PRISMA-P Group
  • Salanti G ,
  • Caldwell DM ,
  • Stewart LA ,
  • PRISMA-IPD Development Group
  • Zorzela L ,
  • Ioannidis JP ,
  • PRISMAHarms Group
  • McInnes MDF ,
  • Thombs BD ,
  • and the PRISMA-DTA Group
  • Beller EM ,
  • Glasziou PP ,
  • PRISMA for Abstracts Group
  • Mayo-Wilson E ,
  • Dickersin K ,
  • MUDS investigators
  • Stovold E ,
  • Beecher D ,
  • Noel-Storr A
  • McGuinness LA
  • Sarafoglou A ,
  • Boutron I ,
  • Giraudeau B ,
  • Porcher R ,
  • Chauvin A ,
  • Schulz KF ,
  • Schroter S ,
  • Stevens A ,
  • Weinstein E ,
  • Macleod MR ,
  • IICARus Collaboration
  • Kirkham JJ ,
  • Petticrew M ,
  • Tugwell P ,
  • PRISMA-Equity Bellagio group

literature review meta analysis

Jump to navigation

Home

Cochrane Training

Chapter 10: analysing data and undertaking meta-analyses.

Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Key Points:

  • Meta-analysis is the statistical combination of results from two or more separate studies.
  • Potential advantages of meta-analyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims. However, they also have the potential to mislead seriously, particularly if specific study designs, within-study biases, variation across studies, and reporting biases are not carefully considered.
  • It is important to be familiar with the type of data (e.g. dichotomous, continuous) that result from measurement of an outcome in an individual study, and to choose suitable effect measures for comparing intervention groups.
  • Most meta-analysis methods are variations on a weighted average of the effect estimates from the different studies.
  • Studies with no events contribute no information about the risk ratio or odds ratio. For rare events, the Peto method has been observed to be less biased and more powerful than other methods.
  • Variation across studies (heterogeneity) must be considered, although most Cochrane Reviews do not have enough studies to allow for the reliable investigation of its causes. Random-effects meta-analyses allow for heterogeneity by assuming that underlying effects follow a normal distribution, but they must be interpreted carefully. Prediction intervals from random-effects meta-analyses are a useful device for presenting the extent of between-study variation.
  • Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.

Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August  2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

10.1 Do not start here!

It can be tempting to jump prematurely into a statistical analysis when undertaking a systematic review. The production of a diamond at the bottom of a plot is an exciting moment for many authors, but results of meta-analyses can be very misleading if suitable attention has not been given to formulating the review question; specifying eligibility criteria; identifying and selecting studies; collecting appropriate data; considering risk of bias; planning intervention comparisons; and deciding what data would be meaningful to analyse. Review authors should consult the chapters that precede this one before a meta-analysis is undertaken.

10.2 Introduction to meta-analysis

An important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention. Potential advantages of meta-analyses include the following:

  • T o improve precision . Many studies are too small to provide convincing evidence about intervention effects in isolation. Estimation is usually improved when it is based on more information.
  • To answer questions not posed by the individual studies . Primary studies often involve a specific type of participant and explicitly defined interventions. A selection of studies in which these characteristics differ can allow investigation of the consistency of effect across a wider range of populations and interventions. It may also, if relevant, allow reasons for differences in effect estimates to be investigated.
  • To settle controversies arising from apparently conflicting studies or to generate new hypotheses . Statistical synthesis of findings allows the degree of conflict to be formally assessed, and reasons for different results to be explored and quantified.

Of course, the use of statistical synthesis methods does not guarantee that the results of a review are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods can be misused.

This chapter describes the principles and methods used to carry out a meta-analysis for a comparison of two interventions for the main types of data encountered. The use of network meta-analysis to compare more than two interventions is addressed in Chapter 11 . Formulae for most of the methods described are provided in the RevMan Web Knowledge Base under Statistical Algorithms and calculations used in Review Manager (documentation.cochrane.org/revman-kb/statistical-methods-210600101.html), and a longer discussion of many of the issues is available ( Deeks et al 2001 ).

10.2.1 Principles of meta-analysis

The commonly used methods for meta-analysis follow the following basic principles:

  • Meta-analysis is typically a two-stage process. In the first stage, a summary statistic is calculated for each study, to describe the observed intervention effect in the same way for every study. For example, the summary statistic may be a risk ratio if the data are dichotomous, or a difference between means if the data are continuous (see Chapter 6 ).

literature review meta analysis

  • The combination of intervention effect estimates across studies may optionally incorporate an assumption that the studies are not all estimating the same intervention effect, but estimate intervention effects that follow a distribution across studies. This is the basis of a random-effects meta-analysis (see Section 10.10.4 ). Alternatively, if it is assumed that each study is estimating exactly the same quantity, then a fixed-effect meta-analysis is performed.
  • The standard error of the summary intervention effect can be used to derive a confidence interval, which communicates the precision (or uncertainty) of the summary estimate; and to derive a P value, which communicates the strength of the evidence against the null hypothesis of no intervention effect.
  • As well as yielding a summary quantification of the intervention effect, all methods of meta-analysis can incorporate an assessment of whether the variation among the results of the separate studies is compatible with random variation, or whether it is large enough to indicate inconsistency of intervention effects across studies (see Section 10.10 ).
  • The problem of missing data is one of the numerous practical considerations that must be thought through when undertaking a meta-analysis. In particular, review authors should consider the implications of missing outcome data from individual participants (due to losses to follow-up or exclusions from analysis) (see Section 10.12 ).

Meta-analyses are usually illustrated using a forest plot . An example appears in Figure 10.2.a . A forest plot displays effect estimates and confidence intervals for both individual studies and meta-analyses (Lewis and Clarke 2001). Each study is represented by a block at the point estimate of intervention effect with a horizontal line extending either side of the block. The area of the block indicates the weight assigned to that study in the meta-analysis while the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The area of the block and the confidence interval convey similar information, but both make different contributions to the graphic. The confidence interval depicts the range of intervention effects compatible with the study’s result. The size of the block draws the eye towards the studies with larger weight (usually those with narrower confidence intervals), which dominate the calculation of the summary result, presented as a diamond at the bottom.

Figure 10.2.a Example of a forest plot from a review of interventions to promote ownership of smoke alarms (DiGuiseppi and Higgins 2001). Reproduced with permission of John Wiley & Sons

literature review meta analysis

10.3 A generic inverse-variance approach to meta-analysis

A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse-variance method . This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many meta-analyses of both dichotomous and continuous data.

The inverse-variance method is so named because the weight given to each study is chosen to be the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error). Thus, larger studies, which have smaller standard errors, are given more weight than smaller studies, which have larger standard errors. This choice of weights minimizes the imprecision (uncertainty) of the pooled effect estimate.

10.3.1 Fixed-effect method for meta-analysis

A fixed-effect meta-analysis using the inverse-variance method calculates a weighted average as:

literature review meta analysis

where Y i is the intervention effect estimated in the i th study, SE i is the standard error of that estimate, and the summation is across all studies. The basic data required for the analysis are therefore an estimate of the intervention effect and its standard error from each study. A fixed-effect meta-analysis is valid under an assumption that all effect estimates are estimating the same underlying intervention effect, which is referred to variously as a ‘fixed-effect’ assumption, a ‘common-effect’ assumption or an ‘equal-effects’ assumption. However, the result of the meta-analysis can be interpreted without making such an assumption (Rice et al 2018).

10.3.2 Random-effects methods for meta-analysis

A variation on the inverse-variance method is to incorporate an assumption that the different studies are estimating different, yet related, intervention effects (Higgins et al 2009). This produces a random-effects meta-analysis, and the simplest version is known as the DerSimonian and Laird method (DerSimonian and Laird 1986). Random-effects meta-analysis is discussed in detail in Section 10.10.4 .

10.3.3 Performing inverse-variance meta-analyses

Most meta-analysis programs perform inverse-variance meta-analyses. Usually the user provides summary data from each intervention arm of each study, such as a 2×2 table when the outcome is dichotomous (see Chapter 6, Section 6.4 ), or means, standard deviations and sample sizes for each group when the outcome is continuous (see Chapter 6, Section 6.5 ). This avoids the need for the author to calculate effect estimates, and allows the use of methods targeted specifically at different types of data (see Sections 10.4 and 10.5 ).

When the data are conveniently available as summary statistics from each intervention group, the inverse-variance method can be implemented directly. For example, estimates and their standard errors may be entered directly into RevMan under the ‘Generic inverse variance’ outcome type. For ratio measures of intervention effect, the data must be entered into RevMan as natural logarithms (for example, as a log odds ratio and the standard error of the log odds ratio). However, it is straightforward to instruct the software to display results on the original (e.g. odds ratio) scale. It is possible to supplement or replace this with a column providing the sample sizes in the two groups. Note that the ability to enter estimates and standard errors creates a high degree of flexibility in meta-analysis. It facilitates the analysis of properly analysed crossover trials, cluster-randomized trials and non-randomized trials (see Chapter 23 ), as well as outcome data that are ordinal, time-to-event or rates (see Chapter 6 ).

10.4 Meta-analysis of dichotomous outcomes

There are four widely used methods of meta-analysis for dichotomous outcomes, three fixed-effect methods (Mantel-Haenszel, Peto and inverse variance) and one random-effects method (DerSimonian and Laird inverse variance). All of these methods are available as analysis options in RevMan. The Peto method can only combine odds ratios, whilst the other three methods can combine odds ratios, risk ratios or risk differences. Formulae for all of the meta-analysis methods are available elsewhere (Deeks et al 2001).

Note that having no events in one group (sometimes referred to as ‘zero cells’) causes problems with computation of estimates and standard errors with some methods: see Section 10.4.4 .

10.4.1 Mantel-Haenszel methods

When data are sparse, either in terms of event risks being low or study size being small, the estimates of the standard errors of the effect estimates that are used in the inverse-variance methods may be poor. Mantel-Haenszel methods are fixed-effect meta-analysis methods using a different weighting scheme that depends on which effect measure (e.g. risk ratio, odds ratio, risk difference) is being used (Mantel and Haenszel 1959, Greenland and Robins 1985). They have been shown to have better statistical properties when there are few events. As this is a common situation in Cochrane Reviews, the Mantel-Haenszel method is generally preferable to the inverse variance method in fixed-effect meta-analyses. In other situations the two methods give similar estimates.

10.4.2 Peto odds ratio method

Peto’s method can only be used to combine odds ratios (Yusuf et al 1985). It uses an inverse-variance approach, but uses an approximate method of estimating the log odds ratio, and uses different weights. An alternative way of viewing the Peto method is as a sum of ‘O – E’ statistics. Here, O is the observed number of events and E is an expected number of events in the experimental intervention group of each study under the null hypothesis of no intervention effect.

The approximation used in the computation of the log odds ratio works well when intervention effects are small (odds ratios are close to 1), events are not particularly common and the studies have similar numbers in experimental and comparator groups. In other situations it has been shown to give biased answers. As these criteria are not always fulfilled, Peto’s method is not recommended as a default approach for meta-analysis.

Corrections for zero cell counts are not necessary when using Peto’s method. Perhaps for this reason, this method performs well when events are very rare (Bradburn et al 2007); see Section 10.4.4.1 . Also, Peto’s method can be used to combine studies with dichotomous outcome data with studies using time-to-event analyses where log-rank tests have been used (see Section 10.9 ).

10.4.3 Which effect measure for dichotomous outcomes?

Effect measures for dichotomous data are described in Chapter 6, Section 6.4.1 . The effect of an intervention can be expressed as either a relative or an absolute effect. The risk ratio (relative risk) and odds ratio are relative measures, while the risk difference and number needed to treat for an additional beneficial outcome are absolute measures. A further complication is that there are, in fact, two risk ratios. We can calculate the risk ratio of an event occurring or the risk ratio of no event occurring. These give different summary results in a meta-analysis, sometimes dramatically so.

The selection of a summary statistic for use in meta-analysis depends on balancing three criteria (Deeks 2002). First, we desire a summary statistic that gives values that are similar for all the studies in the meta-analysis and subdivisions of the population to which the interventions will be applied. The more consistent the summary statistic, the greater is the justification for expressing the intervention effect as a single summary number. Second, the summary statistic must have the mathematical properties required to perform a valid meta-analysis. Third, the summary statistic would ideally be easily understood and applied by those using the review. The summary intervention effect should be presented in a way that helps readers to interpret and apply the results appropriately. Among effect measures for dichotomous data, no single measure is uniformly best, so the choice inevitably involves a compromise.

Consistency Empirical evidence suggests that relative effect measures are, on average, more consistent than absolute measures (Engels et al 2000, Deeks 2002, Rücker et al 2009). For this reason, it is wise to avoid performing meta-analyses of risk differences, unless there is a clear reason to suspect that risk differences will be consistent in a particular clinical situation. On average there is little difference between the odds ratio and risk ratio in terms of consistency (Deeks 2002). When the study aims to reduce the incidence of an adverse event, there is empirical evidence that risk ratios of the adverse event are more consistent than risk ratios of the non-event (Deeks 2002). Selecting an effect measure based on what is the most consistent in a particular situation is not a generally recommended strategy, since it may lead to a selection that spuriously maximizes the precision of a meta-analysis estimate.

Mathematical properties The most important mathematical criterion is the availability of a reliable variance estimate. The number needed to treat for an additional beneficial outcome does not have a simple variance estimator and cannot easily be used directly in meta-analysis, although it can be computed from the meta-analysis result afterwards (see Chapter 15, Section 15.4.2 ). There is no consensus regarding the importance of two other often-cited mathematical properties: the fact that the behaviour of the odds ratio and the risk difference do not rely on which of the two outcome states is coded as the event, and the odds ratio being the only statistic which is unbounded (see Chapter 6, Section 6.4.1 ).

Ease of interpretation The odds ratio is the hardest summary statistic to understand and to apply in practice, and many practising clinicians report difficulties in using them. There are many published examples where authors have misinterpreted odds ratios from meta-analyses as risk ratios. Although odds ratios can be re-expressed for interpretation (as discussed here), there must be some concern that routine presentation of the results of systematic reviews as odds ratios will lead to frequent over-estimation of the benefits and harms of interventions when the results are applied in clinical practice. Absolute measures of effect are thought to be more easily interpreted by clinicians than relative effects (Sinclair and Bracken 1994), and allow trade-offs to be made between likely benefits and likely harms of interventions. However, they are less likely to be generalizable.

It is generally recommended that meta-analyses are undertaken using risk ratios (taking care to make a sensible choice over which category of outcome is classified as the event) or odds ratios. This is because it seems important to avoid using summary statistics for which there is empirical evidence that they are unlikely to give consistent estimates of intervention effects (the risk difference), and it is impossible to use statistics for which meta-analysis cannot be performed (the number needed to treat for an additional beneficial outcome). It may be wise to plan to undertake a sensitivity analysis to investigate whether choice of summary statistic (and selection of the event category) is critical to the conclusions of the meta-analysis (see Section 10.14 ).

It is often sensible to use one statistic for meta-analysis and to re-express the results using a second, more easily interpretable statistic. For example, often meta-analysis may be best performed using relative effect measures (risk ratios or odds ratios) and the results re-expressed using absolute effect measures (risk differences or numbers needed to treat for an additional beneficial outcome – see Chapter 15, Section 15.4 . This is one of the key motivations for ‘Summary of findings’ tables in Cochrane Reviews: see Chapter 14 ). If odds ratios are used for meta-analysis they can also be re-expressed as risk ratios (see Chapter 15, Section 15.4 ). In all cases the same formulae can be used to convert upper and lower confidence limits. However, all of these transformations require specification of a value of baseline risk that indicates the likely risk of the outcome in the ‘control’ population to which the experimental intervention will be applied. Where the chosen value for this assumed comparator group risk is close to the typical observed comparator group risks across the studies, similar estimates of absolute effect will be obtained regardless of whether odds ratios or risk ratios are used for meta-analysis. Where the assumed comparator risk differs from the typical observed comparator group risk, the predictions of absolute benefit will differ according to which summary statistic was used for meta-analysis.

10.4.4 Meta-analysis of rare events

For rare outcomes, meta-analysis may be the only way to obtain reliable evidence of the effects of healthcare interventions. Individual studies are usually under-powered to detect differences in rare outcomes, but a meta-analysis of many studies may have adequate power to investigate whether interventions do have an impact on the incidence of the rare event. However, many methods of meta-analysis are based on large sample approximations, and are unsuitable when events are rare. Thus authors must take care when selecting a method of meta-analysis (Efthimiou 2018).

There is no single risk at which events are classified as ‘rare’. Certainly risks of 1 in 1000 constitute rare events, and many would classify risks of 1 in 100 the same way. However, the performance of methods when risks are as high as 1 in 10 may also be affected by the issues discussed in this section. What is typical is that a high proportion of the studies in the meta-analysis observe no events in one or more study arms.

10.4.4.1 Studies with no events in one or more arms

Computational problems can occur when no events are observed in one or both groups in an individual study. Inverse variance meta-analytical methods involve computing an intervention effect estimate and its standard error for each study. For studies where no events were observed in one or both arms, these computations often involve dividing by a zero count, which yields a computational error. Most meta-analytical software routines (including those in RevMan) automatically check for problematic zero counts, and add a fixed value (typically 0.5) to all cells of a 2×2 table where the problems occur. The Mantel-Haenszel methods require zero-cell corrections only if the same cell is zero in all the included studies, and hence need to use the correction less often. However, in many software applications the same correction rules are applied for Mantel-Haenszel methods as for the inverse-variance methods. Odds ratio and risk ratio methods require zero cell corrections more often than difference methods, except for the Peto odds ratio method, which encounters computation problems only in the extreme situation of no events occurring in all arms of all studies.

Whilst the fixed correction meets the objective of avoiding computational errors, it usually has the undesirable effect of biasing study estimates towards no difference and over-estimating variances of study estimates (consequently down-weighting inappropriately their contribution to the meta-analysis). Where the sizes of the study arms are unequal (which occurs more commonly in non-randomized studies than randomized trials), they will introduce a directional bias in the treatment effect. Alternative non-fixed zero-cell corrections have been explored by Sweeting and colleagues, including a correction proportional to the reciprocal of the size of the contrasting study arm, which they found preferable to the fixed 0.5 correction when arm sizes were not balanced (Sweeting et al 2004).

10.4.4.2 Studies with no events in either arm

The standard practice in meta-analysis of odds ratios and risk ratios is to exclude studies from the meta-analysis where there are no events in both arms. This is because such studies do not provide any indication of either the direction or magnitude of the relative treatment effect. Whilst it may be clear that events are very rare on both the experimental intervention and the comparator intervention, no information is provided as to which group is likely to have the higher risk, or on whether the risks are of the same or different orders of magnitude (when risks are very low, they are compatible with very large or very small ratios). Whilst one might be tempted to infer that the risk would be lowest in the group with the larger sample size (as the upper limit of the confidence interval would be lower), this is not justified as the sample size allocation was determined by the study investigators and is not a measure of the incidence of the event.

Risk difference methods superficially appear to have an advantage over odds ratio methods in that the risk difference is defined (as zero) when no events occur in either arm. Such studies are therefore included in the estimation process. Bradburn and colleagues undertook simulation studies which revealed that all risk difference methods yield confidence intervals that are too wide when events are rare, and have associated poor statistical power, which make them unsuitable for meta-analysis of rare events (Bradburn et al 2007). This is especially relevant when outcomes that focus on treatment safety are being studied, as the ability to identify correctly (or attempt to refute) serious adverse events is a key issue in drug development.

It is likely that outcomes for which no events occur in either arm may not be mentioned in reports of many randomized trials, precluding their inclusion in a meta-analysis. It is unclear, though, when working with published results, whether failure to mention a particular adverse event means there were no such events, or simply that such events were not included as a measured endpoint. Whilst the results of risk difference meta-analyses will be affected by non-reporting of outcomes with no events, odds and risk ratio based methods naturally exclude these data whether or not they are published, and are therefore unaffected.

10.4.4.3 Validity of methods of meta-analysis for rare events

Simulation studies have revealed that many meta-analytical methods can give misleading results for rare events, which is unsurprising given their reliance on asymptotic statistical theory. Their performance has been judged suboptimal either through results being biased, confidence intervals being inappropriately wide, or statistical power being too low to detect substantial differences.

In the following we consider the choice of statistical method for meta-analyses of odds ratios. Appropriate choices appear to depend on the comparator group risk, the likely size of the treatment effect and consideration of balance in the numbers of experimental and comparator participants in the constituent studies. We are not aware of research that has evaluated risk ratio measures directly, but their performance is likely to be very similar to corresponding odds ratio measurements. When events are rare, estimates of odds and risks are near identical, and results of both can be interpreted as ratios of probabilities.

Bradburn and colleagues found that many of the most commonly used meta-analytical methods were biased when events were rare (Bradburn et al 2007). The bias was greatest in inverse variance and DerSimonian and Laird odds ratio and risk difference methods, and the Mantel-Haenszel odds ratio method using a 0.5 zero-cell correction. As already noted, risk difference meta-analytical methods tended to show conservative confidence interval coverage and low statistical power when risks of events were low.

At event rates below 1% the Peto one-step odds ratio method was found to be the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and comparator group sizes within studies, and treatment effects were not exceptionally large. This finding was consistently observed across three different meta-analytical scenarios, and was also observed by Sweeting and colleagues (Sweeting et al 2004).

This finding was noted despite the method producing only an approximation to the odds ratio. For very large effects (e.g. risk ratio=0.2) when the approximation is known to be poor, treatment effects were under-estimated, but the Peto method still had the best performance of all the methods considered for event risks of 1 in 1000, and the bias was never more than 6% of the comparator group risk.

In other circumstances (i.e. event risks above 1%, very large effects at event risks around 1%, and meta-analyses where many studies were substantially imbalanced) the best performing methods were the Mantel-Haenszel odds ratio without zero-cell corrections, logistic regression and an exact method. None of these methods is available in RevMan.

Methods that should be avoided with rare events are the inverse-variance methods (including the DerSimonian and Laird random-effects method) (Efthimiou 2018). These directly incorporate the study’s variance in the estimation of its contribution to the meta-analysis, but these are usually based on a large-sample variance approximation, which was not intended for use with rare events. We would suggest that incorporation of heterogeneity into an estimate of a treatment effect should be a secondary consideration when attempting to produce estimates of effects from sparse data – the primary concern is to discern whether there is any signal of an effect in the data.

10.5 Meta-analysis of continuous outcomes

An important assumption underlying standard methods for meta-analysis of continuous data is that the outcomes have a normal distribution in each intervention arm in each study. This assumption may not always be met, although it is unimportant in very large studies. It is useful to consider the possibility of skewed data (see Section 10.5.3 ).

10.5.1 Which effect measure for continuous outcomes?

The two summary statistics commonly used for meta-analysis of continuous data are the mean difference (MD) and the standardized mean difference (SMD). Other options are available, such as the ratio of means (see Chapter 6, Section 6.5.1 ). Selection of summary statistics for continuous data is principally determined by whether studies all report the outcome using the same scale (when the mean difference can be used) or using different scales (when the standardized mean difference is usually used). The ratio of means can be used in either situation, but is appropriate only when outcome measurements are strictly greater than zero. Further considerations in deciding on an effect measure that will facilitate interpretation of the findings appears in Chapter 15, Section 15.5 .

The different roles played in MD and SMD approaches by the standard deviations (SDs) of outcomes observed in the two groups should be understood.

For the mean difference approach, the SDs are used together with the sample sizes to compute the weight given to each study. Studies with small SDs are given relatively higher weight whilst studies with larger SDs are given relatively smaller weights. This is appropriate if variation in SDs between studies reflects differences in the reliability of outcome measurements, but is probably not appropriate if the differences in SD reflect real differences in the variability of outcomes in the study populations.

For the standardized mean difference approach, the SDs are used to standardize the mean differences to a single scale, as well as in the computation of study weights. Thus, studies with small SDs lead to relatively higher estimates of SMD, whilst studies with larger SDs lead to relatively smaller estimates of SMD. For this to be appropriate, it must be assumed that between-study variation in SDs reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among study populations, as discussed in Chapter 6, Section 6.5.1.2 .

These assumptions of the methods should be borne in mind when unexpected variation of SDs is observed across studies.

10.5.2 Meta-analysis of change scores

In some circumstances an analysis based on changes from baseline will be more efficient and powerful than comparison of post-intervention values, as it removes a component of between-person variability from the analysis. However, calculation of a change score requires measurement of the outcome twice and in practice may be less efficient for outcomes that are unstable or difficult to measure precisely, where the measurement error may be larger than true between-person baseline variability. Change-from-baseline outcomes may also be preferred if they have a less skewed distribution than post-intervention measurement outcomes. Although sometimes used as a device to ‘correct’ for unlucky randomization, this practice is not recommended.

The preferred statistical approach to accounting for baseline measurements of the outcome variable is to include the baseline outcome measurements as a covariate in a regression model or analysis of covariance (ANCOVA). These analyses produce an ‘adjusted’ estimate of the intervention effect together with its standard error. These analyses are the least frequently encountered, but as they give the most precise and least biased estimates of intervention effects they should be included in the analysis when they are available. However, they can only be included in a meta-analysis using the generic inverse-variance method, since means and SDs are not available for each intervention group separately.

In practice an author is likely to discover that the studies included in a review include a mixture of change-from-baseline and post-intervention value scores. However, mixing of outcomes is not a problem when it comes to meta-analysis of MDs. There is no statistical reason why studies with change-from-baseline outcomes should not be combined in a meta-analysis with studies with post-intervention measurement outcomes when using the (unstandardized) MD method. In a randomized study, MD based on changes from baseline can usually be assumed to be addressing exactly the same underlying intervention effects as analyses based on post-intervention measurements. That is to say, the difference in mean post-intervention values will on average be the same as the difference in mean change scores. If the use of change scores does increase precision, appropriately, the studies presenting change scores will be given higher weights in the analysis than they would have received if post-intervention values had been used, as they will have smaller SDs.

When combining the data on the MD scale, authors must be careful to use the appropriate means and SDs (either of post-intervention measurements or of changes from baseline) for each study. Since the mean values and SDs for the two types of outcome may differ substantially, it may be advisable to place them in separate subgroups to avoid confusion for the reader, but the results of the subgroups can legitimately be pooled together.

In contrast, post-intervention value and change scores should not in principle be combined using standard meta-analysis approaches when the effect measure is an SMD. This is because the SDs used in the standardization reflect different things. The SD when standardizing post-intervention values reflects between-person variability at a single point in time. The SD when standardizing change scores reflects variation in between-person changes over time, so will depend on both within-person and between-person variability; within-person variability in turn is likely to depend on the length of time between measurements. Nevertheless, an empirical study of 21 meta-analyses in osteoarthritis did not find a difference between combined SMDs based on post-intervention values and combined SMDs based on change scores (da Costa et al 2013). One option is to standardize SMDs using post-intervention SDs rather than change score SDs. This would lead to valid synthesis of the two approaches, but we are not aware that an appropriate standard error for this has been derived.

A common practical problem associated with including change-from-baseline measures is that the SD of changes is not reported. Imputation of SDs is discussed in Chapter 6, Section 6.5.2.8 .

10.5.3 Meta-analysis of skewed data

Analyses based on means are appropriate for data that are at least approximately normally distributed, and for data from very large trials. If the true distribution of outcomes is asymmetrical, then the data are said to be skewed. Review authors should consider the possibility and implications of skewed data when analysing continuous outcomes (see MECIR Box 10.5.a ). Skew can sometimes be diagnosed from the means and SDs of the outcomes. A rough check is available, but it is only valid if a lowest or highest possible value for an outcome is known to exist. Thus, the check may be used for outcomes such as weight, volume and blood concentrations, which have lowest possible values of 0, or for scale outcomes with minimum or maximum scores, but it may not be appropriate for change-from-baseline measures. The check involves calculating the observed mean minus the lowest possible value (or the highest possible value minus the observed mean), and dividing this by the SD. A ratio less than 2 suggests skew (Altman and Bland 1996). If the ratio is less than 1, there is strong evidence of a skewed distribution.

Transformation of the original outcome data may reduce skew substantially. Reports of trials may present results on a transformed scale, usually a log scale. Collection of appropriate data summaries from the trialists, or acquisition of individual patient data, is currently the approach of choice. Appropriate data summaries and analysis strategies for the individual patient data will depend on the situation. Consultation with a knowledgeable statistician is advised.

Where data have been analysed on a log scale, results are commonly presented as geometric means and ratios of geometric means. A meta-analysis may be then performed on the scale of the log-transformed data; an example of the calculation of the required means and SD is given in Chapter 6, Section 6.5.2.4 . This approach depends on being able to obtain transformed data for all studies; methods for transforming from one scale to the other are available (Higgins et al 2008b). Log-transformed and untransformed data should not be mixed in a meta-analysis.

MECIR Box 10.5.a Relevant expectations for conduct of intervention reviews

10.6 Combining dichotomous and continuous outcomes

Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means, or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cut-point). This type of information is often easier to understand, and more helpful, when it is dichotomized. However, deciding on a cut-point may be arbitrary, and information is lost when continuous data are transformed to dichotomous data.

There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and SDs as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.

There are statistical approaches available that will re-express odds ratios as SMDs (and vice versa), allowing dichotomous and continuous data to be combined (Anzures-Cabrera et al 2011). A simple approach is as follows. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution, but with more data in the distributional tails), and that the variability of the outcomes is the same in both experimental and comparator participants, the odds ratios can be re-expressed as a SMD according to the following simple formula (Chinn 2000):

literature review meta analysis

The standard error of the log odds ratio can be converted to the standard error of a SMD by multiplying by the same constant (√3/π=0.5513). Alternatively SMDs can be re-expressed as log odds ratios by multiplying by π/√3=1.814. Once SMDs (or log odds ratios) and their standard errors have been computed for all studies in the meta-analysis, they can be combined using the generic inverse-variance method. Standard errors can be computed for all studies by entering the data as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and SMDs into standard errors (see Chapter 6, Section 6.3 ).

10.7 Meta-analysis of ordinal outcomes and measurement scale s

Ordinal and measurement scale outcomes are most commonly meta-analysed as dichotomous data (if so, see Section 10.4 ) or continuous data (if so, see Section 10.5 ) depending on the way that the study authors performed the original analyses.

Occasionally it is possible to analyse the data using proportional odds models. This is the case when ordinal scales have a small number of categories, the numbers falling into each category for each intervention group can be obtained, and the same ordinal scale has been used in all studies. This approach may make more efficient use of all available data than dichotomization, but requires access to statistical software and results in a summary statistic for which it is challenging to find a clinical meaning.

The proportional odds model uses the proportional odds ratio as the measure of intervention effect (Agresti 1996) (see Chapter 6, Section 6.6 ), and can be used for conducting a meta-analysis in advanced statistical software packages (Whitehead and Jones 1994). Estimates of log odds ratios and their standard errors from a proportional odds model may be meta-analysed using the generic inverse-variance method (see Section 10.3.3 ). If the same ordinal scale has been used in all studies, but in some reports has been presented as a dichotomous outcome, it may still be possible to include all studies in the meta-analysis. In the context of the three-category model, this might mean that for some studies category 1 constitutes a success, while for others both categories 1 and 2 constitute a success. Methods are available for dealing with this, and for combining data from scales that are related but have different definitions for their categories (Whitehead and Jones 1994).

10.8 Meta-analysis of counts and rates

Results may be expressed as count data when each participant may experience an event, and may experience it more than once (see Chapter 6, Section 6.7 ). For example, ‘number of strokes’, or ‘number of hospital visits’ are counts. These events may not happen at all, but if they do happen there is no theoretical maximum number of occurrences for an individual. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual (see Section 10.4 ), continuous data (see Section 10.5 ) and time-to-event data (see Section 10.9 ), as well as being analysed as rate data.

Rate data occur if counts are measured for each participant along with the time over which they are observed. This is particularly appropriate when the events being counted are rare. For example, a woman may experience two strokes during a follow-up period of two years. Her rate of strokes is one per year of follow-up (or, equivalently 0.083 per month of follow-up). Rates are conventionally summarized at the group level. For example, participants in the comparator group of a clinical trial may experience 85 strokes during a total of 2836 person-years of follow-up. An underlying assumption associated with the use of rates is that the risk of an event is constant across participants and over time. This assumption should be carefully considered for each situation. For example, in contraception studies, rates have been used (known as Pearl indices) to describe the number of pregnancies per 100 women-years of follow-up. This is now considered inappropriate since couples have different risks of conception, and the risk for each woman changes over time. Pregnancies are now analysed more often using life tables or time-to-event methods that investigate the time elapsing before the first pregnancy.

Analysing count data as rates is not always the most appropriate approach and is uncommon in practice. This is because:

  • the assumption of a constant underlying risk may not be suitable; and
  • the statistical methods are not as well developed as they are for other types of data.

The results of a study may be expressed as a rate ratio , that is the ratio of the rate in the experimental intervention group to the rate in the comparator group. The (natural) logarithms of the rate ratios may be combined across studies using the generic inverse-variance method (see Section 10.3.3 ). Alternatively, Poisson regression approaches can be used (Spittal et al 2015).

In a randomized trial, rate ratios may often be very similar to risk ratios obtained after dichotomizing the participants, since the average period of follow-up should be similar in all intervention groups. Rate ratios and risk ratios will differ, however, if an intervention affects the likelihood of some participants experiencing multiple events.

It is possible also to focus attention on the rate difference (see Chapter 6, Section 6.7.1 ). The analysis again can be performed using the generic inverse-variance method (Hasselblad and McCrory 1995, Guevara et al 2004).

10.9 Meta-analysis of time-to-event outcomes

Two approaches to meta-analysis of time-to-event outcomes are readily available to Cochrane Review authors. The choice of which to use will depend on the type of data that have been extracted from the primary studies, or obtained from re-analysis of individual participant data.

If ‘O – E’ and ‘V’ statistics have been obtained (see Chapter 6, Section 6.8.2 ), either through re-analysis of individual participant data or from aggregate statistics presented in the study reports, then these statistics may be entered directly into RevMan using the ‘O – E and Variance’ outcome type. There are several ways to calculate these ‘O – E’ and ‘V’ statistics. Peto’s method applied to dichotomous data (Section 10.4.2 ) gives rise to an odds ratio; a log-rank approach gives rise to a hazard ratio; and a variation of the Peto method for analysing time-to-event data gives rise to something in between (Simmonds et al 2011). The appropriate effect measure should be specified. Only fixed-effect meta-analysis methods are available in RevMan for ‘O – E and Variance’ outcomes.

Alternatively, if estimates of log hazard ratios and standard errors have been obtained from results of Cox proportional hazards regression models, study results can be combined using generic inverse-variance methods (see Section 10.3.3 ).

If a mixture of log-rank and Cox model estimates are obtained from the studies, all results can be combined using the generic inverse-variance method, as the log-rank estimates can be converted into log hazard ratios and standard errors using the approaches discussed in Chapter 6, Section 6.8 .

10.10 Heterogeneity

10.10.1 what is heterogeneity.

Inevitably, studies brought together in a systematic review will differ. Any kind of variability among studies in a systematic review may be termed heterogeneity. It can be helpful to distinguish between different types of heterogeneity. Variability in the participants, interventions and outcomes studied may be described as clinical diversity (sometimes called clinical heterogeneity), and variability in study design, outcome measurement tools and risk of bias may be described as methodological diversity (sometimes called methodological heterogeneity). Variability in the intervention effects being evaluated in the different studies is known as statistical heterogeneity , and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone. We will follow convention and refer to statistical heterogeneity simply as heterogeneity .

Clinical variation will lead to heterogeneity if the intervention effect is affected by the factors that vary across studies; most obviously, the specific interventions or patient characteristics. In other words, the true intervention effect will be different in different studies.

Differences between studies in terms of methodological factors, such as use of blinding and concealment of allocation sequence, or if there are differences between studies in the way the outcomes are defined and measured, may be expected to lead to differences in the observed intervention effects. Significant statistical heterogeneity arising from methodological diversity or differences in outcome assessments suggests that the studies are not all estimating the same quantity, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity would indicate that the studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this is not always the case. Further discussion appears in Chapter 7 and Chapter 8 .

The scope of a review will largely determine the extent to which studies included in a review are diverse. Sometimes a review will include studies addressing a variety of questions, for example when several different interventions for the same condition are of interest (see also Chapter 11 ) or when the differential effects of an intervention in different populations are of interest. Meta-analysis should only be considered when a group of studies is sufficiently homogeneous in terms of participants, interventions and outcomes to provide a meaningful summary (see MECIR Box 10.10.a. ). It is often appropriate to take a broader perspective in a meta-analysis than in a single clinical trial. A common analogy is that systematic reviews bring together apples and oranges, and that combining these can yield a meaningless result. This is true if apples and oranges are of intrinsic interest on their own, but may not be if they are used to contribute to a wider question about fruit. For example, a meta-analysis may reasonably evaluate the average effect of a class of drugs by combining results from trials where each evaluates the effect of a different drug from the class.

MECIR Box 10.10.a Relevant expectations for conduct of intervention reviews

There may be specific interest in a review in investigating how clinical and methodological aspects of studies relate to their results. Where possible these investigations should be specified a priori (i.e. in the protocol for the systematic review). It is legitimate for a systematic review to focus on examining the relationship between some clinical characteristic(s) of the studies and the size of intervention effect, rather than on obtaining a summary effect estimate across a series of studies (see Section 10.11 ). Meta-regression may best be used for this purpose, although it is not implemented in RevMan (see Section 10.11.4 ).

10.10.2 Identifying and measuring heterogeneity

It is essential to consider the extent to which the results of studies are consistent with each other (see MECIR Box 10.10.b ). If confidence intervals for the results of individual studies (generally depicted graphically using horizontal lines) have poor overlap, this generally indicates the presence of statistical heterogeneity. More formally, a statistical test for heterogeneity is available. This Chi 2 (χ 2 , or chi-squared) test is included in the forest plots in Cochrane Reviews. It assesses whether observed differences in results are compatible with chance alone. A low P value (or a large Chi 2 statistic relative to its degree of freedom) provides evidence of heterogeneity of intervention effects (variation in effect estimates beyond chance).

MECIR Box 10.10.b Relevant expectations for conduct of intervention reviews

Care must be taken in the interpretation of the Chi 2 test, since it has low power in the (common) situation of a meta-analysis when studies have small sample size or are few in number. This means that while a statistically significant result may indicate a problem with heterogeneity, a non-significant result must not be taken as evidence of no heterogeneity. This is also why a P value of 0.10, rather than the conventional level of 0.05, is sometimes used to determine statistical significance. A further problem with the test, which seldom occurs in Cochrane Reviews, is that when there are many studies in a meta-analysis, the test has high power to detect a small amount of heterogeneity that may be clinically unimportant.

Some argue that, since clinical and methodological diversity always occur in a meta-analysis, statistical heterogeneity is inevitable (Higgins et al 2003). Thus, the test for heterogeneity is irrelevant to the choice of analysis; heterogeneity will always exist whether or not we happen to be able to detect it using a statistical test. Methods have been developed for quantifying inconsistency across studies that move the focus away from testing whether heterogeneity is present to assessing its impact on the meta-analysis. A useful statistic for quantifying inconsistency is:

literature review meta analysis

In this equation, Q is the Chi 2 statistic and df is its degrees of freedom (Higgins and Thompson 2002, Higgins et al 2003). I 2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).

Thresholds for the interpretation of the I 2 statistic can be misleading, since the importance of inconsistency depends on several factors. A rough guide to interpretation in the context of meta-analyses of randomized trials is as follows:

  • 0% to 40%: might not be important;
  • 30% to 60%: may represent moderate heterogeneity*;
  • 50% to 90%: may represent substantial heterogeneity*;
  • 75% to 100%: considerable heterogeneity*.

*The importance of the observed value of I 2 depends on (1) magnitude and direction of effects, and (2) strength of evidence for heterogeneity (e.g. P value from the Chi 2 test, or a confidence interval for I 2 : uncertainty in the value of I 2 is substantial when the number of studies is small).

10.10.3 Strategies for addressing heterogeneity

Review authors must take into account any statistical heterogeneity when interpreting results, particularly when there is variation in the direction of effect (see MECIR Box 10.10.c ). A number of options are available if heterogeneity is identified among a group of studies that would otherwise be considered suitable for a meta-analysis.

MECIR Box 10.10.c  Relevant expectations for conduct of intervention reviews

  • Check again that the data are correct. Severe apparent heterogeneity can indicate that data have been incorrectly extracted or entered into meta-analysis software. For example, if standard errors have mistakenly been entered as SDs for continuous outcomes, this could manifest itself in overly narrow confidence intervals with poor overlap and hence substantial heterogeneity. Unit-of-analysis errors may also be causes of heterogeneity (see Chapter 6, Section 6.2 ).  
  • Do not do a meta -analysis. A systematic review need not contain any meta-analyses. If there is considerable variation in results, and particularly if there is inconsistency in the direction of effect, it may be misleading to quote an average value for the intervention effect.  
  • Explore heterogeneity. It is clearly of interest to determine the causes of heterogeneity among results of studies. This process is problematic since there are often many characteristics that vary across studies from which one may choose. Heterogeneity may be explored by conducting subgroup analyses (see Section 10.11.3 ) or meta-regression (see Section 10.11.4 ). Reliable conclusions can only be drawn from analyses that are truly pre-specified before inspecting the studies’ results, and even these conclusions should be interpreted with caution. Explorations of heterogeneity that are devised after heterogeneity is identified can at best lead to the generation of hypotheses. They should be interpreted with even more caution and should generally not be listed among the conclusions of a review. Also, investigations of heterogeneity when there are very few studies are of questionable value.  
  • Ignore heterogeneity. Fixed-effect meta-analyses ignore heterogeneity. The summary effect estimate from a fixed-effect meta-analysis is normally interpreted as being the best estimate of the intervention effect. However, the existence of heterogeneity suggests that there may not be a single intervention effect but a variety of intervention effects. Thus, the summary fixed-effect estimate may be an intervention effect that does not actually exist in any population, and therefore have a confidence interval that is meaningless as well as being too narrow (see Section 10.10.4 ).  
  • Perform a random-effects meta-analysis. A random-effects meta-analysis may be used to incorporate heterogeneity among studies. This is not a substitute for a thorough investigation of heterogeneity. It is intended primarily for heterogeneity that cannot be explained. An extended discussion of this option appears in Section 10.10.4 .  
  • Reconsider the effect measure. Heterogeneity may be an artificial consequence of an inappropriate choice of effect measure. For example, when studies collect continuous outcome data using different scales or different units, extreme heterogeneity may be apparent when using the mean difference but not when the more appropriate standardized mean difference is used. Furthermore, choice of effect measure for dichotomous outcomes (odds ratio, risk ratio, or risk difference) may affect the degree of heterogeneity among results. In particular, when comparator group risks vary, homogeneous odds ratios or risk ratios will necessarily lead to heterogeneous risk differences, and vice versa. However, it remains unclear whether homogeneity of intervention effect in a particular meta-analysis is a suitable criterion for choosing between these measures (see also Section 10.4.3 ).  
  • Exclude studies. Heterogeneity may be due to the presence of one or two outlying studies with results that conflict with the rest of the studies. In general it is unwise to exclude studies from a meta-analysis on the basis of their results as this may introduce bias. However, if an obvious reason for the outlying result is apparent, the study might be removed with more confidence. Since usually at least one characteristic can be found for any study in any meta-analysis which makes it different from the others, this criterion is unreliable because it is all too easy to fulfil. It is advisable to perform analyses both with and without outlying studies as part of a sensitivity analysis (see Section 10.14 ). Whenever possible, potential sources of clinical diversity that might lead to such situations should be specified in the protocol.

10.10.4 Incorporating heterogeneity into random-effects models

The random-effects meta-analysis approach incorporates an assumption that the different studies are estimating different, yet related, intervention effects (DerSimonian and Laird 1986, Borenstein et al 2010). The approach allows us to address heterogeneity that cannot readily be explained by other factors. A random-effects meta-analysis model involves an assumption that the effects being estimated in the different studies follow some distribution. The model represents our lack of knowledge about why real, or apparent, intervention effects differ, by considering the differences as if they were random. The centre of the assumed distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any particular distributional assumption, and this is a common criticism of random-effects meta-analyses. The importance of the assumed shape for this distribution has not been widely studied.

To undertake a random-effects meta-analysis, the standard errors of the study-specific estimates (SE i in Section 10.3.1 ) are adjusted to incorporate a measure of the extent of variation, or heterogeneity, among the intervention effects observed in different studies (this variation is often referred to as Tau-squared, τ 2 , or Tau 2 ). The amount of variation, and hence the adjustment, can be estimated from the intervention effects and standard errors of the studies included in the meta-analysis.

In a heterogeneous set of studies, a random-effects meta-analysis will award relatively more weight to smaller studies than such studies would receive in a fixed-effect meta-analysis. This is because small studies are more informative for learning about the distribution of effects across studies than for learning about an assumed common intervention effect.

Note that a random-effects model does not ‘take account’ of the heterogeneity, in the sense that it is no longer an issue. It is always preferable to explore possible causes of heterogeneity, although there may be too few studies to do this adequately (see Section 10.11 ).

10.10.4.1 Fixed or random effects?

A fixed-effect meta-analysis provides a result that may be viewed as a ‘typical intervention effect’ from the studies included in the analysis. In order to calculate a confidence interval for a fixed-effect meta-analysis the assumption is usually made that the true effect of intervention (in both magnitude and direction) is the same value in every study (i.e. fixed across studies). This assumption implies that the observed differences among study results are due solely to the play of chance (i.e. that there is no statistical heterogeneity).

A random-effects model provides a result that may be viewed as an ‘average intervention effect’, where this average is explicitly defined according to an assumed distribution of effects across studies. Instead of assuming that the intervention effects are the same, we assume that they follow (usually) a normal distribution. The assumption implies that the observed differences among study results are due to a combination of the play of chance and some genuine variation in the intervention effects.

The random-effects method and the fixed-effect method will give identical results when there is no heterogeneity among the studies.

When heterogeneity is present, a confidence interval around the random-effects summary estimate is wider than a confidence interval around a fixed-effect summary estimate. This will happen whenever the I 2 statistic is greater than zero, even if the heterogeneity is not detected by the Chi 2 test for heterogeneity (see Section 10.10.2 ).

Sometimes the central estimate of the intervention effect is different between fixed-effect and random-effects analyses. In particular, if results of smaller studies are systematically different from results of larger ones, which can happen as a result of publication bias or within-study bias in smaller studies (Egger et al 1997, Poole and Greenland 1999, Kjaergard et al 2001), then a random-effects meta-analysis will exacerbate the effects of the bias (see also Chapter 13, Section 13.3.5.6 ). A fixed-effect analysis will be affected less, although strictly it will also be inappropriate.

The decision between fixed- and random-effects meta-analyses has been the subject of much debate, and we do not provide a universal recommendation. Some considerations in making this choice are as follows:

  • Many have argued that the decision should be based on an expectation of whether the intervention effects are truly identical, preferring the fixed-effect model if this is likely and a random-effects model if this is unlikely (Borenstein et al 2010). Since it is generally considered to be implausible that intervention effects across studies are identical (unless the intervention has no effect at all), this leads many to advocate use of the random-effects model.
  • Others have argued that a fixed-effect analysis can be interpreted in the presence of heterogeneity, and that it makes fewer assumptions than a random-effects meta-analysis. They then refer to it as a ‘fixed-effects’ meta-analysis (Peto et al 1995, Rice et al 2018).
  • Under any interpretation, a fixed-effect meta-analysis ignores heterogeneity. If the method is used, it is therefore important to supplement it with a statistical investigation of the extent of heterogeneity (see Section 10.10.2 ).
  • In the presence of heterogeneity, a random-effects analysis gives relatively more weight to smaller studies and relatively less weight to larger studies. If there is additionally some funnel plot asymmetry (i.e. a relationship between intervention effect magnitude and study size), then this will push the results of the random-effects analysis towards the findings in the smaller studies. In the context of randomized trials, this is generally regarded as an unfortunate consequence of the model.
  • A pragmatic approach is to plan to undertake both a fixed-effect and a random-effects meta-analysis, with an intention to present the random-effects result if there is no indication of funnel plot asymmetry. If there is an indication of funnel plot asymmetry, then both methods are problematic. It may be reasonable to present both analyses or neither, or to perform a sensitivity analysis in which small studies are excluded or addressed directly using meta-regression (see Chapter 13, Section 13.3.5.6 ).
  • The choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity.

10.10.4.2 Interpretation of random-effects meta-analyses

The summary estimate and confidence interval from a random-effects meta-analysis refer to the centre of the distribution of intervention effects, but do not describe the width of the distribution. Often the summary estimate and its confidence interval are quoted in isolation and portrayed as a sufficient summary of the meta-analysis. This is inappropriate. The confidence interval from a random-effects meta-analysis describes uncertainty in the location of the mean of systematically different effects in the different studies. It does not describe the degree of heterogeneity among studies, as may be commonly believed. For example, when there are many studies in a meta-analysis, we may obtain a very tight confidence interval around the random-effects estimate of the mean effect even when there is a large amount of heterogeneity. A solution to this problem is to consider a prediction interval (see Section 10.10.4.3 ).

Methodological diversity creates heterogeneity through biases variably affecting the results of different studies. The random-effects summary estimate will only correctly estimate the average intervention effect if the biases are symmetrically distributed, leading to a mixture of over-estimates and under-estimates of effect, which is unlikely to be the case. In practice it can be very difficult to distinguish whether heterogeneity results from clinical or methodological diversity, and in most cases it is likely to be due to both, so these distinctions are hard to draw in the interpretation.

When there is little information, either because there are few studies or if the studies are small with few events, a random-effects analysis will provide poor estimates of the amount of heterogeneity (i.e. of the width of the distribution of intervention effects). Fixed-effect methods such as the Mantel-Haenszel method will provide more robust estimates of the average intervention effect, but at the cost of ignoring any heterogeneity.

10.10.4.3 Prediction intervals from a random-effects meta-analysis

An estimate of the between-study variance in a random-effects meta-analysis is typically presented as part of its results. The square root of this number (i.e. Tau) is the estimated standard deviation of underlying effects across studies. Prediction intervals are a way of expressing this value in an interpretable way.

To motivate the idea of a prediction interval, note that for absolute measures of effect (e.g. risk difference, mean difference, standardized mean difference), an approximate 95% range of normally distributed underlying effects can be obtained by creating an interval from 1.96´Tau below the random-effects mean, to 1.96✕Tau above it. (For relative measures such as the odds ratio and risk ratio, an equivalent interval needs to be based on the natural logarithm of the summary estimate.) In reality, both the summary estimate and the value of Tau are associated with uncertainty. A prediction interval seeks to present the range of effects in a way that acknowledges this uncertainty (Higgins et al 2009). A simple 95% prediction interval can be calculated as:

literature review meta analysis

where M is the summary mean from the random-effects meta-analysis, t k −2 is the 95% percentile of a t -distribution with k –2 degrees of freedom, k is the number of studies, Tau 2 is the estimated amount of heterogeneity and SE( M ) is the standard error of the summary mean.

The term ‘prediction interval’ relates to the use of this interval to predict the possible underlying effect in a new study that is similar to the studies in the meta-analysis. A more useful interpretation of the interval is as a summary of the spread of underlying effects in the studies included in the random-effects meta-analysis.

Prediction intervals have proved a popular way of expressing the amount of heterogeneity in a meta-analysis (Riley et al 2011). They are, however, strongly based on the assumption of a normal distribution for the effects across studies, and can be very problematic when the number of studies is small, in which case they can appear spuriously wide or spuriously narrow. Nevertheless, we encourage their use when the number of studies is reasonable (e.g. more than ten) and there is no clear funnel plot asymmetry.

10.10.4.4 Implementing random-effects meta-analyses

As introduced in Section 10.3.2 , the random-effects model can be implemented using an inverse-variance approach, incorporating a measure of the extent of heterogeneity into the study weights. RevMan implements a version of random-effects meta-analysis that is described by DerSimonian and Laird, making use of a ‘moment-based’ estimate of the between-study variance (DerSimonian and Laird 1986). The attraction of this method is that the calculations are straightforward, but it has a theoretical disadvantage in that the confidence intervals are slightly too narrow to encompass full uncertainty resulting from having estimated the degree of heterogeneity.

For many years, RevMan has implemented two random-effects methods for dichotomous data: a Mantel-Haenszel method and an inverse-variance method. Both use the moment-based approach to estimating the amount of between-studies variation. The difference between the two is subtle: the former estimates the between-study variation by comparing each study’s result with a Mantel-Haenszel fixed-effect meta-analysis result, whereas the latter estimates it by comparing each study’s result with an inverse-variance fixed-effect meta-analysis result. In practice, the difference is likely to be trivial.

There are alternative methods for performing random-effects meta-analyses that have better technical properties than the DerSimonian and Laird approach with a moment-based estimate (Veroniki et al 2016). Most notable among these is an adjustment to the confidence interval proposed by Hartung and Knapp and by Sidik and Jonkman (Hartung and Knapp 2001, Sidik and Jonkman 2002). This adjustment widens the confidence interval to reflect uncertainty in the estimation of between-study heterogeneity, and it should be used if available to review authors. An alternative option to encompass full uncertainty in the degree of heterogeneity is to take a Bayesian approach (see Section 10.13 ).

An empirical comparison of different ways to estimate between-study variation in Cochrane meta-analyses has shown that they can lead to substantial differences in estimates of heterogeneity, but seldom have major implications for estimating summary effects (Langan et al 2015). Several simulation studies have concluded that an approach proposed by Paule and Mandel should be recommended (Langan et al 2017); whereas a comprehensive recent simulation study recommended a restricted maximum likelihood approach, although noted that no single approach is universally preferable (Langan et al 2019). Review authors are encouraged to select one of these options if it is available to them.

10.11 Investigating heterogeneity

10.11.1 interaction and effect modification.

Does the intervention effect vary with different populations or intervention characteristics (such as dose or duration)? Such variation is known as interaction by statisticians and as effect modification by epidemiologists. Methods to search for such interactions include subgroup analyses and meta-regression. All methods have considerable pitfalls.

10.11.2 What are subgroup analyses?

Subgroup analyses involve splitting all the participant data into subgroups, often in order to make comparisons between them. Subgroup analyses may be done for subsets of participants (such as males and females), or for subsets of studies (such as different geographical locations). Subgroup analyses may be done as a means of investigating heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study.

Subgroup analyses of subsets of participants within studies are uncommon in systematic reviews based on published literature because sufficient details to extract data about separate participant types are seldom published in reports. By contrast, such subsets of participants are easily analysed when individual participant data have been collected (see Chapter 26 ). The methods we describe in the remainder of this chapter are for subgroups of studies.

Findings from multiple subgroup analyses may be misleading. Subgroup analyses are observational by nature and are not based on randomized comparisons. False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed. If their findings are presented as definitive conclusions there is clearly a risk of people being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations about directions for future research that, if followed, would waste scarce resources.

It is useful to distinguish between the notions of ‘qualitative interaction’ and ‘quantitative interaction’ (Yusuf et al 1991). Qualitative interaction exists if the direction of effect is reversed, that is if an intervention is beneficial in one subgroup but is harmful in another. Qualitative interaction is rare. This may be used as an argument that the most appropriate result of a meta-analysis is the overall effect across all subgroups. Quantitative interaction exists when the size of the effect varies but not the direction, that is if an intervention is beneficial to different degrees in different subgroups.

10.11.3 Undertaking subgroup analyses

Meta-analyses can be undertaken in RevMan both within subgroups of studies as well as across all studies irrespective of their subgroup membership. It is tempting to compare effect estimates in different subgroups by considering the meta-analysis results from each subgroup separately. This should only be done informally by comparing the magnitudes of effect. Noting that either the effect or the test for heterogeneity in one subgroup is statistically significant whilst that in the other subgroup is not statistically significant does not indicate that the subgroup factor explains heterogeneity. Since different subgroups are likely to contain different amounts of information and thus have different abilities to detect effects, it is extremely misleading simply to compare the statistical significance of the results.

10.11.3.1 Is the effect different in different subgroups?

Valid investigations of whether an intervention works differently in different subgroups involve comparing the subgroups with each other. It is a mistake to compare within-subgroup inferences such as P values. If one subgroup analysis is statistically significant and another is not, then the latter may simply reflect a lack of information rather than a smaller (or absent) effect. When there are only two subgroups, non-overlap of the confidence intervals indicates statistical significance, but note that the confidence intervals can overlap to a small degree and the difference still be statistically significant.

A formal statistical approach should be used to examine differences among subgroups (see MECIR Box 10.11.a ). A simple significance test to investigate differences between two or more subgroups can be performed (Borenstein and Higgins 2013). This procedure consists of undertaking a standard test for heterogeneity across subgroup results rather than across individual study results. When the meta-analysis uses a fixed-effect inverse-variance weighted average approach, the method is exactly equivalent to the test described by Deeks and colleagues (Deeks et al 2001). An I 2 statistic is also computed for subgroup differences. This describes the percentage of the variability in effect estimates from the different subgroups that is due to genuine subgroup differences rather than sampling error (chance). Note that these methods for examining subgroup differences should be used only when the data in the subgroups are independent (i.e. they should not be used if the same study participants contribute to more than one of the subgroups in the forest plot).

If fixed-effect models are used for the analysis within each subgroup, then these statistics relate to differences in typical effects across different subgroups. If random-effects models are used for the analysis within each subgroup, then the statistics relate to variation in the mean effects in the different subgroups.

An alternative method for testing for differences between subgroups is to use meta-regression techniques, in which case a random-effects model is generally preferred (see Section 10.11.4 ). Tests for subgroup differences based on random-effects models may be regarded as preferable to those based on fixed-effect models, due to the high risk of false-positive results when a fixed-effect model is used to compare subgroups (Higgins and Thompson 2004).

MECIR Box 10.11.a Relevant expectations for conduct of intervention reviews

10.11.4 Meta-regression

If studies are divided into subgroups (see Section 10.11.2 ), this may be viewed as an investigation of how a categorical study characteristic is associated with the intervention effects in the meta-analysis. For example, studies in which allocation sequence concealment was adequate may yield different results from those in which it was inadequate. Here, allocation sequence concealment, being either adequate or inadequate, is a categorical characteristic at the study level. Meta-regression is an extension to subgroup analyses that allows the effect of continuous, as well as categorical, characteristics to be investigated, and in principle allows the effects of multiple factors to be investigated simultaneously (although this is rarely possible due to inadequate numbers of studies) (Thompson and Higgins 2002). Meta-regression should generally not be considered when there are fewer than ten studies in a meta-analysis.

Meta-regressions are similar in essence to simple regressions, in which an outcome variable is predicted according to the values of one or more explanatory variables . In meta-regression, the outcome variable is the effect estimate (for example, a mean difference, a risk difference, a log odds ratio or a log risk ratio). The explanatory variables are characteristics of studies that might influence the size of intervention effect. These are often called ‘potential effect modifiers’ or covariates. Meta-regressions usually differ from simple regressions in two ways. First, larger studies have more influence on the relationship than smaller studies, since studies are weighted by the precision of their respective effect estimate. Second, it is wise to allow for the residual heterogeneity among intervention effects not modelled by the explanatory variables. This gives rise to the term ‘random-effects meta-regression’, since the extra variability is incorporated in the same way as in a random-effects meta-analysis (Thompson and Sharp 1999).

The regression coefficient obtained from a meta-regression analysis will describe how the outcome variable (the intervention effect) changes with a unit increase in the explanatory variable (the potential effect modifier). The statistical significance of the regression coefficient is a test of whether there is a linear relationship between intervention effect and the explanatory variable. If the intervention effect is a ratio measure, the log-transformed value of the intervention effect should always be used in the regression model (see Chapter 6, Section 6.1.2.1 ), and the exponential of the regression coefficient will give an estimate of the relative change in intervention effect with a unit increase in the explanatory variable.

Meta-regression can also be used to investigate differences for categorical explanatory variables as done in subgroup analyses. If there are J subgroups, membership of particular subgroups is indicated by using J minus 1 dummy variables (which can only take values of zero or one) in the meta-regression model (as in standard linear regression modelling). The regression coefficients will estimate how the intervention effect in each subgroup differs from a nominated reference subgroup. The P value of each regression coefficient will indicate the strength of evidence against the null hypothesis that the characteristic is not associated with the intervention effect.

Meta-regression may be performed using the ‘metareg’ macro available for the Stata statistical package, or using the ‘metafor’ package for R, as well as other packages.

10.11.5 Selection of study characteristics for subgroup analyses and meta-regression

Authors need to be cautious about undertaking subgroup analyses, and interpreting any that they do. Some considerations are outlined here for selecting characteristics (also called explanatory variables, potential effect modifiers or covariates) that will be investigated for their possible influence on the size of the intervention effect. These considerations apply similarly to subgroup analyses and to meta-regressions. Further details may be obtained elsewhere (Oxman and Guyatt 1992, Berlin and Antman 1994).

10.11.5.1 Ensure that there are adequate studies to justify subgroup analyses and meta-regressions

It is very unlikely that an investigation of heterogeneity will produce useful findings unless there is a substantial number of studies. Typical advice for undertaking simple regression analyses: that at least ten observations (i.e. ten studies in a meta-analysis) should be available for each characteristic modelled. However, even this will be too few when the covariates are unevenly distributed across studies.

10.11.5.2 Specify characteristics in advance

Authors should, whenever possible, pre-specify characteristics in the protocol that later will be subject to subgroup analyses or meta-regression. The plan specified in the protocol should then be followed (data permitting), without undue emphasis on any particular findings (see MECIR Box 10.11.b ). Pre-specifying characteristics reduces the likelihood of spurious findings, first by limiting the number of subgroups investigated, and second by preventing knowledge of the studies’ results influencing which subgroups are analysed. True pre-specification is difficult in systematic reviews, because the results of some of the relevant studies are often known when the protocol is drafted. If a characteristic was overlooked in the protocol, but is clearly of major importance and justified by external evidence, then authors should not be reluctant to explore it. However, such post-hoc analyses should be identified as such.

MECIR Box 10.11.b Relevant expectations for conduct of intervention reviews

10.11.5.3 Select a small number of characteristics

The likelihood of a false-positive result among subgroup analyses and meta-regression increases with the number of characteristics investigated. It is difficult to suggest a maximum number of characteristics to look at, especially since the number of available studies is unknown in advance. If more than one or two characteristics are investigated it may be sensible to adjust the level of significance to account for making multiple comparisons.

10.11.5.4 Ensure there is scientific rationale for investigating each characteristic

Selection of characteristics should be motivated by biological and clinical hypotheses, ideally supported by evidence from sources other than the included studies. Subgroup analyses using characteristics that are implausible or clinically irrelevant are not likely to be useful and should be avoided. For example, a relationship between intervention effect and year of publication is seldom in itself clinically informative, and if identified runs the risk of initiating a post-hoc data dredge of factors that may have changed over time.

Prognostic factors are those that predict the outcome of a disease or condition, whereas effect modifiers are factors that influence how well an intervention works in affecting the outcome. Confusion between prognostic factors and effect modifiers is common in planning subgroup analyses, especially at the protocol stage. Prognostic factors are not good candidates for subgroup analyses unless they are also believed to modify the effect of intervention. For example, being a smoker may be a strong predictor of mortality within the next ten years, but there may not be reason for it to influence the effect of a drug therapy on mortality (Deeks 1998). Potential effect modifiers may include participant characteristics (age, setting), the precise interventions (dose of active intervention, choice of comparison intervention), how the study was done (length of follow-up) or methodology (design and quality).

10.11.5.5 Be aware that the effect of a characteristic may not always be identified

Many characteristics that might have important effects on how well an intervention works cannot be investigated using subgroup analysis or meta-regression. These are characteristics of participants that might vary substantially within studies, but that can only be summarized at the level of the study. An example is age. Consider a collection of clinical trials involving adults ranging from 18 to 60 years old. There may be a strong relationship between age and intervention effect that is apparent within each study. However, if the mean ages for the trials are similar, then no relationship will be apparent by looking at trial mean ages and trial-level effect estimates. The problem is one of aggregating individuals’ results and is variously known as aggregation bias, ecological bias or the ecological fallacy (Morgenstern 1982, Greenland 1987, Berlin et al 2002). It is even possible for the direction of the relationship across studies be the opposite of the direction of the relationship observed within each study.

10.11.5.6 Think about whether the characteristic is closely related to another characteristic (confounded)

The problem of ‘confounding’ complicates interpretation of subgroup analyses and meta-regressions and can lead to incorrect conclusions. Two characteristics are confounded if their influences on the intervention effect cannot be disentangled. For example, if those studies implementing an intensive version of a therapy happened to be the studies that involved patients with more severe disease, then one cannot tell which aspect is the cause of any difference in effect estimates between these studies and others. In meta-regression, co-linearity between potential effect modifiers leads to similar difficulties (Berlin and Antman 1994). Computing correlations between study characteristics will give some information about which study characteristics may be confounded with each other.

10.11.6 Interpretation of subgroup analyses and meta-regressions

Appropriate interpretation of subgroup analyses and meta-regressions requires caution (Oxman and Guyatt 1992).

  • Subgroup comparisons are observational. It must be remembered that subgroup analyses and meta-regressions are entirely observational in their nature. These analyses investigate differences between studies. Even if individuals are randomized to one group or other within a clinical trial, they are not randomized to go in one trial or another. Hence, subgroup analyses suffer the limitations of any observational investigation, including possible bias through confounding by other study-level characteristics. Furthermore, even a genuine difference between subgroups is not necessarily due to the classification of the subgroups. As an example, a subgroup analysis of bone marrow transplantation for treating leukaemia might show a strong association between the age of a sibling donor and the success of the transplant. However, this probably does not mean that the age of donor is important. In fact, the age of the recipient is probably a key factor and the subgroup finding would simply be due to the strong association between the age of the recipient and the age of their sibling.  
  • Was the analysis pre-specified or post hoc? Authors should state whether subgroup analyses were pre-specified or undertaken after the results of the studies had been compiled (post hoc). More reliance may be placed on a subgroup analysis if it was one of a small number of pre-specified analyses. Performing numerous post-hoc subgroup analyses to explain heterogeneity is a form of data dredging. Data dredging is condemned because it is usually possible to find an apparent, but false, explanation for heterogeneity by considering lots of different characteristics.  
  • Is there indirect evidence in support of the findings? Differences between subgroups should be clinically plausible and supported by other external or indirect evidence, if they are to be convincing.  
  • Is the magnitude of the difference practically important? If the magnitude of a difference between subgroups will not result in different recommendations for different subgroups, then it may be better to present only the overall analysis results.  
  • Is there a statistically significant difference between subgroups? To establish whether there is a different effect of an intervention in different situations, the magnitudes of effects in different subgroups should be compared directly with each other. In particular, statistical significance of the results within separate subgroup analyses should not be compared (see Section 10.11.3.1 ).  
  • Are analyses looking at within-study or between-study relationships? For patient and intervention characteristics, differences in subgroups that are observed within studies are more reliable than analyses of subsets of studies. If such within-study relationships are replicated across studies then this adds confidence to the findings.

10.11.7 Investigating the effect of underlying risk

One potentially important source of heterogeneity among a series of studies is when the underlying average risk of the outcome event varies between the studies. The underlying risk of a particular event may be viewed as an aggregate measure of case-mix factors such as age or disease severity. It is generally measured as the observed risk of the event in the comparator group of each study (the comparator group risk, or CGR). The notion is controversial in its relevance to clinical practice since underlying risk represents a summary of both known and unknown risk factors. Problems also arise because comparator group risk will depend on the length of follow-up, which often varies across studies. However, underlying risk has received particular attention in meta-analysis because the information is readily available once dichotomous data have been prepared for use in meta-analyses. Sharp provides a full discussion of the topic (Sharp 2001).

Intuition would suggest that participants are more or less likely to benefit from an effective intervention according to their risk status. However, the relationship between underlying risk and intervention effect is a complicated issue. For example, suppose an intervention is equally beneficial in the sense that for all patients it reduces the risk of an event, say a stroke, to 80% of the underlying risk. Then it is not equally beneficial in terms of absolute differences in risk in the sense that it reduces a 50% stroke rate by 10 percentage points to 40% (number needed to treat=10), but a 20% stroke rate by 4 percentage points to 16% (number needed to treat=25).

Use of different summary statistics (risk ratio, odds ratio and risk difference) will demonstrate different relationships with underlying risk. Summary statistics that show close to no relationship with underlying risk are generally preferred for use in meta-analysis (see Section 10.4.3 ).

Investigating any relationship between effect estimates and the comparator group risk is also complicated by a technical phenomenon known as regression to the mean. This arises because the comparator group risk forms an integral part of the effect estimate. A high risk in a comparator group, observed entirely by chance, will on average give rise to a higher than expected effect estimate, and vice versa. This phenomenon results in a false correlation between effect estimates and comparator group risks. There are methods, which require sophisticated software, that correct for regression to the mean (McIntosh 1996, Thompson et al 1997). These should be used for such analyses, and statistical expertise is recommended.

10.11.8 Dose-response analyses

The principles of meta-regression can be applied to the relationships between intervention effect and dose (commonly termed dose-response), treatment intensity or treatment duration (Greenland and Longnecker 1992, Berlin et al 1993). Conclusions about differences in effect due to differences in dose (or similar factors) are on stronger ground if participants are randomized to one dose or another within a study and a consistent relationship is found across similar studies. While authors should consider these effects, particularly as a possible explanation for heterogeneity, they should be cautious about drawing conclusions based on between-study differences. Authors should be particularly cautious about claiming that a dose-response relationship does not exist, given the low power of many meta-regression analyses to detect genuine relationships.

10.12 Missing data

10.12.1 types of missing data.

There are many potential sources of missing data in a systematic review or meta-analysis (see Table 10.12.a ). For example, a whole study may be missing from the review, an outcome may be missing from a study, summary data may be missing for an outcome, and individual participants may be missing from the summary data. Here we discuss a variety of potential sources of missing data, highlighting where more detailed discussions are available elsewhere in the Handbook .

Whole studies may be missing from a review because they are never published, are published in obscure places, are rarely cited, or are inappropriately indexed in databases. Thus, review authors should always be aware of the possibility that they have failed to identify relevant studies. There is a strong possibility that such studies are missing because of their ‘uninteresting’ or ‘unwelcome’ findings (that is, in the presence of publication bias). This problem is discussed at length in Chapter 13 . Details of comprehensive search methods are provided in Chapter 4 .

Some studies might not report any information on outcomes of interest to the review. For example, there may be no information on quality of life, or on serious adverse effects. It is often difficult to determine whether this is because the outcome was not measured or because the outcome was not reported. Furthermore, failure to report that outcomes were measured may be dependent on the unreported results (selective outcome reporting bias; see Chapter 7, Section 7.2.3.3 ). Similarly, summary data for an outcome, in a form that can be included in a meta-analysis, may be missing. A common example is missing standard deviations (SDs) for continuous outcomes. This is often a problem when change-from-baseline outcomes are sought. We discuss imputation of missing SDs in Chapter 6, Section 6.5.2.8 . Other examples of missing summary data are missing sample sizes (particularly those for each intervention group separately), numbers of events, standard errors, follow-up times for calculating rates, and sufficient details of time-to-event outcomes. Inappropriate analyses of studies, for example of cluster-randomized and crossover trials, can lead to missing summary data. It is sometimes possible to approximate the correct analyses of such studies, for example by imputing correlation coefficients or SDs, as discussed in Chapter 23, Section 23.1 , for cluster-randomized studies and Chapter 23,Section 23.2 , for crossover trials. As a general rule, most methodologists believe that missing summary data (e.g. ‘no usable data’) should not be used as a reason to exclude a study from a systematic review. It is more appropriate to include the study in the review, and to discuss the potential implications of its absence from a meta-analysis.

It is likely that in some, if not all, included studies, there will be individuals missing from the reported results. Review authors are encouraged to consider this problem carefully (see MECIR Box 10.12.a ). We provide further discussion of this problem in Section 10.12.3 ; see also Chapter 8, Section 8.5 .

Missing data can also affect subgroup analyses. If subgroup analyses or meta-regressions are planned (see Section 10.11 ), they require details of the study-level characteristics that distinguish studies from one another. If these are not available for all studies, review authors should consider asking the study authors for more information.

Table 10.12.a Types of missing data in a meta-analysis

MECIR Box 10.12.a Relevant expectations for conduct of intervention reviews

10.12.2 General principles for dealing with missing data

There is a large literature of statistical methods for dealing with missing data. Here we briefly review some key concepts and make some general recommendations for Cochrane Review authors. It is important to think why data may be missing. Statisticians often use the terms ‘missing at random’ and ‘not missing at random’ to represent different scenarios.

Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual values of the missing data. For instance, if some quality-of-life questionnaires were lost in the postal system, this would be unlikely to be related to the quality of life of the trial participants who completed the forms. In some circumstances, statisticians distinguish between data ‘missing at random’ and data ‘missing completely at random’, although in the context of a systematic review the distinction is unlikely to be important. Data that are missing at random may not be important. Analyses based on the available data will often be unbiased, although based on a smaller sample size than the original data set.

Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual missing data. For instance, in a depression trial, participants who had a relapse of depression might be less likely to attend the final follow-up interview, and more likely to have missing outcome data. Such data are ‘non-ignorable’ in the sense that an analysis of the available data alone will typically be biased. Publication bias and selective reporting bias lead by definition to data that are ‘not missing at random’, and attrition and exclusions of individuals within studies often do as well.

The principal options for dealing with missing data are:

  • analysing only the available data (i.e. ignoring the missing data);
  • imputing the missing data with replacement values, and treating these as if they were observed (e.g. last observation carried forward, imputing an assumed outcome such as assuming all were poor outcomes, imputing the mean, imputing based on predicted values from a regression analysis);
  • imputing the missing data and accounting for the fact that these were imputed with uncertainty (e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard error); and
  • using statistical models to allow for missing data, making assumptions about their relationships with the available data.

Option 2 is practical in most circumstances and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in the imputed values and results, typically, in confidence intervals that are too narrow. Options 3 and 4 would require involvement of a knowledgeable statistician.

Five general recommendations for dealing with missing data in Cochrane Reviews are as follows:

  • Whenever possible, contact the original investigators to request missing data.
  • Make explicit the assumptions of any methods used to address missing data: for example, that the data are assumed missing at random, or that missing values were assumed to have a particular value such as a poor outcome.
  • Follow the guidance in Chapter 8 to assess risk of bias due to missing outcome data in randomized trials.
  • Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the assumptions that are made (see Section 10.14 ).
  • Address the potential impact of missing data on the findings of the review in the Discussion section.

10.12.3 Dealing with missing outcome data from individual participants

Review authors may undertake sensitivity analyses to assess the potential impact of missing outcome data, based on assumptions about the relationship between missingness in the outcome and its true value. Several methods are available (Akl et al 2015). For dichotomous outcomes, Higgins and colleagues propose a strategy involving different assumptions about how the risk of the event among the missing participants differs from the risk of the event among the observed participants, taking account of uncertainty introduced by the assumptions (Higgins et al 2008a). Akl and colleagues propose a suite of simple imputation methods, including a similar approach to that of Higgins and colleagues based on relative risks of the event in missing versus observed participants. Similar ideas can be applied to continuous outcome data (Ebrahim et al 2013, Ebrahim et al 2014). Particular care is required to avoid double counting events, since it can be unclear whether reported numbers of events in trial reports apply to the full randomized sample or only to those who did not drop out (Akl et al 2016).

Although there is a tradition of implementing ‘worst case’ and ‘best case’ analyses clarifying the extreme boundaries of what is theoretically possible, such analyses may not be informative for the most plausible scenarios (Higgins et al 2008a).

10.13 Bayesian approaches to meta-analysis

Bayesian statistics is an approach to statistics based on a different philosophy from that which underlies significance tests and confidence intervals. It is essentially about updating of evidence. In a Bayesian analysis, initial uncertainty is expressed through a prior distribution about the quantities of interest. Current data and assumptions concerning how they were generated are summarized in the likelihood . The posterior distribution for the quantities of interest can then be obtained by combining the prior distribution and the likelihood. The likelihood summarizes both the data from studies included in the meta-analysis (for example, 2×2 tables from randomized trials) and the meta-analysis model (for example, assuming a fixed effect or random effects). The result of the analysis is usually presented as a point estimate and 95% credible interval from the posterior distribution for each quantity of interest, which look much like classical estimates and confidence intervals. Potential advantages of Bayesian analyses are summarized in Box 10.13.a . Bayesian analysis may be performed using WinBUGS software (Smith et al 1995, Lunn et al 2000), within R (Röver 2017), or – for some applications – using standard meta-regression software with a simple trick (Rhodes et al 2016).

A difference between Bayesian analysis and classical meta-analysis is that the interpretation is directly in terms of belief: a 95% credible interval for an odds ratio is that region in which we believe the odds ratio to lie with probability 95%. This is how many practitioners actually interpret a classical confidence interval, but strictly in the classical framework the 95% refers to the long-term frequency with which 95% intervals contain the true value. The Bayesian framework also allows a review author to calculate the probability that the odds ratio has a particular range of values, which cannot be done in the classical framework. For example, we can determine the probability that the odds ratio is less than 1 (which might indicate a beneficial effect of an experimental intervention), or that it is no larger than 0.8 (which might indicate a clinically important effect). It should be noted that these probabilities are specific to the choice of the prior distribution. Different meta-analysts may analyse the same data using different prior distributions and obtain different results. It is therefore important to carry out sensitivity analyses to investigate how the results depend on any assumptions made.

In the context of a meta-analysis, prior distributions are needed for the particular intervention effect being analysed (such as the odds ratio or the mean difference) and – in the context of a random-effects meta-analysis – on the amount of heterogeneity among intervention effects across studies. Prior distributions may represent subjective belief about the size of the effect, or may be derived from sources of evidence not included in the meta-analysis, such as information from non-randomized studies of the same intervention or from randomized trials of other interventions. The width of the prior distribution reflects the degree of uncertainty about the quantity. When there is little or no information, a ‘non-informative’ prior can be used, in which all values across the possible range are equally likely.

Most Bayesian meta-analyses use non-informative (or very weakly informative) prior distributions to represent beliefs about intervention effects, since many regard it as controversial to combine objective trial data with subjective opinion. However, prior distributions are increasingly used for the extent of among-study variation in a random-effects analysis. This is particularly advantageous when the number of studies in the meta-analysis is small, say fewer than five or ten. Libraries of data-based prior distributions are available that have been derived from re-analyses of many thousands of meta-analyses in the Cochrane Database of Systematic Reviews (Turner et al 2012).

Box 10.13.a Some potential advantages of Bayesian meta-analysis

Statistical expertise is strongly recommended for review authors who wish to carry out Bayesian analyses. There are several good texts (Sutton et al 2000, Sutton and Abrams 2001, Spiegelhalter et al 2004).

10.14 Sensitivity analyses

The process of undertaking a systematic review involves a sequence of decisions. Whilst many of these decisions are clearly objective and non-contentious, some will be somewhat arbitrary or unclear. For instance, if eligibility criteria involve a numerical value, the choice of value is usually arbitrary: for example, defining groups of older people may reasonably have lower limits of 60, 65, 70 or 75 years, or any value in between. Other decisions may be unclear because a study report fails to include the required information. Some decisions are unclear because the included studies themselves never obtained the information required: for example, the outcomes of those who were lost to follow-up. Further decisions are unclear because there is no consensus on the best statistical method to use for a particular problem.

It is highly desirable to prove that the findings from a systematic review are not dependent on such arbitrary or unclear decisions by using sensitivity analysis (see MECIR Box 10.14.a ). A sensitivity analysis is a repeat of the primary analysis or meta-analysis in which alternative decisions or ranges of values are substituted for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the meta-analysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking the meta-analysis twice: the first time including all studies and, second, including only those that are definitely known to be eligible. A sensitivity analysis asks the question, ‘Are the findings robust to the decisions made in the process of obtaining them?’

MECIR Box 10.14.a Relevant expectations for conduct of intervention reviews

There are many decision nodes within the systematic review process that can generate a need for a sensitivity analysis. Examples include:

Searching for studies:

  • Should abstracts whose results cannot be confirmed in subsequent publications be included in the review?

Eligibility criteria:

  • Characteristics of participants: where a majority but not all people in a study meet an age range, should the study be included?
  • Characteristics of the intervention: what range of doses should be included in the meta-analysis?
  • Characteristics of the comparator: what criteria are required to define usual care to be used as a comparator group?
  • Characteristics of the outcome: what time point or range of time points are eligible for inclusion?
  • Study design: should blinded and unblinded outcome assessment be included, or should study inclusion be restricted by other aspects of methodological criteria?

What data should be analysed?

  • Time-to-event data: what assumptions of the distribution of censored data should be made?
  • Continuous data: where standard deviations are missing, when and how should they be imputed? Should analyses be based on change scores or on post-intervention values?
  • Ordinal scales: what cut-point should be used to dichotomize short ordinal scales into two groups?
  • Cluster-randomized trials: what values of the intraclass correlation coefficient should be used when trial analyses have not been adjusted for clustering?
  • Crossover trials: what values of the within-subject correlation coefficient should be used when this is not available in primary reports?
  • All analyses: what assumptions should be made about missing outcomes? Should adjusted or unadjusted estimates of intervention effects be used?

Analysis methods:

  • Should fixed-effect or random-effects methods be used for the analysis?
  • For dichotomous outcomes, should odds ratios, risk ratios or risk differences be used?
  • For continuous outcomes, where several scales have assessed the same dimension, should results be analysed as a standardized mean difference across all scales or as mean differences individually for each scale?

Some sensitivity analyses can be pre-specified in the study protocol, but many issues suitable for sensitivity analysis are only identified during the review process where the individual peculiarities of the studies under investigation are identified. When sensitivity analyses show that the overall result and conclusions are not affected by the different decisions that could be made during the review process, the results of the review can be regarded with a higher degree of certainty. Where sensitivity analyses identify particular decisions or missing information that greatly influence the findings of the review, greater resources can be deployed to try and resolve uncertainties and obtain extra information, possibly through contacting trial authors and obtaining individual participant data. If this cannot be achieved, the results must be interpreted with an appropriate degree of caution. Such findings may generate proposals for further investigations and future research.

Reporting of sensitivity analyses in a systematic review may best be done by producing a summary table. Rarely is it informative to produce individual forest plots for each sensitivity analysis undertaken.

Sensitivity analyses are sometimes confused with subgroup analysis. Although some sensitivity analyses involve restricting the analysis to a subset of the totality of studies, the two methods differ in two ways. First, sensitivity analyses do not attempt to estimate the effect of the intervention in the group of studies removed from the analysis, whereas in subgroup analyses, estimates are produced for each subgroup. Second, in sensitivity analyses, informal comparisons are made between different ways of estimating the same thing, whereas in subgroup analyses, formal statistical comparisons are made across the subgroups.

10.15 Chapter information

Editors: Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Contributing authors: Douglas Altman, Deborah Ashby, Jacqueline Birks, Michael Borenstein, Marion Campbell, Jonathan Deeks, Matthias Egger, Julian Higgins, Joseph Lau, Keith O’Rourke, Gerta Rücker, Rob Scholten, Jonathan Sterne, Simon Thompson, Anne Whitehead

Acknowledgements: We are grateful to the following for commenting helpfully on earlier drafts: Bodil Als-Nielsen, Deborah Ashby, Jesse Berlin, Joseph Beyene, Jacqueline Birks, Michael Bracken, Marion Campbell, Chris Cates, Wendong Chen, Mike Clarke, Albert Cobos, Esther Coren, Francois Curtin, Roberto D’Amico, Keith Dear, Heather Dickinson, Diana Elbourne, Simon Gates, Paul Glasziou, Christian Gluud, Peter Herbison, Sally Hollis, David Jones, Steff Lewis, Tianjing Li, Joanne McKenzie, Philippa Middleton, Nathan Pace, Craig Ramsey, Keith O’Rourke, Rob Scholten, Guido Schwarzer, Jack Sinclair, Jonathan Sterne, Simon Thompson, Andy Vail, Clarine van Oel, Paula Williamson and Fred Wolf.

Funding: JJD received support from the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH is a member of the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

10.16 References

Agresti A. An Introduction to Categorical Data Analysis . New York (NY): John Wiley & Sons; 1996.

Akl EA, Kahale LA, Agoritsas T, Brignardello-Petersen R, Busse JW, Carrasco-Labra A, Ebrahim S, Johnston BC, Neumann I, Sola I, Sun X, Vandvik P, Zhang Y, Alonso-Coello P, Guyatt G. Handling trial participants with missing outcome data when conducting a meta-analysis: a systematic survey of proposed approaches. Systematic Reviews 2015; 4 : 98.

Akl EA, Kahale LA, Ebrahim S, Alonso-Coello P, Schünemann HJ, Guyatt GH. Three challenges described for identifying participants with missing data in trials reports, and potential solutions suggested to systematic reviewers. Journal of Clinical Epidemiology 2016; 76 : 147-154.

Altman DG, Bland JM. Detecting skewness from summary information. BMJ 1996; 313 : 1200.

Anzures-Cabrera J, Sarpatwari A, Higgins JPT. Expressing findings from meta-analyses of continuous outcomes in terms of risks. Statistics in Medicine 2011; 30 : 2967-2985.

Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology 1993; 4 : 218-228.

Berlin JA, Antman EM. Advantages and limitations of metaanalytic regressions of clinical trials data. Online Journal of Current Clinical Trials 1994; Doc No 134 .

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman KA, Group A-LAITS. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21 : 371-387.

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods 2010; 1 : 97-111.

Borenstein M, Higgins JPT. Meta-analysis and subgroups. Prev Sci 2013; 14 : 134-143.

Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 2007; 26 : 53-77.

Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine 2000; 19 : 3127-3131.

da Costa BR, Nuesch E, Rutjes AW, Johnston BC, Reichenbach S, Trelle S, Guyatt GH, Jüni P. Combining follow-up and change data is valid in meta-analyses of continuous outcomes: a meta-epidemiological study. Journal of Clinical Epidemiology 2013; 66 : 847-855.

Deeks JJ. Systematic reviews of published evidence: Miracles or minefields? Annals of Oncology 1998; 9 : 703-709.

Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 285-312.

Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine 2002; 21 : 1575-1600.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7 : 177-188.

DiGuiseppi C, Higgins JPT. Interventions for promoting smoke alarm ownership and function. Cochrane Database of Systematic Reviews 2001; 2 : CD002246.

Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Johnston BC, Guyatt GH. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2013; 66 : 1014-1021 e1011.

Ebrahim S, Johnston BC, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Guyatt GH. Addressing continuous data measured with different instruments for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2014; 67 : 560-570.

Efthimiou O. Practical guide to the meta-analysis of rare events. Evidence-Based Mental Health 2018; 21 : 72-76.

Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315 : 629-634.

Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Statistics in Medicine 2000; 19 : 1707-1728.

Greenland S, Robins JM. Estimation of a common effect parameter from sparse follow-up data. Biometrics 1985; 41 : 55-68.

Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiologic Reviews 1987; 9 : 1-30.

Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American Journal of Epidemiology 1992; 135 : 1301-1309.

Guevara JP, Berlin JA, Wolf FM. Meta-analytic methods for pooling rates when follow-up duration varies: a case study. BMC Medical Research Methodology 2004; 4 : 17.

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20 : 3875-3889.

Hasselblad V, McCrory DC. Meta-analytic tools for medical decision making: A practical guide. Medical Decision Making 1995; 15 : 81-96.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002; 21 : 1539-1558.

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327 : 557-560.

Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Statistics in Medicine 2004; 23 : 1663-1682.

Higgins JPT, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clinical Trials 2008a; 5 : 225-239.

Higgins JPT, White IR, Anzures-Cabrera J. Meta-analysis of skewed data: combining results reported on log-transformed or raw scales. Statistics in Medicine 2008b; 27 : 6072-6092.

Higgins JPT, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172 : 137-159.

Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Annals of Internal Medicine 2001; 135 : 982-989.

Langan D, Higgins JPT, Simmonds M. An empirical comparison of heterogeneity variance estimators in 12 894 meta-analyses. Research Synthesis Methods 2015; 6 : 195-205.

Langan D, Higgins JPT, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Research Synthesis Methods 2017; 8 : 181-198.

Langan D, Higgins JPT, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Research Synthesis Methods 2019; 10 : 83-98.

Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001; 322 : 1479-1480.

Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 2000; 10 : 325-337.

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 1959; 22 : 719-748.

McIntosh MW. The population risk as an explanatory variable in research synthesis of clinical trials. Statistics in Medicine 1996; 15 : 1713-1728.

Morgenstern H. Uses of ecologic analysis in epidemiologic research. American Journal of Public Health 1982; 72 : 1336-1344.

Oxman AD, Guyatt GH. A consumers guide to subgroup analyses. Annals of Internal Medicine 1992; 116 : 78-84.

Peto R, Collins R, Gray R. Large-scale randomized evidence: large, simple trials and overviews of trials. Journal of Clinical Epidemiology 1995; 48 : 23-40.

Poole C, Greenland S. Random-effects meta-analyses are not always conservative. American Journal of Epidemiology 1999; 150 : 469-475.

Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ, Higgins JPT. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data. Statistics in Medicine 2016; 35 : 5495-5511.

Rice K, Higgins JPT, Lumley T. A re-evaluation of fixed effect(s) meta-analysis. Journal of the Royal Statistical Society Series A (Statistics in Society) 2018; 181 : 205-227.

Riley RD, Higgins JPT, Deeks JJ. Interpretation of random effects meta-analyses. BMJ 2011; 342 : d549.

Röver C. Bayesian random-effects meta-analysis using the bayesmeta R package 2017. https://arxiv.org/abs/1711.08683 .

Rücker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Statistics in Medicine 2009; 28 : 721-738.

Sharp SJ. Analysing the relationship between treatment benefit and underlying risk: precautions and practical recommendations. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 176-188.

Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Statistics in Medicine 2002; 21 : 3153-3159.

Simmonds MC, Tierney J, Bowden J, Higgins JPT. Meta-analysis of time-to-event data: a comparison of two-stage methods. Research Synthesis Methods 2011; 2 : 139-149.

Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47 : 881-889.

Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: a comparative study. Statistics in Medicine 1995; 14 : 2685-2699.

Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Chichester (UK): John Wiley & Sons; 2004.

Spittal MJ, Pirkis J, Gurrin LC. Meta-analysis of incidence rate data in the presence of zero events. BMC Medical Research Methodology 2015; 15 : 42.

Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-analysis in Medical Research . Chichester (UK): John Wiley & Sons; 2000.

Sutton AJ, Abrams KR. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research 2001; 10 : 277-303.

Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine 2004; 23 : 1351-1375.

Thompson SG, Smith TC, Sharp SJ. Investigating underlying risk as a source of heterogeneity in meta-analysis. Statistics in Medicine 1997; 16 : 2741-2758.

Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Statistics in Medicine 1999; 18 : 2693-2708.

Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine 2002; 21 : 1559-1574.

Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. International Journal of Epidemiology 2012; 41 : 818-827.

Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Research Synthesis Methods 2016; 7 : 55-79.

Whitehead A, Jones NMB. A meta-analysis of clinical trials involving different classifications of response into ordered categories. Statistics in Medicine 1994; 13 : 2503-2515.

Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases 1985; 27 : 335-371.

Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 1991; 266 : 93-98.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Logo for OPEN OKSTATE

Literature Review, Systematic Review and Meta-analysis

Literature reviews can be a good way to narrow down theoretical interests; refine a research question; understand contemporary debates; and orientate a particular research project. It is very common for PhD theses to contain some element of reviewing the literature around a particular topic. It’s typical to have an entire chapter devoted to reporting the result of this task, identifying gaps in the literature and framing the collection of additional data.

Systematic review is a type of literature review that uses systematic methods to collect secondary data, critically appraise research studies, and synthesise findings. Systematic reviews are designed to provide a comprehensive, exhaustive summary of current theories and/or evidence and published research (Siddaway, Wood & Hedges, 2019) and may be qualitative or qualitative. Relevant studies and literature are identified through a research question, summarised and synthesized into a discrete set of findings or a description of the state-of-the-art. This might result in a ‘literature review’ chapter in a doctoral thesis, but can also be the basis of an entire research project.

Meta-analysis is a specialised type of systematic review which is quantitative and rigorous, often comparing data and results across multiple similar studies. This is a common approach in medical research where several papers might report the results of trials of a particular treatment, for instance. The meta-analysis then statistical techniques to synthesize these into one summary. This can have a high statistical power but care must be taken not to introduce bias in the selection and filtering of evidence.

Whichever type of review is employed, the process is similarly linear. The first step is to frame a question which can guide the review. This is used to identify relevant literature, often through searching subject-specific scientific databases. From these results the most relevant will be identified. Filtering is important here as there will be time constraints that prevent the researcher considering every possible piece of evidence or theoretical viewpoint. Once a concrete evidence base has been identified, the researcher extracts relevant data before reporting the synthesized results in an extended piece of writing.

Literature Review: GO-GN Insights

Sarah Lambert used a systematic review of literature with both qualitative and quantitative phases to investigate the question “How can open education programs be reconceptualised as acts of social justice to improve the access, participation and success of those who are traditionally excluded from higher education knowledge and skills?”

“My PhD research used systematic review, qualitative synthesis, case study and discourse analysis techniques, each was underpinned and made coherent by a consistent critical inquiry methodology and an overarching research question. “Systematic reviews are becoming increasingly popular as a way to collect evidence of what works across multiple contexts and can be said to address some of the weaknesses of case study designs which provide detail about a particular context – but which is often not replicable in other socio-cultural contexts (such as other countries or states.) Publication of systematic reviews that are done according to well defined methods are quite likely to be published in high-ranking journals – my PhD supervisors were keen on this from the outset and I was encouraged along this path. “Previously I had explored social realist authors and a social realist approach to systematic reviews (Pawson on realist reviews) but they did not sufficiently embrace social relations, issues of power, inclusion/exclusion. My supervisors had pushed me to explain what kind of realist review I intended to undertake, and I found out there was a branch of critical realism which was briefly of interest. By getting deeply into theory and trying out ways of combining theory I also feel that I have developed a deeper understanding of conceptual working and the different ways theories can be used at all stagesof research and even how to come up with novel conceptual frameworks.”

Useful references for Systematic Review & Meta-Analysis: Finfgeld-Connett (2014); Lambert (2020); Siddaway, Wood & Hedges (2019)

Research Methods Handbook Copyright © 2020 by Rob Farrow; Francisco Iniesto; Martin Weller; and Rebecca Pitt is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

Systematic Reviews and Meta-Analysis: A Guide for Beginners

Affiliation.

  • 1 Department of Pediatrics, Advanced Pediatrics Centre, PGIMER, Chandigarh. Correspondence to: Prof Joseph L Mathew, Department of Pediatrics, Advanced Pediatrics Centre, PGIMER Chandigarh. [email protected].
  • PMID: 34183469
  • PMCID: PMC9065227
  • DOI: 10.1007/s13312-022-2500-y

Systematic reviews involve the application of scientific methods to reduce bias in review of literature. The key components of a systematic review are a well-defined research question, comprehensive literature search to identify all studies that potentially address the question, systematic assembly of the studies that answer the question, critical appraisal of the methodological quality of the included studies, data extraction and analysis (with and without statistics), and considerations towards applicability of the evidence generated in a systematic review. These key features can be remembered as six 'A'; Ask, Access, Assimilate, Appraise, Analyze and Apply. Meta-analysis is a statistical tool that provides pooled estimates of effect from the data extracted from individual studies in the systematic review. The graphical output of meta-analysis is a forest plot which provides information on individual studies and the pooled effect. Systematic reviews of literature can be undertaken for all types of questions, and all types of study designs. This article highlights the key features of systematic reviews, and is designed to help readers understand and interpret them. It can also help to serve as a beginner's guide for both users and producers of systematic reviews and to appreciate some of the methodological issues.

Publication types

  • Meta-Analysis
  • Meta-Analysis as Topic*
  • Research Design
  • Systematic Reviews as Topic*
  • Locations and Hours
  • UCLA Library
  • Research Guides
  • Biomedical Library Guides

Systematic Reviews

  • Types of Literature Reviews

What Makes a Systematic Review Different from Other Types of Reviews?

  • Planning Your Systematic Review
  • Database Searching
  • Creating the Search
  • Search Filters & Hedges
  • Grey Literature
  • Managing & Appraising Results
  • Further Resources

Reproduced from Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26: 91–108. doi:10.1111/j.1471-1842.2009.00848.x

  • << Previous: Home
  • Next: Planning Your Systematic Review >>
  • Last Updated: Apr 10, 2024 11:08 AM
  • URL: https://guides.library.ucla.edu/systematicreviews

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 16, Issue 1

What is meta-analysis?

  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Allison Shorten 1 ,
  • Brett Shorten 2
  • 1 School of Nursing , Yale University , New Haven, Connecticut , USA
  • 2 Informed Health Choices Trust, Wollongong, New South Wales, Australia
  • Correspondence to : Dr Allison Shorten Yale University School of Nursing, 100 Church Street South, PO Box 9740, New Haven, CT 06536, USA; allison.shorten{at}yale.edu

https://doi.org/10.1136/eb-2012-101118

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

When clinicians begin their search for the best available evidence to inform decision-making, they are usually directed to the top of the ‘evidence pyramid’ to find out whether a systematic review and meta-analysis have been conducted. The Cochrane Library 1 is fast filling with systematic reviews and meta-analyses that aim to answer important clinical questions and provide the most reliable evidence to inform practice and research. So what is meta-analysis and how can it contribute to practice?

The Five-step process

There is debate about the best practice for meta-analysis, however there are five common steps.

Step 1: the research question

A clinical research question is identified and a hypothesis proposed. The likely clinical significance is explained and the study design and analytical plan are justified.

Step 2: systematic review

A systematic review (SR) is specifically designed to address the research question and conducted to identify all studies considered to be both relevant and of sufficiently good quality to warrant inclusion. Often, only studies published in established journals are identified, but identification of ‘unpublished’ data is important to avoid ‘publication bias’ or exclusion of studies with negative findings. 4 Some meta-analyses only consider randomised control trials (RCTs) in the quest for highest quality evidence. Other types of ‘experimental’ and ‘quasi-experimental’ studies may be included if they satisfy the defined inclusion/exclusion criteria.

Step 3: data extraction

Once studies are selected for inclusion in the meta-analysis, summary data or outcomes are extracted from each study. In addition, sample sizes and measures of data variability for both intervention and control groups are required. Depending on the study and the research question, outcome measures could include numerical measures or categorical measures. For example, differences in scores on a questionnaire or differences in a measurement level such as blood pressure would be reported as a numerical mean. However, differences in the likelihood of being in one category versus another (eg, vaginal birth versus cesarean birth) are usually reported in terms of risk measures such as OR or relative risk (RR).

Step 4: standardisation and weighting studies

Having assembled all the necessary data, the fourth step is to calculate appropriate summary measures from each study for further analysis. These measures are usually called Effect Sizes and represent the difference in average scores between intervention and control groups. For example, the difference in change in blood pressure between study participants who used drug X compared with participants who used a placebo. Since units of measurement typically vary across included studies, they usually need to be ‘standardised’ in order to produce comparable estimates of this effect. When different outcome measures are used, such as when researchers use different tests, standardisation is imperative. Standardisation is achieved by taking, for each study, the mean score for the intervention group, subtracting the mean for the control group and dividing this result by the appropriate measure of variability in that data set.

The results of some studies need to carry more weight than others. Larger studies (as measured by sample sizes) are thought to produce more precise effect size estimates than smaller studies. Second, studies with less data variability, for example, smaller SD or narrower CIs are often regarded as ‘better quality’ in study design. A weighting statistic that seeks to incorporate both these factors, known as inverse variance , is commonly used.

Step 5: final estimates of effect

The final stage is to select and apply an appropriate model to compare Effect Sizes across different studies. The most common models used are Fixed Effects and Random Effects models. Fixed Effects models are based on the ‘assumption that every study is evaluating a common treatment effect’. 5 This means that the assumption is that all studies would estimate the same Effect Size were it not for different levels of sample variability across different studies. In contrast, the Random Effects model ‘assumes that the true treatment effects in the individual studies may be different from each other’. 5 and attempts to allow for this additional source of interstudy variation in Effect Sizes . Whether this latter source of variability is likely to be important is often assessed within the meta-analysis by testing for ‘heterogeneity’.

Forest plot

The final estimates from a meta-analysis are often graphically reported in the form of a ‘Forest Plot’.

In the hypothetical Forest Plot shown in figure 1 , for each study, a horizontal line indicates the standardised Effect Size estimate (the rectangular box in the centre of each line) and 95% CI for the risk ratio used. For each of the studies, drug X reduced the risk of death (the risk ratio is less than 1.0). However, the first study was larger than the other two (the size of the boxes represents the relative weights calculated by the meta-analysis). Perhaps, because of this, the estimates for the two smaller studies were not statistically significant (the lines emanating from their boxes include the value of 1). When all the three studies were combined in the meta-analysis, as represented by the diamond, we get a more precise estimate of the effect of the drug, where the diamond represents both the combined risk ratio estimate and the limits of the 95% CI.

  • Download figure
  • Open in new tab
  • Download powerpoint

Hypothetical Forest Plot

Relevance to practice and research

Many Evidence Based Nursing commentaries feature recently published systematic review and meta-analysis because they not only bring new insight or strength to recommendations about the most effective healthcare practices but they also identify where future research should be directed to bridge the gaps or limitations in current evidence. The strength of conclusions from meta-analysis largely depends on the quality of the data available for synthesis. This reflects the quality of individual studies and the systematic review. Meta-analysis does not magically resolve the problem of underpowered or poorly designed studies and clinicians can be frustrated to find that even when a meta-analysis has been conducted, all that the researchers can conclude is that the evidence is weak, there is uncertainty about the effects of treatment and that higher quality research is needed to better inform practice. This is still an important finding and can inform our practice and challenge us to fill the evidence gaps with better quality research in the future.

  • ↵ The Cochrane Library . http://www.thecochranelibrary.com/view/0/index.html (accessed 23 Oct 2012).
  • Davey Smith G
  • Davey Smoth G
  • Higgins JPT ,

Competing interests None.

Read the full text or download the PDF:

A Guide to Conducting Reviews: Meta-Analysis

  • Literature Review
  • Systematic Review
  • Meta-Analysis
  • Scoping Review
  • Rapid Review
  • Umbrella Review
  • How and Where to Search?
  • Courses and Webinars
  • Helpful Tools
  • Service Charter
  • Online Consultation Form
  • Publications Co-authored by AUB University Librarians
  • Subject librarians

literature review meta analysis

Definition : A specialized subset of systematic reviews, meta-analysis is a statistical technique for combining the findings from disparate quantitative studies and using the pooled data to come to new statistical conclusions. Not all systematic reviews include meta-analysis, but all meta-analyses are found in systematic reviews.

Aim : To synthesize evidence across studies to detect effects, estimate their magnitudes, and analyze the factors influencing those effects.

Key characteristics:

  • Uses statistical methods to objectively evaluate, synthesize, and summarize results.
  • Systematic reviews and meta-analyses are undertaken by a research team rather than individual researchers to facilitate expedited review of studies and reduce researcher bias.

Strengths : Combines individual studies to determine overall evidence-based strength. Conclusions produced by meta-analysis are statistically stronger than the analysis of any single study, due to increased numbers of subjects, greater diversity among subjects, or accumulated effects and results. 

Drawbacks/Limitations : Combining data from disparate studies produces misleading or unreliable results. For a meta-analysis to be valid, all included studies must be sufficiently similar.

Source : TARG Bristol. (2017, November 13).  A three minute primer on meta-analysis  [Video]. YouTube. https://www.youtube.com/watch?v=i675gZNe3MY

Further Reading: e-Books

Cover Art

  • << Previous: Systematic Review
  • Next: Scoping Review >>
  • Last Updated: Mar 19, 2024 9:15 AM
  • URL: https://aub.edu.lb.libguides.com/conductingreviews

University Library, University of Illinois at Urbana-Champaign

University of Illinois Library Wordmark

Literature Reviews in Medicine and Health

  • Types of Literature Reviews
  • Narrative Review
  • Meta-Analysis
  • Steps of a Systematic Review
  • Systematic Review Standards
  • Systematic Review Tools
  • Systematic Review in the Social Sciences
  • Rapid Review
  • Scoping Review Process
  • Formulate the Question
  • Develop a Protocol
  • Search Terms (Mesh, etc.)
  • Search Tips
  • Citation Management
  • Appraisal by Study Type
  • Analyze Data
  • Library Services

Meta-analysis

A quantitative method of combining the results of independent studies (usually drawn from the published literature) and synthesizing summaries and conclusions which may be used to evaluate therapeutic effectiveness, plan new studies, etc., with application chiefly in the areas of research and medicine.

Source: MeSH: https://www.ncbi.nlm.nih.gov/mesh/68015201

Technique that statistically combines the results of quantitative study to provide a more precise effect of the results.

SEARCH aims for exhaustive, comprehensive searching. May use funnel plot to assess completeness. requires either very sensitive search to retrieve all studies or separately conceived quantitative and qualitative strategies.

APPRAISAL includes quality assessment that may determine inclusion/exclusion and/or sensitivity analyses.

SYNTHESIS is graphical and tabular with narrative commentary.

ANALYSIS includes numerical analysis of measures of effect assuming absence of heterogeneity.

Source: A typology of reviews: an analysis of 14 review types and associated methodologies. Grant MJ & Booth A. Health information and Libraries Journal year : 2009 26(2):91 -108. doi: 10.1111/j.1471-1842.2009.00848.x.

Meta-Analysis Standards

  • PRISMA for Meta-Analyses checklist of essential reporting items to include when completing a network meta-analysis

Tools for MetaAnalyses

  • Meta Essentials Meta-Essentials is a free tool for meta-analysis. It facilitates the integration and synthesis of effect sizes from different studies. The tool consists of a set of workbooks designed for Microsoft Excel that, based on your input, automatically produces all the required statistics, tables, figures, and more.
  • MetaLight MetaLight is a software application designed to support the teaching and learning of meta-analysis. It is freely available and, as it uses the Silverlight browser plugin, can be run on just about any PC, thus facilitating its use in teaching environments such as PC labs.
  • Metafor The metafor package is a free and open-source add-on for conducting meta-analyses with the statistical software environment R. The package consists of a collection of functions that allow the user to calculate various effect size or outcome measures, fit equal-, fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots.
  • OpenMeta Completely open-source, cross-platform software for advanced meta-analysis.
  • OpenMEE OpenMEE is open-source software for performing meta-analysis suited to the need of ecologist and evolutionary biologists. This program is made possible by the National Science Foundation (NSF).
  • << Previous: Systematic Review
  • Next: Steps of a Systematic Review >>
  • Last Updated: Mar 5, 2024 2:36 PM
  • URL: https://guides.library.illinois.edu/healthscilitreviews
  • Follow us on Facebook
  • Follow us on Twitter
  • Criminal Justice
  • Environment
  • Politics & Government
  • Race & Gender

Expert Commentary

The literature review and meta-analysis: 2 journalism tools you should use

Reporters can get up to date on a public policy issue quickly by reading a research literature review or meta-analysis. This article from the Education Writers Association explains how to find and use them.

Tunnel tower of books

Republish this article

Creative Commons License

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License .

by Denise-Marie Ordway, The Journalist's Resource June 20, 2019

This <a target="_blank" href="https://journalistsresource.org/media/meta-analysis-literature-review/">article</a> first appeared on <a target="_blank" href="https://journalistsresource.org">The Journalist's Resource</a> and is republished here under a Creative Commons license.<img src="https://journalistsresource.org/wp-content/uploads/2020/11/cropped-jr-favicon-150x150.png" style="width:1em;height:1em;margin-left:10px;">

We’re republishing this article on research literature reviews and meta-analyses with permission from the Education Writers Association , which hired Journalist’s Resource’s managing editor, Denise-Marie Ordway, late last year to write it in her free time. Ordway is a veteran education reporter who joined the EWA’s board of directors in May.  

This piece was first published on the EWA’s website . It has been slightly edited to reflect Journalist’s Resource’s editorial style.

It’s important to note that while the examples used in this piece come from the education beat, the information applies to literature reviews and meta-analyses across academic fields.

———–

When journalists want to learn what’s known about a certain subject, they look for research. Scholars are continually conducting studies on education topics ranging from kindergarten readiness and teacher pay to public university funding and Ivy League admissions.

One of the best ways for a reporter to get up to date quickly, though, is to read a study of studies, which come in two forms: a literature review and a meta-analysis.

A literature review is what it sounds like — a review of all the academic literature that exists on a specific issue or research question. If your school district or state is considering a new policy or approach, there’s no better way to educate yourself on what’s already been learned. Your news coverage also benefits from literature reviews: Rather than hunting down studies on your own and then worrying whether you found the right ones, you can, instead, share the results of a literature review that already has done that legwork for you.

Literature reviews examine both quantitative research, which is based on numerical data, and qualitative research, based on observations and other information that isn’t in numerical form. When scholars conduct a literature review, they summarize and synthesize multiple research studies and their findings, highlighting gaps in knowledge and the studies that are the strongest or most pertinent.

In addition, literature reviews often point out and explain disagreements between studies — why the results of one study seem to contradict the results of another.

For instance, a literature review might explain that the results of Study A and Study B differ because the two pieces of research focus on different populations or examine slightly different interventions. By relying on literature reviews, journalists also will be able to provide the context audiences need to make sense of the cumulative body of knowledge on a topic.

A meta-analysis also can be helpful to journalists, but for different reasons. To conduct a meta-analysis, scholars focus on quantitative research studies that generally aim to answer a research question — for example, whether there is a link between student suspension rates and academic achievement or whether a certain type of program reduces binge drinking among college students.

After pulling together the quantitative research that exists on the topic, scholars perform a systematic analysis of the numerical data and draw their own conclusions. The findings of a meta-analysis are statistically stronger than those reached in a single study, partly because pooling data from multiple, similar studies creates a larger sample.

The results of a meta-analysis are summarized as a single number or set of numbers that represent an average outcome for all the studies included in the review. A meta-analysis might tell us, for example, how many children, on average, are bullied in middle school, or the average number of points SAT scores rise after students complete a specific type of tutoring program.

It’s important to note that a meta-analysis is vulnerable to misinterpretation because its results can be deceptively simple: Just as you can’t learn everything about students from viewing their credit ratings or graduation rates, you can miss out on important nuances when you attempt to synthesize an entire body of research with a single number or set of numbers generated by a meta-analysis.

For journalists, literature reviews and meta-analyses are important tools for investigating public policy issues and fact-checking claims made by elected leaders, campus administrators and others. But to use them, reporters first need to know how to find them. And, as with any source of information, reporters also should be aware of the potential flaws and biases of these research overviews.

Finding research

The best place to find literature reviews and meta-analyses are in peer-reviewed academic journals such as the Review of Educational Research , Social Problems  and PNAS (short for Proceedings of the National Academy of Sciences of the United States of America ). While publication in a journal does not guarantee quality, the peer-review process is designed for quality control. Typically, papers appearing in top-tier journals have survived detailed critiques by scholars with expertise in the field. Thus, academic journals are an important source of reliable, evidence-based knowledge.

An easy way to find journal articles is by using Google Scholar, a free search engine that indexes published and unpublished research. Another option is to go directly to journal websites. Although  many academic journals keep their research behind paywalls, some provide journalists with free subscriptions or special access codes. Other ways to get around journal paywalls are outlined in a tip sheet that Journalist’s Resource , a project of Harvard’s Shorenstein Center on Media, Politics and Public Policy, created specifically for reporters.

Another thing to keep in mind: Literature reviews and meta-analyses do not exist on every education topic. If you have trouble finding one, reach out to an education professor or research organization such as the American Educational Research Association for guidance.

Sources of bias

Because literature reviews and meta-analyses are based on an examination of multiple studies, the strength of their findings relies heavily on three factors:

  • the quality of each included study,
  • ​the completeness of researchers’ search for scholarship on the topic of interest, and
  • ​researchers’ decisions about which studies to include and leave out.

In fact, many of the choices researchers make during each step of designing and carrying out a meta-analysis can create biases that might influence their results.

Knowing these things can help journalists gauge the quality of a literature review or meta-analysis and ask better questions about them. This comes in handy for reporters wanting to take a critical lens to their coverage of these two forms of research, especially those claiming to have made a groundbreaking discovery.

That said, vetting a review or meta-analysis can be time-consuming. Remember that journalists are not expected to be experts in research methods. When in doubt, contact education researchers for guidance and insights. Also, be sure to interview authors about their studies’ strengths, weaknesses, limitations and real-world implications.

Study quality, appropriateness

If scholars perform a meta-analysis using biased data or data from studies that are too dissimilar, the findings might be misleading — or outright incorrect. One of the biggest potential flaws of meta-analyses is the pooling of data from studies that should not be combined. For example, even if two individual studies focus on school meals, the authors might be looking at different populations, using different definitions and collecting data differently.

Perhaps the authors of the first study consider a school meal to be a hot lunch prepared by a public school cafeteria in Oklahoma, while the research team for the second study defines a school meal as any food an adult or child eats at college preparatory schools throughout Europe. What if the first study relies on data collected from school records over a decade and the second relies on data extracted from a brief online survey of students? Researchers performing a meta-analysis would need to make a judgment call about the appropriateness of merging information from these two studies, conducted in different parts of the world.

Search completeness

Researchers should explain how hard they worked to find all the research that exists on the topic they examined. Small differences in search strategies can lead to substantial differences in search results. If, for instance, search terms are too vague or specific, scholars might miss some compelling studies. Likewise, results may vary according to the databases, websites and search engines used.

Decisions about what to include

Scholars are not supposed to cherry-pick the research they include in literature reviews and meta-analyses. But decisions researchers make about which kinds of scholarship make the cut can influence conclusions.

Should they include unpublished research, such as working papers and papers presented at academic conferences? Does it make sense to exclude studies written in foreign languages? What about doctoral dissertations? Should researchers only include studies that have been published in journals, which tend to favor research with positive findings? Some scholars argue that meta-analyses that rely solely on published research offer misleading findings.

Other factors to consider

As journalists consider how the process of conducting literature reviews and meta-analyses affects results, they also should look for indicators of quality among the individual research studies examined. For example:

  • Sample sizes: Bigger samples tend to provide more accurate results than smaller ones.
  • ​Study duration: Data collected over several years generally offer a more complete picture than data gathered over a few weeks.
  • ​Study age: In some cases, an older study might not be reliable anymore. If a study appears to be too old, ask yourself if there is a reason to expect that conditions have changed substantially since its publication or release.
  • ​Researcher credentials: A scholar’s education, work experience and publication history often reflect their level of expertise.

About The Author

' src=

Denise-Marie Ordway

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 08 April 2024

A systematic review and multivariate meta-analysis of the physical and mental health benefits of touch interventions

  • Julian Packheiser   ORCID: orcid.org/0000-0001-9805-6755 2   na1   nAff1 ,
  • Helena Hartmann 2 , 3 , 4   na1 ,
  • Kelly Fredriksen 2 ,
  • Valeria Gazzola   ORCID: orcid.org/0000-0003-0324-0619 2 ,
  • Christian Keysers   ORCID: orcid.org/0000-0002-2845-5467 2 &
  • Frédéric Michon   ORCID: orcid.org/0000-0003-1289-2133 2  

Nature Human Behaviour ( 2024 ) Cite this article

2558 Accesses

675 Altmetric

Metrics details

  • Human behaviour
  • Paediatric research
  • Randomized controlled trials

Receiving touch is of critical importance, as many studies have shown that touch promotes mental and physical well-being. We conducted a pre-registered (PROSPERO: CRD42022304281) systematic review and multilevel meta-analysis encompassing 137 studies in the meta-analysis and 75 additional studies in the systematic review ( n  = 12,966 individuals, search via Google Scholar, PubMed and Web of Science until 1 October 2022) to identify critical factors moderating touch intervention efficacy. Included studies always featured a touch versus no touch control intervention with diverse health outcomes as dependent variables. Risk of bias was assessed via small study, randomization, sequencing, performance and attrition bias. Touch interventions were especially effective in regulating cortisol levels (Hedges’ g  = 0.78, 95% confidence interval (CI) 0.24 to 1.31) and increasing weight (0.65, 95% CI 0.37 to 0.94) in newborns as well as in reducing pain (0.69, 95% CI 0.48 to 0.89), feelings of depression (0.59, 95% CI 0.40 to 0.78) and state (0.64, 95% CI 0.44 to 0.84) or trait anxiety (0.59, 95% CI 0.40 to 0.77) for adults. Comparing touch interventions involving objects or robots resulted in similar physical (0.56, 95% CI 0.24 to 0.88 versus 0.51, 95% CI 0.38 to 0.64) but lower mental health benefits (0.34, 95% CI 0.19 to 0.49 versus 0.58, 95% CI 0.43 to 0.73). Adult clinical cohorts profited more strongly in mental health domains compared with healthy individuals (0.63, 95% CI 0.46 to 0.80 versus 0.37, 95% CI 0.20 to 0.55). We found no difference in health benefits in adults when comparing touch applied by a familiar person or a health care professional (0.51, 95% CI 0.29 to 0.73 versus 0.50, 95% CI 0.38 to 0.61), but parental touch was more beneficial in newborns (0.69, 95% CI 0.50 to 0.88 versus 0.39, 95% CI 0.18 to 0.61). Small but significant small study bias and the impossibility to blind experimental conditions need to be considered. Leveraging factors that influence touch intervention efficacy will help maximize the benefits of future interventions and focus research in this field.

Similar content being viewed by others

literature review meta analysis

Touching the social robot PARO reduces pain perception and salivary oxytocin levels

Nirit Geva, Florina Uzefovsky & Shelly Levy-Tzedek

literature review meta analysis

The impact of mindfulness apps on psychological processes of change: a systematic review

Natalia Macrynikola, Zareen Mir, … John Torous

literature review meta analysis

The why, who and how of social touch

Juulia T. Suvilehto, Asta Cekaite & India Morrison

The sense of touch has immense importance for many aspects of our life. It is the first of all the senses to develop in newborns 1 and the most direct experience of contact with our physical and social environment 2 . Complementing our own touch experience, we also regularly receive touch from others around us, for example, through consensual hugs, kisses or massages 3 .

The recent coronavirus pandemic has raised awareness regarding the need to better understand the effects that touch—and its reduction during social distancing—can have on our mental and physical well-being. The most common touch interventions, for example, massage for adults or kangaroo care for newborns, have been shown to have a wide range of both mental and physical health benefits, from facilitating growth and development to buffering against anxiety and stress, over the lifespan of humans and animals alike 4 . Despite the substantial weight this literature gives to support the benefits of touch, it is also characterized by a large variability in, for example, studied cohorts (adults, children, newborns and animals), type and duration of applied touch (for example, one-time hug versus repeated 60-min massages), measured health outcomes (ranging from physical health outcomes such as sleep and blood pressure to mental health outcomes such as depression or mood) and who actually applies the touch (for example, partner versus stranger).

A meaningful tool to make sense of this vast amount of research is through meta-analysis. While previous meta-analyses on this topic exist, they were limited in scope, focusing only on particular types of touch, cohorts or specific health outcomes (for example, refs. 5 , 6 ). Furthermore, despite best efforts, meaningful variables that moderate the efficacy of touch interventions could not yet be identified. However, understanding these variables is critical to tailor touch interventions and guide future research to navigate this diverse field with the ultimate aim of promoting well-being in the population.

In this Article, we describe a pre-registered, large-scale systematic review and multilevel, multivariate meta-analysis to address this need with quantitative evidence for (1) the effect of touch interventions on physical and mental health and (2) which moderators influence the efficacy of the intervention. In particular, we ask whether and how strongly health outcomes depend on the dynamics of the touching dyad (for example, humans or robots/objects, familiarity and touch directionality), demographics (for example, clinical status, age or sex), delivery means (for example, type of touch intervention or touched body part) and procedure (for example, duration or number of sessions). We did so separately for newborns and for children and adults, as the health outcomes in newborns differed substantially from those in the other age groups. Despite the focus of the analysis being on humans, it is widely known that many animal species benefit from touch interactions and that engaging in touch promotes their well-being as well 7 . Since animal models are essential for the investigation of the mechanisms underlying biological processes and for the development of therapeutic approaches, we accordingly included health benefits of touch interventions in non-human animals as part of our systematic review. However, this search yielded only a small number of studies, suggesting a lack of research in this domain, and as such, was insufficient to be included in the meta-analysis. We evaluate the identified animal studies and their findings in the discussion.

Touch interventions have a medium-sized effect

The pre-registration can be found at ref. 8 . The flowchart for data collection and extraction is depicted in Fig. 1 .

figure 1

Animal outcomes refer to outcomes measured in non-human species that were solely considered as part of a systematic review. Included languages were French, Dutch, German and English, but our search did not identify any articles in French, Dutch or German. MA, meta-analysis.

For adults, a total of n  = 2,841 and n  = 2,556 individuals in the touch and control groups, respectively, across 85 studies and 103 cohorts were included. The effect of touch overall was medium-sized ( t (102) = 9.74, P  < 0.001, Hedges’ g  = 0.52, 95% confidence interval (CI) 0.42 to 0.63; Fig. 2a ). For newborns, we could include 63 cohorts across 52 studies comprising a total of n  = 2,134 and n  = 2,086 newborns in the touch and control groups, respectively, with an overall effect almost identical to the older age group ( t (62) = 7.53, P  < 0.001, Hedges’ g  = 0.56, 95% CI 0.41 to 0.71; Fig. 2b ), suggesting that, despite distinct health outcomes, touch interventions show comparable effects across newborns and adults. Using these overall effect estimates, we conducted a power sensitivity analysis of all the included primary studies to investigate whether such effects could be reliably detected 9 . Sufficient power to detect such effect sizes was rare in individual studies, as investigated by firepower plots 10 (Supplementary Figs. 1 and 2 ). No individual effect size from either meta-analysis was overly influential (Cook’s D  < 0.06). The benefits were similar for mental and physical outcomes (mental versus physical; adults: t (101) = 0.79, P  = 0.432, Hedges’ g difference of −0.05, 95% CI −0.16 to 0.07, Fig. 2c ; newborns: t (61) = 1.08, P  = 0.284, Hedges’ g difference of −0.19, 95% CI −0.53 to 0.16, Fig. 2d ).

figure 2

a , Orchard plot illustrating the overall benefits across all health outcomes for adults/children across 469 in part dependent effect sizes from 85 studies and 103 cohorts. b , The same as a but for newborns across 174 in part dependent effect sizes from 52 studies and 63 cohorts. c , The same as a but separating the results for physical versus mental health benefits across 469 in part dependent effect sizes from 85 studies and 103 cohorts. d , The same as b but separating the results for physical versus mental health benefits across 172 in part dependent effect sizes from 52 studies and 63 cohorts. Each dot reflects a measured effect, and the number of effects ( k ) included in the analysis is depicted in the bottom left. Mean effects and 95% CIs are presented in the bottom right and are indicated by the central black dot (mean effect) and its error bars (95% CI). The heterogeneity Q statistic is presented in the top left. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). Note that the P values above the mean effects indicate whether an effect differed significantly from a zero effect. P values were not corrected for multiple comparisons. The dot size reflects the precision of each individual effect (larger indicates higher precision). Small-study bias for the overall effect was significant ( F test, two-sided test) in the adult meta-analysis ( F (1, 101) = 21.24, P  < 0.001; Supplementary Fig. 3 ) as well as in the newborn meta-analysis ( F (1, 61) = 5.25, P  = 0.025; Supplementary Fig. 4 ).

Source data

On the basis of the overall effect of both meta-analyses as well as their median sample sizes, the minimum number of studies necessary for subgroup analyses to achieve 80% power was k  = 9 effects for adults and k  = 8 effects for newborns (Supplementary Figs. 5 and 6 ). Assessing specific health outcomes with sufficient power in more detail in adults (Fig. 3a ) revealed smaller benefits to sleep and heart rate parameters, moderate benefits to positive and negative affect, diastolic blood and systolic blood pressure, mobility and reductions of the stress hormone cortisol and larger benefits to trait and state anxiety, depression, fatigue and pain. Post hoc tests revealed stronger benefits for pain, state anxiety, depression and trait anxiety compared with respiratory, sleep and heart rate parameters (see Fig. 3 for all post hoc comparisons). Reductions in pain and state anxiety were increased compared with reductions in negative affect ( t (83) = 2.54, P  = 0.013, Hedges’ g difference of 0.31, 95% CI 0.07 to 0.55; t (83) = 2.31, P  = 0.024, Hedges’ g difference of 0.27, 95% CI 0.03 to 0.51). Benefits to pain symptoms were higher compared with benefits to positive affect ( t (83) = 2.22, P  = 0.030, Hedges’ g difference of 0.29, 95% CI 0.04 to 0.54). Finally, touch resulted in larger benefits to cortisol release compared with heart rate parameters ( t (83) = 2.30, P  = 0.024, Hedges’ g difference of 0.26, 95% CI 0.04–0.48).

figure 3

a , b , Health outcomes in adults analysed across 405 in part dependent effect sizes from 79 studies and 97 cohorts ( a ) and in newborns analysed across 105 in part dependent effect sizes from 46 studies and 56 cohorts ( b ). The type of health outcomes measured differed between adults and newborns and were thus analysed separately. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

In newborns, only physical health effects offered sufficient data for further analysis. We found no benefits for digestion and heart rate parameters. All other health outcomes (cortisol, liver enzymes, respiration, temperature regulation and weight gain) showed medium to large effects (Fig. 3b ). We found no significant differences among any specific health outcomes.

Non-human touch and skin-to-skin contact

In some situations, a fellow human is not readily available to provide affective touch, raising the question of the efficacy of touch delivered by objects and robots 11 . Overall, we found humans engaging in touch with other humans or objects to have medium-sized health benefits in adults, without significant differences ( t (99) = 1.05, P  = 0.295, Hedges’ g difference of 0.12, 95% CI −0.11 to 0.35; Fig. 4a ). However, differentiating physical versus mental health benefits revealed similar benefits for human and object touch on physical health outcomes, but larger benefits on mental outcomes when humans were touched by humans ( t (97) = 2.32, P  = 0.022, Hedges’ g difference of 0.24, 95% CI 0.04 to 0.44; Fig. 4b ). It must be noted that touching with an object still showed a significant effect (see Supplementary Fig. 7 for the corresponding orchard plot).

figure 4

a , Forest plot comparing humans versus objects touching a human on health outcomes overall across 467 in part dependent effect sizes from 85 studies and 101 cohorts. b , The same as a but separately for mental versus physical health outcomes across 467 in part dependent effect sizes from 85 studies and 101 cohorts. c , Results with the removal of all object studies, leaving 406 in part dependent effect sizes from 71 studies and 88 cohorts to identify whether missing skin-to-skin contact is the relevant mediator of higher mental health effects in human–human interactions. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

We considered the possibility that this effect was due to missing skin-to-skin contact in human–object interactions. Thus, we investigated human–human interactions with and without skin-to-skin contact (Fig. 4c ). In line with the hypothesis that skin-to-skin contact is highly relevant, we again found stronger mental health benefits in the presence of skin-to-skin contact that however did not achieve nominal significance ( t (69) = 1.95, P  = 0.055, Hedges’ g difference of 0.41, 95% CI −0.00 to 0.82), possibly because skin-to-skin contact was rarely absent in human–human interactions, leading to a decrease in power of this analysis. Results for skin-to-skin contact as an overall moderator can be found in Supplementary Fig. 8 .

Influences of type of touch

The large majority of touch interventions comprised massage therapy in adults and kangaroo care in newborns (see Supplementary Table 1 for a complete list of interventions across studies). However, comparing the different types of touch explored across studies did not reveal significant differences in effect sizes based on touch type, be it on overall health benefits (adults: t (101) = 0.11, P  = 0.916, Hedges’ g difference of 0.02, 95% CI −0.32 to 0.29; Fig. 5a ) or comparing different forms of touch separately for physical (massage therapy versus other forms: t (99) = 0.99, P  = 0.325, Hedges’ g difference 0.16, 95% CI −0.15 to 0.47) or for mental health benefits (massage therapy versus other forms: t (99) = 0.75, P  = 0.458, Hedges’ g difference of 0.13, 95% CI −0.22 to 0.48) in adults (Fig. 5c ; see Supplementary Fig. 9 for the corresponding orchard plot). A similar picture emerged for physical health effects in newborns (massage therapy versus kangaroo care: t (58) = 0.94, P  = 0.353, Hedges’ g difference of 0.15, 95% CI −0.17 to 0.47; massage therapy versus other forms: t (58) = 0.56, P  = 0.577, Hedges’ g difference of 0.13, 95% CI −0.34 to 0.60; kangaroo care versus other forms: t (58) = 0.07, P  = 0.947, Hedges’ g difference of 0.02, 95% CI −0.46 to 0.50; Fig. 5d ; see also Supplementary Fig. 10 for the corresponding orchard plot). This suggests that touch types may be flexibly adapted to the setting of every touch intervention.

figure 5

a , Forest plot of health benefits comparing massage therapy versus other forms of touch in adult cohorts across 469 in part dependent effect sizes from 85 studies and 103 cohorts. b , Forest plot of health benefits comparing massage therapy, kangaroo care and other forms of touch for newborns across 174 in part dependent effect sizes from 52 studies and 63 cohorts. c , The same as a but separating mental and physical health benefits across 469 in part dependent effect sizes from 85 studies and 103 cohorts. d , The same as b but separating mental and physical health outcomes where possible across 164 in part dependent effect sizes from 51 studies and 62 cohorts. Note that an insufficient number of studies assessed mental health benefits of massage therapy or other forms of touch to be included. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

The role of clinical status

Most research on touch interventions has focused on clinical samples, but are benefits restricted to clinical cohorts? We found health benefits to be significant in clinical and healthy populations (Fig. 6 ), whether all outcomes are considered (Fig. 6a,b ) or physical and mental health outcomes are separated (Fig. 6c,d , see Supplementary Figs. 11 and 12 for the corresponding orchard plots). In adults, however, we found higher mental health benefits for clinical populations compared with healthy ones (Fig. 6c ; t (99) = 2.11, P  = 0.037, Hedges’ g difference of 0.25, 95% CI 0.01 to 0.49).

figure 6

a , Health benefits for clinical cohorts of adults versus healthy cohorts of adults across 469 in part dependent effect sizes from 85 studies and 103 cohorts. b , The same as a but for newborn cohorts across 174 in part dependent effect sizes from 52 studies and 63 cohorts. c , The same as a but separating mental versus physical health benefits across 469 in part dependent effect sizes from 85 studies and 103 cohorts. d , The same as b but separating mental versus physical health benefits across 172 in part dependent effect sizes from 52 studies and 63 cohorts. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test).The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

A more detailed analysis of specific clinical conditions in adults revealed positive mental and physical health benefits for almost all assessed clinical disorders. Differences between disorders were not found, with the exception of increased effectiveness of touch interventions in neurological disorders (Supplementary Fig. 13 ).

Familiarity in the touching dyad and intervention location

Touch interventions can be performed either by familiar touchers (partners, family members or friends) or by unfamiliar touchers (health care professionals). In adults, we did not find an impact of familiarity of the toucher ( t (99) = 0.12, P  = 0.905, Hedges’ g difference of 0.02, 95% CI −0.27 to 0.24; Fig. 7a ; see Supplementary Fig. 14 for the corresponding orchard plot). Similarly, investigating the impact on mental and physical health benefits specifically, no significant differences could be detected, suggesting that familiarity is irrelevant in adults. In contrast, touch applied to newborns by their parents (almost all studies only included touch by the mother) was significantly more beneficial compared with unfamiliar touch ( t (60) = 2.09, P  = 0.041, Hedges’ g difference of 0.30, 95% CI 0.01 to 0.59) (Fig. 7b ; see Supplementary Fig. 15 for the corresponding orchard plot). Investigating mental and physical health benefits specifically revealed no significant differences. Familiarity with the location in which the touch was applied (familiar being, for example, the participants’ home) did not influence the efficacy of touch interventions (Supplementary Fig. 16 ).

figure 7

a , Health benefits for being touched by a familiar (for example, partner, family member or friend) versus unfamiliar toucher (health care professional) across 463 in part dependent effect sizes from 83 studies and 101 cohorts. b , The same as a but for newborn cohorts across 171 in part dependent effect sizes from 51 studies and 62 cohorts. c , The same as a but separating mental versus physical health benefits across 463 in part dependent effect sizes from 83 studies and 101 cohorts. d , The same as b but separating mental versus physical health benefits across 169 in part dependent effect sizes from 51 studies and 62 cohorts. Numbers on the right represent the mean effect with its 95% CI in square brackets and the significance level estimating the likelihood that the effect is equal to zero. Overall effects of moderator impact were assessed via an F test, and post hoc comparisons were done using t tests (two-sided test). The F value in the top right represents a test of the hypothesis that all effects within the subpanel are equal. The Q statistic represents the heterogeneity. P values of post hoc tests are depicted whenever significant. P values above the horizontal whiskers indicate whether an effect differed significantly from a zero effect. Vertical lines indicate significant post hoc tests between moderator levels. P values were not corrected for multiple comparisons. Physical outcomes are marked in red. Mental outcomes are marked in blue.

Frequency and duration of touch interventions

How often and for how long should touch be delivered? For adults, the median touch duration across studies was 20 min and the median number of touch interventions was four sessions with an average time interval of 2.3 days between each session. For newborns, the median touch duration across studies was 17.5 min and the median number of touch interventions was seven sessions with an average time interval of 1.3 days between each session.

Delivering more touch sessions increased benefits in adults, whether overall ( t (101) = 4.90, P  < 0.001, Hedges’ g  = 0.02, 95% CI 0.01 to 0.03), physical ( t (81) = 3.07, P  = 0.003, Hedges’ g  = 0.02, 95% CI 0.01–0.03) or mental benefits ( t (72) = 5.43, P  < 0.001, Hedges’ g  = 0.02, 95% CI 0.01–0.03) were measured (Fig. 8a ). A closer look at specific outcomes for which sufficient data were available revealed that positive associations between the number of sessions and outcomes were found for trait anxiety ( t (12) = 7.90, P  < 0.001, Hedges’ g  = 0.03, 95% CI 0.02–0.04), depression ( t (20) = 10.69, P  < 0.001, Hedges’ g  = 0.03, 95% CI 0.03–0.04) and pain ( t (37) = 3.65, P  < 0.001, Hedges’ g  = 0.03, 95% CI 0.02–0.05), indicating a need for repeated sessions to improve these adverse health outcomes. Neither increasing the number of sessions for newborns nor increasing the duration of touch per session in adults or newborns increased health benefits, be they physical or mental (Fig. 8b–d ). For continuous moderators in adults, we also looked at specific health outcomes as sufficient data were generally available for further analysis. Surprisingly, we found significant negative associations between touch duration and reductions of cortisol ( t (24) = 2.71, P  = 0.012, Hedges’ g  = −0.01, 95% CI −0.01 to −0.00) and heart rate parameters ( t (21) = 2.35, P  = 0.029, Hedges’ g  = −0.01, 95% CI −0.02 to −0.00).

figure 8

a , Meta-regression analysis examining the association between the number of sessions applied and the effect size in adults, either on overall health benefits (left, 469 in part dependent effect sizes from 85 studies and 103 cohorts) or for physical (middle, 245 in part dependent effect sizes from 69 studies and 83 cohorts) or mental benefits (right, 224 in part dependent effect sizes from 60 studies and 74 cohorts) separately. b , The same as a for newborns (overall: 150 in part dependent effect sizes from 46 studies and 53 cohorts; physical health: 127 in part dependent effect sizes from 44 studies and 51 cohorts; mental health: 21 in part dependent effect sizes from 11 studies and 12 cohorts). c , d the same as a ( c ) and b ( d ) but for the duration of the individual sessions. For adults, 449 in part dependent effect sizes across 80 studies and 96 cohorts were included in the overall analysis. The analysis of physical health benefits included 240 in part dependent effect sizes across 67 studies and 80 cohorts, and the analysis of mental health benefits included 209 in part dependent effect sizes from 56 studies and 69 cohorts. For newborns, 145 in part dependent effect sizes across 45 studies and 52 cohorts were included in the overall analysis. The analysis of physical health benefits included 122 in part dependent effect sizes across 43 studies and 50 cohorts, and the analysis of mental health benefits included 21 in part dependent effect sizes from 11 studies and 12 cohorts. Each dot represents an effect size. Its size indicates the precision of the study (larger indicates better). Overall effects of moderator impact were assessed via an F test (two-sided test). The P values in each panel represent the result of a regression analysis testing the hypothesis that the slope of the relationship is equal to zero. P values are not corrected for multiple testing. The shaded area around the regression line represents the 95% CI.

Demographic influences of sex and age

We used the ratio between women and men in the single-study samples as a proxy for sex-specific effects. Sex ratios were heavily skewed towards larger numbers of women in each cohort (median 83% women), and we could not find significant associations between sex ratio and overall ( t (62) = 0.08, P  = 0.935, Hedges’ g  = 0.00, 95% CI −0.00 to 0.01), mental ( t (43) = 0.55, P  = 0.588, Hedges’ g  = 0.00, 95% CI −0.00 to 0.01) or physical health benefits ( t (51) = 0.15, P  = 0.882, Hedges’ g  = −0.00, 95% CI −0.01 to 0.01). For specific outcomes that could be further analysed, we found a significant positive association of sex ratio with reductions in cortisol secretion ( t (18) = 2.31, P  = 0.033, Hedges’ g  = 0.01, 95% CI 0.00 to 0.01) suggesting stronger benefits in women. In contrast to adults, sex ratios were balanced in samples of newborns (median 53% girls). For newborns, there was no significant association with overall ( t (36) = 0.77, P  = 0.447, Hedges’ g  = −0.01, 95% CI −0.02 to 0.01) and physical health benefits of touch ( t (35) = 0.93, P  = 0.359, Hedges’ g  = −0.01, 95% CI −0.02 to 0.01). Mental health benefits did not provide sufficient data for further analysis.

The median age in the adult meta-analysis was 42.6 years (s.d. 21.16 years, range 4.5–88.4 years). There was no association between age and the overall ( t (73) = 0.35, P  = 0.727, Hedges’ g = 0.00, 95% CI −0.01 to 0.01), mental ( t (53) = 0.94, P  = 0.353, Hedges’ g  = 0.01, 95% CI −0.01 to 0.02) and physical health benefits of touch ( t (60) = 0.16, P  = 0.870, Hedges’ g  = 0.00, 95% CI −0.01 to 0.01). Looking at specific health outcomes, we found significant positive associations between mean age and improved positive affect ( t (10) = 2.54, P  = 0.030, Hedges’ g  = 0.01, 95% CI 0.00 to 0.02) as well as systolic blood pressure ( t (11) = 2.39, P  = 0.036, Hedges’ g  = 0.02, 95% CI 0.00 to 0.04).

A list of touched body parts can be found in Supplementary Table 1 . For the touched body part, we found significantly higher health benefits for head touch compared with arm touch ( t (40) = 2.14, P  = 0.039, Hedges’ g difference of 0.78, 95% CI 0.07 to 1.49) and torso touch ( t (40) = 2.23, P  = 0.031; Hedges’ g difference of 0.84, 95% CI 0.10 to 1.58; Supplementary Fig. 17 ). Touching the arm resulted in lower mental health compared with physical health benefits ( t (37) = 2.29, P  = 0.028, Hedges’ g difference of −0.35, 95% CI −0.65 to −0.05). Furthermore, we found a significantly increased physical health benefit when the head was touched as opposed to the torso ( t (37) = 2.10, P  = 0.043, Hedges’ g difference of 0.96, 95% CI 0.06 to 1.86). Thus, head touch such as a face or scalp massage could be especially beneficial.

Directionality

In adults, we tested whether a uni- or bidirectional application of touch mattered. The large majority of touch was applied unidirectionally ( k  = 442 of 469 effects). Unidirectional touch had higher health benefits ( t (101) = 2.17, P  = 0.032, Hedges’ g difference of 0.30, 95% CI 0.03 to 0.58) than bidirectional touch. Specifically, mental health benefits were higher in unidirectional touch ( t (99) = 2.33, P  = 0.022, Hedges’ g difference of 0.46, 95% CI 0.06 to 0.66).

Study location

For adults, we found significantly stronger health benefits of touch in South American compared with North American cohorts ( t (95) = 2.03, P  = 0.046, Hedges’ g difference of 0.37, 95% CI 0.01 to 0.73) and European cohorts ( t (95) = 2.22, P  = 0.029, Hedges’ g difference of 0.36, 95% CI 0.04 to 0.68). For newborns, we found weaker effects in North American cohorts compared to Asian ( t (55) = 2.28, P  = 0.026, Hedges’ g difference of −0.37, 95% CI −0.69 to −0.05) and European cohorts ( t (55) = 2.36, P  = 0.022, Hedges’ g difference of −0.40, 95% CI −0.74 to −0.06). Investigating the interaction with mental and physical health benefits did not reveal any effects of study location in both meta-analyses (Supplementary Fig. 18 ).

Systematic review of studies without effect sizes

All studies where effect size data could not be obtained or that did not meet the meta-analysis inclusion criteria can be found on the OSF project 12 in the file ‘Study_lists_final_revised.xlsx’ (sheet ‘Studies_without_effect_sizes’). Specific reasons for exclusion are furthermore documented in Supplementary Table 2 . For human health outcomes assessed across 56 studies and n  = 2,438 individuals, interventions mostly comprised massage therapy ( k  = 86 health outcomes) and kangaroo care ( k  = 33 health outcomes). For datasets where no effect size could be computed, 90.0% of mental health and 84.3% of physical health parameters were positively impacted by touch. Positive impact of touch did not differ between types of touch interventions. These results match well with the observations of the meta-analysis of a highly positive benefit of touch overall, irrespective of whether a massage or any other intervention is applied.

We also assessed health outcomes in animals across 19 studies and n  = 911 subjects. Most research was conducted in rodents. Animals that received touch were rats (ten studies, k  = 16 health outcomes), mice (four studies, k  = 7 health outcomes), macaques (two studies, k  = 3 health outcomes), cats (one study, k  = 3 health outcomes), lambs (one study, k  = 2 health outcomes) and coral reef fish (one study, k  = 1 health outcome). Touch interventions mostly comprised stroking ( k  = 13 health outcomes) and tickling ( k  = 10 health outcomes). For animal studies, 71.4% of effects showed benefits to mental health-like parameters and 81.8% showed positive physical health effects. We thus found strong evidence that touch interventions, which were mostly conducted by humans (16 studies with human touch versus 3 studies with object touch), had positive health effects in animal species as well.

The key aim of the present study was twofold: (1) to provide an estimate of the effect size of touch interventions and (2) to disambiguate moderating factors to potentially tailor future interventions more precisely. Overall, touch interventions were beneficial for both physical and mental health, with a medium effect size. Our work illustrates that touch interventions are best suited for reducing pain, depression and anxiety in adults and children as well as for increasing weight gain in newborns. These findings are in line with previous meta-analyses on this topic, supporting their conclusions and their robustness to the addition of more datasets. One limitation of previous meta-analyses is that they focused on specific health outcomes or populations, despite primary studies often reporting effects on multiple health parameters simultaneously (for example, ref. 13 focusing on neck and shoulder pain and ref. 14 focusing on massage therapy in preterms). To our knowledge, only ref. 5 provides a multivariate picture for a large number of dependent variables. However, this study analysed their data in separate random effects models that did not account for multivariate reporting nor for the multilevel structure of the data, as such approaches have only become available recently. Thus, in addition to adding a substantial amount of new data, our statistical approach provides a more accurate depiction of effect size estimates. Additionally, our study investigated a variety of moderating effects that did not reach significance (for example, sex ratio, mean age or intervention duration) or were not considered (for example, the benefits of robot or object touch) in previous meta-analyses in relation to touch intervention efficacy 5 , probably because of the small number of studies with information on these moderators in the past. Owing to our large-scale approach, we reached high statistical power for many moderator analyses. Finally, previous meta-analyses on this topic exclusively focused on massage therapy in adults or kangaroo care in newborns 15 , leaving out a large number of interventions that are being carried out in research as well as in everyday life to improve well-being. Incorporating these studies into our study, we found that, in general, both massages and other types of touch, such as gentle touch, stroking or kangaroo care, showed similar health benefits.

While it seems to be less critical which touch intervention is applied, the frequency of interventions seems to matter. More sessions were positively associated with the improvement of trait outcomes such as depression and anxiety but also pain reductions in adults. In contrast to session number, increasing the duration of individual sessions did not improve health effects. In fact, we found some indications of negative relationships in adults for cortisol and blood pressure. This could be due to habituating effects of touch on the sympathetic nervous system and hypothalamic–pituitary–adrenal axis, ultimately resulting in diminished effects with longer exposure, or decreased pleasantness ratings of affective touch with increasing duration 16 . For newborns, we could not support previous notions that the duration of the touch intervention is linked to benefits in weight gain 17 . Thus, an ideal intervention protocol does not seem to have to be excessively long. It should be noted that very few interventions lasted less than 5 min, and it therefore remains unclear whether very short interventions have the same effect.

A critical issue highlighted in the pandemic was the lack of touch due to social restrictions 18 . To accommodate the need for touch in individuals with small social networks (for example, institutionalized or isolated individuals), touch interventions using objects/robots have been explored in the past (for a review, see ref. 11 ). We show here that touch interactions outside of the human–human domain are beneficial for mental and physical health outcomes. Importantly, object/robot touch was not as effective in improving mental health as human-applied touch. A sub-analysis of missing skin-to-skin contact among humans indicated that mental health effects of touch might be mediated by the presence of skin-to-skin contact. Thus, it seems profitable to include skin-to-skin contact in future touch interventions, in line with previous findings in newborns 19 . In robots, recent advancements in synthetic skin 20 should be investigated further in this regard. It should be noted that, although we did not observe significant differences in physical health benefits between human–human and human–object touch, the variability of effect sizes was higher in human–object touch. The conditions enabling object or robot interactions to improve well-being should therefore be explored in more detail in the future.

Touch was beneficial for both healthy and clinical cohorts. These data are critical as most previous meta-analytic research has focused on individuals diagnosed with clinical disorders (for example, ref. 6 ). For mental health outcomes, we found larger effects in clinical cohorts. A possible reason could relate to increased touch wanting 21 in patients. For example, loneliness often co-occurs with chronic illnesses 22 , which are linked to depressed mood and feelings of anxiety 23 . Touch can be used to counteract this negative development 24 , 25 . In adults and children, knowing the toucher did not influence health benefits. In contrast, familiarity affected overall health benefits in newborns, with parental touch being more beneficial than touch applied by medical staff. Previous studies have suggested that early skin-to-skin contact and exposure to maternal odour is critical for a newborn’s ability to adapt to a new environment 26 , supporting the notion that parental care is difficult to substitute in this time period.

With respect to age-related effects, our data further suggest that increasing age was associated with a higher benefit through touch for systolic blood pressure. These findings could potentially be attributed to higher basal blood pressure 27 with increasing age, allowing for a stronger modulation of this parameter. For sex differences, our study provides some evidence that there are differences between women and men with respect to health benefits of touch. Overall, research on sex differences in touch processing is relatively sparse (but see refs. 28 , 29 ). Our results suggest that buffering effects against physiological stress are stronger in women. This is in line with increased buffering effects of hugs in women compared with men 30 . The female-biased primary research in adults, however, begs for more research in men or non-binary individuals. Unfortunately, our study could not dive deeper into this topic as health benefits broken down by sex or gender were almost never provided. Recent research has demonstrated that sensory pleasantness is affected by sex and that this also interacts with the familiarity of the other person in the touching dyad 29 , 31 . In general, contextual factors such as sex and gender or the relationship of the touching dyad, differences in cultural background or internal states such as stress have been demonstrated to be highly influential in the perception of affective touch and are thus relevant to maximizing the pleasantness and ultimately the health benefits of touch interactions 32 , 33 , 34 . As a positive personal relationship within the touching dyad is paramount to induce positive health effects, future research applying robot touch to promote well-being should therefore not only explore synthetic skin options but also focus on improving robots as social agents that form a close relationship with the person receiving the touch 35 .

As part of the systematic review, we also assessed the effects of touch interventions in non-human animals. Mimicking the results of the meta-analysis in humans, beneficial effects of touch in animals were comparably strong for mental health-like and physical health outcomes. This may inform interventions to promote animal welfare in the context of animal experiments 36 , farming 37 and pets 38 . While most studies investigated effects in rodents, which are mostly used as laboratory animals, these results probably transfer to livestock and common pets as well. Indeed, touch was beneficial in lambs, fish and cats 39 , 40 , 41 . The positive impact of human touch in rodents also allows for future mechanistic studies in animal models to investigate how interventions such as tickling or stroking modulate hormonal and neuronal responses to touch in the brain. Furthermore, the commonly proposed oxytocin hypothesis can be causally investigated in these animal models through, for example, optogenetic or chemogenetic techniques 42 . We believe that such translational approaches will further help in optimizing future interventions in humans by uncovering the underlying mechanisms and brain circuits involved in touch.

Our results offer many promising avenues to improve future touch interventions, but they also need to be discussed in light of their limitations. While the majority of findings showed robust health benefits of touch interventions across moderators when compared with a null effect, post hoc tests of, for example, familiarity effects in newborns or mental health benefit differences between human and object touch only barely reached significance. Since we computed a large number of statistical tests in the present study, there is a risk that these results are false positives. We hope that researchers in this field are stimulated by these intriguing results and target these questions by primary research through controlled experimental designs within a well-powered study. Furthermore, the presence of small-study bias in both meta-analyses is indicative that the effect size estimates presented here might be overestimated as null results are often unpublished. We want to stress however that this bias is probably reduced by the multivariate reporting of primary studies. Most studies that reported on multiple health outcomes only showed significant findings for one or two among many. Thus, the multivariate nature of primary research in this field allowed us to include many non-significant findings in the present study. Another limitation pertains to the fact that we only included articles in languages mostly spoken in Western countries. As a large body of evidence comes from Asian countries, it could be that primary research was published in languages other than specified in the inclusion criteria. Thus, despite the large and inclusive nature of our study, some studies could have been missed regardless. Another factor that could not be accounted for in our meta-analysis was that an important prerequisite for touch to be beneficial is its perceived pleasantness. The level of pleasantness associated with being touched is modulated by several parameters 34 including cultural acceptability 43 , perceived humanness 44 or a need for touch 45 , which could explain the observed differences for certain moderators, such as human–human versus robot–human interaction. Moreover, the fact that secondary categorical moderators could not be investigated with respect to specific health outcomes, owing to the lack of data points, limits the specificity of our conclusions in this regard. It thus remains unclear whether, for example, a decreased mental health benefit in the absence of skin-to-skin contact is linked mostly to decreased anxiolytic effects, changes in positive/negative affect or something else. Since these health outcomes are however highly correlated 46 , it is likely that such effects are driven by multiple health outcomes. Similarly, it is important to note that our conclusions mainly refer to outcomes measured close to the touch intervention as we did not include long-term outcomes. Finally, it needs to be noted that blinding towards the experimental condition is essentially impossible in touch interventions. Although we compared the touch intervention with other interventions, such as relaxation therapy, as control whenever possible, contributions of placebo effects cannot be ruled out.

In conclusion, we show clear evidence that touch interventions are beneficial across a large number of both physical and mental health outcomes, for both healthy and clinical cohorts, and for all ages. These benefits, while influenced in their magnitude by study cohorts and intervention characteristics, were robustly present, promoting the conclusion that touch interventions can be systematically employed across the population to preserve and improve our health.

Open science practices

All data and code are accessible in the corresponding OSF project 12 . The systematic review was registered on PROSPERO (CRD42022304281) before the start of data collection. We deviated from the pre-registered plan as follows:

Deviation 1: During our initial screening for the systematic review, we were confronted with a large number of potential health outcomes to look at. This observation of multivariate outcomes led us to register an amendment during data collection (but before any effect size or moderator screening). In doing so, we aimed to additionally extract meta-analytic effects for a more quantitative assessment of our review question that can account for multivariate data reporting and dependencies of effects within the same study. Furthermore, as we noted a severe lack of studies with respect to health outcomes for animals during the inclusion assessment for the systematic review, we decided that the meta-analysis would only focus on outcomes that could be meaningfully analysed on the meta-analytic level and therefore only included health outcomes of human participants.

Deviation 2: In the pre-registration, we did not explicitly exclude non-randomized trials. Since an explicit use of non-randomization for group allocation significantly increases the risk of bias, we decided to exclude them a posteriori from data analysis.

Deviation 3: In the pre-registration, we outlined a tertiary moderator level, namely benefits of touch application versus touch reception. This level was ignored since no included study specifically investigated the benefits of touch application by itself.

Deviation 4: In the pre-registration, we suggested using the RoBMA function 47 to provide a Bayesian framework that allows for a more accurate assessment of publication bias beyond small-study bias. Unfortunately, neither multilevel nor multivariate data structures are supported by the RoBMA function, to our knowledge. For this reason, we did not further pursue this analysis, as the hierarchical nature of the data would not be accounted for.

Deviation 5: Beyond the pre-registered inclusion and exclusion criteria, we also excluded dissertations owing to their lack of peer review.

Deviation 6: In the pre-registration, we stated to investigate the impact of sex of the person applying the touch. This moderator was not further analysed, as this information was rarely given and the individuals applying the touch were almost exclusively women (7 males, 24 mixed and 85 females in studies on adults/children; 3 males, 17 mixed and 80 females in studied on newborns).

Deviation 7: The time span of the touch intervention as assessed by subtracting the final day of the intervention from the first day was not investigated further owing to its very high correlation with the number of sessions ( r (461) = 0.81 in the adult meta-analysis, r (145) = 0.84 in the newborn meta-analysis).

Inclusion and exclusion criteria

To be included in the systematic review, studies had to investigate the relationship between at least one health outcome (physical and/or mental) in humans or animals and a touch intervention, include explicit physical touch by another human, animal or object as part of an intervention and include an experimental and control condition/group that are differentiated by touch alone. Of note, as a result of this selection process, no animal-to-animal touch intervention study was included, as they never featured a proper no-touch control. Human touch was always explicit touch by a human (that is, no brushes or other tools), either with or without skin-to-skin contact. Regarding the included health outcomes, we aimed to be as broad as possible but excluded parameters such as neurophysiological responses or pleasantness ratings after touch application as they do not reflect health outcomes. All included studies in the meta-analysis and systematic review 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 , 91 , 92 , 93 , 94 , 95 , 96 , 97 , 98 , 99 , 100 , 101 , 102 , 103 , 104 , 105 , 106 , 107 , 108 , 109 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 , 123 , 124 , 125 , 126 , 127 , 128 , 129 , 130 , 131 , 132 , 133 , 134 , 135 , 136 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 148 , 149 , 150 , 151 , 152 , 153 , 154 , 155 , 156 , 157 , 158 , 159 , 160 , 161 , 162 , 163 , 164 , 165 , 166 , 167 , 168 , 169 , 170 , 171 , 172 , 173 , 174 , 175 , 176 , 177 , 178 , 179 , 180 , 181 , 182 , 183 , 184 , 185 , 186 , 187 , 188 , 189 , 190 , 191 , 192 , 193 , 194 , 195 , 196 , 197 , 198 , 199 , 200 , 201 , 202 , 203 , 204 , 205 , 206 , 207 , 208 , 209 , 210 , 211 , 212 , 213 , 214 , 215 , 216 , 217 , 218 , 219 , 220 , 221 , 222 , 223 , 224 , 225 , 226 , 227 , 228 , 229 , 230 , 231 , 232 , 233 , 234 , 235 , 236 , 237 , 238 , 239 , 240 , 241 , 242 , 243 , 244 , 245 , 246 , 247 , 248 , 249 , 250 , 251 , 252 , 253 , 254 , 255 , 256 , 257 , 258 , 259 , 260 , 261 , 262 , 263 are listed in Supplementary Table 2 . All excluded studies are listed in Supplementary Table 3 , together with a reason for exclusion. We then applied a two-step process: First, we identified all potential health outcomes and extracted qualitative information on those outcomes (for example, direction of effect). Second, we extracted quantitative information from all possible outcomes (for example, effect sizes). The meta-analysis additionally required a between-subjects design (to clearly distinguish touch from no-touch effects and owing to missing information about the correlation between repeated measurements 264 ). Studies that explicitly did not apply a randomized protocol were excluded before further analysis to reduce risk of bias. The full study lists for excluded and included studies can be found in the OSF project 12 in the file ‘Study_lists_final_revised.xlsx’. In terms of the time frame, we conducted an open-start search of studies until 2022 and identified studies conducted between 1965 and 2022.

Data collection

We used Google Scholar, PubMed and Web of Science for our literature search, with no limitations regarding the publication date and using pre-specified search queries (see Supplementary Information for the exact keywords used). All procedures were in accordance with the updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines 265 . Articles were assessed in French, Dutch, German or English. The above databases were searched from 2 December 2021 until 1 October 2022. Two independent coders evaluated each paper against the inclusion and exclusion criteria. Inconsistencies between coders were checked and resolved by J.P. and H.H. Studies excluded/included for the review and meta-analysis can be found on the OSF project.

Search queries

We used the following keywords to search the chosen databases. Agents (human versus animal versus object versus robot) and touch outcome (physical versus mental) were searched separately together with keywords searching for touch.

TOUCH: Touch OR Social OR Affective OR Contact OR Tactile interaction OR Hug OR Massage OR Embrace OR Kiss OR Cradling OR Stroking OR Haptic interaction OR tickling

AGENT: Object OR Robot OR human OR animal OR rodent OR primate

MENTAL OUTCOME: Health OR mood OR Depression OR Loneliness OR happiness OR life satisfaction OR Mental Disorder OR well-being OR welfare OR dementia OR psychological OR psychiatric OR anxiety OR Distress

PHYSICAL OUTCOME: Health OR Stress OR Pain OR cardiovascular health OR infection risk OR immune response OR blood pressure OR heart rate

Data extraction and preparation

Data extraction began on 10 October 2022 and was concluded on 25 February 2023. J.P. and H.H. oversaw the data collection process, and checked and resolved all inconsistencies between coders.

Health benefits of touch were always coded by positive summary effects, whereas adverse health effects of touch were represented by negative summary effects. If multiple time points were measured for the same outcome on the same day after a single touch intervention, we extracted the peak effect size (in either the positive or negative direction). If the touch intervention occurred multiple times and health outcomes were assessed for each time point, we extracted data points separately. However, we only extracted immediate effects, as long-term effects not controlled through the experimental conditions could be due to influences other than the initial touch intervention. Measurements assessing long-term effects without explicit touch sessions in the breaks were excluded for the same reason. Common control groups for touch interventions comprised active (for example, relaxation therapy) as well as passive control groups (for example, standard medical care). In the case of multiple control groups, we always contrasted the touch group to the group that most closely matched the touch condition (for example, relaxation therapy was preferred over standard medical care). We extracted information from all moderators listed in the pre-registration (Supplementary Table 4 ). A list of included and excluded health outcomes is presented in Supplementary Table 5 . Authors of studies with possible effects but missing information to calculate those effects were contacted via email and asked to provide the missing data (response rate 35.7%).

After finalizing the list of included studies for the systematic review, we added columns for moderators and the coding schema for our meta-analysis per our updated registration. Then, each study was assessed for its eligibility in the meta-analysis by two independent coders (J.P., H.H., K.F. or F.M.). To this end, all coders followed an a priori specified procedure: First, the PDF was skimmed for possible effects to extract, and the study was excluded if no PDF was available or the study was in a language different from the ones specified in ‘ Data collection ’. Effects from studies that met the inclusion criteria were extracted from all studies listing descriptive values or statistical parameters to calculate effect sizes. A website 266 was used to convert descriptive and statistical values available in the included studies (means and standard deviations/standard errors/confidence intervals, sample sizes, F values, t values, t test P values or frequencies) into Cohen’s d , which were then converted in Hedges’ g . If only P value thresholds were reported (for example, P  < 0.01), we used this, most conservative, value as the P value to calculate the effect size (for example, P  = 0.01). If only the total sample size was given but that number was even and the participants were randomly assigned to each group, we assumed equal sample sizes for each group. If delta change scores (for example, pre- to post-touch intervention) were reported, we used those over post-touch only scores. In case frequencies were 0 when frequency tables were used to determine effect sizes, we used a value of 0.5 as a substitute to calculate the effect (the default setting in the ‘metafor’ function 267 ). From these data, Hedges’ g and its variance could be derived. Effect sizes were always computed between the experimental and the control group.

Statistical analysis and risk of bias assessment

Owing to the lack of identified studies, health benefits to animals were not included as part of the statistical analysis. One meta-analysis was performed for adults, adolescents and children, as outcomes were highly comparable. We refer to this meta-analysis as the adult meta-analysis, as children/adolescent cohorts were only targeted in a minority of studies. A separate meta-analysis was performed for newborns, as their health outcomes differed substantially from any other age group.

Data were analysed using R (version 4.2.2) with the ‘rma.mv’ function from the ‘metafor’ package 267 in a multistep, multivariate and multilevel fashion.

We calculated an overall effect of touch interventions across all studies, cohorts and health outcomes. To account for the hierarchical structure of the data, we used a multilevel structure with random effects at the study, cohort and effects level. Furthermore, we calculated the variance–covariance matrix of all data points to account for the dependencies of measured effects within each individual cohort and study. The variance–covariance matrix was calculated by default with an assumed correlation of effect sizes within each cohort of ρ  = 0.6. As ρ needed to be assumed, sensitivity analyses for all computed effect estimates were conducted using correlations between effects of 0, 0.2, 0.4 and 0.8. The results of these sensitivity analyses can be found in ref. 12 . No conclusion drawn in the present manuscript was altered by changing the level of ρ . The sensitivity analyses, however, showed that higher assumed correlations lead to more conservative effect size estimates (see Supplementary Figs. 19 and 20 for the adult and newborn meta-analyses, respectively), reducing the type I error risk in general 268 . In addition to these procedures, we used robust variance estimation with cluster-robust inference at the cohort level. This step is recommended to more accurately determine the confidence intervals in complex multivariate models 269 . The data distribution was assumed to be normal, but this was not formally tested.

To determine whether individual effects had a strong influence on our results, we calculated Cook’s distance D . Here, a threshold of D  > 0.5 was used to qualify a study as influential 270 . Heterogeneity in the present study was assessed using Cochran’s Q , which determines whether the extracted effect sizes estimate a common population effect size. Although the Q statistic in the ‘rma.mv’ function accounts for the hierarchical nature of the data, we also quantified the heterogeneity estimator σ ² for each random-effects level to provide a comprehensive overview of heterogeneity indicators. These indicators for all models can be found on the OSF project 12 in the Table ‘Model estimates’. To assess small study bias, we visually inspected the funnel plot and used the standard error as a moderator in the overarching meta-analyses.

Before any sub-group analysis, the overall effect size was used as input for power calculations. While such post hoc power calculations might be limited, we believe that a minimum number of effects to be included in subgroup analyses was necessary to allow for meaningful conclusions. Such medium effect sizes would also probably be the minimum effect sizes of interest for researchers as well as clinical practitioners. Power calculation for random-effects models further requires a sample size for each individual effect as well as an approximation of the expected heterogeneity between studies. For the sample size input, we used the median sample size in each of our studies. For heterogeneity, we assumed a value between medium and high levels of heterogeneity ( I ² = 62.5% 271 ), as moderator analyses typically aim at reducing heterogeneity overall. Subgroups were only further investigated if the number of observed effects achieved ~80% power under these circumstances, to allow for a more robust interpretation of the observed effects (see Supplementary Figs. 5 and 6 for the adult and newborn meta-analysis, respectively). In a next step, we investigated all pre-registered moderators for which sufficient power was detected. We first looked at our primary moderators (mental versus physical health) and how the effect sizes systematically varied as a function of our secondary moderators (for example, human–human or human–object touch, duration, skin-to-skin presence, etc.). We always included random slopes to allow for our moderators to vary with the random effects at our clustering variable, which is recommended in multilevel models to reduce false positives 272 . All statistical tests were performed two-sided. Significance of moderators was determined using omnibus F tests. Effect size differences between moderator levels and their confidence intervals were assessed via t tests.

Post hoc t tests were performed comparing mental and physical health benefits within each interacting moderator (for example, mental versus physical health benefits in cancer patients) and mental or physical health benefits across levels of the interacting moderator (for example, mental health benefits in cancer versus pain patients). The post hoc tests were not pre-registered. Data were visualized using forest plots and orchard plots 273 for categorical moderators and scatter plots for continuous moderators.

For a broad overview of prior work and their biases, risk of bias was assessed for all studies included in both meta-analyses and the systematic review. We assessed the risk of bias for the following parameters:

Bias from randomization, including whether a randomization procedure was performed, whether it was a between- or within-participant design and whether there were any baseline differences for demographic or dependent variables.

Sequence bias resulting from a lack of counterbalancing in within-subject designs.

Performance bias resulting from the participants or experiments not being blinded to the experimental conditions.

Attrition bias resulting from different dropout rates between experimental groups.

Note that four studies in the adult meta-analysis did not explicitly mention randomization as part of their protocol. However, since these studies never showed any baseline differences in all relevant variables (see ‘Risk of Bias’ table on the OSF project ) , we assumed that randomization was performed but not mentioned. Sequence bias was of no concern for studies for the meta-analysis since cross-over designs were excluded. It was, however, assessed for studies within the scope of the systematic review. Importantly, performance bias was always high in the adult/children meta-analysis, as blinding of the participants and experimenters to the experimental conditions was not possible owing to the nature of the intervention (touch versus no touch). For studies with newborns and animals, we assessed the performance bias as medium since neither newborns or animals are likely to be aware of being part of an experiment or specific group. An overview of the results is presented in Supplementary Fig. 21 , and the precise assessment for each study can be found on the OSF project 12 in the ‘Risk of Bias’ table.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data are available via Open Science Framework at https://doi.org/10.17605/OSF.IO/C8RVW (ref. 12 ). Source data are provided with this paper.

Code availability

All code is available via Open Science Framework at https://doi.org/10.17605/OSF.IO/C8RVW (ref. 12 ).

Fulkerson, M. The First Sense: a Philosophical Study of Human Touch (MIT Press, 2013).

Farroni, T., Della Longa, L. & Valori, I. The self-regulatory affective touch: a speculative framework for the development of executive functioning. Curr. Opin. Behav. Sci. 43 , 167–173 (2022).

Article   Google Scholar  

Ocklenburg, S. et al. Hugs and kisses—the role of motor preferences and emotional lateralization for hemispheric asymmetries in human social touch. Neurosci. Biobehav. Rev. 95 , 353–360 (2018).

Ardiel, E. L. & Rankin, C. H. The importance of touch in development. Paediatr. Child Health 15 , 153–156 (2010).

Article   PubMed   PubMed Central   Google Scholar  

Moyer, C. A., Rounds, J. & Hannum, J. W. A meta-analysis of massage therapy research. Psychol. Bull. 130 , 3–18 (2004).

Article   PubMed   Google Scholar  

Lee, S. H., Kim, J. Y., Yeo, S., Kim, S. H. & Lim, S. Meta-analysis of massage therapy on cancer pain. Integr. Cancer Ther. 14 , 297–304 (2015).

LaFollette, M. R., O’Haire, M. E., Cloutier, S. & Gaskill, B. N. A happier rat pack: the impacts of tickling pet store rats on human–animal interactions and rat welfare. Appl. Anim. Behav. Sci. 203 , 92–102 (2018).

Packheiser, J., Michon, F. Eva, C., Fredriksen, K. & Hartmann H. The physical and mental health benefits of social touch: a comparative systematic review and meta-analysis. PROSPERO https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022304281 (2023).

Lakens, D. Sample size justification. Collabra. Psychol. 8 , 33267 (2022).

Quintana, D. S. A guide for calculating study-level statistical power for meta-analyses. Adv. Meth. Pract. Psychol. Sci. https://doi.org/10.1177/25152459221147260 (2023).

Eckstein, M., Mamaev, I., Ditzen, B. & Sailer, U. Calming effects of touch in human, animal, and robotic interaction—scientific state-of-the-art and technical advances. Front. Psychiatry 11 , 555058 (2020).

Packheiser, J. et al. The physical and mental health benefits of affective touch: a comparative systematic review and multivariate meta-analysis. Open Science Framework https://doi.org/10.17605/OSF.IO/C8RVW (2023).

Kong, L. J. et al. Massage therapy for neck and shoulder pain: a systematic review and meta-analysis. Evid. Based Complement. Altern. Med. 2013 , 613279 (2013).

Wang, L., He, J. L. & Zhang, X. H. The efficacy of massage on preterm infants: a meta-analysis. Am. J. Perinatol. 30 , 731–738 (2013).

Field, T. Massage therapy research review. Complement. Ther. Clin. Pract. 24 , 19–31 (2016).

Bendas, J., Ree, A., Pabel, L., Sailer, U. & Croy, I. Dynamics of affective habituation to touch differ on the group and individual level. Neuroscience 464 , 44–52 (2021).

Article   CAS   PubMed   Google Scholar  

Charpak, N., Montealegre‐Pomar, A. & Bohorquez, A. Systematic review and meta‐analysis suggest that the duration of Kangaroo mother care has a direct impact on neonatal growth. Acta Paediatr. 110 , 45–59 (2021).

Packheiser, J. et al. A comparison of hugging frequency and its association with momentary mood before and during COVID-19 using ecological momentary assessment. Health Commun. https://doi.org/10.1080/10410236.2023.2198058 (2023).

Whitelaw, A., Heisterkamp, G., Sleath, K., Acolet, D. & Richards, M. Skin to skin contact for very low birthweight infants and their mothers. Arch. Dis. Child. 63 , 1377–1381 (1988).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Yogeswaran, N. et al. New materials and advances in making electronic skin for interactive robots. Adv. Robot. 29 , 1359–1373 (2015).

Durkin, J., Jackson, D. & Usher, K. Touch in times of COVID‐19: touch hunger hurts. J. Clin. Nurs. https://doi.org/10.1111/jocn.15488 (2021).

Rokach, A., Lechcier-Kimel, R. & Safarov, A. Loneliness of people with physical disabilities. Soc. Behav. Personal. Int. J. 34 , 681–700 (2006).

Palgi, Y. et al. The loneliness pandemic: loneliness and other concomitants of depression, anxiety and their comorbidity during the COVID-19 outbreak. J. Affect. Disord. 275 , 109–111 (2020).

Heatley-Tejada, A., Dunbar, R. I. M. & Montero, M. Physical contact and loneliness: being touched reduces perceptions of loneliness. Adapt. Hum. Behav. Physiol. 6 , 292–306 (2020).

Article   CAS   Google Scholar  

Packheiser, J. et al. The association of embracing with daily mood and general life satisfaction: an ecological momentary assessment study. J. Nonverbal Behav. 46 , 519–536 (2022).

Porter, R. The biological significance of skin-to-skin contact and maternal odours. Acta Paediatr. 93 , 1560–1562 (2007).

Hawkley, L. C., Masi, C. M., Berry, J. D. & Cacioppo, J. T. Loneliness is a unique predictor of age-related differences in systolic blood pressure. Psychol. Aging 21 , 152–164 (2006).

Russo, V., Ottaviani, C. & Spitoni, G. F. Affective touch: a meta-analysis on sex differences. Neurosci. Biobehav. Rev. 108 , 445–452 (2020).

Schirmer, A. et al. Understanding sex differences in affective touch: sensory pleasantness, social comfort, and precursive experiences. Physiol. Behav. 250 , 113797 (2022).

Berretz, G. et al. Romantic partner embraces reduce cortisol release after acute stress induction in women but not in men. PLoS ONE 17 , e0266887 (2022).

Gazzola, V. et al. Primary somatosensory cortex discriminates affective significance in social touch. Proc. Natl Acad. Sci. USA 109 , E1657–E1666 (2012).

Sorokowska, A. et al. Affective interpersonal touch in close relationships: a cross-cultural perspective. Personal. Soc. Psychol. Bull. 47 , 1705–1721 (2021).

Ravaja, N., Harjunen, V., Ahmed, I., Jacucci, G. & Spapé, M. M. Feeling touched: emotional modulation of somatosensory potentials to interpersonal touch. Sci. Rep. 7 , 40504 (2017).

Saarinen, A., Harjunen, V., Jasinskaja-Lahti, I., Jääskeläinen, I. P. & Ravaja, N. Social touch experience in different contexts: a review. Neurosci. Biobehav. Rev. 131 , 360–372 (2021).

Huisman, G. Social touch technology: a survey of haptic technology for social touch. IEEE Trans. Haptics 10 , 391–408 (2017).

Lewejohann, L., Schwabe, K., Häger, C. & Jirkof, P. Impulse for animal welfare outside the experiment. Lab. Anim. https://doi.org/10.17169/REFUBIUM-26765 (2020).

Sørensen, J. T., Sandøe, P. & Halberg, N. Animal welfare as one among several values to be considered at farm level: the idea of an ethical account for livestock farming. Acta Agric. Scand. A 51 , 11–16 (2001).

Google Scholar  

Verga, M. & Michelazzi, M. Companion animal welfare and possible implications on the human–pet relationship. Ital. J. Anim. Sci. 8 , 231–240 (2009).

Coulon, M. et al. Do lambs perceive regular human stroking as pleasant? Behavior and heart rate variability analyses. PLoS ONE 10 , e0118617 (2015).

Soares, M. C., Oliveira, R. F., Ros, A. F. H., Grutter, A. S. & Bshary, R. Tactile stimulation lowers stress in fish. Nat. Commun. 2 , 534 (2011).

Gourkow, N., Hamon, S. C. & Phillips, C. J. C. Effect of gentle stroking and vocalization on behaviour, mucosal immunity and upper respiratory disease in anxious shelter cats. Prev. Vet. Med. 117 , 266–275 (2014).

Oliveira, V. E. et al. Oxytocin and vasopressin within the ventral and dorsal lateral septum modulate aggression in female rats. Nat. Commun. 12 , 2900 (2021).

Burleson, M. H., Roberts, N. A., Coon, D. W. & Soto, J. A. Perceived cultural acceptability and comfort with affectionate touch: differences between Mexican Americans and European Americans. J. Soc. Personal. Relatsh. 36 , 1000–1022 (2019).

Wijaya, M. et al. The human ‘feel’ of touch contributes to its perceived pleasantness. J. Exp. Psychol. Hum. Percept. Perform. 46 , 155–171 (2020).

Golaya, S. Touch-hunger: an unexplored consequence of the COVID-19 pandemic. Indian J. Psychol. Med. 43 , 362–363 (2021).

Ng, T. W. H., Sorensen, K. L., Zhang, Y. & Yim, F. H. K. Anger, anxiety, depression, and negative affect: convergent or divergent? J. Vocat. Behav. 110 , 186–202 (2019).

Maier, M., Bartoš, F. & Wagenmakers, E.-J. Robust Bayesian meta-analysis: addressing publication bias with model-averaging. Psychol. Methods 28 , 107–122 (2022).

Ahles, T. A. et al. Massage therapy for patients undergoing autologous bone marrow transplantation. J. Pain. Symptom Manag. 18 , 157–163 (1999).

Albert, N. M. et al. A randomized trial of massage therapy after heart surgery. Heart Lung 38 , 480–490 (2009).

Ang, J. Y. et al. A randomized placebo-controlled trial of massage therapy on the immune system of preterm infants. Pediatrics 130 , e1549–e1558 (2012).

Arditi, H., Feldman, R. & Eidelman, A. I. Effects of human contact and vagal regulation on pain reactivity and visual attention in newborns. Dev. Psychobiol. 48 , 561–573 (2006).

Arora, J., Kumar, A. & Ramji, S. Effect of oil massage on growth and neurobehavior in very low birth weight preterm neonates. Indian Pediatr. 42 , 1092–1100 (2005).

PubMed   Google Scholar  

Asadollahi, M., Jabraeili, M., Mahallei, M., Asgari Jafarabadi, M. & Ebrahimi, S. Effects of gentle human touch and field massage on urine cortisol level in premature infants: a randomized, controlled clinical trial. J. Caring Sci. 5 , 187–194 (2016).

Basiri-Moghadam, M., Basiri-Moghadam, K., Kianmehr, M. & Jani, S. The effect of massage on neonatal jaundice in stable preterm newborn infants: a randomized controlled trial. J. Pak. Med. Assoc. 65 , 602–606 (2015).

Bauer, B. A. et al. Effect of massage therapy on pain, anxiety, and tension after cardiac surgery: a randomized study. Complement. Ther. Clin. Pract. 16 , 70–75 (2010).

Beijers, R., Cillessen, L. & Zijlmans, M. A. C. An experimental study on mother-infant skin-to-skin contact in full-terms. Infant Behav. Dev. 43 , 58–65 (2016).

Bennett, S. et al. Acute effects of traditional Thai massage on cortisol levels, arterial blood pressure and stress perception in academic stress condition: a single blind randomised controlled trial. J. Bodyw. Mov. Therapies 20 , 286–292 (2016).

Bergman, N., Linley, L. & Fawcus, S. Randomized controlled trial of skin-to-skin contact from birth versus conventional incubator for physiological stabilization in 1200- to 2199-gram newborns. Acta Paediatr. 93 , 779–785 (2004).

Bigelow, A., Power, M., MacLellan‐Peters, J., Alex, M. & McDonald, C. Effect of mother/infant skin‐to‐skin contact on postpartum depressive symptoms and maternal physiological stress. J. Obstet. Gynecol. Neonatal Nurs. 41 , 369–382 (2012).

Billhult, A., Bergbom, I. & Stener-Victorin, E. Massage relieves nausea in women with breast cancer who are undergoing chemotherapy. J. Altern. Complement. Med. 13 , 53–57 (2007).

Billhult, A., Lindholm, C., Gunnarsson, R. & Stener-Victorin, E. The effect of massage on cellular immunity, endocrine and psychological factors in women with breast cancer—a randomized controlled clinical trial. Auton. Neurosci. 140 , 88–95 (2008).

Braun, L. A. et al. Massage therapy for cardiac surgery patients—a randomized trial. J. Thorac. Cardiovasc. Surg. 144 , 1453–1459 (2012).

Cabibihan, J.-J. & Chauhan, S. S. Physiological responses to affective tele-touch during induced emotional stimuli. IEEE Trans. Affect. Comput. 8 , 108–118 (2017).

Campeau, M.-P. et al. Impact of massage therapy on anxiety levels in patients undergoing radiation therapy: randomized controlled trial. J. Soc. Integr. Oncol. 5 , 133–138 (2007).

Can, Ş. & Kaya, H. The effects of yakson or gentle human touch training given to mothers with preterm babies on attachment levels and the responses of the baby: a randomized controlled trial. Health Care Women Int. 43 , 479–498 (2021).

Carfoot, S., Williamson, P. & Dickson, R. A randomised controlled trial in the north of England examining the effects of skin-to-skin care on breast feeding. Midwifery 21 , 71–79 (2005).

Castral, T. C., Warnock, F., Leite, A. M., Haas, V. J. & Scochi, C. G. S. The effects of skin-to-skin contact during acute pain in preterm newborns. Eur. J. Pain. 12 , 464–471 (2008).

Cattaneo, A. et al. Kangaroo mother care for low birthweight infants: a randomized controlled trial in different settings. Acta Paediatr. 87 , 976–985 (1998).

Charpak, N., Ruiz-Peláez, J. G. & Charpak, Y. Rey-Martinez kangaroo mother program: an alternative way of caring for low birth weight infants? One year mortality in a two cohort study. Pediatrics 94 , 804–810 (1994).

Chermont, A. G., Falcão, L. F. M., de Souza Silva, E. H. L., de Cássia Xavier Balda, R. & Guinsburg, R. Skin-to-skin contact and/or oral 25% dextrose for procedural pain relief for term newborn infants. Pediatrics 124 , e1101–e1107 (2009).

Chi Luong, K., Long Nguyen, T., Huynh Thi, D. H., Carrara, H. P. O. & Bergman, N. J. Newly born low birthweight infants stabilise better in skin-to-skin contact than when separated from their mothers: a randomised controlled trial. Acta Paediatr. 105 , 381–390 (2016).

Cho, E.-S. et al. The effects of kangaroo care in the neonatal intensive care unit on the physiological functions of preterm infants, maternal–infant attachment, and maternal stress. J. Pediatr. Nurs. 31 , 430–438 (2016).

Choi, H. et al. The effects of massage therapy on physical growth and gastrointestinal function in premature infants: a pilot study. J. Child Health Care 20 , 394–404 (2016).

Choudhary, M. et al. To study the effect of Kangaroo mother care on pain response in preterm neonates and to determine the behavioral and physiological responses to painful stimuli in preterm neonates: a study from western Rajasthan. J. Matern. Fetal Neonatal Med. 29 , 826–831 (2016).

Christensson, K. et al. Temperature, metabolic adaptation and crying in healthy full-term newborns cared for skin-to-skin or in a cot. Acta Paediatr. 81 , 488–493 (1992).

Cloutier, S. & Newberry, R. C. Use of a conditioning technique to reduce stress associated with repeated intra-peritoneal injections in laboratory rats. Appl. Anim. Behav. Sci. 112 , 158–173 (2008).

Cloutier, S., Wahl, K., Baker, C. & Newberry, R. C. The social buffering effect of playful handling on responses to repeated intraperitoneal injections in laboratory rats. J. Am. Assoc. Lab. Anim. Sci. 53 , 168–173 (2014).

CAS   PubMed   PubMed Central   Google Scholar  

Cloutier, S., Wahl, K. L., Panksepp, J. & Newberry, R. C. Playful handling of laboratory rats is more beneficial when applied before than after routine injections. Appl. Anim. Behav. Sci. 164 , 81–90 (2015).

Cong, X. et al. Effects of skin-to-skin contact on autonomic pain responses in preterm infants. J. Pain. 13 , 636–645 (2012).

Cong, X., Ludington-Hoe, S. M., McCain, G. & Fu, P. Kangaroo care modifies preterm infant heart rate variability in response to heel stick pain: pilot study. Early Hum. Dev. 85 , 561–567 (2009).

Cong, X., Ludington-Hoe, S. M. & Walsh, S. Randomized crossover trial of kangaroo care to reduce biobehavioral pain responses in preterm infants: a pilot study. Biol. Res. Nurs. 13 , 204–216 (2011).

Costa, R. et al. Tactile stimulation of adult rats modulates hormonal responses, depression-like behaviors, and memory impairment induced by chronic mild stress: role of angiotensin II. Behav. Brain Res. 379 , 112250 (2020).

Cutshall, S. M. et al. Effect of massage therapy on pain, anxiety, and tension in cardiac surgical patients: a pilot study. Complement. Ther. Clin. Pract. 16 , 92–95 (2010).

Dalili, H., Sheikhi, S., Shariat, M. & Haghnazarian, E. Effects of baby massage on neonatal jaundice in healthy Iranian infants: a pilot study. Infant Behav. Dev. 42 , 22–26 (2016).

Diego, M. A., Field, T. & Hernandez-Reif, M. Vagal activity, gastric motility, and weight gain in massaged preterm neonates. J. Pediatr. 147 , 50–55 (2005).

Diego, M. A., Field, T. & Hernandez-Reif, M. Temperature increases in preterm infants during massage therapy. Infant Behav. Dev. 31 , 149–152 (2008).

Diego, M. A. et al. Preterm infant massage elicits consistent increases in vagal activity and gastric motility that are associated with greater weight gain. Acta Paediatr. 96 , 1588–1591 (2007).

Diego, M. A. et al. Spinal cord patients benefit from massage therapy. Int. J. Neurosci. 112 , 133–142 (2002).

Diego, M. A. et al. Aggressive adolescents benefit from massage therapy. Adolescence 37 , 597–607 (2002).

Diego, M. A. et al. HIV adolescents show improved immune function following massage therapy. Int. J. Neurosci. 106 , 35–45 (2001).

Dieter, J. N. I., Field, T., Hernandez-Reif, M., Emory, E. K. & Redzepi, M. Stable preterm infants gain more weight and sleep less after five days of massage therapy. J. Pediatr. Psychol. 28 , 403–411 (2003).

Ditzen, B. et al. Effects of different kinds of couple interaction on cortisol and heart rate responses to stress in women. Psychoneuroendocrinology 32 , 565–574 (2007).

Dreisoerner, A. et al. Self-soothing touch and being hugged reduce cortisol responses to stress: a randomized controlled trial on stress, physical touch, and social identity. Compr. Psychoneuroendocrinol. 8 , 100091 (2021).

Eaton, M., Mitchell-Bonair, I. L. & Friedmann, E. The effect of touch on nutritional intake of chronic organic brain syndrome patients. J. Gerontol. 41 , 611–616 (1986).

Edens, J. L., Larkin, K. T. & Abel, J. L. The effect of social support and physical touch on cardiovascular reactions to mental stress. J. Psychosom. Res. 36 , 371–382 (1992).

El-Farrash, R. A. et al. Longer duration of kangaroo care improves neurobehavioral performance and feeding in preterm infants: a randomized controlled trial. Pediatr. Res. 87 , 683–688 (2020).

Erlandsson, K., Dsilna, A., Fagerberg, I. & Christensson, K. Skin-to-skin care with the father after cesarean birth and its effect on newborn crying and prefeeding behavior. Birth 34 , 105–114 (2007).

Escalona, A., Field, T., Singer-Strunck, R., Cullen, C. & Hartshorn, K. Brief report: improvements in the behavior of children with autism following massage therapy. J. Autism Dev. Disord. 31 , 513–516 (2001).

Fattah, M. A. & Hamdy, B. Pulmonary functions of children with asthma improve following massage therapy. J. Altern. Complement. Med. 17 , 1065–1068 (2011).

Feldman, R. & Eidelman, A. I. Skin-to-skin contact (kangaroo care) accelerates autonomic and neurobehavioural maturation in preterm infants. Dev. Med. Child Neurol. 45 , 274–281 (2003).

Feldman, R., Eidelman, A. I., Sirota, L. & Weller, A. Comparison of skin-to-skin (kangaroo) and traditional care: parenting outcomes and preterm infant development. Pediatrics 110 , 16–26 (2002).

Feldman, R., Singer, M. & Zagoory, O. Touch attenuates infants’ physiological reactivity to stress. Dev. Sci. 13 , 271–278 (2010).

Feldman, R., Weller, A., Sirota, L. & Eidelman, A. I. Testing a family intervention hypothesis: the contribution of mother–infant skin-to-skin contact (kangaroo care) to family interaction, proximity, and touch. J. Fam. Psychol. 17 , 94–107 (2003).

Ferber, S. G. et al. Massage therapy by mothers and trained professionals enhances weight gain in preterm infants. Early Hum. Dev. 67 , 37–45 (2002).

Ferber, S. G. & Makhoul, I. R. The effect of skin-to-skin contact (kangaroo care) shortly after birth on the neurobehavioral responses of the term newborn: a randomized, controlled trial. Pediatrics 113 , 858–865 (2004).

Ferreira, A. M. & Bergamasco, N. H. P. Behavioral analysis of preterm neonates included in a tactile and kinesthetic stimulation program during hospitalization. Rev. Bras. Fisioter. 14 , 141–148 (2010).

Fidanza, F., Polimeni, E., Pierangeli, V. & Martini, M. A better touch: C-tactile fibers related activity is associated to pain reduction during temporal summation of second pain. J. Pain. 22 , 567–576 (2021).

Field, T. et al. Leukemia immune changes following massage therapy. J. Bodyw. Mov. Ther. 5 , 271–274 (2001).

Field, T. et al. Benefits of combining massage therapy with group interpersonal psychotherapy in prenatally depressed women. J. Bodyw. Mov. Ther. 13 , 297–303 (2009).

Field, T., Delage, J. & Hernandez-Reif, M. Movement and massage therapy reduce fibromyalgia pain. J. Bodyw. Mov. Ther. 7 , 49–52 (2003).

Field, T. et al. Fibromyalgia pain and substance P decrease and sleep improves after massage therapy. J. Clin. Rheumatol. 8 , 72–76 (2002).

Field, T., Diego, M., Gonzalez, G. & Funk, C. G. Neck arthritis pain is reduced and range of motion is increased by massage therapy. Complement. Ther. Clin. Pract. 20 , 219–223 (2014).

Field, T., Diego, M., Hernandez-Reif, M., Deeds, O. & Figueiredo, B. Pregnancy massage reduces prematurity, low birthweight and postpartum depression. Infant Behav. Dev. 32 , 454–460 (2009).

Field, T. et al. Insulin and insulin-like growth factor-1 increased in preterm neonates following massage therapy. J. Dev. Behav. Pediatr. 29 , 463–466 (2008).

Field, T. et al. Yoga and massage therapy reduce prenatal depression and prematurity. J. Bodyw. Mov. Ther. 16 , 204–209 (2012).

Field, T., Diego, M., Hernandez-Reif, M., Schanberg, S. & Kuhn, C. Massage therapy effects on depressed pregnant women. J. Psychosom. Obstet. Gynecol. 25 , 115–122 (2004).

Field, T., Diego, M., Hernandez-Reif, M. & Shea, J. Hand arthritis pain is reduced by massage therapy. J. Bodyw. Mov. Ther. 11 , 21–24 (2007).

Field, T., Gonzalez, G., Diego, M. & Mindell, J. Mothers massaging their newborns with lotion versus no lotion enhances mothers’ and newborns’ sleep. Infant Behav. Dev. 45 , 31–37 (2016).

Field, T. et al. Children with asthma have improved pulmonary functions after massage therapy. J. Pediatr. 132 , 854–858 (1998).

Field, T., Hernandez-Reif, M., Diego, M. & Fraser, M. Lower back pain and sleep disturbance are reduced following massage therapy. J. Bodyw. Mov. Ther. 11 , 141–145 (2007).

Field, T. et al. Effects of sexual abuse are lessened by massage therapy. J. Bodyw. Mov. Ther. 1 , 65–69 (1997).

Field, T. et al. Pregnant women benefit from massage therapy. J. Psychosom. Obstet. Gynecol. 20 , 31–38 (1999).

Field, T. et al. Juvenilerheumatoid arthritis: benefits from massage therapy. J. Pediatr. Psychol. 22 , 607–617 (1997).

Field, T., Hernandez-Reif, M., Taylor, S., Quintino, O. & Burman, I. Labor pain is reduced by massage therapy. J. Psychosom. Obstet. Gynecol. 18 , 286–291 (1997).

Field, T. et al. Massage therapy reduces anxiety and enhances EEG pattern of alertness and math computations. Int. J. Neurosci. 86 , 197–205 (1996).

Field, T. et al. Brief report: autistic children’s attentiveness and responsivity improve after touch therapy. J. Autism Dev. Disord. 27 , 333–338 (1997).

Field, T. M. et al. Tactile/kinesthetic stimulation effects on preterm neonates. Pediatrics 77 , 654–658 (1986).

Field, T. et al. Massage reduces anxiety in child and adolescent psychiatric patients. J. Am. Acad. Child Adolesc. Psychiatry 31 , 125–131 (1992).

Field, T. et al. Burn injuries benefit from massage therapy. J. Burn Care Res. 19 , 241–244 (1998).

Filho, F. L. et al. Effect of maternal skin-to-skin contact on decolonization of methicillin-oxacillin-resistant Staphylococcus in neonatal intensive care units: a randomized controlled trial. BMC Pregnancy Childbirth https://doi.org/10.1186/s12884-015-0496-1 (2015).

Forward, J. B., Greuter, N. E., Crisall, S. J. & Lester, H. F. Effect of structured touch and guided imagery for pain and anxiety in elective joint replacement patients—a randomized controlled trial: M-TIJRP. Perm. J. 19 , 18–28 (2015).

Fraser, J. & Ross Kerr, J. Psychophysiological effects of back massage on elderly institutionalized patients. J. Adv. Nurs. 18 , 238–245 (1993).

Frey Law, L. A. et al. Massage reduces pain perception and hyperalgesia in experimental muscle pain: a randomized, controlled trial. J. Pain. 9 , 714–721 (2008).

Gao, H. et al. Effect of repeated kangaroo mother care on repeated procedural pain in preterm infants: a randomized controlled trial. Int. J. Nurs. Stud. 52 , 1157–1165 (2015).

Garner, B. et al. Pilot study evaluating the effect of massage therapy on stress, anxiety and aggression in a young adult psychiatric inpatient unit. Aust. N. Z. J. Psychiatry 42 , 414–422 (2008).

Gathwala, G., Singh, B. & Singh, J. Effect of kangaroo mother care on physical growth, breastfeeding and its acceptability. Trop. Dr. 40 , 199–202 (2010).

Geva, N., Uzefovsky, F. & Levy-Tzedek, S. Touching the social robot PARO reduces pain perception and salivary oxytocin levels. Sci. Rep. 10 , 9814 (2020).

Gitau, R. et al. Acute effects of maternal skin-to-skin contact and massage on saliva cortisol in preterm babies. J. Reprod. Infant Psychol. 20 , 83–88 (2002).

Givi, M. Durability of effect of massage therapy on blood pressure. Int. J. Prev. Med. 4 , 511–516 (2013).

PubMed   PubMed Central   Google Scholar  

Glover, V., Onozawa, K. & Hodgkinson, A. Benefits of infant massage for mothers with postnatal depression. Semin. Neonatol. 7 , 495–500 (2002).

Gonzalez, A. et al. Weight gain in preterm infants following parent-administered vimala massage: a randomized controlled trial. Am. J. Perinatol. 26 , 247–252 (2009).

Gray, L., Watt, L. & Blass, E. M. Skin-to-skin contact is analgesic in healthy newborns. Pediatrics 105 , e14 (2000).

Grewen, K. M., Anderson, B. J., Girdler, S. S. & Light, K. C. Warm partner contact is related to lower cardiovascular reactivity. Behav. Med. 29 , 123–130 (2003).

Groër, M. W., Hill, J., Wilkinson, J. E. & Stuart, A. Effects of separation and separation with supplemental stroking in BALB/c infant mice. Biol. Res. Nurs. 3 , 119–131 (2002).

Gürol, A. P., Polat, S. & Nuran Akçay, M. Itching, pain, and anxiety levels are reduced with massage therapy in burned adolescents. J. Burn Care Res. 31 , 429–432 (2010).

Haley, S. et al. Tactile/kinesthetic stimulation (TKS) increases tibial speed of sound and urinary osteocalcin (U-MidOC and unOC) in premature infants (29–32 weeks PMA). Bone 51 , 661–666 (2012).

Harris, M., Richards, K. C. & Grando, V. T. The effects of slow-stroke back massage on minutes of nighttime sleep in persons with dementia and sleep disturbances in the nursing home: a pilot study. J. Holist. Nurs. 30 , 255–263 (2012).

Hart, S. et al. Anorexia nervosa symptoms are reduced by massage therapy. Eat. Disord. 9 , 289–299 (2001).

Hattan, J., King, L. & Griffiths, P. The impact of foot massage and guided relaxation following cardiac surgery: a randomized controlled trial. Issues Innov. Nurs. Pract. 37 , 199–207 (2002).

Haynes, A. C. et al. A calming hug: design and validation of a tactile aid to ease anxiety. PLoS ONE 17 , e0259838 (2022).

Henricson, M., Ersson, A., Määttä, S., Segesten, K. & Berglund, A.-L. The outcome of tactile touch on stress parameters in intensive care: a randomized controlled trial. Complement. Ther. Clin. Pract. 14 , 244–254 (2008).

Hernandez-Reif, M., Diego, M. & Field, T. Preterm infants show reduced stress behaviors and activity after 5 days of massage therapy. Infant Behav. Dev. 30 , 557–561 (2007).

Hernandez-Reif, M., Dieter, J. N. I., Field, T., Swerdlow, B. & Diego, M. Migraine headaches are reduced by massage therapy. Int. J. Neurosci. 96 , 1–11 (1998).

Hernandez-Reif, M. et al. Natural killer cells and lymphocytes increase in women with breast cancer following massage therapy. Int. J. Neurosci. 115 , 495–510 (2005).

Hernandez-Reif, M. et al. Children with cystic fibrosis benefit from massage therapy. J. Pediatr. Psychol. 24 , 175–181 (1999).

Hernandez-Reif, M., Field, T., Krasnegor, J. & Theakston, H. Lower back pain is reduced and range of motion increased after massage therapy. Int. J. Neurosci. 106 , 131–145 (2001).

Hernandez-Reif, M. et al. High blood pressure and associated symptoms were reduced by massage therapy. J. Bodyw. Mov. Ther. 4 , 31–38 (2000).

Hernandez-Reif, M. et al. Parkinson’s disease symptoms are differentially affected by massage therapy vs. progressive muscle relaxation: a pilot study. J. Bodyw. Mov. Ther. 6 , 177–182 (2002).

Hernandez-Reif, M., Field, T. & Theakston, H. Multiple sclerosis patients benefit from massage therapy. J. Bodyw. Mov. Ther. 2 , 168–174 (1998).

Hernandez-Reif, M. et al. Breast cancer patients have improved immune and neuroendocrine functions following massage therapy. J. Psychosom. Res. 57 , 45–52 (2004).

Hertenstein, M. J. & Campos, J. J. Emotion regulation via maternal touch. Infancy 2 , 549–566 (2001).

Hinchcliffe, J. K., Mendl, M. & Robinson, E. S. J. Rat 50 kHz calls reflect graded tickling-induced positive emotion. Curr. Biol. 30 , R1034–R1035 (2020).

Hodgson, N. A. & Andersen, S. The clinical efficacy of reflexology in nursing home residents with dementia. J. Altern. Complement. Med. 14 , 269–275 (2008).

Hoffmann, L. & Krämer, N. C. The persuasive power of robot touch. Behavioral and evaluative consequences of non-functional touch from a robot. PLoS ONE 16 , e0249554 (2021).

Holst, S., Lund, I., Petersson, M. & Uvnäs-Moberg, K. Massage-like stroking influences plasma levels of gastrointestinal hormones, including insulin, and increases weight gain in male rats. Auton. Neurosci. 120 , 73–79 (2005).

Hori, M. et al. Tickling during adolescence alters fear-related and cognitive behaviors in rats after prolonged isolation. Physiol. Behav. 131 , 62–67 (2014).

Hori, M. et al. Effects of repeated tickling on conditioned fear and hormonal responses in socially isolated rats. Neurosci. Lett. 536 , 85–89 (2013).

Hucklenbruch-Rother, E. et al. Delivery room skin-to-skin contact in preterm infants affects long-term expression of stress response genes. Psychoneuroendocrinology 122 , 104883 (2020).

Im, H. & Kim, E. Effect of yakson and gentle human touch versus usual care on urine stress hormones and behaviors in preterm infants: a quasi-experimental study. Int. J. Nurs. Stud. 46 , 450–458 (2009).

Jain, S., Kumar, P. & McMillan, D. D. Prior leg massage decreases pain responses to heel stick in preterm babies. J. Paediatr. Child Health 42 , 505–508 (2006).

Jane, S.-W. et al. Effects of massage on pain, mood status, relaxation, and sleep in Taiwanese patients with metastatic bone pain: a randomized clinical trial. Pain 152 , 2432–2442 (2011).

Johnston, C. C. et al. Kangaroo mother care diminishes pain from heel lance in very preterm neonates: a crossover trial. BMC Pediatr. 8 , 13 (2008).

Johnston, C. C. et al. Kangaroo care is effective in diminishing pain response in preterm neonates. Arch. Pediatr. Adolesc. Med. 157 , 1084–1088 (2003).

Jung, M. J., Shin, B.-C., Kim, Y.-S., Shin, Y.-I. & Lee, M. S. Is there any difference in the effects of QI therapy (external QIGONG) with and without touching? a pilot study. Int. J. Neurosci. 116 , 1055–1064 (2006).

Kapoor, Y. & Orr, R. Effect of therapeutic massage on pain in patients with dementia. Dementia 16 , 119–125 (2017).

Karagozoglu, S. & Kahve, E. Effects of back massage on chemotherapy-related fatigue and anxiety: supportive care and therapeutic touch in cancer nursing. Appl. Nurs. Res. 26 , 210–217 (2013).

Karbasi, S. A., Golestan, M., Fallah, R., Golshan, M. & Dehghan, Z. Effect of body massage on increase of low birth weight neonates growth parameters: a randomized clinical trial. Iran. J. Reprod. Med. 11 , 583–588 (2013).

Kashaninia, Z., Sajedi, F., Rahgozar, M. & Noghabi, F. A. The effect of kangaroo care on behavioral responses to pain of an intramuscular injection in neonates . J. Pediatr. Nurs. 3 , 275–280 (2008).

Kelling, C., Pitaro, D. & Rantala, J. Good vibes: The impact of haptic patterns on stress levels. In Proc. 20th International Academic Mindtrek Conference 130–136 (Association for Computing Machinery, 2016).

Khilnani, S., Field, T., Hernandez-Reif, M. & Schanberg, S. Massage therapy improves mood and behavior of students with attention-deficit/hyperactivity disorder. Adolescence 38 , 623–638 (2003).

Kianmehr, M. et al. The effect of massage on serum bilirubin levels in term neonates with hyperbilirubinemia undergoing phototherapy. Nautilus 128 , 36–41 (2014).

Kim, I.-H., Kim, T.-Y. & Ko, Y.-W. The effect of a scalp massage on stress hormone, blood pressure, and heart rate of healthy female. J. Phys. Ther. Sci. 28 , 2703–2707 (2016).

Kim, M. A., Kim, S.-J. & Cho, H. Effects of tactile stimulation by fathers on physiological responses and paternal attachment in infants in the NICU: a pilot study. J. Child Health Care 21 , 36–45 (2017).

Kim, M. S., Sook Cho, K., Woo, H.-M. & Kim, J. H. Effects of hand massage on anxiety in cataract surgery using local anesthesia. J. Cataract Refr. Surg. 27 , 884–890 (2001).

Koole, S. L., Tjew A Sin, M. & Schneider, I. K. Embodied terror management: interpersonal touch alleviates existential concerns among individuals with low self-esteem. Psychol. Sci. 25 , 30–37 (2014).

Krohn, M. et al. Depression, mood, stress, and Th1/Th2 immune balance in primary breast cancer patients undergoing classical massage therapy. Support. Care Cancer 19 , 1303–1311 (2011).

Kuhn, C. et al. Tactile-kinesthetic stimulation effects sympathetic and adrenocortical function in preterm infants. J. Pediatr. 119 , 434–440 (1991).

Kumar, J. et al. Effect of oil massage on growth in preterm neonates less than 1800 g: a randomized control trial. Indian J. Pediatr. 80 , 465–469 (2013).

Lee, H.-K. The effects of infant massage on weight, height, and mother–infant interaction. J. Korean Acad. Nurs. 36 , 1331–1339 (2006).

Leivadi, S. et al. Massage therapy and relaxation effects on university dance students. J. Dance Med. Sci. 3 , 108–112 (1999).

Lindgren, L. et al. Touch massage: a pilot study of a complex intervention. Nurs. Crit. Care 18 , 269–277 (2013).

Lindgren, L. et al. Physiological responses to touch massage in healthy volunteers. Auton. Neurosci. Basic Clin. 158 , 105–110 (2010).

Listing, M. et al. Massage therapy reduces physical discomfort and improves mood disturbances in women with breast cancer. Psycho-Oncol. 18 , 1290–1299 (2009).

Ludington-Hoe, S. M., Cranston Anderson, G., Swinth, J. Y., Thompson, C. & Hadeed, A. J. Randomized controlled trial of kangaroo care: cardiorespiratory and thermal effects on healthy preterm infants. Neonatal Netw. 23 , 39–48 (2004).

Lund, I. et al. Corticotropin releasing factor in urine—a possible biochemical marker of fibromyalgia. Neurosci. Lett. 403 , 166–171 (2006).

Ma, Y.-K. et al. Lack of social touch alters anxiety-like and social behaviors in male mice. Stress 25 , 134–144 (2022).

Massaro, A. N., Hammad, T. A., Jazzo, B. & Aly, H. Massage with kinesthetic stimulation improves weight gain in preterm infants. J. Perinatol. 29 , 352–357 (2009).

Mathai, S., Fernandez, A., Mondkar, J. & Kanbur, W. Effects of tactile-kinesthetic stimulation in preterms–a controlled trial. Indian Pediatr. 38 , 1091–1098 (2001).

CAS   PubMed   Google Scholar  

Matsunaga, M. et al. Profiling of serum proteins influenced by warm partner contact in healthy couples. Neuroenocrinol. Lett. 30 , 227–236 (2009).

CAS   Google Scholar  

Mendes, E. W. & Procianoy, R. S. Massage therapy reduces hospital stay and occurrence of late-onset sepsis in very preterm neonates. J. Perinatol. 28 , 815–820 (2008).

Mirnia, K., Arshadi Bostanabad, M., Asadollahi, M. & Hamid Razzaghi, M. Paternal skin-to-skin care and its effect on cortisol levels of the infants. Iran. J. Pediatrics 27 , e8151 (2017).

Mitchell, A. J., Yates, C., Williams, K. & Hall, R. W. Effects of daily kangaroo care on cardiorespiratory parameters in preterm infants. J. Neonatal-Perinat. Med. 6 , 243–249 (2013).

Mitchinson, A. R. et al. Acute postoperative pain management using massage as an adjuvant therapy: a randomized trial. Arch. Surg. 142 , 1158–1167 (2007).

Modrcin-Talbott, M. A., Harrison, L. L., Groer, M. W. & Younger, M. S. The biobehavioral effects of gentle human touch on preterm infants. Nurs. Sci. Q. 16 , 60–67 (2003).

Mok, E. & Pang Woo, C. The effects of slow-stroke back massage on anxiety and shoulder pain in elderly stroke patients. Complement. Ther. Nurs. Midwifery 10 , 209–216 (2004).

Mokaberian, M., Noripour, S., Sheikh, M. & Mills, P. J. Examining the effectiveness of body massage on physical status of premature neonates and their mothers’ psychological status. Early Child Dev. Care 192 , 2311–2325 (2021).

Mori, H. et al. Effect of massage on blood flow and muscle fatigue following isometric lumbar exercise. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 10 , CR173–CR178 (2004).

Moyer-Mileur, L. J., Haley, S., Slater, H., Beachy, J. & Smith, S. L. Massage improves growth quality by decreasing body fat deposition in male preterm infants. J. Pediatr. 162 , 490–495 (2013).

Moyle, W. et al. Foot massage and physiological stress in people with dementia: a randomized controlled trial. J. Altern. Complement. Med. 20 , 305–311 (2014).

Muntsant, A., Shrivastava, K., Recasens, M. & Giménez-Llort, L. Severe perinatal hypoxic-ischemic brain injury induces long-term sensorimotor deficits, anxiety-like behaviors and cognitive impairment in a sex-, age- and task-selective manner in C57BL/6 mice but can be modulated by neonatal handling. Front. Behav. Neurosci. 13 , 7 (2019).

Negahban, H., Rezaie, S. & Goharpey, S. Massage therapy and exercise therapy in patients with multiple sclerosis: a randomized controlled pilot study. Clin. Rehabil. 27 , 1126–1136 (2013).

Nelson, D., Heitman, R. & Jennings, C. Effects of tactile stimulation on premature infant weight gain. J. Obstet. Gynecol. Neonatal Nurs. 15 , 262–267 (1986).

Griffin, J. W. Calculating statistical power for meta-analysis using metapower. Quant. Meth. Psychol . 17 , 24–39 (2021).

Nunes, G. S. et al. Massage therapy decreases pain and perceived fatigue after long-distance Ironman triathlon: a randomised trial. J. Physiother. 62 , 83–87 (2016).

Ohgi, S. et al. Comparison of kangaroo care and standard care: behavioral organization, development, and temperament in healthy, low-birth-weight infants through 1 year. J. Perinatol. 22 , 374–379 (2002).

O′Higgins, M., St. James Roberts, I. & Glover, V. Postnatal depression and mother and infant outcomes after infant massage. J. Affect. Disord. 109 , 189–192 (2008).

Okan, F., Ozdil, A., Bulbul, A., Yapici, Z. & Nuhoglu, A. Analgesic effects of skin-to-skin contact and breastfeeding in procedural pain in healthy term neonates. Ann. Trop. Paediatr. 30 , 119–128 (2010).

Oliveira, D. S., Hachul, H., Goto, V., Tufik, S. & Bittencourt, L. R. A. Effect of therapeutic massage on insomnia and climacteric symptoms in postmenopausal women. Climacteric 15 , 21–29 (2012).

Olsson, E., Ahlsén, G. & Eriksson, M. Skin-to-skin contact reduces near-infrared spectroscopy pain responses in premature infants during blood sampling. Acta Paediatr. 105 , 376–380 (2016).

Pauk, J., Kuhn, C. M., Field, T. M. & Schanberg, S. M. Positive effects of tactile versus kinesthetic or vestibular stimulation on neuroendocrine and ODC activity in maternally-deprived rat pups. Life Sci. 39 , 2081–2087 (1986).

Pinazo, D., Arahuete, L. & Correas, N. Hugging as a buffer against distal fear of death. Calid. Vida Salud 13 , 11–20 (2020).

Pope, M. H. et al. A prospective randomized three-week trial of spinal manipulation, transcutaneous muscle stimulation, massage and corset in the treatment of subacute low back pain. Spine 19 , 2571–2577 (1994).

Preyde, M. Effectiveness of massage therapy for subacute low-back pain: a randomized controlled trial. Can. Med. Assoc. J. 162 , 1815–1820 (2000).

Ramanathan, K., Paul, V. K., Deorari, A. K., Taneja, U. & George, G. Kangaroo mother care in very low birth weight infants. Indian J. Pediatr. 68 , 1019–1023 (2001).

Reddan, M. C., Young, H., Falkner, J., López-Solà, M. & Wager, T. D. Touch and social support influence interpersonal synchrony and pain. Soc. Cogn. Affect. Neurosci. 15 , 1064–1075 (2020).

Rodríguez-Mansilla, J. et al. The effects of ear acupressure, massage therapy and no therapy on symptoms of dementia: a randomized controlled trial. Clin. Rehabil. 29 , 683–693 (2015).

Rose, S. A., Schmidt, K., Riese, M. L. & Bridger, W. H. Effects of prematurity and early intervention on responsivity to tactual stimuli: a comparison of preterm and full-term infants. Child Dev. 51 , 416–425 (1980).

Scafidi, F. A. et al. Massage stimulates growth in preterm infants: a replication. Infant Behav. Dev. 13 , 167–188 (1990).

Scafidi, F. A. et al. Effects of tactile/kinesthetic stimulation on the clinical course and sleep/wake behavior of preterm neonates. Infant Behav. Dev. 9 , 91–105 (1986).

Scafidi, F. & Field, T. Massage therapy improves behavior in neonates born to HIV-positive mothers. J. Pediatr. Psychol. 21 , 889–897 (1996).

Scarr-Salapatek, S. & Williams, M. L. A stimulation program for low birth weight infants. Am. J. Public Health 62 , 662–667 (1972).

Serrano, B., Baños, R. M. & Botella, C. Virtual reality and stimulation of touch and smell for inducing relaxation: a randomized controlled trial. Comput. Hum. Behav. 55 , 1–8 (2016).

Seyyedrasooli, A., Valizadeh, L., Hosseini, M. B., Asgari Jafarabadi, M. & Mohammadzad, M. Effect of vimala massage on physiological jaundice in infants: a randomized controlled trial. J. Caring Sci. 3 , 165–173 (2014).

Sharpe, P. A., Williams, H. G., Granner, M. L. & Hussey, J. R. A randomised study of the effects of massage therapy compared to guided relaxation on well-being and stress perception among older adults. Complement. Therap. Med. 15 , 157–163 (2007).

Sherman, K. J., Cherkin, D. C., Hawkes, R. J., Miglioretti, D. L. & Deyo, R. A. Randomized trial of therapeutic massage for chronic neck pain. Clin. J. Pain. 25 , 233–238 (2009).

Shiloh, S., Sorek, G. & Terkel, J. Reduction of state-anxiety by petting animals in a controlled laboratory experiment. Anxiety, Stress Coping 16 , 387–395 (2003).

Shor-Posner, G. et al. Impact of a massage therapy clinical trial on immune status in young Dominican children infected with HIV-1. J. Altern. Complement. Med. 12 , 511–516 (2006).

Simpson, E. A. et al. Social touch alters newborn monkey behavior. Infant Behav. Dev. 57 , 101368 (2019).

Smith, S. L., Haley, S., Slater, H. & Moyer-Mileur, L. J. Heart rate variability during caregiving and sleep after massage therapy in preterm infants. Early Hum. Dev. 89 , 525–529 (2013).

Smith, S. L. et al. The effect of massage on heart rate variability in preterm infants. J. Perinatol. 33 , 59–64 (2013).

Solkoff, N. & Matuszak, D. Tactile stimulation and behavioral development among low-birthweight infants. Child Psychiatry Hum. Dev. 6 , 3337 (1975).

Srivastava, S., Gupta, A., Bhatnagar, A. & Dutta, S. Effect of very early skin to skin contact on success at breastfeeding and preventing early hypothermia in neonates. Indian J. Public Health 58 , 22–26 (2014).

Stringer, J., Swindell, R. & Dennis, M. Massage in patients undergoing intensive chemotherapy reduces serum cortisol and prolactin: massage in oncology patients reduces serum cortisol. Psycho-Oncol. 17 , 1024–1031 (2008).

Suman Rao, P. N., Udani, R. & Nanavati, R. Kangaroo mother care for low birth weight infants: a randomized controlled trial. Indian Pediatr. 45 , 17–23 (2008).

Sumioka, H. et al. A huggable device can reduce the stress of calling an unfamiliar person on the phone for individuals with ASD. PLoS ONE 16 , e0254675 (2021).

Sumioka, H., Nakae, A., Kanai, R. & Ishiguro, H. Huggable communication medium decreases cortisol levels. Sci. Rep. 3 , 3034 (2013).

Suzuki, M. et al. Physical and psychological effects of 6-week tactile massage on elderly patients with severe dementia. Am. J. Alzheimer’s Dis. Other Dement. 25 , 680–686 (2010).

Thomson, L. J. M., Ander, E. E., Menon, U., Lanceley, A. & Chatterjee, H. J. Quantitative evidence for wellbeing benefits from a heritage-in-health intervention with hospital patients. Int. J. Art. Ther. 17 , 63–79 (2012).

Triplett, J. L. & Arneson, S. W. The use of verbal and tactile comfort to alleviate distress in young hospitalized children. Res. Nurs. Health 2 , 17–23 (1979).

Walach, H., Güthlin, C. & König, M. Efficacy of massage therapy in chronic pain: a pragmatic randomized trial. J. Altern. Complement. Med. 9 , 837–846 (2003).

Walker, S. C. et al. C‐low threshold mechanoafferent targeted dynamic touch modulates stress resilience in rats exposed to chronic mild stress. Eur. J. Neurosci. 55 , 2925–2938 (2022).

Weinrich, S. P. & Weinrich, M. C. The effect of massage on pain in cancer patients. Appl. Nurs. Res. 3 , 140–145 (1990).

Wheeden, A. et al. Massage effects on cocaine-exposed preterm neonates. Dev. Behav. Pediatr. 14 , 318–322 (1993).

White, J. L. & Labarba, R. C. The effects of tactile and kinesthetic stimulation on neonatal development in the premature infant. Dev. Psychobiol. 9 , 569–577 (1976).

Wilkie, D. J. et al. Effects of massage on pain intensity, analgesics and quality of life in patients with cancer pain: a pilot study of a randomized clinical trial conducted within hospice care delivery. Hosp. J. 15 , 31–53 (2000).

Willemse, C. J. A. M., Toet, A. & van Erp, J. B. F. Affective and behavioral responses to robot-initiated social touch: toward understanding the opportunities and limitations of physical contact in human–robot interaction. Front. ICT 4 , 12 (2017).

Willemse, C. J. A. M. & van Erp, J. B. F. Social touch in human–robot interaction: robot-initiated touches can induce positive responses without extensive prior bonding. Int. J. Soc. Robot. 11 , 285–304 (2019).

Woods, D. L., Beck, C. & Sinha, K. The effect of therapeutic touch on behavioral symptoms and cortisol in persons with dementia. Res. Complement. Med. 16 , 181–189 (2009).

Yamaguchi, M., Sekine, T. & Shetty, V. A salivary cytokine panel discriminates moods states following a touch massage intervention. Int. J. Affect. Eng. 19 , 189–198 (2020).

Yamazaki, R. et al. Intimacy in phone conversations: anxiety reduction for Danish seniors with hugvie. Front. Psychol. 7 , 537 (2016).

Yang, M.-H. et al. Comparison of the efficacy of aroma-acupressure and aromatherapy for the treatment of dementia-associated agitation. BMC Complement. Altern. Med. 15 , 93 (2015).

Yates, C. C. et al. The effects of massage therapy to induce sleep in infants born preterm. Pediatr. Phys. Ther. 26 , 405–410 (2014).

Yu, H. et al. Social touch-like tactile stimulation activates a tachykinin 1-oxytocin pathway to promote social interactions. Neuron 110 , 1051–1067 (2022).

Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t -tests and ANOVAs. Front. Psychol. 4 , 863 (2013).

Page, M. J., et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst. Rev. https://doi.org/10.1186/s13643-021-01626-4 (2021).

Wilson, D. B. Practical meta-analysis effect size calculator (Version 2023.11.27). https://campbellcollaboration.org/research-resources/effect-size-calculator.html (2023).

Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw https://doi.org/10.18637/jss.v036.i03 (2010).

Scammacca, N., Roberts, G. & Stuebing, K. K. Meta-analysis with complex research designs: dealing with dependence from multiple measures and multiple group comparisons. Rev. Educ. Res. 84 , 328–364 (2014).

Pustejovsky, J. E. & Tipton, E. Meta-analysis with robust variance estimation: expanding the range of working models. Prev. Sci. Off. J. Soc. Prev. Res. 23 , 425–438 (2022).

Cook, R. D. in International Encyclopedia of Statistical Science (ed. M. Lovric) S. 301–302 (Springer, 2011).

Higgins, J. P. T., Thompson, S. & Deeks, J. Measuring inconsistency in meta-analyses. BMJ https://doi.org/10.1136/bmj.327.7414.557 (2003).

Oberauer, K. The importance of random slopes in mixed models for Bayesian hypothesis testing. Psychol. Sci. 33 , 648–665 (2022).

Nakagawa, S. et al. The orchard plot: cultivating a forest plot for use in ecology, evolution, and beyond. Res. Synth. Methods 12 , 4–12 (2021).

Download references

Acknowledgements

We thank A. Frick and E. Chris for supporting the initial literature search and coding. We also thank A. Dreisoerner, T. Field, S. Koole, C. Kuhn, M. Henricson, L. Frey Law, J. Fraser, M. Cumella Reddan, and J. Stringer, who kindly responded to our data requests and provided additional information or data with respect to single studies. J.P. was supported by the German National Academy of Sciences Leopoldina (LPDS 2021-05). H.H. was supported by the Marietta-Blau scholarship of the Austrian Agency for Education and Internationalisation (OeAD) and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, project ID 422744262 – TRR 289). C.K. received funding from OCENW.XL21.XL21.069 and V.G. from the European Research Council (ERC) under European Union’s Horizon 2020 research and innovation programme, grant ‘HelpUS’ (758703) and from the Dutch Research Council (NWO) grant OCENW.XL21.XL21.069. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Open access funding provided by Ruhr-Universität Bochum.

Author information

Julian Packheiser

Present address: Social Neuroscience, Faculty of Medicine, Ruhr University Bochum, Bochum, Germany

These authors contributed equally: Julian Packheiser, Helena Hartmann.

Authors and Affiliations

Social Brain Lab, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Art and Sciences, Amsterdam, the Netherlands

Julian Packheiser, Helena Hartmann, Kelly Fredriksen, Valeria Gazzola, Christian Keysers & Frédéric Michon

Center for Translational and Behavioral Neuroscience, University Hospital Essen, Essen, Germany

Helena Hartmann

Clinical Neurosciences, Department for Neurology, University Hospital Essen, Essen, Germany

You can also search for this author in PubMed   Google Scholar

Contributions

J.P. contributed to conceptualization, methodology, formal analysis, investigation, data curation, writing the original draft, review and editing, visualization, supervision and project administration. HH contributed to conceptualization, methodology, formal analysis, investigation, data curation, writing the original draft, review and editing, visualization, supervision and project administration. K.F. contributed to investigation, data curation, and review and editing. C.K. and V.G. contributed to conceptualization, and review and editing. F.M. contributed to conceptualization, methodology, formal analysis, investigation, writing the original draft, and review and editing.

Corresponding author

Correspondence to Julian Packheiser .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Human Behaviour thanks Ville Harjunen, Rebecca Boehme and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary Figs. 1–21 and Tables 1–4.

Reporting Summary

Peer review file, supplementary table 1.

List of studies included in and excluded from the meta-analyses/review.

Supplementary Table 2

PRISMA checklist, manuscript.

Supplementary Table 3

PRISMA checklist, abstract.

Source Data Fig. 2

Effect size/error (columns ‘Hedges_g’ and ‘variance’) information for each study/cohort/effect included in the analysis. Source Data Fig. 3 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘Outcome’) for each study/cohort/effect included in the analysis. Source Data Fig. 4 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (columns ‘dyad_type’ and ‘skin_to_skin’) for each study/cohort/effect included in the analysis. Source Data Fig. 5 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘touch_type’) for each study/cohort/effect included in the analysis. Source Data Fig. 6 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘clin_sample’) for each study/cohort/effect included in the analysis. Source Data Fig. 7 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (column ‘familiarity’) for each study/cohort/effect included in the analysis. Source Data Fig. 7 Effect size/error (columns ‘Hedges_g’ and ‘variance’) together with moderator data (columns ‘touch_duration’ and ‘sessions’) for each study/cohort/effect included in the analysis.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Packheiser, J., Hartmann, H., Fredriksen, K. et al. A systematic review and multivariate meta-analysis of the physical and mental health benefits of touch interventions. Nat Hum Behav (2024). https://doi.org/10.1038/s41562-024-01841-8

Download citation

Received : 16 August 2023

Accepted : 29 January 2024

Published : 08 April 2024

DOI : https://doi.org/10.1038/s41562-024-01841-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

literature review meta analysis

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 14, Issue 4
  • Comparison of the efficacy and tolerability of different repetitive transcranial magnetic stimulation modalities for post-stroke dysphagia: a systematic review and Bayesian network meta-analysis protocol
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0009-0001-2981-9322 Qiang Chen 1 ,
  • http://orcid.org/0009-0005-3248-0875 Mengfan Kan 1 ,
  • http://orcid.org/0000-0002-8404-5032 Xiaoyu Jiang 1 ,
  • http://orcid.org/0000-0003-3918-3775 Huifen Liu 1 ,
  • Deqi Zhang 1 ,
  • Lin Yuan 1 ,
  • Qiling Xu 1 ,
  • Hongyan Bi 2
  • 1 College of Rehabilitation Medicine , Shandong University of Traditional Chinese Medicine , Jinan , Shandong , China
  • 2 Department of Rehabilitation Medicine , Shandong University of Traditional Chinese Medicine Affiliated Hospital , Jinan , Shandong , China
  • Correspondence to Professor Hongyan Bi; Hy__bi{at}163.com

Introduction Up to 78% of patients who had a stroke develop post-stroke dysphagia (PSD), a significant consequence. Life-threatening aspiration pneumonia, starvation, and water and electrolyte abnormalities can result. Several meta-analyses have shown that repeated transcranial magnetic stimulation (rTMS) improves swallowing in patients who had a stroke; however, the optimum model is unknown. This study will be the first Bayesian network meta-analysis (NMA) to determine the best rTMS modalities for swallowing of patients who had a stroke.

Methods and analysis PubMed, Web of Science, Embase, Google Scholar, Cochrane, the Chinese National Knowledge Infrastructure, the Chongqing VIP Database and WanFang Data will be searched from their creation to 2 September 2023. All randomised controlled trials associated with rTMS for PSD will be included. Only Chinese or English results will be studied. Two researchers will independently review the literature and extract data, then use the Cochrane Collaboration’s Risk of Bias 2.0 tool to assess the included studies’ methodological quality. The primary outcome is swallowing function improvement, whereas secondary outcomes include side effects (eg, paraesthesia, vertigo, seizures) and quality of life. A pairwise meta-analysis and NMA based on a Bayesian framework will be conducted using Stata and R statistical software. The Grading of Recommendations Assessment, Development, and Evaluation system will assess outcome indicator evidence quality.

Ethics and dissemination As all data in this study will be taken from the literature, ethical approval is not needed. We will publish our work in peer-reviewed publications and present it at academic conferences.

PROSPERO registration number CRD42023456386.

  • Transcranial Magnetic Stimulation
  • Systematic Review
  • REHABILITATION MEDICINE

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjopen-2023-080289

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

This study will collect a wide range of evidence to assess the efficacy and tolerability of repetitive transcranial magnetic stimulation for post-stroke dysphagia.

The study’s outcome indicators will be coupled with the subjective assessment scale and objective physiological index.

The Grading of Recommendations Assessment, Development, and Evaluation system will be implemented to assess the quality of the evidence.

Language bias may result from searching solely Chinese and English databases for literature.

Introduction

Stroke is the second leading cause of death and the third leading cause of disability worldwide, 1 2 with more than 17 million new cases reported each year. 3 The Global Burden of Disease Study 2019 showed that about 101 million people have a stroke worldwide, and the number of deaths due to stroke has reached 65.5 million. 4 Sequelae of varying degrees can occur after stroke, such as dyskinesia, cognitive impairment, dysphagia, speech disorder, anxiety, depression, fatigue and other symptoms, which cause a heavy burden on the lives of patients and their families. 5–8 Among them, post-stroke dysphagia (PSD) is a common and serious complication after stroke, and its incidence ranges from 37% to 78%. 6 About 20–43% of patients have persistent dysphagia after 3 months, mainly manifested as choking on drinking, unable to eat and can cause a variety of complications, such as aspiration pneumonia, malnutrition, and water and electrolyte disorders. 9 In severe cases, it may lead to asphyxia, thereby increasing the risk of death. 10 In addition, PSD can further lead to a series of psychological problems in patients, such as fear of eating, anxiety, depression, among others, which cause serious distress to patients’ psychology and daily lives. 11 12 However, these negative psychological states will in turn lead to the aggravation of PSD, which affects the recovery and quality of life of patients, thus forming a vicious circle. 13 14 It is worth noting that early screening, intervention and management of PSD have not received enough attention. 15 16 Consequently, the timely diagnosis and effective treatment of PSD have become urgent problems to be solved in clinical work.

The pathogenesis of PSD is quite complex, which may be related to damage to the swallowing cortical centre, descending cortical fibres, bulbar swallowing centre and extrapyramidal system. 17–20 At present, the clinical treatment methods for PSD are limited. Compensatory interventions based on diet and nutrition interventions combined with recovery interventions based on swallowing function rehabilitation training are widely used. 21 22 However, these therapies have problems such as inflated cost, long cycle and poor compliance, which are not conducive to clinical application. 23–25 Nevertheless, repetitive transcranial magnetic stimulation (rTMS), as a non-invasive treatment technique, can directly regulate the excitability of the swallowing cortex or promote the reorganisation of swallowing cortex function by generating evoked potentials through pulsed magnetic fields. It has the advantages of simple operation, being non-invasive, painless and having high safety, and it does not require the active cooperation of patients, which brings new opportunities for the treatment of PSD. 24 25

There are various stimulation modalities for treating PSD with rTMS. It is believed that low-frequency rTMS (LF-rTMS) (≤1 Hz) can attenuate cortical excitability, while high-frequency rTMS (HF-rTMS) (>1 Hz) can enhance cortical excitability. 26 27 Consequently, previous studies usually used HF-rTMS (3 Hz, 5 Hz and 10 Hz) to stimulate (excite) the lesion side (the affected side) or LF-rTMS (1 Hz) to stimulate (inhibit) the non-lesion side (the healthy side) to improve the swallowing function of patients with PSD. 28–30 The selection of the above stimulation modality is dependent on the competitive cerebral hemisphere model. rTMS can enhance or inhibit the excitability of the contralateral cerebral hemisphere, reshape the balance between the two hemispheres and thus achieve the goal of restoring swallowing function after stroke. 31 32 In addition, some studies have shown that HF-rTMS of the contralateral cerebral cortex or bilateral cerebral cortex stimulation can also improve or even contribute to the recovery of swallowing function in patients with PSD. Some researchers have also shown that HF-rTMS of the opposite cerebral cortex or stimulation of both cerebral cortices can help patients with PSD swallow better or even get their swallowing back. 24 33–37 This may be related to functional reorganisation and compensation of the swallowing motor cortex function in the contralateral hemisphere. 38 39 As a result, there are significant debates about whether rTMS should be applied to the affected side, the healthy side or both sides, and whether LF-rTMS or HF-rTMS should be applied on this basis.

At present, many scholars have conducted evidence-based medical research on rTMS in the PSD. 40–42 Liao et al published a systematic review and meta-analysis in 2017, which confirmed that rTMS has a positive effect on PSD. Moreover, compared with LF-rTMS, HF-rTMS may be more beneficial to patients. 40 Tan et al also conducted a systematic review and meta-analysis in 2022 and found that rTMS has a long-term effect on the recovery of swallowing function after stroke. 41 Hsiao et al published a meta-analysis in 2023, which confirmed that both HF-rTMS on the affected side and LF-rTMS on the healthy side could improve the swallowing function of patients who had a stroke. 42 However, they are based on traditional meta-analysis methods, which can only achieve a direct comparison between two interventions and lack a comparison of the efficacy of different rTMS modalities. Network meta-analysis (NMA) can be used to compare the efficacy of different rTMS treatment regimens.

Consequently, this study will use the Bayesian NMA method to compare the efficacy of different rTMS modalities, rank their effectiveness and synthesise the results to obtain the best rTMS treatment regimens and provide reliable and comprehensive evidence for clinical treatment decisions in patients with PSD.

Methods and analyses

Protocol design and registration.

We plan to do a systematic review and NMA based on a Bayesian framework. This protocol was implemented according to the Preferred Reporting Item for Systematic Reviews and Meta-Analyses Protocol 43 and has been registered on PROSPERO (CRD42023456386). Any amendments to this agreement will be made through PROSPERO.

Inclusion criteria

Types of studies.

Only randomised controlled trials (RCTs) presented in English or Chinese will be included in the study. Animal trials, meta-analyses, systematic reviews, abstracts, conference presentations, case reports and cohort studies will be excluded.

Types of participants

All participants will meet the following criteria: (1) patients with ischaemic or haemorrhagic stroke (including cerebral hemisphere and brain stem) diagnosed by CT, MRI and other related examinations, not limited to stroke stage; (2) patients with a final diagnosis of swallowing dysfunction on a clinical swallowing-related scale or by objective instrumental examination; and (3) adult patients (≥18 years old) regardless of gender, ethnicity, race and education level.

Types of interventions

The intervention of the experimental group may be rTMS treatment with different stimulation modalities. Based on our previous literature search, rTMS treatment regimens may have a choice of five stimulation modalities. Among the stimulation modalities will mainly include LF-rTMS on the healthy side, 27 30 HF-rTMS on the affected side, 24 30 32 34 36 37 HF-rTMS on the healthy side, 32 33 35–37 HF-rTMS bilaterally 24 32 34 36 37 and LF-rTMS on the healthy side combined with HF-rTMS on the affected side. 44

Types of control groups

The control group may be conventional rehabilitation therapy, sham stimulation therapy or another rTMS treatment regimen different from the experimental group.

The primary outcome will be improvement in swallowing function, which will be measured with a swallowing assessment scale and objective physiological measures of swallowing function. Among them, the Standardized Swallowing Assessment (SSA) and the Penetration Aspiration Scale (PAS) will be included in the subjective swallowing assessment. Objective swallowing measurements will include a videofluoroscopic swallowing study (VFSS) and surface electromyography (sEMG). The secondary outcomes will include quality-of-life measures such as the Swallowing Quality-of-Life Questionnaire and adverse events (including dizziness, headache, paraesthesia, seizures). The tolerability of the rTMS intervention will be evaluated by the occurrence of adverse events.

PAS is an indicator of food invasion into the airways. The score is between 1 and 8 points, with a higher score indicating a higher risk of aspiration and a greater degree of dysphagia. 45 The SSA is composed of three parts 46 : (1) the clinical examination mainly includes eight items such as consciousness level, head and trunk control, and lip control, with a total score of 8–23 points; (2) the patient is asked to swallow 5 mL of water three times, and the mouth is observed for running water, laryngeal movement, repeated swallowing, wheezing during swallowing and laryngeal function after swallowing, with a total score of 5–11 points; (3) if no abnormal manifestations are observed in the above examination, the patient is instructed to drink 60 mL of water. Observations are made to check whether the patient can consume all the water, if there is any coughing or wheezing during or after swallowing, if there is any laryngeal function impairment after swallowing and if there is any sign of aspiration. The total score is 5–12 points. The SSA scores range from 18 to 46, with higher scores indicating more severe dysphagia in the patient. sEMG can quantitatively evaluate the functional status of neuromusculars during swallowing, reflect the difficulty and duration of tongue–laryngeal complex elevation and predict the risk of aspiration in patients with dysphagia. 47 VFSS can evaluate the situation throughout the swallowing stage, dynamically observe the delivery of food and diagnose whether there is a hidden aspiration, which is recognised as the gold standard for the diagnosis of dysphagia. 48

Exclusion criteria

We will refer to the following exclusion criteria: (1) non-RCTs, including cohort studies, case reports, meta-analyses, reviews and conference papers; (2) dysphagia not caused by stroke (eg, trauma, Parkinson’s disease); (3) outcome indicators related to swallowing function were not reported; (4) repeated publication; and (5) full text cannot be obtained or data cannot be extracted.

Data sources and search strategy

We will search PubMed, Web of Science, Embase, Google Scholar, Cochrane, China National Knowledge Infrastructure, Chongqing VIP Database and WanFang Data from the database’s inception to 2 September 2023. All RCTs related to rTMS for PSD will be included. The studies will be limited to results published in Chinese or English. The search terms will include “repetitive transcranial magnetic stimulation”, “rTMS”, “post-stroke dysphagia”, “PSD” and other related terms. At the same time, we will conduct a secondary manual search of the references in the included literature and relevant systematic reviews to avoid missing important literature. In the case of PubMed, we will present the search strategy in detail in the online supplemental material .

Supplemental material

Study selection.

First, two researchers (LY and DZ) will use EndNote V.X9 software to eliminate duplicate literature, then they will conduct a preliminary screen of the literature by reading the title and abstract to exclude the articles that do not meet the inclusion criteria, and finally, they will evaluate the potentially qualified studies by reading the full text to determine the final included literature. In case of any disagreement, the third researcher (QX) will help to resolve the problem. We will present the entire literature screening process in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart, 43 and the detailed process is shown in figure 1 .

  • Download figure
  • Open in new tab
  • Download powerpoint

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart of study selection.

Data extraction

After two researchers (LY and DZ) read the final included literature, the following data will be extracted separately: (1) basic information (first author, publication time, country, sample size, intervention measures); (2) patient information (mean age, gender, hemiplegic side of stroke, stroke stage, course of disease); (3) rTMS-specific parameters and treatment protocols (stimulation frequency, stimulation target, stimulation intensity, total number of pulses, coil type, treatment protocol and duration); and (4) outcome measures (data on each outcome and adverse event and follow-up time). In cases of disagreement, the third researcher (QX) will assist in resolution.

Risk-of-bias assessment

Two researchers (LY and DZ) will independently evaluate the literature that meets the inclusion criteria using the Risk of Bias 2.0 provided by Cochrane Collaboration. 49 It consists of the following five aspects: (1) bias in the randomisation process; (2) bias from the intended intervention; (3) bias of missing outcome data; (4) outcome measurement bias and (5) bias of selective reporting. The degree of risk of bias was divided into ‘low risk of bias’, ‘high risk of bias’ and ‘uncertain risk of bias’. The overall risk of bias in a study was determined by combining the level of bias for each item. In cases of disagreement, the third researcher (QX) will participate and reach a consensus.

Data synthesis and statistical analyses

Pairwise meta-analysis.

Before performing the NMA, we will perform a standard pairwise meta-analysis using Stata V.14.2 (StataCorp, College Station, Texas, USA). Χ 2 test and I 2 statistic will be used to evaluate the heterogeneity of the studies. If I 2 ≤50%, indicating less heterogeneity, the fixed-effects model will be used for pooling. If I 2 >50%, indicating large heterogeneity, the random-effects model will be selected for pooling. 50 For continuous variables, the mean difference (MD) and its 95% CI will be used if the measurement instrument is the same. The standard MD and its 95% CI will be used if the measurement instrument is different. For dichotomous data, relative risk and its 95% CI will be used.

Network meta-analysis

We will perform a Bayesian NMA using Stata V.14.2 and R (V.4.1.2) (available at Index of/src/base/R-4 ( r-project.org )). Stata V.14.2 will be used to draw a network plot for different stimulation modalities of rTMS. In addition, the efficacy of different rTMS modalities will be ranked according to the surface under the cumulative ranking curve provided by Stata V.14.2. We will use R (V.4.1.2) to perform Bayesian NMA of random-effects models and use the Markov Chain Monte Carlo algorithm for statistical calculation. Each model will use four Markov chains to set the initial values. The number of iterations will be 50 000: the first 20 000 will be used for annealing to eliminate the influence of the initial values and the last 30 000 will be used for sampling calculations. 51 52

Assessment of similarity and consistency

Based on the selection of the above effect indicators, the principle of framework construction and the selection of statistical methods, R (V.4.1.0) software will be used to construct the consistency model and inconsistency model of Bayesian NMA and calculate their relevant results. R (V.4.1.0) software will be used to build the consistency model and the inconsistency model of Bayesian NMA and figure out their results. This is based on the choice of the above effect indicators, the framework construction principle and the statistical methods. 52 For the Bayesian NMA results of the generated consistency model and inconsistency model, we will use the Deviance Information Criterion (DIC) for global inconsistency detection. Significant global inconsistencies will be considered if the difference in DIC values between the two models is greater than one. For local inconsistency tests, if the outcome forms a closed-loop structure (including any pairwise direct comparisons), we will use node splitting to detect inconsistencies between direct and indirect comparisons. If there are any pairwise direct comparisons in the outcome of a local inconsistency test, we will use node splitting to find problems between direct and indirect comparisons if the structure is closed. Local inconsistency in the results will be considered at p<0.05; if the network diagram does not form a closed-loop structure, the inconsistency between the two results above will be determined directly by visual inspection. For the trace and density map and convergence diagnostics map generated by R (V.4.1.0), convergence diagnosis will be carried out through the Brooks-Gelman-Rubin method. If the potential scale reduced factor value is close to 1, it can be considered that the convergence is good, which will indicate that the statistical results are stable and credible.

Sensitivity analysis and subgroup analysis

When heterogeneity is significant, we will carefully read the original literature again to find whether there are significant clinical, methodological and statistical differences between studies. 53 We will further explore sources of heterogeneity by performing sensitivity analyses or subgroup analyses with the use of Stata V.14.2. 54

Meta-regression analysis

If necessary, we will perform a meta-regression analysis of factors such as patient demographics that may contribute to heterogeneity between studies. If the meta-regression coefficient is p<0.05, it will be considered one of the sources of heterogeneity. 55

Assessment of publication bias

If the number of included studies exceeds 10, we will assess small-study effects or the publication bias by the comparison-adjusted funnel plots generated and the results of Egger’s test. 56 57

Quality of evidence

The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) will be applied to evaluate the quality of evidence for all outcomes in the pairwise and network meta-analyses. Two researchers will import the data into GRADEprofiler software (GRADEpro, V.3.6.1) (available at www.gradeworkinggroup.org ), respectively, to evaluate the quality of the evidence. The GRADE system will include five evaluation items: risk of bias, inconsistency, indirectness, precision and publication bias. 58 The level of evidence was graded as very low, low, moderate and high. 59

Patient and public involvement

There will be no direct patient or public involvement in any aspect of this study.

Ethics and dissemination

As a literature-based systematic review and NMA, the data used in this study will all be extracted from pre-existing literature. Therefore, ethical approval is not required for this study. The findings will be submitted to peer-reviewed journals and disseminated at national/international academic conferences.

In recent years, with the development of brain imaging technology and non-invasive nerve stimulation technology, as well as the understanding of the neurophysiological characteristics of swallowing, rTMS has become one of the methods for the treatment of PSD. Many studies have confirmed the effectiveness of rTMS in the treatment of dysphagia and its superiority over other techniques. 42 60 61 The guideline for the diagnosis and treatment of PSD published by the European Stroke Organisation and European Society also recommends rTMS for PSD and suggests that it is more beneficial in combination with conventional swallowing therapy. 62 However, there is no unified standard for the selection of stimulation modalities when rTMS is used to treat PSD. In addition, there is no clear evidence-based medical evidence to support which stimulation modalities has the best effect. To some extent, these will lead to controversy and confusion in clinical application and hinder the process of recovery of patients with dysphagia after stroke.

This study conducted a comprehensive and quantitative analysis of the published literature data by the method of NMA to explore the effectiveness of different rTMS modalities, and to provide a basis for the comprehensive prevention and treatment of stroke dysphagia. Nonetheless, the study has several limitations: First, the severity, stage and lesion location of patients who had a stroke in this study were not uniform, and the effect of heterogeneity cannot be fully excluded. Second, the languages of the included articles were limited to Chinese and English, which may leave out valuable literature. Finally, the ranking of results based on NMA is only a statistical and methodological reference, due to the method itself still having some defects and limitations of application, the choice of rTMS should still be used in conjunction with the specific conditions of patients in the clinical process.

To the best of our knowledge, the present study will be the first systematic review and Bayesian NMA to compare the efficacy and tolerability of different rTMS modalities for PSD. The results of this study will help physicians and patients choose the optimal rTMS treatment and provide the latest theoretical basis for the rehabilitation application of rTMS in PSD.

Ethics statements

Patient consent for publication.

Not applicable.

  • Abbafati C , et al
  • Murphy TH ,
  • Schwarzbach CJ ,
  • Galligan NG ,
  • Coen RF , et al
  • Sherman V ,
  • Flowers H ,
  • Kapral MK , et al
  • Liesirova K ,
  • Broeg-Morvay A , et al
  • Wang SJ , et al
  • Zhang YY , et al
  • Park S-W , et al
  • Jung Y-S , et al
  • Pierpoint M ,
  • Dziewas R , et al
  • Daniels SK ,
  • Mukhi SV , et al
  • Sasegbon A ,
  • Abusrair A ,
  • AlHamoud I ,
  • González-Fernández M ,
  • Ottenstein L ,
  • Atanelov L , et al
  • Nonnenmacher J ,
  • Singer ML , et al
  • Chang WH , et al
  • Doeltgen SH ,
  • Bradnam LV ,
  • Young JA , et al
  • Lee KW , et al
  • Ünlüer NÖ ,
  • Temuçin ÇM ,
  • Demir N , et al
  • Liu L , et al
  • Abo-Elfetoh N ,
  • Rothwell JC
  • Kim BR , et al
  • Takeuchi N ,
  • Dong LH , et al
  • Lee J-W , et al
  • Su WD , et al
  • Yan WJ , et al
  • Jin MM , et al
  • Liu Z , et al
  • Cheng IKY ,
  • Wong C-S , et al
  • Rothwell JC , et al
  • Guo Z , et al
  • Cheng LJ , et al
  • Hsiao M-Y ,
  • Liu I-C , et al
  • Bossuyt PM , et al
  • Borders JC ,
  • O’Neil KH ,
  • Falk J , et al
  • Giraldo-Cadavid LF ,
  • Leal-Leaño LR ,
  • Leon-Basantes GA , et al
  • Sterne JAC ,
  • Savović J ,
  • Page MJ , et al
  • Higgins JPT ,
  • Thompson SG ,
  • Deeks JJ , et al
  • Jansen JP ,
  • Crawford B ,
  • Bergman G , et al
  • Lee J , et al
  • Rhodes KM ,
  • Turner RM ,
  • Higgins JPT
  • Shin I-S , et al
  • Spineli LM ,
  • Davey Smith G ,
  • Schneider M , et al
  • Chaimani A ,
  • Mavridis D , et al
  • Guyatt GH ,
  • Vist GE , et al
  • Higgins JP ,
  • Del Giovane C ,
  • Chaimani A , et al
  • Pisegna JM ,
  • Kaneoka A ,
  • Pearson WG , et al
  • Dziewas R ,
  • Trapl-Grundschober M , et al

Contributors QC conceived the original idea and initiated this protocol. HB was responsible for quality control and review of the articles. LY, DZ and QX participated in literature screening and literature extraction. The manuscript was prepared and written by QC and MK. XJ and LY collaborated on the revision of the paper. HL and MK conducted data analysis. All authors read and agreed to publish the protocol.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

IMAGES

  1. 3 Systematic Reviews and Meta-Analyses

    literature review meta analysis

  2. Types of Reviews

    literature review meta analysis

  3. Overview

    literature review meta analysis

  4. The difference between a systematic review and a meta-analysis

    literature review meta analysis

  5. What's the Difference between a Literature Review, Systematic Review

    literature review meta analysis

  6. (PDF) Meta-analysis of Systematic Literature Review Methods

    literature review meta analysis

VIDEO

  1. Literature, Systematic Review & Meta Analysis

  2. Systematic Literature Review & Meta Analysis (Day-3): DR. JASPREET KAUR

  3. Meta Analysis Research (मेटा विश्लेषण अनुसंधान) #educationalbyarun

  4. Systematic Literature Review & Meta Data Analysis Part 2 Malayalam

  5. Systematic review & meta-analysis Workshop

  6. Systematic literature Review and Meta-Analysis workshop on 23-01-2021 part 1

COMMENTS

  1. Introduction to systematic review and meta-analysis

    A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality. A meta-analysis is a valid, objective ...

  2. How to conduct a meta-analysis in eight steps: a practical guide

    Similar to conducting a literature review, the search process of a meta-analysis should be systematic ... Tanner-Smith EE (2017) A review of meta-analysis packages in R. J Edu Behav Stat 42(2):206-242. Article Google Scholar Polanin JR, Hennessy EA, Tsuji S (2020) Transparency and reproducibility of meta-analyses in psychology: a meta-review. ...

  3. Meta‐analysis and traditional systematic literature reviews—What, why

    Review Manager (RevMan) is a web-based software that manages the entire literature review process and meta-analysis. The meta-analyst uploads all studies to RevMan library, where they can be managed and exanimated for inclusion. Like CMA, RevMan enables authors to conduct overall analysis and moderator analysis. 4.4.6.3 Stata

  4. Systematic Reviews and Meta Analysis

    It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

  5. (PDF) Literature Reviews and Meta Analysis

    Although recent scholars recommended meta analysis in reviewing literature thoroughly (Durlak, 2010; Gopalakrishnan & Ganeshkumar, 2013), however, SLR has been widely applied by many researchers ...

  6. Literature Reviews and Meta Analysis

    The overall goals of a meta-analysis are the same as any review which were noted earlier (i.e., critically evaluate and summarize a body of research, reach some conclusions about that research, and offer suggestions for future work). The unique feature of a meta-analysis is its ability to quantify the magnitude of the findings via the effect size.

  7. Systematic reviews vs meta-analysis: what's the difference?

    A systematic review is an article that synthesizes available evidence on a certain topic utilizing a specific research question, pre-specified eligibility criteria for including articles, and a systematic method for its production. Whereas a meta-analysis is a quantitative, epidemiological study design used to assess the results of articles ...

  8. The PRISMA 2020 statement: an updated guideline for reporting ...

    The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline.

  9. Systematic Reviews and Meta-Analysis

    The course provides a general overview of all aspects of a scientific literature review, including formulating a problem, finding the relevant literature, coding studies, and meta-analysis. It follows guidelines and standards developed by the Campbell Collaboration, based on empirical evidence about how to produce the most comprehensive and ...

  10. Chapter 10: Analysing data and undertaking meta-analyses

    There are many potential sources of missing data in a systematic review or meta-analysis (see Table 10.12.a). For example, a whole study may be missing from the review, an outcome may be missing from a study, summary data may be missing for an outcome, and individual participants may be missing from the summary data.

  11. Literature Review, Systematic Review and Meta-analysis

    Meta-analysis is a specialised type of systematic review which is quantitative and rigorous, often comparing data and results across multiple similar studies. This is a common approach in medical research where several papers might report the results of trials of a particular treatment, for instance. The meta-analysis then statistical ...

  12. Systematic Reviews and Meta-Analysis: A Guide for Beginners

    Meta-analysis is a statistical tool that provides pooled estimates of effect from the data extracted from individual studies in the systematic review. The graphical output of meta-analysis is a forest plot which provides information on individual studies and the pooled effect. Systematic reviews of literature can be undertaken for all types of ...

  13. Types of Literature Reviews

    Meta-analysis: Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results: Aims for exhaustive, comprehensive searching. ... Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a ...

  14. What is meta-analysis?

    When clinicians begin their search for the best available evidence to inform decision-making, they are usually directed to the top of the 'evidence pyramid' to find out whether a systematic review and meta-analysis have been conducted. The Cochrane Library1 is fast filling with systematic reviews and meta-analyses that aim to answer important clinical questions and provide the most ...

  15. PDF A Literature Review and Meta-analysis of The Effects of Lockdowns on

    A Literature Review and Meta-Analysis of the Effects of Lockdowns on COVID-19 Mortality By Jonas Herby, Lars Jonung, and Steve H. Hanke About the Series The Studies in Applied Economics series is under the general direction of Prof. Steve H. Hanke, Founder and Co-Director of The Johns Hopkins Institute for Applied Economics, Global Health,

  16. LibGuides: A Guide to Conducting Reviews: Meta-Analysis

    Definition: A specialized subset of systematic reviews, meta-analysis is a statistical technique for combining the findings from disparate quantitative studies and using the pooled data to come to new statistical conclusions.Not all systematic reviews include meta-analysis, but all meta-analyses are found in systematic reviews. Aim: To synthesize evidence across studies to detect effects ...

  17. Meta-Analysis

    ANALYSIS includes numerical analysis of measures of effect assuming absence of heterogeneity. Source: A typology of reviews: an analysis of 14 review types and associated methodologies. Grant MJ & Booth A. Health information and Libraries Journal year: 2009 26(2):91 -108. doi: 10.1111/j.1471-1842.2009.00848.x.

  18. The literature review and meta-analysis: 2 journalism tools you should use

    For journalists, literature reviews and meta-analyses are important tools for investigating public policy issues and fact-checking claims made by elected leaders, campus administrators and others. But to use them, reporters first need to know how to find them. And, as with any source of information, reporters also should be aware of the ...

  19. A systematic review and multivariate meta-analysis of the ...

    This pre-registered systematic review and multilevel meta-analysis examined the effects of receiving touch for promoting mental and physical well-being, quantifying the efficacy of touch ...

  20. Full article: Risk factors for recurrence in patients with hormone

    We performed a systematic literature review and meta-analysis using the MEDLINE, Embase, Cochrane CENTRAL, and Japan Medical Abstract Society databases to identify risk factors for recurrence in HR+/HER2− early breast cancer in Japan. The primary outcome was relapse-free or disease-free survival (RFS/DFS), and the secondary outcomes were ...

  21. Comparison of the efficacy and tolerability of different repetitive

    Moreover, compared with LF-rTMS, HF-rTMS may be more beneficial to patients.40 Tan et al also conducted a systematic review and meta-analysis in 2022 and found that rTMS has a long-term effect on the recovery of swallowing function after stroke.41 Hsiao et al published a meta-analysis in 2023, which confirmed that both HF-rTMS on the affected ...

  22. A meta-analysis into the mediatory effects of family planning

    Background Despite conflicting findings in the current literature regarding the correlation between contraceptives and maternal health consequences, statistical analyses indicate that family planning may decrease the occurrence of such outcomes. Consequently, it is crucial to assess the capability of family planning to mitigate adverse maternal health outcomes. Objectives This review ...

  23. Applied Sciences

    Background: Physical activity (PA) and/or exercise improves postprandial cardiometabolic risk markers; however, the optimal exercise intensity, frequency, and dose remain unclear. We aimed to (1) compare the acute metabolic effects of interrupted prolonged sitting with PA bouts of different frequencies and durations on blood glucose, insulin, and triacylglycerol responses, and (2) compare the ...