Select either 'Selected records' or 'All records' and then click 'Export to CSV' to begin exporting.

educational research systematic review

Welcome to the International Database of Education Systematic Reviews.

IDESR is a database of published systematic reviews in Education and a clearinghouse for protocol registration of ongoing and planned systematic reviews.  From this page you can:

  • dive straight in and search the database
  • register a protocol for a planned systematic review
  • read more about IDESR ,
  • explore the rationale for IDESR
  • meet the team behind the IDESR project.

IDESR is now in its second phase of development and is accepting registrations of review protocols for all areas of education . Please check the inclusion criteria and submission guidance here. The library arm will continue to focus on published reviews in Language Education for the time being.

Search IDESR

educational research systematic review

The IDESR Team are grateful to the John Fell Fund and the Department of Education at the University of Oxford for their support of this pilot phase of IDESR. We believe it will provide proof of concept to inform future funding bids to enable the expansion of IDESR to include systematic reviews from all sub-fields of Education.

Keep up to date with developments at IDESR, including when we will be accepting protocol registrations from fields outside Language Education by following the IDESR blog and on Twitter @idesr_org .

educational research systematic review

This is the management page visible only to Gordon at the moment

Log in with your IDESR password

Contact us

Are you registered yet? You can do this here .

Reset your IDESR password

Email address

Register with IDESR

Affiliation

IDESR search

  • To search for all records use * as your search term
  • To search for truncated words add * to the end of the word, e.g. to search for words that start with caption use caption*
  • To search for an exact term use double quotes, e.g. "Captioned Video for L2 listening and vocabulary learning : A meta-analysis."

educational research systematic review

Record details

Before you register your review, inclusion criteria for idesr protocols.

IDESR accepts registrations of protocols for systematic reviews in all fields of education.

We operationalise systematic reviews as follows:

Systematic reviews are rigorous, transparent and replicable reviews of research literature. High quality systematic reviews aim to identify all relevant published and unpublished research reports on a given topic and provide an unbiased summary of totality of that evidence.

Systematic reviews select for inclusion reports of studies using a series of transparent and replicable steps. These include applying a predetermined systematic search strategy and assessment of the eligibility of studies for inclusion using clear inclusion/exclusion criteria.

The quality of each study included in the review should be assessed for trustworthiness. The criteria by which trustworthiness is assessed is informed by the types of study eligible for the review. For example, reviews of experiments might use the Cochrane Risk of Bias Tool , or Gorard's Sieve . Reviews of qualitative studies might use the Quality in Qualitative Evaluation Tool ; observational studies might use the Newcastle-Ottawa Scale .

The assembled body of evidence should be synthesised using narrative and/or statistical synthesis (meta-analysis) and interpreted taking into account the quality of the studies included.

For more information about the characteristics of high quality systematic reviews, reference should be made to PRISMA (the Preferred Reporting Items for Systematic Reviews and Meta-analyses) at www.prisma-statement.org.

Note: Although the terms are often used interchangeably, a meta-analysis is not a systematic review, it is a statistical technique. A systematic review may incorporate a meta-analysis, but to be considered a systematic review, the study must adhere to the methodological characteristics summarised above.

Protocol Registration

Before embarking on a systematic review, it is good scientific practice to prepare a protocol detailing the steps that will be taken to reduce biases in preparing the review, and to make this protocol publicly available. This helps to guard against poor scientific practices, such as outcome switching, selective reporting, and failure to publish. In addition, protocol registration helps to reduce unnecessary duplication of effort (by allowing prospective reviewers to check if other researchers have already embarked on a review addressing the same or a similar topic), and to foster collaboration (by alerting reviewers to other groups interested in the same or similar topics). It also gives reviewers an opportunity to demonstrate their commitment to open science. This is one of the key objectives of the IDESR Registry.

How to register your protocol

Before you start.

Check IDESR and other relevant databases for systematic reviews that have already been published or that are ongoing or proposed. Satisfy yourself that your proposed review does not unnecessarily duplicate work that has been or is being done.

You should have a complete or near complete protocol. If you intend to have your protocol peer-reviewed, this should be done before you begin the registration process. Protocols should be prepared in accordance with PRISMA-P (the PRISMA extension for protocols for systematic reviews), available here .

Confirm that work on the review has not progressed beyond the search phase. We understand that reviewers may have piloted their search strategy and may have screened some of the records returned in the pilot to help refine their inclusion/exclusion criteria before they are ready to submit a protocol registration. Thus, we do not insist that no work has been undertaken in the preparation of the review prior to registration. However, pre-registration is only helpful in reducing the potential for bias if it happens before the main bulk of the work on the review begins. That said, sometimes during the process of conducting a systematic review, new information comes to light and changes to the protocol are deemed necessary. In such cases, an update to the protocol should be submitted to IDESR. This will be date-stamped and published alongside the original protocol. This provides an audit trail for the review, keeping methodological choices open and transparent.

Protocols for completed and/or published systematic reviews should not be registered.

Protocols registered on IDESR should not be registered elsewhere. To maintain the integrity of the protocol registration process, only one version of a protocol registration should exist.

Submissions to IDESR must be in English. Though great fans of multilingualism, the IDESR team is not able to process applications in languages other than English. However, search terms may be in any language, and authors can include a link to different language versions of the protocol if they wish .

Registering your protocol

To register the protocol for your systematic review, first set up an IDESR account. You can do this here .

Once you have set up your account you will be able to access the protocol registration form. Completion is straightforward and involves entering the relevant information from your protocol into the appropriate fields on the form. A blank version of the form is available here ( PDF / Word ) to assist with preparing your draft protocol before finalising it for upload.

Once you have completed all the fields and clicked 'submit', you will receive a confirmation email and your application will be sent to one of the IDESR team for review. Once it has been reviewed you will receive an email telling you either a) that the application has been accepted for publication; b) that further information is required before your submission can be accepted; or c) that the application has been rejected (stating the reasons for this).

Note: IDESR is a clearinghouse for registrations of systematic review protocols, not an arbiter of the necessity or merit of any individual review, nor a judge of a review's methodological quality. These are matters for the review team, its funders, and the body providing peer review of the research. The IDESR team is responsible only for confirming that a protocol registration reflects the minimum criteria for systematic review protocols, as laid out in PRISMA-P, and to make a permanent, publicly available record of those which do. The decision to accept or reject your application will be made on the basis of these criteria only.

Access to your application will be suspended while it is being processed.

Once accepted, your registration will be converted to a PDF file and added to the IDESR Registry, and it will be made available to view by all users of the site.

Once published on the IDESR website, you will be able to update your registration if necessary. All updates will be date stamped to provide an audit trail.

On completion of your review you should update the record to say that the review is complete and to provide information about where it has been published. Bibliographic information about your published review will be added to the IDESR Library and will include a link back to your IDESR protocol.

Authors of published protocols retain the copyright to their words and grant IDESR exclusive rights to publish them in the form of an IDESR protocol registration. Authors agree to these being published under a Creative Commons Attribution-NonCommercial-NoDerivs licence. This license allows others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.

Eligibility criteria for inclusion in IDESR library

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Your registered records

Tell us about a systematic review that we don't have.

We have searched extensively for systematic reviews in language education, but inevitably we may have missed some. If you know of a systematic review that we don’t have, we want to know about it. First, please check the IDESR database by searching by author name, keyword or title. If the record does not exist in the IDESR database, please use the form below to let us know. Please complete as many fields as possible.

We operationalise systematic reviews as literature reviews that have a methods section. At minimum, the methods section should describe the way that literature was searched for and what the inclusion criteria were. Ideally, a systematic review will adhere to all defining criteria laid out in the PRISMA statement.

Abstract (if available)

About idesr.

The International Database of Education Systematic Reviews (IDESR) is a free-at-the-point-of-access electronic database of published systematic reviews in the field of Education and an online space where planned and ongoing Education systematic reviews can be registered, searched for and viewed.

IDESR is currently in its first phase of development, focusing on cataloguing and registering systematic reviews from one sub-field of Education: language education. This phase of the project has been generously supported by a John Fell Fund award and will serve as proof of concept for future funding bids to support the expansion of IDESR to include systematic reviews from all sub-fields of Education.

Ethical and useful research should build on what is already known, and systematic reviews of existing research are key in this regard. Systematic reviews aim to locate, critically assess, and synthesise the totality of reliable evidence relevant to a particular topic or question. When looking to inform educational policy, decision makers frequently rely on systematic reviews for an up-to-date, quality-assessed overview of available evidence. Teachers, teacher educators and publishers also look to systematic reviews for digestible, evidence-based guidance, seeking to underpin pedagogy and materials development with substantiated insights and approaches, a practice which is ultimately beneficial to their learners and so to society at large. Systematic reviews are thus a crucial element of the knowledge base on which the field of Education is founded. Despite their importance, however, locating systematic reviews in Education can be difficult and time-consuming. A dedicated database of Education systematic reviews is needed to simplify the location of Systematic Reviews in Education.

In addition to a database of completed systematic reviews, Education lacks a platform through which to register ongoing and planned systematic reviews. Prospective registration provides transparency by providing a permanent record of planned systematic reviews, irrespective of whether they are eventually published, and helps to detect and address publication bias (with underreporting of 'unflattering' or 'unexciting reviews'). Prospective registration helps researchers and other users to assess the published versions of systematic reviews against their protocols, and helping them to identify instances of poor practice, e.g., outcome switching. Prospective registration also helps those commissioning or planning systematic reviews to identify whether reviews on their chosen topic are already in preparation, thus helping to avoid duplication of effort, and facilitating collaboration. While registries for planned systematic reviews exist in other disciplines (e.g., PROSPERO for health-related reviews), until IDESR there has been no such registry in Education.

IDESR is coordinated by Dr Hamish Chalmers of the Department of Education, University of Oxford. Initial set up of IDESR was supported by Dr Jessica Briggs Baffoe-Djan and Jessica Brown (IDESR Research Assistant), and the IDESR advisory group as listed below.

Judy Sebba

The IDESR website was built and is maintained by Gordon Dooley and Metaxis Software Design .

If you have any questions or feedback about IDESR, please contact  [email protected]

Below is a list of your records. Records that have been submitted are locked until an administrator has approved or rejected them. Records that have been published may not be edited but may be updated.

For any questions about this site please contact Dr Hamish Chalmers [email protected]

IDESR disclaimer

Disclaimer To the extent permitted by law, IDESR provides this website and its contents on an "as is" basis and makes no representation or warranty of any kind regarding this website or any information, content, products or services on it. IDESR does not represent or warrant that the information accessible via this website is accurate, complete or current. In no circumstances, to the extent permitted by law, shall IDESR or any of its officers or employees be liable for any loss, additional costs or damage (howsoever arising) suffered as a result of any use of this website or its contents.

Links to third party information This website includes links to third party web sites. These links are used to provide further information and are not intended to signify that IDESR endorses such websites and/or their content. IDESR takes no responsibility for any loss or damage suffered as a result of using the information published on any of the pages of the linked web sites.

Published Protocols This website publishes protocol registrations submitted by users of the IDESR platform. IDESR takes no responsibility for the information contained in those protocols and consequently, IDESR does not and cannot guarantee the accuracy of such information.

Privacy poicy

IDESR is committed to protecting your privacy, as an IDESR account holder and/or user of our website. This privacy policy explains how we collect and use personal data we collect from you, or that you provide.

IDESR is housed at The Department of Education, University of Oxford. The key personnel at IDESR are named on the Team page of the IDESR website. The IDESR website was built and is maintained by Metaxis Software Design , and is stored on servers owned by Metaxis.

The IDESR blog, linked to from this website but not part of it, is housed on Wordpress.com and is covered by Wordpress's privacy policy .

The information we collect

Idesr account holders.

When users set up an IDESR account for the purpose of registering protocols of planned and ongoing systematic reviews, we ask for first and last names, email address, institutional affiliation, and geographical location. When an account holder submits a protocol registration form, we ask for the name, institutional affiliation, email address and physical mailing address of the main contact/corresponding author. We also ask for the names, institutional affiliations, and email addresses of any additional authors.

All IDESR users

We use Google Analytics to provide information about site usage. Information such as your IP address and your usage of our website is automatically collected each time you visit.

Our websites use cookies – small text files that are placed on your machine to help the websites provide a better user experience. In general, cookies are used to retain user preferences, store information for things like your protocol registrations, and provide anonymised tracking data to third party applications like Google Analytics. As a rule, cookies will make your browsing experience better. However, you may prefer to disable cookies on this site and on others. The most effective way to do this is to disable cookies in your browser. We suggest consulting the Help section of your browser or taking a look at the About Cookies website which offers guidance for all modern browsers.

How do we use your data?

We use the lawful bases of consent, contract and legitimate interests to process your personal data.

We may use your personal information to:

  • Administer and communicate with you for any reason related to your IDESR account
  • Send you email reminders when updates to any protocols you have registered are due
  • Notify you of any changes to our services
  • Respond to any enquiries made by you

We may use personal information to:

  • Understand how the IDESR website is used by its visitors, via information collected by third party services like Google Analytics
  • Maintain the safety and security of our websites and other online platforms, and to prevent fraud

Sharing your information

We may share your personal information with:

  • Any member of IDESR operational staff. The type of personal data shared will be relevant to the purpose for which the data is used. For example, to communicate with you via email to tell you the outcome of a protocol registration application.
  • Providers of IT services for administration of our websites and management of our internal systems.
  • Analytics and search engine providers such as Google Analytics, to help us to improve our website and your user experience.
  • Regulators, financial organisations, fraud detection and crime prevention agencies. This will be in order to comply with any legal obligations or mandatory reporting requirements.

In addition, accepted protocol registrations will be published on the IDESR website and will include the information provided by the IDESR account holder about the main contact/corresponding author and any additional authors. Published protocol registrations are freely available to view and download by any user of the IDESR website.

In addition, anonymised analytics information (for example geographical locations of users of the site, pages views, length of time on the site, etc.) may be used to provide usage reports for IDESR, its funders, and in any applications for funding support in the future. These anonymised data may also be used in scholarly publications about the IDESR project.

Where is your personal data is stored?

IDESR is based in the UK and is currently bound by regulations applying to members of the EEA (European Economic Area). We may transfer, store and process the data we collect from you at a destination outside of the EEA. Where it is necessary to do so, we will take all steps reasonably necessary to ensure that appropriate safeguards are in place to treat your data securely and in accordance with this policy. After 31 December 2020 we will review our privacy policy in the light of any changes to the UK's relationship with the EEA.

Information you provide is stored on our in-house servers and with third-party cloud providers. This policy covers processing once your information has been received by IDESR and does not cover any processing which may be carried out by your internet service provider (ISP). Any transmission of data via the internet is not completely secure and at your own risk. We recommend that you keep any passwords issued for access to our website and your personal information confidential.

Retention of your information

Unless we inform you otherwise, we will retain your personal information as follows:

  • For as long as is required to provide services you have requested as an IDESR account holder.
  • For as long as is necessary for our own legitimate interests (such as investigating misuse of the IDESR platform)
  • For retention periods in line with legal and regulatory requirements or guidance.
  • Accepted protocol registrations, including the personal data contained therein, are published permanently.

Your rights

You have several rights regarding the collection and use of your personal data. These include, but are not limited to a right to:

  • Be informed about the collection and processing of your personal data
  • Access to your personal information and how we process it
  • Object to the processing of your personal information
  • Rectification of any inaccurate personal information or have incomplete information completed
  • Erasure of personal information if we no longer have a lawful basis for retaining or processing it
  • Data portability – obtain and re-use your data in a commonly used, machine-readable format
  • Withdraw consent
  • Object to automated decision making and profiling

These do not apply in all circumstances. If you wish to use them, we will explain whether they are applicable in that instance.

You have the right to lodge a complaint with the Information Commissioner's Office (ICO) . It is usually expected that you would raise your concern with us, in the first instance.

Other websites

This privacy policy only covers our websites. Any other websites which are linked to from our websites have their own privacy policies which may differ from ours. We do not accept any responsibility or liability for these policies.

Updates to this Privacy Policy

We may change this privacy policy from time to time in response to changes in the law and/or how we use your personal information. We recommend that you review the contents of this privacy policy regularly. Your continued use of the websites after changes are posted constitutes your acceptance of this agreement as modified.

If you have any questions, comments, requests or complaints about this privacy policy or how we treat your personal data, then please contact the editor of IDESR Dr Hamish Chalmers, in the first instance:

Department of Education University of Oxford 15 Norham Gardens Oxford OX2 6PY

+44 (0)1865 284091

[email protected]

Subject access requests are free. We will respond to your request within one month.

If you believe that we have not met our obligations, you are entitled to contact the Information Commissioner's Office (ICO) .

This site uses cookies for navigation purposes only. Please see our Privacy policy for more details.

Logo Oapen

  • For Librarians
  • For Publishers
  • For Researchers
  •   OAPEN Home

Systematic Reviews in Educational Research

Methodology, Perspectives and Application

Thumbnail

Contributor(s)

Publisher website, publication date and place, classification.

  • Imported or submitted locally

Export search results

The export option will allow you to export the current search results of the entered query to a file. Differen formats are available for download. To export the items, click on the button corresponding with the preferred download format.

A logged-in user can export up to 15000 items. If you're not logged in, you can export no more than 500 items.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.

Penn State University Libraries

Systematic reviews in education and psychology: an introductory guide.

  • Systematic Review Essentials (Home)
  • Sample Workflow for Education and Psychology Systematic Reviews
  • Databases for Education and Psychology Systematic Reviews
  • Software Supporting Systematic Reviews
  • Other Review Types

Ellysa Stern Cahoy, Distinguished Librarian, Education Library, and Director, the Pennsylvania Center for the Book

Profile Photo

Welcome to Systematic Reviews!

Please note: this guide reuses and builds upon content from a prior library guide on systematic reviews in health care by Penn State librarians Christina Wissinger and Kathleen Phillips. Here, we have adapted their material to assist Education faculty and students.

A systematic review is a comprehensive analysis of all known evidence on a given subject. In the words of Siddaway, Wood, and Hedges (2019) , systematic reviews are "methodical, comprehensive, transparent, and replicable." Sometimes conducted for publication in scholarly venues, they are much more rigorous than the literature searches that students usually do when writing course papers. In Education and Psychology, systematic reviews typically include:

  • A clearly defined research question. The research question is often developed after performing preliminary searches on the subject, ensuring that it is viable and that no other systematic reviews exist on the topic.  
  • Evidence of a rigorous search process. Systematic searching demands a carefully planned search strategy, described in the methodology section of the review.. This often includes the databases used; the keywords and thesaurus terms searched; and any limits placed on the search.   
  • Inclusion and exclusion criteria. Not all evidence found through a search will be relevant to the research question. Clearly defined criteria must be used to decide which studies should be included in analysis.  
  • Critical appraisal of all included studies.

To Learn More About Systematic Reviews in Education and Psychology

  • The Art and Science of Quality Literature Reviews Offers important guidance on conducting systematic reviews.
  • The Concept of a Systematic Review Infographics explaining what systematic reviews are and how they work.
  • High Quality Meta-Analysis in a Systematic Review Explains meta-analysis, an important facet of many systematic reviews.
  • How to Do a Systematic Review Dispels many common misunderstandings about systematic reviews.
  • Systematic Reviews in Educational Research A book length overview of conducting systematic reviews of educational research.
  • What Are Systematic Reviews? A video that explains why systematic reviews are important, how they are done, and how interventions are compared.
  • SAGE Research Methods Core This link opens in a new window Definitions, book chapters, and other explanations about research methods, including systematic reviews. more... less... SAGE Research Methods is a research methods tool created to help researchers, faculty and students with their research projects. SAGE Research Methods links over 100,000 pages of SAGE's renowned book, journal and reference content with truly advanced search and discovery tools. Researchers can explore methods concepts to help them design research projects, understand particular methods or identify a new method, conduct their research, and write up their findings. Since SAGE Research Methods focuses on methodology rather than disciplines, it can be used across the social sciences, health sciences, and more. SAGE Research Methods contains content from more than 640 books, dictionaries, encyclopedias, and handbooks, the entire Little Green Book, and Little Blue Book series, two Major Works collating a selection of journal articles, and newly commissioned videos. Our access is to: SRM Core Update 2020-2025; SRM Cases (includes updates through 2025); SRM Cases 2.
  • Credo Information Literacy Tutorials Refreshers on choosing a research topic, primary vs. secondary literature, evaluating sources, and other general skills.

Kogut, A., Foster, M., Ramirez, D., & Xiao, D. (2019). Critical Appraisal of Mathematics Education Systematic Review Search Methods: Implications for Social Sciences Librarians. College & Research Libraries , 80 (7), 973–995. https://doi.org/10.5860/crl.80.7.973

Cover Art

  • Next: Sample Workflow for Education and Psychology Systematic Reviews >>
  • Last Updated: Sep 8, 2023 12:07 PM
  • URL: https://guides.libraries.psu.edu/edpsyreviews
  • Open access
  • Published: 01 May 2024

The effectiveness of virtual reality training on knowledge, skills and attitudes of health care professionals and students in assessing and treating mental health disorders: a systematic review

  • Cathrine W. Steen 1 , 2 ,
  • Kerstin Söderström 1 , 2 ,
  • Bjørn Stensrud 3 ,
  • Inger Beate Nylund 2 &
  • Johan Siqveland 4 , 5  

BMC Medical Education volume  24 , Article number:  480 ( 2024 ) Cite this article

375 Accesses

1 Altmetric

Metrics details

Virtual reality (VR) training can enhance health professionals’ learning. However, there are ambiguous findings on the effectiveness of VR as an educational tool in mental health. We therefore reviewed the existing literature on the effectiveness of VR training on health professionals’ knowledge, skills, and attitudes in assessing and treating patients with mental health disorders.

We searched MEDLINE, PsycINFO (via Ovid), the Cochrane Library, ERIC, CINAHL (on EBSCOhost), Web of Science Core Collection, and the Scopus database for studies published from January 1985 to July 2023. We included all studies evaluating the effect of VR training interventions on attitudes, knowledge, and skills pertinent to the assessment and treatment of mental health disorders and published in English or Scandinavian languages. The quality of the evidence in randomized controlled trials was assessed with the Cochrane Risk of Bias Tool 2.0. For non-randomized studies, we assessed the quality of the studies with the ROBINS-I tool.

Of 4170 unique records identified, eight studies were eligible. The four randomized controlled trials were assessed as having some concern or a high risk of overall bias. The four non-randomized studies were assessed as having a moderate to serious overall risk of bias. Of the eight included studies, four used a virtual standardized patient design to simulate training situations, two studies used interactive patient scenario training designs, while two studies used a virtual patient game design. The results suggest that VR training interventions can promote knowledge and skills acquisition.

Conclusions

The findings indicate that VR interventions can effectively train health care personnel to acquire knowledge and skills in the assessment and treatment of mental health disorders. However, study heterogeneity, prevalence of small sample sizes, and many studies with a high or serious risk of bias suggest an uncertain evidence base. Future research on the effectiveness of VR training should include assessment of immersive VR training designs and a focus on more robust studies with larger sample sizes.

Trial registration

This review was pre-registered in the Open Science Framework register with the ID-number Z8EDK.

Peer Review reports

A robustly trained health care workforce is pivotal to forging a resilient health care system [ 1 ], and there is an urgent need to develop innovative methods and emerging technologies for health care workforce education [ 2 ]. Virtual reality technology designs for clinical training have emerged as a promising avenue for increasing the competence of health care professionals, reflecting their potential to provide effective training [ 3 ].

Virtual reality (VR) is a dynamic and diverse field, and can be described as a computer-generated environment that simulates sensory experiences, where user interactions play a role in shaping the course of events within that environment [ 4 ]. When optimally designed, VR gives users the feeling that they are physically within this simulated space, unlocking its potential as a dynamic and immersive learning tool [ 5 ]. The cornerstone of the allure of VR is its capacity for creating artificial settings via sensory deceptions, encapsulated by the term ‘immersion’. Immersion conveys the sensation of being deeply engrossed or enveloped in an alternate world, akin to absorption in a video game. Some VR systems will be more immersive than others, based on the technology used to influence the senses. However, the degree of immersion does not necessarily determine the user’s level of engagement with the application [ 6 ].

A common approach to categorizing VR systems is based on the design of the technology used, allowing them to be classified into: 1) non-immersive desktop systems, where users experience virtual environments through a computer screen, 2) immersive CAVE systems with large projected images and motion trackers to adjust the image to the user, and 3) fully immersive head-mounted display systems that involve users wearing a headset that fully covers their eyes and ears, thus entirely immersing them in the virtual environment [ 7 ]. Advances in VR technology have enabled a wide range of VR experiences. The possibility for health care professionals to repeatedly practice clinical skills with virtual patients in a risk-free environment offers an invaluable learning platform for health care education.

The impact of VR training on health care professionals’ learning has predominantly been researched in terms of the enhancement of technical surgical abilities. This includes refining procedural planning, familiarizing oneself with medical instruments, and practicing psychomotor skills such as dexterity, accuracy, and speed [ 8 , 9 ]. In contrast, the exploration of VR training in fostering non-technical or ‘soft’ skills, such as communication and teamwork, appears to be less prevalent [ 10 ]. A recent systematic review evaluates the outcomes of VR training in non-technical skills across various medical specialties [ 11 ], focusing on vital cognitive abilities (e.g., situation awareness, decision-making) and interprofessional social competencies (e.g., teamwork, conflict resolution, leadership). These skills are pivotal in promoting collaboration among colleagues and ensuring a safe health care environment. At the same time, they are not sufficiently comprehensive for encounters with patients with mental health disorders.

For health care professionals providing care to patients with mental health disorders, acquiring specific skills, knowledge, and empathic attitudes is of utmost importance. Many individuals experiencing mental health challenges may find it difficult to communicate their thoughts and feelings, and it is therefore essential for health care providers to cultivate an environment where patients feel safe and encouraged to share feelings and thoughts. Beyond fostering trust, health care professionals must also possess in-depth knowledge about the nature and treatment of various mental health disorders. Moreover, they must actively practice and internalize the skills necessary to translate their knowledge into clinical practice. While the conventional approach to training mental health clinical skills has been through simulation or role-playing with peers under expert supervision and practicing with real patients, the emergence of VR applications presents a compelling alternative. This technology promises a potentially transformative way to train mental health professionals. Our review identifies specific outcomes in knowledge, skills, and attitudes, covering areas from theoretical understanding to practical application and patient interaction. By focusing on these measurable concepts, which are in line with current healthcare education guidelines [ 12 ], we aim to contribute to the knowledge base and provide a detailed analysis of the complexities in mental health care training. This approach is designed to highlight the VR training’s practical relevance alongside its contribution to academic discourse.

A recent systematic review evaluated the effects of virtual patient (VP) interventions on knowledge, skills, and attitudes in undergraduate psychiatry education [ 13 ]. This review’s scope is limited to assessing VP interventions and does not cover other types of VR training interventions. Furthermore, it adopts a classification of VP different from our review, rendering their findings and conclusions not directly comparable to ours.

To the best of our knowledge, no systematic review has assessed and summarized the effectiveness of VR training interventions for health professionals in the assessment and treatment of mental health disorders. This systematic review addresses the gap by exploring the effectiveness of virtual reality in the training of knowledge, skills, and attitudes health professionals need to master in the assessment and treatment of mental health disorders.

This systematic review follows the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analysis [ 14 ]. The protocol of the systematic review was registered in the Open Science Framework register with the registration ID Z8EDK.

We included randomized controlled trials, cohort studies, and pretest–posttest studies, which met the following criteria: a) a population of health care professionals or health care professional students, b) assessed the effectiveness of a VR application in assessing and treating mental health disorders, and c) reported changes in knowledge, skills, or attitudes. We excluded studies evaluating VR interventions not designed for training in assessing and treating mental health disorders (e.g., training of surgical skills), studies evaluating VR training from the first-person perspective, studies that used VR interventions for non-educational purposes and studies where VR interventions trained patients with mental health problems (e.g., social skills training). We also excluded studies not published in English or Scandinavian languages.

Search strategy

The literature search reporting was guided by relevant items in PRISMA-S [ 15 ]. In collaboration with a senior academic librarian (IBN), we developed the search strategy for the systematic review. Inspired by the ‘pearl harvesting’ information retrieval approach [ 16 ], we anticipated a broad spectrum of terms related to our interdisciplinary query. Recognizing that various terminologies could encapsulate our central ideas, we harvested an array of terms for each of the four elements ‘health care professionals and health care students’, ‘VR’, ‘training’, and ‘mental health’. The pearl harvesting framework [ 16 ] consists of four steps which we followed with some minor adaptions. Step 1: We searched for and sampled a set of relevant research articles, a book chapter, and literature reviews. Step 2: The librarian scrutinized titles, abstracts, and author keywords, as well as subject headings used in databases, and collected relevant terms. Step 3: The librarian refined the lists of terms. Step 4: The review group, in collaboration with a VR consultant from KildeGruppen AS (a Norwegian media company), validated the refined lists of terms to ensure they included all relevant VR search terms. This process for the element VR resulted in the inclusion of search terms such as ‘3D simulated environment’, ‘second life simulation’, ‘virtual patient’, and ‘virtual world’. We were given a peer review of the search strategy by an academic librarian at Inland Norway University of Applied Sciences.

In June and July 2021, we performed comprehensive searches for publications dating from January 1985 to the present. This period for the inclusion of studies was chosen since VR systems designed for training in health care first emerged in the early 1990s. The searches were carried out in seven databases: MEDLINE and PsycInfo (on Ovid), ERIC and CINAHL (on EBSCOhost), the Cochrane Library, Web of Science Core Collection, and Scopus. Detailed search strategies from each database are available for public access at DataverseNO [ 17 ]. On July 2, 2021, a search in CINAHL yielded 993 hits. However, when attempting to transfer these records to EndNote using the ‘Folder View’—a feature designed for organizing and managing selected records before export—only 982 records were successfully transferred. This discrepancy indicates that 11 records could not be transferred through Folder View, for reasons not specified. The process was repeated twice, consistently yielding the same discrepancy. The missing 11 records pose a risk of failing to capture relevant studies in the initial search. In July 2023, to make sure that we included the latest publications, we updated our initial searches, focusing on entries since January 1, 2021. This ensured that we did not miss any new references recently added to these databases. Due to a lack of access to the Cochrane Library in July 2023, we used EBMR (Evidence Based Medicine Reviews) on the Ovid platform instead, including the databases Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, and Cochrane Clinical Answers. All references were exported to Endnote and duplicates were removed. The number of records from each database can be observed in the PRISMA diagram [ 14 ], Fig.  1 .

figure 1

PRISMA flow chart of the records and study selection process

Study selection and data collection

Two reviewers (JS, CWS) independently assessed the titles and abstracts of studies retrieved from the literature search based on the eligibility criteria. We employed the Rayyan website for the screening process [ 18 ]. The same reviewers (JS, CWS) assessed the full-text articles selected after the initial screening. Articles meeting the eligibility criteria were incorporated into the review. Any disagreements were resolved through discussion.

Data extracted from the studies by the first author (CWS) and cross-checked by another reviewer (JS) included: authors of the study, publication year, country, study design, participant details (education, setting), interventions (VR system, class label), comparison types, outcomes, and main findings. This data is summarized in Table  1 and Additional file 1 . In the process of reviewing the VR interventions utilized within the included studies, we sought expertise from advisers associated with VRINN, a Norwegian immersive learning cluster, and SIMInnlandet, a center dedicated to simulation in mental health care at Innlandet Hospital Trust. This collaboration ensured a thorough examination and accurate categorization of the VR technologies applied. Furthermore, the classification of the learning designs employed in the VP interventions was conducted under the guidance of an experienced VP scholar at Paracelcus Medical University in Salzburg.

Data analysis

We initially intended to perform a meta-analysis with knowledge, skills, and attitudes as primary outcomes, planning separate analyses for each. However, due to significant heterogeneity observed among the included studies, it was not feasible to carry out a meta-analysis. Consequently, we opted for a narrative synthesis based on these pre-determined outcomes of knowledge, skills, and attitudes. This approach allowed for an analysis of the relationships both within and between the studies. The effect sizes were calculated using a web-based effect size calculator [ 27 ]. We have interpreted effect sizes based on commonly used descriptions for Cohen’s d: small = 0.2, moderate = 0.5, and large = 0.8, and for Cramer’s V: small = 0.10, medium = 0.30, and large = 0.50.

Risk of bias assessment

JS and CWS independently evaluated the risk of bias for all studies using two distinct assessment tools. We used the Cochrane risk of bias tool RoB 2 [ 28 ] to assess the risk of bias in the RCTs. With the RoB 2 tool, the bias was assessed as high, some concerns or low for five domains: randomization process, deviations from the intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result [ 28 ].

We used the Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool [ 29 ] to assess the risk of bias in the cohort and single-group studies. By using ROBINS-I for the non-randomized trials, the risk of bias was assessed using the categories low, moderate, serious, critical or no information for seven domains: confounding, selection of participants, classification of interventions, deviations from intended interventions, missing data, measurement of outcomes, and selection of the reported result [ 29 ].

We included eight studies in the review (Fig.  1 ). An overview of the included studies is presented in detail in Table  1 .

Four studies were RCTs [ 19 , 20 , 21 , 22 ], two were single group pretest–posttest studies [ 23 , 26 ], one was a controlled before and after study [ 25 ], and one was a cohort study [ 24 ]. The studies included health professionals from diverse educational backgrounds, including some from mental health and medical services, as well as students in medicine, social work, and nursing. All studies, published from 2009 to 2021, utilized non-immersive VR desktop system interventions featuring various forms of VP designs. Based on an updated classification of VP interventions by Kononowicz et al. [ 30 ] developed from a model proposed by Talbot et al. [ 31 ], we have described the characteristics of the interventions in Table  1 . Four of the studies utilized a virtual standardized patient (VSP) intervention [ 20 , 21 , 22 , 23 ], a conversational agent that simulates clinical presentations for training purposes. Two studies employed an interactive patient scenario (IPS) design [ 25 , 26 ], an approach that primarily uses text-based multimedia, enhanced with images and case histories through text or voice narratives, to simulate clinical scenarios. Lastly, two studies used a virtual patient game (VP game) intervention [ 19 , 24 ]. These interventions feature training scenarios using 3D avatars, specifically designed to improve clinical reasoning and team training skills. It should be noted that the interventions classified as VSPs in this review, being a few years old, do not encompass artificial intelligence (AI) as we interpret it today. However, since the interventions include some kind of algorithm that provides answers to questions, we consider them as conversational agents, and therefore as VSPs. As the eight included studies varied significantly in terms of design, interventions, and outcome measures, we could not incorporate them into a meta-analysis.

The overall risk of bias for the four RCTs was high [ 19 , 20 , 22 ] or of some concern [ 21 ] (Fig.  2 ). They were all assessed as low or of some concern in the domains of randomization. Three studies were assessed with a high risk of bias in one [ 19 , 20 ] or two domains [ 22 ]; one study had a high risk of bias in the domain of selection of the reported result [ 19 ], one in the domain of measurement of outcome [ 20 ], and one in the domains of deviation from the intended interventions and missing outcome data [ 22 ]. One study was not assessed as having a high risk of bias in any domain [ 21 ].

figure 2

Risk of bias summary: review authors assessments of each risk of bias item in the included RCT studies

For the four non-randomized studies, the overall risk of bias was judged to be moderate [ 26 ] or serious [ 23 , 24 , 25 ] (Fig.  3 ). One study had a serious risk of bias in two domains: confounding and measurement of outcomes [ 23 ]. Two studies had a serious risk of bias in one domain, namely confounding [ 24 , 25 ], while one study was judged not to have a serious risk of bias in any domain [ 26 ].

figure 3

Risk of bias summary: review authors assessments of each risk of bias item in the included non-randomized studies

Three studies investigated the impact of virtual reality training on mental health knowledge [ 24 , 25 , 26 ]. One study with 32 resident psychiatrists in a single group pretest–posttest design assessed the effect of a VR training intervention on knowledge of posttraumatic stress disorder (PTSD) symptomatology, clinical management, and communication skills [ 26 ]. The intervention consisted of an IPS. The assessment of the outcome was conducted using a knowledge test with 11 multiple-choice questions and was administered before and after the intervention. This study reported a significant improvement on the knowledge test after the VR training intervention.

The second study examined the effect of a VR training intervention on knowledge of dementia [ 25 ], employing a controlled before and after design. Seventy-nine medical students in clinical training were divided into two groups, following a traditional learning program. The experimental group received an IPS intervention. The outcome was evaluated with a knowledge test administered before and after the intervention with significantly higher posttest scores in the experimental group than in the control group, with a moderate effects size observed between the groups.

A third study evaluated the effect of a VR training intervention on 299 undergraduate nursing students’ diagnostic recognition of depression and schizophrenia (classified as knowledge) [ 24 ]. In a prospective cohort design, the VR intervention was the only difference in the mental health related educational content provided to the two cohorts, and consisted of a VP game design, developed to simulate training situations with virtual patient case scenarios, including depression and schizophrenia. The outcome was assessed by determining the accuracy of diagnoses made after reviewing case vignettes of depression and schizophrenia. The study found no statistically significant effect of VR training on diagnostic accuracy between the simulation and the non-simulation cohort.

Summary: All three studies assessing the effect of a VR intervention on knowledge were non-randomized studies with different study designs using different outcome measures. Two studies used an IPS design, while one study used a VP game design. Two of the studies found a significant effect of VR training on knowledge. Of these, one study had a moderate overall risk of bias [ 26 ], while the other was assessed as having a serious overall risk of bias [ 25 ]. The third study, which did not find any effect of the virtual reality intervention on knowledge, was assessed to have a serious risk of bias [ 24 ].

Three RCTs assessed the effectiveness of VR training on skills [ 20 , 21 , 22 ]. One of them evaluated the effect of VR training on clinical skills in alcohol screening and intervention [ 20 ]. In this study, 102 health care professionals were randomly allocated to either a group receiving no training or a group receiving a VSP intervention. To evaluate the outcome, three standardized patients rated each participant using a checklist based on clinical criteria. The VSP intervention group demonstrated significantly improved posttest skills in alcohol screening and brief intervention compared to the control group, with moderate and small effect sizes, respectively.

Another RCT, including 67 medical college students, evaluated the effect of VR training on clinical skills by comparing the frequency of questions asked about suicide in a VSP intervention group and a video module group [ 21 ]. The assessment of the outcome was a psychiatric interview with a standardized patient. The primary outcome was the frequency with which the students asked the standardized patient five questions about suicide risk. Minimal to small effect sizes were noted in favor of the VSP intervention, though they did not achieve statistical significance for any outcomes.

One posttest only RCT evaluated the effect of three training programs on skills in detecting and diagnosing major depressive disorder and posttraumatic stress disorder (PTSD) [ 22 ]. The study included 30 family physicians, and featured interventions that consisted of two different VSPs designed to simulate training situations, and one text-based program. A diagnostic form filled in by the participants after the intervention was used to assess the outcome. The results revealed a significant effect on diagnostic accuracy for major depressive disorder for both groups receiving VR training, compared to the text-based program, with large effect sizes observed. For PTSD, the intervention using a fixed avatar significantly improved diagnostic accuracy with a large effect size, whereas the intervention with a choice avatar demonstrated a moderate to large effect size compared to the text-based program.

Summary: Three RCTs assessed the effectiveness of VR training on clinical skills [ 20 , 21 , 22 ], all of which used a VSP design. To evaluate the effect of training, two of the studies utilized standardized patients with checklists. The third study measured the effect on skills using a diagnostic form completed by the participants. Two of the studies found a significant effect on skills [ 20 , 22 ], both were assessed to have a high risk of bias. The third study, which did not find any effect of VR training on skills, had some concern for risk of bias [ 21 ].

Knowledge and skills

One RCT study with 227 health care professionals assessed knowledge and skills as a combined outcome compared to a waitlist control group, using a self-report survey before and after the VR training [ 19 ]. The training intervention was a VP game designed to practice knowledge and skills related to mental health and substance abuse disorders. To assess effect of the training, participants completed a self-report scale measuring perceived knowledge and skills. Changes between presimulation and postsimulation scores were reported only for the within treatment group ( n  = 117), where the composite postsimulation score was significantly higher than the presimulation score, with a large effect size observed. The study was judged to have a high risk of bias in the domain of selection of the reported result.

One single group pretest–posttest study with 100 social work and nursing students assessed the effect of VSP training on attitudes towards individuals with substance abuse disorders [ 23 ]. To assess the effect of the training, participants completed an online pretest and posttest survey including questions from a substance abuse attitudes survey. This study found no significant effect of VR training on attitudes and was assessed as having a serious risk of bias.

Perceived competence

The same single group pretest–posttest study also assessed the effect of a VSP training intervention on perceived competence in screening, brief intervention, and referral to treatment in encounters with patients with substance abuse disorders [ 23 ]. A commonly accepted definition of competence is that it comprises integrated components of knowledge, skills, and attitudes that enable the successful execution of a professional task [ 32 ]. To assess the effect of the training, participants completed an online pretest and posttest survey including questions on perceived competence. The study findings demonstrated a significant increase in perceived competence following the VSP intervention. The risk of bias in this study was judged as serious.

This systematic review aimed to investigate the effectiveness of VR training on knowledge, skills, and attitudes that health professionals need to master in the assessment and treatment of mental health disorders. A narrative synthesis of eight included studies identified VR training interventions that varied in design and educational content. Although mixed results emerged, most studies reported improvements in knowledge and skills after VR training.

We found that all interventions utilized some type of VP design, predominantly VSP interventions. Although our review includes a limited number of studies, it is noteworthy that the distribution of interventions contrasts with a literature review on the use of ‘virtual patient’ in health care education from 2015 [ 30 ], which identified IPS as the most frequent intervention. This variation may stem from our review’s focus on the mental health field, suggesting a different intervention need and distribution than that observed in general medical education. A fundamental aspect of mental health education involves training skills needed for interpersonal communication, clinical interviews, and symptom assessment, which makes VSPs particularly appropriate. While VP games may be suitable for clinical reasoning in medical fields, offering the opportunity to perform technical medical procedures in a virtual environment, these designs may present some limitations for skills training in mental health education. Notably, avatars in a VP game do not comprehend natural language and are incapable of engaging in conversations. Therefore, the continued advancement of conversational agents like VSPs is particularly compelling and considered by scholars to hold the greatest potential for clinical skills training in mental health education [ 3 ]. VSPs, equipped with AI dialogue capabilities, are particularly valuable for repetitive practice in key skills such as interviewing and counseling [ 31 ], which are crucial in the assessment and treatment of mental health disorders. VSPs could also be a valuable tool for the implementation of training methods in mental health education, such as deliberate practice, a method that has gained attention in psychotherapy training in recent years [ 33 ] for its effectiveness in refining specific performance areas through consistent repetition [ 34 ]. Within this evolving landscape, AI system-based large language models (LLMs) like ChatGPT stand out as a promising innovation. Developed from extensive datasets that include billions of words from a variety of sources, these models possess the ability to generate and understand text in a manner akin to human interaction [ 35 ]. The integration of LLMs into educational contexts shows promise, yet careful consideration and thorough evaluation of their limitations are essential [ 36 ]. One concern regarding LLMs is the possibility of generating inaccurate information, which represents a challenge in healthcare education where precision is crucial [ 37 ]. Furthermore, the use of generative AI raises ethical questions, notably because of potential biases in the training datasets, including content from books and the internet that may not have been verified, thereby risking the perpetuation of these biases [ 38 ]. Developing strategies to mitigate these challenges is imperative, ensuring LLMs are utilized safely in healthcare education.

All interventions in our review were based on non-immersive desktop VR systems, which is somewhat surprising considering the growing body of literature highlighting the impact of immersive VR technology in education, as exemplified by reviews such as that of Radianti et al. [ 39 ]. Furthermore, given the recent accessibility of affordable, high-quality head-mounted displays, this observation is noteworthy. Research has indicated that immersive learning based on head-mounted displays generally yields better learning outcomes than non-immersive approaches [ 40 ], making it an interesting research area in mental health care training and education. Studies using immersive interventions were excluded in the present review because of methodological concerns, paralleling findings described in a systematic review on immersive VR in education [ 41 ], suggesting the potential early stage of research within this field. Moreover, the integration of immersive VR technology into mental health care education may encounter challenges associated with complex ethical and regulatory frameworks, including data privacy concerns exemplified by the Oculus VR headset-Facebook integration, which could restrict the implementation of this technology in healthcare setting. Prioritizing specific training methodologies for enhancing skills may also affect the utilization of immersive VR in mental health education. For example, integrating interactive VSPs into a fully immersive VR environment remains a costly endeavor, potentially limiting the widespread adoption of immersive VR in mental health care. Meanwhile, the use of 360-degree videos in immersive VR environments for training purposes [ 42 ] can be realized with a significantly lower budget. Immersive VR offers promising opportunities for innovative training, but realizing its full potential in mental health care education requires broader research validation and the resolution of existing obstacles.

This review bears some resemblance to the systematic review by Jensen et al. on virtual patients in undergraduate psychiatry education [ 13 ] from 2024, which found that virtual patients improved learning outcomes compared to traditional methods. However, these authors’ expansion of the commonly used definition of virtual patient makes their results difficult to compare with the findings in the present review. A recognized challenge in understanding VR application in health care training arises from the literature on VR training for health care personnel, where ‘virtual patient’ is a term broadly used to describe a diverse range of VR interventions, which vary significantly in technology and educational design [ 3 , 30 ]. For instance, reviews might group different interventions using various VR systems and designs under a single label (virtual patient), or primary studies may use misleading or inadequately defined classifications for the virtual patient interventions evaluated. Clarifying the similarities and differences among these interventions is vital to inform development and enhance communication and understanding in educational contexts [ 43 ].

Strengths and limitations

To the best of our knowledge, this is the first systematic review to evaluate the effectiveness of VR training on knowledge, skills, and attitudes in health care professionals and students in assessing and treating mental health disorders. This review therefore provides valuable insights into the use of VR technology in training and education for mental health care. Another strength of this review is the comprehensive search strategy developed by a senior academic librarian at Inland Norway University of Applied Sciences (HINN) and the authors in collaboration with an adviser from KildeGruppen AS (a Norwegian media company). The search strategy was peer-reviewed by an academic librarian at HINN. Advisers from VRINN (an immersive learning cluster in Norway) and SIMInnlandet (a center for simulation in mental health care at Innlandet Hospital Trust) provided assistance in reviewing the VR systems of the studies, while the classification of the learning designs was conducted under the guidance of a VP scholar. This systematic review relies on an established and recognized classification of VR interventions for training health care personnel and may enhance understanding of the effectiveness of VR interventions designed for the training of mental health care personnel.

This review has some limitations. As we aimed to measure the effect of the VR intervention alone and not the effect of a blended training design, the selection of included studies was limited. Studies not covered in this review might have offered different insights. Given the understanding that blended learning designs, where technology is combined with other forms of learning, have significant positive effects on learning outcomes [ 44 ], we were unable to evaluate interventions that may be more effective in clinical settings. Further, by limiting the outcomes to knowledge, skills, and attitudes, we might have missed insights into other outcomes that are pivotal to competence acquisition.

Limitations in many of the included studies necessitate cautious interpretation of the review’s findings. Small sample sizes and weak designs in several studies, coupled with the use of non-validated outcome measures in some studies, diminish the robustness of the findings. Furthermore, the risk of bias assessment in this review indicates a predominantly high or serious risk of bias across most of the studies, regardless of their design. In addition, the heterogeneity of the studies in terms of study design, interventions, and outcome measures prevented us from conducting a meta-analysis.

Further research

Future research on the effectiveness of VR training for specific learning outcomes in assessing and treating mental health disorders should encompass more rigorous experimental studies with larger sample sizes. These studies should include verifiable descriptions of the VR interventions and employ validated tools to measure outcomes. Moreover, considering that much professional learning involves interactive and reflective practice, research on VR training would probably be enhanced by developing more in-depth study designs that evaluate not only the immediate learning outcomes of VR training but also the broader learning processes associated with it. Future research should also concentrate on utilizing immersive VR training applications, while additionally exploring the integration of large language models to augment interactive learning in mental health care. Finally, this review underscores the necessity in health education research involving VR to communicate research findings using agreed terms and classifications, with the aim of providing a clearer and more comprehensive understanding of the research.

This systematic review investigated the effect of VR training interventions on knowledge, skills, and attitudes in the assessment and treatment of mental health disorders. The results suggest that VR training interventions can promote knowledge and skills acquisition. Further studies are needed to evaluate VR training interventions as a learning tool for mental health care providers. This review emphasizes the necessity to improve future study designs. Additionally, intervention studies of immersive VR applications are lacking in current research and should be a future area of focus.

Availability of data and materials

Detailed search strategies from each database is available in the DataverseNO repository, https://doi.org/10.18710/TI1E0O .

Abbreviations

Virtual Reality

Cave Automatic Virtual Environment

Randomized Controlled Trial

Non-Randomized study

Virtual Standardized Patient

Interactive Patient Scenario

Virtual Patient

Post Traumatic Stress Disorder

Standardized Patient

Artificial intelligence

Inland Norway University of Applied Sciences

Doctor of Philosophy

Frenk J, Chen L, Bhutta ZA, Cohen J, Crisp N, Evans T, et al. Health professionals for a new century: transforming education to strengthen health systems in an interdependent world. Lancet. 2010;376(9756):1923–58.

Article   Google Scholar  

World Health Organization. eLearning for undergraduate health professional education: a systematic review informing a radical transformation of health workforce development. Geneva: World Health Organization; 2015.

Google Scholar  

Talbot T, Rizzo AS. Virtual human standardized patients for clinical training. In: Rizzo AS, Bouchard S, editors. Virtual reality for psychological and neurocognitive interventions. New York: Springer; 2019. p. 387–405.

Chapter   Google Scholar  

Merriam-Webster dictionary. Springfield: Merriam-Webster Incorporated; c2024. Virtual reality. Available from: https://www.merriam-webster.com/dictionary/virtual%20reality . [cited 2024 Mar 24].

Winn W. A conceptual basis for educational applications of virtual reality. Technical Publication R-93–9. Seattle: Human Interface Technology Laboratory, University of Washington; 1993.

Bouchard S, Rizzo AS. Applications of virtual reality in clinical psychology and clinical cognitive neuroscience–an introduction. In: Rizzo AS, Bouchard S, editors. Virtual reality for psychological and neurocognitive interventions. New York: Springer; 2019. p. 1–13.

Waller D, Hodgson E. Sensory contributions to spatial knowledge of real and virtual environments. In: Steinicke F, Visell Y, Campos J, Lécuyer A, editors. Human walking in virtual environments: perception, technology, and applications. New York: Springer New York; 2013. p. 3–26. https://doi.org/10.1007/978-1-4419-8432-6_1 .

Choudhury N, Gélinas-Phaneuf N, Delorme S, Del Maestro R. Fundamentals of neurosurgery: virtual reality tasks for training and evaluation of technical skills. World Neurosurg. 2013;80(5):e9–19. https://doi.org/10.1016/j.wneu.2012.08.022 .

Gallagher AG, Ritter EM, Champion H, Higgins G, Fried MP, Moses G, et al. Virtual reality simulation for the operating room: proficiency-based training as a paradigm shift in surgical skills training. Ann Surg. 2005;241(2):364–72. https://doi.org/10.1097/01.sla.0000151982.85062.80 .

Kyaw BM, Saxena N, Posadzki P, Vseteckova J, Nikolaou CK, George PP, et al. Virtual reality for health professions education: systematic review and meta-analysis by the Digital Health Education Collaboration. J Med Internet Res. 2019;21(1):e12959. https://doi.org/10.2196/12959 .

Bracq M-S, Michinov E, Jannin P. Virtual reality simulation in nontechnical skills training for healthcare professionals: a systematic review. Simul Healthc. 2019;14(3):188–94. https://doi.org/10.1097/sih.0000000000000347 .

World Health Organization. Transforming and scaling up health professionals’ education and training: World Health Organization guidelines 2013. Geneva: World Health Organization; 2013. Available from: https://www.who.int/publications/i/item/transforming-and-scaling-up-health-professionals%E2%80%99-education-and-training . Accessed 15 Jan 2024.

Jensen RAA, Musaeus P, Pedersen K. Virtual patients in undergraduate psychiatry education: a systematic review and synthesis. Adv Health Sci Educ. 2024;29(1):329–47. https://doi.org/10.1007/s10459-023-10247-6 .

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev. 2021;10(1):89. https://doi.org/10.1186/s13643-021-01626-4 .

Rethlefsen ML, Kirtley S, Waffenschmidt S, Ayala AP, Moher D, Page MJ, et al. PRISMA-S: an extension to the PRISMA statement for reporting literature searches in systematic reviews. Syst Rev. 2021;10(1):39. https://doi.org/10.1186/s13643-020-01542-z .

Sandieson RW, Kirkpatrick LC, Sandieson RM, Zimmerman W. Harnessing the power of education research databases with the pearl-harvesting methodological framework for information retrieval. J Spec Educ. 2010;44(3):161–75. https://doi.org/10.1177/0022466909349144 .

Steen CW, Söderström K, Stensrud B, Nylund IB, Siqveland J. Replication data for: the effectiveness of virtual reality training on knowledge, skills and attitudes of health care professionals and students in assessing and treating mental health disorders: a systematic review. In: Inland Norway University of Applied S, editor. V1 ed: DataverseNO; 2024. https://doi.org/10.18710/TI1E0O .

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):210. https://doi.org/10.1186/s13643-016-0384-4 .

Albright G, Bryan C, Adam C, McMillan J, Shockley K. Using virtual patient simulations to prepare primary health care professionals to conduct substance use and mental health screening and brief intervention. J Am Psych Nurses Assoc. 2018;24(3):247–59. https://doi.org/10.1177/1078390317719321 .

Fleming M, Olsen D, Stathes H, Boteler L, Grossberg P, Pfeifer J, et al. Virtual reality skills training for health care professionals in alcohol screening and brief intervention. J Am Board Fam Med. 2009;22(4):387–98. https://doi.org/10.3122/jabfm.2009.04.080208 .

Foster A, Chaudhary N, Murphy J, Lok B, Waller J, Buckley PF. The use of simulation to teach suicide risk assessment to health profession trainees—rationale, methodology, and a proof of concept demonstration with a virtual patient. Acad Psych. 2015;39:620–9. https://doi.org/10.1007/s40596-014-0185-9 .

Satter R. Diagnosing mental health disorders in primary care: evaluation of a new training tool [dissertation]. Tempe (AZ): Arizona State University; 2012.

Hitchcock LI, King DM, Johnson K, Cohen H, McPherson TL. Learning outcomes for adolescent SBIRT simulation training in social work and nursing education. J Soc Work Pract Addict. 2019;19(1/2):47–56. https://doi.org/10.1080/1533256X.2019.1591781 .

Liu W. Virtual simulation in undergraduate nursing education: effects on students’ correct recognition of and causative beliefs about mental disorders. Comput Inform Nurs. 2021;39(11):616–26. https://doi.org/10.1097/CIN.0000000000000745 .

Matsumura Y, Shinno H, Mori T, Nakamura Y. Simulating clinical psychiatry for medical students: a comprehensive clinic simulator with virtual patients and an electronic medical record system. Acad Psych. 2018;42(5):613–21. https://doi.org/10.1007/s40596-017-0860-8 .

Pantziaras I, Fors U, Ekblad S. Training with virtual patients in transcultural psychiatry: Do the learners actually learn? J Med Internet Res. 2015;17(2):e46. https://doi.org/10.2196/jmir.3497 .

Wilson DB. Practical meta-analysis effect size calculator [Online calculator]. n.d. https://campbellcollaboration.org/research-resources/effect-size-calculator.html . Accessed 08 March 2024.

Sterne JA, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. Br Med J. 2019;366:l4898.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. Br Med J. 2016;355:i4919. https://doi.org/10.1136/bmj.i4919 .

Kononowicz AA, Zary N, Edelbring S, Corral J, Hege I. Virtual patients - what are we talking about? A framework to classify the meanings of the term in healthcare education. BMC Med Educ. 2015;15(1):11. https://doi.org/10.1186/s12909-015-0296-3 .

Talbot TB, Sagae K, John B, Rizzo AA. Sorting out the virtual patient: how to exploit artificial intelligence, game technology and sound educational practices to create engaging role-playing simulations. Int J Gaming Comput-Mediat Simul. 2012;4(3):1–19.

Baartman LKJ, de Bruijn E. Integrating knowledge, skills and attitudes: conceptualising learning processes towards vocational competence. Educ Res Rev. 2011;6(2):125–34. https://doi.org/10.1016/j.edurev.2011.03.001 .

Mahon D. A scoping review of deliberate practice in the acquisition of therapeutic skills and practices. Couns Psychother Res. 2023;23(4):965–81. https://doi.org/10.1002/capr.12601 .

Ericsson KA, Lehmann AC. Expert and exceptional performance: evidence of maximal adaptation to task constraints. Annu Rev Psychol. 1996;47(1):273–305.

Roumeliotis KI, Tselikas ND. ChatGPT and Open-AI models: a preliminary review. Future Internet. 2023;15(6):192. https://doi.org/10.3390/fi15060192 .

Kasneci E, Sessler K, Küchemann S, Bannert M, Dementieva D, Fischer F, et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn Individ Differ. 2023;103:102274. https://doi.org/10.1016/j.lindif.2023.102274 .

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930–40. https://doi.org/10.1038/s41591-023-02448-8 .

Touvron H, Lavril T, Gautier I, Martinet X, Marie-Anne L, Lacroix T, et al. LLaMA: open and efficient foundation language models. arXivorg. 2023;2302.13971. https://doi.org/10.48550/arxiv.2302.13971 .

Radianti J, Majchrzak TA, Fromm J, Wohlgenannt I. A systematic review of immersive virtual reality applications for higher education: design elements, lessons learned, and research agenda. Comput Educ. 2020;147:103778. https://doi.org/10.1016/j.compedu.2019.103778 .

Wu B, Yu X, Gu X. Effectiveness of immersive virtual reality using head-mounted displays on learning performance: a meta-analysis. Br J Educ Technol. 2020;51(6):1991–2005. https://doi.org/10.1111/bjet.13023 .

Di Natale AF, Repetto C, Riva G, Villani D. Immersive virtual reality in K-12 and higher education: a 10-year systematic review of empirical research. Br J Educ Technol. 2020;51(6):2006–33. https://doi.org/10.1111/bjet.13030 .

Haugan S, Kværnø E, Sandaker J, Hustad JL, Thordarson GO. Playful learning with VR-SIMI model: the use of 360-video as a learning tool for nursing students in a psychiatric simulation setting. In: Akselbo I, Aune I, editors. How can we use simulation to improve competencies in nursing? Cham: Springer International Publishing; 2023. p. 103–16. https://doi.org/10.1007/978-3-031-10399-5_9 .

Huwendiek S, De leng BA, Zary N, Fischer MR, Ruiz JG, Ellaway R. Towards a typology of virtual patients. Med Teach. 2009;31(8):743–8. https://doi.org/10.1080/01421590903124708 .

Ødegaard NB, Myrhaug HT, Dahl-Michelsen T, Røe Y. Digital learning designs in physiotherapy education: a systematic review and meta-analysis. BMC Med Educ. 2021;21(1):48. https://doi.org/10.1186/s12909-020-02483-w .

Download references

Acknowledgements

The authors thank Mole Meyer, adviser at SIMInnlandet, Innlandet Hospital Trust, and Keith Mellingen, manager at VRINN, for their assistance with the categorization and classification of VR interventions, and Associate Professor Inga Hege at the Paracelcus Medical University in Salzburg for valuable contributions to the final classification of the interventions. The authors would also like to thank Håvard Røste from the media company KildeGruppen AS, for assistance with the search strategy; Academic Librarian Elin Opheim at the Inland Norway University of Applied Sciences for valuable peer review of the search strategy; and the Library at the Inland Norway University of Applied Sciences for their support. Additionally, we acknowledge the assistance provided by OpenAI’s ChatGPT for support with translations and language refinement.

Open access funding provided by Inland Norway University Of Applied Sciences The study forms a part of a collaborative PhD project funded by South-Eastern Norway Regional Health Authority through Innlandet Hospital Trust and the Inland University of Applied Sciences.

Author information

Authors and affiliations.

Mental Health Department, Innlandet Hospital Trust, P.B 104, Brumunddal, 2381, Norway

Cathrine W. Steen & Kerstin Söderström

Inland Norway University of Applied Sciences, P.B. 400, Elverum, 2418, Norway

Cathrine W. Steen, Kerstin Söderström & Inger Beate Nylund

Norwegian National Advisory Unit On Concurrent Substance Abuse and Mental Health Disorders, Innlandet Hospital Trust, P.B 104, Brumunddal, 2381, Norway

Bjørn Stensrud

Akershus University Hospital, P.B 1000, Lørenskog, 1478, Norway

Johan Siqveland

National Centre for Suicide Research and Prevention, Oslo, 0372, Norway

You can also search for this author in PubMed   Google Scholar

Contributions

CWS, KS, BS, and JS collaboratively designed the study. CWS and JS collected and analysed the data and were primarily responsible for writing the manuscript text. All authors contributed to the development of the search strategy. IBN conducted the literature searches and authored the chapter on the search strategy in the manuscript. All authors reviewed, gave feedback, and granted their final approval of the manuscript.

Corresponding author

Correspondence to Cathrine W. Steen .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Not applicable .

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: table 2..

Effects of VR training in the included studies: Randomized controlled trials (RCTs) and non-randomized studies (NRSs).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Steen, C.W., Söderström, K., Stensrud, B. et al. The effectiveness of virtual reality training on knowledge, skills and attitudes of health care professionals and students in assessing and treating mental health disorders: a systematic review. BMC Med Educ 24 , 480 (2024). https://doi.org/10.1186/s12909-024-05423-0

Download citation

Received : 19 January 2024

Accepted : 12 April 2024

Published : 01 May 2024

DOI : https://doi.org/10.1186/s12909-024-05423-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Health care professionals
  • Health care students
  • Virtual reality
  • Mental health
  • Clinical skills
  • Systematic review

BMC Medical Education

ISSN: 1472-6920

educational research systematic review

  • Open access
  • Published: 09 May 2024

Machine learning models for abstract screening task - A systematic literature review application for health economics and outcome research

  • Jingcheng Du 1 ,
  • Ekin Soysal 1 , 3 ,
  • Dong Wang 2 ,
  • Long He 1 ,
  • Bin Lin 1 ,
  • Jingqi Wang 1 ,
  • Frank J. Manion 1 ,
  • Yeran Li 2 ,
  • Elise Wu 2 &
  • Lixia Yao 2  

BMC Medical Research Methodology volume  24 , Article number:  108 ( 2024 ) Cite this article

45 Accesses

2 Altmetric

Metrics details

Systematic literature reviews (SLRs) are critical for life-science research. However, the manual selection and retrieval of relevant publications can be a time-consuming process. This study aims to (1) develop two disease-specific annotated corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases (PAPD), and (2) optimize machine- and deep-learning models to facilitate automation of the SLR abstract screening.

This study constructed two disease-specific SLR screening corpora for HPV and PAPD, which contained citation metadata and corresponding abstracts. Performance was evaluated using precision, recall, accuracy, and F1-score of multiple combinations of machine- and deep-learning algorithms and features such as keywords and MeSH terms.

Results and conclusions

The HPV corpus contained 1697 entries, with 538 relevant and 1159 irrelevant articles. The PAPD corpus included 2865 entries, with 711 relevant and 2154 irrelevant articles. Adding additional features beyond title and abstract improved the performance (measured in Accuracy) of machine learning models by 3% for HPV corpus and 2% for PAPD corpus. Transformer-based deep learning models that consistently outperformed conventional machine learning algorithms, highlighting the strength of domain-specific pre-trained language models for SLR abstract screening. This study provides a foundation for the development of more intelligent SLR systems.

Peer Review reports

Introduction

Systematic literature reviews (SLRs) are an essential tool in many areas of health sciences, enabling researchers to understand the current knowledge around a topic and identify future research and development directions. In the field of health economics and outcomes research (HEOR), SLRs play a crucial role in synthesizing evidence around unmet medical needs, comparing treatment options, and preparing the design and execution of future real-world evidence studies. SLRs provide a comprehensive and transparent analysis of available evidence, allowing researchers to make informed decisions and improve patient outcomes.

Conducting a SLR involves synthesizing high-quality evidence from biomedical literature in a transparent and reproducible manner, and seeks to include all available evidence on a given research question, and provides some assessment regarding quality of the evidence [ 1 , 2 ]. To conduct an SLR one or more bibliographic databases are queried based on a given research question and a corresponding set of inclusion and exclusion criteria, resulting in the selection of a relevant set of abstracts. The abstracts are reviewed, further refining the set of articles that are used to address the research question. Finally, appropriate data is systematically extracted from the articles and summarized [ 1 , 3 ].

The current approach to conducting a SLR is through manual review, with data collection, and summary done by domain experts against pre-specified eligibility criteria. This is time-consuming, labor-intensive, expensive, and non-scalable given the current more-than linear growth of the biomedical literature [ 4 ]. Michelson and Reuter estimate that each SLR costs approximately $141,194.80 and that on average major pharmaceutical companies conduct 23.36 SLRs, and major academic centers 177.32 SLRs per year, though the cost may vary based on the scope of different reviews [ 4 ]. Clearly automated methods are needed, both from a cost/time savings perspective, and for the ability to effectively scan and identify increasing amounts of literature, thereby allowing the domain experts to spend more time analyzing the data and gleaning the insights.

One major task of SLR project that involves large amounts of manual effort, is the abstract screening task. For this task, selection criteria are developed and the citation metadata and abstract for articles tentatively meeting these criteria are retrieved from one or more bibliographic databases (e.g., PubMed). The abstracts are then examined in more detail to determine if they are relevant to the research question(s) and should be included or excluded from further consideration. Consequently, the task of determining whether articles are relevant or not based on their titles, abstracts and metadata can be treated as a binary classification task, which can be addressed by natural language processing (NLP). NLP involves recognizing entities and relationships expressed in text and leverages machine-learning (ML) and deep-learning (DL) algorithms together with computational semantics to extract information. The past decade has witnessed significant advances in these areas for biomedical literature mining. A comprehensive review on how NLP techniques in particular are being applied for automatic mining and knowledge extraction from biomedical literature can be found in Zhao et al. [ 5 ].

Materials and methods

The aims of this study were to: (1) identify and develop two disease-specific corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases suitable for training the ML and DL models underlying the necessary NLP functions; (2) investigate and optimize the performance of the ML and DL models using different sets of features (e.g., keywords, Medical Subject Heading (MeSH) terms [ 6 ]) to facilitate automation of the abstract screening tasks necessary to construct a SLR. Note that these screening corpora can be used as training data to build different NLP models. We intend to freely share these two corpora with the entire scientific community so they can serve as benchmark corpora for future NLP model development in this area.

SLR corpora preparation

Two completed disease-specific SLR studies by Merck & Co., Inc., Rahway, NJ, USA were used as the basis to construct corpora for abstract-level screening. The two SLR studies were both relevant to health economics and outcome research, including one for human papillomavirus (HPV) associated diseases (referred to as the HPV corpus), and one for pneumococcal-associated pediatric diseases (which we refer to as the PAPD corpus). Both of the original SLR studies contained literature from PubMed/MEDLINE and EMBASE. Since we intended for the screening corpora to be released to the community, we only kept citations found from PubMed/MEDLINE in the finalized corpora. Because the original SLR studies did not contain the PubMed ID (PMID) for each article, we matched each article’s citation information (if available) against PubMed and then collected meta-data such as authors, journals, keywords, MeSH terms, publication types, etc., using PubMed Entrez Programming Utilities (E-utilities) Application Programming Interface (API). The detailed description of the two corpora can be seen in Table  1 . Both of the resulting corpora are publicly available at [ https://github.com/Merck/NLP-SLR-corpora ].

Machine learning algorithms

Although deep learning algorithms have demonstrated superior performance on many NLP tasks, conventional machine learning algorithms have certain advantages, such as low computation costs and faster training and prediction speed.

We evaluated four traditional ML-based document classification algorithms, XGBoost [ 7 ], Support Vector Machines (SVM) [ 8 ], Logistic regression (LR) [ 9 ], and Random Forest [ 10 ] on the binary inclusion/exclusion classification task for abstract screening. Salient characteristics of these models are as follows:

XGBoost: Short for “eXtreme Gradient Boosting”, XGBoost is a boosting-based ensemble of algorithms that turn weak learners into strong learners by focusing on where the individual models went wrong. In Gradient Boosting, individual weak models train upon the difference between the prediction and the actual results [ 7 ]. We set max_depth at 3, n_estimators at 150 and learning rate at 0.7.

Support vector machine (SVM): SVM is one of the most robust prediction methods based on statistical learning frameworks. It aims to find a hyperplane in an N-dimensional space (where N = the number of features) that distinctly classifies the data points [ 8 ]. We set C at 100, gamma at 0.005 and kernel as radial basis function.

Logistic regression (LR): LR is a classic statistical model that in its basic form uses a logistic function to model a binary dependent variable [ 9 ]. We set C at 5 and penalty as l2.

Random forest (RF): RF is a machine learning technique that utilizes ensemble learning to combine many decision trees classifiers through bagging or bootstrap aggregating [ 10 ]. We set n_estimators at 100 and max_depth at 14.

These four algorithms were trained for both the HPV screening task and the PAPD screening task using the corresponding training corpus.

For each of the four algorithms, we examined performance using (1) only the baseline feature criteria (title and abstract of each article), and (2) with five additional meta-data features (MeSH, Authors, Keywords, Journal, Publication types.) retrieved from each article using the PubMed E-utilities API. Conventionally, title and abstract are the first information a human reviewer would depend on when making a judgment for inclusion or exclusion of an article. Consequently, we used title and abstract as the baseline features to classify whether an abstract should be included at the abstract screening stage. We further evaluated the performance with additional features that can be retrieved by PubMed E-utilities API, including MeSH terms, authors, journal, keywords and publication type. For baseline evaluation, we concatenated the titles and abstracts and extracted the TF-IDF (term frequency-inverse document frequency) vector for the corpus. TF-IDF evaluates how relevant a word is to a document in a collection of documents. For additional features, we extracted TF-IDF vector using each feature respectively and then concatenated the extracted vectors with title and abstract vector. XGBoost was selected for the feature evaluation process, due to its relatively quick computational running time and robust performance.

Deep learning algorithms

Conventional ML methods rely heavily on manually designed features and suffer from the challenges of data sparsity and poor transportability when applied to new use cases. Deep learning (DL) is a set of machine learning algorithms based on deep neural networks that has advanced performance of text classification along with many other NLP tasks. Transformer-based deep learning models, such as BERT (Bidirectional encoder representations from transformers), have achieved state-of-the-art performance in many NLP tasks [ 11 ]. A Transformer is an emerging architecture of deep learning models designed to handle sequential input data such as natural language by adopting the mechanisms of attention to differentially weigh the significance of each part of the input data [ 12 ]. The BERT model and its variants (which use Transformer as a basic unit) leverage the power of transfer learning by first pre-training the models over 100’s of millions of parameters using large volumes of unlabeled textual data. The resulting model is then fine-tuned for a particular downstream NLP application, such as text classification, named entity recognition, relation extraction, etc. The following three BERT models were evaluated against both the HPV and Pediatric pneumococcal corpus using two sets of features (title and abstract versus adding all additional features into the text). For all BERT models, we used Adam optimizer with weight decay. We set learning rate at 1e-5, batch size at 8 and number of epochs at 20.

BERT base: this is the original BERT model released by Google. The BERT base model was pre-trained on textual data in the general domain, i.e., BooksCorpus (800 M words) and English Wikipedia (2500 M words) [ 11 ].

BioBERT base: as the biomedical language is different from general language, the BERT models trained on general textual data may not work well on biomedical NLP tasks. BioBERT was further pre-trained (based on original BERT models) in the large-scale biomedical corpora, including PubMed abstracts (4.5B words) and PubMed Central Full-text articles (13.5B words) [ 13 ].

PubMedBERT: PubMedBERT was pre-trained from scratch using abstracts from PubMed. This model has achieved state-of-the-art performance on several biomedical NLP tasks on Biomedical Language Understanding and Reasoning Benchmark [ 14 ].

Text pre-processing and libraries that were used

We have removed special characters and common English words as a part of text pre-processing. Default tokenizer from scikit-learn was adopted for tokenization. Scikit-learn was also used for TF-IDF feature extraction and machine learning algorithms implementation. Transformers libraries from Hugging Face were used for deep learning algorithms implementation.

Evaluation datasets were constructed from the HPV and Pediatric pneumococcal corpora and were split into training, validation and testing sets with a ratio of 8:1:1 for the two evaluation tasks: (1) ML algorithms performance assessment; and (2) DL algorithms performance assessment. Models were fitted on the training sets, and model hyperparameters were optimized on the validation sets and the performance were evaluated on the testing sets. The following major metrics are expressed by the noted calculations:

Where True positive is an outcome where the model correctly predicts the positive (e.g., “included” in our tasks) class. Similarly, a True negative is an outcome where the model correctly predicts the negative class (e.g., “excluded” in our tasks). False positive is an outcome where the model incorrectly predicts the positive class, and a False negative is an outcome where the model incorrectly predicts the negative class. We have repeated all experiments five times and reported the mean scores with standard deviation.

Table  2 shows the baseline comparison using different feature combinations for the SLR text classification tasks using XGBoost. As noted, adding additional features in addition to title and abstract was effective in further improving the classification accuracy. Specifically, using all available features for the HPV classification increased accuracy by ? ∼  3% and F1 score by ? ∼  3%; using all available features for Pediatric pneumococcal classification increased accuracy by ? ∼  2% and F1 score by ? ∼  4%. As observed, adding additional features provided a stronger boost in precision, which contributed to the overall performance improvement.

The comparison of the article inclusion/exclusion classification task for four machine learning algorithms with all features is shown in Table  3 . XGBoost achieved the highest accuracy and F-1 scores in both tasks. Table  4 shows the comparison between XGBoost and deep learning algorithms on the classification tasks for each disease. Both XGBoost and deep learning models consistently have achieved higher accuracy scores when using all features as input. Among all models, BioBERT has achieved the highest accuracy at 0.88, compared with XGBoost at 0.86. XGBoost has the highest F1 score at 0.8 and the highest recall score at 0.9 for inclusion prediction.

Discussions and conclusions

Abstract screening is a crucial step in conducting a systematic literature review (SLR), as it helps to identify relevant citations and reduces the effort required for full-text screening and data element extraction. However, screening thousands of abstracts can be a time-consuming and burdensome task for scientific reviewers. In this study, we systematically investigated the use of various machine learning and deep learning algorithms, using different sets of features, to automate abstract screening tasks. We evaluated these algorithms using disease-focused SLR corpora, including one for human papillomavirus (HPV) associated diseases and another for pneumococcal-associated pediatric diseases (PADA). The publicly available corpora used in this study can be used by the scientific community for advanced algorithm development and evaluation. Our findings suggest that machine learning and deep learning algorithms can effectively automate abstract screening tasks, saving valuable time and effort in the SLR process.

Although machine learning and deep learning algorithms trained on the two SLR corpora showed some variations in performance, there were also some consistencies. Firstly, adding additional citation features significantly improved the performance of conventional machine learning algorithms, although the improvement was not as strong in transformer-based deep learning models. This may be because transformer models were mostly pre-trained on abstracts, which do not include additional citation information like MeSH terms, keywords, and journal names. Secondly, when using only title and abstract as input, transformer models consistently outperformed conventional machine learning algorithms, highlighting the strength of subject domain-specific pre-trained language models. When all citation features were combined as input, conventional machine learning algorithms showed comparable performance to deep learning models. Given the much lower computation costs and faster training and prediction time, XGBoost or support vector machines with all citation features could be an excellent choice for developing an abstract screening system.

Some limitations remain for this study. Although we’ve evaluated cutting-edge machine learning and deep learning algorithms on two SLR corpora, we did not conduct much task-specific customization to the learning algorithms, including task-specific feature engineering and rule-based post-processing, which could offer additional benefits to the performance. As the focus of this study is to provide generalizable strategies for employing machine learning to abstract screening tasks, we leave the task-specific customization to future improvement. The corpora we evaluated in this study mainly focus on health economics and outcome research, the generalizability of learning algorithms to another domain will benefit from formal examination.

Extensive studies have shown the superiority of transformer-based deep learning models for many NLP tasks [ 11 , 13 , 14 , 15 , 16 ]. Based on our experiments, however, adding features to the pre-trained language models that have not seen these features before may not significantly boost their performance. It would be interesting to find a better way of encoding additional features to these pre-trained language models to maximize their performance. In addition, transfer learning has proven to be an effective technique to improve the performance on a target task by leveraging annotation data from a source task [ 17 , 18 , 19 ]. Thus, for a new SLR abstract screening task, it would be worthwhile to investigate the use of transfer learning by adapting our (publicly available) corpora to the new target task.

When labeled data is available, supervised machine learning algorithms can be very effective and efficient for article screening. However, as there is increasing need for explainability and transparency in NLP-assisted SLR workflow, supervised machine learning algorithms are facing challenges in explaining why certain papers fail to fulfill the criteria. The recent advances in large language models (LLMs), such as ChatGPT [ 20 ] and Gemini [ 21 ], show remarkable performance on NLP tasks and good potentials in explainablity. Although there are some concerns on the bias and hallucinations that LLMs could bring, it would be worthwhile to evaluate further how LLMs could be applied to SLR tasks and understand the performance of using LLMs to take free-text article screening criteria as the input and provide explainanation for article screening decisions.

Data availability

The annotated corpora underlying this article are available at https://github.com/Merck/NLP-SLR-corpora .

Bullers K, Howard AM, Hanson A, et al. It takes longer than you think: librarian time spent on systematic review tasks. J Med Libr Assoc. 2018;106:198–207. https://doi.org/10.5195/jmla.2018.323 .

Article   PubMed   PubMed Central   Google Scholar  

Carver JC, Hassler E, Hernandes E et al. Identifying Barriers to the Systematic Literature Review Process. In: 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement . 2013. 203–12. https://doi.org/10.1109/ESEM.2013.28 .

Lame G. Systematic literature reviews: an introduction. Proc Des Society: Int Conf Eng Des. 2019;1:1633–42. https://doi.org/10.1017/dsi.2019.169 .

Article   Google Scholar  

Michelson M, Reuter K. The significant cost of systematic reviews and meta-analyses: a call for greater involvement of machine learning to assess the promise of clinical trials. Contemp Clin Trials Commun. 2019;16:100443. https://doi.org/10.1016/j.conctc.2019.100443 .

Recent advances in. biomedical literature mining | Briefings in Bioinformatics | Oxford Academic. https://academic.oup.com/bib/article/22/3/bbaa057/5838460?login=true (accessed 30 May 2022).

Medical Subject Headings - Home Page. https://www.nlm.nih.gov/mesh/meshhome.html (accessed 30 May 2022).

Chen T, Guestrin C, XGBoost:. A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . New York, NY, USA: Association for Computing Machinery 2016. 785–94. https://doi.org/10.1145/2939672.2939785 .

Noble WS. What is a support vector machine? Nat Biotechnol. 2006;24:1565–7. https://doi.org/10.1038/nbt1206-1565 .

Article   CAS   PubMed   Google Scholar  

Logistic Regression . https://doi.org/10.1007/978-1-4419-1742-3 (accessed 30 May 2022).

Random forest classifier for remote sensing classification. International Journal of Remote Sensing: Vol 26, No 1. https://www.tandfonline.com/doi/abs/10.1080/01431160412331269698 (accessed 30 May 2022).

Devlin J, Chang M-W, Lee K, et al. BERT: pre-training of Deep Bidirectional transformers for Language understanding. arXiv. 2019. https://doi.org/10.48550/arXiv.1810.04805 .

Vaswani A, Shazeer N, Parmar N et al. Attention is All you Need. In: Advances in Neural Information Processing Systems . Curran Associates, Inc. 2017. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed 30 May 2022).

BioBERT. a pre-trained biomedical language representation model for biomedical text mining | Bioinformatics | Oxford Academic. https://academic.oup.com/bioinformatics/article/36/4/1234/5566506 (accessed 3 Jun 2020).

Gu Y, Tinn R, Cheng H, et al. Domain-specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthc. 2021;3(2):1–2. https://doi.org/10.1145/3458754 .

Article   CAS   Google Scholar  

Chen Q, Du J, Allot A, et al. LitMC-BERT: transformer-based multi-label classification of biomedical literature with an application on COVID-19 literature curation. arXiv. 2022. https://doi.org/10.48550/arXiv.2204.08649 .

Chen Q, Allot A, Leaman R, et al. Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. arXiv. 2022. https://doi.org/10.48550/arXiv.2204.09781 .

Kermany DS, Goldbaum M, Cai W, et al. Identifying Medical diagnoses and Treatable diseases by Image-based deep learning. Cell. 2018;172:1122–e11319. https://doi.org/10.1016/j.cell.2018.02.010 .

Howard J, Ruder S. Universal Language Model fine-tuning for text classification. arXiv. 2018. https://doi.org/10.48550/arXiv.1801.06146 .

Do CB, Ng AY. Transfer learning for text classification. In: Advances in Neural Information Processing Systems . MIT Press. 2005. https://proceedings.neurips.cc/paper/2005/hash/bf2fb7d1825a1df3ca308ad0bf48591e-Abstract.html (accessed 30 May 2022).

Achiam J et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).

https:// gemini.google.com/app/a4dcd2e2d7672354 . (accessed 01 Feb 2024).

Download references

Acknowledgements

We thank Dr. Majid Rastegar-Mojarad for conducting some additional experiments during revision.

This research was supported by Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA.

Author information

Authors and affiliations.

Intelligent Medical Objects, Houston, TX, USA

Jingcheng Du, Ekin Soysal, Long He, Bin Lin, Jingqi Wang & Frank J. Manion

Merck & Co., Inc, Rahway, NJ, USA

Dong Wang, Yeran Li, Elise Wu & Lixia Yao

McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA

Ekin Soysal

You can also search for this author in PubMed   Google Scholar

Contributions

Study concept and design: JD and LY Corpus preparation: DW, YL and LY Experiments: JD and ES Draft of the manuscript: JD, DW, FJM and LY Acquisition, analysis, or interpretation of data: JD, ES, DW and LY Critical revision of the manuscript for important intellectual content: JD, ES, DW, LH, BL, JW, FJM, YL, EW, LY Study supervision: LY.

Corresponding author

Correspondence to Lixia Yao .

Ethics declarations

Disclaimers.

The content is the sole responsibility of the authors and does not necessarily represent the official views of Merck & Co., Inc., Rahway, NJ, USA or Intelligent Medical Objects.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Competing interests.

DW is an employee of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. EW, YL, and LY were employees of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA for this work. JD, LH, JW, and FJM are employees of Intelligent Medical Objects. ES was an employee of Intelligent Medical Objects during his contributions, and is currently an employee of EBSCO Information Services. All the other authors declare no competing interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Du, J., Soysal, E., Wang, D. et al. Machine learning models for abstract screening task - A systematic literature review application for health economics and outcome research. BMC Med Res Methodol 24 , 108 (2024). https://doi.org/10.1186/s12874-024-02224-3

Download citation

Received : 19 May 2023

Accepted : 18 April 2024

Published : 09 May 2024

DOI : https://doi.org/10.1186/s12874-024-02224-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Deep learning
  • Text classification
  • Article screening
  • Systematic literature review

BMC Medical Research Methodology

ISSN: 1471-2288

educational research systematic review

IMAGES

  1. How to Conduct a Systematic Review

    educational research systematic review

  2. systematic review step by step guide

    educational research systematic review

  3. 10 Ultimate Tips: How to Know if an Article is a Systematic Review

    educational research systematic review

  4. (PDF) Systematic Reviews in Educational Research

    educational research systematic review

  5. Introduction to systematic reviews

    educational research systematic review

  6. Levels of evidence and study design

    educational research systematic review

VIDEO

  1. CIRD Intro to Systematic Review And Meta-Analysis

  2. Sessions 2 and 3 Research Seminar for Elementary School Teacher

  3. Introduction to Evidence Synthesis

  4. Introduction to Systematic Review of Research

  5. Systematic Literature Review: An Introduction [Urdu/Hindi]

  6. Systematic Reviews In Research Universe

COMMENTS

  1. Systematic Reviews in Educational Research: Methodology, Perspectives

    A literature review is a scholarly paper which provides an overview of current knowledge about a topic. It will typically include substantive findings, as well as theoretical and methodological contributions to a particular topic (Hart 2018, p. xiii).Traditionally in education 'reviewing the literature' and 'doing research' have been viewed as distinct activities.

  2. Review of Educational Research: Sage Journals

    The Review of Educational Research (RER) publishes critical, integrative reviews of research literature bearing on education, including conceptualizations, interpretations, and syntheses of literature and scholarly work in a field broadly relevant to education and educational research. View full journal description.

  3. Educational Research Review

    Educational Research Review is an international journal addressed to researchers and various agencies interested in the review of studies and theoretical papers in education at any level. The journal accepts high quality articles that are solving educational research problems by using a review approach. ... A systematic review of empirical ...

  4. (PDF) Systematic Reviews in Educational Research

    In this open access edited volume, international researchers of the field describe and discuss the systematic review method in its application to research in education. Alongside fundamental ...

  5. IDESR

    IDESR is a database of published systematic reviews in Education and a clearinghouse for protocol registration of ongoing and planned systematic reviews. From this page you can: meet the team behind the IDESR project. IDESR is now in its second phase of development and is accepting registrations of review protocols for all areas of education.

  6. Systematic reviews of research in education: aims, myths and multiple

    Review of Education is an official BERA journal publishing educational research from throughout the world, and papers on topics of international interest. Systematic reviews are still a controversial topic in some quarters, with the arguments for and against their use being well-rehearsed.

  7. PDF Overviews in Education Research: A Systematic Review and Analysis

    and practitioners increasingly rely on systematic reviews to inform funding alloca-tions and practice (Gough, Oliver, & Thomas, 2012). The research synthesis industry is efficient and expanding, nearly doubling each year in the social sciences (Williams, 2012). Bastian, Glasziou, and Chalmers (2010) estimated that 11 systematic reviews

  8. The trials of evidence-based practice in education: a systematic review

    Introduction. Since the late 1990s there has been an increasing shift towards the notion of evidence-based practice in education (Thomas and Pring Citation 2004; Hammersley Citation 2007; Bridges, Smeyers, and Smith Citation 2009).A significant element of this has been concerned with research that has sought to identify and provide robust evidence of 'what works' in relation to educational ...

  9. Educational Research Review

    Adolescent psychosocial factors and participation in education and employment in young adulthood: A systematic review and meta-analyses. Sümeyra N. Tayfur, Susan Prior, Anusua Singh Roy, Linda Irvine Fitzpatrick, Kirsty Forsyth. Article 100404. View PDF.

  10. (PDF) Systematic Reviews in Educational Research: Methodology

    Research with systematic review is a research design to systematically synthesize existing research evidence by searching research articles, critical review (critical appraisal), and synthesis of ...

  11. Overviews in Education Research: A Systematic Review and Analysis

    Overviews, or syntheses of research syntheses, have become a popular approach to synthesizing the rapidly expanding body of research and systematic reviews. Despite their popularity, few guidelines exist and the state of the field in education is unclear.

  12. Systematic Reviews in Educational Research

    Language. In this open access edited volume, international researchers of the field describe and discuss the systematic review method in its application to research in education. Alongside fundamental methodical considerations, reflections and practice examples are included and provide an introduction and overview on systematic reviews in ...

  13. Systematic Review Essentials (Home)

    In Education and Psychology, systematic reviews typically include: A clearly defined research question. The research question is often developed after performing preliminary searches on the subject, ensuring that it is viable and that no other systematic reviews exist on the topic. Evidence of a rigorous search process.

  14. The Place of Systematic Reviews in Education Research

    the place of systematic research reviews in education research. moving on from Thomas and Pring's state-of-the-art discussion of. debates surrounding evidence-based practice in education, I. chosen to take one aspect of this debate and point the way for ther development and enquiry.

  15. Overviews in education research: A systematic review and analysis

    Overviews, or syntheses of research syntheses, have become a popular approach to synthesizing the rapidly expanding body of research and systematic reviews. Despite their popularity, few guidelines exist and the state of the field in education is unclear. The purpose of this study is to describe the prevalence and current state of overviews of education research and to provide further guidance ...

  16. Educational Technology Adoption: A systematic review

    The research scope of this systematic review is narrowed and piloted towards understanding the most recognized and applied theoretical models, as well as the most influential predictive factors affecting various technologies used to support the process of knowledge transfer and acquisition. ... Review of Educational Research. 2009; 79 (2):625 ...

  17. A systematic literature review of research examining the impact of

    Review of Education is an official BERA journal publishing educational research from throughout the world, and papers on topics of international interest. Abstract This article reports on a systematic review of the evidence concerning the impact of citizenship education, specifically focusing on the effect of different teaching activities on a ...

  18. Systematic Reviews of Educational Research: Does the medical model fit

    Abstract There has been a recent increase in interest in the research review as a method of presenting cumulative data about the effects of educational policies and practices. This is part of a wider movement in 'evidence-informed policy-making' espoused by the current Government. In part, the interest as been sparked by the perceived success of the Cochrane Collaboration in medicine ...

  19. Digital storytelling as a strategy for developing 21st-century skills

    ABSTRACT. Digital storytelling (DST) is an educational strategy that has been shown to be capable of developing a wide range of skills. The focus of this review is to examine the relationship between this strategy and the development of 21st-century skills, a highly relevant notion in today's educational systems.

  20. The effectiveness of virtual reality training on knowledge, skills and

    World Health Organization. eLearning for undergraduate health professional education: a systematic review informing a radical transformation of health workforce development. ... Riva G, Villani D. Immersive virtual reality in K-12 and higher education: a 10-year systematic review of empirical research. Br J Educ Technol. 2020;51(6):2006-33 ...

  21. Review of Educational Research

    Zid Mancenido. Preview abstract. Restricted access Research article First published May 24, 2023 pp. 268-307. xml GET ACCESS. Table of contents for Review of Educational Research, 94, 2, Apr 01, 2024.

  22. (PDF) A Systematic Review of Q Methodology with the Aim of

    This article provides a comprehensive review of the importance of the Q method in educational management studies. Firstly, the Q methodology is explained and compared with other methodologies in ...

  23. Healthcare

    The 23 articles selected in this systematic review span 11 years (2012-2023) and provide diverse research study themes. The clusters illustrate important social determinants of health and public health, healthcare interventions, and policies that are needed for HIV older adults.

  24. Full article: A Systematic Review of the Limitations and Associated

    First, while this systematic review identified a substantial pool of 485 studies on ChatGPT limitations through searches in Scopus, Web of Science, ERIC, and IEEE databases, it should be acknowledged that it may have overlooked potential studies not indexed within these major databases. ... Physical Review Physics Education Research, 19, 1-18 ...

  25. PDF Systematic Reviews in Educational Research

    a very prestigious educational research journal (see Chap. 5) write about the ben-ets of publishing systematic reviews and share some positive examples that can serve as guidelines for researchers new to systematic reviews, in order to get their work published in a peer-reviewed journal.

  26. Computers

    In this systematic literature review, the intersection of deep learning applications within the aphasia domain is meticulously explored, acknowledging the condition's complex nature and the nuanced challenges it presents for language comprehension and expression. By harnessing data from primary databases and employing advanced query methodologies, this study synthesizes findings from 28 ...

  27. Machine learning models for abstract screening task

    Systematic literature reviews (SLRs) are critical for life-science research. However, the manual selection and retrieval of relevant publications can be a time-consuming process. This study aims to (1) develop two disease-specific annotated corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases (PAPD), and (2) optimize ...