• Tutorial Review
  • Open access
  • Published: 24 January 2018

Teaching the science of learning

  • Yana Weinstein   ORCID: orcid.org/0000-0002-5144-968X 1 ,
  • Christopher R. Madan 2 , 3 &
  • Megan A. Sumeracki 4  

Cognitive Research: Principles and Implications volume  3 , Article number:  2 ( 2018 ) Cite this article

234k Accesses

86 Citations

767 Altmetric

Metrics details

The science of learning has made a considerable contribution to our understanding of effective teaching and learning strategies. However, few instructors outside of the field are privy to this research. In this tutorial review, we focus on six specific cognitive strategies that have received robust support from decades of research: spaced practice, interleaving, retrieval practice, elaboration, concrete examples, and dual coding. We describe the basic research behind each strategy and relevant applied research, present examples of existing and suggested implementation, and make recommendations for further research that would broaden the reach of these strategies.

Significance

Education does not currently adhere to the medical model of evidence-based practice (Roediger, 2013 ). However, over the past few decades, our field has made significant advances in applying cognitive processes to education. From this work, specific recommendations can be made for students to maximize their learning efficiency (Dunlosky, Rawson, Marsh, Nathan, & Willingham, 2013 ; Roediger, Finn, & Weinstein, 2012 ). In particular, a review published 10 years ago identified a limited number of study techniques that have received solid evidence from multiple replications testing their effectiveness in and out of the classroom (Pashler et al., 2007 ). A recent textbook analysis (Pomerance, Greenberg, & Walsh, 2016 ) took the six key learning strategies from this report by Pashler and colleagues, and found that very few teacher-training textbooks cover any of these six principles – and none cover them all, suggesting that these strategies are not systematically making their way into the classroom. This is the case in spite of multiple recent academic (e.g., Dunlosky et al., 2013 ) and general audience (e.g., Dunlosky, 2013 ) publications about these strategies. In this tutorial review, we present the basic science behind each of these six key principles, along with more recent research on their effectiveness in live classrooms, and suggest ideas for pedagogical implementation. The target audience of this review is (a) educators who might be interested in integrating the strategies into their teaching practice, (b) science of learning researchers who are looking for open questions to help determine future research priorities, and (c) researchers in other subfields who are interested in the ways that principles from cognitive psychology have been applied to education.

While the typical teacher may not be exposed to this research during teacher training, a small cohort of teachers intensely interested in cognitive psychology has recently emerged. These teachers are mainly based in the UK, and, anecdotally (e.g., Dennis (2016), personal communication), appear to have taken an interest in the science of learning after reading Make it Stick (Brown, Roediger, & McDaniel, 2014 ; see Clark ( 2016 ) for an enthusiastic review of this book on a teacher’s blog, and “Learning Scientists” ( 2016c ) for a collection). In addition, a grassroots teacher movement has led to the creation of “researchED” – a series of conferences on evidence-based education (researchED, 2013 ). The teachers who form part of this network frequently discuss cognitive psychology techniques and their applications to education on social media (mainly Twitter; e.g., Fordham, 2016 ; Penfound, 2016 ) and on their blogs, such as Evidence Into Practice ( https://evidenceintopractice.wordpress.com/ ), My Learning Journey ( http://reflectionsofmyteaching.blogspot.com/ ), and The Effortful Educator ( https://theeffortfuleducator.com/ ). In general, the teachers who write about these issues pay careful attention to the relevant literature, often citing some of the work described in this review.

These informal writings, while allowing teachers to explore their approach to teaching practice (Luehmann, 2008 ), give us a unique window into the application of the science of learning to the classroom. By examining these blogs, we can not only observe how basic cognitive research is being applied in the classroom by teachers who are reading it, but also how it is being misapplied, and what questions teachers may be posing that have gone unaddressed in the scientific literature. Throughout this review, we illustrate each strategy with examples of how it can be implemented (see Table  1 and Figs.  1 , 2 , 3 , 4 , 5 , 6 and 7 ), as well as with relevant teacher blog posts that reflect on its application, and draw upon this work to pin-point fruitful avenues for further basic and applied research.

Spaced practice schedule for one week. This schedule is designed to represent a typical timetable of a high-school student. The schedule includes four one-hour study sessions, one longer study session on the weekend, and one rest day. Notice that each subject is studied one day after it is covered in school, to create spacing between classes and study sessions. Copyright note: this image was produced by the authors

a Blocked practice and interleaved practice with fraction problems. In the blocked version, students answer four multiplication problems consecutively. In the interleaved version, students answer a multiplication problem followed by a division problem and then an addition problem, before returning to multiplication. For an experiment with a similar setup, see Patel et al. ( 2016 ). Copyright note: this image was produced by the authors. b Illustration of interleaving and spacing. Each color represents a different homework topic. Interleaving involves alternating between topics, rather than blocking. Spacing involves distributing practice over time, rather than massing. Interleaving inherently involves spacing as other tasks naturally “fill” the spaces between interleaved sessions. Copyright note: this image was produced by the authors, adapted from Rohrer ( 2012 )

Concept map illustrating the process and resulting benefits of retrieval practice. Retrieval practice involves the process of withdrawing learned information from long-term memory into working memory, which requires effort. This produces direct benefits via the consolidation of learned information, making it easier to remember later and causing improvements in memory, transfer, and inferences. Retrieval practice also produces indirect benefits of feedback to students and teachers, which in turn can lead to more effective study and teaching practices, with a focus on information that was not accurately retrieved. Copyright note: this figure originally appeared in a blog post by the first and third authors ( http://www.learningscientists.org/blog/2016/4/1-1 )

Illustration of “how” and “why” questions (i.e., elaborative interrogation questions) students might ask while studying the physics of flight. To help figure out how physics explains flight, students might ask themselves the following questions: “How does a plane take off?”; “Why does a plane need an engine?”; “How does the upward force (lift) work?”; “Why do the wings have a curved upper surface and a flat lower surface?”; and “Why is there a downwash behind the wings?”. Copyright note: the image of the plane was downloaded from Pixabay.com and is free to use, modify, and share

Three examples of physics problems that would be categorized differently by novices and experts. The problems in ( a ) and ( c ) look similar on the surface, so novices would group them together into one category. Experts, however, will recognize that the problems in ( b ) and ( c ) both relate to the principle of energy conservation, and so will group those two problems into one category instead. Copyright note: the figure was produced by the authors, based on figures in Chi et al. ( 1981 )

Example of how to enhance learning through use of a visual example. Students might view this visual representation of neural communications with the words provided, or they could draw a similar visual representation themselves. Copyright note: this figure was produced by the authors

Example of word properties associated with visual, verbal, and motor coding for the word “SPOON”. A word can evoke multiple types of representation (“codes” in dual coding theory). Viewing a word will automatically evoke verbal representations related to its component letters and phonemes. Words representing objects (i.e., concrete nouns) will also evoke visual representations, including information about similar objects, component parts of the object, and information about where the object is typically found. In some cases, additional codes can also be evoked, such as motor-related properties of the represented object, where contextual information related to the object’s functional intention and manipulation action may also be processed automatically when reading the word. Copyright note: this figure was produced by the authors and is based on Aylwin ( 1990 ; Fig.  2 ) and Madan and Singhal ( 2012a , Fig.  3 )

Spaced practice

The benefits of spaced (or distributed) practice to learning are arguably one of the strongest contributions that cognitive psychology has made to education (Kang, 2016 ). The effect is simple: the same amount of repeated studying of the same information spaced out over time will lead to greater retention of that information in the long run, compared with repeated studying of the same information for the same amount of time in one study session. The benefits of distributed practice were first empirically demonstrated in the 19 th century. As part of his extensive investigation into his own memory, Ebbinghaus ( 1885/1913 ) found that when he spaced out repetitions across 3 days, he could almost halve the number of repetitions necessary to relearn a series of 12 syllables in one day (Chapter 8). He thus concluded that “a suitable distribution of [repetitions] over a space of time is decidedly more advantageous than the massing of them at a single time” (Section 34). For those who want to read more about Ebbinghaus’s contribution to memory research, Roediger ( 1985 ) provides an excellent summary.

Since then, hundreds of studies have examined spacing effects both in the laboratory and in the classroom (Kang, 2016 ). Spaced practice appears to be particularly useful at large retention intervals: in the meta-analysis by Cepeda, Pashler, Vul, Wixted, and Rohrer ( 2006 ), all studies with a retention interval longer than a month showed a clear benefit of distributed practice. The “new theory of disuse” (Bjork & Bjork, 1992 ) provides a helpful mechanistic explanation for the benefits of spacing to learning. This theory posits that memories have both retrieval strength and storage strength. Whereas retrieval strength is thought to measure the ease with which a memory can be recalled at a given moment, storage strength (which cannot be measured directly) represents the extent to which a memory is truly embedded in the mind. When studying is taking place, both retrieval strength and storage strength receive a boost. However, the extent to which storage strength is boosted depends upon retrieval strength, and the relationship is negative: the greater the current retrieval strength, the smaller the gains in storage strength. Thus, the information learned through “cramming” will be rapidly forgotten due to high retrieval strength and low storage strength (Bjork & Bjork, 2011 ), whereas spacing out learning increases storage strength by allowing retrieval strength to wane before restudy.

Teachers can introduce spacing to their students in two broad ways. One involves creating opportunities to revisit information throughout the semester, or even in future semesters. This does involve some up-front planning, and can be difficult to achieve, given time constraints and the need to cover a set curriculum. However, spacing can be achieved with no great costs if teachers set aside a few minutes per class to review information from previous lessons. The second method involves putting the onus to space on the students themselves. Of course, this would work best with older students – high school and above. Because spacing requires advance planning, it is crucial that the teacher helps students plan their studying. For example, teachers could suggest that students schedule study sessions on days that alternate with the days on which a particular class meets (e.g., schedule review sessions for Tuesday and Thursday when the class meets Monday and Wednesday; see Fig.  1 for a more complete weekly spaced practice schedule). It important to note that the spacing effect refers to information that is repeated multiple times, rather than the idea of studying different material in one long session versus spaced out in small study sessions over time. However, for teachers and particularly for students planning a study schedule, the subtle difference between the two situations (spacing out restudy opportunities, versus spacing out studying of different information over time) may be lost. Future research should address the effects of spacing out studying of different information over time, whether the same considerations apply in this situation as compared to spacing out restudy opportunities, and how important it is for teachers and students to understand the difference between these two types of spaced practice.

It is important to note that students may feel less confident when they space their learning (Bjork, 1999 ) than when they cram. This is because spaced learning is harder – but it is this “desirable difficulty” that helps learning in the long term (Bjork, 1994 ). Students tend to cram for exams rather than space out their learning. One explanation for this is that cramming does “work”, if the goal is only to pass an exam. In order to change students’ minds about how they schedule their studying, it might be important to emphasize the value of retaining information beyond a final exam in one course.

Ideas for how to apply spaced practice in teaching have appeared in numerous teacher blogs (e.g., Fawcett, 2013 ; Kraft, 2015 ; Picciotto, 2009 ). In England in particular, as of 2013, high-school students need to be able to remember content from up to 3 years back on cumulative exams (General Certificate of Secondary Education (GCSE) and A-level exams; see CIFE, 2012 ). A-levels in particular determine what subject students study in university and which programs they are accepted into, and thus shape the path of their academic career. A common approach for dealing with these exams has been to include a “revision” (i.e., studying or cramming) period of a few weeks leading up to the high-stakes cumulative exams. Now, teachers who follow cognitive psychology are advocating a shift of priorities to spacing learning over time across the 3 years, rather than teaching a topic once and then intensely reviewing it weeks before the exam (Cox, 2016a ; Wood, 2017 ). For example, some teachers have suggested using homework assignments as an opportunity for spaced practice by giving students homework on previous topics (Rose, 2014 ). However, questions remain, such as whether spaced practice can ever be effective enough to completely alleviate the need or utility of a cramming period (Cox, 2016b ), and how one can possibly figure out the optimal lag for spacing (Benney, 2016 ; Firth, 2016 ).

There has been considerable research on the question of optimal lag, and much of it is quite complex; two sessions neither too close together (i.e., cramming) nor too far apart are ideal for retention. In a large-scale study, Cepeda, Vul, Rohrer, Wixted, and Pashler ( 2008 ) examined the effects of the gap between study sessions and the interval between study and test across long periods, and found that the optimal gap between study sessions was contingent on the retention interval. Thus, it is not clear how teachers can apply the complex findings on lag to their own classrooms.

A useful avenue of research would be to simplify the research paradigms that are used to study optimal lag, with the goal of creating a flexible, spaced-practice framework that teachers could apply and tailor to their own teaching needs. For example, an Excel macro spreadsheet was recently produced to help teachers plan for lagged lessons (Weinstein-Jones & Weinstein, 2017 ; see Weinstein & Weinstein-Jones ( 2017 ) for a description of the algorithm used in the spreadsheet), and has been used by teachers to plan their lessons (Penfound, 2017 ). However, one teacher who found this tool helpful also wondered whether the more sophisticated plan was any better than his own method of manually selecting poorly understood material from previous classes for later review (Lovell, 2017 ). This direction is being actively explored within personalized online learning environments (Kornell & Finn, 2016 ; Lindsey, Shroyer, Pashler, & Mozer, 2014 ), but teachers in physical classrooms might need less technologically-driven solutions to teach cohorts of students.

It seems teachers would greatly appreciate a set of guidelines for how to implement spacing in the curriculum in the most effective, but also the most efficient manner. While the cognitive field has made great advances in terms of understanding the mechanisms behind spacing, what teachers need more of are concrete evidence-based tools and guidelines for direct implementation in the classroom. These could include more sophisticated and experimentally tested versions of the software described above (Weinstein-Jones & Weinstein, 2017 ), or adaptable templates of spaced curricula. Moreover, researchers need to evaluate the effectiveness of these tools in a real classroom environment, over a semester or academic year, in order to give pedagogically relevant evidence-based recommendations to teachers.

Interleaving

Another scheduling technique that has been shown to increase learning is interleaving. Interleaving occurs when different ideas or problem types are tackled in a sequence, as opposed to the more common method of attempting multiple versions of the same problem in a given study session (known as blocking). Interleaving as a principle can be applied in many different ways. One such way involves interleaving different types of problems during learning, which is particularly applicable to subjects such as math and physics (see Fig.  2 a for an example with fractions, based on a study by Patel, Liu, & Koedinger, 2016 ). For example, in a study with college students, Rohrer and Taylor ( 2007 ) found that shuffling math problems that involved calculating the volume of different shapes resulted in better test performance 1 week later than when students answered multiple problems about the same type of shape in a row. This pattern of results has also been replicated with younger students, for example 7 th grade students learning to solve graph and slope problems (Rohrer, Dedrick, & Stershic, 2015 ). The proposed explanation for the benefit of interleaving is that switching between different problem types allows students to acquire the ability to choose the right method for solving different types of problems rather than learning only the method itself, and not when to apply it.

Do the benefits of interleaving extend beyond problem solving? The answer appears to be yes. Interleaving can be helpful in other situations that require discrimination, such as inductive learning. Kornell and Bjork ( 2008 ) examined the effects of interleaving in a task that might be pertinent to a student of the history of art: the ability to match paintings to their respective painters. Students who studied different painters’ paintings interleaved at study were more successful on a later identification test than were participants who studied the paintings blocked by painter. Birnbaum, Kornell, Bjork, and Bjork ( 2013 ) proposed the discriminative-contrast hypothesis to explain that interleaving enhances learning by allowing the comparison between exemplars of different categories. They found support for this hypothesis in a set of experiments with bird categorization: participants benefited from interleaving and also from spacing, but not when the spacing interrupted side-by-side comparisons of birds from different categories.

Another type of interleaving involves the interleaving of study and test opportunities. This type of interleaving has been applied, once again, to problem solving, whereby students alternate between attempting a problem and viewing a worked example (Trafton & Reiser, 1993 ); this pattern appears to be superior to answering a string of problems in a row, at least with respect to the amount of time it takes to achieve mastery of a procedure (Corbett, Reed, Hoffmann, MacLaren, & Wagner, 2010 ). The benefits of interleaving study and test opportunities – rather than blocking study followed by attempting to answer problems or questions – might arise due to a process known as “test-potentiated learning”. That is, a study opportunity that immediately follows a retrieval attempt may be more fruitful than when that same studying was not preceded by retrieval (Arnold & McDermott, 2013 ).

For problem-based subjects, the interleaving technique is straightforward: simply mix questions on homework and quizzes with previous materials (which takes care of spacing as well); for languages, mix vocabulary themes rather than blocking by theme (Thomson & Mehring, 2016 ). But interleaving as an educational strategy ought to be presented to teachers with some caveats. Research has focused on interleaving material that is somewhat related (e.g., solving different mathematical equations, Rohrer et al., 2015 ), whereas students sometimes ask whether they should interleave material from different subjects – a practice that has not received empirical support (Hausman & Kornell, 2014 ). When advising students how to study independently, teachers should thus proceed with caution. Since it is easy for younger students to confuse this type of unhelpful interleaving with the more helpful interleaving of related information, it may be best for teachers of younger grades to create opportunities for interleaving in homework and quiz assignments rather than putting the onus on the students themselves to make use of the technique. Technology can be very helpful here, with apps such as Quizlet, Memrise, Anki, Synap, Quiz Champ, and many others (see also “Learning Scientists”, 2017 ) that not only allow instructor-created quizzes to be taken by students, but also provide built-in interleaving algorithms so that the burden does not fall on the teacher or the student to carefully plan which items are interleaved when.

An important point to consider is that in educational practice, the distinction between spacing and interleaving can be difficult to delineate. The gap between the scientific and classroom definitions of interleaving is demonstrated by teachers’ own writings about this technique. When they write about interleaving, teachers often extend the term to connote a curriculum that involves returning to topics multiple times throughout the year (e.g., Kirby, 2014 ; see “Learning Scientists” ( 2016a ) for a collection of similar blog posts by several other teachers). The “interleaving” of topics throughout the curriculum produces an effect that is more akin to what cognitive psychologists call “spacing” (see Fig.  2 b for a visual representation of the difference between interleaving and spacing). However, cognitive psychologists have not examined the effects of structuring the curriculum in this way, and open questions remain: does repeatedly circling back to previous topics throughout the semester interrupt the learning of new information? What are some effective techniques for interleaving old and new information within one class? And how does one determine the balance between old and new information?

Retrieval practice

While tests are most often used in educational settings for assessment, a lesser-known benefit of tests is that they actually improve memory of the tested information. If we think of our memories as libraries of information, then it may seem surprising that retrieval (which happens when we take a test) improves memory; however, we know from a century of research that retrieving knowledge actually strengthens it (see Karpicke, Lehman, & Aue, 2014 ). Testing was shown to strengthen memory as early as 100 years ago (Gates, 1917 ), and there has been a surge of research in the last decade on the mnemonic benefits of testing, or retrieval practice . Most of the research on the effectiveness of retrieval practice has been done with college students (see Roediger & Karpicke, 2006 ; Roediger, Putnam, & Smith, 2011 ), but retrieval-based learning has been shown to be effective at producing learning for a wide range of ages, including preschoolers (Fritz, Morris, Nolan, & Singleton, 2007 ), elementary-aged children (e.g., Karpicke, Blunt, & Smith, 2016 ; Karpicke, Blunt, Smith, & Karpicke, 2014 ; Lipko-Speed, Dunlosky, & Rawson, 2014 ; Marsh, Fazio, & Goswick, 2012 ; Ritchie, Della Sala, & McIntosh, 2013 ), middle-school students (e.g., McDaniel, Thomas, Agarwal, McDermott, & Roediger, 2013 ; McDermott, Agarwal, D’Antonio, Roediger, & McDaniel, 2014 ), and high-school students (e.g., McDermott et al., 2014 ). In addition, the effectiveness of retrieval-based learning has been extended beyond simple testing to other activities in which retrieval practice can be integrated, such as concept mapping (Blunt & Karpicke, 2014 ; Karpicke, Blunt, et al., 2014 ; Ritchie et al., 2013 ).

A debate is currently ongoing as to the effectiveness of retrieval practice for more complex materials (Karpicke & Aue, 2015 ; Roelle & Berthold, 2017 ; Van Gog & Sweller, 2015 ). Practicing retrieval has been shown to improve the application of knowledge to new situations (e.g., Butler, 2010 ; Dirkx, Kester, & Kirschner, 2014 ); McDaniel et al., 2013 ; Smith, Blunt, Whiffen, & Karpicke, 2016 ); but see Tran, Rohrer, and Pashler ( 2015 ) and Wooldridge, Bugg, McDaniel, and Liu ( 2014 ), for retrieval practice studies that showed limited or no increased transfer compared to restudy. Retrieval practice effects on higher-order learning may be more sensitive than fact learning to encoding factors, such as the way material is presented during study (Eglington & Kang, 2016 ). In addition, retrieval practice may be more beneficial for higher-order learning if it includes more scaffolding (Fiechter & Benjamin, 2017 ; but see Smith, Blunt, et al., 2016 ) and targeted practice with application questions (Son & Rivas, 2016 ).

How does retrieval practice help memory? Figure  3 illustrates both the direct and indirect benefits of retrieval practice identified by the literature. The act of retrieval itself is thought to strengthen memory (Karpicke, Blunt, et al., 2014 ; Roediger & Karpicke, 2006 ; Smith, Roediger, & Karpicke, 2013 ). For example, Smith et al. ( 2013 ) showed that if students brought information to mind without actually producing it (covert retrieval), they remembered the information just as well as if they overtly produced the retrieved information (overt retrieval). Importantly, both overt and covert retrieval practice improved memory over control groups without retrieval practice, even when feedback was not provided. The fact that bringing information to mind in the absence of feedback or restudy opportunities improves memory leads researchers to conclude that it is the act of retrieval – thinking back to bring information to mind – that improves memory of that information.

The benefit of retrieval practice depends to a certain extent on successful retrieval (see Karpicke, Lehman, et al., 2014 ). For example, in Experiment 4 of Smith et al. ( 2013 ), students successfully retrieved 72% of the information during retrieval practice. Of course, retrieving 72% of the information was compared to a restudy control group, during which students were re-exposed to 100% of the information, creating a bias in favor of the restudy condition. Yet retrieval led to superior memory later compared to the restudy control. However, if retrieval success is extremely low, then it is unlikely to improve memory (e.g., Karpicke, Blunt, et al., 2014 ), particularly in the absence of feedback. On the other hand, if retrieval-based learning situations are constructed in such a way that ensures high levels of success, the act of bringing the information to mind may be undermined, thus making it less beneficial. For example, if a student reads a sentence and then immediately covers the sentence and recites it out loud, they are likely not retrieving the information but rather just keeping the information in their working memory long enough to recite it again (see Smith, Blunt, et al., 2016 for a discussion of this point). Thus, it is important to balance success of retrieval with overall difficulty in retrieving the information (Smith & Karpicke, 2014 ; Weinstein, Nunes, & Karpicke, 2016 ). If initial retrieval success is low, then feedback can help improve the overall benefit of practicing retrieval (Kang, McDermott, & Roediger, 2007 ; Smith & Karpicke, 2014 ). Kornell, Klein, and Rawson ( 2015 ), however, found that it was the retrieval attempt and not the correct production of information that produced the retrieval practice benefit – as long as the correct answer was provided after an unsuccessful attempt, the benefit was the same as for a successful retrieval attempt in this set of studies. From a practical perspective, it would be helpful for teachers to know when retrieval attempts in the absence of success are helpful, and when they are not. There may also be additional reasons beyond retrieval benefits that would push teachers towards retrieval practice activities that produce some success amongst students; for example, teachers may hesitate to give students retrieval practice exercises that are too difficult, as this may negatively affect self-efficacy and confidence.

In addition to the fact that bringing information to mind directly improves memory for that information, engaging in retrieval practice can produce indirect benefits as well (see Roediger et al., 2011 ). For example, research by Weinstein, Gilmore, Szpunar, and McDermott ( 2014 ) demonstrated that when students expected to be tested, the increased test expectancy led to better-quality encoding of new information. Frequent testing can also serve to decrease mind-wandering – that is, thoughts that are unrelated to the material that students are supposed to be studying (Szpunar, Khan, & Schacter, 2013 ).

Practicing retrieval is a powerful way to improve meaningful learning of information, and it is relatively easy to implement in the classroom. For example, requiring students to practice retrieval can be as simple as asking students to put their class materials away and try to write out everything they know about a topic. Retrieval-based learning strategies are also flexible. Instructors can give students practice tests (e.g., short-answer or multiple-choice, see Smith & Karpicke, 2014 ), provide open-ended prompts for the students to recall information (e.g., Smith, Blunt, et al., 2016 ) or ask their students to create concept maps from memory (e.g., Blunt & Karpicke, 2014 ). In one study, Weinstein et al. ( 2016 ) looked at the effectiveness of inserting simple short-answer questions into online learning modules to see whether they improved student performance. Weinstein and colleagues also manipulated the placement of the questions. For some students, the questions were interspersed throughout the module, and for other students the questions were all presented at the end of the module. Initial success on the short-answer questions was higher when the questions were interspersed throughout the module. However, on a later test of learning from that module, the original placement of the questions in the module did not matter for performance. As with spaced practice, where the optimal gap between study sessions is contingent on the retention interval, the optimum difficulty and level of success during retrieval practice may also depend on the retention interval. Both groups of students who answered questions performed better on the delayed test compared to a control group without question opportunities during the module. Thus, the important thing is for instructors to provide opportunities for retrieval practice during learning. Based on previous research, any activity that promotes the successful retrieval of information should improve learning.

Retrieval practice has received a lot of attention in teacher blogs (see “Learning Scientists” ( 2016b ) for a collection). A common theme seems to be an emphasis on low-stakes (Young, 2016 ) and even no-stakes (Cox, 2015 ) testing, the goal of which is to increase learning rather than assess performance. In fact, one well-known charter school in the UK has an official homework policy grounded in retrieval practice: students are to test themselves on subject knowledge for 30 minutes every day in lieu of standard homework (Michaela Community School, 2014 ). The utility of homework, particularly for younger children, is often a hotly debated topic outside of academia (e.g., Shumaker, 2016 ; but see Jones ( 2016 ) for an opposing viewpoint and Cooper ( 1989 ) for the original research the blog posts were based on). Whereas some research shows clear links between homework and academic achievement (Valle et al., 2016 ), other researchers have questioned the effectiveness of homework (Dettmers, Trautwein, & Lüdtke, 2009 ). Perhaps amending homework to involve retrieval practice might make it more effective; this remains an open empirical question.

One final consideration is that of test anxiety. While retrieval practice can be very powerful at improving memory, some research shows that pressure during retrieval can undermine some of the learning benefit. For example, Hinze and Rapp ( 2014 ) manipulated pressure during quizzing to create high-pressure and low-pressure conditions. On the quizzes themselves, students performed equally well. However, those in the high-pressure condition did not perform as well on a criterion test later compared to the low-pressure group. Thus, test anxiety may reduce the learning benefit of retrieval practice. Eliminating all high-pressure tests is probably not possible, but instructors can provide a number of low-stakes retrieval opportunities for students to help increase learning. The use of low-stakes testing can serve to decrease test anxiety (Khanna, 2015 ), and has recently been shown to negate the detrimental impact of stress on learning (Smith, Floerke, & Thomas, 2016 ). This is a particularly important line of inquiry to pursue for future research, because many teachers who are not familiar with the effectiveness of retrieval practice may be put off by the implied pressure of “testing”, which evokes the much maligned high-stakes standardized tests (e.g., McHugh, 2013 ).

Elaboration

Elaboration involves connecting new information to pre-existing knowledge. Anderson ( 1983 , p.285) made the following claim about elaboration: “One of the most potent manipulations that can be performed in terms of increasing a subject’s memory for material is to have the subject elaborate on the to-be-remembered material.” Postman ( 1976 , p. 28) defined elaboration most parsimoniously as “additions to nominal input”, and Hirshman ( 2001 , p. 4369) provided an elaboration on this definition (pun intended!), defining elaboration as “A conscious, intentional process that associates to-be-remembered information with other information in memory.” However, in practice, elaboration could mean many different things. The common thread in all the definitions is that elaboration involves adding features to an existing memory.

One possible instantiation of elaboration is thinking about information on a deeper level. The levels (or “depth”) of processing framework, proposed by Craik and Lockhart ( 1972 ), predicts that information will be remembered better if it is processed more deeply in terms of meaning, rather than shallowly in terms of form. The leves of processing framework has, however, received a number of criticisms (Craik, 2002 ). One major problem with this framework is that it is difficult to measure “depth”. And if we are not able to actually measure depth, then the argument can become circular: is it that something was remembered better because it was studied more deeply, or do we conclude that it must have been studied more deeply because it is remembered better? (See Lockhart & Craik, 1990 , for further discussion of this issue).

Another mechanism by which elaboration can confer a benefit to learning is via improvement in organization (Bellezza, Cheesman, & Reddy, 1977 ; Mandler, 1979 ). By this view, elaboration involves making information more integrated and organized with existing knowledge structures. By connecting and integrating the to-be-learned information with other concepts in memory, students can increase the extent to which the ideas are organized in their minds, and this increased organization presumably facilitates the reconstruction of the past at the time of retrieval.

Elaboration is such a broad term and can include so many different techniques that it is hard to claim that elaboration will always help learning. There is, however, a specific technique under the umbrella of elaboration for which there is relatively strong evidence in terms of effectiveness (Dunlosky et al., 2013 ; Pashler et al., 2007 ). This technique is called elaborative interrogation, and involves students questioning the materials that they are studying (Pressley, McDaniel, Turnure, Wood, & Ahmad, 1987 ). More specifically, students using this technique would ask “how” and “why” questions about the concepts they are studying (see Fig.  4 for an example on the physics of flight). Then, crucially, students would try to answer these questions – either from their materials or, eventually, from memory (McDaniel & Donnelly, 1996 ). The process of figuring out the answer to the questions – with some amount of uncertainty (Overoye & Storm, 2015 ) – can help learning. When using this technique, however, it is important that students check their answers with their materials or with the teacher; when the content generated through elaborative interrogation is poor, it can actually hurt learning (Clinton, Alibali, & Nathan, 2016 ).

Students can also be encouraged to self-explain concepts to themselves while learning (Chi, De Leeuw, Chiu, & LaVancher, 1994 ). This might involve students simply saying out loud what steps they need to perform to solve an equation. Aleven and Koedinger ( 2002 ) conducted two classroom studies in which students were either prompted by a “cognitive tutor” to provide self-explanations during a problem-solving task or not, and found that the self-explanations led to improved performance. According to the authors, this approach could scale well to real classrooms. If possible and relevant, students could even perform actions alongside their self-explanations (Cohen, 1981 ; see also the enactment effect, Hainselin, Picard, Manolli, Vankerkore-Candas, & Bourdin, 2017 ). Instructors can scaffold students in these types of activities by providing self-explanation prompts throughout to-be-learned material (O’Neil et al., 2014 ). Ultimately, the greatest potential benefit of accurate self-explanation or elaboration is that the student will be able to transfer their knowledge to a new situation (Rittle-Johnson, 2006 ).

The technical term “elaborative interrogation” has not made it into the vernacular of educational bloggers (a search on https://educationechochamberuncut.wordpress.com , which consolidates over 3,000 UK-based teacher blogs, yielded zero results for that term). However, a few teachers have blogged about elaboration more generally (e.g., Hobbiss, 2016 ) and deep questioning specifically (e.g., Class Teaching, 2013 ), just without using the specific terminology. This strategy in particular may benefit from a more open dialog between researchers and teachers to facilitate the use of elaborative interrogation in the classroom and to address possible barriers to implementation. In terms of advancing the scientific understanding of elaborative interrogation in a classroom setting, it would be informative to conduct a larger-scale intervention to see whether having students elaborate during reading actually helps their understanding. It would also be useful to know whether the students really need to generate their own elaborative interrogation (“how” and “why”) questions, versus answering questions provided by others. How long should students persist to find the answers? When is the right time to have students engage in this task, given the levels of expertise required to do it well (Clinton et al., 2016 )? Without knowing the answers to these questions, it may be too early for us to instruct teachers to use this technique in their classes. Finally, elaborative interrogation takes a long time. Is this time efficiently spent? Or, would it be better to have the students try to answer a few questions, pool their information as a class, and then move to practicing retrieval of the information?

Concrete examples

Providing supporting information can improve the learning of key ideas and concepts. Specifically, using concrete examples to supplement content that is more conceptual in nature can make the ideas easier to understand and remember. Concrete examples can provide several advantages to the learning process: (a) they can concisely convey information, (b) they can provide students with more concrete information that is easier to remember, and (c) they can take advantage of the superior memorability of pictures relative to words (see “Dual Coding”).

Words that are more concrete are both recognized and recalled better than abstract words (Gorman, 1961 ; e.g., “button” and “bound,” respectively). Furthermore, it has been demonstrated that information that is more concrete and imageable enhances the learning of associations, even with abstract content (Caplan & Madan, 2016 ; Madan, Glaholt, & Caplan, 2010 ; Paivio, 1971 ). Following from this, providing concrete examples during instruction should improve retention of related abstract concepts, rather than the concrete examples alone being remembered better. Concrete examples can be useful both during instruction and during practice problems. Having students actively explain how two examples are similar and encouraging them to extract the underlying structure on their own can also help with transfer. In a laboratory study, Berry ( 1983 ) demonstrated that students performed well when given concrete practice problems, regardless of the use of verbalization (akin to elaborative interrogation), but that verbalization helped students transfer understanding from concrete to abstract problems. One particularly important area of future research is determining how students can best make the link between concrete examples and abstract ideas.

Since abstract concepts are harder to grasp than concrete information (Paivio, Walsh, & Bons, 1994 ), it follows that teachers ought to illustrate abstract ideas with concrete examples. However, care must be taken when selecting the examples. LeFevre and Dixon ( 1986 ) provided students with both concrete examples and abstract instructions and found that when these were inconsistent, students followed the concrete examples rather than the abstract instructions, potentially constraining the application of the abstract concept being taught. Lew, Fukawa-Connelly, Mejí-Ramos, and Weber ( 2016 ) used an interview approach to examine why students may have difficulty understanding a lecture. Responses indicated that some issues were related to understanding the overarching topic rather than the component parts, and to the use of informal colloquialisms that did not clearly follow from the material being taught. Both of these issues could have potentially been addressed through the inclusion of a greater number of relevant concrete examples.

One concern with using concrete examples is that students might only remember the examples – especially if they are particularly memorable, such as fun or gimmicky examples – and will not be able to transfer their understanding from one example to another, or more broadly to the abstract concept. However, there does not seem to be any evidence that fun relevant examples actually hurt learning by harming memory for important information. Instead, fun examples and jokes tend to be more memorable, but this boost in memory for the joke does not seem to come at a cost to memory for the underlying concept (Baldassari & Kelley, 2012 ). However, two important caveats need to be highlighted. First, to the extent that the more memorable content is not relevant to the concepts of interest, learning of the target information can be compromised (Harp & Mayer, 1998 ). Thus, care must be taken to ensure that all examples and gimmicks are, in fact, related to the core concepts that the students need to acquire, and do not contain irrelevant perceptual features (Kaminski & Sloutsky, 2013 ).

The second issue is that novices often notice and remember the surface details of an example rather than the underlying structure. Experts, on the other hand, can extract the underlying structure from examples that have divergent surface features (Chi, Feltovich, & Glaser, 1981 ; see Fig.  5 for an example from physics). Gick and Holyoak ( 1983 ) tried to get students to apply a rule from one problem to another problem that appeared different on the surface, but was structurally similar. They found that providing multiple examples helped with this transfer process compared to only using one example – especially when the examples provided had different surface details. More work is also needed to determine how many examples are sufficient for generalization to occur (and this, of course, will vary with contextual factors and individual differences). Further research on the continuum between concrete/specific examples and more abstract concepts would also be informative. That is, if an example is not concrete enough, it may be too difficult to understand. On the other hand, if the example is too concrete, that could be detrimental to generalization to the more abstract concept (although a diverse set of very concrete examples may be able to help with this). In fact, in a controversial article, Kaminski, Sloutsky, and Heckler ( 2008 ) claimed that abstract examples were more effective than concrete examples. Later rebuttals of this paper contested whether the abstract versus concrete distinction was clearly defined in the original study (see Reed, 2008 , for a collection of letters on the subject). This ideal point along the concrete-abstract continuum might also interact with development.

Finding teacher blog posts on concrete examples proved to be more difficult than for the other strategies in this review. One optimistic possibility is that teachers frequently use concrete examples in their teaching, and thus do not think of this as a specific contribution from cognitive psychology; the one blog post we were able to find that discussed concrete examples suggests that this might be the case (Boulton, 2016 ). The idea of “linking abstract concepts with concrete examples” is also covered in 25% of teacher-training textbooks used in the US, according to the report by Pomerance et al. ( 2016 ); this is the second most frequently covered of the six strategies, after “posing probing questions” (i.e., elaborative interrogation). A useful direction for future research would be to establish how teachers are using concrete examples in their practice, and whether we can make any suggestions for improvement based on research into the science of learning. For example, if two examples are better than one (Bauernschmidt, 2017 ), are additional examples also needed, or are there diminishing returns from providing more examples? And, how can teachers best ensure that concrete examples are consistent with prior knowledge (Reed, 2008 )?

Dual coding

Both the memory literature and folk psychology support the notion of visual examples being beneficial—the adage of “a picture is worth a thousand words” (traced back to an advertising slogan from the 1920s; Meider, 1990 ). Indeed, it is well-understood that more information can be conveyed through a simple illustration than through several paragraphs of text (e.g., Barker & Manji, 1989 ; Mayer & Gallini, 1990 ). Illustrations can be particularly helpful when the described concept involves several parts or steps and is intended for individuals with low prior knowledge (Eitel & Scheiter, 2015 ; Mayer & Gallini, 1990 ). Figure  6 provides a concrete example of this, illustrating how information can flow through neurons and synapses.

In addition to being able to convey information more succinctly, pictures are also more memorable than words (Paivio & Csapo, 1969 , 1973 ). In the memory literature, this is referred to as the picture superiority effect , and dual coding theory was developed in part to explain this effect. Dual coding follows from the notion of text being accompanied by complementary visual information to enhance learning. Paivio ( 1971 , 1986 ) proposed dual coding theory as a mechanistic account for the integration of multiple information “codes” to process information. In this theory, a code corresponds to a modal or otherwise distinct representation of a concept—e.g., “mental images for ‘book’ have visual, tactual, and other perceptual qualities similar to those evoked by the referent objects on which the images are based” (Clark & Paivio, 1991 , p. 152). Aylwin ( 1990 ) provides a clear example of how the word “dog” can evoke verbal, visual, and enactive representations (see Fig.  7 for a similar example for the word “SPOON”, based on Aylwin, 1990 (Fig.  2 ) and Madan & Singhal, 2012a (Fig.  3 )). Codes can also correspond to emotional properties (Clark & Paivio, 1991 ; Paivio, 2013 ). Clark and Paivio ( 1991 ) provide a thorough review of dual coding theory and its relation to education, while Paivio ( 2007 ) provides a comprehensive treatise on dual coding theory. Broadly, dual coding theory suggests that providing multiple representations of the same information enhances learning and memory, and that information that more readily evokes additional representations (through automatic imagery processes) receives a similar benefit.

Paivio and Csapo ( 1973 ) suggest that verbal and imaginal codes have independent and additive effects on memory recall. Using visuals to improve learning and memory has been particularly applied to vocabulary learning (Danan, 1992 ; Sadoski, 2005 ), but has also shown success in other domains such as in health care (Hartland, Biddle, & Fallacaro, 2008 ). To take advantage of dual coding, verbal information should be accompanied by a visual representation when possible. However, while the studies discussed all indicate that the use of multiple representations of information is favorable, it is important to acknowledge that each representation also increases cognitive load and can lead to over-saturation (Mayer & Moreno, 2003 ).

Given that pictures are generally remembered better than words, it is important to ensure that the pictures students are provided with are helpful and relevant to the content they are expected to learn. McNeill, Uttal, Jarvin, and Sternberg ( 2009 ) found that providing visual examples decreased conceptual errors. However, McNeill et al. also found that when students were given visually rich examples, they performed more poorly than students who were not given any visual example, suggesting that the visual details can at times become a distraction and hinder performance. Thus, it is important to consider that images used in teaching are clear and not ambiguous in their meaning (Schwartz, 2007 ).

Further broadening the scope of dual coding theory, Engelkamp and Zimmer ( 1984 ) suggest that motor movements, such as “turning the handle,” can provide an additional motor code that can improve memory, linking studies of motor actions (enactment) with dual coding theory (Clark & Paivio, 1991 ; Engelkamp & Cohen, 1991 ; Madan & Singhal, 2012c ). Indeed, enactment effects appear to primarily occur during learning, rather than during retrieval (Peterson & Mulligan, 2010 ). Along similar lines, Wammes, Meade, and Fernandes ( 2016 ) demonstrated that generating drawings can provide memory benefits beyond what could otherwise be explained by visual imagery, picture superiority, and other memory enhancing effects. Providing convergent evidence, even when overt motor actions are not critical in themselves, words representing functional objects have been shown to enhance later memory (Madan & Singhal, 2012b ; Montefinese, Ambrosini, Fairfield, & Mammarella, 2013 ). This indicates that motoric processes can improve memory similarly to visual imagery, similar to memory differences for concrete vs. abstract words. Further research suggests that automatic motor simulation for functional objects is likely responsible for this memory benefit (Madan, Chen, & Singhal, 2016 ).

When teachers combine visuals and words in their educational practice, however, they may not always be taking advantage of dual coding – at least, not in the optimal manner. For example, a recent discussion on Twitter centered around one teacher’s decision to have 7 th Grade students replace certain words in their science laboratory report with a picture of that word (e.g., the instructions read “using a syringe …” and a picture of a syringe replaced the word; Turner, 2016a ). Other teachers argued that this was not dual coding (Beaven, 2016 ; Williams, 2016 ), because there were no longer two different representations of the information. The first teacher maintained that dual coding was preserved, because this laboratory report with pictures was to be used alongside the original, fully verbal report (Turner, 2016b ). This particular implementation – having students replace individual words with pictures – has not been examined in the cognitive literature, presumably because no benefit would be expected. In any case, we need to be clearer about implementations for dual coding, and more research is needed to clarify how teachers can make use of the benefits conferred by multiple representations and picture superiority.

Critically, dual coding theory is distinct from the notion of “learning styles,” which describe the idea that individuals benefit from instruction that matches their modality preference. While this idea is pervasive and individuals often subjectively feel that they have a preference, evidence indicates that the learning styles theory is not supported by empirical findings (e.g., Kavale, Hirshoren, & Forness, 1998 ; Pashler, McDaniel, Rohrer, & Bjork, 2008 ; Rohrer & Pashler, 2012 ). That is, there is no evidence that instructing students in their preferred learning style leads to an overall improvement in learning (the “meshing” hypothesis). Moreover, learning styles have come to be described as a myth or urban legend within psychology (Coffield, Moseley, Hall, & Ecclestone, 2004 ; Hattie & Yates, 2014 ; Kirschner & van Merriënboer, 2013 ; Kirschner, 2017 ); skepticism about learning styles is a common stance amongst evidence-informed teachers (e.g., Saunders, 2016 ). Providing evidence against the notion of learning styles, Kraemer, Rosenberg, and Thompson-Schill ( 2009 ) found that individuals who scored as “verbalizers” and “visualizers” did not perform any better on experimental trials matching their preference. Instead, it has recently been shown that learning through one’s preferred learning style is associated with elevated subjective judgements of learning, but not objective performance (Knoll, Otani, Skeel, & Van Horn, 2017 ). In contrast to learning styles, dual coding is based on providing additional, complementary forms of information to enhance learning, rather than tailoring instruction to individuals’ preferences.

Genuine educational environments present many opportunities for combining the strategies outlined above. Spacing can be particularly potent for learning if it is combined with retrieval practice. The additive benefits of retrieval practice and spacing can be gained by engaging in retrieval practice multiple times (also known as distributed practice; see Cepeda et al., 2006 ). Interleaving naturally entails spacing if students interleave old and new material. Concrete examples can be both verbal and visual, making use of dual coding. In addition, the strategies of elaboration, concrete examples, and dual coding all work best when used as part of retrieval practice. For example, in the concept-mapping studies mentioned above (Blunt & Karpicke, 2014 ; Karpicke, Blunt, et al., 2014 ), creating concept maps while looking at course materials (e.g., a textbook) was not as effective for later memory as creating concept maps from memory. When practicing elaborative interrogation, students can start off answering the “how” and “why” questions they pose for themselves using class materials, and work their way up to answering them from memory. And when interleaving different problem types, students should be practicing answering them rather than just looking over worked examples.

But while these ideas for strategy combinations have empirical bases, it has not yet been established whether the benefits of the strategies to learning are additive, super-additive, or, in some cases, incompatible. Thus, future research needs to (a) better formalize the definition of each strategy (particularly critical for elaboration and dual coding), (b) identify best practices for implementation in the classroom, (c) delineate the boundary conditions of each strategy, and (d) strategically investigate interactions between the six strategies we outlined in this manuscript.

Aleven, V. A., & Koedinger, K. R. (2002). An effective metacognitive strategy: learning by doing and explaining with a computer-based cognitive tutor. Cognitive Science, 26 , 147–179.

Article   Google Scholar  

Anderson, J. R. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior, 22 , 261–295.

Arnold, K. M., & McDermott, K. B. (2013). Test-potentiated learning: distinguishing between direct and indirect effects of tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39 , 940–945.

PubMed   Google Scholar  

Aylwin, S. (1990). Imagery and affect: big questions, little answers. In P. J. Thompson, D. E. Marks, & J. T. E. Richardson (Eds.), Imagery: Current developments . New York: International Library of Psychology.

Google Scholar  

Baldassari, M. J., & Kelley, M. (2012). Make’em laugh? The mnemonic effect of humor in a speech. Psi Chi Journal of Psychological Research, 17 , 2–9.

Barker, P. G., & Manji, K. A. (1989). Pictorial dialogue methods. International Journal of Man-Machine Studies, 31 , 323–347.

Bauernschmidt, A. (2017). GUEST POST: two examples are better than one. [Blog post]. The Learning Scientists Blog . Retrieved from http://www.learningscientists.org/blog/2017/5/30-1 . Accessed 25 Dec 2017.

Beaven, T. (2016). @doctorwhy @FurtherEdagogy @doc_kristy Right, I thought the whole point of dual coding was to use TWO codes: pics + words of the SAME info? [Tweet]. Retrieved from https://twitter.com/TitaBeaven/status/807504041341308929 . Accessed 25 Dec 2017.

Bellezza, F. S., Cheesman, F. L., & Reddy, B. G. (1977). Organization and semantic elaboration in free recall. Journal of Experimental Psychology: Human Learning and Memory, 3 , 539–550.

Benney, D. (2016). (Trying to apply) spacing in a content heavy subject [Blog post]. Retrieved from https://mrbenney.wordpress.com/2016/10/16/trying-to-apply-spacing-in-science/ . Accessed 25 Dec 2017.

Berry, D. C. (1983). Metacognitive experience and transfer of logical reasoning. Quarterly Journal of Experimental Psychology, 35A , 39–49.

Birnbaum, M. S., Kornell, N., Bjork, E. L., & Bjork, R. A. (2013). Why interleaving enhances inductive learning: the roles of discrimination and retrieval. Memory & Cognition, 41 , 392–402.

Bjork, R. A. (1999). Assessing our own competence: heuristics and illusions. In D. Gopher & A. Koriat (Eds.), Attention and peformance XVII. Cognitive regulation of performance: Interaction of theory and application (pp. 435–459). Cambridge, MA: MIT Press.

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press.

Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. From learning processes to cognitive processes: Essays in honor of William K. Estes, 2 , 35–67.

Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: creating desirable difficulties to enhance learning. Psychology and the real world: Essays illustrating fundamental contributions to society , 56–64.

Blunt, J. R., & Karpicke, J. D. (2014). Learning with retrieval-based concept mapping. Journal of Educational Psychology, 106 , 849–858.

Boulton, K. (2016). What does cognitive overload look like in the humanities? [Blog post]. Retrieved from https://educationechochamberuncut.wordpress.com/2016/03/05/what-does-cognitive-overload-look-like-in-the-humanities-kris-boulton-2/ . Accessed 25 Dec 2017.

Brown, P. C., Roediger, H. L., & McDaniel, M. A. (2014). Make it stick . Cambridge, MA: Harvard University Press.

Book   Google Scholar  

Butler, A. C. (2010). Repeated testing produces superior transfer of learning relative to repeated studying. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36 , 1118–1133.

Caplan, J. B., & Madan, C. R. (2016). Word-imageability enhances association-memory by recruiting hippocampal activity. Journal of Cognitive Neuroscience, 28 , 1522–1538.

Article   PubMed   Google Scholar  

Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: a review and quantitative synthesis. Psychological Bulletin, 132 , 354–380.

Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning a temporal ridgeline of optimal retention. Psychological Science, 19 , 1095–1102.

Chi, M. T., De Leeuw, N., Chiu, M. H., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18 , 439–477.

Chi, M. T., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5 , 121–152.

CIFE. (2012). No January A level and other changes. Retrieved from http://www.cife.org.uk/cife-general-news/no-january-a-level-and-other-changes/ . Accessed 25 Dec 2017.

Clark, D. (2016). One book on learning that every teacher, lecturer & trainer should read (7 reasons) [Blog post]. Retrieved from http://donaldclarkplanb.blogspot.com/2016/03/one-book-on-learning-that-every-teacher.html . Accessed 25 Dec 2017.

Clark, J. M., & Paivio, A. (1991). Dual coding theory and education. Educational Psychology Review, 3 , 149–210.

Class Teaching. (2013). Deep questioning [Blog post]. Retrieved from https://classteaching.wordpress.com/2013/07/12/deep-questioning/ . Accessed 25 Dec 2017.

Clinton, V., Alibali, M. W., & Nathan, M. J. (2016). Learning about posterior probability: do diagrams and elaborative interrogation help? The Journal of Experimental Education, 84 , 579–599.

Coffield, F., Moseley, D., Hall, E., & Ecclestone, K. (2004). Learning styles and pedagogy in post-16 learning: a systematic and critical review . London: Learning & Skills Research Centre.

Cohen, R. L. (1981). On the generality of some memory laws. Scandinavian Journal of Psychology, 22 , 267–281.

Cooper, H. (1989). Synthesis of research on homework. Educational Leadership, 47 , 85–91.

Corbett, A. T., Reed, S. K., Hoffmann, R., MacLaren, B., & Wagner, A. (2010). Interleaving worked examples and cognitive tutor support for algebraic modeling of problem situations. In Proceedings of the Thirty-Second Annual Meeting of the Cognitive Science Society (pp. 2882–2887).

Cox, D. (2015). No stakes testing – not telling students their results [Blog post]. Retrieved from https://missdcoxblog.wordpress.com/2015/06/06/no-stakes-testing-not-telling-students-their-results/ . Accessed 25 Dec 2017.

Cox, D. (2016a). Ditch revision. Teach it well [Blog post]. Retrieved from https://missdcoxblog.wordpress.com/2016/01/09/ditch-revision-teach-it-well/ . Accessed 25 Dec 2017.

Cox, D. (2016b). ‘They need to remember this in three years time’: spacing & interleaving for the new GCSEs [Blog post]. Retrieved from https://missdcoxblog.wordpress.com/2016/03/25/they-need-to-remember-this-in-three-years-time-spacing-interleaving-for-the-new-gcses/ . Accessed 25 Dec 2017.

Craik, F. I. (2002). Levels of processing: past, present… future? Memory, 10 , 305–318.

Craik, F. I., & Lockhart, R. S. (1972). Levels of processing: a framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11 , 671–684.

Danan, M. (1992). Reversed subtitling and dual coding theory: new directions for foreign language instruction. Language Learning, 42 , 497–527.

Dettmers, S., Trautwein, U., & Lüdtke, O. (2009). The relationship between homework time and achievement is not universal: evidence from multilevel analyses in 40 countries. School Effectiveness and School Improvement, 20 , 375–405.

Dirkx, K. J., Kester, L., & Kirschner, P. A. (2014). The testing effect for learning principles and procedures from texts. The Journal of Educational Research, 107 , 357–364.

Dunlosky, J. (2013). Strengthening the student toolbox: study strategies to boost learning. American Educator, 37 (3), 12–21.

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques: promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14 , 4–58.

Ebbinghaus, H. (1913). Memory (HA Ruger & CE Bussenius, Trans.). New York: Columbia University, Teachers College. (Original work published 1885) . Retrieved from http://psychclassics.yorku.ca/Ebbinghaus/memory8.htm . Accessed 25 Dec 2017.

Eglington, L. G., & Kang, S. H. (2016). Retrieval practice benefits deductive inference. Educational Psychology Review , 1–14.

Eitel, A., & Scheiter, K. (2015). Picture or text first? Explaining sequential effects when learning with pictures and text. Educational Psychology Review, 27 , 153–180.

Engelkamp, J., & Cohen, R. L. (1991). Current issues in memory of action events. Psychological Research, 53 , 175–182.

Engelkamp, J., & Zimmer, H. D. (1984). Motor programme information as a separable memory unit. Psychological Research, 46 , 283–299.

Fawcett, D. (2013). Can I be that little better at……using cognitive science/psychology/neurology to plan learning? [Blog post]. Retrieved from http://reflectionsofmyteaching.blogspot.com/2013/09/can-i-be-that-little-better-atusing.html . Accessed 25 Dec 2017.

Fiechter, J. L., & Benjamin, A. S. (2017). Diminishing-cues retrieval practice: a memory-enhancing technique that works when regular testing doesn’t. Psychonomic Bulletin & Review , 1–9.

Firth, J. (2016). Spacing in teaching practice [Blog post]. Retrieved from http://www.learningscientists.org/blog/2016/4/12-1 . Accessed 25 Dec 2017.

Fordham, M. [mfordhamhistory]. (2016). Is there a meaningful distinction in psychology between ‘thinking’ & ‘critical thinking’? [Tweet]. Retrieved from https://twitter.com/mfordhamhistory/status/809525713623781377 . Accessed 25 Dec 2017.

Fritz, C. O., Morris, P. E., Nolan, D., & Singleton, J. (2007). Expanding retrieval practice: an effective aid to preschool children’s learning. The Quarterly Journal of Experimental Psychology, 60 , 991–1004.

Gates, A. I. (1917). Recitation as a factory in memorizing. Archives of Psychology, 6.

Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15 , 1–38.

Gorman, A. M. (1961). Recognition memory for nouns as a function of abstractedness and frequency. Journal of Experimental Psychology, 61 , 23–39.

Hainselin, M., Picard, L., Manolli, P., Vankerkore-Candas, S., & Bourdin, B. (2017). Hey teacher, don’t leave them kids alone: action is better for memory than reading. Frontiers in Psychology , 8 .

Harp, S. F., & Mayer, R. E. (1998). How seductive details do their damage. Journal of Educational Psychology, 90 , 414–434.

Hartland, W., Biddle, C., & Fallacaro, M. (2008). Audiovisual facilitation of clinical knowledge: A paradigm for dispersed student education based on Paivio’s dual coding theory. AANA Journal, 76 , 194–198.

Hattie, J., & Yates, G. (2014). Visible learning and the science of how we learn . New York: Routledge.

Hausman, H., & Kornell, N. (2014). Mixing topics while studying does not enhance learning. Journal of Applied Research in Memory and Cognition, 3 , 153–160.

Hinze, S. R., & Rapp, D. N. (2014). Retrieval (sometimes) enhances learning: performance pressure reduces the benefits of retrieval practice. Applied Cognitive Psychology, 28 , 597–606.

Hirshman, E. (2001). Elaboration in memory. In N. J. Smelser & P. B. Baltes (Eds.), International encyclopedia of the social & behavioral sciences (pp. 4369–4374). Oxford: Pergamon.

Chapter   Google Scholar  

Hobbiss, M. (2016). Make it meaningful! Elaboration [Blog post]. Retrieved from https://hobbolog.wordpress.com/2016/06/09/make-it-meaningful-elaboration/ . Accessed 25 Dec 2017.

Jones, F. (2016). Homework – is it really that useless? [Blog post]. Retrieved from http://www.learningscientists.org/blog/2016/4/5-1 . Accessed 25 Dec 2017.

Kaminski, J. A., & Sloutsky, V. M. (2013). Extraneous perceptual information interferes with children’s acquisition of mathematical knowledge. Journal of Educational Psychology, 105 (2), 351–363.

Kaminski, J. A., Sloutsky, V. M., & Heckler, A. F. (2008). The advantage of abstract examples in learning math. Science, 320 , 454–455.

Kang, S. H. (2016). Spaced repetition promotes efficient and effective learning policy implications for instruction. Policy Insights from the Behavioral and Brain Sciences, 3 , 12–19.

Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format and corrective feedback modify the effects of testing on long-term retention. European Journal of Cognitive Psychology, 19 , 528–558.

Karpicke, J. D., & Aue, W. R. (2015). The testing effect is alive and well with complex materials. Educational Psychology Review, 27 , 317–326.

Karpicke, J. D., Blunt, J. R., Smith, M. A., & Karpicke, S. S. (2014). Retrieval-based learning: The need for guided retrieval in elementary school children. Journal of Applied Research in Memory and Cognition, 3 , 198–206.

Karpicke, J. D., Lehman, M., & Aue, W. R. (2014). Retrieval-based learning: an episodic context account. In B. H. Ross (Ed.), Psychology of Learning and Motivation (Vol. 61, pp. 237–284). San Diego, CA: Elsevier Academic Press.

Karpicke, J. D., Blunt, J. R., & Smith, M. A. (2016). Retrieval-based learning: positive effects of retrieval practice in elementary school children. Frontiers in Psychology, 7 .

Kavale, K. A., Hirshoren, A., & Forness, S. R. (1998). Meta-analytic validation of the Dunn and Dunn model of learning-style preferences: a critique of what was Dunn. Learning Disabilities Research & Practice, 13 , 75–80.

Khanna, M. M. (2015). Ungraded pop quizzes: test-enhanced learning without all the anxiety. Teaching of Psychology, 42 , 174–178.

Kirby, J. (2014). One scientific insight for curriculum design [Blog post]. Retrieved from https://pragmaticreform.wordpress.com/2014/05/05/scientificcurriculumdesign/ . Accessed 25 Dec 2017.

Kirschner, P. A. (2017). Stop propagating the learning styles myth. Computers & Education, 106 , 166–171.

Kirschner, P. A., & van Merriënboer, J. J. G. (2013). Do learners really know best? Urban legends in education. Educational Psychologist, 48 , 169–183.

Knoll, A. R., Otani, H., Skeel, R. L., & Van Horn, K. R. (2017). Learning style, judgments of learning, and learning of verbal and visual information. British Journal of Psychology, 108 , 544-563.

Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories is spacing the “enemy of induction”? Psychological Science, 19 , 585–592.

Kornell, N., & Finn, B. (2016). Self-regulated learning: an overview of theory and data. In J. Dunlosky & S. Tauber (Eds.), The Oxford Handbook of Metamemory (pp. 325–340). New York: Oxford University Press.

Kornell, N., Klein, P. J., & Rawson, K. A. (2015). Retrieval attempts enhance learning, but retrieval success (versus failure) does not matter. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41 , 283–294.

Kraemer, D. J. M., Rosenberg, L. M., & Thompson-Schill, S. L. (2009). The neural correlates of visual and verbal cognitive styles. Journal of Neuroscience, 29 , 3792–3798.

Article   PubMed   PubMed Central   Google Scholar  

Kraft, N. (2015). Spaced practice and repercussions for teaching. Retrieved from http://nathankraft.blogspot.com/2015/08/spaced-practice-and-repercussions-for.html . Accessed 25 Dec 2017.

Learning Scientists. (2016a). Weekly Digest #3: How teachers implement interleaving in their curriculum [Blog post]. Retrieved from http://www.learningscientists.org/blog/2016/3/28/weekly-digest-3 . Accessed 25 Dec 2017.

Learning Scientists. (2016b). Weekly Digest #13: how teachers implement retrieval in their classrooms [Blog post]. Retrieved from http://www.learningscientists.org/blog/2016/6/5/weekly-digest-13 . Accessed 25 Dec 2017.

Learning Scientists. (2016c). Weekly Digest #40: teachers’ implementation of principles from “Make It Stick” [Blog post]. Retrieved from http://www.learningscientists.org/blog/2016/12/18-1 . Accessed 25 Dec 2017.

Learning Scientists. (2017). Weekly Digest #54: is there an app for that? Studying 2.0 [Blog post]. Retrieved from http://www.learningscientists.org/blog/2017/4/9/weekly-digest-54 . Accessed 25 Dec 2017.

LeFevre, J.-A., & Dixon, P. (1986). Do written instructions need examples? Cognition and Instruction, 3 , 1–30.

Lew, K., Fukawa-Connelly, T., Mejí-Ramos, J. P., & Weber, K. (2016). Lectures in advanced mathematics: Why students might not understand what the mathematics professor is trying to convey. Journal of Research in Mathematics Education, 47 , 162–198.

Lindsey, R. V., Shroyer, J. D., Pashler, H., & Mozer, M. C. (2014). Improving students’ long-term knowledge retention through personalized review. Psychological Science, 25 , 639–647.

Lipko-Speed, A., Dunlosky, J., & Rawson, K. A. (2014). Does testing with feedback help grade-school children learn key concepts in science? Journal of Applied Research in Memory and Cognition, 3 , 171–176.

Lockhart, R. S., & Craik, F. I. (1990). Levels of processing: a retrospective commentary on a framework for memory research. Canadian Journal of Psychology, 44 , 87–112.

Lovell, O. (2017). How do we know what to put on the quiz? [Blog Post]. Retrieved from http://www.ollielovell.com/olliesclassroom/know-put-quiz/ . Accessed 25 Dec 2017.

Luehmann, A. L. (2008). Using blogging in support of teacher professional identity development: a case study. The Journal of the Learning Sciences, 17 , 287–337.

Madan, C. R., Glaholt, M. G., & Caplan, J. B. (2010). The influence of item properties on association-memory. Journal of Memory and Language, 63 , 46–63.

Madan, C. R., & Singhal, A. (2012a). Motor imagery and higher-level cognition: four hurdles before research can sprint forward. Cognitive Processing, 13 , 211–229.

Madan, C. R., & Singhal, A. (2012b). Encoding the world around us: motor-related processing influences verbal memory. Consciousness and Cognition, 21 , 1563–1570.

Madan, C. R., & Singhal, A. (2012c). Using actions to enhance memory: effects of enactment, gestures, and exercise on human memory. Frontiers in Psychology, 3 .

Madan, C. R., Chen, Y. Y., & Singhal, A. (2016). ERPs differentially reflect automatic and deliberate processing of the functional manipulability of objects. Frontiers in Human Neuroscience, 10 .

Mandler, G. (1979). Organization and repetition: organizational principles with special reference to rote learning. In L. G. Nilsson (Ed.), Perspectives on Memory Research (pp. 293–327). New York: Academic Press.

Marsh, E. J., Fazio, L. K., & Goswick, A. E. (2012). Memorial consequences of testing school-aged children. Memory, 20 , 899–906.

Mayer, R. E., & Gallini, J. K. (1990). When is an illustration worth ten thousand words? Journal of Educational Psychology, 82 , 715–726.

Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38 , 43–52.

McDaniel, M. A., & Donnelly, C. M. (1996). Learning with analogy and elaborative interrogation. Journal of Educational Psychology, 88 , 508–519.

McDaniel, M. A., Thomas, R. C., Agarwal, P. K., McDermott, K. B., & Roediger, H. L. (2013). Quizzing in middle-school science: successful transfer performance on classroom exams. Applied Cognitive Psychology, 27 , 360–372.

McDermott, K. B., Agarwal, P. K., D’Antonio, L., Roediger, H. L., & McDaniel, M. A. (2014). Both multiple-choice and short-answer quizzes enhance later exam performance in middle and high school classes. Journal of Experimental Psychology: Applied, 20 , 3–21.

McHugh, A. (2013). High-stakes tests: bad for students, teachers, and education in general [Blog post]. Retrieved from https://teacherbiz.wordpress.com/2013/07/01/high-stakes-tests-bad-for-students-teachers-and-education-in-general/ . Accessed 25 Dec 2017.

McNeill, N. M., Uttal, D. H., Jarvin, L., & Sternberg, R. J. (2009). Should you show me the money? Concrete objects both hurt and help performance on mathematics problems. Learning and Instruction, 19 , 171–184.

Meider, W. (1990). “A picture is worth a thousand words”: from advertising slogan to American proverb. Southern Folklore, 47 , 207–225.

Michaela Community School. (2014). Homework. Retrieved from http://mcsbrent.co.uk/homework-2/ . Accessed 25 Dec 2017.

Montefinese, M., Ambrosini, E., Fairfield, B., & Mammarella, N. (2013). The “subjective” pupil old/new effect: is the truth plain to see? International Journal of Psychophysiology, 89 , 48–56.

O’Neil, H. F., Chung, G. K., Kerr, D., Vendlinski, T. P., Buschang, R. E., & Mayer, R. E. (2014). Adding self-explanation prompts to an educational computer game. Computers In Human Behavior, 30 , 23–28.

Overoye, A. L., & Storm, B. C. (2015). Harnessing the power of uncertainty to enhance learning. Translational Issues in Psychological Science, 1 , 140–148.

Paivio, A. (1971). Imagery and verbal processes . New York: Holt, Rinehart and Winston.

Paivio, A. (1986). Mental representations: a dual coding approach . New York: Oxford University Press.

Paivio, A. (2007). Mind and its evolution: a dual coding theoretical approach . Mahwah: Erlbaum.

Paivio, A. (2013). Dual coding theory, word abstractness, and emotion: a critical review of Kousta et al. (2011). Journal of Experimental Psychology: General, 142 , 282–287.

Paivio, A., & Csapo, K. (1969). Concrete image and verbal memory codes. Journal of Experimental Psychology, 80 , 279–285.

Paivio, A., & Csapo, K. (1973). Picture superiority in free recall: imagery or dual coding? Cognitive Psychology, 5 , 176–206.

Paivio, A., Walsh, M., & Bons, T. (1994). Concreteness effects on memory: when and why? Journal of Experimental Psychology: Learning, Memory, and Cognition, 20 , 1196–1204.

Pashler, H., McDaniel, M., Rohrer, D., & Bjork, R. (2008). Learning styles: concepts and evidence. Psychological Science in the Public Interest, 9 , 105–119.

Pashler, H., Bain, P. M., Bottge, B. A., Graesser, A., Koedinger, K., McDaniel, M., & Metcalfe, J. (2007). Organizing instruction and study to improve student learning. IES practice guide. NCER 2007–2004. National Center for Education Research .

Patel, R., Liu, R., & Koedinger, K. (2016). When to block versus interleave practice? Evidence against teaching fraction addition before fraction multiplication. In Proceedings of the 38th Annual Meeting of the Cognitive Science Society, Philadelphia, PA .

Penfound, B. (2017). Journey to interleaved practice #2 [Blog Post]. Retrieved from https://fullstackcalculus.com/2017/02/03/journey-to-interleaved-practice-2/ . Accessed 25 Dec 2017.

Penfound, B. [BryanPenfound]. (2016). Does blocked practice/learning lessen cognitive load? Does interleaved practice/learning provide productive struggle? [Tweet]. Retrieved from https://twitter.com/BryanPenfound/status/808759362244087808 . Accessed 25 Dec 2017.

Peterson, D. J., & Mulligan, N. W. (2010). Enactment and retrieval. Memory & Cognition, 38 , 233–243.

Picciotto, H. (2009). Lagging homework [Blog post]. Retrieved from http://blog.mathedpage.org/2013/06/lagging-homework.html . Accessed 25 Dec 2017.

Pomerance, L., Greenberg, J., & Walsh, K. (2016). Learning about learning: what every teacher needs to know. Retrieved from http://www.nctq.org/dmsView/Learning_About_Learning_Report . Accessed 25 Dec 2017.

Postman, L. (1976). Methodology of human learning. In W. K. Estes (Ed.), Handbook of learning and cognitive processes (Vol. 3). Hillsdale: Erlbaum.

Pressley, M., McDaniel, M. A., Turnure, J. E., Wood, E., & Ahmad, M. (1987). Generation and precision of elaboration: effects on intentional and incidental learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13 , 291–300.

Reed, S. K. (2008). Concrete examples must jibe with experience. Science, 322 , 1632–1633.

researchED. (2013). How it all began. Retrieved from http://www.researched.org.uk/about/our-story/ . Accessed 25 Dec 2017.

Ritchie, S. J., Della Sala, S., & McIntosh, R. D. (2013). Retrieval practice, with or without mind mapping, boosts fact learning in primary school children. PLoS One, 8 (11), e78976.

Rittle-Johnson, B. (2006). Promoting transfer: effects of self-explanation and direct instruction. Child Development, 77 , 1–15.

Roediger, H. L. (1985). Remembering Ebbinghaus. [Retrospective review of the book On Memory , by H. Ebbinghaus]. Contemporary Psychology, 30 , 519–523.

Roediger, H. L. (2013). Applying cognitive psychology to education translational educational science. Psychological Science in the Public Interest, 14 , 1–3.

Roediger, H. L., & Karpicke, J. D. (2006). The power of testing memory: basic research and implications for educational practice. Perspectives on Psychological Science, 1 , 181–210.

Roediger, H. L., Putnam, A. L., & Smith, M. A. (2011). Ten benefits of testing and their applications to educational practice. In J. Mester & B. Ross (Eds.), The psychology of learning and motivation: cognition in education (pp. 1–36). Oxford: Elsevier.

Roediger, H. L., Finn, B., & Weinstein, Y. (2012). Applications of cognitive science to education. In Della Sala, S., & Anderson, M. (Eds.), Neuroscience in education: the good, the bad, and the ugly . Oxford, UK: Oxford University Press.

Roelle, J., & Berthold, K. (2017). Effects of incorporating retrieval into learning tasks: the complexity of the tasks matters. Learning and Instruction, 49 , 142–156.

Rohrer, D. (2012). Interleaving helps students distinguish among similar concepts. Educational Psychology Review, 24(3), 355–367.

Rohrer, D., Dedrick, R. F., & Stershic, S. (2015). Interleaved practice improves mathematics learning. Journal of Educational Psychology, 107 , 900–908.

Rohrer, D., & Pashler, H. (2012). Learning styles: Where’s the evidence? Medical Education, 46 , 34–35.

Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science, 35 , 481–498.

Rose, N. (2014). Improving the effectiveness of homework [Blog post]. Retrieved from https://evidenceintopractice.wordpress.com/2014/03/20/improving-the-effectiveness-of-homework/ . Accessed 25 Dec 2017.

Sadoski, M. (2005). A dual coding view of vocabulary learning. Reading & Writing Quarterly, 21 , 221–238.

Saunders, K. (2016). It really is time we stopped talking about learning styles [Blog post]. Retrieved from http://martingsaunders.com/2016/10/it-really-is-time-we-stopped-talking-about-learning-styles/ . Accessed 25 Dec 2017.

Schwartz, D. (2007). If a picture is worth a thousand words, why are you reading this essay? Social Psychology Quarterly, 70 , 319–321.

Shumaker, H. (2016). Homework is wrecking our kids: the research is clear, let’s ban elementary homework. Salon. Retrieved from http://www.salon.com/2016/03/05/homework_is_wrecking_our_kids_the_research_is_clear_lets_ban_elementary_homework . Accessed 25 Dec 2017.

Smith, A. M., Floerke, V. A., & Thomas, A. K. (2016). Retrieval practice protects memory against acute stress. Science, 354 , 1046–1048.

Smith, M. A., Blunt, J. R., Whiffen, J. W., & Karpicke, J. D. (2016). Does providing prompts during retrieval practice improve learning? Applied Cognitive Psychology, 30 , 784–802.

Smith, M. A., & Karpicke, J. D. (2014). Retrieval practice with short-answer, multiple-choice, and hybrid formats. Memory, 22 , 784–802.

Smith, M. A., Roediger, H. L., & Karpicke, J. D. (2013). Covert retrieval practice benefits retention as much as overt retrieval practice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39 , 1712–1725.

Son, J. Y., & Rivas, M. J. (2016). Designing clicker questions to stimulate transfer. Scholarship of Teaching and Learning in Psychology, 2 , 193–207.

Szpunar, K. K., Khan, N. Y., & Schacter, D. L. (2013). Interpolated memory tests reduce mind wandering and improve learning of online lectures. Proceedings of the National Academy of Sciences, 110 , 6313–6317.

Thomson, R., & Mehring, J. (2016). Better vocabulary study strategies for long-term learning. Kwansei Gakuin University Humanities Review, 20 , 133–141.

Trafton, J. G., & Reiser, B. J. (1993). Studying examples and solving problems: contributions to skill acquisition . Technical report, Naval HCI Research Lab, Washington, DC, USA.

Tran, R., Rohrer, D., & Pashler, H. (2015). Retrieval practice: the lack of transfer to deductive inferences. Psychonomic Bulletin & Review, 22 , 135–140.

Turner, K. [doc_kristy]. (2016a). My dual coding (in red) and some y8 work @AceThatTest they really enjoyed practising the technique [Tweet]. Retrieved from https://twitter.com/doc_kristy/status/807220355395977216 . Accessed 25 Dec 2017.

Turner, K. [doc_kristy]. (2016b). @FurtherEdagogy @doctorwhy their work is revision work, they already have the words on a different page, to compliment not replace [Tweet]. Retrieved from https://twitter.com/doc_kristy/status/807360265100599301 . Accessed 25 Dec 2017.

Valle, A., Regueiro, B., Núñez, J. C., Rodríguez, S., Piñeiro, I., & Rosário, P. (2016). Academic goals, student homework engagement, and academic achievement in elementary school. Frontiers in Psychology, 7 .

Van Gog, T., & Sweller, J. (2015). Not new, but nearly forgotten: the testing effect decreases or even disappears as the complexity of learning materials increases. Educational Psychology Review, 27 , 247–264.

Wammes, J. D., Meade, M. E., & Fernandes, M. A. (2016). The drawing effect: evidence for reliable and robust memory benefits in free recall. Quarterly Journal of Experimental Psychology, 69 , 1752–1776.

Weinstein, Y., Gilmore, A. W., Szpunar, K. K., & McDermott, K. B. (2014). The role of test expectancy in the build-up of proactive interference in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40 , 1039–1048.

Weinstein, Y., Nunes, L. D., & Karpicke, J. D. (2016). On the placement of practice questions during study. Journal of Experimental Psychology: Applied, 22 , 72–84.

Weinstein, Y., & Weinstein-Jones, F. (2017). Topic and quiz spacing spreadsheet: a planning tool for teachers [Blog Post]. Retrieved from http://www.learningscientists.org/blog/2017/5/11-1 . Accessed 25 Dec 2017.

Weinstein-Jones, F., & Weinstein, Y. (2017). Topic spacing spreadsheet for teachers [Excel macro]. Zenodo. http://doi.org/10.5281/zenodo.573764 . Accessed 25 Dec 2017.

Williams, D. [FurtherEdagogy]. (2016). @doctorwhy @doc_kristy word accompanying the visual? I’m unclear how removing words benefit? Would a flow chart better suit a scientific exp? [Tweet]. Retrieved from https://twitter.com/FurtherEdagogy/status/807356800509104128 . Accessed 25 Dec 2017.

Wood, B. (2017). And now for something a little bit different….[Blog post]. Retrieved from https://justateacherstandinginfrontofaclass.wordpress.com/2017/04/20/and-now-for-something-a-little-bit-different/ . Accessed 25 Dec 2017.

Wooldridge, C. L., Bugg, J. M., McDaniel, M. A., & Liu, Y. (2014). The testing effect with authentic educational materials: a cautionary note. Journal of Applied Research in Memory and Cognition, 3 , 214–221.

Young, C. (2016). Mini-tests. Retrieved from https://colleenyoung.wordpress.com/revision-activities/mini-tests/ . Accessed 25 Dec 2017.

Download references

Acknowledgements

Not applicable.

YW and MAS were partially supported by a grant from The IDEA Center.

Availability of data and materials

Author information, authors and affiliations.

Department of Psychology, University of Massachusetts Lowell, Lowell, MA, USA

Yana Weinstein

Department of Psychology, Boston College, Chestnut Hill, MA, USA

Christopher R. Madan

School of Psychology, University of Nottingham, Nottingham, UK

Department of Psychology, Rhode Island College, Providence, RI, USA

Megan A. Sumeracki

You can also search for this author in PubMed   Google Scholar

Contributions

YW took the lead on writing the “Spaced practice”, “Interleaving”, and “Elaboration” sections. CRM took the lead on writing the “Concrete examples” and “Dual coding” sections. MAS took the lead on writing the “Retrieval practice” section. All authors edited each others’ sections. All authors were involved in the conception and writing of the manuscript. All authors gave approval of the final version.

Corresponding author

Correspondence to Yana Weinstein .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

YW and MAS run a blog, “The Learning Scientists Blog”, which is cited in the tutorial review. The blog does not make money. Free resources on the strategies described in this tutorial review are provided on the blog. Occasionally, YW and MAS are invited by schools/school districts to present research findings from cognitive psychology applied to education.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Weinstein, Y., Madan, C.R. & Sumeracki, M.A. Teaching the science of learning. Cogn. Research 3 , 2 (2018). https://doi.org/10.1186/s41235-017-0087-y

Download citation

Received : 20 December 2016

Accepted : 02 December 2017

Published : 24 January 2018

DOI : https://doi.org/10.1186/s41235-017-0087-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

research papers on learning approach

  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Systematic review article, a meta-analysis of ten learning techniques.

www.frontiersin.org

  • Science of Learning Research Centre, Graduate School of Education, University of Melbourne, Melbourne, VIC, Australia

This article outlines a meta-analysis of the 10 learning techniques identified in Dunlosky et al. (2013a), and is based on 242 studies, 1,619 effects, 169,179 unique participants, with an overall mean of 0.56. The most effective techniques are Distributed Practice and Practice Testing and the least effective (but still with relatively high effects) are Underlining and Summarization. A major limitation was that the majority of studies in the meta-analysis were based on surface or factual outcomes, and there is caution needed when applying these findings to deeper and more relational outcomes. Other important moderators included the presence of feedback or not, near or far transfer, and the effects were much greater for lower than higher ability students. It is recommended that more attention be paid to when, under what conditions, each technique can be used, and how they can best be taught.

Introduction

While the purpose of schooling may change over time and differ across jurisdictions, the mechanisms by which human learning occurs arguably are somewhat more universal. Learning techniques actions that learners themselves can take to enhance their learning–have attracted considerable research interest in recent years ( Edwards et al., 2014 ). This is unsurprising given the direct, practical applicability of such research, and its relevance to students, educators and school leaders alike.

A major, thorough, and important review of various learning techniques has created much interest. Dunlosky et al. (2013a) reviewed 10 learning techniques and a feature of their review is their careful analyses of possible moderators to the conclusions about the effectiveness of these learning techniques, such as learning conditions (e.g., study alone or in groups), student characteristics (e.g., age, ability), materials (e.g., simple concepts or problem-based analyses), and criterion tasks (different outcome measures). This article uses this review as a basis for conducting a meta-analysis on these authors’ references to add another perspective of the magnitude of the various learning techniques and how they are affected by various moderators.

Dunlosky et al. (2013a) , claim to have conducted an exhaustive search of the literature, relied on previous empirical reviews of learning techniques, and applied a robust set of selection criteria before selecting final 10 techniques. These criteria included that the technique could be implemented by students without assistance, there was sufficient empirical evidence to support at least a preliminary assessment of efficacy, and there was robust evidence to identify the generalizability of its benefits across four categories of variables materials, learning conditions, student characteristics and criterion tasks. Indeed the authors’ mastery of the literature is most evident throughout the article.

The authors then categorised the 10 techniques into three groups based on whether they considered them having high, medium or low support for their effectiveness in enhancing learning. Categorised as “high” support were Practice Testing (self-testing or taking practice tests on to-be-learned material) and Distributed Practice (implementing a schedule of practice that spreads out study activities over time in contrast to massed or ‘crammed’ practice). Categorised as “moderate” support were Elaborative Interrogation (generating an explanation of a fact or concept), Self-Explanation (where the student explains how new information is related to already-known information) and Interleaved Practice (implementing a schedule of practice mixing different kinds of problems within a single study session). Finally, categorised as “low” support were Summarization (writing summaries of to-be-learned texts), Highlighting/Underlining (marking potentially important portions of to-be-learned materials whilst reading), Keyword Mnemonic (generating keywords and mental imagery to associate verbal materials), Imagery use (attempting to form mental images of text materials while reading or listening) and Re-Reading (restudying text material again after an initial reading). In an accompanying article, Dunlosky et al. (2013b) claimed that some of these low support techniques (that students use a lot) have “failed to help students of all sorts” ( p . 20), the benefits can be short lived, they may not be widely applicable, the benefits are relatively limited, and they do not provide “bang for the buck” ( p . 21).

Practice Testing is one of the two techniques with the highest utility. This must be distinguished from high stakes testing: Practice Testing instead involves any activity where the student practices retrieval of to-be-learned information, reproduces that information in some form, and evaluates the correctness of that reproduction against an accepted ‘correct’ answer. Any discrepancy between the produced and “correct” information then forms a type of feedback that the learner uses to modify their understanding. Practice tests can include a range of activities that students can conduct on their own, such as completing questions from textbooks or previous exams, or even self-generated flashcards. According to Dunlosky et al. (2013a) , such testing helps increase the likelihood that target information can be retrieved from long-term memory and it helps students mentally organize information that supports better retention and test performance. This effect is strong regardless of test form (multiple choice or essay), even when the format of the practice test does not match the format of the criterion test, and it is effective for all ages of student. Practice Testing works well even when it is massed, but is even more effective when it is spaced over time. It does not place high demand on time, is easy to learn to do (but some basic instruction on how to most effectively use practice tests helps), is so much better than unguided restudy, and so much more effective when there is feedback about the practice test outputs (which also enhances confidence in performance).

Many studies have shown that practice spread out over time (spaced) is much more effective than practice over a short time period (massed)–this is what is meant by Distributed Practice. Most students need three to four opportunities to learn something ( Nuthall, 2007 ) but these learning opportunities are more effective if they are distributed over time, rather than delivered in one massed session: that is, spaced practice, not skill and drill, spread out not crammed, and longer inter-study intervals are more effective than shorter. There have been four meta-analyses of Spaced vs. Massed practices involving about 300 studies, with an average effect of 0.60 ( Donovan and Radosevich, 1999 ; Cepeda et al., 2006 ; Janiszewski et al., 2003 ; Lee and Genovese 1988 ). Cepeda et al. (2008) showed that for almost all retention intervals, memory performance increases sharply with the length of the spacing interval. But at a certain spacing interval, optimal test performance is reached, and from that interval onwards, performance declines but only to a limited degree. But they also note that this does not take into account the absolute level of performance, which decreases as the retention interval increases. Further, Spaced Practice is more effective for deeper than surface processing, and for all ages. Rowland (2014) completed a meta-analysis on 61 studies investigating the effect of testing vs. restudy on retention. He found a high effect size ( d = 0.50) supporting the testing over restudy, and the effects were greater for recall than for recognition tasks. The educational message is to review previously covered material in subsequent units of work, time tests regularly and not all at the end (which encourages cramming and massed practice), and given that students tend to rate learning higher after massed, educate them as to the benefits of spaced practice and show them those benefits.

Elaborative Interrogation, Self-Explanation, and Interleaved Practice received moderate support. Elaborative Interrogation involves asking “Why” questions (“Why does it make sense that” “Why is this true”) and a major purpose is to integrate new information with existing prior knowledge. The effects are higher when elaborations are precise rather than imprecise, when prior knowledge is higher than lower, and when elaborations are self-generated rather than provided. A constraint of the method is that is more applicable to surface than to deep understanding. Self-explanation involves students explaining some aspect of their processing during learning. It works across task domains, across ages, but may require training, and can take some time to implement. Interleaved Practice involves alternating study practice of different kinds of items, problems, and even subject domains rather than blocking study. The claim is that Interleaving leads to better discrimination of different kinds of problems, more attention to the actual question or problem posed, and as above there is better learning from Spaced than Mass Practice. The research evidence base is currently small, and it is not clear how to break tasks in an optimal manner so as to interleave them.

There is mixed and often low support, claimed Dunlosky et al. (2013a) , for Summarization, Highlighting, Keyword Mnemonic, Imagery Use for text learning, and Re-Reading. Summarization involves students writing summaries of to-be-learned texts with the aim of capturing the main points and excluding unimportant or repetitive material. The generality and accuracy of the summary are important moderators, and it is not clear whether it is better to summarize smaller pieces of a text (more frequent Summarization) or to capture more of the text in a larger summary (less frequent Summarization). Younger and less able students are not as good at Summarization, it is better when the assessments are performance or generative and not closed or multiple choice tests, and it can require extensive training to use optimally. Highlighting and Underlining are simple to use, do not require training, and demand hardly any additional time beyond the reading of the text. It is more effective when professionals do the highlighting, then for the student doing the highlighting, and least for reading other student’s highlights. It may be detrimental to later ability to make inferences; overall it does little to boost performance. The Keyword Mnemonic involves associating some imagery with the word or concept to be learned. The method requires generating images that can be difficult for younger and less able students, and there is evidence is may not produce durable retention. Similarly Imagery Use is of low utility. This method involves students mentally imaging or drawing pictures of the content using simple and clear mental images. It too is more constrained to imagery-friendly materials, and memory capacity. Re-Reading is very common. It is more effective when the Re-Reading is spaced and not massed, the effects seem to decrease beyond the second reading, is better for factual recall than for developing comprehension, and it is not clear it is effective with students below college age.

A follow-up and more teacher accessible article by Dunlosky et al. (2013b) asks why students do not learn about the best techniques for learning. Perhaps, the authors suggest, it is because curricula are developed to highlight content rather than how to effectively acquire it; and it may be because many recent textbooks used in teacher education courses fail to adequately cover the most effective techniques or how to teach students to use them. They noted that employing the best techniques will only be effective if students are motivated to use them correctly but teaching students to guide their learning of content using effective techniques will allow them to successfully learn throughout their lifetime. Some of the authors’ tips include: give a low-stakes quiz at the beginning of each class and focus on the most important material; give a cumulative exam that encourages students to re-study the most important material in a distributed fashion; encourage students to develop a “study planner” so they can distribute their study throughout a class and rely less on cramming; encourage students to use practice retrieval when studying instead of passively re-reading their books and notes; encourage students to elaborate on what they are reading, such as by asking “why” questions; mix up problems from earlier classes so students can practice identifying problems and their solutions; and tell students that highlighting is fine but only in the beginning of their learning journey.

The Dunlosky et al. (2013a) , review shows a high level of care of selection of articles, an expansiveness of the review, an attention to generalizability and moderators, and is sophisticated in its conclusions. There are two aspects of the this research that the current paper aims to address. First, Dunlosky et al. (2013a) relied on a traditional literature review method and did not include any estimates of the effect-sizes of their various techniques, nor did they indicate the magnitude of their terms high, medium, and low. One of the purposes of this article is to provide these empirical estimates. Second, the authors did not empirically evaluate the moderators of the 10 learning techniques, such as Deep vs. Surface learning, Far vs. Near Transfer, or age/grade level of learner. An aim of this paper is to analyze the effects of each of the 10 techniques with respect to these and other potential moderators.

Research syntheses aim to summarise past research by estimating effect-sizes from multiple, separate studies that address, in this case, 10 major learning techniques. The data is based on the 399 studies referenced in Dunlosky et al. (2013a) . We removed all non-empirical studies, and any studies that did not report sufficient data for the calculation of a Cohen’s d . This resulted in 242 studies being included in the meta-analysis, many of which contained data for multiple effect sizes, resulting in 1,620 cases for which a Cohen’s d was calculated (see Figure 1 ).

www.frontiersin.org

FIGURE 1 . Flow diagram of articles used in the meta-analysis.

The publication dates of the articles ranged from 1929 to 2014, with half being published since 1996. Most participants were undergraduates (65%), but also included secondary (11%), primary (13%), adults (2%), and early childhood (9%). Most were chosen from the average range of abilities (86%), while 7% were categorised low ability and 7% high ability. The participants were mainly North Americans (86%), and also included Europeans (11%), and Australians (3%).

All articles were coded by the two authors, and independent colleagues were asked to re-code a sample of 30 (about 10%) to estimate inter-rater reliability. This resulted in a kappa value of 0 89, which gives much confidence in the dependability of the coding.

For each study, three sets of moderators were coded. The first set of moderators included attributes of the article: quality of the journal (h-index), year of publication (to assess any changes in effectiveness as more research has been added into the literature), and sample size. The second set of moderators included attributes of the students: ability level of the students (low, average, and high), country of the study, grade levels of the student (pre and primary, high, Univ, adults). The third set of moderators included attributes of the design: whether the outcome was near or far transfer (e.g., was the learner tested on criterion tasks that differed from the training tasks or did the effect of the technique improve the student learning in a different subject domain), the depth of the outcome (Surface or content-specific vs. Deep or more generalizable learning), how delayed was the testing from the actual study (under 1 day, or 2 + days), the learning domain of the content of the study or measure (e.g., cognitive, non-cognitive).

The design of most studies include experimental compared to control group (91%), longitudinal (pre-post, time series) 6.5%, and within subject designs (2.4%). Most learning outcomes were classified as Surface (93%) and the other 7% Deep. The post-tests were predominantly completed very soon after the intervention - 74% completed in 1 day or less, 17% from 2 to 7 days, 3.3% from 8 days to month, 0.4% from 1 to 3 months, and 0.2% from 4 months to 7 years.

We used two major methods for calculating Cohen’s d from the various statistics published in the studies. First, standardized mean differences ( N = 1,203 effects) involved subtracting the mean of the control group from the mean of the experimental group, then dividing by an estimate of the pooled standard deviation, as follows.-

The standard errors of the effect size (ES) were calculated as follows,

We adjusted the effect sizes (ES) according to Hedges and Olkin, (1985) to account for bias in sample sizes, according to this

Second, F-statistics (for two groups only) were converted using:

The Standard Error was calculated using:

In all cases, therefore, a positive effect meant that the learning technique had a positive impact on learning.

The distribution of effect sizes and sample sizes was examined to determine if any were statistical outliers. Grubbs (1950) test was applied (see also Barnett and Lewis, 1994 ). If outliers were identified, these values were set at the value of their next nearest neighbour. We used inverse variance weighted procedures to calculate average effect sizes across all comparisons ( Borenstein et al., 2005 ). Also, 95% confidence intervals were calculated for average effects. Possible moderators (e.g., grade level, duration of the treatment) of the DBP to student outcome relationship were tested using homogeneity analyses ( Hedges and Olkin, 1985 ; Cooper et al., 2019 ). The analyses were carried out to determine whether a) the variance in a group of individual effect sizes varies more than predicted by sampling error and/or b) multiple groups of average effect sizes vary more than predicted by sampling error.

Rather than opt for a single model of error, we conducted the overall analyses twice, once employing fixed-error assumptions and once employing random-error assumptions (see Hedges and Vevea, 1998 , for a discussion of fixed and random effects). This sensitivity analysis allowed us to examine the effects of the different assumptions (fixed or random) on the findings. If, for example, a moderator is found to be significant under a random-effects assumption but not significant under a fixed effects assumption, then this suggests a limit on the generalizability of the inferences of the moderator. All statistical processes were conducted using the Comprehensive Meta-Analysis software package ( Borenstein et al., 2005 ).

The examination of heterogeneity of the effect size distributions within each outcome category was conducted using the Q statistic and the I 2 statistic ( Borenstein et al., 2009 ). To calculate Q and I 2 , we entered the corrected effect-sizes for every case, along with the SE (calculated as above) and generated homogeneity data. Due to the substantive variability within the studies, even in the case of a non-significant Q test, when I 2 was different from zero, moderation analyses were carried out through subgroup analysis ( Lipsey and Wilson, 2001 ). As all hypothesized moderators were operationalized as categorical variables, these analyses were performed primarily through subgroup analyses using a mixed-effects model.

Table 1 shows a comprehensive analysis of the collected data. For the 242 studies, we calculated a total of 1,619 effects which related to 169,179 unique participants. The overall mean assuming a fixed model was 0.56 (SD = 0.81, SEM = 0.072, skewness 1.3, kurtosis = 5.64); the overall mean assuming a random model was 0.56 (SE = 0.016). The overall mean at the study level was 0.77 (SE = 0.049). The fixed effects model assumes that all studies in the meta-analysis share a common true effect size, whereas the random effects model assumes that the studies were drawn from populations that differ from each other in ways that could impact on the treatment effect. Given that the means estimated under the two models are similar we proceed to use only one (the random model) in subsequent analyses.

www.frontiersin.org

TABLE 1 . Summary of effects for each learning strategy.

The distribution of all effects is presented in Figure 1 and the studies, their attributes, and study effect-size are presented in Table 1 . It is clear that there is much variance among these effects ( Q = 10,688.2, I 2 = 84.87). The I 2 a measure of the degree of inconsistency in the studies’ results; and this I 2 of 85% shows that most of the variability across studies is due to heterogeneity rather than chance. Thus, the search for moderators is critical to understanding which learning techniques work best in which situations.

Table 2 shows an overall summary of effects moderated by the learning domain. The effects correspond with the classification of High, Moderate, and Low by Dunlosky et al. (2013a) , but it is noted that Low is still relatively above the average of most meta-analysis in education – Hattie, (2009) , Hattie, (2012) , Hattie, (2015) reported an average effect-size of 0.40 from over 1,200 meta-analyses relating to achievement outcomes. All techniques analyzed in the current study had an ES of over 0.40.

www.frontiersin.org

TABLE 2 . Effect Sizes moderated by the Learning Domain.

Moderator Analyses

Year of publication.

There was no relation between the magnitude of the effects and the year of the study ( r = 0.08, df = 236, p = 0.25) indicating that the effects of the learning technique have not changed over time (from 1929 to 2015).

Learning Domain

The vast majority of the evidence is based on measurements of academic achievement: 222 of the 242 studies (91.7%) and 1,527 of the 1,619 effects (94.3%). English or Reading was the basis for 85 of the studies (35.1%) and 546 of the effects (33.7%), and Science 41 of the studies (16.9%) and 336 of the effects (20.8%). There was considerable variation in the effect sizes of these domains, as shown in Table 3 .

www.frontiersin.org

TABLE 3 . Effect sizes moderated by grade level.

Near vs. Far Transfer

If the study measured an effect on performance on a task similar to the task used in the experiment, it was classified as measuring Near transfer, alternatively if the transfer was to another dissimilar context it was classified as Far transfer. There were so few Far transfer effects that the information is not broken into the 10 learning techniques. Overall, the effects on Near ( d = 0.61, SE = 0.052, N = 197) are much greater than the effects on Far ( d = 0.39, SE = -0.002, N = 1,385).

Depth of Learning

The effects were higher for Surface ( d = 0.60, SE = 0.021, N = 1,473) than for Deep processing ( d = 0.26, SE = 0.064, N = 109).

Grade Level

The effects moderated by grade level of the participants are presented in Table 4 . All students had higher effects on summarization, distributed practice, imagery use, and re-reading, primary students had lower effects on interleaved practice, mnemonics, self-explanation, and practice testing. Both primary and secondary students had lower effects on Underlining.

www.frontiersin.org

TABLE 4 . Effect size moderated by Country of first author.

Each study was coded for the country where the study was conducted. Where that information was not made clear in the article, the first author’s country of employment was used. Of the 242 studies, 187 (77.3%) were from USA, 20 (8.3%) were from Canada, 27 (11.1%) from Europe: United Kingdom, Denmark, France, Germany, Italy, Netherlands), 7 (2.9%) from Australia and 1 (0.4%) from Iran making a total North American proportion of 207 (85.6%). Other than the drop for Europe in Mnemonics, Interleaved Practice and Summarisation there is not a great difference by country.

Ability Level

Almost all studies referred to participants as being either “Low” “Normal” or “High” ability. This language has been continued in the collection and analysis of the data, however in the body of the paper the terms “Low”, “Average” and “High” ability have been used instead. In all cases, these categories aligned with percentiles of the normal distribution for academic scores. Of the 242 studies, only six investigated High ability students, and only 13 Low ability. Across all techniques, the mean effect on High ability students was -0.11 (SE = 0.10, N = 28) for Low ability students was 0.47, SE = 0.15, N = 58. The High ability students had negative effects for Interleaved Practice and Summarization.

Studies predominantly measured only very short-term effects, the exception being the three learning techniques focused on practice effects (Practice Testing, Distributed Practice and Interleaved Practice). Most (68%) where evaluated within a day (usually immediately). There were no overall differences relating to the effects less than a day (d = 0.58, SE = 0.025, N = 1,073), > 1 day and < 1 week ( d = 0.59, SE = 0.057, N = 204), > 1 week and < 1 month ( d = 0.56, SE = 0.058, N = 228), < 1 month and less than 6 months ( d = 0.51, SE = 0.082, N = 64).

Journal Impact Factor

The published Impact factor for each journal was sourced from that Journal’s website. Where a multiple-year (usually 5 years) average h-index was provided, that was used in preference to a single (the most recent) year (PhD theses were left blank). The average impact factor is 2.80 (SE = 3.29), which relative to Journals in educational psychology indicates that the overall quality of Journals is quite high. Across all 10 learning techniques, there was a moderate positive correlation between effect size and Journal Impact Factor, r (235) = 0.24, < 0.001. Thus the effect-sizes were slightly higher in the more highly cited Journals.

Discussion and Conclusion

The purpose of the current study was twofold: to provide empirical estimates of the effectiveness of the 10 learning techniques, and second, to empirically evaluate a range of their potential moderators. The major conclusion from the meta-analysis is a confirmation of the major findings in Dunlosky et al. (2013a) . They rated the effects by High, Moderate, and Low and there was much correspondence between their ratings and the actual effect-sizes: High in the meta-analysis was > 0.70, Moderate between 0.54 and 0.69, and Low < 0.53. This meta-analysis, however, shows the arbitrariness of these ratings, as some of the low effects were very close estimates to the moderate. mnemonics, re-reading and interleaved practice were all within 0.06 of the moderate category and these techniques may have similar importance to those Dunlosky et al. (2013a) classified as Moderate. Certainly they should not be dismissed as ineffective. Even the lowest learning techniques (Underlining and Summarization (both d = 0.44) are sufficiently effective to be included in a student’s toolbox of techniques.

The rating method into High, Medium, and Low was matched by the findings of the meta-analysis, but Table 2 shows the usual difficulties of such arbitrary (but not capricious) cut scores. Mnemonics ( d = 0.50) is close to Self-Explanation ( d = 0.54), although there is a clear separation between Moderate (Elaborative Interrogation d = 0.56) and Practice Testing ( d = 0.74). All have a sufficient positive effect to consider by students choosing learning techniques, and it may be that there is a more optimal stage during the learning process to choose the high techniques related to consolidating learning, and the low techniques related to first encountering new material and ideas. It may also be that techniques are affected by whether the tasks are more relevant to memory vs. those that are relevant to comprehension. Many of the techniques in the authors’ list of 10 are more related to the former than the latter.

www.frontiersin.org

FIGURE 2 . Distribution of effects.

The technique with the lowest overall effect was Summarization. Dunlosky et al. (2013a) note that it is difficult to draw general conclusions about its efficacy, it is likely a family of techniques, and should not be confused with mere copying. They noted that it is easy to learn and use, training typically helps improve the effect (but such training may need to be extensive), but suggest other techniques might better serve the students. In their other article ( Dunlosky et al., 2013b ), the authors classified Summarization as among the “less useful techniques” that “have not fared so well when considered with an eye toward effectiveness” ( p . 19). They also noted that a critical moderator for the effectiveness of all techniques is the student’s motivation to use them correctly. This meta-analysis shows that Summarization, while among the less effective of the 10 under review, still has a sufficiently high impact to be considered worthwhile in the student’s arsenal of learning techniques, and with training could be among the more easier to use techniques.

One of the sobering aspects of this meta-analysis is the finding that the majority of studies are based on Surface learning of factual, academic content, measure learning almost immediately after the technique has been used, and only measure Near transfer. This limits the generalisability of the Dunlosky et al. (2013a) review and this meta-analysis and there may well be different learning techniques that optimise deeper learning, non-academic learning, or more intensive learning that requires longer retention periods and Far transfer. The verdict is still out on the effectiveness and identification of the optimal techniques in these latter conditions. It should be noted, however, that this may be not only a criticism of the current research on learning techniques but could well be the same criticism of student experiences in most classrooms. Too many modern classrooms are still dominated by a preponderance of surface learning, teachers asking low level questions demanding content answers, and assessments privileging surface knowledge ( Tyack & Cuban, 1995 ). Thus the 10 techniques may remain optimal for many current classrooms.

The implication for teachers is not that these learning techniques should be implemented as stand-alone “learning interventions” or fostered through study skills courses. They can be used, however, within a teaching process to maximise the surface and deeper outcomes of a series of lessons. For example, Practice Testing is among the top two techniques but it would be a mistake to then make claims that there should be more testing, especially high-stakes testing! Dunlosky et al. (2013a) concluded that more Practice Testing is better, should be spaced not massed, works with all ages, levels of ability, and across all levels of cognitive complexity. A major moderator is whether the practice tests are accompanied by feedback or not. “The advantage of Practice Testing with feedback over restudy is extremely robust. Practice Testing with feedback also consistently outperforms Practice Testing alone” ( p . 35). If students continue to practice wrong answers, errors or misconceptions, then these will be successfully learnt and become high-confidence errors; hence the power of feedback. It is not the frequency of testing that matters, but the skill in using practice testing to learn and consolidate knowledge and ideas.

There are still many unanswered questions that need further attention. First, there is a need to develop a more overarching model of learning techniques to situate these 10 and the many other learning techniques. For example, we have developed a model that argues that various learning techniques can be optimised at certain phases of learning from Surface to Deep to Transfer, from acquiring and consolidating knowledge and understanding, and involves three inputs and outputs -knowing, dispositions, and motivations; which we call the skill, the will, and the thrill ( Hattie and Donoghue, 2016 ). Memorisation and Practice Testing, for example, can be shown to be effective in the consolidating of surface knowing but not effective without first acquiring surface knowing. Problem based learning is relatively ineffective for promoting surface but more effective at the deeper understanding, and thus should be optimal after it has been shown students have sufficient surface knowledge to then work through problem based methods.

Second, it was noted above that the preponderance of current studies (and perhaps classrooms) favour Surface and Near learning and care should be taken to not generalise the results of either the original review or our meta-analysis to when Deep and Far learning is desired. Third, it is likely, as the original authors hint, having a toolbox of optimal learning techniques may be most effective, but we suggest that there may need to be a higher sense of self-regulation to know when to use them. Fourth, as the authors noted, it is likely that motivation and emotions are involved in the selection, persistence with, and effectiveness of using the learning techniques, so attention to these matters is imperative for many students. Fifth, given the extensive and robust evidence for the efficacy of these learning techniques, an important avenue of future research may centre on the value in teaching them to both teachers and students. Can these techniques be taught, and if so, how? Need they be taught in the context of specific content? In what ways can the emerging field of educational neuroscience inform these questions?

Third, Dunlosky and Rawson (2015) noted that more recent research may influence some of these findings. For example, he noted that while Interleaving was a “Low” technique, there have since been many studies demonstrating the benefits of Interleaving. For example, Carvalho and Goldstone (2015) found that the way information is ordered impacts learning and that this influence is modulated by the demands of the study task; in particular whether learning is active or passive. Learners in the active study condition tend to look for features that discriminate between categories, and these features are easier to detect when categories frequently alternate (i.e., using Interleaving). Learners in the passive study condition are more likely to look for features that consistently appear within one category’s examples, and these features are easier to detect when categories rarely alternate.

A significant limitation of the current study is that no publications beyond 2014 have been meta-analysed. Notwithstanding, the authors are unaware of any more recent study that contradicts any of our findings. Accordingly, the study represents a comprehensive and valid quantitative review of research published between 1929 and 2014, one that complements and underpins Dunlosky et al. (2013a) qualitative review.

Concluding Remarks

The major contribution from Dunlosky et al. (2013a) , and supported by the findings from this study is to highlight the relative importance of learning techniques, to identify and allow for the optimal moderators, and clearly more defensible models are needed that take into account the demands of the task, the timing of the intervention, and the role of learning techniques within content domains. Future research that examines the impact of these (and other) moderators, and incorporates findings into theoretical and conceptual models, is much needed.

Author Contributions

JH conceived study, wrote article with GD. GD found and coded all article, worked on analyses, contributed to writing.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Barnett, V., and Lewis, T. (1994). Outliers in statistical data . New York, NY: Wiley .

Borenstein, M., Cooper, H., Hedges, L., and Valentine, J. (2009). Effect sizes for continuous data. Handbook Res. Synth. Meta-Anal. 2, 221–235. doi:10.7758/9781610448864.4

Google Scholar

Borenstein, M., Hedges, L., Higgins, J., and Rothstein, H. (2005). Comprehensive meta-analysis version 2 . Englewood, NJ: Biostat .

Carvalho, P. F., and Goldstone, R. L. (2015). The benefits of interleaved and blocked study: different tasks benefit from different schedules of study. Psychon. Bull. Rev. 22 (1), 281–288. doi:10.3758/s13423-014-0676-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., and Rohrer, D. (2006). Distributed practice in verbal recall tasks: a review and quantitative synthesis. Psychol. Bull. 132 (3), 354. doi:10.1037/0033-2909.132.3.354

Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., and Pashler, H. (2008). Spacing effects in learning: a temporal ridgeline of optimal retention. Psychol. Sci. 19 (11), 1095–1102. doi:10.1111/j.1467-9280.2008.02209.x

H. Cooper, L. V. Hedges, and J. C. Valentine (Editors) (2019). The handbook of research synthesis and meta-analysis (Newyork, NY: Russell Sage Foundation ).

Donovan, J. J., and Radosevich, D. J. (1999). A meta-analytic review of the distribution of practice effect: now you see it, now you don’t. J. Appl. Psychol. 84 (5), 795. doi:10.1037/0021-9010.84.5.795

CrossRef Full Text | Google Scholar

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., and Willingham, D. T. (2013a). Improving students’ learning with effective learning techniques: promising directions from cognitive and educational psychology. Psychol. Sci. Public Interest 14 (1), 4–58. doi:10.1177/1529100612453266

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., and Willingham, D. T. (2013b). What works, what doesn’t. Sci. Am. Mind 24 (4), 46–53. doi:10.1038/scientificamericanmind0913-46

Dunlosky, J., and Rawson, K. A. (2015). Practice tests, spaced practice, and successive relearning: tips for classroom use and for guiding students' learning. Scholarship Teach. Learn. Psychol. 1 (1), 72. doi:10.1037/stl0000024

Edwards, A. J., Weinstein, C. E., Goetz, E. T., and Alexander, P. A. (2014). Learning and study techniques: issues in assessment, instruction, and evaluation . Amsterdam, The Netherland: Elsevier .

Grubbs, F. E. (1950). Sample criteria for testing outlying observations. Ann. Math. Statist. 21 (1), 27–58. doi:10.1214/aoms/1177729885

Hattie, J. A., and Donoghue, G. M. (2016). Learning techniques: a synthesis and conceptual model. Npj Sci. Learn. 1, 16013. doi:10.1038/npjscilearn.2016.13

Hattie, J. (2015). The applicability of Visible Learning to higher education. Scholarship Teach. Learn. Psychol. 1 (1), 79. doi:10.1037/stl0000021

Hattie, J. (2012). Visible learning for teachers: maximizing impact on learning . England, United Kingdom: Routledge .

Hattie, J. (2009). Visible learning: a synthesis of over 800 meta-analyses relating to achievement . England, United Kingdom: Routledge .

Hedges, L. V., and Olkin, I. (1985). Statistical methods for meta-analysis . Cambridge, MA: Academic Press .

Hedges, L. V. and Vevea, J. L. (1998). Fixed-and randomeffects models in meta-analysis. Psychol. Meth. 3, 486.

Janiszewski, C., Noel, H., and Sawyer, A. G. (2003). A meta-analysis of the spacing effect in verbal learning: implications for research on advertising repetition and consumer memory. J. Consum. Res. 30 (1), 138–149. doi:10.1086/374692

Lee, T. D., and Genovese, E. D. (1988). Distribution of practice in motor skill acquisition: learning and performance effects reconsidered. Res. Q. Exerc. Sport 59 (4), 277–287. doi:10.1080/02701367.1988.10609373

Lipsey, M. W. and Wilson, D. B. (2001). Practical meta-analysis. Newbury Park, CA, United States: SAGE publications, Inc.

Nuthall, G. (2007). The hidden lives of learners . Wellington, New Zealand: NZCER Press.

Rowland, C. A. (2014). The effect of testing versus restudy on retention: a meta-analytic review of the testing effect. Psychol. Bull. 140 (6), 1432. doi:10.1037/a0037559

Tyack, D. B., and Cuban, L. (1995). Tinkering toward utopia . Cambridge, MA: Harvard University Press .

Keywords: meta-analysis, learning strategies, transfer of learning, learning technique, surface and deep learning

Citation: Donoghue GM and Hattie JAC (2021) A Meta-Analysis of Ten Learning Techniques. Front. Educ. 6:581216. doi: 10.3389/feduc.2021.581216

Received: 08 July 2020; Accepted: 08 February 2021; Published: 31 March 2021.

Reviewed by:

Copyright © 2021 Donoghue and Hattie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Gregory M. Donoghue, [email protected]

  • Open access
  • Published: 23 October 2020

A systematic literature review of personalized learning terms

  • Atikah Shemshack   ORCID: orcid.org/0000-0003-4964-6171 1 &
  • Jonathan Michael Spector 1  

Smart Learning Environments volume  7 , Article number:  33 ( 2020 ) Cite this article

49k Accesses

89 Citations

4 Altmetric

Metrics details

Learning is a natural human activity that is shaped by personal experiences, cognitive awareness, personal bias, opinions, cultural background, and environment. Learning has been defined as a stable and persistent change in what a person knows and can do. Learning is formed through an individual’s interactions, including the conveyance of knowledge and skills from others and experiences. So, learning is a personalized experience that allows one to expand their knowledge, perspective, skills, and understanding. Therefore, personalized learning models can help to meet individual needs and goals. Furthermore, to personalize the learning experience, technology integration can play a crucial role. This paper provides a review of the recent research literature on personalized learning as technology is changing how learning can be effectively personalized. The emphasis is on the terms used to characterize learning as those can suggest a framework for personalized and will eventually be used in meta-analyses of research on personalized learning, which is beyond the scope of this paper.

Introduction

Personalized learning has been a topic of research for a long time. However, around 2008, personalized learning started to draw more attention and take on a transformed meaning as seen in Fig.  1 . However, we believe the variety of terms that have been used for personalized learning seems to be an obstacle to the progress of personalized learning theories and research. Although there exists an abundance of the resources/studies on personalized learning, not having a readily agreed-upon term of personalized learning might be the obstacle in research progress on personalized learning. In response to this need, this paper is focused on analyzing the terms that have been used for personalized learning. A distinctly personalized learning approach can help the educational researchers to build up research on previous data, instead of trying to start new research from scratch each time. This paper will present a research-based framework for personalized learning and discuss future research directions, issues, and challenges through an in-depth analysis of the definitions and terms used for personalized learning.

figure 1

The number of published papers on “personalized learning ”

Personalized learning has existed for hundreds of years in the form of apprenticeship and mentoring. As educational technologies began to mature in the last half of the previous century, personalized learning took the form of intelligent tutoring systems. In this century, big data and learning analytics are poised to transform personalized learning once again. Learning has been characterized as a stable and persistent change in what a person knows and can do (Spector, 2015 ). Personalized learning is a complex activity approach that is the product of self-organization (Chatti, 2010 ; Miliband, 2006 ) or learning and customized instruction that considers individual needs and goals. Personalized learning can be an efficient approach that can increase motivation, engagement and understanding (Pontual Falcão, e Peres, Sales de Morais and da Silva Oliveira, 2018 ), maximizing learner satisfaction, learning efficiency, and learning effectiveness (Gómez, Zervas, Sampson and Fabregat, 2014 ). However, while such personalized learning is now possible, it remains as one of the biggest challenges in modern educational systems. In this paper a review of progress in personalized learning using current technologies is provided. The emphasis is on the characteristics of personalized learning that need to be taken into consideration to have a well-developed concept of personalized learning.

We started with the definition of personalized learning suggested by Spector ( 2014 , 2018 ) and others that are discussed below, which requires a digital learning environment to be classified as a personalized learning environment to be adaptive to individual knowledge, experience and interests and to be effective and efficient in supporting and promoting desired learning outcomes. These characteristics are those which are typically discussed in the research community although we found it challenging to find a sufficient number of published cases that reported effect sizes and details of the sample in order to conduct a formal meta-analysis. Lacking those cases suggests that personalized learning in the digital era is still in its infancy. As a result, we conducted a more informal albeit systematic review of published research on personalized learning.

Furthermore, we, along with many educational technologists, believe an efficient personalized learning approach can increase learners’ motivation and engagement in learning activities so that improved learning results. While that outcome now seems achievable, it remains a largely unrealized opportunity according to this research review. Truong ( 2016 ) stated that providing the same content to students with different qualifications and personal traits and having different interests and needs is not considered adequate anymore when learning can now be personalized. Miliband ( 2006 , as cited in Lee, Huh, Lin and Reigeluth, 2018 ) promoted personalized learning to be the solution to tailoring the learning according to individuals’ needs and prior experience so as to allow everyone to reach their maximum potential through customized instruction (Hsieh and Chen, 2016 ; Lin, Yeh, Hung and Chang, 2013 ).The customized instruction that includes what is taught, how it is taught, and the pace at which it is taught. This allows learning to meet individual needs, interests and circumstances which can be quite diverse (Brusilovsky and Peylo, 2003 ; Liu and Yu, 2011 ). Furthermore, FitzGerald et al. ( 2018 ) pointed out the personalization of learning is now a recurring trend across government agencies, popular media, conferences, research papers, and technological innovations.

Personalized learning is in demand (Huang, Liang, Su and Chen, 2012 ) due to new technologies involving big data and learning analytics. It should be tailored to and continuously modified to an individual learner’s conditions, abilities, preferences, background knowledge, interests, and goals and adaptable to the learner’s evolving skills and knowledge (Sampson, Karagiannidis and Kinshuk, 2002 ; Sharples, 2000 ). Today’s personalized learning theories are inspired by educational philosophy from the progressive era in the previous century, especially John Dewey’s ( 1915 , 1998 ) emphasis on experiential, learner-centered learning, social learning, extension of the curriculum, and fitting for a changing world. McCombs and Whisler ( 1997 ; as cited in Lee et al., 2018 ) claimed that a learner-centered environment develops as it considers learners’ unique characteristics using the best knowledge of teaching and learning which are available. Furthermore, Lockspeiser and Kaul ( 2016 ) claimed that individualized learning is a tool to facilitate learner-centered education. FitzGerald et al. ( 2018 ) pointed out that personalization is a crucial topic of current interest in technology-oriented learning design and discussion for government policymakers, but less so in educational research. This might be a good explanation of disunity of personalized learning approaches.

On the other hand, Niknam and Thulasiraman ( 2020 ) argued that educational society has been interested in having a personalized learning system that adjusts the pedagogy, curriculum, and learning environment for learners to meet their learning needs and preferences. A personalized learning system can adapt itself when providing learning support to different learners to defeat the weakness of one-size-fits-all approaches in technology-enabled learning systems. The goal is to have a learning system that can dynamically adapt itself based on a learner’s characteristics and needs to provide personalized learning. Human one-on-one tutors can do this and now it is possible for digital systems to do so as well. Schmid and Petko ( 2019 ) pointed out that a look at international research literature shows that personalized learning is a multilayered construct with numerous definitions and various forms of implementation. Which supports our claim that one of the most critical problems with personalized learning is, there is no readily agreed-upon meaning of the phrase ‘personalized learning’. Schmid and Petko ( 2019 ) supported this claim by stating that a clearly defined concept of personalized learning is still lacking; instead, it serves as an umbrella term for educational strategies that try to do justice to the individual’s abilities, knowledge, and learning needs of each student. Spector ( 2013 ) claimed that there would be more robust information to support personalized learning as technology develops. So many different terms have been used in the replacement of ‘personalized learning’. Researchers could not locate a systematic literature review on personalized learning terms that review the terms that have been used for personalized learning, and it is important to address this need. Therefore, this review was done to close that gap and respond to the need for a unified, personalized learning term. As a result, personalized learning definitions and the terms that have been used interchangeably, such as adaptive learning, individualized instruction, and customized learning are analyzed in this paper. These terms were chosen because they have been most used in the education field (Reisman, 2014 ). In the next several sections, each term will be defined, and their relationship with personalized learning will be discussed. The analysis of these terms guided the systematic review of the research literature that follows.

  • Adaptive learning

Most educators recognize the advantages of adaptive learning, but evidence-based research stays limited as adaptive learning is still evolving (Liu, McKelroy, Corliss and Carrigan, 2017 ). Adaptive learning is one of the terms that has been used interchangeably with personalized learning. The adaptive learning system is built on principles that have been around for a very long time dating back to the era of apprenticeship training and human tutoring. However, many other labels such as individualized instruction, self-paced instruction, and personalized instruction were used interchangeably while trying to produce the most suitable sequence of learning units for each learner (Garcia-Cabot, De-Marcos and Garcia-Lopez, 2015 ; Reisman, 2014 ). While early forms of adaptive learning (e.g., apprenticeship training and human tutoring) only dealt with one or a very small number of learners, the current interest is using adaptive learning for large numbers of learners, which is why there is such interest in big data and learning analytics.

For instance, adaptive learning has been interchangeably used by Yang, Hwang and Yang ( 2013 ) in their study that focused on the development of adaptive learning by considering students’ preferences (Dwivedi and Bharadwaj, 2013 ) and characteristics, including learning styles (Çakıroğlu, 2014 ; Klašnja-Milićević, Vesin, Ivanović and Budimac, 2011 ) and cognitive styles (Lo, Chan and Yeh, 2012 ) which concluded to be effective. Wang and Liao ( 2011 ) defined adaptive learning as a developed system (Lu, Chang, Kinshuk, Huang and Chen, 2014 ) to accommodate a variety of individual differences (Scheiter et al., 2019 ; Wang & Liao, 2011 ) such as gender, learning motivation, cognitive type, and learning style to determine optimal adaptive learning experience that accommodates a variety of individual differences (Afini Normadhi et al., 2019 ) to remove barriers of time and location. Griff and Matter ( 2013 ) discussed that adaptive learning is also referred to as computer-based learning, adaptive educational hypermedia, and intelligent tutoring. Furthermore, Hooshyar, Ahmad, Yousefi, Yusop and Horng ( 2015 ) used personalized and adaptive learning to explain the importance of the Intelligent Tutoring System (Aeiad and Meziane, 2019 ) for implementing one-to-one personalized and adaptive teaching. “Although the terms ‘personalized learning’ and ‘adaptive learning’ are different, they are often used interchangeably in various studies” (Aroyo et al., 2006 ; Göbel et al., 2010 ; Gómez et al., 2014 ; Lin et al., 2013 , as cited in Xie, Chu, Hwang and Wang, 2019 , p.2).

Based on this review, adaptive learning systems are defined as those that are computerized learning systems that adapt learning content, presentation styles, or learning paths based on individual students’ profiles, learning status, or human factors (Chen, Liu and Chang, 2006 ; Tseng et al., 2008 ; Yang et al., 2013 ).

Individualized instruction

Individualized instruction is one of the terms that are often used to talk about the specific needs and goals of individuals to be addressed during instruction. U.S. Department of Education ( 2010 ) defined personalized learning as involving customizing the learning pace to individual learners (individualization), tailoring instructional methods (differentiation), and personalizing learning content. This notion has evolved from one-on-one human tutoring. It is not agreed upon whether individualization is a component of personalized learning or another term that can be used in place of personalized learning. The review results show that instead of being a component, individualized instruction has been used as a replacement term for personalized learning and is a product of personalized learning. Chatti, Jarke and Specht ( 2010 ) and Chou, Lai, Chao, Lan and Chen ( 2015 ) had used both terms without defining/explaining how they relate to each other . Bahçeci and Gürol ( 2016 ) created a portal that offers individualized learning content based on the individual’s level of cognitive knowledge. Bahçeci and Gürol ( 2016 ) stated that education should be done by recognizing the individual differences of the students such as students learning styles (Çakıroğlu, 2014 ; Klašnja-Milićević et al., 2011 ) and characteristics. The researchers observed that Bahçeci and Gürol ( 2016 ) used individualized learning and personalized learning interchangeably without pointing out that they were doing so.

Also, most individualized learning studies have used individualized instruction to refer to IEP (individualized educational plans) for students with disabilities to accommodate their needs and goals. Even though individualized instruction is suggested as an approach that individualize material to improve the learning experience for students with learning disabilities, it can benefit all students (Barrio et al., 2017 ; Ko, Chiang, Lin and Chen, 2011 ). Personalized learning considers students’ interests, needs, readiness, and motivation and adapts to their progress by situating the learner at the center of the learning process. Individualized learning allows for individualization of learning based on the learner’s unique needs (Cavanagh, 2014 ; Lockspeiser & Kaul, 2016 ). While a learner-centered paradigm of education has influenced personalized learning, the current teacher-student ratios in school systems seem to be an obstacle to make learning experiences personalized for individual students without technology (Lee et al., 2018 ), with the exception of the requirement for IEPs in many school districts. We follow the definition offered by the U.S. Department of Education and note that individualized learning in school systems requires significant technology support, such as big data and learning analytics.

Customized learning

While Lee et al. ( 2018 ) suggested a learner-centered system that supports diverse needs and development of individual learners’ potentials. This system develops customized instructional methods and learning content for individual learners with unique characteristics and interests. Lee et al. ( 2018 ) suggested that learner-centered learning and personalized learning are blended and considered together. Lee et al. ( 2018 ) defined a personalized learning plan (PLP) that refers to a customized instructional plan (Somyürek, 2015 ) that considers individual differences and needs, characteristics, interests, and academic mastery. The PLP includes the notions of individualization, differentiation, and personalization that allows learning to be personally relevant, engaging, appropriate to the learners’ capabilities, and respectful of individual differences, making learning useful and motivational.

The review of those three terms reveals a great deal of overlap with an emphasis on the need to use technology to support such efforts. This study reviews definitions of personalized learning terms used in research papers from 2010 to 2020 by systematically reviewing the literature to compare the similarities and differences in definitions of each of these terms. The hope is to synthesize the terms used for personalized learning so the researchers can analyze and go through the research in the field and conduct meta-analyses and syntheses of the research literature. Also, analyzing the definitions of the term ‘personalized learning’, ‘adaptive learning’, ‘individualized instruction’, and ‘customized learning’ that have been used can help to develop a unified definition for personalized learning that can lead a framework. The framework can help with having a common understanding of personalized learning rather than a collection of loosely defined systems. A unified description of personalized learning and analyzing the studies related to personalized learning can help consolidate findings and suggest new areas to explore.

Our idea of personalized learning rests on the foundation that humans learn through experience and by constructing knowledge. Constructivism claims that learners’ acquired knowledge and understanding, determine learning ability and that knowledge acquisition is a process of construction according to individuals’ experience (Ormrod, 2011 ). Personalized learning is influenced by a learner’s prior experiences, backgrounds, interests, needs, goals, and motivation. Moreover, it is accomplished via meaningful interactions in individual learners’ lives. Furthermore, no conscious effort is needed to be actively learning while engaged in everyday life (Kinshuk, 2012 ) although reflection and mega-cognition can promote learning.

Adaptive instruction, blended instruction, differentiation, customized instruction, individualized learning, adaptive learning, proactive supports, real-world connections, and applications are hallmarks of good personalized learning. In general, personalized-learning models seek to adapt to the pace of learning and the instructional strategies, content and activities being used to fit best each learner’s strengths, weaknesses, and interests. Personalized learning is about giving students some control over their learning (Benhamdi, Babouri and Chiky, 2017 ; Jung, Kim, Yoon, Park and Oakley, 2019 ; Tomberg, Laanpere, Ley and Normak, 2013 ), differentiating instruction for each learner, and providing real-time individualized feedback to teachers and learners (Nedungadi and Raman, 2012 ), which is all effortlessly blended throughout the learning activity. Putting a framework together can help with a practical personalized learning model for all. The model can be developed and evolved as technology develops and we learn more about human learning and machine learning.

Research methodology

For this review, the guidelines published by Okoli to conducting a systematic literature review for Information Systems Research were adapted (Okoli, 2015 ). Okoli’s work provides a detailed framework for writing a systematic literature review with its roots in information technology. As this systematic literature review is rooted in information technology, it was deemed appropriate to use Okoli’s work as the basis for this body of work.

Okoli presented eight significant steps that need to be followed to conduct a scientifically, rigorous systematic literature review. These steps are listed below:

Identify the purpose: The researchers identified the purpose and intended goals of the study to ensure the review is clear to readers.

Draft protocol and train the team: Reviewers agreed on procedures to follow to ensure consistency in how they complete the review.

Apply practical inclusion screen: Reviewers were specific about what studies they considered for review and which ones they eliminated without further examination. The reviewers created four phases to review papers to produce the final papers to review.

Search for literature: Reviewers described the literature search details and justified how they ensured the search’s comprehensiveness.

Extract data: After reviewers identified all the studies to be included in the review, they systematically extract the applicable information from each study by going through four review phases they explained in search query.

Appraise quality: The reviewers explicitly listed the criteria used to decide which papers they will exclude for insufficient quality in the search query. Researchers reviewed all papers and decided on final papers after explicit four search phases. They finalized the papers to be reviewed, depending on the content of the papers’ content and quality.

Synthesize studies: The researchers analyzed the data obtained from the studies using appropriate qualitative techniques.

Write the review: The process of a systematic literature review was explicitly described in adequate detail that other researchers can independently reproduce the review’s results.

Research question

This literature review promotes research around personalized learning in informational education. To fulfill answer of “What are the similarities and differences of different terms used for personalized learning approaches?” we need a research base and theoretical framework that provides answers to basic questions. Furthermore, the following questions are sub-questions to be considered during the study.

How is personalized learning defined?

How adaptive learning has been used and how it relates to personalized learning?

How individualized instruction has been used and how it relates to personalized learning?

How is customized learning connected to personalized learning?

What components need to be included in a well-defined personalized learning term?

Also, researchers are seeking a unified definition of personalized learning that will include all those different components. That is the focus of this literature review was conducted.

Sources of literature

To answer the research question, the researchers have selected the following well known and reputable databases to base this literature review: Scopus, Science Direct, EBSCOhost, IEEE Xplore, JSTOR, and Web of Science to ensure all related journals of the field are included. The most relevant journals for the systematic review were chosen consistently from these databases. Also, Google Scholar h5-index for the category “Educational technology” was used as the starting point since this category is a specific category for personalized learning studies.

Databases in which to base this literature review are listed in Table  1 .

The top nine journals from the “Educational Technology” category from google scholar h5-index selected to keep the range of the papers manageable while trying to ensure the review is broad enough to include enough studies that can satisfactorily answer the research question. Later, most of the journals about educational technology were indexed. SJR (SCIMAGO JOURNAL RANK) was used to validate the impact of the selected journals. Even though the impact factor is not perfectly aligned with Google Scholar’s h5-index order, the selected journals listed the most impactful journals in the educational technology field. Also, even though Journal of Learning Analytics was listed on google scholar and showed having a high impact on education technology, researchers have not located any qualified paper according to selection procedures, thus this journal was eliminated from review.

This review solely retrieved peer-reviewed article papers from online journals because those online academic journals are known to be reliable and authoritative. They allow the readers to verify the facts from their sources, which increases the reliability of enriched studies filled with data and facts. They enable the readers to perform comprehensive research and allow the reader to access more data without the limitations of space and time. A defined method was set in this research for selecting journals, to keep the process methodologically reliable and scientifically consistent. The researchers review the main databases for educational technology to ensure all related journals of the field are included. This review is only focused on journals to keep the scope of the review manageable and provide reviewed data to create a resource for future studies.

Journals in which to base this literature review are listed in Table  2 .

Supplementary procedures

Relevant papers were initially identified through traditional searches of online databases and journals. These papers were subsequently analyzed to determine their applicability to the study.

Search query

An appropriate search query was formulated that would find relevant personalized learning papers. The search query was as follows: ( Publication Title : (“journal name”)) AND (“term”) and the journals listed in the table were searched for each of following terms: “personalized learning”, “adaptive learning”, “individualized instruction”, and “customized learning.”

Inclusion/exclusion criteria

Four phases were determined to meet the paper’s inclusion criteria in the final set to be reviewed. First phase was initial search, searching each term ‘ personalized learning,’ ‘adaptive learning,’ ‘individualized instruction,’ ‘customized learning ,’ filtered years to 2010–2020 to review personalized learning papers which has been a hot topic for the research and policymakers. The language was filtered to English only to not wait on translation, the paper addressed technology integration , and type of the paper research articles that published in one of the peer-reviewed scientific journals listed to keep the scope manageable. The second search phase was eliminating by title, reviewing the abstract and keywords; researchers went through titles, abstracts, and keywords of each result of the initial search and included the ones look related to the term.

The next search phase, reading the abstract of each paper of the second search result-set, looking for a definition to see if it mentions the definition and/or terms that have been used for the term and the paper was available at one of the free online databases or the researchers’ university library. The fourth step was to download all those papers to Mendeley (indexing database) and index them under sub-folders for each journal database. Then the entire paper was read to determine if the paper was to be included in the literature review by looking for components and definitions of personalized learning and star the ones to be included in the review. Each paper that met the inclusion criteria was read in its entirety a second time to validate the paper’s decision in the final data set.

An initial search on google scholar on ‘ personalized learning’ shows that the number of published papers on personalized learning has progressively increased year by year; especially there is a jump in 2008 as seen in Fig. 1 . The date range of 2010 to the present day was chosen as this when personalized learning term started to gain more attention to research due to technology usage increase in education. The first smartphone was released in June 2007, which might be an element of the increase due to flexibility and access it provides. Cheung and Hew ( 2009 ) claimed that handheld devices are increasingly being used in educational settings. Primarily, papers published after the 2000s are focused on more technology-enhanced personalized learning. Figure 1 shows the results of the initial google scholar search on “personalized learning” published papers (Fig. 1 ).

Nine journals were determined as the source of papers to be reviewed for this study. Each journal was searched for “personalized learning,” “adaptive learning,” “individualized instruction,” “customized learning,” and each result gone through the inclusion criteria and final phase; papers were saved in Mendeley under subfolders for each journal. Table 3 are search results for each phase by journals.

The title, abstract, and when necessary, the full paper was reviewed to decide if the paper met the inclusion criteria. This process helped to finalize the papers that will be used for this study, and the result set for “personalized learning” and the result set for each term to be reviewed is shown in Table 3 . Some of the papers that did not fit the inclusion criteria are referenced in this paper as they provide valuable information about personalized learning. We reviewed 978 papers, and 4 phases of inclusion ended up with 56 relevant, high-quality papers. The 56 papers identified are marked in the references section with an asterisk. The systematic review methodology was used, and our literature search resulted in 56 relevant studies meeting the inclusion criteria. As shown in Table  4 , 56 papers met the minimum quality criteria and were examined in detail; 33 of them use personalized learning, 17 adaptive learning, three individualized instruction, and three customized learning as the main term in the paper.

Our findings revealed that although so many terms are used in education settings, by policymakers and cooperate settings, in the research field, the terms used for personalized learning are unified, and mostly personalized learning and/or adaptive learning is being used. For example, Chatti et al. ( 2010 ) and Peng, Ma and Spector ( 2019 ) are the ones who put the two most common terms used for personalized learning together and started to use “personalized adaptive learning,” which might be a good lead for future studies. However, future research needs to focus on components included in the personalized adaptive learning term’s definition, and components are included in it. Chatti et al. ( 2010 ) and Peng et al. ( 2019 ) s paper put all together very well, and Peng et al. ( 2019 ) called it a personalized adaptive smart learning environment. Future studies can focus on what components are being used for each personalized learning approach and, at the same time, acknowledge it is a term that will evolve by time as we learn more about human learning and as technology develop. Table  4 shows the results of the searches for each term by journals.

Existing and emerging trends

Miliband ( 2006 , as cited in Schmid & Petko, 2019 ) pointed out that the Organisation for Economic Co-operation and Development OECD ( 2006 ) was among the first to use personalized learning term and described personalized learning in the report “Schooling for Tomorrow– Personalising Education” as a critical trend. According to this educational policy report, personalized learning is characterized by changes concerning five dimensions: assessment for learning by giving students individual feedback and setting suitable learning objectives, teaching and learning strategies based on the individual needs, curriculum choices (Tomberg et al., 2013 ), student-centered approach to school organization, and strong partnerships beyond the school.

According to the United States National Education Technology Plan 2017 , personalized learning is defined as “instruction in which the pace of learning and the instructional approach are optimized for each learner’s needs. Learning objectives, instructional strategies, and instructional content (Shute and Rahimi, 2017 ) may differ depending on learner needs. Besides, learning activities are meaningful and relevant to learners, driven by their interests, and often self-initiated.” (p. 9).

American Psychological Association Presidential Task Force on Psychology in Education (1993, as cited in Lee et al., 2018 ) explained that a personalized learning plan (PLP) refers to a customized instructional plan that considers individual differences or needs such as career goals, characteristics, interests, and academic mastery. This includes the notions of individualization, differentiation, and personalization. Preparing and implementing PLPs allows for adjusting the pace to individual learners, adjusting instructional methods to individual characteristics, and having different learning goals tailored to individual interests. Furthermore, Sungkur, Antoaroo and Beeharry ( 2016 ) suggested an eye-tracking system to determine the user’s interest and behavior. The PLPs allow learning to be personally relevant, engaging, appropriate to the learners’ capabilities, and respectful of individual differences, making learning useful and motivational.

Learning analytics seems to grow to ensure the process of personalizing the content which allows mechanisms to identify student characteristics and associate them with a learning pattern (Ramos de Melo et al., 2014 ). Also, the ability to reactively organize personalized content may be a favorable factor in promoting the study support in virtual learning environments, respecting students’ different individualities, preferences (Erümit and Çetin, 2020 ) and difficulty factors.

There is a research gap in an adaptive learning environment that needs to focus on emotions and personality which play a significant role in parts of adaptive systems, such as feedback (Fatahi, 2019 ). Furthermore, Junokas, Lindgren, Kang and Morphew ( 2018 ) created a system based on multimodal educational environments that integrate gesture-recognition systems and found that it is effective in improving the learning experience.

The personalization of learning has been achieved using various methods that have been made available by the rapid development of information communication technology (ICT) (Dawson, Heathcote and Poole, 2010 ). Furthermore, Ramos de Melo et al. ( 2014 ) stated that personalization is customizing the content that allows present parts of the content as needed by the student. That is one of the most common themes among most of the personalized learning approaches which can be done by using adaptive learning systems that can present personalized content for individual students (Hwang, Sung, Hung and Huang, 2013 ).

The higher-order thinking skills and communication had attracted little attention in terms of both learning outcomes and the process of adaptive/personalized learning due to the difficulty of measurement and the limited learning support types. Furthermore, virtual reality techniques might be a solution to this need. Developing learning approaches that build on students’ current ability and support efficacy beliefs by allowing autonomy with a proper challenge to promote academic attainment (Foshee, Elliott and Atkinson, 2016 ; Xie et al., 2019 ). Future studies can focus on higher-order thinking skills cultivation by supporting these skills through personalized learning environments.

The idea of personalized learning rests on the foundation that humans learn through experience and by constructing knowledge. It is heavily influenced by a learner’s prior experiences and is accomplished via language and social interaction. Personalized learning is not the only way to think about teaching and learning. Moreover, learning will and should take many different forms. Proper instruction, blended instruction, differentiation, proactive supports, real-world connections, and applications are hallmarks of good, sound personalized learning. In general, personalized-learning models seek to adapt to the pace of learning and the instructional strategies, content and activities being used to fit best each learner’s strengths, weaknesses, and interests. Personalized learning is about giving students control over their learning, differentiating instruction for each child, and providing real-time feedback. Putting a framework together can help with practical personalized learning for all and can be developed as it faces challenges. The framework can help with having a structured common-sense personalized learning instead of a learning system that is being interpreted differently. In conjunction with a well-designed curriculum, instructional practice plays a crucial role in how children learn.

Most of the current personalized learning models/ideas are built on technology integration. For example, while Chen, Lee and Chen ( 2005 ) proposed a personalized system that provides learning paths (Nabizadeh, Gonçalves, Gama, Jorge and Rafsanjani, 2020 ) that can be adapted to various levels of difficulty of course materials (Zou and Xie, 2018 ) and various abilities of learners (p. 239). Klašnja-Milićević et al. ( 2011 ) stated that personalized learning occurs when e-learning systems make deliberate efforts to design educational experiences (Flores, Ari, Inan and Arslan-Ari, 2012 ) that fit the needs, goals, talents, motivations, and interests of their learners (p. 885). The term of needs is not specified to clarify what needs of the learner need to be considered for robust personalized learning. Considering the needs of the learner is one of the most common components used in personalized learning. However, only a few studies clarify what needs are mentioned to be considered, such as emotional needs, social needs, learning needs, knowledge needs, etc. Even if we agree on a unified definition with each component commonly agreed on, we need to ensure that each component is well defined.

In the past decades, many methods and systems have been proposed to accommodate students’ needs by proposing learning environments that consider personal factors. Learning styles (Çakıroğlu, 2014 ; Klašnja-Milićević et al., 2011 ; Latham, Crockett and McLean, 2014 ) have been among the broadly chosen components in previous studies as a reference for adapting learning. For example, George and Lal ( 2019 ) argued that personalized learning is meant to incorporate a learner’s varied attributes, including learning style, knowledge level on a subject, preferences, and learner’s prior knowledge while they discussed adaptive learning is adapting content according to learner’s choice and pace. Chen, Huang, Shih and Chang ( 2016 ) brought up the gender component to personalized learning. Furthermore, Atkinson ( 2006 ) found that there was a significant difference in learning achievement between male and female students, and among students who used different learning styles (Çakıroğlu, 2014 ; Klašnja-Milićević et al., 2011 ; Latham et al., 2014 ).

Our findings revealed that individualized instruction mostly focuses on special education students or students are limited in way compared to their peers. These students have IEPs (individualized educational plans) mandated by the state to be followed to ensure the schools are accommodating these students’ needs. One goal could be to create IEPs for all learners.

Moreover, it seems in education industry terms are quite varied, but when it comes to academia, it is mostly adaptive learning and personalized learning being used interchangeably Rastegarmoghadam and Ziarati ( 2017 ); however, mostly adaptive learning is being used when it is technology-enhanced learning. Adaptivity is typically referring to content being adjusted according to prior knowledge (Huang and Shiu, 2012 ), while personalized learning is being used for more broad adjustments according to different needs, interests, and goals of individuals.

Another finding is that adaptive learning is the most used term follows personalized learning. Individualized learning and customized learning, even though they are being used by cooperative, they are not commonly used in research. As shown in Table  4 , we have found 56 papers met the minimum quality criteria and were examined in detail; 33 of them use personalized learning, 17 adaptive learning, three individualized instruction, and three customized learning.

However, it seems that also the lack of a commonly identified personalized learning approach is an obstacle. This might be due to the nature of technology involvement; due to the rapid development increase in technology that makes personalized learning an evolving approach. That is fine if we all can agree that it should evolve as technology improves, and we learn more about humans and how human-machine interaction can improve the learning process.

Besides, another obstacle is the researchers and policymakers should show the same interest to personalized learning so the demand and research can align. Educators fear that machines will take over the teaching job if they allow technology to be used for teaching. Kinshuk, Huang, Sampson and Chen ( 2013 ) argued that the benefits of technology in education caught the interest of researchers, governments, and funding agencies. Computer systems were funded to help students in the learning process, consequently decreasing teachers’ workload. As a result, educational technology research was able to study advanced issues such as intelligent tutoring, simulations, advanced learning management systems, automatic assessment systems, and adaptive systems. Some educators believe that since technology involves big budgets, the interest of policymakers is not due to the interest of improving learning experience (Troussas, Krouska and Sgouropoulou, 2020 ); their interest is due to the monetary benefit they gain from increased use of technology in education. In addition, Kinshuk et al. ( 2013 ) pointed out that practitioners in education could not take advantage of all that research at an equally fast pace, and the implementation lagged severely behind. Researchers need to keep up with the demand of personalized learning. The alignment will help to ensure the practices policy makers discuss are research based efficient approaches that will increase efficiency of learning/teaching.

The progress of the research in personalized learning shows that by technological improvement, personalized learning becomes more embedded with technology and taking advantage of the benefits technology can offer. Some of these advantages are gathering data of learners’ emotions by using bio-trackers, which might bring up some privacy concerns.

Limitations

This study encountered several shortcomings during the review and in its attempt to answer all the research questions. The enormous number of published papers might lead to some missing relevant papers; numerous literature review studies face this problem. Furthermore, the immense effort to construct a search by identifying the keywords is crucial for the search process. The keyword determination method was conducted using a snowballing process to identify the reflections or keywords relevant to this study. Overlooking articles by omitting relevant information or keyword combinations is likewise possible due to the limited time frame.

Nevertheless, this study also faces the possible limitation caused by the selection criteria. For example, this study focused on only journal articles and was limited to only documents written in English. Therefore, other pertinent articles that are not written in English and were not published in journals might have not included.

Future research

Our findings revealed that there is no unified agreement on what components to consider planning a personalized adaptive learning environment. Future research can focus on components included in different personalized adaptive learning systems and the term’s definition to build a unified approach and definition. Future studies focusing on what components are being used for each personalized learning approach simultaneously need to acknowledge it as a term that will evolve by time as we learn more about human psychology and develop more technologies. Chatti et al. ( 2010 ) and Peng et al. ( 2019 ) paper put it all together very well, and Peng et al. ( 2019 ) called it a personalized adaptive learning. Future studies can be built on this approach to develop a general framework.

Also, a focus on higher-order thinking skills is not a common theme in the existing literature. This gap can be filled up by focusing on higher-order thinking skills cultivation by supporting these skills through personalized learning environments. Future studies can also focus on adding higher-order thinking skills as an outcome of personalized learning models and seek embedding of virtual reality techniques with considering ethical and privacy concerns.

Furthermore, a in depth study is needed to review current personalized adaptive learning platforms/systems and see if different systems work better for different goals and needs.

Conclusions

In conclusion, this study found and analyzed 56 relevant studies based on the research protocol. The findings from this study support that adaptive/personalized learning has become a fundamental learning paradigm in the research community of educational technologies. Firstly, the findings are presented as they relate to the R.Q. (Research Question) s; then, the future direction and limitations are discussed. The SLR results show that using personality traits and their identification techniques has an enormously positive influence in adaptive learning environments. This study is related to several significant domains of psychology, education, and computer science. It likewise reveals the integration of personal traits in the adaptive learning environment, which involves many personality traits and identification techniques that can influence learning. Also, it found that there is an increase of interest in two areas that are oriented towards the incorporation and exploration of significant data capabilities in education: Educational Data Mining (EDM) and Learning Analytics (LA) and their respective communities (Papamitsiou and Economides, 2014 ) which seems to adding another perspective to personalized learning and make it easier modify the learning according individuals.

It seems the personalized learning models gain more attention from governments and policymakers than educators and researchers. We need to focus on the obstacles of lack of interest to motivate the educators and researchers, the experts of the field, to voice their concerns and look for solutions to come up with a robust personalized learning model that will satisfy both instructor and learners’ expectations. Personalized learning cannot be a solution to learning until it is defined better and developed more thoroughly. Personalized learning for everyone looks different according to the needs and goals of the individual. Ennouamani, Mahani and Akharraz ( 2020 ) argued that learners are different in terms of their needs, knowledge, personality, behavior (Pliakos et al., 2019 ) preferences, learning style, culture, as well as the parameters of the mobile devices that they use. Furthermore, the increasing involvement of the researchers and educators in proposing personalized learning approaches can increase the trust towards the ICT supported personalized learning models.

In this review study, we have answered some critical research questions, including the issues with different terms that have been used for personalized learning, components of personalized learning, and obstacles to the development of personalized learning. We need more research to be done about personalized learning. We also need the involvement of experts in the field, educators, pedagogues, researchers, software engineers, and programmers to create teams to work on the same goal to produce stable, unified, personalized learning systems/models.

Also, some research issues and potential future development directions are discussed. According to the discussions and results, it was found that adaptive/personalized learning systems seem to evolve as technology develops, however, a unified agreement on the components needs to be included in personalized learning models still needed. These components may evolve as we learn more about human-machine interaction and learn to take advantage of the technology to improve learning experiences. We suggest that researchers might use the consolidated terms of this review to guide future meta-analyses of the impact of personalized learning on student learning and performance.

To sum up, this study discusses the potential obstacles to personalized learning and practical solutions for these issues. We also discussed different components used for personalized learning models and how personalized learning evolves as technology develops, and we learn more about human-machine interaction.

Availability of data and materials

Not applicable.

Abbreviations

Individualized educational plans

Personalized learning plan

Scimago Journal Rank

Organisation for Economic Co-operation and Development

Information communication technology

Systematic Literature Review

Research Question

Educational Data Mining

Learning Analytics

The papers identified for systematic literature review are marked in the references with an asterisk (*)

*Aeiad, E., & Meziane, F. (2019). An adaptable and personalised elearning system applied to computer. Education and Information Technologies , 78 , 674–681.

Google Scholar  

*Afini Normadhi, N. B., Shuib, L., Md Nasir, H. N., Bimba, A., Idris, N., & Balakrishnan, V. (2019). Identification of personal traits in adaptive learning environment: Systematic literature review. Computers & Education , 130 , 168–190. https://doi.org/10.1016/j.compedu.2018.11.005 .

Article   Google Scholar  

Aroyo, L., Dolog, P., Houben, G.-J., Kravcik, M., Naeve, A., Nilsson, M., & Wild, F. (2006). Interoperability in personalized adaptive learning. Educational Technology & Society, 9 (2), 4–18.

Atkinson, S. (2006). Factors influencing successful achievement in contrasting design and technology activities in higher education. International Journal of Technology and Design Education , 16 , 193–213.

Bahçeci, F., & Gürol, M. (2016). The effect of individualized instruction system on the academic achievement scores of students. Education Research International , 2016 , 1–9. https://doi.org/10.1155/2016/7392125 .

Barrio, B. L., Miller, D., Hsiao, Y. J., Dunn, M., Petersen, S., Hollingshead, A., & Banks, S. (2017). Designing culturally responsive and relevant individualized educational programs. Intervention in School and Clinic , 53 (2), 114–119. https://doi.org/10.1177/1053451217693364 .

*Benhamdi, S., Babouri, A., & Chiky, R. (2017). Personalized recommender system for e-Learning environment. Education and Information Technologies , 22 (4), 1455–1477. https://doi.org/10.1007/s10639-016-9504-y .

Brusilovsky, P., & Peylo, C. (2003). Adaptive and intelligent web-based educational systems adaptive and intelligent technologies for web-based educational systems. International Journal of Artificial Intelligence in Education , 13 , 156–169.

*Çakıroğlu, Ü. (2014). Analyzing the effect of learning styles and study habits of distance learners on learning performances: a case of an introductory programming course. International Review of Research in Open and Distance Learning , 15 (4), 161–185.

Cavanagh, S. (2014). What is “personalized learning”? Educators seek clarity. Education Week , 34 (9), S2–S4.

Chatti, M. A. (2010). Personalization in technology enhanced learning: a social software perspective . Aachen: Shaker Verlag.

*Chatti, M. A., Jarke, M., & Specht, M. (2010). The 3P learning model. Educational Technology & Society , 13 (4), 74–85.

Chen, C. M., Lee, H. M., & Chen, Y. H. (2005). Personalized e-learning system using item response theory. Computers & Education , 44 (3), 237–255.

Chen, C. M., Liu, C. Y., & Chang, M. H. (2006). Personalized curriculum sequencing utilizing modified item response theory for web-based instruction. Expert Systems with Applications , 30 (2), 378–396.

Chen, S. Y., Huang, P. R., Shih, Y. C., & Chang, L. P. (2016). Investigation of multiple human factors in personalized learning. Interactive Learning Environments , 24 (1), 119–141.

Cheung, W. S., & Hew, K. F. (2009). A review of research methodologies used in studies on mobile handheld devices in K-12 and higher education settings. Australasian Journal of Educational Technology , 25 (2), 153–183.

*Chou, C. Y., Lai, K. R., Chao, P. Y., Lan, C. H., & Chen, T. H. (2015). Negotiation based adaptive learning sequences: combining adaptivity and adaptability. Computers & Education , 88 , 215–226. https://doi.org/10.1016/j.compedu.2015.05.007 .

Dawson, S., Heathcote, L., & Poole, G. (2010). Harnessing ICT potential: the adoption and analysis of ICT systems for enhancing the student learning experience. International Journal of Educational Management , 24 (2), 116–128.

Dewey, J. (1915). The school and society . Chicago: Chicago Press.

Dewey, J. (1998). Experience and education (60th anniversary edn.) . West Lafayette: Kappa Delta.

*Dwivedi, P., & Bharadwaj, K. K. (2013). Effective trust-aware E-learning recommender system based on learning styles and knowledge levels. Educational Technology & Society , 16 (4), 201–216.

*Ennouamani, S., Mahani, Z., & Akharraz, L. (2020). A context-aware mobile learning system for adapting learning content and format of presentation: design, validation, and evaluation. Education and Information Technologies . https://doi.org/10.1007/s10639-020-10149-9 .

*Erümit, A. K., & Çetin, İ. (2020). Design framework of adaptive intelligent tutoring systems. Education and Information Technologies . https://doi.org/10.1007/s10639-020-10182-8 .

*Fatahi, S. (2019). An experimental study on an adaptive e-learning environment based on learner’s personality and emotion. Education and Information Technologies , 24 (4), 2225–2241. https://doi.org/10.1007/s10639-019-09868-5 .

*FitzGerald, E., Kucirkova, N., Jones, A., Cross, S., Ferguson, R., Herodotou, C., … Scanlon, E. (2018). Dimensions of personalisation in technology-enhanced learning: A framework and implications for design. British Journal of Educational Technology , 49 (1), 165–181. https://doi.org/10.1111/bjet.12534 .

*Flores, R., Ari, F., Inan, F. A., & Arslan-Ari, I. (2012). The impact of adapting content for students with individual differences. Educational Technology & Society , 15 (3), 251–261.

*Foshee, C. M., Elliott, S. N., & Atkinson, R. K. (2016). Technology-enhanced learning in college mathematics remediation. British Journal of Educational Technology , 47 (5), 893–905. https://doi.org/10.1111/bjet.12285 .

*Garcia-Cabot, A., De-Marcos, L., & Garcia-Lopez, E. (2015). An empirical study on m-learning adaptation: Learning performance and learning contexts. Computers & Education , 82 , 450–459. https://doi.org/10.1016/j.compedu.2014.12.007 .

*George, G., & Lal, A. M. (2019). Review of ontology-based recommender systems in e-learning. Computers & Education , 142 (July), 103,642 https://doi.org/10.1016/j.compedu.2019.103642 .

Göbel, S., Wendel, V., Ritter, C., & Steinmetz, R. (2010). Personalized, adaptive digital educational games using narrative game-based learning objects. In International conference on technologies for E-learning and digital entertainment , (pp. 438–445). Berlin. Springer.

Gómez, S., Zervas, P., Sampson, D. G., & Fabregat, R. (2014). Context-aware adaptive and personalized mobile learning delivery supported by UoLmP. Journal of King Saud University - Computer and Information Sciences , 26 (1), 47–61. https://doi.org/10.1016/j.jksuci.2013.10.008 .

*Griff, E. R., & Matter, S. F. (2013). Evaluation of an adaptive online learning system. British Journal of Educational Technology , 44 (1), 170–176. https://doi.org/10.1111/j.1467-8535.2012.01300.x .

*Hooshyar, D., Ahmad, R. B., Yousefi, M., Yusop, F. D., & Horng, S. J. (2015). A flowchart-based intelligent tutoring system for improving problem-solving skills of novice programmers. Journal of Computer Assisted Learning , 31 (4), 345–361. https://doi.org/10.1111/jcal.12099 .

*Hsieh, C. W., & Chen, S. Y. (2016). A cognitive style perspective to handheld devices: customization vs. personalization. International Review of Research in Open and Distance Learning , 17 (1), 1–22. https://doi.org/10.19173/irrodl.v17i1.2168 .

*Huang, S. L., & Shiu, J. H. (2012). A user-centric adaptive learning system for e-learning 2.0. Educational Technology & Society , 15 (3), 214–225.

Huang, Y. M., Liang, T. H., Su, Y. N., & Chen, N. S. (2012). Empowering personalized learning with an interactive e-book learning system for elementary school students. Educational Technology Research and Development , 60 (4), 703–722. https://doi.org/10.1007/s11423-012-9237-6 .

*Hwang, G. J., Sung, H. Y., Hung, C. M., & Huang, I. (2013). A learning style perspective to investigate the necessity of developing adaptive learning systems. Educational Technology & Society , 16 (2), 188–197.

*Jung, E., Kim, D., Yoon, M., Park, S., & Oakley, B. (2019). The influence of instructional design on learner control, sense of achievement, and perceived effectiveness in a supersize MOOC course. Computers & Education , 128 (October 2018), 377–388 https://doi.org/10.1016/j.compedu.2018.10.001 .

*Junokas, M. J., Lindgren, R., Kang, J., & Morphew, J. W. (2018). Enhancing multimodal learning through personalized gesture recognition. Journal of Computer Assisted Learning , 34 (4), 350–357. https://doi.org/10.1111/jcal.12262 .

*Kinshuk (2012). Guest editorial: personalized learning. Educational Technology Research and Development , 60 (4), 561–562. https://doi.org/10.1007/s11423-012-9248-3 .

*Kinshuk, Huang, H.-W., Sampson, D., & Chen, N.-S. (2013). Trends in educational technology through the lens of the highly cited articles published in the Journal of Educational Technology and Society. Educational Technology & Society , 16 (2), 3–20.

*Klašnja-Milićević, A., Vesin, B., Ivanović, M., & Budimac, Z. (2011). E-Learning personalization based on hybrid recommendation strategy and learning style identification. Computers & Education , 56 (3), 885–899. https://doi.org/10.1016/j.compedu.2010.11.001 .

*Ko, C. C., Chiang, C. H., Lin, Y. L., & Chen, M. C. (2011). An individualized e-Reading system developed based on multirepresentations approach. Educational Technology & Society , 14 (4), 88–98.

*Latham, A., Crockett, K., & McLean, D. (2014). An adaptation algorithm for an intelligent natural language tutoring system. Computers & Education , 71 , 97–110. https://doi.org/10.1016/j.compedu.2013.09.014 .

*Lee, D., Huh, Y., Lin, C. Y., & Reigeluth, C. M. (2018). Technology functions for personalized learning in learner-centered schools. Educational Technology Research and Development , 6 (5), 1269–1302. https://doi.org/10.1007/s11423-018-9615-9 .

*Lin, C. F., Yeh, Y. C., Hung, Y. H., & Chang, R. I. (2013). Data mining for providing a personalized learning path in creativity: an application of decision trees. Computers & Education , 68 , 199–210. https://doi.org/10.1016/j.compedu.2013.05.009 .

*Liu, M., McKelroy, E., Corliss, S. B., & Carrigan, J. (2017). Investigating the effect of an adaptive learning intervention on students’ learning. Educational Technology Research and Development , 65 (6), 1605–1625. https://doi.org/10.1007/s11423-017-9542-1 .

*Liu, M.-T., & Yu, P.-T. (2011). International forum of educational technology &amp; Society aberrant learning achievement detection based on person-fit statistics in personalized e-learning systems. Journal of Educational Technology & Society , 14 (1), 107–120. https://doi.org/10.2307/jeductechsoci.14.1.107

*Lo, J. J., Chan, Y. C., & Yeh, S. W. (2012). Designing an adaptive web-based learning system based on students’ cognitive styles identified online. Computers & Education , 58 (1), 209–222. https://doi.org/10.1016/j.compedu.2011.08.018 .

Lockspeiser, T. M., & Kaul, P. (2016). Using individualized learning plans to facilitate learner-centered teaching. Journal of Pediatric and Adolescent Gynecology , 29 (3), 214–217. https://doi.org/10.1016/j.jpag.2015.10.020 .

*Lu, C., Chang, M., Kinshuk, Huang, E., & Chen, C. W. (2014). Context-aware mobile role-playing game for learning. Lecture Notes in Educational Technology , 17 (9783642382901), 131–146. https://doi.org/10.1007/978-3-642-38291-8_8 .

McCombs, B. L., & Whisler, J. S. (1997). The learner-centered classroom and school: Strategies for increasing student motivation and achievement , (1st ed.). San Francisco: Jossey-Bass.

Miliband, D. (2006). Choice and voice in personalised learning. In OECD (Ed.), Schooling for tomorrow: personalising education , (pp. 21–30). Paris: OECD Publishing.

*Nabizadeh, A. H., Gonçalves, D., Gama, S., Jorge, J., & Rafsanjani, H. N. (2020). Adaptive learning path recommender approach using auxiliary learning objects. Computers & Education , 147 (November 2019), 103777 https://doi.org/10.1016/j.compedu.2019.103777 .

*Nedungadi, P., & Raman, R. (2012). A new approach to personalization: Integrating e-learning and m-learning. Educational Technology Research and Development , 60 (4), 659–678. https://doi.org/10.1007/s11423-012-9250-9 .

*Niknam, M., & Thulasiraman, P. (2020). LPR: a bio-inspired intelligent learning path recommendation system based on meaningful learning theory. Education and Information Technologies . https://doi.org/10.1007/s10639-020-10133-3 .

Okoli, C. (2015). A guide to conducting a standalone systematic literature review. Communications of the Association for Information Systems , 37 (1), 879–910. https://doi.org/10.17705/1cais.03743 .

Organisation for Economic Co-operation and Development (OECD). (2006). Are students ready for a technology-rich world? What PISA studies tell us. Retrieved from http://www.oecd.org

Ormrod, J. E. (2011). Human learning . London: Pearson Higher.

Papamitsiou, Z., & Economides, A. A. (2014). Learning analytics and educational data mining in practice: a systemic literature review of empirical evidence. Educational Technology & Society , 17 (4), 49–64.

Peng, H., Ma, S., & Spector, J. M. (2019). Personalized adaptive learning: an emerging pedagogical approach enabled by a smart learning environment. Lecture Notes in Educational Technology , 171–176. https://doi.org/10.1007/978-981-13-6908-7_24 .

*Pliakos, K., Joo, S. H., Park, J. Y., Cornillie, F., Vens, C., & Van den Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers & Education , 137 (April), 91–103. https://doi.org/10.1016/j.compedu.2019.04.009 .

*Pontual Falcão, T., e Peres, F. M. A., Sales de Morais, D. C., & da Silva Oliveira, G. (2018). Participatory methodologies to promote student engagement in the development of educational digital games. Computers & Education , 116 , 161–175. https://doi.org/10.1016/j.compedu.2017.09.006 .

*Ramos De Melo, F., Flôres, E. L., Diniz De Carvalho, S., Gonçalves De Teixeira, R. A., Batista Loja, L. F., & De Sousa Gomide, R. (2014). Computational organization of didactic contents for personalized virtual learning environments. Computers & Education , 79 , 126–137. https://doi.org/10.1016/j.compedu.2014.07.012 .

*Rastegarmoghadam, M., & Ziarati, K. (2017). Improved modeling of intelligent tutoring systems using ant colony optimization. Education and Information Technologies , 22 (3), 1067–1087. https://doi.org/10.1007/s10639-016-9472-2 .

Reisman, S. (2014). The future of online instruction, Part 2. Computer , 47 (6), 82–84. https://doi.org/10.1109/MC.2014.168 .

Sampson, D., Karagiannidis, C., & Kinshuk (2002). Personalised learning: educational, technological and standardisation perspective. Interactive Educational Multimedia: IEM , 4 (4), 24–39.

*Scheiter, K., Schubert, C., Schüler, A., Schmidt, H., Zimmermann, G., Wassermann, B., … Eder, T. (2019). Adaptive multimedia: using gaze-contingent instructional guidance to provide personalized processing support. Computers & Education , 139 , 31–47. https://doi.org/10.1016/j.compedu.2019.05.005 .

*Schmid, R., & Petko, D. (2019). Does the use of educational technology in personalized learning environments correlate with self-reported digital skills and beliefs of secondary-school students? Computers & Education , 136 (March), 75–86. https://doi.org/10.1016/j.compedu.2019.03.006 .

Sharples, M. (2000). The design of personal mobile technologies for lifelong learning. Computers & Education , 34 (3–4), 177–193. https://doi.org/10.1016/s0360-1315(99)00044-5 .

*Shute, V. J., & Rahimi, S. (2017). Review of computer-based assessment for learning in elementary and secondary education. Journal of Computer Assisted Learning , 33 (1), 1–19. https://doi.org/10.1111/jcal.12172 .

*Somyürek, S. (2015). The new trends in adaptive educational hypermedia systems. International Review of Research in Open and Distance Learning , 16 (1), 221–241. https://doi.org/10.19173/irrodl.v16i1.1946 .

*Spector, J. M. (2013). Emerging educational technologies and research directions. Educational Technology & Society , 16 (2), 21–30.

Spector, J. M. (2014). Conceptualizing the emerging field of smart learning environments. Smart Learning Environments , 1 , 2. https://doi.org/10.1186/s40561-014-0002-7 .

Spector, J. M. (2015). Foundations of educational technology: integrative approaches and interdisciplinary perspectives , (2nd ed.). New York: Routledge.

Book   Google Scholar  

Spector, J. M. (2018). The potential of smart technologies for learning and instruction. International Journal of Smart Technology & Learning , 1 (1), 21–32. https://doi.org/10.1504/IJSMARTTL.2016.078163 .

Sungkur, R. K., Antoaroo, M. A., & Beeharry, A. (2016). Eye tracking system for enhanced learning experiences. Education and Information Technologies , 21 (6), 1785–1806. https://doi.org/10.1007/s10639-015-9418-0 .

*Tomberg, V., Laanpere, M., Ley, T., & Normak, P. (2013). Sustaining teacher control in a blog-based personal learning environment. International Review of Research in Open and Distance Learning , 14 (3), 109–133 https://doi.org/10.19173/irrodl.v14i3.1397 .

*Troussas, C., Krouska, A., & Sgouropoulou, C. (2020). Collaboration and fuzzy-modeled personalization for mobile game-based learning in higher education. Computers & Education , 144 (September 2019), 103698 https://doi.org/10.1016/j.compedu.2019.103698 .

Truong, H. M. (2016). Integrating learning styles and adaptive e-learning system: current developments, problems, and opportunities. Computers in Human Behavior , 55 , 1185–1193 https://doi.org/10.1016/j.chb.2015.02.014 .

Tseng, S.-S., Su, J.-M., Hwang, G.-J., Hwang, G.-H., Tsai, C.-C., & Tsai, C.-J. (2008). An object- oriented course framework for developing adaptive learning systems. Educational Technology & Society , 11 (2), 171–191.

U.S. Department of Education (2010). Transforming American education: Learning powered by technology . Washington, DC: Office of Educational Technology.

U.S. Department of Education, Office of Educational Technology (2017 ). Reimagining the role of technology in education: 2017 national education technology plan update. Available at: https://tech.ed.gov/files/2017/01/NETP17.pdf .

*Wang, Y. H., & Liao, H. C. (2011). Adaptive learning for ESL based on computation. British Journal of Educational Technology , 42 (1), 66–87. https://doi.org/10.1111/j.1467-8535.2009.00981.x .

*Xie, H., Chu, H. C., Hwang, G. J., & Wang, C. C. (2019). Trends and development in technology-enhanced adaptive/personalized learning: a systematic review of journal publications from 2007 to 2017. Computers & Education , 140 (June), 103599 https://doi.org/10.1016/j.compedu.2019.103599 .

Yang, T. C., Hwang, G. J., & Yang, S. J. H. (2013). Development of an adaptive learning system with multiple perspectives based on students’ learning styles and cognitive styles. Educational Technology & Society , 16 (4), 185–200.

*Zou, D., & Xie, H. (2018). Personalized word-learning based on technique feature analysis and learning analytics. Educational Technology & Society , 21 (2), 233–244.

Download references

Acknowledgements

Author information, authors and affiliations.

University of North Texas, Denton, USA

Atikah Shemshack & Jonathan Michael Spector

You can also search for this author in PubMed   Google Scholar

Contributions

Each author contributed evenly to this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Atikah Shemshack .

Ethics declarations

Competing interests.

The authors have no conflict of interest to declare.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shemshack, A., Spector, J.M. A systematic literature review of personalized learning terms. Smart Learn. Environ. 7 , 33 (2020). https://doi.org/10.1186/s40561-020-00140-9

Download citation

Received : 05 August 2020

Accepted : 13 September 2020

Published : 23 October 2020

DOI : https://doi.org/10.1186/s40561-020-00140-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Personalized learning
  • Intelligent tutoring systems
  • Learning analytics
  • Personalized adaptive learning
  • Systematic review

research papers on learning approach

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Psychol Res Behav Manag

A Systematic Review of Systematic Reviews on Blended Learning: Trends, Gaps and Future Directions

Muhammad azeem ashraf.

1 Research Institute of Education Science, Hunan University, Changsha, People’s Republic of China

Meijia Yang

Yufeng zhang, mouna denden.

2 Research Laboratory of Technologies of Information and Communication & Electrical Engineering (LaTICE), Tunis Higher School of Engineering (ENSIT), Tunis, Tunisia

Ahmed Tlili

3 Smart Learning Institute, Beijing Normal University, Beijing, People’s Republic of China

4 School of Professional Studies, Columbia University, New York City, NY, USA

Ronghuai Huang

Daniel burgos.

5 Research Institute for Innovation & Technology in Education (UNIR iTED), Universidad Internacional de La Rioja (UNIR), Logroño, 26006, Spain

Blended Learning (BL) is one of the most used methods in education to promote active learning and enhance students’ learning outcomes. Although BL has existed for over a decade, there are still several challenges associated with it. For instance, the teachers’ and students’ individual differences, such as their behaviors and attitudes, might impact their adoption of BL. These challenges are further exacerbated by the COVID-19 pandemic, as schools and universities had to combine both online and offline courses to keep up with health regulations. This study conducts a systematic review of systematic reviews on BL, based on PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, to identify BL trends, gaps and future directions. The obtained findings highlight that BL was mostly investigated in higher education and targeted students in the first place. Additionally, most of the BL research is coming from developed countries, calling for cross-collaborations to facilitate BL adoption in developing countries in particular. Furthermore, a lack of ICT skills and infrastructure are the most encountered challenges by teachers, students and institutions. The findings of this study can create a roadmap to facilitate the adoption of BL. The findings of this study could facilitate the design and adoption of BL which is one of the possible solutions to face major health challenges, such as the COVID-19 pandemic.

Introduction

Blended Learning (BL) is one of the most frequently used approaches related to the application of Information and Communications Technology (ICT) in education. 1 In its simplest definition, BL aims to combine face-to-face (F2F) and online settings, resulting in better learning engagement and flexible learning experiences, with rich settings way further the use of a simple online content repository to support the face-to-face classes. 2 , 3 Researchers and practitioners have used different terms to refer to the blended learning approach, including “brick and click” instruction, 4 hybrid learning, 4 dual-mode instruction, 5 blended pedagogies, 4 HyFlex learning, 6 targeted learning, 4 multimodal learning and flipped learning. 3

Researchers and practitioners have pointed out that designing BL experiences could be complex, as several features need to be considered, including the quality of learning experiences, learning instruction, learning technologies/tools and applied pedagogies. 7–9 Therefore, they have focused on investigating different BL perspectives since 2000. 10 Despite this 21-year investigation and research, there are still several challenges and unanswered questions related to BL, including the quality of the designed learning materials 9 , 11 , 12 applied learning instructions, 9 the culture of resisting this approach, 13 , 14 and teachers being overloaded when applying BL. 15 The COVID-19 pandemic has further highlighted the challenges associated with BL. Specifically, international universities and schools worldwide had to take several actions with respect to health regulations, such as reducing classroom sizes. 16 Therefore, they combined online and offline learning to maintain their courses for both on-campus and off-campus experiences. 16 For instance, as a response to the effort made by the government of Indonesia to carry out physical distancing during the COVID-19 pandemic, in all domains including education, some elementary schools used BL with Moodle platform to ensure the continuity of learning. 17 In this context, several teachers raised concerns about implementing BL experiences, such as the lack of infrastructure and competencies to do so, calling for further investigation in this regard. Several international organizations, such as UNESCO and ILO, claimed that teacher professional development for online and blended learning is one of the priorities for building resilient education systems for the future. 18

Based on the background above, it is seen that there is still room for discussion of designing and implementing effective BL. Researchers have suggested that conducting literature reviews can help identify challenges and solutions in a given domain. 19–21 Review papers may serve the development of new theories and also shape future research studies, as well as disseminate knowledge to promote scientific discussion and reflection about concepts, methods and practices. However, several BL systematic reviews were conducted in the literature which are of variable quality, focus and geographical region. This made the BL literature fragmented, where no study provides a comprehensive summary that could be a reference for different stakeholders to adopt BL. In this context, Smith et al mentioned that a logical and appropriate next step is to conduct a systematic review of reviews of the topic under consideration, allowing the findings of separate reviews to be compared and contrasted, thereby providing comprehensive and in-depth findings for different stakeholders. 22 As BL is becoming the new normal, 23 this study takes a step further beyond simply conducting a systematic review and conducts a systematic review of systematic reviews on BL. By systematically examining high-quality published literature review articles, this study reveals the reported BL trends and challenges, as well as research gaps and future paths. These findings could help different stakeholders (eg, policy makers, teachers, instructional designers, etc.) to facilitate the design and adoption of BL worldwide. Although several systematic reviews of literature reviews have been conducted in different fields, such as engineering, 24 healthcare 25 and tourism, 26 no one was conducted on blended learning, to the best of our knowledge. It should be noted that one study was conducted in this context, but it mainly focused on the transparency of the systematic reviews that were conducted 27 and was not about the BL field itself.

Guided by the technology-based learning model (see Figure 1 ), this study aims to answer the following six research questions:

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0001.jpg

Blended learning model.

RQ1. What are the trends of blended learning research in terms of: publication year, geographic region and publication venue?

RQ2. What are the covered subject areas in blended learning research?

RQ3. Who are the covered participants (stakeholders) in blended learning research?

RQ4. What are the most frequently used research methods (in systematic reviews) in blended learning research?

RQ5. How blended learning was designed in terms of the used learning models and technologies?

RQ6. What are the learning outcomes of blended learning, as well as the associated challenges?

The findings of this study could help to analyze the behaviors and attitudes of different stakeholders from different BL contexts, hence draw a comprehensive understanding of BL and its impact from different international perspectives. This can promote cross-country collaboration and more open BL design that more worldwide universities could be involved in. The findings could also facilitate the design (eg, in terms of the used learning models and technologies) and adoption of BL which is one of the possible solutions to face major health challenges, such as the COVID-19 pandemic.

Methodology

This study presents a systematic review of systematic review papers on BL. In particular, this review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. 28 PRISMA provides a standard peer-accepted methodology that uses a guideline checklist, which was strictly followed for this study, to contribute to the quality assurance of the revision process and to ensure its replicability. A review protocol was developed, describing the search strategy and article selection criteria, quality assessment, data extraction and data analysis procedures.

Search Strategy and Selection Criteria

To deal with this topic, an extensive search for research articles was undertaken in the most common and highly valued electronic databases: Web of Science, Scopus and Google Scholar, 29 using the following search strings.

Search string: ((blending learning substring) AND (literature review substring))

Blended learning substring: “Blended learning” OR “blended education” OR “hybrid learning” OR “flipped classroom” OR “flipped learning” OR “inverted classroom” OR “mixed-mode instruction” OR “HyFlex learning”

Literature review substring: “Review” OR “Systematic review” OR “state-of-art” OR “state of the art” OR “state of art” OR “meta-analysis” OR “meta analytic study” OR “mapping stud*” OR “overview”

Databases were searched separately by two of the authors. After searching the relevant databases, the two authors independently analyzed the retrieved papers by titles and abstracts, and papers that clearly were not systematic reviews, such as empirical, descriptive and conceptual papers, were excluded. Then, the two authors independently performed an eligibility assessment by carefully screening the full texts of the remaining papers, based on the inclusion and exclusion criteria described in Table 1 . During this phase, disagreements between the authors were resolved by discussion or arbitration from a third author. Specifically, to provide high-quality papers, this study was restricted to papers published in journals.

Inclusion and Exclusion Criteria

This research yielded a total of 972 articles. After removing duplicated papers, 816 papers remained. 672 papers were then removed based on the screening of titles and abstracts. The remaining 144 papers were considered and assessed as full texts. 85 of these papers did not pass the inclusion criteria. Thus, as a total number, 57 eligible research studies remained for inclusion in the systematic review. Figure 2 presents the study selection process as recommended by the PRISMA group. 28

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0002.jpg

Flowchart of the systematic review process.

Quality Assessment

For methodological quality evaluation, the AMSTAR assessment tool was used. AMSTAR is widely used as a valuable tool to evaluate the quality of systematic reviews conducted in any academic field. 30 It consists of 11 items that evaluate whether the review was guided by a protocol, whether there was duplicate study selection and data extraction, the comprehensiveness of the search, the inclusion of grey literature, the use of quality assessment, the appropriateness of data synthesis and the documentation of conflicts of interest. Specifically, two authors independently assessed the methodological quality of the included reviews using the AMSTAR checklist. Items were evaluated as “Yes” (meaning the item has been properly handled, 1 point), “No” (indicating the possibility that the item did not perform well, 0 points) or “Not applicable” (in the case of performance failure because the item was not applied, 0 points). Disagreements regarding the AMSTAR score were resolved by discussion or by a decision made by a third author.

Appendix 1 presents the results of the quality assessment of the 57 systematic reviews based on the AMSTAR tool. 19 were rated as being low quality (AMSTAR score 0–4), 30 as being moderate quality (score 5–8), and eight as being high quality (score 9–11). Specifically, no study has acknowledged the conflict of interest in both the systematic review and the included studies. Also, few studies provided the list of the included and excluded studies (3 out of 57), and reported the method used to combine the findings of the studies (13 out of 57). About half of the included studies assessed the scientific quality of the included studies (25 out of 57), but all the studies fulfilled at least one quality criterion.

Data Extraction

This study adapted the technology-based learning model, 31 which has been used in BL contexts, 32 , 33 as shown in Figure 1 . This model is based on six factors: subject area, learning models, participants, outcomes and issues, research methods and adopted technologies. The current study adopted most of the schemes from this model but made slight adjustments according to the features of different models in blended learning. Table 2 presents a detailed description of the coding scheme that was used in this study to answer the aforementioned research questions.

The Coding Scheme for Analyzing the Collected Papers

Results and Discussion

Blended learning trends.

Figure 3 shows that the first two systematic reviews on BL were conducted in 2012. The first, by Keengwe and Kang, 34 investigated the effectiveness of BL practices in the teacher education field. The second was by Rowe et al, 35 which investigated how to incorporate BL in clinical settings and health education. These findings show an early interest in providing teachers with the necessary competencies and skills to use BL, as well as in enhancing health education, where students need more practical knowledge and skills that could be facilitated through BL (eg, simulation health videos, virtual labs, etc.). The number of literature reviews conducted has since increased, showing an increased interest in BL over the years. Specifically, the highest peak of literature reviews conducted on blended learning was in 2020 (16 studies). This might be due to the COVID-19 pandemic, which forced most institutions worldwide to implement BL (online merged with offline) to accommodate the needs of learners in this disruptive time. 18 This has encouraged many institutions to make their own attempts to practice BL and thus furthered the research interest in examining the best practices of BL.

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0003.jpg

Distribution of studies by publication year.

Additionally, according to the authors’ affiliation countries (see Figure 4 ), China and the United States have the highest number of publications, with nine and seven studies respectively. This could be explained by the continuous rapid evolution of the technological education industry in both China and the United States, 36 which has made researchers and educators innovate to provide more flexible learning experiences by combining both online and offline environments. 37 This could also be explained by the number of blended learning policies that have been issued in these two countries to facilitate blended learning adoption. 38 , 39

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0004.jpg

Distribution of studies per country.

Interestingly, while several studies are from Europe (eg, Belgium, the UK, Italy, etc.), there are very few studies from the African and Arab regions. Similarly, in BL contexts, Birgili et al 40 conducted a systematic review about flipped learning between 2012 and 2018; they found very few studies coming from Africa. This indicates a trend where countries with more sufficient educational resources and infrastructure are exposed to more chances to develop BL environments and experiences. These findings call for more cross-country collaboration to facilitate the implementation of BL in the countries that have limited knowledge or infrastructure related to BL. For instance, such a collaboration could cover BL policies, ICT trainings and the development of educational resources and technologies.

Finally, the 57 reviews were published in 44 journals. Figure 5 shows the journals that have at least two publications. Education and Information Technologies has the highest number of publications (six studies), followed by Interactive Learning Environments (four studies) and Nurse Education Today (four studies). These journals are mostly from the educational technology and health fields.

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0005.jpg

Distribution of studies by publication venue.

Subject Area

Figure 6 shows that most of the literature review studies (n = 21) did not mention the covered subject area and discussed BL in general. For example, Wang et al proposed a complex adaptive systems framework to conduct analysis on BL literature. 41 This shows that, despite the popularity of BL, which has existed for a decade, educators and researchers are still finding it to be a complex concept that needs further investigation regardless of the subject. 2

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0006.jpg

Distribution of studies by subject area.

Other studies considered BL as being context-dependent, 42 investigating it from different subject areas, namely health (14 studies), STEM (five studies) and language (three studies). This could be explained by these three subjects requiring a lot of practical knowledge, such as communication and pronunciation, programming or physical treatments, where the BL concept could provide teachers with a chance to be more innovative and offer students the possibility of practicing this practical knowledge online by using virtual labs or online virtual programming emulators, for instance. Walker et al 43 and Yeonja et al 44 found that BL is considered to be crucial for health students, and health educators have tried to integrate a wide range of advanced technology and learning tools to enhance their skill acquisition.

From these findings, it can be deduced that more research should be conducted to investigate how BL is conducted in other subject areas that are considered crucial for student performance assessment, such as mathematics. This could help researchers and practitioners compare the different BL design and assessment approaches in different subjects and come up with personalized guidelines that could help educators implement their BL in a specific subject. In this context, studies have pointed out that teachers are willing to implement BL in their courses but do not know how. 45 Additionally, as shown in Figure 6 , most of the conducted literature reviews covered limited number of studies (less than 50). Therefore, the future literature reviews on BL should cover more studies (more than 50) to have an in-depth and broad view of how BL is being implemented in different contexts by different researchers.

Participants

As Figure 7A shows, the most targeted participants by the review studies were students (n = 42) followed by teachers (n = 13) and then working adults, health professionals and researchers (one study for each). This analysis shows that none of the review studies have targeted major players in the adoption of BL, such as policy makers. Owston stated that policies on different levels (eg, institutions, faculties, technology use, data collection procedures, learning support, etc.) are crucial to advancing the adoption of BL for future education. 38 Therefore, to advance BL adoption worldwide, more reviews about BL policies and the focus of these policies – including copyright, privacy and data protection, and others, 46 , 47 – should be investigated to develop a BL policy framework to which everyone could refer.

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0007.jpg

( A ) Distribution by educational level. ( B ) Distribution by participants.

Figure 7B , on the other hand, shows that most of the review studies (n = 33) focused mainly on higher education, followed by K–12 (six studies) and teacher education (five studies). Interestingly, these findings are in line with two older studies that were conducted in 2012 (Halverson et al) 48 and 2013 (Drysdale et al), 49 where they found that BL is mostly applied in higher education. These findings clearly show that, despite the long period of time since 2012, the research setting of BL application has not changed, which calls for more serious efforts and research about BL design in other contexts, such as K–12. Especially since younger students might lack the appropriate self-regulation skills compared to older students that can help them adopt BL, 50 more support should also be provided accordingly. Additionally, as few studies focused on teacher education, more research should investigate how to harness the power of BL for teacher professional development. There are limited empirical findings on BL for teacher professional development, 34 , 51–53 calling for more investigation in this context.

Research Method

Table 3 shows that most reviews conducted were systematic reviews (n = 47). As researchers note, systematic literature reviews are usually composed with a clearly defined objective, a research question, a research approach and inclusion and exclusion criteria. 54–56 Through systematic review, researchers can come to a qualitative conclusion in response to their research question. Only seven reviews conducted meta-analysis to assess the effect size and variability of BL and to identify potential causal factors that can inform practice and further research. Finally, three studies used both systematic reviews and meta-analysis in their studies, which can quantitatively synthesize the results in an even more comprehensive way. For instance, Liu et al 57 first reviewed the literature of the effectiveness of knowledge acquisition in health-subject learners and then conducted a meta-analysis to show that BL had a significant larger pooled effect size than non-BL health-subject learners. In this way, researchers are able to address the extent to which BL is truly effective in the learning. 57 Considering that only three review papers conducted both systematic review and meta-analysis, we must again address the usefulness of quantitative analysis. There are still many unanswered questions that could be addressed in a better way using quantitative analysis. Therefore, future research should consider conducting more meta-analysis in order to provide a better understanding of the nuanced effects of BL.

Distribution of Studies by Research Method and Subject Area

Design (Learning Models and Technologies)

Figure 8 shows that the majority of review studies (33 out of 57) discussed BL as a generic concept and did not mention any specific model. Additionally, the flipped model was the most frequently implemented model, mentioned by 27 review studies. This model is designed based on three stages: pre-class, in-class and post-class (optional). In the pre-class stage, the students engage with the course content through online resources, so that they spend in-class time doing practical activities and having discussions. Then, in the post-class stage, teachers can assess the students’ perceptions and performance in the flipped course. 32

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0008.jpg

Frequency of usage of blended learning models.

The second most frequently used models were the station rotation model and the flex model (each mentioned by three studies). In the station rotation model, the student can rotate at fixed points of time (on a fixed schedule or at the teacher’s discretion) between different stations, at least one of which is an online learning station). 58 For instance, the students can rotate between face-to-face (F2F) instruction, online instruction and collaborative activities. The flex model, on the other hand, relies entirely on online materials and student self-study, with the availability of F2F teachers when needed. 59

Two review studies mentioned the self-blend (also known as the “à la carte” model) and the enriched virtual model. The first model allows students to take fully online courses with online teachers, in addition to other F2F courses. 60 In the second model, students are asked to be able to conduct F2F sessions with instructors and then can complete their assignments online, but they are not required to attend F2F classes. 60

Finally, only one study applied the mixed model, supplemental model and online practicing model. Specifically, in the mixed model, content delivery and practical activities occur both F2F and online. In the supplemental model, both content delivery and practical activities take place F2F. In contrast, in the online practicing model, students can practice activities through a specific online learning environment. In particular, the reported BL models were implemented differently in many domains. It should be noted that some studies investigated more than one BL model. For instance, Alammary investigated flipped, mixed, flex, supplemental, and online-practicing models. 59

Table 4 presents the distribution of the reviewed studies by BL models and subject areas. 22 studies (seven multiple courses and 15 uncategorized) have focused on the design of BL in general or in multiple courses. This might be explained by the fact that teachers have limited knowledge about BL models that is why they always face challenges on how to design their blended courses and mix the offline and online settings. 58 This blended learning design problem was further emphasized during the COVID-19 pandemic, where several teachers raised concerns about this matter. 61 Therefore, more BL design trainings should be provided for teachers to help them efficiently design their blended courses.

Distribution of Studies by Blended Learning Models and Subject Area

Additionally, the flipped model was frequently used in health (seven studies), followed by STEM (five studies). This may be explained by health and STEM subjects requiring many hands-on practices to promote skill acquisition and long-term retention by the student. 62 , 63 In line with this, the flipped model enables teachers to reduce the in-class time by teaching all the courses online (in the pre-class stage) and counterbalance the students’ workload, so that the class time can be reserved for practical exercises instead of traditional lectures. For instance, in the health domain, the flipped model is applied by explaining the basic concepts of the course using different learning strategies in the pre-class stage, such as online learning platforms, instructional videos, animation, PowerPoint presentations and 3D virtual gaming situations. Also, students can use social media platforms such as Facebook for online discussions. In-class activities include instructor-led training, discussion of issues, practice or doing exercises (eg, assignments or quizzes), clinical teaching (eg, nursing diagnosis training) or lab teaching. In this context, several learning technologies were used, such as traditional computers and projectors, medical or teaching equipment and simulation teaching aids. Finally, in the post-class stage, some teachers used assessment methods to evaluate students’ perception of the applied model using peer evaluation, post-class evaluation and surveys. Similarly, in STEM subjects, the in-class time was reserved for more practice, including complex exercises where students can interact with each other and with the instructor (collaborative group assignments), active learning exercises rather than lectures, gaming activities, examinations and peer instruction.

Furthermore, as Table 4 shows, the mixed, flex, supplemental and online practicing models were also applied in STEM, specifically in programming courses. 59 This may be explained by the fact that STEM subjects – and programming courses in particular – allow for flexibility; combined with emerging technologies, this enables the teaching of this course in different ways, fully online or F2F. 64 For instance, in the mixed model, students received the course content and practical coding exercises in both F2F and online sessions, reserving most of the in-class time for practical exercises and discussion. In this context, in addition to the classical learning strategies such as online self-paced learning, online collaboration and online instructor-led learning, online programming tools were also used for coding and problem solving in online sessions. In the flex model, both course content and practical coding exercises take place online, but students are required to attend F2F sessions from time to time to check their progress or be provided with feedback. In the supplemental model, both course content and practical coding exercises take place F2F. However, online supplemental activities are added to the course to increase students’ engagement with course content. In the online practicing model, an online programming environment is used as the backbone of students’ learning. It allows students to practice programming and problem solving and provides them with immediate feedback about their programming solutions. The delivery of the course content is achieved through lectures and/or self-based online resources. In some cases, online resources are integrated within the online programming environment.

Outcomes and Challenges

Figure 9 presents the different learning outcomes investigated in the 57 review studies based on two categories: psychological and behavioural outcomes. 65 The majority of studies (49 studies) focused on investigating the psychological outcomes within the reviewed studies. Specifically, students’ self-regulation toward learning was the most investigated psychological outcome (10 studies), followed by satisfaction (nine studies) and engagement (eight studies). According to Van Laer and Elen, blended learning design includes attributes that support self-regulation, including authenticity, personalization, learner control, scaffolding and interaction. 66 The 10 studies found that students’ self-regulation was improved. Additionally, BL was found to improve students’ satisfaction and engagement in different domains, especially in health (seven studies). For instance, Li et al 67 and Presti 68 found that flipped learning enhanced students’ engagement and satisfaction in nursing education. Moreover, motivation, attitude, high-order thinking, social interaction and self-efficacy were found to be improved using BL.

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0009.jpg

Distribution of learning outcomes based on the number of studies addressing them.

The most investigated behavioural outcome is academic performance (26 studies), followed by skill progression and cooperation. In particular, the 26 studies showed that BL supports learning performance in different subject areas, including health, language and STEM. BL can also enhance students’ skills, such as clinical skills in the health domain, 35 , 69 and speaking skills in the language domain. 70 Additionally, its design may include several collaborative learning assignments (online or F2F) that encourage cooperation with students. 71 It should be noted that some studies investigated more than one type of learning outcomes. For instance, Atmacasoy and Aksu investigated students’ engagement with, collaboration in, participation in and academic performance with the blended learning course. 72

Despite the many advantages that BL offers, it also comes with several challenges. Figure 10 presents the most encountered challenges in the 57 review studies. Specifically, the lack of ICT skills is the most mentioned challenge (seven studies), followed by infrastructure issues (six studies) such as internet-related problems and lack of personal computers, course preparation time (three studies), design model and cost of technologies (two studies for each) and course quality content, student engagement and student isolation (one study for each). It should be noted that 47 studies did not mention any challenges and nine studies mentioned more than one challenge each. For instance, Rasheed et al found that students, teachers and institutions may face different challenges in BL, such as students’ isolation, lack of ICT skills for teachers and students and technological provision challenges (eg, cost of online learning technologies) for institutions. 73

An external file that holds a picture, illustration, etc.
Object name is PRBM-14-1525-g0010.jpg

Distribution of blended learning challenges.

Both teachers and students from different domains might lack ICT skills, which can negatively influence their adoption of BL. For instance, Atmacasoy and Aksu stated that teachers with low ICT skills may not have positive attitudes toward using BL since it is based on technology use. 72 Teachers might find difficulties in the ease of use of some technologies while creating a BL course, such as in recording videos, uploading videos and using online learning platforms. 73 Additionally, students may face some technological complexity challenges, such as accessing online educational resources or uploading their materials to the online learning environment. 73

ICT infrastructure is also a crucial layer for facilitating and implementing blended courses; however, it is still a major problem for several universities, especially in developing countries 74 and rural areas. 75 For instance, a lack of basic technologies, including internet, computers and projectors can limit the implementation of blended courses. Therefore, it is very important to improve institutes’ ICT infrastructure in order to improve education in general and enable teachers to teach using BL, which has proven to be efficient in many subject areas (see sections above).

In addition to issues with ICT skills and infrastructure, teachers may lack knowledge about designing BL models and hence face difficulties in selecting the appropriate design for their courses, 58 and they may also spend too much time preparing the blended course. 75 , 76 Moreover, some challenges of online learning, such as engagement and students’ isolation, may be faced in BL. In this context, teachers may integrate online collaborative assignments to solve the problem of isolation 77 and integrate new approaches, such as gamification, into the online learning environment in order to make students motivated and engaged while learning online. 78 , 79 In this context, Ekici found that gamified flipped learning enhanced students’ motivation and engagement while learning. 80

This study conducted a systematic review of systematic reviews on BL. It revealed several findings according to each research question: (1) the first two systematic reviews on BL were conducted in 2012, and this number rapidly increased over the years, reflecting a massive interest in BL. Additionally, more cross-country collaboration should be established to facilitate BL implementation in countries that lack, for instance, infrastructure or the needed BL competencies; (2) despite that several studies focused on specific subject area such as health or STEM, most studies did not discuss BL from a specific subject area; (3) most of the studies targeted students as stakeholders, and neglected major key players for BL adoption, such as policy makers; (4) most of the studies conducted a systematic review with qualitative analysis. Therefore, future research should follow a more quantitative approach through meta-analysis in order to provide a better understanding of the nuanced effects of BL; (5) the majority of studies discussed BL as a generic construct and did not focus on the learning models of BL. However, the flipped model was the most frequently implemented model in the papers that focused on learning models specifically in health and STEM ; and (6) BL can affect students’ psychological and behavioural outcomes. In terms of psychological outcomes, it can enhance students’ self-regulation toward learning, satisfaction and engagement while learning in different domains, especially in health. In terms of behavioural outcomes, BL supported students’ academic performance in different subject areas. Additionally, a lack of ICT skills and infrastructure are the most encountered challenges by teachers, students and institutions.

The findings of this study can help create a roadmap about future research on BL. This could facilitate BL adoption worldwide and thus contribute to achieving the UN Sustainable Development Goals (SDGs), especially SDG #4 – equity and high-quality education for all – which works as a backbone for some other SDGs, such as good health (#3), economic Growth (#8) and reduced inequality (#10). Despite the importance of the revealed findings, this study has several limitations that should be acknowledged. For instance, this study used a limited number of search keywords within specific electronic databases.

Future research might focus on: (1) dealing with these limitations; (2) investigating different BL models with specific application domains to test their impacts on students’ psychological and behavioural outcomes; (3) enhancing students’ motivation and engagement in online sessions by integrating new motivational concepts such as gamification in online learning platforms; and (4) dealing with BL challenges by providing some solutions to enhance the learning experience. For instance, for the challenge of a lack of ICT skills, research might work to provide ICT trainings for teachers and students to enhance their skills with technology.

Acknowledgments

The study was supported by the National Natural Science Foundation of China (The Research Fund for International Young Scientists; Grant No. 71950410624). However, any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the National Natural Science Foundation of China.

The authors report no conflicts of interest in this work.

A review of machine learning-based methods for predicting drug–target interactions

  • Published: 12 April 2024
  • Volume 12 , article number  30 , ( 2024 )

Cite this article

  • Wen Shi 1 , 3   na1 ,
  • Hong Yang   ORCID: orcid.org/0000-0002-4328-335X 1   na1 ,
  • Linhai Xie 2 ,
  • Xiao-Xia Yin 1 &
  • Yanchun Zhang 3 , 4  

The prediction of drug–target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing emphasis on the data representation employed by these methods, delineating five representations for drugs and four for proteins. The methods are then categorized into traditional machine learning-based approaches and deep learning-based ones, with a discussion of representative approaches in each category and the introduction of a novel taxonomy for deep neural network models in DTI prediction. Additionally, we present a synthesis of commonly used datasets and evaluation metrics to facilitate practical implementation. In conclusion, we address current challenges and outline potential future directions in this research field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research papers on learning approach

Ambudkar SV, Gottesman MM, editors. ABC Transporters: Biomedical, Cellular, and Molecular Aspects, Methods in Enzymology, vol. 292. San Diego (CA): Academic Press; 1998.

Al-Absi HR, Refaee MA, Rehman AU, Islam MT, Belhaouari SB, Alam T. Risk factors and comorbidities associated to cardiovascular disease in qatar: a machine learning based case-control study. IEEE Access. 2021;9:29929–41.

Article   Google Scholar  

Vermaas JV, Sedova A, Baker MB, Boehm S, Rogers DM, Larkin J, Glaser J, Smith MD, Hernandez O, Smith JC. Supercomputing pipelines search for therapeutics against covid-19. Comput Sci Eng. 2020;23(1):7–16.

Peska L, Buza K, Koller J. Drug-target interaction prediction: a Bayesian ranking approach. Comput Methods Programs Biomed. 2017;152:15–21.

Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):i232–40.

Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MND. A ligand-based approach for the in silico discovery of multi-target inhibitors for proteins associated with HIV infection. Mol Biosyst. 2012;8(8):2188–96.

Yang Y, Zhu Z, Wang X, Zhang X, Mu K, Shi Y, Peng C, Xu Z, Zhu W. Ligand-based approach for predicting drug targets and for virtual screening against covid-19. Briefings Bioinform. 2021;22(2):1053–64.

Aziz F, Cardoso VR, Bravo-Merodio L, Russ D, Pendleton SC, Williams JA, Acharjee A, Gkoutos GV. Multimorbidity prediction using link prediction. Sci Rep. 2021;11(1):16392.

Valentin JP, Guillon JM, Jenkinson S, Kadambi V, Ravikumar P, Roberts S, Rosenbrier-Ribeiro L, Schmidt F, Armstrong D. In vitro secondary pharmacological profiling: an iq-drusafe industry survey on current practices. J Pharmacol Toxicol Methods. 2018;93:7–14.

da Silva Rocha SF, Olanda CG, Fokoue HH, Sant’Anna CM. Virtual screening techniques in drug discovery: review and recent applications. Curr Top Med Chem. 2019;19(19):1751–67.

Mousavian Z, Masoudi-Nejad A. Drug-target interaction prediction via chemogenomic space: learning-based methods. Expert Opin Drug Metab Toxicol. 2014;10(9):1273–87.

Li J, Zheng S, Chen B, Butte AJ, Swamidass SJ, Lu Z. A survey of current trends in computational drug repositioning. Brief Bioinform. 2016;17(1):2–12.

Chen R, Liu X, Jin S, Lin J, Liu J. Machine learning for drug-target interaction prediction. Molecules. 2018;23(9):2208.

Zhang W, Lin W, Zhang D, Wang S, Shi J, Niu Y. Recent advances in the machine learning-based drug-target interaction prediction. Curr Drug Metabol. 2019;20(3):194–202.

Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform. 2021;22(1):247–69.

Stokes JM, Yang K, Swanson K, Jin W, Cubillos-Ruiz A, Donghia NM, MacNair CR, French S, Carfrae LA, Bloom-Ackermann Z, et al. A deep learning approach to antibiotic discovery. Cell. 2020;180(4):688–702.

Thafar MA, Alshahrani M, Albaradei S, Gojobori T, Essack M, Gao X. Affinity2vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep. 2022;12(1):4751.

Honda S, Shi S, Ueda H.R. Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738 2019.

Wang YB, You ZH, Yang S, Yi HC, Chen ZH, Zheng K. A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network. BMC Med Inform Decis Mak. 2020;20(2):1–9.

Google Scholar  

Zhao ZY, Huang WZ, Zhan XK, Pan J, Huang YA, Zhang SW, Yu CQ et al. An ensemble learning-based method for inferring drug-target interactions combining protein sequences and drug fingerprints. BioMed Res Int. 2021;2021.

Zhao Z, Bourne PE. Harnessing systematic protein–ligand interaction fingerprints for drug discovery. Drug Discov Today. 2022;27(10):103319.

Shao J, Gong Q, Yin Z, Yin Z, Pan W, Pandiyan S, Wang L. S2dv: converting smiles to a drug vector for predicting the activity of anti-hbv small molecules. Brief Bioinform. 2022;23(2):bbab593.

Kurata H, Tsukiyama S, Manavalan B. iacvp: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model. Brief Bioinform. 2022;23(4):bbac265.

Lim J, Ryu S, Kim JW, Kim WY. Molecular generative model based on conditional variational autoencoder for de novo molecular design. J Chem. 2018;10(1):1–9.

Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. Learn molecular representations from large-scale unlabeled molecules for drug discovery. arXiv preprint arXiv:2012.11175 . 2020.

Zhang J, Liu B. A review on the recent developments of sequence-based protein feature extraction methods. Curr Bioinform. 2019;14(3):190–9.

Cui F, Zhang Z, Zou Q. Sequence representation approaches for sequence-based protein prediction tasks that use deep learning. Brief Funct Genom. 2021;20(1):61–73.

Zhang YF, Wang X, Kaushik AC, Chu Y, Shan X, Zhao MZ, Xu Q, Wei DQ. Spvec: a word2vec-inspired feature representation method for drug-target interaction prediction. Front Chem. 2020;7:895.

Wang H, Wang J, Dong C, Lian Y, Liu D, Yan Z. A novel approach for drug-target interactions prediction based on multimodal deep autoencoder. Front Pharmacol. 2020;10:1592.

Jiang M, Li Z, Zhang S, Wang S, Wang X, Yuan Q, Wei Z. Drug-target affinity prediction using graph neural network and contact maps. RSC Adv. 2020;10(35):20701–12.

Wang P, Zheng S, Jiang Y, Li C, Liu J, Wen C, Patronov A, Qian D, Chen H, Yang Y. Structure-aware multimodal deep learning for drug-protein interaction prediction. J Chem Inform Model. 2022;62(5):1308–17.

Zhao Y, Zheng K, Guan B, Guo M, Song L, Gao J, Qu H, Wang Y, Shi D, Zhang Y. Dldti: a learning-based framework for drug-target interaction identification using neural networks and network representation. J Transl Med. 2020;18:1–15.

Lee H, Kim W. Comparison of target features for predicting drug-target interactions by deep neural network based on large-scale drug-induced transcriptome data. Pharmaceutics. 2019;11(8):377.

Li X, Xiong H, Li X, Wu X, Zhang X, Liu J, Bian J, Dou D. Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond. Knowl Inform Syst. 2022;64(12):3197–234.

Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database hallmark gene set collection. Cell Syst. 2015;1(6):417–25.

Thafar MA, Olayan RS, Ashoor H, Albaradei S, Bajic VB, Gao X, Gojobori T, Essack M. Dtigems+: drug-target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J Cheminform. 2020;12(1):1–17.

Mohamed SK, Nováček V, Nounu A. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics. 2020;36(2):603–10.

Thafar MA, Olayan RS, Albaradei S, Bajic VB, Gojobori T, Essack M, Gao X. Dti2vec: drug-target interaction prediction using network embedding and ensemble learning. J Cheminform. 2021;13(1):1–18.

Perlman L, Gottlieb A, Atias N, Ruppin E, Sharan R. Combining drug and gene similarity measures for drug-target elucidation. J Comput Biol. 2011;18(2):133–45.

Shi JY, Yiu SM In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (IEEE, 2015): 1636–1641

Shi JY, Li JX, Lu HM, Zhang Y. In: Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques: 5th International Conference, IScIDE 2015, Suzhou, June 14–16, 2015, Revised Selected Papers, Part II 5 (Springer, 2015): 477–486

Shi JY, Yiu SM, Li Y, Leung HC, Chin FY. Predicting drug-target interaction for new drugs using enhanced similarity measures and super-target clustering. Methods. 2015;83:98–104.

Buza K, Peška L. Drug-target interaction prediction with bipartite local models and hubness-aware regression. Neurocomputing. 2017;260:284–93.

Pahikkala T, Airola A, Pietilä S, Shakyawar S, Szwajda A, Tang J, Aittokallio T. Toward more realistic drug-target interaction predictions. Brief Bioinform. 2015;16(2):325–37.

He T, Heidemeyer M, Ban F, Cherkasov A, Ester M. Simboost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminform. 2017;9(1):1–14.

Yu H, Chen J, Xu X, Li Y, Zhao H, Fang Y, Li X, Zhou W, Wang W, Wang Y. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS ONE. 2012;7(5): e37608.

Fu G, Ding Y, Seal A, Chen B, Sun Y, Bolton E. Predicting drug target interactions using meta-path-based semantic network analysis. BMC Bioinform. 2016;17(1):1–10.

Wang MY, Li P, Qiao Pl, et al. The virtual screening of the drug protein with a few crystal structures based on the adaboost-svm. Comput Math Methods Med. 2016:2016

Olayan RS, Ashoor H, Bajic VB. Ddr: efficient computational method to predict drug-target interactions using graph mining and machine learning approaches. Bioinformatics. 2018;34(7):1164–73.

Ezzat A, Wu M, Li XL, Kwoh CK. Computational prediction of drug-target interactions using chemogenomic approaches: an empirical survey. Brief Bioinform. 2019;20(4):1337–57.

Xia Z, Wu LY, Zhou X, Wong ST. In: BMC systems biology, vol. 4 (BioMed Central, 2010): 1–16

Van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics. 2011;27(21):3036–43.

Wang Y, Zeng J. Predicting drug-target interactions using restricted Boltzmann machines. Bioinformatics. 2013;29(13):i126–34.

Aghakhani S, Qabaja A, Alhajj R. Integration of k-means clustering algorithm with network analysis for drug-target interactions network prediction. Int J Data Min Bioinform. 2018;20(3):185–212.

Yang F, Xue F, Zhang Y, Karypis G. Kernelized multitask learning method for personalized signaling adverse drug reactions. IEEE Trans Knowl Data Eng. 2021;35(2):1681–94.

Zheng Y, Tang P, Qiu W, Wang H, Guo J, Huang Z. in International conference on database systems for advanced applications (Springer, 2023):336–352

Yin XX, Jian Y, Zhang Y, Zhang Y, Wu J, Lu H, Su MY. Automatic breast tissue segmentation in mris with morphology snake and deep denoiser training via extended stein’s unbiased risk estimator. Health Inform Sci Syst. 2021;9:1–21.

Ma T, Xiao C, Zhou J, Wang F. Drug similarity integration through attentive multi-view graph auto-encoders. arXiv preprint arXiv:1804.10850 . 2018.

Vázquez J, López M, Gibert E, Herrero E, Luque FJ. Merging ligand-based and structure-based methods in drug discovery: an overview of combined virtual screening approaches. Molecules. 2020;25(20):4723.

Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang GZ. Deep learning for health informatics. IEEE J Biomed Health Inform. 2016;21(1):4–21.

Abbasi K, Razzaghi P, Poso A, Ghanbari-Ara S, Masoudi-Nejad A. Deep learning in drug target interaction prediction: current and future perspectives. Curr Med Chem. 2021;28(11):2100–13.

Thafar M, Raies AB, Albaradei S, Essack M, Bajic VB. Comparison study of computational prediction tools for drug-target binding affinities. Front Chem. 2019;7:782.

Nikraftar Z, Keyvanpour MR. A comparative analytical review on machine learning methods in Drugtarget interactions prediction. Curr Comput-Aided Drug Des. 2023;19(5):325–55.

Liu X, Yan M, Deng L, Li G, Ye X, Fan D, Pan S, Xie Y. Survey on graph neural network acceleration: An algorithmic perspective. arXiv preprint arXiv:2202.04822 . 2022

Huang L, Lin J, Liu R, Zheng Z, Meng L, Chen X, Li X, Wong KC. Coadti: multi-modal co-attention based framework for drug–target interaction annotation. Brief Bioinform. 2022;23(6):bbac446.

Hua Y, Song X, Feng Z, Wu X. Mfr-dta: a multi-functional and robust model for predicting drug–target binding affinity and region. Bioinformatics. 2023;39(2):btad056.

Gim M, Choe J, Baek S, Park J, Lee C, Ju M, Lee S, Kang J, Kang J. Arkdta: attention regularization guided by non-covalent interactions for explainable drug–target binding affinity prediction. Bioinformatics. 2023;39(1):i448–57.

Yuan Y, Chang S, Zhang Z, Li Z, Li S, Xie P, Yau WP, Lin H, Cai W, Zhang Y, et al. A novel strategy for prediction of human plasma protein binding using machine learning techniques. Chemom Intell Lab Syst. 2020;199: 103962.

Askr H, Elgeldawi E, Aboul Ella H, Elshaier YA, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev. 2023;56(7):5975–6037.

Öztürk H, Özgür A, Ozkirimli E. Deepdta: deep drug-target binding affinity prediction. Bioinformatics. 2018;34(17):i821–9.

Öztürk H, Ozkirimli E, Özgür A. Widedta: prediction of drug-target binding affinity. arXiv preprint arXiv:1902.04166 . 2019.

Lee I, Keum J, Nam H. Deepconv-dti: prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol. 2019;15(6): e1007129.

Zheng X, He S, Song X, Zhang Z, Bo X. In: Artificial neural networks and machine learning-ICANN 2018: 27th international conference on artificial neural networks, Rhodes, October 4–7, 2018, Proceedings, Part I 27 (Springer, 2018): 104–114

Rifaioglu AS, Cetin Atalay R, Cansen Kahraman D, Doğan T, Martin M, Atalay V. Mdeepred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery. Bioinformatics. 2021;37(5):693–704.

Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. Graphdta: predicting drug-target binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–7.

Wu Y, Gao M, Zeng M, Zhang J, Li M. Bridgedpi: a novel graph neural network for predicting drug-protein interactions. Bioinformatics. 2022;38(9):2571–8.

Zhang H, Hu J, Zhang X. In: International conference on intelligent computing (Springer, 2022): 533–546

Yang Z, Zhong W, Zhao L, Chen CYC. Mgraphdta: deep multiscale graph neural network for explainable drug-target binding affinity prediction. Chem Sci. 2022;13(3):816–33.

Mukherjee S, Ghosh M, Basuchowdhuri P. In: Proceedings of the 2022 SIAM international conference on data mining (SDM) (SIAM, 2022): 729–737

Zhao Q, Xiao F, Yang M, Li Y, Wang J. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM) (IEEE, 2019): 64–69

Chen L, Tan X, Wang D, Zhong F, Liu X, Yang T, Luo X, Chen K, Jiang H, Zheng M. Transformercpi: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics. 2020;36(16):4406–14.

Ghimire A, Tayara H, Xuan Z, Chong KT. Csatdta: prediction of drug-target binding affinity using convolution model with self-attention. Int J Mol Sci. 2022;23(15):8453.

Monteiro NR, Oliveira JL, Arrais JP. Dtitr: end-to-end drug-target binding affinity prediction with transformers. Comput Biol Med. 2022;147: 105772.

Cowen L, Ideker T, Raphael BJ, Sharan R. Network propagation: a universal amplifier of genetic associations. Nat Rev Genet. 2017;18(9):551–62.

Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573.

Eslami Manoochehri H, Nourani M. Drug-target interaction prediction using semi-bipartite graph model and deep learning. BMC bioinform. 2020;21:1–16.

Wang Z, Zhou M, Arnold C. Toward heterogeneous information fusion: bipartite graph convolutional networks for in silico drug repurposing. Bioinformatics. 2020;36(1):i525–33.

Li X, Ma D, Ren Y, Luo J, Li Y. Large-scale prediction of drug-protein interactions based on network information. Curr Comput-Aided Drug Des. 2022;18(1):64–72.

Peng J, Li J, Shang X. A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network. BMC Bioinform. 2020;21(13):1–13.

Peng J, Wang Y, Guan J, Li J, Han R, Wei Z, Shang X. An end-to-end heterogeneous graph representation learning-based framework for drug–target interaction prediction. Brief Bioinform. 2021;22(5):bbaa430.

Zhou D, Xu Z, Li W, Xie X, Peng S. Multidti: drug–target interaction prediction based on multi-modal representation learning to bridge the gap between new chemical entities and known heterogeneous network. Bioinformatics. 2021;37(23):4485–92.

Wang S, Du Z, Ding M, Rodriguez-Paton A, Song T. Kg-dti: a knowledge graph based deep learning method for drug-target interaction predictions and Alzheimer’s disease drug repositions. Appl Intell. 2022;52(1):846–57.

Wang W, Liang S, Yu M, Liu D, Zhang H, Wang X, Zhou Y. Gchn-dti: predicting drug-target interactions by graph convolution on heterogeneous networks. Methods. 2022;206:101–7.

Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, Bridgland A, Cowie A, Meyer C, Laydon A, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596(7873):590–6.

Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–6.

Liebeschuetz JW, Cole JC, Korb O. Pose prediction and virtual screening performance of gold scoring functions in a standardized test. J Comput-Aided Mol Des. 2012;26:737–48.

Clark JJ, Orban ZJ, Carlson HA. Predicting binding sites from unbound versus bound protein structures. Sci Rep. 2020;10(1):15856.

Akdel M, Pires DE, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G, et al. A structural biology community assessment of alphafold2 applications. Nat Struct Mol Biol. 2022;29(11):1056–67.

Laskowski RA, Watson JD, Thornton JM. Protein function prediction using local 3d templates. J Mol Biol. 2005;351(3):614–26.

Zhang Z, Chen L, Zhong F, Wang D, Jiang J, Zhang S, Jiang H, Zheng M, Li X. Graph neural network approaches for drug-target interactions. Curr Opin Struct Biol. 2022;73: 102327.

Jiménez J, Skalic M, Martinez-Rosell G, De Fabritiis G. K deep: protein-ligand absolute binding affinity prediction via 3d-convolutional neural networks. J Chem Inform Model. 2018;58(2):287–96.

Tsubaki M, Tomii K, Sese J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics. 2019;35(2):309–18.

Lim J, Ryu S, Park K, Choe YJ, Ham J, Kim WY. Predicting drug-target interaction using a novel graph neural network with 3d structure-embedded graph representation. J Chem Inform Model. 2019;59(9):3981–8.

Li S, Zhou J, Xu T, Huang L, Wang F, Xiong H, Huang W, Dou D, Xiong H. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining 2021: 975–985

Aykent S, Xia T. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining 2022: 4–14

Stärk H, Ganea O, Pattanaik L, Barzilay R, Jaakkola T. In: International conference on machine learning (PMLR, 2022), pp. 20503–20521

Zhou G, Gao Z, Ding Q, Zheng H, Xu H, Wei Z, Zhang L, Ke G. Uni-mol: a universal 3d molecular representation learning framework 2023.

da Silva Rocha SF, Olanda CG, Fokoue HH, Sant’Anna CM. Virtual screening techniques in drug discovery: review and recent applications. Curr Top Med Chem. 2019;19(19):1751–67.

Li Y, Qiao G, Gao X, Wang G. Supervised graph co-contrastive learning for drug-target interaction prediction. Bioinformatics. 2022;38(10):2847–54.

Wu J, Lv X, Jiang S. In: Advances in intelligent automation and soft computing (Springer, 2022), pp. 376–383

Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J. Bindingdb in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44(D1):D1045–53.

Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G. Brenda, the enzyme database: updates and major new developments. Nucleic acids Res. 2004;32(1):D431–3.

Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, et al. The Chembl database in 2017. Nucleic Acids Res. 2017;45(D1):D945–54.

Wishart DS, Feunang YD, Guo AC, Lo EJ, Sajed T, Johnson D, Li C, Sayeeda Z, Sayeeda Z, et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic acids Res. 2018;46(D1):D1074–82.

Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A. Supertarget and matador: resources for exploring drug-target relationships. Nucleic Acids Res. 2007;36(1):D919–22.

Wang R, Fang X, Lu Y, Yang CY, Wang S. The pdbbind database: methodologies and updates. J Med Chem. 2005;48(12):4111–9.

Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, et al. Pubchem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–13.

Kuhn M, Szklarczyk D, Pletscher-Frankild S, Blicher TH, Von Mering C, Jensen LJ, Bork P. Stitch 4: integration of protein-chemical interactions with user data. Nucleic Acids Res. 2014;42(D1):D401–7.

Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, et al. Supertarget and matador: resources for exploring drug-target relationships. Nucleic Acids Res. 2007;36(1):D919–22.

Magariños MP, Carmona SJ, Crowther GJ, Ralph SA, Roos DS, Shanmugam D, Van Voorhis WC, Agüero F. Tdr targets: a chemogenomics resource for neglected diseases. Nucleic Acids Res. 2012;40(D1):D1118–27.

Chen X, Ji ZL, Chen YZ. Ttd: therapeutic target database. Nucleic Acids Res. 2002;30(1):412–5.

Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wennerberg K, Aittokallio T. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inform Model. 2014;54(3):735–43.

Ezzat A, Wu M, Li XL, Kwoh CK. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinform. 2016;17(19):267–76.

Samara KA, Al Aghbari Z, Abusafia A. Glimpse: a glioblastoma prognostication model using ensemble learning’a surveillance, epidemiology, and end results study. Health Inform Sci Syst. 2021;9:1–13.

Hu L, Fu C, Ren Z, Cai Y, Yang J, Xu S, Xu W, Tang D. Sselm-neg: spherical search-based extreme learning machine for drug-target interaction prediction. BMC Bioinform. 2023;24(1):38.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62376065), Natural Science Foundation of Guangdong (No. 2022A1515010102) and Joint Research Fund of Guangzhou and University (No. 2024A03J0323).

Author information

Wen Shi and Hong Yang have contributed equally to this work.

Authors and Affiliations

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006, China

Wen Shi, Hong Yang & Xiao-Xia Yin

State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing, 102206, China

School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China

Wen Shi & Yanchun Zhang

Department of New Networks, Peng Cheng Laboratory, Shenzhen, 518000, China

Yanchun Zhang

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Linhai Xie or Yanchun Zhang .

Ethics declarations

Conflict of interest.

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Shi, W., Yang, H., Xie, L. et al. A review of machine learning-based methods for predicting drug–target interactions. Health Inf Sci Syst 12 , 30 (2024). https://doi.org/10.1007/s13755-024-00287-6

Download citation

Received : 11 September 2023

Accepted : 04 March 2024

Published : 12 April 2024

DOI : https://doi.org/10.1007/s13755-024-00287-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Drug–target interactions
  • Machine learning
  • Data representation
  • Deep neural network models
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

55k Accesses

862 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

research papers on learning approach

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

research papers on learning approach

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

research papers on learning approach

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

research papers on learning approach

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: an evidential-enhanced tri-branch consistency learning method for semi-supervised medical image segmentation.

Abstract: Semi-supervised segmentation presents a promising approach for large-scale medical image analysis, effectively reducing annotation burdens while achieving comparable performance. This methodology holds substantial potential for streamlining the segmentation process and enhancing its feasibility within clinical settings for translational investigations. While cross-supervised training, based on distinct co-training sub-networks, has become a prevalent paradigm for this task, addressing critical issues such as predication disagreement and label-noise suppression requires further attention and progress in cross-supervised training. In this paper, we introduce an Evidential Tri-Branch Consistency learning framework (ETC-Net) for semi-supervised medical image segmentation. ETC-Net employs three branches: an evidential conservative branch, an evidential progressive branch, and an evidential fusion branch. The first two branches exhibit complementary characteristics, allowing them to address prediction diversity and enhance training stability. We also integrate uncertainty estimation from the evidential learning into cross-supervised training, mitigating the negative impact of erroneous supervision signals. Additionally, the evidential fusion branch capitalizes on the complementary attributes of the first two branches and leverages an evidence-based Dempster-Shafer fusion strategy, supervised by more reliable and accurate pseudo-labels of unlabeled data. Extensive experiments conducted on LA, Pancreas-CT, and ACDC datasets demonstrate that ETC-Net surpasses other state-of-the-art methods for semi-supervised segmentation. The code will be made available in the near future at this https URL .

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Subscribe to the PwC Newsletter

Join the community, edit social preview.

research papers on learning approach

Add a new code entry for this paper

Remove a code repository from this paper.

research papers on learning approach

Mark the official implementation from paper authors

Add a new evaluation result row.

  • REINFORCEMENT-LEARNING
  • REINFORCEMENT LEARNING (RL)

Remove a task

Add a method, remove a method, edit datasets, compositional conservatism: a transductive approach in offline reinforcement learning.

6 Apr 2024  ·  Yeda Song , Dongwook Lee , Gunhee Kim · Edit social preview

Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. Nevertheless, offline RL inevitably faces the problem of distributional shifts, where the states and actions encountered during policy execution may not be in the training dataset distribution. A common solution involves incorporating conservatism into the policy or the value function to safeguard against uncertainties and unknowns. In this work, we focus on achieving the same objectives of conservatism but from a different perspective. We propose COmpositional COnservatism with Anchor-seeking (COCOA) for offline RL, an approach that pursues conservatism in a compositional manner on top of the transductive reparameterization (Netanyahu et al., 2023), which decomposes the input variable (the state in our case) into an anchor and its difference from the original input. Our COCOA seeks both in-distribution anchors and differences by utilizing the learned reverse dynamics model, encouraging conservatism in the compositional input space for the policy or value function. Such compositional conservatism is independent of and agnostic to the prevalent behavioral conservatism in offline RL. We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark, where COCOA generally improves the performance of each algorithm. The code is available at https://github.com/runamu/compositional-conservatism.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit.

research papers on learning approach

Results from the Paper Edit

Methods edit add remove.

IMAGES

  1. (PDF) A Study of the Impact of Technology-Enhanced Learning on Student

    research papers on learning approach

  2. (PDF) E-learning in Academic Teaching

    research papers on learning approach

  3. (PDF) E-Learning Using Artificial Intelligence

    research papers on learning approach

  4. 🎉 Learning styles research paper. Research Paper Styles. 2019-01-24

    research papers on learning approach

  5. (PDF) An Essay on Learning

    research papers on learning approach

  6. 🌱 Research paper on learning styles pdf. (PDF) Learning Style. 2022-10-24

    research papers on learning approach

VIDEO

  1. || How to find Research Papers & Identify Research Gap || AI tools || Research Beginners Guide ||

  2. Introductory microeconomics|| question paper|| 1st year|| DU sol|| previous year

  3. Tips for writing your Research Papers ✍🏻

  4. The Simon Approach to Learning Engineering

  5. What is Learning with Concepts?

  6. Rising to students’ learning challenges with Adrian Doff

COMMENTS

  1. A methodological perspective on learning in the developing brain

    By drawing from studies in the domains of reading, reinforcement learning, and learning difficulties, we present a brief overview of methodological approaches and research designs that bridge ...

  2. (PDF) The Cognitive Perspective on Learning: Its Theoretical

    Learning theories are essential for effective teaching in that they shed light on different aspects of the learning process. The spectrum of learning theories can be categorized into three main ...

  3. Learning Styles: Concepts and Evidence

    The authors of the present review were charged with determining whether these practices are supported by scientific evidence. We concluded that any credible validation of learning-styles-based instruction requires robust documentation of a very particular type of experimental finding with several necessary criteria. First, students must be divided into groups on the basis of their learning ...

  4. Teaching the science of learning

    The science of learning has made a considerable contribution to our understanding of effective teaching and learning strategies. However, few instructors outside of the field are privy to this research. In this tutorial review, we focus on six specific cognitive strategies that have received robust support from decades of research: spaced practice, interleaving, retrieval practice, elaboration ...

  5. PDF Learning: Theory and Research

    Learning theory and research have long been the province of education and psychology, but what is now known about how people learn comes from research in many different disciplines. This chapter of the Teaching Guide introduces three central learning theories, as well as relevant research from the fields of neuroscience, anthropology, cognitive ...

  6. Improving Students' Learning With Effective Learning Techniques:

    Student-generated paragraph summaries and the information-processing theory of prose learning. Journal of Experimental Education, 51, 4 ... Crawford C. C. (1925a). The correlation between college lecture notes and quiz papers. Journal of Educational Research, 12, 282-291. Crossref. Google Scholar. Crawford C. C. (1925b). Some experimental ...

  7. The science of effective learning with spacing and retrieval practice

    Stephany Duany Rea. Human Arenas (2023) Research on the psychology of learning has highlighted straightforward ways of enhancing learning. However, effective learning strategies are underused by ...

  8. Learning Styles: An overview of theories, models, and measures

    This paper aims to provide such an account, attempting to clarify common areas of ambiguity and in particular issues surrounding measurement and appropriate instruments. ... (Citation 1995) discussion of style‐based theory and research, Rayner and Riding (Citation 1997) consider learning style within the framework of personality‐centred ...

  9. (PDF) Learning Theories

    or increases information (cognitive skills). Education empowers our brain and beliefs, so. it encourages our intellectual power to improve knowledge. Most important theories related to language ...

  10. A Systematic Review of Research on Personalized Learning: Personalized

    The National Academy of Engineering named the development of personalized learning systems a "Grand Challenge" for the 21st century (Ellis, 2009), and researchers from many different disciplines have taken aim at different features of the grand challenge they describe.The process of personalizing learning requires that a learning environment—whether it be face-to-face vs. digital or ...

  11. Frontiers

    This article outlines a meta-analysis of the 10 learning techniques identified in Dunlosky et al. (2013a), and is based on 242 studies, 1,619 effects, 169,179 unique participants, with an overall mean of 0.56. The most effective techniques are Distributed Practice and Practice Testing and the least effective (but still with relatively high effects) are Underlining and Summarization. A major ...

  12. Full article: A practical approach to assessment for learning and

    Whole task differentiating instruction. DI proposals are based on insights into differences between students and can take many forms (Tomlinson et al., Citation 2003).An approach to DI is to let students work on common, complex 'whole' tasks with common goals, but to tailor support - learning routes - for finishing the tasks to the students' needs.

  13. A systematic literature review of personalized learning terms

    A distinctly personalized learning approach can help the educational researchers to build up research on previous data, instead of trying to start new research from scratch each time. ... This study reviews definitions of personalized learning terms used in research papers from 2010 to 2020 by systematically reviewing the literature to compare ...

  14. Inquiry-Based Learning: A Review of the Research Literature

    Although there is no peer-reviewed research of the impact of this approach to. inquiry on learning, Johnson and Adams (2011) undertook two field-based studies on. the efficacy of this approach ...

  15. Learning analytics: current trends and innovative practices

    Trends and innovative practices of learning analytics reported in this special issue. Although the number of studies on learning analytics is increasing at a fast pace, researchers and education practitioners have reported various problems in designing learning analytics tools and applying learning analytics in teaching and learning (Wilson et ...

  16. A Systematic Review of Systematic Reviews on Blended Learning: Trends

    Introduction. Blended Learning (BL) is one of the most frequently used approaches related to the application of Information and Communications Technology (ICT) in education. 1 In its simplest definition, BL aims to combine face-to-face (F2F) and online settings, resulting in better learning engagement and flexible learning experiences, with rich settings way further the use of a simple online ...

  17. PDF PEER LEARNING: WHAT THE RESEARCH SAYS

    >Learning Through Group Work in the ollege lassroom: Evaluating the Evidence from an Instructional Goal Perspective. ? In Higher Education: Handbook of Theory and Research. Volume 31. Springer. A soon-to-be published handbook on student-centered learning, with several chapters on peer learning topics. Hoidn, Sabine, and Manja Klemenčič, eds ...

  18. PDF Engaging Students in the Learning Process with Game-Based Learning: The

    theoretical underpinnings, teachers' perceptions towards the game-based learning approach are further addressed in the paper. The advantages and disadvantages of game-based learning are also discussed. Keywords Game-based learning Digital games Technology Collaborative learning . Introduction. Game-based learning is a method of obtaining new ...

  19. Experiential Learning: A Case Study Approach

    The still developing student brain plays a significant part in the success of an. experiential learning approach. As our knowledge of the brain and its functions grow. through imaging and testing ...

  20. On the Convergence of Continual Learning with Adaptive Methods

    One of the objectives of continual learning is to prevent catastrophic forgetting in learning multiple tasks sequentially, and the existing solutions have been driven by the conceptualization of the plasticity-stability dilemma. However, the convergence of continual learning for each sequential task is less studied so far. In this paper, we provide a convergence analysis of memory-based ...

  21. Collaborative learning practices: teacher and student perceived

    In this paper, we report on teacher and student perceived features of collaborative activities that teachers have implemented to foster student collaboration. Over the last decades, research has demonstrated that CL can promote academic and social educational outcomes (Johnson, Johnson, & Smith, Citation 2007; Slavin, Citation 1996). However ...

  22. PDF The Impact of Hands-On-Approach on Student Academic Performance in ...

    This paper includes the analysis of the feedback of the pre-test and post-test scores of the students before and after the Hands-on-approach was given as ... rate of learning by lecture is 5% while that of practice by doing (Activity-oriented) is about 75%. ... and activity-oriented teaching methods (NERDC, 2008). Past research work had stated ...

  23. Research article The effectiveness of blended learning on students

    Regarding mathematics instruction with a blended learning approach, 12 teachers (50% of the group) think it is appropriate, ten teachers (41.7%) believe it is very suitable, and two teachers (8.4%) believe that it is very appropriate. ... Wrote the paper. Funding statement. This research did not receive any specific grant from funding agencies ...

  24. A comprehensive review on ensemble deep learning ...

    With the emergence of ensemble learning approaches, lots of research has been conducted to evaluate the methods of ensemble (Hashino et al., 2007 ... The paper also illustrated the recent trends in ensemble learning using quantitative analysis of several research papers. Moreover, the paper offered various factors that influence ensemble ...

  25. Information

    A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the ...

  26. Realistic Continual Learning Approach using Pre-trained Models

    View a PDF of the paper titled Realistic Continual Learning Approach using Pre-trained Models, by Nadia Nasri and 4 other authors. Continual learning (CL) is crucial for evaluating adaptability in learning solutions to retain knowledge. Our research addresses the challenge of catastrophic forgetting, where models lose proficiency in previously ...

  27. A review of machine learning-based methods for predicting ...

    The prediction of drug-target interactions (DTI) is a crucial preliminary stage in drug discovery and development, given the substantial risk of failure and the prolonged validation period associated with in vitro and in vivo experiments. In the contemporary landscape, various machine learning-based methods have emerged as indispensable tools for DTI prediction. This paper begins by placing ...

  28. Predicting and improving complex beer flavor through machine learning

    Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential ...

  29. An Evidential-enhanced Tri-Branch Consistency Learning Method for Semi

    Semi-supervised segmentation presents a promising approach for large-scale medical image analysis, effectively reducing annotation burdens while achieving comparable performance. This methodology holds substantial potential for streamlining the segmentation process and enhancing its feasibility within clinical settings for translational investigations. While cross-supervised training, based on ...

  30. Papers with Code

    Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. ... Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. ... (COCOA) for offline RL, an approach that pursues ...