A literature review about usability evaluation methods for e-learning platforms

Affiliation.

  • 1 Deparment of Production and Systems Engineering, University of Minho, Guimarães, Portugal. [email protected]
  • PMID: 22316857
  • DOI: 10.3233/WOR-2012-0281-1038

The usability analysis of information systems has been the target of several research studies over the past thirty years. These studies have highlighted a great diversity of points of view, including researchers from different scientific areas such as Ergonomics, Computer Science, Design and Education. Within the domain of information ergonomics, the study of tools and methods used for usability evaluation dedicated to E-learning presents evidence that there is a continuous and dynamic evolution of E-learning systems, in many different contexts -academics and corporative. These systems, also known as LMS (Learning Management Systems), can be classified according to their educational goals and their technological features. However, in these systems the usability issues are related with the relationship/interactions between user and system in the user's context. This review is a synthesis of research project about Information Ergonomics and embraces three dimensions, namely the methods, models and frameworks that have been applied to evaluate LMS. The study also includes the main usability criteria and heuristics used. The obtained results show a notorious change in the paradigms of usability, with which it will be possible to discuss about the studies carried out by different researchers that were focused on usability ergonomic principles aimed at E-learning.

Publication types

  • Computer-Assisted Instruction / standards*
  • Evaluation Studies as Topic*
  • User-Computer Interface*

Usability Evaluation Methods of Mobile Applications: A Systematic Literature Review

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Open access
  • Published: 03 July 2017

Users’ design feedback in usability evaluation: a literature review

  • Asbjørn Følstad   ORCID: orcid.org/0000-0003-2763-0996 1  

Human-centric Computing and Information Sciences volume  7 , Article number:  19 ( 2017 ) Cite this article

19k Accesses

13 Citations

Metrics details

As part of usability evaluation, users may be invited to offer their reflections on the system being evaluated. Such reflections may concern the system’s suitability for its context of use, usability problem predictions, and design suggestions. We term the data resulting from such reflections users’ design feedback . Gathering users’ design feedback as part of usability evaluation may be seen as controversial, and the current knowledge on users’ design feedback is fragmented. To mitigate this, we have conducted a literature review. The review provides an overview of the benefits and limitations of users’ design feedback in usability evaluations. Following an extensive search process, 31 research papers were identified as relevant and analysed. Users’ design feedback is gathered for a number of distinct purposes: to support budget approaches to usability testing, to expand on interaction data from usability testing, to provide insight into usability problems in users’ everyday context, and to benefit from users’ knowledge and creativity. Evaluation findings based on users’ design feedback can be qualitatively different from, and hence complement, findings based on other types of evaluation data. Furthermore, findings based on users’ design feedback can hold acceptable validity, though the thoroughness of such findings may be questioned. Finally, findings from users’ design feedback may have substantial impact in the downstream development process. Four practical implications are highlighted, and three directions for future research are suggested.

Introduction

Involving users in usability evaluation is valuable when designing information and communication technology (ICT), and a range of usability evaluation methods (UEM) support user involvement. Relevant methods include adaptations of usability testing [ 1 ], usability inspection methods such as pluralistic walkthrough [ 2 ], and inquiry methods such as interviews [ 3 ], and focus groups [ 4 ].

Users involved in usability evaluation may generate two types of data. We term these interaction data and design feedback . Interaction data are recordings of the actual use of an interactive system, such as observational data, system logs, and data from think-aloud protocols. Design feedback are data on users’ reflections concerning an interactive system, such as comments on experiential issues, considerations of the system’s suitability for its context of use, usability problem predictions, and design suggestions.

The value of interaction data in evaluation is unchallenged. Interaction data is held to be a key source of insight in the usability of interactive systems and has been the object of thorough scientific research. Numerous empirical studies concern the identification of usability problems on the basis of observable user behaviour [ 5 ]. Indeed, empirical UEM assessments are typically done by comparing the set of usability problems identified through the assessed UEM with a set of usability problems identified during usability testing (e.g. [ 6 , 7 ]).

The value of users’ design feedback is, however, disputed. Nielsen [ 8 ] stated, as a first rule of usability, “don’t listen to users” and argued that users’ design feedback should be limited to preference data after having used the interactive system in question. Users’ design feedback may be biased due to a desire to report what the evaluator wants to hear, imperfect memory, and rationalization of own behaviour [ 8 , 9 ]. As discussed by Gould and Lewis [ 10 ], it can be challenging to elicit useful design information from users as they may not have considered alternative approaches or may be ignorant of relevant alternatives; users may simply be unaware of what they need. Furthermore, as discussed by Wilson and Sasse [ 11 ], users do not always know what is good for them and may easily be swayed by contextual factors when making assessments.

Nevertheless, numerous UEMs that involve the gathering and analysis of users’ design feedback have been suggested (e.g. [ 12 – 14 ]), and textbooks on usability evaluations typically recommend gathering data on users’ experiences or considerations in qualitative post-task or post-test interviews [ 1 , 15 ]. It is also common among usability practitioners to ask for the opinion of the participants in usability testing pertaining to usability problems or design suggestions [ 16 ].

Our current knowledge of users’ design feedback is fragmented. Despite the number of UEMs suggested to support the gathering of users’ design feedback, no coherent body of knowledge on users’ design feedback as a distinct data source has been established. Existing empirical studies of users’ design feedback typically involve the assessment of one or a small number of UEMs, and only to a limited degree build on each other. Consequently, a comprehensive overview of existing studies on users’ design feedback is needed to better understand the benefits and limitation of this data source in usability evaluation.

To strengthen our understanding of users’ design feedback in usability evaluation we present a review of the research literature on such design feedback. Footnote 1 Through the review, we have sought to provide an overview the benefits and limitations of users’ design feedback. In particular, we have investigated users’ design feedback in terms of the purposes for which it is gathered, its qualitative characteristics, its validity and thoroughness, as well as its downstream utility.

Our study is not an attempt to challenge the benefit of interaction data in usability evaluation. Rather, we assume that users’ design feedback may complement other types of evaluation data, such as interaction data or data from inspections with usability experts, thereby strengthening the value of involving users in usability evaluation.

The scope of the study is delimited to qualitative or open-ended design feedback; such data may provide richer insight into the potential benefits and limitations of users’ design feedback than do quantitative or set-response design feedback. Hence, design feedback in the form of data from set-response data gathering methods, such as standard usability questionnaires, are not considered in this review.

  • Users’ design feedback

In usability evaluation, users may engage in interaction and reflection. During interaction the user engages in behaviour that involves the user interface of an interactive system or its abstraction, such as a mock-up or prototype. The behaviour may include think-aloud verbalization of the immediate perceptions and thoughts that accompany the user’s interaction. The interaction may be recorded through video, system log data, and observation forms or notes. We term such records interaction data. Interaction data is a key data source in usability testing and typically leads to findings formulated as usability problems, or to quantitative summaries such as success rate, time on task, and number of errors [ 1 ].

During reflection, the user engages in analysis and interpretation of the interactive system or the experiences made during system interaction. Unlike the free-flowing thought processes represented in think-aloud data, user reflection typically is conducted after having used the interactive system or in response to a demonstration or presentation of the interactive system. User reflection can be made on the basis of system representations such as prototypes or mock-ups, but also on the basis of pre-prototype documentation such as concept descriptions, and may be recorded as verbal or written reports. We refer to records of user reflection as design feedback, as their purpose in usability evaluation typically is to support the understanding or improvement of the evaluated design. Users’ design feedback often lead to findings formulated as usability problems, (e.g. [ 3 , 17 ]), but also to other types of findings such as insight into users’ experiences of a particular design [ 18 ], input to user requirements [ 19 ], and suggestions for changes to the design [ 20 ].

What we refer to as users ’ design feedback eclipses what has been termed user reports [ 9 ], as its scope includes data on user’ reflections not only from inquiry methods but also from usability inspection and usability testing.

UEMs for users’ design feedback

The gathering and analysis of users’ design feedback is found in all the main UEM groups, that is, usability inspection methods, usability testing methods, and inquiry methods [ 21 ].

Usability inspection, though typically conducted by trained usability experts [ 22 ], is acknowledged to be useful also with other inspector types such as “end users with content or task knowledge” [ 23 ]. Specific inspection methods have been developed to involve users as inspectors. In the pluralistic walkthrough [ 2 ] and the participatory heuristic evaluation [ 13 ] users are involved in inspection groups together with usability experts and developers. In the structured expert evaluation method [ 24 ] and the group-based expert walkthrough [ 25 ] users can be involved as the only inspector type.

Several usability testing methods have been developed where interaction data is complemented with users’ design feedback, such as cooperative evaluation, cooperative usability testing, and asynchronous remote usability testing. In the cooperative evaluation [ 14 ] the user is told to think of himself as a co-evaluator and encouraged to ask questions and to be critical. In the cooperative usability testing [ 26 ] the user is invited to review the task solving process upon its completion and to reflect on incidents and potential usability problems. In asynchronous remote usability testing the user may be required to self-report incidents or problems, as a substitute of having these identified on the basis of interaction data [ 27 ].

Inquiry methods typically are general purpose data collection methods that have been adapted to the purpose of usability evaluation. Prominent inquiry methods in usability evaluation are interviews [ 3 ], workshops [ 28 ], contextual inquiries [ 29 ], and focus groups [ 30 ]. Also, online discussion forums have been applied for evaluation purposes [ 17 ]. Inquiry methods used for usability evaluation are generally less researched than methods for usability inspection methods and usability testing [ 21 ].

Motivations for gathering users’ design feedback

There are two key motivations for gathering design feedback from users: users as a source of knowledge and users as a source of creativity.

Knowledge of a system’ context of use is critical in design and evaluation. Such knowledge, which we in the following call domain knowledge, can be a missing evaluation resource [ 22 ]. Users have often been pointed out as a possible source of domain knowledge during evaluation [ 12 , 13 ]. Users’ domain knowledge may be most relevant for usability evaluations in domains requiring high levels of specialization or training, such as health care or gaming. In particular, users’ domain knowledge may be critical in domains where the usability expert cannot be expected to have overlapping knowledge [ 25 ]. Hence, it may be expected that the user reflections that are captured in users’ design feedback are more beneficial for applications specialized to a particular context of use than for applications with a broader target user group.

A second motivation to gather design feedback from users is to tap into their creative potential. This perspective has, in particular, been argued within participatory design. Here, users, developers, and designers are encouraged to exchange knowledge, ideas, and design suggestions in cooperative design and evaluation activities [ 31 ]. In a survey of usability evaluation state-of-the-practice, Følstad, Law, and Hornbæk [ 16 ] found that it is common among usability practitioners to ask participants in usability testing questions concerning redesign suggestions.

How to review studies of users’ design feedback?

Through a wide range of UEMs that involve users’ design feedback have been suggested, current knowledge on users’ design feedback is fragmented; in part, because the literature on relevant UEMs often do not present detailed empirical data on the quality of users’ design feedback (e.g. [ 2 , 13 , 31 ]).

We do not have a sufficient overview of the purposes for which users’ design feedback is gathered. Furthermore, we do not know the degree to which users’ design feedback serves its purpose as usability evaluation data. Does users’ design feedback really complement other evaluation data sources, such as interaction data and usability experts’ findings? To what degree can users’ design feedback be seen as a credible source of usability evaluation findings; that is, what levels of validity and thoroughness can be expected? And to what degree does users’ design feedback have an impact in the downstream development process?

To get an answer to these questions concerning users’ design feedback, we needed to single out that part of the literature which presents empirical data this topic. We assumed that this literature typically would have the form of UEM assessments, where data on users’ design feedback is compared to some external criterion to investigate its qualitative characteristics, validity and thoroughness, or downstream impact. UEM assessment as form of scientific enquiry has deep roots in the field of human–computer interaction (HCI); flourishing since the early nineties, typically pitting UEMs against each other to investigate their relative strengths and limitations (e.g. [ 32 , 33 ]). Following Gray and Salzman’s [ 34 ] criticism of early UEM assessments, studies have mainly targeted validity and thoroughness [ 35 ]. However, also aspects such as downstream utility [ 36 , 37 ] and the qualitative characteristics of the output of different UEMs (e.g. [ 38 , 39 ]) have been investigated in UEM assessments.

In our literature review, we have identified and analysed UEM assessments where the evaluation data included in the assessment at least in part are users’ design feedback.

Research question

Due to the exploratory character of the study, the following main research question was defined:

Which are the potential benefits and limitations of users’ design feedback in usability evaluations?

The main research question was then broken down into four sub-questions, following from the questions raised in the section “ How to review studies of users' design feedback? ”.

RQ1: For which purposes are users’ design feedback gathered in usability evaluation?

RQ2: How do the qualitative characteristics of users’ design feedback compare to that of other evaluation data (that is, interaction data and design feedback from usability experts)?

RQ3: Which levels of validity and thoroughness are to be expected for users’ design feedback?

RQ4: Which levels of downstream impact are to be expected for users’ design feedback?

The literature review was set up following the guidelines of Kitchenham [ 40 ], with some adaptations to fit the nature of the problem area. In this " Methods " section we describe the search, selection, and analysis process.

Search tool and search terms

Before conducting the review, we were aware of only a small number of studies concerning users’ design feedback in usability evaluation; this in spite of our familiarity with the literature on UEMs. Hence, we decided to conduct the literature search through the Google Scholar search engine to allow for a broader scoping of publication channels than what is supported in other broad academic search engines such as Scopus or Web of Knowledge [ 41 ]. Google Scholar has been criticized for including a too broad range of content in its search results [ 42 ]. However, for the purpose of this review, where we aimed to conduct a broad search across multiple scientific communities, a Google Scholar search was judged to be an adequate approach.

To establish good search terms we went through a phase of trial and error. The key terms of the research question, user and “ design feedback ”, were not useful even if combined with “ usability evaluation ”; the former due to its lack of discriminatory ability within the HCI literature, the latter because it is not an established term within the HCI field. Our solution to the challenge of establishing good search terms was to use the names of UEMs that involve users’ design feedback. An initial list of relevant UEMs was established on the basis of our knowledge of the HCI field. Then, whenever we were made aware of other relevant UEMs throughout the review process, these were included as search terms along with the other UEMs. We also included the search term “ user reports ” (combined with “ usability evaluation ”) as this term partly overlaps the term design feedback. The search was conducted in December 2012 and January 2013.

Table  1 lists the UEM names forming the basis of the search. For methods or approaches that are also used outside the field of HCI (cooperative evaluation, focus group, interview, contextual inquiry, the ADA approach, and online forums for evaluation) the UEM name was combined with the term usability or “ usability evaluation ”.

To balance the aim for a broad search with the resources available, we set a cut-off at the 100 first hits for each search. For searches that returned fewer hits, we included all. The first 100 hits is, of course, an arbitrary cut-off and it is possible that more relevant papers had been found if this limit was extended. Hence, while the search indeed is broad it cannot claim complete coverage. We do not, however, see this as a problematic limitation. In practice, the cut-off was found to work satisfactorily as the last part of the included hits for a given search term combination typically returned little of interest for the purposes of the review. Increasing the number of included hits for each search combination would arguably have given diminishing returns.

Selection and analysis

Each of the search result hits was examined according to publication channel and language. Only scientific journal and conference papers were included, as the quality of these is verified through peer review. Also, for practical reasons, only English language publications were included.

All papers were scrutinized with regard to the following inclusion criterion: Include papers with conclusions on the potential benefits and limitations of users ’ design feedback. Papers excluded were typically conceptual papers presenting evaluation methods without presenting conclusions, studies on design feedback from participants (often students) that were not also within the target user group of the system, and studies that did not include qualitative design feedback but only quantitative data collection (e.g. set-response questionnaires). In total 41 papers were retained following this filtering. Included in this set were three papers co-authored by the author of this review [ 19 , 25 , 43 ].

The retained papers were then scrutinized according to possible overlapping studies and errors in classification. Nine papers were excluded as these presented the same data on users’ design feedback as had already been presented in other of the identified papers, but in less detail. One paper was excluded as it had been erroneously classified as a study of evaluation methods.

In the analysis process, all papers were coded on four aspects directly reflecting the research question: the purpose of the gathered users’ design feedback (RQ1), the qualitative characteristics of the evaluation output (RQ2), assessments of validity and thoroughness (RQ3), and assessments of downstream impact (RQ4). Furthermore, all papers were coded according to UEM type, evaluation output types, comparison criterion (the criteria used, if any, to assess the design feedback), the involved users or participants, and research design.

The papers included for analysis concerned users’ design feedback gathered through a wide range of methods from all the main UEM groups. The papers presented studies where users’ design feedback was gathered through usability inspections, usability testing, and inquiry methods. Among the usability testing studies, users’ design feedback was gathered both as extended debriefs and for users’ self-reporting of problems or incidents. The inquiry methods were used both for stand-alone usability evaluations and as part of field tests (see Table  2 ). This width in studies should provide a good basis for making general claims on the benefits and limitations of users’ design feedback.

Of the analysed studies, 19 provided detailed empirical data supporting their conclusions. The remaining studies presented the findings only summarily. The studies which provided detailed empirical data ranged from problem-counting head-to-head UEM comparisons, (e.g. [ 3 , 17 , 27 , 44 ]) to in-depth reports on lessons learnt concerning a particular UEM (e.g. [ 30 , 45 ]). All but two of the studies with detailed presentations of empirical data [ 20 , 30 ] compared evaluation output from users’ design feedback to output from interaction data and/or data from inspections with usability experts.

In the presented studies, users’ design feedback was typically treated as a source to usability problems or incidents; this in spite that users’ design feedback may serve as a gateway also to other types of evaluation output such as experiential issues, reflections on the system’s context of use, and design suggestions. The findings from this review therefore mainly concern usability problems or incidents.

The purpose of gathering users’ design feedback (RQ1)

In the reviewed studies, different data collection methods for users’ design feedback were often pitted against each other. For example, Bruun et al. [ 44 ] compared online report forms, online discussion forum, and diary as methods to gather users’ self-reports of problems or incidents. Henderson et al. [ 3 ] compared interviews and questionnaires as means of gathering details on usability problems as part of usability testing debriefs. Cowley and Radford-Davenport [ 20 ] compared online discussion forum and focus groups for purposes of stand-alone usability evaluations.

These comparative studies surely provide relevant insight into the differences between specific data collection methods for users’ design feedback. However, though comparative, most of these studies mainly addressed one specific purpose for gathering users’ design feedback. Bruun et al. only considered users’ design feedback in the context of users’ self-reporting of problems in usability tests. Henderson et al. [ 3 ] only considered users’ self-reporting during usability testing debriefs. Cowley and Radford-Davenport [ 20 ] only considered methods for users’ design feedback as stand-alone evaluation methods. We therefore see it as beneficial to contrast the different purposes for gathering users’ design feedback in the context of usability evaluations.

Four specific purposes for gathering users’ design feedback were identified: (a) a budget approach to problem identification in usability testing, (b) to expand on interaction data from usability testing, (c) to identify problems in the users’ everyday context, and (d) to benefit from users’ knowledge or creativity.

The budget approach

In some of the studies, users’ design feedback was used as a budget approach to reach findings that one could also have reached through classical usability testing. This is, in particular, seen in the five studies of usability testing with self-reports where the users’ design feedback consisted mainly of reports of problems or incidents [ 27 , 44 , 46 – 48 ]. Here, the users were to run the usability test and report on the usability problems independently of the test administrator, potentially saving evaluation costs. For example, in their study of usability testing with disabled users, Petrie et al. [ 48 ] compared the self-reported usability problems from users that self-administer the usability test at home to those that participate in a similar usability test in the usability laboratory. Likewise, Andreasen et al. [ 27 ], Bruun et al. [ 44 ] compared different approaches to remote asynchronous usability testing. In these studies of self-reported usability problems, users’ design feedback hardly generated findings that complemented other data sources. Rather, the users’ design feedback mainly generated a subset of the usability problems already identified through interaction data.

Expanding on interaction data

Other reviewed studies concerned how users’ design feedback may expand on usability test interaction data. This was seen in some of the studies where users’ design feedback is gathered as part of the usability testing procedure or debrief session [ 4 , 14 , 19 , 49 , 59 ]. Here, users’ design feedback generated additional findings rather than merely reproducing the findings of the usability test interaction data. For example, O’Donnel et al. [ 4 ] showed how the participants of a usability test converged on new suggestions for redesign in focus group sessions following the usability test. Similarly, Følstad and Hornbæk [ 19 ] found the participants of a cooperative usability test to identify other types of usability issues when walking through completed tasks of a usability test than the issues already evident through the interaction data. In both these studies, the debrief was set up so as to aid the memory of the users by the use of video recordings from the test session [ 4 ] or by walkthroughs of the test tasks [ 19 ]. Other studies were less successful in generating additional findings through such debrief sessions. For example, Henderson et al. [ 3 ] found that users during debrief interviews, though readily reporting problems, were prone to issues concerning recall, recognition, overload, and prominence. Likewise, Donker and Markopoulos [ 51 ], in their debrief interviews with children, found them susceptible of forgetfulness. Neither of these studies included specific memory aids during the debrief session.

Problem reports from the everyday context

Users’ design feedback may also serve to provide insight that is impractical to gather by other data sources. This is exemplified in the four studies concerning users’ design feedback gathered through inquiry methods as part of field tests [ 17 , 28 , 45 , 52 ]. Here, users reported on usability problems as they appear in everyday use of the interactive system, rather than usability problems encountered during the limited tasks of a usability test. As such, this form of users’ design feedback provides insight into usability problems presumably holding high face validity, and that may be difficult to identify during usability testing. For example, Christensen and Frøkjær [ 45 ], gathered user reports on problems with a fleet management systems through an integrated reporting software. Likewise, Horsky et al. gathered user reports on problems with a medial application through emails from medical personnel. The user reports in these studies, hence, provided insight into problems as they appeared in the work-day of the fleet managers and medical personnel respectively.

Benefitting from users’ knowledge and creativity

Finally, in some of the studies, users’ design feedback was gathered with the aim of benefiting from the particular knowledge or creativity of users. This is, in particular, seen in studies where users were involved as usability inspectors [ 25 , 43 , 53 , 54 ] and in studies where inquiry methods were applied for stand-alone usability evaluations [ 20 , 28 , 30 , 55 , 56 ]. Also, some of the studies where users’ design feedback was gathered through extended debriefing sections had such a purpose [ 3 , 4 , 19 , 57 ]. For example, in their studies of users as usability inspectors, Barcelos et al. [ 53 ], Edwards et al. [ 54 ], and Følstad [ 25 ] found the user inspectors to be particularly attentive to other aspects of the interactive systems than did the usability expert inspectors. Cowley and Radford-Davenport [ 20 ], as well as Ebenezer [ 58 ], in their studies of focus groups and discussion forums for usability evaluation, found participants to eagerly provide design suggestions, as did Sylaiou et al. [ 64 ] in their study of evaluations based on interviews and questionnaires with open-ended questions. Similarly, O’Donnel et al. [ 4 ] found users in focus groups arranged as follow-ups to classical usability testing sessions to identify and develop design suggestions; in particular in response to tasks that were perceived by the users as difficult.

How do the qualitative characteristics of users’ design feedback compare to that of other evaluation data? (RQ2)

Given that users design feedback is gathered with the purpose of expanding on the interaction data from usability testing, or with the aim of benefitting from users knowledge and creativity, it is relevant to know whether users’ design feedback actually generate findings that are different to what one could have reached through other data sources. Such knowledge may be found in the studies that addressed the qualitative characteristics of the usability issues identified on the basis on users’ design feedback.

The qualitative characteristics of the identified usability issues were detailed in nine of the reviewed papers [ 17 , 19 , 20 , 25 , 28 , 52 – 54 , 59 ]. These studies indeed suggest that evaluations based on users’ design feedback may generate output that is qualitatively different from that of evaluations based on other types of data. A striking finding across these papers is the degree to which users’ design feedback may facilitate the identification of usability issues specific to the particular domain of the interactive system. In six of the papers addressing the qualitative characteristics of the evaluation output [ 19 , 25 , 28 , 52 – 54 ], the findings based on users’ design feedback concerned domain-specific issues not captured by the alternative UEMs. For example, in a heuristic evaluation of virtual world applications, studied by Barcelos et al. [ 53 ], online gamers that were representative of the typical users of the applications identified relatively more issues related to the concept of playability than did usability experts. Emergency response personnel and mobile salesforce representatives involved in cooperative usability testing, studied by Følstad and Hornbæk [ 19 ], identified more issues concerning needed functionality and organisational requirements when providing design feedback in the interpretation phases of the testing procedure than when providing interaction data in the interaction phases. The users of a public sector work support system, studied by Hertzum [ 28 ], identified more utility-problems when in a workshop test, where the users were free to provide design feedback, than they did in a classical usability test. Hertzum suggested that the rigidly set tasks, observational setup, and formal setting of the usability test made this evaluation “biased toward usability at the expense of utility”, whereas the workshop allowed more free exploration on the basis of the participants’ work knowledge which was beneficial for the identification of utility problems and bugs.

In two of the studies, however, the UEMs involving users’ design feedback were not reported to generate more domain-specific issues than did the other UEMs [ 17 , 59 ]. These two studies differed from the others on one important point: the evaluated systems were general purpose work support systems (one spreadsheet system and one system for electronic Post-It notes), not systems for specialized work support. A key motivation for gathering users’ design feedback is that users possess knowledge not held by other parties of the development process. Consequently, as the contexts of use for these two systems most likely were well known to the involved development teams, the value of tapping into user’s domain knowledge may have been lower than for the evaluations of more specialized work support systems.

The studies concerning the qualitative characteristics of users’ design feedback also suggested the importance of not relying solely on such feedback. In all the seven studies, findings from UEMs based on users’ design feedback were compared with findings from UEMs based on other data sources (interaction data or usability experts’ findings). In all of these, the other data sources generated usability issues that were not identified from the users’ design feedback. For example, the usability experts in usability inspections studied by Barcelos et al. [ 53 ] and Følstad [ 25 ] identified a number of usability issues not identified by the users; issues that also had different qualitative characteristics. In the study by Barcelos et al. [ 53 ], the usability expert inspectors identified more issues pertaining to system configuration than did the user inspectors. In the study by Følstad [ 25 ], the usability expert inspectors identified more domain-independent issues. Hence, depending only on users’ design feedback would have limited the findings with respect to issues related to what Barcelos et al. [ 53 ] referred to as “the classical usability concept” (p. 303).

These findings are in line with our assumption that users’ design feedback may complement other types of evaluation data by supporting qualitatively different evaluation output, but not replace other evaluation data. Users’ design feedback may constitute an important addition to other evaluation data sources, by supporting the identification of domain specific usability issues and, also, user-based suggestions for redesign.

Which levels of validity and thoroughness are to be expected for users’ design feedback? (RQ3)

To rely on users’ design feedback as data in usability evaluations, we need to trust the data. To be used for any evaluation purpose, the findings based on users’ design feedback need to hold adequate levels of validity ; that is, the usability problems identified during the evaluation should reflect problems that the user can be expected to encounter when using the interactive system outside the evaluation context. Furthermore, if users’ design feedback is to be used as the only data in usability evaluations, it is necessary to know the levels of thoroughness that can be expected; that is, the degree to which the evaluation serves to identify all relevant usability problems that the user can be expected to encounter.

Following Hartson et al. [ 35 ], validity and thoroughness scores can be calculated on the basis of (a) the set of usability problems predicted with a particular UEM and (b) the set of real usability problems, that is, usability problems actually encountered by users outside the evaluation context. The challenge of such calculations, however, is that we need to establish a reasonably complete set of real usability problems. This challenge has typically been resolved by using the findings from classical usability testing as an approximation to such a set [ 65 ], though this approach introduces the risk of erroneously classifying usability problems as false alarms [ 6 ].

A substantial proportion of the reviewed papers present general views on the validity of the users’ design feedback. However, only five of the papers included in the review provide sufficient detail to calculate validity scores. This, provided that we assume that classical laboratory testing can serve as an approximation to the complete set of real usability problems. In three of these [ 44 , 46 , 47 ], the users’ design feedback was gathered as self-reports during remote usability testing, in one [ 3 ] users’ design feedback was gathered during usability testing debrief, and in one [ 43 ] users’ design feedback was gathered through usability inspection. The validity scores ranged between 60% [ 43 ] and 89% [ 47 ], meaning that in all of the studies 60% or more of the usability problems or incidents predicted by the users were also confirmed by classical usability testing.

The reported validity values for users’ design feedback were arguably acceptable. For comparison, in newer empirical studies of heuristic evaluation with usability experts the validity of the evaluation output has typically been found to be well below 50% (e.g. [ 6 , 7 ]). Furthermore, following from the challenge of establishing a complete set of real usability problems, it may be assumed that several of the usability problems not identified in classical usability testing may nevertheless represent real usability problems [ 43 , 47 ].

Thoroughness concerns the proportion of predicted real problems relative to the full set of real problems [ 35 ]. Some of the above studies also provided empirical data that can be used to assess the thoroughness of users’ design feedback. In the Hartson and Castillo [ 47 ] study, 68% of the critical incidents observed during video analysis were also self-reported by the users. The similar proportion for the study by Henderson et al. [ 3 ] on problem identification from interviews was 53%. For the study on users as usability inspectors by Følstad et al. [ 43 ] the median thoroughness score for individual inspectors was 25%; however, for inspectors in nominal groups of seven thoroughness scores were raised to 70%. Larger numbers of evaluators or users is beneficial to thoroughness [ 35 ]. This is, in particular, seen in the study of Bruun et al. [ 44 ] where 43 users self-reporting usability problems in remote usability evaluations were able to identify 78% of the problems identified in classical usability testing. For comparison, in newer empirical studies of heuristic evaluation with usability experts thoroughness is typically well above 50% (e.g. [ 6 , 7 ]).

The empirical data on thoroughness seem to support the conclusion that users typically underreport problems in their design feedback, though the extent of such underreporting varies widely between evaluations. In particular, involving larger numbers of users may mitigate this deficit in users’ design feedback as an evaluation data source.

Which levels of downstream impact are to be expected for users’ design feedback? (RQ4)

Seven of the papers presented conclusions concerning the impact of users’ design feedback on the subsequent design process; that is, whether the issues identified during evaluations lead to change in later versions of the system. Rector et al. [ 60 ], Obrist et al. [ 56 ], and Wright and Monk [ 14 ] concluded that the direct access to users’ reports served to strengthen the understanding in the design team of the users’ needs. The remaining four studies concerning downstream impact, provided more detailed evidence on this.

In a study by Hertzum [ 28 ], the impact ratio for a workshop test was found to be more than 70%, which was similar to that of a preceding usability test in the same development process. Hertzum argued that a key factor determining the impact of an evaluation is its location in time: evaluations early in the development process are argued to have more impact than late evaluations. Følstad and Hornbæk [ 19 ], in their study of cooperative usability testing, found the usability issues identified on the basis of users’ design feedback during interpretation phases to have equal impact to those identified on the basis of interaction data. Følstad [ 25 ] in his study of users and usability experts as inspectors for applications for three specialized domains, found usability issues identified users on average to have higher impact than those of usability experts. Horsky et al. [ 52 ] studied usability evaluations of a medical work support system by way of users’ design feedback through email and free-text questionnaires during field trial, and compared the findings from these methods to findings from classical usability testing and inspections conducted by usability experts. Here, 64% of the subsequent changes to the system were motivated from issues reported in users’ self-reports by email. E-mail reports were also the most prominent source of users’ design feedback; 85 of a total of 155 user comments were gathered through such reports. Horsky et al. suggested the problem types identified from the e-mail reports to be an important reason for the high impact of the findings from this method.

Discussion and conclusion

The benefits and limitations of users’ design feedback.

The literature review has provided an overview concerning the potential benefits and limitations of users’ design feedback. We found that users’ design feedback can be gathered for four purposes. When users’ design feedback is gathered to expand on interaction data from usability testing, as in usability testing debriefs (e.g. [ 4 ]), or benefitting from the users’ knowledge or creativity, as in usability inspections with user inspectors (e.g. [ 53 ]), it is critical that the evaluation output include findings that complement what could be achieved through other evaluation data sources; if not, the rationale for gathering users’ design feedback in such studies is severely weakened. When users’ design feedback is gathered as a budget approach to classical usability testing, as in asynchronous remote usability testing (e.g. [ 44 ]), or a way to identify problems in the users’ everyday context, as in inquiry methods as part of field tests (e.g. [ 45 ]), it is critical that the evaluation output holds adequate validity and thoroughness.

The studies included in the review indicate that users’ design feedback may indeed complement other types of evaluation data. This is seen in the different qualitative characteristics for findings made on the basis of users’ design feedback compared to those made from other evaluation data types. This finding is important, as it may motivate usability professionals to make better use of UEMs particularly designed to gather of users’ design feedback to complement other evaluation data. Such UEMs may include the pluralistic walkthrough, where users participate as inspectors in groups with usability experts and development team representatives, and the cooperative usability testing, where users’ design feedback is gathered through dedicated interpretation phases added to the classical usability testing procedure. Using UEMs that support users’ design feedback seems to be particularly important when evaluating systems for specialized domains, such as that of medical personnel or public sector employees. Possibly, the added value of users’ design feedback as a complementary data source may be reduced in evaluations of interactive systems for the general public; here, the users’ design feedback may not add much to what is already identified through interaction data or usability experts’ findings.

Furthermore, the reviewed studies indicated that users’ can self-report incidents or problems validly. For usability testing with self-reporting of problems, validity values for self-reports were consistently 60% or above; most identified incidents or problems made during self-report were also observed during interaction. In the studies providing validity findings, the objects of evaluation were general purpose work support systems or general public websites, potentially explaining why the users did not make findings more complementary to that of the classical usability test.

Users were, however, found to be less able with regard to thoroughness. In the reviewed studies, thoroughness scores varied from 25 to 78%. A relatively larger number of users’ seems to be required to reach adequate thoroughness through users’ design feedback than through interaction data. Evaluation depending solely on users’ design feedback may need to increase the number of users relative to what would be done e.g. for classical usability testing.

Finally, issues identified from users’ design feedback may have substantial impact in the subsequent development process. The relative impact of users’ design feedback compared to that of other data sources may of course differ between studies and development process, e.g. due to contextual variation. Nevertheless, the reviewed studies indicate users’ design feedback to be at least as impactful as evaluation output from other data sources. This finding is highly relevant for usability professionals, whom typically aim to get the highest possible impact on development. One reason why findings from users’ design feedback were found to have relatively high levels of impact may be that such findings, as opposed to, for example, the findings of usability experts in usability inspections, allow the development team to access the scarce resource of users’ domain knowledge. Hence, the persuasive character of users’ design feedback may be understood as a consequence of it being qualitatively distinct from evaluation output from other data sources, rather than merely being a consequence of this feedback coming straight from the users.

Implications for usability evaluation practice

The findings from the review may be used to advice usability evaluation practice. In the following, we summarize what we find to be the most important take-away for practitioners:

Users’ design feedback may be particularly beneficial when conducting evaluation of interactive systems for specialized contexts of use. Here, users’ design feedback may generate findings that complement those based on other types of evaluation data. However, for this benefit to be realized, the users’ design feedback should be gathered with a clear purpose of benefitting from the knowledge and creativity of users.

When users’ design feedback is gathered through extended debriefs, users are prone to forgetting encountered issues or incidents. Consider supporting the users recall by the use of, for example, video recordings from system interaction or by walking through the task.

Users’ design feedback may support problem identification, in evaluations where the purpose is a budget approach to usability testing or problem reporting from the field. However, due to challenges in thoroughness, it may be necessary to scale up such evaluations to involve more users than would be needed e.g. for classical usability testing.

Evaluation output based on users’ design feedback seems to be impactful in the downstream development process. Hence, gathering users’ design feedback may be an effective way to boost the impact of usability evaluation.

Limitations and future work

Being a literature review, this study is limited by the research papers available. Though evaluation findings from interaction data and inspections with usability experts have been thoroughly studied in the research literature, the literature on users’ design feedback is limited. Furthermore, as users’ design feedback is not used as a term in the current literature, the identification of relevant studies was challenging to the point that we cannot be certain that not some relevant study has passed unnoticed.

Nonetheless, the identified papers, though concerning a wide variety of UEMs, were found to provide reasonably consistent findings. Furthermore, the findings suggest that users’ design feedback is a promising area for further research on usability evaluation.

The review also serves to highlight possible future research directions, to optimize UEMs for users’ design feedback and to further investigate which types of development processes that in particular benefit from users’ design feedback. In particular, the following topics may be highly relevant for future work:

More systematic studies of the qualitative characteristics of UEM output in general, and users’ design feedback in particular. In the review, a number of studies addressing various qualitative characteristics were identified. However, to optimize UEMs for users’ design feedback it may be beneficial to study the qualitative characteristics of evaluation output according to more comprehensive frameworks where feedback is characterized e.g. in terms of being general or domain-specific as well as being problem oriented, providing suggestions, or concerning the broader context of use.

Investigating users’ design feedback across types of application areas. The review findings suggest that the usefulness of users’ design feedback in part may be decided by application area. In particular, application domains characterized by high levels of specialization may benefit more from evaluations including users’ design feedback, as the knowledge represented by the users are not as easily available through other means as for more general domains. Future research is needed for more in-depth study of this implication of the findings.

Systematic studies of users’ design feedback across the development process. It is likely, as seen from the review, that the usefulness of users’ design feedback may be dependent on which stage of the development process in which the evaluation is conducted. Furthermore, different stages of the development process may require different UEMs for gathering users’ design feedback. In the review, we identified four typical motivations for gathering users’ design feedback. These may serve as a starting point for further studies of users’ design feedback across the development process.

While the review provides an overview of our current and fragmented knowledge of users’ design feedback, important areas of research still remain. We conclude that users’ design feedback is a worthy topic of future UEM research, and hope that this review can serve as a starting point for this endeavour.

The review is based on the author’s Ph.D. thesis on users’ design feedback, where it served to position three studies conducted by the authors relative to other work done within this field. The review presented in this paper includes these three studies as they satisfy the inclusion criteria for the review. It may also be noted that, to include a broader set of perspectives on the benefits and limitations of users’ design feedback, the inclusion criteria applied in the review presented here is more relaxed compared to that of the Ph.D. thesis. The thesis was accepted at the University of Oslo in 2014.

Rubin J, Chisnell D (2008) Handbook of usability testing: how to plan, design, and conduct effective tests, 2nd edn. Wiley, Indianapolis

Google Scholar  

Bias RG (1994) The pluralistic usability walkthrough: coordinated empathies. In: Nielsen J, Mack RL (eds) Usability inspection methods. Wiley, New York, pp 63–76

Henderson R, Podd J, Smith MC, Varela-Alvarez H (1995) An examination of four user-based software evaluation methods. Interact Comput 7(4):412–432

Article   Google Scholar  

O’Donnel PJ, Scobie G, Baxter I (1991) The use of focus groups as an evaluation technique in HCI. In: Diaper D, Hammond H (eds) People and computers VI, proceedings of HCI 1991. Cambridge University Press, Cambridge, pp 212–224

Lewis JR (2006) Sample sizes for usability tests: mostly math, not magic. Interactions 13(6):29–33

Chattratichart J, Brodie J (2004) Applying user testing data to UEM performance metrics. In: Dykstra-Erickson E, Tscheligi M (eds) CHI’04 extended abstracts on human factors in computing systems. ACM, New York, pp 1119–1122

Hvannberg ET, Law EL-C, Lárusdóttir MK (2007) Heuristic evaluation: comparing ways of finding and reporting usability problems. Interact Comput 19(2):225–240

Nielsen J (2001) First rule of usability? don’t listen to users. Jakob Nielsen’s Alertbox: August 5, 2001. http://www.nngroup.com/articles/first-rule-of-usability-dont-listen-to-users/

Whitefield A, Wilson F, Dowell J (1991) A framework for human factors evaluation. Behav Inf Technol 10(1):65–79

Gould JD, Lewis C (1985) Designing for usability: key principles and what designers think. Commun ACM 28(3):300–311

Wilson GM, Sasse MA (2000) Do users always know what’s good for them? Utilising physiological responses to assess media quality. People and computers XIV—usability or else!. Springer, London, pp 327–339.

Chapter   Google Scholar  

Åborg C, Sandblad B, Gulliksen J, Lif M (2003) Integrating work environment considerations into usability evaluation methods—the ADA approach. Interact Comput 15(3):453–471

Muller MJ, Matheson L, Page C, Gallup R (1998) Methods & tools: participatory heuristic evaluation. Interactions 5(5):13–18

Wright PC, Monk AF (1991) A cost-effective evaluation method for use by designers. Int J Man Mach Stud 35(6):891–912

Dumas JS, Redish JC (1999) A practical guide to usability testing. Intellect Books, Exeter

Følstad A, Law E, Hornbæk K (2012) Analysis in practical usability evaluation: a survey study. In: Chi E, Höök K (eds) Proceedings of the SIGCHI conference on human factors in computing systems, CHI '12. ACM, New York, pp 2127–2136

Smilowitz ED, Darnell MJ, Benson AE (1994) Are we overlooking some usability testing methods? A comparison of lab, beta, and forum tests. Behav Inf Technol 13(1–2):183–190

Vermeeren AP, Law ELC, Roto V, Obrist M, Hoonhout J, Väänänen-Vainio-Mattila K (2010) User experience evaluation methods: current state and development needs. In: Proceedings of the 6th Nordic conference on human-computer interaction: extending boundaries, ACM, New York, p 521–530

Følstad A, Hornbæk K (2010) Work-domain knowledge in usability evaluation: experiences with cooperative usability testing. J Syst Softw 83(11):2019–2030

Cowley JA, Radford-Davenport J (2011) Qualitative data differences between a focus group and online forum hosting a usability design review: a case study. Proceedings of the human factors and ergonomics society annual meeting 55(1): 1356–1360

Jacobsen NE (1999) Usability evaluation methods: the reliability and usage of cognitive walkthrough and usability test. (Doctoral thesis. University of Copenhagen, Denmark)

Cockton G, Lavery D, Woolrych A (2008) Inspection-based evaluations. In: Sears A, Jacko J (eds) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, 2nd edn. Lawrence Erlbaum Associates, New York, pp 1171–1190

Mack RL, Nielsen J (1994) Executive summary. In: Nielsen J, Mack RL (eds) Usability inspection methods. Wiley, New York, pp 1–23

Baauw E, Bekker MM, Barendregt W (2005) A structured expert evaluation method for the evaluation of children’s computer games. In: Costabile MF, Paternò F (Eds.) Proceedings of human-computer interaction—INTERACT 2005, lecture notes in computer science 3585, Springer, Berlin, p 457–469

Følstad A (2007) Work-domain experts as evaluators: usability inspection of domain-specific work support systems. Int J Human Comp Interact 22(3):217–245

Frøkjær E, Hornbæk K (2005) Cooperative usability testing: complementing usability tests with user-supported interpretation sessions. In: van der Veer G, Gale C (eds) CHI’05 extended abstracts on human factors in computing systems. ACM Press, New York, pp 1383–1386

Andreasen MS, Nielsen HV, Schrøder SO, Stage J (2007) What happened to remote usability testing? An empirical study of three methods. In: Rosson MB, Gilmore D (Eds.) CHI’97: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, p 1405–1414

Hertzum M (1999) User testing in industry: a case study of laboratory, workshop, and field tests. In: Kobsa A, Stephanidis C (Eds.) Proceedings of the 5th ERCIM Workshop on User Interfaces for All, Dagstuhl, Germany, November 28–December 1, 1999. http://www.interaction-design.org/references/conferences/proceedings_of_the_5th_ercim_workshop_on_user_interfaces_for_all.html

Rosenbaum S, Kantner L (2007) Field usability testing: method, not compromise. Proceedings of the IEEE international professional communication conference, IPCC 2007. doi: 10.1109/IPCC.2007.4464060

Choe P, Kim C, Lehto MR, Lehto X, Allebach J (2006) Evaluating and improving a self-help technical support web site: use of focus group interviews. Int J Human Comput Interact 21(3):333–354

Greenbaum J, Kyng M (eds) (1991) Design at work. Lawrence Erlbaum Associates, Hillsdale

Desurvire HW, Kondziela JM, Atwood ME (1992) What is gained and lost when using evaluation methods other than empirical testing. In: Monk A, Diaper D, Harrison MD (eds) People and computers VII: proceedings of HCI 92. Cambridge University Press, Cambridge, pp 89–102

Karat CM, Campbell R, Fiegel T (1992) Comparison of empirical testing and walkthrough methods in user interface evaluation. In: Bauersfeld P, Bennett J, Lynch G (Eds.) CHI’92: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, p 397–404

Gray WD, Salzman MC (1998) Damaged merchandise? A review of experiments that compare usability evaluation methods. Human Comput Interact 13(3):203–261

Hartson HR, Andre TS, Williges RC (2003) Criteria for evaluating usability evaluation methods. Int J Human Comput Interact 15(1):145–181

Law EL-C (2006) Evaluating the downstream utility of user tests and examining the developer effect: a case study. Int J Human Comput Interact 21(2):147–172

Uldall-Espersen T, Frøkjær E, Hornbæk K (2008) Tracing impact in a usability improvement process. Interact Comput 20(1):48–63

Frøkjær E, Hornbæk K (2008) Metaphors of human thinking for usability inspection and design. ACM Trans Comput Human Interact (TOCHI) 14(4):20:1–20:33

Fu L, Salvendy G, Turley L (2002) Effectiveness of user testing and heuristic evaluation as a function of performance classification. Behav Inf Technol 21(2):137–143

Kitchenham B (2004) Procedures for performing systematic reviews (Technical Report TR/SE-0401). Keele, UK: Keele University. http://www.scm.keele.ac.uk/ease/sreview.doc

Harzing AW (2013) A preliminary test of Google Scholar as a source for citation data: a longitudinal study of Nobel prize winners. Scientometrics 94(3):1057–1075

Meho LI, Yang K (2007) Impact of data sources on citation counts and rankings of LIS faculty: web of Science versus Scopus and Google Scholar. J Am Soc Inform Sci Technol 58(13):2105–2125

Følstad A, Anda BC, Sjøberg DIK (2010) The usability inspection performance of work-domain experts: an empirical study. Interact Comput 22:75–87

Bruun A, Gull P, Hofmeister L, Stage J (2009) Let your users do the testing: a comparison of three remote asynchronous usability testing methods. In: Hickley K, Morris MR, Hudson S, Greenberg S (Eds.) CHI’09: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, p 1619–1628

Christensen L, Frøkjær E (2010) Distributed usability evaluation: enabling large-scale usability evaluation with user-controlled Instrumentation. In: Blandford A, Gulliksen J (Eds.) NordiCHI’10: Proceedings of the 6th Nordic conference on human-computer interaction: extending boundaries, ACM, New York, p 118–127

Bruun A, Stage J (2012) The effect of task assignments and instruction types on remote asynchronous usability testing. In: Chi EH, Höök K (Eds.) CHI’12: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, p 2117–2126

Hartson H R, Castillo JC (1998) Remote evaluation for post-deployment usability improvement. In: Catarci T, Costabile MF, Santucci G, Tarafino L, Levialdi S (Eds.) AVI98: Proceedings of the working conference on advanced visual interfaces, ACM Press, New York, p 22–29

Petrie H, Hamilton F, King N, Pavan P (2006) Remote usability evaluations with disabled people. In: Grinter R, Rodden T, Aoki P, Cutrell E, Jeffries R, Olson G (Eds.) CHI’06: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, p 1133–1141

Cunliffe D, Kritou E, Tudhope D (2001) Usability evaluation for museum web sites. Mus Manag Curatorship 19(3):229–252

Sullivan P (1991) Multiple methods and the usability of interface prototypes: the complementarity of laboratory observation and focus groups. In: Proceedings of the Internetional Conference on Systems Documentation—SIGDOC’91, ACM, New York, p 106–112

Donker A, Markopoulos P (2002) A comparison of think-aloud, questionnaires and interviews for testing usability with children. In: Faulkner X, Finlay J, Détienne F (eds) People and computers XVI—memorable yet invisible, proceedings of HCI 202. Springer, London, pp 305–316

Horsky J, McColgan K, Pang JE, Melnikas AJ, Linder JA, Schnipper JL, Middleton B (2010) Complementary methods of system usability evaluation: surveys and observations during software design and development cycles. J Biomed Inform 43(5):782–790

Barcelos TS, Muñoz R, Chalegre V (2012) Gamers as usability evaluators: A study in the domain of virtual worlds. In: Anacleto JC, de Almeida Nedis VP (Eds.) IHC’12: Proceedings of the 11th brazilian symposium on human factors in computing systems, Brazilian Computer Society, Porto Alegre, p 301–304

Edwards PJ, Moloney KP, Jacko JA, Sainfort F (2008) Evaluating usability of a commercial electronic health record: a case study. Int J Hum Comput Stud 66:718–728

Kontio J, Lehtola L, Bragge J (2004) Using the focus group method in software engineering: obtaining practitioner and user experiences. In: Proceedings of the International Symposium on Empirical Software Engineering – ISESE, IEEE, Washington, p 271–280

Obrist M, Moser C, Alliez D, Tscheligi M (2011) In-situ evaluation of users’ first impressions on a unified electronic program guide concept. Entertain Comput 2:191–202

Marsh SL, Dykes J, Attilakou F (2006) Evaluating a geovisualization prototype with two approaches: remoteinstructional vs. face-to-face exploratory. In: Proceedings of information visualization 2006, IEEE, Washington, p 310–315

Ebenezer C (2003) Usability evaluation of an NHS library website. Health Inf Libr J 20(3):134–142

Yeo A (2001) Global-software development lifecycle: an exploratory study. In: Jacko J, Sears A (Eds.) CHI’01: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, p 104–111

Rector AL, Horan B, Fitter M, Kay S, Newton PD, Nowlan WA, Robinson D, Wilson A (1992) User centered development of a general practice medical workstation: The PEN&PAD experience. In: Bauersfeld P, Bennett J, Lunch G (Eds.) CHI ‘92: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, p 447–453

Smith A, Dunckley L (2002) Prototype evaluation and redesign: structuring the design space through contextual techniques. Interact Comput 14(6):821–843

Ross S, Ramage M, Ramage Y (1995) PETRA: participatory evaluation through redesign and analysis. Interact Comput 7(4):335–360

Lamanauskas L, Pribeanu C, Vilkonis R, Balog A, Iordache DD, Klangauskas A (2007) Evaluating the educational value and usability of an augmented reality platform for school environments: some preliminary results. In: Proceedings of the 4th WSEAS/IASME international conference on engineering education p 86–91

Sylaiou S, Economou M, Karoulis A, White M (2008) The evaluation of ARCO: a lesson in curatorial competence and intuition with new technology. ACM Comput Entertain 6(20):23

Hornbæk K (2010) Dogmas in the assessment of usability evaluation methods. Behav Inf Technol 29(1):97–111

Download references

Acknowledgements

The presented work was supported the Research Council of Norway Grant Numbers 176828 and 203432. Thanks to Professor Kasper Hornbæk for providing helpful and constructive input on the manuscript and for supervising the Ph.D. work on which it is based.

Competing interests

The author declares no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and affiliations.

SINTEF, Forskningsveien 1, 0373, Oslo, Norway

Asbjørn Følstad

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Asbjørn Følstad .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Følstad, A. Users’ design feedback in usability evaluation: a literature review. Hum. Cent. Comput. Inf. Sci. 7 , 19 (2017). https://doi.org/10.1186/s13673-017-0100-y

Download citation

Received : 02 July 2016

Accepted : 18 May 2017

Published : 03 July 2017

DOI : https://doi.org/10.1186/s13673-017-0100-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Usability evaluation
  • User reports
  • Literature review

literature review about usability evaluation methods

  • Open access
  • Published: 28 February 2024

Towards a validated glossary of usability attributes for the evaluation of wearable robotic devices

  • Diana Herrera-Valenzuela 1 , 2 ,
  • Jan T. Meyer 3 ,
  • Antonio J. del-Ama 4 ,
  • Juan C. Moreno 5 , 6 ,
  • Roger Gassert 3 , 7 &
  • Olivier Lambercy 3 , 7  

Journal of NeuroEngineering and Rehabilitation volume  21 , Article number:  30 ( 2024 ) Cite this article

114 Accesses

3 Altmetric

Metrics details

Despite technical advances in the field of wearable robotic devices (WRD), there is still limited user acceptance of these technologies. While usability often comes as a key factor influencing acceptance, there is a scattered landscape of definitions and scopes for the term. To advance usability evaluation, and to integrate usability features as design requirements during technology development, there is a need for benchmarks and shared terminology. These should be easily accessible and implementable by developers.

An initial set of usability attributes (UA) was extracted from a literature survey on usability evaluation in WRD. The initial set of attributes was enriched and locally validated with seven developers of WRD through an online survey and a focus group. The locally validated glossary was then externally validated through a globally distributed online survey.

The result is the Robotics Usability Glossary (RUG), a comprehensive glossary of 41 UA validated by 70 WRD developers from 17 countries, ensuring its generalizability. 31 of the UA had high agreement scores among respondents and 27 were considered highly relevant in the field, but only 11 of them had been included as design criteria by the respondents.

Conclusions

Multiple UA ought to be considered for a comprehensive usability assessment. Usability remains inadequately incorporated into device development, indicating a need for increased awareness and end-user perspective. The RUG can be readily accessed through an online platform, the Interactive Usability Toolbox (IUT), developed to provide context-specific outcome measures and usability evaluation methods. Overall, this effort is an important step towards improving and promoting usability evaluation practices within WRD. It has the potential to pave the way for establishing usability evaluation benchmarks that further endorse the acceptance of WRD.

Introduction

Over the last decades, we have witnessed an outstanding evolution in the field of wearable robotic devices (WRD) for rehabilitation and assistance. However, despite technical advances, user acceptance and adoption of these technologies is still very limited [ 1 ]. This fact is increasingly attracting the interest of researchers in the WRD field with the aim of better understanding its causes and the limiting factors of the user experience in human–robot interactions [ 2 ]. Of particular importance, studies have shown the limited evaluation of user satisfaction with WRD [ 3 ], the lack of validated tools to assess devices from the user’s perspective [ 4 ], and the need to improve their usability [ 1 ].

When it comes to usability, there is a scattered landscape of definitions and scopes for the term. The best-known standard related to usability of human–robot interactions is ISO 9241-11, which defines usability as “the extent to which the user’s physical, cognitive and emotional responses that result from the use of a system, product, or service meet the user’s needs and expectation” [ 5 ]. However, only a few WRD studies end up using the exact terminology the standard provides, underlining the difficulty in capturing the complex construct of usability by the means of only three dimensions: effectiveness, efficiency, and satisfaction. As a consequence, other models including further dimensions have been proposed to evaluate usability in assistive technologies [ 6 , 7 , 8 , 9 ], demonstrating that technology developers more often refer to usability using a broader scope of terms, hereinafter called “usability attributes” (UA). The definition of such UA is often blurry, offering the possibility for different interpretations based on the educational background, the language, as well as application context. Consequently, as of now, there exist no validated definitions of UA that are easily accessible and, more importantly, that were agreed upon by WRD developers. Only once the field establishes an agreement upon specific UA with their respective definitions, can we ensure the WRD community evaluates the same things and provides data that can be more easily compared across devices and studies.

In this regard, open-source benchmarks for the evaluation of WRD have been developed recently in two coordinated European efforts: Eurobench [ 10 ] and the European Cooperation in Science and Technology (COST) action for Wearable Robotics [ 11 ]. Eurobench aimed to create a framework for applying benchmarking methodology on bipedal robotic systems, including lower limb WRD and robotic humanoids. To run the evaluations proposed in their framework, two facilities with standardized equipment and settings to evaluate lower limb WRD were set up in Europe. Only one of the 75 protocols developed in the Eurobench framework addresses the usability of WRD. This evaluation is conducted through a questionnaire including the attributes acceptability, perceptibility, and functionality. The questionnaire evaluates usability by asking if the device is useful to the user and provides a scoring system based on the three dimensions stated by ISO 9241-11 [ 12 ]. Additionally, the protocol is limited only to lower limb WRD, has limited accessibility for developers around the world due to the specialized setups required to evaluate the technologies, and is only applicable to devices in advanced development stages with Technology Readiness Levels (TRL) ≥ 7. On the other hand, the first objective of the COST action for wearable robotics was to create a common understanding of terms and concepts related to wearable robotics among fields of expertise in general. Nevertheless, their vocabulary is not specific to usability or user experience. As such, the term usability itself was not included, but the UA cognitive load , mental fatigue , robustness , and wearability were separately considered [ 11 ]. This further highlights the need for a more comprehensive, usability-focused framework to define and evaluate the usability of WRD at any TRL.

With a similar motivation, the committee F48 on Exoskeletons and Exosuits formed by the American Society for Testing and Materials (ASTM) has been working to develop voluntary consensus standards for WRD since 2017. They have a subcommittee specifically devoted to defining a Standard Terminology for these WRD, which published the standard F3323-21 with the proposed terms and definitions [ 13 ]. Nonetheless, this standard is not related to usability, is not open-access, and was not externally validated, thus having limited accessibility and applicability among WRD developers.

To push usability evaluation and integrate usability features as design requirements during technology development, we need to create benchmarks and shared terminology that can be unequivocally understood, are easily accessible and implementable by WRD researchers and developers. To this end, the Interactive Usability Toolbox (IUT) was developed at ETH Zurich [ 14 ]. It takes the form of an online platform aimed at increasing and improving usability evaluation practices during the development of WRD [ 15 ]. The Toolbox facilitates the search and selection of context-specific outcome measures and usability research methods, including the option to select specific UA as part of the intended context. To guarantee the comprehensiveness, generalizability and validity of the UA, which are the starting point to recommend specific usability evaluation methods, we aimed to develop an internationally validated glossary of UA as part of the IUT. The objective of this paper is to describe the process of building and externally validating the Robotics Usability Glossary (RUG), a glossary with consensus-based definitions for each commonly used UA. Specifically, we provide the results of a two-step validation consisting first of a local evaluation with usability experts, followed by an online survey administered to developers of WRD around the world to assess the external validity of this glossary. These agreed UA should then become the basis to find and create more widely accepted benchmarks for the usability evaluation of WRD.

Study design

An initial set of UA was extracted from a literature survey on usability evaluation in WRD. The initial set of attributes was enriched and locally validated with seven developers of WRD through an online survey and a focus group, leading to a reasonable consensus. The locally validated glossary was then externally validated through a globally distributed online survey. The current study purposely targeted only technology developers because they are mostly the ones conducting and designing usability evaluations or WRD. Therefore, we aimed to reach a consensus among them. Figure  1 summarizes the overall methodology. The details of the process of building the glossary and of the two-step validation are described in the following sections.

figure 1

Schematics of the methodology followed to build the UA glossary, validate it locally and launch an online survey to validate it worldwide. The acronyms correspond to the number of developers (n), the number of usability attributes (a), and the number of questions (Q)

Establishing the UA list

The first set of UA was gathered based on a literature survey on how usability is assessed in the field of WRD, mostly from other models proposed for usability evaluation [ 6 , 7 , 8 , 9 ]. The resulting data was summarized in 46 UA that encompass the overall usability of WRD. Previously available definitions were retrieved from their respective papers when available, from standardized guidelines such as ISO 9241-11, from international health organizations like the World Health Organization (WHO) and the Agency for Healthcare Research and Quality (AHRQ), or from English dictionaries (e.g. Cambridge Dictionary, Oxford English Dictionary). The definition that best fit the attribute with respect to WRD was selected, based on the agreement of the two main study coordinators (DHV, JTM).

Local validation

UA definitions for which the two study coordinators did not reach a consensus were discussed with a group of seven local WRD developers through an online questionnaire, where the respondents rated with a 5-point Likert scale their agreement with the provided definition(s) of each UA, as well as the applicability of each attribute for the development of WRD. The definitions with average agreement scores of at least 4.0 were thus considered locally validated and not further discussed. The remaining UA were discussed with four of the respondents of the survey during a focus group aimed at (i) improving the definitions based on the available ones and (ii) deciding to potentially merge UA with similar definitions. Despite all seven local developers being invited to participate in the focus group, only 4 of them could participate due to time availability. The session was moderated by the study coordinators (DHV, JTM). All the descriptions built during this session were scored once again by six of the respondents from the initial local survey in a second online survey.

Both surveys were reviewed and tested before being distributed to guarantee the understandability of the questions and face validity of the survey. Comment boxes were always included to gather further insights from the respondents about the definition of each UA. Before starting the study, the research aims and methods were discussed and approved among the authors, assuring that face validity was established.

Global validation

With the locally validated glossary, a second online survey was designed and launched to validate the glossary in the international community of WRD developers. The intended sample size for this study was set at 91 respondents, determined based on an estimated total target population size of N = 1000, a 95% confidence interval and 10% accepted margin of error [ 16 , 17 ]. The full set of UA was divided into four batches so that respondents rated at least one of the batches. The division of the set was done to reduce the time required to complete the survey to under 15 min, aimed at increasing the completion rate. The UA in each batch were strategically distributed to balance the ones that had lower agreement scores from the local validation. The survey contained initial questions on demographics, and respondent’s experience in device development and usability evaluation, followed by the selection of one of the batches to rate (a) the respondent’s agreement with the proposed definition for each UA, (b) the relevance of the UA for the development of WRD and (c) the inclusion of the UA as a design criterion in the developments that the respondent was involved in. For all the ratings, a 5-point Likert scale was used. If the agreement rate for any UA definition was below 3, a text box was displayed giving the option to describe how they would improve or change the proposed definition. At the end of the survey, respondents could write down further comments in a text box and they could also choose to complete other attribute batches. The survey was reviewed and tested by four researchers with three different native languages (all proficient in the English language) to guarantee the understandability of the questions and face validity of the survey. The complete survey is available in Additional file 1 : Annex 1.

All surveys were administered using the QuestionPro Survey Software (QuestionPro Inc., Austin, TX, USA). On the landing page of each survey, the study aims were presented, and informed consent was collected from the participants. Once the participants agreed with the stated terms and conditions, the surveys started. Data were collected from August 2022 to February 2023.

The participants for the local validation were recruited through purposive and convenience sampling techniques, to guarantee valuable knowledge on the aspects studied and to allow performing on-site activities like the focus group in a timely manner, since they all were familiar with the IUT beforehand. An email was sent to the experts explaining the aim of the study, both the online survey and the focus group, and inviting them to participate in both or at least in the online survey. Inclusion criteria included experience in the development and usability evaluation of WRD, previous knowledge of the IUT, and a legally valid signature of the informed consent.

For the global validation stage purposive and snowball sampling techniques were used to obtain survey responses. Recruitment was made from the authors’ wider network via email, social media, the IUT website, and as well as at international conferences related to the field of WRD. Developers contacted through these channels were encouraged to take part in the survey emphasizing the importance of reaching a consensus regarding the definitions of usability attributes within the field. Their participation was underscored as vital for the validation of the glossary, ensuring a diverse range of respondents contributed to the process. Inclusion criteria included an agreement to participate in the survey and share the results (obtained at the beginning of the survey), and experience in the development and usability evaluation of WRD, assessed through four questions regarding this matter in the questionnaire. Additionally, there was a highlighted note in the introduction of the survey indicating that only WRD developers should complete it.

Data analysis

All demographic variables and ratings are presented using descriptive statistics, either with their mean and standard deviation (mean ± STD) or with their median and quartiles first and third, Mdn (Q1–Q3), in case of high data dispersion. Categorical variables are analyzed with absolute frequency. Kolmogórov-Smirnov (KS) tests were performed for each demographic variable and rating to test for normal distribution. To further investigate whether professional experience influences the agreement, relevance or previous implementation of the UA included in the RUG, Spearman rank correlation tests were performed to assess possible correlations between each of the three ratings asked in the surveys and the professional data collected from the subjects: (i) years of experience as a developer, (ii) highest TRL achieved, (iii) the number of dedicated usability studies performed, and (iv) number of users they had previously interacted with. Lastly, the kurtosis and Pearson’s 2nd coefficient of skewness were calculated to study the distribution of the three ratings evaluated.

The local validation was performed with 7 WRD experts from ETH Zurich. In the global validation, 70 respondents from 17 countries around the globe participated. The participants' demographics and WRD experiences are summarized in Table  1 . Only 20 UA were assessed during the local validation, since those were the ones for which the study coordinators (DHV, JTM) did not reach a consensus. Of these, only the 10 attributes that were not rated with an average agreement score of at least 4.0 were further discussed during the focus group. The participants of the focus group agreed on merging three out of five pairs of UA with similar definitions, preserving only the attribute that best encompassed both definitions. Therefore, by the end of the local validation, the glossary contained 43 UA to be externally validated. The list of the individual UA, their definitions and the ratings obtained in the global validation are available in Table  2 . The full individual ratings obtained in both local and global validation stages are additionally included in Additional file 2 : Annex 2. A summary of these ratings is shown in Table  3 . Box plots showing the distribution of each type of rate among the 43 attributes are shown in Fig.  2 . The average response time for this survey was 2.74 (2.05–4.02) min for the introductory part and 6.85 (4.80–11.85) min for the UA batches. The survey reached 713 viewers worldwide, of whom 150 started the survey and 70 fully completed it (completion rate = 46.67%). The geographical distribution of the respondents of the globally distributed survey is displayed in Fig.  3 .

figure 2

Box plots for each one of the three ratings assessed in the global validation stages for all the attributes

figure 3

Respondents per country of the global validation stage. The acronyms used are United States (US), Spain (ES), Switzerland (CH), Germany (DE), Italy (IT), Korea (KR), Netherlands (NL), France (FR), Belgium (BE), India (IN), New Zealand (NZ), Brazil (BR), Greece (GR), Indonesia (ID), Poland (PL), Canada (CA) and Iceland (IS)

KS tests indicated neither the demographic data nor the ratings followed a normal distribution, as can be confirmed with the skewness and kurtosis values. Poor Spearman rank correlations (|ρ|< 0.3) [ 18 ] were found between all the ratings and professional data from the respondents. These values are presented in Table  4 .

The objective of this work was to establish and validate a glossary of usability attributes aimed at improving usability evaluation practices to support the user-centered design of WRD. The established glossary, the RUG, provides a shared and validated terminology that is easily accessible and implementable by developers. To this end, our glossary facilitates the search and selection of context-specific outcome measures and usability research methods within the online Interactive Usability Toolbox (IUT) of ETH Zurich [ 14 ]. The generalizability and validity of the UA definitions comprised in our glossary were supported by the ratings of 70 developers of WRD from 17 countries around the world, who showed high agreement (≥ 4.0) on 32 of the 43 UA, and moderate agreement (4.0 > agreement ≥ 3.5) on other 10 UA. Likewise, developers agreed on the relevance of most of these attributes in the field of WRD, with 27 UA considered as highly relevant (≥ 4.0) and other 12 as moderately relevant (4.0 > relevance ≥ 3.5). Improved definitions for the attributes considered relevant but with moderate or low agreement ratings are also proposed based on the feedback provided by the respondents. All the comments provided by the respondents and the improved definitions are included in Additional file 2 : Annex 2.

The high agreement ratings for most of the UA included in our glossary underline that, despite the wide interpretation of UA in the literature [ 6 , 7 , 8 , 9 ] our definitions are in general adequate and could serve as reference for future studies or for people interested in comprehensive usability evaluation of WRD. It is interesting to highlight that most UA with moderate or high-to-moderate agreement ratings are terms usually found within the field of engineering, e.g. autonomy , complexity , robustness , technical requirements and wearability [ 11 ]. We hypothesize that most developers possess an engineering background, which may lead them to interpret these terms in alignment with engineering-based definitions. Consequently, when prompted to provide a perspective on these terms from a different field, such as usability, discrepancies may arise. Widening the perspective of research and development teams beyond the engineering requirements is fundamental to promote the development of WRD that are usable and effectively respond to users’ needs [ 2 ].

A special case is that of ergonomics , the only attribute with low agreement but with high relevance. Ergonomics is a very wide umbrella term used differently across different fields and, thus, can be understood in different ways. In fact, this was the attribute that received the most comments. Instead of considering it as part of usability, ergonomics has long been studied as a separate field of research interacting with usability [ 19 ] and there are longstanding international efforts such as the Ergonomics Research Society or the International Ergonomics Association [ 20 ], that have stated definitions of the term ergonomics that can be adapted to suit specific fields. Consequently, several of the aspects regarding ergonomics relate also to usability, including other UA of our glossary such as comfort or wearability , and therefore, some WRD developers might consider that the whole field of ergonomics cannot be synthesized as a single, specific UA. Due to its high relevance, we consider it crucial to integrate ergonomics into the IUT, enabling developers to access the available tools for assessing the ergonomics of WRD, even though simplifying the entire field as a UA may be an oversimplification. Based on the feedback provided by the respondents and the definitions stated by the aforementioned organizations, the improved definition for ergonomics in the RUG is “the degree to which the interactions among users and elements of a WRD are optimized to increase human well-being and overall system performance including anatomical, anthropometric, physiological and biomechanical characteristics that relate to the intended use of a WRD”.

Complementary to the high agreement ratings obtained, the high (27 out of 43) and moderate (12 out of 43) relevance ratings of most UA underscore the multifaceted nature of usability. This observation highlights that usability is not a singular, simplistic concept but rather a complex interplay of various dimensions and attributes [ 16 ]. Consequently, to conduct a comprehensive assessment of usability, it becomes evident that multiple attributes of usability must be taken into consideration, highlighting the necessity for a holistic evaluation approach that transcends the prevalent trend in the field. Currently, the field predominantly relies on the use of three dimensions to describe usability (i.e. effectiveness , satisfaction , and efficiency ) and usability evaluation is predominantly related to functional or performance-related outcomes [ 21 , 22 ], followed by the evaluation ease of use , safety and comfort [ 16 , 23 ], which may overlook the richness of usability. As expected, in our survey, many of the most widespread attributes related to the usability of WRD received very high relevance ratings (≥ 4.5): safety , usefulness , comfort , reliability , wearability , effectiveness , functionality , meet user needs , and satisfaction . However, efficiency received a high but not very high rate, indicating that other attributes are more relevant to the developers than only the three stated by ISO 9241–11. The glossary provided within this study, which deems most UA as relevant, signifies that the UA summarized and validated therein serve as pivotal elements that effectively encapsulate and represent the entirety of usability. A detailed analysis of the individual ratings (see Additional file 2 : Annex 2) raises the need to debate whether the four attributes with relevance scores below 3.5 should be included in the glossary. Aesthetics and embodiment have borderline low-to-moderate relevance. Since they have been previously found to be design criteria important for the primary users of WRD under comparable terms such as “appearance” and “avoid machine body disconnection” [ 2 ], respectively, we consider they should be included in the list of UA of the IUT. Both definitions stated for these UA have high agreement, therefore, they do not need improved descriptions but rather more awareness from developers to be included as part of their design criteria, because both have poor scores in this regard. On the other hand, the UA technical requirements received a low relevance score and exhibited borderline moderate-to-low agreement among respondents. Comments associated with this attribute suggest that developers do not necessarily perceive it as an integral component of usability but rather believe that technical requirements and usability requirements are complementary in technology developments. Considering this valuable feedback, it is prudent to consider removing this attribute from the glossary. On the other hand, pleasure stands as the only UA marked with a low relevance score, albeit displaying high agreement in its definition. A detailed examination of the definition provided for this UA shows that it could be closely intertwined with the attribute of satisfaction , which holds very high relevance in the field. Hence, it may be reasonable to also consider omitting pleasure from the set of UA. Both UA are closely related to two psychology-related codes expressed by end-users of lower limb robotic devices for gait rehabilitation, including “positive feeling of being able to stand up and walk again” and “sense of wellness (physical and/or mental)” [ 2 ], underlining their relevance for end-users.

From the remaining 41 attributes, improved definitions were proposed for eight UA considered highly relevant (≥ 4.0) but with moderate ( adaptability , complexity , ease of use , helpfulness , meet user needs , robustness , and wearability ) or low ( ergonomics ) agreement ratings. In fact, most of these UA were the ones that more respondents commented on: ergonomics (10 comments), adaptability , helpfulness , wearability , and technical requirements with 4 comments each, and robustness and durability with 3 comments each. Three of these attributes ( ease of use , meet user needs , and wearability ) are also often included as design criteria (ratings ≥ 4.0), underpinning the importance of providing definitions that are agreed upon by developers in the field.

Moreover, a detailed analysis of the boxplots in Fig.  2 and the summary of the ratings in Table  3 , show that while most of the attributes of the glossary are considered relevant in the field of WRD and that there is a high agreement with their proposed definitions, they have not been often included as design criteria in previous developments [ 16 ]. This can be confirmed by comparing the respondents’ years of experience in the field (mdn = 7) and the number of dedicated usability studies performed (mdn = 2). Therefore, our study underlines that usability is still poorly considered as part of the design criteria during device development, even if developers recognize its relevance. Actually, 10 respondents (17.14%) indicated that they had not performed any dedicated usability study in their career and two respondents (2.86%) reported they had never had contact with end-users of their devices. We consider there must be a paradigm shift in WRD development towards implementing user-centered design to properly address users’ needs during device developments [ 24 , 25 , 26 ], since it is unlikely that developments done without both involving users [ 27 ] and considering usability issues will be successful in reaching end-users [ 1 , 28 , 29 ].

It is worth noting that the highest correlation among all the studied combinations was found between the ratings of “relevance in the field” and “previously included as design criteria in technology developments” (moderate correlation, ρ = 0.62, p-value ≈ 0.00). This could be explained by the fact that developers may only include as design criteria the attributes that they consider relevant and overlook the ones that they do not consider important. In fact, the eight UA seldomly included as design criteria (ratings < 3.00) are not considered highly relevant in the field (relevance < 4.0). These are accessibility , aesthetics , autonomy , desirability , embodiment , error recovery , frustration , and pleasure . All of these UA exhibit high or moderate (only in the case of autonomy ) agreement in their respective definitions. Therefore, their infrequent inclusion as design criteria, despite their moderate relevance scores, cannot be attributed to having ambiguous definitions. Instead, this pattern illustrates that some UA are potentially less relevant in specific application cases of WRD or could arise from a potential lack of awareness regarding their significance from the perspective of end-users. It's important to note that all the listed UA originally emerged as design criteria demanded by primary or secondary end-users in a prior study on lower limb WRD [ 2 ].

A moderate correlation between the professional experience related to the “number of dedicated usability studies performed” and the “number of users personally interacted” was found (ρ = 0.55, p-value ≈ 0.00). This can be easily understood because the more usability studies performed, the more users are involved in these studies. Similarly, more users must be involved in usability evaluation as technology becomes more mature, which explains the positive correlation between higher TRLs and both the “number of usability studies performed” (ρ = 0.54, p-value ≈ 0.00) and “number of users personally interacted” (ρ = 0.52, p-value ≈ 0.00). In this regard, results show that the peak values for both user involvement and usability studies are in late TRLs (i.e. 6, 8 and 9), corresponding to the stages of prototypes validated and product. Similar results were found in a previous study [ 16 ], highlighting the relevance of user involvement to develop technologies that go beyond the prototype phase and successfully reach end-users [ 30 ].

Previous efforts to define usability in WRD [ 7 , 8 ] contained 17 attributes each and agreed on seven of them. Nonetheless, some of them are related to services that must be provided by the distributors of the WRD or are entirely device-centered. Moreover, in contrast to our work, none of these models validated the attributes and their definition within the local or global community of WRD developers, limiting the diffusion, impact, and generalizability of the proposed glossaries. Therefore, their selection of terms for what is considered usability was arbitrary, and some of the proposed definitions are not specifically related to usability. The RUG comprises all the UA included in previous efforts and provides definitions specifically related to usability, including the four UA included in the COST action dictionary and the factors and subfactors in the EXPERIENCE questionnaire from Eurobench [ 11 , 12 ]. The detailed comparison between these previous works in the field and the attributes of our glossary that encompass their definitions are presented in Additional file 3 : Annex 3.

Therefore, the RUG is the most comprehensive set of UA available in the field of WRD to evaluate usability and has been externally assessed and improved by developers from most of the active countries working in the field of WRD, thus enhancing its generalizability. It can be readily accessed through the IUT website ( www.usabilitytoolbox.ch ), enabling developers to have immediate open access to the definitions of each UA and to identify context-specific outcome measures and usability evaluation methods related to each attribute. Three examples are presented in Table  5 . The results of this study do not aim to point to specific attributes as being more important than others, but rather underline that all attributes should ideally be considered for a holistic usability evaluation. Despite the glossary being built entirely in English, it was mostly agreed upon by both native and non-native English speakers. In fact, all the definitions within our glossary are not aimed exclusively at the field of WRD but were rather built from a usability perspective. This means that they could possibly be useful to be implemented in other fields related to wearables, robotics, and health technologies overall. In case such interest arises, we recommend engaging developers from each specialized field to evaluate the significance of the attributes included in our glossary and the appropriateness of the proposed definitions within their respective domains. This evaluation is advised before directly implementing the current glossary.

Limitations and future work

The estimated target sample size of the global validation stage was not fully met. Nevertheless, in line with the previous online survey experience of the research team [ 16 ], all measures to reach the largest possible sample were taken. The survey was widely shared through several channels (e.g. social media, conferences, email lists, research centers and companies, the IUT website, and Exoskeleton Report) to reach WRD developers from different countries and from both academia and industry. Additionally, the data collection period was extended until there was no increase in the responses gathered. To increase the completion rate, the survey was designed dividing the glossary into the UA batches to guarantee a reasonable response time (below 10 min.). Nevertheless, this raises an additional limitation to the study, since not all respondents rated all UA, representing a possible confound. The authors gave priority to increasing the number of responses collected, since the main objective of the study was to obtain an external validation of the glossary with the participation of a wide sample of respondents.

Collecting the professional background information of the respondents in the global survey would have enabled us to explore potential correlations between each rating and the respondents' profiles. This is important because some respondents may have a technical development-oriented perspective, while others might have professional backgrounds more closely aligned with being end-users of the technologies (e.g. clinicians or people with neurological injuries), thereby reflecting perspectives from real-life scenarios. The current study purposely targeted only technology developers because they are mostly the ones conducting and designing usability evaluations or WRD. Therefore, we aimed to reach a consensus among them. Nevertheless, understanding that there might be differences between end-users and developers regarding the perception and relevance of the usability attributes, it would be interesting to perform another study targeting only end-users. The study would be aimed at comparing the understanding and relevance of the UA included in the RUG and to check if end-users identify additional usability attributes that ought to be added to the glossary. Such an effort would require a different survey and different distribution channels to the ones used in this work. We strongly suggest including a question to identify the background of the respondents in the survey and assess possible differences in their responses. As indicated before, this is an important limitation of our study.

Another limitation of our effort is that the proposed methodology was aimed at reaching an external validation of the glossary but could instead be considered a participative assessment and improvement of the proposed definitions. Therefore, it remains as a somewhat subjective methodology, because we did not implement our global validation stage as a truly iterative process with multiple rounds of evaluation where participants could reach a consensus. Ideally, the global validation could have taken the form of an e-Delphi study [ 31 ], but such an approach is highly resource and effort demanding, which might have further limited the participation of developers. We consider that the participation of developers from several countries and with different native languages was fundamental to making the glossary generalizable, understandable, and representative to developers from all continents. For developers interested in translating the RUG to other languages, we strongly suggest such translation is performed carefully by native speakers with knowledge of the field, to make sure the specificity of the terms is preserved. Lastly, it might be worth to regularly updating the RUG based on the potential emergence of new disruptive technologies, because WRD is still a developing field. Doing it is important to assess if new attributes are needed when such devices appear in the field. A new survey can be carried out to this end. If performed, we strongly suggest also considering the application(s) of the WRD with whom respondents have experience. This is important because the relevance of certain usability attributes can depend on the application of a given WRD, as it already discussed in our paper. Alternatively, any other type of global coordinated effort between leading organizations in the field or WRD can lead to an updated version of the RUG when considered necessary by the demands of the people working in the field.

Our glossary provides a comprehensive set of UA in the field of WRD to evaluate usability. The generalizability and relevance of these UA were supported by the ratings of 70 developers of WRD from 17 countries around the world. These results signify that the UA summarized and validated in our glossary serve as pivotal elements that effectively encapsulate and represent the entirety of usability. To conduct a comprehensive assessment of usability, multiple attributes of usability must be taken into consideration, in contrast to the prevalent trend in the field. Our study underlines that usability is still poorly considered part of the design criteria during device development, even if developers recognize its relevance. In this regard, there seems to be a lack of awareness regarding the significance from the perspective of end-users of some UA considered moderately relevant but seldom included during device development.

Overall, this effort is aimed at improving usability evaluation practices during the development of WRD by providing a shared and validated terminology that is easily accessible and implementable by developers, and that can lead to the definition of benchmarks for usability evaluation to promote the acceptance of WRD.

Availability of data and materials

All new data created in this study is provided in the manuscript and the three Additional files provided.

Rodríguez-Fernández A, Lobo-Prat J, Font-Llagunes J. Systematic review on wearable lower-limb exoskeletons for gait training in neuromuscular impairments. J NeuroEngineering Rehabil. 2021;18(1):22.

Article   Google Scholar  

Herrera-Valenzuela D, Díaz-Peña L, Redondo-Galán C, Arroyo M, Cascante-Gutiérrez L, Gil-Agudo A, Moreno J, Del-Ama A. A Qualitative study to elicit user requirements for lower limb wearable exoskeletons for gait rehabilitation in spinal cord injury. JNER. 2023. https://doi.org/10.1186/s12984-023-01264-y .

Article   PubMed   PubMed Central   Google Scholar  

Lajeunesse V, Vincent C, Routhier F, Careau E, Michaud F. Exoskeletons’ design and usefulness evidence according to a systematic review of lower limb exoskeletons used for functional mobility by people with spinal cord injury. Disabil Rehabil Assist Technol. 2016;11(7):535–47.

Article   PubMed   Google Scholar  

Koumpouros Y. A systematic review on existing measures for the subjective assessment of rehabilitation and assistive robot devices. J Healthc Eng. 2016;2016:1048964.

I. O. f. Standardization, "Ergonomics of Human-System Interaction - Part 11: Usability: Definitions and Concepts," International Organization for Standardization, Geneva, CH, 2018.

Bryce T, Dijkers M, Kozlowsk J. Framework for assessment of the usability of lower-extremity robotic exoskeletal orthoses. Am J Phys Med Rehabil. 2015;94(11):1000–14.

Batavia A, Hammer G. Toward the development of consumer-based criteria for the evaluation of assistive devices. J Rehabil Res Dev. 1990;27(4):425–36.

Article   CAS   PubMed   Google Scholar  

Arthanat S, Bauer S, Lenker J, Nochajski S, Wu Y. Conceptualization and measurement of assistive technology usability. Disabil Rehabil Assist Technol. 2010;2(4):235–48.

Fuhrer M, Jutai J, Scherer M, DeRuyter F. A framework for the conceptual modelling of assistive technology device outcomes. Disabil Rehabil. 2003;25(22):1243–51.

EUROBENCH, "Eurobench," EUROBENCH. https://eurobench2020.eu/ . Accessed 27 Feb 2023

Massardi S, Briem K, Veneman J, Torricelli D, Moreno J. Re-defining wearable robots: a multidisciplinary approach towards a unified terminology. JNER. 2023;10:1068.

Google Scholar  

EUROBENCH, "User-centered assessment of exoskeleton-assisted overground walking," EUROBENCH2020EU, 06 2022. https://platform.eurobench2020.eu/protocols/info/43 . Accessed 12 Aug 2022.

A. International, Subcommittee F48.91 on Terminology, ASTM International, 2021. https://www.astm.org/get-involved/technical-committees/committee-f48/subcommittee-f48/jurisdiction-f4891 . Accessed 26 June 2023.

R. E. Lab, The Interactive Usability Toolbox, Rehabilitation Engineering Lab, ETH Zürich, 2020. Available: www.usabilitytoolbox.ch . Accessed 22 June 2023.

Meyer JT, Tanczak N, Kanzler CM, Pelletier C, Gassert R, Lambercy O. Design and validation of a novel online platform to support the usability evaluation of wearable robotic devices. Wearable Technol. 2023;4: e3.

Meyer J, Gassert R, Lambercy O. An analysis of usability evaluation practices and contexts of use in wearable robotics. J NeuroEngineering Rehabil. 2021;18:170.

Dillman D, Smyth J, Christian L. Internet, phone, mail, and mixed-mode surveys: the tailored design method. Washington: Wiley; 2014.

Book   Google Scholar  

Rovetta A. Raiders of the lost correlation: a guide on using Pearson and Spearman coefficients to detect hidden correlations in medical sciences. Cureus. 2020;12(11): e11794.

PubMed   PubMed Central   Google Scholar  

Wegge KZD. Accessibility, usability, safety, ergonomics: concepts, models, and differences. In: Universal Acess in Human Computer Interaction. Coping with Diversity. UAHCI 2007. Lecture Notes in Computer Science. vol 4554., Berlin: Springer; 2007, 294–301

I. E. Association, "What is Ergonomics (HFE)?," International Ergonomics Association. https://iea.cc/about/what-is-ergonomics/ . Accessed 20 Sep 2023.

Pinto-Fernandez D, et al. Performance evaluation of lower limb exoskeletons: a systematic review. IEEE Trans Neural Syst Rehabil Eng. 2020;28(7):1573–83.

Contreras-Vidal J, et al. Powered exoskeletons for bipedal locomotion after spinal cord injury. J Neural Eng. 2016;13: 031001.

Gantenbein J, Dittli J, Meyer J, Gassert R, Lambercy O. Intention detection strategies for robotic upper-limb orthoses: A scoping review considering usability, daily life application, and user evaluation. Front Neurorobot. 2022;16: 815693.

Hill D, Holloway C, Morgado Ramirez D, Smitham P, Pappas Y. What are user perspectives of exoskeleton technology? A literature review. Int J Technol Assess Health Care. 2017;33(2):160–7.

Brown-Triolo D, Roach M, Nelson K, Triolo R. Consumer perspectives on mobility: Implications for neuroprosthesis design. J Rehabil Res. 2002;39:659–70.

Cowan R, Fregly B, Boninger M, Chan L, Rodgers M, Reinkensmeyer D. Recent trends in assistive technology for mobility. J Neuroeng Rehabil. 2012;9:20.

Power V, de Eyto A, Hartigan B, Ortiz J, O’Sullivan L. Application of a user-centered design approach to the development of XoSoft – a lower body soft exoskeleton. Biosystems & Biorobotics. 2018;22:44–8.

McMillen A, Söderberg S. Disabled persons’ experience of dependence on assistive devices. Scand J Occup Ther. 2002;9:176–83.

Salesforce. State of the Connected Customer. San Francisco: Salesforce; 2020.

Tolikas M, Antoniou A, Ingber D. The wyss institute: a new model for medical technology innovation and translation across the academic-industrial interface. Bioeng Transl Med. 2017;2(3):247–57.

Barrett D, Heale R. What are Delphi studies? Evid Based Nurs. 2020;23:68–9.

Download references

Acknowledgements

We are grateful to all the WRD developers who participated in this study and the different surveys. Their participation and feedback ensure that our glossary considers the perspectives of researchers and developers from all around the globe. As an international community, this helps us to improve the usability of our devices and thus provide better solutions to users. We would also like to thank all the people from the institutions and consortiums that cooperated in this research: the Rehabilitation Engineering Laboratory and the Sensory-Motor Systems Lab from ETH Zurich, the Neural Rehabilitation Group from the Spanish Research Council, and the European Cooperation in Science and Technology (COST) action for wearable robotics.

Open access funding provided by Swiss Federal Institute of Technology Zurich. This work was supported in part by the Vontobel Foundation and the ETH Zurich Foundation in collaboration with Hocoma AG, as well as by the National Research Foundation Singapore (NRF) under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.

Author information

Authors and affiliations.

International Doctoral School, Rey Juan Carlos University, Madrid, Spain

Diana Herrera-Valenzuela

Biomechanics and Technical Aids Unit, National Hospital for Paraplegics, Toledo, Spain

Rehabilitation Engineering Laboratory, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland

Jan T. Meyer, Roger Gassert & Olivier Lambercy

School of Science and Technology, Department of Applied Mathematics, Materials Science and Engineering and Electronic Technology, Rey Juan Carlos University, Móstoles, Madrid, Spain

Antonio J. del-Ama

Neural Rehabilitation Group, Cajal Institute, CSIC–Spanish National Research Council, Madrid, Spain

Juan C. Moreno

Unit of Neurorehabilitation, Biomechanics and Sensorimotor Function (HNP-SESCAM), Associated Unit of R&D&I to the CSIC, Toledo, Spain

Future Health Technologies, Singapore-ETH Centre, Campus for Research Excellence and Technological Enterprise (CREATE), Singapore, Singapore

Roger Gassert & Olivier Lambercy

You can also search for this author in PubMed   Google Scholar

Contributions

The concept and research design were developed by DHV, JTM and OL. The main manuscript text was written by DHV. Data collection was performed by DHV and JTM. DHV performed the data analysis. AJA, JCM and JTM provided liaison with other organizations that have performed efforts related to the usability of WRD. All authors helped in distributing the global survey through their networks, revised, significantly improved, and approved the final version of the manuscript. They all have read and agreed to the published version of the manuscript. The funding to perform this study was obtained by RG and OL.

Corresponding authors

Correspondence to Diana Herrera-Valenzuela or Olivier Lambercy .

Ethics declarations

Ethics approval and consent to participate.

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.

Competing interests

OL is a member of the Editorial Board of Journal of NeuroEngineering and Rehabilitation. OL was not involved in the journal’s peer review process of, or decisions related to, this manuscript. The other authors declare that they have no competing interests. All the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Template of the global survey used to rate the agreement, relevance and recorded use of the usability attributes included in the glossary.

Additional file 2.

Definitions of each usability attribute and their individual ratings obtained in both local and global validation stages.

Additional file 3.

Detailed comparison between the other usability models available in the field and the usability attributes included in the RUG.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Herrera-Valenzuela, D., Meyer, J.T., del-Ama, A.J. et al. Towards a validated glossary of usability attributes for the evaluation of wearable robotic devices. J NeuroEngineering Rehabil 21 , 30 (2024). https://doi.org/10.1186/s12984-024-01312-1

Download citation

Received : 02 November 2023

Accepted : 24 January 2024

Published : 28 February 2024

DOI : https://doi.org/10.1186/s12984-024-01312-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Journal of NeuroEngineering and Rehabilitation

ISSN: 1743-0003

literature review about usability evaluation methods

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Biomed Res Int

Logo of bmri

Usability Evaluation of Dashboards: A Systematic Literature Review of Tools

Sohrab almasi.

1 Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

Kambiz Bahaadinbeigy

2 Digital Health Team, Australian College of Rural and Remote Medicine, Brisbane, QLD, Australia

3 Medical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran

Hossein Ahmadi

4 Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth PL4 8AA, UK

Solmaz Sohrabei

Reza rabiei, associated data.

All data generated or analyzed during this study are included in this published article. The data used to support the findings of this study are included within the supplementary information file(s).

Introduction

In recent years, the use of dashboards in healthcare has been considered an effective approach for the visual presentation of information to support clinical and administrative decisions. Effective and efficient use of dashboards in clinical and managerial processes requires a framework for the design and development of tools based on usability principles.

The present study is aimed at investigating the existing questionnaires used for the usability evaluation framework of dashboards and at presenting more specific usability criteria for evaluating dashboards.

This systematic review was conducted using PubMed, Web of Science, and Scopus, without any time restrictions. The final search of articles was performed on September 2, 2022. Data collection was performed using a data extraction form, and the content of selected studies was analyzed based on the dashboard usability criteria.

After reviewing the full text of relevant articles, a total of 29 studies were selected according to the inclusion criteria. Regarding the questionnaires used in the selected studies, researcher-made questionnaires were used in five studies, while 25 studies applied previously used questionnaires. The most widely used questionnaires were the System Usability Scale (SUS), Technology Acceptance Model (TAM), Situation Awareness Rating Technique (SART), Questionnaire for User Interaction Satisfaction (QUIS), Unified Theory of Acceptance and Use of Technology (UTAUT), and Health Information Technology Usability Evaluation Scale (Health-ITUES), respectively. Finally, dashboard evaluation criteria, including usefulness, operability, learnability, ease of use, suitability for tasks, improvement of situational awareness, satisfaction, user interface, content, and system capabilities, were suggested.

General questionnaires that were not specifically designed for dashboard evaluation were mainly used in reviewed studies. The current study suggested specific criteria for measuring the usability of dashboards. When selecting the usability evaluation criteria for dashboards, it is important to pay attention to the evaluation objectives, dashboard features and capabilities, and context of use.

1. Introduction

Nowadays, healthcare organizations encounter various forms of information chaos, such as information overload, erroneous information, scattered information, and incompatibility of information with job requirements [ 1 ]. Meanwhile, effective and efficient use of data in managerial and clinical decision-making can be complicated because of the massive amount of data, data collection from various sources, and lack of data organization, which can lead to increased errors [ 2 ], delayed service delivery [ 3 ], and patient safety risks [ 4 ]. Agile healthcare organizations use relevant data in their daily operational decisions, ranging from supply chain management and staff planning to care delivery planning and community health management [ 5 ].

Healthcare systems are increasingly using business intelligence systems for monitoring performance indicators [ 5 ]. According to Loewen and Roudsari, these systems are used for collecting, analyzing, and presenting organizational data to intended users in their required format in line with meeting organizational objectives [ 6 ]. Dashboards are one of these systems widely used in the healthcare settings. Through data visualization, dashboards provide practical feedback to improve performance, promote the use of evidence-based methods, and enhance workflow and resource management [ 7 , 8 ]. These tools also use visual representations, such as charts and color coding, to facilitate the interpretation of information [ 8 , 9 ].

Generally, dashboards, as data management tools, collect data from various information systems and present them based on key performance indicators in a concise, comprehensive, meaningful, and intelligent manner. Additionally, dashboards provide useful information to managers to enable them to check their performance at a glance, easily identify the existing problems and their leading causes, and take necessary actions for performance improvement [ 10 , 11 ]. Nevertheless, development of dashboards is a complex process, as the information needs of users are completely dependent on the context of use and factors, such as clinical environment, occupational roles, and patient population, which also influence the selection of proper data elements, visualizations, and interactive capabilities [ 12 – 14 ]. Therefore, in the design of dashboards, particular attention must be paid to usability principles and human factors to deliver interactive and data sharing capabilities [ 15 ].

In order to have efficient dashboards for clinical and managerial decisions, these tools should have no or minor usability problems. One of the methods to ensure the proper design of software programs and health information systems, such as dashboards, is to use proper evaluation criteria for system usability. Generally, usability evaluation deals with various software features, including the ease of learning, efficiency, ease of use, memorization, error prevention, and user satisfaction. According to the ISO 9241-11, usability can be defined as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [ 16 ]. This definition refers to the user's experience of human-machine interactions. Regardless of the product type, it is not only important to achieve specific goals but also the user's satisfaction and experience of the system are significant [ 16 ]. For dashboards, similar to other information systems, usability can be defined as “the extent to which a system is used by users to achieve specific goals with high efficacy, efficiency, and satisfaction” [ 17 ].

One of the most well-known classifications for usability evaluation methods was developed by Nielsen [ 18 ] and Holzinger [ 19 ]. According to this classification, the usability evaluation methods can be divided into two categories: usability inspection and usability testing. The first category refers to experts' inspections of the user interface design based on standards using inspection techniques. On the other hand, usability inspection is aimed at identifying the usability problems of a design [ 20 ], although it can be also applied to determine the user interface characteristics of systems that have not been implemented. The main methods of usability inspection include (1) heuristic evaluation, (2) cognitive walkthrough, and (3) action analysis [ 21 ].

The process of usability testing is different from that of usability inspection. In usability testing, several end users, on behalf of other users, implement a series of tasks using a prototype system so that experts can detect usability problems by observing their performance. These methods can provide direct access to information on how users employ systems [ 19 ]. Some of the most common usability testing methods include (1) paper and pencil tests, (2) think aloud, (3) codiscovery, (4) field observation, (5) query techniques, (6) questionnaires, and (7) card sorting [ 21 ].

Questionnaires have been employed as usability testing methods to collect the users' demographic data and opinions [ 22 ]. In recent years, various questionnaires have been developed to evaluate the usability dimensions [ 22 ]. The most well-known questionnaires for usability testing include the Computer System Usability Questionnaire (CSUQ), Post-Study System Usability Questionnaire (PSSUQ), Questionnaire for User Interaction Satisfaction (QUIS), Software Usability Measurement Inventory (SUMI), System Usability System (SUS), Usability Metric for User Experience (UMUX and UMUX-Lite), and Usefulness, Satisfaction, and Ease of Use (USE) [ 21 , 22 ].

Our search indicated that the questionnaires used for the usability evaluation of dashboards are not specially designed for this purpose, and they could fail to appropriately measure the main capabilities and features of these systems.

On the other hand, previous studies mainly have focused on identifying important functional and nonfunctional requirements of healthcare dashboards [ 8 , 9 ], the effect of dashboards in improving patient outcomes and in healthcare provider satisfaction [ 12 , 17 ], and developing frameworks for designing dashboards [ 13 ].

Given the role of dashboards in the decision-making process and the multiplicity of questionnaires, it can be challenging to select a proper questionnaire for the usability evaluation framework of dashboards. Since no study has yet presented a framework or tool for evaluating the usability of dashboards, the present study is aimed at reviewing the existing questionnaires for the usability evaluation of dashboards and at providing appropriate criteria for such assessments.

2.1. Data Sources and Search Strategy

The search and data extraction stages were performed based on the PRISMA checklist [ 23 ]. Articles were extracted by searching the PubMed, Web of Science, and Scopus databases. A combination of MeSH terms and keywords related to dashboards, usability, and questionnaires was used for the search strategy ( Table 1 ). The final search of articles was carried out without any time restrictions. Two researchers (SA and SS) searched and retrieved articles independently, and any disagreement was discussed with the senior author (RR).

The keywords used in the search strategy.

2.2. Inclusion and Exclusion Criteria

2.2.1. inclusion criteria.

The inclusion criteria were as follows: (1) English articles published on the design, implementation, and evaluation of dashboards in healthcare settings, including clinics, hospitals, or any healthcare center providing services for disease prevention, treatment, rehabilitation, and medical education and (2) the use of questionnaires for evaluating dashboards.

2.2.2. Exclusion Criteria

The exclusion criteria were as follows: (1) non-English studies, (2) focusing on only dashboard design or dashboard evaluation, (3) use of evaluation methods other than questionnaires to evaluate usability, and (4) lack of access to the full text of articles.

2.3. Study Selection, Article Evaluation, and Data Extraction

In the study selection phase, two authors (SS and SA) performed screening, selection, and full-text review and two authors (KB and HA) performed qualitative evaluations of papers; any disagreement was checked and eliminated through discussing with the senior author (RR). The quality of each study was checked by using the Joanna Briggs Institute (JBI) critical appraisal tools. The JBI-MAStARI instrument was used for RCT and quasiexperimental studies (nonrandomized experimental studies) [ 24 ]. For RCT studies, there is a checklist containing 13 questions with four options (“yes,” “no,” “unclear,” and “not/applicable”). For quasiexperimental studies, there is a checklist covering 9 questions with four options (“yes,” “no,” “unclear,” and “not/applicable”).

One score was assigned for each “yes” answer, and in case 70 of the questions led to “yes” answer, the risk of bias was considered as low. The risk of bias was regarded as “moderate” in the event of obtaining 50-60% of “yes” answers. Ultimately, a “high-risk” bias was assigned to “yes” responses below 50% (Appendix A Table A1 and Appendix A Table A2 ).

For data extraction, the features of questionnaires, including the number and scoring of questions, criteria, and reliability, were first investigated ( Table 2 ). Next, the year of the study, country of the study, evaluation criteria for dashboards, and questionnaires used for the evaluation of dashboards were extracted for each article and entered into Microsoft Excel for analysis (Appendix B Table A3 ). Moreover, for data extraction, the questionnaires were assessed, and the evaluation criteria for dashboards were extracted ( Table 3 ). The reasons for selecting or removing each criterion for dashboard evaluation in the questionnaires are presented (Appendix C Table A4 ).

Characteristics of the questionnaires.

TAM: Technology Acceptance Model; UTAUT: Unified Theory of Acceptance and Use of Technology; SUS: System Usability Scale; SART: Situation Awareness Rating Technique; Health-ITUES: Health Information Technology Usability Evaluation Scale; PSSUQ: Post-Study System Usability Questionnaire; QUIS: Questionnaire for User Interaction Satisfaction; CSUQ: Computer System Usability Questionnaire; EUCS: End-User Computing Satisfaction Model; DATUS: Dashboard Assessment Usability Model; GR: Global Reliability; NR: not reported.

Usability evaluation criteria for dashboards.

A total of 1214 articles were retrieved after searching the databases. Using EndNote software, 108 duplicate articles were removed, and 1106 articles remained. After reviewing the titles and abstracts of studies, 1002 articles were removed, and 105 articles remained. Finally, by reviewing the full text of studies, 75 articles were removed, and 29 articles were included in the present study. The article selection process is presented in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is BMRI2023-9990933.001.jpg

The study flow diagram based on the PRISMA guidelines.

3.1. Quality Assessment

Based on the qualitative evaluation of articles using the Joanna Briggs Institute (JBI) appraisal tool, among nonclinical studies, 8 (31%) articles were classified to have “moderate” qualitative evaluations for dashboards, while 18 (69%) articles were placed in the “low-risk group” (Appendix A Table A1 ). Additionally, three clinical trials were evaluated using the JBI tool, all of which were placed in the low-risk group (Appendix A Table A2 ).

3.2. General Characteristics of Studies

According to our review of selected studies, 29 (89%) articles, including 23 cross-sectional studies, three case report studies, one longitudinal study, and three experimental and clinical trials (11%), were found to be descriptive. As shown in Figure 2 , the number of articles focusing on dashboards in healthcare is increasing. Concerning the location of studies, the majority of studies were conducted in the United States (39%), England (14%), Germany (7%), and South Korea (7%), respectively.

An external file that holds a picture, illustration, etc.
Object name is BMRI2023-9990933.002.jpg

Number of publications by year.

Five studies used researcher-made questionnaires, while 24 studies used existing questionnaires. In five studies, two questionnaires were used to evaluate dashboard usability. The most widely employed questionnaires were the System Usability Scale (SUS), Technology Acceptance Model (TAM), Situation Awareness Rating Technique (SART), Questionnaire for User Interaction Satisfaction (QUIS), Unified Theory of Acceptance and Use of Technology (UTAUT), and Health Information Technology Usability Evaluation Scale (Health-ITUES), respectively ( Figure 3 ).

An external file that holds a picture, illustration, etc.
Object name is BMRI2023-9990933.003.jpg

Number of questionnaires used in previous studies.

3.3. Usability Evaluation Criteria for Dashboards

According to the review of other questionnaires used in previous studies ( Table 3 ), the following criteria were identified for dashboard evaluation: usefulness, operability, learnability, ease of use, suitability for tasks, improvement of situational awareness, satisfaction, user interface, content, and system capabilities.

3.3.1. Usefulness

Usefulness is usually defined as meeting a customer's needs or providing a competitive advantage with the product's attributes or benefits. Designers, generally, aim to deliver useful products. In the reviewed studies, the “usefulness” criterion was used instead of “effectiveness and efficiency” and it was used in four questionnaires, including the Health-ITUES, PSSUQ, CSUQ, and TAM, to evaluate the usability of dashboards.

3.3.2. Operability

It refers to a user's ability to use and control a dashboard for performing their tasks. In the present study, operability included criteria, such as representation of data in detail, access to various filters and reports, and ability to correct errors and support user. The user control is measured under the “operability” criterion.

3.3.3. Learnability

Learnability is a quality of software interface that allows users to quickly become familiar with them and able to make good use of all their features and capabilities.

3.3.4. Ease of Use

It is a fundamental concept explaining how easily users can employ a dashboard. This criterion was used for dashboard evaluation in the EUCS, Health-ITUES, and TAM questionnaires.

3.3.5. Suitability for Tasks

This criterion can help to assess if users can find out whether a product or system is appropriate for their needs. It provides support for the users' daily activities and ensures the compatibility and organization of data on the screen with the user's tasks.

3.3.6. Improvement of Situational Awareness

Situation awareness at a fundamental level is about understanding what is going on and what might happen next. The criteria for evaluating situational awareness were divided into instability representation, complexity representation, variability representation, arousal support, concentration support, spare mental capacity support, and division of attention.

3.3.7. Satisfaction

This criterion refers to satisfaction with the features, capabilities, and ease of use of a dashboard.

3.3.8. User Interface

It consists of visual and interactive tools. Visual tools in a dashboard involve color coding for data visualization, histogram plots, pie charts, bar graphs, gauges, data labels, and geographic maps. The interactive techniques also include customizable searching, summary view, drill up and drill down, data ordering and filtering, zoom in and zoom out, and real-time feature.

3.3.9. Content

This criterion involves the quantity and quality of data displayed by a dashboard. The quantity of displayed data was measured using two questionnaires (SART and PSSUQ), while quality was measured using SART. The amount of displayed data and their compatibility with the users' tasks were also evaluated, and data accuracy, timeliness (being up-to-date), comprehensiveness, and relevance were used for measuring data quality.

3.3.10. System Capabilities

Evaluation of compatibility is a criterion to assess software in terms of compatibility with work-related requirements. The dashboard capabilities are evaluated to determine how well its compatibility to work-related processes and how well it satisfies the users' data requirements.

4. Discussion

In the present study, questionnaires used in previous research were reviewed to suggest criteria for dashboard evaluation. Generally, questionnaires are the most commonly used tools for usability evaluation because of the simplicity of data analysis [ 53 , 54 ]. According to the findings, although SUS does not cover the efficiency, memorability, or error criteria and consists of a series of general questions for usability evaluation [ 55 ], it was the most widely used tool for dashboard evaluation. In four studies, SUS was used along with other questionnaires for dashboard evaluation [ 32 – 35 ].

In the study of Hajesmaeel-Gohari et al., the SUS questionnaire was the most used tool for measuring usability [ 56 ]. In the study of Sousa and Dunn Lopez conducted with the aim of identifying the questionnaires used for usability evaluation of electronic health tools, the main used criteria in the investigated questionnaires included learnability, efficiency, and satisfaction. The memorability was the least used criterion [ 57 ].

In the present study, “satisfaction” and “learnability” were proposed as two key criteria for evaluating the usability of the dashboards, and “efficiency” was also proposed as one of the subcriteria of “usefulness.” One criterion, i.e., “memorability,” was not included in the proposed framework, as the learnability could cover the required metrics.

To take advantage of usability evaluation tools, it is important to pay attention to the study objectives, used technologies, and context of use [ 53 , 58 , 59 ]. The ISO/IEC 25010 consists of suitability for tasks, learnability, operability, user error protection, user interface aesthetics, and accessibility [ 60 ]. The ISO/IEC 9241-11 also suggests measure such as effectiveness, efficiency, and satisfaction for usability evaluation [ 60 ]. Additionally, Nielsen's criteria were used for evaluating dashboard including efficiency, memorability, error, learnability, and satisfaction [ 61 ]. In the current study, usefulness was used rather than the effectiveness and efficiency criterion, and it was used in four questionnaires, including the Health-ITUES, PSSUQ, CSUQ, and TAM.

In general, TAM and UTAUT are the most widely used acceptance models in health informatics because of their simplicity, and these mainly focus on the usefulness and easy to use technology [ 56 ].

The dashboard “operability” criterion in the current study refers to the user's ability to the user's control over the software, error correction ability, and quick recovery. In addition, in previous studies, the “operability” criterion referred to error correction, error correction in use, default value availability in use, message understandability, self-explanatory error messages, operational error recoverability in use, and time between human error operation in use [ 62 ]. Moreover, improvement of situational awareness was considered as one of the evaluation criteria for dashboards. Overall, dashboards provide key data that should be monitored effectively to be notified of what is occurring in one's work environment. The results of previous studies indicated that dashboards have the potential to accelerate data collection, decrease the cognitive load, reduce errors, and improve situational awareness in healthcare settings [ 8 , 16 ].

Additionally, the “user interface” criterion includes what a user uses to interact with the system. Some interface hardware components include a keyboard, mouse, microphone, and user interface (e.g., graphic forms, language tools, and interactive tools) [ 22 ]. With respect to the user interface of dashboards, the application of visual and interactive features was suggested in the present study, considering data representation and interactive visualization as critical features [ 63 ]. Visualization systems, such as dashboards, are capable of two main functions: representation and interaction [ 64 ]. Besides interactive features, it is also essential to consider the visual features for an effective and understandable representation of indicators, which can lead to an effective interaction with data and instantaneous monitoring of performance indices [ 61 , 65 ]. In Shneiderman's study, interactive features included overview, zoom, filter, details-on-demand, relate, history, and extraction [ 66 ]. In addition, interactive techniques in M. Khan and S. Khan's study included zoom and pan, overview and detail, and filtering [ 67 ].

In the current study, the quantity and quality of data represented by dashboards were considered as the content criteria. In the EUCS questionnaire, being up-to-date is considered as a separate criterion for dashboard evaluation, while being up-to-date, accurate, comprehensive, and relevant were considered as data quality features in previous research [ 68 , 69 ]; consequently, in the present study, these features were considered for data quality. Data quality refers to data integrity, data standardization, data granularity, and data completeness, which are essential for a well-designed dashboard. Data integrity indicates whether a dashboard could provide information on data sources, collection methods, and representativeness [ 68 ].

Furthermore, the “system capabilities” criterion, which involves dashboard features and capabilities, was regarded as a separate criterion for evaluating dashboards in the present study. To design a dashboard, functional and nonfunctional requirements should be taken into consideration. The functional requirements of dashboards denote the key functions of a system related to operations carried out or facilitated using that system. On the other hand, nonfunctional requirements are a set of specifications that are not directly related to users' tasks but could improve its functionality [ 9 , 70 ].

Finally, it can be acknowledged that both quantitative and qualitative methods play a significant role in technology development and progress. While quantitative methods have some advantages, such as cost-effectiveness and higher suitability for studies with a large sample size, qualitative methods (e.g., think aloud) are beneficial for providing details about problems to which quantitative methods do not commonly apply [ 57 ]. Additionally, qualitative data analysis of user's behaviors and routines and a variety of other information are essential to deliver a product that actually fits into a user's needs or desires [ 71 ]. A combination of qualitative and quantitative approaches is suggested to appropriately measure the usability of technologies [ 57 ].

5. Strengths and Limitations

Since no study has yet designed a tool for evaluating usability of dashboards in healthcare, in this systematic review, a comprehensive analysis was carried out to remark usability evaluation criteria for dashboards. The usability evaluation criteria that could be used for dashboards were extracted by investigating 29 questionnaires used in previous available studies. However, there are limitations with the current study. First of all, although these studies provided a foundation for conducting our review and suggesting relevant criteria, further study is required to investigate the power of suggested criteria in practice. However, we have designed such a study to address the limitation noted. Second, this review only focused on quantitative studies and usability questionnaires, while qualitative approaches could help to provide a more robust construction for dashboard evaluation. However, we made an attempt to provide a basis for researchers who aim to measure different aspects of dashboards quantitatively, which is a well-used and common evaluation approach. In addition, we focused on English published literature, and we might have missed some relevant studies published in non-English languages.

6. Conclusion

Dashboards, as data management tools, play a crucial role in the decision-making and management of clinical and administrative data; therefore, they should be free of any usability-related problems. In this study, by reviewing the existing questionnaires used for the usability evaluation of dashboards, some criteria were suggested for evaluating dashboards, including usefulness, operability, learnability, ease of use, suitability for tasks, improvement of situational awareness, satisfaction, user interface, content, and system capabilities. When choosing criteria for the usability evaluation of dashboards, the study objectives, dashboard features and capabilities, and context of use should be taken into consideration.

Data Availability

Conflicts of interest.

The authors declare that they have no conflict of interest.

Authors' Contributions

Concept and design were carried out by SA, KB, and RR. Literature search and quality check were carried out by SA, SS, and RR. Data analysis and interpretation were carried out by SA, SS, HA, and RR. Manuscript drafting was carried out by SA and SS. Editing and critical review were carried out by RR, KB, and HA. All authors read and approved the final manuscript.

Supplementary Materials

Table A1: appraisal result of study quality for quasiexperimental studies using the JBI-M AStARI. Table A2: appraisal result of study quality for the RCT using the JBI-MAStARI. Table A3: examining dashboard evaluation criteria in included articles. Table A4: dimensions to measure usability discarded from the model.

Book cover

International Conference on Human-Computer Interaction

HCII 2022: HCI International 2022 - Late Breaking Papers. Design, User Experience and Interaction pp 499–516 Cite as

Mobile Applications Usability Evaluation: Systematic Review and Reappraisal

  • Jiabei Wu 16 &
  • Vincent G. Duffy 16  
  • Conference paper
  • First Online: 05 October 2022

1363 Accesses

1 Citations

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13516)

To be more competitive in the mobile applications market, designing the mobile applications that users tend to use is of great significance. Usability plays an indispensable role in affecting usage intention. Therefore, mobile applications usability evaluation is vital in the designing process. This report is a systematic review of mobile applications usability evaluation by bibliometric analysis. We used Scopus and Web of Science to search documents. Trend analysis, co-occurrence keywords analysis, co-authorship analysis, co-citation analysis, leading table, and word cloud were conducted to do the systematic review. Research related to this topic has become popular in recent years. Mobile applications usability can be influenced by the user, environment, task/activity, and technology, evaluated by different usability attributes from different perspectives, and influence technology acceptance, adoption, retention, etc. [ 7 ] Lab experiments and field study were frequently used mobile applications usability evaluation methods [ 5 – 7 ]. The subjective questionnaire, such as SUS (System Usability Scale) [ 12 ] and USE (Usefulness, Satisfaction, Ease of use) questionnaire [ 17 ], were frequently applied to investigate participants’ subjective feelings after using mobile applications. Another subjective method is the heuristic evaluation [ 15 ]. Then, objective metrics, such as task completion time, task completion rate, time spent on the first use, and so on, were frequently used in mobile applications usability evaluation [ 5 , 6 , 35 ]. In future work, we can find papers from more databases and include more papers in the literature review. Next, differences in contexts, users, and goals, should be considered in mobile applications usability evaluation. Finally, security is a prominent factor in mobile applications development.

  • Mobile applications
  • Bibliometric analysis

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Hoehle, H., Viswanath, V.: Mobile application usability: conceptualization and instrument development. MIS Q. 39 (2), 435–472 (2015)

Article   Google Scholar  

Wei, J., Dong, S.Y.: Mobile systems design and evaluation. In: Handbook of Human Factors and Ergonomics, 5th edn., pp. 1037–1057. Wiley, Hoboken (2021)

Google Scholar  

Venkatesh, V., Hillol, B.: Technology acceptance model 3 and a research agenda on interventions. Decis. Sci. 39 (2), 273–315 (2008)

International Organization for Standardization: Ergonomics of human-system interaction - Part 11: Usability: Definitions and concepts. https://www.iso.org/obp/ui/#iso:std:iso:9241:-11:ed-2:v1:en

Harrison, R., Derek, F., David, D.: Usability of mobile applications: literature review and rationale for a new usability model. J. Interact. Sci. 1 (1), 1–16 (2013)

Zhang, D.S., Boonlit, A.: Challenges, methodologies, and issues in the usability testing of mobile applications. Int. J. Hum. Comput. Interact. 18 (3), 293–308 (2005)

Coursaris, C.K., Kim, D.J.: A meta-analytical review of empirical mobile usability studies. J. Usability Stud. Arch. 6 , 117–171 (2011)

Ding, Y., Rousseau, R., Wolfram, D. (eds.): Measuring Scholarly Impact. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10377-8

Book   Google Scholar  

Kurniawan, J., Duffy, V.G.: Systematic review of the importance of human factors in incorporating healthcare automation. In: Duffy, V.G. (ed.) HCII 2021. LNCS, vol. 12778, pp. 96–110. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77820-0_8

Chapter   Google Scholar  

Van, E.N.J., Waltman, L.: VOSviewer Manual, pp. 1–54. Univeristeit Leiden, Leiden (2021)

Bangor, A., Kortum, P., Miller, J.: Determining what individual SUS scores mean: adding an adjective rating scale. J. Usability Stud. 4 (3), 114–123 (2009)

Brooke, J.: SUS - a quick and dirty usability scale. In: Usability Evaluation in Industry, vol. 189, no. 3 (1996)

Brooke, J.: SUS: a retrospective. J. Usability Stud. 8 (2), 29–40 (2013)

Seffah, A., Mohammad, D., Rex, B.K., Harkirat, K.P.: Usability measurement and metrics: a consolidated model. Softw. Qual. J. 14 (2), 159–178 (2006)

Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 249–256 (1990)

Bangor, A., Kortum, P.T., Miller, J.T.: An empirical evaluation of the system usability scale. Intl. J. Hum. Comput. Interact. 24 (6), 574–594 (2008)

Lund, A.M.: Measuring usability with the use questionnaire12. Usability Interface 8 (2), 3–6 (2001)

Stoyanov, S.R., Hides, L., Kavanagh, D.J., Zelenko, O., Tjondronegoro, D., Mani, M.: Mobile app rating scale: a new tool for assessing the quality of health mobile apps. JMIR mHealth uHealth 3 (1), e3422 (2015)

Chen, C.M.: The citespace manual. College of Computing and Informatics, pp. 1–84 (2014)

Lewis, J.R., Sauro, J.: Usability and user experience: design and evaluation. In: Handbook of Human Factors and Ergonomics, 5th edn., pp. 972–1015. Wiley, Hoboken (2021)

Holzinger, A., Searle, G., Nischelwitzer, A.: On some aspects of improving mobile applications for the elderly. In: Stephanidis, C. (ed.) UAHCI 2007. LNCS, vol. 4554, pp. 923–932. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73279-2_103

Poynter, R.: The utilization of mobile technology and approaches in commercial market research. In: Mobile Research Methods, pp. 11–20. Ubiquity Press, London (2015)

Kortum, P., Sorber, M.: Measuring the usability of mobile applications for phones and tablets. Int. J. Hum. Comput. Interact. 31 (8), 518–529 (2015)

Nielsen, J.: Usability Engineering. AP Professional, Boston (1994)

MATH   Google Scholar  

Venkatesh, V.: Determinants of perceived ease of use: integrating control, intrinsic motivation, and emotion into the technology acceptance model. Inf. Syst. Res. 11 (4), 342–365 (2000)

Beul, L.S., Christian, S., Maximilian, W., Karl, H.K., Eva, M.J., Martina, Z.: Usability evaluation of mobile passenger information systems. In: Marcus, A. (ed.) Design, User Experience, and Usability. Theories, Methods, and Tools for Designing the User Experience, pp. 217–228. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07668-3_22

Arain, A.A., Hussain, Z., Rizvi, W.H., Vighio, M.S.: Evaluating usability of M-learning application in the context of higher education institute. In: International Conference on Learning and Collaboration Technologies, pp. 259–268. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39483-1_24

Islam, M.N., Khan, S.R., Islam, N.N., Rezwan, A.R.M., Zaman, S.R., Zaman, S.R.: A mobile application for mental health care during Covid-19 pandemic: development and usability evaluation with system usability scale. In: Suhaili, W.S.H., Siau, N.Z., Omar, S., Phon-Amuaisuk, S. (eds.) International Conference on Computational Intelligence in Information System, pp. 33–42. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68133-3_4

Islam, M. N., Karim, Md.M., Inan, T.T., Islam, A.K.A.M.: Investigating usability of mobile health applications in Bangladesh. BMC Med. Inf. Decis. Making 20 (1), 19 (2020)

Joyce, G., Lilley, M., Barker, T., Jefferies A.: Mobile application usability heuristics: decoupling context-of-use. In: Marcus, A., Wang, W. (eds.) Design, User Experience, and Usability: Theory, Methodology, and Management, pp. 410–423. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58634-2_30

Nielsen Norman Group: 10 Usability Heuristics for User Interface Design. https://www.nngroup.com/articles/ten-usability-heuristics/#poster

Joyce, G., Lilley, M.: Towards the development of usability heuristics for native smartphone mobile applications. In: Marcus, A. (ed.) Design, User Experience, and Usability. Theories, Methods, and Tools for Designing the User Experience, pp. 465–474. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07668-3_45

Joyce, G., Lilley, M., Barker, T., Jefferies A.: Mobile application usability: heuristic evaluation and evaluation of heuristics. In: Amaba, B. (ed.) Advances in Human Factors, Software, and Systems Engineering, pp. 77–86. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41935-0_8

Inostroza, R., Rusu, C., Roncagliolo, S., Rusu, V., Collazos, C.A.: Developing SMASH: a set of smartphone’s usability heuristics. Comput. Stand. Interfaces 43 , 40–52 (2016)

Saleh, A., Ismail, R., Fabil, N.: Evaluating usability for mobile application: a MAUEM approach. In: Proceedings of the 2017 International Conference on Software and e-Business, pp. 71–77. ACM (2017)

Download references

Author information

Authors and affiliations.

Purdue University, West Lafayette, IN, 47906, USA

Jiabei Wu & Vincent G. Duffy

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Vincent G. Duffy .

Editor information

Editors and affiliations.

The Open University of Japan, Chiba, Japan

Masaaki Kurosu

Tokyo University of Science, Tokyo, Saitama, Japan

Sakae Yamamoto

Tokyo City University, Tokyo, Japan

Hirohiko Mori

Southern University of Science and Technology, Shenzhen, China

Marcelo M. Soares

World Usability Day and Bubble Mountain Consulting, Newton Center, MA, USA

Elizabeth Rosenzweig

Aaron Marcus and Associates, Berkeley, CA, USA

Aaron Marcus

Tsinghua University, Beijing, China

Pei-Luen Patrick Rau

Coventry University, Coventry, UK

Cranfield University, Cranfield, UK

Wen-Chin Li

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Wu, J., Duffy, V.G. (2022). Mobile Applications Usability Evaluation: Systematic Review and Reappraisal. In: Kurosu, M., et al. HCI International 2022 - Late Breaking Papers. Design, User Experience and Interaction. HCII 2022. Lecture Notes in Computer Science, vol 13516. Springer, Cham. https://doi.org/10.1007/978-3-031-17615-9_35

Download citation

DOI : https://doi.org/10.1007/978-3-031-17615-9_35

Published : 05 October 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-17614-2

Online ISBN : 978-3-031-17615-9

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Open access
  • Published: 29 February 2024

What methods are used to examine representation of mental ill-health on social media? A systematic review

  • Lucy Tudehope   ORCID: orcid.org/0000-0002-9544-1006 1 ,
  • Neil Harris   ORCID: orcid.org/0000-0002-1786-3967 1 ,
  • Lieke Vorage   ORCID: orcid.org/0000-0002-5744-189X 1 &
  • Ernesta Sofija   ORCID: orcid.org/0000-0002-4761-9762 1  

BMC Psychology volume  12 , Article number:  105 ( 2024 ) Cite this article

19 Accesses

7 Altmetric

Metrics details

There has been an increasing number of papers which explore the representation of mental health on social media using various social media platforms and methodologies. It is timely to review methodologies employed in this growing body of research in order to understand their strengths and weaknesses. This systematic literature review provides a comprehensive overview and evaluation of the methods used to investigate the representation of mental ill-health on social media, shedding light on the current state of this field. Seven databases were searched with keywords related to social media, mental health, and aspects of representation (e.g., trivialisation or stigma). Of the 36 studies which met inclusion criteria, the most frequently selected social media platforms for data collection were Twitter ( n  = 22, 61.1%), Sina Weibo ( n  = 5, 13.9%) and YouTube ( n  = 4, 11.1%). The vast majority of studies analysed social media data using manual content analysis ( n  = 24, 66.7%), with limited studies employing more contemporary data analysis techniques, such as machine learning ( n  = 5, 13.9%). Few studies analysed visual data ( n  = 7, 19.4%). To enable a more complete understanding of mental ill-health representation on social media, further research is needed focussing on popular and influential image and video-based platforms, moving beyond text-based data like Twitter. Future research in this field should also employ a combination of both manual and computer-assisted approaches for analysis.

Peer Review reports

Introduction

In the last few decades, and particularly in the wake of the COVID-19 pandemic, the threat mental illness poses to public health has been increasingly recognised. The World Health Organization defines mental health as “a state of mental well-being that enables people to cope with the stresses of life, realize their abilities, learn well and work well, and contribute to their community” (World Health Organization, 2022, p. 8). However, this review is focused on mental-ill health, an umbrella term to refer to an absence of this state of well-being either through mental illness/disorder or mental health problems [ 1 , 2 ]. A global burden of disease study to quantify the impact of mental and addictive disorders estimated that 16% of the world’s population were affected by some form of mental or addictive disorder in 2019, and suggest these conditions contribute to 7% of total disease burden as measured by disability adjusted life years (DALYs) [ 3 ]. Although the age-adjusted rates of DALYs and mortality for all disease causes have steadily declined in the last 15 years by 30.4% and 16.3% respectively, these rates have only increased for mental disorders by 4.3% and 12% respectively [ 3 ].

Despite the benefits and effectiveness of modern medicine, therapies and community support programs for those with mental health conditions, engagement with mental health support is often very poor [ 4 ]. Even for individuals who do eventually seek mental health care, the delay between symptom onset and treatment averages more than a decade [ 5 ]. The consequences of such delays in help-seeking can include adverse pathways to care [ 6 ], worse mental health outcomes [ 7 ], drug and alcohol abuse [ 8 ] and suicide [ 9 ]. While there are many potential barriers to the help-seeking process, significant previous research has demonstrated that attitudes towards mental illness, in particular stigma, are key factors preventing individuals from translating a need for help into action [ 9 , 10 , 11 ]. Stigma is a term often used in a broad sense to refer to discriminatory and negative beliefs attributed to a person or group of people [ 12 ]. However, in order to design evidence-based and effective stigma reduction interventions, a nuanced understanding of current societal views and attitudes towards mental ill-health is first necessary.

Historically, many studies investigating public stigma towards mental illness have focussed on traditional media (e.g., print or television news media), but more recently the wealth of information provided by social media has been recognised. Researchers are now harnessing social media as a powerful tool for public health research, for example in the fields of epidemiology and disease surveillance [ 13 , 14 ], chronic disease management and prevention [ 15 ], health communication [ 16 ] and as an effective platform for intervention strategies [ 17 ].

Social media allows individuals to share user-generated or curated content and to interact with others [ 18 ]. It has become a central means to share their experiences and express their thoughts, opinions, and feelings towards issues. Access to such information and opinion has significant potential to influence the attitudes and health behaviours of social media users [ 19 ]. It can perpetuate negative stereotypes and increase stigma, but it can also provide a platform for discussion and sharing of personal experiences potentially helping to reduce stigma and in turn, facilitate help seeking behaviour. It must also be noted that persons living with mental illness are known to have higher rates of social media use in comparison to the general population, and are therefore at high risk of exposure to potentially negative or misrepresenting mental health content [ 20 ]. As such, social media presents a valuable research tool for investigating the attitudes of society toward mental ill-health.

Much of the previous research surrounding mental health and social media focuses on the effects of extensive social media use on psychological health and wellbeing [ 21 ] and utilizing machine learning to detect and predict the mental health status of users [ 22 ]. However, there has been a recent surge in studies using social media data to reveal attitudes and perceptions towards mental-ill health more broadly and towards specific mental health conditions. Despite the growing interest in this field and its importance to public mental health, no attempts have been made to systematically review these studies. The current state of research is heterogenous with various research designs, data collection and data analysis techniques employed to analyse social media data. A methodological review is needed to provide researchers and health professionals with an overview of the current state of the literature, demonstrate the utility of various methods and provide direction for future research.

Therefore, the aim of this systematic literature review is to provide a comprehensive overview and evaluation of the current research methods used to investigate the representation of mental ill-health on social media. The review critically appraises the quality of these studies, summarises their methodological approaches, and identifies priorities and future opportunities for research and study design.

Search strategy and screening procedure

Seven databases were systematically searched on September 27, 2022, including Ovid MEDLINE (via Ovid), PsycINFO (via Ovid), CINAHL (via EBSCO), SCOPUS and the ProQuest Public Health, Psychology and Computer Science Databases. Searches were filtered to present only peer-reviewed journal articles and studies published in English, and terms were applied to the title and abstract fields for each database where possible. Search terms related to [ 1 ] social media (e.g., “social platform”, “online social network*”, “user-generated”), [ 2 ] mental health (e.g., “depress*”, “anxiety”, “schizo*”) and [ 3 ] either relevant method (e.g., “(content or discourse or thematic) adj3 analy*) or terms to reflect representation (e.g., “represent*”, “attitude*”, “stigma*”). The full search strategy employed for each database can be found in Additional File 1.

The abstract and citation information for 9,576 records were downloaded and imported into Covidence systematic review software (Version 2), a web-based software specifically designed to facilitate screening, extraction, and quality appraisal. Once imported, duplicate records were automatically identified and removed by Covidence. Each stage of the screening process was carried out by two authors (LT and LV), independently. The title and abstract of 5,373 articles were screened to determine eligibility. If the two reviewers marked a different decision in Covidence, the articles were discussed and reviewers came to a consensus, and if a decision could not be made a third reviewer was consulted (ES or NH). Articles included at the title/abstract level ( n  = 136) were then screened in full text to determine relevance. Reviewers recorded the reason for exclusion. The reference list for each eligible article was then screened for any relevant publications.

This systematic review is registered with the International Prospective Register of Systematic Reviews (PROSPERO, ID: CRD42022361731). The review is reported in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines 2020 [ 23 ]. Figure  1 presents a PRISMA flowchart detailing the systematic review procedure.

figure 1

PRISMA flow diagram of identification, screening, and inclusion procedure

Eligibility criteria

Peer-reviewed journal articles were considered for inclusion if authors conducted an analysis of user-generated social media content regarding mental ill-health and its representation. To be considered for inclusion, social media content must be posted by individual users, as opposed to content posted on behalf of a group or organisation e.g., news media or a non-government organisation. All social media platforms except for those considered discussion forum websites such as Reddit and Quora were included. These were excluded from the review because they are considered distinct forms of social media in which content is arranged and centred on subject matter in contrast to traditional social networking sites which focus on people and their profiles. As a result, the networking dynamics are distinctly different from traditional social media platforms and bring together individuals with specific shared interests and may therefore be less appropriate for analysis of wider public perceptions and representations of mental ill-health. As per the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V), the scope of the systemic review was narrowed to include social media content regarding any condition classified under ‘schizophrenia spectrum and other psychotic disorders’, ‘bipolar and related disorders’, ‘depressive disorders’, ‘anxiety disorders’ and ‘obsessive-compulsive and related disorders.’ Studies must evaluate content regarding mental health more broadly or focus on a specific mental health condition as listed under these DSM-V classifications. It is beyond the scope of this review to include studies which focus on mental health in a positive sense i.e., wellbeing, happiness, and positive functioning.

In terms of study design, articles were included if they analysed the content of social media posts and/or comment responses, whether this be text, photo and/or video-based content. Data analysis methods may include but are not limited to content, discourse, thematic or linguistic analysis, and may also include studies which utilised machine learning to facilitate the analysis process. Conference proceedings, articles without accessible full-text or published in a language other than English were also excluded.

Data extraction and synthesis

The data extraction template was developed using a sample of 5 studies. It was then piloted using an additional 5 studies and further refined. Extraction was completed through Covidence by one reviewer (LT) and subsequently checked by a second reviewer (LV). Any issues or questions were discussed and agreed upon by the two reviewers, and a third reviewer (ES or NH) was consulted if a decision could not be made. Extracted data included bibliographic information as well as methodological details, including: (1) aim/objective (2) social media platform and language (3) mental health condition/s (4) comparison to physical condition (yes/no) (5) hashtags/keywords used for search (6) data range and timeframe of collected data (7) number of posts analysed (8) type of data analysis (9) coding framework and development process and (10) coding protocol. The extracted results are presented in a narrative synthesis due to the heterogeneity of the included studies and because this review focuses on the methods of included studies.

Article appraisal

Critical appraisal of the included studies was conducted based on the Critical Appraisal Skills Programme (CASP) guidelines for qualitative research (Critical Appraisal Skills Programme, 2022). This tool contains a checklist of 10 items which assist in the assessment of the appropriateness of the qualitative research design, consideration of ethical issues, the rigour of data collection, analysis and presentation of results and value of the research. Each item was answered with ‘Yes’, ‘Can’t tell’ or ‘No’. Two reviewers (LT and LV) independently applied the CASP checklist for each of the extracted studies. Any disagreements were discussed and resolved between the two reviewers of if this was not possible a third independent review (ES or NH) assisted.

The included studies primarily involved the analysis of text-based data derived from social media. When considering the range of critical appraisal tools which could be employed in this systematic review, the CASP tool was selected by the authors because it included items most applicable to this type of analysis, as opposed to qualitative studies involving interview or focus group-based data collection. The authors decided to exclude item 4, “Was the recruitment strategy appropriate to the aims of the research?” and item 6, “Has the relationship between researcher and participants been adequately considered?”, as there was no recruitment of active participants in the included studies. Researcher bias was instead considered when answering ‘Was the data analysis sufficiently rigorous?” by identifying whether authors demonstrated consistency in coding and factored in potential biases. The identification and selection of posts for analysis was considered in the question regarding data collection (item 5).

It must be noted that some of the studies selected for inclusion in the review analyse text-based data in a quantitative manner or conduct additional quantitative analysis of social media reach metrics. These studies were still appraised using the CASP tool, however questions such as “Is a qualitative methodology appropriate?” and “Was the data analysis sufficiently rigorous?” were modified or expanded to include consideration of any quantitative analysis elements. This was deemed more appropriate than employing a mixed-methods appraisal tool, which included items inappropriate or irrelevant to the included studies.

A total of 36 articles met all inclusion criteria and were synthesised in the results. The search yielded 10 articles (27.8%) which were published in 2022, the year the search was conducted. A further 15 articles were published within the previous three years from 2019 to 2021 (41.7%) and 11 were published in 2018 or earlier (30.6%). Figure  2 illustrates the growth in the cumulative number of peer-reviewed publications each year.

figure 2

Cumulative number of articles published each year and their primary method of analysis

Social media platforms and unit of analysis

As shown in Table  1 , various social media platforms were used for the collection of data. Of the 36 included studies, the majority ( n  = 22, 61.1%) analysed data collected from Twitter. This was followed by 5 studies analysing Sina Weibo (13.9%), 4 studies analysing YouTube (11.1%), 2 studies analysing Instagram (5.6%), 1 study analysing TikTok (2.7%) and 1 study analysing Pinterest (2.7%). One study collected data from a variety of social media platforms (2.7%).

The unit/s of analysis (element of social media post analysed) also varied between studies (Refer to Table  2 ). A total of 28 (77.8%), primarily comprising the Twitter and Sina Weibo studies, analysed text-based data. Three studies analysed images (8.3%), two of which also involved analysis of associated captions (5.6%). Four studies analysed video-based content (11.1%). In total 8 studies (22.2%) conducted an analysis of comments associated with social media posts and 15 (41.7%) analysed reach metrics such as post likes and shares. Only 3 studies (8.3%) included an analysis of any content linked in a social media post such as an external website, and 14 (38.9%) collected and analysed data based on the social media profile type or demographics of content posters.

Mental health condition/s in focus

The studies analysed social media content relating to one or more mental health conditions as per the review inclusion criteria (Refer to Table  1 ). The most frequent mental health condition was schizophrenia/psychosis, with content analysed in 14 studies (38.9%). This was closely followed by studies focused on mental health/mental illness content more broadly ( n  = 13, 36.1%), for example by searching for posts using #mentalhealth or ‘mental illness’, and studies which analysed depression ( n  = 12, 33.3%) Four included studies focused on bipolar disorder (11.1%), three studies focused on obsessive compulsive disorder (8.3%), only two focused on anxiety (5.6%) and one specifically focused on trichotillomania (2.8%).

Although the majority of studies focus solely on social media content related to one mental health condition, four studies (11.1%) include multiple health conditions and compare analysis results between each condition. Budenz et al. [ 25 ] compares content related to mental health/mental illness to content specific to bipolar disorder, while Jansli et al. [ 29 ] compares seven different mental health conditions. Both Li et al. [ 45 ] and Reavley and Pilkington [ 39 ] offer a comparison of schizophrenia/psychosis and depression related social media content. Four studies also incorporated a comparison between mental and physical health conditions into research aims. Studies compare mental ill-health content to diabetes [ 24 , 31 , 40 , 43 ], cancer [ 40 , 43 ], Alzheimer’s disease [ 43 ], HIV/AIDS [ 40 , 43 ], asthma [ 40 ] and epilepsy [ 40 ].

Social media content language and location of researchers

The inclusion criteria specified that studies must be published in English, but studies did not necessarily need to analyse English-based social media content. While 75.0% of studies did analyse English content ( n  = 27), five studies analysed Chinese content (13.9%), two studies analysed Greek content (5.6%), and Turkish, French, and Finnish social media content were each analysed in one study (8.4%) (Refer to Table  1 ).

Over half of the literature in this field is published by researchers affiliated with institutions within the United States ( n  = 19, 52.8%). This is followed by five studies from researchers in the United Kingdom (13.9%), four studies from China (11.1%), four studies from Canada (11.1%), and the remaining articles from researchers in Finland, Greece, Israel, Australia, New Zealand, Spain, Netherlands, and Turkey ( n  = 11, 30.6%).

Study design

Data collection methods.

The specific method of data collection varied based on the social media platform analysed. In most studies, authors applied a specific hashtag search relevant to the mental health topic of interest (e.g., #mentalhealth) or entered keywords into the social media platform search bar (e.g., “schizophrenia”). Given the volume of data posted to social media, most studies limited the collection of data to a specified time period, which ranged drastically between studies from 1 day to 10 years.

Several studies aimed to analyse mental health-related social media content based on a particular event or public health campaign, which dictated the timeframe of data collection. Makita et al. [ 33 ] collected data and analysed discourse specifically during Mental Health Awareness Week and Saha et al. [ 41 ] collected data only on World Mental Health Awareness Day. A study by Budenz et al. [ 20 ] collected data before and after a mass shooting event in the United States to identify changes in mental illness stigma messaging. Two studies analysed social media responses to the mental ill-health disclosure of professional athletes [ 36 , 54 ], and one study collected data using the hashtag ‘#InHonorofCarrie’ to examine mental health-related content after the death of mental health advocate and actress Carrie Fisher [ 35 ].

While some authors analysed all posts identified in their social media search, others used specific inclusion/exclusion criteria and/or selection methods to limit the number of posts for further analysis. These included random selection of posts in the search result, selecting only every ‘x’th post, selecting the most viewed/liked/commented posts and/or selecting the first ‘x’ number of posts appearing in search results or each page of search results.

Primary data analysis methods

While all included studies involved analysis of data extracted from social media, the method of analysis differed between studies (Refer to Table  2 ). The majority of studies conducted analysis through manual human-based coding ( n  = 25, 69.4%), of which 24 utilised some form of content analysis ( n  = 24, 66.7%). A total of eight (22.2%) content analysis studies employed an inductive coding approach in which themes were generated from the ‘ground up’ based on the data, while nine studies (25%) employed a deductive approach in which a coding framework was developed prior to the commencement of coding based on previous research and/or author expertise. However, six studies (16.7%) used a combination of approaches, in which a codebook was initially developed, but was inductively refined through a preliminary coding process. Only one study performed an inductive thematic analysis of social media content (2.8%), and one study used a combination of deductive content analysis and inductive thematic analysis to answer research questions (2.8%).

In total five studies (13.9%) used human-based coding in combination with computer-assisted coding, whereby an initial sample of human coded data was used to develop a machine learning model which could subsequently analyse a large volume of data. Aside from content analysis and thematic analysis, three studies conducted software-mediated linguistic analysis (8.3%) and two studies involved sentiment analysis and topic modelling (8.3%) and one used language modelling (2.5%). Figure  2 illustrates the cumulative number of articles published each year and the primary analysis employed. The figure demonstrates that an article utilising a computer-assisted approach was first published in 2018, and there has since been a surge in the number of studies adopting these tools for analysis.

Coding frameworks

The authors who utilised a deductive approach to content analysis, either developed their own coding framework, or adopted a framework previously developed and reported in the literature. Frameworks varied greatly between studies but often included coding the type of social media profile (e.g., individual, consumer, health professional, organisation), the type of mental health-related content (e.g., personal experience, awareness promotion, advertising, news media, personal opinion/dyadic interaction) and/or the broader topic or context of posts (e.g., politics, everyday social chatter, culture/entertainment, mental health, news, awareness campaigns). Some studies also chose to categorise mental health-related content as either ‘medical’ (e.g., diagnosis, treatment, prognosis) or ‘non-medical’ before further classification.

In terms of coding for representation or attitudes towards mental ill-health, most studies coded for stigma, variously defined. In some studies, this was merely the presence or absence of stigma for each unit of analysis (e.g., was there stigmatising content in the tweet or not), but in others stigma was further broken down into more specific types of stigma. For example, the coding framework developed by Reavley and Pilkington [ 39 ] includes stigmatising attitude subthemes such as ‘social distance’, ‘dangerousness’, and ‘personal weakness’. In some studies, trivialisation has been classed as stigma, while in others a separate coding category has been created for any posts which are deemed to be trivialising, mocking or sarcastic towards mental ill-health. Another common approach in the included studies was to code for the valence or overall sentiment of each unit of analysis, in which categories included positive, neutral or negative polarity, or classified tone as positive or pejorative. Some authors analysed the use of mental health related terminology and categorised this based on whether terms are misused or employed metaphorically.

Quality appraisal

The studies were appraised using the CASP tool for qualitative research, which does not calculate a final score or provide an overall grade of quality. A total of 37 studies met all the review inclusion criteria and were appraised by reviewers. A breakdown of appraisal results for each CASP item is presented in Additional File 2. The criteria in which the highest number of studies received a rating of ‘no’ related to the rigour of data analysis ( n  = 6, 16.7%) and clarity of stating findings ( n  = 6, 16.7%). Based on the results of the appraisal and after discussion between all authors, one study was excluded from the review synthesis due to lack of clarity in reporting methods [ 58 ].

This review summarised the current literature investigating the representation of mental ill-health on social media, in particular focussing on methodological design. While human-based content analysis was the dominant means of qualitative data analysis, a limited number of studies employed computer-based techniques. The results also indicated an uneven distribution in the social media platforms selected for data collection, as well as the unit/s of analysis. These findings suggest some important methodological gaps in the literature.

A growing area of research interest

The results demonstrate that almost 70% of all studies in this field were published within the last four years, from 2019 to 2022, suggesting this is an emerging area of interest in the academic literature. Social media research has been used to identify the attitudes and opinions of the public regarding many topics, but appears to have rapidly gained favour amongst researchers during the COVID-19 pandemic, researching public perceptions of issues such as vaccination [ 59 ], healthcare staff [ 60 ], restrictions [ 61 ] and the pandemic more broadly [ 62 , 63 , 64 ]. Perhaps the surge in publications relating to the representation of mental ill-health on social media is reflective of a wider trend towards this type of research and an acknowledgement amongst researchers of the power of social media data. Social media presents real-time data to capture current public perceptions about a topic and the opportunity to monitor changes over time [ 62 ]. However, it must also be acknowledged that the recent growth in publications found may also be reflective of a societal shift towards increased acceptance of using online social media as an appropriate forum for mental health-related discourse, triggering subsequent research interest [ 65 , 66 ].

The dominance of Twitter-based research

Our review revealed an uneven distribution of social media platforms studied within the current literature. Over 50% of the included studies collected data from the text-based social media platform Twitter and a further five studies analysed Sina Weibo data, a Chinese microblogging site highly reminiscent of Twitter. These results align with the findings from other systematic reviews into social media-based research, which demonstrate a skewed focus towards text-based data sources [ 67 , 68 ]. This dominance in the research landscape is likely due to methodological considerations. Twitter is an open-source platform and users can choose not to reveal their identity in profile ‘handles’. The text-based nature of the data also ensures analysis is relatively easier and permits the use of machine learning approaches.

Unfortunately, the emphasis on Twitter limits the scope of this body of research and does not accurately reflect the relative popularity of social media platforms. As of 2023, Facebook has the highest number of global monthly active users (MAUs) at more than 2.9 billion, yet none of the included studies in this review collected data from this platform. This is likely because collecting data on Facebook and other direct messaging platforms without breaching the privacy of users remains an ethical challenge [ 67 ]. Image and video-sharing platforms have seen rapid growth in popularity in the last few years, yet only represent a minority of the studies in this review. Instagram has over 2 billion MAUs, and the video-based platform TikTok has over 1 billion, suggesting a much higher share of the social media market than Twitter at 556 million and Sina Weibo at 584 million MAUs [ 69 ].

Such dominance in the use of Twitter means that certain populations and age groups are underrepresented in the current research. Twitter is known to have an older demographic of users, with 38.5% aged 25–34 years and 20.7% aged 35–49 years [ 70 ]. By comparison, TikTok has become a popular platform for teenagers and young adults, with 67.3% aged under 24 years and only 5.97% aged 35–44 years [ 71 ]. Young people are known to experience a higher rate of mental illness in comparison to older age groups, but their engagement with mental health care is often poor causing a delay in help-seeking behaviour [ 4 , 9 , 72 ]. Thus, future social media-based research into the representation of mental health conditions on platforms predominantly frequented by younger users has the potential to add significant value to this body of literature.

Analysis of social media content

While the majority of included studies employed content analysis ( n  = 24, 66.7%), their processes varied considerably. It is worth noting that nine of these 24 studies, followed a deductive dominant approach to coding, while a further seven included a deductive element. A deductive (or sometimes termed ‘directive’) approach to content analysis is most appropriate where existing research findings, conceptual frameworks or theories can be used to guide codebook development [ 73 , 74 ]. Thus, given that there is extensive previous literature related to the representation of mental ill-health (albeit not necessarily in social media), and in particular frameworks for mental illness stigma, it is appropriate to take a deductive approach [ 75 , 76 ]. However, introducing an inductive element to the approach, in which the initial codebook is inductively refined through initial coding stages can result in coding categories more suited to the specific social media data extracted from the platform of interest and potentially provide more nuanced analysis [ 77 ].

It should also be noted the apparent dearth of studies in this field adopting thematic analysis. There are several reasons why this may be the case, the foremost being the volume of data for analysis on social media. It is widely held in the literature that the choice of content analysis versus thematic analysis is a question of wide application versus deep analysis [ 78 ]. Due its alignment with quantitative research, content analysis can be more suited to larger data sets, whereas thematic analysis allows for greater immersion in the data and depth of understanding [ 78 ]. While both are of value, in the case of social media data where researchers are aiming to understand public representations and attitudes towards mental illness, content analysis can provide the wider analysis required for research questions.

Review of the current literature also suggested the coding frameworks adopted by the included studies vary greatly, making comparison of their findings challenging. Each study defined the concept of stigma differently through their approach to coding, for example both Jansli et al. [ 29 ] and Jilka et al. [ 30 ] simply identified whether content was stigmatising or not. Conversely, Budenz et al. [ 25 ] coded for the presence or absence of mental illness stigma and then specifically coded for violence-related mental illness stigma as the study aimed to identify changes in tweet content before and after a mass shooting event. Meanwhile, Reavley and Pilkington [ 39 ] took the coding process one step further and developed a detailed coding framework which groups different types of stigmatising attitudes such as ‘beliefs that mental illness is due to personal weakness’, ‘people with mental illness are dangerous’ and ‘desire for social distance from the person’. In critically analysing the methodological approaches of these studies, it must be acknowledged that stigma is a broad concept containing many nuances. In order to gain a deep understanding of societal perceptions and attitudes towards mental ill-health, coding frameworks should be developed with these nuances in mind and reflect the many aspects of stigmatising attitudes. Content analysis should avoid a ‘tick box’ approach to the identification of stigma, and instead aim for a richer understanding of mental ill-health perceptions.

Of the studies which employed content analysis, the vast majority used a manual approach in which human researchers hand coded the data. However, more recently machine learning techniques have been applied to the field. For example, Saha et al. [ 41 ] hand coded a sample of 700 tweets and used these to develop a machine learning framework to automatically infer the topic of the remaining 13,517 tweets. Several studies also used specialised packages such as Linguistic Inquiry and Word Count software to extract the psycholinguistic features from social media data and obtain quantitative counts [ 37 , 44 , 45 ]. The clear advantage of these computerised methods is that they allow researchers to analyse much larger volumes of data and reduce the manual labour and time involved in the analysis process. While these studies undoubtedly add value to the body of literature, there still remains a place for the process of manual human coding, especially in the case of more detailed coding frameworks, which can offer more nuanced insights. Although technology is rapidly advancing, manual human coding also remains the only viable means of analysis for researchers intending to interpret image and video-based data.

Quality of studies and frequent issues

Critical appraisal of the included studies was conducted using the CASP tool for qualitative research [ 79 ]. As was described in the methods, this was deemed the most appropriate tool for the appraisal, yet authors still needed to modify and adapt the tool for the purposes of this review. Given the difficulty in finding an appropriate critical appraisal tool for studies which involve analysis of social media-based content and the apparent growth in researcher interest for this study design, the authors advocate for the need of the development of a more specific appraisal tool.

The authors noted a few frequent issues which lowered the quality of included studies and should be addressed in future research in the field. Firstly, multiple studies did not describe the process of codebook development with transparency and if the approach was deductive did not indicate the previous literature which assisted this process. The coding framework is key to ensuring rigorous data analysis and generating meaningful findings, and its development should therefore be described in sufficient detail. The reviewers also noted inconsistency in study coding protocols for content analysis studies. In this type of analysis, reliability is of paramount importance, and previous methodological literature highlights the need to establish intercoder reliability (ICR) [ 80 , 81 ]. At least two coders are needed to independently analyse data [ 81 ], or alternatively two coders can analyse a sample of data and if sufficient intercoder reliability is achieved, one coder can complete the remaining analysis [ 82 ]. Yet, some studies utilised only a single coder, did not establish or report measures of intercoder reliability, or were unclear in their reporting of the coding protocol. Content analysis is susceptible to human biases during the coding process, and thus it is essential to minimise these risks through a robust protocol.

Limitations of social media-based research

Although the strengths of social media-based research are numerous, there are several key limitations to this type of research. Many studies utilise ‘hashtags’ to search and identify content relevant to their topic of interest. However, not everyone who posts on social media uses hashtags, and these are often employed as a means to generate followers [ 83 ]. There are also some technical challenges in the data collection process whereby researchers must use external programs such as a Twitter Application Programming Interface to search for data which only permits access to a portion of all tweets.

Another important consideration is that findings cannot necessarily be generalised to the wider community. Although social media is a significant aspect of life for many, some demographics use and post on social media more frequently than others, for example women and younger age groups [ 84 , 85 ]. Not everyone uses and interacts with social media in the same way, so this type of research cannot be used to interpret the opinions and perspectives of the broader population.

Social media-based research is also somewhat constrained by ethical concerns regarding user privacy. Studies are often limited to the use of data extracted from public profiles, which in turn may bias the type of data collected. Mental health is an inherently sensitive topic, and thus analysis of mental health content posted to private social media profiles may yield additional insights.

Limitations of the review

This systematic review is subject to several limitations which must be noted. Firstly, the scope of this review was limited to identification and analysis of the methods used in the included studies and did not extend to synthesis of results. Future review articles may wish to focus on synthesis of results, although their highly heterogenous nature is likely to prevent meta-analysis. Secondly, the search was filtered to include only articles which were published in the English language. This may have missed relevant studies published in a language other than English, although the review did include several studies focused on social media content posted in Chinese, Greek, Turkish, French, and Finnish. The database searches were also limited to peer-reviewed publications as per convention for systematic literature reviews, however this search approach could potentially miss peer-reviewed conference proceedings and industry reports [ 67 ].

This review is the first to systematically identify, summarise and critically evaluate the available literature focused on the representation of mental ill-health on social media. The review analysed current methodologies employed by these studies and critically evaluated strengths and weaknesses of the various approaches adopted by researchers. The results highlight the need to shift away from text-based social media research such as Twitter, towards the more popular and emerging image and video-based platforms. The utility of both manual and computer-assisted content analysis was discussed, and reviewers concluded that both make valuable contributions to the body of research. Future research could aim to investigate how social media representation of mental illness translates to ‘real-life’ attitudes and instances of stigmatising behaviour, as well as the help-seeking behaviours of those experiencing symptoms of mental ill-health. Along with many other non-communicable chronic diseases, the rate of mental illness continues to grow, presenting an urgent public health challenge. This field of research can help to develop a deeper understanding of societal attitudes towards mental ill-health and reveal the information those suffering from mental ill-health are exposed to on social media. Through this knowledge, mental and public health professionals can create more targeted and effective campaigns to combat negative representations of mental ill-health using social media as a medium.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Disability-adjusted life year

Coronavirus disease 2019

International Prospective Register of Systematic Reviews

Preferred Reporting Items for Systematic reviews and Meta-Analyses

Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition

Critical Appraisal Skills Programme

Autism spectrum disorder

Eating disorder

Human immunodeficiency virus/Acquired immunodeficiency syndrome

Monthly active users

Allen D. The relationship between challenging behaviour and mental ill-health in people with intellectual disabilities: a review of current theories and evidence. J Intellect Disabil. 2008;12(4):267–94.

Article   PubMed   Google Scholar  

Everymind. Understanding mental ill-health n.d. [cited 2023 Mar 14]. Available from: https://everymind.org.au/understanding-mental-health/mental-health/what-is-mental-illness .

Rehm KD Jr. Global burden of Disease and the impact of Mental and Addictive disorders. Curr Psychiatry Rep. 2019;21(2):1–7.

Article   Google Scholar  

Elias CL, Gorey KM. Online social networking among clinically depressed Young people: scoping review of potentially supportive or harmful behaviors. J Technol Hum Serv. 2022;40(1):79–96.

Wang PS, Berglund PA, Olfson M, Kessler RC. Delays in initial treatment contact after first onset of a mental disorder. Health Serv Res. 2004;39(2):393–415.

Article   PubMed   PubMed Central   Google Scholar  

Morgan C, Mallett R, Hutchinson G, Leff J. Negative pathways to psychiatric care and ethnicity: the bridge between social science and psychiatry. Soc Sci Med. 2004;58(4):739–52.

Dell’Osso B, Glick ID, Baldwin DS, Altamura AC. Can Long-Term outcomes be improved by shortening the duration of untreated illness in Psychiatric disorders? A conceptual Framework. Psychopathology. 2013;46(1):14–21.

Sullivan LE, Fiellin DA, O’Connor, PGDoIMYUSoM. The prevalence and impact of alcohol problems in major depression: a systematic review. Am J Med. 2005;118(4):330–41.

Clement S, Schauman O, Graham T, Maggioni F, Evans-Lacko S, Bezborodovs N, et al. What is the impact of mental health-related stigma on help-seeking? A systematic review of quantitative and qualitative studies. Psychol Med. 2015;45(1):11–27.

Article   CAS   PubMed   Google Scholar  

Xu Z, Huang F, Kösters M, Staiger T, Becker T, Thornicroft G, et al. Effectiveness of interventions to promote help-seeking for mental health problems: systematic review and meta-analysis. Psychol Med. 2018;48(16):2658–67.

Schnyder N, Panczak R, Groth N, Schultze-Lutter F. Association between mental health-related stigma and active help-seeking: systematic review and meta-analysis. B J Psychiatry. 2017;210(4):261–8.

Dudley JR. Confronting stigma within the Services System. Soc Work. 2000;45(5):449.

Salathé M, Bengtsson L, Bodnar TJ, Brewer DD, Brownstein JS, Buckee C et al. Digital Epidemiology. PLoS Comput Biol. 2012;8(7).

Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection-harnessing the web for public health surveillance. N Engl J Med. 2009;360(21):2153–5.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Patel R, Chang T, Greysen SR, Chopra V. Social Media Use in Chronic Disease: a systematic review and Novel Taxonomy. Am J Med. 2015;128(12):1335–50.

Eysenbach G, Schulz P, Auvinen A-M, Crotty B, Moorhead SA, Hazlett DE et al. A New Dimension of Health Care: systematic review of the uses, benefits, and Limitations of Social Media for Health Communication. JMIR. 2013;15(4).

Zhang Y, Cao B, Wang Y, Peng T-Q, Wang X. When Public Health Research Meets Social Media: knowledge mapping from 2000 to 2018. JMIR. 2020;22(8):e17582.

PubMed   PubMed Central   Google Scholar  

Wongkoblap A, Vadillo MA, Curcin V. Researching Mental Health disorders in the era of Social Media. Syst Rev JMIR. 2017;19(6):e228.

Google Scholar  

Passerello GL, Hazelwood JE, Lawrie S. Using Twitter to assess attitudes to schizophrenia and psychosis. BJPsych Bull. 2019;43(4):158–66.

Budenz A, Purtle J, Klassen A, Yom-Tov E, Yudell M, Massey P. The case of a mass shooting and violence-related mental illness stigma on Twitter. Stigma Health. 2019;4(4):411–20.

Liu D, Feng XL, Ahmed F, Shahid M, Guo J. Detecting and measuring Depression on Social Media using a machine Learning Approach: systematic review. JMIR Ment Health. 2022;9(3):e27244.

Chancellor S, De Choudhury M. Methods in predictive techniques for mental health status on social media: a critical review. NPJ Digit Med. 2020;3(1).

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical Res ed). 2021;372:n71.

Athanasopoulou C, Sakellari E. Schizophrenia’ on Twitter: Content Analysis of Greek Language tweets. Stud Health Technol Inf. 2016;226:271–4.

Budenz A, Klassen A, Purtle J, Yom Tov E, Yudell M, Massey P. Mental illness and bipolar disorder on Twitter: implications for stigma and social support. J Ment Health. 2020;29(2):191–9.

Cavazos-Rehg PA, Krauss MJ, Sowles S, Connolly S, Rosas C, Bharadwaj M, et al. A content analysis of depression-related tweets. Comput Hum Behav. 2016;54:351–7.

Delanys S, Benamara F, Moriceau V, Olivier F, Mothe J. Psychiatry on Twitter: Content Analysis of the Use of Psychiatric terms in French. JMIR Form Res. 2022;6(2):e18539.

Hernandez MY, Hernandez M, Lopez DH, Gamez D, Lopez SR. What do health providers and patients tweet about schizophrenia? Early Interv Psychiatry. 2020;14(5):613–8.

Jansli SM, Hudson G, Negbenose E, Erturk S, Wykes T, Jilka S. Investigating mental health service user views of stigma on Twitter during COVID-19: a mixed-methods study. J Ment Health. 2022;31(4):576–84.

Jilka S, Odoi CM, van Bilsen J, Morris D, Erturk S, Cummins N et al. Identifying schizophrenia stigma on Twitter: a proof of principle model using service user supervised machine learning. Schizophr. 2022;8(1).

Joseph AJ, Tandon N, Yang LH, Duckworth K, Torous J, Seidman LJ, et al. #Schizophrenia: use and misuse on Twitter. Schizophr Res. 2015;165(2–3):111–5.

Kara UY, Şenel Kara B. Schizophrenia on Turkish Twitter: an exploratory study investigating misuse, stigmatization and trivialization. Soc Psychiatry Psychiatr Epidemiol. 2022;57(3):531–9.

Makita M, Mas-Bleda A, Morris S, Thelwall M. Mental Health Discourses on Twitter during Mental Health Awareness Week. Issues Ment Health Nurs. 2021;42(5):437–50.

Nelson A. Ups and Downs: Social Media Advocacy of bipolar disorder on World Mental Health Day. Front Commun. 2019;4.

Park S, Hoffner C. Tweeting about mental health to honor Carrie Fisher: how #InHonorOfCarrie reinforced the social influence of celebrity advocacy. Comput Hum Behav. 2020;110:106353.

Parrott B. Hakim, Gentile. From #endthestigma to #realman: Stigma-Challenging Social Media Responses to NBA players’ Mental Health disclosures. Commun Rep. 2020;33(3):148–60.

Pavlova A, Berkers P. Mental health discourse and social media: which mechanisms of cultural power drive discourse on Twitter. Soc Sci Med. 2020;263:113250.

Pavlova A, Berkers P. Mental Health as defined by Twitter: frames, emotions, Stigma. Health Commun. 2022;37(5):637–47.

Reavley NJ, Pilkington PD. Use of Twitter to monitor attitudes toward depression and schizophrenia: an exploratory study. PeerJ. 2014;2:e647.

Robinson P, Turk D, Jilka S, Cella M. Measuring attitudes towards mental health using social media: investigating stigma and trivialisation. Soc Psychiatry Psychiatr Epidemiol. 2019;54(1):51–8.

Saha K, Torous J, Ernala SK, Rizuto C, Stafford A, De Choudhury M. A computational study of mental health awareness campaigns on social media. Transl Behav Med. 2019;9(6):1197–207.

Stupinski AM, Alshaabi T, Arnold MV, Adams JL, Minot JR, Price M, et al. Quantifying changes in the Language used around Mental Health on Twitter over 10 years: Observational Study. JMIR Ment Health. 2022;9(3):e33685.

Alvarez-Mon MA, Llavero-Valero M, Sánchez-Bayona R, Pereira-Sanchez V, Vallejo-Valdivielso M, Monserrat J, et al. Areas of interest and stigmatic attitudes of the General Public in five relevant Medical conditions: thematic and quantitative analysis using Twitter. JMIR. 2019;21(5):e14110.

Li A, Zhu T, Jiao D. Detecting depression stigma on social media: a linguistic analysis. J Affect Disord Rep. 2018;232:358–62.

Article   CAS   Google Scholar  

Li A, Jiao D, Liu X, Zhu T. A comparison of the psycholinguistic styles of Schizophrenia-Related Stigma and Depression-Related Stigma on Social Media: Content Analysis. JMIR. 2020;22(4):e16470.

Pan J, Liu B, Kreps GL. A content analysis of depression-related discourses on Sina Weibo: attribution, efficacy, and information sources. BMC Public Health. 2018;18(1):772.

Wang W, Liu Y. Discussing mental illness in Chinese social media: the impact of influential sources on stigmatization and support among their followers. Health Commun. 2016;31(3):355–63.

Yu L, Jiang W, Ren Z, Xu S, Zhang L, Hu X. Detecting changes in attitudes toward depression on Chinese social media: a text analysis. J Affect Disord. 2021;280(Pt A):354–63.

Athanasopoulou C, Suni S, Hätönen H, Apostolakis I, Lionis C, Välimäki M. Attitudes towards schizophrenia on YouTube: A content analysis of Finnish and Greek videos. Inf Health Soc Care. 2016;41(3):307–24.

Devendorf A, Bender A, Rottenberg J. Depression presentations, stigma, and mental health literacy: a critical review and YouTube content analysis. Clin Psychol Rev. 2020;78:101843.

Ghate R, Hossain R, Lewis SP, Richter MA, Sinyor M. Characterizing the content, messaging, and tone of trichotillomania on YouTube: A content analysis. J Psychiatr Res. 2022;151:150–6.

McLellan A, Schmidt-Waselenchuk K, Duerksen K, Woodin E. Talking back to mental health stigma: an exploration of YouTube comments on anti-stigma videos. Comput Hum Behav. 2022;131:107214.

Wu J, Hong T. The picture of #Mentalhealth on Instagram: congruent vs. incongruent emotions in Predicting the sentiment of comments. Front Commun. 2022;7.

Pavelko RL, Wang T. Love and basketball: audience response to a professional athlete’s mental health proclamation. Health Educ J. 2021;80(6):635–47.

Shigeta N, Ahmed S, Ahmed SW, Afzal AR, Qasqas M, Kanda H, et al. Content analysis of Canadian newspapers articles and readers’ comments related to schizophrenia. Int J Cult Ment Health. 2017;10(1):75–81.

Basch CH, Donelle L, Fera J, Jaime C. Deconstructing TikTok videos on Mental Health: cross-sectional, descriptive content analysis. JMIR Form Res. 2022;6(5):e38340.

Guidry J, Zhang Y, Jin Y, Parrish C. Portrayals of depression on Pinterest and why public relations practitioners should care. Public Relat Rev. 2016;42.

Vidamaly S, Lee SL. Young adults’ Mental Illness aesthetics on Social Media. Int J Cyber Behav. 2021;11:13–32.

Cascini F, Pantovic A, Al-Ajlouni YA, Failla G, Puleo V, Melnyk A, et al. Social media and attitudes towards a COVID-19 vaccination: a systematic review of the literature. eClinicalMedicine. 2022;48:101454.

Tokac U, Brysiewicz P, Chipps J. Public perceptions on Twitter of nurses during the COVID-19 pandemic. Contemp Nurse. 2022:1–10.

Ölcer S, Yilmaz-Aslan Y, Brzoska P. Lay perspectives on social distancing and other official recommendations and regulations in the time of COVID-19: a qualitative study of social media posts. BMC Public Health. 2020;20(1):963.

Ugarte DA, Cumberland WG, Flores L, Young SD. Public attitudes about COVID-19 in response to President Trump’s Social Media posts. JAMA Netw Open. 2021;4(2):e210101–e.

De Falco CC, Punziano G, Trezza D. A mixed content analysis design in the study of the Italian perception of COVID-19 on Twitter. Athens J Soc Sci. 2021;8(3):191–210.

Shorey S, Ang E, Yamina A, Tam C. Perceptions of public on the COVID-19 outbreak in Singapore: a qualitative content analysis. J Public Health. 2020;42(4):665–71.

Bucci S, Schwannauer M, Berry N. The digital revolution and its impact on mental health care. Psychol Psychother. 2019;92(2):277–97.

Naslund JA, Aschbrenner KA, Marsch LA, Bartels SJ. The future of mental health care: peer-to-peer support and social media. Epidemiol Psychiatr Sci. 2016;25(2):113–22.

Fung IC-H, Duke CH, Finch KC, Snook KR, Tseng P-L, Hernandez AC, et al. Ebola virus disease and social media: a systematic review. AM J Infect Control. 2016;44(12):1660–71.

Hawks JR, Madanat H, Walsh-Buhi ER, Hartman S, Nara A, Strong D et al. Narrative review of social media as a research tool for diet and weight loss. Comput Hum Behav. 2020;111.

Statista. Most popular social networks worldwide as of January 2023, ranked by number of monthly active users 2023 [cited 2023 Feb 23]. Available from: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/ .

Statista. Distribution of Twitter users worldwide as of April 2021, by age group 2021 [cited 2023 Feb 23]. Available from: https://www.statista.com/statistics/283119/age-distribution-of-global-twitter-users/ .

Oberlo. TikTok age demographics 2023 [cited 2023 Feb 23]. Available from: https://au.oberlo.com/statistics/tiktok-age-demographics .

Australian Bureau of Statistics. National Study of Mental Health and Wellbeing. 2022 [cited 2023 Feb 23]. Available from: https://www.abs.gov.au/statistics/health/mental-health/national-study-mental-health-and-wellbeing/2020-21#cite-window2 .

Liamputtong P. Qualitative research methods. Fifth ed. Melbourne, Australia: Oxford University Press Australia and New Zealand; 2020.

Cho JY, Lee E-H. Reducing confusion about grounded theory and qualitative content analysis: similarities and differences. Qual Rep. 2014;19(32):1–20.

Fox AB, Earnshaw VA, Taverna EC, Vogt D. Conceptualizing and measuring Mental Illness Stigma: the Mental Illness Stigma Framework and critical review of measures. Stigma Health. 2018;3(4):348–76.

Corrigan P. How stigma interferes with mental health care. Am Psychol. 2004;59(7):614–25.

Forman J, Damschroder L. Qualitative content analysis. Empirical methods for bioethics: a primer. Emerald Group Publishing Limited; 2007. pp. 39–62.

Humble N, Mozelius P, editors. Content analysis or thematic analysis: Similarities, differences and applications in qualitative research. European Conference on Research Methodology for Business and Management Studies; 2022 June 2–3; Portugal.

Critical Appraisal Skills Programme. CASP Qualitative Studies Checklist 2022 [cited 2023 Feb 23]. Available from: https://casp-uk.net/images/checklist/documents/CASP-Qualitative-Studies-Checklist/CASP-Qualitative-Checklist-2018_fillable_form.pdf .

Kleinheksel AJP, Rockich-Winston NP, Tawfik HPMD, Wyatt TRP. Demystifying content analysis. Am J Pharm Educ. 2020;84(1).

O’Connor C, Joffe H. Intercoder Reliability in Qualitative Research: debates and practical guidelines. Int J Qual Methods. 2020;19.

Campbell JL, Quincy C, Osserman J, Pedersen OK. Coding in-depth semistructured interviews: problems of unitization and intercoder reliability and agreement. Sociol Methods Res. 2013;42:294–320.

Article   MathSciNet   Google Scholar  

Martín EG, Lavesson N, Doroud M. Hashtags and followers. Social Netw Anal Min. 2016;6(1):12.

Svensson R, Johnson B, Olsson A. Does gender matter? The association between different digital media activities and adolescent well-being. BMC Public Health [Internet]. 2022; 22(1).

Twenge JM, Martin GN. Gender differences in associations between digital media use and psychological well-being: evidence from three large datasets. J Adolesc. 2020;79:91–102.

Download references

Acknowledgements

Not applicable.

The authors received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and affiliations.

School of Medicine and Dentistry, Griffith University, Gold Coast Campus, 1 Parklands Drive, 4222, Southport, Gold Coast, QLD, Australia

Lucy Tudehope, Neil Harris, Lieke Vorage & Ernesta Sofija

You can also search for this author in PubMed   Google Scholar

Contributions

LT, ES and NH conceptualised the study. LT conducted the systematic literature search, and LV and LT completed the article screening and appraisal process. LT wrote the first draft of the manuscript, which was subsequently edited and reviewed by ES and NH. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lucy Tudehope .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Tudehope, L., Harris, N., Vorage, L. et al. What methods are used to examine representation of mental ill-health on social media? A systematic review. BMC Psychol 12 , 105 (2024). https://doi.org/10.1186/s40359-024-01603-1

Download citation

Received : 24 July 2023

Accepted : 18 February 2024

Published : 29 February 2024

DOI : https://doi.org/10.1186/s40359-024-01603-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Mental health
  • Public health
  • Content analysis
  • Research methods

BMC Psychology

ISSN: 2050-7283

literature review about usability evaluation methods

IMAGES

  1. (PDF) Usability evaluation methods: a literature review

    literature review about usability evaluation methods

  2. Applying Heuristic Evaluation in Usability Testing

    literature review about usability evaluation methods

  3. Usability Evaluation Method

    literature review about usability evaluation methods

  4. Usability evaluation framework.

    literature review about usability evaluation methods

  5. (PDF) The Effectiveness of Usability Evaluation Methods: Determining

    literature review about usability evaluation methods

  6. (PDF) Usability evaluation methods employed to assess information

    literature review about usability evaluation methods

COMMENTS

  1. Usability evaluation methods: a literature review

    Usability is the most widely used concept in the software engineering field and defines the software system's demand and use. Due to such wide importance of this quality factor various usability ...

  2. A Review of Usability Evaluation Methods and their Use for Testing

    Conclusions: In summary, this paper provides a review of the usability evaluation methods employed in the assessment of eHealth HIV eHealth interventions. eHealth is a growing platform for delivery of HIV interventions and there is a need to critically evaluate the usability of these tools before deployment.

  3. A literature review about usability evaluation methods for e-learning

    Within the domain of information ergonomics, the study of tools and methods used for usability evaluation dedicated to E-learning presents evidence that there is a continuous and dynamic evolution of E-learning systems, in many different contexts -academics and corporative. These systems, also known as LMS (Learning Management Systems), can be ...

  4. Usability: An introduction to and literature review of usability

    Various testing methods were used, including questionnaires, think aloud studies and heuristic evaluation. Usability testing comprised a range of single cycle through to several rounds of testing. ... Methods. A literature review was carried out to assess the reported use of usability testing in the radiation oncology education literature.

  5. Usability: An introduction to and literature review of usability

    Evaluation Method Description Benefits (+)/Limitations (-) Example Study; Direct observation - live or recorded evaluation: Heuristic evaluation [19]: Usability experts examine an interface against a set of pre-defined characteristics - "heuristics" - such as simple language, consistency and shortcuts in order to identify usability flaws and severity

  6. PDF USABILITY EVALUATION METHODS: A LITERATURE REVIEW

    Abstract: Usability is an important factor for all software quality models. It is the key factor in the development of successful interactive software applications. Usability is the most widely ...

  7. Potential effectiveness and efficiency issues in usability evaluation

    A systematic literature review of usability evaluation studies, published by (academic) practitioners between 2016 and April 2023, was conducted. 610 primary articles were identified and analysed, utilising five major scientific databases. ... Usability evaluation methods like the traditional heuristic evaluation method, often used for general ...

  8. Usability Methods and Attributes Reported in Usability Studies of

    Therefore, it is suggested that various usability evaluation methods, including subjective and objective usability measures, are used in future usability studies. Our review found that most of the included studies in health care education (71/98, 72%) performed field testing, whereas previous literature suggests that usability experiments in ...

  9. A Systematic Literature Review of Usability Evaluation ...

    A systematic literature review is a methodology that identify, synthesize and interpret all available studies that are relevant to a research question formulated previously, or topic area, or phenomenon of interest [].Although systematic reviews require more effort than traditional reviews, the advantages undertaking this method are greater.

  10. A Review: Healthcare Usability Evaluation Methods

    Several types of usability evaluation methods (UEM) are used to assess software, and more extensive research is needed on the use of UEM in early design and development stages by manufacturers to achieve the goal of user-centered design. This article is a literature review of the most commonly applied UEM and related emerging trends.

  11. [PDF] A literature review about usability evaluation methods for e

    This review is a synthesis of research project about Information Ergonomics and embraces three dimensions, namely the methods, models and frameworks that have been applied to evaluate LMS and shows a notorious change in the paradigms of usability. The usability analysis of information systems has been the target of several research studies over the past thirty years.

  12. Usability Evaluation Methods: A Systematic Review

    Usability Evaluation Methods: A Systematic Review. A. Martins, A. Queirós, +1 author. N. Rocha. Published 2015. Computer Science. This chapter aims to identify, analyze, and classify the methodologies and methods described in the literature for the usability evaluation of systems and services based on information and….

  13. A Review of Usability Evaluation Methods for eHealth Applications

    Based on the results obtained from the 20 selected papers, a majority of the papers used only one method of usability evaluation. From the ten papers that used only one method, a total of 70% which indicates 7 out 10 papers used questionnaire [4, 13, 16, 19, 20, 25, 27] as a method to evaluate the usability of the eHealth application.The remaining paper used think aloud [], survey [] and ...

  14. Usability research in educational technology: a state-of-the-art

    This paper presents a systematic literature review characterizing the methodological properties of usability studies conducted on educational and learning technologies in the past 20 years. PRISMA guidelines were followed to identify, select, and review relevant research and report results. Our rigorous review focused on (1) categories of educational and learning technologies that have been ...

  15. Systematic review of applied usability metrics within usability

    This study reviews the breadth of usability evaluation methods, metrics, and associated measurement techniques that have been reported to assess systems designed for hospital staff to assess inpatient clinical condition. ... The usability of electronic medical record systems implemented in sub‐Saharan Africa: a literature review of the ...

  16. Usability Evaluation Methods of Mobile Applications: A Systematic

    The usability evaluation process of mobile applications is carried out with a systematic literature review of 22 papers. The results show that 73% of the methods used are usability testing, 23% heuristic evaluations, and 4% are user satisfaction usability evaluations.

  17. A literature review about usability evaluation methods for e-learning

    Within the domain of information ergonomics, the study of tools and methods used for usability evaluation dedicated to E-learning presents evidence that there is a continuous and dynamic evolution ...

  18. Users' design feedback in usability evaluation: a literature review

    As part of usability evaluation, users may be invited to offer their reflections on the system being evaluated. Such reflections may concern the system's suitability for its context of use, usability problem predictions, and design suggestions. We term the data resulting from such reflections users' design feedback. Gathering users' design feedback as part of usability evaluation may be ...

  19. IEA 2012

    A literature review about usability evaluation methods for e-learning platforms Freire, Luciana Lopesa,, Arezes, Pedro Miguelb and Campos, José Creissacc abDeparment of Production and Systems Engineering - University of Minho - Guimarães, Portugal - c Deparment of Computer Science - Gualtar - Un iversity of Minho - Gualtar, Portugal -

  20. A literature review about usability evaluation methods for e-learning

    This review is a synthesis of research project about Information Ergonomics and embraces three dimensions, namely the methods, models and frameworks that have been applied to evaluate LMS. The study also includes the main usability criteria and heuristics used. The obtained results show a notorious change in the paradigms of usability, with ...

  21. Agile, Easily Applicable, and Useful eHealth Usability Evaluations

    Background Electronic health (eHealth) usability evaluations of rapidly developed eHealth systems are difficult to accomplish because traditional usability evaluation methods require substantial time in preparation and implementation. This illustrates the growing need for fast, flexible, and cost-effective methods to evaluate the usability of eHealth systems.

  22. Towards a validated glossary of usability attributes for the evaluation

    Background Despite technical advances in the field of wearable robotic devices (WRD), there is still limited user acceptance of these technologies. While usability often comes as a key factor influencing acceptance, there is a scattered landscape of definitions and scopes for the term. To advance usability evaluation, and to integrate usability features as design requirements during technology ...

  23. Usability Evaluation of Dashboards: A Systematic Literature Review of

    The exclusion criteria were as follows: (1) non-English studies, (2) focusing on only dashboard design or dashboard evaluation, (3) use of evaluation methods other than questionnaires to evaluate usability, and (4) lack of access to the full text of articles. 2.3. Study Selection, Article Evaluation, and Data Extraction.

  24. Mobile Applications Usability Evaluation: Systematic Review and

    This report is a systematic review related to mobile applications usability evaluation by bibliometric analysis. Scopus and Web of Science were applied to search literature. Three tools, including VOSviewer, CiteSpace, and MAXQDA, were used in this report to do the bibliometric analysis.

  25. What methods are used to examine representation of mental ill-health on

    There has been an increasing number of papers which explore the representation of mental health on social media using various social media platforms and methodologies. It is timely to review methodologies employed in this growing body of research in order to understand their strengths and weaknesses. This systematic literature review provides a comprehensive overview and evaluation of the ...