Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 19 April 2023

Voice assistants in private households: a conceptual framework for future research in an interdisciplinary field

  • Bettina Minder   ORCID: orcid.org/0000-0002-5874-4999 1 ,
  • Patricia Wolf 2 , 3 ,
  • Matthias Baldauf 4 &
  • Surabhi Verma 2 , 5  

Humanities and Social Sciences Communications volume  10 , Article number:  173 ( 2023 ) Cite this article

2291 Accesses

1 Citations

1 Altmetric

Metrics details

  • Business and management
  • Criminology
  • Information systems and information technology
  • Science, technology and society

The present study identifies, organizes, and structures the available scientific knowledge on the recent use and the prospects of Voice Assistants (VA) in private households. The systematic review of the 207 articles from the Computer, Social, and Business and Management research domains combines bibliometric with qualitative content analysis. The study contributes to earlier research by consolidating the as yet dispersed insights from scholarly research, and by conceptualizing linkages between research domains around common themes. We find that, despite advances in the technological development of VA, research largely lacks cross-fertilization between findings from the Social and Business and Management Sciences. This is needed for developing and monetizing meaningful VA use cases and solutions that match the needs of private households. Few articles show that future research is well-advised to make interdisciplinary efforts to create a common understanding from complementary findings—e.g., what necessary social, legal, functional, and technological extensions could integrate social, behavioral, and business aspects with technological development. We identify future VA-based business opportunities and propose integrated future research avenues for aligning the different disciplines’ scholarly efforts.

Similar content being viewed by others

virtual voice assistant research paper

Participatory action research

virtual voice assistant research paper

Explore, engage, empower: methodological insights into a transformative mixed methods study tackling the COVID-19 lockdown

virtual voice assistant research paper

Citizen social science in practice: the case of the Empty Houses Project

Introduction.

Scholarly research across disciplines agrees that technological advancement is one of the important drivers of economic development because it brings about efficiency gains for all players of an economic system (Grossman and Helpman, 1991 ; Kortum, 1997 ; Dercole et al., 2008 ). Digitization and emerging technologies thus usually draw intense scholarly interest and are studied with the hope that their adoption will enable companies to generate “new capabilities, new products, and new markets” (Bhat, 2005 , p. 457) based on new business models, specifically designed for digitalized life spheres (Chao et al., 2007 ; Sestino et al., 2020 ; Antonopoulou and Begkos, 2020 ).

One of the recent emergent digital technologies promising companies substantial future revenues from innovative user services is voice assistants (VAs). They are “speech-driven interaction systems” (Ammari et al., 2019 , p. 3) that offer new interaction modalities (Rzepka et al., 2022 ).

Partly based on the integration of complementary Artificial Intelligence (AI) technology, they allow users’ speech to be processed, interpreted, and responded to in a meaningful way. In private households, we witness a rapid adoption rate of VAs in the form of smart speakers such as Amazon Echo, Apple Homepod, and Google Home (Pridmore and Mols, 2020 ) which, particularly in combination with customization of IoT home systems, provide a higher level of control over the smart home experience compared to a traditional setting (Papagiannidis and Davlembayeva, 2022 ). Available in the United States (US) since 2014 and in Europe since September 2016 (Trenholm, 2016 ; Hern, 2017 ), by 2018, already 15.4% of the US and 5.9% of the German population owned an Amazon Echo (Brandt, 2018 ). Overall, private household purchases already grew to 116% in the third quarter of 2018 compared to 2017 (Tung, 2018 ) and, according to a recent research report from the IoT analyst firm Berg Insight (Berg Insight, 2022 ), the number smart homes in Europe and North America reached 105 million in 2021. We realize that, at present, VAs represent an emergent technology that has its challenges (Clark et al., 2022 ), similar to the Internet of Things (IoT) or big data analytics technology. It has triggered an enormous amount of diverse scholarly research resulting “in a mass of disorganized knowledge” (Sestino et al., 2020 , p. 1). For both scholars and managers, the sheer quantity of disorganized information is making it hard to predict the characteristics of future technology use cases that fit users’ needs or to use this information for strategy development processes (Brem et al., 2019 ; Antonopoulou and Begkos, 2020 ). While Computer Science scholars already debate the technological feasibility of specific and complex VA applications, Social Science research points to VA-related market acceptance risks resulting, for example, from biased choices offered by VA (Rabassa et al., 2022 ) or from not identifying and implementing the privacy protection measures required by younger people (Shin et al., 2018 ), motivated by frequent user privacy leaks (Fathalizadeh et al., 2022 ) and worries about adverse incidents (Shank et al., 2022 ). Recent studies also specifically emphasize the need to shift the focus to user-centric product value (Nguyen et al., 2022 ) in the pursuit of the most beneficial solutions in terms of social acceptance and legal requirements (Clemente et al., 2022 ). For the most beneficial solutions, a collaboration between companies or even industries is likely to be necessary (Struckell et al., 2021 ).

There are, to our best knowledge, no systematic review papers focusing on VAs from a single discipline’s perspective that we could draw from. We did find an exploration of recent papers about the use of virtual assistants in healthcare that highlights some critical points (e.g., VA limitations concerning the ability to maintain continuity over multiple conversations (Clemente et al., 2022 ) or a review focusing on different interactions modalities in the ear of 4.0 industry—highlighting the need for strong voice recognition algorithm and coded voice commands (Kumar and Lee, 2022 ). In sum, the research that might allow for strategizing around VA solutions that match the needs of private households is scattered and needs to be organized and made sense of from an interdisciplinary perspective to shed “light on current challenges and opportunities, with the hope of informing future research and practice.” (Sestino et al., 2020 ). This paper thus sets out to identify, organize, and structure the available scientific knowledge on the recent use and the prospects of VAs in private households and propose integrated future research avenues for aligning the different disciplines’ scholarly efforts and leading research on consistent, interdisciplinary informed paths. We use a systematic literature review approach that combines bibliometric and qualitative content analysis to gain an overview of the still dispersed insights from scholarly research in different disciplines and to conceptualize topical links and common themes. Research on emerging technologies acknowledges that the adoption of these technologies depends on more factors than just technological maturity. Also, social aspects (e.g., social norms) and economic maturity (e.g., can a product be produced and sold so that it is cost-effective) play an important role (Birgonul and Carrasco, 2021 ; Xi and Hamari, 2021 ). Research particularly emphasizes that emerging technologies need to not only be creatively and economically explored—but also grounded in the user’s perspectives (Grossman-Kahn and Rosensweig, 2012 ) and serve longer enduring needs (Patnaik and Becker, 1999 ). IDEO conceptualized these requirements into the three dimensions of feasibility, viability, and desirability (IDEO, 2009 ).

Feasibility covers all aspects of VA innovation management that assures that the solution will be technically feasible and scalable. This also includes insuring that legal and regulatory requirements are met (Brenner et al., 2021 ). The viability lens focuses on economic success. “Desirability” ensures that the solutions and services are accepted by the target groups and, more generally, desired by society (Brenner et al., 2021 ). While IDEO and their focus on innovation development processes relate to a different context, the main reasoning about the relevance of these three dimensions (technical, social, and management) is also applicable when looking for research literature that helps find strategies around VA solutions that correspond to people’s needs in private homes. To cover these three dimensions, we focus on studies from Computer Science (CS), Social Science (SS), and Business and Management Science (BMS) to advance our knowledge of the still dispersed insights from scholarly research and highlight shared topics and common themes.

With this conceptual approach, we contribute an in-depth analysis and systematic overview of interdisciplinary scholarly work that allows cross-fertilization between different disciplines’ findings. Based on our findings, we develop several propositions and a framework for future research in the interest of aligning the various scholarly efforts and leading research on consistent, interdisciplinarily informed paths. This will help realize VA’s potential in people’s everyday lives. We moreover identify potential future VA-based business opportunities.

This paper is structured as follows: the section “Business opportunities related to VA use in private households” summarizes the research on potential business opportunities related to the use of VAs in private households. The research methodology, i.e., our approach of combining a bibliometric literature analysis with qualitative content analysis in a literature review, is presented in the section “Methods”. Section “Thematic clusters in recent VA research” identifies nine thematic clusters in recent VA research, and section “Analysis and conceptualization of research streams” analyzes and conceptually integrates them into four interdisciplinary research streams. Section “Discussion: Propositions and a framework for future research, and related business opportunities” identify future business opportunities and proposes future directions for integrated research, and section “Conclusion” concludes with contributions that should help both scholars and managers use this research to predict the characteristics of future technology, use cases that fit users’ needs, and use this information for their strategy development processes around VA.

Business opportunities related to VA use in private households

Sestino et al. ( 2020 , p. 7) argue that when new technologies emerge, “companies will need to assess the positives and negatives of adopting these technologies”. The positives of VA adoption lay mainly in the projection of large new consumer markets offering products and services where text-based human–computer interaction will be replaced by voice-activated input (Pridmore and Mols, 2020 :1), checkout-free stores such as Amazon Go, and the use of VA (Batat, 2021 ). Marketing studies predict high adoption rates in private households due to potential efficiency gains from managing household systems and devices by voice commands anytime from anywhere (Celebre et al., 2015 ; Chan and Shum, 2018 ; Jabbar et al., 2019 ; Vishwakarma et al., 2019 ), as well as the high potential of health check app for improving communication with patients (Abdel-Basset et al., 2021 ) or realize self-care solutions (Clemente et al., 2022 ). A study by Microsoft and Bing (Olson and Kemery, 2019 ) substantiates that claim for smart homes by revealing that, already today, 54% of the 5000 responding US users use their smart speakers to manage their homes, especially for controlling lighting and thermostats. In surveys, users state that they envision a future in which they will increasingly use voice commands to control household appliances from the microwave to the bathtub or from curtains to toilet controls (Kunath et al., 2019 ). CS scholars discuss how to design complementary Internet of Things (IoT) technology features and systems to bring about such benefits (Hamill, 2006 ; Druga et al., 2017 ; Pradhan et al., 2018 ; Gnewuch et al., 2018 ; Tsiourti et al., 2018a / b ; Azmandian et al., 2019 ; Lee et al., 2019 ; Pyae and Scifleet, 2019 ; Sanders and Martin-Hammond, 2019 ). BMS research additionally debates how companies should proceed to capture, organize, and analyze the (big) user data that become potentially available once VA is commonly used in private households, and to identify new business opportunities (Krotov, 2017 ; Sestino et al., 2020 ) and future VA applications, such as communication and monitoring services in pandemics (Abdel-Basset et al. 2021 ).

However, many recent studies also mention the negatives of VA usage, like worrying trends emerging from the so-called surveillance economy (Zuboff, 2019 ) or, instead, debate future questions, such as what happens when technology fails or what the rights of fully automated technological beings would be (Harwood and Eaves, 2020 ). 2050 out of the 5000 respondents to the Microsoft and Bing study reported concerns related to voice-enabled technology, especially about data security (52%) and passive listening (41%). The “significant new production of situated and sensitive data” (Pridmore and Mols, 2020 , p. 1) in private environments and the unclear legal situation related to the usage of these data seem to act as one of the inhibitors to the adoption of more complex VA applications by users. Thus, many of the imaginable future use cases, such as advanced smart home controls (Lopatovska and Oropeza, 2018 ; Lopatovska et al., 2019 ) or personal virtual shopping assistance (Omale, 2020 ; Sestino et al., 2020 ), are still a long way off. Although technologically feasible and partly already available, today’s users use VAs for simple tasks, such as “searching for a quick fact, playing music, looking for directions, getting the news and weather” (Olson and Kemery, 2019 ). Therefore, companies are warned against too high expectations of fast returns. Moreover, there are also some technical issues, and only the not-yet-mature integration of further AI-enabled services in VA is expected to be a game changer leading to growth in the deployment of voice-based solutions (Gartner, 2019 ; Columbus, 2020 ).

At a meta-level, BMS research advises companies to explore and implement new technologies in their products, services, or business processes, because that might result in a considerable competitive first-mover advantage (Drucker, 1988 ; Porter, 1990 ; Carayannis and Turner, 2006 ; Hofmann and Orr, 2005 ; Bhat, 2005 ). At the same time, Macdonald and Jinliang ( 1994 ) have shown that in industrial gestation (or the impact of science on society), the evolution in the demand for technology, and a set of competitors go hand in hand. Consequently, the adoption of an emergent technology by “the ultimate affected customer base” (Bhat, 2005 , p. 462) becomes of utmost importance when looking at how company investments pay off (Pridmore and Mols, 2020 ). This is particularly the case for VAs where companies are greatly dependent on the adoption of respective hardware—typically the aforementioned smart speakers (Herring and Roy, 2007 )—or of new services, such as the envisioned digital assistants (Sestino et al., 2020 , p. 7), by private users. VAs differ from other emergent technologies that allow companies to reap the benefits by implementing them in their own organization and reorganizing business or production processes, like RFID technology (Chao et al., 2007 ), nanotechnology (Bhat, 2005 ) or IoT-based business process (re)engineering (Sestino et al., 2020 ). Hence—although it is one of the most prominent emerging technologies discussed in current mass media—this might be one of the reasons for why there is yet very limited BMS research studying VA-related challenges and opportunities that could inform companies.

High-tech companies striving to develop VA-related business models need to consider and integrate scholarly knowledge from disciplines as different as CS, SS, and BMS to meet the requirements of “a secure conversational landscape where users feel safe” (Olson and Kemery, 2019 , p. 24). However, such interdisciplinary perspectives are yet hardly available—instead, we see a large amount of scattered disciplinary scholarly knowledge. This situation makes it difficult to assess opportunities for future VA-related services and to develop sustainable business models that offer a potential competitive advantage. In this paper, we set out to contribute to such an assessment by organizing and making sense of the scholarly knowledge from CS, SS, and BMS. We follow earlier research on the assumption that assessing the state of emergent technologies and making sense of available knowledge on new phenomena requires an interdisciplinary perspective (Bhat, 2005 ; Melkers and Xiao, 2010 ; Sestino et al., 2020 ) to pin down and forecast the technology’s future impact and to advise companies in their technology adoption decisions (Leahey et al., 2017 ; Demidova, 2018 ; McLean and Osei-Frimpong, 2019 ). The literature review we present here is therefore additionally aimed at substantiating the call for interdisciplinarity of research into emerging technologies that aim to offer insights about business opportunities.

Our aim of making sense of a large amount of disorganized scholarly knowledge on VAs, assessing challenges and opportunities for businesses, and identifying avenues for future interdisciplinary research, made a systematic literature review appear to be the most appropriate research strategy: Literature reviews enable systematic in-depth analyses about the theoretical advancement of an area (Callahan, 2014 ). Earlier research with similar aims that studied other emerging technologies found the method “useful for making sense of the noise” (Sestino et al., 2020 :1) in a fast-growing body of scholarly literature (Fig. 1 ).

figure 1

Innovation dimensions by IDEO: feasibility-viability-desirability (after IDEO, 2009 ).

For our research, we decided to combine a conventional literature review that applies qualitative content analysis, with bibliometric analysis. The bibliometric analysis provides an overview of connections between research articles and the intersection of different research areas (Singh et al., 2020 ). The qualitative content analysis-based literature review offers a more in-depth overview of the current state of the literature (Petticrew and Roberts, 2006 ). Earlier scholarly work indicates that such a combination is particularly useful for analyzing the current state of technology trends and the significance of forecasts (Chao et al., 2007 ; Li et al., 2019 ). Figure 2 depicts the methodological research approach of this study.

figure 2

Overview of the methodological research approach of this study.

In the following, we describe the methodological approach in detail.

Article identification and screening

The literature search employed the Scopus database, as the coverage for the Scopus and Web of Science databases is similar (Harzing and Alakangas, 2016 ). In the literature search, we employed the keywords “voice assistant” and synonyms of it (“Voice assistant” OR “Virtual assistant” OR “intelligent personal assistant” OR “voice-activated personal assistant” OR “conversational agent” OR “SIRI” OR “Alexa” OR “Google Assistant” OR “Bixby” OR “Smart Loudspeaker” OR “Echo” OR “Smart Speaker”) and “home” and synonyms of it (“home” OR “house” OR “household”). The automated bibliometric analysis scanned titles, abstracts, and keywords of the article for these terms. We used the search field “theme” including title, abstract, and keywords (compare 3.2). Due to the focus of the research, the search was restricted to articles published in the CS, SS, and BMS areas, written in English, and published before May 2020.

We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guide proposed by Moher et al. ( 2009 ) for the bibliometric literature review. The initial search yielded 428 articles in the CS, 356 articles in the SS, and 40 articles in BMS. After scanning the abstracts of all documents in the list of each field, further articles were excluded based on their relevance to our topic. The most frequent reason for excluding an article was that it was not about VAs—e.g., articles found with the keyword “echo” referred to acoustic phenomena. Table 1 displays the descriptive results of the bibliometric literature review.

The final dataset included 267 articles in CS, 52 articles in SS, and 20 articles in BMS.

Tables 2 and 3 display the most frequent countries of origin for SS and CS.

Tables 2 and 3 present the top countries of origin of the articles from CS and SS. There was no information related to the countries of origin of the bms articles. In view of the many (regionally differing) legal questions and regulatory issues, it is important to see that, while the US is leading the list by a large margin, the discussion is also spread over countries from different continents.

Data analysis step 1: Bibliometric literature review

The final dataset consisted of bibliometric information including the author names, affiliations, titles, abstracts, publication dates, and citation information. The bibliometric analysis was conducted in each discipline separately using the VOSviewer software. For each discipline, we visualized common knowledge patterns through co-occurrence networks in the VA literature. A co-occurrence network contains keywords with similar meanings that can distort the analyses. Therefore, synonyms were grouped into topics using the VOSviewer thesaurus to ensure a rigorous analysis. For example, the keywords „voice assistant“, „virtual assistant“, „intelligent personal assistant“, „voice-activated personal assistant“, „conversational agent“, „SIRI“, „Alexa“, „Google Assistant“, „Bixby“, „smart loudspeaker“, „Echo“, „smart speaker“ were replaced with the main term “voice assistant”. Also, keywords were standardized to ensure uniformity and consistency (e.g., singular and plural). Further, a few keywords were also deleted from the thesaurus to ensure the focus of the review around the research questions of this study.

Scopus provides Subject Areas—we used these areas to generate the bibliometric analysis (e.g., select CS to analyze all papers from that area). When cleaning the data set—e.g., excluding non-relevant papers—some papers could be assigned to more than one area by checking the author’s affiliation. The co-occurrence networks (Figs. 3 – 5 ) of the keywords were obtained automatically from scanning titles, abstracts, and keywords of the articles in the final cleaned datasets. The networks present similarities between frequently co-occurring keywords (themes or topics) in the literature (Van Eck and Waltman, 2010 ). The co-occurrence number of two keywords is the number of articles that contain both keywords (Van Eck and Waltman, 2014 ). VOSviewer places these keywords in the network and identifies clusters with similar themes, and with each color representing one cluster (Van Eck and Waltman, 2010 ). The colors, therefore, reflect topical links and common themes. Boundaries between these clusters are fluid: ‘affordance’ for example (in Fig. 4 ) is in the light green cluster denoting research on VA systems—but it is also connected to the red cluster, discussing security issues (compare Fig. 4 ). The assignment to the ‘green cluster happens based on more frequent links to this topic. The co-occurrence networks for our three scholarly disciplines are displayed in Figs. 3 – 5 . By discussing the clusters, nine topic themes for our research emerged (compare next section).

figure 3

The frequently co-occurring keywords, themes, or topics in research in the CS field on VAs in private households.

figure 4

The frequently co-occurring keywords, topics, or themes in research in the SS field on VAs in private households.

figure 5

The frequently co-occurring keywords, topics, or themes in research in the BMS field on VAs in private households.

We can see that the networks and the topics covered differ in the three scientific areas. By studying and grouping the research topics that were revealed in the co-occurrence analysis within and across scientific areas, we identified nine thematic clusters in VA research. We labeled these clusters as “Smart devices” (cluster 1), “Human–computer interaction (HCI) and user experience (UX)” (cluster 2), “Privacy and technology adoption” (cluster 3), “VA marketing strategies” (cluster 4), “Technical challenges in VA applications development” (cluster 5), “Potential future VAs and augmented reality (AR) applications and developments” (cluster 6), “Efficiency increase by VA use” (cluster 7), “VAs providing legal evidence” (cluster 8) and “VAs supporting assisted living” (cluster 9). The clusters emerged from discussing the different research areas displayed in Figs. 3 – 5 in relation to our research question on strategies around VA solutions in private households. Essentially, the process of finding appropriate clusters for our research involved scanning the research areas, listening, and discussing possible grouping until the four researchers of this paper agreed on a final set of nine clusters. The nine clusters encompass different areas and terms in the figures—e.g., cluster 1 (smart devices) covered the areas ‚virtual assistants‘, ‚conversational agents‘, ‚intelligent assistants‘, ‚home automation‘, and ‚smart speakers‘, ‚smart technology‘. Cluster 2 (HCI and UX) includes areas such as ‘voice user interfaces’, ‘chatbots’, ‘human–computer interaction’, ‘hand-free speakers’. Some of the clusters we identified in this process contained only a small number of areas, such as cluster 4 (marketing strategies), which essentially covers the research areas ‘marketing’ and ‘advertising’.

Data analysis step 2: Qualitative content analysis

It can be difficult to derive qualitative conclusions from quantitative data, which is why, in this study, we additionally conducted a qualitative content analysis of the 267 articles in the cleaned dataset. The objective of this second step was to rigorously assess the results from the bibliometric review, ensuring that the identified nine themes identified in stage 1 are in accordance with the main tenets presented in the literature. Any qualitative content analysis of literature suffers, to a certain extent, from the subjective opinions of the authors. However, the benefits of this method are indisputable and follow a well-established approach used in past studies of a similar kind. To counter the risk of subjectivity in data analysis, we involved three researchers in it, thereby triangulating investigators (Denzin, 1989 ; Flick, 2009 ). We adopted Krippendorff’s ( 2013 ) content analysis methodology to ensure a robust analysis and help with the contextual dimensions of each research field.

In the first step, the nine clusters identified by using VOSViewer were evaluated by the three researchers independently by assigning each of the 267 articles to one of the nine thematic clusters. During this process, it became apparent that the qualitative content analysis confirmed the bibliometric analysis to a large extent, i.e., that most of the articles belonged to the clusters proposed in the bibliometric analysis. However, we excluded 60 articles in this process step, since many of the less obvious thematic mismatches of the articles can only be found in a more in-depth cleaning of the data set: 5 were duplicates (4 allocated to CS, 1 to SS) and 55 papers (46 from CS, 2 from SS and 6 from BMS) were not about VAs in private households. This left us with an overall sample of 207 articles (see the list in the Appendix).

Moreover, we identified articles that belonged to other clusters than suggested in the bibliometric analysis, and assigned them, after discussions with the research team, to the correct cluster. For example, the bibliometric analyses had originally not classified any of the articles in cluster 2 (“HCI and UX”) as belonging to the BMS area, while we identified such articles during the qualitative content analysis. Table 4 below displays the distribution of articles in the final dataset.

After having accomplished this data cleaning, we developed short summary descriptions summarizing the content of the research in each of the nine clusters (see section “Thematic clusters in recent VA research”).

In a final step, we condensed the nine clusters into four meaningful streams, representing distinguishable VA research topics that can support the emergence of interdisciplinary perspectives in research that studies VAs in private households. We applied the following procedure to obtain clusters and allocate papers from the clusters to the streams: First, three researchers independently conceptualized topical research streams. Then, all researchers discussed these streams and agreed on topical headlines reflecting the terminology used in the respective research. Next, they allocated—again first working independently and later together—papers to the four research streams presented in chapter 5. Our aim of finding meaningful streams that can support the emergence of interdisciplinary research on VA in private households made a qualitative procedure appear to be the most appropriate strategy for this step in the analysis. Qualitative analysis helps organize data in meaningful units (Miles and Huberman, 1994 ).

Thematic clusters in recent VA research

From our analysis, recent research on VA in private households can be divided into nine thematic clusters. In the following, we briefly present these clusters and elaborate on connections between the contributions from the three research areas we considered.

Cluster 1: Smart device solutions

Cluster 1 comprises publications on smart device solutions in smart home settings and their potential in orchestrating various household devices (Amit et al., 2019 ). Many CS papers present prototypes of web-based smart home solutions that can be controlled with voice commands, like household devices enabling location-independent access to IoT-based systems (Thapliyal et al., 2018 ; Amit et al., 2019 ; Jabbar et al., 2019 ). A research topic that appears in both the CS and SS areas relates to users’ choices, decisions, and concerns (Pridmore and Mols, 2020 ). Concerns studied relate to privacy issues (Burns and Igou, 2019 ) or the impact of VA use on different age groups of children (Sangal and Bathla, 2019 ).

A topic researched in all three scientific domains is the potential of VAs for overcoming the limitations of home automation systems. CS papers typically cover suggestions for resolving mainly technical limitations, such as those concerning language options (Pyae and Scifleet, 2019 ), wireless transmission range (Jabbar et al., 2019 ), security (Thapliyal et al., 2018 ; Parkin et al., 2019 ), learning from training with humans (Demidova, 2018 ), or sound-based context information (Alrumayh et al., 2019 ). SS research mostly investigates the limitations of VAs in acting as an interlocutor and social contact for humans (Lopatovska and Oropeza, 2018 ; Hoy, 2018 ; Pridmore and Mols, 2020 ), or identifies requirements for more user-friendly and secure systems (Vishwakarma et al., 2019 ). Finally, BMS papers focus on studying efficiency gains from using VAs, for example in the context of saving energy (Vishwakarma et al., 2019 ).

Cluster 2: Human–computer interaction and user experience

Cluster 2 contains human–computer interaction (HCI) research on the users’ experience of VA technology. Researchers investigate user challenges that result from unmet expectations concerning VA-enabled services (Santos-Pérez et al., 2011 ; Han et al., 2018 ; Komatsu and Sasayama, 2019 ). Papers from the SS area are typically discussing language issues (Principi et al., 2013 ; King et al., 2017 ).

A central topic covered both in the CS and BMS publications is trust in and user acceptance of VAs (e.g., Hamill, 2006 ; Hashemi et al., 2018 ; Lackes et al., 2019 ). From the BMS perspective, researchers find that trust and perceived (dis)advantages are factors influencing user decisions on buying or utilizing VAs (Lackes et al., 2019 ). Complementary, CS researchers find that the usefulness of human-VA interactions and access to one’s own household data impacts the acceptance of VAs (e.g., Pridmore and Mols, 2020 ). The combination of these two scientific disciplines discussing a topic without SS entering the debate is unique in our data material.

‘Humanized VAs’ is a topic discussed both in CS and SS research. In CS, this includes quasi-human voice-enabled assistants acting as buddies or companions for older adults living alone (Tsiourti et al., 2018a , b ) or technical challenges with implementing human characteristics (Hamill, 2006 ; Lopatovska and Oropeza, 2018 ; Jacques et al., 2019 ). Two papers from both CS and SS contributed to the theory of anthropomorphism in the VA context (Lopatovska and Oropeza, 2018 ; Pradhan et al., 2019 ). SS additionally offers findings about user needs, like the preferred level of autonomy and anthropomorphism for VAs (Hamill, 2006 ).

Cluster 3: Privacy and technology adoption

Cluster 3 consists predominantly of CS research into privacy-related aspects like the security risks of VA technology and corresponding technical solutions to minimize them (e.g., Dörner, 2017 ; Furey and Blue, 2018 ; Pradhan et al., 2019 ; Sudharsan et al., 2019 ). An exception concerns the user-perceived privacy risks and concerns that are studied in all three scientific domains. Related papers discuss these topics with a focus on user attitudes towards VA technology, resulting in technology adoption, and identify factors motivating VA application (e.g., Demidova, 2018 ; Fruchter and Liccardi, 2018 ; Lau et al., 2018 ; Pridmore and Mols, 2020 ): Perceived privacy risks are found to negatively influence user adoption rates (McLean and Osei-Frimpong, 2019 ). In CS studies, researchers predominantly propose solutions for more efficient VA solutions that users would want to bring into their homes (Seymour, 2018 ; Parkin et al., 2019 ; Vishwakarma et al., 2019 ). These should be equipped with standardized frameworks for data collection and processing (Bytes et al., 2019 ), or with technological countermeasures and detection features to establish IoT security and privacy protection (Stadler et al., 2012 ; Sudharsan et al., 2019 ; Javed and Rajabi, 2020 ). Complementary, SS researchers investigate measures for protecting the privacy of VA users beyond technical approaches, such as legislation ensuring privacy protection (Pfeifle, 2018 ; Dunin-Underwood, 2020 ).

Cluster 4: VA marketing strategies

Cluster 4 comprises research developing strategies for advertising the use of VAs in private households. We find here articles exclusively from BMS. Scholars address various aspects of VA marketing strategies, such as highlighting security improvements or enhanced user-friendliness and intelligence of the devices (e.g., Burns and Igou, 2019 ; Vishwakarma et al., 2019 ). Others study how to measure user satisfaction with VA technology (e.g., Hashemi et al., 2018 ).

Cluster 5: Technical challenges in VA applications development

Cluster 5 contains predominantly CS research papers investigating and proposing solutions for technical challenges in VA application development. Recent work focuses on extensions and improvements for the technologically relatively mature mass-market VAs (e.g., Liciotti et al., 2014 ; Azmandian et al., 2019 ; Jabbar et al., 2019 ; Mavropoulos et al., 2019 ). Some research investigates ways to overcome the technical challenges of VAs in household environments: For example, King et al. ( 2017 ) work on more robust speech recognition, and Ito ( 2019 ) proposes an audio watermarking technique to avoid the misdetection of utterances from other VAs in the same room. Further research on technological improvements includes work on knowledge graphs (Dong, 2019 ), on cross-lingual dialog scenarios (Liu et al., 2020 ), on fog computing for detailed VA data analysis (Zschörnig et al., 2019 ), and on the automated integration of new services based on formal specifications and error handling via follow-up questions (Stefanidi et al., 2019 ).

We identify a complementarity between CS and SS research within the research topic of “affective computing”. In both research domains, researchers strive to identify ways to create more empathic VAs. For example, Tao et al., ( 2018 ) propose a framework that conceptualizes several dimensions of emotion and VA use. SS research contributes to a virtual caregiver prototype aware of the patient’s emotional state (Tironi et al., 2019 ). However, scholarly contributions in the two areas are not related to each other.

Cluster 6: Potential future VA applications and developments

Cluster 6 investigates the future of VAs research, particularly technological advancements we can expect and suggestions for future research avenues. Most CS papers introduce prospective potential technical applications in many different areas, such as medical treatment and therapy (Shamekhi et al., 2017 ; Pradhan et al., 2018 ; Patel and Bhalodiya, 2019 ) or VA content creation and retrieval (Martin, 2017 ; Kita et al., 2019 ). A sub-group of papers also proposes functional prototypes (e.g., Yaghoubzadeh et al., 2015 ; Freed et al., 2016 ; Tielman et al., 2017 ).

We identify three topics that are discussed in both SS and CS publications. The first focuses on language and VAs and represents an area where CS research relates to SS findings: While SS identifies open language issues in dialogs with VAs (Martin, 2017 ; Ong et al., 2018 ; Huxohl, et al., 2019 ), CS researchers investigate how to approach them - not only at the technological level of speech recognition but also in terms of what it means to have a conversation with a machine (Yaghoubzadeh et al., 2015 ; Ong et al., 2018 ; Santhanaraj and Barkathunissa, 2020 ). A second focus is on near-future use scenarios (Hoy, 2018 ; Seymour, 2018 ; Tsiourti et al., 2019 ; Burns and Igou, 2019 ) such as VA library services, VA services for assisted living or support VAs for emergency detection and handling. The third common topic is about identifying future differences between the use of VAs in private households and in other environments like public spaces (Lopatovska and Oropeza, 2018 ; Robinson et al., 2018 ).

Cluster 7: Efficiency increase by VA use

Cluster 7 consists of papers about efficiency increase through VA use—with a focus on smart home automation systems. Papers in BMS discuss the increasing efficiency of home automation systems through the use of VAs (Vishwakarma et al., 2019 ). CS papers study and appraise the efficiency of home automation solutions and use cases, more efficient VA automation systems, interface device solutions (Liciotti et al., 2014 ; Jabbar et al., 2019 ; Jacques et al., 2019 ), effective activity assistance (Freed et al., 2016 ; Palumbo et al., 2016 ; Tielman et al., 2017 ), care for elderly people (Donaldson et al., 2005 ; Wallace and Morris, 2018 ; Tsiourti et al., 2019 ) , and smart assistive user interfaces and systems of the future (Shamekhi et al., 2017 ; Pradhan et al., 2018 ; Mokhtari et al., 2019 ). SS has not yet contributed to this cluster.

Cluster 8: VAs providing legal evidence

Cluster 8 addresses the rather novel topic of digital forensics in papers from the CS and SS domains. The research studies how VA activities can inform court cases. Researchers investigate which information can be gathered, derived, or inferred from IoT-collected data, and what approaches and tools are available and required to analyze them (Shin et al., 2018 ; Yildirim et al., 2019 ).

Cluster 9: VAs supporting assisted living

Cluster 9 comprises papers on VAs supporting assisted living. CS papers explore and describe technical solutions for the application of VAs in households and everyday task planning (König et al., 2016 ; Tsiourti et al., 2018a ; Sanders and Martin-Hammond, 2019 ), for improving aspects of companionship (Donaldson et al., 2005 ), for stress management in relation to chronic pain (Shamekhi et al., 2017 ), and for the recognition of distress calls (Principi et al., 2013 ; Liciotti et al., 2014 ). CS scholars also study user acceptance and the usability of VA for elderly people (Kowalski et al., 2019 ; Purao and Meng, 2019 ).

CS and SS both share a research focus on VAs helping people maintain a self-determined lifestyle (Yaghoubzadeh et al., 2015 ; Mokhtari et al., 2019 ) and on their potential and limitations for home care-therapy (Lopatovska and Oropeza, 2018 ; Kowalski et al., 2019 ; Turner-Lee, 2019 ), but without relating findings to each other.

Analysis and conceptualization of research streams

When comparing the bibliometric and the qualitative content analysis, the clusters found in the bibliometric analysis were confirmed to a large extent. The comparison did, however, also lead to the allocation of some articles to different areas. The content analysis particularly helped subsume the nine clusters in four principal research streams. The overview that we gained based on the four streams points to interdisciplinary research topics that need to be studied by scholars wanting to help realize VA potential through applications perceived as safe by users.

What all research domains share to a certain extent is a focus on users’ perceived privacy risks and concerns and a focus on the impact of perceived risks or concerns on the adoption of VA technology. At the same time, our findings confirm our assumption that these complementarities are generally not well used for advancing the field: In CS, researchers predominantly study future application development and technological advancements, but—except for language issues (cluster 6)—they do not relate this much to solving challenges identified in SS and BMS research. In the following, we first present an overview of the four deduced research streams and, in the next section, propositions and the conceptual model for future interdisciplinary research that we developed based on our analysis.

The four major research streams into which we consolidated the identified nine thematic clusters from our literature review are labeled as “Conceptual foundation of VA research” (stream 1), ”Systemic challenges, enabling technologies and implementation” (stream 2),” Efficiency” (stream 3) and “VA applications and (potential) use cases” (stream 4). The streams were obtained in a qualitative procedure, where three researchers conceptualized streams independently and discussed potentially meaningful streams together (compare 3.3). Table 5 provides an overview of the four main streams identified in VA literature and presents selected publications for each of the streams.

The streams systematize the scattered body of VA research in a way that offers clearly distinguishable interdisciplinary research avenues to assist in strategizing around and realizing VA technology potential with applications that are perceived as safe and make a real difference in the everyday life of users. The first stream includes all papers offering theoretical and conceptual knowledge. Papers, for example, conceptualize challenges for VA user perceptions or develop security and privacy protection concepts. Systemic challenges and enabling technologies to form a second stream in VA research. This particularly includes systemic security and UX challenges, and legal issues. Efficiency presents the third research stream, in which scholars particularly investigate private people’s awareness of how VA can make their homes more efficient and asks how VA can be advertised to private households. Finally, VA applications and potential use cases form a fourth research stream. It investigates user expectations and presents prototypes for greater VA use in future home automation systems, medical care, or IOT forensics.

The overview that we gain based on the four streams enables us to frame the contributions of the research domains to VA research more clearly than based on the nine clusters. We find that all research areas contribute publications in all streams. However, the number of contributions varies: CS acts as the main driver of current developments with most publications in all research streams. CS research predominantly addresses systemic challenges, enabling technologies and technology implementation. We recognize increasing scholarly attention on user-oriented VA applications and on VA systems for novel applications beyond their originally intended usage—such as exploiting the microphone array for sensing a user’s gestures and tracking exercises (Agarwal et al., 2018 ; Tsiourti et al., 2018a / b ), or using VA data for forensics (Dorai et al., 2018 ; Shin et al., 2018 )—which indicates that the fundamental technical challenges in the development of this emergent technology are solved. SS so far mainly contributed to the theoretical foundation of VA design principles and use affordance (Yusri et al., 2017 ), and with the theory that supports developing concrete applications. It also conceptualizes the potential or desirable impact of VA in real-life settings, such as increasing the comfort and quality of life through low-cost smart home automation systems combining VA and smartphones (Kodali et al., 2019 ), or VA adding to content creation (Martin, 2017 ). The contributions by BMS scholars are mainly aimed at researching and promoting efficiency increases from using VAs.

Discussion: Propositions and a framework for future research, and related business opportunities

In this paper, we used a systematic literature review approach combining a bibliometric and qualitative content analysis to structure the dispersed insights from scholarly research on VAs in CS, SS, and BMS, and to conceptualize linkages and common themes between them. We identified four major research streams and specified the contributions of researchers from the different disciplines to them in a conceptual overview. Our research allows us to confirm advances in the technological foundations of VAs (Pyae and Joelsson, 2018 ; Lee et al., 2019 ; McLean and Osei-Frimpong, 2019 ), and some concrete VAs like Alexa, Google, and Siri have already arrived in the mass market. Still, more technologically robust and user-friendly solutions that meet their legal requirements for data security will be needed to spark broader user interest (Kuruvilla, 2019 ; Pridmore and Mols, 2020 ).

Propositions for future research

We find that recent research from the three domains contributes to the challenges that literature identified as hindering a broader user adaption of VA in different ways, and with different foci. Table 6 summarizes the identified challenges and domain-specific research contributions.

However, to advance VA’s adoption in private households. more complex VA solutions will need to convince users that the perceived privacy risks are solved (Kowalczuk, 2018 ; Lackes et al., 2019 ). To this end, all three research domains will need to contribute: CS is required to come up with defining comprehensible frameworks for data collection and processing (Bytes et al., 2019 ), and solutions to ensure data safety (Mirzamohammadi et al., 2017 ; Sudharsan et al., 2019 ; Javed and Rajabi, 2020 ). Complementary, SS should identify the social and legal conditions which users perceive as safe environments for VA use in private households (Pfeifle, 2018 ; Dunin-Underwood, 2020 ). Finally, BMS is urged to identify user advantages that go beyond simple efficiency gains, investigate the benefits of accessing one’s own data and find metrics for user trust in technology applications (Lackes et al., 2019 ). Particularly, SS research is providing potentially valuable insights into users’ perceptions and use case areas such as home medical care or assisted living that would be worth to be taken into account by CS scholars developing advanced solutions, and vice versa benefit from taking available technical solutions into consideration. Similarly, BMS scholarly research exhibits a rather narrow focus on increasing the efficiency of activities by using VA applications, and on how to market these solutions to private households. CS scholars complement this focus with technical solutions aimed at increasing the efficiency of automated home systems, but the research efforts from the two domains are not well aligned. VA security-related issues and solutions, limitations of VA applications for assisted living, and effects of humanization and anthropomorphism seem to be under-investigated topics in BMS.

Thus, our first proposition reads as follows:

Proposition 1 : To advance users' adoption of complex VA applications in private households, domain-specific disciplinary efforts of CS, SS, and BMS need to be integrated by interdisciplinary research .

Our study has shown that this is particularly important to arrive at the necessary insights into how to overcome VA security issues and VA technological development constraints CS works on and, at the same time, deal with the effects of VA humanization (SS research) and develop VA-related business opportunities (BMS research) in smart home systems, assisted living, medical home therapy, and digital forensic. Therefore, we define the following three sub-propositions:

Proposition 1.1 : In order to realize VA potential for medical care solutions that are perceived as safe by users, research insights from studies on VA perception and on perceived security issues from SS need to be integrated with CS research aimed at resolving the technical constraints of VA applications and with BMS research about the development of use cases desirable for private households and related business models .

Proposition 1.2 : To advance smart home system efficiency and arrive at regulations that make users perceive the usage of more complex applications as safe , research insights from studies on systemic integration, and security-related technical solutions from CS need to be studied and developed .

Proposition 1.3 : In order to increase our knowledge of social and economic conditions for VA adoption in private households, BMS and SS research needs to integrate insights from research with users with VA prototypes and research about near-future scenarios of VA use to model and test valid business cases that are not based on mere assumptions of efficiency gains .

In our four streams, we moreover recognize a common interest in studying VAs beyond isolated voice-enabled ‘butlers’. In essence, VAs are increasingly investigated as gateways to smart home systems which are enabling interaction with entire ecosystems. This calls, next to the development of more complex technical applications in CS, mainly for more future research into the social (SS) and economical (BMS) conditions enabling the emergence of such ecosystems—from the necessary changes in regulations to insurance and real estate issues to designing marketing strategies for VA health applications in the home (Olson and Kemery, 2019 ; Bhat, 2005 ; Melkers and Xiao, 2010 ; Sestino et al., 2020 ). The above is not only true for the three scientific domains which we looked at, but also calls for the integration of complementary VA-related research in adjacent disciplines, such as law, policy, or real estate. Our second proposition thus reads as follows:

Proposition 2: To advance users’ adoption of complex VA applications in private households, research needs to perform interdisciplinary efforts to study and develop ways to overcome ecosystem-related technology adoption challenges .

Conceptual framework for future research

As outlined above, future research wishing to contribute to increasing user acceptance and awareness and to generate use cases that make sense for private households in everyday life is urged to make interdisciplinary efforts to integrate complementary findings.

The conceptual framework (Fig. 6 ) presents avenues for future research. The figure highlights Propositions 1 and 2 that emphasize the need to advance user adaptation through interdisciplinary research that can help overcome challenges from complex VA applications (Proposition 1) and ecosystem-related technology adoption challenges (Proposition 2), to advance users’ adoption of complex VA applications. Furthermore, the figure reflects three sub-propositions that summarize relevant avenues for interdisciplinary work that can help solve VA-related security issues, generate security and privacy protection concepts, and advance frameworks for legal regulations. The first sub-propositions is research that helps find solutions for home medical care where VA limitations and security issues are solved. Sub-proposition 2 consists of research needed to advance systemic integration and security-related solutions for efficiency and the regulation of smart home systems. The third sub-proposition involves research that can help define social and economic conditions for VA and create business opportunities by including insights from user research with VA prototypes and from research with near-future scenarios that can model and test valid business cases that are not based on mere assumptions of efficiency gains.

figure 6

The framework highlights the focus of propositions 1 and 2 and reflecting propositions 1.1, 1,2, and 1.3.

Identified business opportunities that will help realize VA potential

Overall, we confirm that VA is not a technology that enables companies to profit from implementing it in their own organizations or make business processes more efficient like other technological innovations (Bhat, 2005 ; Chao et al., 2007 ; Sestino et al., 2020 ). Instead, we find that companies need to build business models around VA-related products and services that users perceive as safe and beneficial. Table 7 below provides an overview of potential areas providing such business opportunities, the technology maturity of these areas, and social and business-related challenges, which need to be solved to fully access VA potential for the everyday life of users.

As shown, the three areas where we identified business opportunities from literature, i.e. smart home systems (Freed et al., 2016 ; Thapliyal et al., 2018 ; Jabbar et al., 2019 ), assisted living and medical home therapy (König et al., 2016 ; Tsiourti et al., 2018a / b ; Sanders and Martin-Hammond, 2019 ), and digital forensics (Shin et al., 2018 ; Yildirim et al., 2019 ) exhibit different technology, social system conditions, and business model maturity models. It is relevant to say that, although in our review, cluster 8 ‘digital forensics’ consisted of only two papers, we can expect this to be an increasingly salient cluster in the next few years due to the importance of the topic for governmental bodies and society.

Designing appropriate business models will require companies, in the first step, to develop a deep understanding of the potential design of future ecosystems, i.e. of “the evolving set of actors, activities, and artifacts, including complementary and substitute relations, that are important for the innovative performance of an actor or a population of actors.” (Granstrand and Holgersson, 2020 , p. 3). We here call for interdisciplinary research that develops and integrates the necessary insights in a thorough and, for companies, comprehensible manner.

Methodology

In this paper, we used a relatively new approach to a literature review: We combined an automated bibliometric analysis with qualitative content analysis to gain holistic insights into a multi-faceted research topic and to structure the available body of knowledge across three scientific domains. In doing so, we followed the advice in recent research that found the classical, purely content-based literature reviews to be time-consuming, lacking rigor, and prone to be affected by the researchers’ biases (Caputo et al., 2018 ; Verma and Gustafsson, 2020 ). Overall, we can confirm that automating literature research through VOSviewer turned out to be a time-saver regarding the actual search across (partly domain-specific) sources and the collection of scientific literature, and it allowed us to relatively quickly identify meaningful research clusters based on keywords in an enormous body of data (Verma, 2017 ; Van Eck and Waltman, 2014 ). However, we also found that several additional steps were necessary to assuring the quality of the review: Despite the careful selection of keywords, the initial literature list contained several irrelevant articles (i.e., not addressing VA-related topics, yet involving the keywords ‘echo’ and ‘home’).

Thus, manual cleaning of the literature lists was required before meaningful graphs could be generated by VOSviewer. The consequent step of identifying research clusters from the graphs demanded broad topical expertise. We found this identification of clusters to be—as described by Krippendorff ( 2013 )—a necessarily iterative process, not only to continuously refine meaningful clusters but also to reach a common understanding and interpretation in an interdisciplinary team. In a similar vein, deriving higher-level categories, i.e. the research streams, turned out to require iterative refinements.

Retrospectively, the quantitative bibliometric analysis helped in recognizing both core topics and gaps in VA-related research with comprehensive reach. The complementary content analysis yielded insights into intersections and overlaps in research by the different areas considered and enabled the identification of further promising avenues for interdisciplinary research.

Conclusions

From our study, we conclude that research into VA-based services is not taking advantage of the potential synergies across disciplines. Business opportunities can specifically be found in spaces that require the combination of research domains that are still disconnected. This should be taken into account when looking for information that can help predict the service value of smart accommodation (Papagiannidis and Davlembayeva, 2022 ) or characteristics of future technology use cases that can fit users’ needs (Nguyen et al., 2022 ). This can also support scholars and managers in strategizing about future business opportunities (Brem et al., 2019 ; Antonopoulou and Begkos, 2020 ).

In consequence, our framework and the propositions we developed highlight the fact that more interdisciplinary research is needed and what type of research is needed to advance the development and application of VA in private households and, by implication, inform companies about future business opportunities.

The study also provides concrete future characteristics of VA use cases technology: Constant development in research on VAs, e.g., on novel devices and complementary technology like artificial intelligence and virtual reality, suggests that future VAs will no longer be limited to audio-only devices, but increasingly feature screens and built-in cameras, and offer more advanced use cases. Accordingly, embodied VAs in the form of for example social robots, require further technology advancement and integration, and studies on user perception.

Implications for managers

Our research enabled us to identify and describe the most promising areas for business opportunities while highlighting related technological, social, and business challenges. From this, it became obvious that managers need to take all three dimensions and related types of challenges into account in order to successfully predict characteristics of future technology use cases that fit users’ needs, and use this information for their strategy development processes (Brem et al., 2019 ; Antonopouloua and Begkos, 2020). This requires not just the design of new services and business models, but of complete business ecosystems, and the establishment of partnerships from the private sector. We moreover found that establishing trust in the safe and transparent treatment of privacy and data is key in getting users to buy and use services involving VA, while pure efficiency-based arguments are not enough to dispel current worries of potential users, like the data security of technology used to improve the tracking and monitoring of patients or viruses (Abdel-Basset et al., 2021 ).

Although our study investigated VAs in private households, with the growing acceptance of working from home, not the least due to the experiences made in the COVID-19 pandemic, our findings also have implications for organizing homework environments. While, for example, the Alexa “daily check” and Apple health check app can provide a community-based AI technology that can support self-testing and virus tracking efforts (Abdel-Basset et al., 2021 ), managers will need to ensure that company data is safe, and this will require them to consider how their employees use VA hardware at home.

Limitations

As with most research, this study has its limitations. While we see value in the combined approach taken in this research, as it allows insights around strategies for VA solutions that match the needs of private households, limitations can be seen in the qualitative approach of our methodology, which is subject to a certain degree of author subjectivity. Limitations of our work also relate to the fact that we included only articles from the Scopus database in this review. Thus, future research should consider articles published in other databases like EBSCO, Web of Science, or Google Scholar. Also, the study focused on only three scientific domains up to May 2020. This review paper does not offer a discussion of the consequences of the ongoing changes triggered by the Covid-19 pandemic for the use of VA solutions in private households. The impact of this disruptive pandemic experience on the use of VA is not yet well understood. More research will be necessary to obtain a complete account of how Covid-19 transformed the use of VA in private homes today and to help understand the linkages and intersections between further research areas using the same methodology.

The combined bibliometric and qualitative content analysis provided an overview of connections and intersections, and an in-depth overview of current research streams. Future research could conduct co-citation and/or bibliographic coupling analyses of authors, institutions, countries, references, etc. to complement our research.

Data availability

Datasets were derived from public resources. Data sources for this article are provided in the Methods section of this article. Data analysis documents are not publicly available as researchers have moved on to other institutions.

Abdel-Basset M, Chang V, Nabeeh NA (2021) An intelligent framework using disruptive technologies for COVID-19 analysis. Technol Forecast Soc Change 163:120431. https://doi.org/10.1016/j.techfore.2020.120431

Agarwal A, Jain M, Kumar P, Patel S (2018) Opportunistic sensing with MIC arrays on smart speakers for distal interaction and exercise tracking. In: IEEE Press (ed), 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 6403–6407

Alrumayh AS, Lehman SM, Tan CC (2019) ABACUS: audio based access control utility for smarthomes. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. pp. 395–400

Amit S, Koshy AS, Samprita S, Joshi S, Ranjitha N (2019) Internet of Things (IoT) enabled sustainable home automation along with security using solar energy. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 International Conference on Communication and Electronics Systems (ICCES). pp. 1026–1029

Ammari T, Kaye J, Tsai J, Bentley F (2019) Music, search, and IoT: how people (really) use voice assistants. ACM Trans Comput–Hum Interact 26:1–28. https://doi.org/10.1145/3311956

Article   Google Scholar  

Antonopoulou K, Begkos C (2020) Strategizing for digital innovations: value propositions for transcending market boundaries. Technol Forecast Soc Change 156:120042

Aylett MP, Cowan BR, Clark L (2019) Siri, echo and performance: you have to suffer darling. In: Association for Computing Machinery (ACM) (ed), Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. pp. 1–10

Azmandian M, Arroyo-Palacios J, Osman S (2019) Guiding the behavior design of virtual assistants. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 19th ACM international conference on intelligent virtual agents. pp. 16–18

Batat W (2021) How augmented reality (AR) is transforming the restaurant sector: Investigating the impact of “Le Petit Chef” on customers’ dining experiences. Technol Forecast Soc Change 172:121013

Bhat JSA (2005) Concerns of new technology based industries—the case of nanotechnology. Technovation 25(5):457–462. https://doi.org/10.1016/j.technovation.2003.09.001

Berg Insight (2022) The number of smart homes in Europe and North America reached 105 million in 2021, Press Releases, 20 April 2022. https://www.berginsight.com/the-number-of-smart-homes-in-europe-and-north-america-reached-105-million-in-2021

Birgonul Z, Carrasco O (2021) The adoption of multidimensional exploration methodology to the design-driven innovation and production practices in AEC industry. J Constr Eng Manag Innov 4(2):92–10. https://doi.org/10.31462/jcemi.2021.02092105

Brandt M (2018) Wenig echo in Deutschland. Statista

Brasser F, Frassetto T, Riedhammer K, Sadeghi A-R, Schneider T, Weinert C (2018) VoiceGuard: secure and private speech processing. In: International Speech Communication Association (ISCA) (ed), Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH. pp. 1303–1307

Brause SR, Blank G (2020) Externalized domestication: smart speaker assistants, networks and domestication theory. Inf Commun Soc 23(5):751–763. https://doi.org/10.1080/1369118X.2020.1713845

Brem A, Bilgram V, Marchuk A (2019) How crowdfunding platforms change the nature of user innovation–from problem solving to entrepreneurship. Technol Forecast Soc Change 144:348–360

Brenner W, Giffen BV, Koehler J (2021) Management of artificial intelligence: feasibility, desirability and viability. In: Aier S et al. (eds), Engineering the transformation of the enterprise. pp. 15–36

Burns MB, Igou A (2019) “Alexa, write an audit opinion”: adopting intelligent virtual assistants in accounting workplaces. J Emerg Technol Account 16(1):81–92. https://doi.org/10.2308/jeta-52424

Bytes A, Adepu S, Zhou J (2019) Towards semantic sensitive feature profiling of IoT devices. IEEE Internet Things J 6(5):8056–8064. https://doi.org/10.1109/JIOT.2019.2903739

Calaça J, Nóbrega L, Baras K (2019) Smartly water: Interaction with a smart water network. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2019 5th Experiment International Conference (Exp. at’19). pp. 233–234

Callahan JL (2014) Writing literature reviews: a reprise and update. Hum Resour Dev Rev 13(3):271–275. https://doi.org/10.1177/1534484314536705

Caputo A, Ayoko OB, Amoo N (2018) The moderating role of cultural intelligence in the relationship between cultural orientations and conflict management styles. J Bus Res 89:10–20. https://doi.org/10.1016/j.jbusres.2018.03.042

Carayannis EG, Turner E (2006) Innovation diffusion and technology acceptance: the case of PKI technology. Technovation 26(7):847–855. https://doi.org/10.1016/j.technovation.2005.06.013

Celebre AMD, Dubouzet AZD, Medina IBA, Surposa ANM, Gustilo RC (2015) Home automation using raspberry Pi through Siri enabled mobile devices. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2015 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM). pp. 1–6

Chan ZY, Shum P (2018) Smart office: a voice-controlled workplace for everyone. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2nd international symposium on computer science and intelligent control. pp. 1–5

Chao C-C, Yang J-M, Jen W-Y (2007) Determining technology trends and forecasts of RFID by a historical review and bibliometric analysis from 1991 to 2005. Technovation 27(5):268–279. https://doi.org/10.1016/j.technovation.2006.09.003

Clark M, Newman MW, Dutta P (2022) ARticulate: one-shot interactions with intelligent assistants in unfamiliar smart spaces using augmented reality. Proc ACM Interact Mob Wearable Ubiquitous Technol 6(1):1–24

Clemente C, Greco E, Sciarretta E, Altieri L (2022) Alexa, how do i feel today? Smart speakers for healthcare and wellbeing: an analysis about uses and challenges. Sociol Soc Work Rev 6(1):6–24

Google Scholar  

Columbus, L (2020) What’s new in Gartner’s hype cycle for emerging technologies, 2020. Forbes. https://www.forbes.com/sites/louiscolumbus/2020/08/23/whats-new-in-gartners-hype-cycle-for-emerging-technologies-2020/?sh=6363286fa46a

Demidova E (2018) Can children teach AI? Towards expressive human–AI dialogs. In: Vrandečić D, Bontcheva K, Suárez-Figueroa MC, Presutti V, Celino I, Sabou M, Kaffee L-A, Simperl E (eds), International Semantic Web Conference Proceedings (P&D/Industry/BlueSky). p. 2180

Denzin NK (1989) Interpretive biography, vol. 17. SAGE

Dercole F, Dieckmann U, Obersteiner M, Rinaldi S (2008) Adaptive dynamics and technological change. Technovation 28(6):335–348. https://doi.org/10.1016/j.technovation.2007.11.004

Deshpande NG, Itole DA (2019) Personal assistant based home automation using Raspberry Pi. Int J Recent Technol Eng

Donaldson J, Evnin J, Saxena S (2005) ECHOES: encouraging companionship, home organization, and entertainment in seniors. In: Association for Computing Machinery (ACM) (ed), Proceedings of the CHI’05 extended abstracts on human factors in computing systems. pp. 2084–2088

Dong XL (2019) Building a broad knowledge graph for products. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019-April. pp. 25–25

Dorai G, Houshmand S, Baggili I (2018) I know what you did last summer: Your smart home internet of things and your iPhone forensically ratting you out. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 13th international conference on availability, reliability and security. Article 3232814

Dörner R (2017) Smart assistive user interfaces in private living environments. In: Gesellschaft für Informatik e.V. (GI) (ed), Lecture notes in informatics (LNI), proceedings—series of the gesellschaft fur informatik (GI). pp. 923–930

Drucker PF (1988) The coming of the new organization. Reprint Harvard Business Review, 88105. https://ams-forschungsnetzwerk.at/downloadpub/the_coming-of_the_new_organization.pdf . Accessed 10 Jul 2022

Druga S, Williams R, Breazeal C, Resnick M (2017) “Hey Google is it OK if I eat you?” Initial explorations in child-agent interaction. In: Blikstein P, Abrahamson D (eds), Proceedings of the 2017 conference on Interaction Design and Children (IDC ’17). pp. 595–600

Dunin-Underwood A (2020) Alexa, can you keep a secret? Applicability of the third-party doctrine to information collected in the home by virtual assistants. Inf Commun Technol Law 29(1):101–119. https://doi.org/10.1080/13600834.2020.1676956

Elahi H, Wang G, Peng T, Chen J (2019) On transparency and accountability of smart assistants in smart cities. Appl Sci 9(24):5344. https://doi.org/10.3390/app9245344

Fathalizadeh A, Moghtadaiee V, Alishahi M (2022) On the privacy protection of indoor location dataset using anonymization. Comput Secur 117:102665

Flick U (2009) An introduction to qualitative research, 4th edn. SAGE

Freed M, Burns B, Heller A, Sanchez D, Beaumont-Bowman S (2016) A virtual assistant to help dysphagia patients eat safely at home. IJCAI 2016:4244–4245

Fruchter N, Liccardi I (2018) Consumer attitudes towards privacy and security in home assistants. In: Association for Computing Machinery (ACM) (ed), Extended Abstracts of the 2018 CHI conference on human factors in computing systems, 2018-April. pp. 1–6

Furey E, Blue J (2018) She knows too much—voice command devices and privacy. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of 2018 29th Irish Signals and Systems Conference (ISSC). pp. 1–6

Gartner (2019) Gartner predicts 25 percent of digital workers will use virtual employee assistants daily by 2021. Gartner https://www.gartner.com/en/newsroom/press-releases/2019-01-09-gartner-predicts-25-percent-of-digital-workers-will-u

Giorgi R, Bettin N, Ermini S, Montefoschi F, Rizzo A (2019) An iris+voice recognition system for a smart doorbell. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 8th Mediterranean Conference on Embedded Computing (MECO). pp. 1–4

Gnewuch U, Morana S, Heckmann C, Maedche A (2018) Designing conversational agents for energy feedback. In: Chatterjee S, Dutta K, Sundarraj RP (eds), Proceedings of the International conference on design science research in information systems and technology, vol 10844. pp. 18–33

Gong Y, Yatawatte H, Poellabauer C, Schneider S, Latham S (2018) Automatic autism spectrum disorder detection using everyday vocalization captured by smart devices. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. pp. 465–473

Goud N, Sivakami A (2019) Spectate home appliances by internet of things using MQTT and IFTTT through Google Assistant. Int J Sci Technol Res 8(10):1852–1857

Granstrand O, Holgersson M (2020) Innovation ecosystems: a conceptual review and a new definition. Technovation 90:102098

Grossman GM, Helpman E (1991) Innovation and growth in the global economy. MIT Press

Grossman-Kahn B, Rosensweig R (2012) Skip the silver bullet: driving innovation through small bets and diverse practices. Lead Through Design 18:815

Hamill L (2006) Controlling smart devices in the home. Inf Soc 22(4):241–249. https://doi.org/10.1080/01972240600791382

Han J, Chung AJ, Sinha MK, Harishankar M, Pan S, Noh HY, Zhang P, Tague P (2018) Do you feel what I hear? Enabling autonomous IoT device pairing using different sensor types. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 IEEE symposium on Security and Privacy (SP). pp. 836–852

Harzing A-W, Alakangas S (2016) Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison. Scientometrics 106(2):787–804. https://doi.org/10.1007/s11192-015-1798-9

Hashemi SH, Williams K, El Kholy A, Zitouni I, Crook PA (2018) Measuring user satisfaction on smart speaker intelligent assistants using intent sensitive query embeddings. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 27th ACM international conference on information and knowledge management. pp. 1183–1192

Hern A (2017) Google Home smart speaker brings battle of living rooms to UK. The Guardian. https://www.theguardian.com/technology/2017/mar/28/google-home-smart-speaker-launch-uk

Herring H, Roy R (2007) Technological innovation, energy efficient design and the rebound effect. Technovation 27(4):194–203. https://doi.org/10.1016/j.technovation.2006.11.004

Hofmann C, Orr S (2005) Advanced manufacturing technology adoption—the German experience. Technovation 25(7):711–724. https://doi.org/10.1016/j.technovation.2003.12.002

Hoy MB (2018) Alexa, Siri, Cortana, and more: an introduction to voice assistants. Med Ref Serv Q 37(1):81–88. https://doi.org/10.1080/02763869.2018.1404391

Article   PubMed   Google Scholar  

Hu J, Tu X, Zhu G, Li Y, Zhou Z (2013) Coupling suppression in human target detection via impulse through wall radar. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2013 14th International Radar Symposium (IRS), vol 2. pp. 1008–1012

Huxohl T, Pohling M, Carlmeyer B, Wrede B, Hermann T (2019) Interaction guidelines for personal voice assistants in smart homes. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 international conference on Speech Technology and Human–Computer Dialogue (SpeD). pp. 1–10

IDEO.org (2009) Human-centred design toolkit. IDEO.org

Ichikawa J, Mitsukuni K, Hori Y, Ikeno Y, Alexandre L, Kawamoto T, Nishizaki Y, Oka N (2019) Analysis of how personality traits affect children’s conversational play with an utterance-output device. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob). pp. 215–220

Ilievski A, Dojchinovski D, Ackovska N, Kirandziska V (2018) The application of an air pollution measuring system built for home living. In: Kalajdziski S, Ackovska N (eds) ICT innovations 2018. Engineering and life sciences. Springer, pp. 75–89

Ito A (2019) Muting machine speech using audio watermarking. In: Pan J-S, Ito A, Tsai P-W, Jain LC (eds) Recent advances in intelligent information hiding and multimedia signal processing. Springer, pp. 74–81

Jabbar WA, Kian TK, Ramli RM, Zubir SN, Zamrizaman NS, Balfaqih M, Shepelev V, Alharbi S (2019) Design and fabrication of smart home with internet of things enabled automation system. IEEE Access 7:144059–144074. https://doi.org/10.1109/ACCESS.2019.2942846

Jacques R, Følstad A, Gerber E, Grudin J, Luger E, Monroy-Hernández A, Wang D (2019) Conversational agents: acting on the wave of research and development. In: Association for Computing Machinery (ACM) (ed), Extended Abstracts of the 2019 CHI conference on human factors in computing systems. pp. 1–8

Javed Y, Rajabi N (2020) Multi-Layer perceptron artificial neural network based IoT botnet traffic classification. In: Arai K, Bhatia R, Kapoor S (eds) Proceedings of the Future Technologies Conference (FTC) 2019. Springer, pp. 973–984

Jones VK (2018) Voice-activated change: marketing in the age of artificial intelligence and virtual assistants. J Brand Strategy 7(3):233–245

Kandlhofer M, Steinbauer G, Hirschmugl-Gaisch S, Huber P (2016) Artificial intelligence and computer science in education: from kindergarten to university. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2016 IEEE Frontiers in Education Conference (FIE). pp. 1–9

Kerekešová V, Babič F, Gašpar V (2019) Using the virtual assistant Alexa as a communication channel for road traffic situation. In: Choroś K, Kopel M, Kukla E, Siemiński A (eds) Multimedia and network information systems, vol 833. Springer, pp. 35–44

Khattar S, Sachdeva A, Kumar R, Gupta R (2019) Smart home with virtual assistant using Raspberry Pi. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 9th International conference on cloud computing, data science & engineering (Confluence). pp. 576–579

King B, Chen I-F, Vaizman Y, Liu Y, Maas R, Parthasarathi SHK, Hoffmeister B (2017) Robust speech recognition via anchor word representations. In: International Speech Communication Association (ISCA) (ed), Proceedings of the Interspeech 2017. pp. 2471–2475

Kita T, Nagaoka C, Hiraoka N, Dougiamas M (2019) Implementation of voice user interfaces to enhance users’ activities on Moodle. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of 2019 4th international conference on information technology. pp. 104–107

Kodali RK, Rajanarayanan SC, Boppana L, Sharma S, Kumar A (2019) Low cost smart home automation system using smart phone. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129). pp. 120–125

Komatsu S, Sasayama M (2019) Speech error detection depending on linguistic units. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2019 3rd international conference on natural language processing and information retrieval. pp. 75–79

König A, Francis LE, Malhotra A, Hoey J (2016) Defining Affective Identities in elderly nursing home residents for the design of an emotionally intelligent cognitive assistant. In: Favela J, Matic A, Fitzpatrick G, Weibel N, Hoey J (eds) Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare. ICST, pp. 206–210

Kortum SS (1997) Research, patenting, and technological change. Econometrica 1389–1419. https://doi.org/10.2307/2171741

Kowalczuk P (2018) Consumer acceptance of smart speakers: a mixed methods approach. J Res Interact Mark 12(4):418–431. https://doi.org/10.1108/JRIM-01-2018-0022

Kowalski J, Jaskulska A, Skorupska K, Abramczuk K, Biele C, Kopeć W, Marasek K (2019) Older adults and voice interaction: a pilot study with Google Home. In: Extended abstracts of the 2019 CHI conference on human factors in computing systems. pp. 1–6

Krippendorff K (2013) Content analysis: an introduction to its methodology. SAGE

Krotov V (2017) The Internet of Things and new business opportunities. Gener Potential Emerg Technol 60(6):831–841. https://doi.org/10.1016/j.bushor.2017.07.009

Kumar A (2018) AlexaPi3—an economical smart speaker. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 IEEE Punecon. pp. 1–4

Kumar N, Lee SC (2022) Human–machine interface in smart factory: a systematic literature review. Technol Forecast Soc Change 174:121284

Kunath G, Hofstetter R, Jörg D, Demarchi D (2019) Voice first barometer Schweiz 2018. Universität Luzern, pp. 1–25

Kuruvilla R (2019) Between you, me, and Alexa: on the legality of virtual assistant devices in two-party consent states. Wash Law Rev 94(4):2029–2055

Lackes R, Siepermann M, Vetter G (2019). Can I help you?—the acceptance of intelligent personal assistants. In: Pańkowska M, Sandkuhl K (eds) Perspectives in business informatics research. Springer, pp. 204–218

Lau J, Zimmerman B, Schaub F (2018) Alexa, are you listening?: Privacy perceptions, concerns and privacy-seeking behaviors with smart speakers. Proc ACM Hum–Comput Interact 2:1–31. https://doi.org/10.1145/3274371 . (CSCW)

Leahey E, Beckman CM, Stanko TL (2017) Prominent but less productive: The impact of interdisciplinarity on scientists’ research. Adm Sci Q 62(1):105–139. https://doi.org/10.1177/0001839216665364

Lee I, Kinney CE, Lee B, Kalker AA (2009) Solving the acoustic echo cancellation problem in double-talk scenario using non-gaussianity of the near-end signal. In: Association for Computing Machinery (ACM) (ed), International conference on independent component analysis and signal separation. pp. 589–596

Lee S, Kim S, Lee S (2019) “What does your agent look like?” A drawing study to understand users’ perceived persona of conversational agent. In: Association for Computing Machinery (ACM) (ed), Extended abstracts of the 2019 CHI conference on human factors in computing systems. pp. 1–6

Li S, Garces E, Daim T (2019) Technology forecasting by analogy-based on social network analysis: the case of autonomous vehicles. Technol Forecast Soc Change 148:119731. https://doi.org/10.1016/j.techfore.2019.119731

Li W, Chen Y, Hu H, Tang C (2020) Using granule to search privacy preserving voice in home IoT systems. IEEE Access 8:31957–31969. https://doi.org/10.1109/ACCESS.2020.2972975

Liciotti D, Ferroni G, Frontoni E, Squartini S, Principi E, Bonfigli R, Zingaretti P, Piazza F (2014) Advanced integration of multimedia assistive technologies: a prospective outlook. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Proceedings of the 2014 IEEE/ASME 10th international conference on Mechatronic and Embedded Systems and Applications (MESA). pp. 1–6

Liu Z, Shin J, Xu Y, Winata GI, Xu P, Madotto A, Fung P (2020) Zero-shot cross-lingual dialogue systems with transferable latent variables. ArXiv. https://arxiv.org/pdf/1911.04081.pdf

Lopatovska I, Oropeza H (2018) User interactions with “Alexa” in public academic space. Proceedings of the Association for Information Science and Technology 55(1):309–318. https://doi.org/10.1002/pra2.2018.14505501034

Lopatovska I, Rink K, Knight I, Raines K, Cosenza K, Williams H, Sorsche P, Hirsch D, Li Q, Martinez A (2019) Talk to me: exploring user interactions with the Amazon Alexa. J Librariansh Inf Sci 51(4):984–997. https://doi.org/10.1177/0961000618759414

Lovato SB, Piper AM, Wartella EA (2019) Hey Google, do unicorns exist? Conversational agents as a path to answers to children’s questions. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 18th ACM international conference on interaction design and children. pp. 301–313

Miles MB, Huberman AM (1994) Qualitative data analysis. A source book of new methods, 2nd edn. Sage

Macdonald RJ, Jinliang W (1994) Time, timeliness of innovation, and the emergence of industries. Technovation 14(1):37–53. https://doi.org/10.1016/0166-4972(94)90069-8

Malik KM, Malik H, Baumann R (2019) Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE conference on Multimedia Information Processing and Retrieval (MIPR). pp. 523–528

Martin EJ (2017) How Echo, Google Home, and other voice assistants can change the game for content creators. EContent. http://www.econtentmag.com/Articles/News/News-Feature/How-Echo-Google-Home-and-Other-Voice-Assistants-Can-Change-the-Game-for-Content--Creators-116564.htm

Masutani O, Nemoto S, Hideshima Y (2019) Toward a better IPA experience for a connected vehicle by means of usage prediction. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), Qualitative data analysis. A source book of new methods, 2nd edn2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). pp. 681–686

Mavropoulos T, Meditskos G, Symeonidis S, Kamateri E, Rousi M, Tzimikas D, Papageorgiou L, Eleftheriadis C, Adamopoulos G, Vrochidis S, Kompatsiaris I (2019) A context-aware conversational agent in the rehabilitation domain. Futur Internet 11(11):231. https://doi.org/10.3390/fi11110231 . Article

McLean G, Osei-Frimpong K (2019) Hey Alexa… examine the variables influencing the use of artificial intelligent in-home voice assistants. Comput Hum Behav 99:28–37. https://doi.org/10.1016/j.chb.2019.05.009

McReynolds E, Hubbard S, Lau T, Saraf A, Cakmak M, Roesner F (2017) Toys that listen: a study of parents, children, and internet-connected toys. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2017 CHI conference on human factors in computing systems. pp. 5197–5207

Melkers J, Xiao F (2010) Boundary-spanning in emerging technology research: determinants of funding success for academic scientists. J Technol Transf 37(3):251–270. https://doi.org/10.1007/s10961-010-9173-8

Mirzamohammadi S, Chen JA, Sani AA, Mehrotra S, Tsudik G (2017) Ditio: trustworthy auditing of sensor activities in mobile and IoT devices. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 15th ACM conference on embedded network sensor systems

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097 . Article

Article   PubMed   PubMed Central   Google Scholar  

Mokhtari M, de Marassé A, Kodys M, Aloulou H (2019) Cities for all ages: Singapore use case. In: Stephanidis C, Antona M (eds) HCI international 2019—late breaking posters. Springer, pp. 251–258

Nguyen TH, Waizenegger L, Techatassanasoontorn AA (2022) “Don’t Neglect the User!”–Identifying Types of Human-Chatbot Interactions and their Associated Characteristics. Inf Syst Front 24(3):797–838

Oh S-R, Kim Y-G (2017) Security requirements analysis for the IoT. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2017 International conference on Platform Technology and Service (PlatCon). pp. 1–6

Olson C, Kemery K (2019) 2019 Voice report: from answers to action: Customer adoption of voice technology and digital assistants. Technical report. Microsoft

Omale G (2020) Customer service and support leaders can use this Gartner Hype Cycle to assess the maturity and risks of customer service and support technologies. Gartner. https://www.gartner.com/smarterwithgartner/5-trends-drive-the-gartner-hype-cycle-for-customer-service-and-support-technologies-2020/

Ong DT, De Jesus CR, Gilig LK, Alburo JB, Ong E (2018) A dialogue model for collaborative storytelling with children. In: Yang JC, Chang M, Wong L-H, Rodrigo MM (eds), 26th International conference on computers in education workshop on innovative technologies for enhancing interactions and learning. pp. 205–210

Palumbo F, Gallicchio C, Pucci R, Micheli A (2016) Human activity recognition using multisensor data fusion based on reservoir computing. J Ambient Intell Smart Environ 8(2):87–107. https://doi.org/10.3233/AIS-160372

Papagiannidis S, Davlembayeva D (2022) Bringing Smart Home Technology to Peer-to-Peer Accommodation: Exploring the Drivers of Intention to Stay in Smart Accommodation. Inf Syst Front 24(4):1189–1208

Parkin S, Patel T, Lopez-Neira I, Tanczer L (2019) Usability analysis of shared device ecosystem security: Informing support for survivors of IoT-facilitated tech-abuse. In: Association for Computing Machinery (ACM) (ed), Proceedings of the new security paradigms workshop. pp. 1–15

Patel D, Bhalodiya P (2019) 3D holographic and interactive artificial intelligence system. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT). pp. 657–662

Patnaik D, Becker R (1999) Needfinding: the why and how of uncovering people’s needs. Design Manag J (Former Ser) 10(2):37–43

Petticrew M, Roberts H (2006) Systematic reviews in the social sciences: a practical guide. John Wiley & Sons

Pfeifle A (2018) Alexa, what should we do about privacy: protecting privacy for users of voice-activated devices. Wash Law Rev 93:421

Porter ME (1990) Competitive advantage of nations. Competitive Intell Rev 1(1):14

Portillo CD, Lituchy TR (2018) An examination of online repurchasing behavior in an IoT environment. In: Simmers CA, Anandarajan M (eds) The Internet of People, Things and Services: workplace tranformations. Routledge, pp. 225–241

Pradhan A, Findlater L, Lazar A (2019) “Phantom friend” or “just a box with information”: personification and ontological categorization of smart speaker-based voice assistants by older adults. In: Association for Computing Machinery (ACM) (ed), Proceedings of the ACM on Human–Computer Interaction, 3(CSCW)

Pradhan A, Mehta K, Findlater L (2018) “Accessibility came by accident”: use of voice-controlled intelligent personal assistants by people with disabilities. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2018 CHI conference on human factors in computing systems. pp. 1–13

Pridmore J, Mols A (2020) Personal choices and situated data: privacy negotiations and the acceptance of household intelligent personal assistants. Big Data Soc 7(1):205395171989174. https://doi.org/10.1177/2053951719891748 . Article

Principi E, Squartini S, Piazza F, Fuselli D, Bonifazi M (2013) A distributed system for recognizing home automation commands and distress calls in the Italian language. INTERSPEECH, pp. 2049–2053

Purao S, Meng C (2019) Data capture and analyses from conversational devices in the homes of the elderly. In: Guizzardi G, Gailly F, Suzana R, Pitangueira Maciel (eds) Lecture notes in computer science, vol 11787. Springer, pp. 157–166

Purington A, Taft JG, Sannon S, Bazarova NN, Taylor SH (2017) “Alexa is my new BFF”: social roles, user satisfaction, and personification of the Amazon Echo. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2017 CHI conference extended abstracts on human factors in computing systems. pp. 2853–2859

Pyae A, Joelsson TN (2018) Investigating the usability and user experiences of voice user interface: a case of Google home smart speaker. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 20th international conference on human-computer interaction with mobile devices and services adjunct. pp. 127–131

Pyae A, Scifleet P (2019) Investigating the role of user’s English language proficiency in using a voice user interface: a case of Google Home smart speaker. In: Association for Computing Machinery (ACM) (ed), (Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems—CHI EA ’19. pp. 1–6

Rabassa V, Sabri O, Spaletta C (2022) Conversational commerce: do biased choices offered by voice assistants’ technology constrain its appropriation? Technol Forecast Soc Change 174:121292

Robinson S, Pearson J, Ahire S, Ahirwar R, Bhikne B, Maravi N, Jones M (2018) Revisiting “hole in the wall” computing: private smart speakers and public slum settings. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 2018 CHI conference on human factors in computing systems. pp. 1–11

Robledo-Arnuncio E, Wada TS, Juang B-H (2007) On dealing with sampling rate mismatches in blind source separation and acoustic echo cancellation. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2007 IEEE workshop on applications of signal processing to audio and acoustics. pp. 34–37

Rzepka C, Berger B, Hess T (2022) Voice assistant vs. Chatbot–examining the fit between conversational agents’ interaction modalities and information search tasks. Inf Syst Front 24(3):839–856

Saadaoui FZ, Mahmoudi C, Maizate A, Ouzzif M (2019) Conferencing-Ng protocol for Internet of Things. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 Third international conference on Intelligent Computing in Data Sciences (ICDS). pp. 1–5

Samarasinghe N, Mannan M (2019a) Towards a global perspective on web tracking. Comput Secur 87:101569. https://doi.org/10.1016/j.cose.2019.101569

Samarasinghe N, Mannan M (2019b) Another look at TLS ecosystems in networked devices vs. web servers. Comput Secur 80:1–13. https://doi.org/10.1016/j.cose.2018.09.001

Sanders J, Martin-Hammond A (2019) Exploring autonomy in the design of an intelligent health assistant for older adults. In: Association for Computing Machinery (ACM) (ed), Proceedings of the 24th International conference on intelligent user interfaces: companion. pp. 95–96

Sangal S, Bathla R (2019) Implementation of restrictions in smart home devices for safety of children. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 4th International Conference on Information Systems and Computer Networks (ISCON). pp. 139–143

Santhanaraj K, Barkathunissa A (2020) A study on the factors affecting usage of voice assistants and the interface transition from touch to voice. Int J Adv Sci Technol 29(5):3084–3102

Santos-Pérez M, González-Parada E, Cano-García JM (2011) AVATAR: an open source architecture for embodied conversational agents in smart environments. In: Bravo J, Hervás R, Villarreal V (eds) Ambient Assisted living. Springer, pp. 109–115

Sestino A, Prete MI, Piper L, Guido G (2020) Internet of Things and Big Data as enablers for business digitalization strategies. Technovation 98:102173. https://doi.org/10.1016/j.technovation.2020.102173 . Article

Article   PubMed Central   Google Scholar  

Seymour W (2018) How loyal is your Alexa? Imagining a respectful smart assistant. In: Association for Computing Machinery (ACM) (ed), Extended abstracts of the 2018 CHI conference on human factors in computing systems. pp. 1–6

Shamekhi A, Bickmore T, Lestoquoy A, Gardiner P (2017) Augmenting group medical visits with conversational agents for stress management behavior change. In: de Vries PW, Oinas-Kukkonen H, Siemons L, Beerlage-de Jong N, van Gemert-Pijnen L (eds) Persuasive technology: development and implementation of personalized technologies to change attitudes and behaviors. Springer, pp. 55–67

Shank DB, Wright D, Nasrin S, White M (2022) Discontinuance and restricted acceptance to reduce worry after unwanted incidents with smart home technology. Int J Hum–Comput Interact 1–14. https://doi.org/10.1080/10447318.2022.2085406

Shin C, Chandok P, Liu R, Nielson SJ, Leschke TR (2018) Potential forensic analysis of IoT data: an overview of the state-of-the-art and future possibilities. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2017 IEEE International Conference on Internet of Things (IThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). pp. 705–710

Singh V, Verma S, Chaurasia SS (2020) Mapping the themes and intellectual structure of corporate university: co-citation and cluster analyses. Scientometrics 122(3):1275–1302. https://doi.org/10.1007/s11192-019-03328-0

Solorio JA, Garcia-Bravo JM, Newell BA (2018) Voice activated semi-autonomous vehicle using off the shelf home automation hardware. IEEE Internet Things J 5(6):5046–5054. https://doi.org/10.1109/JIOT.2018.2854591

Souden M, Liu Z (2009) Optimal joint linear acoustic echo cancelation and blind source separation in the presence of loudspeaker nonlinearity. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2009 IEEE international conference on multimedia and expo. pp. 117–120

Srikanth S, Saddamhussain SK, Siva Prasad P (2019) Home anti-theft powered by Alexa. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 International conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN). pp. 1–6

Stefanidi Z, Leonidis A, Antona M (2019) A multi-stage approach to facilitate interaction with intelligent environments via natural language. In: Stephanidis C, Antona M (eds) HCI International 2019—Late Breaking Posters, vol 1088. Springer, pp. 67–77

Struckell E, Ojha D, Patel PC, Dhir A (2021) Ecological determinants of smart home ecosystems: A coopetition framework. Technol Forecast Soc Change 173:121147. https://doi.org/10.1016/j.techfore.2021.121147

Sudharsan B, Corcoran P, Ali MI (2019) Smart speaker design and implementation with biometric authentication and advanced voice interaction capability. In: Curry E, Keane M, Ojo A, Salwala D (eds), Proceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, NUI Galway, vol 2563. pp. 305–316

Tao F, Liu G, Zhao Q (2018) An ensemble framework of voice-based emotion recognition system. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). pp. 1–6

Thapliyal H, Ratajczak N, Wendroth O, Labrado C (2018) Amazon Echo enabled IoT home security system for smart home environment. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2018 IEEE International Symposium on Smart Electronic Systems (ISES) (Formerly INiS). pp. 31–36

Tielman ML, Neerincx MA, Bidarra R, Kybartas B, Brinkman W-P (2017) A therapy system for post-traumatic stress disorder using a virtual agent and virtual storytelling to reconstruct traumatic memories. Journal of Medical Systems 41(8):125. https://doi.org/10.1007/s10916-017-0771-y

Tironi A, Mainetti R, Pezzera M, Borghese AN (2019) An empathic virtual caregiver for assistance in exer-game-based rehabilitation therapies. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE 7th International Conference on Serious Games and Applications for Health (SeGAH). pp. 1–6

Trenholm R (2016) Amazon Echo (and Alexa) arrive in Europe, and Echo comes in white now too. CNET. https://www.cnet.com/news/amazon-echo-and-alexa-arrives-in-europe/

Tsiourti C, Weiss A, Wac K, Vincze M (2019) Multimodal integration of emotional signals from voice, body, and context: effects of (in)congruence on emotion recognition and attitudes towards robots. Int J Soc Robot 11(4):555–573. https://doi.org/10.1007/s12369-019-00524-z

Tsiourti C, Quintas J, Ben-Moussa M, Hanke S, Nijdam NA, Konstantas D (2018a) The CaMeLi framework—a multimodal virtual companion for older adults. In: Kapoor S, Bhatia R, Bi Y (eds) Studies in computational intelligence, vol 751. Springer, pp. 196–217

Tsiourti C, Ben-Moussa M, Quintas J, Loke B, Jochem I, Lopes JA, Konstantas D (2018b) A virtual assistive companion for older adults: design implications for a real-world application. In: Sharma H, Shrivastava V, Bharti KK, Wang L (eds), Lecture notes in networks and systems, vol 15. Springer, pp. 1014-1033

Tung L (2018) Amazon Echo, Google Home: how Europe fell in love with smart speakers. ZDnet. https://www.zdnet.com/article/amazon-echo-google-home-how-europe-fell-in-love-with-smart-speakers

Turner-Lee N (2019) Can emerging technologies buffer the cost of in-home care in rural America? Generations 43(2):88–93. http://web.a.ebscohost.com/ehost/pdfviewer/pdfviewer?vid=2&sid=0aaaf704-d3bd-42ab-ad26-ecd36c0a059b%40sdc-v-sessmgr02

Vaca K, Gajjar A, Yang X (2019) Real-time automatic music transcription (AMT) with Zync FPGA. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). pp. 378–384

Van Eck NJ, Waltman L (2014) Visualizing bibliometric networks. In: Ding Y, Roussea R, Wolfram D (eds) Measuring scholarly impact: methods and practice. Springer, pp. 285–320

Van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2):523–538

Verma S, Gustafsson A (2020) Investigating the emerging COVID-19 research trends in the field of business and management: a bibliometric analysis approach. J Bus Res 118:253–261

Verma S (2017) The adoption of big data services by manufacturing firms: an empirical investigation in India. J Inf Syst Technol Manag 14(1):39–68

Vishwakarma SK, Upadhyaya P, Kumari B, Mishra AK (2019) Smart energy efficient home automation system using IoT. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 4th International Conference on Internet of Things: smart Innovation and Usages (IoT-SIU). pp. 1–4

Vora J, Tanwar S, Tyagi S, Kumar N, Rodrigues JJPC (2017) Home-based exercise system for patients using IoT enabled smart speaker. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2017 IEEE 19th International Conference on E-Health Networking, Applications and Services (Healthcom). pp. 1–6

Wakefield CC (2019) Achieving position 0: optimising your content to rank in Google’s answer box. J Brand Strategy 7(4):326–336

Wallace T, Morris J (2018) Identifying barriers to usability: smart speaker testing by military veterans with mild brain injury and PTSD. In: Langdon P, Lazar J, Heylighen A, Dong H (eds) Breaking down barriers. Springer, pp. 113–122

Xi N, Hamari J (2021) Shopping in virtual reality: a literature review and future agenda. J Bus Res 134:37–58. https://doi.org/10.1016/j.jbusres.2021.04.075

Yaghoubzadeh R, Pitsch K, Kopp S (2015) Adaptive grounding and dialogue management for autonomous conversational assistants for elderly users. In: Brinkman W-P, Broekens J, Heylen D (eds) Intelligent virtual agents, vol 9238. Springer, pp. 28–38

Yildirim İ, Bostancı E, Güzel MS (2019) Forensic analysis with anti-forensic case studies on Amazon Alexa and Google Assistant build-in smart home speakers. In: Institute of Electrical and Electronics Engineers (IEEE) (ed), 2019 4th International conference on computer science and engineering (UBMK). pp. 271–273

Yusri MM, Kasim S, Hassan R, Abdullah Z, Ruslai H, Jahidin K, Arshad MS (2017) Smart mirror for smart life. In: Institute of Electrical and Electronics Engineers (ed), 2017 6th ICT International Student Project Conference (ICT-ISPC) 2017 6th ICT International Student Project Conference (ICT-ISPC). pp. 1–5

Zschörnig T, Wehlitz R, Franczyk B (2019) A fog-enabled smart home analytics platform. In: Brodsky A, Hammoudi S, Filipe J, Smialek M (eds) Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), vol 1. SciTePress, pp. 604–610

Zuboff S (2019) The age of surveillance capitalism: the fight for a human future at the new frontier of power. Profile Books

Harwood S, Eaves S (2020) Conceptualising technology, its development and future: The six genres of technology. Technol Forecast Soc Change 160:120174

Stadler S, Riegler S, Hinterkörner S (2012) Bzzzt: When mobile phones feel at home. Conference on Human Factors in Computing Systems – Proceedings, 1297-1302. https://doi.org/10.1145/2212776.2212443

Download references

Acknowledgements

This research was funded by the Swiss National Science Foundation (SNSF) as part of the project “VA-People, Experiences, Practices and Routines” (VA-PEPR) (Grant Nr. CRSII5_189955). We are grateful for the support from the wider project team from Lucerne University of Applied Sciences and Arts, Eastern Switzerland University of Applied Sciences, and Northumbria University. We would also like to thank Bjørn S. Cience for his support while working on this paper.

Author information

Authors and affiliations.

Lucerne School of Information Technology and Computer Sciences, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland

Bettina Minder

Department of Business & Management, University of Southern Denmark, Odense, Denmark

Patricia Wolf & Surabhi Verma

Department of Management, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland

Patricia Wolf

Institute for Information and Process Management, Eastern Switzerland University of Applied Sciences, St.Gallen, Switzerland

Matthias Baldauf

Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark

Surabhi Verma

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Patricia Wolf .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

The article does not contain any studies with human participants performed by any of the authors.

Informed consent

Additional information.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Minder, B., Wolf, P., Baldauf, M. et al. Voice assistants in private households: a conceptual framework for future research in an interdisciplinary field. Humanit Soc Sci Commun 10 , 173 (2023). https://doi.org/10.1057/s41599-023-01615-z

Download citation

Received : 19 May 2022

Accepted : 14 March 2023

Published : 19 April 2023

DOI : https://doi.org/10.1057/s41599-023-01615-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

virtual voice assistant research paper

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

A Systematic Review of Voice Assistant Usability: An ISO 9241–11 Approach

Faruk lawal ibrahim dutsinma.

1 Innovative Cognitive Computing (IC2) Research Center, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand

Debajyoti Pal

Suree funilkul.

2 School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand

Jonathan H. Chan

Voice assistants (VA) are an emerging technology that have become an essential tool of the twenty-first century. The VA ease of access and use has resulted in high usability curiosity in voice assistants. Usability is an essential aspect of any emerging technology, with every technology having a standardized usability measure. Despite the high acceptance rate on the use of VA, to the best of our knowledge, not many studies were carried out on voice assistants’ usability. We reviewed studies that used voice assistants for various tasks in this context. Our study highlighted the usability measures currently used for voice assistants. Moreover, our study also highlighted the independent variables used and their context of use. We employed the ISO 9241-11 framework as the measuring tool in our study. We highlighted voice assistant’s usability measures currently used; both within the ISO 9241-11 framework, as well as outside of it to provide a comprehensive view. A range of diverse independent variables are identified that were used to measure usability. We also specified that the independent variables still not used to measure some usability experience. We currently concluded what was carried out on voice assistant usability measurement and what research gaps were present. We also examined if the ISO 9241-11 framework can be used as a standard measurement tool for voice assistants.

Introduction

Voice assistants (VAs) which are also called intelligent personal assistants are computer programs capable of understanding and responding to users using synthetic voices. Voice assistants have been integrated into different technological devices, including smartphones and smart speakers [ 1 ]. The voice modality is the central mode of communication used by these devices, rendering the graphic user interface (GUI) inapplicable or less meaningful [ 2 ]. People use VA technology in different aspects of their lives, such as for simple tasks like getting the weather report [ 3 ] or managing emails [ 4 ]. In addition, the VA can perform complex tasks like client representative tasks [ 5 ] and controllers in autonomous vehicles [ 6 ]. In other words, VA’s can revolutionize the way people interact with computing systems [ 7 ]. Currently, there is a massive global adoption of voice assistants. A report in [ 8 ] indicates that 4.2 billion VA’s were adopted and used in 2020 alone, with a projected increase to 8.4 billion by 2024. The popularity of VA’s has led to a greater research attention to its usability and user experience aspect.

Usability is a critical factor in the adoption of voice assistants [ 9 ]. A study by Zwakman et al. [ 10 ] highlighted the importance of usability in voice assistants [ 9 ]. An additional study by Coronado et al. [ 11 ] reiterated the importance of usability in human–computer interaction tools. Numerous studies have been carried out on the usability heuristics used in a VA, each study adopting a unique approach. A study by Maguire [ 12 ] used the Nielsen and Molich versions of Voice User Interface (VUI), and the heuristic Voice User Interface (VUI), to evaluate the ease of use of the VA’s. The study affirmed both the two heuristics were appropriate. However, the study noted that one was less problematic to use than the other [ 12 ]. A further study tested VUI heuristics to measure VA efficacy [ 13 ]. However, a critical factor that prevents the VA from adopting the heuristic currently available is the absence of a graphical user interface (GUI). Despite numerous studies on heuristics, the level of satisfaction is still low [ 14 ]. Furthermore, heuristics cannot be used as a standardized approach because they are approximate strategies or empirical rules for decision-making and problem-solving that do not ensure a correct solution. According to a study by Murad [ 16 ], the absence of standardized usability guidelines when developing VA interface presents a challenge in the development of an effective VA [ 15 ]. Another report from Budi & Leipheimer [ 17 ] also suggests that the usability of the VA’s requires improvements and standardization [ 16 ]. To create a standard tool a globally recognized and well-known organization is critical in the process because it eliminates bias and promotes neutrality [ 17 ]. The International Organization for Standardization (ISO) 9241-11 framework is one of the standard usability frameworks widely used for measuring technology acceptance.

According to the ISO 9241–11 framework, usability is defined as “the degree to which a program may be utilized to achieve measurable objectives with effectiveness, efficiency, and satisfaction in a specific context of usage” [ 18 ]. ISO 9241-11 provides a framework for understanding and applying the concept of usability in an interactive system and environment [ 19 ]. The main advantage of using the ISO standard is that industries and developers do not need to build different design measurement tools. This standard is intended to create compatibility with new and existing technologies, and also create trust [ 20 ]. Currently, the system developers do not have any standardized tool created specifically for the measurement of VA usability, consequently, the measures are decentralized, causing confusion among developers. The lack of in-depth assessment of the current heuristics used in the VA design affects the trust and adaptability of their users [ 15 ]. Other emerging technologies such as virtual reality [ 21 ] and game design [ 22 ] have understood the importance of creating an acceptable standardized measurement tool when designing new interfaces. Therefore, VA technology could also benefit significantly from the same concept. As evident from the above discussion, there is little to no focus on VA standardization.

Our study presents a systematic literature review comprising works carried out on the usability of voice assistants. In addition, we use the ISO 9241-11 framework as a standardized measurement tool to analyze the findings from the studies we collected. We chose the ACM and IEEE databases for the selection of our articles because both contain a variety of studies dealing with the usability aspects of VA’s. The following are the contributions of this literature review to the Human--Computer Interaction (HCI) community:

  • Our work highlights the studies currently carried out on VA usability. This includes the independent and dependent variables currently used.
  • Our study highlights the factors that affect the voice assistants' acceptance and impact the user’s total experience.
  • We identify and explain some attributes unique to only voice assistants, such as machine voice.
  • We also highlight the evaluation techniques used in previous studies to measure usability.
  • Finally, our study tries to compare the existing usability studies with the ISO 9241-11 framework. The decentralized approach of the VA usability measurement makes it vague to understand if the ISO 9241-11 framework is being adhered to whilst developing the usability metrics.

We hope that our input will highlight the integration of the current existing VA usability measures with the ISO 9241-11 framework. This will also verify whether the ISO 9241-11 framework can serve as a standard measure of usability in voice assistants. In conclusion, our study tries to answer the following four research questions:

  • RQ 1 : Can the ISO 9241–11 framework be used to measure the usability of the VA’s?
  • RQ 2 : What are the independent variables used when dealing with the usability of VA’s?
  • RQ 3 : What current measures serve as the dependent variables when evaluating the usability of VA’s?
  • RQ 4 : What is the relationship between the independent and dependent variables?

The remaining work is structured as follows. The second section presents the related work. This highlights what previous literature review studies had been carried out on voice agents’ usability; furthermore, the section also highlights the emergent technology that employed the ISO 9241-11 framework as a usability measuring tool. This is followed by the methodology section, which presents the inclusion and exclusion criteria used together with the review protocol. Furthermore, the query created for the database search is presented, and the database to be used is also selected. The fourth section presents the result and analysis. In this phase, the article used for this study is listed. Also, the research questions are answered. The fifth section contains discussion on the result analysis. This includes a more detailed explanation of the relationships between independent and dependent variables. Our insights and observations are included in this section as well.

Literature Review

Previous systematic reviews.

There have been a number of systematic literature reviews concerning VA’s over the years. Table ​ Table1 1 presents the information for a few of the relevant works.

Current literature reviews

As highlighted in Table ​ Table1, 1 , multiple systematic literature reviews have been carried out on VA's usability over the years. However, each study has a specific limitation and gap for improvement. For instance, some studies focus on the usability of voice assistants used only in specified fields such as education [ 25 ] and health [ 36 ]. Other studies focus on the usability of voice assistants concerning only specific age groups, such as older adults [ 28 ]. Likewise, although an in-depth analysis of the usability of the VA’s is carried out involving every usability measure in [ 32 ], this study does not use the ISO 9241 framework as a measuring standard. On the other hand, another study in [ 33 ] although uses the ISO 9241 framework as a measuring standard, however, the usage context was chatbots focusing primarily on text-based communication instead of voice. Overall, the available literature reviews on VA’s usability listed in Table ​ Table1 1 supports the view that very few of the current literature review studies on VA’s use the ISO 9241-11 framework as an in-depth tool for measuring usability.

The ISO 9242-11 Usability Framework

The ISO 9241-11 is a usability framework used to understand usability in situations where interactive systems are used and employed, which includes framework environments, products, and services [ 39 ]. Nigel et al. [ 40 ] conducted a study to revise the ISO 9241-11 framework standard, which reiterates the importance of the framework within the concept of usability. A number of studies have been conducted on various technologies using the ISO 9241–11 framework as a tool to measure their usability. This shows the diversified approach when using the framework. For instance, a study by Karima et al. (2016) proposed the use of ISO 9241-11 framework to measure the usability of mobile applications running on multiple operating systems by developers, in which the study identified display resolution and memory capacity as factors that affect the usability of using mobile applications [ 41 ]. Another study used the ISO 9241-11 framework to identify usability factors when developing e-government systems [ 42 ]. This study focused on the general aspect of e-Government system development and concluded the framework could be used as a usability guideline when developing a government portal. In addition, the ISO 9241-11 framework was also used to evaluate other available methods and tools. For instance, a study by Maria et al. [ 44 ] used the framework to evaluate existing tools used in the measurement of usability of software products and artifacts on the web. The study compared existing tools with the ISO 9241-11 measures for efficiency, effectiveness and satisfaction [ 43 ]. ISO 9241–11 framework has also been employed as a method of standardization tool in the geographic field [ 44 ], game therapy in dementia [ 45 ], and logistics [ 46 ]. Despite the ISO 9241-11 usability framework being utilized in different aspects of old and emergent technologies, it has not been used with a VA in the past.

We performed a systematic literature review is this study using the guidelines established by Barbara [ 47 ]. These guidelines have been widely used in other systematic review studies as a result of their rigor and inclusiveness [ 48 ]. In addition, we have added a new quality assessment process to our guidelines. The quality assessment is a list of questions that we use to independently measure each study to ensure its relevance for our review. Our quality evaluation checklists are derived from existing studies [ 49 , 50 ]. The complete guidelines used in this section comprises of four different stages:

  • Inclusion and exclusion criteria
  • Search query
  • Database and article selection.
  • Quality assessment.

Inclusion and Exclusion Criteria

The inclusion and exclusion criteria used in our study are developed for completeness and avoidance of bias. The criteria we used for our study are:

  • Studies that focus on VA, with voice being the primary modality. In scenarios where the text or graphical user interfaces are involved, they should not be the primary focus.
  • Studies are only in the English language to avoid mistakes during translation from another language
  • The studies include at least one user and one voice assistant to ensure that the focus is on usability, not system performance.
  • Study has a comprehensive conclusion.
  • Released between 2000 and 2021, because during this period the vocal assistants started to gain notable popularity

The exclusion criteria are:

  • Studies with poor research design, where the study's purpose is not clear are excluded.
  • White papers, posters, and academic thesis are excluded.

Search Query

We created the search query for our study using keywords arranged to search the relevant databases. We went through previous studies to find the most relevant search keyword to find what is commonly used in usability studies. After numerous debates among the researchers and seeking two HCI expert's opinion, we chose the following set of keywords: usability, user experience, voice assistants, personal assistants, conversational agents, Google Assistant, Alexa, and Siri. We connected the keywords with logical operators (AND and OR ) to yield accurate results. The final search string used was (“usability” OR “user experience “) AND (“voice assistants” OR “personal assistants” OR “conversational Agents” OR “Google Assistant” OR” Alexa” OR “Siri”). The search was limited to the abstract and title of the study.

Database and Article Selection

Figure  1 highlights the graphic presentation of the selection and filtering process. The figure is adapted from the Prisma flow diagram [ 51 ]. As earlier stated, two databases are used as the sources for our article selection: the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). Both databases we used in our study contain the most advanced studies on VA and are highly recognized among the HCI community. The search query returned 340 results from the ACM database and 280 results from the IEEE database. 720 items in both databases were checked for duplication and 165 documents (23%) were found to be duplicated and hence removed. Additionally, more items were filtered by title and abstract. We utilized keyword match to search the title; however, the abstract was read to identify the eligibility criteria. In addition, 399 documents (72%) were removed because they did not meet the eligibility criteria. Finally, 121 documents were removed that were not consistent with the research objectives of our study. At the end of the screening process 29 articles (19%) were finally included in this literature review.

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1172_Fig1_HTML.jpg

Article selection process

Quality Assessment

The selected items presented in Table ​ Table2 2 are used for assessing the quality of the selected articles. The process was deployed to ensure the reported contents fit into our research. The sections collected from articles such as the methodology used, analysis done, and the context of use within each article were vital to our study. Each question is a three-point scale: “Yes” is scored as 1 point, which means the question is fully answerable. “Partial” is scored as 0.5, which means the question is vaguely answered, and “NO “is scored as 0, which means it is not answered at all. All the 29 sets of finally included articles passed the quality assessment phase.

Quality assessment checklist

Result and Analysis

List of articles.

This section lists and discusses the articles collected in the previous stage. Table ​ Table3 3 presents the list of all the compiled articles. Moreover, we identified the usability focus of each study.

List of compiled articles

Voice Assistant Usability Timeline

We grouped the collected research into three categories, each representing a range of time frames (Fig.  2 ). The categorization is based on voice assistant period breakthroughs. The first category is from 2000 to 2006, which was the year of social media and camera phones, also known as the year of the Y2K bug in telecommunications. During these years, conversational agents started to get noticed with the introduction of the inventions such as the Honda’s Advanced Step in Innovative Mobility (ASIMO) humanoid robot [ 80 ]. The second category ranges from 2007 to 2014. During these years technological advancements got users more exposed to voice assistants through embedding them into smartphones and computers. For instance, Apple first introduced SIRI in 2011 [ 81 ], and Microsoft introduced Cortana in 2014. The last category ranges from 2015 to 2021. This was when the massive adoption of voice assistants took place, making it an all-time high.

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1172_Fig2_HTML.jpg

Year of publication of selected articles

Based on the year of publication of our selected articles, Fig.  2 clearly shows that the study on VA’s has expanded significantly over the last six years (2014–2021). This can be attributed to the invention of a smart speaker and phone with built-in voice agents [ 82 ]. Another reason for VA popularity is the COVID -19 outbreak that has given a fresh impetus towards touchless interaction technologies like voice [ 83 ].

Different Embodiment Types of VA’s

Smart speakers are the mostly used embodiment of VA’s used in our selected articles. This is due to the current popularity of commercial smart speakers such as Alexa, HomePod, etc. A 2019 study showed that 35% of US households have an intelligent smart speaker, and projected to reach 75% by 2025 [ 84 ]. Use of humanoids is also popular because usability measures such as anthropomorphism are essential for voice assistant usability [ 85 ]. Furthermore, Fig.  3 shows that only a few studies were done on car interface voice assistants. Car interfaces are vocal assistants that act as intermediaries between the driver and the car. The VA car interface allows drivers to access car information and also be able to perform the task without losing focus on driving. The fourth type of software interface refers to a voice assistant software embedded inside smartphones or computers. The studies we have collected have used either the commercialized form of the software interface, such as Alexa and Siri, while others have developed new voice interfaces that are easily accessible to users due to the adoption of smartphones and computers assistants using programming codes and skills. Nevertheless, both are in the forms of different software agents.

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1172_Fig3_HTML.jpg

Embodiment of Voice assistant used in selected studies

Component of ISO 9241-11 Framework

The ISO 9241-11 framework highlights two components, the context of use and usability measure [ 18 ]. We concentrate on both components to highlight any correlations between usability metrics and the context of use in the selected articles. The context of use consists of the different independent variables along with the techniques used for analyzing them. Likewise, the usability measure represents the dependent variables, i.e., the effect that the independent variables have on the overall experience of the users. Accordingly, the analysis is presented in a bi-dimensional manner in the following sections.

Context of Use

Independent variable.

We split the context of use into an independent variable and the techniques used. The independent variables presented in our study are the physical and mental attributes used to measure a given user interaction outcome. Furthermore, our study grouped the independent variables into five main categories. The grouping is shown in Fig.  4 and is based on the similar themes identified from the collected studies. The five groups included people (user attributes), voice (voice assistant attributes), task, conversational style, and anthropomorphic cues. The voice and people categories are the oldest independent variables used to measure usability. Their relevance is also seen in the recent studies, which indicate that researchers have a high interest in correlating users with the VA’s. On the other hand, anthropomorphic clues and conversational styles are relatively new to the measurement of usability. The task-independent variable is the most used variable of late, perhaps because users always test the VA’s ability to perform certain tasks. It also indicates that VA’s are widely used for various functional and utilitarian aspects. The anthropomorphic cues are seldom used in the second phase (2007–2014). However, it is most widely used in the last range (2015–2021).

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1172_Fig4_HTML.jpg

Categories of independent variable use over the years

In Table ​ Table4 4 we highlight more details with regards to the different groups of the independent variable collected, and also present examples of the independent variables for each category. We highlight how the independent variables have been applied by the previous studies and in which environment they have been used. We defined each independent variable category in Table ​ Table4, 4 , and explained their sub-categories as well. As evident from Table ​ Table4, 4 , different independent variables are used together in multiple studies. For example, independent voice variables and independent people variables are used simultaneously in various studies, such as personality, gender, and accent. Similarities between multiple independent variables aid to understand the relationship between the variables themselves and their relationship with the usability measures. Furthermore, the table also highlights the kind of experiments carried out. Controlled experiments are effective methods for understanding the immediate cause and effect between variables. However, a noticeable drawback of controlled experiments is the absence of external validity. The results might not be the same when applied in real-world settings. For instance, the simulation experiment on cars is a controlled environment, a driver has no control over the domain in real life. The usability experience of the driver might be different in natural settings and that might sometimes prove fatal.

Independent variables and their categorization

Techniques Used

We identified seven techniques that researchers have used as shown in Fig.  5 . The quantitative experiments are the most used and the oldest technique used on voice assistants based on our data collected. The quantitative method is sometimes used as a standalone experiment and sometimes with other techniques [ 54 ]. It is worthy of notice that cars simulation experiments involving VA’s were first used in 2000. Other experiments on human communication with self-driving cars have been carried out since 1990’s. making it one of the oldest techniques for usability measurement. More accurate technique was introduced later, such as the interaction design. The interaction design employed by studies such as [ 61 ] provides a real-time experiment scenario. This avoids the drawback such as bias when using quantitative methods. Factorial design studies are majorly used by studies that compare two or more entities in a case study [ 55 ]. They are utilized mainly by studies using two or more independent variables together.

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1172_Fig5_HTML.jpg

Technique used in our studies over the span period of time

Usability Measure (Dependent Variable)

This subsection of our study focuses on the usability measurement of our research. Moreover, the findings are used to answer RQ 1 and RQ 3 . The ISO 9241-11 framework grouped usability measures into three categories; effectiveness, efficiency, and satisfaction. According to the ISO 9241-11 framework, “effectiveness is the accuracy and completeness with which users achieve specified goals.”, Whereas “Efficiency is the resources expended concerning accuracy and completeness in which users achieve goals” and “satisfaction is the freedom from discomfort and positive attitudes towards the use of the product” [ 18 ].

In numerous studies, the usability measures used were clearly outside the scope of the ISO 9241-11 framework. In total, we identified three additional usability categories attitude, machine voice (anthropomorphism), and cognitive load. The graphical representation of the different usability measures identified in this study is presented in Figs.  6 and ​ and7. 7 . Futhermore, the figures also highlights the percentage of studies that used the mentioned usability measures in the ISO 9241-11 framework and those that are outside the framework. Based on our compiled result, the user satisfaction and effectiveness are the earliest usability measures used when measuring VA’s usability. Some studies used performance and productivity as subthemes to measure effectiveness [ 62 ]. The measure of usability has been carried out both subjectively and objectively. For instance, studies have measured the VA effectiveness by subjective means by using quantitative methods such as questionnaire tools [ 72 ]. In contrast, other studies have used objective methods such as average completed interaction [ 69 ]. Multiple usability measures are sometimes applied in the same research; for instance some studies measured effectiveness alongside efficiency and satisfaction [ 66 , 70 ]. Learnability, optimization, and ease of use have been used as subthemes to measure efficiency. Interactive design is the most effective experiment that provides real-time results employed [ 56 , 79 ]. The ISO 9241-11 framework works well with effectiveness, efficiency, and satisfaction; however, the users have more expectations from the voice assistant with the recent advancement of VA capabilities. Our compiled result showed that more than half of the studies are not carried out in accordance with the standard ISO 9241-11 framework (Fig.  7 ). The other usability measures we identified outside the ISO 9241-11 framework are attitude, machine voice, and cognitive load.

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1172_Fig6_HTML.jpg

Usability measurement used over the years on our compiled articles

An external file that holds a picture, illustration, etc.
Object name is 42979_2022_1172_Fig7_HTML.jpg

Percentage of ISO 9241–11 framework usability measures and non ISO 9241–11

Attitude is a set of emotions, beliefs, and behavior towards the voice assistants. Attitude results from a person’s experience and can influence user behavior. Attitude is subjected to change and is not constant. Understanding the user attitude towards the VA has become an active research area. Numerous studies have used different methods to measure subthemes of attitude such as trust, closeness, disclosure, smartness, and honesty [ 60 , 78 ]. Likeability is also a subtheme of attitude, and it has been used to measure the compatibility, trust, and strength between the user and VA’s [ 56 , 57 ]. Moreover, embodiment type affects the user attitude as well, A study highlighted how gaze affects the user attitude toward VA [ 59 ], and it shows VA with gaze creates trust.

We defined machine voice (anthropomorphism) as the user attribution of human characteristics and human similarity to the voice assistant. We considered machine voice an important usability measure that only applies to voice assistants due to their primary modality being the voice. Considering that fact, the measure of machine voice has also spiked currently it becomes obvious that it has been drawing a lot of interest. One of the direct purposes of the VA is to sound as humanly as possible. When the users will perceived the machines to be more human, it built more trust, which will result in a better usability experience.

The cognitive load might be mistaken for efficiency. Nevertheless, they are different. We defined cognitive load as the amount of mental capacity a person applies to communicate successfully with the VA. When it comes to VA, actions such as giving out commands require cognitive thinking and approach. The cognitive load is measured by specific characteristics unique to the VA, such as attention time during the use of the VA [ 76 ] and the user’s mental workload during use [ 77 ].

To answer RQ 1 ( can the ISO 9241–11 framework be used to measure the usability of the VA’s?), none of the existing works have used the ISO 9241-11 framework solely for the purpose of usability evaluation. It has been supplemented by other factors that we have presented above that are outside the scope of this framework.

Relationship Between the Independent variables and Usability Measures

After identifying the independent and dependent variables, in Tables ​ Tables5 5 and ​ and6 6 we show how they are inter-related for having a better understanding of the usability scenario of the VA’s. While Table ​ Table5 5 focuses on the ISO 9241-11 specific factors, Table ​ Table6 6 considers the non-ISO factors specifically.

Relationship between independent variables and ISO 9241–11 framework measurement

Relationship between independent variables and non- ISO 9241–11 framework measurement

The independent variables are grouped into categories and represented by table rows, with every category consisting of multiple independent variables. Moreover, the usability measures have been presented in the column of the table. Every usability measure is made up of different sub-themes, which are all presented on the table as well. The tables highlight the relationship between the independent and usability measures. An “X” mark present in each cell represents a study present between that independent variable and usability measure subtheme. Nonetheless, an empty cell indicates that there is no study carried out to link that relationship between the usability measure and independent variable.

Independent Variable and Usability Measures

Our study revealed what has been previously carried out in VA usability and revealed the gaps that are yet to be addressed. We analyzed the usability measures and their relationship to the so-called independent variables. There is an easy accessibility to VA’s due to the development of different embodiment types such as speakers, humanoids, and robots. However, there is so much less focus on embodiment types and their relationship to effectiveness and anthropomorphism, which needs more attention. Some relationship gaps and associations are apparent, while some are vague. For instance, the independent variable “ accent ”, has often been connected with its effectiveness on users. However, what is left unanswered is if the VA accents impart the same efficacy on users of the same or different genders. Another notable gap is gender and efficiency, with very few studies on that. This will be an essential aspect to understand and apply with the recent massive adoption of voice assistants in different contexts. Another obvious gap is the query expression relationship with any ISO 9241-11 framework measures. The query expression is how a user expresses their query to the voice assistants. The query expression has been known to increase the trust and attitude of the user towards the VA. However, its relationship to usability measures such as efficiency, satisfaction, and effectiveness is still under-researched. Knowing the right way to ask queries (questions) defines the type of response a user gets. An incorrect response will be received if the right question is expressed incorrectly. From a mental model, when a user has too much energy and thought to frame a question, it affects the VA efficiency and satisfaction. However, this has not been proven by any current study.

The VA response types increase effectiveness and trust. However, its relationship to user acceptance is still unknown. Another exciting intersection is the anthropomorphic cues and attitude, which results from anthropomorphic emotional response than a practical one. Attitude is an emotional response to a giving state, hence its strong connection with anthropomorphism. The attitude toward the VA is a highly researched area [ 86 ]. Trust, likeability, and acceptance are subthemes that focused on the attitude usability measure. This can be attributed to the importance of trust while using emergent technologies such as voice assistants. User trust in voice assistants is an essential aspect with the rise of IoT devices, and user mistrust affects the acceptance and effectiveness of the VA’s [ 87 ]. Multiple studies measured user trust while using machine voice categories as an independent variable. That could be attributed to the lack of GUI in VA. Furthermore, the voice modality must be enough to cultivate user trust. Noticeably subjective methods were widely employed when measuring the user attitudes; even though subjective measures often relate to the variables they are intended to capture; however, they are also affected by cognitive biases.

The ISO 9241-11 framework is an effective tool when measuring effectiveness, efficiency, and satisfaction. However, it is not applicable when measuring usability’s, such as attitude, machine voice, and mental load. These are all measurements that are uniquely associated with voice assistants. Therefore, the ISO 9241-11 framework could be expanded to include such usability aspects.

Technique Employed

The factorial design adapts well when used in a matched subject design experiments [ 56 ]. Based on the studies collected, machine learning is not well used as an analytic tool in usability. This could be attributed to the technical aspects of machine learning and it is still relatively a new field. However, with machine learning third-party tools more analysis will be carried out. Wizard of Oz, and interactive design started gaining popularity in 2015–2021. Moreover, the Wizard of Oz and interactive techniques are more effective when using independent variables such as anthropomorphic cues. The anthropomorphic cue independent variables is used with Wizard of Oz. techniques and interaction design more than any other techniques. This could be recognized to the importance of using objective methods to avoid biased human responses. Furthermore, “machine voice” is a fairly popular usability measure. This could be attributed to the VA developers trying to give the VA a more human and intelligent attributes. The more users perceive the machine voice as intelligent and humanlike, the more they trust and adopt it. More objective technique methods should be created and used on the independent variables when measuring machine voice. Subjective techniques such as Quantitative methods are easy to use and straightforward. However, they can produce biased results.

Interactive design experiments are the most commonly used technique employed to measure the usability. However, the interaction depends on voice modality, making it different from the traditional interaction design that uses visual cues as part of its essential components. Moreover, interaction design also triggers an emotional response, which makes it effective when measuring user attitude. The absence of visual elements in interactive design used might debatably defeat the purpose of clear communication. A new standard of interaction design uniquely for voice modality should be done.

Future Works and Limitation

One limitation in our study was using a few databases as our articles source; in future studies, we intend to add more journal databases such as Scopus, and Taylor and Francis. The majority of the experiment studies we collected was conducted in a controlled environment; future studies will focus on usability measures and independent variables, that are used in natural settings; furthermore, the results can be compared together More studies should be carried out on objective techniques, also how they could cooperate with subjective techniques. This is vital because, with the rise of user expectations of voice assistants, it will be essential to understand how techniques complement each other in each usability measurement.

Our study aimed to understand what is currently employed for measuring voice assistant usability, and we identified the different independent variables, dependent variables, and the techniques used. Furthermore, we also focused on using the ISO 9241-11 framework to measure the usability of voices assistants. Our study classified five independent variable classes used for measuring the dependent variables. These separate classes were categorized based on the similarities between the member groups. Also, our study used the three usability measures in the ISO 9241-11 framework in conjunction with the other three to serve as the dependable variables. We uncovered that voice assistants such as car interface speakers were not studied enough, and currently, smart speakers have the most focus. Dependent variables such as machine voice (anthropomorphism) and attitude recently have more concentration than the old usability measures, such as effectiveness. We also uncovered that usability is dependent on the context of use, such as the same independent variables could be used in different usability measures. Our study highlights the relationship between the independent and dependent variables used by other studies. In conclusion, our study used the ISO 9241-11 to analyse usability. We also highlight what has been carried out on VA’s usability and what gaps are left. Moreover, we concluded even though there is a lot of usability measurement carried out, there are still many aspects that have not been researched. Furthermore, the current ISO 9241-11 framework is not suitable for measuring the recent advancement of VA because the user needs and expectation have changed with the rise of technology. Using the ISO 9241-11 framework will create ambiguity in explaining some usability measures such as machine voice, attitude and cognitive load. However, it has the potential to be a foundation for future VA usability frameworks.

This study was funded by The Asahi Glass Foundation.

Declarations

The author declares that they have no conflict of interest.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Faruk Lawal Ibrahim Dutsinma, Email: [email protected] .

Debajyoti Pal, Email: [email protected] , Email: [email protected] .

Suree Funilkul, Email: ht.ca.ttumk.tis@eerus .

Jonathan H. Chan, Email: ht.ca.ttumk.tis@nahtanoj .

Artificial Intelligence and Virtual Assistant—Working Model

  • Conference paper
  • First Online: 29 September 2020
  • Cite this conference paper

virtual voice assistant research paper

  • Shakti Arora 13 ,
  • Vijay Anant Athavale 13 ,
  • Himanshu Maggu 13 &
  • Abhay Agarwal 13  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 140))

1432 Accesses

5 Citations

In twenty-first-century virtual assistant is playing a very crucial role in day to day activities of human. According to the survey report of Clutch in 2019, 27% of the people are using the AI-powered virtual assistant such as: Google Assistant, Amazon Alexa, Cortana, Apple Siri, etc., for performing a simple task, people are using virtual assistant designed with natural language processing. In this research paper, we have studied and analyzed the working model and the efficiency of different virtual assistants available in the market. We also designed an intelligent virtual assistant that could be integrated with Google virtual services and work with the Google virtual assistant interface. A comparative analysis of the traffic and message communication with length of conversation for approximately three days is taken as input to calculate the efficiency of the designed virtual assistant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

https://www.prnewswire.com/news-releases/

https://clutch.co/developers/internetof-things/resources/iot-technology-smart-devices-home

https://en.wikipedia.org/wiki/Cortana

https://www.slideshare.net/AbedMatini/chatbot-presentation-iitpsa-22-Feb-2018

Google Cloud: Dialog Flow Documentation. https://cloud.google.com/dialogflow/docs/console

Proceedings of 10th conference of the Italian Chapter of AIS, ‘Empowering society through digital innovations’. Università Commerciale Luigi Bocconi in Milan, Italy, 14 Dec 2013. ISBN: 978-88-6685-007-6

Google Scholar  

Mining Business Data. https://miningbusinessdata.com

https://en.wikipedia.org/wiki/Siri

https://en.wikipedia.org/wiki/Amazon_Alexa

Imrie P, Bednar P (2013) Virtual personal assistant. In: Martinez M, Pennarolaecilia F (eds) ItAIS 2013

Alexa vs Siri vs Google Assistant vs Cortana. https://www.newgenapps.com/blog/alexa-vs-Siri-vs-Cortana-vs-google-which-ai-assistant-wins

Google Assistant. https://en.wikipedia.org/wiki/Google_Assistant . Tulshan, Amrita & Dhage, Sudhir (2019)

Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa. In: 4th International symposium SIRS 2018, Bangalore, India, 9–22 Sept 2018, Revised Selected Papers. https://doi.org/10.1007/978-981-13-5758-9_17

Russel S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall

https://www.google.com/search/about/learn-more . Accessed on 03 Nov 2016

Apple, ios—siri. https://www.apple.com/ios/siri . Accessed on 03 Nov 2016

A Glossary of Term of Humans. https://medium.com/@Wondr/ai-explained-for-humans-your-artificial-intelligence-glossary-is-is-right-here-6920279ff88f

Download references

Author information

Authors and affiliations.

Panipat Institute of Engineering & Technology, Samalkha, Panipat, 132102, India

Shakti Arora, Vijay Anant Athavale,  Himanshu Maggu & Abhay Agarwal

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shakti Arora .

Editor information

Editors and affiliations.

Department of Electronics and Communication Engineering, University Institute of Engineering and Technology (UIET), Kurukshetra University, Kurukshetra, Haryana, India

Nikhil Marriwala

University Institute of Engineering and Technology (UIET), Kurukshetra University, Kurukshetra, Haryana, India

C. C. Tripathi

Department of Electrical and Computer System Engineering, RMIT University, Melbourne, VIC, Australia

Dinesh Kumar

Department of Electronics and Communication Engineering, Jaypee University of Information Technology, Waknaghat, Himachal Pradesh, India

Shruti Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Arora, S., Athavale, V.A., Himanshu Maggu, Agarwal, A. (2021). Artificial Intelligence and Virtual Assistant—Working Model. In: Marriwala, N., Tripathi, C.C., Kumar, D., Jain, S. (eds) Mobile Radio Communications and 5G Networks. Lecture Notes in Networks and Systems, vol 140. Springer, Singapore. https://doi.org/10.1007/978-981-15-7130-5_12

Download citation

DOI : https://doi.org/10.1007/978-981-15-7130-5_12

Published : 29 September 2020

Publisher Name : Springer, Singapore

Print ISBN : 978-981-15-7129-9

Online ISBN : 978-981-15-7130-5

eBook Packages : Engineering Engineering (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Our approach

  • Responsibility
  • Infrastructure
  • Try Meta AI

RECOMMENDED READS

  • 5 Steps to Getting Started with Llama 2
  • The Llama Ecosystem: Past, Present, and Future
  • Introducing Code Llama, a state-of-the-art large language model for coding
  • Meta and Microsoft Introduce the Next Generation of Llama
  • Today, we’re introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model.
  • Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.
  • We’re dedicated to developing Llama 3 in a responsible way, and we’re offering various resources to help others use it responsibly as well. This includes introducing new trust and safety tools with Llama Guard 2, Code Shield, and CyberSec Eval 2.
  • In the coming months, we expect to introduce new capabilities, longer context windows, additional model sizes, and enhanced performance, and we’ll share the Llama 3 research paper.
  • Meta AI, built with Llama 3 technology, is now one of the world’s leading AI assistants that can boost your intelligence and lighten your load—helping you learn, get things done, create content, and connect to make the most out of every moment. You can try Meta AI here .

Today, we’re excited to share the first two models of the next generation of Llama, Meta Llama 3, available for broad use. This release features pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases. This next generation of Llama demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning. We believe these are the best open source models of their class, period. In support of our longstanding open approach, we’re putting Llama 3 in the hands of the community. We want to kickstart the next wave of innovation in AI across the stack—from applications to developer tools to evals to inference optimizations and more. We can’t wait to see what you build and look forward to your feedback.

Our goals for Llama 3

With Llama 3, we set out to build the best open models that are on par with the best proprietary models available today. We wanted to address developer feedback to increase the overall helpfulness of Llama 3 and are doing so while continuing to play a leading role on responsible use and deployment of LLMs. We are embracing the open source ethos of releasing early and often to enable the community to get access to these models while they are still in development. The text-based models we are releasing today are the first in the Llama 3 collection of models. Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core LLM capabilities such as reasoning and coding.

State-of-the-art performance

Our new 8B and 70B parameter Llama 3 models are a major leap over Llama 2 and establish a new state-of-the-art for LLM models at those scales. Thanks to improvements in pretraining and post-training, our pretrained and instruction-fine-tuned models are the best models existing today at the 8B and 70B parameter scale. Improvements in our post-training procedures substantially reduced false refusal rates, improved alignment, and increased diversity in model responses. We also saw greatly improved capabilities like reasoning, code generation, and instruction following making Llama 3 more steerable.

virtual voice assistant research paper

*Please see evaluation details for setting and parameters with which these evaluations are calculated.

In the development of Llama 3, we looked at model performance on standard benchmarks and also sought to optimize for performance for real-world scenarios. To this end, we developed a new high-quality human evaluation set. This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization. To prevent accidental overfitting of our models on this evaluation set, even our own modeling teams do not have access to it. The chart below shows aggregated results of our human evaluations across of these categories and prompts against Claude Sonnet, Mistral Medium, and GPT-3.5.

virtual voice assistant research paper

Preference rankings by human annotators based on this evaluation set highlight the strong performance of our 70B instruction-following model compared to competing models of comparable size in real-world scenarios.

Our pretrained model also establishes a new state-of-the-art for LLM models at those scales.

virtual voice assistant research paper

To develop a great language model, we believe it’s important to innovate, scale, and optimize for simplicity. We adopted this design philosophy throughout the Llama 3 project with a focus on four key ingredients: the model architecture, the pretraining data, scaling up pretraining, and instruction fine-tuning.

Model architecture

In line with our design philosophy, we opted for a relatively standard decoder-only transformer architecture in Llama 3. Compared to Llama 2, we made several key improvements. Llama 3 uses a tokenizer with a vocabulary of 128K tokens that encodes language much more efficiently, which leads to substantially improved model performance. To improve the inference efficiency of Llama 3 models, we’ve adopted grouped query attention (GQA) across both the 8B and 70B sizes. We trained the models on sequences of 8,192 tokens, using a mask to ensure self-attention does not cross document boundaries.

Training data

To train the best language model, the curation of a large, high-quality training dataset is paramount. In line with our design principles, we invested heavily in pretraining data. Llama 3 is pretrained on over 15T tokens that were all collected from publicly available sources. Our training dataset is seven times larger than that used for Llama 2, and it includes four times more code. To prepare for upcoming multilingual use cases, over 5% of the Llama 3 pretraining dataset consists of high-quality non-English data that covers over 30 languages. However, we do not expect the same level of performance in these languages as in English.

To ensure Llama 3 is trained on data of the highest quality, we developed a series of data-filtering pipelines. These pipelines include using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality. We found that previous generations of Llama are surprisingly good at identifying high-quality data, hence we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.

We also performed extensive experiments to evaluate the best ways of mixing data from different sources in our final pretraining dataset. These experiments enabled us to select a data mix that ensures that Llama 3 performs well across use cases including trivia questions, STEM, coding, historical knowledge, etc.

Scaling up pretraining

To effectively leverage our pretraining data in Llama 3 models, we put substantial effort into scaling up pretraining. Specifically, we have developed a series of detailed scaling laws for downstream benchmark evaluations. These scaling laws enable us to select an optimal data mix and to make informed decisions on how to best use our training compute. Importantly, scaling laws allow us to predict the performance of our largest models on key tasks (for example, code generation as evaluated on the HumanEval benchmark—see above) before we actually train the models. This helps us ensure strong performance of our final models across a variety of use cases and capabilities.

We made several new observations on scaling behavior during the development of Llama 3. For example, while the Chinchilla-optimal amount of training compute for an 8B parameter model corresponds to ~200B tokens, we found that model performance continues to improve even after the model is trained on two orders of magnitude more data. Both our 8B and 70B parameter models continued to improve log-linearly after we trained them on up to 15T tokens. Larger models can match the performance of these smaller models with less training compute, but smaller models are generally preferred because they are much more efficient during inference.

To train our largest Llama 3 models, we combined three types of parallelization: data parallelization, model parallelization, and pipeline parallelization. Our most efficient implementation achieves a compute utilization of over 400 TFLOPS per GPU when trained on 16K GPUs simultaneously. We performed training runs on two custom-built 24K GPU clusters . To maximize GPU uptime, we developed an advanced new training stack that automates error detection, handling, and maintenance. We also greatly improved our hardware reliability and detection mechanisms for silent data corruption, and we developed new scalable storage systems that reduce overheads of checkpointing and rollback. Those improvements resulted in an overall effective training time of more than 95%. Combined, these improvements increased the efficiency of Llama 3 training by ~three times compared to Llama 2.

Instruction fine-tuning

To fully unlock the potential of our pretrained models in chat use cases, we innovated on our approach to instruction-tuning as well. Our approach to post-training is a combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct preference optimization (DPO). The quality of the prompts that are used in SFT and the preference rankings that are used in PPO and DPO has an outsized influence on the performance of aligned models. Some of our biggest improvements in model quality came from carefully curating this data and performing multiple rounds of quality assurance on annotations provided by human annotators.

Learning from preference rankings via PPO and DPO also greatly improved the performance of Llama 3 on reasoning and coding tasks. We found that if you ask a model a reasoning question that it struggles to answer, the model will sometimes produce the right reasoning trace: The model knows how to produce the right answer, but it does not know how to select it. Training on preference rankings enables the model to learn how to select it.

Building with Llama 3

Our vision is to enable developers to customize Llama 3 to support relevant use cases and to make it easier to adopt best practices and improve the open ecosystem. With this release, we’re providing new trust and safety tools including updated components with both Llama Guard 2 and Cybersec Eval 2, and the introduction of Code Shield—an inference time guardrail for filtering insecure code produced by LLMs.

We’ve also co-developed Llama 3 with torchtune , the new PyTorch-native library for easily authoring, fine-tuning, and experimenting with LLMs. torchtune provides memory efficient and hackable training recipes written entirely in PyTorch. The library is integrated with popular platforms such as Hugging Face, Weights & Biases, and EleutherAI and even supports Executorch for enabling efficient inference to be run on a wide variety of mobile and edge devices. For everything from prompt engineering to using Llama 3 with LangChain we have a comprehensive getting started guide and takes you from downloading Llama 3 all the way to deployment at scale within your generative AI application.

A system-level approach to responsibility

We have designed Llama 3 models to be maximally helpful while ensuring an industry leading approach to responsibly deploying them. To achieve this, we have adopted a new, system-level approach to the responsible development and deployment of Llama. We envision Llama models as part of a broader system that puts the developer in the driver’s seat. Llama models will serve as a foundational piece of a system that developers design with their unique end goals in mind.

virtual voice assistant research paper

Instruction fine-tuning also plays a major role in ensuring the safety of our models. Our instruction-fine-tuned models have been red-teamed (tested) for safety through internal and external efforts. ​​Our red teaming approach leverages human experts and automation methods to generate adversarial prompts that try to elicit problematic responses. For instance, we apply comprehensive testing to assess risks of misuse related to Chemical, Biological, Cyber Security, and other risk areas. All of these efforts are iterative and used to inform safety fine-tuning of the models being released. You can read more about our efforts in the model card .

Llama Guard models are meant to be a foundation for prompt and response safety and can easily be fine-tuned to create a new taxonomy depending on application needs. As a starting point, the new Llama Guard 2 uses the recently announced MLCommons taxonomy, in an effort to support the emergence of industry standards in this important area. Additionally, CyberSecEval 2 expands on its predecessor by adding measures of an LLM’s propensity to allow for abuse of its code interpreter, offensive cybersecurity capabilities, and susceptibility to prompt injection attacks (learn more in our technical paper ). Finally, we’re introducing Code Shield which adds support for inference-time filtering of insecure code produced by LLMs. This offers mitigation of risks around insecure code suggestions, code interpreter abuse prevention, and secure command execution.

With the speed at which the generative AI space is moving, we believe an open approach is an important way to bring the ecosystem together and mitigate these potential harms. As part of that, we’re updating our Responsible Use Guide (RUG) that provides a comprehensive guide to responsible development with LLMs. As we outlined in the RUG, we recommend that all inputs and outputs be checked and filtered in accordance with content guidelines appropriate to the application. Additionally, many cloud service providers offer content moderation APIs and other tools for responsible deployment, and we encourage developers to also consider using these options.

Deploying Llama 3 at scale

Llama 3 will soon be available on all major platforms including cloud providers, model API providers, and much more. Llama 3 will be everywhere .

Our benchmarks show the tokenizer offers improved token efficiency, yielding up to 15% fewer tokens compared to Llama 2. Also, Group Query Attention (GQA) now has been added to Llama 3 8B as well. As a result, we observed that despite the model having 1B more parameters compared to Llama 2 7B, the improved tokenizer efficiency and GQA contribute to maintaining the inference efficiency on par with Llama 2 7B.

For examples of how to leverage all of these capabilities, check out Llama Recipes which contains all of our open source code that can be leveraged for everything from fine-tuning to deployment to model evaluation.

What’s next for Llama 3?

The Llama 3 8B and 70B models mark the beginning of what we plan to release for Llama 3. And there’s a lot more to come.

Our largest models are over 400B parameters and, while these models are still training, our team is excited about how they’re trending. Over the coming months, we’ll release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities. We will also publish a detailed research paper once we are done training Llama 3.

To give you a sneak preview for where these models are today as they continue training, we thought we could share some snapshots of how our largest LLM model is trending. Please note that this data is based on an early checkpoint of Llama 3 that is still training and these capabilities are not supported as part of the models released today.

virtual voice assistant research paper

We’re committed to the continued growth and development of an open AI ecosystem for releasing our models responsibly. We have long believed that openness leads to better, safer products, faster innovation, and a healthier overall market. This is good for Meta, and it is good for society. We’re taking a community-first approach with Llama 3, and starting today, these models are available on the leading cloud, hosting, and hardware platforms with many more to come.

Try Meta Llama 3 today

We’ve integrated our latest models into Meta AI, which we believe is the world’s leading AI assistant. It’s now built with Llama 3 technology and it’s available in more countries across our apps.

You can use Meta AI on Facebook, Instagram, WhatsApp, Messenger, and the web to get things done, learn, create, and connect with the things that matter to you. You can read more about the Meta AI experience here .

Visit the Llama 3 website to download the models and reference the Getting Started Guide for the latest list of all available platforms.

You’ll also soon be able to test multimodal Meta AI on our Ray-Ban Meta smart glasses.

As always, we look forward to seeing all the amazing products and experiences you will build with Meta Llama 3.

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

virtual voice assistant research paper

Product experiences

Foundational models

Latest news

Meta © 2024

  • Share full article

A robot version of a Mary Poppins-like nanny holds an umbrella and an oversized bag in its hands.

Critic’s Notebook

How a Virtual Assistant Taught Me to Appreciate Busywork

A new category of apps promises to relieve parents of drudgery, with an assist from A.I. But a family’s grunt work is more human, and valuable, than it seems.

Credit... Illustration by Cari Vander Yacht

Supported by

Amanda Hess

By Amanda Hess

  • April 24, 2024

I recently downloaded a virtual assistant that promised to ease the burdens of modern parenthood. The app is called Yohana , and it offered to handle a pile of tasks on my behalf. It suggested enlisting a professional to wash my windows, scheduling a lesson with a “private sports coach” or planning a “stylish and sustainable” Earth Day party featuring décor, recipes, activities and party favors, none of which interested me. Finally it volunteered to produce a “chef-curated menu” for Passover.

Well, sure. I was already planning on attending a friend’s Seder, and at least this task did not involve Yohana siccing an expert on me or making me host an elaborate event. So, I agreed to the Passover idea. Yohana assigned the “to-do” to a faceless assistant identified only by a first name. The next day, she sent along a confusing list of menu options that included a recipe for ham mini quiches — a provocative choice.

Yohana is one of a growing crew of virtual-assistant apps that combine artificial intelligence and human labor to help parents manage their family lives. For $129 a month, Yohana promises to “offload joy-stealing tasks, improve your family’s well-being, and find more breathing room in your schedule.” Ohai ($26.99 a month), a text-based “A.I. household assistant,” wants to “lighten the mental load of Chief Household Officers,” and Milo ($40/month, with a wait list), an “A.I. co-pilot,” hopes to calm “every form of family chaos.”

These apps are styled like cutesy helpmeets, and their names — Yohana, Ohai, Milo — would be at home on a Brooklyn day care roster. Though pitched to “busy parents,” they implicitly target affluent working mothers who are struggling to manage household tasks on top of work and child care, and who might even be convinced to spend some (though not too much) extra cash to make them go away. But when I gave Yohana a spin, I found that I did not want to do the things she can manage, and that she cannot manage the things I want to do. She made me start to believe that the busywork I might delegate to a machine is actually more human, and valuable, than I realized.

Mothers have long been served fantasies about how robots will relieve the drudgery of housework. In the first episode of the animated sitcom “The Jetsons,” from 1962, Jane Jetson tires of pressing all the buttons that automatically cook and clean for her, so she buys Rosie the robot maid to run her smart house instead. In 1965, General Electric urged housewives to “Let a Mobile Maid Dishwasher give you priceless time for the wife-and-mother jobs that really count.”

And yet automation has failed to eliminate the burdens of those “wife-and-mother jobs.” In a culture that promotes ruthless competition and intensive mothering, a mother’s tasks (the ones that “really count”) are capable of expanding endlessly.

The feminist campaign to demand “wages for housework,” which also captured the maternal imagination in the 1960s and ’70s, represented the flip side of the automation fantasy. As Barbara Ehrenreich documented in her 2000 essay “Maid to Order,” that campaign dissolved as professional women instead opted to pay other women to clean their houses for them, often under lousy conditions. Now a modern wealthy mother can have it all: She can use her phone to command a robot-esque “assistant” to hire a human cleaner on her behalf, without having to actually look anyone in the face.

In their brand copy, these apps speak of lifting loads — “mental loads,” “invisible loads.” They suggest that the central challenge of parenthood is bureaucratic. Families should be “about love, not logistics,” Milo says.

But in a bid to banish bureaucracy, these services add layer upon layer. They suggest we hire more helpers, schedule more activities, plan more events. (An Earth Day party with recyclable décor? No. Private sports coaching? Absolutely not!) When I signed up for Ohai, it texted me every morning, asking if it could add a workout to my schedule.

I don’t need help scheduling more things to do; I need to do less. Often these services suggest that users throw money at that problem (which is not very helpful if one of your problems is that you do not have enough money). The apps transform parents from workers into consumers, translating our to-do lists into shopping lists. Somebody is still performing our “joy-stealing” tasks, and it may be a call center worker or one of the many other invisible laborers who make artificial intelligence systems seem to run automatically.

The boundary between the human and the artificial is slippery; Yohana emphasizes that it employs “actual humans (not A.I. chatbots) that can do the grunt work,” though according to Forbes, those humans are using generative A.I . to assist them with our tasks. When these services style themselves as “worker bees,” “secret helpers” or “fairy godmothers,” they lean on the iconography of fantasy to obscure the grimmer reality of farming out your “grunt work” to an anonymized labor force.

The work that these services hope to eradicate (or at least obscure) is feminized. It’s “women’s work,” and indeed, most of my Yohana helpers had feminine first names. One of the most helpful things a virtual assistant can do is assign family burdens more equitably among its members, a duty commonly demeaned as “nagging.”

Last year, Meghan Verena Joyce, the chief executive of another task delegation service, Duckbill, argued that “with its capabilities for efficiency and customization,” artificial intelligence “could play a crucial role in easing the societal and economic burdens that disproportionately affect women.”

In an illustration on Yohana’s website, a typical user is portrayed as a bespectacled woman who wears a baby in a sling, anchors a square of wrapping paper under a foot, balances a bowl of dog food on a lifted leg, stirs a pot with one hand and types on a computer with the other. She resembles Rosie from the Jetsons, each mechanical limb firing autonomously in order to labor more efficiently. We are familiar with A.I. helpers, like Apple’s Siri, which are modeled after feminine stereotypes, but here it feels as if the opposite is happening: A mother has been recast as a robotic being, her work dismissed as rote and easily outsourced.

In the few weeks that I spent as a virtual-assistant taskmaster, I realized that much of the busywork claimed by the apps is actually quite personal, often rewarding and occasionally transformative.

For instance, when I asked Yohana where I could shop locally for a child’s birthday party, it spat out links to Amazon toys instead. And when I asked if it could find a worker-owned cooperative cleaning service (there are many in New York City ), it did not; instead it linked me to the profile of an app, Quicklyn, hosted on another app, Thumbtack. An app can suggest a volunteer opportunity that welcomes children, but it can’t do what my neighbor did, which was add me to the WhatsApp group organizing mutual aid for the nearby migrant shelter. It can direct me to a national database of registered caregivers but not to the teenage babysitter who lives three floors above me.

When I alerted my Yohana assistants to some of these issues — the Passover ham, the Amazon links — they dutifully fixed them, though it’s hard to imagine a worse use of my time than reforming the stranger I’d hired to fix my life through my phone. These services may be able to plug users into corporate-mediated experiences, but no amount of machine learning can simulate neighborhood bonds. “Grunt work” can be central to building community, but only if you do it yourself.

Amanda Hess is a critic at large for the Culture section of The Times, covering the intersection of internet and pop culture. More about Amanda Hess

Advertisement

Voice and Gesture based Virtual Desktop Assistant for Physically Challenged People

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

COMMENTS

  1. (PDF) VOICE ASSISTANT: A SYSTEMATIC LITERATURE REVIEW

    work is a systematic review of the literature on V oice assistants (V A). Innovative mode of interaction, the V oice. Assistant definition is derived from advances in artificial intelligence ...

  2. (PDF) VIRTUAL ASSISTANT USING PYTHON

    VIRTUAL ASSISTANT USING. PYTHON. Vivek Vishal Singh. Student, Department of Computer Science & Engineering, Galgotias University, Gautam. Buddha Nagar, Greater Noida, Uttar Pradesh, India. Under ...

  3. Voice assistants in private households: a conceptual framework for

    The present study identifies, organizes, and structures the available scientific knowledge on the recent use and the prospects of Voice Assistants (VA) in private households. The systematic review ...

  4. Humanizing voice assistant: The impact of voice assistant personality

    A voice assistant (VA), a type of voice-enabled artificial intelligence, is no longer just a character in science fiction movies. Currently, voice is embedded in a variety of products such as smartphones (mobile applications) and smart speakers in consumers' homes. ... However, little research has taken note of the role voice interaction in a ...

  5. PDF Voice Assistants and Smart Speakers in Everyday Life and in ...

    voice assistants will reach $19 billion by the same year according to Juniper Research (2017). The Alexa platform is the dominant market leader, with more than 70% of all intelligent voice assistant-enabled devices (other than phones), running the Alexa plat-form (Griswold, 2018). Voice assistants have several interesting capabilities such as:

  6. (PDF) Development of AI-based voice assistants using ...

    1. ABSTRACT. Voice assistants have become an integral part of our daily lives, enabling natura l and seamless. interactions with technology. Recent advancements in natural language processing (NLP ...

  7. Nova: a voice-controlled virtual assistant for seamless task ...

    This paper presents Nova, an advanced virtual assistant designed to operate through natural language commands, thereby bridging the gap between user intent and system action. Leveraging state-of-the-art techniques in natural language processing, speech recognition, and text-to-speech conversion, Nova empowers users with the ability to execute a ...

  8. AI-Based Virtual Assistant Using Python: A Systematic Review

    Vishal Kumar Dhanraj, Lokesh kriplani, Semal Mahajan, "Research Paper on Desktop Voice Assistant" International Journal of Research in Engineering and Science, Volume 10 Issue 2, February 2022.

  9. A Systematic Review of Voice Assistant Usability: An ISO 9241-11

    Interactive design is the most effective experiment that provides real-time results employed [ 56, 79 ]. The ISO 9241-11 framework works well with effectiveness, efficiency, and satisfaction; however, the users have more expectations from the voice assistant with the recent advancement of VA capabilities.

  10. Voice-Based Intelligent Virtual Assistant

    A smart virtual assistant is a digital life aid designed to make the user's life easier. It is a highly developed programme with a robust voice recognition engine that focuses on processing an audio input, turning it to text, and completing the task . The majority of virtual assistants use speech as their primary mode of contact . It focuses ...

  11. Voice-Based Virtual Assistants for User Interaction Modeling

    In this paper we propose an approach and implementation of a voice-based virtual assistant for model-driven development of user interactions and user interfaces. While our solution is general and independent from the modeling language, to demonstrate the feasibility and advantages of the approach, the paper describes an implementation upon ...

  12. An Intelligent Virtual System using Machine Learning

    This paper is based on a voice based virtual assistant. It is a tool in AI that allows us to fulfil different purposes just by giving voice commands. The voice assistant we have developed is a desktop-based built using python modules and libraries which further stretches its reach to Machine Learning models and Deep Learning. In this process, we developed a system which recognizes the voice ...

  13. Voice Assistant Using Artificial Intelligence

    The paper also introduces the application of virtual assistants that can help open up opportunities for humanity in various fields. Voice control is an important growing feature that will change people's lives. Voice assistants are available for laptops, desktops and mobile phones. Assistant is now available on all electronic devices.

  14. A Systematic Review of Voice Assistant Usability: An ISO 9241-11

    Voice assistants (VA) are an emerging technology that have become an essential tool of the twenty-first century. The VA ease of access and use has resulted in high usability curiosity in voice assistants. Usability is an essential aspect of any emerging technology, with every technology having a standardized usability measure. Despite the high acceptance rate on the use of VA, to the best of ...

  15. Artificial Intelligence-based Voice Assistant

    Voice control is a major growing feature that change the way people can live. The voice assistant is commonly being used in smartphones and laptops. AI-based Voice assistants are the operating systems that can recognize human voice and respond via integrated voices. This voice assistant will gather the audio from the microphone and then convert that into text, later it is sent through GTTS ...

  16. VOICE ASSISTANT USING PYTHON

    The primary goal of Artificial Intelligence is to make human interaction with computers and other electronic devices considerably easier and more practical. Personal assistant that can carry out activities for everyday needs with just a simple phrase is a rapidly increasing field nowadays. Many firms have leveraged dialogue systems technology to create Virtual Personal Assistants (VPAs) based ...

  17. PDF JARVIS

    JARVIS, a virtual embedded voice assistant that includes cutting-edge technology based on gTTS and Python in developing a personalized assistant. JARVIS integrates the functionality of AIML and, together with Google, the industry leader, a text-to-speech ... continuous in research papers since 2000, except the year 2010 (Figure . 2.2). So this ...

  18. Next-generation of virtual personal assistants (Microsoft Cortana

    Many companies have used the dialogue systems technology to establish various kinds of Virtual Personal Assistants(VPAs) based on their applications and areas, such as Microsoft's Cortana, Apple's Siri, Amazon Alexa, Google Assistant, and Facebook's M. However, in this proposal, we have used the multi-modal dialogue systems which process two or ...

  19. PDF Personal Desktop Voice Assistant

    to provide technical information about virtual assistant technology, including its advantages and disadvantages in many contexts. The project focuses on virtual assistant types and structural elements of a virtual assistant system. This research paper explores the development and application of Personal Desktop Voice Assistants in various domains.

  20. Artificial Intelligence and Virtual Assistant—Working Model

    In this research paper, we have studied and analyzed the working model and the efficiency of different virtual assistants available in the market. ... To perform an action on virtual assistant via voice communication interface on different objects, some of the approaches work upon computers, tablets while some upon smart devices like ...

  21. (PDF) An Artificial Intelligence Based Virtual Assistant Using

    In this paper, we propose a smart virtual assistant using ... adding voice command capabilit y to our ... In sectors like education and research, AI chatbots can act as virtual assistants, helping ...

  22. Introducing Meta Llama 3: The most capable openly available LLM to date

    Today, we're introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel ...

  23. How a Virtual Assistant Taught Me to Appreciate Busywork

    For $129 a month, Yohana promises to "offload joy-stealing tasks, improve your family's well-being, and find more breathing room in your schedule.". Ohai ($26.99 a month), a text-based "A ...

  24. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants

    Voice assistants are software agents that can interpret human. speech and respond via synthesized voices. Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant. are the ...

  25. Voice and Gesture based Virtual Desktop Assistant for Physically

    This research work attempts to propose a voice and gesture based virtual assistant that can be used by disabled as well as non disabled persons to perform common tasks on their computers. The main aim of this research paper is to develop natural human-machine interaction. Input to the virtual assistant is the user's choice. Users can choose to ...

  26. (PDF) VOICE BASED VIRTUAL ASSISTANT

    Abstract — The goal of this project is to create a voice-. based smart virtual assistant that makes use of cutting-. edge machine learning, speech recognition, and other. technologies. Through ...