How many participants do I need for qualitative research?

  • Participant recruitment
  • Qualitative research

6 min read David Renwick

qualitative research respondents number

For those new to the qualitative research space, there’s one question that’s usually pretty tough to figure out, and that’s the question of how many participants to include in a study. Regardless of whether it’s research as part of the discovery phase for a new product, or perhaps an in-depth canvas of the users of an existing service, researchers can often find it difficult to agree on the numbers. So is there an easy answer? Let’s find out.

Here, we’ll look into the right number of participants for qualitative research studies. If you want to know about participants for quantitative research, read Nielsen Norman Group’s article .

Getting the numbers right

So you need to run a series of user interviews or usability tests and aren’t sure exactly how many people you should reach out to. It can be a tricky situation – especially for those without much experience. Do you test a small selection of 1 or 2 people to make the recruitment process easier? Or, do you go big and test with a series of 10 people over the course of a month? The answer lies somewhere in between.

It’s often a good idea (for qualitative research methods like interviews and usability tests) to start with 5 participants and then scale up by a further 5 based on how complicated the subject matter is. You may also find it helpful to add additional participants if you’re new to user research or you’re working in a new area.

What you’re actually looking for here is what’s known as saturation.

Understanding saturation

Whether it’s qualitative research as part of a master’s thesis or as research for a new online dating app, saturation is the best metric you can use to identify when you’ve hit the right number of participants.

In a nutshell, saturation is when you’ve reached the point where adding further participants doesn’t give you any further insights. It’s true that you may still pick up on the occasional interesting detail, but all of your big revelations and learnings have come and gone. A good measure is to sit down after each session with a participant and analyze the number of new insights you’ve noted down.

Interestingly, in a paper titled How Many Interviews Are Enough? , authors Greg Guest, Arwen Bunce and Laura Johnson noted that saturation usually occurs with around 12 participants in homogeneous groups (meaning people in the same role at an organization, for example). However, carrying out ethnographic research on a larger domain with a diverse set of participants will almost certainly require a larger sample.

Ensuring you’ve hit the right number of participants

How do you know when you’ve reached saturation point? You have to keep conducting interviews or usability tests until you’re no longer uncovering new insights or concepts.

While this may seem to run counter to the idea of just gathering as much data from as many people as possible, there’s a strong case for focusing on a smaller group of participants. In The logic of small samples in interview-based , authors Mira Crouch and Heather McKenzie note that using fewer than 20 participants during a qualitative research study will result in better data. Why? With a smaller group, it’s easier for you (the researcher) to build strong close relationships with your participants, which in turn leads to more natural conversations and better data.

There’s also a school of thought that you should interview 5 or so people per persona. For example, if you’re working in a company that has well-defined personas, you might want to use those as a basis for your study, and then you would interview 5 people based on each persona. This maybe worth considering or particularly important when you have a product that has very distinct user groups (e.g. students and staff, teachers and parents etc).

How your domain affects sample size

The scope of the topic you’re researching will change the amount of information you’ll need to gather before you’ve hit the saturation point. Your topic is also commonly referred to as the domain.

If you’re working in quite a confined domain, for example, a single screen of a mobile app or a very specific scenario, you’ll likely find interviews with 5 participants to be perfectly fine. Moving into more complicated domains, like the entire checkout process for an online shopping app, will push up your sample size.

As Mitchel Seaman notes : “Exploring a big issue like young peoples’ opinions about healthcare coverage, a broad emotional issue like postmarital sexuality, or a poorly-understood domain for your team like mobile device use in another country can drastically increase the number of interviews you’ll want to conduct.”

In-person or remote

Does the location of your participants change the number you need for qualitative user research? Well, not really – but there are other factors to consider.

  • Budget: If you choose to conduct remote interviews/usability tests, you’ll likely find you’ve got lower costs as you won’t need to travel to your participants or have them travel to you. This also affects…
  • Participant access: Remote qualitative research can be a lifesaver when it comes to participant access. No longer are you confined to the people you have physical access to — instead you can reach out to anyone you’d like.
  • Quality: On the other hand, remote research does have its downsides. For one, you’ll likely find you’re not able to build the same kinds of relationships over the internet or phone as those in person, which in turn means you never quite get the same level of insights.

Is there value in outsourcing recruitment?

Recruitment is understandably an intensive logistical exercise with many moving parts. If you’ve ever had to recruit people for a study before, you’ll understand the need for long lead times (to ensure you have enough participants for the project) and the countless long email chains as you discuss suitable times.

Outsourcing your participant recruitment is just one way to lighten the logistical load during your research. Instead of having to go out and look for participants, you have them essentially delivered to you in the right number and with the right attributes.

We’ve got one such service at Optimal Workshop, which means it’s the perfect accompaniment if you’re also using our platform of UX tools. Read more about that here .

So that’s really most of what there is to know about participant recruitment in a qualitative research context. As we said at the start, while it can appear quite tricky to figure out exactly how many people you need to recruit, it’s actually not all that difficult in reality.

Overall, the number of participants you need for your qualitative research can depend on your project among other factors. It’s important to keep saturation in mind, as well as the locale of participants. You also need to get the most you can out of what’s available to you. Remember: Some research is better than none!

Capture, analyze and visualize your qualitative data.

Try our qualitative research tool for usability testing, interviewing and note-taking. Reframer by Optimal Workshop.

qualitative research respondents number

Published on August 8, 2019

qualitative research respondents number

David Renwick

David is Optimal Workshop's Content Strategist and Editor of CRUX. You can usually find him alongside one of the office dogs 🐕 (Bella, Bowie, Frida, Tana or Steezy). Connect with him on LinkedIn.

Recommended for you

qualitative research respondents number

Remote research opens the door for richer, higher-fidelity insights

Remote research opens up a huge number of ways to empower participants and to give yourself a better chance to have a valid understanding of what’s going on.

qualitative research respondents number

Participant recruitment made easy and fast

Today we have an exciting new feature to announce. It's probably the most requested feature of all time for us.

How do I break the perception that UX research is 'slow'?

How do I break the perception that UX research is 'slow'?

How do you break the perception that UX design, and research in particular, can be 'slow'?

Try Optimal Workshop tools for free

What are you looking for.

Explore all tags

Discover more from Optimal Workshop

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading

  • Research article
  • Open access
  • Published: 21 November 2018

Characterising and justifying sample size sufficiency in interview-based studies: systematic analysis of qualitative health research over a 15-year period

  • Konstantina Vasileiou   ORCID: orcid.org/0000-0001-5047-3920 1 ,
  • Julie Barnett 1 ,
  • Susan Thorpe 2 &
  • Terry Young 3  

BMC Medical Research Methodology volume  18 , Article number:  148 ( 2018 ) Cite this article

698k Accesses

1087 Citations

175 Altmetric

Metrics details

Choosing a suitable sample size in qualitative research is an area of conceptual debate and practical uncertainty. That sample size principles, guidelines and tools have been developed to enable researchers to set, and justify the acceptability of, their sample size is an indication that the issue constitutes an important marker of the quality of qualitative research. Nevertheless, research shows that sample size sufficiency reporting is often poor, if not absent, across a range of disciplinary fields.

A systematic analysis of single-interview-per-participant designs within three health-related journals from the disciplines of psychology, sociology and medicine, over a 15-year period, was conducted to examine whether and how sample sizes were justified and how sample size was characterised and discussed by authors. Data pertinent to sample size were extracted and analysed using qualitative and quantitative analytic techniques.

Our findings demonstrate that provision of sample size justifications in qualitative health research is limited; is not contingent on the number of interviews; and relates to the journal of publication. Defence of sample size was most frequently supported across all three journals with reference to the principle of saturation and to pragmatic considerations. Qualitative sample sizes were predominantly – and often without justification – characterised as insufficient (i.e., ‘small’) and discussed in the context of study limitations. Sample size insufficiency was seen to threaten the validity and generalizability of studies’ results, with the latter being frequently conceived in nomothetic terms.

Conclusions

We recommend, firstly, that qualitative health researchers be more transparent about evaluations of their sample size sufficiency, situating these within broader and more encompassing assessments of data adequacy . Secondly, we invite researchers critically to consider how saturation parameters found in prior methodological studies and sample size community norms might best inform, and apply to, their own project and encourage that data adequacy is best appraised with reference to features that are intrinsic to the study at hand. Finally, those reviewing papers have a vital role in supporting and encouraging transparent study-specific reporting.

Peer Review reports

Sample adequacy in qualitative inquiry pertains to the appropriateness of the sample composition and size . It is an important consideration in evaluations of the quality and trustworthiness of much qualitative research [ 1 ] and is implicated – particularly for research that is situated within a post-positivist tradition and retains a degree of commitment to realist ontological premises – in appraisals of validity and generalizability [ 2 , 3 , 4 , 5 ].

Samples in qualitative research tend to be small in order to support the depth of case-oriented analysis that is fundamental to this mode of inquiry [ 5 ]. Additionally, qualitative samples are purposive, that is, selected by virtue of their capacity to provide richly-textured information, relevant to the phenomenon under investigation. As a result, purposive sampling [ 6 , 7 ] – as opposed to probability sampling employed in quantitative research – selects ‘information-rich’ cases [ 8 ]. Indeed, recent research demonstrates the greater efficiency of purposive sampling compared to random sampling in qualitative studies [ 9 ], supporting related assertions long put forward by qualitative methodologists.

Sample size in qualitative research has been the subject of enduring discussions [ 4 , 10 , 11 ]. Whilst the quantitative research community has established relatively straightforward statistics-based rules to set sample sizes precisely, the intricacies of qualitative sample size determination and assessment arise from the methodological, theoretical, epistemological, and ideological pluralism that characterises qualitative inquiry (for a discussion focused on the discipline of psychology see [ 12 ]). This mitigates against clear-cut guidelines, invariably applied. Despite these challenges, various conceptual developments have sought to address this issue, with guidance and principles [ 4 , 10 , 11 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ], and more recently, an evidence-based approach to sample size determination seeks to ground the discussion empirically [ 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 ].

Focusing on single-interview-per-participant qualitative designs, the present study aims to further contribute to the dialogue of sample size in qualitative research by offering empirical evidence around justification practices associated with sample size. We next review the existing conceptual and empirical literature on sample size determination.

Sample size in qualitative research: Conceptual developments and empirical investigations

Qualitative research experts argue that there is no straightforward answer to the question of ‘how many’ and that sample size is contingent on a number of factors relating to epistemological, methodological and practical issues [ 36 ]. Sandelowski [ 4 ] recommends that qualitative sample sizes are large enough to allow the unfolding of a ‘new and richly textured understanding’ of the phenomenon under study, but small enough so that the ‘deep, case-oriented analysis’ (p. 183) of qualitative data is not precluded. Morse [ 11 ] posits that the more useable data are collected from each person, the fewer participants are needed. She invites researchers to take into account parameters, such as the scope of study, the nature of topic (i.e. complexity, accessibility), the quality of data, and the study design. Indeed, the level of structure of questions in qualitative interviewing has been found to influence the richness of data generated [ 37 ], and so, requires attention; empirical research shows that open questions, which are asked later on in the interview, tend to produce richer data [ 37 ].

Beyond such guidance, specific numerical recommendations have also been proffered, often based on experts’ experience of qualitative research. For example, Green and Thorogood [ 38 ] maintain that the experience of most qualitative researchers conducting an interview-based study with a fairly specific research question is that little new information is generated after interviewing 20 people or so belonging to one analytically relevant participant ‘category’ (pp. 102–104). Ritchie et al. [ 39 ] suggest that studies employing individual interviews conduct no more than 50 interviews so that researchers are able to manage the complexity of the analytic task. Similarly, Britten [ 40 ] notes that large interview studies will often comprise of 50 to 60 people. Experts have also offered numerical guidelines tailored to different theoretical and methodological traditions and specific research approaches, e.g. grounded theory, phenomenology [ 11 , 41 ]. More recently, a quantitative tool was proposed [ 42 ] to support a priori sample size determination based on estimates of the prevalence of themes in the population. Nevertheless, this more formulaic approach raised criticisms relating to assumptions about the conceptual [ 43 ] and ontological status of ‘themes’ [ 44 ] and the linearity ascribed to the processes of sampling, data collection and data analysis [ 45 ].

In terms of principles, Lincoln and Guba [ 17 ] proposed that sample size determination be guided by the criterion of informational redundancy , that is, sampling can be terminated when no new information is elicited by sampling more units. Following the logic of informational comprehensiveness Malterud et al. [ 18 ] introduced the concept of information power as a pragmatic guiding principle, suggesting that the more information power the sample provides, the smaller the sample size needs to be, and vice versa.

Undoubtedly, the most widely used principle for determining sample size and evaluating its sufficiency is that of saturation . The notion of saturation originates in grounded theory [ 15 ] – a qualitative methodological approach explicitly concerned with empirically-derived theory development – and is inextricably linked to theoretical sampling. Theoretical sampling describes an iterative process of data collection, data analysis and theory development whereby data collection is governed by emerging theory rather than predefined characteristics of the population. Grounded theory saturation (often called theoretical saturation) concerns the theoretical categories – as opposed to data – that are being developed and becomes evident when ‘gathering fresh data no longer sparks new theoretical insights, nor reveals new properties of your core theoretical categories’ [ 46 p. 113]. Saturation in grounded theory, therefore, does not equate to the more common focus on data repetition and moves beyond a singular focus on sample size as the justification of sampling adequacy [ 46 , 47 ]. Sample size in grounded theory cannot be determined a priori as it is contingent on the evolving theoretical categories.

Saturation – often under the terms of ‘data’ or ‘thematic’ saturation – has diffused into several qualitative communities beyond its origins in grounded theory. Alongside the expansion of its meaning, being variously equated with ‘no new data’, ‘no new themes’, and ‘no new codes’, saturation has emerged as the ‘gold standard’ in qualitative inquiry [ 2 , 26 ]. Nevertheless, and as Morse [ 48 ] asserts, whilst saturation is the most frequently invoked ‘guarantee of qualitative rigor’, ‘it is the one we know least about’ (p. 587). Certainly researchers caution that saturation is less applicable to, or appropriate for, particular types of qualitative research (e.g. conversation analysis, [ 49 ]; phenomenological research, [ 50 ]) whilst others reject the concept altogether [ 19 , 51 ].

Methodological studies in this area aim to provide guidance about saturation and develop a practical application of processes that ‘operationalise’ and evidence saturation. Guest, Bunce, and Johnson [ 26 ] analysed 60 interviews and found that saturation of themes was reached by the twelfth interview. They noted that their sample was relatively homogeneous, their research aims focused, so studies of more heterogeneous samples and with a broader scope would be likely to need a larger size to achieve saturation. Extending the enquiry to multi-site, cross-cultural research, Hagaman and Wutich [ 28 ] showed that sample sizes of 20 to 40 interviews were required to achieve data saturation of meta-themes that cut across research sites. In a theory-driven content analysis, Francis et al. [ 25 ] reached data saturation at the 17th interview for all their pre-determined theoretical constructs. The authors further proposed two main principles upon which specification of saturation be based: (a) researchers should a priori specify an initial analysis sample (e.g. 10 interviews) which will be used for the first round of analysis and (b) a stopping criterion , that is, a number of interviews (e.g. 3) that needs to be further conducted, the analysis of which will not yield any new themes or ideas. For greater transparency, Francis et al. [ 25 ] recommend that researchers present cumulative frequency graphs supporting their judgment that saturation was achieved. A comparative method for themes saturation (CoMeTS) has also been suggested [ 23 ] whereby the findings of each new interview are compared with those that have already emerged and if it does not yield any new theme, the ‘saturated terrain’ is assumed to have been established. Because the order in which interviews are analysed can influence saturation thresholds depending on the richness of the data, Constantinou et al. [ 23 ] recommend reordering and re-analysing interviews to confirm saturation. Hennink, Kaiser and Marconi’s [ 29 ] methodological study sheds further light on the problem of specifying and demonstrating saturation. Their analysis of interview data showed that code saturation (i.e. the point at which no additional issues are identified) was achieved at 9 interviews, but meaning saturation (i.e. the point at which no further dimensions, nuances, or insights of issues are identified) required 16–24 interviews. Although breadth can be achieved relatively soon, especially for high-prevalence and concrete codes, depth requires additional data, especially for codes of a more conceptual nature.

Critiquing the concept of saturation, Nelson [ 19 ] proposes five conceptual depth criteria in grounded theory projects to assess the robustness of the developing theory: (a) theoretical concepts should be supported by a wide range of evidence drawn from the data; (b) be demonstrably part of a network of inter-connected concepts; (c) demonstrate subtlety; (d) resonate with existing literature; and (e) can be successfully submitted to tests of external validity.

Other work has sought to examine practices of sample size reporting and sufficiency assessment across a range of disciplinary fields and research domains, from nutrition [ 34 ] and health education [ 32 ], to education and the health sciences [ 22 , 27 ], information systems [ 30 ], organisation and workplace studies [ 33 ], human computer interaction [ 21 ], and accounting studies [ 24 ]. Others investigated PhD qualitative studies [ 31 ] and grounded theory studies [ 35 ]. Incomplete and imprecise sample size reporting is commonly pinpointed by these investigations whilst assessment and justifications of sample size sufficiency are even more sporadic.

Sobal [ 34 ] examined the sample size of qualitative studies published in the Journal of Nutrition Education over a period of 30 years. Studies that employed individual interviews ( n  = 30) had an average sample size of 45 individuals and none of these explicitly reported whether their sample size sought and/or attained saturation. A minority of articles discussed how sample-related limitations (with the latter most often concerning the type of sample, rather than the size) limited generalizability. A further systematic analysis [ 32 ] of health education research over 20 years demonstrated that interview-based studies averaged 104 participants (range 2 to 720 interviewees). However, 40% did not report the number of participants. An examination of 83 qualitative interview studies in leading information systems journals [ 30 ] indicated little defence of sample sizes on the basis of recommendations by qualitative methodologists, prior relevant work, or the criterion of saturation. Rather, sample size seemed to correlate with factors such as the journal of publication or the region of study (US vs Europe vs Asia). These results led the authors to call for more rigor in determining and reporting sample size in qualitative information systems research and to recommend optimal sample size ranges for grounded theory (i.e. 20–30 interviews) and single case (i.e. 15–30 interviews) projects.

Similarly, fewer than 10% of articles in organisation and workplace studies provided a sample size justification relating to existing recommendations by methodologists, prior relevant work, or saturation [ 33 ], whilst only 17% of focus groups studies in health-related journals provided an explanation of sample size (i.e. number of focus groups), with saturation being the most frequently invoked argument, followed by published sample size recommendations and practical reasons [ 22 ]. The notion of saturation was also invoked by 11 out of the 51 most highly cited studies that Guetterman [ 27 ] reviewed in the fields of education and health sciences, of which six were grounded theory studies, four phenomenological and one a narrative inquiry. Finally, analysing 641 interview-based articles in accounting, Dai et al. [ 24 ] called for more rigor since a significant minority of studies did not report precise sample size.

Despite increasing attention to rigor in qualitative research (e.g. [ 52 ]) and more extensive methodological and analytical disclosures that seek to validate qualitative work [ 24 ], sample size reporting and sufficiency assessment remain inconsistent and partial, if not absent, across a range of research domains.

Objectives of the present study

The present study sought to enrich existing systematic analyses of the customs and practices of sample size reporting and justification by focusing on qualitative research relating to health. Additionally, this study attempted to expand previous empirical investigations by examining how qualitative sample sizes are characterised and discussed in academic narratives. Qualitative health research is an inter-disciplinary field that due to its affiliation with medical sciences, often faces views and positions reflective of a quantitative ethos. Thus qualitative health research constitutes an emblematic case that may help to unfold underlying philosophical and methodological differences across the scientific community that are crystallised in considerations of sample size. The present research, therefore, incorporates a comparative element on the basis of three different disciplines engaging with qualitative health research: medicine, psychology, and sociology. We chose to focus our analysis on single-per-participant-interview designs as this not only presents a popular and widespread methodological choice in qualitative health research, but also as the method where consideration of sample size – defined as the number of interviewees – is particularly salient.

Study design

A structured search for articles reporting cross-sectional, interview-based qualitative studies was carried out and eligible reports were systematically reviewed and analysed employing both quantitative and qualitative analytic techniques.

We selected journals which (a) follow a peer review process, (b) are considered high quality and influential in their field as reflected in journal metrics, and (c) are receptive to, and publish, qualitative research (Additional File  1 presents the journals’ editorial positions in relation to qualitative research and sample considerations where available). Three health-related journals were chosen, each representing a different disciplinary field; the British Medical Journal (BMJ) representing medicine, the British Journal of Health Psychology (BJHP) representing psychology, and the Sociology of Health & Illness (SHI) representing sociology.

Search strategy to identify studies

Employing the search function of each individual journal, we used the terms ‘interview*’ AND ‘qualitative’ and limited the results to articles published between 1 January 2003 and 22 September 2017 (i.e. a 15-year review period).

Eligibility criteria

To be eligible for inclusion in the review, the article had to report a cross-sectional study design. Longitudinal studies were thus excluded whilst studies conducted within a broader research programme (e.g. interview studies nested in a trial, as part of a broader ethnography, as part of a longitudinal research) were included if they reported only single-time qualitative interviews. The method of data collection had to be individual, synchronous qualitative interviews (i.e. group interviews, structured interviews and e-mail interviews over a period of time were excluded), and the data had to be analysed qualitatively (i.e. studies that quantified their qualitative data were excluded). Mixed method studies and articles reporting more than one qualitative method of data collection (e.g. individual interviews and focus groups) were excluded. Figure  1 , a PRISMA flow diagram [ 53 ], shows the number of: articles obtained from the searches and screened; papers assessed for eligibility; and articles included in the review (Additional File  2 provides the full list of articles included in the review and their unique identifying code – e.g. BMJ01, BJHP02, SHI03). One review author (KV) assessed the eligibility of all papers identified from the searches. When in doubt, discussions about retaining or excluding articles were held between KV and JB in regular meetings, and decisions were jointly made.

figure 1

PRISMA flow diagram

Data extraction and analysis

A data extraction form was developed (see Additional File  3 ) recording three areas of information: (a) information about the article (e.g. authors, title, journal, year of publication etc.); (b) information about the aims of the study, the sample size and any justification for this, the participant characteristics, the sampling technique and any sample-related observations or comments made by the authors; and (c) information about the method or technique(s) of data analysis, the number of researchers involved in the analysis, the potential use of software, and any discussion around epistemological considerations. The Abstract, Methods and Discussion (and/or Conclusion) sections of each article were examined by one author (KV) who extracted all the relevant information. This was directly copied from the articles and, when appropriate, comments, notes and initial thoughts were written down.

To examine the kinds of sample size justifications provided by articles, an inductive content analysis [ 54 ] was initially conducted. On the basis of this analysis, the categories that expressed qualitatively different sample size justifications were developed.

We also extracted or coded quantitative data regarding the following aspects:

Journal and year of publication

Number of interviews

Number of participants

Presence of sample size justification(s) (Yes/No)

Presence of a particular sample size justification category (Yes/No), and

Number of sample size justifications provided

Descriptive and inferential statistical analyses were used to explore these data.

A thematic analysis [ 55 ] was then performed on all scientific narratives that discussed or commented on the sample size of the study. These narratives were evident both in papers that justified their sample size and those that did not. To identify these narratives, in addition to the methods sections, the discussion sections of the reviewed articles were also examined and relevant data were extracted and analysed.

In total, 214 articles – 21 in the BMJ, 53 in the BJHP and 140 in the SHI – were eligible for inclusion in the review. Table  1 provides basic information about the sample sizes – measured in number of interviews – of the studies reviewed across the three journals. Figure  2 depicts the number of eligible articles published each year per journal.

figure 2

The publication of qualitative studies in the BMJ was significantly reduced from 2012 onwards and this appears to coincide with the initiation of the BMJ Open to which qualitative studies were possibly directed.

Pairwise comparisons following a significant Kruskal-Wallis Footnote 2 test indicated that the studies published in the BJHP had significantly ( p  < .001) smaller samples sizes than those published either in the BMJ or the SHI. Sample sizes of BMJ and SHI articles did not differ significantly from each other.

Sample size justifications: Results from the quantitative and qualitative content analysis

Ten (47.6%) of the 21 BMJ studies, 26 (49.1%) of the 53 BJHP papers and 24 (17.1%) of the 140 SHI articles provided some sort of sample size justification. As shown in Table  2 , the majority of articles which justified their sample size provided one justification (70% of articles); fourteen studies (25%) provided two distinct justifications; one study (1.7%) gave three justifications and two studies (3.3%) expressed four distinct justifications.

There was no association between the number of interviews (i.e. sample size) conducted and the provision of a justification (rpb = .054, p  = .433). Within journals, Mann-Whitney tests indicated that sample sizes of ‘justifying’ and ‘non-justifying’ articles in the BMJ and SHI did not differ significantly from each other. In the BJHP, ‘justifying’ articles ( Mean rank  = 31.3) had significantly larger sample sizes than ‘non-justifying’ studies ( Mean rank  = 22.7; U = 237.000, p  < .05).

There was a significant association between the journal a paper was published in and the provision of a justification (χ 2 (2) = 23.83, p  < .001). BJHP studies provided a sample size justification significantly more often than would be expected ( z  = 2.9); SHI studies significantly less often ( z  = − 2.4). If an article was published in the BJHP, the odds of providing a justification were 4.8 times higher than if published in the SHI. Similarly if published in the BMJ, the odds of a study justifying its sample size were 4.5 times higher than in the SHI.

The qualitative content analysis of the scientific narratives identified eleven different sample size justifications. These are described below and illustrated with excerpts from relevant articles. By way of a summary, the frequency with which these were deployed across the three journals is indicated in Table  3 .

Saturation was the most commonly invoked principle (55.4% of all justifications) deployed by studies across all three journals to justify the sufficiency of their sample size. In the BMJ, two studies claimed that they achieved data saturation (BMJ17; BMJ18) and one article referred descriptively to achieving saturation without explicitly using the term (BMJ13). Interestingly, BMJ13 included data in the analysis beyond the point of saturation in search of ‘unusual/deviant observations’ and with a view to establishing findings consistency.

Thirty three women were approached to take part in the interview study. Twenty seven agreed and 21 (aged 21–64, median 40) were interviewed before data saturation was reached (one tape failure meant that 20 interviews were available for analysis). (BMJ17). No new topics were identified following analysis of approximately two thirds of the interviews; however, all interviews were coded in order to develop a better understanding of how characteristic the views and reported behaviours were, and also to collect further examples of unusual/deviant observations. (BMJ13).

Two articles reported pre-determining their sample size with a view to achieving data saturation (BMJ08 – see extract in section In line with existing research ; BMJ15 – see extract in section Pragmatic considerations ) without further specifying if this was achieved. One paper claimed theoretical saturation (BMJ06) conceived as being when “no further recurring themes emerging from the analysis” whilst another study argued that although the analytic categories were highly saturated, it was not possible to determine whether theoretical saturation had been achieved (BMJ04). One article (BMJ18) cited a reference to support its position on saturation.

In the BJHP, six articles claimed that they achieved data saturation (BJHP21; BJHP32; BJHP39; BJHP48; BJHP49; BJHP52) and one article stated that, given their sample size and the guidelines for achieving data saturation, it anticipated that saturation would be attained (BJHP50).

Recruitment continued until data saturation was reached, defined as the point at which no new themes emerged. (BJHP48). It has previously been recommended that qualitative studies require a minimum sample size of at least 12 to reach data saturation (Clarke & Braun, 2013; Fugard & Potts, 2014; Guest, Bunce, & Johnson, 2006) Therefore, a sample of 13 was deemed sufficient for the qualitative analysis and scale of this study. (BJHP50).

Two studies argued that they achieved thematic saturation (BJHP28 – see extract in section Sample size guidelines ; BJHP31) and one (BJHP30) article, explicitly concerned with theory development and deploying theoretical sampling, claimed both theoretical and data saturation.

The final sample size was determined by thematic saturation, the point at which new data appears to no longer contribute to the findings due to repetition of themes and comments by participants (Morse, 1995). At this point, data generation was terminated. (BJHP31).

Five studies argued that they achieved (BJHP05; BJHP33; BJHP40; BJHP13 – see extract in section Pragmatic considerations ) or anticipated (BJHP46) saturation without any further specification of the term. BJHP17 referred descriptively to a state of achieved saturation without specifically using the term. Saturation of coding , but not saturation of themes, was claimed to have been reached by one article (BJHP18). Two articles explicitly stated that they did not achieve saturation; instead claiming a level of theme completeness (BJHP27) or that themes being replicated (BJHP53) were arguments for sufficiency of their sample size.

Furthermore, data collection ceased on pragmatic grounds rather than at the point when saturation point was reached. Despite this, although nuances within sub-themes were still emerging towards the end of data analysis, the themes themselves were being replicated indicating a level of completeness. (BJHP27).

Finally, one article criticised and explicitly renounced the notion of data saturation claiming that, on the contrary, the criterion of theoretical sufficiency determined its sample size (BJHP16).

According to the original Grounded Theory texts, data collection should continue until there are no new discoveries ( i.e. , ‘data saturation’; Glaser & Strauss, 1967). However, recent revisions of this process have discussed how it is rare that data collection is an exhaustive process and researchers should rely on how well their data are able to create a sufficient theoretical account or ‘theoretical sufficiency’ (Dey, 1999). For this study, it was decided that theoretical sufficiency would guide recruitment, rather than looking for data saturation. (BJHP16).

Ten out of the 20 BJHP articles that employed the argument of saturation used one or more citations relating to this principle.

In the SHI, one article (SHI01) claimed that it achieved category saturation based on authors’ judgment.

This number was not fixed in advance, but was guided by the sampling strategy and the judgement, based on the analysis of the data, of the point at which ‘category saturation’ was achieved. (SHI01).

Three articles described a state of achieved saturation without using the term or specifying what sort of saturation they had achieved (i.e. data, theoretical, thematic saturation) (SHI04; SHI13; SHI30) whilst another four articles explicitly stated that they achieved saturation (SHI100; SHI125; SHI136; SHI137). Two papers stated that they achieved data saturation (SHI73 – see extract in section Sample size guidelines ; SHI113), two claimed theoretical saturation (SHI78; SHI115) and two referred to achieving thematic saturation (SHI87; SHI139) or to saturated themes (SHI29; SHI50).

Recruitment and analysis ceased once theoretical saturation was reached in the categories described below (Lincoln and Guba 1985). (SHI115). The respondents’ quotes drawn on below were chosen as representative, and illustrate saturated themes. (SHI50).

One article stated that thematic saturation was anticipated with its sample size (SHI94). Briefly referring to the difficulty in pinpointing achievement of theoretical saturation, SHI32 (see extract in section Richness and volume of data ) defended the sufficiency of its sample size on the basis of “the high degree of consensus [that] had begun to emerge among those interviewed”, suggesting that information from interviews was being replicated. Finally, SHI112 (see extract in section Further sampling to check findings consistency ) argued that it achieved saturation of discursive patterns . Seven of the 19 SHI articles cited references to support their position on saturation (see Additional File  4 for the full list of citations used by articles to support their position on saturation across the three journals).

Overall, it is clear that the concept of saturation encompassed a wide range of variants expressed in terms such as saturation, data saturation, thematic saturation, theoretical saturation, category saturation, saturation of coding, saturation of discursive themes, theme completeness. It is noteworthy, however, that although these various claims were sometimes supported with reference to the literature, they were not evidenced in relation to the study at hand.

Pragmatic considerations

The determination of sample size on the basis of pragmatic considerations was the second most frequently invoked argument (9.6% of all justifications) appearing in all three journals. In the BMJ, one article (BMJ15) appealed to pragmatic reasons, relating to time constraints and the difficulty to access certain study populations, to justify the determination of its sample size.

On the basis of the researchers’ previous experience and the literature, [30, 31] we estimated that recruitment of 15–20 patients at each site would achieve data saturation when data from each site were analysed separately. We set a target of seven to 10 caregivers per site because of time constraints and the anticipated difficulty of accessing caregivers at some home based care services. This gave a target sample of 75–100 patients and 35–50 caregivers overall. (BMJ15).

In the BJHP, four articles mentioned pragmatic considerations relating to time or financial constraints (BJHP27 – see extract in section Saturation ; BJHP53), the participant response rate (BJHP13), and the fixed (and thus limited) size of the participant pool from which interviewees were sampled (BJHP18).

We had aimed to continue interviewing until we had reached saturation, a point whereby further data collection would yield no further themes. In practice, the number of individuals volunteering to participate dictated when recruitment into the study ceased (15 young people, 15 parents). Nonetheless, by the last few interviews, significant repetition of concepts was occurring, suggesting ample sampling. (BJHP13).

Finally, three SHI articles explained their sample size with reference to practical aspects: time constraints and project manageability (SHI56), limited availability of respondents and project resources (SHI131), and time constraints (SHI113).

The size of the sample was largely determined by the availability of respondents and resources to complete the study. Its composition reflected, as far as practicable, our interest in how contextual factors (for example, gender relations and ethnicity) mediated the illness experience. (SHI131).

Qualities of the analysis

This sample size justification (8.4% of all justifications) was mainly employed by BJHP articles and referred to an intensive, idiographic and/or latently focused analysis, i.e. that moved beyond description. More specifically, six articles defended their sample size on the basis of an intensive analysis of transcripts and/or the idiographic focus of the study/analysis. Four of these papers (BJHP02; BJHP19; BJHP24; BJHP47) adopted an Interpretative Phenomenological Analysis (IPA) approach.

The current study employed a sample of 10 in keeping with the aim of exploring each participant’s account (Smith et al. , 1999). (BJHP19).

BJHP47 explicitly renounced the notion of saturation within an IPA approach. The other two BJHP articles conducted thematic analysis (BJHP34; BJHP38). The level of analysis – i.e. latent as opposed to a more superficial descriptive analysis – was also invoked as a justification by BJHP38 alongside the argument of an intensive analysis of individual transcripts

The resulting sample size was at the lower end of the range of sample sizes employed in thematic analysis (Braun & Clarke, 2013). This was in order to enable significant reflection, dialogue, and time on each transcript and was in line with the more latent level of analysis employed, to identify underlying ideas, rather than a more superficial descriptive analysis (Braun & Clarke, 2006). (BJHP38).

Finally, one BMJ paper (BMJ21) defended its sample size with reference to the complexity of the analytic task.

We stopped recruitment when we reached 30–35 interviews, owing to the depth and duration of interviews, richness of data, and complexity of the analytical task. (BMJ21).

Meet sampling requirements

Meeting sampling requirements (7.2% of all justifications) was another argument employed by two BMJ and four SHI articles to explain their sample size. Achieving maximum variation sampling in terms of specific interviewee characteristics determined and explained the sample size of two BMJ studies (BMJ02; BMJ16 – see extract in section Meet research design requirements ).

Recruitment continued until sampling frame requirements were met for diversity in age, sex, ethnicity, frequency of attendance, and health status. (BMJ02).

Regarding the SHI articles, two papers explained their numbers on the basis of their sampling strategy (SHI01- see extract in section Saturation ; SHI23) whilst sampling requirements that would help attain sample heterogeneity in terms of a particular characteristic of interest was cited by one paper (SHI127).

The combination of matching the recruitment sites for the quantitative research and the additional purposive criteria led to 104 phase 2 interviews (Internet (OLC): 21; Internet (FTF): 20); Gyms (FTF): 23; HIV testing (FTF): 20; HIV treatment (FTF): 20.) (SHI23). Of the fifty interviews conducted, thirty were translated from Spanish into English. These thirty, from which we draw our findings, were chosen for translation based on heterogeneity in depressive symptomology and educational attainment. (SHI127).

Finally, the pre-determination of sample size on the basis of sampling requirements was stated by one article though this was not used to justify the number of interviews (SHI10).

Sample size guidelines

Five BJHP articles (BJHP28; BJHP38 – see extract in section Qualities of the analysis ; BJHP46; BJHP47; BJHP50 – see extract in section Saturation ) and one SHI paper (SHI73) relied on citing existing sample size guidelines or norms within research traditions to determine and subsequently defend their sample size (7.2% of all justifications).

Sample size guidelines suggested a range between 20 and 30 interviews to be adequate (Creswell, 1998). Interviewer and note taker agreed that thematic saturation, the point at which no new concepts emerge from subsequent interviews (Patton, 2002), was achieved following completion of 20 interviews. (BJHP28). Interviewing continued until we deemed data saturation to have been reached (the point at which no new themes were emerging). Researchers have proposed 30 as an approximate or working number of interviews at which one could expect to be reaching theoretical saturation when using a semi-structured interview approach (Morse 2000), although this can vary depending on the heterogeneity of respondents interviewed and complexity of the issues explored. (SHI73).

In line with existing research

Sample sizes of published literature in the area of the subject matter under investigation (3.5% of all justifications) were used by 2 BMJ articles as guidance and a precedent for determining and defending their own sample size (BMJ08; BMJ15 – see extract in section Pragmatic considerations ).

We drew participants from a list of prisoners who were scheduled for release each week, sampling them until we reached the target of 35 cases, with a view to achieving data saturation within the scope of the study and sufficient follow-up interviews and in line with recent studies [8–10]. (BMJ08).

Similarly, BJHP38 (see extract in section Qualities of the analysis ) claimed that its sample size was within the range of sample sizes of published studies that use its analytic approach.

Richness and volume of data

BMJ21 (see extract in section Qualities of the analysis ) and SHI32 referred to the richness, detailed nature, and volume of data collected (2.3% of all justifications) to justify the sufficiency of their sample size.

Although there were more potential interviewees from those contacted by postcode selection, it was decided to stop recruitment after the 10th interview and focus on analysis of this sample. The material collected was considerable and, given the focused nature of the study, extremely detailed. Moreover, a high degree of consensus had begun to emerge among those interviewed, and while it is always difficult to judge at what point ‘theoretical saturation’ has been reached, or how many interviews would be required to uncover exception(s), it was felt the number was sufficient to satisfy the aims of this small in-depth investigation (Strauss and Corbin 1990). (SHI32).

Meet research design requirements

Determination of sample size so that it is in line with, and serves the requirements of, the research design (2.3% of all justifications) that the study adopted was another justification used by 2 BMJ papers (BMJ16; BMJ08 – see extract in section In line with existing research ).

We aimed for diverse, maximum variation samples [20] totalling 80 respondents from different social backgrounds and ethnic groups and those bereaved due to different types of suicide and traumatic death. We could have interviewed a smaller sample at different points in time (a qualitative longitudinal study) but chose instead to seek a broad range of experiences by interviewing those bereaved many years ago and others bereaved more recently; those bereaved in different circumstances and with different relations to the deceased; and people who lived in different parts of the UK; with different support systems and coroners’ procedures (see Tables 1 and 2 for more details). (BMJ16).

Researchers’ previous experience

The researchers’ previous experience (possibly referring to experience with qualitative research) was invoked by BMJ15 (see extract in section Pragmatic considerations ) as a justification for the determination of sample size.

Nature of study

One BJHP paper argued that the sample size was appropriate for the exploratory nature of the study (BJHP38).

A sample of eight participants was deemed appropriate because of the exploratory nature of this research and the focus on identifying underlying ideas about the topic. (BJHP38).

Further sampling to check findings consistency

Finally, SHI112 argued that once it had achieved saturation of discursive patterns, further sampling was decided and conducted to check for consistency of the findings.

Within each of the age-stratified groups, interviews were randomly sampled until saturation of discursive patterns was achieved. This resulted in a sample of 67 interviews. Once this sample had been analysed, one further interview from each age-stratified group was randomly chosen to check for consistency of the findings. Using this approach it was possible to more carefully explore children’s discourse about the ‘I’, agency, relationality and power in the thematic areas, revealing the subtle discursive variations described in this article. (SHI112).

Thematic analysis of passages discussing sample size

This analysis resulted in two overarching thematic areas; the first concerned the variation in the characterisation of sample size sufficiency, and the second related to the perceived threats deriving from sample size insufficiency.

Characterisations of sample size sufficiency

The analysis showed that there were three main characterisations of the sample size in the articles that provided relevant comments and discussion: (a) the vast majority of these qualitative studies ( n  = 42) considered their sample size as ‘small’ and this was seen and discussed as a limitation; only two articles viewed their small sample size as desirable and appropriate (b) a minority of articles ( n  = 4) proclaimed that their achieved sample size was ‘sufficient’; and (c) finally, a small group of studies ( n  = 5) characterised their sample size as ‘large’. Whilst achieving a ‘large’ sample size was sometimes viewed positively because it led to richer results, there were also occasions when a large sample size was problematic rather than desirable.

‘Small’ but why and for whom?

A number of articles which characterised their sample size as ‘small’ did so against an implicit or explicit quantitative framework of reference. Interestingly, three studies that claimed to have achieved data saturation or ‘theoretical sufficiency’ with their sample size, discussed or noted as a limitation in their discussion their ‘small’ sample size, raising the question of why, or for whom, the sample size was considered small given that the qualitative criterion of saturation had been satisfied.

The current study has a number of limitations. The sample size was small (n = 11) and, however, large enough for no new themes to emerge. (BJHP39). The study has two principal limitations. The first of these relates to the small number of respondents who took part in the study. (SHI73).

Other articles appeared to accept and acknowledge that their sample was flawed because of its small size (as well as other compositional ‘deficits’ e.g. non-representativeness, biases, self-selection) or anticipated that they might be criticized for their small sample size. It seemed that the imagined audience – perhaps reviewer or reader – was one inclined to hold the tenets of quantitative research, and certainly one to whom it was important to indicate the recognition that small samples were likely to be problematic. That one’s sample might be thought small was often construed as a limitation couched in a discourse of regret or apology.

Very occasionally, the articulation of the small size as a limitation was explicitly aligned against an espoused positivist framework and quantitative research.

This study has some limitations. Firstly, the 100 incidents sample represents a small number of the total number of serious incidents that occurs every year. 26 We sent out a nationwide invitation and do not know why more people did not volunteer for the study. Our lack of epidemiological knowledge about healthcare incidents, however, means that determining an appropriate sample size continues to be difficult. (BMJ20).

Indicative of an apparent oscillation of qualitative researchers between the different requirements and protocols demarcating the quantitative and qualitative worlds, there were a few instances of articles which briefly recognised their ‘small’ sample size as a limitation, but then defended their study on more qualitative grounds, such as their ability and success at capturing the complexity of experience and delving into the idiographic, and at generating particularly rich data.

This research, while limited in size, has sought to capture some of the complexity attached to men’s attitudes and experiences concerning incomes and material circumstances. (SHI35). Our numbers are small because negotiating access to social networks was slow and labour intensive, but our methods generated exceptionally rich data. (BMJ21). This study could be criticised for using a small and unrepresentative sample. Given that older adults have been ignored in the research concerning suntanning, fair-skinned older adults are the most likely to experience skin cancer, and women privilege appearance over health when it comes to sunbathing practices, our study offers depth and richness of data in a demographic group much in need of research attention. (SHI57).

‘Good enough’ sample sizes

Only four articles expressed some degree of confidence that their achieved sample size was sufficient. For example, SHI139, in line with the justification of thematic saturation that it offered, expressed trust in its sample size sufficiency despite the poor response rate. Similarly, BJHP04, which did not provide a sample size justification, argued that it targeted a larger sample size in order to eventually recruit a sufficient number of interviewees, due to anticipated low response rate.

Twenty-three people with type I diabetes from the target population of 133 ( i.e. 17.3%) consented to participate but four did not then respond to further contacts (total N = 19). The relatively low response rate was anticipated, due to the busy life-styles of young people in the age range, the geographical constraints, and the time required to participate in a semi-structured interview, so a larger target sample allowed a sufficient number of participants to be recruited. (BJHP04).

Two other articles (BJHP35; SHI32) linked the claimed sufficiency to the scope (i.e. ‘small, in-depth investigation’), aims and nature (i.e. ‘exploratory’) of their studies, thus anchoring their numbers to the particular context of their research. Nevertheless, claims of sample size sufficiency were sometimes undermined when they were juxtaposed with an acknowledgement that a larger sample size would be more scientifically productive.

Although our sample size was sufficient for this exploratory study, a more diverse sample including participants with lower socioeconomic status and more ethnic variation would be informative. A larger sample could also ensure inclusion of a more representative range of apps operating on a wider range of platforms. (BJHP35).

‘Large’ sample sizes - Promise or peril?

Three articles (BMJ13; BJHP05; BJHP48) which all provided the justification of saturation, characterised their sample size as ‘large’ and narrated this oversufficiency in positive terms as it allowed richer data and findings and enhanced the potential for generalisation. The type of generalisation aspired to (BJHP48) was not further specified however.

This study used rich data provided by a relatively large sample of expert informants on an important but under-researched topic. (BMJ13). Qualitative research provides a unique opportunity to understand a clinical problem from the patient’s perspective. This study had a large diverse sample, recruited through a range of locations and used in-depth interviews which enhance the richness and generalizability of the results. (BJHP48).

And whilst a ‘large’ sample size was endorsed and valued by some qualitative researchers, within the psychological tradition of IPA, a ‘large’ sample size was counter-normative and therefore needed to be justified. Four BJHP studies, all adopting IPA, expressed the appropriateness or desirability of ‘small’ sample sizes (BJHP41; BJHP45) or hastened to explain why they included a larger than typical sample size (BJHP32; BJHP47). For example, BJHP32 below provides a rationale for how an IPA study can accommodate a large sample size and how this was indeed suitable for the purposes of the particular research. To strengthen the explanation for choosing a non-normative sample size, previous IPA research citing a similar sample size approach is used as a precedent.

Small scale IPA studies allow in-depth analysis which would not be possible with larger samples (Smith et al. , 2009). (BJHP41). Although IPA generally involves intense scrutiny of a small number of transcripts, it was decided to recruit a larger diverse sample as this is the first qualitative study of this population in the United Kingdom (as far as we know) and we wanted to gain an overview. Indeed, Smith, Flowers, and Larkin (2009) agree that IPA is suitable for larger groups. However, the emphasis changes from an in-depth individualistic analysis to one in which common themes from shared experiences of a group of people can be elicited and used to understand the network of relationships between themes that emerge from the interviews. This large-scale format of IPA has been used by other researchers in the field of false-positive research. Baillie, Smith, Hewison, and Mason (2000) conducted an IPA study, with 24 participants, of ultrasound screening for chromosomal abnormality; they found that this larger number of participants enabled them to produce a more refined and cohesive account. (BJHP32).

The IPA articles found in the BJHP were the only instances where a ‘small’ sample size was advocated and a ‘large’ sample size problematized and defended. These IPA studies illustrate that the characterisation of sample size sufficiency can be a function of researchers’ theoretical and epistemological commitments rather than the result of an ‘objective’ sample size assessment.

Threats from sample size insufficiency

As shown above, the majority of articles that commented on their sample size, simultaneously characterized it as small and problematic. On those occasions that authors did not simply cite their ‘small’ sample size as a study limitation but rather continued and provided an account of how and why a small sample size was problematic, two important scientific qualities of the research seemed to be threatened: the generalizability and validity of results.

Generalizability

Those who characterised their sample as ‘small’ connected this to the limited potential for generalization of the results. Other features related to the sample – often some kind of compositional particularity – were also linked to limited potential for generalisation. Though not always explicitly articulated to what form of generalisation the articles referred to (see BJHP09), generalisation was mostly conceived in nomothetic terms, that is, it concerned the potential to draw inferences from the sample to the broader study population (‘representational generalisation’ – see BJHP31) and less often to other populations or cultures.

It must be noted that samples are small and whilst in both groups the majority of those women eligible participated, generalizability cannot be assumed. (BJHP09). The study’s limitations should be acknowledged: Data are presented from interviews with a relatively small group of participants, and thus, the views are not necessarily generalizable to all patients and clinicians. In particular, patients were only recruited from secondary care services where COFP diagnoses are typically confirmed. The sample therefore is unlikely to represent the full spectrum of patients, particularly those who are not referred to, or who have been discharged from dental services. (BJHP31).

Without explicitly using the term generalisation, two SHI articles noted how their ‘small’ sample size imposed limits on ‘the extent that we can extrapolate from these participants’ accounts’ (SHI114) or to the possibility ‘to draw far-reaching conclusions from the results’ (SHI124).

Interestingly, only a minority of articles alluded to, or invoked, a type of generalisation that is aligned with qualitative research, that is, idiographic generalisation (i.e. generalisation that can be made from and about cases [ 5 ]). These articles, all published in the discipline of sociology, defended their findings in terms of the possibility of drawing logical and conceptual inferences to other contexts and of generating understanding that has the potential to advance knowledge, despite their ‘small’ size. One article (SHI139) clearly contrasted nomothetic (statistical) generalisation to idiographic generalisation, arguing that the lack of statistical generalizability does not nullify the ability of qualitative research to still be relevant beyond the sample studied.

Further, these data do not need to be statistically generalisable for us to draw inferences that may advance medicalisation analyses (Charmaz 2014). These data may be seen as an opportunity to generate further hypotheses and are a unique application of the medicalisation framework. (SHI139). Although a small-scale qualitative study related to school counselling, this analysis can be usefully regarded as a case study of the successful utilisation of mental health-related resources by adolescents. As many of the issues explored are of relevance to mental health stigma more generally, it may also provide insights into adult engagement in services. It shows how a sociological analysis, which uses positioning theory to examine how people negotiate, partially accept and simultaneously resist stigmatisation in relation to mental health concerns, can contribute to an elucidation of the social processes and narrative constructions which may maintain as well as bridge the mental health service gap. (SHI103).

Only one article (SHI30) used the term transferability to argue for the potential of wider relevance of the results which was thought to be more the product of the composition of the sample (i.e. diverse sample), rather than the sample size.

The second major concern that arose from a ‘small’ sample size pertained to the internal validity of findings (i.e. here the term is used to denote the ‘truth’ or credibility of research findings). Authors expressed uncertainty about the degree of confidence in particular aspects or patterns of their results, primarily those that concerned some form of differentiation on the basis of relevant participant characteristics.

The information source preferred seemed to vary according to parents’ education; however, the sample size is too small to draw conclusions about such patterns. (SHI80). Although our numbers were too small to demonstrate gender differences with any certainty, it does seem that the biomedical and erotic scripts may be more common in the accounts of men and the relational script more common in the accounts of women. (SHI81).

In other instances, articles expressed uncertainty about whether their results accounted for the full spectrum and variation of the phenomenon under investigation. In other words, a ‘small’ sample size (alongside compositional ‘deficits’ such as a not statistically representative sample) was seen to threaten the ‘content validity’ of the results which in turn led to constructions of the study conclusions as tentative.

Data collection ceased on pragmatic grounds rather than when no new information appeared to be obtained ( i.e. , saturation point). As such, care should be taken not to overstate the findings. Whilst the themes from the initial interviews seemed to be replicated in the later interviews, further interviews may have identified additional themes or provided more nuanced explanations. (BJHP53). …it should be acknowledged that this study was based on a small sample of self-selected couples in enduring marriages who were not broadly representative of the population. Thus, participants may not be representative of couples that experience postnatal PTSD. It is therefore unlikely that all the key themes have been identified and explored. For example, couples who were excluded from the study because the male partner declined to participate may have been experiencing greater interpersonal difficulties. (BJHP03).

In other instances, articles attempted to preserve a degree of credibility of their results, despite the recognition that the sample size was ‘small’. Clarity and sharpness of emerging themes and alignment with previous relevant work were the arguments employed to warrant the validity of the results.

This study focused on British Chinese carers of patients with affective disorders, using a qualitative methodology to synthesise the sociocultural representations of illness within this community. Despite the small sample size, clear themes emerged from the narratives that were sufficient for this exploratory investigation. (SHI98).

The present study sought to examine how qualitative sample sizes in health-related research are characterised and justified. In line with previous studies [ 22 , 30 , 33 , 34 ] the findings demonstrate that reporting of sample size sufficiency is limited; just over 50% of articles in the BMJ and BJHP and 82% in the SHI did not provide any sample size justification. Providing a sample size justification was not related to the number of interviews conducted, but it was associated with the journal that the article was published in, indicating the influence of disciplinary or publishing norms, also reported in prior research [ 30 ]. This lack of transparency about sample size sufficiency is problematic given that most qualitative researchers would agree that it is an important marker of quality [ 56 , 57 ]. Moreover, and with the rise of qualitative research in social sciences, efforts to synthesise existing evidence and assess its quality are obstructed by poor reporting [ 58 , 59 ].

When authors justified their sample size, our findings indicate that sufficiency was mostly appraised with reference to features that were intrinsic to the study, in agreement with general advice on sample size determination [ 4 , 11 , 36 ]. The principle of saturation was the most commonly invoked argument [ 22 ] accounting for 55% of all justifications. A wide range of variants of saturation was evident corroborating the proliferation of the meaning of the term [ 49 ] and reflecting different underlying conceptualisations or models of saturation [ 20 ]. Nevertheless, claims of saturation were never substantiated in relation to procedures conducted in the study itself, endorsing similar observations in the literature [ 25 , 30 , 47 ]. Claims of saturation were sometimes supported with citations of other literature, suggesting a removal of the concept away from the characteristics of the study at hand. Pragmatic considerations, such as resource constraints or participant response rate and availability, was the second most frequently used argument accounting for approximately 10% of justifications and another 23% of justifications also represented intrinsic-to-the-study characteristics (i.e. qualities of the analysis, meeting sampling or research design requirements, richness and volume of the data obtained, nature of study, further sampling to check findings consistency).

Only, 12% of mentions of sample size justification pertained to arguments that were external to the study at hand, in the form of existing sample size guidelines and prior research that sets precedents. Whilst community norms and prior research can establish useful rules of thumb for estimating sample sizes [ 60 ] – and reveal what sizes are more likely to be acceptable within research communities – researchers should avoid adopting these norms uncritically, especially when such guidelines [e.g. 30 , 35 ], might be based on research that does not provide adequate evidence of sample size sufficiency. Similarly, whilst methodological research that seeks to demonstrate the achievement of saturation is invaluable since it explicates the parameters upon which saturation is contingent and indicates when a research project is likely to require a smaller or a larger sample [e.g. 29 ], specific numbers at which saturation was achieved within these projects cannot be routinely extrapolated for other projects. We concur with existing views [ 11 , 36 ] that the consideration of the characteristics of the study at hand, such as the epistemological and theoretical approach, the nature of the phenomenon under investigation, the aims and scope of the study, the quality and richness of data, or the researcher’s experience and skills of conducting qualitative research, should be the primary guide in determining sample size and assessing its sufficiency.

Moreover, although numbers in qualitative research are not unimportant [ 61 ], sample size should not be considered alone but be embedded in the more encompassing examination of data adequacy [ 56 , 57 ]. Erickson’s [ 62 ] dimensions of ‘evidentiary adequacy’ are useful here. He explains the concept in terms of adequate amounts of evidence, adequate variety in kinds of evidence, adequate interpretive status of evidence, adequate disconfirming evidence, and adequate discrepant case analysis. All dimensions might not be relevant across all qualitative research designs, but this illustrates the thickness of the concept of data adequacy, taking it beyond sample size.

The present research also demonstrated that sample sizes were commonly seen as ‘small’ and insufficient and discussed as limitation. Often unjustified (and in two cases incongruent with their own claims of saturation) these findings imply that sample size in qualitative health research is often adversely judged (or expected to be judged) against an implicit, yet omnipresent, quasi-quantitative standpoint. Indeed there were a few instances in our data where authors appeared, possibly in response to reviewers, to resist to some sort of quantification of their results. This implicit reference point became more apparent when authors discussed the threats deriving from an insufficient sample size. Whilst the concerns about internal validity might be legitimate to the extent that qualitative research projects, which are broadly related to realism, are set to examine phenomena in sufficient breadth and depth, the concerns around generalizability revealed a conceptualisation that is not compatible with purposive sampling. The limited potential for generalisation, as a result of a small sample size, was often discussed in nomothetic, statistical terms. Only occasionally was analytic or idiographic generalisation invoked to warrant the value of the study’s findings [ 5 , 17 ].

Strengths and limitations of the present study

We note, first, the limited number of health-related journals reviewed, so that only a ‘snapshot’ of qualitative health research has been captured. Examining additional disciplines (e.g. nursing sciences) as well as inter-disciplinary journals would add to the findings of this analysis. Nevertheless, our study is the first to provide some comparative insights on the basis of disciplines that are differently attached to the legacy of positivism and analysed literature published over a lengthy period of time (15 years). Guetterman [ 27 ] also examined health-related literature but this analysis was restricted to 26 most highly cited articles published over a period of five years whilst Carlsen and Glenton’s [ 22 ] study concentrated on focus groups health research. Moreover, although it was our intention to examine sample size justification in relation to the epistemological and theoretical positions of articles, this proved to be challenging largely due to absence of relevant information, or the difficulty into discerning clearly articles’ positions [ 63 ] and classifying them under specific approaches (e.g. studies often combined elements from different theoretical and epistemological traditions). We believe that such an analysis would yield useful insights as it links the methodological issue of sample size to the broader philosophical stance of the research. Despite these limitations, the analysis of the characterisation of sample size and of the threats seen to accrue from insufficient sample size, enriches our understanding of sample size (in)sufficiency argumentation by linking it to other features of the research. As the peer-review process becomes increasingly public, future research could usefully examine how reporting around sample size sufficiency and data adequacy might be influenced by the interactions between authors and reviewers.

The past decade has seen a growing appetite in qualitative research for an evidence-based approach to sample size determination and to evaluations of the sufficiency of sample size. Despite the conceptual and methodological developments in the area, the findings of the present study confirm previous studies in concluding that appraisals of sample size sufficiency are either absent or poorly substantiated. To ensure and maintain high quality research that will encourage greater appreciation of qualitative work in health-related sciences [ 64 ], we argue that qualitative researchers should be more transparent and thorough in their evaluation of sample size as part of their appraisal of data adequacy. We would encourage the practice of appraising sample size sufficiency with close reference to the study at hand and would thus caution against responding to the growing methodological research in this area with a decontextualised application of sample size numerical guidelines, norms and principles. Although researchers might find sample size community norms serve as useful rules of thumb, we recommend methodological knowledge is used to critically consider how saturation and other parameters that affect sample size sufficiency pertain to the specifics of the particular project. Those reviewing papers have a vital role in encouraging transparent study-specific reporting. The review process should support authors to exercise nuanced judgments in decisions about sample size determination in the context of the range of factors that influence sample size sufficiency and the specifics of a particular study. In light of the growing methodological evidence in the area, transparent presentation of such evidence-based judgement is crucial and in time should surely obviate the seemingly routine practice of citing the ‘small’ size of qualitative samples among the study limitations.

A non-parametric test of difference for independent samples was performed since the variable number of interviews violated assumptions of normality according to the standardized scores of skewness and kurtosis (BMJ: z skewness = 3.23, z kurtosis = 1.52; BJHP: z skewness = 4.73, z kurtosis = 4.85; SHI: z skewness = 12.04, z kurtosis = 21.72) and the Shapiro-Wilk test of normality ( p  < .001).

Abbreviations

British Journal of Health Psychology

British Medical Journal

Interpretative Phenomenological Analysis

Sociology of Health & Illness

Spencer L, Ritchie J, Lewis J, Dillon L. Quality in qualitative evaluation: a framework for assessing research evidence. National Centre for Social Research 2003 https://www.heacademy.ac.uk/system/files/166_policy_hub_a_quality_framework.pdf Accessed 11 May 2018.

Fusch PI, Ness LR. Are we there yet? Data saturation in qualitative research Qual Rep. 2015;20(9):1408–16.

Google Scholar  

Robinson OC. Sampling in interview-based qualitative research: a theoretical and practical guide. Qual Res Psychol. 2014;11(1):25–41.

Article   Google Scholar  

Sandelowski M. Sample size in qualitative research. Res Nurs Health. 1995;18(2):179–83.

Article   CAS   Google Scholar  

Sandelowski M. One is the liveliest number: the case orientation of qualitative research. Res Nurs Health. 1996;19(6):525–9.

Luborsky MR, Rubinstein RL. Sampling in qualitative research: rationale, issues. and methods Res Aging. 1995;17(1):89–113.

Marshall MN. Sampling for qualitative research. Fam Pract. 1996;13(6):522–6.

Patton MQ. Qualitative evaluation and research methods. 2nd ed. Newbury Park, CA: Sage; 1990.

van Rijnsoever FJ. (I Can’t get no) saturation: a simulation and guidelines for sample sizes in qualitative research. PLoS One. 2017;12(7):e0181689.

Morse JM. The significance of saturation. Qual Health Res. 1995;5(2):147–9.

Morse JM. Determining sample size. Qual Health Res. 2000;10(1):3–5.

Gergen KJ, Josselson R, Freeman M. The promises of qualitative inquiry. Am Psychol. 2015;70(1):1–9.

Borsci S, Macredie RD, Barnett J, Martin J, Kuljis J, Young T. Reviewing and extending the five-user assumption: a grounded procedure for interaction evaluation. ACM Trans Comput Hum Interact. 2013;20(5):29.

Borsci S, Macredie RD, Martin JL, Young T. How many testers are needed to assure the usability of medical devices? Expert Rev Med Devices. 2014;11(5):513–25.

Glaser BG, Strauss AL. The discovery of grounded theory: strategies for qualitative research. Chicago, IL: Aldine; 1967.

Kerr C, Nixon A, Wild D. Assessing and demonstrating data saturation in qualitative inquiry supporting patient-reported outcomes research. Expert Rev Pharmacoecon Outcomes Res. 2010;10(3):269–81.

Lincoln YS, Guba EG. Naturalistic inquiry. London: Sage; 1985.

Book   Google Scholar  

Malterud K, Siersma VD, Guassora AD. Sample size in qualitative interview studies: guided by information power. Qual Health Res. 2015;26:1753–60.

Nelson J. Using conceptual depth criteria: addressing the challenge of reaching saturation in qualitative research. Qual Res. 2017;17(5):554–70.

Saunders B, Sim J, Kingstone T, Baker S, Waterfield J, Bartlam B, et al. Saturation in qualitative research: exploring its conceptualization and operationalization. Qual Quant. 2017. https://doi.org/10.1007/s11135-017-0574-8 .

Caine K. Local standards for sample size at CHI. In Proceedings of the 2016 CHI conference on human factors in computing systems. 2016;981–992. ACM.

Carlsen B, Glenton C. What about N? A methodological study of sample-size reporting in focus group studies. BMC Med Res Methodol. 2011;11(1):26.

Constantinou CS, Georgiou M, Perdikogianni M. A comparative method for themes saturation (CoMeTS) in qualitative interviews. Qual Res. 2017;17(5):571–88.

Dai NT, Free C, Gendron Y. Interview-based research in accounting 2000–2014: a review. November 2016. https://ssrn.com/abstract=2711022 or https://doi.org/10.2139/ssrn.2711022 . Accessed 17 May 2018.

Francis JJ, Johnston M, Robertson C, Glidewell L, Entwistle V, Eccles MP, et al. What is an adequate sample size? Operationalising data saturation for theory-based interview studies. Psychol Health. 2010;25(10):1229–45.

Guest G, Bunce A, Johnson L. How many interviews are enough? An experiment with data saturation and variability. Field Methods. 2006;18(1):59–82.

Guetterman TC. Descriptions of sampling practices within five approaches to qualitative research in education and the health sciences. Forum Qual Soc Res. 2015;16(2):25. http://nbn-resolving.de/urn:nbn:de:0114-fqs1502256 . Accessed 17 May 2018.

Hagaman AK, Wutich A. How many interviews are enough to identify metathemes in multisited and cross-cultural research? Another perspective on guest, bunce, and Johnson’s (2006) landmark study. Field Methods. 2017;29(1):23–41.

Hennink MM, Kaiser BN, Marconi VC. Code saturation versus meaning saturation: how many interviews are enough? Qual Health Res. 2017;27(4):591–608.

Marshall B, Cardon P, Poddar A, Fontenot R. Does sample size matter in qualitative research?: a review of qualitative interviews in IS research. J Comput Inform Syst. 2013;54(1):11–22.

Mason M. Sample size and saturation in PhD studies using qualitative interviews. Forum Qual Soc Res 2010;11(3):8. http://nbn-resolving.de/urn:nbn:de:0114-fqs100387 . Accessed 17 May 2018.

Safman RM, Sobal J. Qualitative sample extensiveness in health education research. Health Educ Behav. 2004;31(1):9–21.

Saunders MN, Townsend K. Reporting and justifying the number of interview participants in organization and workplace research. Br J Manag. 2016;27(4):836–52.

Sobal J. 2001. Sample extensiveness in qualitative nutrition education research. J Nutr Educ. 2001;33(4):184–92.

Thomson SB. 2010. Sample size and grounded theory. JOAAG. 2010;5(1). http://www.joaag.com/uploads/5_1__Research_Note_1_Thomson.pdf . Accessed 17 May 2018.

Baker SE, Edwards R. How many qualitative interviews is enough?: expert voices and early career reflections on sampling and cases in qualitative research. National Centre for Research Methods Review Paper. 2012; http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.pdf . Accessed 17 May 2018.

Ogden J, Cornwell D. The role of topic, interviewee, and question in predicting rich interview data in the field of health research. Sociol Health Illn. 2010;32(7):1059–71.

Green J, Thorogood N. Qualitative methods for health research. London: Sage; 2004.

Ritchie J, Lewis J, Elam G. Designing and selecting samples. In: Ritchie J, Lewis J, editors. Qualitative research practice: a guide for social science students and researchers. London: Sage; 2003. p. 77–108.

Britten N. Qualitative research: qualitative interviews in medical research. BMJ. 1995;311(6999):251–3.

Creswell JW. Qualitative inquiry and research design: choosing among five approaches. 2nd ed. London: Sage; 2007.

Fugard AJ, Potts HW. Supporting thinking on sample sizes for thematic analyses: a quantitative tool. Int J Soc Res Methodol. 2015;18(6):669–84.

Emmel N. Themes, variables, and the limits to calculating sample size in qualitative research: a response to Fugard and Potts. Int J Soc Res Methodol. 2015;18(6):685–6.

Braun V, Clarke V. (Mis) conceptualising themes, thematic analysis, and other problems with Fugard and Potts’ (2015) sample-size tool for thematic analysis. Int J Soc Res Methodol. 2016;19(6):739–43.

Hammersley M. Sampling and thematic analysis: a response to Fugard and Potts. Int J Soc Res Methodol. 2015;18(6):687–8.

Charmaz K. Constructing grounded theory: a practical guide through qualitative analysis. London: Sage; 2006.

Bowen GA. Naturalistic inquiry and the saturation concept: a research note. Qual Res. 2008;8(1):137–52.

Morse JM. Data were saturated. Qual Health Res. 2015;25(5):587–8.

O’Reilly M, Parker N. ‘Unsatisfactory saturation’: a critical exploration of the notion of saturated sample sizes in qualitative research. Qual Res. 2013;13(2):190–7.

Manen M, Higgins I, Riet P. A conversation with max van Manen on phenomenology in its original sense. Nurs Health Sci. 2016;18(1):4–7.

Dey I. Grounding grounded theory. San Francisco, CA: Academic Press; 1999.

Hays DG, Wood C, Dahl H, Kirk-Jenkins A. Methodological rigor in journal of counseling & development qualitative research articles: a 15-year review. J Couns Dev. 2016;94(2):172–83.

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 2009; 6(7): e1000097.

Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–88.

Boyatzis RE. Transforming qualitative information: thematic analysis and code development. Thousand Oaks, CA: Sage; 1998.

Levitt HM, Motulsky SL, Wertz FJ, Morrow SL, Ponterotto JG. Recommendations for designing and reviewing qualitative research in psychology: promoting methodological integrity. Qual Psychol. 2017;4(1):2–22.

Morrow SL. Quality and trustworthiness in qualitative research in counseling psychology. J Couns Psychol. 2005;52(2):250–60.

Barroso J, Sandelowski M. Sample reporting in qualitative studies of women with HIV infection. Field Methods. 2003;15(4):386–404.

Glenton C, Carlsen B, Lewin S, Munthe-Kaas H, Colvin CJ, Tunçalp Ö, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings—paper 5: how to assess adequacy of data. Implement Sci. 2018;13(Suppl 1):14.

Onwuegbuzie AJ. Leech NL. A call for qualitative power analyses. Qual Quant. 2007;41(1):105–21.

Sandelowski M. Real qualitative researchers do not count: the use of numbers in qualitative research. Res Nurs Health. 2001;24(3):230–40.

Erickson F. Qualitative methods in research on teaching. In: Wittrock M, editor. Handbook of research on teaching. 3rd ed. New York: Macmillan; 1986. p. 119–61.

Bradbury-Jones C, Taylor J, Herber O. How theory is used and articulated in qualitative research: development of a new typology. Soc Sci Med. 2014;120:135–41.

Greenhalgh T, Annandale E, Ashcroft R, Barlow J, Black N, Bleakley A, et al. An open letter to the BMJ editors on qualitative research. BMJ. 2016;i563:352.

Download references

Acknowledgments

We would like to thank Dr. Paula Smith and Katharine Lee for their comments on a previous draft of this paper as well as Natalie Ann Mitchell and Meron Teferra for assisting us with data extraction.

This research was initially conceived of and partly conducted with financial support from the Multidisciplinary Assessment of Technology Centre for Healthcare (MATCH) programme (EP/F063822/1 and EP/G012393/1). The research continued and was completed independent of any support. The funding body did not have any role in the study design, the collection, analysis and interpretation of the data, in the writing of the paper, and in the decision to submit the manuscript for publication. The views expressed are those of the authors alone.

Availability of data and materials

Supporting data can be accessed in the original publications. Additional File 2 lists all eligible studies that were included in the present analysis.

Author information

Authors and affiliations.

Department of Psychology, University of Bath, Building 10 West, Claverton Down, Bath, BA2 7AY, UK

Konstantina Vasileiou & Julie Barnett

School of Psychology, Newcastle University, Ridley Building 1, Queen Victoria Road, Newcastle upon Tyne, NE1 7RU, UK

Susan Thorpe

Department of Computer Science, Brunel University London, Wilfred Brown Building 108, Uxbridge, UB8 3PH, UK

Terry Young

You can also search for this author in PubMed   Google Scholar

Contributions

JB and TY conceived the study; KV, JB, and TY designed the study; KV identified the articles and extracted the data; KV and JB assessed eligibility of articles; KV, JB, ST, and TY contributed to the analysis of the data, discussed the findings and early drafts of the paper; KV developed the final manuscript; KV, JB, ST, and TY read and approved the manuscript.

Corresponding author

Correspondence to Konstantina Vasileiou .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

Terry Young is an academic who undertakes research and occasional consultancy in the areas of health technology assessment, information systems, and service design. He is unaware of any direct conflict of interest with respect to this paper. All other authors have no competing interests to declare.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional Files

Additional file 1:.

Editorial positions on qualitative research and sample considerations (where available). (DOCX 12 kb)

Additional File 2:

List of eligible articles included in the review ( N  = 214). (DOCX 38 kb)

Additional File 3:

Data Extraction Form. (DOCX 15 kb)

Additional File 4:

Citations used by articles to support their position on saturation. (DOCX 14 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Vasileiou, K., Barnett, J., Thorpe, S. et al. Characterising and justifying sample size sufficiency in interview-based studies: systematic analysis of qualitative health research over a 15-year period. BMC Med Res Methodol 18 , 148 (2018). https://doi.org/10.1186/s12874-018-0594-7

Download citation

Received : 22 May 2018

Accepted : 29 October 2018

Published : 21 November 2018

DOI : https://doi.org/10.1186/s12874-018-0594-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sample size
  • Sample size justification
  • Sample size characterisation
  • Data adequacy
  • Qualitative health research
  • Qualitative interviews
  • Systematic analysis

BMC Medical Research Methodology

ISSN: 1471-2288

qualitative research respondents number

Sample size for qualitative research

Qualitative Market Research

ISSN : 1352-2752

Article publication date: 12 September 2016

Qualitative researchers have been criticised for not justifying sample size decisions in their research. This short paper addresses the issue of which sample sizes are appropriate and valid within different approaches to qualitative research.

Design/methodology/approach

The sparse literature on sample sizes in qualitative research is reviewed and discussed. This examination is informed by the personal experience of the author in terms of assessing, as an editor, reviewer comments as they relate to sample size in qualitative research. Also, the discussion is informed by the author’s own experience of undertaking commercial and academic qualitative research over the last 31 years.

In qualitative research, the determination of sample size is contextual and partially dependent upon the scientific paradigm under which investigation is taking place. For example, qualitative research which is oriented towards positivism, will require larger samples than in-depth qualitative research does, so that a representative picture of the whole population under review can be gained. Nonetheless, the paper also concludes that sample sizes involving one single case can be highly informative and meaningful as demonstrated in examples from management and medical research. Unique examples of research using a single sample or case but involving new areas or findings that are potentially highly relevant, can be worthy of publication. Theoretical saturation can also be useful as a guide in designing qualitative research, with practical research illustrating that samples of 12 may be cases where data saturation occurs among a relatively homogeneous population.

Practical implications

Sample sizes as low as one can be justified. Researchers and reviewers may find the discussion in this paper to be a useful guide to determining and critiquing sample size in qualitative research.

Originality/value

Sample size in qualitative research is always mentioned by reviewers of qualitative papers but discussion tends to be simplistic and relatively uninformed. The current paper draws attention to how sample sizes, at both ends of the size continuum, can be justified by researchers. This will also aid reviewers in their making of comments about the appropriateness of sample sizes in qualitative research.

  • Qualitative research
  • Qualitative methodology
  • Case studies
  • Sample size

Boddy, C.R. (2016), "Sample size for qualitative research", Qualitative Market Research , Vol. 19 No. 4, pp. 426-432. https://doi.org/10.1108/QMR-06-2016-0053

Emerald Group Publishing Limited

Copyright © 2016, Emerald Group Publishing Limited

Introduction

This current article considers the seldom-written-about-but-much-questioned issue of sample size in qualitative research. This paper is inspired and informed by the author’s experiences in commercial marketing research, academic management research and as an editor for qualitative academic papers. Further, as an author of such papers and in the role of editor, the views of many reviewers have been read over the past 31 years in research, and these have also inspired this current paper. Reviewers clearly need guidance in this area, and researchers could also benefit from this discussion as they struggle to design qualitative research in terms of sample size.

Furthermore, qualitative research has recently come under criticism for its lack of rigour in terms of there being little or no justifications given for the sample sizes that are actually used in research ( Marshall et al. , 2013 ). Marshall, Cardon, Poddar and Fontenot considered 81 qualitative studies and concluded that scant attention was paid to estimating or justifying sample sizes.

The question of what sample size is needed for qualitative research is frequently asked by individual researchers ( Dworkin, 2012 ) but not frequently discussed in the literature ( Onwuegbuzie and Leech, 2005 ). Few studies approach this issue, and as much qualitative research does not involve the making of statistical generalisations, many qualitative researchers report that sample size is not an issue in qualitative research ( Onwuegbuzie and Leech, 2005 ). However, for reviewers it clearly is an issue as described below.

Furthermore, the related issue of what sample size is needed for qualitative research findings to have some validity is also one which many paper reviewers are concerned about enough for them to mention in their reviews. Reviewers typically, nonetheless, do not definitively answer their own questions regarding what size a sample should be. Comments from reviewers are usually to do with the sample size (whatever size it is), being too small, and they commonly state that this should be noted in the limitations sections of an academic research paper. This current paper reviews some of the sparse literature on this subject, investigates a case study from the physical sciences and one from management and comes to some tentative conclusions.

The concept of data saturation, which is the point at which no new information or themes are observed in the data from the completion of additional interviews or cases, ( Guest et al. , 2006 ) is a useful one in terms of discussing sample size in qualitative research. This approach implies that a single case study or interview is never enough, because data saturation can only be known after at least two cases, and usually more, are examined. This idea of sampling until data saturation is reached can be used as a justification for the use of a particular sample size in any qualitative research which is guided by this idea.

However, in practical terms, although the idea of saturation is very helpful at the conceptual level, it provides little guidance for estimating actual sample sizes, prior to data collection ( Guest et al. , 2006 ). For example, it is difficult to give cost and timing estimates for research where the sample size has not been pre-determined. This impracticality may be a reason why the data saturation approach does not appear to be used in practice, even in academic research.

For example, in a meta-analysis of 560 academic qualitative studies, the distribution of sample sizes used was found to be non-random, with a statistically significant proportion of studies, presenting sample sizes that were multiples of ten ( Mason, 2010 ). This strongly suggests that a premeditated approach to sample size determination was used, and this is not wholly congruent with some of the principles of qualitative research ( Mason, 2010 ). Clearly, there is confusion and a gap between theoretical expectations and practice.

This is corroborated by the investigation of 81 qualitative studies mentioned earlier ( Marshall et al. , 2013 ). This investigation found that those qualitative researchers who used data/theoretical saturation as an indicator that their sample size was sufficiently large did not explain this in sufficient detail, or in a way that was persuasive or entailed the presentation of any evidence to support the claim for data saturation ( Marshall et al. , 2013 ).

The idea underlying data saturation as a guide to sample size is the idea that once saturation is reached, the results must be capable of some degree of generalisation. Generalisation is traditionally seen as a central aim of science, as a process of theory formulation for further applications ( Mayring, 2007 ). However, as Mayring notes, the concept of generalisation has been criticised, for example because of the context specificity of all scientific findings.

Despite the apparent limitations of samples which involve a single case or single research participant as discussed above, it has nevertheless been noted that individual (single sample) case studies can provide reliable indications for the directions in which future research can go. Individual cases can also provide a new, deep and nuanced understanding of previously unexplored phenomena. Furthermore, qualitative researchers have noted that often a researcher can (unknowingly) have all the data they need from their first piece of data collection ( Sandelowski, 1995 ). It is also argued that case studies have been undervalued in terms of their ability to generate theoretical generalisations ( Tsang, 2014 ). This is demonstrated below from the discussion of two examples, one from the physical sciences and one from management research.

First, in medicine, it has been noted that findings from single case studies can have findings which can be generalised from and implications which are global in importance. The discovery of penicillin is a case in point. Alexander Fleming noticed an accidental case where mould was growing as a contaminant on the jelly in one of his culture plates (like Petri dishes). The mould appeared to have an inhibitory effect on the surrounding growth of bacteria. He called the mould Penicillin notatum ( American Chemical Society, 1999 ). Publishing his findings in 1929 in the British Journal of Experimental Pathology , he wrote that the broth from the mould had marked inhibitory, bactericidal and bacteriolytic properties to many of the more common pathogenic bacteria ( Fleming, 1929 ). His work was taken up by Howard Flory and Ernst Chain at Oxford University who developed penicillin as a medicine, with the eventual help of US drug companies.

Penicillin was so apparently successful and generally applicable that it did not initially undergo full randomised trails prior to use in humans. Nonetheless, the development of penicillin is noted as being one of the greatest breakthroughs in modern medicine ( American Chemical Society, 1999 ).

In management research, the longitudinal examination of an individual CEO who was highly psychopathic is also a plausible example of such a single-case approach being ground breaking and informative ( Boddy, 2015 ). This is particularly so because corporate psychopaths appear to have a common modus operandi and to be relatively stable personalities over time ( Boddy et al. , 2015 ) (just as penicillin has stable properties). The study of one corporate psychopath CEO, it was compellingly argued, can therefore inform how other psychopathic CEO’s will likely behave.

More theoretically, the research philosophy or paradigm adopted and discussions of an appropriate sample size are related ( Onwuegbuzie and Leech, 2005 ; Boddy, 2005b ). Some researchers associate size considerations with an approach to science based on positivism, which is an approach to scientific inquiry which many qualitative researchers reject ( Lincoln and Guba, 2000 ). However, it should be noted that some researchers do use a qualitative element of research to set the parameters for a further, positivist quantification. This usually means that they apply a positivist approach to qualitative research ( Boddy, 2005a , 2005b ) and, under this approach, a criticism of sample size because of smallness may well be justified. This is because the qualitative sample size has to be representative of the population under consideration as a breadth of inquiry is anticipated.

This is the approach recommended (pp. 25-28) by qualitative market researchers who suggest that researchers draw up a grid (such as sex by brand usage) to make sure that each segment of the population is covered by the research ( Gordon and Langmaid, 1990 ). Academic researchers also suggest this grid or matrix type approach to qualitative sample size determination ( Stake, 2000 ).

Commentators suggest that qualitative sample sizes of ten may be adequate for sampling among a homogenous population ( Sandelowski, 1995 ). Others state that qualitative sample sizes of 20-30 are typically (pp. 56) conducted by researchers to establish data saturation using a grounded theory approach to qualitative inquiry ( Creswell, 1998 ). However, no evidence is presented as the basis for this latter sample size claim. Marshall and colleagues refer to a sample size of 20 as being small for a grounded theory-type approach to qualitative research and to 40 being a large sample size for the same type of study. This gives a range of what sample size they would consider appropriate, and later in the same paper, they recommend a range of 20-30 interviews for grounded research and 15-30 interviews for case studies.

Bearing in mind their North American background, such a recommended range would certainly be smaller in number at both ends of the spectrum, e.g. from UK qualitative researchers. US qualitative researchers tend to adopt larger sample sizes than other qualitative researchers ( Marshall et al. , 2013 ).

In terms of the upper limits to sample size, Sandelowski is one of the few commentators on sample size in qualitative research to note that a sample can be too large. A sample which is very large does not permit the deep, case-oriented analysis that is the raison-d’etre of qualitative inquiry ( Sandelowski, 1995 ), at least in constructivist or in-depth approaches to scientific research. In terms of how large is too large, few have ventured an opinion. Sandelowski suggests that 50 interviews is a large sample for a qualitative study. Boddy (2005b , 2005a) mentions once being asked, as a commercial marketing researcher to conduct 1,000 in-depth interviews by a US positivist researcher. Upon learning that, given resources available, this would take over a year and cost about US$1m. The US researcher re-evaluated what was meant by “in-depth”. However, such a sample size would undoubtedly be “too large”, because the sheer volume of data would inhibit meaningful, timely, qualitative analysis. This current author’s view is that any qualitative sample size over 30 (per market/geography) becomes too unwieldy to administer and analyse.

Therefore, in a single market/country or relatively homogeneous population, any qualitative sample size at or over 12 focus groups or more than 30 in-depth interviews could be considered large and would require justification. Corresponding with this viewpoint, in one of the few studies investigating actual theoretical saturation, the authors found data saturation starting to become evident at six in-depth interviews and definitely evident at 12 in-depth interviews among a sample of women in two countries ( Guest et al. , 2006 ). This suggests that multiples of 12 in-depth interviews may be more appropriate than the multiples of 10 that were commonly found in a meta-analysis ( Mason, 2010 ) of qualitative research in practice.

In a review of sample sizes in qualitative studies in the information systems discipline, the authors note that (North) American studies tend to have larger sample sizes than those from other countries ( Marshall et al. , 2013 ). They state that they cannot account for this difference. However, in a discussion of the different, USA versus UK, approaches to qualitative research using focus group discussions (UK)/focus group interviews (USA), the author notes that US researchers tend to implicitly follow a positivist epistemology ( Boddy, 2005a , 2005b ). This logically results in their favouring larger sample sizes. On the contrary, for UK researchers, the concern is more about gathering in-depth information rather than quasi-measurement and so smaller sample sizes are intuitively more appealing.

In making a justification for an adopted sample size, qualitative researchers should make reference to the scope of the study and nature of the topic ( Morse, 2000 ), the contact time to be spent on each individual research participant (respondent) ( Marshall et al. , 2013 ) and the homogeneity of the population under consideration ( Trotter, 2012 ). In practical terms, attempts should be made to make sure that the sample is as representative of the population as possible ( Bock and Sergeant, 2002 ), albeit that it may be a very tightly defined or unusual population.

Conclusions

Qualitative research often concerns developing a depth of understanding rather than a breadth, particularly when undertaken under a non-positivist paradigm, such as that involving depth psychology or a constructivist approach to research. As such, we must conclude that in these cases a single case study involving a single research participant can be of importance and can generate great insight. This logically means that the smallest acceptable sample size in these types of qualitative research is a sample of one. In many cases, therefore, the observation that many reviewers would be tempted to make, that such a sample is too small or cannot be generalised from, is not a valid criticism, particularly if the researcher has justified the sample size. One case can produce an in-depth understanding that furthers knowledge as in the case of a psychopathic CEO. Furthermore, as the example of the discovery of penicillin demonstrates, a single case can also have findings, which do validly apply across many areas.

Exceptions to this guide to sample size may be where the qualitative research is being undertaken under a positivist approach to research, for example with a view to developing a quantitative measurement instrument such as a questionnaire. In this example, it would be useful to have a more representative understanding of likely incidence rates so that questions can be prioritised in terms of inclusion in any questionnaire or other instrument. This would necessitate sampling a greater number of respondents and, in general, at least one representative of each segment of the population under consideration in the wider research should be sampled in the qualitative research.

Thus, the issue of what constitutes an appropriate sample size in qualitative research is only really answerable within the context and scientific paradigm of the research being conducted. In constructivist or in-depth qualitative research, a single example can be highly instructive.

In positivist qualitative research, a representative sample is arguably needed, involving representatives of each of the sub-segments of the total population to be researched. Researchers and reviewers may take these arguments into consideration when respectively deciding what sample sizes to use and in deciding whether to criticise the sample size used in any qualitative research that is being evaluated.

American Chemical Society ( 1999 ), The Discovery and Development of Penicillin 1928-1945 , The Alexander Fleming Laboratory Museum , London .

Bock , T. and Sergeant , J. ( 2002 ), “ Small sample market research ”, International Journal of Market Research , Vol. 44 No. 2 , p. 235 .

Boddy , C.R. ( 2005a ), “ Groups in focus: the distinctive difference between focus group discussions and focus group interviews ”, Australasian Journal of Market and Social Research , Vol. 13 No. 2 , pp. 29 - 38 .

Boddy , C.R. ( 2005b ), “ A rose by any other name may smell as sweet but ‘group discussion’ is not another name for a ‘focus group’ nor should it be ”, Qualitative Market Research: An International Journal , Vol. 8 No. 3 , pp. 248 - 255 .

Boddy , C.R. ( 2015 ), “ Psychopathic leadership: a case study of a corporate psychopath CEO ”, Journal of Business Ethics , Vol. 1 No. 1 , pp. 1 - 16 .

Boddy , C.R. , Miles , D. , Sanyal , C. and Hartog , M. ( 2015 ), “ Extreme managers, extreme workplaces: capitalism, organisations and corporate psychopaths ”, Organization , Vol. 2 No. 4 , pp. 530 - 551 .

Creswell , J.W. ( 1998 ), Qualitative Inquiry and Research Design: Choosing Among Five Traditions , Sage , Thousand Oaks, CA .

Dworkin , S. ( 2012 ), “ Sample size policy for qualitative studies using in-depth interviews ”, Archives of Sexual Behavior , Vol. 41 No. 6 , pp. 1319 - 1320 .

Fleming , A. ( 1929 ), “ On the antibacterial action of cultures of a penicillium, with special reference to their use in the isolation of B. influenzae ”, British Journal of Experimental Pathology , Vol. 10 No. 3 , p. 226 .

Gordon , W. and Langmaid , R. ( 1990 ), Qualitative Market Research: A Practitioner’s and Buyer’s Guide , Gower , Aldershot .

Guest , G. , Bunce , A. and Johnson , L. ( 2006 ), “ How many interviews are enough? An experiment with data saturation and variability ”, Field Methods , Vol. 18 No. 1 , pp. 59 - 82 .

Lincoln , Y.S. and Guba , E.G. ( 2000 ), “ Paradigm controversies, contradictions and emerging influences ”, in Denzin , N.K. and Lincoln , Y.S. (Eds), Handbook of Qualitative Research , Sage , Thousand Oaks, CA , pp. 163 - 188 .

Marshall , B. , Cardon , P. , Poddar , A. and Fontenot , R. ( 2013 ), “ Does sample size matter in qualitative research?: a review of qualitative interviews in IS research ”, Journal of Computer Information Systems , Vol. 54 No. 1 , pp. 11 - 22 .

Mason , M. ( 2010 ), “ Sample size and saturation in PhD studies using qualitative interviews ”, Forum Qualitative Sozialforschung/Forum: Qualitative Social Research , Vol. 11 No. 3 .

Mayring , P. ( 2007 ), “ On generalization in qualitatively oriented research ”, Forum Qualitative Sozialforschung/Forum: Qualitative Social Research , Vol. 8 No. 3 .

Morse , J.M. ( 2000 ), “ Determining sample size ”, Qualitative Health Research , Vol. 10 No. 1 , pp. 3 - 5 .

Onwuegbuzie , A.J. and Leech , N.L. ( 2005 ), “ The role of sampling in qualitative research ”, Academic Exchange Quarterly , Vol. 9 No. 3 , p. 280 .

Sandelowski , M. ( 1995 ), “ Sample size in qualitative research ”, Research in nursing & health , Vol. 18 No. 2 , pp. 179 - 183 .

Stake , R.E. ( 2000 ), “ Case Studies ”, in Denzin , N.K. and Lincoln , Y.S. (Eds), Handbook of Qualitative Research , Sage , London , pp. 425 - 454 .

Trotter , R.T. ( 2012 ), “ Qualitative research sample design and sample size: resolving and unresolved issues and inferential imperatives ”, Preventive Medicine , Vol. 55 No. 5 , pp. 398 - 400 .

Tsang , E.W.K. ( 2014 ), “ Generalizing from research findings: the merits of case studies ”, International Journal of Management Reviews , Vol. 16 No. 4 , pp. 369 - 383 .

Corresponding author

About the author.

Clive Roland Boddy is Professor of Leadership and Organisation Behaviour at Middlesex University where he was previously Associate Professor of Marketing. He is also co-chief examiner for the Diploma of the Market Research Society. Prior to academia, Clive ran marketing research companies in Taiwan, Hong Kong, South Korea and the UK in the 1980s and 1990s. His current research concerns workplace ethical outcomes under corporate psychopaths and toxic leaders. He is a Fellow of the Market Research Society, the Australian Institute of Management, the Chartered Institute of Marketing and the Association for Tertiary Education Management.

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

Criteria for Good Qualitative Research: A Comprehensive Review

  • Regular Article
  • Open access
  • Published: 18 September 2021
  • Volume 31 , pages 679–689, ( 2022 )

Cite this article

You have full access to this open access article

  • Drishti Yadav   ORCID: orcid.org/0000-0002-2974-0323 1  

74k Accesses

27 Citations

71 Altmetric

Explore all metrics

This review aims to synthesize a published set of evaluative criteria for good qualitative research. The aim is to shed light on existing standards for assessing the rigor of qualitative research encompassing a range of epistemological and ontological standpoints. Using a systematic search strategy, published journal articles that deliberate criteria for rigorous research were identified. Then, references of relevant articles were surveyed to find noteworthy, distinct, and well-defined pointers to good qualitative research. This review presents an investigative assessment of the pivotal features in qualitative research that can permit the readers to pass judgment on its quality and to condemn it as good research when objectively and adequately utilized. Overall, this review underlines the crux of qualitative research and accentuates the necessity to evaluate such research by the very tenets of its being. It also offers some prospects and recommendations to improve the quality of qualitative research. Based on the findings of this review, it is concluded that quality criteria are the aftereffect of socio-institutional procedures and existing paradigmatic conducts. Owing to the paradigmatic diversity of qualitative research, a single and specific set of quality criteria is neither feasible nor anticipated. Since qualitative research is not a cohesive discipline, researchers need to educate and familiarize themselves with applicable norms and decisive factors to evaluate qualitative research from within its theoretical and methodological framework of origin.

Similar content being viewed by others

qualitative research respondents number

What is Qualitative in Qualitative Research

Patrik Aspers & Ugo Corte

qualitative research respondents number

Qualitative Research: Ethical Considerations

qualitative research respondents number

How to use and assess qualitative research methods

Loraine Busetto, Wolfgang Wick & Christoph Gumbinger

Avoid common mistakes on your manuscript.

Introduction

“… It is important to regularly dialogue about what makes for good qualitative research” (Tracy, 2010 , p. 837)

To decide what represents good qualitative research is highly debatable. There are numerous methods that are contained within qualitative research and that are established on diverse philosophical perspectives. Bryman et al., ( 2008 , p. 262) suggest that “It is widely assumed that whereas quality criteria for quantitative research are well‐known and widely agreed, this is not the case for qualitative research.” Hence, the question “how to evaluate the quality of qualitative research” has been continuously debated. There are many areas of science and technology wherein these debates on the assessment of qualitative research have taken place. Examples include various areas of psychology: general psychology (Madill et al., 2000 ); counseling psychology (Morrow, 2005 ); and clinical psychology (Barker & Pistrang, 2005 ), and other disciplines of social sciences: social policy (Bryman et al., 2008 ); health research (Sparkes, 2001 ); business and management research (Johnson et al., 2006 ); information systems (Klein & Myers, 1999 ); and environmental studies (Reid & Gough, 2000 ). In the literature, these debates are enthused by the impression that the blanket application of criteria for good qualitative research developed around the positivist paradigm is improper. Such debates are based on the wide range of philosophical backgrounds within which qualitative research is conducted (e.g., Sandberg, 2000 ; Schwandt, 1996 ). The existence of methodological diversity led to the formulation of different sets of criteria applicable to qualitative research.

Among qualitative researchers, the dilemma of governing the measures to assess the quality of research is not a new phenomenon, especially when the virtuous triad of objectivity, reliability, and validity (Spencer et al., 2004 ) are not adequate. Occasionally, the criteria of quantitative research are used to evaluate qualitative research (Cohen & Crabtree, 2008 ; Lather, 2004 ). Indeed, Howe ( 2004 ) claims that the prevailing paradigm in educational research is scientifically based experimental research. Hypotheses and conjectures about the preeminence of quantitative research can weaken the worth and usefulness of qualitative research by neglecting the prominence of harmonizing match for purpose on research paradigm, the epistemological stance of the researcher, and the choice of methodology. Researchers have been reprimanded concerning this in “paradigmatic controversies, contradictions, and emerging confluences” (Lincoln & Guba, 2000 ).

In general, qualitative research tends to come from a very different paradigmatic stance and intrinsically demands distinctive and out-of-the-ordinary criteria for evaluating good research and varieties of research contributions that can be made. This review attempts to present a series of evaluative criteria for qualitative researchers, arguing that their choice of criteria needs to be compatible with the unique nature of the research in question (its methodology, aims, and assumptions). This review aims to assist researchers in identifying some of the indispensable features or markers of high-quality qualitative research. In a nutshell, the purpose of this systematic literature review is to analyze the existing knowledge on high-quality qualitative research and to verify the existence of research studies dealing with the critical assessment of qualitative research based on the concept of diverse paradigmatic stances. Contrary to the existing reviews, this review also suggests some critical directions to follow to improve the quality of qualitative research in different epistemological and ontological perspectives. This review is also intended to provide guidelines for the acceleration of future developments and dialogues among qualitative researchers in the context of assessing the qualitative research.

The rest of this review article is structured in the following fashion: Sect.  Methods describes the method followed for performing this review. Section Criteria for Evaluating Qualitative Studies provides a comprehensive description of the criteria for evaluating qualitative studies. This section is followed by a summary of the strategies to improve the quality of qualitative research in Sect.  Improving Quality: Strategies . Section  How to Assess the Quality of the Research Findings? provides details on how to assess the quality of the research findings. After that, some of the quality checklists (as tools to evaluate quality) are discussed in Sect.  Quality Checklists: Tools for Assessing the Quality . At last, the review ends with the concluding remarks presented in Sect.  Conclusions, Future Directions and Outlook . Some prospects in qualitative research for enhancing its quality and usefulness in the social and techno-scientific research community are also presented in Sect.  Conclusions, Future Directions and Outlook .

For this review, a comprehensive literature search was performed from many databases using generic search terms such as Qualitative Research , Criteria , etc . The following databases were chosen for the literature search based on the high number of results: IEEE Explore, ScienceDirect, PubMed, Google Scholar, and Web of Science. The following keywords (and their combinations using Boolean connectives OR/AND) were adopted for the literature search: qualitative research, criteria, quality, assessment, and validity. The synonyms for these keywords were collected and arranged in a logical structure (see Table 1 ). All publications in journals and conference proceedings later than 1950 till 2021 were considered for the search. Other articles extracted from the references of the papers identified in the electronic search were also included. A large number of publications on qualitative research were retrieved during the initial screening. Hence, to include the searches with the main focus on criteria for good qualitative research, an inclusion criterion was utilized in the search string.

From the selected databases, the search retrieved a total of 765 publications. Then, the duplicate records were removed. After that, based on the title and abstract, the remaining 426 publications were screened for their relevance by using the following inclusion and exclusion criteria (see Table 2 ). Publications focusing on evaluation criteria for good qualitative research were included, whereas those works which delivered theoretical concepts on qualitative research were excluded. Based on the screening and eligibility, 45 research articles were identified that offered explicit criteria for evaluating the quality of qualitative research and were found to be relevant to this review.

Figure  1 illustrates the complete review process in the form of PRISMA flow diagram. PRISMA, i.e., “preferred reporting items for systematic reviews and meta-analyses” is employed in systematic reviews to refine the quality of reporting.

figure 1

PRISMA flow diagram illustrating the search and inclusion process. N represents the number of records

Criteria for Evaluating Qualitative Studies

Fundamental criteria: general research quality.

Various researchers have put forward criteria for evaluating qualitative research, which have been summarized in Table 3 . Also, the criteria outlined in Table 4 effectively deliver the various approaches to evaluate and assess the quality of qualitative work. The entries in Table 4 are based on Tracy’s “Eight big‐tent criteria for excellent qualitative research” (Tracy, 2010 ). Tracy argues that high-quality qualitative work should formulate criteria focusing on the worthiness, relevance, timeliness, significance, morality, and practicality of the research topic, and the ethical stance of the research itself. Researchers have also suggested a series of questions as guiding principles to assess the quality of a qualitative study (Mays & Pope, 2020 ). Nassaji ( 2020 ) argues that good qualitative research should be robust, well informed, and thoroughly documented.

Qualitative Research: Interpretive Paradigms

All qualitative researchers follow highly abstract principles which bring together beliefs about ontology, epistemology, and methodology. These beliefs govern how the researcher perceives and acts. The net, which encompasses the researcher’s epistemological, ontological, and methodological premises, is referred to as a paradigm, or an interpretive structure, a “Basic set of beliefs that guides action” (Guba, 1990 ). Four major interpretive paradigms structure the qualitative research: positivist and postpositivist, constructivist interpretive, critical (Marxist, emancipatory), and feminist poststructural. The complexity of these four abstract paradigms increases at the level of concrete, specific interpretive communities. Table 5 presents these paradigms and their assumptions, including their criteria for evaluating research, and the typical form that an interpretive or theoretical statement assumes in each paradigm. Moreover, for evaluating qualitative research, quantitative conceptualizations of reliability and validity are proven to be incompatible (Horsburgh, 2003 ). In addition, a series of questions have been put forward in the literature to assist a reviewer (who is proficient in qualitative methods) for meticulous assessment and endorsement of qualitative research (Morse, 2003 ). Hammersley ( 2007 ) also suggests that guiding principles for qualitative research are advantageous, but methodological pluralism should not be simply acknowledged for all qualitative approaches. Seale ( 1999 ) also points out the significance of methodological cognizance in research studies.

Table 5 reflects that criteria for assessing the quality of qualitative research are the aftermath of socio-institutional practices and existing paradigmatic standpoints. Owing to the paradigmatic diversity of qualitative research, a single set of quality criteria is neither possible nor desirable. Hence, the researchers must be reflexive about the criteria they use in the various roles they play within their research community.

Improving Quality: Strategies

Another critical question is “How can the qualitative researchers ensure that the abovementioned quality criteria can be met?” Lincoln and Guba ( 1986 ) delineated several strategies to intensify each criteria of trustworthiness. Other researchers (Merriam & Tisdell, 2016 ; Shenton, 2004 ) also presented such strategies. A brief description of these strategies is shown in Table 6 .

It is worth mentioning that generalizability is also an integral part of qualitative research (Hays & McKibben, 2021 ). In general, the guiding principle pertaining to generalizability speaks about inducing and comprehending knowledge to synthesize interpretive components of an underlying context. Table 7 summarizes the main metasynthesis steps required to ascertain generalizability in qualitative research.

Figure  2 reflects the crucial components of a conceptual framework and their contribution to decisions regarding research design, implementation, and applications of results to future thinking, study, and practice (Johnson et al., 2020 ). The synergy and interrelationship of these components signifies their role to different stances of a qualitative research study.

figure 2

Essential elements of a conceptual framework

In a nutshell, to assess the rationale of a study, its conceptual framework and research question(s), quality criteria must take account of the following: lucid context for the problem statement in the introduction; well-articulated research problems and questions; precise conceptual framework; distinct research purpose; and clear presentation and investigation of the paradigms. These criteria would expedite the quality of qualitative research.

How to Assess the Quality of the Research Findings?

The inclusion of quotes or similar research data enhances the confirmability in the write-up of the findings. The use of expressions (for instance, “80% of all respondents agreed that” or “only one of the interviewees mentioned that”) may also quantify qualitative findings (Stenfors et al., 2020 ). On the other hand, the persuasive reason for “why this may not help in intensifying the research” has also been provided (Monrouxe & Rees, 2020 ). Further, the Discussion and Conclusion sections of an article also prove robust markers of high-quality qualitative research, as elucidated in Table 8 .

Quality Checklists: Tools for Assessing the Quality

Numerous checklists are available to speed up the assessment of the quality of qualitative research. However, if used uncritically and recklessly concerning the research context, these checklists may be counterproductive. I recommend that such lists and guiding principles may assist in pinpointing the markers of high-quality qualitative research. However, considering enormous variations in the authors’ theoretical and philosophical contexts, I would emphasize that high dependability on such checklists may say little about whether the findings can be applied in your setting. A combination of such checklists might be appropriate for novice researchers. Some of these checklists are listed below:

The most commonly used framework is Consolidated Criteria for Reporting Qualitative Research (COREQ) (Tong et al., 2007 ). This framework is recommended by some journals to be followed by the authors during article submission.

Standards for Reporting Qualitative Research (SRQR) is another checklist that has been created particularly for medical education (O’Brien et al., 2014 ).

Also, Tracy ( 2010 ) and Critical Appraisal Skills Programme (CASP, 2021 ) offer criteria for qualitative research relevant across methods and approaches.

Further, researchers have also outlined different criteria as hallmarks of high-quality qualitative research. For instance, the “Road Trip Checklist” (Epp & Otnes, 2021 ) provides a quick reference to specific questions to address different elements of high-quality qualitative research.

Conclusions, Future Directions, and Outlook

This work presents a broad review of the criteria for good qualitative research. In addition, this article presents an exploratory analysis of the essential elements in qualitative research that can enable the readers of qualitative work to judge it as good research when objectively and adequately utilized. In this review, some of the essential markers that indicate high-quality qualitative research have been highlighted. I scope them narrowly to achieve rigor in qualitative research and note that they do not completely cover the broader considerations necessary for high-quality research. This review points out that a universal and versatile one-size-fits-all guideline for evaluating the quality of qualitative research does not exist. In other words, this review also emphasizes the non-existence of a set of common guidelines among qualitative researchers. In unison, this review reinforces that each qualitative approach should be treated uniquely on account of its own distinctive features for different epistemological and disciplinary positions. Owing to the sensitivity of the worth of qualitative research towards the specific context and the type of paradigmatic stance, researchers should themselves analyze what approaches can be and must be tailored to ensemble the distinct characteristics of the phenomenon under investigation. Although this article does not assert to put forward a magic bullet and to provide a one-stop solution for dealing with dilemmas about how, why, or whether to evaluate the “goodness” of qualitative research, it offers a platform to assist the researchers in improving their qualitative studies. This work provides an assembly of concerns to reflect on, a series of questions to ask, and multiple sets of criteria to look at, when attempting to determine the quality of qualitative research. Overall, this review underlines the crux of qualitative research and accentuates the need to evaluate such research by the very tenets of its being. Bringing together the vital arguments and delineating the requirements that good qualitative research should satisfy, this review strives to equip the researchers as well as reviewers to make well-versed judgment about the worth and significance of the qualitative research under scrutiny. In a nutshell, a comprehensive portrayal of the research process (from the context of research to the research objectives, research questions and design, speculative foundations, and from approaches of collecting data to analyzing the results, to deriving inferences) frequently proliferates the quality of a qualitative research.

Prospects : A Road Ahead for Qualitative Research

Irrefutably, qualitative research is a vivacious and evolving discipline wherein different epistemological and disciplinary positions have their own characteristics and importance. In addition, not surprisingly, owing to the sprouting and varied features of qualitative research, no consensus has been pulled off till date. Researchers have reflected various concerns and proposed several recommendations for editors and reviewers on conducting reviews of critical qualitative research (Levitt et al., 2021 ; McGinley et al., 2021 ). Following are some prospects and a few recommendations put forward towards the maturation of qualitative research and its quality evaluation:

In general, most of the manuscript and grant reviewers are not qualitative experts. Hence, it is more likely that they would prefer to adopt a broad set of criteria. However, researchers and reviewers need to keep in mind that it is inappropriate to utilize the same approaches and conducts among all qualitative research. Therefore, future work needs to focus on educating researchers and reviewers about the criteria to evaluate qualitative research from within the suitable theoretical and methodological context.

There is an urgent need to refurbish and augment critical assessment of some well-known and widely accepted tools (including checklists such as COREQ, SRQR) to interrogate their applicability on different aspects (along with their epistemological ramifications).

Efforts should be made towards creating more space for creativity, experimentation, and a dialogue between the diverse traditions of qualitative research. This would potentially help to avoid the enforcement of one's own set of quality criteria on the work carried out by others.

Moreover, journal reviewers need to be aware of various methodological practices and philosophical debates.

It is pivotal to highlight the expressions and considerations of qualitative researchers and bring them into a more open and transparent dialogue about assessing qualitative research in techno-scientific, academic, sociocultural, and political rooms.

Frequent debates on the use of evaluative criteria are required to solve some potentially resolved issues (including the applicability of a single set of criteria in multi-disciplinary aspects). Such debates would not only benefit the group of qualitative researchers themselves, but primarily assist in augmenting the well-being and vivacity of the entire discipline.

To conclude, I speculate that the criteria, and my perspective, may transfer to other methods, approaches, and contexts. I hope that they spark dialog and debate – about criteria for excellent qualitative research and the underpinnings of the discipline more broadly – and, therefore, help improve the quality of a qualitative study. Further, I anticipate that this review will assist the researchers to contemplate on the quality of their own research, to substantiate research design and help the reviewers to review qualitative research for journals. On a final note, I pinpoint the need to formulate a framework (encompassing the prerequisites of a qualitative study) by the cohesive efforts of qualitative researchers of different disciplines with different theoretic-paradigmatic origins. I believe that tailoring such a framework (of guiding principles) paves the way for qualitative researchers to consolidate the status of qualitative research in the wide-ranging open science debate. Dialogue on this issue across different approaches is crucial for the impending prospects of socio-techno-educational research.

Amin, M. E. K., Nørgaard, L. S., Cavaco, A. M., Witry, M. J., Hillman, L., Cernasev, A., & Desselle, S. P. (2020). Establishing trustworthiness and authenticity in qualitative pharmacy research. Research in Social and Administrative Pharmacy, 16 (10), 1472–1482.

Article   Google Scholar  

Barker, C., & Pistrang, N. (2005). Quality criteria under methodological pluralism: Implications for conducting and evaluating research. American Journal of Community Psychology, 35 (3–4), 201–212.

Bryman, A., Becker, S., & Sempik, J. (2008). Quality criteria for quantitative, qualitative and mixed methods research: A view from social policy. International Journal of Social Research Methodology, 11 (4), 261–276.

Caelli, K., Ray, L., & Mill, J. (2003). ‘Clear as mud’: Toward greater clarity in generic qualitative research. International Journal of Qualitative Methods, 2 (2), 1–13.

CASP (2021). CASP checklists. Retrieved May 2021 from https://casp-uk.net/casp-tools-checklists/

Cohen, D. J., & Crabtree, B. F. (2008). Evaluative criteria for qualitative research in health care: Controversies and recommendations. The Annals of Family Medicine, 6 (4), 331–339.

Denzin, N. K., & Lincoln, Y. S. (2005). Introduction: The discipline and practice of qualitative research. In N. K. Denzin & Y. S. Lincoln (Eds.), The sage handbook of qualitative research (pp. 1–32). Sage Publications Ltd.

Google Scholar  

Elliott, R., Fischer, C. T., & Rennie, D. L. (1999). Evolving guidelines for publication of qualitative research studies in psychology and related fields. British Journal of Clinical Psychology, 38 (3), 215–229.

Epp, A. M., & Otnes, C. C. (2021). High-quality qualitative research: Getting into gear. Journal of Service Research . https://doi.org/10.1177/1094670520961445

Guba, E. G. (1990). The paradigm dialog. In Alternative paradigms conference, mar, 1989, Indiana u, school of education, San Francisco, ca, us . Sage Publications, Inc.

Hammersley, M. (2007). The issue of quality in qualitative research. International Journal of Research and Method in Education, 30 (3), 287–305.

Haven, T. L., Errington, T. M., Gleditsch, K. S., van Grootel, L., Jacobs, A. M., Kern, F. G., & Mokkink, L. B. (2020). Preregistering qualitative research: A Delphi study. International Journal of Qualitative Methods, 19 , 1609406920976417.

Hays, D. G., & McKibben, W. B. (2021). Promoting rigorous research: Generalizability and qualitative research. Journal of Counseling and Development, 99 (2), 178–188.

Horsburgh, D. (2003). Evaluation of qualitative research. Journal of Clinical Nursing, 12 (2), 307–312.

Howe, K. R. (2004). A critique of experimentalism. Qualitative Inquiry, 10 (1), 42–46.

Johnson, J. L., Adkins, D., & Chauvin, S. (2020). A review of the quality indicators of rigor in qualitative research. American Journal of Pharmaceutical Education, 84 (1), 7120.

Johnson, P., Buehring, A., Cassell, C., & Symon, G. (2006). Evaluating qualitative management research: Towards a contingent criteriology. International Journal of Management Reviews, 8 (3), 131–156.

Klein, H. K., & Myers, M. D. (1999). A set of principles for conducting and evaluating interpretive field studies in information systems. MIS Quarterly, 23 (1), 67–93.

Lather, P. (2004). This is your father’s paradigm: Government intrusion and the case of qualitative research in education. Qualitative Inquiry, 10 (1), 15–34.

Levitt, H. M., Morrill, Z., Collins, K. M., & Rizo, J. L. (2021). The methodological integrity of critical qualitative research: Principles to support design and research review. Journal of Counseling Psychology, 68 (3), 357.

Lincoln, Y. S., & Guba, E. G. (1986). But is it rigorous? Trustworthiness and authenticity in naturalistic evaluation. New Directions for Program Evaluation, 1986 (30), 73–84.

Lincoln, Y. S., & Guba, E. G. (2000). Paradigmatic controversies, contradictions and emerging confluences. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (2nd ed., pp. 163–188). Sage Publications.

Madill, A., Jordan, A., & Shirley, C. (2000). Objectivity and reliability in qualitative analysis: Realist, contextualist and radical constructionist epistemologies. British Journal of Psychology, 91 (1), 1–20.

Mays, N., & Pope, C. (2020). Quality in qualitative research. Qualitative Research in Health Care . https://doi.org/10.1002/9781119410867.ch15

McGinley, S., Wei, W., Zhang, L., & Zheng, Y. (2021). The state of qualitative research in hospitality: A 5-year review 2014 to 2019. Cornell Hospitality Quarterly, 62 (1), 8–20.

Merriam, S., & Tisdell, E. (2016). Qualitative research: A guide to design and implementation. San Francisco, US.

Meyer, M., & Dykes, J. (2019). Criteria for rigor in visualization design study. IEEE Transactions on Visualization and Computer Graphics, 26 (1), 87–97.

Monrouxe, L. V., & Rees, C. E. (2020). When I say… quantification in qualitative research. Medical Education, 54 (3), 186–187.

Morrow, S. L. (2005). Quality and trustworthiness in qualitative research in counseling psychology. Journal of Counseling Psychology, 52 (2), 250.

Morse, J. M. (2003). A review committee’s guide for evaluating qualitative proposals. Qualitative Health Research, 13 (6), 833–851.

Nassaji, H. (2020). Good qualitative research. Language Teaching Research, 24 (4), 427–431.

O’Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., & Cook, D. A. (2014). Standards for reporting qualitative research: A synthesis of recommendations. Academic Medicine, 89 (9), 1245–1251.

O’Connor, C., & Joffe, H. (2020). Intercoder reliability in qualitative research: Debates and practical guidelines. International Journal of Qualitative Methods, 19 , 1609406919899220.

Reid, A., & Gough, S. (2000). Guidelines for reporting and evaluating qualitative research: What are the alternatives? Environmental Education Research, 6 (1), 59–91.

Rocco, T. S. (2010). Criteria for evaluating qualitative studies. Human Resource Development International . https://doi.org/10.1080/13678868.2010.501959

Sandberg, J. (2000). Understanding human competence at work: An interpretative approach. Academy of Management Journal, 43 (1), 9–25.

Schwandt, T. A. (1996). Farewell to criteriology. Qualitative Inquiry, 2 (1), 58–72.

Seale, C. (1999). Quality in qualitative research. Qualitative Inquiry, 5 (4), 465–478.

Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects. Education for Information, 22 (2), 63–75.

Sparkes, A. C. (2001). Myth 94: Qualitative health researchers will agree about validity. Qualitative Health Research, 11 (4), 538–552.

Spencer, L., Ritchie, J., Lewis, J., & Dillon, L. (2004). Quality in qualitative evaluation: A framework for assessing research evidence.

Stenfors, T., Kajamaa, A., & Bennett, D. (2020). How to assess the quality of qualitative research. The Clinical Teacher, 17 (6), 596–599.

Taylor, E. W., Beck, J., & Ainsworth, E. (2001). Publishing qualitative adult education research: A peer review perspective. Studies in the Education of Adults, 33 (2), 163–179.

Tong, A., Sainsbury, P., & Craig, J. (2007). Consolidated criteria for reporting qualitative research (COREQ): A 32-item checklist for interviews and focus groups. International Journal for Quality in Health Care, 19 (6), 349–357.

Tracy, S. J. (2010). Qualitative quality: Eight “big-tent” criteria for excellent qualitative research. Qualitative Inquiry, 16 (10), 837–851.

Download references

Open access funding provided by TU Wien (TUW).

Author information

Authors and affiliations.

Faculty of Informatics, Technische Universität Wien, 1040, Vienna, Austria

Drishti Yadav

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Drishti Yadav .

Ethics declarations

Conflict of interest.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Yadav, D. Criteria for Good Qualitative Research: A Comprehensive Review. Asia-Pacific Edu Res 31 , 679–689 (2022). https://doi.org/10.1007/s40299-021-00619-0

Download citation

Accepted : 28 August 2021

Published : 18 September 2021

Issue Date : December 2022

DOI : https://doi.org/10.1007/s40299-021-00619-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Qualitative research
  • Evaluative criteria
  • Find a journal
  • Publish with us
  • Track your research

Logo for Open Educational Resources

Chapter 5. Sampling

Introduction.

Most Americans will experience unemployment at some point in their lives. Sarah Damaske ( 2021 ) was interested in learning about how men and women experience unemployment differently. To answer this question, she interviewed unemployed people. After conducting a “pilot study” with twenty interviewees, she realized she was also interested in finding out how working-class and middle-class persons experienced unemployment differently. She found one hundred persons through local unemployment offices. She purposefully selected a roughly equal number of men and women and working-class and middle-class persons for the study. This would allow her to make the kinds of comparisons she was interested in. She further refined her selection of persons to interview:

I decided that I needed to be able to focus my attention on gender and class; therefore, I interviewed only people born between 1962 and 1987 (ages 28–52, the prime working and child-rearing years), those who worked full-time before their job loss, those who experienced an involuntary job loss during the past year, and those who did not lose a job for cause (e.g., were not fired because of their behavior at work). ( 244 )

The people she ultimately interviewed compose her sample. They represent (“sample”) the larger population of the involuntarily unemployed. This “theoretically informed stratified sampling design” allowed Damaske “to achieve relatively equal distribution of participation across gender and class,” but it came with some limitations. For one, the unemployment centers were located in primarily White areas of the country, so there were very few persons of color interviewed. Qualitative researchers must make these kinds of decisions all the time—who to include and who not to include. There is never an absolutely correct decision, as the choice is linked to the particular research question posed by the particular researcher, although some sampling choices are more compelling than others. In this case, Damaske made the choice to foreground both gender and class rather than compare all middle-class men and women or women of color from different class positions or just talk to White men. She leaves the door open for other researchers to sample differently. Because science is a collective enterprise, it is most likely someone will be inspired to conduct a similar study as Damaske’s but with an entirely different sample.

This chapter is all about sampling. After you have developed a research question and have a general idea of how you will collect data (observations or interviews), how do you go about actually finding people and sites to study? Although there is no “correct number” of people to interview, the sample should follow the research question and research design. You might remember studying sampling in a quantitative research course. Sampling is important here too, but it works a bit differently. Unlike quantitative research, qualitative research involves nonprobability sampling. This chapter explains why this is so and what qualities instead make a good sample for qualitative research.

Quick Terms Refresher

  • The population is the entire group that you want to draw conclusions about.
  • The sample is the specific group of individuals that you will collect data from.
  • Sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population).
  • Sample size is how many individuals (or units) are included in your sample.

The “Who” of Your Research Study

After you have turned your general research interest into an actual research question and identified an approach you want to take to answer that question, you will need to specify the people you will be interviewing or observing. In most qualitative research, the objects of your study will indeed be people. In some cases, however, your objects might be content left by people (e.g., diaries, yearbooks, photographs) or documents (official or unofficial) or even institutions (e.g., schools, medical centers) and locations (e.g., nation-states, cities). Chances are, whatever “people, places, or things” are the objects of your study, you will not really be able to talk to, observe, or follow every single individual/object of the entire population of interest. You will need to create a sample of the population . Sampling in qualitative research has different purposes and goals than sampling in quantitative research. Sampling in both allows you to say something of interest about a population without having to include the entire population in your sample.

We begin this chapter with the case of a population of interest composed of actual people. After we have a better understanding of populations and samples that involve real people, we’ll discuss sampling in other types of qualitative research, such as archival research, content analysis, and case studies. We’ll then move to a larger discussion about the difference between sampling in qualitative research generally versus quantitative research, then we’ll move on to the idea of “theoretical” generalizability, and finally, we’ll conclude with some practical tips on the correct “number” to include in one’s sample.

Sampling People

To help think through samples, let’s imagine we want to know more about “vaccine hesitancy.” We’ve all lived through 2020 and 2021, and we know that a sizable number of people in the United States (and elsewhere) were slow to accept vaccines, even when these were freely available. By some accounts, about one-third of Americans initially refused vaccination. Why is this so? Well, as I write this in the summer of 2021, we know that some people actively refused the vaccination, thinking it was harmful or part of a government plot. Others were simply lazy or dismissed the necessity. And still others were worried about harmful side effects. The general population of interest here (all adult Americans who were not vaccinated by August 2021) may be as many as eighty million people. We clearly cannot talk to all of them. So we will have to narrow the number to something manageable. How can we do this?

Null

First, we have to think about our actual research question and the form of research we are conducting. I am going to begin with a quantitative research question. Quantitative research questions tend to be simpler to visualize, at least when we are first starting out doing social science research. So let us say we want to know what percentage of each kind of resistance is out there and how race or class or gender affects vaccine hesitancy. Again, we don’t have the ability to talk to everyone. But harnessing what we know about normal probability distributions (see quantitative methods for more on this), we can find this out through a sample that represents the general population. We can’t really address these particular questions if we only talk to White women who go to college with us. And if you are really trying to generalize the specific findings of your sample to the larger population, you will have to employ probability sampling , a sampling technique where a researcher sets a selection of a few criteria and chooses members of a population randomly. Why randomly? If truly random, all the members have an equal opportunity to be a part of the sample, and thus we avoid the problem of having only our friends and neighbors (who may be very different from other people in the population) in the study. Mathematically, there is going to be a certain number that will be large enough to allow us to generalize our particular findings from our sample population to the population at large. It might surprise you how small that number can be. Election polls of no more than one thousand people are routinely used to predict actual election outcomes of millions of people. Below that number, however, you will not be able to make generalizations. Talking to five people at random is simply not enough people to predict a presidential election.

In order to answer quantitative research questions of causality, one must employ probability sampling. Quantitative researchers try to generalize their findings to a larger population. Samples are designed with that in mind. Qualitative researchers ask very different questions, though. Qualitative research questions are not about “how many” of a certain group do X (in this case, what percentage of the unvaccinated hesitate for concern about safety rather than reject vaccination on political grounds). Qualitative research employs nonprobability sampling . By definition, not everyone has an equal opportunity to be included in the sample. The researcher might select White women they go to college with to provide insight into racial and gender dynamics at play. Whatever is found by doing so will not be generalizable to everyone who has not been vaccinated, or even all White women who have not been vaccinated, or even all White women who have not been vaccinated who are in this particular college. That is not the point of qualitative research at all. This is a really important distinction, so I will repeat in bold: Qualitative researchers are not trying to statistically generalize specific findings to a larger population . They have not failed when their sample cannot be generalized, as that is not the point at all.

In the previous paragraph, I said it would be perfectly acceptable for a qualitative researcher to interview five White women with whom she goes to college about their vaccine hesitancy “to provide insight into racial and gender dynamics at play.” The key word here is “insight.” Rather than use a sample as a stand-in for the general population, as quantitative researchers do, the qualitative researcher uses the sample to gain insight into a process or phenomenon. The qualitative researcher is not going to be content with simply asking each of the women to state her reason for not being vaccinated and then draw conclusions that, because one in five of these women were concerned about their health, one in five of all people were also concerned about their health. That would be, frankly, a very poor study indeed. Rather, the qualitative researcher might sit down with each of the women and conduct a lengthy interview about what the vaccine means to her, why she is hesitant, how she manages her hesitancy (how she explains it to her friends), what she thinks about others who are unvaccinated, what she thinks of those who have been vaccinated, and what she knows or thinks she knows about COVID-19. The researcher might include specific interview questions about the college context, about their status as White women, about the political beliefs they hold about racism in the US, and about how their own political affiliations may or may not provide narrative scripts about “protective whiteness.” There are many interesting things to ask and learn about and many things to discover. Where a quantitative researcher begins with clear parameters to set their population and guide their sample selection process, the qualitative researcher is discovering new parameters, making it impossible to engage in probability sampling.

Looking at it this way, sampling for qualitative researchers needs to be more strategic. More theoretically informed. What persons can be interviewed or observed that would provide maximum insight into what is still unknown? In other words, qualitative researchers think through what cases they could learn the most from, and those are the cases selected to study: “What would be ‘bias’ in statistical sampling, and therefore a weakness, becomes intended focus in qualitative sampling, and therefore a strength. The logic and power of purposeful sampling like in selecting information-rich cases for study in depth. Information-rich cases are those from which one can learn a great deal about issues of central importance to the purpose of the inquiry, thus the term purposeful sampling” ( Patton 2002:230 ; emphases in the original).

Before selecting your sample, though, it is important to clearly identify the general population of interest. You need to know this before you can determine the sample. In our example case, it is “adult Americans who have not yet been vaccinated.” Depending on the specific qualitative research question, however, it might be “adult Americans who have been vaccinated for political reasons” or even “college students who have not been vaccinated.” What insights are you seeking? Do you want to know how politics is affecting vaccination? Or do you want to understand how people manage being an outlier in a particular setting (unvaccinated where vaccinations are heavily encouraged if not required)? More clearly stated, your population should align with your research question . Think back to the opening story about Damaske’s work studying the unemployed. She drew her sample narrowly to address the particular questions she was interested in pursuing. Knowing your questions or, at a minimum, why you are interested in the topic will allow you to draw the best sample possible to achieve insight.

Once you have your population in mind, how do you go about getting people to agree to be in your sample? In qualitative research, it is permissible to find people by convenience. Just ask for people who fit your sample criteria and see who shows up. Or reach out to friends and colleagues and see if they know anyone that fits. Don’t let the name convenience sampling mislead you; this is not exactly “easy,” and it is certainly a valid form of sampling in qualitative research. The more unknowns you have about what you will find, the more convenience sampling makes sense. If you don’t know how race or class or political affiliation might matter, and your population is unvaccinated college students, you can construct a sample of college students by placing an advertisement in the student paper or posting a flyer on a notice board. Whoever answers is your sample. That is what is meant by a convenience sample. A common variation of convenience sampling is snowball sampling . This is particularly useful if your target population is hard to find. Let’s say you posted a flyer about your study and only two college students responded. You could then ask those two students for referrals. They tell their friends, and those friends tell other friends, and, like a snowball, your sample gets bigger and bigger.

Researcher Note

Gaining Access: When Your Friend Is Your Research Subject

My early experience with qualitative research was rather unique. At that time, I needed to do a project that required me to interview first-generation college students, and my friends, with whom I had been sharing a dorm for two years, just perfectly fell into the sample category. Thus, I just asked them and easily “gained my access” to the research subject; I know them, we are friends, and I am part of them. I am an insider. I also thought, “Well, since I am part of the group, I can easily understand their language and norms, I can capture their honesty, read their nonverbal cues well, will get more information, as they will be more opened to me because they trust me.” All in all, easy access with rich information. But, gosh, I did not realize that my status as an insider came with a price! When structuring the interview questions, I began to realize that rather than focusing on the unique experiences of my friends, I mostly based the questions on my own experiences, assuming we have similar if not the same experiences. I began to struggle with my objectivity and even questioned my role; am I doing this as part of the group or as a researcher? I came to know later that my status as an insider or my “positionality” may impact my research. It not only shapes the process of data collection but might heavily influence my interpretation of the data. I came to realize that although my inside status came with a lot of benefits (especially for access), it could also bring some drawbacks.

—Dede Setiono, PhD student focusing on international development and environmental policy, Oregon State University

The more you know about what you might find, the more strategic you can be. If you wanted to compare how politically conservative and politically liberal college students explained their vaccine hesitancy, for example, you might construct a sample purposively, finding an equal number of both types of students so that you can make those comparisons in your analysis. This is what Damaske ( 2021 ) did. You could still use convenience or snowball sampling as a way of recruitment. Post a flyer at the conservative student club and then ask for referrals from the one student that agrees to be interviewed. As with convenience sampling, there are variations of purposive sampling as well as other names used (e.g., judgment, quota, stratified, criterion, theoretical). Try not to get bogged down in the nomenclature; instead, focus on identifying the general population that matches your research question and then using a sampling method that is most likely to provide insight, given the types of questions you have.

There are all kinds of ways of being strategic with sampling in qualitative research. Here are a few of my favorite techniques for maximizing insight:

  • Consider using “extreme” or “deviant” cases. Maybe your college houses a prominent anti-vaxxer who has written about and demonstrated against the college’s policy on vaccines. You could learn a lot from that single case (depending on your research question, of course).
  • Consider “intensity”: people and cases and circumstances where your questions are more likely to feature prominently (but not extremely or deviantly). For example, you could compare those who volunteer at local Republican and Democratic election headquarters during an election season in a study on why party matters. Those who volunteer are more likely to have something to say than those who are more apathetic.
  • Maximize variation, as with the case of “politically liberal” versus “politically conservative,” or include an array of social locations (young vs. old; Northwest vs. Southeast region). This kind of heterogeneity sampling can capture and describe the central themes that cut across the variations: any common patterns that emerge, even in this wildly mismatched sample, are probably important to note!
  • Rather than maximize the variation, you could select a small homogenous sample to describe some particular subgroup in depth. Focus groups are often the best form of data collection for homogeneity sampling.
  • Think about which cases are “critical” or politically important—ones that “if it happens here, it would happen anywhere” or a case that is politically sensitive, as with the single “blue” (Democratic) county in a “red” (Republican) state. In both, you are choosing a site that would yield the most information and have the greatest impact on the development of knowledge.
  • On the other hand, sometimes you want to select the “typical”—the typical college student, for example. You are trying to not generalize from the typical but illustrate aspects that may be typical of this case or group. When selecting for typicality, be clear with yourself about why the typical matches your research questions (and who might be excluded or marginalized in doing so).
  • Finally, it is often a good idea to look for disconfirming cases : if you are at the stage where you have a hypothesis (of sorts), you might select those who do not fit your hypothesis—you will surely learn something important there. They may be “exceptions that prove the rule” or exceptions that force you to alter your findings in order to make sense of these additional cases.

In addition to all these sampling variations, there is the theoretical approach taken by grounded theorists in which the researcher samples comparative people (or events) on the basis of their potential to represent important theoretical constructs. The sample, one can say, is by definition representative of the phenomenon of interest. It accompanies the constant comparative method of analysis. In the words of the funders of Grounded Theory , “Theoretical sampling is sampling on the basis of the emerging concepts, with the aim being to explore the dimensional range or varied conditions along which the properties of the concepts vary” ( Strauss and Corbin 1998:73 ).

When Your Population is Not Composed of People

I think it is easiest for most people to think of populations and samples in terms of people, but sometimes our units of analysis are not actually people. They could be places or institutions. Even so, you might still want to talk to people or observe the actions of people to understand those places or institutions. Or not! In the case of content analyses (see chapter 17), you won’t even have people involved at all but rather documents or films or photographs or news clippings. Everything we have covered about sampling applies to other units of analysis too. Let’s work through some examples.

Case Studies

When constructing a case study, it is helpful to think of your cases as sample populations in the same way that we considered people above. If, for example, you are comparing campus climates for diversity, your overall population may be “four-year college campuses in the US,” and from there you might decide to study three college campuses as your sample. Which three? Will you use purposeful sampling (perhaps [1] selecting three colleges in Oregon that are different sizes or [2] selecting three colleges across the US located in different political cultures or [3] varying the three colleges by racial makeup of the student body)? Or will you select three colleges at random, out of convenience? There are justifiable reasons for all approaches.

As with people, there are different ways of maximizing insight in your sample selection. Think about the following rationales: typical, diverse, extreme, deviant, influential, crucial, or even embodying a particular “pathway” ( Gerring 2008 ). When choosing a case or particular research site, Rubin ( 2021 ) suggests you bear in mind, first, what you are leaving out by selecting this particular case/site; second, what you might be overemphasizing by studying this case/site and not another; and, finally, whether you truly need to worry about either of those things—“that is, what are the sources of bias and how bad are they for what you are trying to do?” ( 89 ).

Once you have selected your cases, you may still want to include interviews with specific people or observations at particular sites within those cases. Then you go through possible sampling approaches all over again to determine which people will be contacted.

Content: Documents, Narrative Accounts, And So On

Although not often discussed as sampling, your selection of documents and other units to use in various content/historical analyses is subject to similar considerations. When you are asking quantitative-type questions (percentages and proportionalities of a general population), you will want to follow probabilistic sampling. For example, I created a random sample of accounts posted on the website studentloanjustice.org to delineate the types of problems people were having with student debt ( Hurst 2007 ). Even though my data was qualitative (narratives of student debt), I was actually asking a quantitative-type research question, so it was important that my sample was representative of the larger population (debtors who posted on the website). On the other hand, when you are asking qualitative-type questions, the selection process should be very different. In that case, use nonprobabilistic techniques, either convenience (where you are really new to this data and do not have the ability to set comparative criteria or even know what a deviant case would be) or some variant of purposive sampling. Let’s say you were interested in the visual representation of women in media published in the 1950s. You could select a national magazine like Time for a “typical” representation (and for its convenience, as all issues are freely available on the web and easy to search). Or you could compare one magazine known for its feminist content versus one antifeminist. The point is, sample selection is important even when you are not interviewing or observing people.

Goals of Qualitative Sampling versus Goals of Quantitative Sampling

We have already discussed some of the differences in the goals of quantitative and qualitative sampling above, but it is worth further discussion. The quantitative researcher seeks a sample that is representative of the population of interest so that they may properly generalize the results (e.g., if 80 percent of first-gen students in the sample were concerned with costs of college, then we can say there is a strong likelihood that 80 percent of first-gen students nationally are concerned with costs of college). The qualitative researcher does not seek to generalize in this way . They may want a representative sample because they are interested in typical responses or behaviors of the population of interest, but they may very well not want a representative sample at all. They might want an “extreme” or deviant case to highlight what could go wrong with a particular situation, or maybe they want to examine just one case as a way of understanding what elements might be of interest in further research. When thinking of your sample, you will have to know why you are selecting the units, and this relates back to your research question or sets of questions. It has nothing to do with having a representative sample to generalize results. You may be tempted—or it may be suggested to you by a quantitatively minded member of your committee—to create as large and representative a sample as you possibly can to earn credibility from quantitative researchers. Ignore this temptation or suggestion. The only thing you should be considering is what sample will best bring insight into the questions guiding your research. This has implications for the number of people (or units) in your study as well, which is the topic of the next section.

What is the Correct “Number” to Sample?

Because we are not trying to create a generalizable representative sample, the guidelines for the “number” of people to interview or news stories to code are also a bit more nebulous. There are some brilliant insightful studies out there with an n of 1 (meaning one person or one account used as the entire set of data). This is particularly so in the case of autoethnography, a variation of ethnographic research that uses the researcher’s own subject position and experiences as the basis of data collection and analysis. But it is true for all forms of qualitative research. There are no hard-and-fast rules here. The number to include is what is relevant and insightful to your particular study.

That said, humans do not thrive well under such ambiguity, and there are a few helpful suggestions that can be made. First, many qualitative researchers talk about “saturation” as the end point for data collection. You stop adding participants when you are no longer getting any new information (or so very little that the cost of adding another interview subject or spending another day in the field exceeds any likely benefits to the research). The term saturation was first used here by Glaser and Strauss ( 1967 ), the founders of Grounded Theory. Here is their explanation: “The criterion for judging when to stop sampling the different groups pertinent to a category is the category’s theoretical saturation . Saturation means that no additional data are being found whereby the sociologist can develop properties of the category. As he [or she] sees similar instances over and over again, the researcher becomes empirically confident that a category is saturated. [They go] out of [their] way to look for groups that stretch diversity of data as far as possible, just to make certain that saturation is based on the widest possible range of data on the category” ( 61 ).

It makes sense that the term was developed by grounded theorists, since this approach is rather more open-ended than other approaches used by qualitative researchers. With so much left open, having a guideline of “stop collecting data when you don’t find anything new” is reasonable. However, saturation can’t help much when first setting out your sample. How do you know how many people to contact to interview? What number will you put down in your institutional review board (IRB) protocol (see chapter 8)? You may guess how many people or units it will take to reach saturation, but there really is no way to know in advance. The best you can do is think about your population and your questions and look at what others have done with similar populations and questions.

Here are some suggestions to use as a starting point: For phenomenological studies, try to interview at least ten people for each major category or group of people . If you are comparing male-identified, female-identified, and gender-neutral college students in a study on gender regimes in social clubs, that means you might want to design a sample of thirty students, ten from each group. This is the minimum suggested number. Damaske’s ( 2021 ) sample of one hundred allows room for up to twenty-five participants in each of four “buckets” (e.g., working-class*female, working-class*male, middle-class*female, middle-class*male). If there is more than one comparative group (e.g., you are comparing students attending three different colleges, and you are comparing White and Black students in each), you can sometimes reduce the number for each group in your sample to five for, in this case, thirty total students. But that is really a bare minimum you will want to go. A lot of people will not trust you with only “five” cases in a bucket. Lareau ( 2021:24 ) advises a minimum of seven or nine for each bucket (or “cell,” in her words). The point is to think about what your analyses might look like and how comfortable you will be with a certain number of persons fitting each category.

Because qualitative research takes so much time and effort, it is rare for a beginning researcher to include more than thirty to fifty people or units in the study. You may not be able to conduct all the comparisons you might want simply because you cannot manage a larger sample. In that case, the limits of who you can reach or what you can include may influence you to rethink an original overcomplicated research design. Rather than include students from every racial group on a campus, for example, you might want to sample strategically, thinking about the most contrast (insightful), possibly excluding majority-race (White) students entirely, and simply using previous literature to fill in gaps in our understanding. For example, one of my former students was interested in discovering how race and class worked at a predominantly White institution (PWI). Due to time constraints, she simplified her study from an original sample frame of middle-class and working-class domestic Black and international African students (four buckets) to a sample frame of domestic Black and international African students (two buckets), allowing the complexities of class to come through individual accounts rather than from part of the sample frame. She wisely decided not to include White students in the sample, as her focus was on how minoritized students navigated the PWI. She was able to successfully complete her project and develop insights from the data with fewer than twenty interviewees. [1]

But what if you had unlimited time and resources? Would it always be better to interview more people or include more accounts, documents, and units of analysis? No! Your sample size should reflect your research question and the goals you have set yourself. Larger numbers can sometimes work against your goals. If, for example, you want to help bring out individual stories of success against the odds, adding more people to the analysis can end up drowning out those individual stories. Sometimes, the perfect size really is one (or three, or five). It really depends on what you are trying to discover and achieve in your study. Furthermore, studies of one hundred or more (people, documents, accounts, etc.) can sometimes be mistaken for quantitative research. Inevitably, the large sample size will push the researcher into simplifying the data numerically. And readers will begin to expect generalizability from such a large sample.

To summarize, “There are no rules for sample size in qualitative inquiry. Sample size depends on what you want to know, the purpose of the inquiry, what’s at stake, what will be useful, what will have credibility, and what can be done with available time and resources” ( Patton 2002:244 ).

How did you find/construct a sample?

Since qualitative researchers work with comparatively small sample sizes, getting your sample right is rather important. Yet it is also difficult to accomplish. For instance, a key question you need to ask yourself is whether you want a homogeneous or heterogeneous sample. In other words, do you want to include people in your study who are by and large the same, or do you want to have diversity in your sample?

For many years, I have studied the experiences of students who were the first in their families to attend university. There is a rather large number of sampling decisions I need to consider before starting the study. (1) Should I only talk to first-in-family students, or should I have a comparison group of students who are not first-in-family? (2) Do I need to strive for a gender distribution that matches undergraduate enrollment patterns? (3) Should I include participants that reflect diversity in gender identity and sexuality? (4) How about racial diversity? First-in-family status is strongly related to some ethnic or racial identity. (5) And how about areas of study?

As you can see, if I wanted to accommodate all these differences and get enough study participants in each category, I would quickly end up with a sample size of hundreds, which is not feasible in most qualitative research. In the end, for me, the most important decision was to maximize the voices of first-in-family students, which meant that I only included them in my sample. As for the other categories, I figured it was going to be hard enough to find first-in-family students, so I started recruiting with an open mind and an understanding that I may have to accept a lack of gender, sexuality, or racial diversity and then not be able to say anything about these issues. But I would definitely be able to speak about the experiences of being first-in-family.

—Wolfgang Lehmann, author of “Habitus Transformation and Hidden Injuries”

Examples of “Sample” Sections in Journal Articles

Think about some of the studies you have read in college, especially those with rich stories and accounts about people’s lives. Do you know how the people were selected to be the focus of those stories? If the account was published by an academic press (e.g., University of California Press or Princeton University Press) or in an academic journal, chances are that the author included a description of their sample selection. You can usually find these in a methodological appendix (book) or a section on “research methods” (article).

Here are two examples from recent books and one example from a recent article:

Example 1 . In It’s Not like I’m Poor: How Working Families Make Ends Meet in a Post-welfare World , the research team employed a mixed methods approach to understand how parents use the earned income tax credit, a refundable tax credit designed to provide relief for low- to moderate-income working people ( Halpern-Meekin et al. 2015 ). At the end of their book, their first appendix is “Introduction to Boston and the Research Project.” After describing the context of the study, they include the following description of their sample selection:

In June 2007, we drew 120 names at random from the roughly 332 surveys we gathered between February and April. Within each racial and ethnic group, we aimed for one-third married couples with children and two-thirds unmarried parents. We sent each of these families a letter informing them of the opportunity to participate in the in-depth portion of our study and then began calling the home and cell phone numbers they provided us on the surveys and knocking on the doors of the addresses they provided.…In the end, we interviewed 115 of the 120 families originally selected for the in-depth interview sample (the remaining five families declined to participate). ( 22 )

Was their sample selection based on convenience or purpose? Why do you think it was important for them to tell you that five families declined to be interviewed? There is actually a trick here, as the names were pulled randomly from a survey whose sample design was probabilistic. Why is this important to know? What can we say about the representativeness or the uniqueness of whatever findings are reported here?

Example 2 . In When Diversity Drops , Park ( 2013 ) examines the impact of decreasing campus diversity on the lives of college students. She does this through a case study of one student club, the InterVarsity Christian Fellowship (IVCF), at one university (“California University,” a pseudonym). Here is her description:

I supplemented participant observation with individual in-depth interviews with sixty IVCF associates, including thirty-four current students, eight former and current staff members, eleven alumni, and seven regional or national staff members. The racial/ethnic breakdown was twenty-five Asian Americans (41.6 percent), one Armenian (1.6 percent), twelve people who were black (20.0 percent), eight Latino/as (13.3 percent), three South Asian Americans (5.0 percent), and eleven people who were white (18.3 percent). Twenty-nine were men, and thirty-one were women. Looking back, I note that the higher number of Asian Americans reflected both the group’s racial/ethnic composition and my relative ease about approaching them for interviews. ( 156 )

How can you tell this is a convenience sample? What else do you note about the sample selection from this description?

Example 3. The last example is taken from an article published in the journal Research in Higher Education . Published articles tend to be more formal than books, at least when it comes to the presentation of qualitative research. In this article, Lawson ( 2021 ) is seeking to understand why female-identified college students drop out of majors that are dominated by male-identified students (e.g., engineering, computer science, music theory). Here is the entire relevant section of the article:

Method Participants Data were collected as part of a larger study designed to better understand the daily experiences of women in MDMs [male-dominated majors].…Participants included 120 students from a midsize, Midwestern University. This sample included 40 women and 40 men from MDMs—defined as any major where at least 2/3 of students are men at both the university and nationally—and 40 women from GNMs—defined as any may where 40–60% of students are women at both the university and nationally.… Procedure A multi-faceted approach was used to recruit participants; participants were sent targeted emails (obtained based on participants’ reported gender and major listings), campus-wide emails sent through the University’s Communication Center, flyers, and in-class presentations. Recruitment materials stated that the research focused on the daily experiences of college students, including classroom experiences, stressors, positive experiences, departmental contexts, and career aspirations. Interested participants were directed to email the study coordinator to verify eligibility (at least 18 years old, man/woman in MDM or woman in GNM, access to a smartphone). Sixteen interested individuals were not eligible for the study due to the gender/major combination. ( 482ff .)

What method of sample selection was used by Lawson? Why is it important to define “MDM” at the outset? How does this definition relate to sampling? Why were interested participants directed to the study coordinator to verify eligibility?

Final Words

I have found that students often find it difficult to be specific enough when defining and choosing their sample. It might help to think about your sample design and sample recruitment like a cookbook. You want all the details there so that someone else can pick up your study and conduct it as you intended. That person could be yourself, but this analogy might work better if you have someone else in mind. When I am writing down recipes, I often think of my sister and try to convey the details she would need to duplicate the dish. We share a grandmother whose recipes are full of handwritten notes in the margins, in spidery ink, that tell us what bowl to use when or where things could go wrong. Describe your sample clearly, convey the steps required accurately, and then add any other details that will help keep you on track and remind you why you have chosen to limit possible interviewees to those of a certain age or class or location. Imagine actually going out and getting your sample (making your dish). Do you have all the necessary details to get started?

Table 5.1. Sampling Type and Strategies

Further Readings

Fusch, Patricia I., and Lawrence R. Ness. 2015. “Are We There Yet? Data Saturation in Qualitative Research.” Qualitative Report 20(9):1408–1416.

Saunders, Benjamin, Julius Sim, Tom Kinstone, Shula Baker, Jackie Waterfield, Bernadette Bartlam, Heather Burroughs, and Clare Jinks. 2018. “Saturation in Qualitative Research: Exploring Its Conceptualization and Operationalization.”  Quality & Quantity  52(4):1893–1907.

  • Rubin ( 2021 ) suggests a minimum of twenty interviews (but safer with thirty) for an interview-based study and a minimum of three to six months in the field for ethnographic studies. For a content-based study, she suggests between five hundred and one thousand documents, although some will be “very small” ( 243–244 ). ↵

The process of selecting people or other units of analysis to represent a larger population. In quantitative research, this representation is taken quite literally, as statistically representative.  In qualitative research, in contrast, sample selection is often made based on potential to generate insight about a particular topic or phenomenon.

The actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population).  Sampling frames can differ from the larger population when specific exclusions are inherent, as in the case of pulling names randomly from voter registration rolls where not everyone is a registered voter.  This difference in frame and population can undercut the generalizability of quantitative results.

The specific group of individuals that you will collect data from.  Contrast population.

The large group of interest to the researcher.  Although it will likely be impossible to design a study that incorporates or reaches all members of the population of interest, this should be clearly defined at the outset of a study so that a reasonable sample of the population can be taken.  For example, if one is studying working-class college students, the sample may include twenty such students attending a particular college, while the population is “working-class college students.”  In quantitative research, clearly defining the general population of interest is a necessary step in generalizing results from a sample.  In qualitative research, defining the population is conceptually important for clarity.

A sampling strategy in which the sample is chosen to represent (numerically) the larger population from which it is drawn by random selection.  Each person in the population has an equal chance of making it into the sample.  This is often done through a lottery or other chance mechanisms (e.g., a random selection of every twelfth name on an alphabetical list of voters).  Also known as random sampling .

The selection of research participants or other data sources based on availability or accessibility, in contrast to purposive sampling .

A sample generated non-randomly by asking participants to help recruit more participants the idea being that a person who fits your sampling criteria probably knows other people with similar criteria.

Broad codes that are assigned to the main issues emerging in the data; identifying themes is often part of initial coding . 

A form of case selection focusing on examples that do not fit the emerging patterns. This allows the researcher to evaluate rival explanations or to define the limitations of their research findings. While disconfirming cases are found (not sought out), researchers should expand their analysis or rethink their theories to include/explain them.

A methodological tradition of inquiry and approach to analyzing qualitative data in which theories emerge from a rigorous and systematic process of induction.  This approach was pioneered by the sociologists Glaser and Strauss (1967).  The elements of theory generated from comparative analysis of data are, first, conceptual categories and their properties and, second, hypotheses or generalized relations among the categories and their properties – “The constant comparing of many groups draws the [researcher’s] attention to their many similarities and differences.  Considering these leads [the researcher] to generate abstract categories and their properties, which, since they emerge from the data, will clearly be important to a theory explaining the kind of behavior under observation.” (36).

The result of probability sampling, in which a sample is chosen to represent (numerically) the larger population from which it is drawn by random selection.  Each person in the population has an equal chance of making it into the random sample.  This is often done through a lottery or other chance mechanisms (e.g., the random selection of every twelfth name on an alphabetical list of voters).  This is typically not required in qualitative research but rather essential for the generalizability of quantitative research.

A form of case selection or purposeful sampling in which cases that are unusual or special in some way are chosen to highlight processes or to illuminate gaps in our knowledge of a phenomenon.   See also extreme case .

The point at which you can conclude data collection because every person you are interviewing, the interaction you are observing, or content you are analyzing merely confirms what you have already noted.  Achieving saturation is often used as the justification for the final sample size.

The accuracy with which results or findings can be transferred to situations or people other than those originally studied.  Qualitative studies generally are unable to use (and are uninterested in) statistical generalizability where the sample population is said to be able to predict or stand in for a larger population of interest.  Instead, qualitative researchers often discuss “theoretical generalizability,” in which the findings of a particular study can shed light on processes and mechanisms that may be at play in other settings.  See also statistical generalization and theoretical generalization .

A term used by IRBs to denote all materials aimed at recruiting participants into a research study (including printed advertisements, scripts, audio or video tapes, or websites).  Copies of this material are required in research protocols submitted to IRB.

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

InterQ Research

Determining Sample Size for Qualitative Research: What is the Magical Number?

  • August 21, 2018

Article Summary:   Sample sizes in qualitative research are much lower than those needed in quantitative research because of careful audience segmentation, study design, and the principle of data saturation.

Everyone wants to know the magical number for sample size when doing qualitative research .

Of course, when you’re doing quantitative research numbers matter more – especially when you’re aiming for statistically meaningful results. But when doing qualitative research, several factors must be considered before blindly accepting an arbitrary number. Here are some important points that should be kept in mind when thinking about sample size:

Point #1: Quality over Quantity

Whether your company is targeting moms from the Midwest with household incomes of $70k+, or teens who use Facebook more than 8 hours a week, or B2B decision makers in HR – you are most likely looking for insights on your services from a very specific audience type. That being said, the first objective when conducting qualitative market research should be to ensure the right people are recruited for the study.

The right respondents for the study are those who meet each and every criteria line-item identified from quantitative research studies, and also the criteria that the clients have identified through their own research. Thus, a high quality panel includes much more than just members who are pulled from a general population that falls within broad parameters. Only those participants who match the audience specifications and background relevance expressed by the client should be recruited.

Point #2: Choose an appropriate study design

The type of qualitative study is another important factor to consider when choosing sample size. There are various methods that can be used to gather insightful data, but not all methods may be applicable to your study and its aim. In-depth interviews , focus groups , and ethnographic research are the most common methods used in qualitative market research, each method being unique in the information it can provide and the setting it can be used in. Moreover, the types of questions being studied have an equally important role to play in deciding sample size. Thus before choosing a sample size, make sure all the parameters, aims, and controls are clearly outlined.

Point #3: Keep in mind the Principle of Saturation

A sample size should be large enough to sufficiently describe the phenomenon of interest, and address the research question at hand. But at the same time, a large sample size risks having repetitive data. The goal of qualitative research should thus be the attainment of saturation . Saturation occurs when adding more participants to the study does not result in obtaining additional perspectives or information. One can say there is a point of diminishing return with larger samples, as it leads to more data but doesn’t necessarily lead to more information.

The objective of qualitative research is to lessen discovery failure; whilst quantitative research aims to reduce estimation error. As qualitative research works to obtain diverse opinions from a sample size on a client’s product/service/project, saturated data does not serve to do anything. One respondent’s opinion is enough to generate a code, part of the analysis framework.

The goal of a qualitative study should be to have a large enough sample size to uncover a variety of opinions, but to limit the sample size at the point of saturation.

So… is there a magical number to aim for in qualitative research?

After bearing in mind all the factors mentioned above, there is a number you can aim for. Based on research conducted on this very issue, 30 seems to be a good number for the most comprehensive assessment. Some studies have noted having a sample size as little as 10 can be extremely fruitful, and still yield applicable results. This would of course be only after a rigorous recruiting process is put in place.

Our general recommendation for in-depth interviews is to have a sample size of 20-30, if we’re building similar segments within the population. In some cases, a minimum of 10 is acceptable – assuming the population integrity in recruiting.

If you would like to learn more about sample sizes and learn how to set up studies and perfect your moderating and interviewing skills, check out our InterQ Learning Labs training programs.

Let’s discuss your project and the type of sample size that would work best. Request a proposal >

Author Bio : Sakshi Shetty was an intern with InterQ Research, working as a coder for qualitative data. She now works as a product manager.

qualitative research respondents number

  • Request Proposal
  • Participate in Studies
  • Our Leadership Team
  • Our Approach
  • Mission, Vision and Core Values
  • Qualitative Research
  • Quantitative Research
  • Research Insights Workshops
  • Customer Journey Mapping
  • Millennial & Gen Z Market Research
  • Market Research Services
  • Our Clients
  • InterQ Blog

Qualitative Studies

Phillips-Wangensteen Building.

Qualitative Research Studies: Introduction

Introduction

Research design decides how research materials will be collected. One or more research methods, for example -- experiment, survey, interview, etc. -- are chosen depending on the research objectives. In some research contexts, a survey may be suitable. In other instances, interviews or case studies or observation might be more appropriate. Research design actually provides insights into “how” to conduct research using a particular research methodology. Basically, every researcher has a list of research questions that need to be assessed that can be done with research design.

So research design can be defined as a framework of research methods and techniques applied by a researcher to incorporate different elements & components of research in a systematic manner. Most significantly, research design provides insights into how to Conduct Research using a particular research methodology. 

Qualitative Methods try to gather detailed, rich data allowing for an in-depth understanding of research phenomena.  Seeks the “why” rather than the “how.”

Qualitative Data Collection

Data obtained using qualitative data collection methods can be used to find new ideas, opportunities, and problems, test their value and accuracy, formulate predictions, explore a certain field in more detail, and explain the numbers obtained using quantitative data collection techniques.

Since qualitative data collection methods usually do not involve numbers and mathematical calculations, qualitative data is often seen as more subjective, but at the same time, it allows a greater depth of understanding.

Aspers, P., Corte, U. What is Qualitative in Qualitative Research .  Qual Sociol   42 , 139–160 (2019). 

Types of Qualitative Studies

Qualitative study methods are semi-structured or unstructured, usually involve small sample sizes and lack strong scientific controls.

Qualitative Study Methods

Qualitative study methods employ many of the same methods as quantitative data collection, except that instead of structured or closed, they are semi- or unstructured and open-ended.  Some of the most common qualitative  study techniques include open-ended surveys and questionnaires, interviews, focus groups, observation, case studies, and so on.

There is generally five types of qualitative data collection:

  • Ethnography research: Involves semi-structure or unstructured interviews with open-ended questions; participant and non-participant observation; collected materials including documents, books, papers, audio, images, videos etc.
  • Phenomenological research : I n-depth interviewing which involves conducting intensive individual interviews with a small number of respondents to explore their perspectives on a particular idea, program, or situation.  The participant interviews may be structured, semi-structured or unstructured; it also includes reflective journals; written oral self-reports; and participant’s aesthetic expressions.
  • Grounded theory research: Data collection methods often include in-depth interviews using open-ended questions. Questions can be adjusted as theory emerges. Participant observation and focus groups may also be used as well as collecting and studying …  including documents, books, papers, audio, images, artifacts; videos etc. used by participants in their daily lives.
  • Narrative: Participant or non-participant interview, aesthetic expressions; one’s own and other’s observation; storytelling; letter writing; autobiographic writing; collected materials …..; personal information such as values. Narrative analysis focuses on different elements to make diverse but equally substantial and meaningful interpretations and conclusions. It is a genre of analytical frames used by researchers to interpret information with the context of research shared by all in daily life. 
  • Case study : Focus groups; semi-structured or unstructured interviews with open-ended questions; participant and non-participant observation; collected materials

Nayar, S., & Stanley, D. M. (Eds.). (2015).  Qualitative research methodologies for occupational science and therapy . London: Routledge.

Frank, G., & Polkinghorne, D. (2010). Qualitative Research in Occupational Therapy: From the First to the Second Generation . OTJR (Thorofare, N.J.), 30(2), 51-57.

How To Search for Qualitative Studies

Databases categorize their records using subject terms or controlled vocabularies. These Subject Headings vary for each database.

Medline/PubMed : MeSH Subject Headings

  • Qualitative Research : Any type of research that employs nonnumeric information to explore individual or group characteristics, producing findings not arrived at by statistical procedures or other quantitative means.  Includes Document Analysis & Hermaneutics.
  • Interviews as Topic:  Works about conversations with an individual or individuals held in order to obtain information about their background and other personal biographical data, their attitudes and opinions, etc. It includes works about school admission or job interviews.
  • Focus Groups : A method of data collection and a QUALITATIVE RESEARCH tool in which a small group of individuals are brought together and allowed to interact in a discussion of their opinions about topics, issues, or questions.
  • Grounded Theory : The generation of theories from analysis of empirical data.
  • Nursing Methodology Research :  Research carried out by nurses concerning techniques and methods to implement projects and to document information, including methods of interviewing patients, collecting data, and forming inferences. The concept includes exploration of methodological issues such as human subjectivity and human experience.
  • Anecdotes As Topic : Works about brief accounts or narratives of an incident or event.
  • Narration : The act, process, or an instance of narrating, i.e., telling a story. In the context of MEDICINE or ETHICS, narration includes relating the particular and the personal in the life story of an individual.
  • Personal Narratives As Topic:  Works about accounts of individual experience in relation to a particular field or of participation in related activities.
  • Observational Studies As Topic : Works about clinical studies in which participants may receive diagnostic, therapeutic, or other types of interventions, but the investigator does not assign participants to specific interventions (as in an interventional study).

CINAHL (Cumulative Index to Nursing & Allied Health) : CINAHL Subject Headings 

  • Action Research: Research in which problem definition, data collection, factor formulation, planned change, data analysis, and problem redefinition continue in an ongoing cycle.
  • Ethnographic Research: Research which seeks to uncover the symbols and categories that members of a given culture use to interpret their world.
  • Ethnological Research: Comparison and contrasting of cultures and societies as a whole.
  • Ethnonursing Research: The study and analysis of a designated culture's viewpoints, beliefs, and practices about nursing care behavior.
  • Grounded Theory: A qualitative method developed by Glaser and Strauss to unite theory construction and data analysis.
  • Naturalist Inquiry: The use of the natural setting in research to enable understanding the whole rather than only part of the reality being studied.
  • Phenomenological Research: Research designed to discover and understand the meaning of human life experiences.
  • Focus Groups : Small groups of individuals brought together to discuss their opinions regarding specific issues, topics, and questions.
  • Interviews:  Face-to-face or telephone meetings with subjects for the purpose of gathering information.
  • Narratives : Descriptions or interpretations of events, usually in an informal manner. Often used as a data collection method for research. Do not confuse with STORYTELLING, a form of literature or telling a real or imagined story to an audience or listener.
  • Descriptive Research : Research studies that have as their main objective the accurate portrayal of the characteristics of persons, situations, or groups, and the frequency with which certain phenomena occur.
  • Observational Methods:  Methods of data collection in which the investigator witnesses and records behaviors of interest.
  • Projective Techniques : A variety of methods for measuring by providing respondents with unstructured stimuli to which to respond.

In CINHAL, on the Advanced Search page, there are Search Options.  Scroll down to the Clinical Queries drop down box and choose to limit the search to  Qualitative-High Sensitivity; Qualitative-High Specificity ; Qualitative-Best Balance . High Sensitivity is the broadest search, to include ALL relevant material, but may also include less relevant materials. High Specificity is the most targeted search to include only the most relevant result set, but may miss some relevant materials. Best Balance retrieves the best balance between Sensitivity and Specificity.

PsycINFO: Subject Headings

  • Grounded Theory
  • Narrative Analysis
  • Thematic Analysis : A qualitative research strategy for identifying, analyzing, and reporting identifiable patterns or clusters within data.
  • Focus Grou p
  • Focus Group Interview
  • Semi-Structured Interview
  • Interpretive Phenomenological Analysis : A systematic qualitative approach in which a researcher explores how individual's make sense of particular experiences, events, and states, primarily through the analysis of data from structured and semi-structured interviews.
  • Qualitative Measures : Measures or tests employing qualitative methods and/or data, such as narratives, interviews, and focus groups.

As with CINAHL, you can limit to Methodology.  Click on Additional Limits, scroll down to "Methodology" and choose "Qualitative Study", "Focus Groups" or "Interview".

NOTE!: Be aware of  Inconsistent indexing. The above subject headings as not always indexed (i.e. added to articles) for qualitative research nor is the publication type/methodology.  So, to successfully find qualitative articles you also need to add keywords to your search strategy or if you are getting too few results, leave off the Clinical Queries or Methodology filters.

Free text keywords

Use selective free text keywords to search in Titles, Abstracts or Keywords of records held in the databases to identify Qualitative Research.  Examples:

When searching, do a combination of subject terms and keywords depending on the type of qualitative study you are looking for:

Qualitative Research [MeSH] OR (qualitative AND (research OR study OR method))

(Grounded Theory[MeSH] OR "grounded theory")

then combine it with your topic of interest

post-traumatic stress disorder OR PTSD

brain injury, OR BTI OR "traumatic, brain injury"

How to Critically Analyze Qualitative Studies

 A critical analysis of a qualitative study considers the “fit” of the research question with the qualitative method used in the study. There are many checklists available for the assessment of qualitative research studies.  Here are a few:

  • The Johanna Briggs Institute: The Joanna Briggs Institute Critical Appraisal tools  for use in JBI Systematic Reviews Checklist for  Qualitative Research  
  • CASP:  CASP Checklist: 10 questions to help you make sense of a Qualitative research
  • McMaster University:  Guidelines for Critical Review Form:  Qualitative Studies (Version 2.0) © Letts, L., Wilkins, S., Law, M., Stewart, D., Bosch, J., & Westmorland, M., 2007  

NOTE:  When using these checklists, be sure to use them critically and with careful consideration of the research context.  In other words, use the checklists as the beginning point in assessing the article and then re-assess the article based on whether the findings can be applied in your setting/population/disease/condition.

Additional Resources

Moorley, C., & Cathala, X. (2019). How to appraise qualitative research .  Evidence-Based Nursing ,  22 (1), 10-13.    ( open access)

Stenfors, T., Kajamaa, A. and Bennett, D. (2020), How to … assess the quality of qualitative research . Clin Teach, 17: 596-599.

Greenhalgh, T., & Taylor, R. (1997). How to read a paper: Papers that go beyond numbers (qualitative research).   BMj ,  315 (7110), 740-743. 

Jeanfreau, S. G., & Jack, L., Jr (2010). Appraising qualitative research in health education: guidelines for public health educators.   Health promotion practice ,  11 (5), 612–617. 

Research Series - Critical appraisal of qualitative research when reading papers Jul 22, 2022 Virtual Tutor; Research Series (Elsevier Health Education) YouTube Video 10:04 min [ This episode Professor Dall'Ora will be looking at qualitative research in more detail. In particular how to critically appraise qualitative studies.]

Hanes K. Chapter 4: Critical appraisal of qualitative research. In: Noyes J, Booth A, Hannes K, Harden A, Harris J, Lewin S, Lockwood C (editors), Supplementary Guidance for Inclusion of Qualitative Research in Cochrane Systematic Reviews of Interventions. Version 1 (updated August 2011). Cochrane Collaboration Qualitative Methods Group, 2011. 

David Tod, Andrew Booth & Brett Smith (2022)  Critical appraisal ,  International Review of Sport and Exercise Psychology, 15:1, 52-72  (open access)

Validity & Reliability in Qualitative Studies

Validity & Reliability

Validity in qualitative research means the “appropriateness” of the tools, processes, and data -- are the tools, processes and data measuring what it is intended to measure to answer the research question?  Assessing for validity is looking to see if the research question is "valid" for the desired outcome -- whether the choice of of the methodology used was appropriate for answering the research question, was the study design valid for the methodology, were the appropriate sampling and data analysis used and finally, were the results and conclusions valid for the sample and within the context of the research question. 

In contrast, reliability concerns the degree of consistency in the results if the study, using the same methodology, can be repeated over and over.

The Basics of Validity and Reliability in Research by Joe O'Brian & Anders Orn, Research Collective.com

Brewer, M., & Crano, W. (2014). Research Design and Issues of Validity. In H. Reis & C. Judd (Eds.),  Handbook of Research Methods in Social and Personality Psychology  (pp. 11-26). Cambridge: Cambridge University Press. 

Golafshani, N. (2003). Understanding Reliability and Validity in Qualitative Research.   The Qualitative Report ,  8 (4), 597-606. 

Cypress, Brigitte S. EdD, RN, CCRN. Rigor or Reliability and Validity in Qualitative Research: Perspectives, Strategies, Reconceptualization, and Recommendations . Dimensions of Critical Care Nursing 36(4):p 253-263, 7/8 2017. 

Leung L. (2015). Validity, reliability, and generalizability in qualitative research .  Journal of family medicine and primary care ,  4 (3), 324–327. 

Understanding Reliability and Validity . Writing@CSU

Rosumeck, S., Wagner, M., Wallraf, S., & Euler, U. (2020). A validation study revealed differences in design and performance of search filters for qualitative research in PsycINFO and CINAHL.   Journal of clinical epidemiology ,  128 , 101–108. 

Wagner, M., Rosumeck, S., Küffmeier, C., Döring, K., & Euler, U. (2020). A validation study revealed differences in design and performance of MEDLINE search filters for qualitative research .  Journal of clinical epidemiology ,  120 , 17–24.

Franzel, B., Schwiegershausen, M., Heusser, P.  et al.   How to locate and appraise qualitative research in complementary and alternative medicine.   BMC Complement Altern Med   13 , 125 (2013). 

Finfgeld-Connett, D. and Johnson, E.D. (2013), Literature search strategies for conducting knowledge-building and theory-generating qualitative systematic reviews. Journal of Advanced Nursing, 69: 194-204. 

Rogers, M, Bethel, A, Abbott, R.  Locating qualitative studies in dementia on MEDLINE, EMBASE, CINAHL, and PsycINFO: A comparison of search strategies.   Res Syn Meth . 2018; 9: 579– 586. 

Booth, A. Searching for qualitative research for inclusion in systematic reviews: a structured methodological review .  Syst Rev   5 , 74 (2016). 

Noyes, J., Hannes, K., Booth, A., Harris, J., Harden, A., Popay, J., ... & Pantoja, T. (2015). Qualitative research and Cochrane reviews .

Citing Sources

Citations are brief notations in the body of a research paper that point to a source in the bibliography or references cited section.

If your paper quotes, paraphrases, summarizes the work of someone else, you need to use citations.

Citation style guides such as APA, Chicago and MLA provide detailed instructions on how citations and bibliographies should be formatted.

Health Sciences Research Toolkit

Resources, tips, and guidelines to help you through the research process., finding information.

Library Research Checklist Helpful hints for starting a library research project.

Search Strategy Checklist and Tips Helpful tips on how to develop a literature search strategy.

Boolean Operators: A Cheat Sheet Boolean logic (named after mathematician George Boole) is a system of logic to designed to yield optimal search results. The Boolean operators, AND, OR, and NOT, help you construct a logical search. Boolean operators act on sets -- groups of records containing a particular word or concept.

Literature Searching Overview and tips on how to conduct a literature search.

Health Statistics and Data Sources Health related statistics and data sources are increasingly available on the Internet. They can be found already neatly packaged, or as raw data sets. The most reliable data comes from governmental sources or health-care professional organizations.

Evaluating Information

Primary, Secondary and Tertiary Sources in the Health Sciences Understand what are considered primary, secondary and tertiary sources.

Scholarly vs Popular Journals/Magazines How to determine what are scholarly journals vs trade or popular magazines.

Identifying Peer-Reviewed Journals A “peer-reviewed” or “refereed” journal is one in which the articles it contains have been examined by people with credentials in the article’s field of study before it is published.

Evaluating Web  Resources When searching for information on the Internet, it is important to be aware of the quality of the information being presented to you. Keep in mind that anyone can host a web site. To be sure that the information you are looking at is credible and of value.

Conducting Research Through An Anti-Racism Lens This guide is for students, staff, and faculty who are incorporating an anti-racist lens at all stages of the research life cycle.

Understanding Research Study Designs Covers case studies, randomized control trials, systematic reviews and meta-analysis.

Qualitative Studies Overview of what is a qualitative study and how to recognize, find and critically appraise.

Writing and Publishing

Citing Sources Citations are brief notations in the body of a research paper that point to a source in the bibliography or references cited section.

Structure of a Research Paper Reports of research studies usually follow the IMRAD format. IMRAD (Introduction, Methods, Results, [and] Discussion) is a mnemonic for the major components of a scientific paper. These elements are included in the overall structure of a research paper.

Top Reasons for Non-Acceptance of Scientific Articles Avoid these mistakes when preparing an article for publication.

Annotated Bibliographies Guide on how to create an annotated bibliography.

Writing guides, Style Manuals and the Publication Process in the Biological and Health Sciences Style manuals, citation guides as well as information on public access policies, copyright and plagiarism.

  • Open access
  • Published: 27 May 2020

How to use and assess qualitative research methods

  • Loraine Busetto   ORCID: orcid.org/0000-0002-9228-7875 1 ,
  • Wolfgang Wick 1 , 2 &
  • Christoph Gumbinger 1  

Neurological Research and Practice volume  2 , Article number:  14 ( 2020 ) Cite this article

697k Accesses

272 Citations

88 Altmetric

Metrics details

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 , 8 , 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 , 10 , 11 , 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

figure 1

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.

Observations

Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

figure 2

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

figure 3

From data collection to data analysis

Attributions for icons: see Fig. 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 , 25 , 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

figure 4

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

Reflexivity

While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 , 32 , 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 , 38 , 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.

Randomisation

While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Availability of data and materials

Not applicable.

Abbreviations

Endovascular treatment

Randomised Controlled Trial

Standard Operating Procedure

Standards for Reporting Qualitative Research

Philipsen, H., & Vernooij-Dassen, M. (2007). Kwalitatief onderzoek: nuttig, onmisbaar en uitdagend. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Qualitative research: useful, indispensable and challenging. In: Qualitative research: Practical methods for medical practice (pp. 5–12). Houten: Bohn Stafleu van Loghum.

Chapter   Google Scholar  

Punch, K. F. (2013). Introduction to social research: Quantitative and qualitative approaches . London: Sage.

Kelly, J., Dwyer, J., Willis, E., & Pekarsky, B. (2014). Travelling to the city for hospital care: Access factors in country aboriginal patient journeys. Australian Journal of Rural Health, 22 (3), 109–113.

Article   Google Scholar  

Nilsen, P., Ståhl, C., Roback, K., & Cairney, P. (2013). Never the twain shall meet? - a comparison of implementation science and policy implementation research. Implementation Science, 8 (1), 1–12.

Howick J, Chalmers I, Glasziou, P., Greenhalgh, T., Heneghan, C., Liberati, A., Moschetti, I., Phillips, B., & Thornton, H. (2011). The 2011 Oxford CEBM evidence levels of evidence (introductory document) . Oxford Center for Evidence Based Medicine. https://www.cebm.net/2011/06/2011-oxford-cebm-levels-evidence-introductory-document/ .

Eakin, J. M. (2016). Educating critical qualitative health researchers in the land of the randomized controlled trial. Qualitative Inquiry, 22 (2), 107–118.

May, A., & Mathijssen, J. (2015). Alternatieven voor RCT bij de evaluatie van effectiviteit van interventies!? Eindrapportage. In Alternatives for RCTs in the evaluation of effectiveness of interventions!? Final report .

Google Scholar  

Berwick, D. M. (2008). The science of improvement. Journal of the American Medical Association, 299 (10), 1182–1184.

Article   CAS   Google Scholar  

Christ, T. W. (2014). Scientific-based research and randomized controlled trials, the “gold” standard? Alternative paradigms and mixed methodologies. Qualitative Inquiry, 20 (1), 72–80.

Lamont, T., Barber, N., Jd, P., Fulop, N., Garfield-Birkbeck, S., Lilford, R., Mear, L., Raine, R., & Fitzpatrick, R. (2016). New approaches to evaluating complex health and care systems. BMJ, 352:i154.

Drabble, S. J., & O’Cathain, A. (2015). Moving from Randomized Controlled Trials to Mixed Methods Intervention Evaluation. In S. Hesse-Biber & R. B. Johnson (Eds.), The Oxford Handbook of Multimethod and Mixed Methods Research Inquiry (pp. 406–425). London: Oxford University Press.

Chambers, D. A., Glasgow, R. E., & Stange, K. C. (2013). The dynamic sustainability framework: Addressing the paradox of sustainment amid ongoing change. Implementation Science : IS, 8 , 117.

Hak, T. (2007). Waarnemingsmethoden in kwalitatief onderzoek. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Observation methods in qualitative research] (pp. 13–25). Houten: Bohn Stafleu van Loghum.

Russell, C. K., & Gregory, D. M. (2003). Evaluation of qualitative research studies. Evidence Based Nursing, 6 (2), 36–40.

Fossey, E., Harvey, C., McDermott, F., & Davidson, L. (2002). Understanding and evaluating qualitative research. Australian and New Zealand Journal of Psychiatry, 36 , 717–732.

Yanow, D. (2000). Conducting interpretive policy analysis (Vol. 47). Thousand Oaks: Sage University Papers Series on Qualitative Research Methods.

Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects. Education for Information, 22 , 63–75.

van der Geest, S. (2006). Participeren in ziekte en zorg: meer over kwalitatief onderzoek. Huisarts en Wetenschap, 49 (4), 283–287.

Hijmans, E., & Kuyper, M. (2007). Het halfopen interview als onderzoeksmethode. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [The half-open interview as research method (pp. 43–51). Houten: Bohn Stafleu van Loghum.

Jansen, H. (2007). Systematiek en toepassing van de kwalitatieve survey. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Systematics and implementation of the qualitative survey (pp. 27–41). Houten: Bohn Stafleu van Loghum.

Pv, R., & Peremans, L. (2007). Exploreren met focusgroepgesprekken: de ‘stem’ van de groep onder de loep. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Exploring with focus group conversations: the “voice” of the group under the magnifying glass (pp. 53–64). Houten: Bohn Stafleu van Loghum.

Carter, N., Bryant-Lukosius, D., DiCenso, A., Blythe, J., & Neville, A. J. (2014). The use of triangulation in qualitative research. Oncology Nursing Forum, 41 (5), 545–547.

Boeije H: Analyseren in kwalitatief onderzoek: Denken en doen, [Analysis in qualitative research: Thinking and doing] vol. Den Haag Boom Lemma uitgevers; 2012.

Hunter, A., & Brewer, J. (2015). Designing Multimethod Research. In S. Hesse-Biber & R. B. Johnson (Eds.), The Oxford Handbook of Multimethod and Mixed Methods Research Inquiry (pp. 185–205). London: Oxford University Press.

Archibald, M. M., Radil, A. I., Zhang, X., & Hanson, W. E. (2015). Current mixed methods practices in qualitative research: A content analysis of leading journals. International Journal of Qualitative Methods, 14 (2), 5–33.

Creswell, J. W., & Plano Clark, V. L. (2011). Choosing a Mixed Methods Design. In Designing and Conducting Mixed Methods Research . Thousand Oaks: SAGE Publications.

Mays, N., & Pope, C. (2000). Assessing quality in qualitative research. BMJ, 320 (7226), 50–52.

O'Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., & Cook, D. A. (2014). Standards for reporting qualitative research: A synthesis of recommendations. Academic Medicine : Journal of the Association of American Medical Colleges, 89 (9), 1245–1251.

Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H., & Jinks, C. (2018). Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality and Quantity, 52 (4), 1893–1907.

Moser, A., & Korstjens, I. (2018). Series: Practical guidance to qualitative research. Part 3: Sampling, data collection and analysis. European Journal of General Practice, 24 (1), 9–18.

Marlett, N., Shklarov, S., Marshall, D., Santana, M. J., & Wasylak, T. (2015). Building new roles and relationships in research: A model of patient engagement research. Quality of Life Research : an international journal of quality of life aspects of treatment, care and rehabilitation, 24 (5), 1057–1067.

Demian, M. N., Lam, N. N., Mac-Way, F., Sapir-Pichhadze, R., & Fernandez, N. (2017). Opportunities for engaging patients in kidney research. Canadian Journal of Kidney Health and Disease, 4 , 2054358117703070–2054358117703070.

Noyes, J., McLaughlin, L., Morgan, K., Roberts, A., Stephens, M., Bourne, J., Houlston, M., Houlston, J., Thomas, S., Rhys, R. G., et al. (2019). Designing a co-productive study to overcome known methodological challenges in organ donation research with bereaved family members. Health Expectations . 22(4):824–35.

Piil, K., Jarden, M., & Pii, K. H. (2019). Research agenda for life-threatening cancer. European Journal Cancer Care (Engl), 28 (1), e12935.

Hofmann, D., Ibrahim, F., Rose, D., Scott, D. L., Cope, A., Wykes, T., & Lempp, H. (2015). Expectations of new treatment in rheumatoid arthritis: Developing a patient-generated questionnaire. Health Expectations : an international journal of public participation in health care and health policy, 18 (5), 995–1008.

Jun, M., Manns, B., Laupacis, A., Manns, L., Rehal, B., Crowe, S., & Hemmelgarn, B. R. (2015). Assessing the extent to which current clinical research is consistent with patient priorities: A scoping review using a case study in patients on or nearing dialysis. Canadian Journal of Kidney Health and Disease, 2 , 35.

Elsie Baker, S., & Edwards, R. (2012). How many qualitative interviews is enough? In National Centre for Research Methods Review Paper . National Centre for Research Methods. http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.pdf .

Sandelowski, M. (1995). Sample size in qualitative research. Research in Nursing & Health, 18 (2), 179–183.

Sim, J., Saunders, B., Waterfield, J., & Kingstone, T. (2018). Can sample size in qualitative research be determined a priori? International Journal of Social Research Methodology, 21 (5), 619–634.

Download references

Acknowledgements

no external funding.

Author information

Authors and affiliations.

Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany

Loraine Busetto, Wolfgang Wick & Christoph Gumbinger

Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Wolfgang Wick

You can also search for this author in PubMed   Google Scholar

Contributions

LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

Corresponding author

Correspondence to Loraine Busetto .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Busetto, L., Wick, W. & Gumbinger, C. How to use and assess qualitative research methods. Neurol. Res. Pract. 2 , 14 (2020). https://doi.org/10.1186/s42466-020-00059-z

Download citation

Received : 30 January 2020

Accepted : 22 April 2020

Published : 27 May 2020

DOI : https://doi.org/10.1186/s42466-020-00059-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Qualitative research
  • Mixed methods
  • Quality assessment

Neurological Research and Practice

ISSN: 2524-3489

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

qualitative research respondents number

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Neurol Res Pract

Logo of neurrp

How to use and assess qualitative research methods

Loraine busetto.

1 Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120 Heidelberg, Germany

Wolfgang Wick

2 Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Christoph Gumbinger

Associated data.

Not applicable.

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 – 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 – 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig1_HTML.jpg

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.

Observations

Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig2_HTML.jpg

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig3_HTML.jpg

From data collection to data analysis

Attributions for icons: see Fig. ​ Fig.2, 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 – 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig4_HTML.jpg

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

Reflexivity

While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 – 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 – 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.

Randomisation

While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table ​ Table1. 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Take-away-points

Acknowledgements

Abbreviations, authors’ contributions.

LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

no external funding.

Availability of data and materials

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The Federal Register

The daily journal of the united states government, request access.

Due to aggressive automated scraping of FederalRegister.gov and eCFR.gov, programmatic access to these sites is limited to access to our extensive developer APIs.

If you are human user receiving this message, we can add your IP address to a set of IPs that can access FederalRegister.gov & eCFR.gov; complete the CAPTCHA (bot test) below and click "Request Access". This process will be necessary for each IP address you wish to access the site from, requests are valid for approximately one quarter (three months) after which the process may need to be repeated.

An official website of the United States government.

If you want to request a wider IP range, first request access for your current IP, and then use the "Site Feedback" button found in the lower left-hand side to make the request.

IMAGES

  1. Respondents for qualitative data.

    qualitative research respondents number

  2. Understanding Qualitative Research: An In-Depth Study Guide

    qualitative research respondents number

  3. Distribution of Research Respondents

    qualitative research respondents number

  4. How to Write Awesome Qualitative Research Questions: Types & Examples

    qualitative research respondents number

  5. How To Write A Findings Section For Qualitative Research

    qualitative research respondents number

  6. Types Of Qualitative Research Designs And Examples

    qualitative research respondents number

VIDEO

  1. Qualitative Research Overview, Types and Relevance (Unit 2)

  2. PanelPlace : Permanent Work From Home Job

  3. Exploring Qualitative and Quantitative Research Methods and why you should use them

  4. Global Consulting And Market Research Firm

  5. Depth interview Marketintg Research Reasearch Methodology

  6. Difference between Qualitative research and Quantitative research

COMMENTS

  1. PDF Determining the Sample in Qualitative Research

    called the 'participants' or 'informants' rather than respondents (Nakkeeran, 2016; Padgett, ... number of participants in qualitative inquiry that is lacking in literature. Therefore, this ... in qualitative research', 'qualitative sample size', 'number of participants', and 'qualitative

  2. Big enough? Sampling in qualitative inquiry

    Any senior researcher, or seasoned mentor, has a practiced response to the 'how many' question. Mine tends to start with a reminder about the different philosophical assumptions undergirding qualitative and quantitative research projects (Staller, 2013).As Abrams (2010) points out, this difference leads to "major differences in sampling goals and strategies."(p.537).

  3. Sample sizes for saturation in qualitative research: A systematic

    These research objectives are typical of much qualitative heath research. The sample size of the datasets used varied from 14 to 132 interviews and 1 to 40 focus groups. All datasets except one ( Francis et al., 2010 ) had a sample that was much larger than the sample ultimately needed for saturation, making them effective for assessing saturation.

  4. How many participants do I need for qualitative research?

    The answer lies somewhere in between. It's often a good idea (for qualitative research methods like interviews and usability tests) to start with 5 participants and then scale up by a further 5 based on how complicated the subject matter is. You may also find it helpful to add additional participants if you're new to user research or you ...

  5. Sample size: how many participants do I need in my research?

    It is the ability of the test to detect a difference in the sample, when it exists in the target population. Calculated as 1-Beta. The greater the power, the larger the required sample size will be. A value between 80%-90% is usually used. Relationship between non-exposed/exposed groups in the sample.

  6. What's in a Number? Understanding the Right Sample Size for Qualitative

    Between 15-30. Based on research conducted on this issue, if you are building similar segments within the population, InterQ's recommendation for in-depth interviews is to have a sample size of 15-30. In some cases, a minimum of 10 is sufficient, assuming there has been integrity in the recruiting process. With the goal to maintain a rigorous ...

  7. Series: Practical guidance to qualitative research. Part 3: Sampling

    This article is the third paper in a series of four articles aiming to provide practical guidance to qualitative research. In an introductory paper, we have described the objective, nature and outline of the Series . Part 2 of the series focused on context, research questions and design of qualitative research . In this paper, Part 3, we ...

  8. Sample Size Policy for Qualitative Studies Using In-Depth Interviews

    There are several debates concerning what sample size is the right size for such endeavors. Most scholars argue that the concept of saturation is the most important factor to think about when mulling over sample size decisions in qualitative research (Mason, 2010).Saturation is defined by many as the point at which the data collection process no longer offers any new or relevant data.

  9. Determining the Sample Size in Qualitative Research

    finds a variation of the sample size from 1 to 95 (averages being of 31 in the first ca se and 28 in the. second). The research region - one of t he cultural factors, plays a significant role in ...

  10. Characterising and justifying sample size sufficiency in interview

    Sample adequacy in qualitative inquiry pertains to the appropriateness of the sample composition and size.It is an important consideration in evaluations of the quality and trustworthiness of much qualitative research [] and is implicated - particularly for research that is situated within a post-positivist tradition and retains a degree of commitment to realist ontological premises - in ...

  11. Sample size for qualitative research

    Marshall and colleagues refer to a sample size of 20 as being small for a grounded theory-type approach to qualitative research and to 40 being a large sample size for the same type of study. This gives a range of what sample size they would consider appropriate, and later in the same paper, they recommend a range of 20-30 interviews for ...

  12. Qualitative Study

    Qualitative research is a type of research that explores and provides deeper insights into real-world problems.[1] Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants ...

  13. Criteria for Good Qualitative Research: A Comprehensive Review

    Fundamental Criteria: General Research Quality. Various researchers have put forward criteria for evaluating qualitative research, which have been summarized in Table 3.Also, the criteria outlined in Table 4 effectively deliver the various approaches to evaluate and assess the quality of qualitative work. The entries in Table 4 are based on Tracy's "Eight big‐tent criteria for excellent ...

  14. (PDF) Sample Size for Interview in Qualitative Research in Social

    In qualitative research, the sample size relates to the number required to fully notify all crucial aspects of the phenomena being explored; the size of the sample plays a critical role in ...

  15. Chapter 5. Sampling

    Sampling in qualitative research has different purposes and goals than sampling in quantitative research. Sampling in both allows you to say something of interest about a population without having to include the entire population in your sample. We begin this chapter with the case of a population of interest composed of actual people.

  16. Determining Sample Size For Qualitative Research: What Is The Magical

    Based on research conducted on this very issue, 30 seems to be a good number for the most comprehensive assessment. Some studies have noted having a sample size as little as 10 can be extremely fruitful, and still yield applicable results. This would of course be only after a rigorous recruiting process is put in place.

  17. Planning Qualitative Research: Design and Decision Making for New

    Qualitative research draws from interpretivist and constructivist paradigms, seeking to deeply understand a research subject rather than predict outcomes, as in the positivist paradigm (Denzin & Lincoln, 2011).Interpretivism seeks to build knowledge from understanding individuals' unique viewpoints and the meaning attached to those viewpoints (Creswell & Poth, 2018).

  18. Home

    Phenomenological research: I n-depth interviewing which involves conducting intensive individual interviews with a small number of respondents to explore their perspectives on a particular idea, program, or situation. ... Qualitative Research [MeSH] OR (qualitative AND (research OR study OR method)) (Grounded Theory[MeSH] OR "grounded theory")

  19. Qualitative Research Part II: Participants, Analysis, and Quality

    This is the second of a two-part series on qualitative research. Part 1 in the December 2011 issue of Journal of Graduate Medical Education provided an introduction to the topic and compared characteristics of quantitative and qualitative research, identified common data collection approaches, and briefly described data analysis and quality assessment techniques.

  20. Sample Size for Interview in Qualitative Research in Social Sciences: A

    about the required number of respondents depends on. According to Baker and Edwards (2012), Julia Brannen has suggested that the most important issue in deciding 'how many qualitative ... qualitative research can be justified by 'theoretical saturation'. According to Dawson (2002), if

  21. How to use and assess qualitative research methods

    This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common ...

  22. How many participants do I need in my qualitative research?

    It is all about reaching the point of saturation or the point where you are already getting repetitive responses (You may want to check Egon and Guba, 1985). Over time some researchers say that ...

  23. How to use and assess qualitative research methods

    Qualitative research is defined as "the study of the nature of phenomena", including "their quality, different manifestations, ... the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised . In doing so, the interviewer learns which wording or types of ...

  24. Beyond Sleep: Investigating User Needs in Today's Bedrooms

    The total number of respondents who both responded to "sleep alone" and "sleep with pets" was seven. The responses received from the 'other' category revealed that three people (1%) defined their bedrooms as their cars. ... V. Successful Qualitative Research: A Practical Guide for Beginners; Sage Publications: London, UK, 2013 ...

  25. Federal Register :: Proposed Collection; 60-Day Comment Request

    To obtain a copy of the data collection plans and instruments, submit comments in writing, or request more information on the proposed project, contact: Ms. Mikia P. Currie, Chief Project Clearance Officer, Office of Policy for Extramural Research Administration, 6705 Rockledge Drive, Suite 350, Bethesda, Maryland, 20892 or call non-toll-free ...

  26. Federal Register, Volume 89 Issue 70 (Wednesday, April 10, 2024)

    Extramural Research Administration, 6705 Rockledge Drive, Suite 350, Bethesda, Maryland, 20892 or call non-toll-free number (301) 435-0941. or email your request, including your address to: [email protected]. Formal requests for additional plans and instruments must be requested. in writing.