• Search Menu
  • Advance articles
  • Editor's Choice
  • 100 years of the AJE
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • About American Journal of Epidemiology
  • About the Johns Hopkins Bloomberg School of Public Health
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Society for Epidemiologic Research

Article Contents

Abbreviations, a framework for hypothesis generation, acknowledgments.

  • < Previous

Hypothesis Generation During Foodborne-Illness Outbreak Investigations

  • Article contents
  • Figures & tables
  • Supplementary Data

Alice E White, Kirk E Smith, Hillary Booth, Carlota Medus, Robert V Tauxe, Laura Gieraltowski, Elaine Scallan Walter, Hypothesis Generation During Foodborne-Illness Outbreak Investigations, American Journal of Epidemiology , Volume 190, Issue 10, October 2021, Pages 2188–2197, https://doi.org/10.1093/aje/kwab118

  • Permissions Icon Permissions

Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. The pathogens that contaminate food have many diverse reservoirs, resulting in seemingly limitless potential vehicles. Identifying a vehicle is particularly challenging for clusters detected through national pathogen-specific surveillance, because cases can be geographically dispersed and lack an obvious epidemiologic link. Moreover, state and local health departments could have limited resources to dedicate to cluster and outbreak investigations. These challenges underscore the importance of hypothesis generation during an outbreak investigation. In this review, we present a framework for hypothesis generation focusing on 3 primary sources of information, typically used in combination: 1) known sources of the pathogen causing illness; 2) person, place, and time characteristics of cases associated with the outbreak (descriptive data); and 3) case exposure assessment. Hypothesis generation can narrow the list of potential food vehicles and focus subsequent epidemiologic, laboratory, environmental, and traceback efforts, ensuring that time and resources are used more efficiently and increasing the likelihood of rapidly and conclusively implicating the contaminated food vehicle.

Shiga toxin-producing Escherichia coli

pulsed-field gel electrophoresis

whole-genome sequencing

hypothesis-generating questionnaire

Foodborne diseases are a continuing public health problem in the United States, where they cause an estimated 48 million illnesses, 128,000 hospitalizations, and 3,000 deaths annually ( 1 ). Public health and regulatory agencies rely on data from foodborne disease surveillance and outbreak investigations to prioritize food safety regulations, policies, and practices aimed at reducing the burden of disease ( 2 ). In particular, foodborne illness outbreaks provide critical information on the foods causing illness, common food-pathogen pairs, and high-risk production technologies and practices. However, only half of the foodborne outbreaks reported each year identify a pathogen, and less than half implicate a food vehicle, decreasing the utility of these data ( 3 ).

A model framework for hypothesis generation during a foodborne-illness outbreak investigation.

A model framework for hypothesis generation during a foodborne-illness outbreak investigation.

Foodborne disease outbreaks require rapid public health response to quickly identify potential sources and prevent future exposures; however, implicating a food vehicle in an outbreak can be challenging. The pathogens that contaminate food have many diverse reservoirs and can be transmitted in other ways (e.g., from one person to another or through contact with animals or contaminated water), resulting in seemingly limitless potential vehicles ( 2 ). Identifying a food vehicle is particularly challenging for clusters detected through national pathogen-specific surveillance: Cases can be geographically dispersed and lack an obvious epidemiologic link ( 4 ). Moreover, state and local health departments might have limited resources to dedicate to cluster and outbreak investigations ( 5 ). These challenges underscore the importance of hypothesis generation during an outbreak investigation. Hypothesis generation can narrow the list of potential food vehicles and focus subsequent epidemiologic, laboratory, environmental, and traceback efforts, ensuring that time and resources are used more efficiently and increasing the likelihood of timely identification of the vehicle. Timely investigations can prevent additional illnesses and increase the likelihood of identifying factors contributing to the outbreak.

The Integrated Food Safety Centers of Excellence were established in 2012 under the Food Safety Modernization Act to serve as resources for federal, state, and local public health professionals who detect and respond to foodborne illness outbreaks. The Integrated Food Safety Centers of Excellence aim to improve the quality of foodborne-illness outbreak investigations by providing public health professionals with training, tools, and model practices. In this paper, we provide a framework for generating hypotheses early during investigation of an outbreak or cluster detected through pathogen-specific surveillance; highlight tools to support rapid and effective hypothesis generation; and illustrate the practice of hypothesis generation using example outbreak case studies.

A hypothesis is “a supposition, arrived at from observation or reflection, that leads to refutable predictions; (or) any conjecture cast in a form that will allow it to be tested and refuted” ( 6 ). In a foodborne outbreak, the hypothesis states which food vehicle(s) could be the source of the outbreak and warrant further investigation. In practice, hypothesis generation is dynamic and iterative. It begins in the earliest stages of an investigation as investigators review available information and look for a pattern or “signal” that might emerge. As more information becomes available hypotheses are frequently evaluated and refined.

The framework presented here focuses on 3 primary sources of information for generating hypotheses, typically used in combination: 1) known sources of the pathogen causing illness; 2) person, place, and time characteristics of cases associated with the outbreak (descriptive data); and 3) case exposure assessment ( Figure 1 ). We discuss the approach for collecting, summarizing, and interpreting each of these sources of information and provide example outbreak case studies ( Table 1 ). We focus primarily on food exposures. However, at the onset of an investigation the transmission route is often unknown, and many pathogens commonly transmitted though food can also be transmitted through other routes (e.g., animal contact, person-to-person, waterborne). Thus, hypothesis generation should consider all potential transmission routes early in the investigation. Moreover, hypothesis generation should involve a multidisciplinary outbreak investigation team, including experienced colleagues who can provide information about past outbreaks and known sources of the pathogen causing illness.

Foodborne-Illness Outbreak Case Studies Highlighting Hypothesis-Generation Methods, United States, 2006–2018

Abbreviations: STEC: Shiga toxin-producing Escherichia coli , HG: hypothesis generation, HGQ: hypothesis-generating questionnaires, PFGE: pulsed-field gel electrophoresis.

Known pathogen sources

When generating a hypothesis, investigators should consider historical information about the causative pathogen, including known reservoirs; foods (and animals) implicated in past outbreaks; findings from case-control studies of sporadic illnesses (i.e., diagnosed cases investigated during routine surveillance not linked to other cases); and molecular subtyping information of the pathogen, including information about nonhuman isolates (i.e., food, animal, or environmental sources).

The reservoir of the infectious agent can indicate potential sources and contributing factors. Pathogens with a human reservoir (e.g., norovirus, hepatitis A virus, and Shigella ) are commonly associated with infected food handlers or ready-to-eat foods that have been contaminated with human feces. In contrast, pathogens with animal reservoirs (e.g., Shiga toxin-producing Escherichia coli (STEC), nontyphoidal Salmonella , and Campylobacter ) are often associated with food sources of animal origin or foods that have been contaminated by animal feces during production (e.g., fresh produce). Pathogens with environmental reservoirs (e.g., Vibrio spp., Listeria monocytogenes , Clostridium botulinum ) are commonly associated with foods that can become contaminated by soil or water. Tools that help identify known pathogen sources include the National Outbreak Reporting System Dashboard ( 7 ), the Food and Drug Administration Bad Bug Book ( 8 ), and An Atlas of Salmonella in the United States ( 9 ).

Food-pathogen pairs identified in past outbreaks and case-control studies of sporadic illnesses provide information on common food vehicles associated with a pathogen. Using data on reported outbreaks from 1998–2016, the Interagency Food Safety Analytics Collaboration estimated the proportion of illnesses attributable to 17 major food categories ( 10 ). The foods most commonly associated with Salmonella illnesses were seeded vegetables (e.g., tomatoes and cucumbers), chicken, pork, and fruit, whereas most STEC illnesses were attributed to leafy greens or beef, and most Listeria illnesses to dairy products or fruits. Similarly, case-control studies of sporadic illnesses have found associations between pathogens and specific foods; for example, Campylobacter and poultry ( 11 ) and Listeria monocytogenes and melons and hummus ( 12 ).

For pathogens with multiple reservoirs, information that distinguishes isolates of the same species by phenotypic or genotypic characteristics can provide increased specificity. For example, there are over 2,600 serotypes of Salmonella ; however, some serotypes have been associated with specific food vehicles, such as Salmonella enterica serotype Enteritidis (SE) and eggs and chicken; serotypes Uganda and Infantis and pork; and serotypes Litchfield, Poona, Oranienburg, and Javiana and fruit ( 13 ). Antimicrobial resistance has also proven useful in differentiating major sources of Salmonella serotypes found in both animal- and plant-derived food commodities. For example, antimicrobial-resistant Salmonella outbreaks were more likely to be associated with meat and poultry (e.g., beef, chicken, and turkey), whereas foods commonly associated with susceptible Salmonella outbreaks were eggs, tomatoes, and melons ( 14 ).

Molecular subtyping with pulsed-field gel electrophoresis (PFGE) has been an essential subtyping tool for outbreak detection, and PFGE patterns have been associated with specific foods . For example, SE isolates with PFGE PulseNet pattern JEGX01.0004 have commonly been associated with eggs (and more recently, chicken), pattern JEGX01.0005 with chicken, and pattern JEGX01.0002 with travel or exposure to the US Pacific Northwest region and Mexico. Similarly, the same PFGE pattern of STEC O157:H7 has been associated with recurrent romaine lettuce outbreaks ( 15 , 16 ). In July 2019, whole-genome sequencing (WGS) replaced PFGE as the standard molecular subtyping method for the national PulseNet network, providing greater discrimination and more reliable indication of genetically related groupings than PFGE. This change in molecular method might limit historical comparisons temporarily, particularly to isolates from before the transition, as PFGE patterns and WGS results are not readily comparable. However, WGS allele codes have been applied to sequenced historical isolates in PulseNet, and although this represents a small proportion of all isolates in PulseNet, the representativeness of the WGS database will increase with time. As historical isolates and regulatory isolates from the Food and Drug Administration and US Department of Agriculture Food Safety and Inspection Service are sequenced, information about recent findings in foods and animals will fill the national database maintained at the National Center for Biotechnology Information ( 17 ) and be readily comparable to sequenced human clinical isolates.

Subtyping of nonhuman isolates collected by regulatory agencies from foods and food chain environments through routine testing or special studies can lead to the identification of outbreaks of human illness by searching the PulseNet database for the same molecular subtypes in human infections, sometimes referred to as “backward” outbreaks. For example, in 2007 public health authorities were investigating a multistate outbreak of Salmonella serotype Wandsworth in which patients reported consuming a puffed vegetable-coated snack food. Food testing yielded the outbreak strain of Salmonella serotype Wandsworth, but it also yielded Salmonella serotype Typhimurium; a search in the PulseNet database identified matching isolates from human cases of Salmonella serotype Typhimurium infection, and these cases confirmed consumption of the same snack food upon re-interview ( 18 ). Importantly, identifying a close genetic match between strains from a product and an illness does not alone establish causation; epidemiologic investigation and traceback are needed to connect the product and patient.

Descriptive data

Descriptive epidemiology of cases, including person, place, or time characteristics, remains a powerful tool for hypothesis generation. Person characteristics can suggest foods that are more likely to be eaten by certain groups, whereas place and time characteristics can provide clues about the geographic distribution and shelf life of the food.

Person characteristics suggestive of certain foods include, but are not limited to, sex age, race, and ethnicity. For example, the median percentage of female cases in vegetable-associated STEC outbreaks was 64%, compared with 50% in beef STEC outbreaks ( 19 ). Likewise, there are differences in food consumption patterns by age, with the lowest median percent of children and adolescents in vegetable-associated STEC outbreaks and the highest in STEC dairy outbreaks ( 19 ). Similar trends are evident in the Centers for Disease Control and Prevention FoodNet Population Survey, a population-based survey to estimate the prevalence of risk factors for foodborne illness, which found that women reported consuming more fruits and vegetables than men, and men reported consuming more meat and poultry ( 20 ).

Time characteristics, displayed by the shape and pattern of an epidemic curve, can indicate the shelf life of a product or the harvest duration of a contaminated field. For example, cases spread over a longer time period might suggest a shelf-stable or frozen food item, ongoing harborage of the contaminating pathogen in a food processing plant, or other sustained mechanism of contamination. Conversely, cases with illness onset dates spread over a limited duration of time might suggest a perishable item, such as fresh produce. However, some fresh produce items have longer shelf lives than others and can cause more protracted outbreaks. Additionally, there are “special case” produce types. For example, outbreaks associated with sprouted seeds or beans, which have a short shelf life, are typically driven by a single contaminated seed lot, and un-sprouted seeds and beans can have a shelf life of months to years. Thus, single batches might be sprouted from the same contaminated lot of seeds at different times and in different places leading to a more sustained outbreak, or resulting in temporally and geographically distinct outbreaks ( 21 ). If an outbreak is detected early and exposure is ongoing, the temporal distribution of cases might be less clear early in an investigation. Thus, epidemic curves can provide supporting evidence that adds to the plausibility of a suspected food vehicle; however, depending on the outbreak, epidemic curves might provide more relevant information as the outbreak progresses.

Geographical mapping of cases can also help assess the plausibility of a suspected vehicle by comparing the distribution of cases with the distribution pattern of that food item, in consultation with regulatory and industry partners. For example, widespread outbreaks are caused by widely distributed commercial products, and some foods are more likely to be distributed nationally (e.g., bagged leafy greens, packaged cereal, national meat brands), whereas other are more likely to be distributed regionally (e.g., popular brands of ice cream) or locally (e.g., raw milk) ( 22 ). Likewise, if some outbreak-associated illnesses are clearly related to travel to a specific country, and others are in nontravelers, it suggests the latter might be associated with a product imported from that country. For example, a 2018 outbreak of Salmonella serotype Typhimurium infections in Canada occurred among persons traveling to Thailand, and among others who shopped at particular stores in Western Canada; the outbreak was ultimately traced to contaminated frozen profiteroles imported from Thailand ( 23 ). Similarly, in a 2011 multistate outbreak in the United States, a subset of cases traveled to Mexico and ate papaya there, and nontravel-associated cases ate papaya imported from Mexico ( 24 ).

Outbreak size and distribution can suggest certain food-pathogen pairs. For example, seafood toxins like ciguatoxin are typically produced or concentrated in an individual fish and therefore cause illness in a limited number of people in a single jurisdiction, whereas Salmonella and other bacterial pathogens can contaminate large amounts of a widely distributed product ( 22 ). The distribution of cases can be misleading or incomplete early in an outbreak, so investigators must use caution when using these parameters to rule out hypotheses and revisit as additional cases are identified. Moreover, an apparently local outbreak can be an early indicator of a larger problem. For example, in 2018, a large multistate outbreak of E. coli O157:H7 infections linked to romaine lettuce was initially detected in New Jersey in association with a single restaurant chain; within 8 days of detecting the cluster it had expanded to include many more cases with a variety of different exposure locations as far away as Nome, Alaska ( 15 ).

Case exposure assessment

Rapidly collecting detailed food histories from cases in an outbreak is the most critical step in identifying commonalities between these cases. Before a cluster is detected, local or state public health agencies typically attempt to interview each individual, reportable enteric-pathogen case using a standard pathogen-specific questionnaire. If a cluster is detected, a review of these routine interviews can provide information on obvious high-risk exposures. In most jurisdictions, detailed hypothesis-generating questionnaires (HGQs) historically have been used only if commonalities are not identified from the initial routine interviews or if the hypotheses identified from routine interviews collapse under further investigation. However, a growing number of state health jurisdictions are conducting hypothesis-generating interviews with all cases of laboratory-confirmed Salmonella and STEC infection, opting to gather this information during the initial interview. This method is considered a best practice to maximize exposure recall ( 25 ), shaving days or weeks off the delay between case exposure and hypothesis-generating interview.

There are 3 major types of HGQs used in the United States ( 26 ):

Oregon “shotgun” questionnaire: This questionnaire uses a “shotgun,” or “trawling” approach of asking mostly close-ended questions for a long list of individual food items. The section order is designed to prompt recall of specific food exposures through review of places where food was purchased or eaten out, and specific repetitive questions for high-risk exposures such as raw foods or sprouts.

Minnesota “long form” hypothesis-generating questionnaire: This questionnaire combines close-ended questions about fewer food items with open-ended questions that seek details on dining/purchase location and brand-variety details for all foods.

National Hypothesis Generating Questionnaire: This questionnaire is a hybridized approach developed by Centers for Disease Control and Prevention that contains elements of both the Oregon and Minnesota models. Close-ended questions are asked about an intermediate number of food items, and brand/variety details are obtained only for commonly eaten types of foods. During national cluster investigations, the National Hypothesis Generating Questionnaire is deployed across state and local health departments to improve standardization across jurisdictions.

In addition to these questionnaires, there are many modified state-specific versions and national pathogen-specific HGQs (e.g., Listeria Initiative questionnaire, Cyclospora ). The use of HGQs can be enhanced by adopting a dynamic or iterative cluster investigation approach. In this approach, if a suspected food item or branded product emerges during interviews, that food item can be added to questionnaires administered to subsequent cases, and individuals who have already been interviewed can be re-interviewed to systematically collect information about that exposure ( 27 ). Decisions about which exposures should be pursued through re-interviews can be informed by descriptive data, as well as incubation periods, which can help define the most likely exposure period ( 28 ).

The number of interviewers participating in hypothesis-generating interviews can depend on resources and the specifics of the outbreak. A single interviewer approach can be advantageous in that a single interviewer might more clearly remember what previously interviewed persons mentioned and pursue clues as they arise during a live interview. However, this approach could slow investigations, particularly in sizable multistate clusters. An alternative is the “lead investigator model,” in which a single person directs the interviewing team with a limited number of interviewers, reviews completed interviews, and decides which exposures to pursue. This approach can be faster and more efficient than the single interviewer approach. When interviews are done by multiple agencies, it is important that the completed interviews be forwarded to the lead investigator promptly and that the group meet regularly and review results of interviews as the investigation proceeds.

If interviews with HGQs do not yield an actionable hypothesis, investigators should consider alternative approaches, such as questionnaire modification or open-ended interviews. Deciding when to attempt an alternative approach depends on cluster size, velocity of incident cases, and investigation effort expended and time elapsed without identification of a solid hypothesis. Questionnaire modification could include adding questions, such as open-ended questions or supplemental questions about exposures that came up on previous interviews, or pruning questions. For example, after 8–10 interviews, items that no case reported “yes” or “maybe” to eating may be removed. Removal of questions should be done cautiously because certain foods (e.g., stealth ingredients such as cilantro and sprouts) might be reported by a low proportion of cases who ate them. Another approach is open-ended interviews of recent cases, which could be considered after 20–25 initial cases in a large multistate investigation have been interviewed without yielding solid hypotheses. Conducted by a single interviewer, if possible, open-ended interviews should cover everything that a case ate or drank in the exposure period of interest, as well as other exposures including animals, grocery stores, restaurants, travel, parties or events, and details about how they prepare their food at home, including recipes. After the first person is interviewed, objective questions about specific exposures can be added to the open-ended interviews of subsequent cases, creating a hybrid open-ended/iterative model. This requires cooperative patients and a persistent investigative approach but has yielded correct hypotheses with as few as 2 interviews ( 29 ).

Additional methods to ascertain exposures, such as obtaining consumer food purchase data, can be appropriate, particularly for outbreaks where obtaining a food history is challenging ( 30 ). For example, during a multistate Salmonella serotype Montevideo outbreak, initial hypothesis-generating interviews did not identify a clear signal beyond shopping at the same warehouse store. Investigators used shopper membership card purchase information to generate hypotheses, which ultimately helped identify red and black peppercorns coating a ready-to-eat salami as the vehicle ( 31 ). In addition, information from services for grocery home delivery, restaurant take-out delivery, and meal kits might help to clarify specific exposures. Other potential methods include focus-group interviews and household inspections, although these are used more rarely and in specific scenarios, with mixed results ( 32 ).

Binomial probability comparisons can further refine hypotheses by comparing the proportion of cases in an outbreak reporting a food exposure with the expected background proportion of the population reporting the food exposure ( 33 , 34 ). Binomial probability calculations in foodborne-disease outbreak investigations emerged in Oregon in 2003 as a complement to the pioneered “shotgun” questionnaire and use independent data sources on food exposure frequency from sporadic cases, past outbreak cases, or well persons sampled from the population. Such data sources include data from healthy people surveyed as part of the FoodNet Population Survey, standardized data collected in previous outbreaks, or sporadic cases as is done with the Listeria Initiative and Project Hg ( 33 , 35 , 36 ).

Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. A well-informed hypothesis can increase the likelihood of rapidly and conclusively implicating the contaminated food vehicle; conversely, the chances of implicating a food item are small if that item is not considered as part of the outbreak investigation. Inadequate hypothesis generation can delay investigation progress and limit investigators’ ability to rapidly identify the outbreak source, potentially leading to prolonged exposure and more illnesses. The 3 primary sources of information presented as part of this framework—known sources of the pathogen causing illness, descriptive data, and case exposure assessment—provide vital information for hypothesis generation, particularly when used in combination and revisited throughout the outbreak investigation.

Despite these sources of information, there are certain types of outbreaks for which hypothesis generation is inherently more challenging. These include outbreaks for which the vehicle has a high background rate of consumption (e.g., chicken) or outbreaks associated with a “stealth” food (e.g., garnishes, spices, chili peppers, or sprouts) that many cases could have consumed, but few remember eating. These challenges can sometimes be overcome by obtaining details on food exposures such as brand/variety and point of purchase. Obtaining this information is also critical to rapidly initiating a traceback investigation. An outbreak might also be caused by multiple contaminated food products when, for example, multiple foods have a single common ingredient or when poor sanitation or contaminated equipment leads to cross-contamination. Furthermore, the key exposure might not be a food at all, but rather an environmental or animal exposure, emphasizing that food should not be the default hypothesis.

There might be specific clues or “toe-holds” that help identify a hypothesis and accelerate an investigation. For example, cases with restricted diets, food diaries, or highly unusual or specific exposures can narrow the list of potential foods. This could include cases who traveled briefly to the outbreak location, and thus had a limited number of exposures. Smaller, localized clusters within a larger outbreak associated with restaurants, events, stores, or institutions, or “subclusters,” are often crucial to hypothesis generation, providing a finite list of foods. For example, in a multistate outbreak of Salmonella serotype Typhimurium infections associated with consumption of tomatoes, comparison of 4 restaurant-associated subclusters was instrumental in rapidly identifying a small set of potential vehicles ( 4 ). Subcluster investigations are precisely focused and as such can lead to much more rapid and efficient hypothesis generation and testing than attempts to assess all exposures among all cases in a large outbreak. Because of the immense value of subclusters, every effort should be made to quickly identify them through initial interviews and the iterative interviewing approach ( 25 ).

The majority of outbreaks are associated with common foods previously associated with that pathogen. In an investigation, it is important to both rule in and rule out common vehicles, while keeping an open mind about potential novel vehicles. If investigators suspect a novel vehicle, they should still rule out the most common vehicles when designing epidemiologic studies. For example, if an STEC outbreak investigation implicates cucumbers, regulatory partners will want to confirm that investigators have eliminated common STEC vehicles such as ground beef, leafy greens, and sprouts. That said, food vehicles change over time, reflecting changing food preferences and trends in food safety measures, and new vehicles continue to emerge (e.g., in recent years: SoyNut butter, raw flour, caramel apples, kratom, and chia seed powder). HGQs are biased toward previously implicated foods and a finite list of foods. If cases continue without a clear hypothesis emerging, it might be necessary to try open-ended hypothesis-generating interviews.

Hypothesis generation during foodborne outbreak investigation will evolve as laboratory techniques advance. Molecular sequencing techniques based on WGS might give investigators more conviction in devoting resources to following leads because there is more confidence that the cases have a common source for their illnesses ( 17 , 37 ). Concurrent or recent nonhuman isolates (e.g., food isolates) that match human case isolates by sequencing will be considered even more likely to be related to the human cases and become a priori hypotheses during investigations.

Foodborne-outbreak investigation methods are constantly evolving. Food production, processing, and distribution are changing to meet consumer demands. Outbreak investigations are more complex, given that laboratory methods for subtyping, strategies for epidemiologic investigation, and environmental assessments are also changing. Rapid investigation is essential, because with mass production and distribution, food safety errors can cause large and widespread outbreaks. Outbreak investigations balance the need for expediency to implement control measures with the need for accuracy. If hastily developed hypotheses are incorrect or insufficiently refined, analytical studies are unlikely to succeed and can waste time and resources. Alternatively, a refined hypothesis can lead directly to effective public health interventions, sometimes bypassing the need for an analytical study, if accompanied with other compelling evidence, such as laboratory evidence or traceback information.

Effectively and swiftly sharing data across jurisdictions increases an investigations team’s ability to quickly develop hypotheses and implicate food vehicles. Successful investigations depend on including the correct hypothesis, the result of a systematic approach to hypothesis generation. The exact path to identifying a hypothesis is rarely the same between outbreaks. Therefore, investigators should be familiar with different hypothesis-generating strategies and be flexible in deciding which strategies to employ.

Author affiliations: Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, United States (Alice E. White, Elaine Scallan Walter); Minnesota Department of Health, St. Paul, Minnesota, United States (Kirk E. Smith, Carlota Medus); Washington State Department of Health, Tumwater, Washington, United States (Hillary Booth); and Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging Zoonotic and Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States (Robert V. Tauxe, Laura Gieraltowski).

This work was funded in part by the Colorado and Minnesota Integrated Food Safety Centers of Excellence, which are supported by the Epidemiology and Laboratory Capacity for Infectious Disease Cooperative Agreement through the Centers for Disease Control and Prevention.

Conflict of interest: none declared.

Scallan E , Hoekstra RM , Angulo FJ , et al.  Foodborne illness acquired in the United States—major pathogens . Emerg Infect Dis . 2011 ; 17 ( 1 ): 7 – 15 .

Google Scholar

Tauxe RV . Surveillance and investigation of foodborne diseases; roles for public health in meeting objectives for food safety . Food Control . 2002 ; 13 ( 6-7 ): 363 – 369 .

Dewey-Mattia D , Manikonda K , Hall AJ , et al.  Surveillance for foodborne disease outbreaks—United States, 2009–2015 . MMWR Morb Mortal Wkly Rep . 2018 ; 67 ( 10 ): 1 – 11 .

Behravesh CB , Blaney D , Medus C , et al.  Multistate outbreak of Salmonella serotype typhimurium infections associated with consumption of restaurant tomatoes, USA, 2006: hypothesis generation through case exposures in multiple restaurant clusters . Epidemiol Infect . 2012 ; 140 ( 11 ): 2053 – 2061 .

Boulton ML , Rosenberg LD . Food safety epidemiology capacity in state health departments—United States, 2010 . MMWR Morb Mortal Wkly Rep . 2011 ; 60 ( 50 ): 1701 – 1704 .

Porta MA A Dictionary of Epidemiology . 5th ed. New York, NY : Oxford University Press ; 2008 ( 4 ): 82 .

Centers for Disease Control and Prevention . National Outbreak Reporting System Dashboard. https://wwwn.cdc.gov/norsdashboard/ . Updated December 7, 2018 . Accessed April 9, 2021 .

Lampel KA , Al-Khaldi S , Cahill SM , eds. Bad Bug Book, Foodborne Pathogenic Microorganisms and Natural Toxins . 2nd ed. Washington, DC : Food and Drug Administration ; 2012 .

Google Preview

Centers for Disease Control and Prevention . An Atlas of Salmonella in the United States, 1968–2011: Laboratory-Based Enteric Disease Surveillance . Atlanta, GA : US Department of Health and Human Services, CDC ; 2013 . https://www.cdc.gov/salmonella/pdf/salmonella-atlas-508c.pdf . Accessed April 9, 2021 .

Interagency Food Safety Analytics Collaboration . Foodborne Illness Source Attribution Estimates for 2017 for Salmonella , Escherichia coli O157, Listeria monocytogenes , and Campylobacter Using Multi-Year Outbreak Surveillance Data, United States . Atlanta, GA and Washington DC : US Department of Health and Human Services ; 2019 . https://www.cdc.gov/foodsafety/ifsac/pdf/P19-2017-report-TriAgency-508-archived.pdf . Accessed April 9, 2021 .

Friedman CR , Hoekstra RM , Samuel M , et al.  Risk factors for sporadic Campylobacter infection in the United States: a case‐control study in FoodNet sites . Clin Infect Dis . 2004 ; 38 ( suppl 3 ): S285 – S296 .

Varma J , Samuel M , Marcus R , et al.  Listeria monocytogenes infection from foods prepared in a commercial establishment: a case-control study of potential sources of sporadic illness in the United States . Clin Infect Dis . 2007 ; 44 ( 4 ): 521 – 528 .

Jackson BR , Griffin PM , Cole D , et al.  Outbreak-associated Salmonella enterica serotypes and food commodities, United States, 1998--2008 . Emerg Infect Dis . 2013 ; 19 ( 8 ): 1239 – 1244 .

Brown AC , Grass JE , Richardson LC , et al.  Antimicrobial resistance in Salmonella that caused foodborne disease outbreaks: United States, 2003–2012 . Epidemiol Infect . 2017 ; 145 ( 4 ): 766 – 774 .

Centers for Disease Control and Prevention . Multistate outbreak of E. coli O157:H7 infections linked to romaine lettuce. https://www.cdc.gov/ecoli/2018/o157h7-04-18/index.html . Published June 28, 2018 . Accessed August 6, 2020 .

Centers for Disease Control and Prevention . Outbreak of E. coli infections linked to romaine lettuce. https://www.cdc.gov/ecoli/2019/o157h7-11-19/index.html . Published January 15, 2020 . Accessed August 6, 2020 .

Besser JM , Carleton HA , Trees E , et al.  Interpretation of whole-genome sequencing for enteric disease surveillance and outbreak investigation . Foodborne Pathog Dis . 2019 ; 16 ( 7 ): 504 – 512 .

Sotir MJ , Ewald G , Kimura AC , et al.  Outbreak of Salmonella Wandsworth and Typhimurium infections in infants and toddlers traced to a commercial vegetable-coated snack food . Pediatr Infect Dis J . 2009 ; 28 ( 12 ): 1041 – 1046 .

White A , Cronquist A , Bedrick E , et al.  Food source prediction of Shiga toxin-producing Escherichia coli outbreaks using demographic and outbreak characteristics, United States, 1998–2014 . Foodborne Pathog Dis . 2016 ; 13 ( 10 ): 527 – 534 .

Shiferaw B , Verrill L , Booth H , et al.  Sex-based differences in food consumption: Foodborne Diseases Active Surveillance Network (FoodNet) Population Survey, 2006–2007 . Clin Infect Dis . 2012 ; 54 ( suppl 5 ): S453 – S457 .

Ferguson DD , Scheftel J , Cronquist A , et al.  Temporally distinct Escherichia coli O157 outbreaks associated with alfalfa sprouts linked to a common seed source—Colorado and Minnesota, 2003 . Epidemiol Infect . 2005 ; 133 ( 3 ): 439 – 447 .

Tauxe RV . Emerging foodborne diseases: an evolving public health challenge . Emerg Infect Dis . 1997 ; 3 ( 4 ): 425 – 434 .

Public Health Agency of Canada . Public Health Notice—outbreak of Salmonella infections linked to Celebrate brand frozen classic/classical and egg nog flavoured profiteroles (cream puffs) and mini chocolate eclairs. https://www.canada.ca/en/public-health/services/public-health-notices/2019/outbreak-salmonella.html . Published June 27, 2019 . Accessed August 6, 2020 .

Mba-Jonas A , Culpepper W , Hill T , et al.  A multistate outbreak of human Salmonella Agona infections associated with consumption of fresh, whole papayas imported from Mexico—United States, 2011 . Clin Infect Dis . 2018 ; 66 ( 11 ): 1756 – 1761 .

Hedberg C . Guidelines for Foodborne Disease Outbreak Response . 3rd ed. Atlanta, GA : Council to Improve Foodborne Outbreak Response (CIFOR) ; 2020 .

Centers for Disease Control and Prevention . Foodborne disease outbreak investigation and surveillance tools. https://www.cdc.gov/foodsafety/outbreaks/surveillance-reporting/investigation-toolkit.html . Reviewed June 10, 2021 . Accessed July 2, 2021 .

Meyer SD , Kirk SE , Hedberg CH . Chapter 7.2—Surveillance for foodborne diseases, part 2: investigation of foodborne disease outbreaks. In: M'ikanatha NM , Lynfield R , Van Beneden CA , et al. eds. Infectious Disease Surveillance . 5th ed. West Sussex, UK : Wiley-Blackwell ; 2013 : 120 – 128 .

Chai S , Gu W , O'Connor KA , et al.  Incubation periods of enteric illnesses in foodborne outbreaks, United States, 1998-2013 . Epidemiol Infect . 2019 ; 147 :e285.

Angelo KM , Conrad AR , Saupe A , et al.  Multistate outbreak of Listeria monocytogenes infections linked to whole apples used in commercially produced, prepackaged caramel apples: United States, 2014-2015 . Epidemiol Infect . 2017 ; 145 ( 5 ): 848 – 856 .

Møller FT , Mølbak K , Ethelberg S . Analysis of consumer food purchase data used for outbreak investigations, a review . Euro Surveill . 2018 ; 23 ( 24 ):1700503.

Gieraltowski L , Julian E , Pringle J , et al.  Nationwide outbreak of Salmonella Montevideo infections associated with contaminated imported black and red pepper: warehouse membership cards provide critical clues to identify the source . Epidemiol Infect . 2013 ; 141 ( 6 ): 1244 – 1252 .

Ickert C , Cheng J , Reimer D , et al.  Methods for generating hypotheses in human enteric illness outbreak investigations: a scoping review of the evidence . Epidemiol Infect . 2019 ; 147 :e280.

Jervis RH , Booth H , Cronquist AB , et al.  Moving away from population-based case-control studies during outbreak investigations . J Food Prot . 2019 ; 82 ( 8 ): 1412 – 1416 .

Keene W . The use of binomial probabilities in outbreak investigations (abstract). In: Presented at the Annual OutbreakNet Conference, Long Beach . California ; September 22, 2011 .

McCollum JT , Cronquist AB , Silk BJ , et al.  Multistate outbreak of listeriosis associated with cantaloupe . N Engl J Med . 2013 ; 369 ( 10 ): 944 – 953 .

Centers for Disease Control and Prevention . National Listeria Surveillance: Listeria initiative. https://www.cdc.gov/nationalsurveillance/listeria-surveillance.html . Published September 13, 2018 . Accessed August 6, 2020

Jackson BR , Tarr C , Strain E , et al.  Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation . Clin Infect Dis . 2016 ; 63 ( 3 ): 380 – 386 .

Sharapov UM , Wendel AM , Davis JP , et al.  Multistate outbreak of Escherichia coli O157:H7 infections associated with consumption of fresh spinach: United States, 2006 . J Food Prot . 2016 ; 79 ( 12 ): 2024 – 2030 .

Neil KP , Biggerstaff G , MacDonald JK , et al.  A novel vehicle for transmission of Escherichia coli O157:H7 to humans: multistate outbreak of E. coli O157:H7 infections associated with consumption of ready-to-bake commercial prepackaged cookie dough—United States, 2009 . Clin Infect Dis . 2012 ; 54 ( 4 ): 511 – 518 .

Miller BD , Rigdon CE , Ball J , et al.  Use of traceback methods to confirm the source of a multistate Escherichia coli O157:H7 outbreak due to in-shell hazelnuts . J Food Prot . 2012 ; 75 ( 2 ): 320 – 327 .

Medus C , Meyer S , Smith K , et al.  Multistate outbreak of Salmonella infections associated with peanut butter and peanut butter-containing products—United States, 2008–2009 . MMWR Morb Mortal Wkly Rep . 2009 ; 58 ( 4 ): 85 – 90 .

Gambino-Shirley KJ , Tesfai A , Schwensohn CA , et al.  Multistate outbreak of Salmonella Virchow infections linked to a powdered meal replacement product—United States, 2015–2016 . Clin Infect Dis . 2018 ; 67 ( 6 ): 890 – 896 .

Centers for Disease Control and Prevention . Multistate outbreak of Salmonella infections linked to kratom. https://www.cdc.gov/salmonella/kratom-02-18/index.html . 2018 . Published February 20, 2018 . Accessed September 14, 2020 .

Centers for Disease Control and Prevention . Multistate outbreak of Salmonella infections linked to kratom. https://www.cdc.gov/nationalsurveillance/listeria-surveillance.html . Last reviewed September 13, 2018 . Accessed July 2, 2021 .

  • disease outbreaks
  • pathogenic organism
  • foodborne disease

Email alerts

Citing articles via, looking for your next opportunity.

  • Recommend to your Library

Affiliations

  • Online ISSN 1476-6256
  • Print ISSN 0002-9262
  • Copyright © 2024 Johns Hopkins Bloomberg School of Public Health
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

7.1.4 - developing and evaluating hypotheses, developing hypotheses section  .

After interviewing affected individuals, gathering data to characterize the outbreak by time, place, and person, and consulting with other health officials, a disease detective will have more focused hypotheses about the source of the disease, its mode of transmission, and the exposures which cause the disease. Hypotheses should be stated in a manner that can be tested.

Hypotheses are developed in a variety of ways. First, consider the known epidemiology for the disease: What is the agent's usual reservoir? How is it usually transmitted? What are the known risk factors? Consider all the 'usual suspects.'

Open-ended conversations with those who fell ill or even visiting homes to look for clues in refrigerators and shelves can be helpful. If the epidemic curve points to a short period of exposure, ask what events occurred around that time. If people living in a particular area have the highest attack rates, or if some groups with a particular age, sex, or other personal characteristics are at greatest risk, ask "why?". Such questions about the data should lead to hypotheses that can be tested.

Evaluating Hypotheses Section  

There are two approaches to evaluating hypotheses: comparison of the hypotheses with the established facts and analytic epidemiology , which allows testing hypotheses.

A comparison with established facts is useful when the evidence is so strong that the hypothesis does not need to be tested. A 1991 investigation of an outbreak of vitamin D intoxication in Massachusetts is a good example. All of the people affected drank milk delivered to their homes by a local dairy. Investigators hypothesized that the dairy was the source, and the milk was the vehicle of excess vitamin D. When they visited the dairy, they quickly recognized that far more than the recommended dose of vitamin D was inadvertently being added to the milk. No further analysis was necessary.

Analytic epidemiology is used when the cause is less clear. Hypotheses are tested, using a comparison group to quantify relationships between various exposures and disease. Case-control, occasionally cohort studies, are useful for this purpose.

Case-control studies Section  

As you recall from last week's lesson, in a case-control study case-patients and controls are asked about their exposures. An odds ratio is calculated to quantify the relationship between exposure and disease.

In general, the more case patients (and controls) you have, the easier it is to find an association. Often, however, an outbreak is small. For example, 4 or 5 cases may constitute an outbreak. An adequate number of potential controls is more easily located. In an outbreak of 50 or more cases, 1 control per case-patient will usually suffice. In smaller outbreaks, you might use 2, 3, or 4 controls per case-patient. More than 4 controls per case-patient are rarely worth the effort because the power of the study does not increase much when you have more than 4 controls per case-patient (we will talk more on power and sample size in epidemiologic studies later in this course!).

Testing statistical significance Section  

The final step in testing a hypothesis is to determine how likely it is that the study results could have occurred by chance alone. Is the exposure the study results suggest as the source of the outbreak related to the disease after all? The significance of the odds ratio can be assessed with a chi-square test. We will also discuss statistical tests that control for many possible factors later in the course.

Cohort studies Section  

If the outbreak occurs in a small, well-defined population a cohort study may be possible. For example, if an outbreak of gastroenteritis occurs among people who attended a particular social function, such as a banquet, and a complete list of guests is available, it is possible to ask each attendee the same set of questions about potential exposures and whether he or she had become ill with gastroenteritis.

After collecting this information from each guest, an attack rate can be calculated for people who ate a particular item (were exposed) and an attack rate for those who did not eat that item (were not exposed). For the exposed group, the attack rate is found by dividing the number of people who ate the item and became ill by the total number of people who ate that item. For those who were not exposed, the attack rate is found by dividing the number of people who did not eat the item but still became ill by the total number of people who did not eat that item.

To identify the source of the outbreak from this information, you would look for an item with:

  • high attack rate among those exposed and
  • a low attack rate among those not exposed (so the difference or ratio between attack rates for the two exposure groups is high); in addition
  • most of the people who became ill should have consumed the item, so that the exposure could explain most, if not all, of the cases.

We will learn more about cohort studies in Week 9 of this course.

The Epidemiologic Toolbox: Identifying, Honing, and Using the Right Tools for the Job

  • PMID: 32207771
  • PMCID: PMC7368131
  • DOI: 10.1093/aje/kwaa030

There has been much debate about the relative emphasis of the field of epidemiology on causal inference. We believe this debate does short shrift to the breadth of the field. Epidemiologists answer myriad questions that are not causal and hypothesize about and investigate causal relationships without estimating causal effects. Descriptive studies face significant and often overlooked inferential and interpretational challenges; we briefly articulate some of them and argue that a more detailed treatment of biases that affect single-sample estimation problems would benefit all types of epidemiologic studies. Lumping all questions about causality creates ambiguity about the utility of different conceptual models and causal frameworks; 2 distinct types of causal questions include 1) hypothesis generation and theorization about causal structures and 2) hypothesis-driven causal effect estimation. The potential outcomes framework and causal graph theory help efficiently and reliably guide epidemiologic studies designed to estimate a causal effect to best leverage prior data, avoid cognitive fallacies, minimize biases, and understand heterogeneity in treatment effects. Appropriate matching of theoretical frameworks to research questions can increase the rigor of epidemiologic research and increase the utility of such research to improve public health.

Keywords: bias; causality; descriptive studies; epidemiologic methods; inference.

© The Author(s) 2020. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: [email protected].

Publication types

  • Research Support, N.I.H., Extramural
  • Epidemiologic Methods*
  • Models, Theoretical
  • Public Health

Grants and funding

  • K01 AA028193/AA/NIAAA NIH HHS/United States
  • K01 AI125087/AI/NIAID NIH HHS/United States
  • R01 ES029531/ES/NIEHS NIH HHS/United States

3. Generate Hypotheses

Developing a hypothesis regarding the cause of the outbreak is often challenging and is a crucial step in the outbreak investigation.

Many pathogens that cause waterborne diseases can also be transmitted by contaminated food or by contact with an infected person or animal. When looking for the source of the illness, investigators first need to decide on the likely mode(s) of transmission. The identified pathogen, where ill persons live, or the age of the patients may suggest a particular mode of transmission and could help identify a specific source. Hypothesis generation should be considered an iterative process in which possible explanations are continually refined or refuted.

When exposure to water is suspected as the source of contamination, public health officials interview ill cases to determine water exposures in the days or weeks prior to onset of illness. These interviews are called “hypothesis-generating interviews.”  Interviews can either use a standardized questionnaire (e.g., “shotgun” questionnaire), or they can be open-ended. Standardized interviews include a set of questions used by public health officials to interview ill people during outbreak investigations.  Open-ended interviews are not standardized and do not provide concrete exposures for analysis. Interviews will focus on activities and experiences that occurred during the pathogen’s incubation period—the time it takes to get sick after exposure to the contaminated water. A table of common waterborne pathogens and their incubation period is listed in the Appendices .

Based on all the information gathered, the investigators make a hypothesis about the likely source of the outbreak. If they are not able to develop a hypothesis, investigators can return to intensive, open-ended interviews or utilize a different set of standardized questions to develop clues to the outbreak source. Clues to the outbreak source might come from ill persons with few exposure opportunities or from interviewing cohorts (e.g., family groups or sports teams) within the larger outbreak population.

  • Drinking Water
  • Healthy Swimming
  • Water, Sanitation, and Environmentally-related Hygiene
  • Harmful Algal Blooms
  • Global WASH
  • WASH Surveillance
  • WASH-related Emergencies and Outbreaks
  • Other Uses of Water

To receive updates highlighting our recent work to prevent infectious disease, enter your email address:

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

hypothesis generation epidemiology

Descriptive Epidemiology

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  

Learn More sidebar

Epi_Tools.XLSX

All Modules

Hypothesis Formulation – Characteristics of Person, Place, and Time

Descriptive epidemiology searches for patterns by examining characteristics of person, place, & time . These characteristics are carefully considered when a disease outbreak occurs, because they provide important clues regarding the source of the outbreak.

Hypotheses about the determinants of disease arise from considering the characteristics of person, place, and time and looking for differences, similarities, and correlations. Consider the following examples:

  • Differences : if the frequency of disease differs in two circumstances, it may be caused by a factor that differs between the two circumstances. For example , there was a substantial difference in the incidence of stomach cancer in Japan & the US. There are also substantial differences in genetics and diet. Perhaps these factors are related to stomach cancer.
  • Similarities : if a high frequency of disease is found in several different circumstances & one can identify a common factor, then the common factor may be responsible. Example : AIDS in IV drug users, recipients of transfusions, & hemophiliacs suggests the possibility that HIV can be transmitted via blood or blood products.
  • Correlations: If the frequency of disease varies in relation to some factor, then that factor may be a cause of the disease. Example: differences in coronary heart disease vary with cigarettes consumption.

Descriptive epidemiology provides a way of organizing and analyzing data on health and disease in order to understand variations in disease frequency geographically and over time and how disease varies among people based on a host of personal characteristics (person, place, and time). Epidemiology had its origins in the desire to understand the determinants of acute infectious diseases, but its methods and applicability have expanded to include chronic diseases as well.

Descriptive Epidemiology for Infectious Disease Outbreaks

Outbreaks generally come to the attention of state or local health departments in one of two ways:

  • Astute individuals (citizens, physicians, nurses, laboratory workers) will sometimes notice cases of disease occurring close together with respect to time and/or location or they will notice several individuals with unusual features of disease and report them to health authorities.
  • Public health surveillance systems collect data on 'reportable diseases'. Requirements for reporting infectious diseases in Massachusetts are described in 105 CMR 300.000 ( Link to Reportable Diseases, Surveillance, and Isolation and Quarantine Requirements ).

Clues About the Source of an Outbreak of Infectious Disease

When an outbreak occurs, one of the first things that should be considered is what is known about that particular disease. How can the disease be transmitted? In what settings is it commonly found? What is the incubation period? There are many good summaries available online. For example, Massachusetts DPH provides this link to a PDF fact sheet for Hepatitis A , which provide a very succinct summary. With this background information in mind, the initial task is to begin to characterize the cases in terms of personal characteristics, location, and time (when did they become ill and where might they have been exposed given the incubation period for that disease. In sense, we are looking for the common element that explains why all of these people became ill. What do they have in common?

"Person"

Information about the cases is typically recorded in a "line listing," a grid on which information for each case is summarized with a separate column for each variable. Demographic information is always relevant, e.g., age, sex, and address, because they are often the characteristics most strongly related to exposure and to the risk of disease. In the beginning of an investigation a small number of cases will be interviewed to look for some common link. These are referred to as "hypothesis-generating interviews." Depending on the means by which the disease is generally transmitted, the investigator might also want to know about other personal characteristics, such as travel, occupation, leisure activities, use of medications, tobacco, drugs. What did these victims have in common? Where did they do their grocery shopping? What restaurants had they gone to in the past month or so? Had they traveled? Had they been exposed to other people who had been ill? Other characteristics will be more specific to the disease under investigation and the setting of the outbreak. For example, if you were investigating an outbreak of hepatitis B, you should consider the usual high-risk exposures for that infection, such as intravenous drug use, sexual contacts, and health care employment. Of course, with an outbreak of foodborne illness (such as hepatitis A), it would be important to ask many questions about possible food exposures. Where do you generally eat your meals? Do you ever eat at restaurants or obtain foods from sources outside the home? Hypothesis generating interviews may quickly reveal some commonalities that provide clues about the possible sources.

"Place"

Assessment of an outbreak by place provides information on the geographic extent of a problem and may also show clusters or patterns that provide clues to the identity and origins of the problem. A simple and useful technique for looking at geographic patterns is to plot, on a "spot map" of the area, where the affected people live, work, or may have been exposed. A spot map of cases may show clusters or patterns that reflect water supplies, wind currents, or proximity to a restaurant or grocery store.

In 1854 there was an epidemic of cholera in the Broad Street area of London. John Snow determined the residence or place of business of the victims and plotted them on a street map (the stacked black disks on the map below). He noted that the cases were clustered around the Broad Street community pump. It was also noteworthy that there were large numbers of workers in a local workhouse and a brewery, but none of these workers were affected - the workhouse and brewery each had their own well.

Map of Broad Street section of London where a cholera outbreak occurred in 1852. Location of cholera victims are shown with stacks of disks that are clustered around the Broad Street water pump.

On a spot map within a hospital, nursing home, or other such facility, clustering usually indicates either a focal source or person-to-person spread, while the scattering of cases throughout a facility is more consistent with a common source such as a dining hall. In studying an outbreak of surgical wound infections in a hospital, we might plot cases by operating room, recovery room, and ward room to look for clustering.

  • Link to more on the outbreak of cholera in the Broad Street area of London
  • Link to an enlarged version of Snow's spot map

"Time"

When investigating the source of an outbreak of infectious disease, Investigators record the date of onset of disease for each of the victims and then plot the onset of new cases over time to create what is referred to as an epidemic curve . The epidemic curve for an outbreak of hepatitis A is shown in the illustration below. Begriming in late April, the number of new cases rises to a peak of twelve new cases reported on May 12, and then the number of new cases gradually drops back to zero by May 21. Knowing that the incubation period for hepatitis A averages about 28-30 days, the investigators concluded that this was a point source epidemic because the cluster of new cases all occurred within the span of a single incubation period (see explanation on the next page). This, in conjunction with other information, provided important clues that helped shape their hypotheses about the source of the outbreak.

hypothesis generation epidemiology

Video Summary: Person, Place, and Time (10:42)

alternative accessible content

return to top | previous page | next page

Click to close

Content ©2017. All Rights Reserved. Date last modified: May 5, 2017. Wayne W. LaMorte, MD, PhD, MPH

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Lippincott Open Access

Logo of lwwopen

Why and How Epidemiologists Should Use Mixed Methods

Lauren c. houghton.

From the a Department of Epidemiology, Columbia University Mailman School of Public Health, New York, NY

b Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY.

Alejandra Paniagua-Avila

The field of epidemiology’s current focus on causal inference follows a quantitative approach and limits research questions to those that are strictly quantifiable. How can epidemiologists study biosociocultural public health problems that they cannot easily quantify? The mixed-methods approach offers a possible solution by incorporating qualitative sociocultural factors as well as the perspective and context from the population under study into quantitative studies. After a pluralist perspective of causal inference, this article provides a guide for epidemiologists interested in applying mixed methods to their observational studies of causal identification and explanation. We begin by reviewing the current paradigms guiding quantitative, qualitative, and mixed methodologies. We then describe applications of convergent and sequential mixed-methods designs to epidemiologic concepts including confounding, mediation, effect modification, measurement, and selection bias. We provide concrete examples of how epidemiologists can use mixed methods to answer research questions of complex bio-socio-cultural health outcomes. We also include a case study of using mixed methods in an observational study design. We describe how mixed methods can enhance how epidemiologists define underlying causal structures. Our alignment of mixed-methods study designs with epidemiologic concepts addresses a major gap in current epidemiology education— how do epidemiologists systematically determine what goes into causal structures?

Health outcomes are the product of complex social and biologic factors that interact at the molecular, individual, organizational, and broader ecologic levels over time. 1 – 3 Historically, the interdisciplinary nature of epidemiology positioned epidemiologists to study health across these levels. Overtime, epidemiology has become focused on causal inference, a process that consists of contrasting health outcomes among two or more groups of participants under different exposures. 4 Ideally, epidemiologists would approach causal inference using interdisciplinary methodologies; 5 however, causal inference in epidemiology follows a quantitative approach 4 and is increasingly methods driven. 6 Epidemiologists seldom overtly use qualitative approaches drawn from anthropology and other social sciences. 7 For this reason, multiple authors have argued that modern epidemiology limits research questions to those that are strictly quantifiable. 7 , 8 As Krieger and Davey-Smith state, “Causes do not cease being causes if they are challenging to study or to address.” 9 Although the call echoes for epidemiologists to study biomedical and social causes of disease, 10 it is unclear how to integrate them within one study, how to capture social constructs that are difficult to quantify such as contextual factors, and how to incorporate the population’s perspective. Mixed-methods research offers solutions.

Mixed-methods research integrates quantitative and qualitative data within a single study and is similar to how epidemiologists conceive triangulation, a concept suggested as essential in improving causal inference in epidemiology from a pluralist perspective. 9 , 11 , 12 Mixed methods can bring the population’s insight into hypothesis generation and incorporate context into causal structures. Epidemiology training programs may offer limited instruction in mixed-methods research, and epidemiologists might be unsure about how to apply these methods. This article provides a guide for epidemiologists to design mixed-methods studies, with a focus on epidemiologic concepts including confounding, selection bias, attrition, measurement error, mediation, and effect modification. We see the applications being most relevant to observational studies designed for causal identification and causal explanation. 11 First, we summarize the current paradigms guiding quantitative, qualitative, and mixed methodologies. Then, we describe specific applications of mixed methods to epidemiologic research. We include examples of existing and hypothetical studies to illustrate the alignment of epidemiologic concepts with mixed-methods study designs. A case study illustrates how to implement mixed methods in an observational study. The third part describes how to use mixed methods to define underlying causal structures. We conclude with current limitations of applying mixed methods to epidemiology.

QUANTITATIVE, QUALITATIVE AND MIXED-METHODS PARADIGMS AND DESIGN

Comparing quantitative and qualitative methods.

Qualitative and quantitative research differ in their foundational scientific paradigms. 13 , 14 The quantitative paradigm, rooted in positivism and empiricism, reduces phenomena to empirical indicators that represent the truth. 15 In contrast, the qualitative research paradigm is based on constructivism, in which reality is socially constructed and constantly changing. 14 , 15 In terms of approach, quantitative methods are primarily deductive—they move top-down from theory, to the formulation of hypothesis, and then to confirmation or rejection by individual observations. 14 , 16 Qualitative methods, in contrast, are primarily inductive—they move bottom-up from particular observations, to patterns, to the formulation of hypotheses, and then to theories. 16 Furthermore, epidemiologists usually interpret quantitative data from an etic or external perspective. 16 On the other hand, qualitative research adheres to the approach traditionally followed by anthropologists, characterized by an emic perspective that puts participants and their views at the center of research. 7 , 17 For example, in nutritional epidemiology an etic view may use methods to obtain nutrient level data, whereas an emic view may use methods to understand cultural practices around meals.

Quantitative and qualitative data collection and analysis also differ. First, quantitative methods gather numerical data, typically from close-ended and structured questionnaires, publicly available data resources, clinical records, or biologic measurements, whereas sources of qualitative information include text and images coming from documents, transcriptions, or field notes derived from in-depth interviews, focus groups, and participant observations. Second, quantitative data collection usually occurs separately and before data analysis. In contrast, qualitative data collection tends to be more iterative. 18 Qualitative researchers may refine their interview guide and analyze data, as they collect it to help assess saturation—that is, when no new themes emerge from additional participants. Third, quantitative data collection tends to be generated from a probabilistic sample with the goal of being generalizable, while qualitative data collection follows a purposeful sampling strategy to gain in-depth information. Whether purposeful or probabilistic, both sampling strategies capture elements of similarity and differences 19 and, in reality, observational studies often collect quantitative data less from probabilistic and more from convenient samples. Some researchers argue that opposing paradigms justify keeping quantitative and qualitative approaches separate 20 ; we and others argue, however, that they are complementary, as each method can access different aspects of a research problem that cannot be accessed with one method alone. 19 For example, mixed-methods research can help to assess generalizability by including a large and representative sample for quantitative analysis and collecting qualitative data to gauge if the local context reflects the larger one.

Mixed-methods Paradigm and Design

The pragmatic mixed-methods paradigm 13 prioritizes the research question over the methods used to answer it. 14 Also following pragmatic ontology, other epidemiologists argue that causal reasoning based on qualitative evidence is justified. Specifically, Bannister-Tyrrell et al. argue that Russo and Williamson 21 and Reiss’s 22 theories of causal inference “align with the empirical focus of epidemiology and allow for different types of evidence to evaluate causal claims, including evidence originating from qualitative research.” 23 Bannister-Tyrrell et al. see qualitative data specifically helping with mediation (mechanism of causal relations) and effect modification (the effects of context on outcomes). 23 We agree that mixed methods can aid in improving causal explanation, but we also see it improving causal identification. Causal identification includes identifying an association between an exposure and outcome and eliminating alternative explanations for that association through taking into account confounding and reducing sources of bias. 24 Both causal identification and explanation can be improved through implementing mixed-method design into epidemiologic studies.

Mixed-methods designs, based on Creswell’s 2018 update, 14 include the convergent, explanatory sequential, and exploratory sequential designs, differentiated by the order in which the methods are used, the stage at which the data are integrated, and the emphasis of each method data relative to each other. The convergent design places equal priority on both methods, by simultaneously collecting parallel qualitative and quantitative data and later comparing or combining them during analysis and interpretation. 14 , 25 The embedded design falls under convergent because it also collects qualitative and quantitative data simultaneously, but places more emphasis on one method, and uses the other method on a subset to the overall study. The two sequential designs aim to use one method to inform or explain the other. 14 , 25 The explanatory sequential design collects and analyzes quantitative data during a first phase and uses qualitative methods in a second phase to explain the quantitative results. 14 , 25 In contrast, the exploratory sequential design starts with a qualitative phase to explore a topic and informs a second quantitative phase. 14 , 25

Some epidemiologists already use mixed methods in epidemiology, particularly when developing surveys, 26 – 28 and other epidemiologists may incorporate aspects of mixed methods in their studies, but not formally or explicitly. For instance, epidemiologists may speak with members of the population when designing studies or include interpretations derived from observations during field work in the Discussion sections of manuscripts, yet they may not describe these qualitative details in the Method or Results sections, respectively. Some may argue that this is just what a good epidemiologist does to generate ideas or interpret data. Our rebuttal is: why must the qualitative aspects of what epidemiologists do be buried in their toolbox? We now turn to how epidemiologists can systematically apply mixed methods to the epidemiologic research process.

Applications of Mixed Methods to the Epidemiologic Research Process

The 2-by-2 table is at the core of epidemiology and mixed methods can help epidemiologists think through what belongs in that table (exposure and outcome) and what matters outside of the table when it comes to confounding, selection bias and attrition, measurement, mediation, and effect modification. We consider the first three of these concepts as causal identification (identifying potential causes and eliminating alternative explanations) and the latter two as causal explanation (explaining how and under what circumstances causes operate). 24 Figure ​ Figure1 1 summarizes which mixed-methods study designs are best suited to strengthen each aspect of observational studies. The Table provides further details including mixed-methods examples for each epidemiologic concept. The best mixed-methods study design depends upon which aspect of the research question the epidemiologist chooses to enhance.

An external file that holds a picture, illustration, etc.
Object name is ede-34-175-g001.jpg

Mixed-method study designs aligned with epidemiologic concepts. Mixed methods can help epidemiologists incorporate the emic view into many aspects of research while building causal models. The 2 × 2 table is the foundation of epidemiology and quantitative at the core. Qualitive methods can be incorporated either before, during, or after estimating the association between exposure and outcomes. The order in which the quantitative and qualitative methods are used depend on what aspect of the research question needs to be strengthened.

Association

When identifying potential causes of disease, qualitative methods allow epidemiologists to make observations of the population or other key stakeholders to generate new, grounded 29 hypotheses. An exploratory sequential design, including interviews, and observations in the qualitative phase, might aid epidemiologists to identify potential causes in the following ways: first, participants may describe a phenomenon not found in previous literature; and second, qualitative data focusing on the cultural context may identify upstream factors, such as family- or society-level determinants of disease, to be conceptualized as new potential causes. In the Table, we provide a hypothetical example of how interviewing school employees and observing children in schools helps an epidemiologist to identify pollution from a new food factory, and a food allergen in the snack the factory makes, as potential causes of high rates of asthma in a specific school district.

Confounding

Going to the population under study and gleaning on-the-ground perspectives can help epidemiologists understand how to make the exposed and unexposed in their sample less confounded. Specifically, using an exploratory sequential design, participant observation and in-depth interviews can help epidemiologists identify respective community and individual level factors that may confound the main association. Qualitative data may also reveal social processes that connect variables to each other to help identify confounding. In the Table, we provide a hypothetical example to understand if social support, sex education, and body positivity can help determine if maternal history of menstrual pain is a confounder in the association between a diagnosis of fibroids and substance abuse.

Selection Bias into a Study

Qualitative methods can identify what factors to compare between the study population and study sample to assess selection bias that occurs during recruitment of participants into a study. Either exploratory sequential or convergent designs may be used. An example of the latter comes from Gallaway et al. who conducted a mixed-method case-control study of risky and protective factors of suicide in soldiers. 30 They used medical records, surveys, interviews, and focus groups and found similar demographic and military characteristics between soldiers who died by suicide versus accidental death. We see expanding upon their convergent design to further assess selection bias by interviewing soldiers about reasons for substance use, a major risk factor for suicide. The study team could compare the distribution of baseline factors by sub-groups based on the reasons for substance use, and then, the subgroup distributions to the overall sample. If the distributions are similar, this would suggest little to no selection bias based on substance use. If one subgroup was more similar to the overall distribution, this would point to selection bias and could help determine to which subgroup of soldiers the results may be most generalizable.

Attrition/Loss to Follow-up

Qualitative methods can help assess attrition, another potential source of selection bias, both during a study and after its completion. Epidemiologists can collaborate with ethnographers in convergent or embedded designs to understand the dynamics of a quantitative research study including the study rationale, design, recruitment, retention and role of study personnel, participants, and advocates. As we explain in hypothetical convergent and explanatory sequential examples in the Table ​ Table1, 1 , an ethnographer finds a conflict between the university and a local politician over gentrification that influences retention into a study. This information can help an epidemiologist determine if those lost to follow-up is a source of bias.

Combining Core Epidemiologic Concepts and Mixed Methods

Measurement

Minimizing measurement error of key variables is essential in epidemiology. 31 Epidemiologists might employ qualitative methods first to decide how or what variables to measure quantitatively or they may use both quantitative and qualitative methods concurrently to measure variables. 16 Qualitative methods can assist epidemiologists to identify language and colloquialisms to measure a variable or ways to phrase questions about potentially sensitive topics. When a previously validated instrument needs to be translated and validated in a different context, qualitive methods ground the necessary changes in an emic perspective. 16 For example, when one of us (LCH) was working with an interdisciplinary team to reconstruct the Native American diet consumed in the 1940s in New Mexico, some collaborators did not see it necessary to ask about dairy intake because of the documented high prevalence of lactose intolerance in Native Americans. However, when we interviewed Native American elders about dairy intake as children, they recalled their family members eating dairy and one participant remembered his family making cheese. Sequential exploratory designs are most suited for when epidemiologists want to use qualitative methods to design survey instruments. For example, Barg et al. 27 explored the meaning of depression in older adults and came to learn that loneliness was a major part of that definition before creating a survey that incorporated loneliness into the depression measure (Table). If epidemiologists want to measure the same variable using both quantitative and qualitative methods, a convergent design is appropriate (Table).

Mediation and Effect Modification

Mixed methods can assist epidemiologists to understand how an association works (mediation) from the perspective of the population under study, or to capture context (group-level, cultural, or social factors) for identifying effect modifiers. Causation cannot be completely isolated from context, as one factor might be a causal factor of an outcome in one environment but not in another one, depending on the distribution of causal partners in each setting. 32 While quantitative methods can describe the quantitative distribution of causal partners, qualitative methods can inform how and why the relationship between exposure and outcome differs between contexts. Either sequential or convergent designs may be useful in explaining causal mechanisms (Table). An example of using mixed methods to identify an effect modifier comes from Erin Kobetz’s work which found that twalet deba, a culturally mediated feminine hygiene practice among many Haitian women, may explain high rates of cervical cancer in Little Haiti in Miami. 33 , 34 Although this research was not a single mixed-methods study, the authors used qualitative results to inform the quantitative test of whether intravaginal agents increased susceptibility to cervical cancer.

We have discussed applications of mixed methods to enhance hypothetical and current epidemiologic studies by aligning mixed-methods study designs with epidemiologic concepts. Although it is common to use the term “mixed methods” when referring to studies using at least one quantitative and one qualitative method, the purpose of mixed methods is to integrate multiple methods during interpretation. There are many examples of mixed-methods studies that use qualitative data to develop a epidemiologic survey 26 and collect qualitative data to understand perspectives of disease outcomes. 35 There are fewer examples of epidemiologic studies that also integrate results during the analysis phase. 36 , 37 We now describe a case study that exemplifies mixed-methods integration in observational epidemiology.

Case Study of an Epidemiologic, Observational Study using Mixed-methods

To better understand what early life factors explain rising breast cancer incidence rates among migrants that move from low to high incident countries, we conducted a mixed-methods migrant study on puberty. 38 – 40 Earlier age at puberty is associated with increased breast cancer risk, 41 so we compared pubertal timing within the context of migration. 40 To align with literature on puberty and breast cancer, we measured puberty following biomedical definitions and used established epidemiologic methods (validated questionnaires and hormonal biomarkers). 42 , 43 At the same time, given the inclusion of different cultural groups in our sample (White British girls and British–Bangladeshi migrants in London, UK, and Bangladeshi girls in Sylhet, Bangladesh) we used qualitative methods to understand the context in which girls were growing up. From the literature we knew that body mass index (BMI) was a potential mediator, but we were interested in identifying other mediators from an emic perspective. Therefore, our research question necessitated mixed methods.

We followed a convergent design to assess biocultural constructs related to both migration (exposure, X) and puberty timing (outcome, Y). Figure ​ Figure2 2 illustrates the causal diagram and uses color to indicate the quantitative and qualitative methods to measure each variable. Quantitative data collection involved measuring puberty (Y) through a hormonal biomarker and the Pubertal Development Scale. 42 A structured questionnaire assessed aspects of migration (X), such as preference for clothes and food. 40 To calculate BMI (mediator, M) we took anthropometric measurements. The qualitative data collection occurred during afterschool clubs, and included participant observation and focus groups to gather girls’ perspectives of social expressions of puberty (Y), such as choice of clothes and wearing the hijab, and food preferences, specifically eating rice and curry, which was both a marker of migration (Y) and related to BMI (M). 40 We collected qualitative and quantitative data in parallel and placed equal emphasis on the qualitative and the quantitative components.

An external file that holds a picture, illustration, etc.
Object name is ede-34-175-g002.jpg

Hypothesized DAG with associated qualitative (italics) and quantitative (Roman) methods in mixed-methods Convergent Study, Adolescence among Bangladeshi and British Youth.

Quantitative analysis included survival models to compare age at puberty among White British, migrant and Bangladeshi girls, as well as mediation by BMI. 38 Analysis of qualitative data included open coding and grounded theory to analyze field notes and focus group discussions related to hijab and food. We used joint display, 40 an approach used to present qualitative and quantitative results simultaneously. 44 In a table, the first column displayed the quantitative results for each study variable through bar charts, the second and third columns presented corresponding quotes from Bangladeshi and migrant participants, respectively. The joint display highlighted where the biologic and cultural definitions of each variable converged or diverged. For example, girls reported eating rice and curry for dinner in 24-hour food recalls, but in the same day said to their friends, “I don’t eat rice,” which was a way to express rejection of Bangladeshi culture.

The quantitative results confirmed that migrant girls experienced puberty earlier than nonmigrant girls. 38 BMI partially explained the association between migrant group (X) and puberty timing (Y). Qualitative data suggested 1 st generation migrant girls, the group with earliest pubertal age, experienced discrimination and stress. 40 Our use of mixed methods allowed for the integration of data in a way we had not initially planned. Early on during field work, we noticed that some girls did not wear hijab every day. We were perplexed as we thought this was a rather fixed cultural practice. However, girls explained, “I’m only practicing, I’m not yet dedicated to the scarf.” We revised our survey to ask girls if they wore the scarf occasionally or every day and used this dichotomous variable as an additional pubertal outcome in survival models. We compared the median age at pubertal onset between our biologic and cultural definitions and found that “practicing” aligned with the hormonal rise in androgens around age 5 (adrenarche) and “being dedicated” aligned with the age at menarche in migrant Bangladeshi girls. 40 This integrated analysis illustrated the relationship between social and biologic markers of puberty, which was a contribution beyond previous studies that investigated social and biologic factors of puberty separately.

We have illustrated the alignment of mixed-methods design with epidemiologic concepts through examples and a case study. Now, we will turn to an application that cuts across the epidemiologic concepts, which entails using mixed methods to define causal structures.

MIXED METHODS TO DEFINE CAUSAL STRUCTURES

Defining the underlying causal structure of a phenomenon in epidemiology entails identifying causes of health outcomes and describing how and for whom the associations between causes (exposures, X) and health outcomes (Y) work. 31 Causal diagrams including but not limited to directed acyclic graphs (DAGs) are one way of illustrating the underlying causal structure. However, epidemiologists predominantly build DAGs using their etic perspective, external to the population under study. Combining the etic with the emic—insider perspective of the context within which the phenomenon occurs—provides a new approach to building DAGs.

A challenge when constructing a useful and meaningful DAG is understanding when two nodes are related, the direction of the arrow, and whether a covariate might be a confounder, mediator, collider, or irrelevant variable. Often there is a lack of theory and sufficient empirical data to be certain of these structures. Furthermore a DAG cannot provide insight into what variables may be missing or whether a variable is conceptualized appropriately. 9 Qualitative data can provide additional empirical data defining the underlying structure of causal relationships. During qualitative data analysis, mapping options in qualitative coding software, such as NVivo, 45 help to identify important nodes and the meaningful connections between them, in a similar way as building a DAG. In NVivo, nodes are qualitative parent and child codes that researchers generate either deductively—the researcher searches for text relating to a preconceived code—or inductively—the code emerges from textual data. Qualitative methods offer a DAG the meaning of variables and connections between them from an emic perspective. Figure ​ Figure3 3 shows a sequential exploratory study that collects qualitative data from women with early-onset breast cancer to build a causal diagram to test with quantitative methods. Qualitative analysis identifies parent codes (Air pollution, Stress, Marital Status) as possible causes of cancer. In telling their story of getting early-onset breast cancer, women said “I found a lump while on honeymoon” or “I thought it was related to breastfeeding” and such qualitative data yield two child codes, Parity and Breastfeeding, under Marital Status (Figure ​ (Figure3A). 3 A). These five codes become variables in a DAG (Figure ​ (Figure3B) 3 B) and qualitative data, as well as evidence from previous studies, inform the connections between them. The epidemiologist can test the idea that breastfeeding is positively associated with early-onset cancer, an idea that they may not have had before interviewing women since breastfeeding is negatively associated with postmenopausal breast cancer.

An external file that holds a picture, illustration, etc.
Object name is ede-34-175-g003.jpg

Using qualitative analysis in NVivo (A) to inform DAGs (B) within a sequential exploratory mixed-method study design.

We recognize that triangulation in epidemiology often implies comparing results across more than one study. Returning to the Adolescence among Bangladeshi and British Youth case study, Figure ​ Figure4 4 illustrates how using mixed methods within a single study can define underlying causal structures for future studies. Qualitative information on discrimination and stress, such as “I’m not a Freshi” and “I’m proud of my religion but not my culture,” helped inform questions as to why puberty was particularly early in first-generation migrants. BMI and stress are established risk factors for early puberty, but seldom analyzed as causal partners, thus mixed methods led us to a new DAG that includes a hormonal mechanism for the interaction between stress and BMI (Figure ​ (Figure4 4 ).

An external file that holds a picture, illustration, etc.
Object name is ede-34-175-g004.jpg

Updated DAG informed by qualitative ( italics ) and quantitative (Roman) results from a mixed-methods convergent study, Adolescence among Bangladeshi and British Youth.

LIMITATIONS OF INTEGRATING MIXED-METHODS AND EPIDEMIOLOGY

We recognize limitations of applying mixed methods in epidemiology at the present time. With no formal training in mixed methods, current epidemiology teams may lack expertise and will need new collaborations with qualitative researchers. Yet lack of training does not preclude epidemiologists from designing mixed-methods studies. We envision epidemiologists who can design their own mixed methods epidemiologic studies and then collaborate with experienced qualitive researchers to conduct the research. Mixed-methods studies may require more time and resources than studies only using quantitative methods and securing funding for epidemiology studies using mixed-methods may be difficult. However, the Office of Behavioral and Social Sciences Research at the National Institutes of Health commissioned the “Best Practices for Mixed Methods Research in the Health Sciences” to assist investigators, reviewers and NIH leadership. 46 Last, despite carefully planned designs, there may be situations where data cannot be easily integrated or provide opposing conclusions. We have had this experience but found that divergent results lead to new hypotheses.

Krieger stated that an “intellectual and empirical challenge is to integrate biomedical, lifestyle and social risk factors to afford a richer understanding of the causal processes at play and hence better inform efforts to improve population health and reduce health inequities.” 47 We argue that mixed methods allows for the integration of bio-socio-cultural factors in epidemiologic studies. We align mixed-methods study designs with epidemiologic concepts so that epidemiologists can enhance observational studies. We describe how mixed methods can define the underlying causal structure of phenomenon. Our how to guide overcomes a major critique of efforts to improve causal inference that epidemiology textbooks currently do not include. 47 Mixed methods is a systematic approach to determining what goes into our causal structures. Previously hidden in the causal inference toolbox, we have described how to systematically incorporate the perspective and context of the population under study and how to integrate the social and biological factors of health and diseases within single epidemiologic studies.

ACKNOWLEDGMENTS

We would like to thank Dr. Sharon Schwartz for her helpful feedback on earlier drafts of this article and Hanfei Qi for developing related web content that helped reorganize the current article.

L.H. conceived the idea for the article; developed the ideas for and prepared the sections “Applications of mixed methods to epidemiologic research process,” “Example of an epidemiologic, observational study using mixed methods,” “Conclusion”; and critically revised all sections of the article. A.P.-A. contributed substantially to the conceptualization of the article; prepared the sections “Introduction,” “Quantitative, qualitative, mixed methods,” contributed to the “Applications of mixed methods to epidemiologic research process” and “Example of epidemiologic, observational study using mixed methods” sections; conducted the literature review; and critically reviewed the final version.

L.C.H. was supported to conduct this work through National Cancer Institute K07CA218166.

The authors report no conflicts of interest.

There are no data included in this article.

IMAGES

  1. Steps in the hypothesis Generation

    hypothesis generation epidemiology

  2. Hypothesis generation

    hypothesis generation epidemiology

  3. PPT

    hypothesis generation epidemiology

  4. Hypothesis generation and evaluation. We develop a general empirical

    hypothesis generation epidemiology

  5. PPT

    hypothesis generation epidemiology

  6. PPT

    hypothesis generation epidemiology

VIDEO

  1. Life Cycle Hypothesis: A Revolution in Economic Understanding

  2. The hypothesis of sixth-generation fighter aircraft (HD Enhanced Edition)

  3. AI in Hypothesis Generation

  4. Abiogenesis: What Is the Probability Life Arose from Inorganic Chemicals?

  5. Welcome to the hypothesis generation workshop for restructure MRs

  6. Null Hypothesis explained in HINDI

COMMENTS

  1. Step 3: Generate Hypotheses about Outbreak Sources

    Step 3: Generate Hypotheses about Outbreak Sources. In a multistate outbreak investigation, a hypothesis is a guess about the source of the illnesses based on what information is known. Hypothesis generation is an ongoing process during an investigation. Early in multistate outbreak investigations, it may not even be clear whether the outbreak ...

  2. The Epidemiologic Toolbox: Identifying, Honing, and Using the Right

    Loosely speaking, these research goals fall along a spectrum with purely descriptive epidemiology at 1 end; hypothesis generation, prediction, and outbreak investigation somewhere in the middle; and causal effect estimation and program evaluation at the other end. Here, we envision the spectrum signifying the approximate strength of assumptions ...

  3. PDF Hypothesis Generation During Outbreaks

    Overview of hypothesis generation. When an outbreak has been identi-fied, demographic, clinical and/or laboratory data are usually ob-tained from the health department, clinicians, or laboratories, and these data are organized in a line listing (see FOCUS Issue 4 for more information about line listings). The next step in the investigation in ...

  4. Methods for generating hypotheses in human enteric illness outbreak

    Hypothesis generation about both the potential source(s) and route(s) of exposure is a key step in outbreak investigations, as it begins the process of narrowing the search for the transmission vehicle. ... The use of descriptive epidemiology is generally based on questionnaire data and is often one of the first hypothesis generation methods ...

  5. The use of multiple hypothesis-generating methods in an outbreak

    The hypothesis generation process is a critical step, as the findings inform further investigative activities and the ability to take action [1, 2]. Despite the importance of hypothesis generation, it is often not well described in published outbreak investigation reports, limiting the ability of investigators to learn from other experiences [ 3 ].

  6. Hypothesis Generation During Foodborne-Illness Outbreak Investigations

    Descriptive epidemiology of cases, including person, place, or time characteristics, remains a powerful tool for hypothesis generation. Person characteristics can suggest foods that are more likely to be eaten by certain groups, whereas place and time characteristics can provide clues about the geographic distribution and shelf life of the food.

  7. Step 6: Develop Hypotheses

    Consider the information obtaining during hypothesis-generating interviews, and also consider the location of cases (spot map) and the time course of the epidemic in relation to the incubation period of the disease (the epidemic curve). ... There are two general study designs that can be used in analytical epidemiology: a cohort study or a case ...

  8. Using Outbreak Data for Hypothesis Generation: A Vehicle ...

    Hypothesis generation about potential food and other exposures is a critical step in an enteric disease outbreak investigation, helping to focus investigation efforts and use of limited resources. Historical outbreak data are an important source of information for hypothesis generation, providing data on common food- and animal-pathogen pairs ...

  9. (PDF) Methods for generating hypotheses in human enteric illness

    Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. The pathogens that contaminate food have many diverse reservoirs, resulting in seemingly limitless ...

  10. Hypothesis Generation

    Hypothesis Generation: Using Descriptive Epidemiology to Generate a Hypothesis Regarding the Source of Infection and Mode of Transmission. In reality one begins to form hypotheses as soon as information about the outbreak begins to emerge. If you know what the disease is, your hypotheses will take into account its biology, what the reservoirs ...

  11. Hypothesis Generation During Foodborne-Illness Outbreak ...

    Abstract. Hypothesis generation is a critical, but challenging, step in a foodborne outbreak investigation. The pathogens that contaminate food have many diverse reservoirs, resulting in seemingly limitless potential vehicles. Identifying a vehicle is particularly challenging for clusters detected through national pathogen-specific surveillance ...

  12. 7.1.4

    Evaluating Hypotheses. There are two approaches to evaluating hypotheses: comparison of the hypotheses with the established facts and analytic epidemiology, which allows testing hypotheses. A comparison with established facts is useful when the evidence is so strong that the hypothesis does not need to be tested.

  13. PDF Epidemiology and Infection Methods for generating hypotheses in human

    breaks that described hypothesis generation methods and 33 papers which focused on the evaluation of hypothesis generation methods. Common hypothesis generation methods described are analytic studies (64.8%), descriptive epidemiology (33.7%), food orenvironmen-tal sampling (32.8%) and facility inspections (27.9%). The least common methods included

  14. The Epidemiologic Toolbox: Identifying, Honing, and Using the Right

    There has been much debate about the relative emphasis of the field of epidemiology on causal inference. We believe this debate does short shrift to the breadth of the field. ... hypothesis generation and theorization about causal structures and 2) hypothesis-driven causal effect estimation. The potential outcomes framework and causal graph ...

  15. The use of multiple hypothesis-generating methods in an outbreak

    The hypothesis generation process is a critical step, as the findings inform further investigative activities and the ability to take action [Reference Gregg 1, 2]. Despite the importance of hypothesis generation, it is often not well described in published outbreak investigation reports, limiting the ability of investigators to learn from ...

  16. Onset of the Outbreak Investigation

    Hypothesis Generation. In reality one begins to form hypotheses as soon as information about the outbreak begins to emerge. ... age, school attended, etc). Descriptive epidemiology focuses entirely on similarities among the cases and does not establish the cause. Hypotheses that are generated need to be tested by analytic studies that compare ...

  17. Formulating Hypotheses for Different Study Designs

    Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...

  18. 3. Generate Hypotheses

    Waterborne Disease Outbreak Investigation Response. 3. Generate Hypotheses. Developing a hypothesis regarding the cause of the outbreak is often challenging and is a crucial step in the outbreak investigation. Many pathogens that cause waterborne diseases can also be transmitted by contaminated food or by contact with an infected person or animal.

  19. Hypothesis Formulation

    Descriptive epidemiology searches for patterns by examining characteristics of person, place, & time. These characteristics are carefully considered when a disease outbreak occurs, because they provide important clues regarding the source of the outbreak. Hypotheses about the determinants of disease arise from considering the characteristics of ...

  20. Why and How Epidemiologists Should Use Mixed Methods

    The field of epidemiology's current focus on causal inference follows a quantitative approach and limits research questions to those that are strictly quantifiable. ... inference in epidemiology from a pluralist perspective. 9,11,12 Mixed methods can bring the population's insight into hypothesis generation and incorporate context into ...