PsyBlog

Social Psychology Experiments: 10 Of The Most Famous Studies

Ten of the most influential social psychology experiments explain why we sometimes do dumb or irrational things. 

social psychology experiments

Ten of the most influential social psychology experiments explain why we sometimes do dumb or irrational things.

“I have been primarily interested in how and why ordinary people do unusual things, things that seem alien to their natures. Why do good people sometimes act evil? Why do smart people sometimes do dumb or irrational things?” –Philip Zimbardo

Like famous social psychologist Professor Philip Zimbardo (author of The Lucifer Effect: Understanding How Good People Turn Evil ), I’m also obsessed with why we do dumb or irrational things.

The answer quite often is because of other people — something social psychologists have comprehensively shown.

Each of the 10 brilliant social psychology experiments below tells a unique, insightful story relevant to all our lives, every day.

Click the link in each social psychology experiment to get the full description and explanation of each phenomenon.

1. Social Psychology Experiments: The Halo Effect

The halo effect is a finding from a famous social psychology experiment.

It is the idea that global evaluations about a person (e.g. she is likeable) bleed over into judgements about their specific traits (e.g. she is intelligent).

It is sometimes called the “what is beautiful is good” principle, or the “physical attractiveness stereotype”.

It is called the halo effect because a halo was often used in religious art to show that a person is good.

2. Cognitive Dissonance

Cognitive dissonance is the mental discomfort people feel when trying to hold two conflicting beliefs in their mind.

People resolve this discomfort by changing their thoughts to align with one of conflicting beliefs and rejecting the other.

The study provides a central insight into the stories we tell ourselves about why we think and behave the way we do.

3. Robbers Cave Experiment: How Group Conflicts Develop

The Robbers Cave experiment was a famous social psychology experiment on how prejudice and conflict emerged between two group of boys.

It shows how groups naturally develop their own cultures, status structures and boundaries — and then come into conflict with each other.

For example, each country has its own culture, its government, legal system and it draws boundaries to differentiate itself from neighbouring countries.

One of the reasons the became so famous is that it appeared to show how groups could be reconciled, how peace could flourish.

The key was the focus on superordinate goals, those stretching beyond the boundaries of the group itself.

4. Social Psychology Experiments: The Stanford Prison Experiment

The Stanford prison experiment was run to find out how people would react to being made a prisoner or prison guard.

The psychologist Philip Zimbardo, who led the Stanford prison experiment, thought ordinary, healthy people would come to behave cruelly, like prison guards, if they were put in that situation, even if it was against their personality.

It has since become a classic social psychology experiment, studied by generations of students and recently coming under a lot of criticism.

5. The Milgram Social Psychology Experiment

The Milgram experiment , led by the well-known psychologist Stanley Milgram in the 1960s, aimed to test people’s obedience to authority.

The results of Milgram’s social psychology experiment, sometimes known as the Milgram obedience study, continue to be both thought-provoking and controversial.

The Milgram experiment discovered people are much more obedient than you might imagine.

Fully 63 percent of the participants continued administering what appeared like electric shocks to another person while they screamed in agony, begged to stop and eventually fell silent — just because they were told to.

6. The False Consensus Effect

The false consensus effect is a famous social psychological finding that people tend to assume that others agree with them.

It could apply to opinions, values, beliefs or behaviours, but people assume others think and act in the same way as they do.

It is hard for many people to believe the false consensus effect exists because they quite naturally believe they are good ‘intuitive psychologists’, thinking it is relatively easy to predict other people’s attitudes and behaviours.

In reality, people show a number of predictable biases, such as the false consensus effect, when estimating other people’s behaviour and its causes.

7. Social Psychology Experiments: Social Identity Theory

Social identity theory helps to explain why people’s behaviour in groups is fascinating and sometimes disturbing.

People gain part of their self from the groups they belong to and that is at the heart of social identity theory.

The famous theory explains why as soon as humans are bunched together in groups we start to do odd things: copy other members of our group, favour members of own group over others, look for a leader to worship and fight other groups.

8. Negotiation: 2 Psychological Strategies That Matter Most

Negotiation is one of those activities we often engage in without quite realising it.

Negotiation doesn’t just happen in the boardroom, or when we ask our boss for a raise or down at the market, it happens every time we want to reach an agreement with someone.

In a classic, award-winning series of social psychology experiments, Morgan Deutsch and Robert Krauss investigated two central factors in negotiation: how we communicate with each other and how we use threats.

9. Bystander Effect And The Diffusion Of Responsibility

The bystander effect in social psychology is the surprising finding that the mere presence of other people inhibits our own helping behaviours in an emergency.

The bystander effect social psychology experiments are mentioned in every psychology textbook and often dubbed ‘seminal’.

This famous social psychology experiment on the bystander effect was inspired by the highly publicised murder of Kitty Genovese in 1964.

It found that in some circumstances, the presence of others inhibits people’s helping behaviours — partly because of a phenomenon called diffusion of responsibility.

10. Asch Conformity Experiment: The Power Of Social Pressure

The Asch conformity experiments — some of the most famous every done — were a series of social psychology experiments carried out by noted psychologist Solomon Asch.

The Asch conformity experiment reveals how strongly a person’s opinions are affected by people around them.

In fact, the Asch conformity experiment shows that many of us will deny our own senses just to conform with others.

' data-src=

Author: Jeremy Dean

Psychologist, Jeremy Dean, PhD is the founder and author of PsyBlog. He holds a doctorate in psychology from University College London and two other advanced degrees in psychology. He has been writing about scientific research on PsyBlog since 2004. He is also the author of the book "Making Habits, Breaking Habits" (Da Capo, 2013) and several ebooks. View all posts by Jeremy Dean

social experiment research paper example

Join the free PsyBlog mailing list. No spam, ever.

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Writing the Experimental Report: Overview, Introductions, and Literature Reviews

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

Experimental reports (also known as "lab reports") are reports of empirical research conducted by their authors. You should think of an experimental report as a "story" of your research in which you lead your readers through your experiment. As you are telling this story, you are crafting an argument about both the validity and reliability of your research, what your results mean, and how they fit into other previous work.

These next two sections provide an overview of the experimental report in APA format. Always check with your instructor, advisor, or journal editor for specific formatting guidelines.

General-specific-general format

Experimental reports follow a general to specific to general pattern. Your report will start off broadly in your introduction and discussion of the literature; the report narrows as it leads up to your specific hypotheses, methods, and results. Your discussion transitions from talking about your specific results to more general ramifications, future work, and trends relating to your research.

Experimental reports in APA format have a title page. Title page formatting is as follows:

  • A running head and page number in the upper right corner (right aligned)
  • A definition of running head in IN ALL CAPS below the running head (left aligned)
  • Vertically and horizontally centered paper title, followed by author and affiliation

Please see our sample APA title page .

Crafting your story

Before you begin to write, carefully consider your purpose in writing: what is it that you discovered, would like to share, or would like to argue? You can see report writing as crafting a story about your research and your findings. Consider the following.

  • What is the story you would like to tell?
  • What literature best speaks to that story?
  • How do your results tell the story?
  • How can you discuss the story in broad terms?

During each section of your paper, you should be focusing on your story. Consider how each sentence, each paragraph, and each section contributes to your overall purpose in writing. Here is a description of one student's process.

Briel is writing an experimental report on her results from her experimental psychology lab class. She was interested in looking at the role gender plays in persuading individuals to take financial risks. After her data analysis, she finds that men are more easily persuaded by women to take financial risks and that men are generally willing to take more financial risks.

When Briel begins to write, she focuses her introduction on financial risk taking and gender, focusing on male behaviors. She then presents relevant literature on financial risk taking and gender that help illuminate her own study, but also help demonstrate the need for her own work. Her introduction ends with a study overview that directly leads from the literature review. Because she has already broadly introduced her study through her introduction and literature review, her readers can anticipate where she is going when she gets to her study overview. Her methods and results continue that story. Finally, her discussion concludes that story, discussing her findings, implications of her work, and the need for more research in the area of gender and financial risk taking.

The abstract gives a concise summary of the contents of the report.

  • Abstracts should be brief (about 100 words)
  • Abstracts should be self-contained and provide a complete picture of what the study is about
  • Abstracts should be organized just like your experimental report—introduction, literature review, methods, results and discussion
  • Abstracts should be written last during your drafting stage

Introduction

The introduction in an experimental article should follow a general to specific pattern, where you first introduce the problem generally and then provide a short overview of your own study. The introduction includes three parts: opening statements, literature review, and study overview.

Opening statements: Define the problem broadly in plain English and then lead into the literature review (this is the "general" part of the introduction). Your opening statements should already be setting the stage for the story you are going to tell.

Literature review: Discusses literature (previous studies) relevant to your current study in a concise manner. Keep your story in mind as you organize your lit review and as you choose what literature to include. The following are tips when writing your literature review.

  • You should discuss studies that are directly related to your problem at hand and that logically lead to your own hypotheses.
  • You do not need to provide a complete historical overview nor provide literature that is peripheral to your own study.
  • Studies should be presented based on themes or concepts relevant to your research, not in a chronological format.
  • You should also consider what gap in the literature your own research fills. What hasn't been examined? What does your work do that others have not?

Study overview: The literature review should lead directly into the last section of the introduction—your study overview. Your short overview should provide your hypotheses and briefly describe your method. The study overview functions as a transition to your methods section.

You should always give good, descriptive names to your hypotheses that you use consistently throughout your study. When you number hypotheses, readers must go back to your introduction to find them, which makes your piece more difficult to read. Using descriptive names reminds readers what your hypotheses were and allows for better overall flow.

In our example above, Briel had three different hypotheses based on previous literature. Her first hypothesis, the "masculine risk-taking hypothesis" was that men would be more willing to take financial risks overall. She clearly named her hypothesis in the study overview, and then referred back to it in her results and discussion sections.

Thais and Sanford (2000) recommend the following organization for introductions.

  • Provide an introduction to your topic
  • Provide a very concise overview of the literature
  • State your hypotheses and how they connect to the literature
  • Provide an overview of the methods for investigation used in your research

Bem (2006) provides the following rules of thumb for writing introductions.

  • Write in plain English
  • Take the time and space to introduce readers to your problem step-by-step; do not plunge them into the middle of the problem without an introduction
  • Use examples to illustrate difficult or unfamiliar theories or concepts. The more complicated the concept or theory, the more important it is to have clear examples
  • Open with a discussion about people and their behavior, not about psychologists and their research

Social Experiments Research Paper

Academic Writing Service

Sample Social Experiments Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

An experiment is a deliberately planned process to collect data that enable causal inferences and legitimize treatment comparisons. Random assignments of subjects to treatments (including control groups) allow statistically significant differences among treatment outcomes to be attributed solely to the treatment differences. Without strong additional assumptions, observational studies and quasi-experiments cannot justify drawing causal inferences, because only experiments, by definition, assign the treatments randomly.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% off with 24start discount code.

Public welfare relies on experiments for many purposes, including to approve medicines, medical procedures, new foods, and product reliability. Largescale social experiments differ from most of these experiments, even from many that involve human subjects, in that the treatments in social experiments are social programs, the subjects are humans, and the responses measure outcomes for these people (or families) as they make decisions under different possible programs.

Large-scale public policy experiments in the United States generally are considered to have started with the New Jersey Negative Income Tax Experiment 1968. Orr (1999) puts the number of social experiments (not all large scale) launched per decade in the US at six in the 1960s, at 49 and 84 in the next two decades, and then at 56 for 1991–95.

These experiments were fielded to gather scientific information so policy makers could know as accurately as possible the costs and benefits of possible social programs. Most social experiments have been initiated and funded by agencies (national, state, or at other political levels) concerned with the welfare of large populations. While these large experiments were costly and time consuming, they found sponsors because their costs were dwarfed by the costs of actual programs.

1. Some Examples

Much has been written about social experiments. Greenberg and Shroder (1997) summarize the duration, cost, treatments, measured outcomes, sample sizes, design, funding sources, developers, evaluators, and results of over 100 of them. Orr (1999), Fienberg et al. (1985), and others in the references here, review some of the main studies and list further sources. This research paper describes several of the better-known public policy social experiments, each of which monitored human subjects for several years as they made personal or family economic decisions while eligible for benefits provided by an experimental policy to which they were randomly assigned.

1.1 The New Jersey Negative Income Tax Experiment (Nj–Nit )

The NJ–NIT experiment (Fienberg et al. 1985, Greenberg and Shroder 1997, Orr 1999, Hausman and Wise 1985) was inspired by the thesis of Ross (1966) while she was a PhD student in economics at MIT. Ross argued that only a social experiment could resolve disputes over how work patterns of the poor might change if the government were to adopt a Negative Income Tax (NIT) program to supplement their incomes. The Office of Economic Opportunity in the Department of Health, Education, and Welfare (HEW) funded the initial Income Maintenance Experiment (IME) to begin in 1968. Mathematica Policy Research, commissioned to design and field it, randomized a total of 725 experimental households into nine treatment groups, and another 632 households into a control group, with welfare and near welfare households in four New Jersey cities being enrolled for three years each. The nine experimental treatments in this response surface design involved two main variables, one providing a range of income guarantees, and the other providing a range of tax rate reductions to families as incentives to earn additional income. The NJ–NIT objectives were to estimate, mainly via regression analyses, how work hours, rates of high school graduation, and other outcomes depended on these factors.

Difficulties encountered in the NJ–IME included significant nonstationary participant behavior at the outset caused by participants needing time to become familiar with their program incentives, and toward the end caused by some gaming behaviors. The NIT design deliberately correlated the random treatment assignments with family incomes in order to reduce experimental costs. That strategy complicated the analyses and risked compromising the experiment’s randomization. Other design problems included differences between treatment and control participants in their initial income reporting, and using of total income as an eligibility criterion at enrollment with a resulting underrepresentation of working wives.

1.2 The Health Insurance Experiment (Hie )

The need for a health insurance experiment was recognized during the Nixon administration when for the first time the government was considering possible forms of a national health insurance program. Congressional leaders held widely divergent views about possible costs of a national program, and especially about the elasticity of demand for health care. If demand were completely inelastic, previously insured individuals would continue to purchase medical services at the same rate. That would make government sponsorship more affordable. While no experiment could provide all needed information, policymakers funded the HIE because it would narrow the range of disagreements, and help avoid disastrous errors in a national program.

The HIE was fielded in 1974 as the income maintenance experiments wound down. The Office of Economic Opportunity, and later HEW, asked the Rand Corporation to design and conduct the HIE, ultimately detailed in ‘Free for All’ (Newhouse 1993) by its principal investigator.

The HIE was designed to assess the demand for healthcare and many other questions, including whether health benefits might derive from a national health insurance program. Nearly 3,000 families, each for three to five years, were randomized to one of 14 insurance plans, including an HMO group. Insurance plans had two main dimensions, with varying coinsurance rates that families faced, and with varying deductible limits (after reaching the limits families were exempted from any further expenses for the years remaining). Anticipating that the world might differ considerably after the results of an eight year experiment became available, the HIE treatments included no particular proposal, but instead were chosen to provide a range of rates and benefits that could be used reliably to extrapolate HIE results to future legislative proposals.

Demand in the HIE was ultimately found to be elastic. Perhaps primary among HIE findings have been the elasticity estimates that legislators still use when comparing the costs of new health insurance proposals (Newhouse 1993, Orr 1999).

1.3 JOBSTART, A Demonstration

The rate of new social experiments continues to increase as the technology for doing them matures, and as new research groups and firms develop the skills to design, conduct, and analyze them. The newer experiments tend to be smaller and more focused than the first ones, many having been designed to demonstrate the effectiveness of an existing program, and to identify areas for improvement. These ‘demonstration’ experiments are simpler to field than those, like the IME and the HIE, that are designed to provide information on a wide range of unspecified programs.

Perhaps politicians are partial to demonstrations because of their lower costs and because they focus on concrete legislation. Demonstration experiments cost less because some of the participant funding comes through an existing program’s benefits, and because focus is exclusively on one treatment and a control group. A complication is that control group subjects, if matched to treatment subjects, will be eligible for the same program benefits and so must be allowed to access them. One way to counter this is to have the treatment to be an encouragment for eligible subjects to take advantage of an underused program.

JOBSTART was a demonstration experiment to evaluate the Job Training Partnership Act (JTPA) of 1982 (Greenberg and Shroder 1997, Cave et al. 1993). Sponsored jointly by the US Department of Labor and by several private foundations, JOBSTART was designed and evaluated by Manpower Demonstration Research Center (MDRC) between 1985 and 1992, and administered by the Local Service Delivery Areas in 13 geographically disparate US sites. Subjects were randomized into treatment and control groups, with about 1,150 subjects in each. All subjects were JTPA eligible (a national program), being economically disadvantaged, high school dropouts aged between 17 and 21 years, and all were poor readers. Treatment interventions included two summers of vocational and educational training, support services, job placement assistance, and incentive payments. Subjects were monitored for four years.

JOBSTART found that the treatment group had a dramatically higher rate of high school completion. Women who had no children before the demonstration, but who gave birth afterward, were less likely to receive AFDC payments. The opposite result was found for women who started with children. No major long-term differences were found for employment, earnings, or welfare measures.

1.4 The Housing Assistance Supply Experiment (HASE )

Economic theory predicts an elastic supply response to increased demand, so that benefits received by individuals do not necessarily rise in proportion to the dollars provided. Even so, almost all large-scale social experiments have measured demand only, mainly because a supply response is extremely difficult to produce in an experiment. HASE is a notable exception (Lowry 1983). While Housing Allowance Demand Experiments had been fielded to evaluate new housing policies (Bradbury and Downs 1981), evaluating the supply response to alternative housing programs requires a long-term saturation experiment in which all eligible members of an entire community participate in an experimental program. A supply experiment takes longer because the supply side needs adequate time to respond to new demand, generally much longer than individuals need to respond to new opportunities.

The Department of Housing and Urban Development sponsored HASE to learn how the supply side would respond if individuals and families received increased government assistance for making home improvements. HASE ran from 1972 to 1982 in Green Bay, Wisconsin and in South Bend, Indiana, selected as two small, stable communities in which a meaningful supply response might be stimulated. Long-term benefits were guaranteed to all eligible families for 15 years, well beyond the length of the experiment, to give the construction community sufficient incentive to relocate workers and businesses into these regions.

Supply experiments are extremely ambitious, but the questions they address are crucial.

2. Alternatives To Social Experiments

Less expensive alternatives must be considered before undertaking a social experiment. These include expert (and nonexpert) opinions, theoretical models, surveys, observational studies, quasiexperiments, and, if they exist, appropriate ‘natural experiments.’ Because these alternatives usually are cheaper, faster, and easier than an experiment, they will be preferred whenever they can provide valid predictions. Even if they cannot produce valid predictions, their consideration is vital to designing an efficient social experiment.

These alternatives all have drawbacks. Expert opinions and theoretical models are no better than the experts or models. While surveys can sample randomly from a target population, they cannot assign treatments randomly, and surveyed subjects rarely would know how they would respond to a hypothetical future program. Nonexperimental data, quasiexperiments, and observational studies (both synonyms for nonrandomized studies) might be used to extrapolate from data on existing programs to predict outcomes of untried programs. Such predictions, however, are especially unreliable for major program changes, and self-selection in such data may bias results. Natural experiments might occur if a comparable country or region adopts a policy similar to one under consideration, but they usually do not exist, and if one does occur, results still must be extrapolated to another country or population.

As none of these alternatives has randomly assigned treatments, selection bias will be present and causal inferences will be suspect. Drivers who use seat belts have lower traffic fatality rates, but those rates are confounded with the same drivers being generally safer drivers. Similarly, individuals who choose health insurance plans with less generous coinsurance rates generally consume fewer health services. Is that only because some of them, knowing they are healthier, choose less generous insurance plans? If so, their behavior in a future health insurance program that includes everyone will not be predictable from their past utilizations. A health insurance experiment offers an alternative option to using observational data for predicting the costs of proposed national health insurance programs.

3. Risks Of Social Experiments

Being costly and time-consuming, social experiments have been used sparingly. They might not provide useful information for many years, risking their irrelevance. Human subjects can refuse to enroll, and enrollees can drop out before completion. If that happens with regularity, randomization fails, and so the experiment fails.

If large experiments become too visible, their subjects may feel that the situation is artificial and act differently.

Drastic political, social, legislative, or economic changes can occur during an experiment and invalidate it. However, a control group can be protective in such situations, to the extent that the shock affects all treatments and controls equally. Orr (1999), who as a government economist helped spawn and plan several early public policy experiments, reviews such concerns about social experiments, and he discusses how experiments can protect themselves. Besides the issues mentioned here, he discusses whether and how experiments have affected policy, their credibility, time considerations, communication issues, their generalizability, relevance of their results, the policy environment, and policy links.

4. Design Issues

Those who design social experiments make numerous crucial choices. They must choose treatments that span the policies that are likely to be considered without picking an overly large range that provides too little information on the policies that ultimately matter. How many sites are needed? Too many sites are difficult to manage, and for a fixed budget, increased management costs must be paid for by decreased sample sizes. Too few sites restricts generalizability. Should sites be chosen randomly? Probably not if the number of sites must be kept small, because then randomization provides little basis for inference.

More subjects are better, but too many subjects make an experiment unfeasibly expensive. Without strong reasons to do otherwise, it helps to keep the percentage of subjects allocated to each treatment the same in every site. Balanced samples, which means matching the key characteristics of the sample across sites and treatments, have various optimal properties in experimental design, and they enjoy considerable face validity (Morris and Hill 2000).

Enrollment participation must be long enough for individuals to learn how to take advantage of experimental programs (and so to reach steady state), and perhaps to discount the times at the beginning and/or the end of the experiment when subjects may act differently than they would in a continuing government program. However, overly long enrollments waste time because early results are better for policy, and because, assuming positive correlations within individuals, one individual provides less information per unit time than two independent individuals. These transitory issues may make it necessary to field a longer experiment, or at least to guarantee treatment benefits beyond the actual measurement period.

Decisions about what data to collect, about the design of questionnaires, and about how much and how frequently to interview subjects may be as crucial as other statistical design decisions. Interviewing subjects too often risks over-stimulating them (‘Hawthorne effects’), so that subjects become overly aware of participating in an experiment and behave artificially. Respondent burden can cause dropouts, careless responses, and selective nonresponse. Interviewing subjects too infrequently sacrifices potentially important data. Experimenters can give a better understanding of respondent burden by subjecting themselves to the same interviewing processes which experimental subjects face.

While large-scale social experiments encounter difficulties beyond those of smaller experiments, they will have greater financial resources to deal with them. Some potential difficulties that concerned HIE designers included: creating Hawthorne effects by frequent interviewing; activating participants by obtaining a medical screening examination at enrollment; stimulating health expenditures by having to pay participation incentives to families; and not being sure of the best time horizon (Newhouse et al. 1979). The HIE budget allowed for four (balanced) ‘subexperiments’ within the main experiment to measure these effects. In one subexperiment, half of the HIE subjects were chosen at random for weekly interviews, and half for bi-weekly. Similarly, 60 percent of the HIE subjects were randomly selected for initial screening exams, and the rest not; and 70 percent were randomly assigned for three-year enrollments, and the other 30 percent for five years. This allowed testing for these effects and, if found, adjusting for them.

5. Further Reading

Orr (1999) provides an overview and first-hand examples of the design and implementation of social experiments. Greenberg and Shroder (1997) summarize 143 completed social experiments in the United States, and 75 others that were ongoing at the time of their publication. Each summary describes the target population, policies tested, experimental design, sites, findings, sources of further information, and public access to the data. Campbell’s fundamental work in social experimentation is summarized in Campbell and Russo (1999). Boruch (1997) provides reference material on randomization in field experiments and Neuberg (1989) discusses anomalies of social control experimentation.

Bibliography:

  • Aigner D J, Morris C (eds.) 1979 Experimental design in econometrics. Journal of Econometrics 11(1)
  • Boruch R F 1975 On common contentions about randomized field experiments. In: Boruch R F, Riecken H W (eds.) Experimental Tests of Public Policy. Westview, Boulder, CO, pp. 108–45
  • Boruch R F 1997 Randomized Experiments for Planning and Evaluation: A Practical Guide. Sage, Thousand Oaks, CA Bradbury K, Downs A (eds.) 1981 Do Housing Allowances Work? Brookings Institution, Washington, DC
  • Campbell D T, Russo M J 1999 Social Experimentation. Sage, Thousand Oaks, CA
  • Cave G, Doolittle F, Bos H, Toussaint C 1993 JOBSTART: Final Report on a Program for School Dropouts. Manpower Demonstration Research Corporation, New York
  • Ferber R, Hirsch W Z 1982 Social Experimentation and Economic Policy. Cambridge University Press, Cambridge, UK
  • Fienberg S E, Singer B, Tanur J M 1985 Large-scale social experimentation in the United States. A Celebration of Statistics: The ISI Centenary Volume: 287–326, SpringerVerlag, New York
  • Greenberg D, Shroder M 1997 The Digest of Social Experiments, 2nd edn. Urban Institute Press, Washington, DC
  • Hausman J, Wise D (eds.) 1985 Social Experimentation. University of Chicago Press, Chicago
  • Lowry I S (ed.) 1983 Experimenting with Housing Allowances: The Final Report of the Housing Assistance Supply Experiment. Oelgeschlager, Gunn & Hain, Cambridge, MA
  • Morris C N 1979 A finite selection model for experimental design of the health insurance study. Journal of Econometrics 11: 43–61
  • Morris C N, Hill J L 2000 The Health Insurance Experiment: Design Using the Finite Selection Model. Public Policy and Statistics: Case Studies from RAND. Springer, New York
  • Neuberg L G 1989 Conceptual Anomalies in Economics and Statistics: Lessons from the Social Experiment. Cambridge University Press, Cambridge, UK
  • Newhouse J P 1993 Free for All? Lessons from the RAND Health Insurance Experiments. Harvard University Press, Cambridge, MA
  • Newhouse J P, Marquis K H, Morris C N, Phelps C E, Rogers W H 1979 Measurement issues in the second generation of social experiments: The health insurance study. Journal of Econometrics 11: 117–29
  • Orr L L 1999 Social Experiments: Evaluating Public Programs with Experimental Methods. Sage, Thousand Oaks, CA
  • Ross H 1966 A Proposal for a Demonstration of New Techniques in Income Maintenance (mimeo). Data Center Archives, Institute for Research on Poverty, University of Wisconsin, Madison, WI

ORDER HIGH QUALITY CUSTOM PAPER

social experiment research paper example

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Social Psychology Research Topics

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

social experiment research paper example

Emily is a board-certified science editor who has worked with top digital publishing brands like Voices for Biodiversity, Study.com, GoodTherapy, Vox, and Verywell.

social experiment research paper example

Choosing topics for social psychology research papers or projects for class can be challenging. It is a broad and fascinating field, which can make it challenging to figure out what you want to investigate in your research.

Social psychology explores how individual thoughts, feelings, and behaviors are affected by social influences. It explores how each person's behavior is affected by their social environment.

This article explores a few different social psychology topics and research questions you might want to study in greater depth. It covers how to start your search for a topic as well as specific ideas you might choose to explore.

How to Find a Social Psychology Research Topic

As you begin your search, think about the questions that you have. What topics interest you? Following your own interests and curiosities can often inspire great research questions.

Choose a Sub-Topic

Social psychologists are interested in all aspects of social behavior. Some of the main areas of interest within the field include social cognition, social influence, and social relationships investigating subtopics such as conformity, groupthink, attitude formation, obedience, prejudice, and so on.

  • Social cognition : How do we process and use information about social experiences? What kinds of biases influence how we engage with other people?
  • Social influence: What are the key social factors that influence our attitudes and behavior? What are group dynamics and how do we understand patterns of behavior in groups?
  • Social relationships : What are the different types of social relationships? How do they develop and change over time?

To help ensure that you select a topic that is specific enough, it can be helpful to start by confining your search to one of these main areas.

Browse Through Past Research

After narrowing down your choices, consider what questions you might have. Are there questions that haven't been fully answered by previous studies? At this point, it can be helpful to spend some time browsing through journal articles or books to see some examples of past findings and identify gaps in the literature.

You can also find inspiration and learn more about a topic by searching for keywords related to your topic in psychological databases such as PsycINFO or browsing through some professional psychology journals.

Narrow Down Your Specific Topic

Once you have a general topic, you'll need to narrow down your research. The goal is to choose a research question that is specific, measurable, and testable. Let's say you want to study conformity; An example of a good research question might be, “Are people more likely to conform when they are in a small group or a large group?” In this case, the specific topic of your paper would be how group size influences social conformity .

Review the Literature on Your Chosen Topic

After choosing a specific social psychology topic to research, the next step is to do a literature review. A literature review involves reading through the existing research findings related to a specific topic.

You are likely to encounter a great deal of information on your topic, which can seem overwhelming at times. You may find it helpful to start by reading review articles or meta-analysis studies. These are summaries of previous research on your topic or studies that incorporate a large pool of past research on the topic.

Talk to Your Instructor

Even if you are really excited to dive right in and start working on your project, there are some important preliminary steps you need to take.

Before you decide to tackle a project for your social psychology class, you should always clear your idea with your instructor. This initial step can save you a lot of time and hassle later on.

Your instructor can offer clear feedback on things you should and should not do while conducting your research and might be able to offer some helpful tips. Also, if you plan to implement your own social experiment, your school might require you to present to and gain permission from an institutional review board.

Thinking about the questions you have about social psychology can be a great way to discover topics for your own research. Once you have a general idea, explore the literature and refine your research question to make sure it is specific enough.

Examples of Social Psychology Research Topics

The following are some specific examples of different subjects you might want to investigate further as part of a social psychology research paper, experiment, or project:

Implicit Attitudes

How do implicit attitudes influence how people respond to others? This can involve exploring how people's attitudes towards different groups of people (e.g., men, women, ethnic minorities) influence their interactions with those groups. For example, one study found that 75% of people perceive men to be more intelligent than women .

In your own project, you might explore how implicit attitudes impact perceptions of qualities such as kindness, intelligence, leadership skills, or attractiveness.

Prosocial Behavior

You might also choose to focus on prosocial behavior in your research. This can involve investigating the reasons why people help others. Some questions you could explore further include:

  • What motivates people to help others?
  • When are people most likely to help others?
  • How does helping others cause people to feel?
  • What are the benefits of helping other people?

How do people change their attitudes in response to persuasion? What are the different techniques that can be used to persuade someone? What factors make some people more susceptible to persuasion than others?

One way to investigate this could be through collecting a wide variety of print advertisements and analyzing how​ persuasion is used. What types of cognitive and affective techniques are utilized? Do certain types of advertisements tend to use specific kinds of persuasive techniques ?

Another area of social psychology that you might research is aggression and violence. This can involve exploring the factors that lead to aggression and violence and the consequences of these behaviors. Some questions you might explore further include:

  • When is violence most likely to occur?
  • What factors influence violent behavior?
  • Do traumatic experiences in childhood lead to more aggressive behavior in adulthood?
  • Does viewing violent media content contribute to increased aggressive behavior in real life?

Prejudice and discrimination are areas that present a range of research opportunities. This can involve studying the different forms that prejudice takes (e.g., sexism, racism, ageism ), as well as the psychological effects of prejudice and discrimination. You might also want to investigate topics related to how prejudices form or strategies that can be used to reduce such discrimination.

Nonverbal Behavior

How do people respond when nonverbal communication does not match up to verbal behavior (for example, saying you feel great when your facial expressions and tone of voice indicate otherwise). Which signal do people respond to most strongly?

How good are people at detecting lies ? Have participants tell a group of people about themselves, but make sure some of the things are true while others are not. Ask members of the group which statements they thought were true and which they thought were false.

Social Norms

How do people react when social norms are violated? This might involve acting in a way that is outside the norm in a particular situation or enlisting friends to act out the behaviors while you observe.

Some examples that you might try include wearing unusual clothing, applauding inappropriately at the end of a class lecture, cutting in line in front of other people, or some other mildly inappropriate behavior. Keep track of your own thoughts as you perform the experiment and observe how people around you respond.

Online Social Behavior

Does online social networking make people more or less likely to interact with people in face-to-face or other offline settings? To investigate this further, you could create a questionnaire to assess how often people participate in social networking versus how much time they spend interacting with their friends in real-world settings.

Social Perception

How does our appearance impact how people respond to us? Ask some friends to help you by having two people dress up in dramatically different ways, one in a professional manner and one in a less conventional manner. Have each person engage in a particular action, then observe how they are treated and how other people's responses differ.

Social psychologists have found that attractiveness can produce what is known as a halo effect . Essentially, we tend to assume that people who are physically attractive are also friendly, intelligent, pleasant, and likable.

To investigate this topic, you could set up an experiment where you have participants look at photographs of people of varying degrees of physical attractiveness, and then ask them to rate each person based on a variety of traits, including social competence, kindness, intellect, and overall likability.

Think about how this might affect a variety of social situations, including how employees are selected or how jurors in a criminal case might respond.

Social psychology is a broad field, so there are many different subtopics you might choose to explore in your research. Implicit attitudes, prosocial behavior, aggression, prejudice, and social perception are just a few areas you might want to consider.

A Word From Verywell

Social psychology topics can provide a great deal of inspiration for further research, whether you are writing a research paper or conducting your own experiment. In addition to some of the social psychology topics above, you can also draw inspiration from your own curiosity about social behavior or examine social issues that you see taking place in the world around you. 

American Psychological Association.  Frequently asked questions about institutional review boards .

Storage D, Charlesworth TES, Banaji M, Cimpian A.  Adults and children implicitly associate brilliance with men more than women .  J Exp Soc Psychol . 2012;90:104020. doi:10.1016/j.jesp.2020.104020

Talamas SN, Mavor KI, Perrett DI. Blinded by beauty: Attractiveness bias and accurate perceptions of academic performance . PLoS ONE . 2016;11(2):e0148284. doi:10.1371/journal.pone.0148284

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • 6. The Methodology
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

The methods section describes actions taken to investigate a research problem and the rationale for the application of specific procedures or techniques used to identify, select, process, and analyze information applied to understanding the problem, thereby, allowing the reader to critically evaluate a study’s overall validity and reliability. The methodology section of a research paper answers two main questions: How was the data collected or generated? And, how was it analyzed? The writing should be direct and precise and always written in the past tense.

Kallet, Richard H. "How to Write the Methods Section of a Research Paper." Respiratory Care 49 (October 2004): 1229-1232.

Importance of a Good Methodology Section

You must explain how you obtained and analyzed your results for the following reasons:

  • Readers need to know how the data was obtained because the method you chose affects the results and, by extension, how you interpreted their significance in the discussion section of your paper.
  • Methodology is crucial for any branch of scholarship because an unreliable method produces unreliable results and, as a consequence, undermines the value of your analysis of the findings.
  • In most cases, there are a variety of different methods you can choose to investigate a research problem. The methodology section of your paper should clearly articulate the reasons why you have chosen a particular procedure or technique.
  • The reader wants to know that the data was collected or generated in a way that is consistent with accepted practice in the field of study. For example, if you are using a multiple choice questionnaire, readers need to know that it offered your respondents a reasonable range of answers to choose from.
  • The method must be appropriate to fulfilling the overall aims of the study. For example, you need to ensure that you have a large enough sample size to be able to generalize and make recommendations based upon the findings.
  • The methodology should discuss the problems that were anticipated and the steps you took to prevent them from occurring. For any problems that do arise, you must describe the ways in which they were minimized or why these problems do not impact in any meaningful way your interpretation of the findings.
  • In the social and behavioral sciences, it is important to always provide sufficient information to allow other researchers to adopt or replicate your methodology. This information is particularly important when a new method has been developed or an innovative use of an existing method is utilized.

Bem, Daryl J. Writing the Empirical Journal Article. Psychology Writing Center. University of Washington; Denscombe, Martyn. The Good Research Guide: For Small-Scale Social Research Projects . 5th edition. Buckingham, UK: Open University Press, 2014; Lunenburg, Frederick C. Writing a Successful Thesis or Dissertation: Tips and Strategies for Students in the Social and Behavioral Sciences . Thousand Oaks, CA: Corwin Press, 2008.

Structure and Writing Style

I.  Groups of Research Methods

There are two main groups of research methods in the social sciences:

  • The e mpirical-analytical group approaches the study of social sciences in a similar manner that researchers study the natural sciences . This type of research focuses on objective knowledge, research questions that can be answered yes or no, and operational definitions of variables to be measured. The empirical-analytical group employs deductive reasoning that uses existing theory as a foundation for formulating hypotheses that need to be tested. This approach is focused on explanation.
  • The i nterpretative group of methods is focused on understanding phenomenon in a comprehensive, holistic way . Interpretive methods focus on analytically disclosing the meaning-making practices of human subjects [the why, how, or by what means people do what they do], while showing how those practices arrange so that it can be used to generate observable outcomes. Interpretive methods allow you to recognize your connection to the phenomena under investigation. However, the interpretative group requires careful examination of variables because it focuses more on subjective knowledge.

II.  Content

The introduction to your methodology section should begin by restating the research problem and underlying assumptions underpinning your study. This is followed by situating the methods you used to gather, analyze, and process information within the overall “tradition” of your field of study and within the particular research design you have chosen to study the problem. If the method you choose lies outside of the tradition of your field [i.e., your review of the literature demonstrates that the method is not commonly used], provide a justification for how your choice of methods specifically addresses the research problem in ways that have not been utilized in prior studies.

The remainder of your methodology section should describe the following:

  • Decisions made in selecting the data you have analyzed or, in the case of qualitative research, the subjects and research setting you have examined,
  • Tools and methods used to identify and collect information, and how you identified relevant variables,
  • The ways in which you processed the data and the procedures you used to analyze that data, and
  • The specific research tools or strategies that you utilized to study the underlying hypothesis and research questions.

In addition, an effectively written methodology section should:

  • Introduce the overall methodological approach for investigating your research problem . Is your study qualitative or quantitative or a combination of both (mixed method)? Are you going to take a special approach, such as action research, or a more neutral stance?
  • Indicate how the approach fits the overall research design . Your methods for gathering data should have a clear connection to your research problem. In other words, make sure that your methods will actually address the problem. One of the most common deficiencies found in research papers is that the proposed methodology is not suitable to achieving the stated objective of your paper.
  • Describe the specific methods of data collection you are going to use , such as, surveys, interviews, questionnaires, observation, archival research. If you are analyzing existing data, such as a data set or archival documents, describe how it was originally created or gathered and by whom. Also be sure to explain how older data is still relevant to investigating the current research problem.
  • Explain how you intend to analyze your results . Will you use statistical analysis? Will you use specific theoretical perspectives to help you analyze a text or explain observed behaviors? Describe how you plan to obtain an accurate assessment of relationships, patterns, trends, distributions, and possible contradictions found in the data.
  • Provide background and a rationale for methodologies that are unfamiliar for your readers . Very often in the social sciences, research problems and the methods for investigating them require more explanation/rationale than widely accepted rules governing the natural and physical sciences. Be clear and concise in your explanation.
  • Provide a justification for subject selection and sampling procedure . For instance, if you propose to conduct interviews, how do you intend to select the sample population? If you are analyzing texts, which texts have you chosen, and why? If you are using statistics, why is this set of data being used? If other data sources exist, explain why the data you chose is most appropriate to addressing the research problem.
  • Provide a justification for case study selection . A common method of analyzing research problems in the social sciences is to analyze specific cases. These can be a person, place, event, phenomenon, or other type of subject of analysis that are either examined as a singular topic of in-depth investigation or multiple topics of investigation studied for the purpose of comparing or contrasting findings. In either method, you should explain why a case or cases were chosen and how they specifically relate to the research problem.
  • Describe potential limitations . Are there any practical limitations that could affect your data collection? How will you attempt to control for potential confounding variables and errors? If your methodology may lead to problems you can anticipate, state this openly and show why pursuing this methodology outweighs the risk of these problems cropping up.

NOTE :   Once you have written all of the elements of the methods section, subsequent revisions should focus on how to present those elements as clearly and as logically as possibly. The description of how you prepared to study the research problem, how you gathered the data, and the protocol for analyzing the data should be organized chronologically. For clarity, when a large amount of detail must be presented, information should be presented in sub-sections according to topic. If necessary, consider using appendices for raw data.

ANOTHER NOTE : If you are conducting a qualitative analysis of a research problem , the methodology section generally requires a more elaborate description of the methods used as well as an explanation of the processes applied to gathering and analyzing of data than is generally required for studies using quantitative methods. Because you are the primary instrument for generating the data [e.g., through interviews or observations], the process for collecting that data has a significantly greater impact on producing the findings. Therefore, qualitative research requires a more detailed description of the methods used.

YET ANOTHER NOTE :   If your study involves interviews, observations, or other qualitative techniques involving human subjects , you may be required to obtain approval from the university's Office for the Protection of Research Subjects before beginning your research. This is not a common procedure for most undergraduate level student research assignments. However, i f your professor states you need approval, you must include a statement in your methods section that you received official endorsement and adequate informed consent from the office and that there was a clear assessment and minimization of risks to participants and to the university. This statement informs the reader that your study was conducted in an ethical and responsible manner. In some cases, the approval notice is included as an appendix to your paper.

III.  Problems to Avoid

Irrelevant Detail The methodology section of your paper should be thorough but concise. Do not provide any background information that does not directly help the reader understand why a particular method was chosen, how the data was gathered or obtained, and how the data was analyzed in relation to the research problem [note: analyzed, not interpreted! Save how you interpreted the findings for the discussion section]. With this in mind, the page length of your methods section will generally be less than any other section of your paper except the conclusion.

Unnecessary Explanation of Basic Procedures Remember that you are not writing a how-to guide about a particular method. You should make the assumption that readers possess a basic understanding of how to investigate the research problem on their own and, therefore, you do not have to go into great detail about specific methodological procedures. The focus should be on how you applied a method , not on the mechanics of doing a method. An exception to this rule is if you select an unconventional methodological approach; if this is the case, be sure to explain why this approach was chosen and how it enhances the overall process of discovery.

Problem Blindness It is almost a given that you will encounter problems when collecting or generating your data, or, gaps will exist in existing data or archival materials. Do not ignore these problems or pretend they did not occur. Often, documenting how you overcame obstacles can form an interesting part of the methodology. It demonstrates to the reader that you can provide a cogent rationale for the decisions you made to minimize the impact of any problems that arose.

Literature Review Just as the literature review section of your paper provides an overview of sources you have examined while researching a particular topic, the methodology section should cite any sources that informed your choice and application of a particular method [i.e., the choice of a survey should include any citations to the works you used to help construct the survey].

It’s More than Sources of Information! A description of a research study's method should not be confused with a description of the sources of information. Such a list of sources is useful in and of itself, especially if it is accompanied by an explanation about the selection and use of the sources. The description of the project's methodology complements a list of sources in that it sets forth the organization and interpretation of information emanating from those sources.

Azevedo, L.F. et al. "How to Write a Scientific Paper: Writing the Methods Section." Revista Portuguesa de Pneumologia 17 (2011): 232-238; Blair Lorrie. “Choosing a Methodology.” In Writing a Graduate Thesis or Dissertation , Teaching Writing Series. (Rotterdam: Sense Publishers 2016), pp. 49-72; Butin, Dan W. The Education Dissertation A Guide for Practitioner Scholars . Thousand Oaks, CA: Corwin, 2010; Carter, Susan. Structuring Your Research Thesis . New York: Palgrave Macmillan, 2012; Kallet, Richard H. “How to Write the Methods Section of a Research Paper.” Respiratory Care 49 (October 2004):1229-1232; Lunenburg, Frederick C. Writing a Successful Thesis or Dissertation: Tips and Strategies for Students in the Social and Behavioral Sciences . Thousand Oaks, CA: Corwin Press, 2008. Methods Section. The Writer’s Handbook. Writing Center. University of Wisconsin, Madison; Rudestam, Kjell Erik and Rae R. Newton. “The Method Chapter: Describing Your Research Plan.” In Surviving Your Dissertation: A Comprehensive Guide to Content and Process . (Thousand Oaks, Sage Publications, 2015), pp. 87-115; What is Interpretive Research. Institute of Public and International Affairs, University of Utah; Writing the Experimental Report: Methods, Results, and Discussion. The Writing Lab and The OWL. Purdue University; Methods and Materials. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper. Department of Biology. Bates College.

Writing Tip

Statistical Designs and Tests? Do Not Fear Them!

Don't avoid using a quantitative approach to analyzing your research problem just because you fear the idea of applying statistical designs and tests. A qualitative approach, such as conducting interviews or content analysis of archival texts, can yield exciting new insights about a research problem, but it should not be undertaken simply because you have a disdain for running a simple regression. A well designed quantitative research study can often be accomplished in very clear and direct ways, whereas, a similar study of a qualitative nature usually requires considerable time to analyze large volumes of data and a tremendous burden to create new paths for analysis where previously no path associated with your research problem had existed.

To locate data and statistics, GO HERE .

Another Writing Tip

Knowing the Relationship Between Theories and Methods

There can be multiple meaning associated with the term "theories" and the term "methods" in social sciences research. A helpful way to delineate between them is to understand "theories" as representing different ways of characterizing the social world when you research it and "methods" as representing different ways of generating and analyzing data about that social world. Framed in this way, all empirical social sciences research involves theories and methods, whether they are stated explicitly or not. However, while theories and methods are often related, it is important that, as a researcher, you deliberately separate them in order to avoid your theories playing a disproportionate role in shaping what outcomes your chosen methods produce.

Introspectively engage in an ongoing dialectic between the application of theories and methods to help enable you to use the outcomes from your methods to interrogate and develop new theories, or ways of framing conceptually the research problem. This is how scholarship grows and branches out into new intellectual territory.

Reynolds, R. Larry. Ways of Knowing. Alternative Microeconomics . Part 1, Chapter 3. Boise State University; The Theory-Method Relationship. S-Cool Revision. United Kingdom.

Yet Another Writing Tip

Methods and the Methodology

Do not confuse the terms "methods" and "methodology." As Schneider notes, a method refers to the technical steps taken to do research . Descriptions of methods usually include defining and stating why you have chosen specific techniques to investigate a research problem, followed by an outline of the procedures you used to systematically select, gather, and process the data [remember to always save the interpretation of data for the discussion section of your paper].

The methodology refers to a discussion of the underlying reasoning why particular methods were used . This discussion includes describing the theoretical concepts that inform the choice of methods to be applied, placing the choice of methods within the more general nature of academic work, and reviewing its relevance to examining the research problem. The methodology section also includes a thorough review of the methods other scholars have used to study the topic.

Bryman, Alan. "Of Methods and Methodology." Qualitative Research in Organizations and Management: An International Journal 3 (2008): 159-168; Schneider, Florian. “What's in a Methodology: The Difference between Method, Methodology, and Theory…and How to Get the Balance Right?” PoliticsEastAsia.com. Chinese Department, University of Leiden, Netherlands.

  • << Previous: Scholarly vs. Popular Publications
  • Next: Qualitative Methods >>
  • Last Updated: Apr 29, 2024 1:49 PM
  • URL: https://libguides.usc.edu/writingguide
  • Invited Paper
  • Published: 20 October 2010

Setting up social experiments: the good, the bad, and the ugly

Die Gestaltung von Sozialexperimenten: The good, the bad and the ugly

  • Burt S. Barnow 1  

Zeitschrift für ArbeitsmarktForschung volume  43 ,  pages 91–105 ( 2010 ) Cite this article

19k Accesses

5 Citations

3 Altmetric

Metrics details

It is widely agreed that randomized controlled trials – social experiments – are the gold standard for evaluating social programs. There are, however, many important issues that cannot be tested using social experiments, and often things go wrong when conducting social experiments. This paper explores these issues and offers suggestions on ways to deal with commonly encountered problems. Social experiments are preferred because random assignment assures that any differences between the treatment and control groups are due to the intervention and not some other factor; also, the results of social experiments are more easily explained and accepted by policy officials. Experimental evaluations often lack external validity and cannot control for entry effects, scale and general equilibrium effects, and aspects of the intervention that were not randomly assigned. Experiments can also lead to biased impact estimates if the control group changes its behavior or if changing the number selected changes the impact. Other problems with conducting social experiments include increased time and cost, and legal and ethical issues related to excluding people from the treatment. Things that sometimes go wrong in social experiments include programs cheating on random assignment, and participants and/or staff not understanding the intervention rules. The random assignment evaluation of the Job Training Partnership Act in the United States is used as a case study to illustrate the issues.

Zusammenfassung

Es herrscht weitestgehend Konsens darüber, dass randomisierte kontrollierte Studien – Sozialexperimente – der „Goldstandard“ für die Bewertung sozialer Programme sind. Es gibt jedoch viele wichtige Aspekte, die sich nicht durch solche Studien bewerten lassen, und bei der Durchführung dieser Studien kann oft etwas schiefgehen. Die vorliegende Arbeit untersucht diese Themen und bietet Lösungsvorschläge für häufig auftretende Probleme. Sozialexperimente werden bevorzugt, weil die Randomisierung dafür sorgt, dass alle Unterschiede zwischen der Treatmentgruppe und der Kontrollgruppe der Intervention und nicht einem anderen Faktor zuzuschreiben sind. Es fällt Politikern und Beamten auch leichter, die Ergebnisse von Sozialexperimenten zu erklären und zu akzeptieren.

Bei experimentellen Bewertungen fehlt oft die externe Validität, und es fehlt die Möglichkeit, „entry effects“, Skaleneffekte, allgemeine Gleichgewichtseffekte und nichtrandomisierte Aspekte der Intervention zu kontrollieren. Experimente können auch zu verzerrten Aussagen über die Auswirkungen führen, wenn die Kontrollgruppe ihr Verhalten ändert oder wenn eine Änderung der Anzahl der ausgewählten Personen zu einer Veränderung der Auswirkungen führt. Weitere Probleme bei Sozialexperimenten sind erhöhter Zeitaufwand und Kosten sowie juristische und ethische Fragen nach dem Ausschluss gewisser Menschen von den Maßnahmen. Fehler, die manchmal in Sozialexperimenten vorkommen, sind beispielsweise Programme, die bei der Randomisierung nicht korrekt vorgehen und Teilnehmer bzw. Mitarbeiter, die die Interventionsregeln nicht verstehen. Die randomisierte Bewertung des Job Training Partnership Act in den USA wird als Fallstudie verwendet, um diese Themen besser aufzuzeigen.

1 Introduction

Since the 1960s, social experiments have been increasingly used in the United States to determine the effects of pilots and demonstrations as well as ongoing programs in areas as diverse as education, health insurance, housing, job training, welfare cash assistance, and time of day pricing of electricity. Although social experiments have not been widely used in Europe, there is growing interest in expanding their use in evaluating social programs. Social experiments remain popular in the United States, but there has been a spirited debate in recent years regarding whether recent methodological developments, particularly propensity score matching and regression discontinuity designs, overcome many of the key objections to nonexperimental methods. This paper provides an assessment of some of the issues that arise in conducting social experiments and explains some of the things that can go wrong in conducting and interpreting the results of social experiments.

The paper first defines what is generally meant by the term social experiments and briefly reviews their use in the United States. This is followed by a discussion of the advantages of social experiments over nonexperimental methods. The next section discusses the limitations of social experiments – what we cannot learn from social experiments. Next is a section discussing some of the things that can go wrong in social experiments and limits of what we learn from them. To illustrate the problems that can arise, the penultimate section provides a case study of lessons from the National JTPA Study, a social experiment that was used to assess a large training program for disadvantaged youth and adults in the United States. The last section provides conclusions.

2 Definitions and context

As Orr ( 1999 , p. 14) notes, “The defining element of a social experiment is random assignment of some pool of individuals to two or more groups that are subject to different policy regimes.” Greenberg and Shroder ( 2004 , p. 4) note that because social experiments are intended to provide unbiased estimates of the impacts of the policy of interest, they must have four specific features:

Random assignment : Creation of at least two groups of human subjects who differ from one another by chance alone.

Policy intervention : A set of actions ensuring that different incentives, opportunities, or constraints confront the members of each of the randomly assigned groups in their daily lives.

Follow-up data collection : Measurement of market and fiscal outcomes for members of each group.

Evaluation : Application of statistical inference and informed professional judgment about the degree to which the policy interventions have caused differences in outcomes between the groups.

These four features are not particularly restrictive, and social experiments can have a large number of variations. Although we often think of random assignment taking place at the individual level, the random assignment can take place at a more aggregated level, such as the classroom, the school, the school district, political or geographic jurisdictions, or any other unit where random assignment can be feasibly carried out. Footnote 1 Second, there is no necessity for a treatment to be compared against a null treatment. In an educational or medical context, for example, it might be harmful to the control group if they receive no intervention; in such instances, the experiment can measure differential impacts where the treatment and control groups both receive treatments, but they do not receive the same treatment. Footnote 2

Third, there does not have to be a single treatment. In many instances it is sensible to develop a number of alternative treatments to which participants are assigned. In health insurance experiments, for example, there are often a number of variations we would like to test for the key aspects of the treatment. Thus, we might want to randomly assign participants to various combinations of deductable amounts and co-payment rates to see which combination leads to the best results in terms of costs and health outcomes. Likewise, in U.S. welfare experiments, the experiments frequently vary the “guarantee,” the payment received if the person does no market work, and the “implicit tax rate,” the rate at which benefits are reduced if there are earnings. Footnote 3

Fourth, social experiments can be implemented in conjunction with an ongoing program or to test a new intervention; in some instances a social experiment will test a new intervention in the context of an ongoing program. Welfare programs in the United States have been subject to several types of social experiments. In the 1960s and 1970s, a series of “negative income tax” experiments were conducted where a randomly selected group of people were diverted from regular welfare programs to entirely new welfare programs with quite different rules and benefits. During the 1980s and 1990s, many states received waivers where they were permitted to try new variations on their welfare programs so long as the new interventions were evaluated using random assignment. U.S. vocational training programs have included freestanding demonstrations with experimental designs as well as experimental evaluations of ongoing programs. Inserting an experimental design in an ongoing program is sometimes difficult, particularly if the program is an entitlement or if the authorizing legislation prohibits denying services to those who apply.

Another important distinction among experiments is that the participants can volunteer for the intervention or they can be assigned to the program. For purely voluntary programs, such as many job training programs in the United States, there is no meaningful concept of mandatory participants. For welfare programs, however, a new intervention can be voluntary in nature or it could be mandatory; the numerous welfare to work demonstration programs tested in the United States have fallen into both categories. While both mandatory and voluntary programs can be evaluated using an experimental design, the findings must be interpreted carefully. The impacts estimated for a voluntary program can not necessarily be expected to apply for a program where all welfare recipients must participate, and the impacts for a mandatory program may not apply if the same intervention were implemented as a voluntary program.

Although this paper does not focus on the ethics of random assignment, it is important to consider whether it is ethical to deny people the opportunity to participate in a social program. Both Greenberg and Shroder ( 2004 ) and Orr ( 1999 ) discuss the ethics of random assignment, but they do not do so in depth. More recently, the topic was explored in more depth in an exchange between Blustein ( 2005a , b ), Barnow ( 2005 ), Rolston ( 2005 ), and Schochet ( 2005 ). Many observers would agree that random assignment is ethical (or at least not unethical) when there is excess demand for a program and the effectiveness of the program is unknown. Blustein ( 2005a ) uses the experimental evaluation of the Job Corps to raise issues such as recruiting additional applicants so that there will be sufficient applicants to deny services to some, the fact that applicants who do not consent to the random assignment procedure are denied access to the program, and whether those randomized out of participation should receive monetary compensation. She believes that a good case can be made that the Job Corps evaluation, which included random assignment, may have been unethical, although her critics generally take issue with her points and claim that the knowledge gained is sufficient to offset any losses to the participants. As Blustein makes clear, her primary motivation in the paper is not to dispute the ethics of the Job Corps evaluation but rather to urge that ethical considerations be taken into account more fully when random assignment is being considered.

An important distinction between social experiments and randomized controlled trials that are frequently used in the fields of medicine and public health is that social experiments rarely make use of double blind or even single blind approaches. In the field of medicine, it is well known that there can often be a “placebo effect” where subjects benefit from the perception of such a treatment. Although social experiments can also be subject to similar problems, it is often difficult or impossible to keep the subjects and researchers unaware of their treatment status. A related phenomenon, known as the “Hawthorne effect,” refers to the possibility that subjects respond differently to stimuli because they are being observed. Footnote 4 The important point is that the inability to conduct double blind experiments, and even the knowledge that a subject is in an experiment can potentially lead to biased estimates of intervention impacts.

It is important to distinguish between true social experiments and “natural experiments.” The term natural experiment is sometimes used to refer to situations where random selection is not used to determine assignment to treatment status but the mechanism used, it is argued, results in treatment and comparison groups that are virtually identical. Angrist and Krueger ( 2001 ) extol the use of natural experiments in evaluations when random assignment is not feasible as a way to eliminate omitted variable bias; however, the examples they cite make use of instrumental variables rather than assuming that simple analysis of variance or ordinary least squares regression analysis can be used to obtain impact estimates:

Instruments that are used to overcome omitted variable bias are sometimes said to derive from “natural experiments.” Recent years have seen a resurgence in the use of instrumental variables in this way – that is, to exploit situations where the forces of nature or government policy have conspired to produce an environment somewhat akin to a randomized experiment. This type of application has generated some of the most provocative empirical findings in economics, along with some controversy over substance and methods.

Perhaps one of the best known examples of use of a natural experiment is the analysis by Angrist and Krueger ( 1991 ) to evaluate the effects of compulsory school attendance laws in the United States on education and earnings. In that study, the authors argue that the number of years of compulsory education (within limits) is essentially random, as it is determined by the month of birth. As Angrist and Krueger clearly imply, a natural experiment is not a classical experiment with randomized control trials, and there is no guarantee that simple analyses or more complex approaches such as instrumental variables will yield unbiased treatment estimates.

3 Why conduct social experiments?

There are a number of reasons why social experiments are preferable to nonexperimental evaluations. In the simplest terms, the objective in an evaluation of a social program is to observe the outcome for an intervention for the participants with and without the intervention. Because it is impossible to observe the same person in two states of the world at the same time, we must rely on some alternative approach to estimate what would have happened to participants had they not been in the program. The simplest and most effective way to assure comparability of the treatment and control groups is to randomly assign the potential participants to either receive the treatment or be denied the treatment; with a sufficiently large sample size, the treatment and control groups are likely to be identical on all characteristics that might affect the outcome. Nonexperimental evaluation approaches generally seek to provide unbiased and consistent impact estimates either by using mechanisms to develop comparison groups that are as similar as possible to the treatment group (e.g., propensity score matching) or by using econometric approaches to control for observed and unobserved omitted variables (e.g., fixed effects models, instrumental variables, ordinary least squares regression analysis, and regression discontinuity designs). Unfortunately, all the nonexperimental approaches require strong assumptions to assure that unbiased estimates are obtained, and these assumptions are not always testable.

Burtless ( 1995 ) describes four reasons why experimental designs are preferable to nonexperimental designs. First, random assignment assures the direction of causality. If earnings rise for the treatment group in a training program more than they do for the control group, there is no logical source of the increase other than the program. If a comparison group of individuals who chose not to enroll is used, the causality is not clear – those who enroll may be more interested in working and it is the motivation that leads to the earnings gain rather than the treatment. Burtless's second argument is related to the first – random assignment assures that there is no selection bias in the evaluation, where selection bias is defined as a likelihood that individuals with particular unobserved characteristics may be more or less likely to participate in the program. Footnote 5 The most common example of potential selection bias is that years of educational attainment are likely to be determined in part on ability, but ability is usually either not available to the evaluator or available only with measurement error.

The third argument raised by Burtless in favor of social experiments is that social experiments permit tests of interventions that do not naturally occur. Although social experiments do permit evaluations of such interventions, pilot projects and demonstrations can also be implemented without a randomly selected control group. Finally, Burtless notes that evaluations using random assignment provide findings that are more persuasive to policy makers than evaluations using nonexperimental methods. One of the best features of using random assignment is that program impacts can be observed by simply subtracting the post-program control group values from the values for the treatment group – there is no need to have faith that a fancy instrumental variables approach or a propensity score matching scheme has adequately controlled for all unobserved variables. Footnote 6 For researchers, experiments also assure that the estimates are unbiased and more precise than alternative approaches.

4 Can nonexperimental methods replicate experimental findings?

The jury is still out on this issue, and in recent years there has been a great deal of research and spirited debate about how well nonexperimental methods do at replicating experimental findings, given the data that are available. There is no question that there have been important developments in nonexperimental methods in recent years, but the question remains as to how well the methods do in replicating experimental findings and how the replication depends on the particular methods used and data available. Major contributions in recent years include the work of Heckman et  al. ( 1997 ) on propensity score matching and Hahn et  al. ( 2001 ) on regression discontinuity designs. Footnote 7 In this section several recent studies that have found a good match between nonexperimental methods and experimental findings are first reviewed, followed by a review of studies that were unable to replicate experimental findings. The section concludes with suggestions from the literature on conditions where nonexperimental approaches are most likely to replicate experimental findings.

Propensity score matching has been widely used in recent years when random assignment is not feasible. Heckman et  al. ( 1997 ) tested a variety of propensity score matching approaches to see what approaches best mirror the experimental findings from the evaluation of the Job Training Partnership Act (JTPA) in the United States. The authors conclude that: “We determine that a regression-adjusted semiparametric conditional difference in differences matching estimator often performs the best among a class of estimators we examine, especially when omitted time-invariant characteristics are a source of bias.” The authors caution, however: “As is true of any empirical study, our findings may not generalize beyond our data.” They go on to state: “Thus, it is likely that the insights gained from our study of the JTPA programme on the effectiveness of different estimators also apply in evaluating other training programmes targeted toward disadvantaged workers.”

Another effort to see how well propensity score matching replicates experimental findings is in Dehejia and Wahba ( 2002 ). These authors are also optimistic about the capability of propensity score matching to replicate experimental impact estimates: “This paper has presented a propensity score-matching method that is able to yield accurate estimates of the treatment effect in nonexperimental settings in which the treated group differs substantially from the pool of potential comparison units.” Dehejia and Wahba ( 2002 ) use propensity score matching in trying to replicate the findings from the National Supported Work demonstration. Although the authors find that propensity score matching works well in the instance they examined, they caution that the approach critically depends on selection being based on observable variables and note that the approach may not work well when important explanatory variables are missing.

Cook et  al. ( 2008 ) provide a third example of finding that nonexperimental approaches do a satisfactory job of replicating experimental findings under some circumstances. The authors looked at the studies by the type of nonexperimental approach that was used. The three studies that used a regression discontinuity design were all found to replicate the findings from the experiment. Footnote 8 They note that although regression discontinuity designs are much less efficient than experiments, as shown by Goldberger ( 1972 ), the studies they reviewed had large samples so impacts remained statistically significant. The authors find that propensity score matching works well in replicating experimental findings when key covariates are included in the propensity score modeling and where the comparison pool members come from the same geographic area as the treatment group, and they also find that propensity score matching works well when clear rules for selection into the treatment group are used and the variables that are used in selection are available for the analysis. Finally, in studies where propensity score matching was used but the covariates available did not correspond well to the selection rules and/or there was a poor geographic match, the nonexperimental results did not consistently match the experimental findings.

In another recent study, Shadish et  al. ( 2008 ) conducted an intriguing experiment by randomly assigning one group of individuals to be randomly assigned to treatment status and the other to self-select one of the two treatment options (mathematics or vocabulary training). The authors found that propensity score matching greatly reduced the bias of impact estimates when the full set of available covariates was used, including pretests, but did poorly when only predictors of convenience (sex, age, marital status, and ethnicity) were used. Thus, their findings correspond with the findings of Cook et  al. ( 2008 ).

Smith and Todd ( 2005a ) reanalyzed the National Supported Work data used by Dehejia and Wahba ( 2002 ). They find that the estimated impacts are highly sensitive to the particular subset of the data analyzed and the variables used in the analysis. Of the various analytical strategies employed, Smith and Todd ( 2005a ) find that difference in difference matching estimators perform the best. Like many other researchers, Smith and Todd ( 2005a ) find that variations in the matching procedure (e.g., number of individuals matched, use of calipers, local linear regressions) generally do not have a large effect on the estimated impacts. Although they conclude that propensity score matching can be a useful approach for nonexperimental evaluations, they believe that it is not a panacea and that there is no single best approach to propensity score matching that should be used. Footnote 9

Wilde and Hollister ( 2007 ) used data from an experimental evaluation of a class size reduction effort in Tennessee (Project STAR) to assess how well propensity score matching replicates the experimental impact estimates. They accomplished this by treating each school as a separate experiment and pooling the control groups from other schools in the study and then using propensity score matching to identify the best match for the treatment group in each school. The authors state that: “Our conclusion is that propensity score estimators do not perform very well, when judged by standards of how close they are to the ‘true’ impacts estimated from experimental estimators based on a random assignment design.” Footnote 10

Bloom et  al. ( 2002 ) make use of an experiment designed to assess the effects of mandatory welfare to work programs in six states to compare a series of comparison groups and estimation strategies to see if popular nonexperimental methods do a reasonable job of approximating the impact estimates obtained from the experimental design. Nonexperimental estimation strategies tested include several propensity score matching strategies, ordinary least squares regression analysis, fixed effect models, and random growth models. The authors conclude that none of the approaches tried do a good job of reproducing the experimental findings and that more sophisticated approaches are sometimes worse than simple approaches such as ordinary least squares.

Overall, the weight of the evidence appears to indicate that nonexperimental approaches generally do not do a good job of replicating experimental estimates and that the most common problem is the lack of suitable data to control for key differences between the treatment group and comparison group. The most promising nonexperimental approach appears to be the regression discontinuity design, but this approach requires a much larger sample size to obtain the same amount of precision as an experiment. Footnote 11 The studies identify a number of factors that generally improve the performance of propensity score matching:

It is important to only include observations in the region of common support, where the probabilities of participating are nonzero for both treatment group members and comparison group members.

Data for the treatment and comparison groups should be drawn from the same data source, or the same questions should be asked of both groups.

Comparison group members should be drawn from the same geographic area as the treatment group.

It is important to understand and statistically control for the variables used to select people into the treatment group and to control for variables correlated with the outcomes of interest.

Difference in difference estimators appear to produce less bias than cross section matching in several of the studies, but it is not clear that this is always the case.

5 What we cannot learn from social experiments

Although experiments provide the best means of obtaining unbiased estimates of program impacts, there are some important limitations that must be kept in mind in designing experiments and interpreting the findings. This section describes some of the limitations that are typically inherent to experiments as well as problems that sometimes arise in experiments.

Although a well designed experiment can eliminate internal validity problems, there are often issues regarding external validity, the applicability of the findings in other situations. External validity for the eligible population is threatened if either the participating sites or individuals volunteer for the program rather than are randomly assigned. If the sites included in the experiment volunteered rather than were randomly selected, the impact findings may not be applicable to other sites. It is possible that the sites that volunteer are more effective sites, as less capable sites may want to avoid having their poor performance known to the world. In some of the welfare to work experiments conducted in the United States, random assignment was conducted among welfare recipients who volunteered to participate in the new program. The fact that the experiment was limited to welfare recipients who volunteered would not harm the internal validity of the evaluation, but the results might not apply to individuals who did not volunteer. If consideration is being given to making the intervention mandatory, then learning the effects of the program for volunteers does not identify the parameter of interest unless the program has the same impact on all participants. Although there is no way to assure external validity, exploratory analyses examining whether impacts are consistent across sites and subgroups can suggest (but not prove) if there is a problem.

Experiments typically randomly assign people to the treatment or control group after they have applied for or enrolled in the program. Thus, experiments typically do not pick up any effects the intervention might have that encourage or discourage participation. For example, if a very generous training option is added to a welfare program, more people might sign up for the program. These types of effects, referred to as entry effects, can be an important aspect of a program's effects. Because experiments are likely not to measure these effects, nonexperimental methods must be used to estimate the entry effects. Footnote 12

Another issue that is difficult to deal with in the context of experiments is the finite time horizon that typically accompanies an experiment. If the experiment is offered on a temporary basis and potential participants are aware of the finite period of the experiment, their behavior may be quite different from what would occur if the program were permanent. Consider a health insurance experiment, for example. If members of the treatment group have more generous coverage during the experiment than they will have after the experiment, they are more likely to increase their spending on health care for services that might otherwise be postponed. The experiment will provide estimates of the impact of a temporary policy, but what is needed for policy purposes is the impact of a permanent program. This issue can be dealt with in several ways. One approach would be to run the experiment for a long time so that the treatment group's response would be similar to what would occur for a permanent program; this would usually not be feasible due to cost issues. Another approach would be to enroll members of the treatment group for a  varying number of years and then try to estimate how the response varies with time in the experiment. Finally, one could enroll the participants in a “permanent” program and then buy them out after the data for the evaluation has been gathered.

Another area where experiments may provide only limited information is on general equilibrium effects. For example, a labor market intervention can have effects not captured in a typical evaluation. Examples include potential displacement of other workers by those who receive training, wage increases for the control group due to movement of those trained into a different labor market, and negative wage effects for occupations if the number of people trained is large. Another example is “herd immunity” observed in immunization programs; the benefits of an immunization program affect those not immunized at some point as their probability of contracting the disease diminishes as the number of people in the community immunized increases. Not only do small scale experiments fail to measure these effects, even the evaluation of a large scale program might miss them. Footnote 13

With human subjects, it is not always a simple matter to assure that individuals in the treatment group obtain the treatment and those in the control group do not receive the treatment. In addition, being in the control group in the experiment may provide benefits that would not have been received had there been no experiment. These three cases are described below.

One factor that differentiates social experiments from agricultural experiments is that often some of those assigned to the treatment group do not receive the treatment. So-called no-shows are frequently found in program evaluations, including experiments. It is essential that no-shows be included in the treatment group to preserve the equality of the treatment and control groups. Unfortunately, the experimental impact estimates produced when there are no-shows provide the impact of an offer of the treatment, not the impact of the treatment itself. A policy maker who is trying to decide whether to continue a training program is not interested in the impact of an offer for training – the program only incurs costs for those who enroll, so the policy maker wants to know the impact for those who participate.

Bloom ( 1984 ) has shown that if one is willing to assume that the treatment has no impact on no-shows, the experimental impact estimator can be adjusted to provide an estimate of the impact on the treated. The overall impact of the program is a weighted average of the impact on those who receive the treatment, \( { I_{\text{P}} } \) , and those who do not receive the treatment, \( { I_{\text{NP}} } \) :

where p is the fraction of the treatment group that receives the treatment. If the impact on those who do not receive the treatment is zero, then \( { I_{\text{NP}} = 0 } \) , and \( { I_{\text{P}} = I/p } \) ; in other words, the impact of the program on those who receive the treatment is estimated by dividing the impact on the overall treatment group (including no-shows) by the proportion who actually receive the treatment.

Individuals assigned to the control group who somehow receive the treatment are referred to as “crossovers.” Orr ( 1999 ) observes that some analysts assign the crossovers to the treatment group or leave them out of the analysis, but either of these strategies is likely to destroy the similarity of the treatment and control groups. He further observes that if we are willing to assume that the program is equally effective for the crossovers and the “crossover-like” individuals in the treatment group, then the impact on the crossover-like individuals is zero and the overall impact of the program can be expressed as a weighted average of the impact on the crossover-like individuals and other individuals:

where \( { I_{\text{c}} } \) is the impact on crossover-like participants, \( { I_{\text{o}} } \) is the impact on others, and c is the proportion of the control group that crossed over; assuming that \( { I_{\text{c}} = 0 } \) , we can then compute the impact on those who do not cross over as \( { I_{\text{o}} = I/(1 - c) } \) . If the crossovers receive a similar but not identical treatment, then the impact on the crossover-like individuals may well not be zero, and Orr ( 1999 ) indicates that the best that can be done is to vary the value of \( { I_{\text{c}} } \) and obtain a range of estimates. Footnote 14

Heckman and Smith ( 1995 ) raise a related issue. In some experiments, the control group may receive valuable services in the process of being randomized out that they would not receive if there were no experiment. This may occur because when people are being recruited for the experiment, they receive some services with the goal of increasing their interest. Alternatively, to reduce ethical concerns, those randomized out may receive information about alternative treatments, which they then receive. In either case, the presence of the experiment has altered the services received by the control group and this creates what Heckman and Smith ( 1995 ) refer to as “substitution bias.”

Heckman and Smith ( 1995 ) also discuss the concept of “randomization bias” that can arise because the experiment changes the scale of the intervention. This problem can arise when the program has heterogeneous impacts and as the scale of the program is increased, those with smaller expected impacts are more likely to enroll. Suppose, for example, that at its usual scale a training program has an earnings impact of $1,000 per year. When the experiment is introduced, the number of people accepted into the program increases, so the impact is likely to decline. It is possible, at least in theory, to assess this problem and correct for it by asking programs to indicate which individuals would have been accepted at the original scale and at the experiment scale. Another possible way to avoid this problem is to reduce the operating scale of the program during the experiment so that the size of the treatment and control groups combined is equal to the normal operating size of the program. More practically, randomization bias can be minimized if the proportion randomized out is very small, say 10% or less; this was the strategy employed in the experimental evaluation of the Job Corps in the United States where Schochet ( 2001 ) indicates that only about 7% of those admitted to the program were assigned to the control group. Footnote 15

6 What can go wrong in social experiments?

In addition to the issues described above that frequently arise in social experiments, there are a number of problems that can also arise. Several common problems are described in this section, and the following section provides a case study of one experiment.

For demonstration projects and for new programs, the intervention may change after the program is initiated. In some cases it may take several months for the program to be working at full capacity; those who enroll when the program first opens may not receive the same services as later participants receive. The program might also change because program officials learn that some program components do not work as well in practice as they do in theory, economic conditions change, or the participants differ from what was anticipated. Some types of interventions, such as comprehensive community initiatives are expected to change over their implementation as new information is gathered. Footnote 16 Although program modifications often improve the intervention, they can complicate the evaluation in several ways. Instead of determining the impact of one known intervention, the impact evaluation may provide estimates that represent an average of two or more different strategies. At worst, policy makers might believe that the impact findings apply to a different intervention than what was evaluated.

Several strategies can be used to avoid or minimize these types of problems. First, it is important to monitor the implementation of the intervention. Even ongoing programs should be subject to implementation studies so that policy makers know what is being evaluated and if it has changed over time. Second, for a new intervention, it is often wise to postpone the impact evaluation until the intervention has achieved a steady state. Finally, if major changes in the intervention occur over the period analyzed, the evaluation can be conducted for two or more separate periods, although this strategy reduces the precision of the impact estimates.

Experiments can vary in their complexity, and this can lead to problems in implementation and the interpretation of findings. In some instances, experiments are complex because we wish to determine an entire “response surface” rather than evaluate a single intervention. Examples in the United States include the RAND health insurance experiment and the negative income tax (welfare reform) experiments (Greenberg and Schroder 2004 ), where various groups in the experiment were subject to variations in key parameters. For example, in the negative income tax experiments, participants were subject to variation in the maximum benefit and the rate at which benefits were reduced if they earned additional income. If the participants did not understand the concepts involved, particularly the implicit tax rate on earnings, then it would be inappropriate to develop a response surface based on variation in behavior by participants subject to different rules.

Problems in understanding the rules of the intervention can also arise in simpler experiments. For example, the State of Maryland wished to promote good parenting among its welfare recipients and instituted an experiment called the Primary Prevention Initiative (PPI). The treatment group in this experiment was required to assure that the children in the household maintained satisfactory school attendance (80% attendance), and preschool children were required to receive immunizations and physical examinations (Wilson et  al. 1999 ). Parents who failed to meet these criteria were subject to a fine of $25.00 per month. The experiment included an implementation study, and as part of the implementation study, clients were surveyed on their knowledge of the PPI. Wilson et  al. ( 1999 ) report that “only a small minority of clients (under 20%) could correctly identify even the general areas in which PPI had behavioral requirements.” The lack of knowledge was almost as high among those sanctioned as for clients not sanctioned. Not surprisingly, the impact evaluation indicated that the PPI had no effect on the number of children that were immunized, that received a physical exam, or that had satisfactory school attendance. If there had been no data on program knowledge, readers of the impact evaluation might logically have inferred that the incentives were not strong enough rather than that participants did not understand the intervention.

The potential for participants in experiments to not fully understand the rules of the intervention is not trivial. If we obtain zero impacts because participants do not understand the rules and it is possible to educate them, it is important to identify the reasons why we estimate no impact. Thus, whenever there is a reasonable possibility of participants misunderstanding the rules, it is advisable to consider including a survey of intervention knowledge as part of the evaluation.

Finally, in instances where state or local programs are asked to volunteer to participate in the program, there may be a high refusal rate, thus jeopardizing external validity. Sites with low impacts may be reluctant to participate as may sites that are having trouble recruiting adequate participants. Sites may also be reluctant to participate if they believe random assignment is unethical, as was discussed above, or adds a delay in processing applicants.

7 Lessons from the National JTPA Study

This section describes some of the problems that occurred in implementing the National JTPA Study in the United States. The Job Training Partnership Act (JTPA) was the primary workforce program for disadvantaged youth and adults in the United States from 1982 through 1998 when the Workforce Investment Act (WIA) was enacted. The U.S. Department of Labor decided to evaluate JTPA with a classical experiment after a series of impact evaluations of JTPA's predecessor produced such a wide range of estimated impacts that it was impossible to know the impact of the program. Footnote 17 The National JTPA Study used a classical experimental design to estimate the impact of the JTPA program on disadvantaged adults and out-of-school disadvantaged youth. The study began in 1986 and made use of JTPA applicants in 16 sites across the country. The impact evaluation found that the program increased earnings of adult men and women by over $1,300 in 1998 dollars during the second year after training. The study found that the out-of-school youth programs were ineffective, and these findings are not discussed below.

I  focus on the interim report of the National JTPA Study for several reasons. Footnote 18 First, the study was generally well done, and it was cited by Hollister ( 2008 ) as one of the best social experiments that was conducted. The problems that I  review below are not technical flaws in the study design or implementation, but program features that precluded analyzing the hypotheses of most interest and, in my view, approaches to presenting the findings that may have led policy makers to misinterpret the findings. I  focus on the interim report rather than the final report because many of the presentation issues that I  discuss were not repeated in the final report. Footnote 19

7.1 Nonrandom site selection

The study design originally called for 16 to 20 local sites to be selected at random. Sites were offered modest payments to compensate for extra costs incurred and to pay for inconvenience experienced. The experiment took place when the economy was relatively strong, and many local programs (called service delivery areas or SDAs) were having difficulty spending all their funding. Because participating sites were required to recruit 50% more potential participants to construct a control group one-half the size of the treatment group, many sites were reluctant to participate in the experiment. In the end, the project enrolled all 16 sites identified that were willing and able to participate. All evaluations, including experiments, run the risk of failing to have external validity, but the fact that most local sites refused to participate raised suspicion that the sites selected did not constitute a representative sample of sites. The National JTPA Study report does note that no large cities are included in the participating sample of 16 SDAs (by design), but the report's overall conclusion is more optimistic: “The most basic conclusion … is that the study sites and the 17,026 members of the 18-month study sample resemble SDAs and their participants nationally and also include much of their diversity” (Bloom et  al. 1993 , p. 73).

Although the external validity of the National JTPA Study has been subject to a great deal of debate among analysts, there is no way to resolve the issue. Obviously it is best to avoid sites refusing to participate, but that may be easier said than done. Potential strategies to improve participation include larger incentive payments, exemption from performance standards sanctions for the period of participation, Footnote 20 making participation in evaluations mandatory in authorizing legislation, and decreasing the proportion of the applicants assigned to the control group.

7.2 Random assignment by service strategy recommended

Experimental methods can only be used to evaluate hypotheses where random assignment was used to assign the specific treatment received. In JTPA, the evaluators determined that prior to the experiment adults in the 16 sites were assigned to one of three broad categories – (1)  occupational classroom training, (2)  job search assistance (JSA) or on-the-job training (OJT), and (3)  other services. Although OJT is generally the most expensive service strategy, because the program pays up to one-half of the participant's wages for up to six months, and JSA is the least expensive because it is generally of short duration and is often provided in a group setting, it was observed that the individuals deemed appropriate for OJT were virtually job ready as were those recommended for JSA; in addition, because OJT slots are difficult to obtain, candidates for OJT are often given JSA while waiting for an OJT slot to become available. The “other” category included candidates recommended for services such as basic skills (education), work experience, and other miscellaneous services but not occupational classroom training or OJT.

The strategy used in the National JTPA Study was to perform random assignment after a prospective participant was given a preliminary assessment and a service strategy recommended for the person; individuals that the program elected not to serve were excluded from the experiment. Two-thirds of the participants recommended for services were in the treatment group, and one-third was excluded from the JTPA program for a period of 18 months. During the embargo period, control group members were permitted to enroll in any workforce activities other than JTPA that they wished.

There are several concerns with the random assignment procedures used in the National JTPA Study. None of these concerns threatens the internal validity of the impacts estimated, but they show how difficult it is to test the most interesting hypotheses when trying to graft a random assignment experimental design to an existing program.

By presenting findings primarily per assignee rather than per participant, the findings may be misinterpreted . This issue relates more to presentation than analysis. A reader of the full report can find detailed information about what the findings mean, but the executive summary stresses impact estimates per assignee, so casual readers may not learn the impact per person who enrolls in the program. Footnote 21 There are often large differences between the impact per assignee and impact per enrollee because for some analyses the percentage of assignees that actually enrolled in the program is much less than 100%. For adult women for example, less than half (48.6%) of the women assigned to classroom training actually received classroom training; for men, the figure was even lower (40.1%). Assignees who did not receive the recommended treatment strategy sometimes received other strategies, and the report notes that impacts per enrollee “were about 60 percent to 70 percent larger than impacts per assignee, depending on the target group” (Bloom et  al. 1993 , p. xxxv). Policy makers generally think about what returns they are getting on people who enroll in the program, as little, if any, money is spent on no-shows. Thus, policy makers want to know the impact per enrollee, and they might assume that impact estimates are impact per enrollee rather than impact per assignee. Footnote 22 \( { ^{,} } \) Footnote 23

Failure to differentiate between the in-program period and the post-program period can be misleading, particularly for short-term findings. The impact findings are generally presented on a quarterly basis, measured in calendar quarters after random assignment, or for the entire six-quarter follow-up period. For strategies that typically last for more than one quarter, the reader can easily misinterpret the impact findings when the in-program and post-program impacts are not presented separately. Footnote 24 \( { ^{,} } \) Footnote 25

The strategy does not allow head-to-head testing of alternative strategies. Because random assignment is performed after a treatment strategy is recommended, the only experimental estimates that can be obtained are for a particular treatment versus control status. Thus, if, say, OJT has a higher experimental impact than classroom training, the experiment tells us nothing about what the impact of OJT would be for those assigned to classroom training. The only way to experimentally test this would be to randomly assign participants to treatment strategies. In the case of the JTPA, this would mean sometimes assigning people to a strategy that the SDA staff believed was inappropriate.

The strategy does not provide the impact of receiving a particular type of treatment – it only provides the impact of being assigned to a particular treatment stream . If all JTPA participants received the activities they were initially assigned to, this point would not be important, but this was not the case. Among the adult women and men who received services, slightly over one-half of those assigned to occupational classroom training received this service, 58 and 56%, respectively. Footnote 26 Of those who did not receive occupational classroom training, about one-half did not enroll, and the remainder received other services. The figures are similar for the OJT-JSA group except that over 40% never enrolled. The “other services” group received a variety of services with no single type of service dominating. There is, of course, no way to analyze actual services received using experimental methods, but the fact that a relatively large proportion of individuals received services other than those recommended makes interpretation of the findings difficult.

The OJT-JSA strategy assignee group includes those receiving the most expensive services and those receiving the least expensive services, so the impact estimates are not particularly useful. The proportions receiving JSA and OJT are roughly equal, but by estimating the impact for these two service strategies combined, policy and program officials cannot determine whether one of the two strategies or both are providing the benefits. It is impossible to disentangle the effects of these two very different strategies using experimental methods. In a future experiment this problem could be avoided by establishing narrower service strategies, e.g., making OJT and JSA separate strategies.

Control group members were barred from receiving JTPA services, but many received comparable services from other sources, making the results difficult to interpret. The National JTPA Study states that impact estimates of the JTPA program are relative to whatever non-JTPA services the control group received. Because both the treatment group and the control group were motivated to receive workforce services, it is perhaps not surprising that for many of the analyses the control group received substantial services. For example, for the men recommended to receive occupational classroom training, 40.1% of the treatment group received such training, but so did 24.2% of the control group. For women, 48.6% of the treatment group received occupational classroom training and 28.7% of the control group received such services. Thus, to some extent, the estimated impacts do not provide the impact of training versus no training, but of one type of training relative to another.

The point is not that the National JTPA Study was seriously flawed; on the contrary, Hollister ( 2008 ) is correct to identify this study as one of the better social experiments conducted in recent years. Rather, the two key lessons to be drawn from the study are as follows:

It is important to present impact estimates so that they answer the questions of primary interest to policy makers. This means clearly separating in-program and post-program impact findings and giving impacts per enrollee more prominence than impacts per assignee. Footnote 27

Some of the most important evaluation questions may be answered only through nonexperimental methods rather than experimental methods. Although experimental estimates are preferred when they are feasible, nonexperimental methods should be used when they are not. The U.S. Department of Labor has sometimes shied away from having researchers use nonexperimental methods in conjunction with experiments. When experimental methods cannot answer all the questions of interest, nonexperimental methods should be tried, with care taken to describe all assumptions made and for sensitivity analyses to be conducted.

8 Conclusions

This paper has addressed the strengths and weaknesses of social experiments. There is no doubt that experiments offer some advantages over nonexperimental evaluation approaches. Major advantages include the fact that experiments avoid the need to make strong assumptions about potential explanatory variables that are unavailable for analysis and the fact that experimental findings are much easier to explain to skeptical policy makers. Although there is growing literature testing how well nonexperimental methods replicate experimental impact estimates, there is no consensus on the extent to which positive findings can be generalized.

But experiments are not without problems. The key point of this paper is that any impact evaluation, experimental or nonexperimental in nature, can have serious limitations. First, there are some questions that experiments generally cannot answer. For example, experiments frequently have “no-shows” who do not participate in the intervention after they were randomly assigned to the treatment group, and crossovers who are members of the control group who somehow take the treatment intervention or something other than what was intended for the control group. Experiments are often bad at capturing entry effects and general equilibrium effects.

In addition, in implementing experimental designs, things can go wrong. Examples include problems with participants understanding the intervention and difficulties in testing the hypotheses of most interest. These points were illustrated by showing how the National JTPA Study, which included random assignment to treatment status and is considered by many as an example of a well conducted experiment, failed to answer many of the questions of interest to policy makers.

Thus, social experiments have many advantages, and one should always give careful thought to using random assignment to evaluate interventions of interest. It should be recognized, however, that simply conducting an experiment is not sufficient to assure that important policy questions are answered correctly. In short, an experiment is not a substitute for thinking.

Executive summary

It is widely agreed that randomized controlled trials – social experiments – are the gold standard for evaluating social programs. There are, however, important issues that cannot be tested using experiments, and often things go wrong when conducting experiments. This paper explores these issues and offers suggestions on dealing with commonly encountered problems. There are several reasons why experiments are preferable to nonexperimental evaluations. Because it is impossible to observe the same person in two states of the world at the same time, we must rely on some alternative approach to estimate what would have happened to participants had they not been in the program.

Nonexperimental evaluation approaches seek to provide unbiased and consistent impact estimates, either by developing comparison groups that are as similar as possible to the treatment group (propensity score matching) or by using approaches to control for observed and unobserved variables (e.g., fixed effects models, instrumental variables, ordinary least squares regression analysis, and regression discontinuity designs). Unfortunately, all the nonexperimental approaches require strong assumptions to assure that unbiased estimates are obtained, and these assumptions are not always testable. Overall, the evidence indicates that nonexperimental approaches generally do not do a good job of replicating experimental estimates and that the most common problem is the lack of suitable data to control for key differences between the treatment group and comparison group. The most promising nonexperimental approach appears to be the regression discontinuity design, but this approach requires a much larger sample size to obtain the same amount of precision as an experiment.

Although a well designed experiment can eliminate internal validity problems, there are often issues regarding external validity. External validity for the eligible population is threatened if either the participating sites or individuals volunteer for the program rather than are randomly assigned. Experiments typically randomly assign people to the treatment or control group after they have applied for or enrolled in the program. Thus, experiments typically do not pick up any effects the intervention might have that encourage or discourage participation. Another issue is the finite time horizon that typically accompanies an experiment; if the experiment is offered on a temporary basis and potential participants are aware of the finite period of the experiment, their behavior may be different than if the program were permanent. Experiments frequently have no-shows and crossovers, and these phenomena can only be addressed by resorting to nonexperimental methods. Finally, experiments generally cannot capture scale or general equilibrium effects.

Several things can go wrong in implementing an experiment. First, the intervention might change while the experiment is implemented. A common occurrence is that the intervention itself changes, either because the original design was not working or circumstances change. The intervention should be carefully monitored to observe this and the evaluation modified if it occurs. Another potential problem is that participants may not understand the intervention; to guard against this, knowledge should be tested and instruction provided if it is a problem.

Many of the problems described here occurred in the random assignment evaluation of the Job Training Partnership Act evaluation in the United States. Although the intent was to include a random sample of local programs, most local programs refused to participate, resulting in questions of external validity. Random assignment in the study occurred after an appropriate service strategy was selected. This assured that each strategy could be compared to exclusion from the program, but the alternative strategies could not be compared with each other. Crossover and no-show rates were high in the study, and it is likely many policy officials did not interpret the impact findings correctly. For example, 40% of the men recommended for classroom training received that treatment, as did 24% of the men in the control group. Thus, the difference in outcomes for the treatment and control groups is very different from the impact of receiving training versus not receiving training. Another feature that makes interpretation difficult is that one service strategy included those who received the most expensive strategy, on-the-job training, and the least expensive strategy, job search assistance; this makes it impossible to differentiate the impacts of these disparate strategies. Finally, the interim report made it difficult for the reader to separate impacts from the post-program period from those from the in-program period and much more attention was paid to the impact for the entire treatment group than the nonexperimentally estimated impact on the treated. It is likely that policy makers failed to understand the subtle but important differences here.

There is no doubt that experiments offer many advantages over nonexperimental evaluations. However, many problems can and do arise, and an experiment is not a substitute for thinking.

Kurzfassung

Es herrscht weitestgehend Konsens darüber, dass randomisierte kontrollierte Studien – Sozialexperimente – der „Goldstandard“ für die Bewertung sozialer Programme sind. Es gibt jedoch viele wichtige Aspekte, die sich nicht durch solche Studien bewerten lassen, und bei der Durchführung dieser Studien kann oft etwas schiefgehen. Die vorliegende Arbeit untersucht diese Themen und bietet Lösungsvorschläge für häufig entstehende Probleme. Es gibt viele Gründe, warum Experimente gegenüber nichtexperimentellen Bewertungen bevorzugt werden. Da es nicht möglich ist, die gleiche Person in zwei verschiedenen Zuständen gleichzeitig zu beobachten, müssen wir auf eine alternative Vorgehensweise zurückgreifen, um einzuschätzen, was mit den Probanden geschehen wäre, hätten sie am Maßnahmenprogramm nicht teilgenommen.

Nichtexperimentelle Bewertungsansätze versuchen unvoreingenommene, konsistente Aussagen über Auswirkungen zu treffen, indem sie entweder Vergleichsgruppen entwickeln, die der Behandlungsgruppe so ähnlich wie möglich sind („propensity score matching“), oder indem sie Ansätze verwenden, die beobachtete und nichtbeobachtete Variablen kontrollieren (z. B. Fixed-effects-Modelle, Instrumentalvariablen, „Ordinary Least Squares Regression Analysis“ und „Regression Discontinuity Designs“). Leider benötigen sämtliche nichtexperimentellen Ansätze starke Annahmen, um zu gewährleisten, dass unvoreingenommene Einschätzungen erfolgen. Es ist nicht immer möglich, solche Annahmen zu prüfen. Im Allgemeinen deuten alle Anzeichen darauf hin, dass nichtexperimentelle Ansätze nur schlecht experimentelle Einschätzungen reproduzieren können. Das häufigste Problem ist dabei der Mangel an geeigneten Daten, um die Kernunterschiede zwischen der Treatmentgruppe und der Vergleichsgruppe zu kontrollieren. Der vielversprechendste nichtexperimentelle Ansatz scheint das „Regression Discontinuity Design“ zu sein, wobei diese Methode eine wesentlich größere Versuchsgruppe benötigt, um die gleiche Präzision wie ein Experiment zu erreichen.

Obwohl ein gut geplantes Experiment Probleme der internen Validität ausschließen kann, bleiben oft Fragen der externen Validität. Die externe Validität hinsichtlich der Gesamtbevölkerung wird gefährdet, wenn entweder die teilnehmenden Standorte oder die Personen sich für das Programm freiwillig melden, anstatt zufällig ausgewählt zu werden. Normalerweise werden in Experimenten Personen zufällig der Treatmentgruppe oder der Kontrollgruppe zugeordnet nachdem sie sich für das Programm angemeldet haben. Auf dieser Weise bilden Experimente in der Regel Faktoren nicht ab, die Personen zur Teilnahme ermutigen oder von der Teilnahme abschrecken können. Ein weiterer Aspekt ist der begrenzte Zeithorizont, den ein Experiment normalerweise mit sich bringt. Läuft das Experiment nur für eine begrenzte Zeit und sind sich die potenziellen Teilnehmer dessen bewusst, kann ihr Verhalten anders sein, als wenn das Experiment zeitlich unbegrenzt wäre. Bei Experimenten muss man oft mit No-Shows und Cross-Overs rechnen, und nur nichtexperimentelle Methoden sind dafür geeignet, solche Phänomene zu berücksichtigen. Zuletzt können Experimente in der Regel Skaleneffekte und allgemeine Gleichgewichtseffekte nicht erfassen.

Bei der Durchführung von Experimenten kann einiges schief gehen. Erstens kann sich während der Durchführung die Intervention ändern. Dies passiert häufig, entweder weil das ursprüngliche Design sich als ungeeignet erwiesen hat oder weil sich die Bedingungen geändert haben. Die Intervention ist aus diesem Grund sorgfältig zu beobachten und die Bewertung gegebenenfalls entsprechend anzupassen. Ein weiteres potenzielles Problem ist die Möglichkeit, dass die Teilnehmer die Intervention nicht verstehen. Um hier vorzubeugen, sollten das Verständnis der Teilnehmer hinsichtlich der Intervention geprüft und ggf. Schulungen bereitgestellt werden.

Viele der hier beschriebenen Probleme sind bei randomisierten Bewertung des Job Training Partnership Act in den USA aufgetreten. Obwohl eine Zufallsauswahl von lokalen Programmen teilnehmen sollte, weigerten sich die meisten dieser Programme. Diese Weigerung wirft Fragen der externen Validität der Studie auf. Die Randomisierung für die Studie erfolgte, nachdem eine passende Maßnahmenstrategie für die verschiedenen Teilnehmer ausgewählt worden war. Diese Vorgehensweise stellte sicher, dass jede Strategie mit der Situation bei Nichtteilnahme am Programm verglichen werden konnte, jedoch konnten die alternativen Strategien dadurch nicht miteinander verglichen werden. Die Cross-Overs und No-Show-Raten für die Studie waren hoch, und es ist wahrscheinlich, dass viele Beamte die Ergebnisse falsch interpretierten. Zum Beispiel bekamen nur 40% der Männer, für die eine Schulung empfohlen wurde, dieses Treatment, aber auch 24% der Männer in der Kontrollgruppe. Die unterschiedlichen Ergebnisse der Treatment- und Kontrollgruppen sind also nicht auf die Tatsachte zurückzuführen, dass eine Gruppe Schulungen bekommen hat und die andere nicht. Eine weitere Besonderheit, die die Interpretation schwierig macht, ist, dass eine Maßnahmenstrategie sowohl die teuersten Maßnahmen (die Ausbildung am Arbeitsplatz) als auch die billigsten Maßnahmen (die Hilfe bei der Jobsuche) enthielt. Dadurch ist es nicht möglich, zwischen den Auswirkungen dieser disparaten Maßnahmen zu unterscheiden. Schließlich machte es der Zwischenbericht dem Leser schwer, die Auswirkungen, die in der Zeit nach dem Programm beobachtet wurden, von denen während der Programmzeit zu trennen, und die Auswirkungen für die gesamte Treatmentgruppe bekamen viel mehr Aufmerksamkeit als die nichtexperimentell geschätzten Auswirkungen auf die Maßnahmenteilnehmer. Höchstwahrscheinlich sind den Entscheidungsträgern subtile, aber wichtige Unterschiede hier entgangen.

Es gibt keinen Zweifel, dass Experimente zahlreiche Vorteile gegenüber nichtexperimentellen Bewertungen haben. Es können dabei jedoch viele Probleme auftreten, und ein Experiment kann das Nachdenken nicht ersetzen.

There are a number of factors that help determine the units used for random assignment. Assignment at the individual level generates the most observations, and hence the most precision, but in many settings it is not practical to conduct random assignment at the individual level. For example, in an educational setting, it is generally not feasible to assign students in the same classroom to different treatments. The most important problem resulting from random assignment at a more aggregated level is that there are fewer observations, leading to a greater probability that the treatment and control groups are not well matched and the potential for imprecise estimates of the treatment effect.

It is important to distinguish between a known null treatment and a broader “whatever they would normally get” control treatment. As discussed below, the latter situation often makes it difficult to know what comparison is specifically being made and how estimated the impacts should be interpreted.

Orr ( 1999 ) notes that by including a variety of treatment doses, we can learn more than the effect of a single dose level on participants; instead, we can estimate a behavioral response function that provides information on how the impact varies with the dosage. Heckman ( 2008 ) provides a broader look at the concept of economic causality.

There are many views on how serious Hawthorne effects distort impact estimates, in the original illumination studies at the Hawthorne works in the 1930s and in other contexts.

See Barnow et  al. ( 1980 ) for a discussion of selection bias and a summary of approaches to deal with the problem.

As discussed more in the sections below, many circumstances can arise that make experimental findings difficult to interpret.

Propensity score matching is a two-step procedure where in the first stage the probability of participating in the program is estimated, and, in the simplest approach, in the second stage the comparison group is selected by matching each member of the treatment group with the nonparticipating person with the closest propensity score; there are numerous variations involving techniques such as multiple matches, weighting, and calipers. Regression discontinuity designs involve selection mechanisms where treatment/control status is determined by a screening variable.

It is important to keep in mind that regression discontinuity designs provide estimates of impact near the discontinuity, but experiments provide estimates over a broader range of the population.

See also the reply by Dehejia ( 2005 ) and the rejoinder by Smith and Todd ( 2005b ).

The paper by Wilde and Hollister ( 2007 ) is one of the papers reviewed by Cook et  al. ( 2008 ), and they claim that because Wilde and Hollister control on too few covariates and draw their comparison group from other areas than where the treatment group resides, the Wilde and Hollister paper does not offer a good test of propensity score matching.

Schochet ( 2009 ) shows that a regression discontinuity design typically requires a sample three to four times as large as an experimental design to achieve the same level of statistical precision.

See Moffitt ( 1992 ) for a review of the topic and Card and Robins ( 2005 ) for a recent evaluation of entry effects.

See Lise et  al. ( 2005 ) for further discussion of these issues.

See Heckman et  al. ( 2000 ) for discussion of this issue and estimates for JTPA. The authors find that JTPA provides only a small increase in the opportunity to receive training and that both JTPA and its substitutes increase earnings for participants; thus, focusing only on the experimental estimates of training impacts can lead to a large underestimate of the impact of training on earnings.

The Job Corps evaluation was able to deny services to a small proportion of applicants by including all eligible Job Corps applicants in the study, with only a relatively small proportion of the treatment group interviewed. The reason that this type of design has not been more actively used is that if there is a substantial fixed cost per site included in the experiment, including all sites generates large costs and for a fixed budget results in a smaller overall sample.

Comprehensive community initiatives are generally complex interventions that include interventions in a number of areas including employment, education, health, and community organization. See Connell and Kubisch ( 1998 ) for a discussion of comprehensive community initiatives and why they are difficult to evaluate.

See Barnow ( 1987 ) for a summary of the diverse findings from the evaluations of the Comprehensive Employment and Training Act (CETA) that were obtained when a number of analysts used diverse nonexperimental methods to evaluate the program.

I  was involved in the National JTPA study as a subcontractor on the component that investigated the possibility of using nonexperimental approaches to determine the impact of the program rather than experimental approaches.

The final report was published as Orr et  al. ( 1996 ).

Although exempting participating sites from performance standards sanctions may increase participation, it also reduces external validity because the participating sites no longer face the same performance incentives.

Some tables in the executive summary (e.g., Exhibit S.2 and Exhibit S.6) only provide the impact per assignee, and significance levels are only provided for estimates of impact per assignee.

A U.S. Department of Labor senior official complained to me that one contractor refused to provide her with impacts per enrollee because they were based on nonexperimental methods and could not, therefore, be believed. She opined that the evaluation had little value for policy decisions if the evaluation could not provide the most important information she needed.

Although I  argue that estimates on the eligible population, sometimes referred to as “intent to treat” (ITT) estimates are prone to misinterpretation, estimating participation rates and the determinants of participation can be valuable for policy officials to learn the extent to which eligible individuals are participating and what groups appear to be underserved. See Heckman and Smith ( 2004 ).

It is, of course, important to capture the impacts for the in-program period so that a cost-benefit analysis can be conducted.

For example, Stanley et  al. ( 1998 ) summarize the impact findings from the National JTPA Study by presenting the earnings impacts in the second year after random assignment, which is virtually all a post-program period.

See Exhibit 3.18 of Bloom et  al. ( 1993 ).

This is not a simple matter when program length varies significantly, as it did in the JTPA program. If the participants are followed long enough, however, part of the follow-up period should be virtually should all be after program exit.

Angrist, J.D., Krueger, A.B.: Does compulsory attendance affect schooling and earnings? Q.  J. Econ. 106 (4), 979–1014 (1991)

Article   Google Scholar  

Angrist, J.D., Krueger, A.B.: Instrumental variables and the search for identification: from supply and demand to natural experiments. J.  Econ. Perspect. 15 (4), 9–85 (2001)

Google Scholar  

Barnow, B.S.: The impacts of CETA programs on earnings: a review of the literature. J.  Hum. Resour. 22 (2), 157–193 (1987)

Barnow, B.S.: The ethics of federal social program evaluation: a response to Jan Blustein. J.  Policy Anal. Manag. 24 (4), 846–848 (2005)

Barnow, B.S., Cain, G.G., Goldberger, A.S.: Issues in the analysis of selection bias. In: Stromsdorfer, E.W., Farkas, G. (eds.) Evaluation Studies Review Annual, vol.  5. Sage Publications, Beverly Hills (1980)

Bloom, H.S.: Accounting for no-shows in experimental evaluation designs. Evaluation Rev. 8 (2), 225–246 (1984)

Bloom, H.S., Orr, L.L., Cave, G., Bell, S.H., Doolittle, F.: The National JTPA Study: Title II-A Impacts on Earnings and Employment at 18 Months. Abt Associates, Bethesda, MD (1993)

Bloom, H.S., Michalopoulos, C., Hill, C.J., Lei, Y.: Can Nonexperimental Comparison Group Methods Match the Findings from a Random Assignment Evaluation of Mandatory Welfare to Work Programs? MDRC, New York (2002)

Blustein, J.: Toward a more public discussion of the ethics of federal social program evaluation. J.  Policy Anal. Manag. 24 (4), 824–846 (2005a)

Blustein, J.: Response. J.  Policy Anal. Manag. 24 (4), 851–852 (2005b)

Burtless, G.: The case for randomized field trials in economic and policy research. J.  Econ. Perspect. 9 (2), 63–84 (1995)

Card, D., Robins, P.K.: How important are “entry effects” in financial incentive programs for welfare recipients? J.  Econometrics 125 (1), 113–139 (2005)

Connell, J.P., Kubisch, A.C.: Applying a theory of change approach to the evaluation of comprehensive community initiatives: progress, prospects, and problems. In: Fulbright-Anderson, K., Kubisch, A.C., Connell, J.P. (eds.) New Approaches to Evaluating Community Initiatives, vol.  2, Theory, Measurement, and Analysis. The Aspen Institute, Washington, DC (1998)

Cook, T.D., Shadish, W.R., Wong, V.C.: Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons. J.  Policy Anal. Manag. 27 (4), 724–750 (2008)

Dehejia, R.H.: Practical propensity score matching: a reply to Smith and Todd. J.  Econometrics 125 (1), 355–364 (2005)

Dehejia, R.H., Wahba, S.: Propensity score matching methods for nonexperimental causal studies. Rev. Econ. Statistics 84 (1), 151–161 (2002)

Goldberger, A.S.: Selection Bias in Evaluating Treatment Effects: Some Formal Illustrations. Institute for Research on Poverty, Discussion Paper 123–72, University of Wisconsin, Madison, WI (1972)

Greenberg, D.H., Shroder, M.: The Digest of Social Experiments, 3rd  edn. The Urban Institute Press, Washington DC (2004)

Hahn J., Todd, P.E., Van der Klaauw, W.: Identification and estimation of treatment effects with a regression discontinuity design. Econometrica 69 (1), 201–209 (2001)

Heckman, J.J.: Economic causality. Int. Stat. Rev. 76 (1), 1–27 (2008)

Heckman, J.J., Smith, J.A.: Assessing the case for social experiments. J.  Econ. Perspect. 9 (2), 85–110 (1995)

Heckman, J.J., Smith, J.A.: The determinants of participation in a social program: evidence from a prototypical job training program. J.  Labor Econ. 22 (2), 243–298 (2004)

Heckman, J.J., Ichimura, H., Todd, P.E.: Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev. Econ. Stud. 64 (4), 605–654 (1997)

Heckman, J.J., Hohmann, N., Smith, J., Khoo, M.: Substitution and dropout bias in social experiments: a study of an influential social experiment. Q.  J. Econ. 115 (2), 651–694 (2000)

Hollister, R.G. jr.: The role of random assignment in social policy research: opening statement. J.  Policy Anal. Manag. 27 (2), 402–409 (2008)

Lise, J., Seitz, S., Smith, J.: Equilibrium Policy Experiments and the Evaluation of Social Programs. Unpublished manuscript (2005)

Moffitt, R.: Evaluation methods for program entry effects. In: Manski, C., Garfinkel, I. (eds.) Evaluating Welfare and Training Programs. Harvard University Press, Cambridge, MA (1992)

Orr, L.L.: Social Experiments: Evaluating Public Programs with Experimental Methods. Sage Publications, Thousand Oaks, CA (1999)

Orr, L.L., Bloom, H.S., Bell, S.H., Doolittle, F., Lin, W.: Does Training for the Disadvantaged Work? Evidence from the National JTPA Study. The Urban Institute Press, Washington, DC (1996)

Rolston, H.: To learn or not to learn. J.  Policy Anal. Manag. 24 (4), 848–849 (2005)

Schochet, P.Z.: National Job Corps Study: Methodological Appendixes on the Impact Analysis. Mathematical Policy Research, Princeton, NJ (2001)

Schochet, P.Z.: Comments on Dr. Blustein's paper, toward a more public discussion of the ethics of federal social program evaluation. J.  Policy Anal. Manag. 24 (4), 849–850 (2005)

Schochet, P.Z.: Statistical power for regression discontinuity designs in education evaluations. J.  Educ. Behav. Stat. 34 (2), 238–266 (2009)

Shadish, W.R., Clark, M.H., Steiner, P.M.: Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J.  Am. Stat. Assoc. 103 (484), 1334–1343 (2008)

Smith, J.A., Todd, P.E.: Does matching overcome LaLonde's critique of nonexperimental estimators? J.  Econometrics 125 (1), 305–353 (2005a)

Smith, J.A., Todd, P.E.: Rejoinder. J.  Econometrics 125 (1), 305–353 (2005b)

Stanley, M., Katz, L., Krueger, A.: Developing Skills: What We Know about the Impacts of American Employment and Training Programs on Employment, Earnings, and Educational Outcomes. Cambridge, MA, unpublished manuscript (1998)

Wilde, E.T., Hollister, R.: How close is close enough? Evaluating propensity score matching using data from a class size reduction experiment. J.  Policy Anal. Manag. 26 (3), 455–477 (2007)

Wilson, L.A., Stoker, R.P., McGrath, D.: Welfare bureaus as moral tutors: what do clients learn from paternalistic welfare reforms? Soc. Sci. Quart. 80 (3), 473–486 (1999)

Download references

Acknowledgements

I  am grateful to Laura Langbein, David Salkever, Peter Schochet, Gesine Stephan, and participants in workshops at George Washington University and the University of Maryland at Baltimore County for comments. I  am particularly indebted to Jeffrey Smith for his thoughtful detailed comments and suggestions. Responsibility for remaining errors is mine.

Author information

Authors and affiliations.

Trachtenberg School of Public Policy and Public Administration, George Washington University, 805 21st St, NW, Washington, DC, 20052, USA

Burt S. Barnow

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Burt S. Barnow .

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Barnow, B.S. Setting up social experiments: the good, the bad, and the ugly. ZAF 43 , 91–105 (2010). https://doi.org/10.1007/s12651-010-0042-6

Download citation

Published : 20 October 2010

Issue Date : November 2010

DOI : https://doi.org/10.1007/s12651-010-0042-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Propensity Score
  • Random Assignment
  • Propensity Score Match
  • Social Experiment
  • Impact Estimate

social experiment research paper example

  • Architecture and Design
  • Asian and Pacific Studies
  • Business and Economics
  • Classical and Ancient Near Eastern Studies
  • Computer Sciences
  • Cultural Studies
  • Engineering
  • General Interest
  • Geosciences
  • Industrial Chemistry
  • Islamic and Middle Eastern Studies
  • Jewish Studies
  • Library and Information Science, Book Studies
  • Life Sciences
  • Linguistics and Semiotics
  • Literary Studies
  • Materials Sciences
  • Mathematics
  • Social Sciences
  • Sports and Recreation
  • Theology and Religion
  • Publish your article
  • The role of authors
  • Promoting your article
  • Abstracting & indexing
  • Publishing Ethics
  • Why publish with De Gruyter
  • How to publish with De Gruyter
  • Our book series
  • Our subject areas
  • Your digital product at De Gruyter
  • Contribute to our reference works
  • Product information
  • Tools & resources
  • Product Information
  • Promotional Materials
  • Orders and Inquiries
  • FAQ for Library Suppliers and Book Sellers
  • Repository Policy
  • Free access policy
  • Open Access agreements
  • Database portals
  • For Authors
  • Customer service
  • People + Culture
  • Journal Management
  • How to join us
  • Working at De Gruyter
  • Mission & Vision
  • De Gruyter Foundation
  • De Gruyter Ebound
  • Our Responsibility
  • Partner publishers

social experiment research paper example

Your purchase has been completed. Your documents are now available to view.

Reflections on the Ethics of Social Experimentation

Social scientists are increasingly engaging in experimental research projects of importance for public policy in developing areas. While this research holds the possibility of producing major social benefits, it may also involve manipulating populations, often without consent, sometimes with potentially adverse effects, and often in settings with obvious power differentials between researcher and subject. Such research is currently conducted with few clear ethical guidelines. In this paper I discuss research ethics as currently understood in this field, highlighting the limitations of standard procedures and the need for the construction of appropriate ethics, focusing on the problems of determining responsibility for interventions and assessing appropriate forms of consent.

1 Introduction

Social science researchers are increasingly using field experimental methods to try to answer all kinds of questions about political processes and public policies. Unlike traditional “observational” methods, in which you observe the world as it comes to you, the idea right at the heart of the experimental approach is that you learn about the world by seeing how it reacts to interventions. In international development research these interventions can sometimes take the form of researchers from wealthy institutions manipulating citizens from poorer populations to answer questions of little interest to those populations.

These studies raise a host of ethical concerns that social scientists are not well equipped to deal with. US based social science researchers rely on principles such as respect for persons, justice, and beneficence that have been adopted by health researchers and institutionalized through formal review processes but that do not always do the work asked of them by social scientists.

Consider one example where many of the points of tension come to a head. Say a researcher is contacted by a set of community organizations that want to figure out whether placing street lights in slums will reduce violent crime. In this research the subjects are the criminals: seeking informed consent of the criminals would likely compromise the research and it would likely not be forthcoming anyhow (violation of respect for persons); the criminals will likely bear the costs of the research without benefitting (violation of justice); and there will be disagreement regarding the benefits of the research – if it is effective, the criminals in particular will not value it (producing a difficulty for assessing benevolence). There is no pretense at neutrality in this research since assessing the effectiveness of the lamps is taking sides, but despite the absence of neutrality no implicit contract between researchers and subjects is broken. The special issues here are not just around the subjects however. Here there are also risks that obtain to non-subjects, if for example criminals retaliate against the organizations putting the lamps in place. The organization may be very aware of these risks but be willing to bear them because they erroneously put faith in the ill-founded expectations of researchers from wealthy universities who are themselves motivated in part to publish.

The example raises a lot of issues. It is chosen because despite the many issues raised, the principles that are currently employed provide almost no guidance to deal with the issues raised. It is not however a particularly unusual case and many of the features of the case are shared by other projects including work in spheres such as reduction of violence against women, efforts to introduce democratic institutions in rural communities, job training programs for ex combatants, efforts to alter electoral behaviour of constituents, and efforts to stamp out corruption by politicians [for a discussion of many relevant cases see Baele (2013)]. Unlike classic health and education interventions, these projects routinely deal with interventions that have winners and losers, create risks for some, and are done without the consent of all parties affected by them.

The absence of clear principles to handle these issues leaves individuals and the professions in a difficult situation, at least if they care about the ethical implications of their research designs above and beyond whether they receive formal research approval.

So how should researchers proceed in these cases? At present there are no satisfactory answers. To make progress I discuss three sets of problems raised by research designs like this, which I call the problem of audience , the problem of agency , and the problem of consent .

The audience question is about determining what the professional ethical issues are. My focus throughout will be on professional ethics rather than more metaphysical questions of what is right or wrong in some objective sense. Thus in Section 2, I highlight a conceptualization of the problem not as a problem of normative ethics – whether any of these designs are right or wrong in any fundamental sense – but as a question of audience. A key purpose of professional ethics is to clarify expectations of members of a profession for relevant groups that are important to their work. For medical ethics the key audience is patients, or particularly subjects: those patients with which medical professionals engage. The current guidelines used by social scientists are inherited from medical ethics, which place a primary focus on human subjects. While subjects perhaps represent the primary audience for medical interventions, this may not be the case for social science interventions for which the key audience can be the general public or policy. This section highlights the need for the construction of an ethics that addresses the preoccupations of social scientists engaging in this type of research. It also highlights the more thorny nature of this problem for interventions in which there are winners and losers, as in the motivating example above.

The agency problem is the problem of determining who is responsible for manipulations. I discuss this in Section 3, describing an argument – which I call the “spheres of ethics” argument – that researchers sometimes employ as grounds for collaborating in partnerships in which subjects are exposed to risks to an extent not normally admissible in the course of research projects. The key idea is that if an intervention is ethical for implementing agencies with respect to the ethical standards of their sphere – which may differ from the ethical standards of researchers – then responsibility may be divided between researchers and implementers, with research ethics standards applied to research components and partner standards applied to manipulations. Put crudely this approach can be considered a way of passing the buck, but in fact the arguments for employing it are much more subtle than that. In a way, the buck-passing interpretation fundamentally misses the point of professional ethics. Even still, this argument is subject to abuse and so this section outlines protections related to agency autonomy and legitimacy which in turn depend on the conceptualization of professional ethics described in Section 2.

The third problem is the critical problem of consent . The bulk of this essay focuses on consent and the role it plays in research ethics. Current norms for informed consent are again inherited from medical ethics and reflect answers in the medical community to the first two questions. Yet alternative conceptualizations of consent are possible, and may be more appropriate for social scientists, given the different answers to questions of audience and agency in social science research. I outline a range of these in Section 4.

I close with reflections on implications for practice and for the development of ethical standards that can address the issues raised by experimental research in social science.

2 Problem 1: Audience

What are we worrying about when we worry about whether implementing experiments like that described above is ethical? It often seems as though we are worrying about whether in some fundamental sense these research activities are right or wrong. But framing the question in that way renders it largely unanswerable. The more practical approach of professional ethics is to determine whether one or another action is more or less consistent with the expectations of a relevant “audience” regarding the behaviour of the members of the profession. [1]

While the response that ethical action is action that is in line with expectations of a relevant audience is not technically question begging, it does require the existence of some recognized set of norms for a profession. In practice, social scientists largely work within the ethical framework provided by the human subjects protection system. [2] The systems was devised primarily with a view to regulating medical research, but now covers all research involving human subjects, at least for US based researchers or researchers receiving federal funding.

The principles embedded in the Belmont report [3] and that permeate the work of Institutional Review Boards in the United States self-consciously seek to prescribe a set of common expectations for a community of researchers and their patients and clients. Indeed, sidestepping the question of ethical foundations seems to have been a strategy of the US Commission that produced these reports. [4] The pragmatic approach adopted by the commission is a strength. As argued by Jonsen (1983), medical ethics, as captured by the documents produced by the Commission, is “a Concord in Medical Ethics,” a concord “reached by a responsible group drawn from the profession and from the public.”

But this pragmatic approach also limits the pretensions to universality of research ethics in an obvious way. The principles of the Belmont report were developed to address particular problems confronting the medical profession that carry authority because they were developed through a deliberative process that sought to reach consensus in the profession around conventions of behaviour. The result is both elegant in sidestepping the unanswerable questions and messy in its result. The final principles are a mixture of deontological and consequentialist principles, with no overarching principle to refer to to determine what kinds of tradeoffs should be made in cases where interventions that benefit one group harm another. The practical solution is to outsource the problem of making these determinations to the judgments of individuals placed on university institutional review boards. While effective for some purposes, there is ex ante no reason to expect that the principles developed provide the appropriate guidelines for social science. [5]

The poor fit stems in part from the fact that medical research differs from social science research in various ways.

researchers are interested in the behaviour of institutions or groups, whether governmental, private sector, or nongovernmental, and do not require information about individuals (for example if you want to figure out if a government licensing agency processes applications faster from high caste applicants than from low caste applicants)

those most likely to be harmed by an intervention are not the subjects (for example when researchers are interested in the behaviour of bureaucrats whose decisions affects citizens, or in the behaviour of pivotal voters, which in turn can affect the outcome of elections)

subjects are not potential beneficiaries of the research and may even oppose it (for example for studies of interventions seeking to reduce corruption in which the corrupt bureaucrats are the subjects)

consent processes can compromise the research (for example for studies that seek to measure gender or race based discrimination by landlords or employers)

there is disagreement over whether the outcomes are valuable (compare finding a cure for a disease to finding out that patronage politics is an effective electoral strategy); indeed some social scientific interventions are centered on the distributive implications of interventions: when different outcomes benefit some and hurt others, the desideratum of benefitting all that are implicated by an intervention is unobtainable

there is no expectation of care between the research subjects and the researcher

These features can sometimes make the standard procedures used by Institutional Review Boards for approving social science research irrelevant or unworkable.

The first two differences mean that formal reviews, as currently set up, can ignore the full range of benefits and harms of research or do not cover the research at all. Formal reviews focus on human subjects: living individuals about whom investigators obtain data through intervention or interaction or obtain identifiable private information.

The third and fourth, which again focus on subjects rather than broader populations, can quickly put the principles of justice and respect for persons – two of the core principles elaborated in the Belmont report (upon which standard review processes are based) at odds with research that may seem justifiable on other grounds.

The fifth difference can make the third Belmont principle, beneficence, unworkable, at least in the absence of some formula for comparing the benefits to some against the costs for others (see Baele 2013 on the difficulties of applying beneficence arguments).

The sixth difference means that the stakes are different. If a health researcher fails to provide care for an individual in a control group, this may violate their duty of care and break the public trust in their professions. This may not be true for social scientists however.

Thus, standard considerations inherited from the human subjects protection system can be blind to the salient considerations for social science researchers and their primary audiences. The focus on private data and the protection of subjects may sometimes seem excessive; but the blindness to the risks for non-subjects may be more costly. Specific risks, beyond welfare costs, are that researchers gain a reputation for providing unsound advice to government officials on sensitive issues, encourage the withholding of benefits from the public, interfere with judicial processes, or put vulnerable (non-subject) populations at risk, in order to further research agendas.

Refocussing on the question of audience however can give some guidance here. A preoccupation of medical ethics is the maintenance of relations of trust between medical professionals and patients. In this sense, patients are a key audience for medical ethics. [6] Patients can expect care from medical professionals no matter who they are. But the nature of social science questions puts researchers in different relations with subjects, most obviously when interventions are interventions aimed against subjects. It seems improbable that social scientists can maintain relations of trust with corrupt politicians, human rights abusers, and perpetrators of violence when the interventions they are examining are designed precisely to confront these groups.

What audiences are most critical for social scientists? Subjects are of course a key audience for social scientists also, not least because for much data collection depends on the trust, generosity, and goodwill of subjects. But two wider audiences are also critical and the fashioning of social science research ethics for field experimentation should focus closely on these. The first are research partners and the second are research consumers.

2.1 Partner Matters

As in the example above, much field experimentation can involve partnerships with local governmental or nongovernmental groups. Partnering in experimental research can be very costly for partners however. And if they do not have a full understanding of the research design, partners can be convinced to do things not in their interests which is a risk when the interests of partners and researchers diverge. One point of divergence is with respect to statistical power. For a partner, an underpowered study can mean costly investments that result in ambiguous findings. Underpowered studies are in general a problem for researchers too with the difference that they can still be useful if their findings can be incorporated into metaanalyses. Researchers may also be more willing to accept underpowered studies if they are less risk averse than partners and if they discount the costs of the interventions. Thus to account for global beneficence, researchers need to establish some form of informed consent with partners . At a minimum this requires establishing that partners really understand the limitations and the costs of an experiment.

One useful practice is to sign a formal Memorandum of Understanding between the researcher and the partner organization at the beginning of a project laying out the roles and responsibilities of both parties. However, even when they exist, these rarely include many of the most important elements that researchers are required to provide to subjects during the informed consent process, such as the potential risks or alternatives to experimentation. These documents could even include discussions of the power of a study to ensure that partners are aware of the probability that their experiment will result in unfavourable findings, even if their program has a positive impact. Having clearer standards for what information should be required before a partner consents to an experiment could facilitate continued positive relationships between researchers and partners.

In addition, concern must be given to how researchers explain technical information to partners. The informed consent process with research subjects defines additional precautions that must be taken to obtain consent from people with limited autonomy. Similarly, there is a burden on researchers to explain the risks and benefits of technical choices to partners in layman’s terms. Alderman et al. (2013) highlight the false expectations that subjects can have when they engage with researchers coming from privileged institutions and the responsibilities that this can produce. A similar logic can be in operation for partner organizations. Sharing (and explaining) statistical power calculations is one way of ensuring understanding. Another is to generate “mock” tables of results in advance so that partners can see exactly what is being tested and how those tests will be interpreted. [7]

A second concern relates to the researchers’ independence from partners. The concern is simple, that in the social sciences, as in medical sciences, partnering induces pressures on researchers to produce results that make the partner happy. These concerns relate to the credibility of results, a problem I return to below. The problems are especially obvious when researchers receive remuneration; but they apply more generally and may put the quality of the research at risk. But the lack of independence cuts the other way also: if staff in partner organizations depend on researchers for access to expertise or funding, this may generate conflicts of interest for them in agreeing to implement some kind of research or other.

One way that independence can be increased is through separation of funding: when researchers are not remunerated for conducting experimental evaluations, they may be freer to report negative results. Another is to clarify from the outset that researchers have the right to the data and the right to publish the results no matter what the findings are. However, even when these measures are taken, there may be psychological or ideological reasons that researchers might still not be fully independent from partners.

2.2 Users: Quality of Research Findings

Given the fact that field experiments can impose costs on some groups, including subjects, assessing the beneficence of a study is especially tricky. A part of the consideration of beneficence however involves an assessment of the quality of the work and the lessons that can be drawn from it. If an argument in favor of a research design is that the lessons from the research produce positive effects, for example by providing answers to normatively important questions, then an assessment of beneficence requires an expectation that the design is capable of generating credible results (Baele 2013). [8] In practice though researchers sometimes defend research that involves potential risks on the basis of the gains from knowledge there is rarely any kind of systematic accounting for such gains and rarely a treatment of how to assess these gains when there are value disagreements. Moreover researchers, given their interests in the research, are likely the wrong people to try to make this determination. Nevertheless, any claim based on the value of the findings needs to assume that the findings are credible.

The credibility of research depends on many features. I would like to draw attention to one which is the loss in credibility that can arise from weak analytic transparency. Post hoc analysis is still the norm in much of political science and economics. Until recently it has been almost impossible to find a registered design of any experiment in the political economy of development (in the first draft of this paper I pointed to one study; there are now close to 200 pre-registered designs housed on the EGAP registry (109), RIDIE (37), and AEA registry (49)). When experiments are not pre-registered there may be concerns that results are selected based on their statistical significance or the substantive claims they make, with serious implications for bias (Gerber and Malhotra 2008; Casey et al. 2012) .

As research of this form increases in prominence, there will be a need to develop principles to address these questions of audience. For this, social scientists might follow the lead of the National Commission that established the principles for health research and seek not to root assessments of what is or is not ethical research in conflicting moral intuitions or on normative theories that may or may not be broadly shared. Instead in response to the issues raised by field experiments, social scientists could initiate a public process to decide what should constitute expected practice in this field in light of the interests of the audiences specific to their research – notably partners, governments, and the general public. [9]

3 Problem 2: Agency

In the example above of an experiment on street-lighting the intervention was initiated and implemented by a local organization and not by the researchers. Is this fact of ethical relevance for researchers taking part in the experiment?

Currently many social science experiments are implemented in this way by political actors of various forms such as a government, an NGO or a development agency. In these cases, and unlike many medical trials, research often only exists because of the intervention rather than the other way round. [10] This approach can be contrasted with a “framed field experiment” in which the intervention is established by researchers for the purpose of addressing a research question and done in a way in which participants know that they are part of a research experiment. [11] In practice, of course, the distinction between these two types of experiment is often not clear, [12] even still it raises an important point of principle: can things be arranged such that the ethical responsibility for experiments can be shared with partners?

Assume heroically that there is agreement among researchers about appropriate standards of research. Say now, still more heroically, that there are other standards of behaviour for other actors in other spheres that are also generally accepted. For NGOs for example we might think of the INGO Accountability Charter; for governments we might think of international treaty obligations. One might think of these ethical principles in different spheres as stemming from a single theory of ethics, or simply as the possibly incompatible principles adopted by different communities. In either case, these different standards may specify different behaviours for different actors. Thus for example by the ethical principles of research, a researcher interviewing a genocidaire in Rwanda should seek fully informed consent prior to questioning and stop questioning when asked by the subject or if they sense discomfort on the part of the subject. However, a government interrogator might not, but still act ethically according to the principles adopted by governments by eschewing other behaviour, such as torture. In this example, the ethical constraints on the researcher seem more demanding. There may be more intractable incompatibilities if constraints are not “nested.” For example a researcher may think it unethical to give over information about a subject suspected of criminal activities while a government official may think it unethical not to.

The question then is whose ethical principles to follow when there are collaborations? One possibility is to adhere to the most stringent principle of the partners. Thus researchers working in partnerships with governments may expect governments to follow principles of research ethics when engaging with subjects. In some situations, discussed below, this may be a fruitful approach. But as a general principle it suffers from two flaws. The first is that in making these requirements the researcher is altering the behaviour of partners in ways that may limit their effectiveness. The second is that, as noted above, the constraints may be non-nested: the ethical position for a government may be to prosecute a criminal; but the researcher wants to minimize harm to subjects. In practice this might rule out appending research components to interventions that would have happened without the researcher and that are ethical from the perspective of implementers; it could for example prevent the use of experimental approaches to study a large range of government strategies without any gain, and possibly some loss, to affected populations.

An alternative approach is to divide responsibilities: to make implementers responsible for implementation and researchers responsible for the research. This is what I call above the “spheres of ethics” argument. The principle of allocating responsibility of implementation to partners may then be justified on the grounds that in the absence of researchers, partners would be implementing (or, more weakly, that they could implement) such interventions anyhow, and are capable of bearing ethical responsibility for the interventions outside of the research context.

Quite distinct rationales for this approach are that partner organizations may be better placed to make decisions in the relevant areas and may be more effectively held to account if things go wrong. In addition partners may be seen by others as having legitimacy to take actions which might (correctly) be seen as meddling by outsiders (see Baele (2013) on the “Foreign Intervention problem”).

As a practical matter researchers can do this in an underhand way by advising on interventions qua consultants and then returning to analyse data qua researchers; or by setting up an NGO to implement an intervention qua activist and then return for the data qua researcher. But this approach risks creating a backdoor for simply avoiding researcher responsibilities altogether.

Instead, by appealing to spheres of ethics, researchers collaborating with autonomous partners can do something like this in a transparent way by formally dividing responsibility. Although researchers play a role in the design of interventions it may still be possible to draw a line between responsibility for design and responsibility for implementation. Here, responsibility is understood not in the causal sense of who contributed to the intervention, but formally as who shoulders moral and legal responsibility for the intervention.

An argument against the spheres of ethics approach is that it is simply passing the buck and not engaging with the ethical issues at all. But this response misses the point of professional ethics; professional ethics is not about what outcomes should obtain in the world but about who should do what. Allocating responsibility to partners is no more buck-passing than calling on police to intervene in a threatening situation rather than relying on self-help.

The sphere of ethics approach is consistent with ideas in medical research for assessing non-validated practice. On this issue the Belmont report notes: “Research and practice may be carried on together when research is designed to evaluate the safety and efficacy of a therapy. This need not cause any confusion regarding whether or not the activity requires review; the general rule is that if there is any element of research in an activity, that activity should undergo review for the protection of human subjects.” In terms of the standards to be applied in such a review, however, Levine (1988) notes: “the ethical norms and procedures that apply to non-validated practice are complex. Use of a modality that has been classified as non-validated practice is justified according to the norms of practice. However, the research designed to develop information about the safety and efficacy of the practice is conducted according to the norms of research.”

Levine’s interpretation of the division of labour appears consistent with the spheres of ethics approach. But the approach raises at least two critical difficulties. The first is a problem of implementer autonomy . In practice implementers may not be so autonomous from the researchers, in which case the spheres of ethics argument may simply serve as a cover for avoiding researcher responsibilities. The second is deeper: the argument is incomplete insofar as it depends on an unanswered normative question: it requires that the researcher have grounds to deem actions that are ethical from the partner’s perspective are indeed ethical – perhaps in terms of content or on the grounds of the process used by partners to construct them. This is the partner legitimacy concern. A researcher adopting a spheres of ethics argument may reasonably be challenged for endorsing or benefitting from weak ethical standards of partners. Indeed without an answer to this question, any collection of people could engage in any action which they claim to be ethical with respect to their “sphere;” a version of this argument could for example serve as grounds for doctors participating in medical experimentation in partnership with the Nazi government.

In line with the principle of socially constructed professional ethics, described in Section 2, a solution might be the formal recognition by the professions of classes of legitimate partners for various spheres – such as all governments, or all governments satisfying some particular criteria. The incompleteness of the spheres of ethics argument then adds urgency to the need for an answer to the problem of audience.

4 Problem 3: Consent

Medical ethics places considerable focus on the principle of informed consent, and indeed consent can in principle allay the twin concerns of audience and agency discussed in Sections 2 and 3: If the relevant audience provides consent then the expectations of the audience are arguably met and there is also a clearer allocation of responsibility for action. Both of these arguments confront difficulties however. Moreover different conceptualizations of audience and agency have different implications for consent.

The US National Commission motivated the principle of consent as follows:

Respect for persons requires that subjects, to the degree that they are capable, be given the opportunity to choose what shall or shall not happen to them… there is widespread agreement that the consent process can be analyzed as containing three elements: information, comprehension and voluntariness.

In promoting the concept of consent, the commission also sought to produce definitional clarity around it. Whereas the terms can mean many things in different settings, as described by Levine (1988), “the Commission […] abandoned the use of the word “consent,” except in situations in which an individual can provide “legally effective consent” on his or her own behalf.” [13]

In practice however in many social experiments, consent is very imperfect. imperfect consent is routinely sought for measurement purposes, for example when survey data is collected. It is sometimes sought at least implicitly for interventions, although individual subjects may often not be consulted on whether for example they are to be exposed to particular ads or whether a school is to be built in their town. But even if consent for exposure to a treatment is sought, individual level consent may not be sought for participation in the experiment per se, for example subjects are often not informed that they were randomly assigned to receive (or not receive) a treatment for research purposes. [14]

To assess how great a problem this is, it is useful to consider the rationales for informed consent that inspired medical professionals and other rationales that may be relevant for social scientists.

4.1 The Argument from Respect of Persons

The argument provided for informed consent in the Belmont report and related documents is the principle of “respect for persons.” Manipulating subjects without their consent diminishes their autonomy and instantiates a lack of respect. Consent, conversely, can serve two functions.

The first is diagnostic : that consent can provide a test of whether people are in fact being used “merely as ends.” [15] Critically, this diagnostic function of consent can in principle be achieved without actual consent; though actual consent eliminates the need for guesswork.

The second is effective : that consent may enhance autonomy (or conversely, forgoing consent reduces autonomy). Thus the Belmont report advises the importance of maximizing the autonomy of subjects: “Respect for persons requires that subjects, to the degree that they are capable, be given the opportunity to choose what shall or shall not happen to them.” There are multiple aspects of autonomy that may be affected by engagement with an experiment, with somewhat different implications for what is required of consent. I distinguish here between three: participation autonomy , behaviour autonomy , and product autonomy . [16]

The first, participation autonomy , relates to the decision of whether or not to be involved with the research. The absence of choice reduces subject autonomy at least with respect to the decision to take part. Behavioural autonomy may be compromised due to lack of consent because of information deficits (see example below) resulting in subjects making decisions that they would not otherwise make, given the options available to them. Behavioural autonomy can also be compromised if individuals’ choice sets are constrained because of the manipulation. Third, as a subject’s actions yield a research product, a lack of consent means that the subject loses control over how their labour is to be used, or a loss of product autonomy . [17] To illustrate: say an intervention broadcasts information about political performance on the radio in order to assess how the information alters voting behaviour by the politician’s constituents. Done without consent, the listeners had no option but to take part in the study (participation autonomy), their subsequent actions are affected by the treatment and might have been different had they known the information was provided for research purposes (behavioural autonomy) and they will have no say in the publication of knowledge that is derived from their actions (product autonomy).

A problem with this formulation is that consent, or even notional consent, is not clearly either a necessary or sufficient condition for respect for persons. That is, unless respect for persons is defined in terms of consent (rather than, for example, a concern with the welfare or capabilities of others), the diagnostic function of consent as described above faces difficulties. There is a logical disconnect between consent and respect since determining respect requires information about the disposition of the researcher but consent provides information on the disposition of the subject. Consent might not be a necessary condition for establishing respect for persons since it is possible that the subject would never consent to an action that is nevertheless taken by a researcher with a view to enhancing their welfare or their capabilities. And of course, subjects may consent to actions not in their interests and not consent to other actions that are, or they may unknowingly take actions that limit their autonomy. The specific markers sometimes invoked to indicate that respect for persons is violated, such as the use of deceit or force, also suffer difficulties since one can construct instances in which a deceived person can recognize that deceit was necessary to achieve a good in question. [18] In addition, consent might not be sufficient since it is possible that a subject consents to an action that is not being done because it is in their interest, but nevertheless has their welfare as a byproduct.

Consider again the three types of autonomy that are threatened by an incomplete consent process. Loss in participation autonomy does not necessarily imply that individuals are treated simply as ends. Holding a surprise birthday for a friend deliberately compromises participation autonomy in order to provide a benefit for the friend – one that they might consent to if only the consent did not destroy the surprise. [19] In some situations, where providing consent may put individuals at risk, not seeking consent may even increase participation autonomy by providing the choice to participate de facto or not even if risks make formal consent impossible. Even in the absence of consent however it is possible that participation in an experiment enhances behaviour autonomy either by expanding information or by expanding choice sets. Product autonomy can be restored by ex post consent, for example allowing a subject to determine whether they want data collected from them to be used in an analysis. Thus consent, as currently required, does not seem to be necessary or sufficient for the work asked of it.

4.2 Other rationales for Consent

Legal protection from charges of abuse : A nonethical reason for seeking consent is to protect researchers from civil or criminal charges of abuse. For medical trials, the need for protection is obvious since actions as simple as providing an injection involve physical injury, which would under normal circumstance have criminal implications. [20] Consent clarifies that the action is non-criminal in nature (although this depends on the action – consent to be killed does not generally protect the killer). The rationale for documenting consent is primarily legal. As noted by Levine, HEW regulations “require that if there are risks associated with research then ‘legally effective informed consent will be obtained… The purpose of documenting consent on a consent form is […] to protect the investigator and the institution against legal liability” (Levine 1979).

Information aggregation/subject filtering : Consent may also provide researchers with information regarding the relative costs or benefits of an intervention. If a researcher discovers that an individual is unwilling to take part in a study, this provides information on the perceived benefits of the study. In such cases there are double grounds not to proceed, not just because it compromises autonomy but also because it violates beneficence. As discussed below however, this goal of information aggregation may be met at a population level by seeking consent from a subset of potential subjects.

Maintaining the reputation of the academy : A third rationale for consent is that consent preserves the reputation of the academy. It clarifies to the public the nature of relations between researchers and populations, that this relation is based on respect, and that populations should not expect that their trust in researchers will be abused or that they will be put at risk without consent. Though clearly of pragmatic benefit to the academy this argument is ethical insofar as it reflects a standard of behaviour that is expected of a particular group. Note that this argument, more than any of the others, provides a rationale for ethical standards specific to researcher-subject relations that maintain higher standards than is expected of general interactions.

In the context of naturally occurring field experiments, there are also arguments for why consent might not be sought.

One is that because the intervention is naturally occurring, an attempt to gain consent would be intrusive for subjects and especially damaging for research. Consider for example an experiment that focuses on the effects of billboard ads. In this experiment it is precisely because seeing government ads is a routine event that preceding (if that is possible) viewing of the ad with an announcement that the ad is being posted to understand such and such an effect will have particularly adverse consequences. Preceding the ad with a disclaimer may moreover falsely suggest to subjects that some unusual participation or measurement is taking place, even if a purpose of the disclaimer is to deny it.

A second, more difficult reason is that the withholding of consent may not be within the rights of the subjects. Consider for example a case where a police force seeks to understand the effects of patrols on reducing crime. The force could argue that the consent of possible criminals (the subjects in this case) is not required, and indeed is undesirable, for the force to decide where to place police. This argument is the most challenging since it highlights the fact that consent is not even notionally required by all actors for all interventions, even if it is generally always required of researchers for subjects. In this example the police can argue that the subject has no rights over whether or how the intervention is administered (participation autonomy). One might counter that even if that is correct, the subject may still have rights regarding whether his responses to the interventions can be used for research purposes (product autonomy). However, one might in turn counter that even these concerns might be discounted if the actions are public information.

In Section 2, I noted that maintaining the trust of subjects is of paramount concern to medical researchers. This provides a basis for insisting on informed consent by subjects. As argued in Section 2, for social scientists, the confidence of the general public and of policy makers in particular are also critical. Moreover the welfare of non-subjects may be of critical importance. These considerations have two implications: first that depending on the treatment of the problem of audience, the form of consent needed may differ from the current standard; second that depending on the population affected, the focus on subjects as the locus of consent may not be appropriate: the informed consent of practitioner partners and affected third parties may be just, or perhaps more, critical.

4.3 Varieties of Consent

Given the multiple desiderata associated with consent we may expect that variations of the informed consent process might succeed in meeting some or other of these.

For example, if what is valued is participation autonomy , then this seems to require actual ex ante consent. The loss in autonomy consists of the absence of choice to be subjected to a treatment. The demands of product autonomy , unlike participation or behaviour autonomy, can be met with ex post consent. The demands of the diagnostic test can in principle be met by notional consent, and so on.

With this in mind, Table 1 , considers how eight approaches to the consent process fare on different desiderata. [21]

Consent Strategies.

Source: Author.

Ex ante informed consent : Ex ante informed consent fares well on autonomy principles as well as on legal protection of researchers (if documented) and the reputation of the discipline. As argued above however it is not a necessary or sufficient condition for respect for persons, in addition it may impose costs on subjects, weaken the quality of some kinds of research, and be costly to achieve.

Implied consent: An alternative is implied consent which arises when there are grounds to think that consent is given even if consent is not formally given or elicited. Implied consent might include cases in which voluntary participation is itself considered evidence of consent to be in a study. Implied consent can reduce costs to subjects and researchers but may leave researchers in a legally weaker position and may put their reputation more into question.

Proxy (delegated) consent : Both ex ante consent and implied consent suppose that subjects are informed of the purpose of the experiment ex ante . In some settings, this can threaten the validity of the research. An approach to maintain a form of participation autonomy but keep subjects blind to treatment is to ask subjects to delegate someone who will be given full information and determine on their behalf whether to give consent. [22] Insofar as the subject sees the delegate as their agent in the matter, proxy consent inherits the benefits of ex ante informed consent, but with reduced risks to the research. A weaker alternative – the “ authoritative ” approach – is to seek consent from a proxy that is not specifically delegated for the purpose by a subject. In some settings for example the consent of community leaders is sought for interventions that take place at a community level; this procedure invokes the principles of proxy consent but assumes that individuals that are delegated for one purpose inherit the authority to be delegates for the consent process. Baele (2013) for example recommends this form of consent.

Superset (Blanket) Consent : Another way to protect research integrity while preserving subject autonomy is to seek what might be called “Superset consent.” Say a researcher identifies set X of possible experiments, including the experiment of interest. The researcher then asks subject to identify set C ∩ X of interventions for which the subject is willing to take part. [23] Given this procedure, if set C includes the experiment of interest, a researcher can conclude that consent has been given for the experiment of interest even though the experiment has not been specified as being the one of interest. In practice, abstract descriptions may suffice to generate consent for large classes of experiments (for example a subject may consent to any experiment that seeks to answer some question in some class for which there is no more than minimal harm); greater coarsening of this form implies less specific information (see Easton on waived consent). [24]

Package consent : An alternative to superset consent is a process in which subjects are asked whether they are willing to take part in an experiment that will involve some intervention in set X , including the intervention of interest. If the subject agrees, then consent for the intervention is assumed. This differs from superset consent insofar as it is possible that a subject would be willing to accept the package but not accept the individual component if offered that component alone. For example if X contained experiment A in which I could expect to win $1000 and experiment B in which I expect to lose US$10 I might consent to set X, but only in the hope that I will be assigned to experiment A . To enhance informedness, the subject may be provided with the probabilities associated with the implementation of each possible experiment. Critically, this approach may be inconsistent with a desire to have continuous consent – in the sense of consent not just at study outset but in the course of the study also. In a sense under this design a deal is struck between researcher and subject and the subject is expected to follow through on their side of the deal; this limitation runs counter to common practice but is not inconsistent with respect for persons.

Deferred (retrospective, ex post) consent : When consent is not sought before the fact, it is common to provide a debriefing after the fact. In some cases this might be important to avoid harm. In the Milgrom experiments debriefing could help remove guilt, if subjects find out that they did not in fact torture the confederates. But beyond debriefing it is possible to seek consent after the fact (Fost and Robertson 1980). For some purposes this is too late: it does not restore participation or behaviour autonomy, [25] but it does provide product autonomy and it does satisfy the diagnostic test. In some situations, retrospective consent might impose costs on subjects however and generate a sense of lost autonomy.

Inferred (surrogate) consent: [26] Consent is inferred (sometimes, “presumed”) if there are empirical grounds to expect that consent would be given were it elicited. As described above, the diagnostic test for respect for person is not that consent has been obtained but that it would not be refused if sought. This question is partly answerable. A number of different approaches might be used. For example one might describe an experiment to a random subset of subjects and ask them if they would be happy to take part in this experiment, or if they would be happy to take part in this experiment, even if their consent were not sought . One could also combine this with ex post consent by implementing the experiment with a subset of actors and then ask them if they are happy that they took part, even though they were not told the purpose; or alternatively if, knowing what they know now, would they have been willing to give their consent to take part ex ante ? Inferences may then be made to the willingness of the larger population to provide consent. This might be called the statistical approach. [27] Again a weaker, authoritative alternative may be invoked by seeking consent from a third person that does not have legitimacy to speak on behalf of the subject but who is believed to have insight into the subject’s disposition.

The final approach marked in column 8 Table 1 , is the spheres of ethics approach, described in Section 3.

Thus although currently researchers use a very narrow operationalization of the principle of consent the broader menu of possibilities is quite large. As of now, researchers could test and develop these in settings in which consent is not routinely sought. Though most of these fall short of fully informed consent, many meet the principles of respect for persons more effectively than consent as sometimes practiced. Looking forward, collective answers to the question of audience and agency can help determine which type of consent is optimal when.

5 Conclusion

I have described the primary problem of assessing the ethical implications of social experiments as a problem of audience. Medical ethics have been developed in large part to regulate relations between medical researchers and patients. Social scientists have adopted the framework created for medical researchers but their audiences are different: at least in the area of experimental research on public policy, relations with policy makers, practitioner organizations, and the general public can be just as important as the relationship with research subjects. Moreover the interests of these different groups often diverge, making the problem of constructing ethics more obviously political.

These considerations suggest two conclusions.

First, rather than seeking some fundamental answer to ethical dilemmas or seeking to address the practical problems facing social scientists using the tools generated for another discipline, there is a need for a social process of construction of ethical principles that address the preoccupations of social scientists in this field, especially in settings in which there are power imbalances between lead researchers and research partners and in which there are value disagreements regarding what constitutes beneficent outcomes. Such a process will be inherently political. Just as social scientific interventions are more likely to have distributive implications – generating costs for some and benefits for others – so ethical principles of engagement, if there is to be engagement at all, may require the principled taking of sides, that is, the choice of an audience. The importance of constructing an appropriate ethics for this field is of some urgency since there is no reason to expect that all researchers working in this domain will independently converge on consistent standards for experimental research in grey areas.

Second, depending on answers to the problem of audience, it may turn out that answers to the questions of agency (Section 3) and consent (Section 4) will be different for social scientists than for medical researchers. I have sketched some possible answers to the questions of agency and consent that diverge somewhat from standard practice. Currently when researchers engage in studies that generate risks, they defend the research on the basis of its social value. But they do so often as interested researchers and without equipment to weigh benefits in the presence of value disagreements. Greater efforts to share the responsibility of research, whether through more carefully crafted relations of agency with developing country actors or more diligent focus on consent may reduce these pressures on value assessments and may also reduce risks to both populations and the professions.

Acknowledgments

Warm thanks to the WIDER research group on Experimental and Non-Experimental Methods in the Study of Government Performance. Earlier version presented at UCSD conference on ethics and experiments in comparative politics. My thanks to Jasper Cooper and Lauren Young for very generous comments on this manuscript. This paper draws on previous work titled “Ethical Challenges of Embedded Experimentation.”

Abram, M. B. and S. M. Wolf (1984) “Public Involvement in Medical Ethics. A Model for Government Action,” The New England Journal of Medicine, 310(10):627–632. 10.1056/NEJM198403083101005 Search in Google Scholar

Alderman, Harold, Jishnu Das, and Vijayendra Rao (2013) Conducting Ethical Economic Research: Complications from the Field . World Bank Policy Research Working Paper No. 6446. 10.1596/1813-9450-6446 Search in Google Scholar

Baele, S. J. (2013) “The Ethics of New Development Economics: is the Experimental Approach to Development Economics morally wrong?,” Journal of Philosophical Economics, 7(1):2–42. Search in Google Scholar

Bertrand, M., S. Djankov, R. Hanna, and S. Mullainathan (2007) “Obtaining a Driver’s License in India: An Experimental Approach to Studying Corruption,” The Quarterly Journal of Economics, 122(4):1639–1676. 10.1162/qjec.2007.122.4.1639 Search in Google Scholar

Binmore, K. G. (1998) Game Theory and the Social Contract: Just Playing . Vol. 2. Cambridge: MIT Press. Search in Google Scholar

Casey, K. R. Glennerster, and E. Miguel (2012) “Reshaping Institutions: Evidence on Aid Impacts Using a Preanalysis Plan,” The Quarterly Journal of Economics 127(4):1755–1812. 10.1093/qje/qje027 Search in Google Scholar

Cassileth, B. R., R. V. Zupkis, K. Sutton-Smith, and V. March (1980) “Informed Consent – Why are its Goals Imperfectly Realized?” The New England Journal of Medicine, 302(16):896–900. 10.1056/NEJM198004173021605 Search in Google Scholar

DeScioli, P. and R. Kurzban (2013) “A Solution to the Mysteries of Morality,” Psychological Bulletin, 139(2):477. 10.1037/a0029065 Search in Google Scholar

Fost, N. and J. A. Robertson (1980) “Deferring Consent with Incompetent patients in an Intensive Care Unit,” IRB, 2(7):5. 10.2307/3564363 Search in Google Scholar

Gerber, A. and N. Malhotra (2008) “Do Statistical Reporting Standards Affect What Is Published? Publication Bias in Two Leading Political Science Journals,” Quarterly Journal of Political Science, 3(3):313–326. 10.1561/100.00008024 Search in Google Scholar

Gray, J. D. (2001) “The Problem of Consent in Emergency Medicine Research,” Canadian Journal of Emergency Medicine, 3(3):213–218. 10.1017/S1481803500005583 Search in Google Scholar

Harms, D. (1978) “The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research,” DHEW Publication No. (OS) 78-0012. Search in Google Scholar

Jonsen, A. R. (1983) “A Concord in Medical Ethics,” Annals of Internal Medicine, 99(2):261–264. 10.7326/0003-4819-99-2-261 Search in Google Scholar

Kant, I. (1956). Critique of Practical Reason , translated by Lewis White Beck. Indianapolis, Ind.: Bobbs-Merrill. Search in Google Scholar

Levine, R. J. (1979). “Clarifying the Concepts of Research Ethics,” Hastings Center Report, 9(3):21–26. 10.2307/3560793 Search in Google Scholar

Levine, R. J. (1988). Ethics and Regulation of Clinical Research . Yale University Press. Search in Google Scholar

Levine, F. J. and P. R. Skedsvold (2008). “Where the Rubber Meets the Road: Aligning IRBs and Research Practice,” PS: Political Science and Politics, 41(3):501–505. Search in Google Scholar

Lipscomb A. and A.E. Bergh, eds. (1903) The Writings of Thomas Jefferson . Washington, DC: Thomas Jefferson Memorial Association of the United States, 1903-04. 20 vols. Search in Google Scholar

Love, R. R. and N. C. Fost (1997) “Ethical and Regulatory Challenges in a Randomized Control Trial of Adjuvant Treatment for Breast Cancer in Vietnam,” Journal of Investigative Medicine, 45:423–431. Search in Google Scholar

Pallikkathayil, J. (2010) “Deriving Morality from Politics: Rethinking the Formula of Humanity,” Ethics, 121(1):116–147. 10.1086/656041 Search in Google Scholar

Tolleson-Rinehart, S. (2008) “A Collision of Noble Goals: Protecting Human Subjects, Improving Health Care, and a Research Agenda for Political Science,” PS: Political Science and Politics, 41(3):507–511. Search in Google Scholar

Veatch, R. (2007) “Implied, Presumed and Waive Consent: the Relative Moral Wrongs of Under and Over-informing,” The American Journal of Bioethics, 7(12):39–41. 10.1080/15265160701710253 Search in Google Scholar

Vollmann, J. and R. Winau (1996) “Informed Consent in Human Experimentation Before the Nuremberg Code.” British Medical Journal, 313(7070):1445. 10.1136/bmj.313.7070.1445 Search in Google Scholar

Wantchekon, L. (2003) “Clientelism and Voting Behavior: Evidence from a Field Experiment in Benin,” World Politics, 55:399–422. 10.1353/wp.2003.0018 Search in Google Scholar

©2015, Macartan Humphreys, published by De Gruyter

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

  • X / Twitter

Supplementary Materials

Please login or register with De Gruyter to order this product.

Journal of Globalization and Development

Journal and Issue

Articles in the same issue.

social experiment research paper example

Research Paper

Social experiment research paper.

social experiment research paper example

This sample Social Experiment Research Paper is published for educational and informational purposes only. If you need help writing your assignment, please use our research paper writing service and buy a paper on any topic at affordable price. Also check our tips on how to write a research paper , see the lists of research paper topics , and browse research paper examples .

A social experiment is the random assignment of human subjects to two groups to examine the effects of social policies. One group, called the “treatment group,” is offered or required to participate in a new program, while a second group, the “control group,” receives the existing program. The two groups are monitored over time to measure differences in their behavior. For example, a social experiment can compare a program that gives unemployed individuals a financial reward for finding a job with one that does not. Or, a social experiment might compare students in schools that receive a new curriculum with students in schools that do not. Because the randomization procedure guarantees that the two groups are otherwise similar, the measured differences in their behavior can be causally attributed to the new program. The behavioral differences are sometimes called the “impacts” of the program. Commonly measured behavioral outcomes in social experiments include earnings, employment, receipt of transfer payments, health, educational attainment, and child development. Sample sizes in social experiments have ranged from under 100 to well over 10,000.

Some social experiments have more than one treatment group. In such cases, each treatment group is assigned to a different program. The various treatment groups may be compared to each other to determine the differential impacts of two of the tested programs, or they may be compared to the control group to determine the impact of the program relative to the status quo. The human subjects may be chosen randomly from the general population or, more commonly, may be chosen randomly from a target population, such as the disadvantaged.

Social experiments have been used extensively since the late 1960s. According to Greenberg and Shroder (2005) almost 300 social experiments have been conducted since then. Social experiments are very much like medical laboratory experiments in which the treatment group is given a new drug or procedure, while the control group is given a placebo or the standard treatment. Laboratory experiments have also been used extensively in the field of economics, since the 1970s (Smith 1994), but they differ from social experiments in that they are used mainly to test various aspects of economic theory, such as the existence of equilibrium or the efficiency of market transactions, rather than the effects of a social program. Also, economics laboratory experiments usually do not have a control group; instead, cash-motivated members of a treatment group are given the opportunity to engage in market transactions in a controlled environmental setting to determine whether they behave in a manner consistent with the predictions of economic theory. Some laboratory experiments in economics have been used to test public policy alternatives.

History Of Social Experiments

Much of the foundation of the modern approach to social experimentation can be traced back to the work of the famous statistician Ronald Fisher in the 1920s. Fisher refined the notion of random assignment and pointed out that no two groups could ever be identical. He noted that allocation of subjects to treatment and control groups by pure chance (by the flip of a coin or from a table of random numbers, for example) ensures that differences in the average behavior of the two groups can be safely attributed to the treatment. As a result, the direction of causality can be determined using basic statistical calculations. Fisher also recognized that randomization provides a means of determining the statistical properties of differences in outcomes between the groups.

The first major social experiment was the New Jersey Income Maintenance Experiment, which was initiated in the United States in 1968. Although a few smaller social experiments preceded the New Jersey Experiment (such as the Perry Preschool Project in 1962), they were much smaller in scope and much less sophisticated. The New Jersey Experiment tested the idea of a negative income tax (NIT), first proposed by the economists Milton Friedman and James Tobin in the 1960s. The New Jersey Experiment was the first of five NIT experiments conducted in North America (four in the United States and one in Canada) that had very sophisticated designs and many treatment groups. Problems evaluating certain aspects of these complex experiments led to much simpler experimental designs in ensuing years.

From the 1970s to the present, social experiments have been conducted in numerous social policy areas, including child health and nutrition, crime and juvenile delinquency, early child development, education, electricity pricing, health services, housing assistance, job training, and welfare-to-work programs. Notable experiments include the Rand Health Insurance Experiment, which tested different health insurance copayment plans; the Moving to Opportunity Experiments, which tested programs enabling poor families to move out of public housing; four unemployment insurance experiments that tested the effects of various financial incentives to induce unemployed individuals to return to work; and a number of welfare-to-work experiments that tested ways of helping welfare recipients find jobs.

Limitations Of Social Experiments

Although widely acknowledged as the ideal way to determine the causal effects of proposed social policies, social experiments have several important limitations. First, and perhaps most importantly, social experiments require that a control group be denied the policy change given to the treatment group. Because control groups in social experiments are typically disadvantaged, denial of program services may be viewed as constituting an ethical breach, thus limiting social experiments to places where resources prevent all eligible individuals from being served. Also, treatments that make a participant worse off are also viewed as unethical and politically infeasible.

social experiment research paper example

Second, although well-designed experiments have a high degree of internal validity (inferences are valid for the tested sample), they may not have external validity (they are not generalizable to other settings). One common criticism of experiments is that because of their limited size, they do not generate the macroeconomic, “community,” effects that a fully operational program would generate. For example, a fully operational job training program may affect the wages and employment of nonparticipants and may affect social norms and attitudes, whereas a limited size experiment would not. Additionally, there is no way of knowing for sure whether a successful experiment in one location would be successful in another location, especially because social experiments are typically conducted in places that are chosen not randomly, but for their capability and willingness to participate in an experiment.

Third, social experiments take time to design and evaluate, usually several years. Policymakers may not want to wait the required time to find out if a particular program works.

Finally, in practice, it has often proven difficult to implement random assignment. For one reason or another, individuals may not be willing to participate in a research study, and in cases where collaboration between researchers and government agencies is required, some may be unwilling to participate. As a result, the treatment and control groups that are tested may turn out to be unrepresentative of the target population.

Because of the various limitations of social experiments, other means of evaluating the effects of social policies have been developed. These are generally termed “nonexperimental” or “quasi-experimental” methods. Nonexperimental methods monitor the behavior of persons subjected to a new policy (the treatment group) and select a “comparison group” to serve the role of a control group. But because randomization is not used to select the two groups, it is never known for sure whether the comparison group is identical to the treatment group in ways other than receipt of the treatment. Many researchers match treatment group members to persons in the nonpar-ticipating population to make the groups as similar as possible. The matches are usually done using demographic and economic characteristics such as age, education, race, place of residence, employment and earnings history, and so on. One popular matching technique is propensity score matching, which uses a weighted average of the observed economic and demographic characteristics of the nonparticipating population to create a comparison group.

A particularly attractive nonexperimental method is the “natural experiment.” Natural experiments often are used to test the effects of social policies already in place. The natural experiment takes advantage of the way a new policy has been implemented so that the comparison group is almost a true control group. For example, military conscription (being draft eligible) during the Vietnam War was done by a national lottery that selected individuals for military service solely according to their date of birth. Thus, theoretically the group selected for military service should be identical to those not chosen, because the only difference is date of birth. Researchers wanting to test the effects of military conscription on individuals’ future behavior could compare outcomes (for example, educational attainment or earnings) of those conscripted with those not conscripted and safely attribute the “impacts” to conscription (Angrist 1990). Because not all conscripted individuals actually serve in the military and because some non-conscripted individuals volunteer for military service, it is also possible to estimate the impact of actual military service on future behavior by adjusting the impacts of conscription for differences in the proportion serving in the military in the treatment and comparison groups. However, the validity of this procedure rests crucially on the comparability of the military service veterans in the two samples.

The Future Of Social Experiments

Social experiments have changed in character since the late 1960s. Many early social experiments such as the NIT experiments, the Unemployment Insurance Experiments, and the Rand Health Insurance Experiment tested a “response surface” in which subjects were given “quantifiable” treatments of varying tax or subsidy rates. In contrast, most of the more recent social experiments are “black box,” meaning that a package of treatments is given to the treatment group, and it is not possible to separately identify the causal effects of each component of the package.

Black-box experiments have been criticized because they tend to have much less generalizability than response-surface experiments. Hence, many researchers have called for a return to nonexperimental evaluation as the preferred method of analyzing the effects of social policies. However, those favoring experimental methods have countered that social experimentation should remain the bedrock of social policy evaluation because the advantages are still great relative to nonexperimental methods (Burtless 1995). In an attempt to “get inside the black box,” those sympathetic with the social experiment as an evaluation tool have proposed ways of combining experimental and nonexperimental evaluation methods to identify causal effects of social policies (Bloom 2005). Nonexperimental methods are necessary because of a selection bias that arises when members of the treatment group who receive certain components of the treatment are not a random subset of the entire treatment group. In the future, social policy evaluation may make greater use of both evaluation methodologies—using experiments when feasible and combining them with nonexperimental methods when experiments cannot answer all the relevant policy questions.

Bibliography:

  • Angrist, Joshua D. 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. American Economic Review 80 (3): 313–336.
  • Bloom, Howard S., ed. 2005. Learning More from Social Experiments. New York: Russell Sage Foundation.
  • Burtless, Gary. 1995. The Case for Randomized Field Trials in Economic and Policy Research. Journal of Economic Perspectives 9 (2): 63–84.
  • Greenberg, David, and Mark Shroder. 2005. The Digest of Social Experiments. 3rd ed. Washington, DC: Urban Institute Press.
  • Greenberg, David, Donna Linksz, and Marvin Mandell. 2003. Social Experimentation and Public Policymaking. Washington, DC: Urban Institute Press.
  • Smith, Vernon. 1994. Economics in the Laboratory. Journal of Economic Perspectives 8 (1): 113–131.
  • How to Write a Research Paper
  • Research Paper Topics
  • Research Paper Examples

Free research papers are not written to satisfy your specific instructions. You can use our professional writing services to buy a custom research paper on any topic and get your high quality paper at affordable price.

ORDER HIGH QUALITY CUSTOM PAPER

social experiment research paper example

Related Posts

Psychology Research Paper

DESIGN, IMPLEMENTATION, AND ASSESSMENT OF A SOFTWARE TOOL KIT TO FACILITATE EXPERIMENTAL RESEARCH IN SOCIAL PSYCHOLOGY

This paper introduces a comprehensive Software Toolkit designed to facilitate the design, implementation, and assessment of experimental research within the field of social psychology. The toolkit includes a python tool integrated with Unreal Engine to run simulations using virtual reality. This tool allows Students and Psychologists to conduct experiments for learning purposes by helping them to create real-life like simulations in different environments. Following a series of experiments, we have determined that the tool performs effectively. With the potential for further updates and enhancements, we believe that the tool holds promise for utilization in social psychology experiments.

Degree Type

  • Master of Science
  • Computer Graphics Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Additional committee member 2, additional committee member 3, usage metrics.

  • Computer gaming and animation
  • Computer graphics

CC BY 4.0

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Writing Survey Questions

Perhaps the most important part of the survey process is the creation of questions that accurately measure the opinions, experiences and behaviors of the public. Accurate random sampling will be wasted if the information gathered is built on a shaky foundation of ambiguous or biased questions. Creating good measures involves both writing good questions and organizing them to form the questionnaire.

Questionnaire design is a multistage process that requires attention to many details at once. Designing the questionnaire is complicated because surveys can ask about topics in varying degrees of detail, questions can be asked in different ways, and questions asked earlier in a survey may influence how people respond to later questions. Researchers are also often interested in measuring change over time and therefore must be attentive to how opinions or behaviors have been measured in prior surveys.

Surveyors may conduct pilot tests or focus groups in the early stages of questionnaire development in order to better understand how people think about an issue or comprehend a question. Pretesting a survey is an essential step in the questionnaire design process to evaluate how people respond to the overall questionnaire and specific questions, especially when questions are being introduced for the first time.

For many years, surveyors approached questionnaire design as an art, but substantial research over the past forty years has demonstrated that there is a lot of science involved in crafting a good survey questionnaire. Here, we discuss the pitfalls and best practices of designing questionnaires.

Question development

There are several steps involved in developing a survey questionnaire. The first is identifying what topics will be covered in the survey. For Pew Research Center surveys, this involves thinking about what is happening in our nation and the world and what will be relevant to the public, policymakers and the media. We also track opinion on a variety of issues over time so we often ensure that we update these trends on a regular basis to better understand whether people’s opinions are changing.

At Pew Research Center, questionnaire development is a collaborative and iterative process where staff meet to discuss drafts of the questionnaire several times over the course of its development. We frequently test new survey questions ahead of time through qualitative research methods such as  focus groups , cognitive interviews, pretesting (often using an  online, opt-in sample ), or a combination of these approaches. Researchers use insights from this testing to refine questions before they are asked in a production survey, such as on the ATP.

Measuring change over time

Many surveyors want to track changes over time in people’s attitudes, opinions and behaviors. To measure change, questions are asked at two or more points in time. A cross-sectional design surveys different people in the same population at multiple points in time. A panel, such as the ATP, surveys the same people over time. However, it is common for the set of people in survey panels to change over time as new panelists are added and some prior panelists drop out. Many of the questions in Pew Research Center surveys have been asked in prior polls. Asking the same questions at different points in time allows us to report on changes in the overall views of the general public (or a subset of the public, such as registered voters, men or Black Americans), or what we call “trending the data”.

When measuring change over time, it is important to use the same question wording and to be sensitive to where the question is asked in the questionnaire to maintain a similar context as when the question was asked previously (see  question wording  and  question order  for further information). All of our survey reports include a topline questionnaire that provides the exact question wording and sequencing, along with results from the current survey and previous surveys in which we asked the question.

The Center’s transition from conducting U.S. surveys by live telephone interviewing to an online panel (around 2014 to 2020) complicated some opinion trends, but not others. Opinion trends that ask about sensitive topics (e.g., personal finances or attending religious services ) or that elicited volunteered answers (e.g., “neither” or “don’t know”) over the phone tended to show larger differences than other trends when shifting from phone polls to the online ATP. The Center adopted several strategies for coping with changes to data trends that may be related to this change in methodology. If there is evidence suggesting that a change in a trend stems from switching from phone to online measurement, Center reports flag that possibility for readers to try to head off confusion or erroneous conclusions.

Open- and closed-ended questions

One of the most significant decisions that can affect how people answer questions is whether the question is posed as an open-ended question, where respondents provide a response in their own words, or a closed-ended question, where they are asked to choose from a list of answer choices.

For example, in a poll conducted after the 2008 presidential election, people responded very differently to two versions of the question: “What one issue mattered most to you in deciding how you voted for president?” One was closed-ended and the other open-ended. In the closed-ended version, respondents were provided five options and could volunteer an option not on the list.

When explicitly offered the economy as a response, more than half of respondents (58%) chose this answer; only 35% of those who responded to the open-ended version volunteered the economy. Moreover, among those asked the closed-ended version, fewer than one-in-ten (8%) provided a response other than the five they were read. By contrast, fully 43% of those asked the open-ended version provided a response not listed in the closed-ended version of the question. All of the other issues were chosen at least slightly more often when explicitly offered in the closed-ended version than in the open-ended version. (Also see  “High Marks for the Campaign, a High Bar for Obama”  for more information.)

social experiment research paper example

Researchers will sometimes conduct a pilot study using open-ended questions to discover which answers are most common. They will then develop closed-ended questions based off that pilot study that include the most common responses as answer choices. In this way, the questions may better reflect what the public is thinking, how they view a particular issue, or bring certain issues to light that the researchers may not have been aware of.

When asking closed-ended questions, the choice of options provided, how each option is described, the number of response options offered, and the order in which options are read can all influence how people respond. One example of the impact of how categories are defined can be found in a Pew Research Center poll conducted in January 2002. When half of the sample was asked whether it was “more important for President Bush to focus on domestic policy or foreign policy,” 52% chose domestic policy while only 34% said foreign policy. When the category “foreign policy” was narrowed to a specific aspect – “the war on terrorism” – far more people chose it; only 33% chose domestic policy while 52% chose the war on terrorism.

In most circumstances, the number of answer choices should be kept to a relatively small number – just four or perhaps five at most – especially in telephone surveys. Psychological research indicates that people have a hard time keeping more than this number of choices in mind at one time. When the question is asking about an objective fact and/or demographics, such as the religious affiliation of the respondent, more categories can be used. In fact, they are encouraged to ensure inclusivity. For example, Pew Research Center’s standard religion questions include more than 12 different categories, beginning with the most common affiliations (Protestant and Catholic). Most respondents have no trouble with this question because they can expect to see their religious group within that list in a self-administered survey.

In addition to the number and choice of response options offered, the order of answer categories can influence how people respond to closed-ended questions. Research suggests that in telephone surveys respondents more frequently choose items heard later in a list (a “recency effect”), and in self-administered surveys, they tend to choose items at the top of the list (a “primacy” effect).

Because of concerns about the effects of category order on responses to closed-ended questions, many sets of response options in Pew Research Center’s surveys are programmed to be randomized to ensure that the options are not asked in the same order for each respondent. Rotating or randomizing means that questions or items in a list are not asked in the same order to each respondent. Answers to questions are sometimes affected by questions that precede them. By presenting questions in a different order to each respondent, we ensure that each question gets asked in the same context as every other question the same number of times (e.g., first, last or any position in between). This does not eliminate the potential impact of previous questions on the current question, but it does ensure that this bias is spread randomly across all of the questions or items in the list. For instance, in the example discussed above about what issue mattered most in people’s vote, the order of the five issues in the closed-ended version of the question was randomized so that no one issue appeared early or late in the list for all respondents. Randomization of response items does not eliminate order effects, but it does ensure that this type of bias is spread randomly.

Questions with ordinal response categories – those with an underlying order (e.g., excellent, good, only fair, poor OR very favorable, mostly favorable, mostly unfavorable, very unfavorable) – are generally not randomized because the order of the categories conveys important information to help respondents answer the question. Generally, these types of scales should be presented in order so respondents can easily place their responses along the continuum, but the order can be reversed for some respondents. For example, in one of Pew Research Center’s questions about abortion, half of the sample is asked whether abortion should be “legal in all cases, legal in most cases, illegal in most cases, illegal in all cases,” while the other half of the sample is asked the same question with the response categories read in reverse order, starting with “illegal in all cases.” Again, reversing the order does not eliminate the recency effect but distributes it randomly across the population.

Question wording

The choice of words and phrases in a question is critical in expressing the meaning and intent of the question to the respondent and ensuring that all respondents interpret the question the same way. Even small wording differences can substantially affect the answers people provide.

[View more Methods 101 Videos ]

An example of a wording difference that had a significant impact on responses comes from a January 2003 Pew Research Center survey. When people were asked whether they would “favor or oppose taking military action in Iraq to end Saddam Hussein’s rule,” 68% said they favored military action while 25% said they opposed military action. However, when asked whether they would “favor or oppose taking military action in Iraq to end Saddam Hussein’s rule  even if it meant that U.S. forces might suffer thousands of casualties, ” responses were dramatically different; only 43% said they favored military action, while 48% said they opposed it. The introduction of U.S. casualties altered the context of the question and influenced whether people favored or opposed military action in Iraq.

There has been a substantial amount of research to gauge the impact of different ways of asking questions and how to minimize differences in the way respondents interpret what is being asked. The issues related to question wording are more numerous than can be treated adequately in this short space, but below are a few of the important things to consider:

First, it is important to ask questions that are clear and specific and that each respondent will be able to answer. If a question is open-ended, it should be evident to respondents that they can answer in their own words and what type of response they should provide (an issue or problem, a month, number of days, etc.). Closed-ended questions should include all reasonable responses (i.e., the list of options is exhaustive) and the response categories should not overlap (i.e., response options should be mutually exclusive). Further, it is important to discern when it is best to use forced-choice close-ended questions (often denoted with a radio button in online surveys) versus “select-all-that-apply” lists (or check-all boxes). A 2019 Center study found that forced-choice questions tend to yield more accurate responses, especially for sensitive questions.  Based on that research, the Center generally avoids using select-all-that-apply questions.

It is also important to ask only one question at a time. Questions that ask respondents to evaluate more than one concept (known as double-barreled questions) – such as “How much confidence do you have in President Obama to handle domestic and foreign policy?” – are difficult for respondents to answer and often lead to responses that are difficult to interpret. In this example, it would be more effective to ask two separate questions, one about domestic policy and another about foreign policy.

In general, questions that use simple and concrete language are more easily understood by respondents. It is especially important to consider the education level of the survey population when thinking about how easy it will be for respondents to interpret and answer a question. Double negatives (e.g., do you favor or oppose  not  allowing gays and lesbians to legally marry) or unfamiliar abbreviations or jargon (e.g., ANWR instead of Arctic National Wildlife Refuge) can result in respondent confusion and should be avoided.

Similarly, it is important to consider whether certain words may be viewed as biased or potentially offensive to some respondents, as well as the emotional reaction that some words may provoke. For example, in a 2005 Pew Research Center survey, 51% of respondents said they favored “making it legal for doctors to give terminally ill patients the means to end their lives,” but only 44% said they favored “making it legal for doctors to assist terminally ill patients in committing suicide.” Although both versions of the question are asking about the same thing, the reaction of respondents was different. In another example, respondents have reacted differently to questions using the word “welfare” as opposed to the more generic “assistance to the poor.” Several experiments have shown that there is much greater public support for expanding “assistance to the poor” than for expanding “welfare.”

We often write two versions of a question and ask half of the survey sample one version of the question and the other half the second version. Thus, we say we have two  forms  of the questionnaire. Respondents are assigned randomly to receive either form, so we can assume that the two groups of respondents are essentially identical. On questions where two versions are used, significant differences in the answers between the two forms tell us that the difference is a result of the way we worded the two versions.

social experiment research paper example

One of the most common formats used in survey questions is the “agree-disagree” format. In this type of question, respondents are asked whether they agree or disagree with a particular statement. Research has shown that, compared with the better educated and better informed, less educated and less informed respondents have a greater tendency to agree with such statements. This is sometimes called an “acquiescence bias” (since some kinds of respondents are more likely to acquiesce to the assertion than are others). This behavior is even more pronounced when there’s an interviewer present, rather than when the survey is self-administered. A better practice is to offer respondents a choice between alternative statements. A Pew Research Center experiment with one of its routinely asked values questions illustrates the difference that question format can make. Not only does the forced choice format yield a very different result overall from the agree-disagree format, but the pattern of answers between respondents with more or less formal education also tends to be very different.

One other challenge in developing questionnaires is what is called “social desirability bias.” People have a natural tendency to want to be accepted and liked, and this may lead people to provide inaccurate answers to questions that deal with sensitive subjects. Research has shown that respondents understate alcohol and drug use, tax evasion and racial bias. They also may overstate church attendance, charitable contributions and the likelihood that they will vote in an election. Researchers attempt to account for this potential bias in crafting questions about these topics. For instance, when Pew Research Center surveys ask about past voting behavior, it is important to note that circumstances may have prevented the respondent from voting: “In the 2012 presidential election between Barack Obama and Mitt Romney, did things come up that kept you from voting, or did you happen to vote?” The choice of response options can also make it easier for people to be honest. For example, a question about church attendance might include three of six response options that indicate infrequent attendance. Research has also shown that social desirability bias can be greater when an interviewer is present (e.g., telephone and face-to-face surveys) than when respondents complete the survey themselves (e.g., paper and web surveys).

Lastly, because slight modifications in question wording can affect responses, identical question wording should be used when the intention is to compare results to those from earlier surveys. Similarly, because question wording and responses can vary based on the mode used to survey respondents, researchers should carefully evaluate the likely effects on trend measurements if a different survey mode will be used to assess change in opinion over time.

Question order

Once the survey questions are developed, particular attention should be paid to how they are ordered in the questionnaire. Surveyors must be attentive to how questions early in a questionnaire may have unintended effects on how respondents answer subsequent questions. Researchers have demonstrated that the order in which questions are asked can influence how people respond; earlier questions can unintentionally provide context for the questions that follow (these effects are called “order effects”).

One kind of order effect can be seen in responses to open-ended questions. Pew Research Center surveys generally ask open-ended questions about national problems, opinions about leaders and similar topics near the beginning of the questionnaire. If closed-ended questions that relate to the topic are placed before the open-ended question, respondents are much more likely to mention concepts or considerations raised in those earlier questions when responding to the open-ended question.

For closed-ended opinion questions, there are two main types of order effects: contrast effects ( where the order results in greater differences in responses), and assimilation effects (where responses are more similar as a result of their order).

social experiment research paper example

An example of a contrast effect can be seen in a Pew Research Center poll conducted in October 2003, a dozen years before same-sex marriage was legalized in the U.S. That poll found that people were more likely to favor allowing gays and lesbians to enter into legal agreements that give them the same rights as married couples when this question was asked after one about whether they favored or opposed allowing gays and lesbians to marry (45% favored legal agreements when asked after the marriage question, but 37% favored legal agreements without the immediate preceding context of a question about same-sex marriage). Responses to the question about same-sex marriage, meanwhile, were not significantly affected by its placement before or after the legal agreements question.

social experiment research paper example

Another experiment embedded in a December 2008 Pew Research Center poll also resulted in a contrast effect. When people were asked “All in all, are you satisfied or dissatisfied with the way things are going in this country today?” immediately after having been asked “Do you approve or disapprove of the way George W. Bush is handling his job as president?”; 88% said they were dissatisfied, compared with only 78% without the context of the prior question.

Responses to presidential approval remained relatively unchanged whether national satisfaction was asked before or after it. A similar finding occurred in December 2004 when both satisfaction and presidential approval were much higher (57% were dissatisfied when Bush approval was asked first vs. 51% when general satisfaction was asked first).

Several studies also have shown that asking a more specific question before a more general question (e.g., asking about happiness with one’s marriage before asking about one’s overall happiness) can result in a contrast effect. Although some exceptions have been found, people tend to avoid redundancy by excluding the more specific question from the general rating.

Assimilation effects occur when responses to two questions are more consistent or closer together because of their placement in the questionnaire. We found an example of an assimilation effect in a Pew Research Center poll conducted in November 2008 when we asked whether Republican leaders should work with Obama or stand up to him on important issues and whether Democratic leaders should work with Republican leaders or stand up to them on important issues. People were more likely to say that Republican leaders should work with Obama when the question was preceded by the one asking what Democratic leaders should do in working with Republican leaders (81% vs. 66%). However, when people were first asked about Republican leaders working with Obama, fewer said that Democratic leaders should work with Republican leaders (71% vs. 82%).

The order questions are asked is of particular importance when tracking trends over time. As a result, care should be taken to ensure that the context is similar each time a question is asked. Modifying the context of the question could call into question any observed changes over time (see  measuring change over time  for more information).

A questionnaire, like a conversation, should be grouped by topic and unfold in a logical order. It is often helpful to begin the survey with simple questions that respondents will find interesting and engaging. Throughout the survey, an effort should be made to keep the survey interesting and not overburden respondents with several difficult questions right after one another. Demographic questions such as income, education or age should not be asked near the beginning of a survey unless they are needed to determine eligibility for the survey or for routing respondents through particular sections of the questionnaire. Even then, it is best to precede such items with more interesting and engaging questions. One virtue of survey panels like the ATP is that demographic questions usually only need to be asked once a year, not in each survey.

U.S. Surveys

Other research methods, sign up for our weekly newsletter.

Fresh data delivered Saturday mornings

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

The Creature of Mary Shelley ‘s Frankenstein

This essay about the lack of a personal name for the creature in Mary Shelley’s *Frankenstein* explores the thematic and narrative significance of this choice. It argues that the absence of a name underscores the creature’s role as a scientific experiment rather than a recognized person, highlighting Victor Frankenstein’s moral failure and the creature’s societal rejection. The essay discusses how the creature’s various labels such as “monster,” “demon,” and “fiend” dehumanize him and emphasize his outsider status. Additionally, it examines the creature’s self-awareness and existential plight, noting that his lack of a name paradoxically becomes a central part of his identity. This namelessness invites readers to reflect on broader themes of identity, alienation, and the societal norms that define individual existence. Through this analysis, the essay demonstrates how the creature’s unnamed status enriches the novel’s exploration of these complex themes.

How it works

In Mary Shelley’s seminal work *Frankenstein*, the creature created by Victor Frankenstein is a pivotal character whose identity and existential plight are central to the novel’s narrative and thematic depth. Curiously, despite being one of the most iconic figures in literature, the creature is never given a personal name by his creator or by the author, a fact that adds layers of meaning to his characterization and to the story as a whole.

The absence of a name for the creature is significant in several ways.

First and foremost, it reflects his ambiguous status as both a human-like being and a piece of scientific experiment. By not naming his creation, Victor Frankenstein emphasizes the creature’s role as a scientific subject and product rather than recognizing him as a person. This depersonalization is a crucial element in understanding Victor’s moral failure and the tragic trajectory of the creature’s life. The lack of a name underscores the creature’s isolation and societal rejection, intensifying his struggle for identity and acceptance.

Furthermore, the creature’s lack of a name contributes to his symbolic function within the novel. He embodies broader themes of alienation, otherness, and the search for self-definition. Throughout the narrative, he is referred to in various impersonal terms such as “the creature,” “the monster,” “the demon,” “the wretch,” and “the fiend.” Each of these labels carries connotations that shape our perception of him, highlighting his exclusion and the horror he inspires in others. The names he is called by others serve to dehumanize him and justify the cruelty and prejudice he faces.

The creature’s awareness of his own namelessness is poignantly expressed in the novel when he compares himself to Adam, the first man, who was also without a mate, yet under completely different circumstances. Unlike Adam, the creature laments, he is “wretched, helpless, and alone.” This reference not only highlights his solitude but also his acute consciousness of his uniqueness and abandonment. It is through this self-awareness that the creature’s lack of a name paradoxically becomes a source of his identity. His namelessness reflects his unique existential predicament as a being created artificially and not born naturally, a being outside the normal social and moral order.

Additionally, the creature’s quest for identity and acceptance without a name invites a deeper reflection on the nature of identity itself. It challenges the reader to question what constitutes one’s identity—is it given by others through a name and social recognition, or can it be self-defined through one’s actions and experiences? The creature’s eloquent pleas for understanding and his philosophical musings about his own nature and fate are central to this inquiry, offering a profound critique of the societal norms that define and often constrain individual identity.

In conclusion, the creature’s lack of a name in Mary Shelley’s *Frankenstein* is not merely a trivial detail but a fundamental aspect of his characterization and a powerful narrative device. It underscores his role as Victor Frankenstein’s experiment and highlights the ethical implications of his creation. The creature’s namelessness also enriches the novel’s exploration of identity, alienation, and the existential struggles associated with being an outsider. Through his voice, Shelley invites readers to empathize with the creature, challenging the labels and perceptions that define and often limit us.

owl

Cite this page

The Creature Of Mary Shelley 's Frankenstein. (2024, Apr 29). Retrieved from https://papersowl.com/examples/the-creature-of-mary-shelley-s-frankenstein/

"The Creature Of Mary Shelley 's Frankenstein." PapersOwl.com , 29 Apr 2024, https://papersowl.com/examples/the-creature-of-mary-shelley-s-frankenstein/

PapersOwl.com. (2024). The Creature Of Mary Shelley 's Frankenstein . [Online]. Available at: https://papersowl.com/examples/the-creature-of-mary-shelley-s-frankenstein/ [Accessed: 30 Apr. 2024]

"The Creature Of Mary Shelley 's Frankenstein." PapersOwl.com, Apr 29, 2024. Accessed April 30, 2024. https://papersowl.com/examples/the-creature-of-mary-shelley-s-frankenstein/

"The Creature Of Mary Shelley 's Frankenstein," PapersOwl.com , 29-Apr-2024. [Online]. Available: https://papersowl.com/examples/the-creature-of-mary-shelley-s-frankenstein/. [Accessed: 30-Apr-2024]

PapersOwl.com. (2024). The Creature Of Mary Shelley 's Frankenstein . [Online]. Available at: https://papersowl.com/examples/the-creature-of-mary-shelley-s-frankenstein/ [Accessed: 30-Apr-2024]

Don't let plagiarism ruin your grade

Hire a writer to get a unique paper crafted to your needs.

owl

Our writers will help you fix any mistakes and get an A+!

Please check your inbox.

You can order an original essay written according to your instructions.

Trusted by over 1 million students worldwide

1. Tell Us Your Requirements

2. Pick your perfect writer

3. Get Your Paper and Pay

Hi! I'm Amy, your personal assistant!

Don't know where to start? Give me your paper requirements and I connect you to an academic expert.

short deadlines

100% Plagiarism-Free

Certified writers

IMAGES

  1. (PDF) How to Write an Abstract of Research Paper in Social Sciences

    social experiment research paper example

  2. How to Write a Social Science Essay .pdf

    social experiment research paper example

  3. Experimentation in social psychology Essay Example

    social experiment research paper example

  4. (PDF) The Social Experiment Market

    social experiment research paper example

  5. Social Science Research Paper Example : Research Proposal Template

    social experiment research paper example

  6. Social Science Research Paper Example / The term 'social science' is a

    social experiment research paper example

VIDEO

  1. Social experiment

  2. social experiment

  3. How do you like the social experiment?

  4. Social experiment 😇

  5. Social experiment

  6. was the social experiment successful?

COMMENTS

  1. APA Sample Paper: Experimental Psychology

    Writing the Experimental Report: Methods, Results, and Discussion. Tables, Appendices, Footnotes and Endnotes. References and Sources for More Information. APA Sample Paper: Experimental Psychology. Style Guide Overview MLA Guide APA Guide Chicago Guide OWL Exercises. Purdue OWL. Subject-Specific Writing.

  2. Social Psychology Experiments: 10 Of The Most Brilliant Studies

    5. The Milgram Social Psychology Experiment. The Milgram experiment, led by the well-known psychologist Stanley Milgram in the 1960s, aimed to test people's obedience to authority. The results of Milgram's social psychology experiment, sometimes known as the Milgram obedience study, continue to be both thought-provoking and controversial.

  3. Social Experiments and Studies in Psychology

    A social experiment is a type of research performed in psychology to investigate how people respond in certain social situations. ... In an experiment described in a paper published in 1920, ... An example of a social experiment might be one that investigates the halo effect, a phenomenon in which people make global evaluations of other people ...

  4. PDF Writing the Empirical Social Science Research Paper:

    Running Head: EMPIRICAL SOCIAL SCIENCE PAPER 5 paraphrased, must be clearly cited. The Empirical Social Science Research Paper The design of the social science research paper moves in the introduction from the general to the specific (see also: Bem, 1987). As we narrow in on the concept of interest,

  5. Experimental Reports 1

    Experimental reports (also known as "lab reports") are reports of empirical research conducted by their authors. You should think of an experimental report as a "story" of your research in which you lead your readers through your experiment. As you are telling this story, you are crafting an argument about both the validity and reliability of ...

  6. (PDF) Qualitative Experiments for Social Sciences

    Qualitative experiments are a recommended research approach for qualitative research aiming to explore and identify patterns, processes or behaviours (Steils, 2021). Qualitative experiment is the ...

  7. PDF Sample Paper: One-Experiment Paper

    Sample One-Experiment Paper (The numbers refer to numbered ... research has suggested that emotional information is privy to attentional selection in young adults (e.g., & Tapia, 2004; Nummenmaa, Hyona, & Calvo, 2006), an obvious service to evolutionary drives

  8. Social Experiments Research Paper

    Sample Social Experiments Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A!

  9. Journal of Experimental Social Psychology

    The Journal of Experimental Social Psychology (JESP) aims to publish articles that extend or create conceptual advances in social psychology. As the title of the journal indicates, we are focused on publishing primary reports of research in social psychology that use experimental or quasi-experimental methods, although not every study in an article needs to be experimental.

  10. PDF Field experimentation Methods for Social Psychology

    30% Research proposal paper and planning document 10% Research proposal presentation R Script Homework 1 and 2 Both homework scripts will entail analyzing and providing visualizations of a real dataset from a social psychology field experiment. Sample R scripts will be provided and reviewed prior to the posting of these homework assignments.

  11. Guide to Experimental Design

    Step 1: Define your variables. You should begin with a specific research question. We will work with two research question examples, one from health sciences and one from ecology: Example question 1: Phone use and sleep. You want to know how phone use before bedtime affects sleep patterns.

  12. Social Psychology Research Topics

    Examples of Social Psychology Research Topics . The following are some specific examples of different subjects you might want to investigate further as part of a social psychology research paper, experiment, or project: Implicit Attitudes . How do implicit attitudes influence how people respond to others? This can involve exploring how people's ...

  13. Journal of Experimental Psychology: General: Sample articles

    February 2011. by Jeff Galak and Tom Meyvis. The Nature of Gestures' Beneficial Role in Spatial Problem Solving (PDF, 181KB) February 2011. by Mingyuan Chu and Sotaro Kita. Date created: 2009. Sample articles from APA's Journal of Experimental Psychology: General.

  14. (PDF) Evaluating the methodology of social experiments

    The design of social experiments involves three distinct sorts of deci-. sions: 1) the choice of experimental population, 2) the choice of the. design space or range of possible treatments, and 3 ...

  15. Demonstrating the Power of Social Situations via a Simulated Prison

    The research, known as the Stanford Prison Experiment, has become a classic demonstration of situational power to influence individual attitudes, values and behavior. So extreme, swift and unexpected were the transformations of character in many of the participants that this study -- planned to last two-weeks -- had to be terminated by the ...

  16. Organizing Your Social Sciences Research Paper

    Bem, Daryl J. Writing the Empirical Journal Article. Psychology Writing Center. University of Washington; Denscombe, Martyn. The Good Research Guide: For Small-Scale Social Research Projects. 5th edition.Buckingham, UK: Open University Press, 2014; Lunenburg, Frederick C. Writing a Successful Thesis or Dissertation: Tips and Strategies for Students in the Social and Behavioral Sciences.

  17. PDF Writing a Formal Research Paper in the Social Sciences

    For a social science research paper, APA format is typically expected. APA format was developed for the social sciences, so it is followed fairly strictly in these types of papers in both formatting the paper and citing sources. When in doubt, follow APA guidelines. Use peer-reviewed sources for research.

  18. How to Write an APA Methods Section

    Research papers in the social and natural sciences often follow APA style. This article focuses on reporting quantitative research methods. ... Example: Reporting participant characteristics The participants included 134 cisgender men between 18 and 25 years old from a public university in New York. All participants were right-handed, fluent in ...

  19. Setting up social experiments: the good, the bad, and the ugly

    It is widely agreed that randomized controlled trials - social experiments - are the gold standard for evaluating social programs. There are, however, many important issues that cannot be tested using social experiments, and often things go wrong when conducting social experiments. This paper explores these issues and offers suggestions on ways to deal with commonly encountered problems ...

  20. (PDF) The Social Experiment Market

    The Social Experiment Market. David Greenberg, Mark Shroder. and Matthew Onstott. S ocial experiments are field studies of social programs in which individuals, households or (in rare instances ...

  21. Reflections on the Ethics of Social Experimentation

    Social scientists are increasingly engaging in experimental research projects of importance for public policy in developing areas. While this research holds the possibility of producing major social benefits, it may also involve manipulating populations, often without consent, sometimes with potentially adverse effects, and often in settings with obvious power differentials between researcher ...

  22. Social Experiment Research Paper

    A social experiment is the random assignment of human subjects to two groups to examine the effects of social policies. One group, called the "treatment group," is offered or required to participate in a new program, while a second group, the "control group," receives the existing program. The two groups are monitored over time to ...

  23. Design, Implementation, and Assessment of A Software Tool Kit to

    This paper introduces a comprehensive Software Toolkit designed to facilitate the design, implementation, and assessment of experimental research within the field of social psychology. The toolkit includes a python tool integrated with Unreal Engine to run simulations using virtual reality. This tool allows Students and Psychologists to conduct experiments for learning purposes by helping them ...

  24. Writing Survey Questions

    [View more Methods 101 Videos]. An example of a wording difference that had a significant impact on responses comes from a January 2003 Pew Research Center survey. When people were asked whether they would "favor or oppose taking military action in Iraq to end Saddam Hussein's rule," 68% said they favored military action while 25% said they opposed military action.

  25. Stanford Prison Experiment Ethical Issues

    Essay Example: Delving into the annals of psychological research, one cannot bypass the notorious Stanford Prison Experiment, a venture orchestrated by the esteemed Dr. Philip Zimbardo in the early 1970s. Intended to unravel the intricate dynamics of power and authority within a simulated prison

  26. Evaluating MCMC samplers

    I've been thinking a lot about how to evaluate MCMC samplers. A common way to do this is to run one or more iterations of your contender against a baseline of something simple, something well understood, or more rarely, the current champion (which seems to remain NUTS, though we're open to suggestions for alternatives).

  27. The Creature of Mary Shelley 's Frankenstein

    Essay Example: In Mary Shelley's seminal work *Frankenstein*, the creature created by Victor Frankenstein is a pivotal character whose identity and existential plight are central to the novel's narrative and thematic depth. Curiously, despite being one of the most iconic figures