Search form

  • About Faculty Development and Support
  • Programs and Funding Opportunities

Consultations, Observations, and Services

  • Strategic Resources & Digital Publications
  • Canvas @ Yale Support
  • Learning Environments @ Yale
  • Teaching Workshops
  • Teaching Consultations and Classroom Observations
  • Teaching Programs

Spring Teaching Forum

  • Written and Oral Communication Workshops and Panels
  • Writing Resources & Tutorials
  • About the Graduate Writing Laboratory
  • Writing and Public Speaking Consultations
  • Writing Workshops and Panels
  • Writing Peer-Review Groups
  • Writing Retreats and All Writes
  • Online Writing Resources for Graduate Students
  • About Teaching Development for Graduate and Professional School Students
  • Teaching Programs and Grants
  • Teaching Forums
  • Resources for Graduate Student Teachers
  • About Undergraduate Writing and Tutoring
  • Academic Strategies Program
  • The Writing Center
  • STEM Tutoring & Programs
  • Humanities & Social Sciences
  • Center for Language Study
  • Online Course Catalog
  • Antiracist Pedagogy
  • NECQL 2019: NorthEast Consortium for Quantitative Literacy XXII Meeting
  • STEMinar Series
  • Teaching in Context: Troubling Times
  • Helmsley Postdoctoral Teaching Scholars
  • Pedagogical Partners
  • Instructional Materials
  • Evaluation & Research
  • STEM Education Job Opportunities
  • Yale Connect
  • Online Education Legal Statements

You are here

Program assessment.

Program evaluation looks at the parameters, needs, components, and outcomes of program design with an eye towards improving student learning. It involves a complex approach, taking into consideration needs assessment, curriculum mapping, and various models of program review.

Needs Assessment

Curriculum mapping, program review, kirkpatrick model, data collection, you may be interested in.

program evaluation in education

Hosted annually, the Spring Teaching Forum fosters discussions about teaching and learning within the Yale community.

program evaluation in education

Reserve a Room

The Poorvu Center for Teaching and Learning partners with departments and groups on-campus throughout the year to share its space. Please review the reservation form and submit a request.

program evaluation in education

The Poorvu Center for Teaching and Learning routinely supports members of the Yale community with individual instructional consultations and classroom observations.

program evaluation in education

Search form

program evaluation in education

  • Table of Contents
  • Troubleshooting Guide
  • A Model for Getting Started
  • Justice Action Toolkit
  • Best Change Processes
  • Databases of Best Practices
  • Online Courses
  • Ask an Advisor
  • Subscribe to eNewsletter
  • Community Stories
  • YouTube Channel
  • About the Tool Box
  • How to Use the Tool Box
  • Privacy Statement
  • Workstation/Check Box Sign-In
  • Online Training Courses
  • Capacity Building Training
  • Training Curriculum - Order Now
  • Community Check Box Evaluation System
  • Build Your Toolbox
  • Facilitation of Community Processes
  • Community Health Assessment and Planning
  • Section 1. A Framework for Program Evaluation: A Gateway to Tools

Chapter 36 Sections

  • Section 2. Community-based Participatory Research
  • Section 3. Understanding Community Leadership, Evaluators, and Funders: What Are Their Interests?
  • Section 4. Choosing Evaluators
  • Section 5. Developing an Evaluation Plan
  • Section 6. Participatory Evaluation
  • Main Section
This section is adapted from the article "Recommended Framework for Program Evaluation in Public Health Practice," by Bobby Milstein, Scott Wetterhall, and the CDC Evaluation Working Group.

Around the world, there exist many programs and interventions developed to improve conditions in local communities. Communities come together to reduce the level of violence that exists, to work for safe, affordable housing for everyone, or to help more students do well in school, to give just a few examples.

But how do we know whether these programs are working? If they are not effective, and even if they are, how can we improve them to make them better for local communities? And finally, how can an organization make intelligent choices about which promising programs are likely to work best in their community?

Over the past years, there has been a growing trend towards the better use of evaluation to understand and improve practice.The systematic use of evaluation has solved many problems and helped countless community-based organizations do what they do better.

Despite an increased understanding of the need for - and the use of - evaluation, however, a basic agreed-upon framework for program evaluation has been lacking. In 1997, scientists at the United States Centers for Disease Control and Prevention (CDC) recognized the need to develop such a framework. As a result of this, the CDC assembled an Evaluation Working Group comprised of experts in the fields of public health and evaluation. Members were asked to develop a framework that summarizes and organizes the basic elements of program evaluation. This Community Tool Box section describes the framework resulting from the Working Group's efforts.

Before we begin, however, we'd like to offer some definitions of terms that we will use throughout this section.

By evaluation , we mean the systematic investigation of the merit, worth, or significance of an object or effort. Evaluation practice has changed dramatically during the past three decades - new methods and approaches have been developed and it is now used for increasingly diverse projects and audiences.

Throughout this section, the term program is used to describe the object or effort that is being evaluated. It may apply to any action with the goal of improving outcomes for whole communities, for more specific sectors (e.g., schools, work places), or for sub-groups (e.g., youth, people experiencing violence or HIV/AIDS). This definition is meant to be very broad.

Examples of different types of programs include:

  • Direct service interventions (e.g., a program that offers free breakfast to improve nutrition for grade school children)
  • Community mobilization efforts (e.g., organizing a boycott of California grapes to improve the economic well-being of farm workers)
  • Research initiatives (e.g., an effort to find out whether inequities in health outcomes based on race can be reduced)
  • Surveillance systems (e.g., whether early detection of school readiness improves educational outcomes)
  • Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
  • Social marketing campaigns (e.g., a campaign in the Third World encouraging mothers to breast-feed their babies to reduce infant mortality)
  • Infrastructure building projects (e.g., a program to build the capacity of state agencies to support community development initiatives)
  • Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)
  • Administrative systems (e.g., an incentive program to improve efficiency of health services)

Program evaluation - the type of evaluation discussed in this section - is an essential organizational practice for all types of community health and development work. It is a way to evaluate the specific projects and activities community groups may take part in, rather than to evaluate an entire organization or comprehensive community initiative.

Stakeholders refer to those who care about the program or effort. These may include those presumed to benefit (e.g., children and their parents or guardians), those with particular influence (e.g., elected or appointed officials), and those who might support the effort (i.e., potential allies) or oppose it (i.e., potential opponents). Key questions in thinking about stakeholders are: Who cares? What do they care about?

This section presents a framework that promotes a common understanding of program evaluation. The overall goal is to make it easier for everyone involved in community health and development work to evaluate their efforts.

Why evaluate community health and development programs?

The type of evaluation we talk about in this section can be closely tied to everyday program operations. Our emphasis is on practical, ongoing evaluation that involves program staff, community members, and other stakeholders, not just evaluation experts. This type of evaluation offers many advantages for community health and development professionals.

For example, it complements program management by:

  • Helping to clarify program plans
  • Improving communication among partners
  • Gathering the feedback needed to improve and be accountable for program effectiveness

It's important to remember, too, that evaluation is not a new activity for those of us working to improve our communities. In fact, we assess the merit of our work all the time when we ask questions, consult partners, make assessments based on feedback, and then use those judgments to improve our work. When the stakes are low, this type of informal evaluation might be enough. However, when the stakes are raised - when a good deal of time or money is involved, or when many people may be affected - then it may make sense for your organization to use evaluation procedures that are more formal, visible, and justifiable.

How do you evaluate a specific program?

Before your organization starts with a program evaluation, your group should be very clear about the answers to the following questions:.

  • What will be evaluated?
  • What criteria will be used to judge program performance?
  • What standards of performance on the criteria must be reached for the program to be considered successful?
  • What evidence will indicate performance on the criteria relative to the standards?
  • What conclusions about program performance are justified based on the available evidence?

To clarify the meaning of each, let's look at some of the answers for Drive Smart, a hypothetical program begun to stop drunk driving.

  • Drive Smart, a program focused on reducing drunk driving through public education and intervention.
  • The number of community residents who are familiar with the program and its goals
  • The number of people who use "Safe Rides" volunteer taxis to get home
  • The percentage of people who report drinking and driving
  • The reported number of single car night time crashes (This is a common way to try to determine if the number of people who drive drunk is changing)
  • 80% of community residents will know about the program and its goals after the first year of the program
  • The number of people who use the "Safe Rides" taxis will increase by 20% in the first year
  • The percentage of people who report drinking and driving will decrease by 20% in the first year
  • The reported number of single car night time crashes will decrease by 10 % in the program's first two years
  • A random telephone survey will demonstrate community residents' knowledge of the program and changes in reported behavior
  • Logs from "Safe Rides" will tell how many people use their services
  • Information on single car night time crashes will be gathered from police records
  • Are the changes we have seen in the level of drunk driving due to our efforts, or something else? Or (if no or insufficient change in behavior or outcome,)
  • Should Drive Smart change what it is doing, or have we just not waited long enough to see results?

The following framework provides an organized approach to answer these questions.

A framework for program evaluation

Program evaluation offers a way to understand and improve community health and development practice using methods that are useful, feasible, proper, and accurate. The framework described below is a practical non-prescriptive tool that summarizes in a logical order the important elements of program evaluation.

The framework contains two related dimensions:

  • Steps in evaluation practice, and
  • Standards for "good" evaluation.

The six connected steps of the framework are actions that should be a part of any evaluation. Although in practice the steps may be encountered out of order, it will usually make sense to follow them in the recommended sequence. That's because earlier steps provide the foundation for subsequent progress. Thus, decisions about how to carry out a given step should not be finalized until prior steps have been thoroughly addressed.

However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's unique context (for example, the program's history and organizational climate) is essential for sound evaluation. They are intended to serve as starting points around which community organizations can tailor an evaluation to best meet their needs.

  • Engage stakeholders
  • Describe the program
  • Focus the evaluation design
  • Gather credible evidence
  • Justify conclusions
  • Ensure use and share lessons learned

Understanding and adhering to these basic steps will improve most evaluation efforts.

The second part of the framework is a basic set of standards to assess the quality of evaluation activities. There are 30 specific standards, organized into the following four groups:

  • Feasibility

These standards help answer the question, "Will this evaluation be a 'good' evaluation?" They are recommended as the initial criteria by which to judge the quality of the program evaluation efforts.

Engage Stakeholders

Stakeholders are people or organizations that have something to gain or lose from what will be learned from an evaluation, and also in what will be done with that knowledge. Evaluation cannot be done in isolation. Almost everything done in community health and development work involves partnerships - alliances among different organizations, board members, those affected by the problem, and others. Therefore, any serious effort to evaluate a program must consider the different values held by the partners. Stakeholders must be part of the evaluation to ensure that their unique perspectives are understood. When stakeholders are not appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted.

However, if they are part of the process, people are likely to feel a good deal of ownership for the evaluation process and results. They will probably want to develop it, defend it, and make sure that the evaluation really works.

That's why this evaluation cycle begins by engaging stakeholders. Once involved, these people will help to carry out each of the steps that follows.

Three principle groups of stakeholders are important to involve:

  • People or organizations involved in program operations may include community members, sponsors, collaborators, coalition partners, funding officials, administrators, managers, and staff.
  • People or organizations served or affected by the program may include clients, family members, neighborhood organizations, academic institutions, elected and appointed officials, advocacy groups, and community residents. Individuals who are openly skeptical of or antagonistic toward the program may also be important to involve. Opening an evaluation to opposing perspectives and enlisting the help of potential program opponents can strengthen the evaluation's credibility.

Likewise, individuals or groups who could be adversely or inadvertently affected by changes arising from the evaluation have a right to be engaged. For example, it is important to include those who would be affected if program services were expanded, altered, limited, or ended as a result of the evaluation.

  • Primary intended users of the evaluation are the specific individuals who are in a position to decide and/or do something with the results.They shouldn't be confused with primary intended users of the program, although some of them should be involved in this group. In fact, primary intended users should be a subset of all of the stakeholders who have been identified. A successful evaluation will designate primary intended users, such as program staff and funders, early in its development and maintain frequent interaction with them to be sure that the evaluation specifically addresses their values and needs.

The amount and type of stakeholder involvement will be different for each program evaluation. For instance, stakeholders can be directly involved in designing and conducting the evaluation. They can be kept informed about progress of the evaluation through periodic meetings, reports, and other means of communication.

It may be helpful, when working with a group such as this, to develop an explicit process to share power and resolve conflicts . This may help avoid overemphasis of values held by any specific stakeholder.

Describe the Program

A program description is a summary of the intervention being evaluated. It should explain what the program is trying to accomplish and how it tries to bring about those changes. The description will also illustrate the program's core components and elements, its ability to make changes, its stage of development, and how the program fits into the larger organizational and community environment.

How a program is described sets the frame of reference for all future decisions about its evaluation. For example, if a program is described as, "attempting to strengthen enforcement of existing laws that discourage underage drinking," the evaluation might be very different than if it is described as, "a program to reduce drunk driving by teens." Also, the description allows members of the group to compare the program to other similar efforts, and it makes it easier to figure out what parts of the program brought about what effects.

Moreover, different stakeholders may have different ideas about what the program is supposed to achieve and why. For example, a program to reduce teen pregnancy may have some members who believe this means only increasing access to contraceptives, and other members who believe it means only focusing on abstinence.

Evaluations done without agreement on the program definition aren't likely to be very useful. In many cases, the process of working with stakeholders to develop a clear and logical program description will bring benefits long before data are available to measure program effectiveness.

There are several specific aspects that should be included when describing a program.

Statement of need

A statement of need describes the problem, goal, or opportunity that the program addresses; it also begins to imply what the program will do in response. Important features to note regarding a program's need are: the nature of the problem or goal, who is affected, how big it is, and whether (and how) it is changing.

Expectations

Expectations are the program's intended results. They describe what the program has to accomplish to be considered successful. For most programs, the accomplishments exist on a continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they should be organized by time ranging from specific (and immediate) to broad (and longer-term) consequences. For example, a program's vision, mission, goals, and objectives , all represent varying levels of specificity about a program's expectations.

Activities are everything the program does to bring about changes. Describing program components and elements permits specific strategies and actions to be listed in logical sequence. This also shows how different program activities, such as education and enforcement, relate to one another. Describing program activities also provides an opportunity to distinguish activities that are the direct responsibility of the program from those that are conducted by related programs or partner organizations. Things outside of the program that may affect its success, such as harsher laws punishing businesses that sell alcohol to minors, can also be noted.

Resources include the time, talent, equipment, information, money, and other assets available to conduct program activities. Reviewing the resources a program has tells a lot about the amount and intensity of its services. It may also point out situations where there is a mismatch between what the group wants to do and the resources available to carry out these activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part of the evaluation.

Stage of development

A program's stage of development reflects its maturity. All community health and development programs mature and change over time. People who conduct evaluations, as well as those who use their findings, need to consider the dynamic nature of programs. For example, a new program that just received its first grant may differ in many respects from one that has been running for over a decade.

At least three phases of development are commonly recognized: planning , implementation , and effects or outcomes . In the planning stage, program activities are untested and the goal of evaluation is to refine plans as much as possible. In the implementation phase, program activities are being field tested and modified; the goal of evaluation is to see what happens in the "real world" and to improve operations. In the effects stage, enough time has passed for the program's effects to emerge; the goal of evaluation is to identify and understand the program's results, including those that were unintentional.

A description of the program's context considers the important features of the environment in which the program operates. This includes understanding the area's history, geography, politics, and social and economic conditions, and also what other organizations have done. A realistic and responsive evaluation is sensitive to a broad range of potential influences on the program. An understanding of the context lets users interpret findings accurately and assess their generalizability. For example, a program to improve housing in an inner-city neighborhood might have been a tremendous success, but would likely not work in a small town on the other side of the country without significant adaptation.

Logic model

A logic model synthesizes the main program elements into a picture of how the program is supposed to work. It makes explicit the sequence of events that are presumed to bring about change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of steps leading to program results.

Creating a logic model allows stakeholders to improve and focus program direction. It reveals assumptions about conditions for program effectiveness and provides a frame of reference for one or more evaluations of the program. A detailed logic model can also be a basis for estimating the program's effect on endpoints that are not directly measured. For example, it may be possible to estimate the rate of reduction in disease from a known number of persons experiencing the intervention if there is prior knowledge about its effectiveness.

The breadth and depth of a program description will vary for each program evaluation. And so, many different activities may be part of developing that description. For instance, multiple sources of information could be pulled together to construct a well-rounded description. The accuracy of an existing program description could be confirmed through discussion with stakeholders. Descriptions of what's going on could be checked against direct observation of activities in the field. A narrow program description could be fleshed out by addressing contextual factors (such as staff turnover, inadequate resources, political pressures, or strong community participation) that may affect program performance.

Focus the Evaluation Design

By focusing the evaluation design, we mean doing advance planning about where the evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an evaluation to try to answer all questions for all stakeholders; there must be a focus. A well-focused plan is a safeguard against using time and resources inefficiently.

Depending on what you want to learn, some types of evaluation will be better suited than others. However, once data collection begins, it may be difficult or impossible to change what you are doing, even if it becomes obvious that other methods would work better. A thorough plan anticipates intended uses and creates an evaluation strategy with the greatest chance to be useful, feasible, proper, and accurate.

Among the issues to consider when focusing an evaluation are:

Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the design, methods, and use of the evaluation. Taking time to articulate an overall purpose will stop your organization from making uninformed decisions about how the evaluation should be conducted and used.

There are at least four general purposes for which a community group might conduct an evaluation:

  • To gain insight .This happens, for example, when deciding whether to use a new approach (e.g., would a neighborhood watch program work for our community?) Knowledge from such an evaluation will provide information about its practicality. For a developing program, information from evaluations of similar programs can provide the insight needed to clarify how its activities should be designed.
  • To improve how things get done .This is appropriate in the implementation stage when an established program tries to describe what it has done. This information can be used to describe program processes, to improve how the program operates, and to fine-tune the overall strategy. Evaluations done for this purpose include efforts to improve the quality, effectiveness, or efficiency of program activities.
  • To determine what the effects of the program are . Evaluations done for this purpose examine the relationship between program activities and observed consequences. For example, are more students finishing high school as a result of the program? Programs most appropriate for this type of evaluation are mature programs that are able to state clearly what happened and who it happened to. Such evaluations should provide evidence about what the program's contribution was to reaching longer-term goals such as a decrease in child abuse or crime in the area. This type of evaluation helps establish the accountability, and thus, the credibility, of a program to funders and to the community.
  • Empower program participants (for example, being part of an evaluation can increase community members' sense of control over the program);
  • Supplement the program (for example, using a follow-up questionnaire can reinforce the main messages of the program);
  • Promote staff development (for example, by teaching staff how to collect, analyze, and interpret evidence); or
  • Contribute to organizational growth (for example, the evaluation may clarify how the program relates to the organization's mission).

Users are the specific individuals who will receive evaluation findings. They will directly experience the consequences of inevitable trade-offs in the evaluation process. For example, a trade-off might be having a relatively modest evaluation to fit the budget with the outcome that the evaluation results will be less certain than they would be for a full-scale evaluation. Because they will be affected by these tradeoffs, intended users have a right to participate in choosing a focus for the evaluation. An evaluation designed without adequate user involvement in selecting the focus can become a misguided and irrelevant exercise. By contrast, when users are encouraged to clarify intended uses, priority questions, and preferred methods, the evaluation is more likely to focus on things that will inform (and influence) future actions.

Uses describe what will be done with what is learned from the evaluation. There is a wide range of potential uses for program evaluation. Generally speaking, the uses fall in the same four categories as the purposes listed above: to gain insight, improve how things get done, determine what the effects of the program are, and affect participants. The following list gives examples of uses in each category.

Some specific examples of evaluation uses

To gain insight:.

  • Assess needs and wants of community members
  • Identify barriers to use of the program
  • Learn how to best describe and measure program activities

To improve how things get done:

  • Refine plans for introducing a new practice
  • Determine the extent to which plans were implemented
  • Improve educational materials
  • Enhance cultural competence
  • Verify that participants' rights are protected
  • Set priorities for staff training
  • Make mid-course adjustments
  • Clarify communication
  • Determine if client satisfaction can be improved
  • Compare costs to benefits
  • Find out which participants benefit most from the program
  • Mobilize community support for the program

To determine what the effects of the program are:

  • Assess skills development by program participants
  • Compare changes in behavior over time
  • Decide where to allocate new resources
  • Document the level of success in accomplishing objectives
  • Demonstrate that accountability requirements are fulfilled
  • Use information from multiple evaluations to predict the likely effects of similar programs

To affect participants:

  • Reinforce messages of the program
  • Stimulate dialogue and raise awareness about community issues
  • Broaden consensus among partners about program goals
  • Teach evaluation skills to staff and other stakeholders
  • Gather success stories
  • Support organizational change and improvement

The evaluation needs to answer specific questions . Drafting questions encourages stakeholders to reveal what they believe the evaluation should answer. That is, what questions are more important to stakeholders? The process of developing evaluation questions further refines the focus of the evaluation.

The methods available for an evaluation are drawn from behavioral science and social research and development. Three types of methods are commonly recognized. They are experimental, quasi-experimental, and observational or case study designs. Experimental designs use random assignment to compare the effect of an intervention between otherwise equivalent groups (for example, comparing a randomly assigned group of students who took part in an after-school reading program with those who didn't). Quasi-experimental methods make comparisons between groups that aren't equal (e.g. program participants vs. those on a waiting list) or use of comparisons within a group over time, such as in an interrupted time series in which the intervention may be introduced sequentially across different individuals, groups, or contexts. Observational or case study methods use comparisons within a group to describe and explain what happens (e.g., comparative case studies with multiple communities).

No design is necessarily better than another. Evaluation methods should be selected because they provide the appropriate information to answer stakeholders' questions, not because they are familiar, easy, or popular. The choice of methods has implications for what will count as evidence, how that evidence will be gathered, and what kind of claims can be made. Because each method option has its own biases and limitations, evaluations that mix methods are generally more robust.

Over the course of an evaluation, methods may need to be revised or modified. Circumstances that make a particular approach useful can change. For example, the intended use of the evaluation could shift from discovering how to improve the program to helping decide about whether the program should continue or not. Thus, methods may need to be adapted or redesigned to keep the evaluation on track.

Agreements summarize the evaluation procedures and clarify everyone's roles and responsibilities. An agreement describes how the evaluation activities will be implemented. Elements of an agreement include statements about the intended purpose, users, uses, and methods, as well as a summary of the deliverables, those responsible, a timeline, and budget.

The formality of the agreement depends upon the relationships that exist between those involved. For example, it may take the form of a legal contract, a detailed protocol, or a simple memorandum of understanding. Regardless of its formality, creating an explicit agreement provides an opportunity to verify the mutual understanding needed for a successful evaluation. It also provides a basis for modifying procedures if that turns out to be necessary.

As you can see, focusing the evaluation design may involve many activities. For instance, both supporters and skeptics of the program could be consulted to ensure that the proposed evaluation questions are politically viable. A menu of potential evaluation uses appropriate for the program's stage of development could be circulated among stakeholders to determine which is most compelling. Interviews could be held with specific intended users to better understand their information needs and timeline for action. Resource requirements could be reduced when users are willing to employ more timely but less precise evaluation methods.

Gather Credible Evidence

Credible evidence is the raw material of a good evaluation. The information learned should be seen by stakeholders as believable, trustworthy, and relevant to answer their questions. This requires thinking broadly about what counts as "evidence." Such decisions are always situational; they depend on the question being posed and the motives for asking it. For some questions, a stakeholder's standard for credibility could demand having the results of a randomized experiment. For another question, a set of well-done, systematic observations such as interactions between an outreach worker and community residents, will have high credibility. The difference depends on what kind of information the stakeholders want and the situation in which it is gathered.

Context matters! In some situations, it may be necessary to consult evaluation specialists. This may be especially true if concern for data quality is especially high. In other circumstances, local people may offer the deepest insights. Regardless of their expertise, however, those involved in an evaluation should strive to collect information that will convey a credible, well-rounded picture of the program and its efforts.

Having credible evidence strengthens the evaluation results as well as the recommendations that follow from them. Although all types of data have limitations, it is possible to improve an evaluation's overall credibility. One way to do this is by using multiple procedures for gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can also enhance perceived credibility. When stakeholders help define questions and gather data, they will be more likely to accept the evaluation's conclusions and to act on its recommendations.

The following features of evidence gathering typically affect how credible it is seen as being:

Indicators translate general concepts about the program and its expected effects into specific, measurable parts.

Examples of indicators include:

  • The program's capacity to deliver services
  • The participation rate
  • The level of client satisfaction
  • The amount of intervention exposure (how many people were exposed to the program, and for how long they were exposed)
  • Changes in participant behavior
  • Changes in community conditions or norms
  • Changes in the environment (e.g., new programs, policies, or practices)
  • Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the county)

Indicators should address the criteria that will be used to judge the program. That is, they reflect the aspects of the program that are most meaningful to monitor. Several indicators are usually needed to track the implementation and effects of a complex program or intervention.

One way to develop multiple indicators is to create a "balanced scorecard," which contains indicators that are carefully selected to complement one another. According to this strategy, program processes and effects are viewed from multiple perspectives using small groups of related indicators. For instance, a balanced scorecard for a single program might include indicators of how the program is being delivered; what participants think of the program; what effects are observed; what goals were attained; and what changes are occurring in the environment around the program.

Another approach to using multiple indicators is based on a program logic model, such as we discussed earlier in the section. A logic model can be used as a template to define a full spectrum of indicators along the pathway that leads from program activities to expected effects. For each step in the model, qualitative and/or quantitative indicators could be developed.

Indicators can be broad-based and don't need to focus only on a program's long -term goals. They can also address intermediary factors that influence program effectiveness, including such intangible factors as service quality, community capacity, or inter -organizational relations. Indicators for these and similar concepts can be created by systematically identifying and then tracking markers of what is said or done when the concept is expressed.

In the course of an evaluation, indicators may need to be modified or new ones adopted. Also, measuring program performance by tracking indicators is only one part of evaluation, and shouldn't be confused as a basis for decision making in itself. There are definite perils to using performance indicators as a substitute for completing the evaluation process and reaching fully justified conclusions. For example, an indicator, such as a rising rate of unemployment, may be falsely assumed to reflect a failing program when it may actually be due to changing environmental conditions that are beyond the program's control.

Sources of evidence in an evaluation may be people, documents, or observations. More than one source may be used to gather evidence for each indicator. In fact, selecting multiple sources provides an opportunity to include different perspectives about the program and enhances the evaluation's credibility. For instance, an inside perspective may be reflected by internal documents and comments from staff or program managers; whereas clients and those who do not support the program may provide different, but equally relevant perspectives. Mixing these and other perspectives provides a more comprehensive view of the program or intervention.

The criteria used to select sources should be clearly stated so that users and other stakeholders can interpret the evidence accurately and assess if it may be biased. In addition, some sources provide information in narrative form (for example, a person's experience when taking part in the program) and others are numerical (for example, how many people were involved in the program). The integration of qualitative and quantitative information can yield evidence that is more complete and more useful, thus meeting the needs and expectations of a wider range of stakeholders.

Quality refers to the appropriateness and integrity of information gathered in an evaluation. High quality data are reliable and informative. It is easier to collect if the indicators have been well defined. Other factors that affect quality may include instrument design, data collection procedures, training of those involved in data collection, source selection, coding, data management, and routine error checking. Obtaining quality data will entail tradeoffs (e.g. breadth vs. depth); stakeholders should decide together what is most important to them. Because all data have limitations, the intent of a practical evaluation is to strive for a level of quality that meets the stakeholders' threshold for credibility.

Quantity refers to the amount of evidence gathered in an evaluation. It is necessary to estimate in advance the amount of information that will be required and to establish criteria to decide when to stop collecting data - to know when enough is enough. Quantity affects the level of confidence or precision users can have - how sure we are that what we've learned is true. It also partly determines whether the evaluation will be able to detect effects. All evidence collected should have a clear, anticipated use.

By logistics , we mean the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations also have cultural preferences that dictate acceptable ways of asking questions and collecting information, including who would be perceived as an appropriate person to ask the questions. For example, some participants may be unwilling to discuss their behavior with a stranger, whereas others are more at ease with someone they don't know. Therefore, the techniques for gathering evidence in an evaluation must be in keeping with the cultural norms of the community. Data collection procedures should also ensure that confidentiality is protected.

Justify Conclusions

The process of justifying conclusions recognizes that evidence in an evaluation does not necessarily speak for itself. Evidence must be carefully considered from a number of different stakeholders' perspectives to reach conclusions that are well -substantiated and justified. Conclusions become justified when they are linked to the evidence gathered and judged against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions are justified in order to use the evaluation results with confidence.

The principal elements involved in justifying conclusions based on evidence are:

Standards reflect the values held by stakeholders about the program. They provide the basis to make program judgments. The use of explicit standards for judgment is fundamental to sound evaluation. In practice, when stakeholders articulate and negotiate their values, these become the standards to judge whether a given program's performance will, for instance, be considered "successful," "adequate," or "unsuccessful."

Analysis and synthesis

Analysis and synthesis are methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each evidence element, as well as a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given body of evidence involves deciding how to organize, classify, compare, and display information. These decisions are guided by the questions being asked, the types of data available, and especially by input from stakeholders and primary intended users.

Interpretation

Interpretation is the effort to figure out what the findings mean. Uncovering facts about a program's performance isn't enough to make conclusions. The facts must be interpreted to understand their practical significance. For example, saying, "15 % of the people in our area witnessed a violent act last year," may be interpreted differently depending on the situation. For example, if 50% of community members had watched a violent act in the last year when they were surveyed five years ago, the group can suggest that, while still a problem, things are getting better in the community. However, if five years ago only 7% of those surveyed said the same thing, community organizations may see this as a sign that they might want to change what they are doing. In short, interpretations draw on information and perspectives that stakeholders bring to the evaluation. They can be strengthened through active participation or interaction with the data and preliminary explanations of what happened.

Judgments are statements about the merit, worth, or significance of the program. They are formed by comparing the findings and their interpretations against one or more selected standards. Because multiple standards can be applied to a given program, stakeholders may reach different or even conflicting judgments. For instance, a program that increases its outreach by 10% from the previous year may be judged positively by program managers, based on standards of improved performance over time. Community members, however, may feel that despite improvements, a minimum threshold of access to services has still not been reached. Their judgment, based on standards of social equity, would therefore be negative. Conflicting claims about a program's quality, value, or importance often indicate that stakeholders are using different standards or values in making judgments. This type of disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or bases) on which the program should be judged.

Recommendations

Recommendations are actions to consider as a result of the evaluation. Forming recommendations requires information beyond just what is necessary to form judgments. For example, knowing that a program is able to increase the services available to battered women doesn't necessarily translate into a recommendation to continue the effort, particularly when there are competing priorities or other effective alternatives. Thus, recommendations about what to do with a given intervention go beyond judgments about a specific program's effectiveness.

If recommendations aren't supported by enough evidence, or if they aren't in keeping with stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an evaluation can be strengthened by recommendations that anticipate and react to what users will want to know.

Three things might increase the chances that recommendations will be relevant and well-received:

  • Sharing draft recommendations
  • Soliciting reactions from multiple stakeholders
  • Presenting options instead of directive advice

Justifying conclusions in an evaluation is a process that involves different possible steps. For instance, conclusions could be strengthened by searching for alternative explanations from the ones you have chosen, and then showing why they are unsupported by the evidence. When there are different but equally well supported conclusions, each could be presented with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and interpret findings might be agreed upon before data collection begins.

Ensure Use and Share Lessons Learned

It is naive to assume that lessons learned in an evaluation will necessarily be used in decision making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure that the evaluation findings will be used appropriately. Preparing for their use involves strategic thinking and continued vigilance in looking for opportunities to communicate and influence. Both of these should begin in the earliest stages of the process and continue throughout the evaluation.

The elements of key importance to be sure that the recommendations from an evaluation are used are:

Design refers to how the evaluation's questions, methods, and overall processes are constructed. As discussed in the third step of this framework (focusing the evaluation design), the evaluation should be organized from the start to achieve specific agreed-upon uses. Having a clear purpose that is focused on the use of what is learned helps those who will carry out the evaluation to know who will do what with the findings. Furthermore, the process of creating a clear design will highlight ways that stakeholders, through their many contributions, can improve the evaluation and facilitate the use of the results.

Preparation

Preparation refers to the steps taken to get ready for the future uses of the evaluation findings. The ability to translate new knowledge into appropriate action is a skill that can be strengthened through practice. In fact, building this skill can itself be a useful benefit of the evaluation. It is possible to prepare stakeholders for future use of the results by discussing how potential findings might affect decision making.

For example, primary intended users and other stakeholders could be given a set of hypothetical results and asked what decisions or actions they would make on the basis of this new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and that no action would be taken, then this is an early warning sign that the planned evaluation should be modified. Preparing for use also gives stakeholders more time to explore both positive and negative implications of potential results and to identify different options for program improvement.

Feedback is the communication that occurs among everyone involved in the evaluation. Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an evaluation on track by keeping everyone informed about how the evaluation is proceeding. Primary intended users and other stakeholders have a right to comment on evaluation decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of every step in the evaluation. Obtaining valuable feedback can be encouraged by holding discussions during each step of the evaluation and routinely sharing interim findings, provisional interpretations, and draft reports.

Follow-up refers to the support that many users need during the evaluation and after they receive evaluation findings. Because of the amount of effort required, reaching justified conclusions in an evaluation can seem like an end in itself. It is not . Active follow-up may be necessary to remind users of the intended uses of what has been learned. Follow-up may also be required to stop lessons learned from becoming lost or ignored in the process of making complex or political decisions. To guard against such oversight, it may be helpful to have someone involved in the evaluation serve as an advocate for the evaluation's findings during the decision -making phase.

Facilitating the use of evaluation findings also carries with it the responsibility to prevent misuse. Evaluation results are always bounded by the context in which the evaluation was conducted. Some stakeholders, however, may be tempted to take results out of context or to use them for different purposes than what they were developed for. For instance, over-generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation.

Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked. Active follow-up can help to prevent these and other forms of misuse by ensuring that evidence is only applied to the questions that were the central focus of the evaluation.

Dissemination

Dissemination is the process of communicating the procedures or the lessons learned from an evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other elements of the evaluation, the reporting strategy should be discussed in advance with intended users and other stakeholders. Planning effective communications also requires considering the timing, style, tone, message source, vehicle, and format of information products. Regardless of how communications are constructed, the goal for dissemination is to achieve full disclosure and impartial reporting.

Along with the uses for evaluation findings, there are also uses that flow from the very process of evaluating. These "process uses" should be encouraged. The people who take part in an evaluation can experience profound changes in beliefs and behavior. For instance, an evaluation challenges staff members to act differently in what they are doing, and to question assumptions that connect program activities with intended effects.

Evaluation also prompts staff to clarify their understanding of the goals of the program. This greater clarity, in turn, helps staff members to better function as a team focused on a common end. In short, immersion in the logic, reasoning, and values of evaluation can have very positive effects, such as basing decisions on systematic judgments instead of on unfounded assumptions.

Additional process uses for evaluation include:

  • By defining indicators, what really matters to stakeholders becomes clear
  • It helps make outcomes matter by changing the reinforcements connected with achieving positive results. For example, a funder might offer "bonus grants" or "outcome dividends" to a program that has shown a significant amount of community change and improvement.

Standards for "good" evaluation

There are standards to assess whether all of the parts of an evaluation are well -designed and working to their greatest potential. The Joint Committee on Educational Evaluation developed "The Program Evaluation Standards" for this purpose. These standards, designed to assess evaluations of educational programs, are also relevant for programs and interventions related to community health and development.

The program evaluation standards make it practical to conduct sound and fair evaluations. They offer well-supported principles to follow when faced with having to make tradeoffs or compromises. Attending to the standards can guard against an imbalanced evaluation, such as one that is accurate and feasible, but isn't very useful or sensitive to the context. Another example of an imbalanced evaluation is one that would be genuinely useful, but is impossible to carry out.

The following standards can be applied while developing an evaluation design and throughout the course of its implementation. Remember, the standards are written as guiding principles, not as rigid rules to be followed in all situations.

The 30 more specific standards are grouped into four categories:

The utility standards are:

  • Stakeholder Identification : People who are involved in (or will be affected by) the evaluation should be identified, so that their needs can be addressed.
  • Evaluator Credibility : The people conducting the evaluation should be both trustworthy and competent, so that the evaluation will be generally accepted as credible or believable.
  • Information Scope and Selection : Information collected should address pertinent questions about the program, and it should be responsive to the needs and interests of clients and other specified stakeholders.
  • Values Identification: The perspectives, procedures, and rationale used to interpret the findings should be carefully described, so that the bases for judgments about merit and value are clear.
  • Report Clarity: Evaluation reports should clearly describe the program being evaluated, including its context, and the purposes, procedures, and findings of the evaluation. This will help ensure that essential information is provided and easily understood.
  • Report Timeliness and Dissemination: Significant midcourse findings and evaluation reports should be shared with intended users so that they can be used in a timely fashion.
  • Evaluation Impact: Evaluations should be planned, conducted, and reported in ways that encourage follow-through by stakeholders, so that the evaluation will be used.

Feasibility Standards

The feasibility standards are to ensure that the evaluation makes sense - that the steps that are planned are both viable and pragmatic.

The feasibility standards are:

  • Practical Procedures: The evaluation procedures should be practical, to keep disruption of everyday activities to a minimum while needed information is obtained.
  • Political Viability : The evaluation should be planned and conducted with anticipation of the different positions or interests of various groups. This should help in obtaining their cooperation so that possible attempts by these groups to curtail evaluation operations or to misuse the results can be avoided or counteracted.
  • Cost Effectiveness: The evaluation should be efficient and produce enough valuable information that the resources used can be justified.

Propriety Standards

The propriety standards ensure that the evaluation is an ethical one, conducted with regard for the rights and interests of those involved. The eight propriety standards follow.

  • Service Orientation : Evaluations should be designed to help organizations effectively serve the needs of all of the targeted participants.
  • Formal Agreements : The responsibilities in an evaluation (what is to be done, how, by whom, when) should be agreed to in writing, so that those involved are obligated to follow all conditions of the agreement, or to formally renegotiate it.
  • Rights of Human Subjects : Evaluation should be designed and conducted to respect and protect the rights and welfare of human subjects, that is, all participants in the study.
  • Human Interactions : Evaluators should respect basic human dignity and worth when working with other people in an evaluation, so that participants don't feel threatened or harmed.
  • Complete and Fair Assessment : The evaluation should be complete and fair in its examination, recording both strengths and weaknesses of the program being evaluated. This allows strengths to be built upon and problem areas addressed.
  • Disclosure of Findings : The people working on the evaluation should ensure that all of the evaluation findings, along with the limitations of the evaluation, are accessible to everyone affected by the evaluation, and any others with expressed legal rights to receive the results.
  • Conflict of Interest: Conflict of interest should be dealt with openly and honestly, so that it does not compromise the evaluation processes and results.
  • Fiscal Responsibility : The evaluator's use of resources should reflect sound accountability procedures and otherwise be prudent and ethically responsible, so that expenditures are accounted for and appropriate.

Accuracy Standards

The accuracy standards ensure that the evaluation findings are considered correct.

There are 12 accuracy standards:

  • Program Documentation: The program should be described and documented clearly and accurately, so that what is being evaluated is clearly identified.
  • Context Analysis: The context in which the program exists should be thoroughly examined so that likely influences on the program can be identified.
  • Described Purposes and Procedures: The purposes and procedures of the evaluation should be monitored and described in enough detail that they can be identified and assessed.
  • Defensible Information Sources: The sources of information used in a program evaluation should be described in enough detail that the adequacy of the information can be assessed.
  • Valid Information: The information gathering procedures should be chosen or developed and then implemented in such a way that they will assure that the interpretation arrived at is valid.
  • Reliable Information : The information gathering procedures should be chosen or developed and then implemented so that they will assure that the information obtained is sufficiently reliable.
  • Systematic Information: The information from an evaluation should be systematically reviewed and any errors found should be corrected.
  • Analysis of Quantitative Information: Quantitative information - data from observations or surveys - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Analysis of Qualitative Information: Qualitative information - descriptive information from interviews and other sources - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
  • Justified Conclusions: The conclusions reached in an evaluation should be explicitly justified, so that stakeholders can understand their worth.
  • Impartial Reporting: Reporting procedures should guard against the distortion caused by personal feelings and biases of people involved in the evaluation, so that evaluation reports fairly reflect the evaluation findings.
  • Metaevaluation: The evaluation itself should be evaluated against these and other pertinent standards, so that it is appropriately guided and, on completion, stakeholders can closely examine its strengths and weaknesses.

Applying the framework: Conducting optimal evaluations

There is an ever-increasing agreement on the worth of evaluation; in fact, doing so is often required by funders and other constituents. So, community health and development professionals can no longer question whether or not to evaluate their programs. Instead, the appropriate questions are:

  • What is the best way to evaluate?
  • What are we learning from the evaluation?
  • How will we use what we learn to become more effective?

The framework for program evaluation helps answer these questions by guiding users to select evaluation strategies that are useful, feasible, proper, and accurate.

To use this framework requires quite a bit of skill in program evaluation. In most cases there are multiple stakeholders to consider, the political context may be divisive, steps don't always follow a logical order, and limited resources may make it difficult to take a preferred course of action. An evaluator's challenge is to devise an optimal strategy, given the conditions she is working under. An optimal strategy is one that accomplishes each step in the framework in a way that takes into account the program context and is able to meet or exceed the relevant standards.

This framework also makes it possible to respond to common concerns about program evaluation. For instance, many evaluations are not undertaken because they are seen as being too expensive. The cost of an evaluation, however, is relative; it depends upon the question being asked and the level of certainty desired for the answer. A simple, low-cost evaluation can deliver information valuable for understanding and improvement.

Rather than discounting evaluations as a time-consuming sideline, the framework encourages evaluations that are timed strategically to provide necessary feedback. This makes it possible to make evaluation closely linked with everyday practices.

Another concern centers on the perceived technical demands of designing and conducting an evaluation. However, the practical approach endorsed by this framework focuses on questions that can improve the program.

Finally, the prospect of evaluation troubles many staff members because they perceive evaluation methods as punishing ("They just want to show what we're doing wrong."), exclusionary ("Why aren't we part of it? We're the ones who know what's going on."), and adversarial ("It's us against them.") The framework instead encourages an evaluation approach that is designed to be helpful and engages all interested stakeholders in a process that welcomes their participation.

Evaluation is a powerful strategy for distinguishing programs and interventions that make a difference from those that don't. It is a driving force for developing and adapting sound strategies, improving existing programs, and demonstrating the results of investments in time and other resources. It also helps determine if what is being done is worth the cost.

This recommended framework for program evaluation is both a synthesis of existing best practices and a set of standards for further improvement. It supports a practical approach to evaluation based on steps and standards that can be applied in almost any setting. Because the framework is purposefully general, it provides a stable guide to design and conduct a wide range of evaluation efforts in a variety of specific program areas. The framework can be used as a template to create useful evaluation plans to contribute to understanding and improvement. The Magenta Book - Guidance for Evaluation  provides additional information on requirements for good evaluation, and some straightforward steps to make a good evaluation of an intervention more feasible, read The Magenta Book - Guidance for Evaluation.

Online Resources

Are You Ready to Evaluate your Coalition? prompts 15 questions to help the group decide whether your coalition is ready to evaluate itself and its work.

The  American Evaluation Association Guiding Principles for Evaluators  helps guide evaluators in their professional practice.

CDC Evaluation Resources  provides a list of resources for evaluation, as well as links to professional associations and journals.

Chapter 11: Community Interventions in the "Introduction to Community Psychology" explains professionally-led versus grassroots interventions, what it means for a community intervention to be effective, why a community needs to be ready for an intervention, and the steps to implementing community interventions.

The  Comprehensive Cancer Control Branch Program Evaluation Toolkit  is designed to help grantees plan and implement evaluations of their NCCCP-funded programs, this toolkit provides general guidance on evaluation principles and techniques, as well as practical templates and tools.

Developing an Effective Evaluation Plan  is a workbook provided by the CDC. In addition to information on designing an evaluation plan, this book also provides worksheets as a step-by-step guide.

EvaluACTION , from the CDC, is designed for people interested in learning about program evaluation and how to apply it to their work. Evaluation is a process, one dependent on what you’re currently doing and on the direction in which you’d like go. In addition to providing helpful information, the site also features an interactive Evaluation Plan & Logic Model Builder, so you can create customized tools for your organization to use.

Evaluating Your Community-Based Program  is a handbook designed by the American Academy of Pediatrics covering a variety of topics related to evaluation.

GAO Designing Evaluations  is a handbook provided by the U.S. Government Accountability Office with copious information regarding program evaluations.

The CDC's  Introduction to Program Evaluation for Publilc Health Programs: A Self-Study Guide  is a "how-to" guide for planning and implementing evaluation activities. The manual, based on CDC’s Framework for Program Evaluation in Public Health, is intended to assist with planning, designing, implementing and using comprehensive evaluations in a practical way.

McCormick Foundation Evaluation Guide  is a guide to planning an organization’s evaluation, with several chapters dedicated to gathering information and using it to improve the organization.

A Participatory Model for Evaluating Social Programs from the James Irvine Foundation.

Practical Evaluation for Public Managers  is a guide to evaluation written by the U.S. Department of Health and Human Services.

Penn State Program Evaluation  offers information on collecting different forms of data and how to measure different community markers.

Program Evaluaton  information page from Implementation Matters.

The Program Manager's Guide to Evaluation  is a handbook provided by the Administration for Children and Families with detailed answers to nine big questions regarding program evaluation.

Program Planning and Evaluation  is a website created by the University of Arizona. It provides links to information on several topics including methods, funding, types of evaluation, and reporting impacts.

User-Friendly Handbook for Program Evaluation  is a guide to evaluations provided by the National Science Foundation.  This guide includes practical information on quantitative and qualitative methodologies in evaluations.

W.K. Kellogg Foundation Evaluation Handbook  provides a framework for thinking about evaluation as a relevant and useful program tool. It was originally written for program directors with direct responsibility for the ongoing evaluation of the W.K. Kellogg Foundation.

Print Resources

This Community Tool Box section is an edited version of:

CDC Evaluation Working Group. (1999). (Draft). Recommended framework for program evaluation in public health practice . Atlanta, GA: Author.

The article cites the following references:

Adler. M., &  Ziglio, E. (1996). Gazing into the oracle: the delphi method and its application to social policy and community health and development. London: Jessica Kingsley Publishers.

Barrett, F.   Program Evaluation: A Step-by-Step Guide.  Sunnycrest Press, 2013. This practical manual includes helpful tips to develop evaluations, tables illustrating evaluation approaches, evaluation planning and reporting templates, and resources if you want more information.

Basch, C., Silepcevich, E., Gold, R., Duncan, D., & Kolbe, L. (1985).   Avoiding type III errors in health education program evaluation: a case study . Health Education Quarterly. 12(4):315-31.

Bickman L, & Rog, D. (1998). Handbook of applied social research methods. Thousand Oaks, CA: Sage Publications.

Boruch, R.  (1998).  Randomized controlled experiments for evaluation and planning. In Handbook of applied social research methods, edited by Bickman L., & Rog. D. Thousand Oaks, CA: Sage Publications: 161-92.

Centers for Disease Control and Prevention DoHAP. Evaluating CDC HIV prevention programs: guidance and data system . Atlanta, GA: Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, 1999.

Centers for Disease Control and Prevention. Guidelines for evaluating surveillance systems. Morbidity and Mortality Weekly Report 1988;37(S-5):1-18.

Centers for Disease Control and Prevention. Handbook for evaluating HIV education . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Adolescent and School Health, 1995.

Cook, T., & Campbell, D. (1979). Quasi-experimentation . Chicago, IL: Rand McNally.

Cook, T.,& Reichardt, C. (1979).  Qualitative and quantitative methods in evaluation research . Beverly Hills, CA: Sage Publications.

Cousins, J.,& Whitmore, E. (1998).   Framing participatory evaluation. In Understanding and practicing participatory evaluation , vol. 80, edited by E Whitmore. San Francisco, CA: Jossey-Bass: 5-24.

Chen, H. (1990).  Theory driven evaluations . Newbury Park, CA: Sage Publications.

de Vries, H., Weijts, W., Dijkstra, M., & Kok, G. (1992).  The utilization of qualitative and quantitative data for health education program planning, implementation, and evaluation: a spiral approach . Health Education Quarterly.1992; 19(1):101-15.

Dyal, W. (1995).  Ten organizational practices of community health and development: a historical perspective . American Journal of Preventive Medicine;11(6):6-8.

Eddy, D. (1998). Performance measurement: problems and solutions . Health Affairs;17 (4):7-25.Harvard Family Research Project. Performance measurement. In The Evaluation Exchange, vol. 4, 1998, pp. 1-15.

Eoyang,G., & Berkas, T. (1996).  Evaluation in a complex adaptive system . Edited by (we don´t have the names), (1999): Taylor-Powell E, Steele S, Douglah M. Planning a program evaluation. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Fawcett, S.B., Paine-Andrews, A., Fancisco, V.T., Schultz, J.A., Richter, K.P, Berkley-Patton, J., Fisher, J., Lewis, R.K., Lopez, C.M., Russos, S., Williams, E.L., Harris, K.J., & Evensen, P. (2001). Evaluating community initiatives for health and development. In I. Rootman, D. McQueen, et al. (Eds.),  Evaluating health promotion approaches . (pp. 241-277). Copenhagen, Denmark: World Health Organization - Europe.

Fawcett , S., Sterling, T., Paine-, A., Harris, K., Francisco, V. et al. (1996).  Evaluating community efforts to prevent cardiovascular diseases . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion.

Fetterman, D.,, Kaftarian, S., & Wandersman, A. (1996).  Empowerment evaluation: knowledge and tools for self-assessment and accountability . Thousand Oaks, CA: Sage Publications.

Frechtling, J.,& Sharp, L. (1997).  User-friendly handbook for mixed method evaluations . Washington, DC: National Science Foundation.

Goodman, R., Speers, M., McLeroy, K., Fawcett, S., Kegler M., et al. (1998).  Identifying and defining the dimensions of community capacity to provide a basis for measurement . Health Education and Behavior;25(3):258-78.

Greene, J.  (1994). Qualitative program evaluation: practice and promise . In Handbook of Qualitative Research, edited by NK Denzin and YS Lincoln. Thousand Oaks, CA: Sage Publications.

Haddix, A., Teutsch. S., Shaffer. P., & Dunet. D. (1996). Prevention effectiveness: a guide to decision analysis and economic evaluation . New York, NY: Oxford University Press.

Hennessy, M.  Evaluation. In Statistics in Community health and development , edited by Stroup. D.,& Teutsch. S. New York, NY: Oxford University Press, 1998: 193-219

Henry, G. (1998). Graphing data. In Handbook of applied social research methods , edited by Bickman. L., & Rog.  D.. Thousand Oaks, CA: Sage Publications: 527-56.

Henry, G. (1998).  Practical sampling. In Handbook of applied social research methods , edited by  Bickman. L., & Rog. D.. Thousand Oaks, CA: Sage Publications: 101-26.

Institute of Medicine. Improving health in the community: a role for performance monitoring . Washington, DC: National Academy Press, 1997.

Joint Committee on Educational Evaluation, James R. Sanders (Chair). The program evaluation standards: how to assess evaluations of educational programs . Thousand Oaks, CA: Sage Publications, 1994.

Kaplan,  R., & Norton, D.  The balanced scorecard: measures that drive performance . Harvard Business Review 1992;Jan-Feb71-9.

Kar, S. (1989). Health promotion indicators and actions . New York, NY: Springer Publications.

Knauft, E. (1993).   What independent sector learned from an evaluation of its own hard-to -measure programs . In A vision of evaluation, edited by ST Gray. Washington, DC: Independent Sector.

Koplan, J. (1999)  CDC sets millennium priorities . US Medicine 4-7.

Lipsy, M. (1998).  Design sensitivity: statistical power for applied experimental research . In Handbook of applied social research methods, edited by Bickman, L., & Rog, D. Thousand Oaks, CA: Sage Publications. 39-68.

Lipsey, M. (1993). Theory as method: small theories of treatments . New Directions for Program Evaluation;(57):5-38.

Lipsey, M. (1997).  What can you build with thousands of bricks? Musings on the cumulation of knowledge in program evaluation . New Directions for Evaluation; (76): 7-23.

Love, A.  (1991).  Internal evaluation: building organizations from within . Newbury Park, CA: Sage Publications.

Miles, M., & Huberman, A. (1994).  Qualitative data analysis: a sourcebook of methods . Thousand Oaks, CA: Sage Publications, Inc.

National Quality Program. (1999).  National Quality Program , vol. 1999. National Institute of Standards and Technology.

National Quality Program . Baldridge index outperforms S&P 500 for fifth year, vol. 1999.

National Quality Program , 1999.

National Quality Program. Health care criteria for performance excellence , vol. 1999. National Quality Program, 1998.

Newcomer, K.  Using statistics appropriately. In Handbook of Practical Program Evaluation, edited by Wholey,J.,  Hatry, H., & Newcomer. K. San Francisco, CA: Jossey-Bass, 1994: 389-416.

Patton, M. (1990).  Qualitative evaluation and research methods . Newbury Park, CA: Sage Publications.

Patton, M (1997).  Toward distinguishing empowerment evaluation and placing it in a larger context . Evaluation Practice;18(2):147-63.

Patton, M. (1997).  Utilization-focused evaluation . Thousand Oaks, CA: Sage Publications.

Perrin, B. Effective use and misuse of performance measurement . American Journal of Evaluation 1998;19(3):367-79.

Perrin, E, Koshel J. (1997).  Assessment of performance measures for community health and development, substance abuse, and mental health . Washington, DC: National Academy Press.

Phillips, J. (1997).  Handbook of training evaluation and measurement methods . Houston, TX: Gulf Publishing Company.

Poreteous, N., Sheldrick B., & Stewart P. (1997).  Program evaluation tool kit: a blueprint for community health and development management . Ottawa, Canada: Community health and development Research, Education, and Development Program, Ottawa-Carleton Health Department.

Posavac, E., & Carey R. (1980).  Program evaluation: methods and case studies . Prentice-Hall, Englewood Cliffs, NJ.

Preskill, H. & Torres R. (1998).  Evaluative inquiry for learning in organizations . Thousand Oaks, CA: Sage Publications.

Public Health Functions Project. (1996). The public health workforce: an agenda for the 21st century . Washington, DC: U.S. Department of Health and Human Services, Community health and development Service.

Public Health Training Network. (1998).  Practical evaluation of public health programs . CDC, Atlanta, GA.

Reichardt, C., & Mark M. (1998).  Quasi-experimentation . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications, 193-228.

Rossi, P., & Freeman H.  (1993).  Evaluation: a systematic approach . Newbury Park, CA: Sage Publications.

Rush, B., & Ogbourne A. (1995).  Program logic models: expanding their role and structure for program planning and evaluation . Canadian Journal of Program Evaluation;695 -106.

Sanders, J. (1993).  Uses of evaluation as a means toward organizational effectiveness. In A vision of evaluation , edited by ST Gray. Washington, DC: Independent Sector.

Schorr, L. (1997).   Common purpose: strengthening families and neighborhoods to rebuild America . New York, NY: Anchor Books, Doubleday.

Scriven, M. (1998) . A minimalist theory of evaluation: the least theory that practice requires . American Journal of Evaluation.

Shadish, W., Cook, T., Leviton, L. (1991).  Foundations of program evaluation . Newbury Park, CA: Sage Publications.

Shadish, W. (1998).   Evaluation theory is who we are. American Journal of Evaluation:19(1):1-19.

Shulha, L., & Cousins, J. (1997).  Evaluation use: theory, research, and practice since 1986 . Evaluation Practice.18(3):195-208

Sieber, J. (1998).   Planning ethically responsible research . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications: 127-56.

Steckler, A., McLeroy, K., Goodman, R., Bird, S., McCormick, L. (1992).  Toward integrating qualitative and quantitative methods: an introduction . Health Education Quarterly;191-8.

Taylor-Powell, E., Rossing, B., Geran, J. (1998). Evaluating collaboratives: reaching the potential. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Teutsch, S.  A framework for assessing the effectiveness of disease and injury prevention . Morbidity and Mortality Weekly Report: Recommendations and Reports Series 1992;41 (RR-3 (March 27, 1992):1-13.

Torres, R., Preskill, H., Piontek, M., (1996).   Evaluation strategies for communicating and reporting: enhancing learning in organizations . Thousand Oaks, CA: Sage Publications.

Trochim, W. (1999).  Research methods knowledge base , vol.

United Way of America. Measuring program outcomes: a practical approach . Alexandria, VA: United Way of America, 1996.

U.S. General Accounting Office. Case study evaluations . GAO/PEMD-91-10.1.9. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. Designing evaluations . GAO/PEMD-10.1.4. Washington, DC: U.S. General Accounting Office, 1991.

U.S. General Accounting Office. Managing for results: measuring program results that are under limited federal control . GAO/GGD-99-16. Washington, DC: 1998.

U.S. General Accounting Office. Prospective evaluation methods: the prosepctive evaluation synthesis . GAO/PEMD-10.1.10. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. The evaluation synthesis . Washington, DC: U.S. General Accounting Office, 1992.

U.S. General Accounting Office. Using statistical sampling . Washington, DC: U.S. General Accounting Office, 1992.

Wandersman, A., Morrissey, E., Davino, K., Seybolt, D., Crusto, C., et al. Comprehensive quality programming and accountability: eight essential strategies for implementing successful prevention programs . Journal of Primary Prevention 1998;19(1):3-30.

Weiss, C. (1995). Nothing as practical as a good theory: exploring theory-based evaluation for comprehensive community initiatives for families and children . In New Approaches to Evaluating Community Initiatives, edited by Connell, J. Kubisch, A. Schorr, L.  & Weiss, C.  New York, NY, NY: Aspin Institute.

Weiss, C. (1998).  Have we learned anything new about the use of evaluation? American Journal of Evaluation;19(1):21-33.

Weiss, C. (1997).  How can theory-based evaluation make greater headway? Evaluation Review 1997;21(4):501-24.

W.K. Kellogg Foundation. (1998). The W.K. Foundation Evaluation Handbook . Battle Creek, MI: W.K. Kellogg Foundation.

Wong-Reiger, D.,& David, L. (1995).  Using program logic models to plan and evaluate education and prevention programs. In Evaluation Methods Sourcebook II, edited by Love. A.J. Ottawa, Ontario: Canadian Evaluation Society.

Wholey, S., Hatry, P., & Newcomer, E. .  Handbook of Practical Program Evaluation.  Jossey-Bass, 2010. This book serves as a comprehensive guide to the evaluation process and its practical applications for sponsors, program managers, and evaluators.

Yarbrough,  B., Lyn, M., Shulha, H., Rodney K., & Caruthers, A. (2011).  The Program Evaluation Standards: A Guide for Evalualtors and Evaluation Users Third Edition . Sage Publications.

Yin, R. (1988).  Case study research: design and methods . Newbury Park, CA: Sage Publications.

Certificate in Education Program Evaluation

Develop expertise in program evaluation theory, methods, and skills to effectively measure and report on the success of programming and add value to your organization.

Curriculum & Schedule

How to register, tuition & funding.

The Certificate in Education Program Evaluation prepares you with an advanced understanding of program evaluation theory, methods, and applications for the 21st century. Through case studies and hands-on exercises, you’ll develop the well-rounded skills and expertise needed to support and influence programs across not only the education sector, but also non-profit organizations, government, and associations.

In the classroom, you’ll learn from academics and advanced practitioners as you work toward designing and presenting your own program evaluation. Upon completing the program, you’ll be able to effectively measure, evaluate, and report on the success of programming within your organization.

Ideal for Professionals in education, non-profits, the public or private sector

Duration 3 months to 2 years

Tuition $2,797

Format Online & On-Campus

Schedule Saturdays

Semester of Entry Fall, spring, summer

Upon successful completion of the certificate, you‘ll be able to:

  • Compare evaluation theories and techniques
  • Identify design structure of an evaluation tool
  • Apply appropriate research methodology to program evaluations
  • Design a program or policy evaluation outline
  • Leverage evaluation findings to influence future change

Testimonials from current students and alumni.

Headshot of Larry Thi

Amy did a really exceptional job keeping the conversations within scope of relevance. The activities also helped me develop the mindset to critically think about programs I develop and how I would eventually want to evaluate them.

You must successfully complete the three required courses for a total 4.90 Continuing Education Units (CEUs), which is equivalent to 49.0 contact hours. All three courses must be completed within a two-year period.

  • Program Planning, Analysis and Evaluation (Required; 1.4 CEUs)
  • Research Methods (Required; 2.1 CEUs)
  • Program Evaluation Design (Required; 1.4 CEUs)

What Is Live Online Learning? Live online instruction is enhanced by incorporating various instructional practices and technology tools. Features such as Zoom video conferencing, breakout rooms, and chat allow for real-time interaction and collaboration among learners. Tools like Google Docs, Slides, Sheets, and Canvas Groups facilitate teamwork and information sharing within the learning community. Polling, surveys, and threaded discussion boards promote active engagement and the expression of opinions. It is important to foster social respect, privacy, and incorporate Jesuit values to create a supportive and inclusive online environment. By utilizing these practices and tools effectively, live online instruction can be engaging, interactive, and conducive to meaningful learning experiences.

What Is On-Campus Learning? On-Campus programs combine traditional classroom learning with interactive experiential methodology. Classes typically meet for two or three consecutive days once a month at our downtown Washington, D.C. campus.

Course Schedule

Brandon Daniels

Brandon Daniels

Brandon Daniels is currently the Performance Management Officer, in the Department of General Services, with the Government of the District of Columbia. Dr. Daniels currently serves as the Performance Management ... Read more

Kristen Hodge-Clark

Kristen Hodge-Clark

Dr. Kristen N. Hodge-Clark serves as senior assistant dean for program planning within the School of Continuing Studies. In this capacity, she oversees several strategic functions related to the development ... Read more

Mona Levine

Mona Levine

Dr. Mona Levine has over thirty years of experience in leadership, administration and instruction in both research universities and community colleges. At Georgetown, she serves as Subject Matter Expert and ... Read more

Please review the refund policies in our Student Handbook before completing your registration.

Degree Requirement

You must hold a bachelor's degree or the equivalent in order to enroll in our certificate programs.

Registration

This certificate is an open-enrollment program. No application is required. Click the "Register Now" button, select your courses, and then click "Add to Cart". Course registration is complete when your payment is processed. You will receive a confirmation email when your payment is received. Please retain the payment confirmation message for your records.

You can combine on-campus and online courses (if available) to complete your certificate. Depending on the certificate program, we may suggest taking courses in a specific order, but this is not a requirement.

Most students register for all courses at the same time and complete their certificate within a few months. However, you may choose to register for courses one by one over time. Once you begin a certificate, you have up to two years from the time you start your first course to complete all required courses.

International Students

International students who enter the U.S. on a valid visa are eligible to enroll in certificate courses. However, Georgetown University cannot sponsor student visas for noncredit professional certificate programs.

A TOEFL examination is not required for non-native English speakers but students are expected to read, write, and comprehend English at the graduate level to fully participate in and gain from the program.

Students from most countries may register for our online certificate programs; however, due to international laws, residents of certain countries are prohibited from registering.

Tuition varies by course. Total program tuition for all 3 courses is $2,797. Most course materials are included.

Noncredit professional certificates do not qualify for federal financial aid, scholarships, grants, or needs-based aid. However, several finance and funding options do exist, as listed below.

Some employers offer funding for employee education or professional development. If an employer guarantees payment for employee education and training, Georgetown will accept an Intent to Pay form . If you are using employer sponsorship or training authorizations, you must submit an Intent to Pay form with your registration.

If your employer will pay for your tuition, select “Third-Party Billing” as your method of payment when you register for courses online. Please submit an Intent to Pay form indicating that your employer or another third party should be billed for tuition. Invoices will not be generated without this form on file.

  • Pay training and education expenses from appropriated funds or other available funds for training needed to support program functions
  • Reimburse employees for all or part of the costs of training or education
  • Share training and education costs with employees
  • Pay travel expenses for employees assigned to training
  • Adjust an employee's normal work schedule for educational purposes not related to official duties

Georgetown accepts Standard Form-182 (SF-182) for training authorizations from the federal government.

*Federal employees should ask the appropriate budget officer about training budgets available.

Eligible Georgetown employees may use their Tuition Assistance Program (TAP) benefits to fund 90% of the certificate program tuition—employees will be invoiced for the remaining 10% of tuition and must pay any other charges associated with their certificate program. Employees using TAP benefits may work directly with the HR Benefits Office to ensure payment prior to the start of any course. This payment option is only valid if registration occurs at least 10–14 business days prior to the start date of the first course. Any fees incurred due to course withdrawal are the student’s responsibility and are not funded by Georgetown University TAP. For questions regarding TAP benefits, please contact the HR/Benefits Office at [email protected] or (202) 687-2500.

SCS is registered with GoArmyEd.com to accept SF-182 training authorization forms. GoArmyEd.com is the virtual gateway for all eligible active duty, National Guard, and Army Reserve soldiers to request Tuition Assistance (TA) online. GoArmyEd.com is also the virtual gateway for Army Civilians to apply for their Civilian education, training, and leadership development events.

The professional certificate programs offer an interest-free payment plan for certificate programs that are more than one month in duration and for which the total tuition is greater than or equal to $4,000. The payment plan is structured in the following manner:

  • Payment #1: A down payment of 25% of the total tuition balance must be paid online (within 72 hours after you register and select Payment Plan) via the Noncredit Student Portal . Please submit your down payment as soon as possible.
  • Payments #2, #3, and #4: Your remaining balance will be due in three (3) equal monthly installments beginning 30 calendar days after your down payment is processed. Your monthly payments must be paid via credit card in the Noncredit Student Portal . You will be able to access each invoice and payment due date in your student account.

PLEASE NOTE: Automatic Payment Service is not available. You must make each subsequent payment via the Noncredit Student Portal .

A number of tuition benefits are available through the Department of Veterans Affairs and under various parts of the GI Bill ® . Please visit the Resources for Military Students page for additional information and instructions.

Some students choose to finance certificate programs with private education loans. Students are responsible for contacting lenders directly to find out if a noncredit professional certificate program is eligible for a loan. While Georgetown University will not recommend specific lenders, it will certify loans for eligible programs from approved lenders.

For eligible noncredit professional certificate programs, Georgetown University will certify loan amounts up to the full cost of tuition for the program. Tuition does not cover books, travel, or living expenses. Please see individual program pages for tuition rates.

Georgetown University has a unique campus code for Sallie Mae. Our Sallie Mae branch code is 001445-99.

You must be approved for a loan before registering for courses. Follow these steps to pursue a loan option:

  • Check the list of lenders that have offered private education loans in the past to Georgetown University students.
  • Contact the lender and confirm your program is eligible for a private education loan.
  • Obtain the necessary paperwork and apply for the loan.
  • Georgetown will certify loan amounts based on the information below. Please note that our branch code is 001445-99.
  • Payment sent to Georgetown: Select “Third-Party Payment” at the time of registration if the lender is sending funds directly to Georgetown.
  • Enter the information about the lender and then contact Noncredit Student Accounts at [email protected] .

Note: It is your responsibility to contact Georgetown University Noncredit Student Accounts at [email protected] to ensure that your loan is processed.

While you may choose to complete your certificate program in one semester, many programs (but not all) allow up to two years to complete all requirements. As a result, you may choose to register for required and elective courses over several semesters to spread out the cost of tuition over time. We generally offer every course in a program each semester, so you'll have many opportunities to enroll in required and elective courses within the two-year time frame.

Tuition Discounts

Only one tuition discount may be applied at the time of registration. Tuition discounts cannot be combined. Tuition discounts are not applied retroactively. Please contact [email protected] with any questions.

Georgetown University alumni and SCS certificate completers are eligible to receive a 30% tuition discount for many certificates offered within SCS’s Professional Development & Certificates (PDC) portfolio. When registering for an eligible certificate through the SCS website, you will see the "30% Georgetown Alumni Discount" as an option. The Enrollment Team will then verify your eligibility status as a Georgetown University alumnus or certificate completer.

Georgetown SCS offers a 20% discount for eligible certificates to organizations that register 5 or more employees for the same certificate cohort at the same time. Eligible organizations include government agencies, nonprofit agencies, and for-profit businesses. Please contact [email protected] for steps and procedures to ensure your group has access to the discount.

Employees of Boeing receive a 10% tuition discount on select programs and courses

Employees of companies that belong to the EdAssist education network may receive a 10% tuition discount on select programs and courses. Contact EdAssist directly to find out if you qualify.

Eligible federal employees across the country receive a 10% scholarship applied to the current tuition rate for all SCS degree programs and professional certificate programs each academic semester. Please contact [email protected] for steps to be added to this discount group.

Professional Certificates Help You Retool, Learn New Skills

In an age of accountability, program evaluation has become a key skill, ‘flex learning’ recreates classroom experience online.

Still Have Questions?

Certificate Admissions and Enrollment Email: [email protected] Phone: (202) 687-7000

Student Accounts Email: [email protected] Phone: (202) 687-7696

Certifying Military Benefits Email: [email protected] Phone: (202) 784-7321

Want to learn more?

Request information to find out the latest on our programs.

All fields are required.

  • Spring 2024
  • Summer 2024
  • Spring 2025
  • Summer 2025

* indicates required field

Choose Your Term

This program has multiple applications available. Please select your preferred term.

Evaluating Educational Programs

  • Open Access
  • First Online: 18 October 2017

Cite this chapter

You have full access to this open access chapter

program evaluation in education

  • Samuel Ball 5  

Part of the book series: Methodology of Educational Measurement and Assessment ((MEMA))

5 Citations

15 Altmetric

This chapter was written by Samuel Ball and originally published as an ETS report in 1979. Ball was one of ETS’s most active program evaluators for 10 years and directed several pacesetting studies, including a large-scale evaluation of the educational effects of Sesame Street. The chapter documents the vigorous program of evaluation research conducted at ETS in the 1960s and 1970s, which helped lay the foundation for this fledgling field. This work developed new viewpoints, techniques, and skills for systematically assessing educational programs and led to the creation of principles for program evaluation that still appear relevant today.

This chapter was written by Samuel Ball and originally published in 1979 by Educational Testing Service and later posthumously in 2011 as a research report in the ETS R&D Scientific and Policy Contributions Series. Ball was one of ETS’s most active program evaluators for 10 years and directed several pacesetting studies including a large-scale evaluation of Sesame Street . The chapter documents the vigorous program of evaluation research conducted at ETS in the 1960s and 1970s, which helped lay the foundation for what was then a fledgling field. This work developed new viewpoints, techniques, and skills for systematically assessing educational programs and led to the creation of principles for program evaluation that still appear relevant today.

You have full access to this open access chapter,  Download chapter PDF

Similar content being viewed by others

program evaluation in education

Evaluation Paradigms

program evaluation in education

Measuring the Impact of Educational Interventions: A Quantitative Approach

program evaluation in education

Evaluating an Evaluation Program: Unknowable Outcomes

1 an emerging profession.

Evaluating educational programs is an emerging profession, and Educational Testing Service (ETS) has played an active role in its development. The term program evaluation only came into wide use in the mid-1960s, when efforts at systematically assessing programs multiplied. The purpose of this kind of evaluation is to provide information to decision makers who have responsibility for existing or proposed educational programs. For instance, program evaluation may be used to help make decisions concerning whether to develop a program ( needs assessment ), how best to develop a program ( formative evaluation ), and whether to modify—or even continue—an existing program ( summative evaluation ).

Needs assessment is the process by which one identifies needs and decides upon priorities among them. Formative evaluation refers to the process involved when the evaluator helps the program developer—by pretesting program materials, for example. Summative evaluation is the evaluation of the program after it is in operation. Arguments are rife among program evaluators about what kinds of information should be provided in each of these forms of evaluation.

In general, the ETS posture has been to try to obtain the best—that is, the most relevant, valid, and reliable—information that can be obtained within the constraints of cost and time and the needs of the various audiences for the evaluation. Sometimes, this means a tight experimental design with a national sample; at other times, the best information might be obtained through an intensive case study of a single institution. ETS has carried out both traditional and innovative evaluations of both traditional and innovative programs, and staff members also have cooperated with other institutions in planning or executing some aspects of evaluation studies. Along the way, the work by ETS has helped to develop new viewpoints, techniques, and skills.

2 The Range of ETS Program Evaluation Activities

Program evaluation calls for a wide range of skills, and evaluators come from a variety of disciplines: educational psychology, developmental psychology, psychometrics , sociology, statistics, anthropology, educational administration, and a host of subject matter areas. As program evaluation began to emerge as a professional concern, ETS changed, both structurally and functionally, to accommodate it. The structural changes were not exclusively tuned to the needs of conducting program evaluations. Rather, program evaluation, like the teaching of English in a well-run high school, became to some degree the concern of virtually all the professional staff. Thus, new research groups were added, and they augmented the organization’s capability to conduct program evaluations.

The functional response was many-faceted. Two of the earliest evaluation studies conducted by ETS indicate the breadth of the range of interest. In 1965, collaborating with the Pennsylvania State Department of Education, Henry Dyer of ETS set out to establish a set of educational goals against which later the performance of the state’s educational system could be evaluated (Dyer 1965a , b ). A unique aspect of this endeavor was Dyer’s insistence that the goal-setting process be opened up to strong participation by the state’s citizens and not left solely to a professional or political elite. (In fact, ETS program evaluation has been marked by a strong emphasis, when at all appropriate, on obtaining community participation.)

The other early evaluation study in which ETS was involved was the now famous Coleman report ( Equality of Educational Opportunity ), issued in 1966 (Coleman et al. 1966 ). ETS staff, under the direction of Albert E. Beaton, had major responsibility for analysis of the massive data generated (see Beaton and Barone , Chap. 8 , this volume). Until then, studies of the effectiveness of the nation’s schools, especially with respect to programs’ educational impact on minorities, had been small-scale. So the collection and analysis of data concerning tens of thousands of students and hundreds of schools and their communities were new experiences for ETS and for the profession of program evaluation.

In the intervening years , the Coleman report (Coleman et al. 1966 ) and the Pennsylvania Goals Study (Dyer 1965a , b ) have become classics of their kind, and from these two auspicious early efforts, ETS has become a center of major program evaluation. Areas of focus include computer-aided instruction, aesthetics and creativity in education, educational television , educational programs for prison inmates, reading programs, camping programs, career education, bilingual education, higher education, preschool programs, special education, and drug programs. (For brief descriptions of ETS work in these areas, as well as for studies that developed relevant measures, see the appendix .) ETS also has evaluated programs relating to year-round schooling, English as a second language , desegregation, performance contracting, women’s education, busing, Title I of the Elementary and Secondary Education Act (ESEA) , accountability , and basic information systems.

One piece of work that must be mentioned is the Encyclopedia of Educational Evaluation , edited by Anderson et al. ( 1975 ). The encyclopedia contains articles by them and 36 other members of the ETS staff. Subtitled Concepts and Techniques for Evaluating Education and Training Programs , it contains 141 articles in all.

3 ETS Contributions to Program Evaluation

Given the innovativeness of many of the programs evaluated, the newness of the profession of program evaluation, and the level of expertise of the ETS staff who have directed these studies, it is not surprising that the evaluations themselves have been marked by innovations for the profession of program evaluation. At the same time, ETS has adopted several principles relative to each aspect of program evaluation. It will be useful to examine these innovations and principles in terms of the phases that a program evaluation usually attends to—goal setting, measurement selection, implementation in the field setting, analysis, and interpretation and presentation of evidence.

3.1 Making Goals Explicit

It would be a pleasure to report that virtually every educational program has a well-thought-through set of goals, but it is not so. It is, therefore, necessary at times for program evaluators to help verbalize and clarify the goals of a program to ensure that they are, at least, explicit. Further, the evaluator may even be given goal development as a primary task, as in the Pennsylvania Goals Study (Dyer 1965a , b ). This need was seen again in a similar program, when Robert Feldmesser ( 1973 ) helped the New Jersey State Board of Education establish goals that underwrite conceptually that state’s “thorough and efficient” education program.

Work by ETS staff indicates there are four important principles with respect to program goal development and explication. The first of these principles is as follows: What program developers say their program goals are may bear only a passing resemblance to what the program in fact seems to be doing.

This principle—the occasional surrealistic quality of program goals—has been noted on a number of occasions: For example, assessment instruments developed for a program evaluation on the basis of the stated goals sometimes do not seem at all sensitive to the actual curriculum. As a result, ETS program evaluators seek, whenever possible, to cooperate with program developers to help fashion the goals statement. The evaluators also will attempt to describe the program in operation and relate that description to the stated goals, as in the case of the 1971 evaluation of the second year of Sesame Street for Children’s Television Workshop (Bogatz and Ball 1971 ). This comparison is an important part of the process and represents sometimes crucial information for decision makers concerned with developing or modifying a program.

The second principle is as follows: When program evaluators work cooperatively with developers in making program goals explicit, both the program and the evaluation seem to benefit.

The original Sesame Street evaluation (Ball and Bogatz, 1970 ) exemplified the usefulness of this cooperation. At the earliest planning sessions for the program, before it had a name and before it was fully funded, the developers, aided by ETS, hammered out the program goals. Thus, ETS was able to learn at the outset what the program developers had in mind, ensuring sufficient time to provide adequately developed measurement instruments. If the evaluation team had had to wait until the program itself was developed, there would not have been sufficient time to develop the instruments; more important, the evaluators might not have had sufficient understanding of the intended goals—thereby making sensible evaluation unlikely.

The third principle is as follows: There is often a great deal of empirical research to be conducted before program goals can be specified.

Sometimes, even before goals can be established or a program developed, it is necessary, through empirical research, to indicate that there is a need for the program. An illustration is provided by the research of Ruth Ekstrom and Marlaine Lockheed ( 1976 ) into the competencies gained by women through volunteer work and homemaking. The ETS researchers argued that it is desirable for women to resume their education if they wish to after years of absence. But what competencies have they picked up in the interim that might be worthy of academic credit? By identifying, surveying, and interviewing women who wished to return to formal education, Ekstrom and Lockheed established that many women had indeed learned valuable skills and knowledge. Colleges were alerted and some have begun to give credit where credit is due.

Similarly, when the federal government decided to make a concerted attack on the reading problem as it affects the total population, one area of concern was adult reading. But there was little knowledge about it. Was there an adult literacy problem? Could adults read with sufficient understanding such items as newspaper employment advertisements, shopping and movie advertisements, and bus schedules? And in investigating adult literacy , what characterized the reading tasks that should be taken into account? Murphy, in a 1973 study (Murphy 1973a ), considered these factors: the importance of a task (the need to be able to read the material if only once a year as with income tax forms and instructions), the intensity of the task (a person who wants to work in the shipping department will have to read the shipping schedule each day), or the extensivity of the task (70% of the adult population read a newspaper but it can usually be ignored without gross problems arising). Murphy and other ETS researchers conducted surveys of reading habits and abilities , and this assessment of needs provided the government with information needed to decide on goals and develop appropriate programs.

Still a different kind of needs assessment was conducted by ETS researchers with respect to a school for learning disabled students in 1976 (Ball and Goldman 1976 ) . The school catered to children aged 5–18 and had four separate programs and sites. ETS first served as a catalyst, helping the school’s staff develop a listing of problems. Then ETS acted as an amicus curiae , drawing attention to those problems, making explicit and public what might have been unsaid for want of an appropriate forum. Solving these problems was the purpose of stating new institutional goals—goals that might never have been formally recognized if ETS had not worked with the school to make its needs explicit.

The fourth principle is as follows: The program evaluator should be conscious of and interested in the unintended outcomes of programs as well as the intended outcomes specified in the program’s goal statement.

In program evaluation, the importance of looking for side effects, especially negative ones, has to be considered against the need to put a major effort into assessing progress toward intended outcomes. Often, in this phase of evaluation, the varying interests of evaluators, developers, and funders intersect—and professional, financial, and political considerations are all at odds. At such times, program evaluation becomes as much an art form as an exercise in social science.

A number of articles were written about this problem by Samuel J. Messick , ETS vice president for research (e.g., Messick 1970 , 1975 ). His viewpoint—the importance of the medical model—has been illustrated in various ETS evaluation studies. His major thesis was that the medical model of program evaluation explicitly recognizes that “…prescriptions for treatment and the evaluation of their effectiveness should take into account not only reported symptoms but other characteristics of the organism and its ecology as well” (Messick 1975 , p. 245). As Messick went on to point out, this characterization was a call for a systems analysis approach to program evaluation—dealing empirically with the interrelatedness of all the factors and monitoring all outcomes, not just the intended ones.

When, for example, ETS evaluated the first 2 years of Sesame Street (Ball and Bogatz 1970 ), there was obviously pressure to ascertain whether the intended goals of that show were being attained. It was nonetheless possible to look for some of the more likely unintended outcomes: whether the show had negative effects on heavy viewers going off to kindergarten, and whether the show was achieving impacts in attitudinal areas.

In summative evaluations , to study unintended outcomes is bound to cost more money than to ignore them. It is often difficult to secure increased funding for this purpose. For educational programs with potential national applications, however, ETS strongly supports this more comprehensive approach.

3.2 Measuring Program Impact

The letters ETS have become almost synonymous in some circles with standardized testing of student achievement . In its program evaluations, ETS naturally uses such tests as appropriate, but frequently the standardized tests are not appropriate measures. In some evaluations, ETS uses both standardized and domain-referenced tests. An example may be seen in The Electric Company evaluations (Ball et al. 1974 ). This televised series, which was intended to teach reading skills to first through fourth graders, was evaluated in some 600 classrooms. One question that was asked during the process concerned the interaction of the student’s level of reading attainment and the effectiveness of viewing the series. Do good readers learn more from the series than poor readers? So standardized, norm-referenced reading tests were administered, and the students in each grade were divided into deciles on this basis, thereby yielding ten levels of reading attainment.

Data on the outcomes using the domain-referenced tests were subsequently analyzed for each decile ranking. Thus, ETS was able to specify for what level of reading attainment, in each grade, the series was working best. This kind of conclusion would not have been possible if a specially designed domain-referenced reading test with no external referent had been the only one used, nor if a standardized test, not sensitive to the program’s impact, had been the only one used.

Without denying the usefulness of previously designed and developed measures, ETS evaluators have frequently preferred to develop or adapt instruments that would be specifically sensitive to the tasks at hand. Sometimes this measurement effort is carried out in anticipation of the needs of program evaluators for a particular instrument, and sometimes because a current program evaluation requires immediate instrumentation.

An example of the former is a study of doctoral programs by Mary Jo Clark et al. ( 1976 ). Existing instruments had been based on surveys in which practitioners in a given discipline were asked to rate the quality of doctoral programs in that discipline. Instead of this reputational survey approach, the ETS team developed an array of criteria (e.g., faculty quality, student body quality, resources, academic offerings, alumni performance), all open to objective assessment. This assessment tool can be used to assess changes in the quality of the doctoral programs offered by major universities.

Similarly, the development by ETS of the Kit of Factor-Referenced Cognitive Tests (Ekstrom et al. 1976 ) also provided a tool—one that could be used when evaluating the cognitive abilities of teachers or students if these structures were of interest in a particular evaluation. A clearly useful application was in the California study of teaching performance by Frederick McDonald and Patricia Elias ( 1976 ). Teachers with certain kinds of cognitive structures were seen to have differential impacts on student achievement . In the Donald A. Trismen study of an aesthetics program (Trismen 1968 ), the factor kit was used to see whether cognitive structures interacted with aesthetic judgments.

3.2.1 Developing Special Instruments

Examples of the development of specific instrumentation for ETS program evaluations are numerous. Virtually every program evaluation involves, at the very least, some adapting of existing instruments. For example, a questionnaire or interview may be adapted from ones developed for earlier studies. Typically, however, new instruments, including goal-specific tests, are prepared. Some ingenious examples, based on the 1966 work of E. J. Webb, D. F. Campbell, R. D. Schwartz , and L. Sechrest , were suggested by Anderson ( 1968 ) for evaluating museum programs, and the title of her article gives a flavor of the unobtrusive measures illustrated—“Noseprints on the Glass.”

Another example of ingenuity is Trismen’s use of 35 mm slides as stimuli in the assessment battery of the Education through Vision program (Trismen 1968 ). Each slide presented an art masterpiece, and the response options were four abstract designs varying in color. The instruction to the student was to pick the design that best illustrated the masterpiece’s coloring.

3.2.2 Using Multiple Measures

When ETS evaluators have to assess a variable and the usual measures have rather high levels of error inherent in them, they usually resort to triangulation. That is, they use multiple measures of the same construct , knowing that each measure suffers from a specific weakness. Thus, in 1975, Donald E. Powers evaluated for the Philadelphia school system the impact of dual-audio television—a television show telecast at the same time as a designated FM radio station provided an appropriate educational commentary. One problem in measurement was assessing the amount of contact the student had with the dual-audio television treatment (Powers 1975a ) . Powers used home telephone interviews, student questionnaires, and very simple knowledge tests of the characters in the shows to assess whether students had in fact been exposed to the treatment. Each of these three measures has problems associated with it, but the combination provided a useful assessment index.

In some circumstances, ETS evaluators are able to develop measurement techniques that are an integral part of the treatment itself. This unobtrusiveness has clear benefits and is most readily attainable with computer-aided instructional (CAI) programs. Thus, for example, Donald L. Alderman , in the evaluation of TICCIT (a CAI program developed by the Mitre Corporation), obtained for each student such indices as the number of lessons passed, the time spent on line, the number of errors made, and the kinds of errors (Alderman 1978 ). And he did this simply by programming the computer to save this information over given periods of time.

3.3 Working in Field Settings

Measurement problems cannot be addressed satisfactorily if the setting in which the measures are to be administered is ignored. One of the clear lessons learned in ETS program evaluation studies is that measurement in field settings (home, school, community) poses different problems from measurement conducted in a laboratory.

Program evaluation, ether formative or summative, demands that its empirical elements usually be conducted in natural field settings rather than in more contrived settings, such as a laboratory. Nonetheless, the problems of working in field settings are rarely systematically discussed or researched. In an article in the Encyclopedia of Educational Evaluation , Bogatz ( 1975 ) detailed these major aspects:

Obtaining permission to collect data at a site

Selecting a field staff

Training the staff

Maintaining family /community support

Of course, all the aspects discussed by Bogatz interact with the measurement and design of the program evaluation. A great source of information concerning field operations is the ETS Head Start Longitudinal Study of Disadvantaged Children, directed by Virginia Shipman ( 1970 ). Although not primarily a program evaluation, it certainly has generated implications for early childhood programs. It was longitudinal, comprehensive in scope, and large in size, encompassing four sites and, initially, some 2000 preschoolers. It was clear from the outset that close community ties were essential if only for expediency—although, of course, more important ethical principles were involved. This close relationship with the communities in which the study was conducted involved using local residents as supervisors and testers, establishing local advisory committees, and thus ensuring free, two-way communication between the research team and the community.

The Sesame Street evaluation also adopted this approach (Ball and Bogatz 1970 ). In part because of time pressures and in part to ensure valid test results, the ETS evaluators especially developed the tests so that community members with minimal educational attainments could be trained quickly to administer them with proper skill.

3.3.1 Establishing Community Rapport

In evaluations of street academies by Ronald L. Flaugher ( 1971 ), and of education programs in prisons by Flaugher and Samuel Barnett ( 1972 ), it was argued that one of the most important elements in successful field relationships is the time an evaluator spends getting to know the interests and concerns of various groups, and lowering barriers of suspicion that frequently separate the educated evaluator and the less-educated program participants. This point may not seem particularly sophisticated or complex, but many program evaluations have floundered because of an evaluator’s lack of regard for disadvantaged communities (Anderson 1970 ). Therefore, a firm principle underlying ETS program evaluation is to be concerned with the communities that provide the contexts for the programs being evaluated. Establishing two-way lines of communication with these communities and using community resources whenever possible help ensure a valid evaluation.

Even with the best possible community support, field settings cause problems for measurement. Raymond G. Wasdyke and Jerilee Grandy ( 1976 ) showed this idea to be true in an evaluation in which the field setting was literally that—a field setting. In studying the impact of a camping program on New York City grade school pupils, they recognized the need, common to most evaluations, to describe the treatment—in this case the camping experience. Therefore, ETS sent an observer to the campsite with the treatment groups. This person, who was herself skilled in camping, managed not to be an obtrusive participant by maintaining a relatively low profile.

Of course, the problems of the observer can be just as difficult in formal institutions as on the campground. In their 1974 evaluation of Open University materials, Hartnett and colleagues found, as have program evaluators in almost every situation, that there was some defensiveness in each of the institutions in which they worked (Hartnett et al. 1974 ). Both personal and professional contacts were used to allay suspicions. There also was emphasis on an evaluation design that took into account each institution’s values. That is, part of the evaluation was specific to the institution, but some common elements across institutions were retained. This strategy underscored the evaluators’ realization that each institution was different, but allowed ETS to study certain variables across all three participating institutions.

Breaking down the barriers in a field setting is one of the important elements of a successful evaluation, yet each situation demands somewhat different evaluator responses.

3.3.2 Involving Program Staff

Another way of ensuring that evaluation field staff are accepted by program staff is to make the program staff active participants in the evaluation process. While this integration is obviously a technique to be strongly recommended in formative evaluations , it can also be used in summative evaluations . In his evaluation of PLATO in junior colleges, Murphy ( 1977 ) could not afford to become the victim of a program developer’s fear of an insensitive evaluator. He overcame this potential problem by enlisting the active participation of the junior college and program development staffs. One of Murphy’s concerns was that there is no common course across colleges. Introduction to Psychology, for example, might be taught virtually everywhere, but the content can change remarkably, depending on such factors as who teaches the course, where it is taught, and what text is used. Murphy understood this variability and his evaluation of PLATO reflected his concern. It also necessitated considerable input and cooperation from program developers and college teachers working in concert—with Murphy acting as the conductor.

3.4 Analyzing the Data

After the principles and strategies used by program evaluators in their field operations are successful and data are obtained, there remains the important phase of data analysis. In practice, of course, the program evaluator thinks through the question of data analysis before entering the data collection phase. Plans for analysis help determine what measures to develop, what data to collect, and even, to some extent, how the field operation is to be conducted. Nonetheless, analysis plans drawn up early in the program evaluation cannot remain quite as immutable as the Mosaic Law. To illustrate the need for flexibility, it is useful to turn once again to the heuristic ETS evaluation of Sesame Street .

As initially planned, the design of the Sesame Street evaluation was a true experiment (Ball and Bogatz 1970 ) . The analyses called for were multivariate analyses of covariance, using pretest scores as the covariate. At each site, a pool of eligible preschoolers was obtained by community census, and experimental and control groups were formed by random assignment from these pools. The evaluators were somewhat concerned that those designated to be the experimental (viewing) group might not view the show—it was a new show on public television, a loose network of TV stations not noted for high viewership. Some members of the Sesame Street national research advisory committee counseled ETS to consider paying the experimental group to view. The suggestion was resisted, however, because any efforts above mild and occasional verbal encouragement to view the show would compromise the results. If the experimental group members were paid, and if they then viewed extensively and outperformed the control group at posttest, would the improved performance be due to the viewing, the payment, or some interaction of payment and viewing? Of course, this nice argument proved to be not much more than an exercise in modern scholasticism. In fact, the problem lay not in the treatment group but in the uninformed and unencouraged-to-view control group. The members of that group, as indeed preschoolers with access to public television throughout the nation, were viewing the show with considerable frequency—and not much less than the experimental group. Thus, the planned analysis involving differences in posttest attainments between the two groups was dealt a mortal blow.

Fortunately, other analyses were available, of which the ETS-refined age cohorts design provided a rational basis. This design is presented in the relevant report (Ball and Bogatz 1970 ). The need here is not to describe the design and analysis but to emphasize a point made practically by the poet Robert Burns some time ago and repeated here more prosaically: The best laid plans of evaluators can “gang aft agley,” too.

3.4.1 Clearing New Paths

Sometimes program evaluators find that the design and analysis they have in mind represent an untrodden path. This result is perhaps in part because many of the designs in the social sciences are built upon laboratory conditions and simply are not particularly relevant to what happens in educational institutions.

When ETS designed the summative evaluation of The Electric Company , it was able to set up a true experiment in the schools. Pairs of comparable classrooms within a school and within a grade were designated as the pool with which to work. One of each pair of classes was randomly assigned to view the series. Pretest scores were used as covariates on posttest scores, and in 1973 the first-year evaluation analysis was successfully carried out (Ball and Bogatz 1973 ). The evaluation was continued through a second year, however, and as is usual in schools, the classes did not remain intact.

From an initial 200 classes, the children had scattered through many more classrooms. Virtually none of the classes with subject children contained only experimental or only control children from the previous year. Donald B. Rubin , an ETS statistician, consulted with a variety of authorities and found that the design and analysis problem for the second year of the evaluation had not been addressed in previous work. To summarize the solution decided on, the new pool of classes was reassigned randomly to E (experimental) or C (control) conditions so that over the 2 years the design was portrayable as Fig. 11.1 .

The design for the new pool of classes. For Year II, EE represents children who were in E classrooms in Year I and again in Year II. That is, the first letter refers to status in Year I and the second to status in Year II

Further, the pretest scores of Year II were usable as new covariates when analyzing the results of the Year II posttest scores (Ball et al. 1974 ).

3.4.2 Tailoring to the Task

Unfortunately for those who prefer routine procedures, it has been shown across a wide range of ETS program evaluations that each design and analysis must be tailored to the occasion. Thus, Gary Marco ( 1972 ), as part of the statewide educational assessment in Michigan, evaluated ESEA Title I program performance. He assessed the amount of exposure students had to various clusters of Title I programs, and he included control schools in the analysis. He found that a regression -analysis model involving a correction for measurement error was an innovative approach that best fit his complex configuration of data.

Garlie Forehand , Marjorie Ragosta , and Donald A. Rock , in a national, correlational study of desegregation, obtained data on school characteristics and on student outcomes (Forehand et al. 1976 ) . The purposes of the study included defining indicators of effective desegregation and discriminating between more and less effective school desegregation programs. The emphasis throughout the effort was on variables that were manipulable. That is, the idea was that evaluators would be able to suggest practical advice on what schools can do to achieve a productive desegregation program. Initial investigations allowed specification among the myriad variables of a hypothesized set of causal relationships, and the use of path analysis made possible estimation of the strength of hypothesized causal relationships. On the basis of the initial correlation matrices, the path analyses, and the observations made during the study, an important product—a nontechnical handbook for use in schools—was developed.

Another large-scale ETS evaluation effort was directed by Trismen et al. ( 1976 ). They studied compensatory reading programs, initially surveying more than 700 schools across the country. Over a 4-year period ending in 1976, this evaluation interspersed data analysis with new data collection efforts. One purpose was to find schools that provided exceptionally positive or negative program results. These schools were visited blind and observed by ETS staff. Whereas the Forehand evaluation analysis (Forehand et al. 1976 ) was geared to obtaining practical applications, the equally extensive evaluation analysis of Trismen’s study was aimed at generating hypotheses to be tested in a series of smaller experiments.

As a further illustration of the complex interrelationship among evaluation purposes, design, analyses, and products, there is the 1977 evaluation of the use of PLATO in the elementary school by Spencer Swinton and Marianne Amarel ( 1978 ). They used a form of regression analysis—as did Forehand et al. ( 1976 ) and Trismen et al. ( 1976 ). But here the regression analyses were used differently in order to identify program effects unconfounded by teacher differences. In this regression analysis, teachers became fixed effects, and contrasts were fitted for each within-teacher pair (experimental versus control classroom teachers).

This design, in turn, provides a contrast to McDonald’s ( 1977 ) evaluation of West New York programs to teach English as a second language to adults. In this instance, the regression analysis was directed toward showing which teaching method related most to gains in adult students’ performance.

There is a school of thought within the evaluation profession that design and analysis in program evaluation can be made routine. At this point, the experience of ETS indicates that this would be unwise.

3.5 Interpreting the Results

Possibly the most important principle in program evaluation is that interpretations of the evaluation’s meaning—the conclusions to be drawn—are often open to various nuances. Another problem is that the evidence on which the interpretations are based may be inconsistent. The initial premise of this chapter was that the role of program evaluation is to provide evidence for decision-makers. Thus, one could argue that differences in interpretation, and inconsistencies in the evidence, are simply problems for the decision-maker and not for the evaluator.

But consider, for example, an evaluation by Powers of a year-round program in a school district in Virginia (Powers 1974 , 1975b ). (The long vacation was staggered around the year so that schools remained open in the summer.) The evidence presented by Powers indicated that the year-round school program provided a better utilization of physical plant and that student performance was not negatively affected. The school board considered this evidence as well as other conflicting evidence provided by Powers that the parents’ attitudes were decidedly negative. The board made up its mind, and (not surprisingly) scotched the program. Clearly, however, the decision was not up to Powers. His role was to collect the evidence and present it systematically.

3.5.1 Keeping the Process Open

In general, the ETS response to conflicting evidence or varieties of nuances in interpretation is to keep the evaluation process and its reporting as open as possible. In this way, the values of the evaluator, though necessarily present, are less likely to be a predominating influence on subsequent action.

Program evaluators do, at times, have the opportunity to influence decision-makers by showing them that there are kinds of evidence not typically considered. The Coleman Study, for example, showed at least some decision-makers that there is more to evaluating school programs than counting (or calculating) the numbers of books in libraries, the amount of classroom space per student, the student-teacher ratio, and the availability of audiovisual equipment (Coleman et al. 1966 ). Rather, the output of the schools in terms of student performance was shown to be generally superior as evidence of school program performance.

Through their work, evaluators are also able to educate decision makers to consider the important principle that educational treatments may have positive effects for some students and negative effects for others—that an interaction of treatment with student should be looked for. As pointed out in the discussion of unintended outcomes, a systems-analysis approach to program evaluation—dealing empirically with the interrelatedness of all the factors that may affect performance—is to be preferred. And this approach, as Messick emphasized, “properly takes into account those student-process-environment interactions that produce differential results” (Messick 1975 , p. 246).

3.5.2 Selecting Appropriate Evidence

Finally, a consideration of the kinds of evidence and interpretations to be provided decision makers leads inexorably to the realization that different kinds of evidence are needed, depending on the decision-maker’s problems and the availability of resources. The most scientific evidence involving objective data on student performance can be brilliantly interpreted by an evaluator, but it might also be an abomination to a decision maker who really needs to know whether teachers’ attitudes are favorable.

ETS evaluations have provided a great variety of evidence. For a formative evaluation in Brevard County, Florida , Trismen ( 1970 ) provided evidence that students could make intelligent choices about courses. In the ungraded schools, students had considerable freedom of choice, but they and their counselors needed considerably more information than in traditional schools about the ingredients for success in each of the available courses. As another example, Gary Echternacht , George Temp, and Theodore Stolie helped state and local education authorities develop Title I reporting models that included evidence on impact, cost, and compliance with federal regulations (Echternacht et al. 1976 ). Forehand and McDonald ( 1972 ) had been working with New York City to develop an accountability model providing constructive kinds of evidence for the city’s school system. On the other hand, as part of an evaluation team, Amarel provided, for a small experimental school in Chicago, judgmental data as well as reports and documents based on the school’s own records and files (Amarel and The Evaluation Collective 1979 ). Finally, Michael Rosenfeld provided Montgomery Township, New Jersey, with student, teacher, and parent perceptions in his evaluation of the open classroom approach then being tried out (Rosenfeld 1973 ).

In short, just as tests are not valid or invalid (it is the ways tests are used that deserve such descriptions), so too, evidence is not good or bad until it is seen in relation to the purpose for which it is to be used, and in relation to its utility to decision-makers.

4 Postscript

For the most part, ETS’s involvement in program evaluation has been at the practical level. Without an accompanying concern for the theoretical and professional issues, however, practical involvement would be irresponsible. ETS staff members have therefore seen the need to integrate and systematize knowledge about program evaluation. Thus, Anderson obtained a contract with the Office of Naval Research to draw together the accumulated knowledge of professionals from inside and outside ETS on the topic of program evaluation. A number of products followed. These products included a survey of practices in program evaluation (Ball and Anderson 1975a ), and a codification of program evaluation principles and issues (Ball and Anderson 1975b ). Perhaps the most generally useful of the products is the aforementioned Encyclopedia of Educational Evaluation (Anderson et al. 1975 ).

From an uncoordinated, nonprescient beginning in the mid-1960s, ETS has acquired a great deal of experience in program evaluation. In one sense it remains uncoordinated because there is no specific “party line,” no dogma designed to ensure ritualized responses. It remains quite possible for different program evaluators at ETS to recommend differently designed evaluations for the same burgeoning or existing programs.

There is no sure knowledge where the profession of program evaluation is going. Perhaps, with zero-based budgeting, program evaluation will experience amazing growth over the next decade, growth that will dwarf its current status (which already dwarfs its status of a decade ago). Or perhaps there will be a revulsion against the use of social scientific techniques within the political, value-dominated arena of program development and justification. At ETS, the consensus is that continued growth is the more likely event. And with the staff’s variegated backgrounds and accumulating expertise, ETS hopes to continue making significant contributions to this emerging profession.

Alderman, D. L. (1978). Evaluation of the TICCIT computer-assisted instructional system in the community college . Princeton: Educational Testing Service.

Google Scholar  

Amarel, M., & The Evaluation Collective. (1979). Reform, response, renegotiation: Transitions in a school-change project. Unpublished manuscript.

Anastasio, E. J. (1972). Evaluation of the PLATO and TICCIT computer-based instructional systems—A preliminary plan (Program Report No. PR-72-19). Princeton: Educational Testing Service.

Anderson, S. B. (1968). Noseprints on the glass—Or how do we evaluate museum programs? In E. Larrabee (Ed.), Museums and education (pp. 115–126). Washington, DC: Smithsonian Institution Press.

Anderson, S. B. (1970). From textbooks to reality: Social researchers face the facts of life in the world of the disadvantaged. In J. Hellmuth (Ed.), Disadvantaged child: Vol. 3. Compensatory education: A national debate . New York: Brunner/Mazel.

Anderson, S. B., Ball, S., & Murphy, R. T. (Eds.). (1975). Encyclopedia of educational evaluation: Concepts and techniques for evaluating education and training programs . San Francisco: Jossey-Bass Publishers.

Ball, S. (1973, July). Evaluation of drug information programs—Report of the panel on the impact of information on drug use and misuse, phase 2 . Washington, DC: National Research Council, National Academy of Sciences.

Ball, S., & Anderson, S. B. (1975a). Practices in program evaluation: A survey and some case studies . Princeton: Educational Testing Service.

Ball, S., & Anderson, S. B. (1975b). Professional issues in the evaluation of education/training programs . Princeton: Educational Testing Service.

Ball, S., & Bogatz, G. A. (1970). The first year of Sesame Street: An evaluation (Program Report No. PR-70-15). Princeton: Educational Testing Service.

Ball, S., & Bogatz, G. A. (1973). Reading with television: An evaluation of the Electric Company (Program Report No. PR-73-02). Princeton: Educational Testing Service.

Ball, S., & Goldman, K. S. (1976). The Adams School An interim report . Princeton: Educational Testing Service.

Ball, S., & Kazarow, K. M. (1974). Evaluation of To Reach a Child . Princeton: Educational Testing Service.

Ball, S., Bogatz, G. A., Kazarow, K. M., & Rubin, D. B. (1974). Reading with television: A follow-up evaluation of The Electric Company (Program Report No. PR-74-15). Princeton: Educational Testing Service.

Ball, S., Bridgeman, B., & Beaton, A. E. (1976). A design for the evaluation of the parent-child development center replication project . Princeton: Educational Testing Service.

Bogatz, G. A. (1975). Field operations. In S. B. Anderson, S. Ball, & R. T. Murphy (Eds.), Encyclopedia of educational evaluation (pp. 169–175). San Francisco: Jossey-Bass Publishers.

Bogatz, G. A., & Ball, S. (1971). The second year of Sesame Street: A continuing evaluation (Program Report No. PR-71-21). Princeton: Educational Testing Service.

Boldt, R. F. (with Gitomer, N.). (1975). Editing and scaling of instrument packets for the clinical evaluation of narcotic antagonists (Program Report No. PR-75-12). Princeton: Educational Testing Service.

Bussis, A. M., Chittenden, E. A., & Amarel, M. (1976). Beyond surface curriculum. An interview study of teachers’ understandings . Boulder: Westview Press.

Campbell, P. B. (1976). Psychoeducational diagnostic services for learning disabled youths [Proposal submitted to Creighton Institute for Business Law and Social Research]. Princeton: Educational Testing Service.

Clark, M. J., Hartnett, R. Y., & Baird, L. L. (1976). Assessing dimensions of quality in doctoral education (Program Report No. PR-76-27). Princeton: Educational Testing Service.

Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity . Washington, DC: U.S. Government Printing Office.

Corder, R. A. (1975). Final evaluation report of part C of the California career education program . Berkeley: Educational Testing Service.

Corder, R. A. (1976a). Calexico intercultural design. El Cid Title VII yearly final evaluation reports for grades 7–12 of program of bilingual education, 1970–1976 . Berkeley: Educational Testing Service.

Corder, R. A. (1976b). External evaluator’s final report on the experience-based career education program . Berkeley: Educational Testing Service.

Corder, R. A., & Johnson, S. (1972). Final evaluation report, 1971–1972, MANO A MANO . Berkeley: Educational Testing Service.

Dyer, H. S. (1965a). A plan for evaluating the quality of educational programs in Pennsylvania (Vol. 1, pp 1–4, 10–12). Harrisburg: State Board of Education.

Dyer, H. S. (1965b). A plan for evaluating the quality of educational programs in Pennsylvania (Vol. 2, pp. 158–161). Harrisburg: State Board of Education.

Echternacht, G., Temp, G., & Storlie, T. (1976). The operation of an ESEA Title I evaluation technical assistance center—Region 2 [Proposal submitted to DHEW/O]. Princeton: Educational Testing Service.

Ekstrom, R. B., & Lockheed, M. (1976). Giving women college credit where credit is due. Findings, 3 (3), 1–5.

Ekstrom, R. B., French, J., & Harman, H. (with Dermen, D.). (1976). Kit of factor-referenced cognitive tests . Princeton: Educational Testing Service.

Elias, P., & Wheeler, P. (1972). Interim evaluation report: BUENO . Berkeley: Educational Testing Service.

Feldmesser, R. A. (1973). Educational goal indicators for New Jersey (Program Report No. PR-73-01). Princeton: Educational Testing Service.

Flaugher, R. L. (1971). Progress report on the activities of ETS for the postal academy program. Unpublished manuscript, Educational Testing Service, Princeton.

Flaugher, R., & Barnett, S. (1972). An evaluation of the prison educational network . Unpublished manuscript, Educational Testing Service, Princeton.

Flaugher, R., & Knapp, J. (1972). Report on evaluation activities of the Bread and Butterflies project . Princeton: Educational Testing Service.

Forehand, G. A., & McDonald, F. J. (1972). A design for an accountability system for the New York City school system . Princeton: Educational Testing Service.

Forehand, G. A., Ragosta, M., & Rock, D. A. (1976). Final report: Conditions and processes of effective school desegregation (Program Report No. PR-76-23). Princeton: Educational Testing Service.

Frederiksen, N., & Ward, W. C. (1975). Development of measures for the study of creativity (Research Bulletin No. RB-75-18). Princeton: Educational Testing Service. http://dx.doi.org/10.1002/j.2333-8504.1975.tb01058.x

Freeberg, N. E. (1970). Assessment of disadvantaged adolescents: A different approach to research and evaluation measures. Journal of Educational Psychology, 61 , 229–240. https://doi.org/10.1037/h0029243

Article   Google Scholar  

Hardy, R. A. (1975). CIRCO: The development of a Spanish language test battery for preschool children. Paper presented at the Florida Educational Research Association, Tampa, FL.

Hardy, R. (1977). Evaluation strategy for developmental projects in career education . Tallahassee: Florida Department of Education, Division of Vocational, Technical, and Adult Education.

Harsh, J. R. (1975). A bilingual/bicultural project. Azusa unified school district evaluation summary . Los Angeles: Educational Testing Service.

Hartnett, R. T., Clark, M. J., Feldmesser, R. A., Gieber, M. L., & Soss, N. M. (1974). The British Open University in the United States . Princeton: Educational Testing Service.

Harvey, P. R. (1974). National College of Education bilingual teacher education project . Evanston: Educational Testing Service.

Holland, P. W., Jamison, D. T., & Ragosta, M. (1976). Project report no. 1—Phase 1 final report research design . Princeton: Educational Testing Service.

Hood, D. E. (1972). Final audit report: Skyline career development center . Austin: Educational Testing Service.

Hood, D. E. (1974). Final audit report of the ESEA IV supplementary reading programs of the Dallas Independent School District. Bilingual education program . Austin: Educational Testing Service.

Hsia, J. (1976). Proposed formative evaluation of a WNET/ 13 pilot television program: The Speech Class [Proposal submitted to educational broadcasting corporation]. Princeton: Educational Testing Service.

Marco, G. L. (1972). Impact of Michigan 1970–71 grade 3 title I reading programs (Program Report No. PR-72-05). Princeton: Educational Testing Service.

McDonald, F. J. (1977). The effects of classroom interaction patterns and student characteristics on the acquisition of proficiency in English as a second language (Program Report No. PR-77-05). Princeton: Educational Testing Service.

McDonald, F. J., & Elias, P. (1976). Beginning teacher evaluation study, Phase 2. The effects of teaching performance on pupil learning (Vol. 1, Program Report No. PR-76-06A). Princeton: Educational Testing Service.

Messick, S. (1970). The criterion problem in the evaluation of instruction: Assessing possible, not just intended outcomes. In M. Wittrock & D. Wiley (Eds.), The evaluation of instruction: Issues and problems (pp. 183–220). New York: Holt, Rinehart and Winston.

Messick, S. (1975). Medical model of evaluation. In S. B. Anderson, S. Ball, & R. T. Murphy (Eds.), Encyclopedia of educational evaluation (pp. 245–247). San Francisco: Jossey-Bass Publishers.

Murphy, R. T. (1973a). Adult functional reading study (Program Report No. PR-73-48). Princeton: Educational Testing Service.

Murphy, R. T. (1973b). Investigation of a creativity dimension (Research Bulletin No. RB-73-12). Princeton: Educational Testing Service. http://dx.doi.org/10.1002/j.2333-8504.1973.tb01027.x

Murphy, R. T. (1977). Evaluation of the PLATO 4 computer-based education system: Community college component . Princeton: Educational Testing Service.

Powers, D. E. (1973). An evaluation of the new approach method (Program Report No. PR-73-47). Princeton: Educational Testing Service.

Powers, D. E. (1974). The Virginia Beach extended school year program and its effects on student achievement and attitudes—First year report (Program Report No. PR-74-25). Princeton: Educational Testing Service.

Powers, D. E. (1975a). Dual audio television: An evaluation of a six-month public broadcast (Program Report No. PR-75-21). Princeton: Educational Testing Service.

Powers, D. E. (1975b). The second year of year-round education in Virginia Beach: A follow-up evaluation (Program Report No. PR-75-27). Princeton: Educational Testing Service.

Rosenfeld, M. (1973). An evaluation of the Orchard Road School open space program (Program Report No. PR-73-14). Princeton: Educational Testing Service.

Shipman, V. C. (1970). Disadvantaged children and their first school experiences (Vol. 1, Program Report No. PR-70-20). Princeton: Educational Testing Service.

Shipman, V. C. (1974). Evaluation of an industry-sponsored child care center . An internal ETS report prepared for Bell Telephone Laboratories. Murray Hill, NJ. Unpublished manuscript, Educational Testing Service, Princeton, NJ.

Sigel, I. E. (1976). Developing representational competence in preschool children: A preschool educational program. In Basic needs, special needs: Implications for kindergarten programs. Selected papers from the New England Kindergarten Conference, Boston . Cambridge, MA: The Lesley College Graduate School of Education.

Swinton, S., & Amarel, M. (1978). The PLATO elementary demonstration: Educational outcome evaluation (Program Report No. PR-78-11). Princeton: Educational Testing Service.

Thomas, I. J. (1970). A bilingual and bicultural model early childhood education program. Fountain Valley School District title VII bilingual project . Berkeley: Educational Testing Service.

Thomas, I. J. (1973). Mathematics aid for disadvantaged students . Los Angeles: Educational Testing Service.

Trismen, D. A. (1968). Evaluation of the Education through Vision curriculum—Phase 1 . Princeton: Educational Testing Service.

Trismen, D. A. (with T. A. Barrows). (1970). Brevard County project: Final report to the Brevard County (Florida) school system (Program Report No. PR-70-06). Princeton: Educational Testing Service.

Trismen, D. A., Waller, M. I., & Wilder, G. (1976). A descriptive and analytic study of compensatory reading programs (Vols. 1 & 2, Program Report No. PR-76-03). Princeton: Educational Testing Service.

Vale, C. A. (1975). National needs assessment of educational media and materials for the handicapped [Proposal submitted to Office of Education]. Princeton: Educational Testing Service.

Ward, W. C., & Frederiksen, N. (1977). A study of the predictive validity of the tests of scientific thinking (Research Bulletin No. RB-77-06). Princeton: Educational Testing Service. http://dx.doi.org/10.1002/j.2333-8504.1977.tb01131.x

Wasdyke, R. G. (1976, August). An evaluation of the Maryland Career Information System [Oral report].

Wasdyke, R. G. (1977). Year 3—Third party annual evaluation report: Career education instructional system project. Newark School District. Newark, Delaware . Princeton: Educational Testing Service.

Wasdyke, R. G., & Grandy, J. (1976). Field evaluation of Manhattan Community School District #2 environmental education program . Princeton: Educational Testing Service.

Webb, E. J., Campbell, D. T., Schwartz, R. D., & Sechrest, L. (1966). Unobtrusive measures: Nonreactive research in the social sciences . Chicago: Rand McNally.

Woodford, P. E. (1975). Pilot project for oral proficiency interview tests of bilingual teachers and tentative determination of language proficiency criteria [Proposal submitted to Illinois State Department of Education]. Princeton: Educational Testing Service.

Download references

Author information

Authors and affiliations.

Educational Testing Service, Princeton, NJ, USA

Samuel Ball

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Samuel Ball .

Editor information

Editors and affiliations.

Educational Testing Service (ETS), Princeton, New Jersey, USA

Randy E. Bennett

National Board of Medical Examiners (NBME), Philadelphia, Pennsylvania, USA

Matthias von Davier

Appendix: Descriptions of ETS Evaluation and Some Related Studies in Some Key Categories

1.1 aesthetics and creativity in education.

For Bartlett Hayes III’s program of Education through Vision at Andover Academy, Donald A. Trismen developed a battery of evaluation instruments that assessed, inter alia, a variety of aesthetic judgments (Trismen 1968 ). Other ETS staff members working in this area have included Norman Frederiksen and William C. Ward , who have developed a variety of assessment techniques for tapping creativity and scientific creativity (Frederiksen and Ward 1975 ; Ward and Frederiksen 1977 ); Richard T. Murphy , who also has developed creativity-assessing techniques (Murphy 1973b , 1977 ); and Scarvia B. Anderson, who described a variety of ways to assess the effectiveness of aesthetic displays (Anderson 1968 ).

1.2 Bilingual Education

ETS staff have conducted and assisted in evaluations of numerous and varied programs of bilingual education. For example, Berkeley office staff (Reginald A. Corder , Patricia Elias, Patricia Wheeler ) have evaluated programs in Calexico (Corder 1976a ), Hacienda-La Puente (Elias and Wheeler 1972 ), and El Monte (Corder and Johnson 1972 ). For the Los Angeles office, J. Richard Harsh ( 1975 ) evaluated a bilingual program in Azusa, and Ivor Thomas ( 1970 ) evaluated one in Fountain Valley. Donald E. Hood ( 1974 ) of the Austin office evaluated the Dallas Bilingual Multicultural Program. These evaluations were variously formative and summative and covered bilingual programs that, in combination, served students from preschool (Fountain Valley) through 12th grade (Calexico).

1.3 Camping Programs

Those in charge of a school camping program in New York City felt that it was having unusual and positive effects on the students, especially in terms of motivation . ETS was asked to—and did—evaluate this program, using an innovative design and measurement procedures developed by Raymond G. Wasdyke and Jerilee Grandy ( 1976 ).

1.4 Career Education

In a decade of heavy federal emphasis on career education, ETS was involved in the evaluation of numerous programs in that field. For instance, Raymond G. Wasdyke ( 1977 ) helped the Newark, Delaware, school system determine whether its career education goals and programs were properly meshed. In Dallas, Donald Hood ( 1972 ) of the ETS regional staff assisted in developing goal specifications and reviewing evaluation test items for the Skyline Project, a performance contract calling for the training of high school students in 12 career clusters. Norman E. Freeberg ( 1970 ) developed a test battery to be used in evaluating the Neighborhood Youth Corps. Ivor Thomas ( 1973 ) of the Los Angeles office provided formative evaluation services for the Azusa Unified School District’s 10th grade career training and performance program for disadvantaged students. Roy Hardy ( 1977 ) of the Atlanta office directed the third-party evaluation of Florida’s Comprehensive Program of Vocational Education for Career Development, and Wasdyke ( 1976 ) evaluated the Maryland Career Information System. Reginald A. Corder, Jr. ( 1975 ) of the Berkeley office assisted in the evaluation of the California Career Education program and subsequently directed the evaluation of the Experience-Based Career Education Models of a number of regional education laboratories (Corder 1976b ).

1.5 Computer-Aided Instruction

Three major computer-aided instruction programs developed for use in schools and colleges have been evaluated by ETS. The most ambitious is PLATO from the University of Illinois. Initially, the ETS evaluation was directed by Ernest Anastasio ( 1972 ), but later the effort was divided between Richard T. Murphy , who focused on college-level programs in PLATO, and Spencer Swinton and Marianne Amarel ( 1978 ), who focused on elementary and secondary school programs. ETS also directed the evaluation of TICCIT , an instructional program for junior colleges that used small-computer technology; the study was conducted by Donald L. Alderman ( 1978 ). Marjorie Ragosta directed the evaluation of the first major in-school longitudinal demonstration of computer-aided instruction for low-income students (Holland et al. 1976 ).

1.6 Drug Programs

Robert F. Boldt ( 1975 ) served as a consultant on the National Academy of Science’s study assessing the effectiveness of drug antagonists (less harmful drugs that will “fight” the impact of illegal drugs). Samuel Ball ( 1973 ) served on a National Academy of Science panel that designed, for the National Institutes of Health, a means of evaluating media drug information programs and spot advertisements.

1.7 Educational Television

ETS was responsible for the national summative evaluation of the ETV series Sesame Street for preschoolers (Ball and Bogatz 1970 ) , and The Electric Company for students in Grades 1 through 4 (Ball and Bogatz 1973 ); the principal evaluators were Samuel Ball, Gerry Ann Bogatz, and Donald B. Rubin . Additionally, Ronald Flaugher and Joan Knapp ( 1972 ) evaluated the series Bread and Butterflies to clarify career choice; Jayjia Hsia ( 1976 ) evaluated a series on the teaching of English for high school students and a series on parenting for adults.

1.8 Higher Education

Much ETS research in higher education focuses on evaluating students or teachers, rather than programs, mirroring the fact that systematic program evaluation is not common at this level. ETS has made, however, at least two major forays in program evaluation in higher education. In their Open University study, Rodney T. Hartnett and associates joined with three American universities (Houston, Maryland, and Rutgers) to see if the British Open University’s methods and materials were appropriate for American institutions Hartnett et al. 1974 ). Mary Jo Clark , Leonard L. Baird , and Hartnett conducted a study of means of assessing quality in doctoral programs (Clark et al. 1976 ). They established an array of criteria for use in obtaining more precise descriptions and evaluations of doctoral programs than the prevailing technique—reputational surveys—provides. P. R. Harvey ( 1974 ) also evaluated the National College of Education Bilingual Teacher Education project, while Protase Woodford , ( 1975 ) proposed a pilot project for oral proficiency interview tests of bilingual teachers and tentative determination of language proficiency criteria.

1.9 Preschool Programs

A number of preschool programs have been evaluated by ETS staff, including the ETV series Sesame Street (Ball and Bogatz 1970 ; Bogatz and Ball 1971 ). Irving Sigel ( 1976 ) conducted formative studies of developmental curriculum. Virginia Shipman ( 1974 ) helped the Bell Telephone Companies evaluate their day care centers, Samuel Ball , Brent Bridgeman , and Albert Beaton provided the U.S. Office of Child Development with a sophisticated design for the evaluation of Parent-Child Development Centers (Ball et al. 1976 ), and Ball and Kathryn Kazarow evaluated the To Reach a Child program (Ball and Kazarow 1974 ). Roy Hardy ( 1975 ) examined the development of CIRCO, a Spanish language test battery for preschool children.

1.10 Prison Programs

In New Jersey, ETS has been involved in the evaluation of educational programs for prisoners. Developed and administered by Mercer County Community College, the programs have been subject to ongoing study by Ronald L. Flaugher and Samuel Barnett ( 1972 ).

1.11 Reading Programs

ETS evaluators have been involved in a variety of ways in a variety of programs and proposed programs in reading. For example, in an extensive, national evaluation, Donald A. Trismen et al. ( 1976 ) studied the effectiveness of reading instruction in compensatory programs. At the same time, Donald E. Powers ( 1973 ) conducted a small study of the impact of a local reading program in Trenton, New Jersey. Ann M. Bussis , Edward A. Chittenden , and Marianne Amarel reported the results of their study of primary school teachers’ perceptions of their own teaching behavior (Bussis et al. 1976 ). Earlier, Richard T. Murphy surveyed the reading competencies and needs of the adult population (Murphy 1973a ).

1.12 Special Education

Samuel Ball and Karla Goldman ( 1976 ) conducted an evaluation of the largest private school for the learning disabled in New York City, and Carol Vale ( 1975 ) of the ETS office in Berkeley directed a national needs assessment concerning educational technology and special education. Paul Campbell ( 1976 ) directed a major study of an intervention program for learning disabled juvenile delinquents.

Rights and permissions

This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License ( http://creativecommons.org/licenses/by-nc/2.5/ ), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2017 Educational Testing Service

About this chapter

Ball, S. (2017). Evaluating Educational Programs. In: Bennett, R., von Davier, M. (eds) Advancing Human Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-319-58689-2_11

Download citation

DOI : https://doi.org/10.1007/978-3-319-58689-2_11

Published : 18 October 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-58687-8

Online ISBN : 978-3-319-58689-2

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Joint Committee on Standards for Educational Evaluation

Joint Committee on Standards for Educational Evaluation

Program Evaluation Standards

program evaluation in education

The third edition of the Program Evaluation Standards was published in 2010. The six-year development process relied on formal and informal needs assessments; reviews of existing scholarship; and the involvement of more than 400 stakeholders in national and international reviews, field trials, and national hearings.

The third edition can be purchased from a variety of sellers including Sage Publishing and Amazon.com .

The standards statements have been adapted into a checklist, available from the WMU Evaluation Checklists Project .

Standards Statements

It should be noted that the standard statements alone do not convey the full measure of any standard. The full publication includes detailed guidelines and principles for applying the standards in the context of real-world evaluation situations.

In order to gain familiarity with the conceptual and practical foundations of these standards and their applications to extended cases, the JCSEE strongly encourages all evaluators and evaluation users to read the complete book.

Utility Standards

The utility standards are intended to increase the extent to which program stakeholders find evaluation processes and products valuable in meeting their needs.

  • U1 Evaluator Credibility Evaluations should be conducted by qualified people who establish and maintain credibility in the evaluation context.
  • U2 Attention to Stakeholders Evaluations should devote attention to the full range of individuals and groups invested in the program and affected by its evaluation.
  • U3 Negotiated Purposes Evaluation purposes should be identified and continually negotiated based on the needs of stakeholders.
  • U4 Explicit Values Evaluations should clarify and specify the individual and cultural values underpinning purposes, processes, and judgments.
  • U5 Relevant Information Evaluation information should serve the identified and emergent needs of stakeholders.
  • U6 Meaningful Processes and Products Evaluations should construct activities, descriptions, and judgments in ways that encourage participants to rediscover, reinterpret, or revise their understandings and behaviors.
  • U7 Timely and Appropriate Communicating and Reporting Evaluations should attend to the continuing information needs of their multiple audiences.
  • U8 Concern for Consequences and Influence Evaluations should promote responsible and adaptive use while guarding against unintended negative consequences and misuse.

Feasibility Standards

The feasibility standards are intended to increase evaluation effectiveness and efficiency.

  • F1 Project Management Evaluations should use effective project management strategies.
  • F2 Practical Procedures Evaluation procedures should be practical and responsive to the way the program operates.
  • F3 Contextual Viability Evaluations should recognize, monitor, and balance the cultural and political interests and needs of individuals and groups.
  • F4 Resource Use Evaluations should use resources effectively and efficiently.

Propriety Standards

The propriety standards support what is proper, fair, legal, right and just in evaluations.

  • P1 Responsive and Inclusive Orientation Evaluations should be responsive to stakeholders and their communities.
  • P2 Formal Agreements Evaluation agreements should be negotiated to make obligations explicit and take into account the needs, expectations, and cultural contexts of clients and other stakeholders.
  • P3 Human Rights and Respect Evaluations should be designed and conducted to protect human and legal rights and maintain the dignity of participants and other stakeholders.
  • P4 Clarity and Fairness Evaluations should be understandable and fair in addressing stakeholder needs and purposes.
  • P5 Transparency and Disclosure Evaluations should provide complete descriptions of findings, limitations, and conclusions to all stakeholders, unless doing so would violate legal and propriety obligations.
  • P6 Conflicts of Interests Evaluations should openly and honestly identify and address real or perceived conflicts of interests that may compromise the evaluation.
  • P7 Fiscal Responsibility Evaluations should account for all expended resources and comply with sound fiscal procedures and processes.

Accuracy Standards

The accuracy standards are intended to increase the dependability and truthfulness of evaluation representations, propositions, and findings, especially those that support interpretations and judgments about quality.

  • A1 Justified Conclusions and Decisions Evaluation conclusions and decisions should be explicitly justified in the cultures and contexts where they have consequences.
  • A2 Valid Information Evaluation information should serve the intended purposes and support valid interpretations.
  • A3 Reliable Information Evaluation procedures should yield sufficiently dependable and consistent information for the intended uses.
  • A4 Explicit Program and Context Descriptions Evaluations should document programs and their contexts with appropriate detail and scope for the evaluation purposes.
  • A5 Information Management Evaluations should employ systematic information collection, review, verification, and storage methods.
  • A6 Sound Designs and Analyses Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes.
  • A7 Explicit Evaluation Reasoning Evaluation reasoning leading from information and analyses to findings, interpretations, conclusions, and judgments should be clearly and completely documented.
  • A8 Communication and Reporting Evaluation communications should have adequate scope and guard against misconceptions, biases, distortions, and errors.

Evaluation Accountability Standards

The evaluation accountability standards encourage adequate documentation of evaluations and a metaevaluative perspective focused on improvement and accountability for evaluation processes and products.

  • E1 Evaluation Documentation Evaluations should fully document their negotiated purposes and implemented designs, procedures, data, and outcomes.
  • E2 Internal Metaevaluation Evaluators should use these and other applicable standards to examine the accountability of the evaluation design, procedures employed, information collected, and outcomes.
  • E3 External Metaevaluation Program evaluation sponsors, clients, evaluators, and other stakeholders should encourage the conduct of external metaevaluations using these and other applicable standards.

A errata page was released to correct early versions of the book. The PDF is available here .

Copyright & Citation

The standard names and statements, as reproduced above, are under copyright to the JCSEE. Permission is freely given for stakeholders to use them for educational and scholarly purposes with attribution to the JCSEE. Authors wishing to reproduce the standard names and standard statements with attribution to the JCSEE may do so after notifying the JCSEE of the specific publication or reproduction.

The full work should be cited as follows:

Yarbrough, D.B., Shula, L.M., Hopson, R.K., & Caruthers, F.A. (2010). The Program Evaluation Standards : A guide for evaluators and evaluation users (3rd. ed). Thousand Oaks, CA: Corwin Press.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • AEM Educ Train
  • v.6(Suppl 1); 2022 Jun

Logo of aemeductrain

Program evaluation: An educator's portal into academic scholarship

Shera hosseini.

1 Faculty of Health Sciences, McMaster Institute for Research on Aging, McMaster Education Research, Innovation, and Theory, McMaster University, Hamilton Ontario, Canada

Yusuf Yilmaz

2 McMaster Education Research, Innovation and Theory (MERIT) Program & Office of Continuing Professional Development, Faculty of Health Sciences, McMaster University, Hamilton Ontario, Canada

3 Department of Medical Education, Faculty of Medicine, Ege University, Izmir Turkey

Kaushal Shah

4 Department of Emergency Medicine, Weill Cornell Medical School, New York New York, USA

Michael Gottlieb

5 Department of Emergency Medicine, Rush University Medical Center, Chicago Illinois, USA

Christine R. Stehman

6 Department of Emergency Medicine, University of Illinois College of Medicine ‐ Peoria/OSF Healthcare, Peoria Illinois, USA

Andrew K. Hall

7 Department of Emergency Medicine, University of Ottawa, Ottawa Ontario, Canada

8 Royal College of Physicians and Surgeons of Canada, Ottawa Ontario, Canada

Teresa M. Chan

9 Faculty of Health Sciences, McMaster University, Hamilton Ontario, Canada

10 Division of Emergency Medicine, Department of Medicine, McMaster University, Hamilton Ontario, Canada

11 McMaster Program for Education Research, Innovation, and Theory (MERIT), McMaster University, Hamilton Ontario, Canada

12 Department of Health Research Methodology, Impact, and Evidence, McMaster University, Hamilton Ontario, Canada

Program evaluation is an “essential responsibility” but is often not seen as a scholarly pursuit. While Boyer expanded what qualifies as educational scholarship, many still need to engage in processes that are rigorous and of a requisite academic standard to be labelled as scholarly. Many medical educators may feel that scholarly program evaluation is a daunting task due to the competing interests of curricular change, remediation, and clinical care. This paper explores how educators can take their questions around outcomes and efficacy of our programs and efficiently engage in education scholarship. The authors outline how educators can examine whether training programs have a desired impact and outcomes, and then how they might leverage this process into education scholarship.

INTRODUCTION

Program evaluation has been referred to as an “essential responsibility” for those tasked with the oversight of medical training programs, 1 but it is striking how little of this program evaluation work is labelled as scholarly, and how rarely this work translates into academic scholarship. While what qualifies as educational scholarship has been expanded well beyond traditional peer‐reviewed publications to include the scholarship of teaching, discovery, integration, and application, 2 there is still a need to engage in processes that are rigorous and of a requisite academic standard to be labeled as scholarly. 3 However, being asked to both create educational deliverables and innovate within this context is often already above and beyond the duties of overworked and under‐supported medical educators. Many medical educators may feel that scholarly program evaluation is a step too far—with so many competing interests, it can be difficult to find the “bandwidth” to accomplish these scholarly tasks. 4 Don’t we all wonder about the outcomes and efficacy of our programs? Were our programs received as they were intended? And finally, is my training program having the desired impact and outcomes? And if so, wouldn’t it be nice to generate a multiple win around your project? 5

It is not just a lack of time that can prevent medical educators from engaging in scholarly evaluation efforts. Some educators may also feel inadequately trained in program evaluation and unclear what approaches and strategies to employ when engaging in program evaluation. Further, if evaluation is completed well, there is often an opportunity to translate this work into scholarly outputs.

The goal of this paper was to accomplish three goals: (1) to introduce educators to the concept of program evaluation, (2) to help them to understand frameworks that will guide them in correctly and rigorously performing program evaluations, and (3) to discuss ways in which program evaluation can translate to scholarly output.

WHAT IS PROGRAM EVALUATION?

In medical education, a “program” can refer to a large spectrum of activities, and experiences—they can range from a new workplace‐based assessment program 6 , 7 to a boot camp series 8 to a longitudinal faculty development course. 9 , 10 It is an ever‐evolving field with new technologies, shifting paradigms, and often unclear scholarly formats. The delivery of medical education requires the implementation of programs. Whether it is a well‐established program (e.g., intern orientation or airway management training) or a novel approach to assessment (e.g., simulation‐based critical care competency or entrustable professional activity), these programs need to be evaluated to determine if they are worthwhile with respect to effectiveness or value. A formal definition for program evaluation has been put forth by Mohanna and Cottrell as “a systematic approach to the collection, analysis, and interpretation of information about any aspect of the conceptualization, design, implementation, and utility of educational programmes”. 11 Simply stated, program evaluation is the process of identifying the value of an educational offering, but at times it can also be a way of determining issues or problems in need of systematic improvement.

Methods similar to those employed by experimentalists or epidemiologists may be used for measurement and analysis when conducting program evaluation, but this process is distinct from conventional research studies. Experimental research typically focuses on the generation of new knowledge that adds to the world more transferable or generalizable to other contexts, whereas program evaluation seeks to understand the efficacy of a specific, discrete project (e.g., a curricular change in a program or a new course design). Quantitative experiments may involve hypothesis testing with a control group and an experimental group, while qualitative studies may seek to understand or describe an experienced phenomenon. Despite being distinct from research, program evaluation is a rigorous process that might use a variety of quantitative and/or qualitative data to determine the value of the outcomes of a program, though technically a research protocol is not required.

WHY AND WHEN TO USE PROGRAM EVALUATION

While the specific purposes of program evaluation are extensive, at its core, program evaluation is about values, judgements, decision making, and change. 1 , 12 , 13 Program evaluation is another way, outside of the program itself, that you can create a value proposition to your community via your program. 14 Educators use program evaluation to determine the value and worth of the program they designed and then explain that worth to others. There are multiple program evaluation frameworks, and which framework you select is determined by the stakeholders and focus of the evaluation. 13 , 15

The ultimate why of your program evaluation will be how you define success of the program in the eyes of the stakeholders and the focus of the evaluation. 16 This marker of success should fall into at least one broad category of program evaluation—accountability, knowledge, or development—though these categories are often intertwined. 1 , 12 , 17 More specific purposes for evaluation within these three categories are found in Table  1 .

Purposes for program evaluation

Although it can resemble research (e.g., experimental or qualitative medical education research), it is differentiated from research by the fundamental underlying impetus for the study—research work seeks to understand the world better through its conduct (to create generalizable or transferrable “truths” to better understand how things work), whereas program evaluation seeks to understand how and if a specific program works.

If done correctly, program evaluation is a systematic method of answering questions about the program you have designed, providing insights for others to replicate or avoid in their own programs. 18 Once the work has been done, “dissemination to the community at large constitutes a critical element of scholarship.” 13 Dissemination of this work could be publishing the program evaluation as an original research report, as an innovation report, or in an online curricular repository (e.g., MedEdPORTAL, JETem) to help advance knowledge for others (Table  2 ).

Comparing and contrasting of various types of program evaluation scholarship

Overall, once the rationale is determined, program evaluation can be divided up into two groups that help direct the when —formative (i.e., used to improve the performance of the program, program monitoring, happens at various times) and summative (i.e., used for overall judgements about the program and its developers, usually at the end of the program). 19 , 20 No matter what the why , all programs should have program evaluations built into them. In fact, Woodward argues that program evaluation should be done within every part of the educational intervention process. For example, a needs assessment is the program evaluation determining the need for the program. 19 Ideally, the program evaluation should be developed alongside the program itself ensuring that one does a credible evaluation answering all required questions. 18 Early program evaluation development prevents later problems and allows data to be collected, as suggested by Durning et al., 16 during three phases: (1) before (establish a baseline and helps show how much of the outcomes are due to the program itself), (2) during (process measurements; allows developers to notice and fix problems early), and (3) after the program (outcome measurements). The why and when of program evaluation feed directly into the approach you take in doing the program evaluation (i.e., how you actually do this).

HOW TO USE PROGRAM EVALUATION METHODOLOGIES

As stated above, development of the program evaluation should happen alongside development of the program itself, meaning prior to launching the program (or the most recent class of participants). This involves identifying the specific goals of the evaluation by considering the potential stakeholders and end‐users of the resultant evaluation. With this information, educators can better align the breadth and focus of the evaluation with their specific needs (Box  1 ).

Components of a program evaluation

  • Develop an evaluation question based on specific goals of various stakeholders
  • Identify your theory of change
  • Perform a literature search
  • Identify your (validated) collection instrument
  • Consider your outcomes with a broad lens

Once you have identified the target audience, next determine the underlying theory for change. The three most common theories for this are reductionism, system theory, and complexity theory. Reductionism relies upon an assumption that there is a specific order with a direct cause and effect for each action. 21 This approach, reflected in models such as the Logic model, 22 , 23 suggests that there is a clear linearity and predictable impact from each intervention. 1 System theory builds upon this with its roots in the general system theory applied to biology. 24 In this model, it is proposed that the whole of a system is greater than the sum of its individual parts. 24 Therefore, education programs expand beyond merely isolated parts, instead comprising the integration of the specific program components with each other and with the broader educational environment. Complexity theory expands further to adapt to the ever‐changing, more complex state of programs in real life. 1 , 25 There are multiple complex factors that can influence education programs, including the participants, influence of stakeholders and regulators, professional practice patterns, the surrounding environment, and expanding knowledge within the specific field, as well as with regard to the education concepts being taught. 1 Understanding the underlying theories can help inform the conceptual frameworks selected for evaluation, but we will dive into this more in the next section.

CONCEPTUAL FRAMEWORKS

There are many frameworks that can guide your program evaluation process. A full description of each of these is beyond the scope of this paper; however, our authorship team has detailed six program evaluation frameworks that have been featured in medical education (and specifically AEM Education and Training) including: CIPP, Kirkpatrick Model, Logic Model, Realist Evaluation, RE‐AIM, and SQUIRE‐EDU. Table  3 provides a description of some of the more commonly used frameworks and sources of further information on each of them. 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47

Conceptual frameworks for program evaluation

When creating the program evaluation, you may utilize frameworks to guide the data collection. The selection process for your conceptual framework will require consideration of the end‐users and which data will be most valuable to them. You should perform a thorough literature search to identify similarities and differences with prior programs. Questions should seek to assess the benefits and consequences of the new intervention or innovation. During the literature search, seek out existing tools used by similar programs to inform your evaluation tool design. Identify how this aligns with your current program evaluation needs and modify the tool where necessary. It is important to also collect validity evidence for your specific tool. 26 Even if a tool is “validated” in another setting, new validity should be sought for the current application within the context of the new program. 26 Since evaluation is often centered on a particular program, the evaluation plan may contain outcomes that are idiosyncratic rather than generalizable; however, best practices of questionnaire design should still be followed as much as possible (e.g., basing the tool on prior evaluation of a previous study, pilot testing a survey tool prior to launch to ensure readability and clarity).

Finally, consider the outcomes with a broader lens. While often considered with regard to learner‐oriented outcomes (e.g., Kirkpatrick model), it is also important to consider the costs (e.g., time, expenses, faculty) and broader societal implications as described further below. Those reading the findings will want to weigh the cost and benefits of the program.

MARKERS OF HIGH‐QUALITY PROGRAM EVALUATION

Program evaluation and research studies have very common features, depending on the objectives of a study, these two methods may become very similar. While research studies aim to produce new knowledge, program evaluation studies focus on the program quality and value. 27 When unsure, ethics boards guidelines are helpful for ensuring that the study that you are about to conduct is a program evaluation study. In the United States, many program evaluations will require institutional review board approval but are usually granted exemption status since program evaluations will fall well within normal educational practices. Ethics boards in Canada deem program evaluations exempt from the ethical review as per Tri___Council Policy Statement 2 [2018] Article 2.5. 28 Therefore, initially, a program evaluation study should be checked with the ethics board and receive an ethical exemption to make sure that the study purpose, objectives, data collection, and analysis aligns with it.

There are three common approaches to program evaluation studies: decision‐oriented, outcomes‐oriented, and expertise‐oriented. 29 In the previous section, various program evaluation frameworks and models were described that can yield to the overall approaches. These frameworks are of vital value to the overall program evaluation process. 1 Without using a framework, program evaluation may lose its focus and the flow of the study may become redundant and less helpful. As each framework focuses on different parts of a study, it is important for researchers to take into account the study’s objectives and focus. The face validity of a framework should be agreed by the investigators, meaning the outcomes of the study could be achieved through the selected framework. 13 A study could focus on many objectives such as trainees’ learning, satisfaction, and the intervention’s success in reaching various audiences. 1

Innovation reports are an integral part of program evaluation studies as they evaluate novel approaches to teaching and learning. Hall and colleagues reviewed the literature on the quality markers of innovation reports and came up with 34 items resulting in seven themes from analysis of the problem to dissemination of results to ensure that the innovation reports adequately provide insights and reproducibly. 30 Therefore, ensuring that a program evaluation study has rigor and reproducibility is very important for any type of program evaluation study. Box 2 provides various pearls to help researchers who will tackle to program evaluation studies. Box 3 contains an annotated bibliography that summarizes key resources for further reading.

Pearls for those interested in conducting program evaluation work

Based on prior literature on innovation reports and program evaluations, we have identified some common problems encountered when authors claim to have conducted these formats of studies:

Pearl 1: Plan the program evaluation from the onset . Ideally, program evaluation should be established prior to the program launch (or at least prior to the most recent cohort). Performing program evaluation once the program is ongoing will limit the available information and increase the risk of recall bias.

Pearl 2: Consider all of the inputs and outputs. The evaluators will need to think beyond just the learner outcomes and consider the broader outcomes, impacts, and the resources and requirements to run the program.

Pearl 3: Attempt to identify unintended outcomes. Intended outcomes are often tracked but a systematic inquiry into identifying unintended outcomes is often overlooked.

Pearl 4: Involve a statistician or a data scientist early. Some program evaluation approaches require complex statistical analysis and even further data exploration to understand complex data to be collected through the program implementation. A statistician or a data scientist can provide different approaches on how to analyze data and understand the relationship on program focus and outcomes.

Pearl 5: Chart the overall program evaluation process. Program evaluation could be very complex from planning to evaluation. Each step of the program evaluation should be represented with a figure in the study. This charting process will give readers a clear idea about the program evaluation steps and how the framework was implemented at each step.

Key resources for further reading

The following are key papers on the program evaluation methodology recommended for those interested in learning more.

1. Frye AW, Hemmer PA. Program evaluation models and related theories: AMEE guide no. 67. Med Teach . 2012;34(5):e288‐e299.

This is a review of several common program evaluation models and the benefits and limitations of each. The paper also provides examples of how to apply these in practice.

2. Cook DA. 2010. Twelve tips for evaluating educational programs. Med Teach . 32:296–301.

A concise article that breaks down program evaluation into twelve “tips” to guide the development and implementation. Not meant to be used alone, but again a solid introduction to the process with an included blank table for readers to start brainstorming their own program evaluations.

3. Goldie J. AMEE Education Guide no. 29: Evaluating Educational Programs. Med Teach . 2006; 28(3): 210–224.

An introductory how‐to guide for program evaluation of educational programs in general including the history and the process. A solid starting point for someone who is unfamiliar with the process and a solid introduction to allow better integration of the information provided in the AMEE no. 67 (included below) which walks the reader through theories to use as frameworks for their program evaluations.

4. Durning SJ, Hemmer P, Pangaro LN. The Structure of Program Evaluation: An Approach for Evaluating a Course, Clerkship, or Components of a Residency or Fellowship Training Program. Teach Learn Med , 19:3, 308–318, 10.1080/10401330701366796

While the other articles included here involve program evaluation in general, this article focuses on applying program evaluation to graduate medical education. While it is just one particular framework out of many that are available, it provides insight into how to apply program evaluation to programs that don’t necessarily fit the usual educational program mold. For medical educators beginning their program evaluation journey, having this example will allow them to see how other frameworks might be used for their programs.

Program evaluations can be seen as a gateway towards other forms of scholarship for those who are most at home developing programs and curricula. However, it should be acknowledged as its own form of scholarship that is unique and separate from curriculum development or research.

CONFLICTS OF INTEREST

Dr. Shera Hosseini has received funding for her postdoctoral fellowship from the McMaster Institute for Research in Aging (MIRA). Dr. Yilmaz is the recipient of a 2019 TUBITAK Postdoctoral Fellowship grant. Dr. Shah—none; and no grants. Dr. Gottlieb holds grants for unrelated work with the Centers for Disease Control and Prevention, Council of Residency Directors in Emergency Medicine, Society for Academic Emergency Medicine, and eCampus Ontario. Dr. Stehman—none, Dr. Hall—holds grants for unrelated work from the Royal College of Physicians and Surgeons of Canada, Queen’s University Center for Teaching and Learning, and the Physician Services Incorporated Foundation. Dr. Chan holds grants for unrelated work from McMaster University, the PSI foundation, Society for Academic Emergency Medicine, eCampus Ontario, the University of Saskatchewan, and Royal College of Physicians and Surgeons of Canada.

Hosseini S, Yilmaz Y, Shah K, et al. Program evaluation: An educator's portal into academic scholarship . AEM Educ Train . 2022; 6(Suppl. 1) :S43–S51. doi: 10.1002/aet2.10745 [ CrossRef ] [ Google Scholar ]

Supervising Editor: Dr. Susan Promes

U.S. flag

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Introduction

  • Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide

‹ View Table of Contents

  • What Is Program Evaluation?
  • Evaluation Supplements Other Types of Reflection and Data Collection
  • Distinguishing Principles of Research and Evaluation
  • Why Evaluate Public Health Programs?
  • CDC’s Framework for Program Evaluation in Public Health
  • How to Establish an Evaluation Team and Select a Lead Evaluator
  • Organization of This Manual

Most program managers assess the value and impact of their work all the time when they ask questions, consult partners, make assessments, and obtain feedback. They then use the information collected to improve the program. Indeed, such informal assessments fit nicely into a broad definition of evaluation as the “ examination of the worth, merit, or significance of an object. ” [4] And throughout this manual, the term “program” will be defined as “ any set of organized activities supported by a set of resources to achieve a specific and intended result. ” This definition is intentionally broad so that almost any organized public health action can be seen as a candidate for program evaluation:

  • Direct service interventions (e.g., a program that offers free breakfasts to improve nutrition for grade school children)
  • Community mobilization efforts (e.g., an effort to organize a boycott of California grapes to improve the economic well-being of farm workers)
  • Research initiatives (e.g., an effort to find out whether disparities in health outcomes based on race can be reduced)
  • Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
  • Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)

What distinguishes program evaluation from ongoing informal assessment is that program evaluation is conducted according to a set of guidelines. With that in mind, this manual defines program evaluation as “the systematic collection of information about the activities, characteristics, and outcomes of programs to make judgments about the program, improve program effectiveness, and/or inform decisions about future program development.” [5] Program evaluation does not occur in a vacuum; rather, it is influenced by real-world constraints. Evaluation should be practical and feasible and conducted within the confines of resources, time, and political context. Moreover, it should serve a useful purpose, be conducted in an ethical manner, and produce accurate findings. Evaluation findings should be used both to make decisions about program implementation and to improve program effectiveness.

Many different questions can be part of a program evaluation, depending on how long the program has been in existence, who is asking the question, and why the information is needed.

In general, evaluation questions fall into these groups:

  • Implementation: Were your program’s activities put into place as originally intended?
  • Effectiveness: Is your program achieving the goals and objectives it was intended to accomplish?
  • Efficiency: Are your program’s activities being produced with appropriate use of resources such as budget and staff time?
  • Cost-Effectiveness: Does the value or benefit of achieving your program’s goals and objectives exceed the cost of producing them?
  • Attribution: Can progress on goals and objectives be shown to be related to your program, as opposed to other things that are going on at the same time?

All of these are appropriate evaluation questions and might be asked with the intention of documenting program progress, demonstrating accountability to funders and policymakers, or identifying ways to make the program better.

Planning asks, “What are we doing and what should we do to achieve our goals?” By providing information on progress toward organizational goals and identifying which parts of the program are working well and/or poorly, program evaluation sets up the discussion of what can be changed to help the program better meet its intended goals and objectives.

Increasingly, public health programs are accountable to funders, legislators, and the general public. Many programs do this by creating, monitoring, and reporting results for a small set of markers and milestones of program progress. Such “performance measures” are a type of evaluation—answering the question “How are we doing?” More importantly, when performance measures show significant or sudden changes in program performance, program evaluation efforts can be directed to the troubled areas to determine “Why are we doing poorly or well?”

Linking program performance to program budget is the final step in accountability. Called “activity-based budgeting” or “performance budgeting,” it requires an understanding of program components and the links between activities and intended outcomes. The early steps in the program evaluation approach (such as logic modeling) clarify these relationships, making the link between budget and performance easier and more apparent.

While the terms surveillance and evaluation are often used interchangeably, each makes a distinctive contribution to a program, and it is important to clarify their different purposes. Surveillance is the continuous monitoring or routine data collection on various factors (e.g., behaviors, attitudes, deaths) over a regular interval of time. Surveillance systems have existing resources and infrastructure. Data gathered by surveillance systems are invaluable for performance measurement and program evaluation, especially of longer term and population-based outcomes. In addition, these data serve an important function in program planning and “formative” evaluation by identifying key burden and risk factors—the descriptive and analytic epidemiology of the public health problem. There are limits, however, to how useful surveillance data can be for evaluators. For example, some surveillance systems such as the Behavioral Risk Factor Surveillance System (BRFSS), Youth Tobacco Survey (YTS), and Youth Risk Behavior Survey (YRBS) can measure changes in large populations, but have insufficient sample sizes to detect changes in outcomes for more targeted programs or interventions. Also, these surveillance systems may have limited flexibility to add questions for a particular program evaluation.

In the best of all worlds, surveillance and evaluation are companion processes that can be conducted simultaneously. Evaluation may supplement surveillance data by providing tailored information to answer specific questions about a program. Data from specific questions for an evaluation are more flexible than surveillance and may allow program areas to be assessed in greater depth. For example, a state may supplement surveillance information with detailed surveys to evaluate how well a program was implemented and the impact of a program on participants’ knowledge, attitudes, and behavior. Evaluators can also use qualitative methods (e.g., focus groups, semi-structured or open-ended interviews) to gain insight into the strengths and weaknesses of a particular program activity.

Both research and program evaluation make important contributions to the body of knowledge, but fundamental differences in the purpose of research and the purpose of evaluation mean that good program evaluation need not always follow an academic research model. Even though some of these differences have tended to break down as research tends toward increasingly participatory models [6]  and some evaluations aspire to make statements about attribution, “pure” research and evaluation serve somewhat different purposes (See “Distinguishing Principles of Research and Evaluation” table, page 4), nicely summarized in the adage “Research seeks to prove; evaluation seeks to improve.” Academic research focuses primarily on testing hypotheses; a key purpose of program evaluation is to improve practice. Research is generally thought of as requiring a controlled environment or control groups. In field settings directed at prevention and control of a public health problem, this is seldom realistic. Of the ten concepts contrasted in the table, the last three are especially worth noting. Unlike pure academic research models, program evaluation acknowledges and incorporates differences in values and perspectives from the start, may address many questions besides attribution, and tends to produce results for varied audiences.

Research Principles

Program Evaluation Principles

Scientific method

  • State hypothesis.
  • Collect data.
  • Analyze data.
  • Draw conclusions.

Framework for program evaluation

  • Engage stakeholders.
  • Describe the program.
  • Focus the evaluation design.
  • Gather credible evidence.
  • Justify conclusions.
  • Ensure use and share lessons learned.

Decision Making

Investigator-controlled

  • Authoritative.

Stakeholder-controlled

  • Collaborative.
  • Internal (accuracy, precision).
  • External (generalizability).

Repeatability program evaluation standards

  • Feasibility.
  • Descriptions.
  • Associations.
  • Merit (i.e., quality).
  • Worth (i.e., value).
  • Significance (i.e., importance).

Isolate changes and control circumstances

  • Narrow experimental influences.
  • Ensure stability over time.
  • Minimize context dependence.
  • Treat contextual factors as confounding (e.g., randomization, adjustment, statistical control).
  • Understand that comparison groups are a necessity.

Incorporate changes and account for circumstances

  • Expand to see all domains of influence.
  • Encourage flexibility and improvement.
  • Maximize context sensitivity.
  • Treat contextual factors as essential information (e.g., system diagrams, logic models, hierarchical or ecological modeling).
  • Understand that comparison groups are optional (and sometimes harmful).

Data Collection

  • Limited number (accuracy preferred).
  • Sampling strategies are critical.
  • Concern for protecting human subjects.

Indicators/Measures

  • Quantitative.
  • Qualitative.
  • Multiple (triangulation preferred).
  • Concern for protecting human subjects, organizations, and communities.
  • Mixed methods (qualitative, quantitative, and integrated).

Analysis & Synthesis

  • One-time (at the end).
  • Focus on specific variables.
  • Ongoing (formative and summative).
  • Integrate all data.
  • Attempt to remain value-free.
  • Examine agreement on values.
  • State precisely whose values are used.

Conclusions

Attribution

  • Establish time sequence.
  • Demonstrate plausible mechanisms.
  • Control for confounding.
  • Replicate findings.

Attribution and contribution

  • Account for alternative explanations.
  • Show similar effects in similar contexts.

Disseminate to interested audiences

  • Content and format varies to maximize comprehension.

Feedback to stakeholders

  • Focus on intended uses by intended users.
  • Build capacity.
  • Emphasis on full disclosure.
  • Requirement for balanced assessment.
  • To monitor progress toward the program’s goals
  • To determine whether program components are producing the desired progress on outcomes
  • To permit comparisons among groups, particularly among populations with disproportionately high risk factors and adverse health outcomes
  • To justify the need for further funding and support
  • To find opportunities for continuous quality improvement.
  • To ensure that effective programs are maintained and resources are not wasted on ineffective programs

Program staff may be pushed to do evaluation by external mandates from funders, authorizers, or others, or they may be pulled to do evaluation by an internal need to determine how the program is performing and what can be improved. While push or pull can motivate a program to conduct good evaluations, program evaluation efforts are more likely to be sustained when staff see the results as useful information that can help them do their jobs better.

Data gathered during evaluation enable managers and staff to create the best possible programs, to learn from mistakes, to make modifications as needed, to monitor progress toward program goals, and to judge the success of the program in achieving its short-term, intermediate, and long-term outcomes. Most public health programs aim to change behavior in one or more target groups and to create an environment that reinforces sustained adoption of these changes, with the intention that changes in environments and behaviors will prevent and control diseases and injuries. Through evaluation, you can track these changes and, with careful evaluation designs, assess the effectiveness and impact of a particular program, intervention, or strategy in producing these changes.

Recognizing the importance of evaluation in public health practice and the need for appropriate methods, the World Health Organization (WHO) established the Working Group on Health Promotion Evaluation. The Working Group prepared a set of conclusions and related recommendations to guide policymakers and practitioners. [7] Recommendations immediately relevant to the evaluation of comprehensive public health programs include:

  • Encourage the adoption of participatory evaluation approaches that provide meaningful opportunities for involvement by all of those with a direct interest in initiatives (programs, policies, and other organized activities).
  • Require that a portion of total financial resources for a health promotion initiative be allocated to evaluation—they recommend 10%.
  • Ensure that a mixture of process and outcome information is used to evaluate all health promotion initiatives.
  • Support the use of multiple methods to evaluate health promotion initiatives.
  • Support further research into the development of appropriate approaches to evaluating health promotion initiatives.
  • Support the establishment of a training and education infrastructure to develop expertise in the evaluation of health promotion initiatives.
  • Create and support opportunities for sharing information on evaluation methods used in health promotion through conferences, workshops, networks, and other means.

The figure presents the steps and standards of the CDC Evaluation Framework.  The 6 steps are (1) engage stakeholders, (2) describe the program (3) focus the evaluation and its design, (4) gather credible evidence, (5) justify conclusions, and (6)ensure use and share lessons learned.

Program evaluation is one of ten essential public health services [8] and a critical organizational practice in public health. [9] Until recently, however, there has been little agreement among public health officials on the principles and procedures for conducting such studies. In 1999, CDC published Framework for Program Evaluation in Public Health and some related recommendations. [10] The Framework, as depicted in Figure 1.1, defined six steps and four sets of standards for conducting good evaluations of public health programs.

The underlying logic of the Evaluation Framework is that good evaluation does not merely gather accurate evidence and draw valid conclusions, but produces results that are used to make a difference. To maximize the chances evaluation results will be used, you need to create a “market” before you create the “product”—the evaluation. You determine the market by focusing evaluations on questions that are most salient, relevant, and important. You ensure the best evaluation focus by understanding where the questions fit into the full landscape of your program description, and especially by ensuring that you have identified and engaged stakeholders who care about these questions and want to take action on the results.

The steps in the CDC Framework are informed by a set of standards for evaluation. [11] These standards do not constitute a way to do evaluation; rather, they serve to guide your choice from among the many options available at each step in the Framework. The 30 standards cluster into four groups:

Utility: Who needs the evaluation results? Will the evaluation provide relevant information in a timely manner for them?

Feasibility: Are the planned evaluation activities realistic given the time, resources, and expertise at hand?

Propriety: Does the evaluation protect the rights of individuals and protect the welfare of those involved? Does it engage those most directly affected by the program and changes in the program, such as participants or the surrounding community?

Accuracy: Will the evaluation produce findings that are valid and reliable, given the needs of those who will use the results?

Sometimes the standards broaden your exploration of choices. Often, they help reduce the options at each step to a manageable number. For example, in the step “Engaging Stakeholders,” the standards can help you think broadly about who constitutes a stakeholder for your program, but simultaneously can reduce the potential list to a manageable number by posing the following questions: ( Utility ) Who will use these results? ( Feasibility ) How much time and effort can be devoted to stakeholder engagement? ( Propriety ) To be ethical, which stakeholders need to be consulted, those served by the program or the community in which it operates? ( Accuracy ) How broadly do you need to engage stakeholders to paint an accurate picture of this program?

Similarly, there are unlimited ways to gather credible evidence (Step 4). Asking these same kinds of questions as you approach evidence gathering will help identify ones what will be most useful, feasible, proper, and accurate for this evaluation at this time. Thus, the CDC Framework approach supports the fundamental insight that there is no such thing as the right program evaluation. Rather, over the life of a program, any number of evaluations may be appropriate, depending on the situation.

  • Experience in the type of evaluation needed
  • Comfortable with quantitative data sources and analysis
  • Able to work with a wide variety of stakeholders, including representatives of target populations
  • Can develop innovative approaches to evaluation while considering the realities affecting a program (e.g., a small budget)
  • Incorporates evaluation into all program activities
  • Understands both the potential benefits and risks of evaluation
  • Educates program personnel in designing and conducting the evaluation
  • Will give staff the full findings (i.e., will not gloss over or fail to report certain findings)

Good evaluation requires a combination of skills that are rarely found in one person. The preferred approach is to choose an evaluation team that includes internal program staff, external stakeholders, and possibly consultants or contractors with evaluation expertise.

An initial step in the formation of a team is to decide who will be responsible for planning and implementing evaluation activities. One program staff person should be selected as the lead evaluator to coordinate program efforts. This person should be responsible for evaluation activities, including planning and budgeting for evaluation, developing program objectives, addressing data collection needs, reporting findings, and working with consultants. The lead evaluator is ultimately responsible for engaging stakeholders, consultants, and other collaborators who bring the skills and interests needed to plan and conduct the evaluation.

Although this staff person should have the skills necessary to competently coordinate evaluation activities, he or she can choose to look elsewhere for technical expertise to design and implement specific tasks. However, developing in-house evaluation expertise and capacity is a beneficial goal for most public health organizations. Of the characteristics of a good evaluator listed in the text box below, the evaluator’s ability to work with a diverse group of stakeholders warrants highlighting. The lead evaluator should be willing and able to draw out and reconcile differences in values and standards among stakeholders and to work with knowledgeable stakeholder representatives in designing and conducting the evaluation.

Seek additional evaluation expertise in programs within the health department, through external partners (e.g., universities, organizations, companies), from peer programs in other states and localities, and through technical assistance offered by CDC. [12]

You can also use outside consultants as volunteers, advisory panel members, or contractors. External consultants can provide high levels of evaluation expertise from an objective point of view. Important factors to consider when selecting consultants are their level of professional training, experience, and ability to meet your needs. Overall, it is important to find a consultant whose approach to evaluation, background, and training best fit your program’s evaluation needs and goals. Be sure to check all references carefully before you enter into a contract with any consultant.

To generate discussion around evaluation planning and implementation, several states have formed evaluation advisory panels. Advisory panels typically generate input from local, regional, or national experts otherwise difficult to access. Such an advisory panel will lend credibility to your efforts and prove useful in cultivating widespread support for evaluation activities.

Evaluation team members should clearly define their respective roles. Informal consensus may be enough; others prefer a written agreement that describes who will conduct the evaluation and assigns specific roles and responsibilities to individual team members. Either way, the team must clarify and reach consensus on the:

  • Purpose of the evaluation
  • Potential users of the evaluation findings and plans for dissemination
  • Evaluation approach
  • Resources available
  • Protection for human subjects.

The agreement should also include a timeline and a budget for the evaluation.

This manual is organized by the six steps of the CDC Framework. Each chapter will introduce the key questions to be answered in that step, approaches to answering those questions, and how the four evaluation standards might influence your approach. The main points are illustrated with one or more public health examples that are composites inspired by actual work being done by CDC and states and localities. [13] Some examples that will be referred to throughout this manual:

The program aims to provide affordable home ownership to low-income families by identifying and linking funders/sponsors, construction volunteers, and eligible families. Together, they build a house over a multi-week period. At the end of the construction period, the home is sold to the family using a no-interest loan.

Lead poisoning is the most widespread environmental hazard facing young children, especially in older inner-city areas. Even at low levels, elevated blood lead levels (EBLL) have been associated with reduced intelligence, medical problems, and developmental problems. The main sources of lead poisoning in children are paint and dust in older homes with lead-based paint. Public health programs address the problem through a combination of primary and secondary prevention efforts. A typical secondary prevention program at the local level does outreach and screening of high-risk children, identifying those with EBLL, assessing their environments for sources of lead, and case managing both their medical treatment and environmental corrections. However, these programs must rely on others to accomplish the actual medical treatment and the reduction of lead in the home environment.

A common initiative of state immunization programs is comprehensive provider education programs to train and motivate private providers to provide more immunizations. A typical program includes a newsletter distributed three times per year to update private providers on new developments and changes in policy, and provide a brief education on various immunization topics; immunization trainings held around the state conducted by teams of state program staff and physician educators on general immunization topics and the immunization registry; a Provider Tool Kit on how to increase immunization rates in their practice; training of nursing staff in local health departments who then conduct immunization presentations in individual private provider clinics; and presentations on immunization topics by physician peer educators at physician grand rounds and state conferences.

Each chapter also provides checklists and worksheets to help you apply the teaching points.

[4] Scriven M. Minimalist theory of evaluation: The least theory that practice requires. American Journal of Evaluation 1998;19:57-70.

[5] Patton MQ. Utilization-focused evaluation: The new century text. 3rd ed. Thousand Oaks, CA: Sage, 1997.

[6] Green LW, George MA, Daniel M, Frankish CJ, Herbert CP, Bowie WR, et al. Study of participatory research in health promotion: Review and recommendations for the development of participatory research in health promotion in Canada . Ottawa, Canada : Royal Society of Canada , 1995.

[7] WHO European Working Group on Health Promotion Evaluation. Health promotion evaluation: Recommendations to policy-makers: Report of the WHO European working group on health promotion evaluation. Copenhagen, Denmark : World Health Organization, Regional Office for Europe, 1998.

[8] Public Health Functions Steering Committee. Public health in America . Fall 1994. Available at <http://www.health.gov/phfunctions/public.htm>. January 1, 2000.

[9] Dyal WW. Ten organizational practices of public health: A historical perspective. American Journal of Preventive Medicine 1995;11(6)Suppl 2:6-8.

[10] Centers for Disease Control and Prevention. op cit.

[11] Joint Committee on Standards for Educational Evaluation. The program evaluation standards: How to assess evaluations of educational programs. 2nd ed. Thousand Oaks, CA: Sage Publications, 1994.

[12] CDC’s Prevention Research Centers (PRC) program is an additional resource. The PRC program is a national network of 24 academic research centers committed to prevention research and the ability to translate that research into programs and policies. The centers work with state health departments and members of their communities to develop and evaluate state and local interventions that address the leading causes of death and disability in the nation. Additional information on the PRCs is available at www.cdc.gov/prc/index.htm.

[13] These cases are composites of multiple CDC and state and local efforts that have been simplified and modified to better illustrate teaching points. While inspired by real CDC and community programs, they are not intended to reflect the current

Pages in this Report

  • Acknowledgments
  • Guide Contents
  • Executive Summary
  • › Introduction
  • Step 1: Engage Stakeholders
  • Step 2: Describe the Program
  • Step 3: Focus the Evaluation Design
  • Step 4: Gather Credible Evidence
  • Step 5: Justify Conclusions
  • Step 6: Ensure Use of Evaluation Findings and Share Lessons Learned
  • Program Evaluation Resources

E-mail: [email protected]

To receive email updates about this page, enter your email address:

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

American Evaluation Association

  • Mission, Vision, & Values
  • AEA Policies
  • AEA Governance
  • Training Package
  • AEA Evaluator Competencies
  • Cutural Competence Statement
  • What is Evaluation
  • 2023 AEA Awards Recipients
  • In Memoriam
  • AEA Connect
  • Find an Evaluator
  • Member Directory
  • Local Affiliates
  • International Partnerships
  • Career Center
  • Research Requests
  • Coffee Breaks
  • Potent Presentations
  • Prospective Scholar
  • Prospective Host Site
  • GEDI Program Leadership
  • Frequently Asked Questions (FAQ)
  • MSI Fellows
  • University Centers
  • Information for SECC Teams
  • Information for SECC Judges
  • Competition Rules
  • AEA U. S. SECC FAQ
  • Student Case Competition Winners
  • Conference Theme
  • Future Conference Dates
  • International Travel Awards
  • Registration
  • Dialogues on Race and Class
  • Conference History
  • External Eval Events
  • American Journal of Evaluation
  • New Directions For Evaluation
  • AJE Podcast
  • Newsletter Archive
  • Current Issue
  • Past Issues
  • AEA365 Blog
  • Outside Materials
  • Evaluation Contribution
  • Practices and Methodology
  • Effective Government Roadmap
  • Evaluation Policy Task Force
  • Policy News
  • Policy Statements
  • Policy Resources

Evaluation Education & Programs

The American Evaluation Association offers a robust library of educational tools and resources to help professional evaluators excel in their careers and advance their practice in all types of evaluation. Explore our offerings to find the education format that matches your learning style and needs. AEA’s live and on-demand professional development offerings can be found on the Digital Knowledge Hub . 

Digital Knowledge Hub

In-depth eStudy courses offer a deep dive into top-of-mind evaluation themes and topics. Open to both members and nonmembers alike, eStudies provide a diverse learning experience where collaboration is encouraged. Submit your interest in presenting here .

Annual Conference Icon

Evaluation Annual Conference

The Evaluation Annual Conference is our largest gathering of the year, bring together 3.000+ evaluators, evaluation scholars, students, and evaluation users from around the world to assemble, share, and learn from the successes of the international discipline and practice of evaluation. This conference is held annually in the fall.

coffee icon

Coffee Breaks are 20 minute presentations that provide insights into niche topics impacting evaluation practice and introduce new tools to evaluators. Coffee Breaks are offered exclusively to AEA members.

Summer Eval Icon

Summer Evaluation Institute

The annual Summer Evaluation Institute is our smaller in-person gathering held in Atlanta, Georgia each June. This event features up to 30 workshops on a variety of evaluation themes.

e-Learning Course icon

e-Learning Course

Online, self-paced, in-depth course that can be taken anytime and anywhere. These self-paced courses allow you the flexibility to complete on your own.

Potent Presentations Icon

Potent Presentation helps evaluators improve their presentation skills, both at the annual conference and in individual evaluation practice. Potent Presenters think about three key components of compelling presentation: Message, Design, and Delivery. Free to the evaluation community!

Town Halls icon

The AEA Board of Directors host regular Town Halls to engage with members and discuss a variety of strategic and visionary topics with the membership. Town Halls are offered exclusively to AEA members.

GEDI Icon

The Graduate Education Diversity Internship Program provides paid internship and training opportunities during the academic year. The GEDI program works to engage and support students from groups traditionally under-represented in the field of evaluation.

MSI icon

The Minority Serving Institution Fellowship seeks to increase the participation of evaluators and academics from underrepresented groups in the profession of evaluation and in the American Evaluation Association. The MSI Faculty Initiative identifies this group of potential and practicing evaluators by drawing from faculty at MSIs.

Social Justice Series

AEA is committed to improving evaluation theory, practice, and methods with education and resources that build professional skills of cultural competence, communication, facilitation, and conflict resolution. We are committed to providing a wide range of offerings that help practitioners learn, identify, and address Social Justice and Equity in evaluation.

Internship and Fellowship Programs Opportunities

AEA provides paid internship and training opportunities for underrepresented groups throughout the academic year. These programs help aspiring evaluation professionals build a career path in evaluation through the guidance of peers and subject experts.

Graduate Education Diversity Internship Minority Serving Institution Fellowship

Upcoming Events

October 9-14, 2023 | indianapolis, in.

Evaluation 2023 The Power of Story

program evaluation in education

Terms of Service | Privacy Policy | Cookie Policy

  • Our Program Divisions
  • Our Three Academies
  • Government Affairs
  • Statement on Diversity and Inclusion
  • Our Study Process
  • Conflict of Interest Policies and Procedures
  • Project Comments and Information
  • Read Our Expert Reports and Published Proceedings
  • Explore PNAS, the Flagship Scientific Journal of NAS
  • Access Transportation Research Board Publications
  • Coronavirus Disease 2019 (COVID-19)
  • Diversity, Equity, and Inclusion
  • Economic Recovery
  • Fellowships and Grants
  • Publications by Division
  • Division of Behavioral and Social Sciences and Education
  • Division on Earth and Life Studies
  • Division on Engineering and Physical Sciences
  • Gulf Research Program
  • Health and Medicine Division
  • Policy and Global Affairs Division
  • Transportation Research Board
  • National Academy of Sciences
  • National Academy of Engineering
  • National Academy of Medicine
  • Publications by Topic
  • Agriculture
  • Behavioral and Social Sciences
  • Biography and Autobiography
  • Biology and Life Sciences
  • Computers and Information Technology
  • Conflict and Security Issues
  • Earth Sciences
  • Energy and Energy Conservation
  • Engineering and Technology
  • Environment and Environmental Studies
  • Food and Nutrition
  • Health and Medicine
  • Industry and Labor
  • Math, Chemistry, and Physics
  • Policy for Science and Technology
  • Space and Aeronautics
  • Surveys and Statistics
  • Transportation and Infrastructure
  • Searchable Collections
  • New Releases

Program Evaluation in Education: When? How? to What Ends?

VIEW LARGER COVER

Program Evaluation in Education

When how to what ends.

  • Health and Medicine — Medical Training and Workforce
  • Health and Medicine — Other Diseases

Suggested Citation

National Research Council. 1981. Program Evaluation in Education: When? How? to What Ends? . Washington, DC: The National Academies Press. https://doi.org/10.17226/19657. Import this citation to: Bibtex EndNote Reference Manager

Publication Info

What is skim.

The Chapter Skim search tool presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter. You may select key terms to highlight them within pages of each chapter.

Copyright Information

The National Academies Press (NAP) has partnered with Copyright Clearance Center's Marketplace service to offer you a variety of options for reusing NAP content. Through Marketplace, you may request permission to reprint NAP content in another publication, course pack, secure website, or other media. Marketplace allows you to instantly obtain permission, pay related fees, and print a license directly from the NAP website. The complete terms and conditions of your reuse license can be found in the license agreement that will be made available to you during the online order process. To request permission through Marketplace you are required to create an account by filling out a simple online form. The following list describes license reuses offered by the NAP through Marketplace:

  • Republish text, tables, figures, or images in print
  • Post on a secure Intranet/Extranet website
  • Use in a PowerPoint Presentation
  • Distribute via CD-ROM

Click here to obtain permission for the above reuses. If you have questions or comments concerning the Marketplace service, please contact:

Marketplace Support International +1.978.646.2600 US Toll Free +1.855.239.3415 E-mail: [email protected] marketplace.copyright.com

To request permission to distribute a PDF, please contact our Customer Service Department at [email protected] .

What is a prepublication?

What is a prepublication image

An uncorrected copy, or prepublication, is an uncorrected proof of the book. We publish prepublications to facilitate timely access to the committee's findings.

What happens when I pre-order?

The final version of this book has not been published yet. You can pre-order a copy of the book and we will send it to you when it becomes available. We will not charge you for the book until it ships. Pricing for a pre-ordered book is estimated and subject to change. All backorders will be released at the final established price. As a courtesy, if the price increases by more than $3.00 we will notify you. If the price decreases, we will simply charge the lower price. Applicable discounts will be extended.

Downloading and Using eBooks from NAP

What is an ebook.

An ebook is one of two file formats that are intended to be used with e-reader devices and apps such as Amazon Kindle or Apple iBooks.

Why is an eBook better than a PDF?

A PDF is a digital representation of the print book, so while it can be loaded into most e-reader programs, it doesn't allow for resizable text or advanced, interactive functionality. The eBook is optimized for e-reader devices and apps, which means that it offers a much better digital reading experience than a PDF, including resizable text and interactive features (when available).

Where do I get eBook files?

eBook files are now available for a large number of reports on the NAP.edu website. If an eBook is available, you'll see the option to purchase it on the book page.

View more FAQ's about Ebooks

Types of Publications

IMAGES

  1. The Program Assessment Cycle

    program evaluation in education

  2. What is Evaluation in Education? Definition of Evaluation in Education

    program evaluation in education

  3. Program Evaluation

    program evaluation in education

  4. Developing and Evaluating Educational Programs

    program evaluation in education

  5. 10 Amazing Course Evaluation Survey Templates

    program evaluation in education

  6. FREE 25+ Sample Course Evaluation Forms in PDF

    program evaluation in education

VIDEO

  1. Giancola Ch 5- Define, Part I: Understanding the Program

  2. Giancola Ch 2- History of Evaluation

  3. Giancola Ch 10- Implement, Part II: Analyzing the Data

  4. Giancola Ch 1- Evaluation Matters

  5. Program Evaluation Example

  6. Giancola Ch 12- Inform and Refine: Using Evaluation Results

COMMENTS

  1. PDF Program Evaluation Toolkit: Quick Start Guide

    5 8. Program Evaluation Toolkit: Quick Start Guide. Joshua Stewart, Jeanete Joyce, Mckenzie Haines, David Yanoski, Douglas Gagnon, Kyle Luke, Christopher Rhoads, and Carrie Germeroth October 2021. Program evaluation is important for assessing the implementation and outcomes of local, state, and federal programs.

  2. PDF What is program evaluation?

    How does program evaluation answer questions about whether a program works, or how to improve it. Basically, program evaluations systematically collect and analyze data about program activities and outcomes. The purpose of this guide is to briefly describe the methods used in the systematic collection and use of data.

  3. Program Evaluation

    Our Program Evaluation Experts. The Johns Hopkins School of Education is home to a diverse group of scholars, researchers, analysts, and administrators with expertise in educational program evaluation. Our experts evaluate programs for all grade levels and content areas, as well as big topics in education like technology and education policy.

  4. Program Assessment

    Program Assessment. Program evaluation is the process of systematically collecting, analyzing, and using data to review the effectiveness and efficiency of programs. In educational contexts, program evaluations are used to: identify methods of improving the quality of higher education; provide feedback to students, faculty, and adminstrators ...

  5. Program Assessment

    In program evaluation, measurement methods are best categorized into direct and indirect measures. Both measures can provide a more holistic view of the impacts of a program. There are also four common types of data that are analyzed in educational research and evaluation: observations, artifacts, historical or institutional records, and self ...

  6. Program Evaluation: Getting Started and Standards

    What Is Known. In the mid-20th century, program evaluation evolved into its own field. Today, the purpose of program evaluation typically falls in 1 of 2 orientations in using data to (1) determine the overall value or worth of an education program (summative judgements of a program) or (2) plan program improvement (formative improvements to a program, project, or activity).

  7. Section 1. A Framework for Program Evaluation: A Gateway to Tools

    Learn how program evaluation makes it easier for everyone involved in community health and development work to evaluate their ... & Kok, G. (1992). The utilization of qualitative and quantitative data for health education program planning, implementation, and evaluation: a spiral approach. Health Education Quarterly.1992; 19(1):101-15. ...

  8. National Center for Education Evaluation and Regional Assistance (NCEE

    The National Center for Education Evaluation and Regional Assistance (NCEE) conducts unbiased large-scale evaluations of education programs and practices supported by federal funds, such as Reading First and Title I of the Elementary and Secondary Education Act. ... The program included reading, speaking, and writing activities for students and ...

  9. Certificate in Education Program Evaluation

    27Jun 11:30am-1pm ET. SCS Open House Lunch. RSVP. The Certificate in Education Program Evaluation prepares you with an advanced understanding of program evaluation theory, methods, and applications for the 21st century. Through case studies and hands-on exercises, you'll develop the well-rounded skills and expertise needed to support and ...

  10. Evaluating Educational Programs

    1 An Emerging Profession. Evaluating educational programs is an emerging profession, and Educational Testing Service (ETS) has played an active role in its development. The term program evaluation only came into wide use in the mid-1960s, when efforts at systematically assessing programs multiplied.

  11. Program evaluation models and related theories: AMEE Guide No. 67

    Program evaluation defined. At the most fundamental level, evaluation involves making a value judgment about information that one has available (Cook Citation 2010; Durning & Hemmer Citation 2010).Thus educational program evaluation uses information to make a decision about the value or worth of an educational program (Cook Citation 2010).More formally defined, the process of educational ...

  12. Program Evaluation and Planning

    A Practical Guide to Program Evaluation Planning: Theory and Case Examples provides a step-by-step process to guide evaluators in planning a comprehensive, yet feasible, program evaluation--from start to design--within any context. No book currently on the market delineate the required steps for preparing to conduct an evaluation.

  13. Program Evaluation Standards

    The third edition of the Program Evaluation Standards was published in 2010. The six-year development process relied on formal and informal needs assessments; reviews of existing scholarship; and the involvement of more than 400 stakeholders in national and international reviews, field trials, and national hearings. The third edition can be purchased from a variety of […]

  14. PDF Program Evaluation: An Introduction

    Program Evaluation Defined. "Evaluation: Systematic investigation of the value, importance, or significance of something or someone along defined dimensions" (Yarbrough, Shulha, Hopson, & Caruthers, 2011, p. 287). "Evaluation is the systematic process of delineating, obtaining, reporting, and applying descriptive and judgmental ...

  15. Program evaluation: An educator's portal into academic scholarship

    WHAT IS PROGRAM EVALUATION? In medical education, a "program" can refer to a large spectrum of activities, and experiences—they can range from a new workplace‐based assessment program 6, 7 to a boot camp series 8 to a longitudinal faculty development course. 9, 10 It is an ever‐evolving field with new technologies, shifting paradigms, and often unclear scholarly formats.

  16. Program Evaluation Guide

    Program evaluation is one of ten essential public health services [8] and a critical organizational practice in public health. [9] Until recently, however, there has been little agreement among public health officials on the principles and procedures for conducting such studies. ... and provide a brief education on various immunization topics ...

  17. Evaluation Education & Programs

    Evaluation Education & Programs. The American Evaluation Association offers a robust library of educational tools and resources to help professional evaluators excel in their careers and advance their practice in all types of evaluation. Explore our offerings to find the education format that matches your learning style and needs.

  18. Program Evaluation in Education: When? How? to What Ends?

    defining evaluation: 35-60: quality of evaluation: 61-96: using evaluation results: 97-132: organizing and managing evaluation activities: 133-178: glossary: 179-182: references: 183-194: appendixes: 195-194: appendix a: federal evaluation activities in education: an overview: 195-216: appendix b: performers of federally funded evaluation ...

  19. (PDF) EVALUATION MODEL OF EDUCATION PROGRAMS

    Abstract. 1. INTRODUTION Educational program evaluation is a series of activities carried out on purpose to see the level of success of educational programs. Evaluating educational programs is an ...