Bookmark this page

  • A Model for the National Assessment of Higher Order Thinking

International Critical Thinking Essay Test

international test of critical thinking

  • Online Critical Thinking Basic Concepts Sample Test
  • Consequential Validity: Using Assessment to Drive Instruction

Translate this page from English...

*Machine translated pages not guaranteed for accuracy. Click Here for our professional translations.

Appropriate for Higher Education and High School

Preview or License the Test Here

Access Validation Research Here

While teaching students about your subject or discipline requires teaching them to reason about and within it, students come to us with varying (and usually very limited) knowledge of how to reason about or within any subject area. The purpose of the International Critical Thinking Essay Test is to assess students' understanding of the fundamentals of critical thinking that should be used to reason through any subject, and to do this more accurately and insightfully than any computer-graded test can do. The goal of the test is two-fold. The first goal is to provide a reasonable way to pre- and post-test students to determine the extent to which they have learned to think critically within a discipline or subject. The second goal is to provide a testing instrument that stimulates faculty to teach their discipline so as to foster critical thinking in the students. Once faculty become committed to pre- and post-testing their students using the exam, it is natural and desirable for them to emphasize analysis and assessment of thinking in their routine instruction within the subjects they teach. The exam, therefore, is designed to have a significant effect on instruction.  In other words, the test is designed to have high consequential validity; that is, the consequence of using the test is significant: faculty tend to re-structure their courses to put more emphasis on critical thinking within the disciplines (to help students prepare for the test). It also has the consequence that faculty think through important critical thinking principles and standards (which they otherwise take for granted).  See our white paper: Consequential Validity: Using Assessment to Drive Instruction . The International Critical Thinking Essay Test differs from traditional critical thinking tests in that traditional tests tend to have low consequential validity; that is, the nature of the test items is such that faculty, not seeing the relevance of the test to the content they teach, ignore it. The International Critical Thinking Essay Test is the perfect test to teach to. For one, the structure and standards for thought explicit in the test are relevant to thinking in all departments and divisions. The English Department can test their students using a literary prompt. The History Department can choose an excerpt from historical writing; Sociology from sociological writing; etc. In one case, a section from a textbook may be chosen; in another, an editorial, in a third, a professional essay. In short, the writing prompt can be chosen from any discipline or writing sample. What is more, since to make the test reliable the faculty must be intimately involved in the choosing of the writing prompt and in the grading of tests, faculty are primed to follow up on the results. Results are seen to be relevant to assessing instruction within the departments involved. The International Critical Thinking Essay Test is divided into two parts: 1) analysis of a writing prompt, and 2) assessment of the writing prompt. The analysis is worth 80 points; the assessment is worth 20. In the Analysis segment of the test, the student must accurately identify the elements of reasoning within a written piece (each response is worth 10 points). In the Assessment segment of the test, the student must construct a critical analysis and evaluation of the reasoning (in the original piece).  Part one can be used within using part two. Each student exam must be graded individually by a person competent to assess the critical thinking of the test taker and trained in the grading called for in this examination. In evaluating student exams the grader is attempting to answer two questions:

  • Did the student clearly understand the key components in the thinking of the author, as exhibited in the writing sample? (Identifying Purpose, Question at Issue, Information, Conclusions, Assumptions, Concepts, Implications, Point of View) .
  • Was the student able to effectively evaluate the reasoning, as appropriate, in the original text and present his/her assessment effectively? (Pointing out strengths and possible limitations and/or weaknesses of the reasoning in the writing sample).

The International Critical Thinking Essay Test Is Available to Educational Institutions Under Three Different Options

  • With a Training Session For Test Graders.   This test is designed for use by faculty who are fostering understanding of the analysis and assessment of thought. Accurately grading the test and achieving inter-rater reliability requires considerable skill at identifying the elements of an author's reasoning (focused on a given text), as well as assessing these elements using intellectual standards. The best way we have found for facilitating this process is by offering critical thinking workshops for faculty and others who will be grading the test. The cost for this will vary according to area of the country. Contact us if you are interested in this option. The test is offered at $10 each time the student takes the test, when we train the test graders under this plan.
  • Direct License.   You may elect to be licensed directly to use the exam. The cost for this varies depending on number of students. In this case, you must take responsibility for training the graders and the appropriate use of the exam. We do not recommend this option; see our reasoning for this in number one above.
  • Pilot Site.  It is possible to become a pilot site for the exam. If interested, please submit a plan as to how you will field test the exam, specifying your purpose and how you will structure your pilot project. To be accepted as a pilot site, you must provide evidence that test graders will be adequately trained and the conditions under which  the exam will be piloted will be carefully controlled. Once you have used the test you will  provide a written report explaining the results of your project. If your plan is accepted, the exam will be provided to your institution at no charge during the course of the pilot program.

The Nature of the Exam

General directions to the student.

After you carefully read the writing sample (taking whatever notes you want), you will have two tasks — each of them important. First, you will complete a template (see Form A) demonstrating your ability to recognize key important components in the thinking of an author. For example, your ability to recognize the author's purpose or the nature of the question, problem, or issue that is at the heart of the original editorial, article, or essay. You should not write your answers on Form A. Use your own paper, or blank pages provided, in order to have room to elaborate. Second, you will summarize your assessment of strengths and weaknesses of the reasoning of the original editorial, article, or essay (with special attention to the components you commented on). In doing this, you should present your analysis and assessment in the form of a persuasive explanation of your thinking about the original, imagining your audience as educated reasonable persons. You are therefore appealing to the reason of the audience, not their emotions. You should refer to intellectual standards whenever you can (Clarity, Accuracy, Precision, Relevance, Depth, Breadth, Logicalness, Significance). For example, you might feel that the question or problem in the text was never sufficiently made clear or that the information in support of a key conclusion was irrelevant to the question. You would then state how the issue or question should have been expressed. If you judge that the information in the original editorial, article, or essay was in part irrelevant, you would state what sort of information was relevant and comment on how that information could best be obtained. You should refer to the Criteria for Evaluating Reasoning (see Form B) in assessing the author's thinking as displayed in the editorial, article, or essay. You are provided with the main criteria that the grader will be using in assessing your answers. The "grader" will be asking himself/herself two questions while reading your answer:

  • Did the student clearly understand the key components in the thinking of the original editorial, article, or essay?
  • Was the student able to effectively evaluate the reasoning in the original editorial, article, or essay? Did the student present a reasonable case for his/her interpretation of the writing sample?

In an excellent evaluation, the evaluator takes into account the nature and purpose of the original writing sample. For example, it would be inappropriate to apply the same criteria to an editorial (which is severely limited in space) that one would to a research monograph or to the report of a scientific experiment to a scientific journal. In some writing technical information is essential and in other writing it is enough to cite common experience in supporting one's conclusions. In every case, we expect the student to sympathetically enter into the viewpoint of the author and to engage in a fair-minded assessment based on an insightful understanding of the author's reasoning. The extra weight (80 points) which is given to an accurate analysis as a necessary first step to evaluation (20 points) reflects our emphasis on the fact that fair-minded critical thinkers always make sure that they understand something BEFORE they criticize it. Good criticism always makes a contribution to the object of its criticism. It brings both strengths and weaknesses out into the open so that we may build on the first and correct the second. The Validity and Reliability of The International Critical Thinking Essay Test 

The validity of the Analysis portion of this test has been established through peer-reviewed research published in Inquiry: Critical Thinking Across the Disciplines . Read more here .

The main purpose of the test is for internal use, and the goal is to facilitate faculty putting more emphasis on thinking critically within the disciplines taught. Because the faculty use various prompts on different testing occasions and choose those prompts from different disciplines, it is difficult to compare student performances (using different prompts) by point scores alone. The goal is for the grading faculty to report back to the teaching faculty with appropriate commentary that enables faculty to form reasonable conclusions about the degree to which students are developing critical thinking skills. The exam has high "face validity," for it directly tests the students’ ability 1) to accurately identify the most fundamental intellectual structures in thinking and 2) to do so in a piece of writing which the faculty themselves choose. It is clear and uncontroversial that critical thinking requires the thinker to analyze and evaluate reasoning. The test requires the student to do just that and, once again, to do so with respect to prompts which are representative of the content that is covered by instruction. One gains insight into the validity of the exam to the extent that one recognizes the significance of the abilities directly tested in the exam: the students’ ability to accurately identify the purpose of a piece of writing, the questions it raises, the information it embodies, the inferences and conclusions arrived at, the key concepts, the underlying assumptions, the implications of the reasoning, and the point of view of the reasoner. One gains further insight into the validity of the exam to the extent that one recognizes the significance of the intellectual standards which the student must use to assess the reasoning in the prompt: the relative clarity, accuracy, precision, relevance, depth, breadth, logicalness, significance, and fairness of the reasoning. Beyond that one gains insight into the usefulness of the test in grasping its potential in helping faculty to develop comparable descriptions of their programs and course grading standards that highlight the critical thinking embodied in the content. Of course, success depends directly on the competence of the graders and the manner in which they have established consistency in their grading. Here are the instructions faculty are given for this purpose:

How to Understand the Examination

First review some of the basic principles and purposes behind critical thinking so that you go into the grading of the examination with the clearest sense of what you are going to assess. You should review the Elements of Thought and the Universal Intellectual Standards. Then you should carefully review the editorial, article, or essay the students are going to analyze and comment on. Each faculty evaluator should read and take the test himself/herself. The faculty evaluators should reach consensus on the range of interpretations of that piece that are plausible. Once a consensus is achieved, one or two student case analyses should be individually assessed by all faculty and scoring compared. Faculty should use Form A and Form B as the criteria for scoring. All faculty should be within a 10 point range. How To Score Exams  

1) First, carefully read and analyze the editorial yourself, making sure that you are clear as to its structure: the writer’s purpose, the central question posed, the information presented and reasons given in support of the author’s position, the main conclusions and concepts, the fundamental assumptions and implications, and, of course, the point of view within the framework of which all of the reasoning proceeds. 2) Do a critical evaluation of the strengths and weaknesses (or limitations) of the original writing prompt. Make sure there is agreement of the faculty graders on these strengths and weaknesses. 2) Read a few of the essays to be scored. 3) Follow the grading procedure detailed in test. 4) The margin of error for graders should be plus or minus ten points. Practice grading with two other graders until the scoring of the three of you fall consistently within this range.

Practical Guide for Critical Thinking Instruction

Critical thinking is, among other things, "thinking that analyzes itself, evaluates itself, and improves itself as a result," In science classes, students should learn to think scientifically; in math classes, to think mathematically; in history classes, to think historically; etc… Critical thinking is essential to this internalization. We internalize the logic of scientific thinking when we can analyze, evaluate, and improve instances of it. We internalize the logic of mathematical thinking when we can analyze, evaluate, and improve instances of it. We internalize the logic of historical thinking when we can analyze, evaluate, and improve instances of it.

To teach a subject in a critical manner requires that students take ownership of the basic intellectual structures of the discipline (the elements of thought focused upon in Part I of the exam). It also requires that students internalize intellectual standards which they can use in assessing thinking for its strengths and weaknesses (the standards of thought which are focused upon in Part II of the International Critical Thinking Essay Test). The Elements of Thought ( The Essence of Part I of the Exam )

To understand content as a mode of thinking, we need to recognize that all content has a logic which is defined by the same eight dimensions that define the thinking which produced, and continue to produce it. All content/thinking has been generated by organizing goals and purposes (that enable professionals to share in the pursuit of common ends and projects); All content/thinking is defined by the problems it defines and solves; All content/thinking presupposes the gathering and use of information in professional performance & problem solving; All content/thinking requires the making of inferences from relevant data or information to interpretative conclusions (rendering thereby the data of use to practitioners for guiding judgments); All content/thinking is structured by concepts (theoretical constructs) that organize, shape, and "direct" it; All content/ thinking proceeds from assumptions or presuppositions from which it logically proceeds (providing "boundaries" for the field); All content/thinking generates implications and consequences, that enable professionals to make predictions and test theories, lines of reasoning, and hypotheses; All content/thinking defines a frame of reference or point of view (which provide practitioners with a logical map of use in considering the professional "moves" they will make). The Exam Highlights the Interrelationship Between Content and Thinking Each of the above sentences, as you may have noted, read equally well with either "content" or "thinking" as the subject. This is no accident of language. There is a perfect logical symmetry captured in each case. The symmetry is a reflection of the fact that all of what we call "content" is nothing more nor less than an organized product of a specific mode of disciplined thinking, developed by a community of thinkers.

When we master the logic of the thinking, we master the logic of the content. When we master the logic of the content, we master the logic of the thinking. For example, when we learn to think like a historian, we, at one and the same time, master the logic of the discipline called "History." When we master the logic of "History," we master, ipso facto, the logic of historical thought. Period. There is nothing else that remains. Once we begin to grasp content as a mode of thinking, we can begin to isolate the connection between what it is that good thinkers must do to think well within that content and what it is that students must do to perform competently in the academic field defined by it. For example, it is possible to construct a generic description of academic goals that can be contextualized for virtually any field of study. Consider the following generic description. As you read through it, mentally place your discipline in the blank spaces. It is followed by a couple of sample contextualizations to exemplify what we mean.

The International Critical Thinking Essay Exam Suggests Model Descriptions of Goals For Academic Programs

  Students successfully completing a major in ... will acquire a range of ... thinking skills and abilities which they use in the acquisition of knowledge. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key ... terms and distinctions, the ability to identify and solve fundamental ... problems. Their work will demonstrate a mind in charge of its own ... ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze ... questions and issues clearly and precisely, formulate ... information accurately, distinguish the relevant from irrelevant, recognize key questionable ... assumptions, use key ... concepts effectively, use ... language in keeping with established professional usage, identify relevant competing ... points of view, and reason carefully from clearly stated ... premises, as well as show sensitivity to important ... implications and consequences. They will demonstrate excellent ... reasoning and problem-solving.

Sample Contextualizations

History Department Students successfully completing a major in History will demonstrate a range of historical thinking skills and abilities which they use in the acquisition of knowledge. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key historical terms and distinctions, the ability to identify and solve fundamental historical problems. Their work will demonstrate a mind in charge of its own historical ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze historical questions and issues clearly and precisely, formulate historical information accurately, distinguish the relevant from irrelevant, recognize key questionable historical assumptions, use key historical concepts effectively, use historical language in keeping with established professional usage, identify relevant competing historical points of view, and reason carefully from clearly stated historical premises, as well as show sensitivity to important historical implications and consequences. They will demonstrate excellent historical reasoning and problem-solving. Anthropology Department Students successfully completing a major in Anthropology will demonstrate a range of anthropological thinking skills and abilities which they use in the acquisition of anthropological knowledge. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key anthropological terms and distinctions, the ability to identify and solve fundamental anthropological problems. Their work will demonstrate a mind in charge of its own anthropological ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze anthropological questions and issues clearly and precisely, formulate anthropological information accurately, distinguish the relevant from irrelevant, recognize key questionable anthropological assumptions, use key anthropological concepts effectively, use anthropological language in keeping with established professional usage, identify relevant competing anthropological points of view, and reason carefully from clearly stated anthropological premises, as well as show sensitivity to important anthropological implications and consequences. They will demonstrate excellent anthropological reasoning and problem-solving. Biology Department Students successfully completing a major in Biology will demonstrate a range of biological thinking skills and abilities which they use in the acquisition of biological knowledge. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key biological terms and distinctions, the ability to identify and solve fundamental biological problems. Their work will demonstrate a mind in charge of its own biological ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze biological questions and issues clearly and precisely, formulate biological information accurately, distinguish the relevant from irrelevant, recognize key questionable biological assumptions, use key biological concepts effectively, use biological language in keeping with established professional usage, identify relevant competing biological points of view, and reason carefully from clearly stated biological premises, as well as show sensitivity to important biological implications and consequences. They will demonstrate excellent biological reasoning and problem-solving. Philosophy Department Students successfully completing a major in Philosophy will demonstrate a range of philosophical thinking skills and abilities. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key philosophical terms and distinctions, the ability to identify and solve fundamental philosophical problems. Their work will demonstrate a mind in charge of its own philosophical ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze philosophical questions and issues clearly and precisely, formulate philosophical information accurately, distinguish the relevant from irrelevant, recognize key questionable philosophical assumptions, use key philosophical concepts effectively, use philosophical language in keeping with established professional usage, identify relevant competing philosophical points of view, and reason carefully from clearly stated philosophical premises, as well as show sensitivity to important philosophical implications and consequences. They will demonstrate excellent philosophical reasoning and problem-solving.

Marketing Department Students successfully completing a major in Marketing will demonstrate a range of marketing thinking skills and abilities which they use in the acquisition of knowledge. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key marketing terms and distinctions, the ability to identify and solve fundamental marketing problems. Their work will demonstrate a mind in charge of its own marketing ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze marketing questions and issues clearly and precisely, formulate marketing information accurately, distinguish the relevant from irrelevant, recognize key questionable marketing assumptions, use key marketing concepts effectively, use marketing language in keeping with established professional usage, identify relevant competing marketing points of view, and reason carefully from clearly stated marketing premises, as well as show sensitivity to important marketing implications and consequences. They will demonstrate excellent marketing reasoning and problem-solving. Mathematics Department Students successfully completing a major in Mathematics will demonstrate a range of mathematical thinking skills and abilities. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key mathematical terms and distinctions, the ability to identify and solve fundamental mathematical problems. Their work will demonstrate a mind in charge of its own mathematical ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze mathematical questions and issues clearly and precisely, formulate mathematical information accurately, distinguish the relevant from irrelevant, recognize key questionable mathematical assumptions, use key mathematical concepts effectively, use mathematical language in keeping with established professional usage, identify relevant competing mathematical points of view, and reason carefully from clearly stated mathematical premises, as well as sensitivity to important mathematical implications and consequences. They will demonstrate excellent mathematical reasoning and problem-solving. Nursing Department Students successfully completing a major in Nursing will demonstrate a range of nursing thinking skills and abilities which they use in the acquisition of knowledge in nursing. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key nursing terms and distinctions, the ability to identify and solve fundamental nursing problems. Their work will demonstrate a mind in charge of its own nursing ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze nursing questions and issues clearly and precisely, formulate nursing information accurately, distinguish the relevant from irrelevant, recognize key questionable nursing assumptions, use key nursing concepts effectively, use nursing language in keeping with established professional usage, identify relevant competing nursing points of view, and reason carefully from clearly stated nursing premises, as well as show sensitivity to important nursing implications and consequences. They will demonstrate excellent nursing reasoning and problem-solving. Management Department Students successfully completing a major in Management will demonstrate a range of management thinking skills and abilities. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key management terms and distinctions, the ability to identify and solve fundamental management problems. Their work will demonstrate a mind in charge of its own ideas, assumptions, inferences, and intellectual processes in management. They will demonstrate the ability to analyze management questions and issues clearly and precisely, formulate information accurately, distinguish the relevant from irrelevant, recognize key questionable management assumptions, use key management concepts effectively, use management language in keeping with established professional usage, identify relevant competing management points of view, and reason carefully from clearly stated premises, as well as show sensitivity to important management implications and consequences. They will demonstrate excellent management reasoning and problem-solving. Music Department Students successfully completing a major in Music will demonstrate a range of musical thinking skills and abilities. Their work at the end of the program will be clear, precise, and well-reasoned and well-performed. They will demonstrate in their musical thinking and performance, command of the key musical terms and distinctions, the ability to identify and solve fundamental musical problems. Their work will demonstrate a mind in charge of its own musical ideas, assumptions, inferences, and intellectual processes, as well as musical performance. They will demonstrate the ability to analyze musical questions and issues clearly and precisely, formulate musical information accurately, distinguish the relevant from irrelevant, recognize key questionable musical assumptions, use key musical concepts effectively, use musical language in keeping with established professional usage, identify relevant competing musical points of view, and reason carefully from clearly stated musical premises, as well as show sensitivity to important musical implications and consequences. They will demonstrate excellent musical reasoning, problem-solving, and performance. Human Ecology Department Students successfully completing a major in Human Ecology will demonstrate a range of ecological thinking skills and abilities which they use in the acquisition of ecological knowledge. Their work at the end of the program will be clear, precise, and well-reasoned. They will demonstrate in their thinking, command of the key ecological terms and distinctions, the ability to identify and solve fundamental ecological problems. Their work will demonstrate a mind in charge of its own ecological ideas, assumptions, inferences, and intellectual processes. They will demonstrate the ability to analyze ecological questions and issues clearly and precisely, formulate ecological information accurately, distinguish the relevant from irrelevant, recognize key questionable ecological assumptions, use key ecological concepts effectively, use ecological language in keeping with established professional usage, identify relevant competing ecological points of view, and reason carefully from clearly stated ecological premises, as well as show sensitivity to important ecological implications and consequences. They will demonstrate excellent ecological reasoning and problem-solving. Physical Education Students successfully completing a major in Physical Education will demonstrate a range of physically-based thinking skills and abilities. Their work at the end of the program will be clear, precise, well-reasoned and well-performed. They will demonstrate in their thinking and performance, command of the key physical and sport terms and distinctions, the ability to identify and solve fundamental problems. inherent in physical education and performance. Their work will demonstrate a mind in charge of its own ideas, assumptions, inferences, and intellectual processes as they are integral to physical performance. They will demonstrate the ability to analyze questions and issues clearly and precisely, formulate information accurately, distinguish the relevant from irrelevant, recognize key questionable assumptions, use key concepts effectively, use physical education language in keeping with established professional usage, identify relevant competing points of view in physical education, and reason carefully from clearly stated premises, as well as show sensitivity to important implications and consequences. They will demonstrate excellent reasoning, problem-solving, and performance in a variety of domains of physical education.

Adopting The Exam Can Lead to Generic Academic Performance Standards

The International Critical Thinking Essay Examination highlights basic structures in thought and basic intellectual standards. Those structures and standards can be combined to create generic academic performance standards. One possible effect of the adoption of the exam is greater alignment between critical thinking and criteria for grades in courses. The text below defines the outlines of potential standards for the "grades" of A, B, C, D, and F. These specifications of performance levels are suggestive of common denominator academic values (tested by the exam). These specifications must, of course, be contextualized at two levels: at the department level (to capture domain-specific variations) and at the course level (to capture course-specific differences). The Grade of A The grade of A implies excellence in thinking and performance within the domain of a subject and course, along with the development of a range of knowledge acquired through the exercise of thinking skills and abilities. A-level work is, on the whole, not only clear, precise, and well-reasoned, but insightful as well. Basic terms and distinctions are learned at a level which implies insight into basic concepts and principles. The A-level student has internalized the basic intellectual standards appropriate to the assessment of his/her own work in a subject and demonstrates insight into self-evaluation. The A-level student often raises important questions and issues, analyzes key questions and problems clearly and precisely, recognizes key questionable assumptions, clarifies key concepts effectively, uses language in keeping with educated usage, frequently identifies relevant competing points of view, and demonstrates a commitment to reason carefully from clearly stated premises in the subject, as well as marked sensitivity to important implications and consequences. A-level work displays excellent reasoning and problem-solving within a field and works consistently at a high level of intellectual excellence. The Grade of B The grade of B implies sound thinking and performance within the domain of a subject and course, along with the development of a range of knowledge acquired through the exercise of thinking skills and abilities. B-level work is, on the whole, clear, precise, and well-reasoned, but does not have depth of insight. Basic terms and distinctions are learned at a level which implies comprehension of basic concepts and principles. The B-level student has internalized some of the basic intellectual standards appropriate to the assessment of his/her own work in a subject and demonstrates competence in self-evaluation. The B-level student often raises questions and issues, analyzes questions and problems clearly and precisely, recognizes some questionable assumptions, clarifies key concepts competently, typically uses language in keeping with educated usage, sometimes identifies relevant competing points of view, and demonstrates the beginnings of a commitment to reason carefully from clearly stated premises in a subject, as well as some sensitivity to important implications and consequences. B-level work displays sound reasoning and problem-solving with in a field and works consistently at a competent level of intellectual performance. The Grade of C The grade of C implies mixed thinking and performance within the domain of a subject and course, along with some development of a range of knowledge acquired through the exercise of thinking skills and abilities. C-level work is inconsistently clear, precise, and well-reasoned; moreover, it does not display depth of insight or even consistent competence. Basic terms and distinctions are learned at a level which implies the beginnings of, but inconsistent comprehension of, basic concepts and principles. The C-level student has internalized a few of the basic intellectual standards appropriate to the assessment of his/her own work in a subject but demonstrates inconsistency in self-evaluation. The C-level student sometimes raises questions and issues, sometimes analyzes questions and problems clearly and precisely, recognizes some questionable assumptions, clarifies some concepts competently, inconsistently uses language in keeping with educated usage, sometimes identifies relevant competing points of view, but does not demonstrate a clear commitment to reason carefully from clearly stated premises in a subject, nor consistent sensitivity to important implications and consequences. C-level work displays inconsistent reasoning and problem-solving within a field and works, at best, at a competent level of intellectual performance. The Grade of D The grade of D implies poor thinking and performance within the domain of a subject and course. On the whole, the student tries to get through the course by means of rote recall, attempting to acquire knowledge by memorization rather than through comprehension and understanding. The student is not developing critical thinking skills and understandings as requisite to understanding course content. D-level work represents thinking that is typically unclear, imprecise, and poorly reasoned. The student is achieving competence only on the lowest order of performance. Basic terms and distinctions are often incorrectly used and reflect a superficial or mistaken comprehension of basic concepts and principles. The D-level student has not internalized the basic intellectual standards appropriate to the assessment of his/her own work in a subject and does poorly in self-evaluation. The D-level student rarely raises questions and issues, superficially analyzes questions and problems, does not recognize his/her assumptions, only partially clarifies concepts, rarely uses language in keeping with educated usage, rarely identifies relevant competing points of view, and shows no understanding of the importance of a commitment to reason carefully from clearly stated premises in a subject. The D-level student is insensitive to important implications and consequences. D-level work displays poor reasoning and problem-solving within a field and works, at best, at a low level of intellectual performance. The Grade of F The student tries to get through the course by means of rote recall, attempting to acquire knowledge by memorization rather than through comprehension and understanding. The student is not developing critical thinking skills and understandings as requisite to understanding course content. F-level work represents thinking that is regularly unclear, imprecise, and poorly reasoned. The student is not achieving competence in his/her academic work. Basic terms and distinctions are regularly incorrectly used and reflect a mistaken comprehension of basic concepts and principles. The F-level student has not internalized the basic intellectual standards appropriate to the assessment of his/her own work in a subject and regularly misevaluates his/her own work. The F-level student does not raise questions or issues, does not analyze questions and problems, does not recognize his/her assumptions, does not clarify concepts, does not use language in keeping with educated usage, confuses his/her point of view with the TRUTH, and shows no understanding of the importance of a commitment to reason carefully from clearly stated premises in a subject. The F-level student is oblivious of important implications and consequences. F-level work displays incompetent reasoning and problem-solving within a field and consistently poor intellectual performance.

The International Critical Thinking Essay Test and Education

Education is a high word. It is not socialization. It is not training. It is not indoctrination. It is the internalization of the life of reason within a domain of purposes and problems. It is the cultivation of a variety of modes of thought. It is the development of the power of knowledge. We are educated only when we are able to think within multiple fields and have the ability to learn to think in others. It would be odd to say that a person was well educated but not able to figure out the purposes, the questions, the information, the key concepts, the point of view, and so forth of their own thinking and that of others. In a like manner, it would be odd to say of persons that they reasoned well, except for their tendency to be unclear, inaccurate, imprecise, irrelevant, superficial, narrow-minded, illogical, trivial, and unfair. The International Critical Thinking Essay Exam focuses on what is the substantive core of education.

Back to top

The Ennis-Weir Critical Thinking Essay Test

The Ennis-Weir Critical Thinking Essay Test is a general test of critical thinking ability in the context of argumentation. In this test, a complex argument is presented to the test taker, who is asked to formulate another complex argument in response to the first.

This can be used both as a test of writing mastery as well as a teaching device for critical thinking. To access the resource, visit https://www.academia.edu/1847582/The_Ennis_Weir_Critical_Thinking_Essay_Test_An_Instrument_for_Teaching_and_Testing.

international test of critical thinking

Open-ended questions

High school and college studentsopen

Ennis, R. H., & Weir, E. (1989). The Ennis-Weir critical thinking essay test: Test, manual, criteria, scoring sheet : an instrument for teaching and testing. Cheltenham, Vic: Hawker Brownlow. Hollis, H., Rachitskiy, R., van der Leer, L., and Elder, L. (in press) Validity and reliability testing of the International Critical Thinking Essay Test form A (ICTET-A). Psychological Reports

For more guidance on measuring student learning and best practices in adapting measurement tools to your contexts, check out the Portal page on Monitoring and Evaluation . You can also contact Alvin Vista (Knowledge Lead, Student Outcomes) and Robbie Dean (Director of Research) for specific questions.

  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, performance assessment of critical thinking: conceptualization, design, and implementation.

international test of critical thinking

  • 1 Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, United States
  • 2 Graduate School of Education, Stanford University, Stanford, CA, United States
  • 3 Department of Business and Economics Education, Johannes Gutenberg University, Mainz, Germany

Enhancing students’ critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves dealing with dilemmas involving ambiguity or conflicts among principles and contradictory information. We argue that performance assessment provides the most realistic—and most credible—approach to measuring CT. From this conceptualization and construct definition, we describe one possible framework for building performance assessments of CT with attention to extended performance tasks within the assessment system. The framework is a product of an ongoing, collaborative effort, the International Performance Assessment of Learning (iPAL). The framework comprises four main aspects: (1) The storyline describes a carefully curated version of a complex, real-world situation. (2) The challenge frames the task to be accomplished (3). A portfolio of documents in a range of formats is drawn from multiple sources chosen to have specific characteristics. (4) The scoring rubric comprises a set of scales each linked to a facet of the construct. We discuss a number of use cases, as well as the challenges that arise with the use and valid interpretation of performance assessments. The final section presents elements of the iPAL research program that involve various refinements and extensions of the assessment framework, a number of empirical studies, along with linkages to current work in online reading and information processing.

Introduction

In their mission statements, most colleges declare that a principal goal is to develop students’ higher-order cognitive skills such as critical thinking (CT) and reasoning (e.g., Shavelson, 2010 ; Hyytinen et al., 2019 ). The importance of CT is echoed by business leaders ( Association of American Colleges and Universities [AACU], 2018 ), as well as by college faculty (for curricular analyses in Germany, see e.g., Zlatkin-Troitschanskaia et al., 2018 ). Indeed, in the 2019 administration of the Faculty Survey of Student Engagement (FSSE), 93% of faculty reported that they “very much” or “quite a bit” structure their courses to support student development with respect to thinking critically and analytically. In a listing of 21st century skills, CT was the most highly ranked among FSSE respondents ( Indiana University, 2019 ). Nevertheless, there is considerable evidence that many college students do not develop these skills to a satisfactory standard ( Arum and Roksa, 2011 ; Shavelson et al., 2019 ; Zlatkin-Troitschanskaia et al., 2019 ). This state of affairs represents a serious challenge to higher education – and to society at large.

In view of the importance of CT, as well as evidence of substantial variation in its development during college, its proper measurement is essential to tracking progress in skill development and to providing useful feedback to both teachers and learners. Feedback can help focus students’ attention on key skill areas in need of improvement, and provide insight to teachers on choices of pedagogical strategies and time allocation. Moreover, comparative studies at the program and institutional level can inform higher education leaders and policy makers.

The conceptualization and definition of CT presented here is closely related to models of information processing and online reasoning, the skills that are the focus of this special issue. These two skills are especially germane to the learning environments that college students experience today when much of their academic work is done online. Ideally, students should be capable of more than naïve Internet search, followed by copy-and-paste (e.g., McGrew et al., 2017 ); rather, for example, they should be able to critically evaluate both sources of evidence and the quality of the evidence itself in light of a given purpose ( Leu et al., 2020 ).

In this paper, we present a systematic approach to conceptualizing CT. From that conceptualization and construct definition, we present one possible framework for building performance assessments of CT with particular attention to extended performance tasks within the test environment. The penultimate section discusses some of the challenges that arise with the use and valid interpretation of performance assessment scores. We conclude the paper with a section on future perspectives in an emerging field of research – the iPAL program.

Conceptual Foundations, Definition and Measurement of Critical Thinking

In this section, we briefly review the concept of CT and its definition. In accordance with the principles of evidence-centered design (ECD; Mislevy et al., 2003 ), the conceptualization drives the measurement of the construct; that is, implementation of ECD directly links aspects of the assessment framework to specific facets of the construct. We then argue that performance assessments designed in accordance with such an assessment framework provide the most realistic—and most credible—approach to measuring CT. The section concludes with a sketch of an approach to CT measurement grounded in performance assessment .

Concept and Definition of Critical Thinking

Taxonomies of 21st century skills ( Pellegrino and Hilton, 2012 ) abound, and it is neither surprising that CT appears in most taxonomies of learning, nor that there are many different approaches to defining and operationalizing the construct of CT. There is, however, general agreement that CT is a multifaceted construct ( Liu et al., 2014 ). Liu et al. (2014) identified five key facets of CT: (i) evaluating evidence and the use of evidence; (ii) analyzing arguments; (iii) understanding implications and consequences; (iv) developing sound arguments; and (v) understanding causation and explanation.

There is empirical support for these facets from college faculty. A 2016–2017 survey conducted by the Higher Education Research Institute (HERI) at the University of California, Los Angeles found that a substantial majority of faculty respondents “frequently” encouraged students to: (i) evaluate the quality or reliability of the information they receive; (ii) recognize biases that affect their thinking; (iii) analyze multiple sources of information before coming to a conclusion; and (iv) support their opinions with a logical argument ( Stolzenberg et al., 2019 ).

There is general agreement that CT involves the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion (e.g., Erwin and Sebrell, 2003 ; Kosslyn and Nelson, 2017 ; Shavelson et al., 2018 ). We further suggest that CT includes dealing with dilemmas of ambiguity or conflict among principles and contradictory information ( Oser and Biedermann, 2020 ).

Importantly, Oser and Biedermann (2020) posit that CT can be manifested at three levels. The first level, Critical Analysis , is the most complex of the three levels. Critical Analysis requires both knowledge in a specific discipline (conceptual) and procedural analytical (deduction, inclusion, etc.) knowledge. The second level is Critical Reflection , which involves more generic skills “… necessary for every responsible member of a society” (p. 90). It is “a basic attitude that must be taken into consideration if (new) information is questioned to be true or false, reliable or not reliable, moral or immoral etc.” (p. 90). To engage in Critical Reflection, one needs not only apply analytic reasoning, but also adopt a reflective stance toward the political, social, and other consequences of choosing a course of action. It also involves analyzing the potential motives of various actors involved in the dilemma of interest. The third level, Critical Alertness , involves questioning one’s own or others’ thinking from a skeptical point of view.

Wheeler and Haertel (1993) categorized higher-order skills, such as CT, into two types: (i) when solving problems and making decisions in professional and everyday life, for instance, related to civic affairs and the environment; and (ii) in situations where various mental processes (e.g., comparing, evaluating, and justifying) are developed through formal instruction, usually in a discipline. Hence, in both settings, individuals must confront situations that typically involve a problematic event, contradictory information, and possibly conflicting principles. Indeed, there is an ongoing debate concerning whether CT should be evaluated using generic or discipline-based assessments ( Nagel et al., 2020 ). Whether CT skills are conceptualized as generic or discipline-specific has implications for how they are assessed and how they are incorporated into the classroom.

In the iPAL project, CT is characterized as a multifaceted construct that comprises conceptualizing, analyzing, drawing inferences or synthesizing information, evaluating claims, and applying the results of these reasoning processes to various purposes (e.g., solve a problem, decide on a course of action, find an answer to a given question or reach a conclusion) ( Shavelson et al., 2019 ). In the course of carrying out a CT task, an individual typically engages in activities such as specifying or clarifying a problem; deciding what information is relevant to the problem; evaluating the trustworthiness of information; avoiding judgmental errors based on “fast thinking”; avoiding biases and stereotypes; recognizing different perspectives and how they can reframe a situation; considering the consequences of alternative courses of actions; and communicating clearly and concisely decisions and actions. The order in which activities are carried out can vary among individuals and the processes can be non-linear and reciprocal.

In this article, we focus on generic CT skills. The importance of these skills derives not only from their utility in academic and professional settings, but also the many situations involving challenging moral and ethical issues – often framed in terms of conflicting principles and/or interests – to which individuals have to apply these skills ( Kegan, 1994 ; Tessier-Lavigne, 2020 ). Conflicts and dilemmas are ubiquitous in the contexts in which adults find themselves: work, family, civil society. Moreover, to remain viable in the global economic environment – one characterized by increased competition and advances in second generation artificial intelligence (AI) – today’s college students will need to continually develop and leverage their CT skills. Ideally, colleges offer a supportive environment in which students can develop and practice effective approaches to reasoning about and acting in learning, professional and everyday situations.

Measurement of Critical Thinking

Critical thinking is a multifaceted construct that poses many challenges to those who would develop relevant and valid assessments. For those interested in current approaches to the measurement of CT that are not the focus of this paper, consult Zlatkin-Troitschanskaia et al. (2018) .

In this paper, we have singled out performance assessment as it offers important advantages to measuring CT. Extant tests of CT typically employ response formats such as forced-choice or short-answer, and scenario-based tasks (for an overview, see Liu et al., 2014 ). They all suffer from moderate to severe construct underrepresentation; that is, they fail to capture important facets of the CT construct such as perspective taking and communication. High fidelity performance tasks are viewed as more authentic in that they provide a problem context and require responses that are more similar to what individuals confront in the real world than what is offered by traditional multiple-choice items ( Messick, 1994 ; Braun, 2019 ). This greater verisimilitude promises higher levels of construct representation and lower levels of construct-irrelevant variance. Such performance tasks have the capacity to measure facets of CT that are imperfectly assessed, if at all, using traditional assessments ( Lane and Stone, 2006 ; Braun, 2019 ; Shavelson et al., 2019 ). However, these assertions must be empirically validated, and the measures should be subjected to psychometric analyses. Evidence of the reliability, validity, and interpretative challenges of performance assessment (PA) are extensively detailed in Davey et al. (2015) .

We adopt the following definition of performance assessment:

A performance assessment (sometimes called a work sample when assessing job performance) … is an activity or set of activities that requires test takers, either individually or in groups, to generate products or performances in response to a complex, most often real-world task. These products and performances provide observable evidence bearing on test takers’ knowledge, skills, and abilities—their competencies—in completing the assessment ( Davey et al., 2015 , p. 10).

A performance assessment typically includes an extended performance task and short constructed-response and selected-response (i.e., multiple-choice) tasks (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). In this paper, we refer to both individual performance- and constructed-response tasks as performance tasks (PT) (For an example, see Table 1 in section “iPAL Assessment Framework”).

www.frontiersin.org

Table 1. The iPAL assessment framework.

An Approach to Performance Assessment of Critical Thinking: The iPAL Program

The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1 ). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and practice in measuring CT with performance tasks ( Shavelson et al., 2018 ). In this section, we present iPAL’s assessment framework as the basis of measuring CT, with examples along the way.

iPAL Background

The iPAL assessment framework builds on the Council of Aid to Education’s Collegiate Learning Assessment (CLA). The CLA was designed to measure cross-disciplinary, generic competencies, such as CT, analytic reasoning, problem solving, and written communication ( Klein et al., 2007 ; Shavelson, 2010 ). Ideally, each PA contained an extended PT (e.g., examining a range of evidential materials related to the crash of an aircraft) and two short PT’s: one in which students either critique an argument or provide a solution in response to a real-world societal issue.

Motivated by considerations of adequate reliability, in 2012, the CLA was later modified to create the CLA+. The CLA+ includes two subtests: a PT and a 25-item Selected Response Question (SRQ) section. The PT presents a document or problem statement and an assignment based on that document which elicits an open-ended response. The CLA+ added the SRQ section (which is not linked substantively to the PT scenario) to increase the number of student responses to obtain more reliable estimates of performance at the student-level than could be achieved with a single PT ( Zahner, 2013 ; Davey et al., 2015 ).

iPAL Assessment Framework

Methodological foundations.

The iPAL framework evolved from the Collegiate Learning Assessment developed by Klein et al. (2007) . It was also informed by the results from the AHELO pilot study ( Organisation for Economic Co-operation and Development [OECD], 2012 , 2013 ), as well as the KoKoHs research program in Germany (for an overview see, Zlatkin-Troitschanskaia et al., 2017 , 2020 ). The ongoing refinement of the iPAL framework has been guided in part by the principles of Evidence Centered Design (ECD) ( Mislevy et al., 2003 ; Mislevy and Haertel, 2006 ; Haertel and Fujii, 2017 ).

In educational measurement, an assessment framework plays a critical intermediary role between the theoretical formulation of the construct and the development of the assessment instrument containing tasks (or items) intended to elicit evidence with respect to that construct ( Mislevy et al., 2003 ). Builders of the assessment framework draw on the construct theory and operationalize it in a way that provides explicit guidance to PT’s developers. Thus, the framework should reflect the relevant facets of the construct, where relevance is determined by substantive theory or an appropriate alternative such as behavioral samples from real-world situations of interest (criterion-sampling; McClelland, 1973 ), as well as the intended use(s) (for an example, see Shavelson et al., 2019 ). By following the requirements and guidelines embodied in the framework, instrument developers strengthen the claim of construct validity for the instrument ( Messick, 1994 ).

An assessment framework can be specified at different levels of granularity: an assessment battery (“omnibus” assessment, for an example see below), a single performance task, or a specific component of an assessment ( Shavelson, 2010 ; Davey et al., 2015 ). In the iPAL program, a performance assessment comprises one or more extended performance tasks and additional selected-response and short constructed-response items. The focus of the framework specified below is on a single PT intended to elicit evidence with respect to some facets of CT, such as the evaluation of the trustworthiness of the documents provided and the capacity to address conflicts of principles.

From the ECD perspective, an assessment is an instrument for generating information to support an evidentiary argument and, therefore, the intended inferences (claims) must guide each stage of the design process. The construct of interest is operationalized through the Student Model , which represents the target knowledge, skills, and abilities, as well as the relationships among them. The student model should also make explicit the assumptions regarding student competencies in foundational skills or content knowledge. The Task Model specifies the features of the problems or items posed to the respondent, with the goal of eliciting the evidence desired. The assessment framework also describes the collection of task models comprising the instrument, with considerations of construct validity, various psychometric characteristics (e.g., reliability) and practical constraints (e.g., testing time and cost). The student model provides grounds for evidence of validity, especially cognitive validity; namely, that the students are thinking critically in responding to the task(s).

In the present context, the target construct (CT) is the competence of individuals to think critically, which entails solving complex, real-world problems, and clearly communicating their conclusions or recommendations for action based on trustworthy, relevant and unbiased information. The situations, drawn from actual events, are challenging and may arise in many possible settings. In contrast to more reductionist approaches to assessment development, the iPAL approach and framework rests on the assumption that properly addressing these situational demands requires the application of a constellation of CT skills appropriate to the particular task presented (e.g., Shavelson, 2010 , 2013 ). For a PT, the assessment framework must also specify the rubric by which the responses will be evaluated. The rubric must be properly linked to the target construct so that the resulting score profile constitutes evidence that is both relevant and interpretable in terms of the student model (for an example, see Zlatkin-Troitschanskaia et al., 2019 ).

iPAL Task Framework

The iPAL ‘omnibus’ framework comprises four main aspects: A storyline , a challenge , a document library , and a scoring rubric . Table 1 displays these aspects, brief descriptions of each, and the corresponding examples drawn from an iPAL performance assessment (Version adapted from original in Hyytinen and Toom, 2019 ). Storylines are drawn from various domains; for example, the worlds of business, public policy, civics, medicine, and family. They often involve moral and/or ethical considerations. Deriving an appropriate storyline from a real-world situation requires careful consideration of which features are to be kept in toto , which adapted for purposes of the assessment, and which to be discarded. Framing the challenge demands care in wording so that there is minimal ambiguity in what is required of the respondent. The difficulty of the challenge depends, in large part, on the nature and extent of the information provided in the document library , the amount of scaffolding included, as well as the scope of the required response. The amount of information and the scope of the challenge should be commensurate with the amount of time available. As is evident from the table, the characteristics of the documents in the library are intended to elicit responses related to facets of CT. For example, with regard to bias, the information provided is intended to play to judgmental errors due to fast thinking and/or motivational reasoning. Ideally, the situation should accommodate multiple solutions of varying degrees of merit.

The dimensions of the scoring rubric are derived from the Task Model and Student Model ( Mislevy et al., 2003 ) and signal which features are to be extracted from the response and indicate how they are to be evaluated. There should be a direct link between the evaluation of the evidence and the claims that are made with respect to the key features of the task model and student model . More specifically, the task model specifies the various manipulations embodied in the PA and so informs scoring, while the student model specifies the capacities students employ in more or less effectively responding to the tasks. The score scales for each of the five facets of CT (see section “Concept and Definition of Critical Thinking”) can be specified using appropriate behavioral anchors (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). Of particular importance is the evaluation of the response with respect to the last dimension of the scoring rubric; namely, the overall coherence and persuasiveness of the argument, building on the explicit or implicit characteristics related to the first five dimensions. The scoring process must be monitored carefully to ensure that (trained) raters are judging each response based on the same types of features and evaluation criteria ( Braun, 2019 ) as indicated by interrater agreement coefficients.

The scoring rubric of the iPAL omnibus framework can be modified for specific tasks ( Lane and Stone, 2006 ). This generic rubric helps ensure consistency across rubrics for different storylines. For example, Zlatkin-Troitschanskaia et al. (2019 , p. 473) used the following scoring scheme:

Based on our construct definition of CT and its four dimensions: (D1-Info) recognizing and evaluating information, (D2-Decision) recognizing and evaluating arguments and making decisions, (D3-Conseq) recognizing and evaluating the consequences of decisions, and (D4-Writing), we developed a corresponding analytic dimensional scoring … The students’ performance is evaluated along the four dimensions, which in turn are subdivided into a total of 23 indicators as (sub)categories of CT … For each dimension, we sought detailed evidence in students’ responses for the indicators and scored them on a six-point Likert-type scale. In order to reduce judgment distortions, an elaborate procedure of ‘behaviorally anchored rating scales’ (Smith and Kendall, 1963) was applied by assigning concrete behavioral expectations to certain scale points (Bernardin et al., 1976). To this end, we defined the scale levels by short descriptions of typical behavior and anchored them with concrete examples. … We trained four raters in 1 day using a specially developed training course to evaluate students’ performance along the 23 indicators clustered into four dimensions (for a description of the rater training, see Klotzer, 2018).

Shavelson et al. (2019) examined the interrater agreement of the scoring scheme developed by Zlatkin-Troitschanskaia et al. (2019) and “found that with 23 items and 2 raters the generalizability (“reliability”) coefficient for total scores to be 0.74 (with 4 raters, 0.84)” ( Shavelson et al., 2019 , p. 15). In the study by Zlatkin-Troitschanskaia et al. (2019 , p. 478) three score profiles were identified (low-, middle-, and high-performer) for students. Proper interpretation of such profiles requires care. For example, there may be multiple possible explanations for low scores such as poor CT skills, a lack of a disposition to engage with the challenge, or the two attributes jointly. These alternative explanations for student performance can potentially pose a threat to the evidentiary argument. In this case, auxiliary information may be available to aid in resolving the ambiguity. For example, student responses to selected- and short-constructed-response items in the PA can provide relevant information about the levels of the different skills possessed by the student. When sufficient data are available, the scores can be modeled statistically and/or qualitatively in such a way as to bring them to bear on the technical quality or interpretability of the claims of the assessment: reliability, validity, and utility evidence ( Davey et al., 2015 ; Zlatkin-Troitschanskaia et al., 2019 ). These kinds of concerns are less critical when PT’s are used in classroom settings. The instructor can draw on other sources of evidence, including direct discussion with the student.

Use of iPAL Performance Assessments in Educational Practice: Evidence From Preliminary Validation Studies

The assessment framework described here supports the development of a PT in a general setting. Many modifications are possible and, indeed, desirable. If the PT is to be more deeply embedded in a certain discipline (e.g., economics, law, or medicine), for example, then the framework must specify characteristics of the narrative and the complementary documents as to the breadth and depth of disciplinary knowledge that is represented.

At present, preliminary field trials employing the omnibus framework (i.e., a full set of documents) indicated that 60 min was generally an inadequate amount of time for students to engage with the full set of complementary documents and to craft a complete response to the challenge (for an example, see Shavelson et al., 2019 ). Accordingly, it would be helpful to develop modified frameworks for PT’s that require substantially less time. For an example, see a short performance assessment of civic online reasoning, requiring response times from 10 to 50 min ( Wineburg et al., 2016 ). Such assessment frameworks could be derived from the omnibus framework by focusing on a reduced number of facets of CT, and specifying the characteristics of the complementary documents to be included – or, perhaps, choices among sets of documents. In principle, one could build a ‘family’ of PT’s, each using the same (or nearly the same) storyline and a subset of the full collection of complementary documents.

Paul and Elder (2007) argue that the goal of CT assessments should be to provide faculty with important information about how well their instruction supports the development of students’ CT. In that spirit, the full family of PT’s could represent all facets of the construct while affording instructors and students more specific insights on strengths and weaknesses with respect to particular facets of CT. Moreover, the framework should be expanded to include the design of a set of short answer and/or multiple choice items to accompany the PT. Ideally, these additional items would be based on the same narrative as the PT to collect more nuanced information on students’ precursor skills such as reading comprehension, while enhancing the overall reliability of the assessment. Areas where students are under-prepared could be addressed before, or even in parallel with the development of the focal CT skills. The parallel approach follows the co-requisite model of developmental education. In other settings (e.g., for summative assessment), these complementary items would be administered after the PT to augment the evidence in relation to the various claims. The full PT taking 90 min or more could serve as a capstone assessment.

As we transition from simply delivering paper-based assessments by computer to taking full advantage of the affordances of a digital platform, we should learn from the hard-won lessons of the past so that we can make swifter progress with fewer missteps. In that regard, we must take validity as the touchstone – assessment design, development and deployment must all be tightly linked to the operational definition of the CT construct. Considerations of reliability and practicality come into play with various use cases that highlight different purposes for the assessment (for future perspectives, see next section).

The iPAL assessment framework represents a feasible compromise between commercial, standardized assessments of CT (e.g., Liu et al., 2014 ), on the one hand, and, on the other, freedom for individual faculty to develop assessment tasks according to idiosyncratic models. It imposes a degree of standardization on both task development and scoring, while still allowing some flexibility for faculty to tailor the assessment to meet their unique needs. In so doing, it addresses a key weakness of the AAC&U’s VALUE initiative 2 (retrieved 5/7/2020) that has achieved wide acceptance among United States colleges.

The VALUE initiative has produced generic scoring rubrics for 15 domains including CT, problem-solving and written communication. A rubric for a particular skill domain (e.g., critical thinking) has five to six dimensions with four ordered performance levels for each dimension (1 = lowest, 4 = highest). The performance levels are accompanied by language that is intended to clearly differentiate among levels. 3 Faculty are asked to submit student work products from a senior level course that is intended to yield evidence with respect to student learning outcomes in a particular domain and that, they believe, can elicit performances at the highest level. The collection of work products is then graded by faculty from other institutions who have been trained to apply the rubrics.

A principal difficulty is that there is neither a common framework to guide the design of the challenge, nor any control on task complexity and difficulty. Consequently, there is substantial heterogeneity in the quality and evidential value of the submitted responses. This also causes difficulties with task scoring and inter-rater reliability. Shavelson et al. (2009) discuss some of the problems arising with non-standardized collections of student work.

In this context, one advantage of the iPAL framework is that it can provide valuable guidance and an explicit structure for faculty in developing performance tasks for both instruction and formative assessment. When faculty design assessments, their focus is typically on content coverage rather than other potentially important characteristics, such as the degree of construct representation and the adequacy of their scoring procedures ( Braun, 2019 ).

Concluding Reflections

Challenges to interpretation and implementation.

Performance tasks such as those generated by iPAL are attractive instruments for assessing CT skills (e.g., Shavelson, 2010 ; Shavelson et al., 2019 ). The attraction mainly rests on the assumption that elaborated PT’s are more authentic (direct) and more completely capture facets of the target construct (i.e., possess greater construct representation) than the widely used selected-response tests. However, as Messick (1994) noted authenticity is a “promissory note” that must be redeemed with empirical research. In practice, there are trade-offs among authenticity, construct validity, and psychometric quality such as reliability ( Davey et al., 2015 ).

One reason for Messick (1994) caution is that authenticity does not guarantee construct validity. The latter must be established by drawing on multiple sources of evidence ( American Educational Research Association et al., 2014 ). Following the ECD principles in designing and developing the PT, as well as the associated scoring rubrics, constitutes an important type of evidence. Further, as Leighton (2019) argues, response process data (“cognitive validity”) is needed to validate claims regarding the cognitive complexity of PT’s. Relevant data can be obtained through cognitive laboratory studies involving methods such as think aloud protocols or eye-tracking. Although time-consuming and expensive, such studies can yield not only evidence of validity, but also valuable information to guide refinements of the PT.

Going forward, iPAL PT’s must be subjected to validation studies as recommended in the Standards for Psychological and Educational Testing by American Educational Research Association et al. (2014) . With a particular focus on the criterion “relationships to other variables,” a framework should include assumptions about the theoretically expected relationships among the indicators assessed by the PT, as well as the indicators’ relationships to external variables such as intelligence or prior (task-relevant) knowledge.

Complementing the necessity of evaluating construct validity, there is the need to consider potential sources of construct-irrelevant variance (CIV). One pertains to student motivation, which is typically greater when the stakes are higher. If students are not motivated, then their performance is likely to be impacted by factors unrelated to their (construct-relevant) ability ( Lane and Stone, 2006 ; Braun et al., 2011 ; Shavelson, 2013 ). Differential motivation across groups can also bias comparisons. Student motivation might be enhanced if the PT is administered in the context of a course with the promise of generating useful feedback on students’ skill profiles.

Construct-irrelevant variance can also occur when students are not equally prepared for the format of the PT or fully appreciate the response requirements. This source of CIV could be alleviated by providing students with practice PT’s. Finally, the use of novel forms of documentation, such as those from the Internet, can potentially introduce CIV due to differential familiarity with forms of representation or contents. Interestingly, this suggests that there may be a conflict between enhancing construct representation and reducing CIV.

Another potential source of CIV is related to response evaluation. Even with training, human raters can vary in accuracy and usage of the full score range. In addition, raters may attend to features of responses that are unrelated to the target construct, such as the length of the students’ responses or the frequency of grammatical errors ( Lane and Stone, 2006 ). Some of these sources of variance could be addressed in an online environment, where word processing software could alert students to potential grammatical and spelling errors before they submit their final work product.

Performance tasks generally take longer to administer and are more costly than traditional assessments, making it more difficult to reliably measure student performance ( Messick, 1994 ; Davey et al., 2015 ). Indeed, it is well known that more than one performance task is needed to obtain high reliability ( Shavelson, 2013 ). This is due to both student-task interactions and variability in scoring. Sources of student-task interactions are differential familiarity with the topic ( Hyytinen and Toom, 2019 ) and differential motivation to engage with the task. The level of reliability required, however, depends on the context of use. For use in formative assessment as part of an instructional program, reliability can be lower than use for summative purposes. In the former case, other types of evidence are generally available to support interpretation and guide pedagogical decisions. Further studies are needed to obtain estimates of reliability in typical instructional settings.

With sufficient data, more sophisticated psychometric analyses become possible. One challenge is that the assumption of unidimensionality required for many psychometric models might be untenable for performance tasks ( Davey et al., 2015 ). Davey et al. (2015) provide the example of a mathematics assessment that requires students to demonstrate not only their mathematics skills but also their written communication skills. Although the iPAL framework does not explicitly address students’ reading comprehension and organization skills, students will likely need to call on these abilities to accomplish the task. Moreover, as the operational definition of CT makes evident, the student must not only deploy several skills in responding to the challenge of the PT, but also carry out component tasks in sequence. The former requirement strongly indicates the need for a multi-dimensional IRT model, while the latter suggests that the usual assumption of local item independence may well be problematic ( Lane and Stone, 2006 ). At the same time, the analytic scoring rubric should facilitate the use of latent class analysis to partition data from large groups into meaningful categories ( Zlatkin-Troitschanskaia et al., 2019 ).

Future Perspectives

Although the iPAL consortium has made substantial progress in the assessment of CT, much remains to be done. Further refinement of existing PT’s and their adaptation to different languages and cultures must continue. To this point, there are a number of examples: The refugee crisis PT (cited in Table 1 ) was translated and adapted from Finnish to US English and then to Colombian Spanish. A PT concerning kidney transplants was translated and adapted from German to US English. Finally, two PT’s based on ‘legacy admissions’ to US colleges were translated and adapted to Colombian Spanish.

With respect to data collection, there is a need for sufficient data to support psychometric analysis of student responses, especially the relationships among the different components of the scoring rubric, as this would inform both task development and response evaluation ( Zlatkin-Troitschanskaia et al., 2019 ). In addition, more intensive study of response processes through cognitive laboratories and the like are needed to strengthen the evidential argument for construct validity ( Leighton, 2019 ). We are currently conducting empirical studies, collecting data on both iPAL PT’s and other measures of CT. These studies will provide evidence of convergent and discriminant validity.

At the same time, efforts should be directed at further development to support different ways CT PT’s might be used—i.e., use cases—especially those that call for formative use of PT’s. Incorporating formative assessment into courses can plausibly be expected to improve students’ competency acquisition ( Zlatkin-Troitschanskaia et al., 2017 ). With suitable choices of storylines, appropriate combinations of (modified) PT’s, supplemented by short-answer and multiple-choice items, could be interwoven into ordinary classroom activities. The supplementary items may be completely separate from the PT’s (as is the case with the CLA+), loosely coupled with the PT’s (as in drawing on the same storyline), or tightly linked to the PT’s (as in requiring elaboration of certain components of the response to the PT).

As an alternative to such integration, stand-alone modules could be embedded in courses to yield evidence of students’ generic CT skills. Core curriculum courses or general education courses offer ideal settings for embedding performance assessments. If these assessments were administered to a representative sample of students in each cohort over their years in college, the results would yield important information on the development of CT skills at a population level. For another example, these PA’s could be used to assess the competence profiles of students entering Bachelor’s or graduate-level programs as a basis for more targeted instructional support.

Thus, in considering different use cases for the assessment of CT, it is evident that several modifications of the iPAL omnibus assessment framework are needed. As noted earlier, assessments built according to this framework are demanding with respect to the extensive preliminary work required by a task and the time required to properly complete it. Thus, it would be helpful to have modified versions of the framework, focusing on one or two facets of the CT construct and calling for a smaller number of supplementary documents. The challenge to the student should be suitably reduced.

Some members of the iPAL collaborative have developed PT’s that are embedded in disciplines such as engineering, law and education ( Crump et al., 2019 ; for teacher education examples, see Jeschke et al., 2019 ). These are proving to be of great interest to various stakeholders and further development is likely. Consequently, it is essential that an appropriate assessment framework be established and implemented. It is both a conceptual and an empirical question as to whether a single framework can guide development in different domains.

Performance Assessment in Online Learning Environment

Over the last 15 years, increasing amounts of time in both college and work are spent using computers and other electronic devices. This has led to formulation of models for the new literacies that attempt to capture some key characteristics of these activities. A prominent example is a model proposed by Leu et al. (2020) . The model frames online reading as a process of problem-based inquiry that calls on five practices to occur during online research and comprehension:

1. Reading to identify important questions,

2. Reading to locate information,

3. Reading to critically evaluate information,

4. Reading to synthesize online information, and

5. Reading and writing to communicate online information.

The parallels with the iPAL definition of CT are evident and suggest there may be benefits to closer links between these two lines of research. For example, a report by Leu et al. (2014) describes empirical studies comparing assessments of online reading using either open-ended or multiple-choice response formats.

The iPAL consortium has begun to take advantage of the affordances of the online environment (for examples, see Schmidt et al. and Nagel et al. in this special issue). Most obviously, Supplementary Materials can now include archival photographs, audio recordings, or videos. Additional tasks might include the online search for relevant documents, though this would add considerably to the time demands. This online search could occur within a simulated Internet environment, as is the case for the IEA’s ePIRLS assessment ( Mullis et al., 2017 ).

The prospect of having access to a wealth of materials that can add to task authenticity is exciting. Yet it can also add ambiguity and information overload. Increased authenticity, then, should be weighed against validity concerns and the time required to absorb the content in these materials. Modifications of the design framework and extensive empirical testing will be required to decide on appropriate trade-offs. A related possibility is to employ some of these materials in short-answer (or even selected-response) items that supplement the main PT. Response formats could include highlighting text or using a drag-and-drop menu to construct a response. Students’ responses could be automatically scored, thereby containing costs. With automated scoring, feedback to students and faculty, including suggestions for next steps in strengthening CT skills, could also be provided without adding to faculty workload. Therefore, taking advantage of the online environment to incorporate new types of supplementary documents should be a high priority and, perhaps, to introduce new response formats as well. Finally, further investigation of the overlap between this formulation of CT and the characterization of online reading promulgated by Leu et al. (2020) is a promising direction to pursue.

Data Availability Statement

All datasets generated for this study are included in the article/supplementary material.

Author Contributions

HB wrote the article. RS, OZ-T, and KB were involved in the preparation and revision of the article and co-wrote the manuscript. All authors contributed to the article and approved the submitted version.

This study was funded in part by the Spencer Foundation (Grant No. #201700123).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank all the researchers who have participated in the iPAL program.

  • ^ https://www.ipal-rd.com/
  • ^ https://www.aacu.org/value
  • ^ When test results are reported by means of substantively defined categories, the scoring is termed “criterion-referenced”. This is, in contrast to results, reported as percentiles; such scoring is termed “norm-referenced”.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for Educational and Psychological Testing. Washington, D.C: American Educational Research Association.

Google Scholar

Arum, R., and Roksa, J. (2011). Academically Adrift: Limited Learning on College Campuses. Chicago, IL: University of Chicago Press.

Association of American Colleges and Universities (n.d.). VALUE: What is value?. Available online at:: https://www.aacu.org/value (accessed May 7, 2020).

Association of American Colleges and Universities [AACU] (2018). Fulfilling the American Dream: Liberal Education and the Future of Work. Available online at:: https://www.aacu.org/research/2018-future-of-work (accessed May 1, 2020).

Braun, H. (2019). Performance assessment and standardization in higher education: a problematic conjunction? Br. J. Educ. Psychol. 89, 429–440. doi: 10.1111/bjep.12274

PubMed Abstract | CrossRef Full Text | Google Scholar

Braun, H. I., Kirsch, I., and Yamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teach. Coll. Rec. 113, 2309–2344.

Crump, N., Sepulveda, C., Fajardo, A., and Aguilera, A. (2019). Systematization of performance tests in critical thinking: an interdisciplinary construction experience. Rev. Estud. Educ. 2, 17–47.

Davey, T., Ferrara, S., Shavelson, R., Holland, P., Webb, N., and Wise, L. (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Washington, DC: Center for K-12 Assessment & Performance Management, Educational Testing Service.

Erwin, T. D., and Sebrell, K. W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. J. Gen. Educ. 52, 50–70. doi: 10.1353/jge.2003.0019

CrossRef Full Text | Google Scholar

Haertel, G. D., and Fujii, R. (2017). “Evidence-centered design and postsecondary assessment,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 313–339. doi: 10.4324/9781315709307-26

Hyytinen, H., and Toom, A. (2019). Developing a performance assessment task in the Finnish higher education context: conceptual and empirical insights. Br. J. Educ. Psychol. 89, 551–563. doi: 10.1111/bjep.12283

Hyytinen, H., Toom, A., and Shavelson, R. J. (2019). “Enhancing scientific thinking through the development of critical thinking in higher education,” in Redefining Scientific Thinking for Higher Education: Higher-Order Thinking, Evidence-Based Reasoning and Research Skills , eds M. Murtonen and K. Balloo (London: Palgrave MacMillan).

Indiana University (2019). FSSE 2019 Frequencies: FSSE 2019 Aggregate. Available online at:: http://fsse.indiana.edu/pdf/FSSE_IR_2019/summary_tables/FSSE19_Frequencies_(FSSE_2019).pdf (accessed May 1, 2020).

Jeschke, C., Kuhn, C., Lindmeier, A., Zlatkin-Troitschanskaia, O., Saas, H., and Heinze, A. (2019). Performance assessment to investigate the domain specificity of instructional skills among pre-service and in-service teachers of mathematics and economics. Br. J. Educ. Psychol. 89, 538–550. doi: 10.1111/bjep.12277

Kegan, R. (1994). In Over Our Heads: The Mental Demands of Modern Life. Cambridge, MA: Harvard University Press.

Klein, S., Benjamin, R., Shavelson, R., and Bolus, R. (2007). The collegiate learning assessment: facts and fantasies. Eval. Rev. 31, 415–439. doi: 10.1177/0193841x07303318

Kosslyn, S. M., and Nelson, B. (2017). Building the Intentional University: Minerva and the Future of Higher Education. Cambridge, MAL: The MIT Press.

Lane, S., and Stone, C. A. (2006). “Performance assessment,” in Educational Measurement , 4th Edn, ed. R. L. Brennan (Lanham, MA: Rowman & Littlefield Publishers), 387–432.

Leighton, J. P. (2019). The risk–return trade-off: performance assessments and cognitive validation of inferences. Br. J. Educ. Psychol. 89, 441–455. doi: 10.1111/bjep.12271

Leu, D. J., Kiili, C., Forzani, E., Zawilinski, L., McVerry, J. G., and O’Byrne, W. I. (2020). “The new literacies of online research and comprehension,” in The Concise Encyclopedia of Applied Linguistics , ed. C. A. Chapelle (Oxford: Wiley-Blackwell), 844–852.

Leu, D. J., Kulikowich, J. M., Kennedy, C., and Maykel, C. (2014). “The ORCA Project: designing technology-based assessments for online research,” in Paper Presented at the American Educational Research Annual Meeting , Philadelphia, PA.

Liu, O. L., Frankel, L., and Roohr, K. C. (2014). Assessing critical thinking in higher education: current state and directions for next-generation assessments. ETS Res. Rep. Ser. 1, 1–23. doi: 10.1002/ets2.12009

McClelland, D. C. (1973). Testing for competence rather than for “intelligence.”. Am. Psychol. 28, 1–14. doi: 10.1037/h0034092

McGrew, S., Ortega, T., Breakstone, J., and Wineburg, S. (2017). The challenge that’s bigger than fake news: civic reasoning in a social media environment. Am. Educ. 4, 4-9, 39.

Mejía, A., Mariño, J. P., and Molina, A. (2019). Incorporating perspective analysis into critical thinking performance assessments. Br. J. Educ. Psychol. 89, 456–467. doi: 10.1111/bjep.12297

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educ. Res. 23, 13–23. doi: 10.3102/0013189x023002013

Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Res. Rep. Ser. 2003, i–29. doi: 10.1002/j.2333-8504.2003.tb01908.x

Mislevy, R. J., and Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educ. Meas. Issues Pract. 25, 6–20. doi: 10.1111/j.1745-3992.2006.00075.x

Mullis, I. V. S., Martin, M. O., Foy, P., and Hooper, M. (2017). ePIRLS 2016 International Results in Online Informational Reading. Available online at:: http://timssandpirls.bc.edu/pirls2016/international-results/ (accessed May 1, 2020).

Nagel, M.-T., Zlatkin-Troitschanskaia, O., Schmidt, S., and Beck, K. (2020). “Performance assessment of generic and domain-specific skills in higher education economics,” in Student Learning in German Higher Education , eds O. Zlatkin-Troitschanskaia, H. A. Pant, M. Toepper, and C. Lautenbach (Berlin: Springer), 281–299. doi: 10.1007/978-3-658-27886-1_14

Organisation for Economic Co-operation and Development [OECD] (2012). AHELO: Feasibility Study Report , Vol. 1. Paris: OECD. Design and implementation.

Organisation for Economic Co-operation and Development [OECD] (2013). AHELO: Feasibility Study Report , Vol. 2. Paris: OECD. Data analysis and national experiences.

Oser, F. K., and Biedermann, H. (2020). “A three-level model for critical thinking: critical alertness, critical reflection, and critical analysis,” in Frontiers and Advances in Positive Learning in the Age of Information (PLATO) , ed. O. Zlatkin-Troitschanskaia (Cham: Springer), 89–106. doi: 10.1007/978-3-030-26578-6_7

Paul, R., and Elder, L. (2007). Consequential validity: using assessment to drive instruction. Found. Crit. Think. 29, 31–40.

Pellegrino, J. W., and Hilton, M. L. (eds) (2012). Education for life and work: Developing Transferable Knowledge and Skills in the 21st Century. Washington DC: National Academies Press.

Shavelson, R. (2010). Measuring College Learning Responsibly: Accountability in a New Era. Redwood City, CA: Stanford University Press.

Shavelson, R. J. (2013). On an approach to testing and modeling competence. Educ. Psychol. 48, 73–86. doi: 10.1080/00461520.2013.779483

Shavelson, R. J., Zlatkin-Troitschanskaia, O., Beck, K., Schmidt, S., and Marino, J. P. (2019). Assessment of university students’ critical thinking: next generation performance assessment. Int. J. Test. 19, 337–362. doi: 10.1080/15305058.2018.1543309

Shavelson, R. J., Zlatkin-Troitschanskaia, O., and Marino, J. P. (2018). “International performance assessment of learning in higher education (iPAL): research and development,” in Assessment of Learning Outcomes in Higher Education: Cross-National Comparisons and Perspectives , eds O. Zlatkin-Troitschanskaia, M. Toepper, H. A. Pant, C. Lautenbach, and C. Kuhn (Berlin: Springer), 193–214. doi: 10.1007/978-3-319-74338-7_10

Shavelson, R. J., Klein, S., and Benjamin, R. (2009). The limitations of portfolios. Inside Higher Educ. Available online at: https://www.insidehighered.com/views/2009/10/16/limitations-portfolios

Stolzenberg, E. B., Eagan, M. K., Zimmerman, H. B., Berdan Lozano, J., Cesar-Davis, N. M., Aragon, M. C., et al. (2019). Undergraduate Teaching Faculty: The HERI Faculty Survey 2016–2017. Los Angeles, CA: UCLA.

Tessier-Lavigne, M. (2020). Putting Ethics at the Heart of Innovation. Stanford, CA: Stanford Magazine.

Wheeler, P., and Haertel, G. D. (1993). Resource Handbook on Performance Assessment and Measurement: A Tool for Students, Practitioners, and Policymakers. Palm Coast, FL: Owl Press.

Wineburg, S., McGrew, S., Breakstone, J., and Ortega, T. (2016). Evaluating Information: The Cornerstone of Civic Online Reasoning. Executive Summary. Stanford, CA: Stanford History Education Group.

Zahner, D. (2013). Reliability and Validity–CLA+. Council for Aid to Education. Available online at:: https://pdfs.semanticscholar.org/91ae/8edfac44bce3bed37d8c9091da01d6db3776.pdf .

Zlatkin-Troitschanskaia, O., and Shavelson, R. J. (2019). Performance assessment of student learning in higher education [Special issue]. Br. J. Educ. Psychol. 89, i–iv, 413–563.

Zlatkin-Troitschanskaia, O., Pant, H. A., Lautenbach, C., Molerov, D., Toepper, M., and Brückner, S. (2017). Modeling and Measuring Competencies in Higher Education: Approaches to Challenges in Higher Education Policy and Practice. Berlin: Springer VS.

Zlatkin-Troitschanskaia, O., Pant, H. A., Toepper, M., and Lautenbach, C. (eds) (2020). Student Learning in German Higher Education: Innovative Measurement Approaches and Research Results. Wiesbaden: Springer.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., and Pant, H. A. (2018). “Assessment of learning outcomes in higher education: international comparisons and perspectives,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 686–697.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., Schmidt, S., and Beck, K. (2019). On the complementarity of holistic and analytic approaches to performance assessment scoring. Br. J. Educ. Psychol. 89, 468–484. doi: 10.1111/bjep.12286

Keywords : critical thinking, performance assessment, assessment framework, scoring rubric, evidence-centered design, 21st century skills, higher education

Citation: Braun HI, Shavelson RJ, Zlatkin-Troitschanskaia O and Borowiec K (2020) Performance Assessment of Critical Thinking: Conceptualization, Design, and Implementation. Front. Educ. 5:156. doi: 10.3389/feduc.2020.00156

Received: 30 May 2020; Accepted: 04 August 2020; Published: 08 September 2020.

Reviewed by:

Copyright © 2020 Braun, Shavelson, Zlatkin-Troitschanskaia and Borowiec. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Henry I. Braun, [email protected]

International Study Reveals Measuring and Developing Critical-Thinking Skills as an Essential Best Practice in Higher Education

Opportunities exist for higher education institutions worldwide to increase critical-thinking skills among higher education graduates through explicit instruction, practice, and measurement of the skills employers are most seeking in today’s innovation economy..

NEW YORK, October 18, 2023 | Source: GlobeNewswire

The  Council for Aid to Education, Inc.  (CAE), a leader in designing innovative performance tasks for measurement and instruction of higher-order skills, recently co-authored an article on a six-year international study in the  European Journal of Education Study . Key findings shared in  “Assessing and Developing Critical-Thinking Skills in Higher Education”  include that it is feasible to reliably and validly measure higher-order skills in a cross-cultural context and that assessment of these skills is necessary for colleges and universities to ensure that their programs are graduating students with the skills needed for career success after graduation.

Between 2015 and 2020, 120,000 students from higher education institutions in six different countries — Chile, Finland, Italy, Mexico, the UK, and the US — were administered CAE’s  Collegiate Learning Assessment (CLA+) , a performance-based assessment that measures proficiency with critical thinking, problem solving, and written communication. Analysis of the data show that students entering a higher education program on average performed at the  Developing  mastery level of the test while exiting students on average performed at the  Proficient  mastery level. The amount of growth is relatively small (d = 0.10), but significant. However, half of exiting students perform at the two lowest levels of proficiency, indicating that higher education degrees do not necessarily mean students have gained the higher-order skills needed for innovation-oriented workplaces.

“In response to employer concerns about graduate employability, assessing and developing students’ higher-order skills is an essential component of best practices in higher education,” said Doris Zahner, Ph.D., CAE’s chief academic officer. “The ability to measure these skills in a cross-cultural context addresses a current gap between the skills that higher education graduates possess and the skills that are required by hiring managers for success in the workplace.”

This study reinforces the same findings of  OECD’s 2013 Assessment of Higher Education Learning Outcomes (AHELO) Feasibility Study and is based upon a recently published 2022 OECD report, Does Higher Education Teach Students to Think Critically? . Since this original study, CAE has further improved CLA+ through lessons learned from its implementation, analytical research on the data gathered, and international collaboration.

The research discussed in “Assessing and Developing Critical-Thinking Skills in Higher Education” reinforces the need for policymakers, researchers, and higher education leaders to have valid and reliable internationally comparative assessments of the skills that are needed for today’s knowledge economy. “The results outlined in this report show the power of assessing critical-thinking skills and how such assessments can feed into the higher education policy agenda at the national and international level,” said article co-author Dirk Van Damme, former head of the Centre for Educational Research and Innovation at OECD and current senior research fellow at the Centre for Curriculum Redesign.

CAE, in collaboration with the Finland Ministry of Education and Culture, will continue to study the impact of higher education on the development of critical-thinking skills. Starting in 2023 and continuing through 2025, a cohort of students from 18 Finnish higher education institutions will use CLA+ to measure their growth with critical thinking, adding a longitudinal component to this ongoing research.

To learn more about this study, CAE’s other research, and CAE’s performance-based assessments and critical thinking instruction, visit  cae.org .

About CAE As a nonprofit whose mission is to help improve the academic and career outcomes of secondary and higher education students, CAE is the leader in designing innovative performance tasks for measurement and instruction of higher order skills and within subject areas.

Over the past 20 years, CAE has helped over 825,000 students globally understand and improve their proficiency in critical thinking, problem solving and effective written communication. Additionally, CAE’s subject area assessments have helped millions of K12 students across the US. Supported by best practices in assessment development, administration and psychometrics, CAE’s performance-based assessments include the Collegiate Learning Assessment (CLA+) and College and Career Readiness Assessment (CCRA+). To learn more, please visit  cae.org  and connect with us on  LinkedIn  and   YouTube .

You Might Also Like…

international test of critical thinking

THE AI EDGE:

international test of critical thinking

Women’s History Month Video: Professionals Share How Higher-Order Skills Contribute to Career Success

international test of critical thinking

Black History Month Video: How Higher-Order Skills Drive Career Success

R&L Logo

  • Browse by Subjects
  • New Releases
  • Coming Soon
  • Chases's Calendar
  • Browse by Course
  • Instructor's Copies
  • Monographs & Research
  • Intelligence & Security
  • Library Services
  • Business & Leadership
  • Museum Studies
  • Pastoral Resources
  • Psychotherapy

Cover Image

The International Critical Thinking Reading and Writing Test

Second edition, richard paul and linda elder, also available.

Cover image for the book Critical Thinking: Tools for Taking Charge of Your Learning and Your Life, Fourth Edition

X

UCL Centre for Languages & International Education (CLIE)

Critical Thinking task

Menu

If you are selected for the interview, you’ll be sent a Critical Thinking Task to prepare before. Here is some general information on the task and guidelines on how to prepare for it.

Why is there a Critical Thinking task?

We recognise that UPCH students come from a variety of academic, educational and cultural backgrounds.

This is why the UPCH Critical Thinking task does not assess your knowledge of critical thinking. Instead, it’s used to identify candidates who can think critically.

We are looking for students who can intellectually engage with ideas, develop and support their arguments, and are able to be self-reflective in their learning. This approach to studying is vital to success on the UPCH.

What is the UPCH Critical Thinking task?

The UPCH Critical Thinking task consists of a short text, or text and images, to read and study before the interview and on which you’ll be asked questions during the interview.

You may also be asked questions that relate more generally to the text’s topic(s) or to your study skills.

The text or text and images will relate to an arts and humanities, philosophical, social, political, economic or environmental topic.

How can I prepare for the Critical Thinking task in the interview?

Once you've received the actual Critical Thinking text, you should make sure you have understood the content.

You should be able to:

  • Understand the vocabulary used in the text
  • Paraphrase or summarise the text and its main arguments
  • Have a clear idea of the theme and the main points of the text and/or images
  • Identify any assumptions, biases, generalisations, flaws, or logical errors in the text and/or images

But to think critically is to read actively! When you read the text you are sent, try to answer questions like:

  • What’s the purpose of the text/image?
  • What are the main arguments/ideas and are they convincing?
  • What support is given for these arguments/ ideas?
  • What conclusion(s) are drawn, and do you agree with them? 
  • What could be a good counter-argument?
  • What examples would you use to support it?
  • Is it well-written? How would you rewrite this text?
  • What wider topics or issues does this text relate to in your view?
  • Is this text or topic surprising to you? Do you find it challenging and in what regard?

Use the practice tasks below to see the type of text you may be sent to study and try to answer some of the preparation questions we’ve given. Try to answer them orally and if you can, record yourself.

  • Practice task 1  (example text and preparation questions)
  • Practice task 2  (example text and preparation questions)
  • Practice task 3  (example text and images and preparation questions)

Other tips to prepare for the interview

The focus of the Critical Thinking task is on thought rather grammatical accuracy and vocabulary. However, correct expression and good vocabulary will help you get your ideas across better during the interview.

Here are some other tips designed to help you prepare for your interview:

  • Make sure you feel rested, are in a comfortable environment, won't be disturbed and that you're unlikely to experience technical problems.
  • Read and listen in English as much as you can (e.g., newspapers, books, radio, keep a diary/journal, write essays, podcasts, etc.).
  • Take part in debates and discussions with friends, family and teachers to practise in advance.
  • Train yourself in finding on-the-spot arguments to support your opinions and practise voicing them confidently to yourself and then to others.
  • Use role-play and put yourself in your opponent’s mind frame: what would they say?
  • Think about a topic from another person's point of view (like an economist or a philosopher). What would they say?
  • Record yourself speaking and try to see where you could articulate your answers better.
  • Learn more about Critical Thinking skills in general and do some exercises using this book:  Critical Thinking Skills: Effective Analysis, Argument and Reflection, by Stella Cottrell (Palgrave Macmillan) .

Living in China? Watch our videos on YouKu .

Applications

You can now apply for September 2024 entry.

Please read the  how to apply  page carefully before submitting your application.

Register your interest 

Sign up to our mailing list for the latest updates about UCL’s international foundation year.

Register your interest

9 reasons to study the UPC at UCL

Complete your international foundation course at a global top 10 university in the centre of London.

Chat with us

We travel to countries around the world and host online one-to-one sessions for our prospective students.

  • Book an online one-to-one session
  • Find out if we are visiting a city near you

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10672018

Logo of jintell

Critical Thinking, Intelligence, and Unsubstantiated Beliefs: An Integrative Review

Associated data.

This research did not involve collection of original data, and hence there are no new data to make available.

A review of the research shows that critical thinking is a more inclusive construct than intelligence, going beyond what general cognitive ability can account for. For instance, critical thinking can more completely account for many everyday outcomes, such as how thinkers reject false conspiracy theories, paranormal and pseudoscientific claims, psychological misconceptions, and other unsubstantiated claims. Deficiencies in the components of critical thinking (in specific reasoning skills, dispositions, and relevant knowledge) contribute to unsubstantiated belief endorsement in ways that go beyond what standardized intelligence tests test. Specifically, people who endorse unsubstantiated claims less tend to show better critical thinking skills, possess more relevant knowledge, and are more disposed to think critically. They tend to be more scientifically skeptical and possess a more rational–analytic cognitive style, while those who accept unsubstantiated claims more tend to be more cynical and adopt a more intuitive–experiential cognitive style. These findings suggest that for a fuller understanding of unsubstantiated beliefs, researchers and instructors should also assess specific reasoning skills, relevant knowledge, and dispositions which go beyond what intelligence tests test.

1. Introduction

Why do some people believe implausible claims, such as the QAnon conspiracy theory, that a cabal of liberals is kidnapping and trafficking many thousands of children each year, despite the lack of any credible supporting evidence? Are believers less intelligent than non-believers? Do they lack knowledge of such matters? Are they more gullible or less skeptical than non-believers? Or, more generally, are they failing to think critically?

Understanding the factors contributing to acceptance of unsubstantiated claims is important, not only to the development of theories of intelligence and critical thinking but also because many unsubstantiated beliefs are false, and some are even dangerous. Endorsing them can have a negative impact on an individual and society at large. For example, false beliefs about the COVID-19 pandemic, such as believing that 5G cell towers induced the spread of the COVID-19 virus, led some British citizens to set fire to 5G towers ( Jolley and Paterson 2020 ). Other believers in COVID-19 conspiracy theories endangered their own and their children’s lives when they refused to socially distance and be vaccinated with highly effective vaccines, despite the admonitions of scientific experts ( Bierwiaczonek et al. 2020 ). Further endangering the population at large, those who believe the false conspiracy theory that human-caused global warming is a hoax likely fail to respond adaptively to this serious global threat ( van der Linden 2015 ). Parents, who uncritically accept pseudoscientific claims, such as the false belief that facilitated communication is an effective treatment for childhood autism, may forego more effective treatments ( Lilienfeld 2007 ). Moreover, people in various parts of the world still persecute other people whom they believe are witches possessing supernatural powers. Likewise, many people still believe in demonic possession, which has been associated with mental disorders ( Nie and Olson 2016 ). Compounding the problems created by these various unsubstantiated beliefs, numerous studies now show that when someone accepts one of these types of unfounded claims, they tend to accept others as well; see Bensley et al. ( 2022 ) for a review.

Studying the factors that contribute to unfounded beliefs is important not only because of their real-world consequences but also because this can facilitate a better understanding of unfounded beliefs and how they are related to critical thinking and intelligence. This article focuses on important ways in which critical thinking and intelligence differ, especially in terms of how a comprehensive model of CT differs from the view of intelligence as general cognitive ability. I argue that this model of CT more fully accounts for how people can accurately decide if a claim is unsubstantiated than can views of intelligence, emphasizing general cognitive ability. In addition to general cognitive ability, thinking critically about unsubstantiated claims involves deployment of specific reasoning skills, dispositions related to CT, and specific knowledge, which go beyond the contribution of general cognitive ability.

Accordingly, this article begins with an examination of the constructs of critical thinking and intelligence. Then, it discusses theories proposing that to understand thinking in the real world requires going beyond general cognitive ability. Specifically, the focus is on factors related to critical thinking, such as specific reasoning skills, dispositions, metacognition, and relevant knowledge. I review research showing that that this alternative multidimensional view of CT can better account for individual differences in the tendency to endorse multiple types of unsubstantiated claims than can general cognitive ability alone.

2. Defining Critical Thinking and Intelligence

Critical thinking is an almost universally valued educational objective in the US and in many other countries which seek to improve it. In contrast, intelligence, although much valued, has often been viewed as a more stable characteristic and less amenable to improvement through specific short-term interventions, such as traditional instruction or more recently through practice on computer-implemented training programs. According to Wechsler’s influential definition, intelligence is a person’s “aggregate or global capacity to act purposefully, to think rationally, and to deal effectively with his environment” ( Wechsler 1944, p. 3 ).

Consistent with this definition, intelligence has long been associated with general cognitive or intellectual ability and the potential to learn and reason well. Intelligence (IQ) tests measure general cognitive abilities, such as knowledge of words, memory skills, analogical reasoning, speed of processing, and the ability to solve verbal and spatial problems. General intelligence or “g” is a composite of these abilities statistically derived from various cognitive subtests on IQ tests which are positively intercorrelated. There is considerable overlap between g and the concept of fluid intelligence (Gf) in the prominent Cattell–Horn–Carroll model ( McGrew 2009 ), which refers to “the ability to solve novel problems, the solution of which does not depend on previously acquired skills and knowledge,” and crystalized intelligence (Gc), which refers to experience, existing skills, and general knowledge ( Conway and Kovacs 2018, pp. 50–51 ). Although g or general intelligence is based on a higher order factor, inclusive of fluid and crystallized intelligence, it is technically not the same as general cognitive ability, a commonly used, related term. However, in this article, I use “general cognitive ability” and “cognitive ability” because they are the imprecise terms frequently used in the research reviewed.

Although IQ scores have been found to predict performance in basic real-world domains, such as academic performance and job success ( Gottfredson 2004 ), an enduring question for intelligence researchers has been whether g and intelligence tests predict the ability to adapt well in other real-world situations, which concerns the second part of Wechsler’s definition. So, in addition to the search for the underlying structure of intelligence, researchers have been perennially concerned with how general abilities associated with intelligence can be applied to help a person adapt to real-world situations. The issue is largely a question of how cognitive ability and intelligence can help people solve real-world problems and cope adaptively and succeed in dealing with various environmental demands ( Sternberg 2019 ).

Based on broad conceptual definitions of intelligence and critical thinking, both intelligence and CT should aid adaptive functioning in the real world, presumably because they both involve rational approaches. Their common association with rationality gives each term a positive connotation. However, complicating the definition of each of these is the fact that rationality also continues to have a variety of meanings. In this article, in agreement with Stanovich et al. ( 2018 ), rationality is defined in the normative sense, used in cognitive science, as the distance between a person’s response and some normative standard of optimal behavior. As such, degree of rationality falls on a continuous scale, not a categorical one.

Despite disagreements surrounding the conceptual definitions of intelligence, critical thinking, and rationality, a commonality in these terms is they are value-laden and normative. In the case of intelligence, people are judged based on norms from standardized intelligence tests, especially in academic settings. Although scores on CT tests seldom are, nor could be, used to judge individuals in this way, the normative and value-laden basis of CT is apparent in people’s informal judgements. They often judge others who have made poor decisions to be irrational or to have failed to think critically.

This value-laden aspect of CT is also apparent in formal definitions of CT. Halpern and Dunn ( 2021 ) defined critical thinking as “the use of those cognitive skills or strategies that increase the probability of a desirable outcome. It is used to describe thinking that is purposeful, reasoned, and goal-directed.” The positive conception of CT as helping a person adapt well to one’s environment is clearly implied in “desirable outcome”.

Robert Ennis ( 1987 ) has offered a simpler, yet useful definition of critical thinking that also has normative implications. According to Ennis, “critical thinking is reasonable, reflective thinking focused on deciding what to believe or do” ( Ennis 1987, p. 102 ). This definition implies that CT helps people know what to believe (a goal of epistemic rationality) and how to act (a goal of instrumental rationality). This is conveyed by associating “critical thinking” with the positive terms, “reasonable” and “reflective”. Dictionaries commonly define “reasonable” as “rational”, “logical”, “intelligent”, and “good”, all terms with positive connotations.

For critical thinkers, being reasonable involves using logical rules, standards of evidence, and other criteria that must be met for a product of thinking to be considered good. Critical thinkers use these to evaluate how strongly reasons or evidence supports one claim versus another, drawing conclusions which are supported by the highest quality evidence ( Bensley 2018 ). If no high-quality evidence is available for consideration, it would be unreasonable to draw a strong conclusion. Unfortunately, people’s beliefs are too often based on acceptance of unsubstantiated claims. This is a failure of CT, but is it also a failure of intelligence?

3. Does Critical Thinking “Go Beyond” What Is Meant by Intelligence?

Despite the conceptual overlap in intelligence and CT at a general level, one way that CT can be distinguished from the common view of intelligence as general cognitive ability is in terms of what each can account for. Although intelligence tests, especially measures of general cognitive ability, have reliably predicted academic and job performance, they may not be sufficient to predict other everyday outcomes for which CT measures have made successful predictions and have added to the variance accounted for in performance. For instance, replicating a study by Butler ( 2012 ), Butler et al. ( 2017 ) obtained a negative correlation ( r = −0.33) between scores on the Halpern Critical Thinking Appraisal (HCTA) and a measure of 134 negative, real-world outcomes, not expected to befall critical thinkers, such as engaging in unprotected sex or posting a message on social media which the person regretted. They found that higher HCTA scores not only predicted better life decisions, but also predicted better performance beyond a measure of general cognitive ability. These results suggest that CT can account for real-world outcomes and goes beyond general cognitive ability to account for additional variance.

Some theorists maintain that standardized intelligence tests do not capture the variety of abilities that people need to adapt well in the real world. For example, Gardner ( 1999 ), has proposed that additional forms of intelligence are needed, such as spatial, musical, and interpersonal intelligences in addition to linguistic and logical–mathematical intelligences, more typically associated with general cognitive ability and academic success. In other theorizing, Sternberg ( 1988 ) has proposed three additional types of intelligence: analytical, practical, and creative intelligence, to more fully capture the variety of intelligent abilities on which people differ. Critical thinking is considered part of analytical skills which involve evaluating the quality and applicability of ideas, products, and options ( Sternberg 2022 ). Regarding adaptive intelligence, Sternberg ( 2019 ) has emphasized how adaptive aspects of intelligence are needed to solve real-world problems both at the individual and species levels. According to Sternberg, core components of intelligence have evolved in humans, but intelligence takes different forms in different cultures, with each culture valuing its own skills for adaptation. Thus, the construct of intelligence must go beyond core cognitive ability to encompass the specific abilities needed for adaptive behavior in specific cultures and settings.

Two other theories propose that other components be added to intelligent and rational thinking. Ackerman ( 2022 ) has emphasized the importance of acquiring domain-specific knowledge for engaging in intelligent functioning in the wide variety of tasks found in everyday life. Ackerman has argued that declarative, procedural, and tacit knowledge, as well as non-ability variables, are needed to better predict job performance and performance of other everyday activities. Taking another approach, Halpern and Dunn ( 2021 ) have proposed that critical thinking is essentially the adaptive application of intelligence for solving real-world problems. Elsewhere, Butler and Halpern ( 2019 ) have argued that dispositions such as open-mindedness are another aspect of CT and that domain-specific knowledge and specific CT skills are needed to solve real-world problems.

Examples are readily available for how CT goes beyond what IQ tests test to include specific rules for reasoning and relevant knowledge needed to execute real-world tasks. Take the example of scientific reasoning, which can be viewed as a specialized form of CT. Drawing a well-reasoned inductive conclusion about a theory or analyzing the quality of a research study both require that a thinker possess relevant specialized knowledge related to the question and specific reasoning skills for reasoning about scientific methodology. In contrast, IQ tests are deliberately designed to be nonspecialized in assessing Gc, broadly sampling vocabulary and general knowledge in order to be fair and unbiased ( Stanovich 2009 ). Specialized knowledge and reasoning skills are also needed in non-academic domains. Jurors must possess specialized knowledge to understand expert, forensic testimony and specific reasoning skills to interpret the law and make well-reasoned judgments about a defendant’s guilt or innocence.

Besides lacking specific reasoning skills and domain-relevant knowledge, people may fail to think critically because they are not disposed to use their reasoning skills to examine such claims and want to preserve their favored beliefs. Critical thinking dispositions are attitudes or traits that make it more likely that a person will think critically. Theorists have proposed numerous CT dispositions (e.g., Bensley 2018 ; Butler and Halpern 2019 ; Dwyer 2017 ; Ennis 1987 ). Some commonly identified CT dispositions especially relevant to this discussion are open-mindedness, skepticism, intellectual engagement, and the tendency to take a reflective, rational–analytic approach. Critical thinking dispositions are clearly value-laden and prescriptive. A good thinker should be open-minded, skeptical, reflective, intellectually engaged, and value a rational–analytic approach to inquiry. Conversely, corresponding negative dispositions, such as “close-mindedness” and “gullibility”, could obstruct CT.

Without the appropriate disposition, individuals will not use their reasoning skills to think critically about questions. For example, the brilliant mystery writer, Sir Arthur Conan Doyle, who was trained as a physician and created the hyper-reasonable detective Sherlock Holmes, was not disposed to think critically about some unsubstantiated claims. Conan Doyle was no doubt highly intelligent in cognitive ability terms, but he was not sufficiently skeptical (disposed to think critically) about spiritualism. He believed that he was talking to his dearly departed son though a medium, despite the warnings of his magician friend, Harry Houdini, who told him that mediums used trickery in their seances. Perhaps influenced by his Irish father’s belief in the “wee folk”, Conan Doyle also believed that fairies inhabited the English countryside, based on children’s photos, despite the advice of experts who said the photos could be faked. Nevertheless, he was skeptical of a new theory of tuberculosis proposed by Koch when he reported on it, despite his wife suffering from the disease. So, in professional capacities, Conan Doyle used his CT skills, but in certain other domains for which he was motivated to accept unsubstantiated claims, he failed to think critically, insufficiently disposed to skeptically challenge certain implausible claims.

This example makes two important points. Conan Doyle’s superior intelligence was not enough for him to reject implausible claims about the world. In general, motivated reasoning can lead people, even those considered highly intelligent, to accept claims with no good evidentiary support. The second important point is that we would not be able to adequately explain cases like this one, considering only the person’s intelligence or even their reasoning skills, without also considering the person’s disposition. General cognitive ability alone is not sufficient, and CT dispositions should also be considered.

Supporting this conclusion, Stanovich and West ( 1997 ) examined the influence of dispositions beyond the contribution of cognitive ability on a CT task. They gave college students an argument evaluation test in which participants first rated their agreement with several claims about real social and political issues made by a fictitious person. Then, they gave them evidence against each claim and finally asked them to rate the quality of a counterargument made by the same fictitious person. Participants’ ratings of the counterarguments were compared to the median ratings of expert judges on the quality of the rebuttals. Stanovich and West also administered a new measure of rational disposition called the Actively Open-minded Thinking (AOT) scale and the SAT as a proxy for cognitive ability. The AOT was a composite of items from several other scales that would be expected to measure CT disposition. They found that both SAT and AOT scores were significant predictors of higher argument analysis scores. Even after partialing out cognitive ability, actively open-minded thinking was significant. These results suggest that general cognitive ability alone was not sufficient to account for thinking critically about real-world issues and that CT disposition was needed to go beyond it.

Further examining the roles of CT dispositions and cognitive ability on reasoning, Stanovich and West ( 2008 ) studied myside bias, a bias in reasoning closely related to one-sided thinking and confirmation bias. A critical thinker would be expected to not show myside bias and instead fairly evaluate evidence on all sides of a question. Stanovich and West ( 2007 ) found that college students often showed myside bias when asked their opinions about real-world policy issues, such as those concerning the health risks of smoking and drinking alcohol. For example, compared to non-smokers, smokers judged the health risks of smoking to be lower. When they divided participants into higher versus lower cognitive ability groups based on SAT scores, the two groups showed little difference on myside bias. Moreover, on the hazards of drinking issue, participants who drank less had higher scores on the CT disposition measure.

Other research supports the need for both reasoning ability and CT disposition in predicting outcomes in the real world. Ren et al. ( 2020 ) found that CT disposition, as measured by a Chinese critical thinking disposition inventory, and a CT skill measure together contributed a significant amount of the variance in predicting academic performance beyond the contribution of cognitive ability alone, as measured by a test of fluid intelligence. Further supporting the claim that CT requires both cognitive ability and CT disposition, Ku and Ho ( 2010 ) found that a CT disposition measure significantly predicted scores on a CT test beyond the significant contribution of verbal intelligence in high school and college students from Hong Kong.

The contribution of dispositions to thinking is related to another way that CT goes beyond the application of general cognitive ability, i.e., by way of the motivation for reasoning. Assuming that all reasoning is motivated ( Kunda 1990 ), then CT is motivated, too, which is implicit within the Halpern and Dunn ( 2021 ) and Ennis ( 1987 ) definitions. Critical thinking is motivated in the sense of being purposeful and directed towards the goal of arriving at an accurate conclusion. For instance, corresponding to pursuit of the goal of accurate reasoning, the CT disposition of “truth-seeking” guides a person towards reaching the CT goal of arriving at an accurate conclusion.

Also, according to Kunda ( 1990 ), a second type of motivated reasoning can lead to faulty conclusions, often by directing a person towards the goal of maintaining favored beliefs and preconceptions, as in illusory correlation, belief perseverance, and confirmation bias. Corresponding to this second type, negative dispositions, such as close-mindedness and self-serving motives, can incline thinkers towards faulty conclusions. This is especially relevant in the present discussion because poorer reasoning, thinking errors, and the inappropriate use of heuristics are related to the endorsement of unsubstantiated claims, all of which are CT failures. The term “thinking errors” is a generic term referring to logical fallacies, informal reasoning fallacies, argumentation errors, and inappropriate uses of cognitive heuristics ( Bensley 2018 ). Heuristics are cognitive shortcuts, commonly used to simplify judgment tasks and reduce mental effort. Yet, when used inappropriately, heuristics often result in biased judgments.

Stanovich ( 2009 ) has argued that IQ tests do not test people’s use of heuristics, but heuristics have been found to be negatively correlated with CT performance ( West et al. 2008 ). In this same study, they found that college students’ cognitive ability, as measured by performance on the SAT, was not correlated with thinking biases associated with use of heuristics. Although Stanovich and West ( 2008 ) found that susceptibility to biases, such as the conjunction fallacy, framing effect, base-rate neglect, affect bias, and myside bias were all uncorrelated with cognitive ability (using SAT as a proxy), other types of thinking errors were correlated with SAT.

Likewise, two types of knowledge are related to the two forms of motivated reasoning. For instance, inaccurate knowledge, such as misconceptions, can derail reasoning from moving towards a correct conclusion, as in when a person reasons from false premises. In contrast, reasoning from accurate knowledge is more likely to produce an accurate conclusion. Taking into account inaccurate knowledge and thinking errors is important to understanding the endorsement of unsubstantiated claims because these are also related to negative dispositions, such as close-mindedness and cynicism, none of which are measured by intelligence tests.

Critical thinking questions are often situated in real-world examples or in simulations of them which are designed to detect thinking errors and bias. As described in Halpern and Butler ( 2018 ), an item like one on the “Halpern Critical Thinking Assessment” (HCTA) provides respondents with a mock newspaper story about research showing that first-graders who attended preschool were better able to learn how to read. Then the question asks if preschool should be made mandatory. A correct response to this item requires recognizing that correlation does not imply causation, that is, avoiding a common reasoning error people make in thinking about research implications in everyday life. Another CT skills test, “Analyzing Psychological Statements” (APS) assesses the ability to recognize thinking errors and apply argumentation skills and psychology to evaluate psychology-related examples and simulations of real-life situations ( Bensley 2021 ). For instance, besides identifying thinking errors in brief samples of thinking, questions ask respondents to distinguish arguments from non-arguments, find assumptions in arguments, evaluate kinds of evidence, and draw a conclusion from a brief psychological argument. An important implication of the studies just reviewed is that efforts to understand CT can be further informed by assessing thinking errors and biases, which, as the next discussion shows, are related to individual differences in thinking dispositions and cognitive style.

4. Dual-Process Theory Measures and Unsubstantiated Beliefs

Dual-process theory (DPT) and measures associated with it have been widely used in the study of the endorsement of unsubstantiated beliefs, especially as they relate to cognitive style. According to a cognitive style version of DPT, people have two modes of processing, a fast intuitive–experiential (I-E) style of processing and a slower, reflective, rational–analytic (R-A) style of processing. The intuitive cognitive style is associated with reliance on hunches, feelings, personal experience, and cognitive heuristics which simplify processing, while the R-A cognitive style is a reflective, rational–analytic style associated with more elaborate and effortful processing ( Bensley et al. 2022 ; Epstein 2008 ). As such, the rational–analytic cognitive style is consistent with CT dispositions, such as those promoting the effortful analysis of evidence, objective truth, and logical consistency. In fact, CT is sometimes referred to as “critical-analytic” thinking ( Byrnes and Dunbar 2014 ) and has been associated with analytical intelligence Sternberg ( 1988 ) and with rational thinking, as discussed before.

People use both modes of processing, but they show individual differences in which mode they tend to rely upon, although the intuitive–experiential mode is the default ( Bensley et al. 2022 ; Morgan 2016 ; Pacini and Epstein 1999 ), and they accept unsubstantiated claims differentially based on their predominate cognitive style ( Bensley et al. 2022 ; Epstein 2008 ). Specifically, individuals who rely more on an I-E cognitive style tend to endorse unsubstantiated claims more strongly, while individuals who rely more on a R-A cognitive style tend to endorse those claims less. Note, however, that other theorists view the two processes and cognitive styles somewhat differently, (e.g., Kahneman 2011 ; Stanovich et al. 2018 ).

Researchers have often assessed the contribution of these two cognitive styles to endorsement of unsubstantiated claims, using variants of three measures: the Cognitive Reflection Test (CRT) of Frederick ( 2005 ), the Rational–Experiential Inventory of Epstein and his colleagues ( Pacini and Epstein 1999 ), and the related Need for Cognition scale of Cacioppo and Petty ( 1982 ). The CRT is a performance-based test which asks participants to solve problems that appear to require simple mathematical calculations, but which actually require more reflection. People typically do poorly on the CRT, which is thought to indicate reliance on an intuitive cognitive style, while better performance is thought to indicate reliance on the slower, more deliberate, and reflective cognitive style. The positive correlation of the CRT with numeracy scores suggests it also has a cognitive skill component ( Patel et al. 2019 ). The Rational–Experiential Inventory (REI) of Pacini and Epstein ( 1999 ) contains one scale designed to measure an intuitive–experiential cognitive style and a second scale intended to measure a rational–analytic (R-A) style. The R-A scale was adapted from the Need for Cognition (NFC) scale of Cacioppo and Petty ( 1982 ), another scale associated with rational–analytic thinking and expected to be negatively correlated with unsubstantiated beliefs. The NFC was found to be related to open-mindedness and intellectual engagement, two CT dispositions ( Cacioppo et al. 1996 ).

The cognitive styles associated with DPT also relate to CT dispositions. Thinking critically requires that individuals be disposed to use their reasoning skills to reject unsubstantiated claims ( Bensley 2018 ) and that they be inclined to take a rational–analytic approach rather than relying on their intuitions and feelings. For instance, Bensley et al. ( 2014 ) found that students who endorsed more psychological misconceptions adopted a more intuitive cognitive style, were less disposed to take a rational–scientific approach to psychology, and scored lower on a psychological critical thinking skills test. Further supporting this connection, West et al. ( 2008 ) found that participants who tended to use cognitive heuristics more, thought to be related to intuitive processing and bias, scored lower on a critical thinking measure. As the Bensley et al. ( 2014 ) results suggest, in addition to assessing reasoning skills and dispositions, comprehensive CT assessment research should assess knowledge and unsubstantiated beliefs because these are related to failures of critical thinking.

5. Assessing Critical Thinking and Unsubstantiated Beliefs

Assessing endorsement of unsubstantiated claims provides another way to assess CT outcomes related to everyday thinking, which goes beyond what intelligence tests test ( Bensley and Lilienfeld 2020 ). From the perspective of the multi-dimensional model of CT, endorsement of unsubstantiated claims could result from deficiencies in a person’s CT reasoning skills, a lack of relevant knowledge, and in the engagement of inappropriate dispositions. Suppose an individual endorses an unsubstantiated claim, such as believing the conspiracy theory that human-caused global warming is a hoax. The person may lack the specific reasoning skills needed to critically evaluate the conspiracy. Lantian et al. ( 2020 ) found that scores on a CT skills test were negatively correlated with conspiracy theory beliefs. The person also must possess relevant scientific knowledge, such as knowing the facts that each year humans pump about 40 billion metric tons of carbon dioxide into the atmosphere and that carbon dioxide is a greenhouse gas which traps heat in the atmosphere. Or, the person may not be scientifically skeptical or too cynical or mistrustful of scientists or governmental officials.

Although endorsing unsubstantiated beliefs is clearly a failure of CT, problems arise in deciding which ones are unsubstantiated, especially when considering conspiracy theories. Typically, the claims which critical thinkers should reject as unsubstantiated are those which are not supported by objective evidence. But of the many conspiracies proposed, few are vigorously examined. Moreover, some conspiracy theories which authorities might initially deny turn out to be real, such as the MK-Ultra theory that the CIA was secretly conducting mind-control research on American citizens.

A way out of this quagmire is to define unsubstantiated beliefs on a continuum which depends on the quality of evidence. This has led to the definition of unsubstantiated claims as assertions which have not been supported by high-quality evidence ( Bensley 2023 ). Those which are supported have the kind of evidentiary support that critical thinkers are expected to value in drawing reasonable conclusions. Instead of insisting that a claim must be demonstrably false to be rejected, we adopt a more tentative acceptance or rejection of claims, based on how much good evidence supports them. Many claims are unsubstantiated because they have not yet been carefully examined and so totally lack support or they may be supported only by low quality evidence such as personal experience, anecdotes, or non-scientific authority. Other claims are more clearly unsubstantiated because they contradict the findings of high-quality research. A critical thinker should be highly skeptical of these.

Psychological misconceptions are one type of claim that can be more clearly unsubstantiated. Psychological misconceptions are commonsense psychological claims (folk theories) about the mind, brain, and behavior that are contradicted by the bulk of high-quality scientific research. Author developed the Test of Psychological Knowledge and Misconceptions (TOPKAM), a 40-item, forced-choice measure with each item posing a statement of a psychological misconception and the other response option stating the evidence-based alternative ( Bensley et al. 2014 ). They found that higher scores on the APS, the argument analysis test applying psychological concepts to analyze real-world examples, were associated with more correct answers on the TOPKAM. Other studies have found positive correlations between CT skills tests and other measures of psychological misconceptions ( McCutcheon et al. 1992 ; Kowalski and Taylor 2004 ). Bensley et al. ( 2014 ) also found that higher correct TOPKAM scores were positively correlated with scores on the Inventory of Thinking Dispositions in Psychology (ITDP) of Bensley ( 2021 ), a measure of the disposition to take a rational and scientific approach to psychology but were negatively correlated with an intuitive cognitive style.

Bensley et al. ( 2021 ) conducted a multidimensional study, assessing beginner psychology students starting a CT course on their endorsement of psychological misconceptions, recognition of thinking errors, CT dispositions, and metacognition, before and after CT instruction. Two classes received explicit instruction involving considerable practice in argument analysis and scientific reasoning skills, with one class receiving CT instruction focused more on recognizing psychological misconceptions and a second class focused more on recognizing various thinking errors. Bensley et al. assessed both classes before and after instruction on the TOPKAM and on the Test of Thinking Errors, a test of the ability to recognize in real-world examples 17 different types of thinking errors, such as confirmation bias, inappropriate use of the availability and representativeness heuristics, reasoning from ignorance/possibility, gambler’s fallacy, and hasty generalization ( Bensley et al. 2021 ). Correct TOPKAM and TOTE scores were positively correlated, and after CT instruction both were positively correlated with the APS, the CT test of argument analysis skills.

Bensley et al. found that after explicit instruction of CT skills, students improved significantly on both the TOPKAM and TOTE, but those focusing on recognizing misconceptions improved the most. Also, those students who improved the most on the TOTE scored higher on the REI rational–analytic scale and on the ITDP, while those improving the most on the TOTE scored higher on the ITDP. The students receiving explicit CT skill instruction in recognizing misconceptions also significantly improved the accuracy of their metacognitive monitoring in estimating their TOPKAM scores after instruction.

Given that before instruction neither class differed in GPA nor on the SAT, a proxy for general cognitive ability, CT instruction provided a good accounting for the improvement in recognition of thinking errors and misconceptions without recourse to intelligence. However, SAT scores were positively correlated with both TOTE scores and APS scores, suggesting that cognitive ability contributed to CT skill performance. These results replicated the earlier findings of Bensley and Spero ( 2014 ) showing that explicit CT instruction improved performance on both CT skills tests and metacognitive monitoring accuracy while controlling for SAT, which was positively correlated with the CT skills test performance.

Taken together, these findings suggest that cognitive ability contributes to performance on CT tasks but that CT instruction goes beyond it to further improve performance. As the results of Bensley et al. ( 2021 ) show, and as discussed next, thinking errors and bias from heuristics are CT failures that should also be assessed because they are related to endorsement of unsubstantiated beliefs and cognitive style.

6. Dual-Processing Theory and Research on Unsubstantiated Beliefs

Consistent with DPT, numerous other studies have obtained significant positive correlations between intuitive cognitive style and paranormal belief, often using the REI intuitive–experiential scale and the Revised Paranormal Belief Scale (RPBS) of Tobacyk ( 2004 ) (e.g., Genovese 2005 ; Irwin and Young 2002 ; Lindeman and Aarnio 2006 ; Pennycook et al. 2015 ; Rogers et al. 2018 ; Saher and Lindeman 2005 ). Studies have also found positive correlations between superstitious belief and intuitive cognitive style (e.g., Lindeman and Aarnio 2006 ; Maqsood et al. 2018 ). REI intuitive–experiential thinking style was also positively correlated with belief in complementary and alternative medicine ( Lindeman 2011 ), conspiracy theory belief ( Alper et al. 2020 ), and with endorsement of psychological misconceptions ( Bensley et al. 2014 ; Bensley et al. 2022 ).

Additional evidence for DPT has been found when REI R-A and NFC scores were negatively correlated with scores on measures of unsubstantiated beliefs, but studies correlating them with measures of paranormal belief and conspiracy theory belief have shown mixed results. Supporting a relationship, REI rational–analytic and NFC scores significantly and negatively predicted paranormal belief ( Lobato et al. 2014 ; Pennycook et al. 2012 ). Other studies have also obtained a negative correlation between NFC and paranormal belief ( Lindeman and Aarnio 2006 ; Rogers et al. 2018 ; Stahl and van Prooijen 2018 ), but both Genovese ( 2005 ) and Pennycook et al. ( 2015 ) found that NFC was not significantly correlated with paranormal belief. Swami et al. ( 2014 ) found that although REI R-A scores were negatively correlated with conspiracy theory belief, NFC scores were not.

Researchers often refer to people who are doubtful of paranormal and other unfounded claims as “skeptics” and so have tested whether measures related to skepticism are associated with less endorsement of unsubstantiated claims. They typically view skepticism as a stance towards unsubstantiated claims taken by rational people who reject them, (e.g., Lindeman and Aarnio 2006 ; Stahl and van Prooijen 2018 ), rather than as a disposition inclining a person to think critically about unsubstantiated beliefs ( Bensley 2018 ).

Fasce and Pico ( 2019 ) conducted one of the few studies using a measure related to skeptical disposition, the Critical Thinking Disposition Scale (CTDS) of Sosu ( 2013 ), in relation to endorsement of unsubstantiated claims. They found that scores on the CTDS were negatively correlated with scores on the RPBS but not significantly correlated with either a measure of pseudoscience or of conspiracy theory belief. However, the CRT was negatively correlated with both RPBS and the pseudoscience measure. Because Fasce and Pico ( 2019 ) did not examine correlations with the Reflective Skepticism subscale of the CTDS, its contribution apart from full-scale CTDS was not found.

To more directly test skepticism as a disposition, we recently assessed college students on how well three new measures predicted endorsement of psychological misconceptions, paranormal claims, and conspiracy theories ( Bensley et al. 2022 ). The dispositional measures included a measure of general skeptical attitude; a second measure, the Scientific Skepticism Scale (SSS), which focused more on waiting to accept claims until high-quality scientific evidence supported them; and a third measure, the Cynicism Scale (CS), which focused on doubting the sincerity of the motives of scientists and people in general. We found that although the general skepticism scale did not predict any of the unsubstantiated belief measures, SSS scores were a significant negative predictor of both paranormal belief and conspiracy theory belief. REI R-A scores were a less consistent negative predictor, while REI I-E scores were more consistent positive predictors, and surprisingly CS scores were the most consistent positive predictors of the unsubstantiated beliefs.

Researchers commonly assume that people who accept implausible, unsubstantiated claims are gullible or not sufficiently skeptical. For instance, van Prooijen ( 2019 ) has argued that conspiracy theory believers are more gullible (less skeptical) than non-believers and tend to accept unsubstantiated claims more than less gullible people. van Prooijen ( 2019 ) reviewed several studies supporting the claim that people who are more gullible tend to endorse conspiracy theories more. However, he did not report any studies in which a gullible disposition was directly measured.

Recently, we directly tested the gullibility hypothesis in relation to scientific skepticism ( Bensley et al. 2023 ) using the Gullibility Scale of Teunisse et al. ( 2019 ) on which people skeptical of the paranormal had been shown to have lower scores. We found that Gullibility Scale and the Cynicism Scale scores were positively correlated, and both were significant positive predictors of unsubstantiated beliefs, in general, consistent with an intuitive–experiential cognitive style. In contrast, we found that scores on the Cognitive Reflection Test, the Scientific Skepticism Scale, and the REI rational–analytic scale were all positively intercorrelated and significant negative predictors of unsubstantiated beliefs, in general, consistent with a rational–analytic/reflective cognitive style. Scientific skepticism scores negatively predicted general endorsement of unsubstantiated claims beyond the REI R-A scale, but neither the CTDS nor the CTDS Reflective Skepticism subscale were significant. These results replicated findings from the Bensley et al. ( 2023 ) study and supported an elaborated dual-process model of unsubstantiated belief. The SSS was not only a substantial negative predictor, it was also negatively correlated with the Gullibility Scale, as expected.

These results suggest that both CT-related dispositions and CT skills are related to endorsement of unsubstantiated beliefs. However, a measure of general cognitive ability or intelligence must be examined along with measures of CT and unsubstantiated beliefs to determine if CT goes beyond intelligence to predict unsubstantiated beliefs. In one of the few studies that also included a measure of cognitive ability, Stahl and van Prooijen ( 2018 ) found that dispositional characteristics helped account for acceptance of conspiracies and paranormal belief beyond cognitive ability. Using the Importance of Rationality Scale (IRS), a rational–analytic scale designed to measure skepticism towards unsubstantiated beliefs, Stahl and van Prooijen ( 2018 ) found that the IRS was negatively correlated with paranormal belief and belief in conspiracy theories. In separate hierarchical regressions, cognitive ability was the strongest negative predictor of both paranormal belief and of conspiracy belief, but IRS scores in combination with cognitive ability negatively predicted endorsement of paranormal belief but did not significantly predict conspiracy theory belief. These results provided partial support that that a measure of rational–analytic cognitive style related to skeptical disposition added to the variance accounted for beyond cognitive ability in negatively predicting unsubstantiated belief.

In another study that included a measure of cognitive ability, Cavojova et al. ( 2019 ) examined how CT-related dispositions and the Scientific Reasoning Scale (SRS) were related to a measure of paranormal, pseudoscientific, and conspiracy theory beliefs. The SRS of Drummond and Fischhoff ( 2017 ) likely measures CT skill in that it measures the ability to evaluate scientific research and evidence. As expected, the unsubstantiated belief measure was negatively correlated with the SRS and a cognitive ability measure, similar to Raven’s Progressive Matrices. Unsubstantiated beliefs were positively correlated with dogmatism (the opposite of open-mindedness) but not with REI rational–analytic cognitive style. The SRS was a significant negative predictor of both unsubstantiated belief and susceptibility to bias beyond the contribution of cognitive ability, but neither dogmatism nor analytic thinking were significant predictors. Nevertheless, this study provides some support that a measure related to CT reasoning skill accounts for variance in unsubstantiated belief beyond cognitive ability.

The failure of this study to show a correlation between rational–analytic cognitive style and unsubstantiated beliefs, when some other studies have found significant correlations with it and related measures, has implications for the multidimensional assessment of unsubstantiated beliefs. One implication is that the REI rational–analytic scale may not be a strong predictor of unsubstantiated beliefs. In fact, we have recently found that the Scientific Skepticism Scale was a stronger negative predictor ( Bensley et al. 2022 ; Bensley et al. 2023 ), which also suggests that other measures related to rational–analytic thinking styles should be examined. This could help triangulate the contribution of self-report cognitive style measures to endorsement of unsubstantiated claims, recognizing that the use of self-report measures has a checkered history in psychological research. A second implication is that once again, measures of critical thinking skill and cognitive ability were negative predictors of unsubstantiated belief and so they, too, should be included in future assessments of unsubstantiated beliefs.

7. Discussion

This review provided different lines of evidence supporting the claim that CT goes beyond cognitive ability in accounting for certain real-world outcomes. Participants who think critically reported fewer problems in everyday functioning, not expected to befall critical thinkers. People who endorsed unsubstantiated claims less showed better CT skills, more accurate domain-specific knowledge, less susceptibility to thinking errors and bias, and were more disposed to think critically. More specifically, they tended to be more scientifically skeptical and adopt a more rational–analytic cognitive style. In contrast, those who endorsed them more tended to be more cynical and adopt an intuitive–experiential cognitive style. These characteristics go beyond what standardized intelligence tests test. In some studies, the CT measures accounted for additional variance beyond the variance contributed by general cognitive ability.

That is not to say that measures of general cognitive ability are not useful. As noted by Gottfredson ( 2004 ), “g” is a highly successful predictor of academic and job performance. More is known about g and Gf than about many other psychological constructs. On average, g is closely related to Gf, which is highly correlated with working memory ( r = 0.70) and can be as high as r = 0.77 ( r 2 = 0.60) based on a correlated two-factor model ( Gignac 2014 ). Because modern working memory theory is, itself, a powerful theory ( Chai et al. 2018 ), this lends construct validity to the fluid intelligence construct. Although cognitive scientists have clearly made progress in understanding the executive processes underlying intelligence, they have not yet identified the specific cognitive components of intelligence ( Sternberg 2022 ). Moreover, theorists have acknowledged that intelligence must also include components beyond g, including domain-specific knowledge ( Ackerman 2022 ; Conway and Kovacs 2018 ) which are not yet clearly understood,

This review also pointed to limitations in the research that should be addressed. So far, not only have few studies of unsubstantiated beliefs included measures of intelligence, but they have also often used proxies for intelligence test scores, such as SAT scores. Future studies, besides using more and better measures of intelligence, could benefit from inclusion of more specifically focused measures, such as measures of Gf and Gc. Also, more research should be carried out to develop additional high-quality measures of CT, including ones that assess specific reasoning skills and knowledge relevant to thinking about a subject, which could help resolve perennial questions about the domain-general versus domain-specific nature of intelligence and CT. Overall, the results of this review encourage taking a multidimensional approach to investigating the complex constructs of intelligence, CT, and unsubstantiated belief. Supporting these recommendations were results of studies in which the improvement accrued from explicit CT skill instruction could be more fully understood when CT skills, relevant knowledge, CT dispositions, metacognitive monitoring accuracy, and a proxy for intelligence were used.

8. Conclusions

Critical thinking, broadly conceived, offers ways to understand real-world outcomes of thinking beyond what general cognitive ability can provide and intelligence tests test. A multi-dimensional view of CT which includes specific reasoning and metacognitive skills, CT dispositions, and relevant knowledge can add to our understanding of why some people endorse unsubstantiated claims more than others do, going beyond what intelligence tests test. Although general cognitive ability and domain-general knowledge often contribute to performance on CT tasks, thinking critically about real-world questions also involves applying rules, criteria, and knowledge which are specific to the question under consideration, as well as the appropriate dispositions and cognitive styles for deploying these.

Despite the advantages of taking this multidimensional approach to CT in helping us to more fully understand everyday thinking and irrationality, it presents challenges for researchers and instructors. It implies the need to assess and instruct multidimensionally, including not only measures of reasoning skills but also addressing thinking errors and biases, dispositions, the knowledge relevant to a task, and the accuracy of metacognitive judgments. As noted by Dwyer ( 2023 ), adopting a more complex conceptualization of CT beyond just skills is needed, but it presents challenges for those seeking to improve students’ CT. Nevertheless, the research reviewed suggests that taking this multidimensional approach to CT can enhance our understanding of the endorsement of unsubstantiated claims beyond what standardized intelligence tests contribute. More research is needed to resolve remaining controversies and to develop evidence-based applications of the findings.

Funding Statement

This research received no external funding.

Institutional Review Board Statement

This research involved no new testing of participants and hence did not require Institutional Review Board approval.

Informed Consent Statement

This research involved no new testing of participants and hence did not require an Informed Consent Statement.

Data Availability Statement

Conflicts of interest.

The author declares no conflict of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

  • Ackerman Phillip L. Intelligence … Moving beyond the lowest common denominator. American Psychologist. 2022; 78 :283–97. doi: 10.1037/amp0001057. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alper Sinan, Bayrak Faith, Yilmaz Onurcan. Psychological correlates of COVID-19 conspiracy beliefs and preventive measures: Evidence from Turkey. Current Psychology. 2020; 40 :5708–17. doi: 10.1007/s12144-020-00903-0. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bensley D. Alan. Critical Thinking in Psychology and Everyday Life: A Guide to Effective Thinking. Worth Publishers; New York: 2018. [ Google Scholar ]
  • Bensley D. Alan. The Critical Thinking in Psychology Assessment Battery (CTPAB) and Test Guide. 2021. Unpublished manuscript. Frostburg State University, Frostburg, MD, USA.
  • Bensley D. Alan. “I can’t believe you believe that”: Identifying unsubstantiated claims. Skeptical Inquirer. 2023; 47 :53–56. [ Google Scholar ]
  • Bensley D. Alan, Spero Rachel A. Improving critical thinking skills and metacognitive monitoring through direct infusion. Thinking Skills and Creativity. 2014; 12 :55–68. doi: 10.1016/j.tsc.2014.02.001. [ CrossRef ] [ Google Scholar ]
  • Bensley D. Alan, Lilienfeld Scott O. Assessment of Unsubstantiated Beliefs. Scholarship of Teaching and Learning in Psychology. 2020; 6 :198–211. doi: 10.1037/stl0000218. [ CrossRef ] [ Google Scholar ]
  • Bensley D. Alan, Masciocchi Christopher M., Rowan Krystal A. A comprehensive assessment of explicit critical thinking instruction on recognition of thinking errors and psychological misconceptions. Scholarship of Teaching and Learning in Psychology. 2021; 7 :107. doi: 10.1037/stl0000188. [ CrossRef ] [ Google Scholar ]
  • Bensley D. Alan, Watkins Cody, Lilienfeld Scott O., Masciocchi Christopher, Murtagh Michael, Rowan Krystal. Skepticism, cynicism, and cognitive style predictors of the generality of unsubstantiated belief. Applied Cognitive Psychology. 2022; 36 :83–99. doi: 10.1002/acp.3900. [ CrossRef ] [ Google Scholar ]
  • Bensley D. Alan, Rodrigo Maria, Bravo Maria, Jocoy Kathleen. Dual-Process Theory and Cognitive Style Predictors of the General Endorsement of Unsubstantiated Claims. 2023. Unpublished manuscript. Frostburg State University, Frostburg, MD, USA.
  • Bensley D. Alan, Lilienfeld Scott O., Powell Lauren. A new measure of psychological. misconceptions: Relations with academic background, critical thinking, and acceptance of paranormal and pseudoscientific claims. Learning and Individual Differences. 2014; 36 :9–18. doi: 10.1016/j.lindif.2014.07.009. [ CrossRef ] [ Google Scholar ]
  • Bierwiaczonek Kinga, Kunst Jonas R., Pich Olivia. Belief in COVID-19 conspiracy theories reduces social distancing over time. Applied Psychology Health and Well-Being. 2020; 12 :1270–85. doi: 10.1111/aphw.12223. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Butler Heather A. Halpern critical thinking assessment predicts real-world outcomes of critical thinking. Applied Cognitive Psychology. 2012; 26 :721–29. doi: 10.1002/acp.2851. [ CrossRef ] [ Google Scholar ]
  • Butler Heather A., Halpern Diane F. Is critical thinking a better model of intelligence? In: Sternberg Robert J., editor. The Nature of Intelligence. Cambridge University Press; Cambridge: 2019. pp. 183–96. [ Google Scholar ]
  • Butler Heather A., Pentoney Christopher, Bong Maebelle P. Predicting real-world outcomes: Critical thinking ability is a better predictor of life decisions than intelligence. Thinking Skills and Creativity. 2017; 25 :38–46. doi: 10.1016/j.tsc.2017.06.005. [ CrossRef ] [ Google Scholar ]
  • Byrnes James P., Dunbar Kevin N. The nature and development of critical-analytic thinking. Educational Research Review. 2014; 26 :477–93. doi: 10.1007/s10648-014-9284-0. [ CrossRef ] [ Google Scholar ]
  • Cacioppo John T., Petty Richard E. The need for cognition. Journal of Personality and Social Psychology. 1982; 42 :116–31. doi: 10.1037/0022-3514.42.1.116. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cacioppo John T., Petty Richard E., Feinstein Jeffrey A., Jarvis W. Blair G. Dispositional differences in cognitive motivation: The life and times of individuals varying in need for cognition. Psychological Bulletin. 1996; 119 :197. doi: 10.1037/0033-2909.119.2.197. [ CrossRef ] [ Google Scholar ]
  • Cavojova Vladimira, Srol Jakub, Jurkovic Marek. Why we should think like scientists? Scientific reasoning and susceptibility to epistemically suspect beliefs and cognitive biases. Applied Cognitive Psychology. 2019; 34 :85–95. doi: 10.1002/acp.3595. [ CrossRef ] [ Google Scholar ]
  • Chai Wen Jia, Hamid Abd, Ismafairus Aini, Abdullah Jafri Malin. Working memory from the psychological and neuroscience perspective. Frontiers in Psychology. 2018; 9 :401. doi: 10.3389/fpsyg.2018.00401. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Conway Andrew R., Kovacs Kristof. The nature of the general factor of intelligence. In: Sternberg Robert J., editor. The Nature of Human Intelligence. Cambridge University Press; Cambridge: 2018. pp. 49–63. [ Google Scholar ]
  • Drummond Caitlin, Fischhoff Baruch. Development and validation of the Scientific Reasoning Scale. Journal of Behavioral Decision Making. 2017; 30 :26–38. doi: 10.1002/bdm.1906. [ CrossRef ] [ Google Scholar ]
  • Dwyer Christopher P. Conceptual Perspectives and Practical Guidelines. Cambridge University Press; Cambridge: 2017. [ Google Scholar ]
  • Dwyer Christopher P. An evaluative review of barriers to critical thinking in educational and real-world settings. Journal of Intelligence. 2023; 11 :105. doi: 10.3390/jintelligence11060105. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ennis Robert H. A taxonomy of critical thinking dispositions and abilities. In: Baron Joan, Sternberg Robert., editors. Teaching Thinking Skills: Theory and Practice. W. H. Freeman; New York: 1987. [ Google Scholar ]
  • Epstein Seymour. Intuition from the perspective of cognitive-experiential self-theory. In: Plessner Henning, Betsch Tilmann., editors. Intuition in Judgment and Decision Making. Erlbaum; Washington, DC: 2008. pp. 23–37. [ Google Scholar ]
  • Fasce Angelo, Pico Alfonso. Science as a vaccine: The relation between scientific literacy and unwarranted beliefs. Science & Education. 2019; 28 :109–25. doi: 10.1007/s11191-018-00022-0. [ CrossRef ] [ Google Scholar ]
  • Frederick Shane. Cognitive reflection and decision making. Journal of Economic Perspectives. 2005; 19 :25–42. doi: 10.1257/089533005775196732. [ CrossRef ] [ Google Scholar ]
  • Gardner Howard. Intelligence Reframed: Multiple Intelligence for the 21st Century. Basic Books; New York: 1999. [ Google Scholar ]
  • Genovese Jeremy E. C. Paranormal beliefs, schizotypy, and thinking styles among teachers and future teachers. Personality and Individual Differences. 2005; 39 :93–102. doi: 10.1016/j.paid.2004.12.008. [ CrossRef ] [ Google Scholar ]
  • Gignac Gilles E. Fluid intelligence shares closer to 60% of its variance with working memory capacity and is a better indicator of general intelligence. Intelligence. 2014; 47 :122–33. doi: 10.1016/j.intell.2014.09.004. [ CrossRef ] [ Google Scholar ]
  • Gottfredson Linda S. Life, death, and intelligence. Journal of Cognitive Education and Psychology. 2004; 4 :23–46. doi: 10.1891/194589504787382839. [ CrossRef ] [ Google Scholar ]
  • Halpern Diane F., Dunn Dana. Critical thinking: A model of intelligence for solving real-world problems. Journal of Intelligence. 2021; 9 :22. doi: 10.3390/jintelligence9020022. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Halpern Diane F., Butler Heather A. Is critical thinking a better model of intelligence? In: Sternberg Robert J., editor. The Nature of Human Intelligence. Cambridge University Press; Cambridge: 2018. pp. 183–196. [ Google Scholar ]
  • Irwin Harvey J., Young J. M. Intuitive versus reflective processes in the formation of paranormal beliefs. European Journal of Parapsychology. 2002; 17 :45–55. [ Google Scholar ]
  • Jolley Daniel, Paterson Jenny L. Pylons ablaze: Examining the role of 5G COVID-19 conspiracy beliefs and support for violence. British Journal of Social Psychology. 2020; 59 :628–40. doi: 10.1111/bjso.12394. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kahneman Daniel. Thinking Fast and Slow. Farrar, Strauss and Giroux; New York: 2011. [ Google Scholar ]
  • Kowalski Patricia, Taylor Annette J. Ability and critical thinking as predictors of change in students’ psychological misconceptions. Journal of Instructional Psychology. 2004; 31 :297–303. [ Google Scholar ]
  • Ku Kelly Y. L., Ho Irene T. Dispositional Factors predicting Chinese students’ critical thinking performance. Personality and Individual Differences. 2010; 48 :54–58. doi: 10.1016/j.paid.2009.08.015. [ CrossRef ] [ Google Scholar ]
  • Kunda Ziva. The case for motivated reasoning. Psychological Bulletin. 1990; 98 :480–98. doi: 10.1037/0033-2909.108.3.480. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lantian Anthony, Bagneux Virginie, Delouvee Sylvain, Gauvrit Nicolas. Maybe a free thinker but not a critical one: High conspiracy belief is associated with low critical thinking ability. Applied Cognitive Psychology. 2020; 35 :674–84. doi: 10.1002/acp.3790. [ CrossRef ] [ Google Scholar ]
  • Lilienfeld Scott O. Psychological treatments that cause harm. Perspectives on Psychological Science. 2007; 2 :53–70. doi: 10.1111/j.1745-6916.2007.00029.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lindeman Marjaana. Biases in intuitive reasoning and belief in complementary and alternative medicine. Psychology and Health. 2011; 26 :371–82. doi: 10.1080/08870440903440707. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lindeman Marjaana, Aarnio Kia. Paranormal beliefs: Their dimensionality and correlates. European Journal of Personality. 2006; 20 :585–602. [ Google Scholar ]
  • Lobato Emilio J., Mendoza Jorge, Sims Valerie, Chin Matthew. Explaining the relationship between conspiracy theories, paranormal beliefs, and pseudoscience acceptance among a university population. Applied Cognitive Psychology. 2014; 28 :617–25. doi: 10.1002/acp.3042. [ CrossRef ] [ Google Scholar ]
  • Maqsood Alisha, Jamil Farhat, Khalid Ruhi. Thinking styles and belief in superstitions: Moderating role of gender in young adults. Pakistan Journal of Psychological Research. 2018; 33 :335–348. [ Google Scholar ]
  • McCutcheon Lynn E., Apperson Jenneifer M., Hanson Esher, Wynn Vincent. Relationships among critical thinking skills, academic achievement, and misconceptions about psychology. Psychological Reports. 1992; 71 :635–39. doi: 10.2466/pr0.1992.71.2.635. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McGrew Kevin S. CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence. 2009; 37 :1–10. doi: 10.1016/j.intell.2008.08.004. [ CrossRef ] [ Google Scholar ]
  • Morgan Jonathan. Religion and dual-process cognition: A continuum of styles or distinct types. Religion, Brain, & Behavior. 2016; 6 :112–29. doi: 10.1080/2153599X.2014.966315. [ CrossRef ] [ Google Scholar ]
  • Nie Fanhao, Olson Daniel V. A. Demonic influence: The negative mental health effects of belief in demons. Journal for the Scientific Study of Religion. 2016; 55 :498–515. doi: 10.1111/jssr.12287. [ CrossRef ] [ Google Scholar ]
  • Pacini Rosemary, Epstein Seymour. The relation of rational and experiential information processing styles to personality, basic beliefs, and the ratio-bias phenomenon. Journal of Personality and Social Psychology. 1999; 76 :972–87. doi: 10.1037/0022-3514.76.6.972. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Patel Niraj, Baker S. Glenn, Scherer Laura D. Evaluating the cognitive reflection test as a measure of intuition/reflection, numeracy, and insight problem solving, and the implications for understanding real-world judgments and beliefs. Journal of Experimental Psychology: General. 2019; 148 :2129–53. doi: 10.1037/xge0000592. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pennycook Gordon, Cheyne James Allen, Barr Nathaniel, Koehler Derek J., Fugelsang Jonathan A. On the reception and detection of pseudo-profound bullshit. Judgment and Decision Making. 2015; 10 :549–63. doi: 10.1017/S1930297500006999. [ CrossRef ] [ Google Scholar ]
  • Pennycook Gordon, Cheyne James Allen, Seti Paul, Koehler Derek J., Fugelsang Jonathan A. Analytic cognitive style predicts religious and paranormal belief. Cognition. 2012; 123 :335–46. doi: 10.1016/j.cognition.2012.03.003. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ren Xuezhu, Tong Yan, Peng Peng, Wang Tengfei. Critical thinking predicts academic performance beyond cognitive ability: Evidence from adults and children. Intelligence. 2020; 82 :10187. doi: 10.1016/j.intell.2020.101487. [ CrossRef ] [ Google Scholar ]
  • Rogers Paul, Fisk John E., Lowrie Emma. Paranormal belief, thinking style preference and susceptibility to confirmatory conjunction errors. Consciousness and Cognition. 2018; 65 :182–95. doi: 10.1016/j.concog.2018.07.013. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Saher Marieke, Lindeman Marjaana. Alternative medicine: A psychological perspective. Personality and Individual Differences. 2005; 39 :1169–78. doi: 10.1016/j.paid.2005.04.008. [ CrossRef ] [ Google Scholar ]
  • Sosu Edward M. The development and psychometric validation of a Critical Thinking Disposition Scale. Thinking Skills and Creativity. 2013; 9 :107–19. doi: 10.1016/j.tsc.2012.09.002. [ CrossRef ] [ Google Scholar ]
  • Stahl Tomas, van Prooijen Jan-Wilem. Epistemic irrationality: Skepticism toward unfounded beliefs requires sufficient cognitive ability and motivation to be rational. Personality and Individual Differences. 2018; 122 :155–63. doi: 10.1016/j.paid.2017.10.026. [ CrossRef ] [ Google Scholar ]
  • Stanovich Keith E. What Intelligence Tests Miss: The Psychology of Rational Thought. Yale University Press; New Haven: 2009. [ Google Scholar ]
  • Stanovich Keith E., West Richard F. Reasoning independently of prior belief and individual differences in actively open-minded thinking. Journal of Educational Psychology. 1997; 89 :345–57. doi: 10.1037/0022-0663.89.2.342. [ CrossRef ] [ Google Scholar ]
  • Stanovich Keith E., West Richard F. Natural myside bias is independent of cognitive ability. Thinking & Reasoning. 2007; 13 :225–47. [ Google Scholar ]
  • Stanovich Keith E., West Richard F. On the failure of cognitive ability to predict myside and one-sided thinking bias. Thinking and Reasoning. 2008; 14 :129–67. doi: 10.1080/13546780701679764. [ CrossRef ] [ Google Scholar ]
  • Stanovich Keith E., West Richard F., Toplak Maggie E. The Rationality Quotient: Toward a Test of Rational Thinking. The MIT Press; Cambridge, MA: 2018. [ Google Scholar ]
  • Sternberg Robert J. The Triarchic Mind: A New Theory of Intelligence. Penguin Press; London: 1988. [ Google Scholar ]
  • Sternberg Robert J. A theory of adaptive intelligence and its relation to general intelligence. Journal of Intelligence. 2019; 7 :23. doi: 10.3390/jintelligence7040023. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sternberg Robert J. The search for the elusive basic processes underlying human intelligence: Historical and contemporary perspectives. Journal of Intelligence. 2022; 10 :28. doi: 10.3390/jintelligence10020028. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Swami Viren, Voracek Martin, Stieger Stefan, Tran Ulrich S., Furnham Adrian. Analytic thinking reduces belief in conspiracy theories. Cognition. 2014; 133 :572–85. doi: 10.1016/j.cognition.2014.08.006. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Teunisse Alessandra K., Case Trevor I., Fitness Julie, Sweller Naomi. I should have known better: Development of a self-report measure of gullibility. Personality and Social Psychology Bulletin. 2019; 46 :408–23. doi: 10.1177/0146167219858641. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tobacyk Jerome J. A revised paranormal belief scale. The International Journal of Transpersonal Studies. 2004; 23 :94–98. doi: 10.24972/ijts.2004.23.1.94. [ CrossRef ] [ Google Scholar ]
  • van der Linden Sander. The conspiracy-effect: Exposure to conspiracy theories (about global warming) leads to decreases pro-social behavior and science acceptance. Personality and Individual Differences. 2015; 87 :173–75. doi: 10.1016/j.paid.2015.07.045. [ CrossRef ] [ Google Scholar ]
  • van Prooijen Jan-Willem. Belief in conspiracy theories: Gullibility or rational skepticism? In: Forgas Joseph P., Baumeister Roy F., editors. The Social Psychology of Gullibility: Fake News, Conspiracy Theories, and Irrational Beliefs. Routledge; London: 2019. pp. 319–32. [ Google Scholar ]
  • Wechsler David. The Measurement of Intelligence. 3rd ed. Williams & Witkins; Baltimore: 1944. [ Google Scholar ]
  • West Richard F., Toplak Maggie E., Stanovich Keith E. Heuristics and biases as measures of critical thinking: Associations with cognitive ability and thinking dispositions. Journal of Educational Psychology. 2008; 100 :930–41. doi: 10.1037/a0012842. [ CrossRef ] [ Google Scholar ]

international test of critical thinking

  • Kindle Store
  • Kindle eBooks
  • Education & Teaching

Promotions apply when you purchase

These promotions will be applied to this item:

Some promotions may be combined; others are not eligible to be combined with other offers. For details, please see the Terms & Conditions associated with these promotions.

  • Highlight, take notes, and search in the book

Buy for others

Buying and sending ebooks to others.

  • Select quantity
  • Buy and send eBooks
  • Recipients can read on any device

These ebooks can only be redeemed by recipients in the US. Redemption links and eBooks cannot be resold.

international test of critical thinking

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required .

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Image Unavailable

The International Critical Thinking Reading and Writing Test (Thinker's Guide Library)

  • To view this video download Flash Player

Follow the author

Linda Elder

The International Critical Thinking Reading and Writing Test (Thinker's Guide Library) 2nd Edition, Kindle Edition

  • ISBN-13 978-0944583326
  • Edition 2nd
  • Sticky notes On Kindle Scribe
  • Publisher The Foundation for Critical Thinking
  • Publication date June 1, 2019
  • Part of series Thinker's Guide Library
  • Language English
  • File size 1839 KB
  • See all details
  • Kindle (5th Generation)
  • Kindle Keyboard
  • Kindle (2nd Generation)
  • Kindle (1st Generation)
  • Kindle Paperwhite
  • Kindle Paperwhite (5th Generation)
  • Kindle Touch
  • Kindle Voyage
  • Kindle Oasis
  • Kindle Scribe (1st Generation)
  • Kindle Fire HDX 8.9''
  • Kindle Fire HDX
  • Kindle Fire HD (3rd Generation)
  • Fire HDX 8.9 Tablet
  • Fire HD 7 Tablet
  • Fire HD 6 Tablet
  • Kindle Fire HD 8.9"
  • Kindle Fire HD(1st Generation)
  • Kindle Fire(2nd Generation)
  • Kindle Fire(1st Generation)
  • Kindle for Windows 8
  • Kindle for Windows Phone
  • Kindle for BlackBerry
  • Kindle for Android Phones
  • Kindle for Android Tablets
  • Kindle for iPhone
  • Kindle for iPod Touch
  • Kindle for iPad
  • Kindle for Mac
  • Kindle for PC
  • Kindle Cloud Reader
  • Next 3 for you in this series $83.87
  • Next 5 for you in this series $139.03
  • All 22 for you in this series $594.86

Thinker's Guide Library  image

  • In This Series
  • By Linda Elder
  • Customers Also Enjoyed

The Miniature Guide to Practical Ways for Promoting Active and Cooperative Learning (Thinker's Guide Library)

Editorial Reviews

About the author, product details.

  • ASIN ‏ : ‎ B07V2BXD46
  • Publisher ‏ : ‎ The Foundation for Critical Thinking; 2nd edition (June 1, 2019)
  • Publication date ‏ : ‎ June 1, 2019
  • Language ‏ : ‎ English
  • File size ‏ : ‎ 1839 KB
  • Text-to-Speech ‏ : ‎ Enabled
  • Screen Reader ‏ : ‎ Supported
  • Enhanced typesetting ‏ : ‎ Enabled
  • X-Ray ‏ : ‎ Not Enabled
  • Word Wise ‏ : ‎ Enabled
  • Sticky notes ‏ : ‎ On Kindle Scribe
  • Print length ‏ : ‎ 109 pages
  • #875 in Education Problem Solving
  • #938 in Logic & Language Philosophy
  • #3,134 in Philosophy of Logic & Language

About the author

Linda elder.

Dr. Linda Elder is an educational psychologist and a prominent authority on critical thinking. She is President of the Foundation for Critical Thinking and Executive Director of the Center for Critical Thinking. Dr. Elder has taught psychology and critical thinking at the college level and has given presentations to more than 20,000 educators at all levels. She has co-authored four books, including Critical Thinking: Tools for Taking Charge of Your Learning and Your Life, Critical Thinking: Tools for Taking Charge of Your Professional and Personal Life and Twenty-Five Days to Better Thinking and Better Living. She has co-authored eighteen thinker’s guides on critical thinking and co-authors a quarterly column on critical thinking in the Journal of Developmental Education.

Dr. Elder has also developed an original stage theory of critical thinking development. Concerned with understanding and illuminating the relationship between thinking and affect, and the barriers to critical thinking, Dr. Elder has placed these issues at the center of her thinking and her work. For more information on Dr. Elder and the work of the Foundation for Critical Thinking visit www.criticalthinking.org

Customer reviews

Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.

To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.

No customer reviews

  • Amazon Newsletter
  • About Amazon
  • Accessibility
  • Sustainability
  • Press Center
  • Investor Relations
  • Amazon Devices
  • Amazon Science
  • Start Selling with Amazon
  • Sell apps on Amazon
  • Supply to Amazon
  • Protect & Build Your Brand
  • Become an Affiliate
  • Become a Delivery Driver
  • Start a Package Delivery Business
  • Advertise Your Products
  • Self-Publish with Us
  • Host an Amazon Hub
  • › See More Ways to Make Money
  • Amazon Visa
  • Amazon Store Card
  • Amazon Secured Card
  • Amazon Business Card
  • Shop with Points
  • Credit Card Marketplace
  • Reload Your Balance
  • Amazon Currency Converter
  • Your Account
  • Your Orders
  • Shipping Rates & Policies
  • Amazon Prime
  • Returns & Replacements
  • Manage Your Content and Devices
  • Recalls and Product Safety Alerts
  • Conditions of Use
  • Privacy Notice
  • Consumer Health Data Privacy Disclosure
  • Your Ads Privacy Choices

Critical thinking definition

international test of critical thinking

Critical thinking, as described by Oxford Languages, is the objective analysis and evaluation of an issue in order to form a judgement.

Active and skillful approach, evaluation, assessment, synthesis, and/or evaluation of information obtained from, or made by, observation, knowledge, reflection, acumen or conversation, as a guide to belief and action, requires the critical thinking process, which is why it's often used in education and academics.

Some even may view it as a backbone of modern thought.

However, it's a skill, and skills must be trained and encouraged to be used at its full potential.

People turn up to various approaches in improving their critical thinking, like:

  • Developing technical and problem-solving skills
  • Engaging in more active listening
  • Actively questioning their assumptions and beliefs
  • Seeking out more diversity of thought
  • Opening up their curiosity in an intellectual way etc.

Is critical thinking useful in writing?

Critical thinking can help in planning your paper and making it more concise, but it's not obvious at first. We carefully pinpointed some the questions you should ask yourself when boosting critical thinking in writing:

  • What information should be included?
  • Which information resources should the author look to?
  • What degree of technical knowledge should the report assume its audience has?
  • What is the most effective way to show information?
  • How should the report be organized?
  • How should it be designed?
  • What tone and level of language difficulty should the document have?

Usage of critical thinking comes down not only to the outline of your paper, it also begs the question: How can we use critical thinking solving problems in our writing's topic?

Let's say, you have a Powerpoint on how critical thinking can reduce poverty in the United States. You'll primarily have to define critical thinking for the viewers, as well as use a lot of critical thinking questions and synonyms to get them to be familiar with your methods and start the thinking process behind it.

Are there any services that can help me use more critical thinking?

We understand that it's difficult to learn how to use critical thinking more effectively in just one article, but our service is here to help.

We are a team specializing in writing essays and other assignments for college students and all other types of customers who need a helping hand in its making. We cover a great range of topics, offer perfect quality work, always deliver on time and aim to leave our customers completely satisfied with what they ordered.

The ordering process is fully online, and it goes as follows:

  • Select the topic and the deadline of your essay.
  • Provide us with any details, requirements, statements that should be emphasized or particular parts of the essay writing process you struggle with.
  • Leave the email address, where your completed order will be sent to.
  • Select your prefered payment type, sit back and relax!

With lots of experience on the market, professionally degreed essay writers , online 24/7 customer support and incredibly low prices, you won't find a service offering a better deal than ours.

Critical Legal Thinking

International law and failure in the context of Gaza

by Marina Velickovic | 2 Apr 2024

international test of critical thinking

A few days ago a discussion developed on Twitter (as these things do) about whether Gaza (and specifically the failure to prevent or halt the ongoing genocide) signals a failure of international law. Many of the responses seemed to be saying a similar thing, mainly, that what we are witnessing is a failure of political will, and not a failure of law. I engaged briefly with the Tweet (saying that whether international law has failed depends on how we perceive success or failure of law – as liberation, justice or discursive reproduction), but the discussion prompted a deeper reflection on what success of international law would look like in the context of Palestine. Below, I explore a number of potential answers. 

A ceasefire

The UN Security Council adopted Resolution 2728 (2024) on March 25, demanding a temporary ceasefire in Gaza for the month of Ramadan. Immediately a discursive battle ensued, with the US claiming that the resolution was vibes-only and non-binding, and a number of states in-turn insisting that it was in fact, very much international law, and therefore binding . Regardless of its legal status, more than 100 Palestinians were killed in the subsequent 48 hours . There is also something strange about a temporary ceasefire for the month of Ramadan. A two-week halt, before the mass murder continues. A pause to let people pray. Except of course, mosques have been destroyed; there is no clean water for either abdest or to break one’s fast; and there is no food. No bombs is always better than bombs, but no bombs does not translate into a dignified Ramadan, and so the provision feels performative. 

In a sense this kind of ambiguity is likely to follow any ceasefire: a stop in fighting is unquestionably a good thing, in that it saves lives and ensures release of the hostages. But a ceasefire now is also a ceasefire five months too late; a ceasefire 30,000 dead Palestinians too late. A ceasefire now is a ceasefire that did not happen after the Al-Ahli hospital was bombed , or after all the hospitals were bombed, after babies decomposed in the Al-Nasr hospital ICU, or after doctor after doctor spoke of children having their limbs amputated without pain relief . It would be a ceasefire that did not happen after Hind Rajab was trapped in the car with her murdered relatives calling for help, her whereabouts unknown for days; or after it became clear that Israel had targeted the ambulance that was going to save her. It would be a ceasefire that did not happen despite hundreds of videos of parents grieving their dead children; of children covered in dust and blood and paralyzed by shock sitting alone, as sole survivors of whole family trees. A ceasefire now answers the question that many of us asked at the start: how many Palestinian deaths until Israel’s right to self-defence is no longer limitless? 

My point here is not just that a ceasefire now is too little too late; it is that the damage caused, and the trauma suffered and the destruction of life as it was known in Gaza is so profound that a ceasefire, while necessary, seems a little bit like putting a single band aid on 100,000 bullet wounds. And it is difficult to imagine what an adequate redress would be for the harm caused. 

In her report, which she presented to the Human Rights Council on March 26, the Special Rapporteur on the state of the human rights in Palestine, Francesca Albanese, recommends that Israel and states complicit in the genocide ‘acknowledge the colossal harm done, commit to non-repetition, with measures for prevention, full reparations, including the full cost of the reconstruction of Gaza’. She further recommends, in the short term, deployment of an international protective presence ‘to constrain the violence routinely used against Palestinians in the occupied Palestinian territory’, and in the longer term a development of a plan to end the ‘unlawful and unsustainable status quo constituting the root cause of the latest escalation.’  The final point is ambitious, although it is not entirely clear from the Report which, precisely, aspect of the status quo the General Assembly is meant to address: the apartheid (suggestion is made to reconstitute the UN Special Committee against Apartheid); the occupation; or the settler-colonialism (which Albanese sets out as relevant context for the genocide at the very beginning of the Report). What is unsustainable may not be unlawful; and what is unlawful may not be a root cause. Addressing unlawfulness then might very well not result in any meaningful form of justice.  And so even if the legal demands set out in the Report could be met , that might not translate into a success story about justice and liberation. Success of law is then, quite distinct from a victory for justice. 

Finally, if a (permanent) ceasefire resolution were to be passed, celebrating this as a success of international law risks obscuring the role of mass political mobilization and organizing, which significantly increased the stakes of continued inaction for political leaders in the Global North. Never forget that Keir Starmer, a human rights barrister and a leader of the Labour Party, went on LBC in October and claimed that Israel has the right to withhold power and water from Gaza. When Labour finally called for a ceasefire four months later, it was not because the law changed; it was because their position had become politically untenable, and Starmer was facing an ever-growing rebellion by his frontbenchers . 

A finding of genocide by the International Court of Justice

In December South Africa brought a case against Israel, under the Genocide Convention, before the International Court of Justice (ICJ). In January, in what must have been one of the most live-streamed sessions, the Court found that it was sufficiently plausible that Israel was committing a genocide in Gaza to order a set of provisional measures aimed at halting the potential breach of international law, until the case can be decided on merits. 

What I found interesting at the time (in both my own reaction to the provisional measures, and that of colleagues) was the sense that the order of provisional measures was a good thing. The reaction was strange, if for no other reason than the fact that it was already at that point clear that Israel was comfortable breaching international law (one might argue that this has been clear for some decades now). My excitement seemed to be more about the fact that international law, a thing I had invested quite a lot of time and energy into studying, was finally doing something, than it was about the possibility of this something having a tangible impact on the situation in Gaza. I had no expectation that international law would help, but it was nice to see it try. Grietje Baars has written brilliantly about a sense of alienation from one’s critical self, during this brief moment of critical lawyers’ deep investment in the liberal international legal order. 

And while the provisional measures have not materially changed the situation in Gaza, that does not necessarily mean that they were a failure of international law. What the ICJ provided was a benchmark against which to judge Israel’s subsequent behavior; and a legal alarm bell to ring. When the UNRWA was defunded, UN experts warned that states’ failure to provide humanitarian aid to Gaza could amount to aiding and abetting of genocide; in his account of the famine in Gaza, Michael Fakhri, the Special Rapporteur on the Right to Food, also made reference to the ICJ case ; and in her report to the Human Rights Council Francesca Albanese argued that Israel ‘appears to have failed to comply with the binding measures ordered by the ICJ on 26 January 2024.’  The measures have then been useful as a rhetorical device. I would also argue that more than this, they have been useful for discursive reproduction of international law, despite its abject failure to alter material reality. 

The ICJ Order itself heavily references UN agencies and observers, and their reports of human rights abuses from October through to January. Textually, the order then reproduces much of the facts that had been established by various UN bodies. This act of reproduction gives value to the observations: they go from being a form of inaction (watching genocide happen), to being a form of action (documenting genocide happen). Similarly, the ICJ Order, which did not result in a better humanitarian situation on the ground, has been subsequently referenced in UN reports and media coverage, as a way of saying – what is happening is a breach of the provisional measures, it is unlawful, and therefore wrong. The issue with this, of course, is that what is unlawful becomes our definition of what is wrong; and conversely, it becomes increasingly difficult to articulate as wrong that which is not unlawful. Legality displaces both ethics and politics, and becomes a (very conservative) benchmark against which to judge the conduct of hostilities. And the becoming of the benchmark is interpreted as an action of law, as its doing of something . This ability to maintain discursive relevance, despite abject failure (as evidence by the mass murder of 30,000 Palestinians) is both essential for the reproduction of the field, and ensures that widespread injustice at no point threatens the reproduction (and therefore existence) of the legal system. The magic of international law is that it turns every fiasco into a prelude to its own success story. Yes this awful thing happened; but look how we punished it, look how we learned from it. 

So, if five years from now, the ICJ does find that Israel has committed a genocide in Gaza, I would be reluctant to call this a success. Especially if Gaza is still an open-air prison; if Palestine is still occupied. Or, in calling it a success, we (international lawyers) should be clear that it is a victory of law for law; not for justice, and certainly not for Palestine.

A brief conclusion   

What emerges from these two legal interventions that I briefly explored is that success of international law should not be conflated with either justice or liberation. This seems like a basic point to make (and in many ways it is), but I do think it is worth reiterating, because as a discipline we have an investment in the thing that we study and practice; an investment (at the very least) in its continued relevance. But our attempts to make law seem active, seem a crusader for justice, are not neutral, they are deeply ideological; they enable the consistent reproduction of the discipline, of the good, the bad, and the ugly; of its many injustices. And they leave those who bear the brunt of law’s failures alone in their disillusionment and anger. 

Bio: Marina Veličković is a Leverhulme Early Career Fellow at the University of Warwick, UK where she is researching the role of international law in (re)producing structures of violence. She holds a PhD from the University of Cambridge and an LLM in Public International Law from the London School of Economics.

This article is written by someone who has already decide what side they are on and what principles should apply. In the context of this article it may be helpful to clarify what constitutes law. Law is a set of rules that can be enforced by a policeman. No police, no law. Period. What we have is a set of international conventions. Conventions may be put to one side or ignored if it is convenient to do so. Sovereign states, which internally have rules and police to enforce them, frequently break or ignore international conventions. These conventions are frequently used by some states as a political tool. Such usage should be recognised as politics; not law.

In this case of Israel an external enemy invaded their country, slaughtered hundreds of their citizens and kidnapped a few hundred more. This enemy was an organisation called Hamas; this organisation may be described as a terrorist organisation or as a rogue state. Any sovereign nation faced with such an organisation will attempt to control; or eliminate it.

If Turkey invaded Greece, killed thousands of thracians; kidnapped hundreds of others, and promised to continue this activity indefinitely; the global response would be an outpouring of support for Greece. Hamas can, at any time, apologise for the atrocities committed, in its name, on 7th October,2023; return all the remaining hostages; identify and bring to trial the perpetrators of the massacre; and undertake to cease its attacks on Israel. Until they do this the Israelis have no choice but to continue an endeavour to destroy Hamas.

There are horrifying consequences for the civilian population of the Gaza but, if Hamas wishes to conduct a war ; then such is war.

Using such notions as “International Law” is placing a sticking plaster over what is nakedly a political problem and which, eventually, will have a political or military solution.

The use of disingenuous language such as “genocide” , “mass murder”, and even the term “International Law” itself is not insightful or helpful.

It seems that our friend Charles has already made up his mind about the Israeli operation in Gaza.

He has no time for the Palestinians and the illegal actions of the Israeli colonists who have slaughtered Palestinians for more than 75 years, and kept them in an “open air prison” aka Gaza.

One thing we can agree on is his statement of the obvious-International Law is basically used, or not, politically in many instances. Where we disagree is where we rejcct his simplistic account of what law is. His positivist notion of law-a policeman with a club-is too ridiculous to be taken seriously.

I too am concerned about the discursive reproduction of international law which still has not altered reality, and find myself also with a sense of alientation from my own critical self that Marina Veličković describes – so I welcome this bold speak and insightful analysis from a fellow academic.

Thank you Gill. Occam’s razor should apply. “Simplistic”? Please explain the utility of a concept of law that has no means of enforcement.

As to the historic background of the Israeli /Arab conflict I believe that there is no “right’ or “wrong’; an insoluble nightmare! I have no “brief” for either side but it is a tragedy for both.

The kind of verbiage which this article exemplifies simply confuses issues; provides a justification for for all sorts of political stances; and opportunities for grandstanding and virtue signalling.

Obviously there will be an outcome; but it will be based on either a political or military solution. Dressing these up in fairy-tales of International Law may help some of the commentariat in dealing with the outcome but will not affect the realities.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Notify me of new posts by email.

Submit Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed .

  • Facebook 14k Followers
  • Twitter 5k Followers

POSTS BY EMAIL

Email Address

We respect your privacy .

international test of critical thinking

FAIR ACCESS* PUBLISHER IN LAW AND THE HUMANITIES

international test of critical thinking

*fair access = access according to ability to pay on a sliding scale down to zero.

international test of critical thinking

JUST PUBLISHED

PUBLISH ON CLT

Publish your article with us and get read by the largest community of critical legal scholars, with over 4500 subscribers.

IMAGES

  1. Critical Thinking Assessment: 4 Ways to Test Applicants

    international test of critical thinking

  2. Critical Thinking Tests

    international test of critical thinking

  3. Critical Thinking Test: Questions and Answers

    international test of critical thinking

  4. Critical Thinking Test by Researchers

    international test of critical thinking

  5. Amazon.com: The International Critical Thinking Reading and Writing

    international test of critical thinking

  6. International Critical Thinking Essay Test

    international test of critical thinking

COMMENTS

  1. International Critical Thinking Essay Test

    The International Critical Thinking Essay Test Is Available to Educational Institutions Under Three Different Options . With a Training Session For Test Graders. This test is designed for use by faculty who are fostering understanding of the analysis and assessment of thought. Accurately grading the test and achieving inter-rater reliability ...

  2. Validity and reliability testing of the International Critical Thinking

    A self-selecting sample of participants (N = 100) completed the ICTET-A and a comparison test (the Ennis Weir Critical Thinking Essay Test) in an online, correlational, cross-sectional study.

  3. The Ennis-Weir Critical Thinking Essay Test

    The Ennis-Weir critical thinking essay test: Test, manual, criteria, scoring sheet : an instrument for teaching and testing. Cheltenham, Vic: Hawker Brownlow. Hollis, H., Rachitskiy, R., van der Leer, L., and Elder, L. (in press) Validity and reliability testing of the International Critical Thinking Essay Test form A (ICTET-A). Psychological ...

  4. Assessing Critical Thinking in Higher Education: Current State and

    Critical thinking is one of the most frequently discussed higher order skills, believed to play a central role in logical thinking, decision making, and problem solving (Butler, 2012; Halpern, 2003).It is also a highly contentious skill in that researchers debate about its definition; its amenability to assessment; its degree of generality or specificity; and the evidence of its practical ...

  5. Frontiers

    An Approach to Performance Assessment of Critical Thinking: The iPAL Program. The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and ...

  6. The International Critical Thinking Reading and Writing Test

    The test fosters close reading and substantive writing abilities and is designed for secondary and higher education students.As part of the Thinker's Guide Library, this book advances the mission of the Foundation for Critical Thinking to promote fairminded critical societies through cultivating essential intellectual abilities and virtues ...

  7. International Study Reveals Measuring and Developing Critical-Thinking

    "The results outlined in this report show the power of assessing critical-thinking skills and how such assessments can feed into the higher education policy agenda at the national and international level," said article co-author Dirk Van Damme, former head of the Centre for Educational Research and Innovation at OECD and current senior ...

  8. (Pdf) International Performance Assessment of Critical Thinking

    Background: The International Performance Assessments of Learning (iPAL) project developed a framework for the construction of performance assessments for testing critical thinking, which proposes ...

  9. The Development of Open Questions in International Critical Thinking

    This paper reviewed threes ways of assessing critical thinking with open questions, that is, text test, reading and writing test, and introduced and commented their question-designing, scoring ...

  10. PDF CHAPTER 1 INTRODUCTION

    The Test of Critical Thinking (TCT) is intended to assess critical thinking in students in grades three to five. The TCT is based theoretically on aspects of the Delphi Report (Facione, 1990a) and especially Paul's (1992) model of reasoning, specifically Paul's eight elements of thought. The TCT consists of ten short stories or text ...

  11. The International Critical Thinking Reading and Writing Test (Thinker's

    Developed by the Foundation for Critical Thinking, The International Critical Thinking Reading and Writing Test assesses the extent to which students have acquired the reading and writing abilities required for skilled analysis and evaluation. These skills are essential to the educated mind and should be considered core elements of any ...

  12. The International Critical Thinking Reading and Writing Test

    Developed by the Foundation for Critical Thinking, The International Critical Thinking Reading and Writing Test assesses the extent to which students have acquired the reading and writing abilities required for skilled analysis and evaluation. These skills are essential to the educated mind and should be considered core elements of any educational program.

  13. PDF Validity and reliability testing of the International Critical Thinking

    Thinking Essay test as a tool for measuring critical thinking. To this end, we assessed the test for inter-rater reliability, internal reliability, and criterion validity. A self-selecting sample of participants (N = 100) completed the ICTET-A and a comparison test (the Ennis Weir Critical Thinking Essay Test) in an online, correlational, cross-

  14. Predicting Everyday Critical Thinking: A Review of Critical Thinking

    Our ability to think critically and our disposition to do so can have major implications for our everyday lives. Research across the globe has shown the impact of critical thinking on decisions about our health, politics, relationships, finances, consumer purchases, education, work, and more. This chapter will review some of that research. Given the importance of critical thinking to our ...

  15. Critical Thinking task

    The UPCH Critical Thinking task consists of a short text, or text and images, to read and study before the interview and on which you'll be asked questions during the interview. You may also be asked questions that relate more generally to the text's topic(s) or to your study skills.

  16. Critical Thinking, Intelligence, and Unsubstantiated Beliefs: An

    A review of the research shows that critical thinking is a more inclusive construct than intelligence, going beyond what general cognitive ability can account for. ... and scored lower on a psychological critical thinking skills test. Further supporting this connection, ... The International Journal of Transpersonal Studies. 2004; 23:94-98 ...

  17. The International Critical Thinking Reading and Writing Test (Thinker's

    Developed by the Foundation for Critical Thinking, The International Critical Thinking Reading and Writing Test assesses the extent to which students have acquired the reading and writing abilities required for skilled analysis and evaluation. These skills are essential to the educated mind and should be considered core elements of any ...

  18. Using Critical Thinking in Essays and other Assignments

    Critical thinking, as described by Oxford Languages, is the objective analysis and evaluation of an issue in order to form a judgement. Active and skillful approach, evaluation, assessment, synthesis, and/or evaluation of information obtained from, or made by, observation, knowledge, reflection, acumen or conversation, as a guide to belief and action, requires the critical thinking process ...

  19. International law and failure in the context of Gaza

    The UN Security Council adopted Resolution 2728 (2024) on March 25, demanding a temporary ceasefire in Gaza for the month of Ramadan. Immediately a discursive battle ensued, with the US claiming that the resolution was vibes-only and non-binding, and a number of states in-turn insisting that it was in fact, very much international law, and ...