Visual Perception Theory In Psychology

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

perception vs sensation

What is Visual Perception?

To receive information from the environment, we are equipped with sense organs, e.g., the eye, ear, and nose.  Each sense organ is part of a sensory system that receives sensory inputs and transmits sensory information to the brain.

A particular problem for psychologists is explaining how the physical energy received by sense organs forms the basis of perceptual experience. Sensory inputs are somehow converted into perceptions of desks and computers, flowers and buildings, cars and planes, into sights, sounds, smells, tastes, and touch experiences.

A major theoretical issue on which psychologists are divided is the extent to which perception relies directly on the information present in the environment.  Some argue that perceptual processes are not direct but depend on the perceiver’s expectations and previous knowledge as well as the information available in the stimulus itself.

perception theories

This controversy is discussed with respect to Gibson (1966), who has proposed a direct theory of perception which is a “bottom-up” theory, and Gregory (1970), who has proposed a constructivist (indirect) theory of perception which is a “top-down” theory.

Psychologists distinguish between two types of processes in perception: bottom-up processing and top-down processing .

Bottom-up processing is also known as data-driven processing because perception begins with the stimulus itself. Processing is carried out in one direction from the retina to the visual cortex, with each successive stage in the visual pathway carrying out an ever more complex analysis of the input.

Top-down processing refers to the use of contextual information in pattern recognition. For example, understanding difficult handwriting is easier when reading complete sentences than reading single and isolated words. This is because the meaning of the surrounding words provides a context to aid understanding.

Gregory (1970) and Top-Down Processing Theory

what is top-down processing in visual perception

Psychologist Richard Gregory (1970) argued that perception is a constructive process that relies on top-down processing.

Stimulus information from our environment is frequently ambiguous, so to interpret it, we require higher cognitive information either from past experiences or stored knowledge in order to make inferences about what we perceive. Helmholtz called it the ‘likelihood principle’.

For Gregory, perception is a hypothesis which is based on prior knowledge. In this way, we are actively constructing our perception of reality based on our environment and stored information.

  • A lot of information reaches the eye, but much is lost by the time it reaches the brain (Gregory estimates about 90% is lost).
  • Therefore, the brain has to guess what a person sees based on past experiences. We actively construct our perception of reality.
  • Richard Gregory proposed that perception involves a lot of hypothesis testing to make sense of the information presented to the sense organs.
  • Our perceptions of the world are hypotheses based on past experiences and stored information.
  • Sensory receptors receive information from the environment, which is then combined with previously stored information about the world which we have built up as a result of experience.
  • The formation of incorrect hypotheses will lead to errors of perception (e.g., visual illusions like the Necker cube).

Supporting Evidence

There seems to be an overwhelming need to reconstruct the face, similar to Helmholtz’s description of “unconscious inference.” An assumption based on past experience.

Perceptions can be ambiguous

necker cube

The Necker cube is a good example of this. When you stare at the crosses on the cube, the orientation can suddenly change or “flip.”

It becomes unstable, and a single physical pattern can produce two perceptions.

Gregory argued that this object appears to flip between orientations because the brain develops two equally plausible hypotheses and is unable to decide between them.

When the perception changes though there is no change in the sensory input, the change of appearance cannot be due to bottom-up processing. It must be set downwards by the prevailing perceptual hypothesis of what is near and what is far.

Perception allows behavior to be generally appropriate to non-sensed object characteristics.

Critical Evaluation of Gregory’s Theory

1. the nature of perceptual hypotheses.

If perceptions make use of hypothesis testing, the question can be asked, “what kind of hypotheses are they?” Scientists modify a hypothesis according to the support they find for it, so are we, as perceivers, also able to modify our hypotheses? In some cases, it would seem the answer is yes.  For example, look at the figure below:

perception

This probably looks like a random arrangement of black shapes. In fact, there is a hidden face in there; can you see it? The face is looking straight ahead and is in the top half of the picture in the center.  Now can you see it?  The figure is strongly lit from the side and has long hair and a beard.

Once the face is discovered, very rapid perceptual learning takes place and the ambiguous picture now obviously contains a face each time we look at it. We have learned to perceive the stimulus in a different way.

Although in some cases, as in the ambiguous face picture, there is a direct relationship between modifying hypotheses and perception, in other cases, this is not so evident.  For example, illusions persist even when we have full knowledge of them (e.g., the inverted face, Gregory 1974).

One would expect that the knowledge we have learned (from, say, touching the face and confirming that it is not “normal”) would modify our hypotheses in an adaptive manner. The current hypothesis testing theories cannot explain this lack of a relationship between learning and perception.

2. Perceptual Development

A perplexing question for the constructivists who propose perception is essentially top-down in nature is “how can the neonate ever perceive?”  If we all have to construct our own worlds based on past experiences, why are our perceptions so similar, even across cultures?  Relying on individual constructs for making sense of the world makes perception a very individual and chancy process.

The constructivist approach stresses the role of knowledge in perception and therefore is against the nativist approach to perceptual development.

However, a substantial body of evidence has been accrued favoring the nativist approach. For example, Newborn infants show shape constancy (Slater & Morison, 1985); they prefer their mother’s voice to other voices (De Casper & Fifer, 1980); and it has been established that they prefer normal features to scrambled features as early as 5 minutes after birth.

3. Sensory Evidence

Perhaps the major criticism of the constructivists is that they have underestimated the richness of sensory evidence available to perceivers in the real world (as opposed to the laboratory, where much of the constructivists” evidence has come from).

Constructivists like Gregory frequently use the example of size constancy to support their explanations. That is, we correctly perceive the size of an object even though the retinal image of an object shrinks as the object recedes. They propose that sensory evidence from other sources must be available for us to be able to do this.

However, in the real world, retinal images are rarely seen in isolation (as is possible in the laboratory). There is a rich array of sensory information, including other objects, background, the distant horizon, and movement. This rich source of sensory information is important to the second approach to explaining perception that we will examine, namely the direct approach to perception as proposed by Gibson.

Gibson argues strongly against the idea that perception involves top-down processing and criticizes Gregory’s discussion of visual illusions on the grounds that they are artificial examples and not images found in our normal visual environments.

This is crucial because Gregory accepts that misperceptions are the exception rather than the norm. Illusions may be interesting phenomena, but they might not be that information about the debate.

Gibson (1966) and Bottom-Up Processing

Gibson’s bottom-up theory suggests that perception involves innate mechanisms forged by evolution and that no learning is required. This suggests that perception is necessary for survival – without perception, we would live in a very dangerous environment.

Our ancestors would have needed perception to escape from harmful predators, suggesting perception is evolutionary.

James Gibson (1966) argues that perception is direct and not subject to hypothesis testing, as Gregory proposed. There is enough information in our environment to make sense of the world in a direct way.

His theory is sometimes known as the ‘Ecological Theory’ because of the claim that perception can be explained solely in terms of the environment.

For Gibson: the sensation is perception: what you see is what you get.  There is no need for processing (interpretation) as the information we receive about size, shape, distance, etc., is sufficiently detailed for us to interact directly with the environment.

Gibson (1972) argued that perception is a bottom-up process, which means that sensory information is analyzed in one direction: from simple analysis of raw sensory data to the ever-increasing complexity of analysis through the visual system.

what is bottom-up processing in visual perception

Features of Gibson’s Theory

The optic array.

Perception involves ‘picking up’ the rich information provided by the optic array in a direct way with little/no processing involved.

Because of movement and different intensities of light shining in different directions, it is an ever-changing source of sensory information. Therefore, if you move, the structure of the optic array changes.

According to Gibson, we have the mechanisms to interpret this unstable sensory input, meaning we experience a stable and meaningful view of the world.

Changes in the flow of the optic array contain important information about what type of movement is taking place. The flow of the optic array will either move from or towards a particular point.

If the flow appears to be coming from the point, it means you are moving towards it. If the optic array is moving towards the point, you are moving away from it.

Invariant Features

the optic array contains invariant information that remains constant as the observer moves. Invariants are aspects of the environment that don’t change. They supply us with crucial information.

Two good examples of invariants are texture and linear perspective.

what is visual representation in psychology

Another invariant is the horizon-ratio relation. The ratio above and below the horizon is constant for objects of the same size standing on the same ground.

OPTICAL ARRAY : The patterns of light that reach the eye from the environment.

RELATIVE BRIGHTNESS : Objects with brighter, clearer images are perceived as closer

TEXTURE GRADIENT : The grain of texture gets smaller as the object recedes. Gives the impression of surfaces receding into the distance.

RELATIVE SIZE : When an object moves further away from the eye, the image gets smaller. Objects with smaller images are seen as more distant.

SUPERIMPOSITION : If the image of one object blocks the image of another, the first object is seen as closer.

HEIGHT IN THE VISUAL FIELD : Objects further away are generally higher in the visual field

Evaluation of Gibson’s (1966) Direct Theory of Perception

Gibson’s theory is a highly ecologically valid theory as it puts perception back into the real world.

A large number of applications can be applied in terms of his theory, e.g., training pilots, runway markings, and road markings.

It’s an excellent explanation for perception when viewing conditions are clear. Gibson’s theory also highlights the richness of information in an optic array and provides an account of perception in animals, babies, and humans.

His theory is reductionist as it seeks to explain perception solely in terms of the environment. There is strong evidence to show that the brain and long-term memory can influence perception. In this case, it could be said that Gregory’s theory is far more plausible.

Gibson’s theory also only supports one side of the nature-nurture debate, that being the nature side. Again, Gregory’s theory is far more plausible as it suggests that what we see with our eyes is not enough, and we use knowledge already stored in our brains, supporting both sides of the debate.

Visual Illusions

Gibson’s emphasis on DIRECT perception provides an explanation for the (generally) fast and accurate perception of the environment. However, his theory cannot explain why perceptions are sometimes inaccurate, e.g., in illusions.

He claimed the illusions used in experimental work constituted extremely artificial perceptual situations unlikely to be encountered in the real world, however, this dismissal cannot realistically be applied to all illusions.

For example, Gibson’s theory cannot account for perceptual errors like the general tendency for people to overestimate vertical extents relative to horizontal ones.

Neither can Gibson’s theory explain naturally occurring illusions. For example, if you stare for some time at a waterfall and then transfer your gaze to a stationary object, the object appears to move in the opposite direction.

Bottom-up or Top-down Processing?

Neither direct nor constructivist theories of perception seem capable of explaining all perceptions all of the time.

Gibson’s theory appears to be based on perceivers operating under ideal viewing conditions, where stimulus information is plentiful and is available for a suitable length of time. Constructivist theories, like Gregory”s, have typically involved viewing under less-than-ideal conditions.

Research by Tulving et al. manipulated both the clarity of the stimulus input and the impact of the perceptual context in a word identification task. As the clarity of the stimulus (through exposure duration) and the amount of context increased, so did the likelihood of correct identification.

However, as the exposure duration increased, so the impact of context was reduced, suggesting that if stimulus information is high, then the need to use other sources of information is reduced.

One theory that explains how top-down and bottom-up processes may be seen as interacting with each other to produce the best interpretation of the stimulus was proposed by Neisser (1976) – known as the “Perceptual Cycle.”

DeCasper, A. J., & Fifer, W. P. (1980). Of human bonding: Newborns prefer their mothers” voices . Science , 208(4448), 1174-1176.

Gibson, J. J. (1966). The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.

Gibson, J. J. (1972). A Theory of Direct Visual Perception. In J. Royce, W. Rozenboom (Eds.). The Psychology of Knowing . New York: Gordon & Breach.

Gregory, R. (1970). The Intelligent Eye . London: Weidenfeld and Nicolson.

Gregory, R. (1974). Concepts and Mechanisms of Perception . London: Duckworth.

Necker, L. (1832). LXI. Observations on some remarkable optical phenomena seen in Switzerland; and on an optical phenomenon which occurs on viewing a figure of a crystal or geometrical solid . The London and Edinburgh Philosophical Magazine and Journal of Science, 1 (5), 329-337.

Slater, A., Morison, V., Somers, M., Mattock, A., Brown, E., & Taylor, D. (1990). Newborn and older infants” perception of partly occluded objects. Infant behavior and Development , 13(1), 33-49.

Further Information

Trichromatic Theory of Color Vision

Held and Hein (1963) Movement-Produced Stimulation in the Development of Visually Guided Behavior

What do visual illusions teach us?

Print Friendly, PDF & Email

  • Reviews / Why join our community?
  • For companies
  • Frequently asked questions

Visual Representation

What is visual representation.

Visual Representation refers to the principles by which markings on a surface are made and interpreted. Designers use representations like typography and illustrations to communicate information, emotions and concepts. Color, imagery, typography and layout are crucial in this communication.

Alan Blackwell, cognition scientist and professor, gives a brief introduction to visual representation:

  • Transcript loading…

We can see visual representation throughout human history, from cave drawings to data visualization :

Art uses visual representation to express emotions and abstract ideas.

Financial forecasting graphs condense data and research into a more straightforward format.

Icons on user interfaces (UI) represent different actions users can take.

The color of a notification indicates its nature and meaning.

A painting of an abstract night sky over a village, with a tree in the foreground.

Van Gogh's "The Starry Night" uses visuals to evoke deep emotions, representing an abstract, dreamy night sky. It exemplifies how art can communicate complex feelings and ideas.

© Public domain

Importance of Visual Representation in Design

Designers use visual representation for internal and external use throughout the design process . For example:

Storyboards are illustrations that outline users’ actions and where they perform them.

Sitemaps are diagrams that show the hierarchy and navigation structure of a website.

Wireframes are sketches that bring together elements of a user interface's structure.

Usability reports use graphs and charts to communicate data gathered from usability testing.

User interfaces visually represent information contained in applications and computerized devices.

A sample usability report that shows a few statistics, a bell curve and a donut chart.

This usability report is straightforward to understand. Yet, the data behind the visualizations could come from thousands of answered surveys.

© Interaction Design Foundation, CC BY-SA 4.0

Visual representation simplifies complex ideas and data and makes them easy to understand. Without these visual aids, designers would struggle to communicate their ideas, findings and products . For example, it would be easier to create a mockup of an e-commerce website interface than to describe it with words.

A side-by-side comparison of a simple mockup, and a very verbose description of the same mockup. A developer understands the simple one, and is confused by the verbose one.

Visual representation simplifies the communication of designs. Without mockups, it would be difficult for developers to reproduce designs using words alone.

Types of Visual Representation

Below are some of the most common forms of visual representation designers use.

Text and Typography

Text represents language and ideas through written characters and symbols. Readers visually perceive and interpret these characters. Typography turns text into a visual form, influencing its perception and interpretation.

We have developed the conventions of typography over centuries , for example, in documents, newspapers and magazines. These conventions include:

Text arranged on a grid brings clarity and structure. Gridded text makes complex information easier to navigate and understand. Tables, columns and other formats help organize content logically and enhance readability.

Contrasting text sizes create a visual hierarchy and draw attention to critical areas. For example, headings use larger text while body copy uses smaller text. This contrast helps readers distinguish between primary and secondary information.

Adequate spacing and paragraphing improve the readability and appearance of the text. These conventions prevent the content from appearing cluttered. Spacing and paragraphing make it easier for the eye to follow and for the brain to process the information.

Balanced image-to-text ratios create engaging layouts. Images break the monotony of text, provide visual relief and illustrate or emphasize points made in the text. A well-planned ratio ensures neither text nor images overwhelm each other. Effective ratios make designs more effective and appealing.

Designers use these conventions because people are familiar with them and better understand text presented in this manner.

A table of names and numbers indicating the funerals of victims of the plague in London in 1665.

This table of funerals from the plague in London in 1665 uses typographic conventions still used today. For example, the author arranged the information in a table and used contrasting text styling to highlight information in the header.

Illustrations and Drawings

Designers use illustrations and drawings independently or alongside text. An example of illustration used to communicate information is the assembly instructions created by furniture retailer IKEA. If IKEA used text instead of illustrations in their instructions, people would find it harder to assemble the furniture.

A diagram showing how to assemble a chest of drawers from furniture retailer IKEA.

IKEA assembly instructions use illustrations to inform customers how to build their furniture. The only text used is numeric to denote step and part numbers. IKEA communicates this information visually to: 1. Enable simple communication, 2. Ensure their instructions are easy to follow, regardless of the customer’s language.

© IKEA, Fair use

Illustrations and drawings can often convey the core message of a visual representation more effectively than a photograph. They focus on the core message , while a photograph might distract a viewer with additional details (such as who this person is, where they are from, etc.)

For example, in IKEA’s case, photographing a person building a piece of furniture might be complicated. Further, photographs may not be easy to understand in a black-and-white print, leading to higher printing costs. To be useful, the pictures would also need to be larger and would occupy more space on a printed manual, further adding to the costs.

But imagine a girl winking—this is something we can easily photograph. 

Ivan Sutherland, creator of the first graphical user interface, used his computer program Sketchpad to draw a winking girl. While not realistic, Sutherland's representation effectively portrays a winking girl. The drawing's abstract, generic elements contrast with the distinct winking eye. The graphical conventions of lines and shapes represent the eyes and mouth. The simplicity of the drawing does not draw attention away from the winking.

A simple illustration of a winking girl next to a photograph of a winking girl.

A photo might distract from the focused message compared to Sutherland's representation. In the photo, the other aspects of the image (i.e., the particular person) distract the viewer from this message.

© Ivan Sutherland, CC BY-SA 3.0 and Amina Filkins, Pexels License

Information and Data Visualization

Designers and other stakeholders use data and information visualization across many industries.

Data visualization uses charts and graphs to show raw data in a graphic form. Information visualization goes further, including more context and complex data sets. Information visualization often uses interactive elements to share a deeper understanding.

For example, most computerized devices have a battery level indicator. This is a type of data visualization. IV takes this further by allowing you to click on the battery indicator for further insights. These insights may include the apps that use the most battery and the last time you charged your device.

A simple battery level icon next to a screenshot of a battery information dashboard.

macOS displays a battery icon in the menu bar that visualizes your device’s battery level. This is an example of data visualization. Meanwhile, macOS’s settings tell you battery level over time, screen-on-usage and when you last charged your device. These insights are actionable; users may notice their battery drains at a specific time. This is an example of information visualization.

© Low Battery by Jemis Mali, CC BY-NC-ND 4.0, and Apple, Fair use

Information visualization is not exclusive to numeric data. It encompasses representations like diagrams and maps. For example, Google Maps collates various types of data and information into one interface:

Data Representation: Google Maps transforms complex geographical data into an easily understandable and navigable visual map.

Interactivity: Users can interactively customize views that show traffic, satellite imagery and more in real-time.

Layered Information: Google Maps layers multiple data types (e.g., traffic, weather) over geographical maps for comprehensive visualization.

User-Centered Design : The interface is intuitive and user-friendly, with symbols and colors for straightforward data interpretation.

A screenshot of Google Maps showing the Design Museum in London, UK. On the left is a profile of the location, on the right is the map.

The volume of data contained in one screenshot of Google Maps is massive. However, this information is presented clearly to the user. Google Maps highlights different terrains with colors and local places and businesses with icons and colors. The panel on the left lists the selected location’s profile, which includes an image, rating and contact information.

© Google, Fair use

Symbolic Correspondence

Symbolic correspondence uses universally recognized symbols and signs to convey specific meanings . This method employs widely recognized visual cues for immediate understanding. Symbolic correspondence removes the need for textual explanation.

For instance, a magnifying glass icon in UI design signifies the search function. Similarly, in environmental design, symbols for restrooms, parking and amenities guide visitors effectively.

A screenshot of the homepage Interaction Design Foundation website. Across the top is a menu bar. Beneath the menu bar is a header image with a call to action.

The Interaction Design Foundation (IxDF) website uses the universal magnifying glass symbol to signify the search function. Similarly, the play icon draws attention to a link to watch a video.

How Designers Create Visual Representations

Visual language.

Designers use elements like color , shape and texture to create a communicative visual experience. Designers use these 8 principles:

Size – Larger elements tend to capture users' attention readily.

Color – Users are typically drawn to bright colors over muted shades.

Contrast – Colors with stark contrasts catch the eye more effectively.

Alignment – Unaligned elements are more noticeable than those aligned ones.

Repetition – Similar styles repeated imply a relationship in content.

Proximity – Elements placed near each other appear to be connected.

Whitespace – Elements surrounded by ample space attract the eye.

Texture and Style – Users often notice richer textures before flat designs.

what is visual representation in psychology

The 8 visual design principles.

In web design , visual hierarchy uses color and repetition to direct the user's attention. Color choice is crucial as it creates contrast between different elements. Repetition helps to organize the design—it uses recurring elements to establish consistency and familiarity.

In this video, Alan Dix, Professor and Expert in Human-Computer Interaction, explains how visual alignment affects how we read and absorb information:

Correspondence Techniques

Designers use correspondence techniques to align visual elements with their conceptual meanings. These techniques include color coding, spatial arrangement and specific imagery. In information visualization, different colors can represent various data sets. This correspondence aids users in quickly identifying trends and relationships .

Two pie charts showing user satisfaction. One visualizes data 1 day after release, and the other 1 month after release. The colors are consistent between both charts, but the segment sizes are different.

Color coding enables the stakeholder to see the relationship and trend between the two pie charts easily.

In user interface design, correspondence techniques link elements with meaning. An example is color-coding notifications to state their nature. For instance, red for warnings and green for confirmation. These techniques are informative and intuitive and enhance the user experience.

A screenshot of an Interaction Design Foundation course page. It features information about the course and a video. Beneath this is a pop-up asking the user if they want to drop this course.

The IxDF website uses blue for call-to-actions (CTAs) and red for warnings. These colors inform the user of the nature of the action of buttons and other interactive elements.

Perception and Interpretation

If visual language is how designers create representations, then visual perception and interpretation are how users receive those representations. Consider a painting—the viewer’s eyes take in colors, shapes and lines, and the brain perceives these visual elements as a painting.

In this video, Alan Dix explains how the interplay of sensation, perception and culture is crucial to understanding visual experiences in design:

Copyright holder: Michael Murphy _ Appearance time: 07:19 - 07:37 _ Link: https://www.youtube.com/watch?v=C67JuZnBBDc

Visual perception principles are essential for creating compelling, engaging visual representations. For example, Gestalt principles explain how we perceive visual information. These rules describe how we group similar items, spot patterns and simplify complex images. Designers apply Gestalt principles to arrange content on websites and other interfaces. This application creates visually appealing and easily understood designs.

In this video, design expert and teacher Mia Cinelli discusses the significance of Gestalt principles in visual design . She introduces fundamental principles, like figure/ground relationships, similarity and proximity.

Interpretation

Everyone's experiences, culture and physical abilities dictate how they interpret visual representations. For this reason, designers carefully consider how users interpret their visual representations. They employ user research and testing to ensure their designs are attractive and functional.

A painting of a woman sitting and looking straight at the viewer. Her expression is difficult to read.

Leonardo da Vinci's "Mona Lisa", is one of the most famous paintings in the world. The piece is renowned for its subject's enigmatic expression. Some interpret her smile as content and serene, while others see it as sad or mischievous. Not everyone interprets this visual representation in the same way.

Color is an excellent example of how one person, compared to another, may interpret a visual element. Take the color red:

In Chinese culture, red symbolizes luck, while in some parts of Africa, it can mean death or illness.

A personal experience may mean a user has a negative or positive connotation with red.

People with protanopia and deuteranopia color blindness cannot distinguish between red and green.

In this video, Joann and Arielle Eckstut, leading color consultants and authors, explain how many factors influence how we perceive and interpret color:

Learn More about Visual Representation

Read Alan Blackwell’s chapter on visual representation from The Encyclopedia of Human-Computer Interaction.

Learn about the F-Shaped Pattern For Reading Web Content from Jakob Nielsen.

Read Smashing Magazine’s article, Visual Design Language: The Building Blocks Of Design .

Take the IxDF’s course, Perception and Memory in HCI and UX .

Questions related to Visual Representation

Some highly cited research on visual representation and related topics includes:

Roland, P. E., & Gulyás, B. (1994). Visual imagery and visual representation. Trends in Neurosciences, 17(7), 281-287. Roland and Gulyás' study explores how the brain creates visual imagination. They look at whether imagining things like objects and scenes uses the same parts of the brain as seeing them does. Their research shows the brain uses certain areas specifically for imagination. These areas are different from the areas used for seeing. This research is essential for understanding how our brain works with vision.

Lurie, N. H., & Mason, C. H. (2007). Visual Representation: Implications for Decision Making. Journal of Marketing, 71(1), 160-177.

This article looks at how visualization tools help in understanding complicated marketing data. It discusses how these tools affect decision-making in marketing. The article gives a detailed method to assess the impact of visuals on the study and combination of vast quantities of marketing data. It explores the benefits and possible biases visuals can bring to marketing choices. These factors make the article an essential resource for researchers and marketing experts. The article suggests using visual tools and detailed analysis together for the best results.

Lohse, G. L., Biolsi, K., Walker, N., & Rueter, H. H. (1994, December). A classification of visual representations. Communications of the ACM, 37(12), 36+.

This publication looks at how visuals help communicate and make information easier to understand. It divides these visuals into six types: graphs, tables, maps, diagrams, networks and icons. The article also looks at different ways these visuals share information effectively.

​​If you’d like to cite content from the IxDF website , click the ‘cite this article’ button near the top of your screen.

Some recommended books on visual representation and related topics include:

Chaplin, E. (1994). Sociology and Visual Representation (1st ed.) . Routledge.

Chaplin's book describes how visual art analysis has changed from ancient times to today. It shows how photography, post-modernism and feminism have changed how we see art. The book combines words and images in its analysis and looks into real-life social sciences studies.

Mitchell, W. J. T. (1994). Picture Theory. The University of Chicago Press.

Mitchell's book explores the important role and meaning of pictures in the late twentieth century. It discusses the change from focusing on language to focusing on images in cultural studies. The book deeply examines the interaction between images and text in different cultural forms like literature, art and media. This detailed study of how we see and read visual representations has become an essential reference for scholars and professionals.

Koffka, K. (1935). Principles of Gestalt Psychology. Harcourt, Brace & World.

"Principles of Gestalt Psychology" by Koffka, released in 1935, is a critical book in its field. It's known as a foundational work in Gestalt psychology, laying out the basic ideas of the theory and how they apply to how we see and think. Koffka's thorough study of Gestalt psychology's principles has profoundly influenced how we understand human perception. This book has been a significant reference in later research and writings.

A visual representation, like an infographic or chart, uses visual elements to show information or data. These types of visuals make complicated information easier to understand and more user-friendly.

Designers harness visual representations in design and communication. Infographics and charts, for instance, distill data for easier audience comprehension and retention.

For an introduction to designing basic information visualizations, take our course, Information Visualization .

Text is a crucial design and communication element, transforming language visually. Designers use font style, size, color and layout to convey emotions and messages effectively.

Designers utilize text for both literal communication and aesthetic enhancement. Their typography choices significantly impact design aesthetics, user experience and readability.

Designers should always consider text's visual impact in their designs. This consideration includes font choice, placement, color and interaction with other design elements.

In this video, design expert and teacher Mia Cinelli teaches how Gestalt principles apply to typography:

Designers use visual elements in projects to convey information, ideas, and messages. Designers use images, colors, shapes and typography for impactful designs.

In UI/UX design, visual representation is vital. Icons, buttons and colors provide contrast for intuitive, user-friendly website and app interfaces.

Graphic design leverages visual representation to create attention-grabbing marketing materials. Careful color, imagery and layout choices create an emotional connection.

Product design relies on visual representation for prototyping and idea presentation. Designers and stakeholders use visual representations to envision functional, aesthetically pleasing products.

Our brains process visuals 60,000 times faster than text. This fact highlights the crucial role of visual representation in design.

Our course, Visual Design: The Ultimate Guide , teaches you how to use visual design elements and principles in your work effectively.

Visual representation, crucial in UX, facilitates interaction, comprehension and emotion. It combines elements like images and typography for better interfaces.

Effective visuals guide users, highlight features and improve navigation. Icons and color schemes communicate functions and set interaction tones.

UX design research shows visual elements significantly impact emotions. 90% of brain-transmitted information is visual.

To create functional, accessible visuals, designers use color contrast and consistent iconography. These elements improve readability and inclusivity.

An excellent example of visual representation in UX is Apple's iOS interface. iOS combines a clean, minimalist design with intuitive navigation. As a result, the operating system is both visually appealing and user-friendly.

Michal Malewicz, Creative Director and CEO at Hype4, explains why visual skills are important in design:

Learn more about UI design from Michal in our Master Class, Beyond Interfaces: The UI Design Skills You Need to Know .

The fundamental principles of effective visual representation are:

Clarity : Designers convey messages clearly, avoiding clutter.

Simplicity : Embrace simple designs for ease and recall.

Emphasis : Designers highlight key elements distinctively.

Balance : Balance ensures design stability and structure.

Alignment : Designers enhance coherence through alignment.

Contrast : Use contrast for dynamic, distinct designs.

Repetition : Repeating elements unify and guide designs.

Designers practice these principles in their projects. They also analyze successful designs and seek feedback to improve their skills.

Read our topic description of Gestalt principles to learn more about creating effective visual designs. The Gestalt principles explain how humans group elements, recognize patterns and simplify object perception.

Color theory is vital in design, helping designers craft visually appealing and compelling works. Designers understand color interactions, psychological impacts and symbolism. These elements help designers enhance communication and guide attention.

Designers use complementary , analogous and triadic colors for contrast, harmony and balance. Understanding color temperature also plays a crucial role in design perception.

Color symbolism is crucial, as different colors can represent specific emotions and messages. For instance, blue can symbolize trust and calmness, while red can indicate energy and urgency.

Cultural variations significantly influence color perception and symbolism. Designers consider these differences to ensure their designs resonate with diverse audiences.

For actionable insights, designers should:

Experiment with color schemes for effective messaging. 

Assess colors' psychological impact on the audience. 

Use color contrast to highlight critical elements. 

Ensure color choices are accessible to all.

In this video, Joann and Arielle Eckstut, leading color consultants and authors, give their six tips for choosing color:

Learn more about color from Joann and Arielle in our Master Class, How To Use Color Theory To Enhance Your Designs .

Typography and font choice are crucial in design, impacting readability and mood. Designers utilize them for effective communication and expression.

Designers' perception of information varies with font type. Serif fonts can imply formality, while sans-serifs can give a more modern look.

Typography choices by designers influence readability and user experience. Well-spaced, distinct fonts enhance readability, whereas decorative fonts may hinder it.

Designers use typography to evoke emotions and set a design's tone. Choices in font size, style and color affect the emotional impact and message clarity.

Designers use typography to direct attention, create hierarchy and establish rhythm. These benefits help with brand recognition and consistency across mediums.

Read our article to learn how web fonts are critical to the online user experience .

Designers create a balance between simplicity and complexity in their work. They focus on the main messages and highlight important parts. Designers use the principles of visual hierarchy, like size, color and spacing. They also use empty space to make their designs clear and understandable.

The Gestalt law of Prägnanz suggests people naturally simplify complex images. This principle aids in making even intricate information accessible and engaging.

Through iteration and feedback, designers refine visuals. They remove extraneous elements and highlight vital information. Testing with the target audience ensures the design resonates and is comprehensible.

Michal Malewicz explains how to master hierarchy in UI design using the Gestalt rule of proximity:

Literature on Visual Representation

Here’s the entire UX literature on Visual Representation by the Interaction Design Foundation, collated in one place:

Learn more about Visual Representation

Take a deep dive into Visual Representation with our course Perception and Memory in HCI and UX .

How does all of this fit with interaction design and user experience? The simple answer is that most of our understanding of human experience comes from our own experiences and just being ourselves. That might extend to people like us, but it gives us no real grasp of the whole range of human experience and abilities. By considering more closely how humans perceive and interact with our world, we can gain real insights into what designs will work for a broader audience: those younger or older than us, more or less capable, more or less skilled and so on.

“You can design for all the people some of the time, and some of the people all the time, but you cannot design for all the people all the time.“ – William Hudson (with apologies to Abraham Lincoln)

While “design for all of the people all of the time” is an impossible goal, understanding how the human machine operates is essential to getting ever closer. And of course, building solutions for people with a wide range of abilities, including those with accessibility issues, involves knowing how and why some human faculties fail. As our course tutor, Professor Alan Dix, points out, this is not only a moral duty but, in most countries, also a legal obligation.

Portfolio Project

In the “ Build Your Portfolio: Perception and Memory Project ”, you’ll find a series of practical exercises that will give you first-hand experience in applying what we’ll cover. If you want to complete these optional exercises, you’ll create a series of case studies for your portfolio which you can show your future employer or freelance customers.

This in-depth, video-based course is created with the amazing Alan Dix , the co-author of the internationally best-selling textbook  Human-Computer Interaction and a superstar in the field of Human-Computer Interaction . Alan is currently a professor and Director of the Computational Foundry at Swansea University.

Gain an Industry-Recognized UX Course Certificate

Use your industry-recognized Course Certificate on your resume , CV , LinkedIn profile or your website.

All open-source articles on Visual Representation

Data visualization for human perception.

what is visual representation in psychology

The Key Elements & Principles of Visual Design

what is visual representation in psychology

  • 1.1k shares

Guidelines for Good Visual Information Representations

what is visual representation in psychology

  • 4 years ago

Philosophy of Interaction

Information visualization – an introduction to multivariate analysis.

what is visual representation in psychology

  • 8 years ago

Aesthetic Computing

How to represent linear data visually for information visualization.

what is visual representation in psychology

  • 5 years ago

Open Access—Link to us!

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this page , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share Knowledge, Get Respect!

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this page.

New to UX Design? We’re Giving You a Free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

  • Review article
  • Open access
  • Published: 11 July 2018

Decision making with visualizations: a cognitive framework across disciplines

  • Lace M. Padilla   ORCID: orcid.org/0000-0001-9251-5279 1 , 2 ,
  • Sarah H. Creem-Regehr 2 ,
  • Mary Hegarty 3 &
  • Jeanine K. Stefanucci 2  

Cognitive Research: Principles and Implications volume  3 , Article number:  29 ( 2018 ) Cite this article

33k Accesses

102 Citations

18 Altmetric

Metrics details

A Correction to this article was published on 02 September 2018

This article has been updated

Visualizations—visual representations of information, depicted in graphics—are studied by researchers in numerous ways, ranging from the study of the basic principles of creating visualizations, to the cognitive processes underlying their use, as well as how visualizations communicate complex information (such as in medical risk or spatial patterns). However, findings from different domains are rarely shared across domains though there may be domain-general principles underlying visualizations and their use. The limited cross-domain communication may be due to a lack of a unifying cognitive framework. This review aims to address this gap by proposing an integrative model that is grounded in models of visualization comprehension and a dual-process account of decision making. We review empirical studies of decision making with static two-dimensional visualizations motivated by a wide range of research goals and find significant direct and indirect support for a dual-process account of decision making with visualizations. Consistent with a dual-process model, the first type of visualization decision mechanism produces fast, easy, and computationally light decisions with visualizations. The second facilitates slower, more contemplative, and effortful decisions with visualizations. We illustrate the utility of a dual-process account of decision making with visualizations using four cross-domain findings that may constitute universal visualization principles. Further, we offer guidance for future research, including novel areas of exploration and practical recommendations for visualization designers based on cognitive theory and empirical findings.

Significance

People use visualizations to make large-scale decisions, such as whether to evacuate a town before a hurricane strike, and more personal decisions, such as which medical treatment to undergo. Given their widespread use and social impact, researchers in many domains, including cognitive psychology, information visualization, and medical decision making, study how we make decisions with visualizations. Even though researchers continue to develop a wealth of knowledge on decision making with visualizations, there are obstacles for scientists interested in integrating findings from other domains—including the lack of a cognitive model that accurately describes decision making with visualizations. Research that does not capitalize on all relevant findings progresses slower, lacks generalizability, and may miss novel solutions and insights. Considering the importance and impact of decisions made with visualizations, it is critical that researchers have the resources to utilize cross-domain findings on this topic. This review provides a cognitive model of decision making with visualizations that can be used to synthesize multiple approaches to visualization research. Further, it offers practical recommendations for visualization designers based on the reviewed studies while deepening our understanding of the cognitive processes involved when making decisions with visualizations.

Introduction

Every day we make numerous decisions with the aid of visualizations , including selecting a driving route, deciding whether to undergo a medical treatment, and comparing figures in a research paper. Visualizations are external visual representations that are systematically related to the information that they represent (Bertin, 1983 ; Stenning & Oberlander, 1995 ). The information represented might be about objects, events, or more abstract information (Hegarty, 2011 ). The scope of the previously mentioned examples illustrates the diversity of disciplines that have a vested interest in the influence of visualizations on decision making. While the term decision has a range of meanings in everyday language, here decision making is defined as a choice between two or more competing courses of action (Balleine, 2007 ).

We argue that for visualizations to be most effective, researchers need to integrate decision-making frameworks into visualization cognition research. Reviews of decision making with visual-spatial uncertainty also agree there has been a general lack of emphasis on mental processes within the visualization decision-making literature (Kinkeldey, MacEachren, Riveiro, & Schiewe, 2017 ; Kinkeldey, MacEachren, & Schiewe, 2014 ). The framework that has dominated applied decision-making research for the last 30 years is a dual-process account of decision making. Dual-process theories propose that we have two types of decision processes: one for automatic, easy decisions (Type 1); and another for more contemplative decisions (Type 2) (Kahneman & Frederick, 2002 ; Stanovich, 1999 ). Footnote 1 Even though many research areas involving higher-level cognition have made significant efforts to incorporate dual-process theories (Evans, 2008 ), visualization research has yet to directly test the application of current decision-making frameworks or develop an effective cognitive model for decision making with visualizations. The goal of this work is to integrate a dual-process account of decision making with established cognitive frameworks of visualization comprehension.

In this paper, we present an overview of current decision-making theories and existing visualization cognition frameworks, followed by a proposal for an integrated model of decision making with visualizations, and a selective review of visualization decision-making studies to determine if there is cross-domain support for a dual-process account of decision making with visualizations. As a preview, we will illustrate Type 1 and 2 processing in decision making with visualizations using four cross-domain findings that we observed in the literature review. Our focus here is on demonstrating how dual-processing can be a useful framework for examining visualization decision-making research. We selected the cross-domain findings as relevant demonstrations of Type 1 and 2 processing that were shared across the studies reviewed, but they do not represent all possible examples of dual-processing in visualization decision-making research. The review documents each of the cross-domain findings, in turn, using examples from studies in multiple domains. These cross-domain findings differ in their reliance on Type 1 and Type 2 processing. We conclude with recommendations for future work and implications for visualization designers.

Decision-making frameworks

Decision-making researchers have pursued two dominant research paths to study how humans make decisions under risk. The first assumes that humans make rational decisions, which are based on weighted and ordered probability functions and can be mathematically modeled (e.g. Kunz, 2004 ; Von Neumann, 1953 ). The second proposes that people often make intuitive decisions using heuristics (Gigerenzer, Todd, & ABC Research Group, 2000 ; Kahneman & Tversky, 1982 ). While there is fervent disagreement on the efficacy of heuristics and whether human behavior is rational (Vranas, 2000 ), there is more consensus that we can make both intuitive and strategic decisions (Epstein, Pacini, Denes-Raj, & Heier, 1996 ; Evans, 2008 ; Evans & Stanovich, 2013 ; cf. Keren & Schul, 2009 ). The capacity to make intuitive and strategic decisions is described by a dual-process account of decision making, which suggests that humans make fast, easy, and computationally light decisions (known as Type 1 processing) by default, but can also make slow, contemplative, and effortful decisions by employing Type 2 processing (Kahneman, 2011 ). Various versions of dual-processing theory exist, with the key distinctions being in the attributes associated with each type of process (for a more detailed review of dual-process theories, see Evans & Stanovich, 2013 ). For example, older dual-systems accounts of decision making suggest that each process is associated with specific cognitive or neurological systems. In contrast, dual-process (sometimes termed dual-type) theories propose that the processes are distinct but do not necessarily occur in separate cognitive or neurological systems (hence the use of process over system) (Evans & Stanovich, 2013 ).

Many applied domains have adapted a dual-processing model to explain task- and domain-specific decisions, with varying degrees of success (Evans, 2008 ). For example, when a physician is deciding if a patient should be assigned to a coronary care unit or a regular nursing bed, the doctor can use a heuristic or utilize heart disease predictive instruments to make the decision (Marewski & Gigerenzer, 2012 ). In the case of the heuristic, the doctor would employ a few simple rules (diagrammed in Fig.  1 ) that would guide her decision, such as considering the patient’s chief complaint being chest pain. Another approach is to apply deliberate mental effort to make a more time-consuming and effortful decision, which could include using heart disease predictive instruments (Marewski & Gigerenzer, 2012 ). In a review of how applied domains in higher-level cognition have implemented a dual-processing model for domain-specific decisions, Evans ( 2008 ) argues that prior work has conflicting accounts of Type 1 and 2 processing. Some studies suggest that the two types work in parallel while others reveal conflicts between the Types (Sloman, 2002 ). In the physician example proposed by Marewski and Gigerenzer ( 2012 ), the two types are not mutually exclusive, as doctors can utilize Type 2 to make a more thoughtful decision that is also influenced by some rules of thumb or Type 1. In sum, Evans ( 2008 ) argues that due to the inconsistency of classifying Type 1 and 2, the distinction between only two types is likely an oversimplification. Evans ( 2008 ) suggests that the literature only consistently supports the identification of processes that require a capacity-limited, working memory resource versus those that do not. Evans and Stanovich ( 2013 ) updated their definition based on new behavioral and neuroscience evidence stating, “the defining characteristic of Type 1 processes is their autonomy. They do not require ‘controlled attention,’ which is another way of saying that they make minimal demands on working memory resources” (p. 236). There is also debate on how to define the term working memory (Cowan, 2017 ). In line with prior work on decision making with visualizations (Patterson et al., 2014 ), we adopt the definition that working memory consists of multiple components that maintain a limited amount of information (their capacity) for a finite period (Cowan, 2017 ). Contemporary theories of working memory also stress the ability to engage attention in a controlled manner to suppress automatic responses and maintain the most task-relevant information with limited capacity (Engle, Kane, & Tuholski, 1999 ; Kane, Bleckley, Conway, & Engle, 2001 ; Shipstead, Harrison, & Engle, 2015 ).

figure 1

Coronary care unit decision tree, which illustrates a sequence of rules that a doctor could use to guide treatment decisions. Redrawn from “Heuristic decision making in medicine” by J. Marewski, and G. Gigerenzer 2012, Dialogues in clinical neuroscience, 14(1) , 77. ST-segment change refers to if certain anomaly appears in the patient’s electrocardiogram. NTG nitroglycerin, MI myocardial infarction, T T-waves with peaking or inversion

Identifying processes that require significant working memory provides a definition of Type 2 processing with observable neural correlates. Therefore, in line with Evans and Stanovich ( 2013 ), in the remainder of this manuscript, we will use significant working memory capacity demands and significant need for cognitive control, as defined above, as the criterion for Type 2 processing. In the context of visualization decision making, processes that require significant working memory are those that depend on the deliberate application of working memory to function. Type 1 processing occurs outside of users’ conscious awareness and may utilize small amounts of working memory but does not rely on conscious processing in working memory to drive the process. It should be noted that Type 1 and 2 processing are not mutually exclusive and many real-world decisions likely incorporate all processes. This review will attempt to identify tasks in visualization decision making that require significant working memory and capacity (Type 2 processing) and those that rely more heavily on Type 1 processing, as a first step to combining decision theory with visualization cognition.

Visualization cognition

Visualization cognition is a subset of visuospatial reasoning, which involves deriving meaning from external representations of visual information that maintain consistent spatial relations (Tversky, 2005 ). Broadly, two distinct approaches delineate visualization cognition models (Shah, Freedman, & Vekiri, 2005 ). The first approach refers to perceptually focused frameworks which attempt to specify the processes involved in perceiving visual information in displays and make predictions about the speed and efficiency of acquiring information from a visualization (e.g. Hollands & Spence, 1992 ; Lohse, 1993 ; Meyer, 2000 ; Simkin & Hastie, 1987 ). The second approach considers the influence of prior knowledge as well as perception. For example, Cognitive Fit Theory (Vessey, 1991), suggests that the user compares a learned graphic convention (mental schema) to the visual depiction. Visualizations that do not match the mental schema require cognitive transformations to make the visualization and mental representation align. For example, Fig.  2 illustrates a fictional relationship between the population growth of Species X and a predator species. At first glance, it may appear that when the predator species was introduced that the population of Species X dropped. However, after careful observation, you may notice that the higher population values are located lower on the Y-axis, which does not match our mental schema for graphs. With some effort, you can mentally reorder the values on the Y-axis to match your mental schema and then you may notice that the introduction of the predator species actually correlates with growth in the population of Species X. When the viewer is forced to mentally transform the visualization to match their mental schema, processing steps are increased, which may increase errors, time to complete a task, and demand on working memory (Vessey, 1991).

figure 2

Fictional relationship between the population growth of Species X and a predator species, where the Y-axis ordering does not match standard graphic conventions. Notice that the y-axis is reverse ordered. This figure was inspired by a controversial graphic produced by Christine Chan of Reuters, which showed the relationship between Florida’s “Stand Your Ground” law and firearm murders with the Y-axis reversed ordered (Lallanilla, 2014 )

Pinker ( 1990 ) proposed a cognitive model (see Fig.  3 ), which provides an integrative structure that denotes the distinction between top-down and bottom-up encoding mechanisms in understanding data graphs. Researchers have generalized this model to propose theories of comprehension, learning, and memory with visual information (Hegarty, 2011 ; Kriz & Hegarty, 2007 ; Shah & Freedman, 2011 ). The Pinker ( 1990 ) model suggests that from the visual array , defined as the unprocessed neuronal firing in response to visualizations, bottom-up encoding mechanisms are utilized to construct a visual description , which is the mental encoding of the visual stimulus. Following encoding, viewers mentally search long-term memory for knowledge relevant for interpreting the visualization. This knowledge is proposed to be in the form of a graph schema.

figure 3

Adapted figure from the Pinker ( 1990 ) model of visualization comprehension, which illustrates each process

Then viewers use a match process, where the graph schema that is the most similar to the visual array is retrieved. When a matching graph schema is found, the schema becomes instantiated . The visualization conventions associated with the graph schema can then help the viewer interpret the visualization ( message assembly process). For example, Fig. 3 illustrates comprehension of a bar chart using the Pinker ( 1990 ) model. In this example, the matched graph schema for a bar graph specifies that the dependent variable is on the Y-axis and the independent variable is on the X-axis; the instantiated graph schema incorporates the visual description and this additional information. The conceptual message is the resulting mental representation of the visualization that includes all supplemental information from long-term memory and any mental transformations the viewer may perform on the visualization. Viewers may need to transform their mental representation of the visualization based on their task or conceptual question . In this example, the viewer’s task is to find the average of A and B. To do this, the viewer must interpolate information in the bar chart and update the conceptual message with this additional information. The conceptual question can guide the construction of the mental representation through interrogation , which is the process of seeking out information that is necessary to answer the conceptual question. Top-down encoding mechanisms can influence each of the processes.

The influences of top-down processes are also emphasized in a previous attempt by Patterson et al. ( 2014 ) to extend visualization cognition theories to decision making. The Patterson et al. ( 2014 ) model illustrates how top-down cognitive processing influences encoding, pattern recognition, and working memory, but not decision making or the response. Patterson et al. ( 2014 ) use the multicomponent definition of working memory, proposed by Baddeley and Hitch ( 1974 ) and summarized by Cowan ( 2017 ) as a “multicomponent system that holds information temporarily and mediates its use in ongoing mental activities” (p. 1160). In this conception of working memory, a central executive controls the functions of working memory. The central executive can, among other functions, control attention and hold information in a visuo-spatial temporary store , which is where information can be maintained temporally for decision making without being stored in long-term memory (Baddeley & Hitch, 1974 ).

While incorporating working memory into a visualization decision-making model is valuable, the Patterson et al. ( 2014 ) model leaves some open questions about relationships between components and processes. For example, their model lacks a pathway for working memory to influence decisions based on top-down processing, which is inconsistent with well-established research in decision science (e.g. Gigerenzer & Todd, 1999; Kahneman & Tversky, 1982 ). Additionally, the normal processing pathway, depicted in the Patterson model, is an oversimplification of the interaction between top-down and bottom-up processing that is documented in a large body of literature (e.g. Engel, Fries, & Singer, 2001 ; Mechelli, Price, Friston, & Ishai, 2004 ).

A proposed integrated model of decision making with visualizations

Our proposed model (Fig.  4 ) introduces a dual-process account of decision making (Evans & Stanovich, 2013 ; Gigerenzer & Gaissmaier, 2011 ; Kahneman, 2011 ) into the Pinker ( 1990 ) model of visualization comprehension. A primary addition of our model is the inclusion of working memory, which is utilized to answer the conceptual question and could have a subsequent impact on each stage of the decision-making process, except bottom-up attention. The final stage of our model includes a decision-making process that derives from the conceptual message and informs behavior. In line with a dual-process account (Evans & Stanovich, 2013 ; Gigerenzer & Gaissmaier, 2011 ; Kahneman, 2011 ), the decision step can either be completed with Type 1 processing, which only uses minimal working memory (Evans & Stanovich, 2013 ) or recruit significant working memory, constituting Type 2 processing. Also following Evans and Stanovich ( 2013 ), we argue that people can make a decision with a visualization while using minimal amounts of working memory. We classify this as Type 1 thinking. Lohse ( 1997 ) found that when participants made judgments about budget allocation using profit charts, individuals with less working memory capacity performed equally well compared to those with more working memory capacity, when they only made decisions about three regions (easier task). However, when participants made judgments about nine regions (harder task), individuals with more working memory capacity outperformed those with less working memory capacity. The results of the study reveal that individual differences in working memory capacity only influence performance on complex decision-making tasks (Lohse, 1997 ). Figure  5 (top) illustrates one way that a viewer could make a Type 1 decision about whether the average value of bars A and B is closer to 2 or 2.2. Figure 5 (top) illustrates how a viewer might make a fast and computationally light decision in which she decides that the middle point between the two bars is closer to the salient tick mark of 2 on the Y-axis and answers 2 (which is incorrect). In contrast, Fig.  5 (bottom) shows a second possible method of solving the same problem by utilizing significant working memory (Type 2 processing). In this example, the viewer has recently learned a strategy to address similar problems, uses working memory to guide a top-down attentional search of the visual array, and identifies the values of A and B. Next, she instantiates a different graph schema than in the prior example by utilizing working memory and completes an effortful mental computation of 2.4 + 1.9/2. Ultimately, the application of working memory leads to a different and more effortful decision than in Fig. 5 (top). This example illustrates how significant amounts of working memory can be used at early stages of the decision-making process and produce downstream effects and more considered responses. In the following sections, we provide a selective review of work on decision making with visualizations that demonstrates direct and indirect evidence for our proposed model.

figure 4

Model of visualization decision making, which emphasizes the influence of working memory. Long-term memory can influence all components and processes in the model either via pre-attentive processes or by conscious application of knowledge

figure 5

Examples of a fast Type 1 (top) and slow Type 2 (bottom) decision outlined in our proposed model of decision making with visualizations. In these examples, the viewer’s task is to decide if the average value of bars A and B are closer to 2 or 2.2. The thick dotted line denotes significant working memory and the thin dotted line negligible working memory

Empirical studies of visualization decision making

Review method.

To determine if there is cross-domain empirical support for a dual-process account of decision making with visualizations, we selectively reviewed studies of complex decision making with computer-generated two-dimensional (2D) static visualizations. To illustrate the application of a dual-process account of decision making to visualization research, this review highlights representative studies from diverse application areas. Interdisciplinary groups conducted many of these studies and, as such, it is not accurate to classify the studies in a single discipline. However, to help the reader evaluate the cross-domain nature of these findings, Table  1 includes the application area for the specific tasks used in each study.

In reviewing this work, we observed four key cross-domain findings that support a dual-process account of decision making (see Table  2 ). The first two support the inclusion of Type 1 processing, which is illustrated by the direct path for bottom-up attention to guide decision making with the minimal application of working memory (see Fig. 5 top). The first finding is that visualizations direct viewers’ bottom-up attention , which can both help and hinder decision making (see “ Bottom-up attention ”). The second finding is that visual-spatial biases comprise a unique category of bias that is a direct result of the visual encoding technique (see “ Visual-Spatial Biases ”). The third finding supports the inclusion of Type 2 processing in our proposed model and suggests that visualizations vary in cognitive fit between the visual description, graph schema, and conceptual question. If the fit is poor (i.e. there is a mismatch between the visualization and a decision-making component), working memory is used to perform corrective mental transformations (see “ Cognitive fit ”). The final cross-domain finding proposes that knowledge-driven processes may interact with the effects of the visual encoding technique (see “ Knowledge-driven processing ”) and could be a function of either Type 1 or 2 processes. Each of these findings will be detailed at length in the relevant sections. The four cross-domain findings do not represent an exhaustive list of all cross-domain findings that pertain to visualization cognition. However, these were selected as illustrative examples of Type 1 and 2 processing that include significant contributions from multiple domains. Further, some of the studies could fit into multiple sections and were included in a particular section as illustrative examples.

Bottom-up attention

The first cross-domain finding that characterizes Type 1 processing in visualization decision making is that visualizations direct participants’ bottom-up attention to specific visual features, which can be either beneficial or detrimental to decision making. Bottom-up attention consists of involuntary shifts in focus to salient features of a visualization and does not utilize working memory (Connor, Egeth, & Yantis, 2004 ), therefore it is a Type 1 process. The research reviewed in this section illustrates that bottom-up attention has a profound influence on decision making with visualizations. A summary of visual features that studies have used to attract bottom-up attention can be found in Table  3 .

Numerous studies show that salient information in a visualization draws viewers’ attention (Fabrikant, Hespanha, & Hegarty, 2010 ; Hegarty, Canham, & Fabrikant, 2010 ; Hegarty, Friedman, Boone, & Barrett, 2016 ; Padilla, Ruginski, & Creem-Regehr, 2017 ; Schirillo & Stone, 2005 ; Stone et al., 2003 ; Stone, Yates, & Parker, 1997 ). The most common methods for demonstrating that visualizations focus viewers’ attention is by showing that viewers miss non-salient but task-relevant information (Schirillo & Stone, 2005 ; Stone et al., 1997 ; Stone et al., 2003 ), viewers are biased by salient information (Hegarty et al., 2016 ; Padilla, Ruginski et al., 2017 ) or viewers spend more time looking at salient information in a visualization (Fabrikant et al., 2010 ; Hegarty et al., 2010 ). For example, Stone et al. ( 1997 ) demonstrated that when viewers are asked how much they would pay for an improved product using the visualizations in Fig.  6 , they focus on the number of icons while missing the base rate of 5,000,000. If a viewer simply totals the icons, the standard product appears to be twice as dangerous as the improved product, but because the base rate is large, the actual difference between the two products is insignificantly small (0.0000003; Stone et al., 1997 ). In one experiment, participants were willing to pay $125 more for improved tires when viewing the visualizations in Fig. 6 compared to a purely textual representation of the information. The authors also demonstrated the same effect for improved toothpaste, with participants paying $0.95 more when viewing a visual depiction compared to text. The authors’ term this heuristic of focusing on salient information and ignoring other data the foreground effect (Stone et al., 1997 ) (see also Schirillo & Stone, 2005 ; Stone et al., 2003 ).

figure 6

Icon arrays used to illustrate the risk of standard or improved tires. Participants were tasked with deciding how much they would pay for the improved tires. Note the base rate of 5 M drivers was represented in text. Redrawn from “Effects of numerical and graphical displays on professed risk-taking behavior” by E. R. Stone, J. F. Yates, & A. M. Parker. 1997, Journal of Experimental Psychology: Applied , 3 (4), 243

A more direct test of visualizations guiding bottom-up attention is to examine if salient information biases viewers’ judgments. One method involves identifying salient features using a behaviorally validated saliency model, which predicts the locations that will attract viewers’ bottom-up attention (Harel, 2015 ; Itti, Koch, & Niebur, 1998 ; Rosenholtz & Jin, 2005 ). In one study, researchers compared participants’ judgments with different hurricane forecast visualizations and then, using the Itti et al. ( 1998 ) saliency algorithm, found that the differences in what was salient in the two visualizations correlated with participants’ performance (Padilla, Ruginski et al., 2017 ). Specifically, they suggested that the salient borders of the Cone of Uncertainty (see Fig.  7 , left), which is used by the National Hurricane Center to display hurricane track forecasts, leads some people to incorrectly believe that the hurricane is growing in physical size, which is a misunderstanding of the probability distribution of hurricane paths that the cone is intended to represent (Padilla, Ruginski et al., 2017 ; see also Ruginski et al., 2016 ). Further, they found that when the same data were represented as individual hurricane paths, such that there was no salient boundary (see Fig. 7 , right), viewers intuited the probability of hurricane paths more effectively than the Cone of Uncertainty. However, an individual hurricane path biased viewers’ judgments if it intersected a point of interest. For example, in Fig. 7 (right), participants accurately judged that locations closer to the densely populated lines (highest likelihood of storm path) would receive more damage. This correct judgment changed when a location farther from the center of the storm was intersected by a path, but the closer location was not (see locations a and b in Fig. 7 right). With both visualizations, the researchers found that viewers were negatively biased by the salient features for some tasks (Padilla, Ruginski et al., 2017 ; Ruginski et al., 2016 ).

figure 7

An example of the Cone of Uncertainty ( left ) and the same data represented as hurricane paths ( right ). Participants were tasked with evaluating the level of damage that would incur to offshore oil rigs at specific locations, based on the hurricane forecast visualization. Redrawn from “Effects of ensemble and summary displays on interpretations of geospatial uncertainty data” by L. M. Padilla, I. Ruginski, and S. H. Creem-Regehr. 2017, Cognitive Research: Principles and Implications , 2 (1), 40

That is not to say that saliency only negatively impacts decisions. When incorporated into visualization design, saliency can guide bottom-up attention to task-relevant information, thereby improving performance (e.g. Fabrikant et al., 2010 ; Fagerlin, Wang, & Ubel, 2005 ; Hegarty et al., 2010 ; Schirillo & Stone, 2005 ; Stone et al., 2003 ; Waters, Weinstein, Colditz, & Emmons, 2007 ). One compelling example using both eye-tracking measures and a saliency algorithm demonstrated that salient features of weather maps directed viewers’ attention to different variables that were visualized on the maps (Hegarty et al., 2010 ) (see also Fabrikant et al., 2010 ). Interestingly, when the researchers manipulated the relative salience of temperature versus pressure (see Fig.  8 ), the salient features captured viewers’ overt attention (as measured by eye fixations) but did not influence performance, until participants were trained on how to effectively interpret the features. Once viewers were trained, their judgments were facilitated when the relevant features were more salient (Hegarty et al., 2010 ). This is an instructive example of how saliency may direct viewers’ bottom-up attention but may not influence their performance until viewers have the relevant top-down knowledge to capitalize on the affordances of the visualization.

figure 8

Eye-tracking data from Hegarty et al. ( 2010 ). Participants viewed an arrow located in Utah (obscured by eye-tracking data in the figure) and made judgments about whether the arrow correctly identified the wind direction. The black isobars were the task-relevant information. Notice that after instructions, viewers with the pressure-salient visualizations focused on the isobars surrounding Utah, rather than on the legend or in other regions. The panels correspond to the conditions in the original study

In sum, the reviewed studies suggest that bottom-up attention has a profound influence on decision making with visualizations. This is noteworthy because bottom-up attention is a Type 1 process. At a minimum, the work suggests that Type 1 processing influences the first stages of decision making with visualizations. Further, the studies cited in this section provide support for the inclusion of bottom-up attention in our proposed model.

  • Visual-spatial biases

A second cross-domain finding that relates to Type 1 processing is that visualizations can give rise to visual-spatial biases that can be either beneficial or detrimental to decision making. We are proposing the new concept of visual-spatial biases and defining this term as a bias that elicits heuristics, which are a direct result of the visual encoding technique. Visual-spatial biases likely originate as a Type 1 process as we suspect they are connected to bottom-up attention, and if detrimental to decision making, have to be actively suppressed by top-down knowledge and cognitive control mechanisms (see Table  4 for summary of biases documented in this section). Visual-spatial biases can also improve decision-making performance. As Card, Mackinlay, and Shneiderman ( 1999 ) point out, we can use vision to think , meaning that visualizations can capitalize on visual perception to interpret a visualization without effort when the visual biases elucidated by the visualization are consistent with the correct interpretation.

Tversky ( 2011 ) presents a taxonomy of visual-spatial communications that are intrinsically related to thought, which are likely the bases for visual-spatial biases (see also Fabrikant & Skupin, 2005 ). One of the most commonly documented visual-spatial biases that we observed across domains is a containment conceptualization of boundary representations in visualizations. Tversky ( 2011 ) makes the analogy, “Framing a picture is a way of saying that what is inside the picture has a different status from what is outside the picture” (p. 522). Similarly, Fabrikant and Skupin ( 2005 ) describe how, “They [boundaries] help partition an information space into zones of relative semantic homogeneity” (p. 673). However, in visualization design, it is common to take continuous data and visually represent them with boundaries (i.e. summary statistics, error bars, isocontours, or regions of interest; Padilla et al., 2015 ; Padilla, Quinan, Meyer, & Creem-Regehr, 2017 ). Binning continuous data is a reasonable approach, particularly when intended to make the data simpler for viewers to understand (Padilla, Quinan, et al., 2017 ). However, it may have the unintended consequence of creating artificial boundaries that can bias users—leading them to respond as if data within a containment is more similar than data across boundaries. For example, McKenzie, Hegarty, Barrett, and Goodchild ( 2016 ) showed that participants were more likely to use a containment heuristic to make decisions about Google Map’s blue dot visualization when the positional uncertainty data were visualized as a bounded circle (Fig.  9 right) compared to a Gaussian fade (Fig. 9 left) (see also Newman & Scholl, 2012 ; Ruginski et al., 2016 ). Recent work by Grounds, Joslyn, and Otsuka ( 2017 ) found that viewers demonstrate a “deterministic construal error” or the belief that visualizations of temperature uncertainty represent a deterministic forecast. However, the deterministic construal error was not observed with textual representations of the same data (see also Joslyn & LeClerc, 2013 ).

figure 9

Example stimuli from McKenzie et al. ( 2016 ) showing circular semi-transparent overlays used by Google Maps to indicate the uncertainty of the users’ location. Participants compared two versions of these visualizations and determined which represented the most accurate positional location. Redrawn from “Assessing the effectiveness of different visualizations for judgments of positional uncertainty” by G. McKenzie, M. Hegarty, T. Barrett, and M. Goodchild. 2016, International Journal of Geographical Information Science , 30 (2), 221–239

Additionally, some visual-spatial biases follow the same principles as more well-known decision-making biases revealed by researchers in behavioral economics and decision science. In fact, some decision-making biases, such as anchoring , the tendency to use the first data point to make relative judgments, seem to have visual correlates (Belia, Fidler, Williams, & Cumming, 2005 ). For example, Belia et al. ( 2005 ) asked experts with experience in statistics to align two means (representing “Group 1” and “Group 2”) with error bars so that they represented data ranges that were just significantly different (see Fig.  10 for example of stimuli). They found that when the starting position of Group 2 was around 800 ms, participants placed Group 2 higher than when the starting position for Group 2 was at around 300 ms. This work demonstrates that participants used the starting mean of Group 2 as an anchor or starting point of reference, even though the starting position was arbitrary. Other work finds that visualizations can be used to reduce some decision-making biases including anecdotal evidence bias (Fagerlin et al., 2005 ), side effect aversion (Waters et al., 2007 ; Waters, Weinstein, Colditz, & Emmons, 2006 ), and risk aversion (Schirillo & Stone, 2005 ).

figure 10

Example display and instructions from Belia et al. ( 2005 ). Redrawn from “Researchers misunderstand confidence intervals and standard error bars” by S. Belia, F. Fidler, J. Williams, and G. Cumming. 2005, Psychological Methods, 10 (4), 390. Copyright 2005 by “American Psychological Association”

Additionally, the mere presence of a visualization may inherently bias viewers. For example, viewers find scientific articles with high-quality neuroimaging figures to have greater scientific reasoning than the same article with a bar chart or without a figure (McCabe & Castel, 2008 ). People tend to unconsciously believe that high-quality scientific images reflect high-quality science—as illustrated by work from Keehner, Mayberry, and Fischer ( 2011 ) showing that viewers rate articles with three-dimensional brain images as more scientific than those with 2D images, schematic drawings, or diagrams (See Fig.  11 ). Unintuitively, however, high-quality complex images can be detrimental to performance compared to simpler visualizations (Hegarty, Smallman, & Stull, 2012 ; St. John, Cowen, Smallman, & Oonk, 2001 ; Wilkening & Fabrikant, 2011 ). Hegarty et al. ( 2012 ) demonstrated that novice users prefer realistically depicted maps (see Fig.  12 ), even though these maps increased the time taken to complete the task and focused participants’ attention on irrelevant information (Ancker, Senathirajah, Kukafka, & Starren, 2006 ; Brügger, Fabrikant, & Çöltekin, 2017 ; St. John et al., 2001 ; Wainer, Hambleton, & Meara, 1999 ; Wilkening & Fabrikant, 2011 ). Interestingly, professional meteorologists also demonstrated the same biases as novice viewers (Hegarty et al., 2012 ) (see also Nadav-Greenberg, Joslyn, & Taing, 2008 ).

figure 11

Image showing participants’ ratings of three-dimensionality and scientific credibility for a given neuroimaging visualization, originally published in grayscale (Keehner et al., 2011 )

figure 12

Example stimuli from Hegarty et al. ( 2012 ) showing maps with varying levels of realism. Both novice viewers and meteorologists were tasked with selecting a visualization to use and performing a geospatial task. The panels correspond to the conditions in the original study

We argue that visual-spatial biases reflect a Type 1 process, occurring automatically with minimal working memory. Work by Sanchez and Wiley ( 2006 ) provides direct evidence for this assertion using eye-tracking data to demonstrate that individuals with less working memory capacity attend to irrelevant images in a scientific article more than those with greater working memory capacity. The authors argue that we are naturally drawn to images (particularly high-quality depictions) and that significant working memory capacity is required to shift focus away from images that are task-irrelevant. The ease by which visualizations captivate our focus and direct our bottom-up attention to specific features likely increases the impact of these biases, which may be why some visual-spatial biases are notoriously difficult to override using working memory capacity (see Belia et al., 2005 ; Boone, Gunalp, & Hegarty, in press ; Joslyn & LeClerc, 2013 ; Newman & Scholl, 2012 ). We speculate that some visual-spatial biases are intertwined with bottom-up attention—occurring early in the decision-making process and influencing the down-stream processes (see our model in Fig. 4 for reference), making them particularly unremitting.

Cognitive fit

We also observe a cross-domain finding involving Type 2 processing, which suggests that if there is a mismatch between the visualization and a decision-making component, working memory is used to perform corrective mental transformations. Cognitive fit is a term used to describe the correspondence between the visualization and conceptual question or task (see our model for reference; for an overview of cognitive fit, see Vessey, Zhang, & Galletta, 2006 ). Those interested in examining cognitive fit generally attempt to identify and reduce mismatches between the visualization and one of the decision-making components (see Table  5 for a breakdown of the decision-making components that the reviewed studies evaluated). When there is a mismatch produced by the default Type 1 processing, it is argued that significant working memory (Type 2 processing) is required to resolve the discrepancy via mental transformations (Vessey et al., 2006 ). As working memory is capacity limited, the magnitude of mental transformation or amount of working memory required is one predictor of reaction times and errors.

Direct evidence for this claim comes from work demonstrating that cognitive fit differentially influenced the performance of individuals with more and less working memory capacity (Zhu & Watts, 2010 ). The task was to identify which two nodes in a social media network diagram should be removed to disconnect the maximal number of nodes. As predicted by cognitive fit theory, when the visualization did not facilitate the task (Fig.  13 left), participants with less working memory capacity were slower than those with more working memory capacity. However, when the visualization aligned with the task (Fig.  13 right), there was no difference in performance. This work suggests that when there is misalignment between the visualization and a decision-making process, people with more working memory capacity have the resources to resolve the conflict, while those with less resources show performance degradations. Footnote 2 Other work only found a modest relationship between working memory capacity and correct interpretations of high and low temperature forecast visualizations (Grounds et al., 2017 ), which suggests that, for some visualizations, viewers utilize little working memory.

figure 13

Examples of social media network diagrams from Zhu and Watts ( 2010 ). The authors argue that the figure on the right is more aligned with the task of identifying the most interconnected nodes than the figure on the left

As illustrated in our model, working memory can be recruited to aid all stages of the decision-making process except bottom-up attention. Work that examines cognitive fit theory provides indirect evidence that working memory is required to resolve conflicts in the schema matching and a decision-making component. For example, one way that a mismatch between a viewer’s mental schema and visualization can arise is when the viewer uses a schema that is not optimal for the task. Tversky, Corter, Yu, Mason, and Nickerson ( 2012 ) primed participants to use different schemas by describing the connections in Fig.  14 in terms of either transfer speed or security levels. Participants then decided on the most efficient or secure route for information to travel between computer nodes with either a visualization that encoded data using the thickness of connections, containment, or physical distance (see Fig.  14 ). Tversky et al. ( 2012 ) found that when the links were described based on their information transfer speed, thickness and distance visualizations were the most effective—suggesting that the speed mental schema was most closely matched to the thickness and distance visualizations, whereas the speed schema required mental transformations to align with the containment visualization. Similarly, the thickness and containment visualizations outperformed the distance visualization when the nodes were described as belonging to specific systems with different security levels. This work and others (Feeney, Hola, Liversedge, Findlay, & Metcalf, 2000 ; Gattis & Holyoak, 1996 ; Joslyn & LeClerc, 2013 ; Smelcer & Carmel, 1997 ) provides indirect evidence that gratuitous realignment between mental schema and the visualization can be error-prone and visualization designers should work to reduce the number of transformations required in the decision-making process.

figure 14

Example of stimuli from Tversky et al. ( 2012 ) showing three types of encoding techniques for connections between nodes (thickness, containment, and distance). Participants were asked to select routes between nodes with different descriptions of the visualizations. Redrawn from “Representing category and continuum: Visualizing thought” by B. Tversky, J. Corter, L. Yu, D. Mason, and J. Nickerson. In Diagrams 2012 (p. 27), P. Cox, P. Rodgers, and B. Plimmer (Eds.), 2012, Berlin Heidelberg: Springer-Verlag

Researchers from multiple domains have also documented cases of misalignment between the task, or conceptual question, and the visualization. For example, Vessey and Galletta ( 1991 ) found that participants completed a financial-based task faster when the visualization they chose (graph or table, see Fig.  15 ) matched the task (spatial or textual). For the spatial task, participants decided which month had the greatest difference between deposits and withdrawals. The textual or symbolic tasks involved reporting specific deposit and withdrawal amounts for various months. The authors argued that when there is a mismatch between the task and visualization, the additional transformation accounts for the increased time taken to complete the task (Vessey & Galletta, 1991 ) (see also Dennis & Carte, 1998 ; Huang et al., 2006 ), which likely takes place in the inference process of our proposed model.

figure 15

Examples of stimuli from Vessey and Galletta ( 1991 ) depicting deposits and withdraw amounts over the course of a year with a graph ( a ) and table ( b ). Participants completed either a spatial or textual task with the chart or table. Redrawn from “Cognitive fit: An empirical study of information acquisition” by I. Vessey, and D. Galletta. 1991, Information systems research, 2 (1), 72–73. Copyright 1991 by “INFORMS”

The aforementioned studies provide direct (Zhu & Watts, 2010 ) and indirect (Dennis & Carte, 1998 ; Feeney et al., 2000 ; Gattis & Holyoak, 1996 ; Huang et al., 2006 ; Joslyn & LeClerc, 2013 ; Smelcer & Carmel, 1997 ; Tversky et al., 2012 ; Vessey & Galletta, 1991 ) evidence that Type 2 processing recruits working memory to resolve misalignment between decision-making processes and the visualization that arise from default Type 1 processing. These examples of Type 2 processing using working memory to perform effortful mental computations are consistent with the assertions of Evans and Stanovich ( 2013 ) that Type 2 processes enact goal directed complex processing. However, it is not clear from the reviewed work how exactly the visualization and decision-making components are matched. Newman and Scholl ( 2012 ) propose that we match the schema and visualization based on the similarities between the salient visual features, although this proposal has not been tested. Further, work that assesses cognitive fit in terms of the visualization and task only examines the alignment of broad categories (i.e., spatial or semantic). Beyond these broad classifications, it is not clear how to predict if a task and visualization are aligned. In sum, there is not a sufficient cross-disciplinary theory for how mental schemas and tasks are matched to visualizations. However, it is apparent from the reviewed work that Type 2 processes (requiring working memory) can be recruited during the schema matching and inference processes.

Either type 1 and/or 2

Knowledge-driven processing.

In a review of map-reading cognition, Lobben ( 2004 ) states, “…research should focus not only on the needs of the map reader but also on their map-reading skills and abilities” (p. 271). In line with this statement, the final cross-domain finding is that the effects of knowledge can interact with the affordances or biases inherent in the visualization method. Knowledge may be held temporally in working memory (Type 2), held in long-term knowledge but effortfully used (Type 2), or held in long-term knowledge but automatically applied (Type 1). As a result, knowledge-driven processing can involve either Type 1 or Type 2 processes.

Both short- and long-term knowledge can influence visualization affordances and biases. However, it is difficult to distinguish whether Type 2 processing is using significant working memory capacity to temporarily hold knowledge or if participants have stored the relevant knowledge in long-term memory and processing is more automatic. Complicating the issue, knowledge stored in long-term memory can influence decision making with visualizations using both Type 1 and 2 processing. For example, if you try to remember Pythagorean’s Theorem, which you may have learned in high school or middle school, you may recall that a 2  + b 2  = c 2 , where c represents the length of the hypotenuse and a and b represent the lengths of the other two sides of a triangle. Unless you use geometry regularly, you likely had to strenuously search in long-term memory for the equation, which is a Type 2 process and requires significant working memory capacity. In contrast, if you are asked to recall your childhood phone number, the number might automatically come to mind with minimal working memory required (Type 1 processing).

In this section, we highlight cases where knowledge either influenced decision making with visualizations or was present but did not influence decisions (see Table  6 for the type of knowledge examined in each study). These studies are organized based on how much time the viewers had to incorporate the knowledge (i.e. short-term instructions and long-term individual differences in abilities and expertise), which may be indicative of where the knowledge is stored. However, many factors other than time influence the process of transferring knowledge by working memory capacity to long-term knowledge. Therefore, each of the studies cited in this section could be either Type 1, Type 2, or both types of processing.

One example of participants using short-term knowledge to override a familiarity bias comes from work by Bailey, Carswell, Grant, and Basham ( 2007 ) (see also Shen, Carswell, Santhanam, & Bailey, 2012 ). In a complex geospatial task for which participants made judgments about terrorism threats, participants were more likely to select familiar map-like visualizations rather than ones that would be optimal for the task (see Fig.  16 ) (Bailey et al., 2007 ). Using the same task and visualizations, Shen et al. ( 2012 ) showed that users were more likely to choose an efficacious visualization when given training concerning the importance of cognitive fit and effective visualization techniques. In this case, viewers were able to use knowledge-driven processing to improve their performance. However, Joslyn and LeClerc ( 2013 ) found that when participants viewed temperature uncertainty, visualized as error bars around a mean temperature prediction, they incorrectly believed that the error bars represented high and low temperatures. Surprisingly, participants maintained this belief despite a key, which detailed the correct way to interpret each temperature forecast (see also Boone et al., in press ). The authors speculated that the error bars might have matched viewers’ mental schema for high- and low-temperature forecasts (stored in long-term memory) and they incorrectly utilized the high-/low-temperature schema rather than incorporating new information from the key. Additionally, the authors propose that because the error bars were visually represented as discrete values, that viewers may have had difficulty reimagining the error bars as points on a distribution, which they term a deterministic construal error (Joslyn & LeClerc, 2013 ). Deterministic construal visual-spatial biases may also be one of the sources of misunderstanding of the Cone of Uncertainty (Padilla, Ruginski et al., 2017 ; Ruginski et al., 2016 ). A notable difference between these studies and the work of Shen et al. ( 2012 ) is that Shen et al. ( 2012 ) used instructions to correct a familiarity bias, which is a cognitive bias originally documented in the decision-making literature that is not based on the visual elements in the display. In contrast, the biases in Joslyn and LeClerc ( 2013 ) were visual-spatial biases. This provides further evidence that visual-spatial biases may be a unique category of biases that warrant dedicated exploration, as they are harder to influence with knowledge-driven processing.

figure 16

Example of different types of view orientations used by examined by Bailey et al. ( 2007 ). Participants selected one of these visualizations and then used their selection to make judgments including identifying safe passageways, determining appropriate locations for firefighters, and identifying suspicious locations based on the height of buildings. The panels correspond to the conditions in the original study

Regarding longer-term knowledge, there is substantial evidence that individual differences in knowledge impact decision making with visualizations. For example, numerous studies document the benefit of visualizations for individuals with less health literacy, graph literacy, and numeracy (Galesic & Garcia-Retamero, 2011 ; Galesic, Garcia-Retamero, & Gigerenzer, 2009 ; Keller, Siegrist, & Visschers, 2009 ; Okan, Galesic, & Garcia-Retamero, 2015 ; Okan, Garcia-Retamero, Cokely, & Maldonado, 2012 ; Okan, Garcia-Retamero, Galesic, & Cokely, 2012 ; Reyna, Nelson, Han, & Dieckmann, 2009 ; Rodríguez et al., 2013 ). Visual depictions of health data are particularly useful because health data often take the form of probabilities, which are unintuitive. Visualizations inherently illustrate probabilities (i.e. 10%) as natural frequencies (i.e. 10 out of 100), which are more intuitive (Hoffrage & Gigerenzer, 1998 ). Further, by depicting natural frequencies visually (see example in Fig.  17 ), viewers can make perceptual comparisons rather than mathematical calculations. This dual benefit is likely the reason visualizations produce facilitation for individuals with less health literacy, graph literacy, and numeracy.

figure 17

Example of stimuli used by Galesic et al. ( 2009 ) in a study demonstrating that natural frequency visualizations can help individuals overcome less numeracy. Participants completed three medical scenario tasks using similar visualizations as depicted here, in which they were asked about the effects of aspirin on risk of stroke or heart attack and about a hypothetical new drug. Redrawn from “Using icon arrays to communicate medical risks: overcoming less numeracy” by M. Galesic, R. Garcia-Retamero, and G. Gigerenzer. 2009, Health Psychology, 28 (2), 210

These studies are good examples of how designers can create visualizations that capitalize on Type 1 processing to help viewers accurately make decisions with complex data even when they lack relevant knowledge. Based on the reviewed work, we speculate that well-designed visualizations that utilize Type 1 processing to intuitively illustrate task-relevant relationships in the data may be particularly beneficial for individuals with less numeracy and graph literacy, even for simple tasks. However, poorly designed visualizations that require superfluous mental transformations may be detrimental to the same individuals. Further, individual differences in expertise, such as graph literacy, which have received more attention in healthcare communication (Galesic & Garcia-Retamero, 2011 ; Nayak et al., 2016 ; Okan et al., 2015 ; Okan, Garcia-Retamero, Cokely, & Maldonado, 2012 ; Okan, Garcia-Retamero, Galesic, & Cokely, 2012 ; Rodríguez et al., 2013 ), may play a large role in how viewers complete even simple tasks in other domains such as map-reading (Kinkeldey et al., 2017 ).

Less consistent are findings on how more experienced users incorporate knowledge acquired over longer periods of time to make decisions with visualizations. Some research finds that students’ decision-making and spatial abilities improved during a semester-long course on Geographic Information Science (GIS) (Lee & Bednarz, 2009 ). Other work finds that experts perform the same as novices (Riveiro, 2016 ), experts can exhibit visual-spatial biases (St. John et al., 2001 ) and experts perform more poorly than expected in their domain of visual expertise (Belia et al., 2005 ). This inconsistency may be due in part to the difficulty in identifying when and if more experienced viewers are automatically applying their knowledge or employing working memory. For example, it is unclear if the students in the GIS course documented by Lee and Bednarz ( 2009 ) developed automatic responses (Type 1) or if they learned the information and used working memory capacity to apply their training (Type 2).

Cheong et al. ( 2016 ) offer one way to gauge how performance may change when one is forced to use Type 1 processing, but then allowed to use Type 2 processing. In a wildfire task using multiple depictions of uncertainty (see Fig.  18 ), Cheong et al. ( 2016 ) found that the type of uncertainty visualization mattered when participants had to make fast Type 1 decisions (5 s) about evacuating from a wildfire. But when given sufficient time to make Type 2 decisions (30 s), participants were not influenced by the visualization technique (see also Wilkening & Fabrikant, 2011 ).

figure 18

Example of multiple uncertainty visualization techniques for wildfire risk by Cheong et al. ( 2016 ). Participants were presented with a house location (indicated by an X), and asked if they would stay or leave based on one of the wildfire hazard communication techniques shown here. The panels correspond to the conditions in the original study

Interesting future work could limit experts’ time to complete a task (forcing Type 1 processing) and then determine if their judgments change when given more time to complete the task (allowing for Type 2 processing). To test this possibility further, a dual-task paradigm could be used such that experts’ working memory capacity is depleted by a difficult secondary task that also required working memory capacity. Some examples of secondary tasks in a dual-task paradigm include span tasks that require participants to remember or follow patterns of information, while completing the primary task, then report the remembered or relevant information from the pattern (for a full description of theoretical bases for a dual-task paradigm see Pashler, 1994 ). To our knowledge, only one study has used a dual-task paradigm to evaluate cognitive load of a visualization decision-making task (Bandlow et al., 2011 ). However, a growing body of research on other domains, such as wayfinding and spatial cognition, demonstrates the utility of using dual-task paradigms to understand the types of working memory that users employ for a task (Caffò, Picucci, Di Masi, & Bosco, 2011 ; Meilinger, Knauff, & Bülthoff, 2008 ; Ratliff & Newcombe, 2005 ; Trueswell & Papafragou, 2010 ).

Span tasks are examples of spatial or verbal secondary tasks, which include remembering the orientations of an arrow (taxes visual-spatial memory, (Shah & Miyake, 1996 ) or counting backward by 3 s (taxes verbal processing and short-term memory) (Castro, Strayer, Matzke, & Heathcote, 2018 ). One should expect more interference if the primary and secondary tasks recruit the same processes (i.e. visual-spatial primary task paired with a visual-spatial memory span task). An example of such an experimental design is illustrated in Fig.  19 . In the dual-task trial illustrated in Fig.  19 , if participants responses are as fast and accurate as the baseline trial then participants are likely not using significant amounts of working memory capacity for that task. If the task does require significant working memory capacity, then the inclusion of the secondary task should increase the time taken to complete the primary task and potentially produce errors in both the secondary and primary tasks. In visualization decision-making research, this is an open area of exploration for researchers and designers that are interested in understanding how working memory capacity and a dual-process account of decision making applies to their visualizations and application domains.

figure 19

A diagram of a dual-tasking experiment is shown using the same task as in Fig. 5 . Responses resulting from Type 1 and 2 processing are illustrated. The dual-task trial illustrates how to place additional load on working memory capacity by having the participant perform a demanding secondary task. The impact of the secondary task is illustrated for both time and accuracy. Long-term memory can influence all components and processes in the model either via pre-attentive processes or by conscious application of knowledge

In sum, this section documents cases where knowledge-driven processing does and does not influence decision making with visualizations. Notably, we describe numerous studies where well-designed visualizations (capitalizing on Type 1 processing) focus viewers’ attention on task-relevant relationships in the data, which improves decision accuracy for individuals with less developed health literacy, graph literacy, and numeracy. However, the current work does not test how knowledge-driven processing maps on to the dual-process model of decision making. Knowledge may be held temporally by working memory capacity (Type 2), held in long-term knowledge but strenuously utilized (Type 2), or held in long-term knowledge but automatically applied (Type 1). More work is needed to understand if a dual-process account of decision making accurately describes the influence of knowledge-driven processing on decision making with visualizations. Finally, we detailed an example of a dual-task paradigm as one way to evaluate if viewers are employing Type 1 processing.

Review summary

Throughout this review, we have provided significant direct and indirect evidence that a dual-process account of decision making effectively describes prior findings from numerous domains interested in visualization decision making. The reviewed work provides support for specific processes in our proposed model including the influences of working memory, bottom-up attention, schema matching, inference processes, and decision making. Further, we identified key commonalities in the reviewed work relating to Type 1 and Type 2 processing, which we added to our proposed visualization decision-making model. The first is that utilizing Type 1 processing, visualizations serve to direct participants’ bottom-up attention to specific information, which can be either beneficial or detrimental for decision making (Fabrikant et al., 2010 ; Fagerlin et al., 2005 ; Hegarty et al., 2010 ; Hegarty et al., 2016 ; Padilla, Ruginski et al., 2017 ; Ruginski et al., 2016 ; Schirillo & Stone, 2005 ; Stone et al., 1997 ; Stone et al., 2003 ; Waters et al., 2007 ). Consistent with assertions from cognitive science and scientific visualization (Munzner, 2014 ), we propose that visualization designers should identify the critical information needed for a task and use a visual encoding technique that directs participants’ attention to this information. We encourage visualization designers who are interested in determining which elements in their visualizations will likely attract viewers’ bottom-up attention, to see the Itti et al. ( 1998 ) saliency model, which has been validated with eye-tracking measures (for implementation of this model along with Matlab code see Padilla, Ruginski et al., 2017 ). If deliberate effort is not made to capitalize on Type 1 processing by focusing the viewer’s attention on task-relevant information, then the viewer will likely focus on distractors via Type 1 processing, resulting in poor decision outcomes.

A second cross-domain finding is the introduction of a new concept, visual-spatial biases , which can also be both beneficial and detrimental to decision making. We define this term as a bias that elicits heuristics, which is a direct result of the visual encoding technique. We provide numerous examples of visual-spatial biases across domains (for implementation of this model along with Matlab code, see Padilla, Ruginski et al., 2017 ). The novel utility of identifying visual-spatial biases is that they potentially arise early in the decision-making process during bottom-up attention, thus influencing the entire downstream process, whereas standard heuristics do not exclusively occur at the first stage of decision making. This possibly accounts for the fact that visual-spatial biases have proven difficult to overcome (Belia et al., 2005 ; Grounds et al., 2017 ; Joslyn & LeClerc, 2013 ; Liu et al., 2016 ; McKenzie et al., 2016 ; Newman & Scholl, 2012 ; Padilla, Ruginski et al., 2017 ; Ruginski et al., 2016 ). Work by Tversky ( 2011 ) presents a taxonomy of visual-spatial communications that are intrinsically related to thought, which are likely the bases for visual-spatial biases.

We have also revealed cross-domain findings involving Type 2 processing, which suggest that if there is a mismatch between the visualization and a decision-making component, working memory is used to perform corrective mental transformations. In scenarios where the visualization is aligned with the mental schema and task, performance is fast and accurate (Joslyn & LeClerc, 2013 ). The types of mismatches observed in the reviewed literature are likely both domain-specific and domain-general. For example, situations where viewers employ the correct graph schema for the visualization, but the graph schema does not align with the task, are likely domain-specific (Dennis & Carte, 1998 ; Frownfelter-Lohrke, 1998 ; Gattis & Holyoak, 1996 ; Huang et al., 2006 ; Joslyn & LeClerc, 2013 ; Smelcer & Carmel, 1997 ; Tversky et al., 2012 ). However, other work demonstrates cases where viewers employ a graph schema that does not match the visualization, which is likely domain-general (e.g. Feeney et al., 2000 ; Gattis & Holyoak, 1996 ; Tversky et al., 2012 ). In these cases, viewers could accidentally use the wrong graph schema because it appears to match the visualization or they might not have learned a relevant schema. The likelihood of viewers making attribution errors because they do not know the corresponding schema increases when the visualization is less common, such as with uncertainty visualizations. When there is a mismatch, additional working memory is required resulting in increased time taken to complete the task and in some cases errors (e.g. Joslyn & LeClerc, 2013 ; McKenzie et al., 2016 ; Padilla, Ruginski et al., 2017 ). Based on these findings, we recommend that visualization designers should aim to create visualizations that most closely align with a viewer’s mental schema and task. However, additional empirical research is required to understand the nature of the alignment processes, including the exact method we use to mentally select a schema and the classifications of tasks that match visualizations.

The final cross-domain finding is that knowledge-driven processes can interact or override effects of visualization methods. We find that short-term (Dennis & Carte, 1998 ; Feeney et al., 2000 ; Gattis & Holyoak, 1996 ; Joslyn & LeClerc, 2013 ; Smelcer & Carmel, 1997 ; Tversky et al., 2012 ) and long-term knowledge acquisition (Shen et al., 2012 ) can influence decision making with visualizations. However, there are also examples of knowledge having little influence on decisions, even when prior knowledge could be used to improve performance (Galesic et al., 2009 ; Galesic & Garcia-Retamero, 2011 ; Keller et al., 2009 ; Lee & Bednarz, 2009 ; Okan et al., 2015 ; Okan, Garcia-Retamero, Cokely, & Maldonado, 2012 ; Okan, Garcia-Retamero, Galesic, & Cokely, 2012 ; Reyna et al., 2009 ; Rodríguez et al., 2013 ). We point out that prior knowledge seems to have more of an effect on non-visual-spatial biases, such as a familiarity bias (Belia et al., 2005 ; Joslyn & LeClerc, 2013 ; Riveiro, 2016 ; St. John et al., 2001 ), which suggests that visual-spatial biases may be closely related to bottom-up attention. Further, it is unclear from the reviewed work when knowledge switches from relying on working memory capacity for application to automatic application. We argue that Type 1 and 2 processing have unique advantages and disadvantages for visualization decision making. Therefore, it is valuable to understand which process users are applying for specific tasks in order to make visualizations that elicit optimal performance. In the case of experts and long-term knowledge, we propose that one interesting way to test if users are utilizing significant working memory capacity is to employ a dual-task paradigm (illustrated in Fig.  19 ). A dual-task paradigm can be used to evaluate the amount of working memory required and compare the relative working memory required between competing visualization techniques.

We have also proposed a variety of practical recommendations for visualization designers based on the empirical findings and our cognitive framework. Below is a summary list of our recommendations along with relevant section numbers for reference:

Identify the critical information needed for a task and use a visual encoding technique that directs participants’ attention to this information (“ Bottom-up attention ” section);

To determine which elements in a visualization will likely attract viewers’ bottom-up attention try employing a saliency algorithm (see Padilla, Quinan, et al., 2017 ) (see “ Bottom-up attention ”);

Aim to create visualizations that most closely align with a viewer’s mental schema and task demands (see “ Visual-Spatial Biases ”);

Work to reduce the number of transformations required in the decision-making process (see " Cognitive fit ");

To understand if a viewer is using Type 1 or 2 processing employ a dual-task paradigm (see Fig.  19 );

Consider evaluating the impact of individual differences such as graphic literacy and numeracy on visualization decision making.

Conclusions

We use visual information to inform many important decisions. To develop visualizations that account for real-life decision making, we must understand how and why we come to conclusions with visual information. We propose a dual-process cognitive framework expanding on visualization comprehension theory that is supported by empirical studies to describe the process of decision making with visualizations. We offer practical recommendations for visualization designers that take into account human decision-making processes. Finally, we propose a new avenue of research focused on the influence of visual-spatial biases on decision making.

Change history

02 september 2018.

The original article (Padilla et al., 2018) contained a formatting error in Table 2; this has now been corrected with the appropriate boxes marked clearly.

Dual-process theory will be described in greater detail in next section.

It should be noted that in some cases the activation of Type 2 processing should improve decision accuracy. More research is needed that examines cases where Type 2 could improve decision performance with visualizations.

Ancker, J. S., Senathirajah, Y., Kukafka, R., & Starren, J. B. (2006). Design features of graphs in health risk communication: A systematic review. Journal of the American Medical Informatics Association , 13 (6), 608–618.

Article   Google Scholar  

Baddeley, A. D., & Hitch, G. (1974). Working memory. Psychology of Learning and Motivation , 8 , 47–89.

Bailey, K., Carswell, C. M., Grant, R., & Basham, L. (2007). Geospatial perspective-taking: how well do decision makers choose their views? ​In  Proceedings of the Human Factors and Ergonomics Society Annual Meeting  (Vol. 51, No. 18, pp. 1246-1248). Los Angeles: SAGE Publications.

Balleine, B. W. (2007). The neural basis of choice and decision making. Journal of Neuroscience , 27 (31), 8159–8160.

Bandlow, A., Matzen, L. E., Cole, K. S., Dornburg, C. C., Geiseler, C. J., Greenfield, J. A., … Stevens-Adams, S. M. (2011). Evaluating Information Visualizations with Working Memory Metrics. In HCI International 2011–Posters’ Extended Abstracts , (pp. 265–269).

Chapter   Google Scholar  

Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods , 10 (4), 389.

Bertin, J. (1983). Semiology of graphics: Diagrams, networks, maps . ​Madison: University of Wisconsin Press.

Boone, A., Gunalp, P., & Hegarty, M. (in press). Explicit versus Actionable Knowledge: The Influence of Explaining Graphical Conventions on Interpretation of Hurricane Forecast Visualizations. Journal of Experimental Psychology: Applied .

Brügger, A., Fabrikant, S. I., & Çöltekin, A. (2017). An empirical evaluation of three elevation change symbolization methods along routes in bicycle maps. Cartography and Geographic Information Science , 44 (5), 436–451.

Caffò, A. O., Picucci, L., Di Masi, M. N., & Bosco, A. (2011). Working memory components and virtual reorientation: A dual-task study. In Working memory: capacity, developments and improvement techniques , (pp. 249–266). Hauppage: Nova Science Publishers.

Google Scholar  

Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization: using vision to think .  San Francisco: Morgan Kaufmann Publishers Inc.

Castro, S. C., Strayer, D. L., Matzke, D., & Heathcote, A. (2018). Cognitive Workload Measurement and Modeling Under Divided Attention. Journal of Experimental Psychology: General .

Cheong, L., Bleisch, S., Kealy, A., Tolhurst, K., Wilkening, T., & Duckham, M. (2016). Evaluating the impact of visualization of wildfire hazard upon decision-making under uncertainty. International Journal of Geographical Information Science , 30 (7), 1377–1404.

Connor, C. E., Egeth, H. E., & Yantis, S. (2004). Visual attention: Bottom-up versus top-down. Current Biology , 14 (19), R850–R852.

Cowan, N. (2017). The many faces of working memory and short-term storage. Psychonomic Bulletin & Review , 24 (4), 1158–1170.

Dennis, A. R., & Carte, T. A. (1998). Using geographical information systems for decision making: Extending cognitive fit theory to map-based presentations. Information Systems Research , 9 (2), 194–203.

Engel, A. K., Fries, P., & Singer, W. (2001). Dynamic predictions: Oscillations and synchrony in top–down processing. Nature Reviews Neuroscience , 2 (10), 704–716.

Engle, R. W., Kane, M. J., & Tuholski, S. W. (1999). Individual differences in working memory capacity and what they tell us about controlled attention, general fluid intelligence, and functions of the prefrontal cortex. ​ In A. Miyake & P. Shah (Eds.),  Models of working memory: Mechanisms of active maintenance and executive control  (pp. 102-134). New York: Cambridge University Press.

Epstein, S., Pacini, R., Denes-Raj, V., & Heier, H. (1996). Individual differences in intuitive–experiential and analytical–rational thinking styles. Journal of Personality and Social Psychology , 71 (2), 390.

Evans, J. S. B. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology , 59 , 255–278.

Evans, J. S. B., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science , 8 (3), 223–241.

Fabrikant, S. I., Hespanha, S. R., & Hegarty, M. (2010). Cognitively inspired and perceptually salient graphic displays for efficient spatial inference making. Annals of the Association of American Geographers , 100 (1), 13–29.

Fabrikant, S. I., & Skupin, A. (2005). Cognitively plausible information visualization. In Exploring geovisualization , (pp. 667–690). Oxford: Elsevier.

Fagerlin, A., Wang, C., & Ubel, P. A. (2005). Reducing the influence of anecdotal reasoning on people’s health care decisions: Is a picture worth a thousand statistics? Medical Decision Making , 25 (4), 398–405.

Feeney, A., Hola, A. K. W., Liversedge, S. P., Findlay, J. M., & Metcalf, R. (2000). How people extract information from graphs: Evidence from a sentence-graph verification paradigm. ​In  International Conference on Theory and Application of Diagrams  (pp. 149-161). Berlin, Heidelberg: Springer.

Frownfelter-Lohrke, C. (1998). The effects of differing information presentations of general purpose financial statements on users’ decisions. Journal of Information Systems , 12 (2), 99–107.

Galesic, M., & Garcia-Retamero, R. (2011). Graph literacy: A cross-cultural comparison. Medical Decision Making , 31 (3), 444–457.

Galesic, M., Garcia-Retamero, R., & Gigerenzer, G. (2009). Using icon arrays to communicate medical risks: Overcoming low numeracy. Health Psychology , 28 (2), 210.

Garcia-Retamero, R., & Galesic, M. (2009). Trust in healthcare. In Kattan (Ed.), Encyclopedia of medical decision making , (pp. 1153–1155). Thousand Oaks: SAGE Publications.

Gattis, M., & Holyoak, K. J. (1996). Mapping conceptual to spatial relations in visual reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition , 22 (1), 231.

PubMed   Google Scholar  

Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology , 62 , 451–482.

Gigerenzer, G., Todd, P. M., & ABC Research Group (2000). Simple Heuristics That Make Us Smart . ​Oxford: Oxford University Press.

Grounds, M. A., Joslyn, S., & Otsuka, K. (2017). Probabilistic interval forecasts: An individual differences approach to understanding forecast communication. Advances in Meteorology , 2017,  1-18.

Harel, J. (2015, July 24, 2012). A Saliency Implementation in MATLAB. Retrieved from http://www.vision.caltech.edu/~harel/share/gbvs.php

Hegarty, M. (2011). The cognitive science of visual-spatial displays: Implications for design. Topics in Cognitive Science , 3 (3), 446–474.

Hegarty, M., Canham, M. S., & Fabrikant, S. I. (2010). Thinking about the weather: How display salience and knowledge affect performance in a graphic inference task. Journal of Experimental Psychology: Learning, Memory, and Cognition , 36 (1), 37.

Hegarty, M., Friedman, A., Boone, A. P., & Barrett, T. J. (2016). Where are you? The effect of uncertainty and its visual representation on location judgments in GPS-like displays. Journal of Experimental Psychology: Applied , 22 (4), 381.

Hegarty, M., Smallman, H. S., & Stull, A. T. (2012). Choosing and using geospatial displays: Effects of design on performance and metacognition. Journal of Experimental Psychology: Applied , 18 (1), 1.

Hoffrage, U., & Gigerenzer, G. (1998). Using natural frequencies to improve diagnostic inferences. Academic Medicine , 73 (5), 538–540.

Hollands, J. G., & Spence, I. (1992). Judgments of change and proportion in graphical perception. Human Factors: The Journal of the Human Factors and Ergonomics Society , 34 (3), 313–334.

Huang, Z., Chen, H., Guo, F., Xu, J. J., Wu, S., & Chen, W.-H. (2006). Expertise visualization: An implementation and study based on cognitive fit theory. Decision Support Systems , 42 (3), 1539–1557.

Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence , 20 (11), 1254–1259.

Joslyn, S., & LeClerc, J. (2013). Decisions with uncertainty: The glass half full. Current Directions in Psychological Science , 22 (4), 308–315.

Kahneman, D. (2011). Thinking, fast and slow . (Vol. 1). New York: Farrar, Straus and Giroux.

Kahneman, D., & Frederick, S. (2002). Representativeness revisited: Attribute substitution in intuitive judgment. In Heuristics and biases: The psychology of intuitive judgment , (p. 49).

Kahneman, D., & Tversky, A. (1982). Judgment under Uncertainty: Heuristics and Biases , (1st ed., ). Cambridge; NY: Cambridge University Press.

Book   Google Scholar  

Kane, M. J., Bleckley, M. K., Conway, A. R. A., & Engle, R. W. (2001). A controlled-attention view of working-memory capacity. Journal of Experimental Psychology: General , 130 (2), 169.

Keehner, M., Mayberry, L., & Fischer, M. H. (2011). Different clues from different views: The role of image format in public perceptions of neuroimaging results. Psychonomic Bulletin & Review , 18 (2), 422–428.

Keller, C., Siegrist, M., & Visschers, V. (2009). Effect of risk ladder format on risk perception in high-and low-numerate individuals. Risk Analysis , 29 (9), 1255–1264.

Keren, G., & Schul, Y. (2009). Two is not always better than one: A critical evaluation of two-system theories. Perspectives on Psychological Science , 4 (6), 533–550.

Kinkeldey, C., MacEachren, A. M., Riveiro, M., & Schiewe, J. (2017). Evaluating the effect of visually represented geodata uncertainty on decision-making: Systematic review, lessons learned, and recommendations. Cartography and Geographic Information Science , 44 (1), 1–21. https://doi.org/10.1080/15230406.2015.1089792 .

Kinkeldey, C., MacEachren, A. M., & Schiewe, J. (2014). How to assess visual communication of uncertainty? A systematic review of geospatial uncertainty visualisation user studies. The Cartographic Journal , 51 (4), 372–386.

Kriz, S., & Hegarty, M. (2007). Top-down and bottom-up influences on learning from animations. International Journal of Human-Computer Studies , 65 (11), 911–930.

Kunz, V. (2004). Rational choice . Frankfurt: Campus Verlag.

Lallanilla, M. (2014, April 24, 2014 10:15 am). Misleading Gun-Death Chart Draws Fire. https://www.livescience.com/45083-misleading-gun-death-chart.html

Lee, J., & Bednarz, R. (2009). Effect of GIS learning on spatial thinking. Journal of Geography in Higher Education , 33 (2), 183–198.

Liu, L., Boone, A., Ruginski, I., Padilla, L., Hegarty, M., Creem-Regehr, S. H., … House, D. H. (2016). Uncertainty Visualization by Representative Sampling from Prediction Ensembles.  IEEE transactions on visualization and computer graphics, 23 (9), 2165-2178.

Lobben, A. K. (2004). Tasks, strategies, and cognitive processes associated with navigational map reading: A review perspective. The Professional Geographer , 56 (2), 270–281.

Lohse, G. L. (1993). A cognitive model for understanding graphical perception. Human Computer Interaction , 8 (4), 353–388.

Lohse, G. L. (1997). The role of working memory on graphical information processing. Behaviour & Information Technology , 16 (6), 297–308.

Marewski, J. N., & Gigerenzer, G. (2012). Heuristic decision making in medicine. Dialogues in Clinical Neuroscience , 14 (1), 77–89.

PubMed   PubMed Central   Google Scholar  

McCabe, D. P., & Castel, A. D. (2008). Seeing is believing: The effect of brain images on judgments of scientific reasoning. Cognition , 107 (1), 343–352.

McKenzie, G., Hegarty, M., Barrett, T., & Goodchild, M. (2016). Assessing the effectiveness of different visualizations for judgments of positional uncertainty. International Journal of Geographical Information Science , 30 (2), 221–239.

Mechelli, A., Price, C. J., Friston, K. J., & Ishai, A. (2004). Where bottom-up meets top-down: Neuronal interactions during perception and imagery. Cerebral Cortex , 14 (11), 1256–1265.

Meilinger, T., Knauff, M., & Bülthoff, H. H. (2008). Working memory in wayfinding—A dual task experiment in a virtual city. Cognitive Science , 32 (4), 755–770.

Meyer, J. (2000). Performance with tables and graphs: Effects of training and a visual search model. Ergonomics , 43 (11), 1840–1865.

Munzner, T. (2014). Visualization analysis and design . Boca Raton, FL: CRC Press.

Nadav-Greenberg, L., Joslyn, S. L., & Taing, M. U. (2008). The effect of uncertainty visualizations on decision making in weather forecasting. Journal of Cognitive Engineering and Decision Making , 2 (1), 24–47.

Nayak, J. G., Hartzler, A. L., Macleod, L. C., Izard, J. P., Dalkin, B. M., & Gore, J. L. (2016). Relevance of graph literacy in the development of patient-centered communication tools. Patient Education and Counseling , 99 (3), 448–454.

Newman, G. E., & Scholl, B. J. (2012). Bar graphs depicting averages are perceptually misinterpreted: The within-the-bar bias. Psychonomic Bulletin & Review , 19 (4), 601–607. https://doi.org/10.3758/s13423-012-0247-5 .

Okan, Y., Galesic, M., & Garcia-Retamero, R. (2015). How people with low and high graph literacy process health graphs: Evidence from eye-tracking. Journal of Behavioral Decision Making .

Okan, Y., Garcia-Retamero, R., Cokely, E. T., & Maldonado, A. (2012). Individual differences in graph literacy: Overcoming denominator neglect in risk comprehension. Journal of Behavioral Decision Making , 25 (4), 390–401.

Okan, Y., Garcia-Retamero, R., Galesic, M., & Cokely, E. T. (2012). When higher bars are not larger quantities: On individual differences in the use of spatial information in graph comprehension. Spatial Cognition and Computation , 12 (2–3), 195–218.

Padilla, L., Hansen, G., Ruginski, I. T., Kramer, H. S., Thompson, W. B., & Creem-Regehr, S. H. (2015). The influence of different graphical displays on nonexpert decision making under uncertainty. Journal of Experimental Psychology: Applied , 21 (1), 37.

Padilla, L., Quinan, P. S., Meyer, M., & Creem-Regehr, S. H. (2017). Evaluating the impact of binning 2d scalar fields. IEEE Transactions on Visualization and Computer Graphics , 23 (1), 431–440.

Padilla, L., Ruginski, I. T., & Creem-Regehr, S. H. (2017). Effects of ensemble and summary displays on interpretations of geospatial uncertainty data. Cognitive Research: Principles and Implications , 2 (1), 40.

Pashler, H. (1994). Dual-task interference in simple tasks: Data and theory. Psychological Bulletin , 116 (2), 220.

Patterson, R. E., Blaha, L. M., Grinstein, G. G., Liggett, K. K., Kaveney, D. E., Sheldon, K. C., … Moore, J. A. (2014). A human cognition framework for information visualization. Computers & Graphics , 42 , 42–58.

Pinker, S. (1990). A theory of graph comprehension. In Artificial intelligence and the future of testing , (pp. 73–126).

Ratliff, K. R., & Newcombe, N. S. (2005). Human spatial reorientation using dual task paradigms . Paper presented at the Proceedings of the Annual Cognitive Science Society.

Reyna, V. F., Nelson, W. L., Han, P. K., & Dieckmann, N. F. (2009). How numeracy influences risk comprehension and medical decision making. Psychological Bulletin , 135 (6), 943.

Riveiro, M. (2016). Visually supported reasoning under uncertain conditions: Effects of domain expertise on air traffic risk assessment. Spatial Cognition and Computation , 16 (2), 133–153.

Rodríguez, V., Andrade, A. D., García-Retamero, R., Anam, R., Rodríguez, R., Lisigurski, M., … Ruiz, J. G. (2013). Health literacy, numeracy, and graphical literacy among veterans in primary care and their effect on shared decision making and trust in physicians. Journal of Health Communication , 18 (sup1), 273–289.

Rosenholtz, R., & Jin, Z. (2005). A computational form of the statistical saliency model for visual search. Journal of Vision , 5 (8), 777–777.

Ruginski, I. T., Boone, A. P., Padilla, L., Liu, L., Heydari, N., Kramer, H. S., … Creem-Regehr, S. H. (2016). Non-expert interpretations of hurricane forecast uncertainty visualizations. Spatial Cognition and Computation , 16 (2), 154–172.

Sanchez, C. A., & Wiley, J. (2006). An examination of the seductive details effect in terms of working memory capacity. Memory & Cognition , 34 (2), 344–355.

Schirillo, J. A., & Stone, E. R. (2005). The greater ability of graphical versus numerical displays to increase risk avoidance involves a common mechanism. Risk Analysis , 25 (3), 555–566.

Shah, P., & Freedman, E. G. (2011). Bar and line graph comprehension: An interaction of top-down and bottom-up processes. Topics in Cognitive Science , 3 (3), 560–578.

Shah, P., Freedman, E. G., & Vekiri, I. (2005). The Comprehension of Quantitative Information in Graphical Displays . In P. Shah (Ed.) & A. Miyake, The Cambridge Handbook of Visuospatial Thinking (pp. 426-476). New York: Cambridge University Press.

Shah, P., & Miyake, A. (1996). The separability of working memory resources for spatial thinking and language processing: An individual differences approach. Journal of Experimental Psychology: General , 125 (1), 4.

Shen, M., Carswell, M., Santhanam, R., & Bailey, K. (2012). Emergency management information systems: Could decision makers be supported in choosing display formats? Decision Support Systems , 52 (2), 318–330.

Shipstead, Z., Harrison, T. L., & Engle, R. W. (2015). Working memory capacity and the scope and control of attention. Attention, Perception, & Psychophysics , 77 (6), 1863–1880.

Simkin, D., & Hastie, R. (1987). An information-processing analysis of graph perception. Journal of the American Statistical Association , 82 (398), 454–465.

Sloman, S. A. (2002). Two systems of reasoning. ​ In T. Gilovich, D. Griffin, & D. Kahneman (Eds.),  Heuristics and biases : The psychology of intuitive judgment (pp. 379-396). New York: Cambridge University Press.

Smelcer, J. B., & Carmel, E. (1997). The effectiveness of different representations for managerial problem solving: Comparing tables and maps. Decision Sciences , 28 (2), 391.

St. John, M., Cowen, M. B., Smallman, H. S., & Oonk, H. M. (2001). The use of 2D and 3D displays for shape-understanding versus relative-position tasks. Human Factors , 43 (1), 79–98.

Stanovich, K. E. (1999). Who is rational? Studies of individual differences in reasoning . New York City: Psychology Press.

Stenning, K., & Oberlander, J. (1995). A cognitive theory of graphical and linguistic reasoning: Logic and implementation. Cognitive Science , 19 (1), 97–140.

Stone, E. R., Sieck, W. R., Bull, B. E., Yates, J. F., Parks, S. C., & Rush, C. J. (2003). Foreground: Background salience: Explaining the effects of graphical displays on risk avoidance. Organizational Behavior and Human Decision Processes , 90 (1), 19–36.

Stone, E. R., Yates, J. F., & Parker, A. M. (1997). Effects of numerical and graphical displays on professed risk-taking behavior. Journal of Experimental Psychology: Applied , 3 (4), 243.

Trueswell, J. C., & Papafragou, A. (2010). Perceiving and remembering events cross-linguistically: Evidence from dual-task paradigms. Journal of Memory and Language , 63 (1), 64–82.

Tversky, B. (2005). Visuospatial reasoning. In K. Holyoak and R. G. Morrison (eds.), The Cambridge Handbook of Thinking and Reasoning , (pp. 209-240). Cambridge: Cambridge University Press.

Tversky, B. (2011). Visualizing thought. Topics in Cognitive Science , 3 (3), 499–535.

Tversky, B., Corter, J. E., Yu, L., Mason, D. L., & Nickerson, J. V. (2012). Representing Category and Continuum: Visualizing Thought . Paper presented at the International Conference on Theory and Application of Diagrams, Berlin, Heidelberg.

Vessey, I., & Galletta, D. (1991). Cognitive fit: An empirical study of information acquisition. Information Systems Research , 2 (1), 63–84.

Vessey, I., Zhang, P., & Galletta, D. (2006). The theory of cognitive fit. In Human-computer interaction and management information systems: Foundations , (pp. 141–183).

Von Neumann, J. (1953). Morgenstern, 0.(1944) theory of games and economic behavior . Princeton, NJ: Princeton UP.

Vranas, P. B. M. (2000). Gigerenzer's normative critique of Kahneman and Tversky. Cognition , 76 (3), 179–193.

Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement , 36 (4), 301–335.

Waters, E. A., Weinstein, N. D., Colditz, G. A., & Emmons, K. (2006). Formats for improving risk communication in medical tradeoff decisions. Journal of Health Communication , 11 (2), 167–182.

Waters, E. A., Weinstein, N. D., Colditz, G. A., & Emmons, K. M. (2007). Reducing aversion to side effects in preventive medical treatment decisions. Journal of Experimental Psychology: Applied , 13 (1), 11.

Wilkening, J., & Fabrikant, S. I. (2011). How do decision time and realism affect map-based decision making? Paper presented at the International Conference on Spatial Information Theory.

Zhu, B., & Watts, S. A. (2010). Visualization of network concepts: The impact of working memory capacity differences. Information Systems Research , 21 (2), 327–344.

Download references

This research is based upon work supported by the National Science Foundation under Grants 1212806, 1810498, and 1212577.

Availability of data and materials

No data were collected for this review.

Author information

Authors and affiliations.

Northwestern University, Evanston, USA

Lace M. Padilla

Department of Psychology, University of Utah, 380 S. 1530 E., Room 502, Salt Lake City, UT, 84112, USA

Lace M. Padilla, Sarah H. Creem-Regehr & Jeanine K. Stefanucci

Department of Psychology, University of California–Santa Barbara, Santa Barbara, USA

Mary Hegarty

You can also search for this author in PubMed   Google Scholar

Contributions

LMP is the primary author of this study; she was central to the development, writing, and conclusions of this work. SHC, MH, and JS contributed to the theoretical development and manuscript preparation. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lace M. Padilla .

Ethics declarations

Authors’ information.

LMP is a Ph.D. student at the University of Utah in the Cognitive Neural Science department. LMP is a member of the Visual Perception and Spatial Cognition Research Group directed by Sarah Creem-Regehr, Ph.D., Jeanine Stefanucci, Ph.D., and William Thompson, Ph.D. Her work focuses on graphical cognition, decision making with visualizations, and visual perception. She works on large interdisciplinary projects with visualization scientists and anthropologists.

SHC is a Professor in the Psychology Department of the University of Utah. She received her MA and Ph.D. in Psychology from the University of Virginia. Her research serves joint goals of developing theories of perception-action processing mechanisms and applying these theories to relevant real-world problems in order to facilitate observers’ understanding of their spatial environments. In particular, her interests are in space perception, spatial cognition, embodied cognition, and virtual environments. She co-authored the book Visual Perception from a Computer Graphics Perspective ; previously, she was Associate Editor of Psychonomic Bulletin & Review and Experimental Psychology: Human Perception and Performance .

MH is a Professor in the Department of Psychological & Brain Sciences at the University of California, Santa Barbara. She received her Ph.D. in Psychology from Carnegie Mellon University. Her research is concerned with spatial cognition, broadly defined, and includes research on small-scale spatial abilities (e.g. mental rotation and perspective taking), large-scale spatial abilities involved in navigation, comprehension of graphics, and the role of spatial cognition in STEM learning. She served as chair of the governing board of the Cognitive Science Society and is associate editor of Topics in Cognitive Science and past Associate Editor of Journal of Experimental Psychology: Applied .

JS is an Associate Professor in the Psychology Department at the University of Utah. She received her M.A. and Ph.D. in Psychology from the University of Virginia. Her research focuses on better understanding if a person’s bodily states, whether emotional, physiological, or physical, affects their spatial perception and cognition. She conducts this research in natural settings (outdoor or indoor) and in virtual environments. This work is inherently interdisciplinary given it spans research on emotion, health, spatial perception and cognition, and virtual environments. She is on the editorial boards for the Journal of Experimental Psychology: General and Virtual Environments: Frontiers in Robotics and AI . She also co-authored the book Visual Perception from a Computer Graphics Perspective .

Ethics approval and consent to participate

The research reported in this paper was conducted in adherence to the Declaration of Helsinki and received IRB approval from the University of Utah, #IRB_00057678. No human subject data were collected for this work; therefore, no consent to participate was acquired.

Consent for publication

Consent to publish was not required for this review.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional information

The original version of this article has been revised. Table 2 was corrected to be presented appropriately.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Padilla, L.M., Creem-Regehr, S.H., Hegarty, M. et al. Decision making with visualizations: a cognitive framework across disciplines. Cogn. Research 3 , 29 (2018). https://doi.org/10.1186/s41235-018-0120-9

Download citation

Received : 20 September 2017

Accepted : 05 June 2018

Published : 11 July 2018

DOI : https://doi.org/10.1186/s41235-018-0120-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Decision making with visualizations review
  • Cognitive model
  • Geospatial visualizations
  • Healthcare visualizations
  • Weather forecast visualizations
  • Uncertainty visualizations
  • Graphical decision making
  • Dual-process

what is visual representation in psychology

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Mental Imagery

If you close your eyes and visualize an apple, what you experience is mental imagery – visual imagery. But mental imagery is far more pervasive in our mental life than just visualizing. It happens in all sense modalities and it plays a crucial role not just in perception, but also in memory, emotions, language, desires and action-execution. It even plays a substantial role in our engagement with artworks, which makes it a key concept not only in philosophy of mind, but also in aesthetics.

1.1 Mental imagery in the empirical sciences

1.2 mental imagery vs. images, 1.3 mental imagery vs. imagination, 1.4 the content of mental imagery, 1.5 the format of mental imagery, 2.1 amodal completion, 2.2 multimodal mental imagery, 2.3 unusual forms of mental imagery in perception, 2.4 pain mental imagery, 3.1 mental imagery and memory, 3.2 mental imagery and emotion, 3.3 mental imagery and language, 3.4 mental imagery and knowledge, 4.1 mental imagery vs. motor imagery, 4.2 pragmatic mental imagery, 4.3 mental imagery and desire, 4.4 mental imagery and biased behavior, 5.1 mental imagery in the visual arts, 5.2 mental imagery in music, 5.3 mental imagery in literature, 5.4 mental imagery in conceptual art, other internet resources, related entries, 1. what is mental imagery.

Close your eyes and visualize an apple. Many readers will have a quasi-perceptual experience that may be a bit similar to actually seeing an apple. For those who do, this experience is an example of mental imagery – in fact, it is the kind of example philosophers use to introduce the concept.

It is not clear whether introducing the term ‘mental imagery’ by example is particularly helpful, for at least two reasons. First, there are well-demonstrated interpersonal variations in mental imagery (see Section 1.2), so much so that some people report no experience whatsoever when closing their eyes and visualizing an apple. Second, it is unclear how such an example like visualizing an apple could be generalized in a way that would give us a coherent concept. It does not seem like mental imagery is an ordinary language term – it was introduced at the end of the 19 th century (see Section 1.1 below) as a technical term in psychology and no languages other than English has a term that would mean mental imagery (as distinct from ‘imagination’ or ‘mental picture’). But if ‘mental imagery’ is indeed a technical term, then it is supposed to be used in a way that maximizes theoretical usefulness. In this case, theoretical usefulness means that we should use ‘mental imagery’ in a way that would help us to explain how the mind works.

This encyclopedia entry will not attempt to give an ordinary language analysis of the term ‘mental imagery’, partly because it is far from clear that ‘mental imagery’ is part of the ordinary language. Instead, the focus will be on the theoretically useful concept of mental imagery as it is used for explaining various mental phenomena in psychology, neuroscience and philosophy.

The concept of mental imagery was first consistently used in the then very new discipline of empirical psychology at the end of the 19 th century. At that time, psychologists like Francis Galton, Wilhelm Wundt or Edward Titchener (Galton 1880, Wundt 1912, Titchener 1909) thought of mental imagery as a mental phenomenon characterized by its phenomenology – a quasi-perceptual episode with a certain specific phenomenal feel. This stance lead to serious suspicion, and often the outright rejection, of this concept in the following decades when behaviorism dominated the psychological discourse (Kulpe 1895, Ryle 1949, Dennett 1969). It was not until the 1970s that mental imagery was again considered to be a respectable concept to study in the empirical sciences of the mind.

Just as perception can be characterized in a variety of ways, the same goes for mental imagery. One way of characterizing perception is in terms of its phenomenology: perception would be a mental process that is characterized by a certain specific phenomenology. The problem is that phenomenology is not publicly observable and, as a result, it is not a good starting point for scientific study. The same considerations apply for mental imagery. But we can also characterize perception functionally or neuroanatomically and these ways of thinking about perception would be publicly observable and, as a result, would be a good starting point for the scientific study of perception. And this is exactly how perceptual psychology and the neuroscience of perception proceeds. Again, the same considerations apply for mental imagery.

As a result, in recent decades psychologists and neuroscientists, rather than relying on introspection and phenomenology, characterized mental imagery in functional and neuroscientific terms. Here is a typical characterization from a review article that summarizes the state of the art concerning mental imagery in psychology, psychiatry and neuroscience, published in the flagship journal Trends in Cognitive Sciences : “We use the term ‘mental imagery’ to refer to representations […] of sensory information without a direct external stimulus” (Pearson et al. 2015, p. 590, see also Dijkstra et al. 2019). In short, according to the psychological definition, mental imagery is perceptual representation not triggered directly by sensory input (or representation-involving perceptual processing not triggered directly by sensory input – these two phrases will be used interchangeably in what follows).

The concept of directness may need some further clarification (and the same goes for the concept of “appropriate immediate sensory input” (Kosslyn et al., 1995, p. 1335, see also Shepard and Metzler 1971) that has also been used to specify what mental imagery lacks). The perceptual processing (in the early cortices) is triggered directly by sensory input if it is triggered without the mediation of some other (perceptual or extra-perceptual) processes. If the perceptual processing is triggered by something non-perceptual (as in the case of closing our eyes and visualizing), it is not triggered either directly or indirectly (see Section 1.3). If the perceptual processing in the visual sense modality is triggered by sensory input in the auditory sense modality (as in the case of an involuntary visual imagery of your face when I hear your voice with my eyes closed), the visual processing is triggered indirectly – with the mediation of auditory processing (see Section 2.2). A direct trigger here would be visual input, but there is no visual input in this case. And if the visual processing at the center of the visual field is triggered by input in the periphery of the visual field (say, because the center of the visual field is occluded by an empty white piece of paper), then the visual processing at the center of the visual field is, again, triggered indirectly, that is, in a way mediated by the visual processes in the periphery (see Section 2.1). A direct trigger would be sensory input at the center of the visual field, but there is no such sensory input in this case. According to the psychological definition of mental imagery, all three of these different examples of perceptual processing count as mental imagery as the perceptual processing is not triggered directly by the sensory input.

Contemporary philosophical thinking about mental imagery comes close to this way of defining mental imagery (see Nanay forthcoming for a summary). Gregory Currie, for example says that “episodes of mental imagery are occasions on which the visual system is driven off-line, disconnected from its normal sensory inputs” (Currie 1995, p. 26, see also Kind 2001, Richardson 1969, Currie and Ravenscroft 2002, but see also Section 1.2 below and see Fazekas et al. 2021, for more on the concept of offline perception).

This way of thinking about mental imagery allows us to examine mental imagery empirically and in a publicly observable manner. To put it very simply, if someone’s eyes are closed, so she receives no visual input and her early sensory cortices are nonetheless representing an equilateral triangle at the middle of the visual field (something that can be established fairly easily given the retinotopy of vision by means of fMRI), this is an instance of mental imagery.

This psychological conception of mental imagery is neutral about some seemingly important features of mental imagery. If mental imagery is perceptual representation not triggered directly by sensory input, then mental imagery may or may not be voluntarily triggered (see more on this distinction in Section 1.3 below). Further, it may or may not be conscious – even if you experience nothing, as long as there is a triangle in your primary visual cortex, but no triangle on your retina, you have mental imagery (see more on unconscious mental imagery in Section 1.2 below).

The psychological definition characterizes mental imagery is defined negatively: it is perceptual representation not triggered directly by sensory input. This leaves open the question what it is triggered by. Often it is triggered by higher level cognitive processes – this is the case when you count to three and visualize an apple. But it can also be triggered laterally by different sense modalities (see Section 2.2 below). And it can also be triggered by sensory input but in an indirect manner (see Section 2.1 below).

Mental imagery is often used interchangeably with the term ‘mental image’. This is misleading in more than one way. First of all, mental imagery is not necessarily visual. Just as perception can be visual, auditory, olfactory, tactile, gustatory, etc, the same goes for mental imagery (see, e.g., Young 2020). Auditory mental imagery, for example, plays a crucial role in listening to music (see Section 5.2 below). But it is not an ‘image’ in any meaningful sense of the term.

Second, and even more importantly, not everyone conjures up vivid and distinct images when they have mental imagery (see Kind 2017 for a summary of the vividness of imagery). There are people who, when they close their eyes and visualize an apple see no ‘images’ in their mind’s eye. They are referred to as aphantasics, a label that just means that they report no conscious mental imagery (Zeman et al. 2010). Aphantasia can have many causes, some having to do with voluntary control, some with the phenomenology of early cortical representations. But at least some aphantasics seem to have mental imagery that they are not aware of: they have unconscious mental imagery (Koenig-Robert and Pearson 2021, Nanay 2021c, see also Phillips 2014, Church 2008, Emmanouil and Ro 2014, Brogaard and Gatzia 2017 on unconscious mental imagery).

The very idea of unconscious mental imagery may raise some philosophical eyebrows and some philosophers indeed build consciousness into their definition of mental imagery (Richardson 1969, pp. 2–3, Kung 2010, p. 622). But if mental imagery is perceptual representation that is not directly triggered, then the bar for unconscious mental imagery should not be higher than the bar for perception per se, that is, for perceptual representation that is directly triggered. And we have plenty of evidence that perception is often unconscious: subjects with blindsight are not conscious of what they are staring at, but what they see systematically influences their behavior. And the same goes for healthy subjects when they look at very briefly presented or masked stimuli (see, e.g., Kentridge et al. 1999, Kouider & Dehaene 2007 as two representative examples of the vast literature on unconscious perception). If perception can be unconscious, then so can mental imagery.

Aphantasia comes on a spectrum. Researchers put together the so-called ‘vividness of visual imagery questionnaire’, which indicates how vivid one’s (visual) mental imagery is. Aphantasics score very low on this scale. People with very vivid mental imagery (often referred to as hyperphantasics) score very high. But most people are somewhere in between. These interpersonal variability in the vividness of mental imagery should make us even more wary of using introspective criteria for characterizing mental imagery as this would give different results in different people on different parts of the aphantasia-hyperphantasia spectrum.

Mental imagery is not imagination (Langland-Hassan 2015, 2020, Arcangeli 2020). Imagination is (typically) a voluntary act. Mental imagery is not. Mental imagery can be, and is very often, involuntary. When we have flashbacks to an unpleasant scene, this is mental imagery, but not imagination by any sense of the term (see also Gregory 2010, 2014, Wiltsher 2016 on the differences between imagination and mental imagery). It is involuntary mental imagery. The same goes for earworms: annoying tunes that go through our head in spite of the fact that we really don’t want them to. Again, this is not auditory imagination, but it is auditory mental imagery.

In spite of these differences, given that the term ‘mental imagery’ was not systematically used until the end of the 19 th century, throughout the history of philosophy people used the term ‘imagination’ to refer to what we now would describe as mental imagery. Thomas Hobbes, for example, talked about “retaining an image of the seen thing”, which comes very close to at least a subcategory of the current use of mental imagery in psychology and neuroscience, but he referred to this mental phenomenon as imagination (Hobbes 1651, Chapter 2). More generally, both the British empiricists and the German idealists used the term ‘imagination’ at least sometimes in the sense that would be captured by the concept of mental imagery nowadays (see Yolton 1996 for a summary). If we want to understand the evolution of philosophical thinking about mental imagery, we would need to go through all the historical texts about imagination and separate out references to voluntary acts (imagination proper) from references to mental imagery. This is not something that can be done in this encyclopedia entry.

The relation between mental imagery and imagination is important for another reason: we have seen that we can have mental imagery without imagination (see the flashback and the earworm examples). But how about the other way round? Can we have imagination without mental imagery? In other words, does imagination necessarily involve the exercise of mental imagery (Kind 2001, Van Leeuwen 2016, Langland-Hassan 2020)? This debate has been further complicated by the standard distinction between sensory and propositional imagination (roughly, imagining seeing x versus imagining that x is F), and the role imagery plays in these two forms of imagination – roughly, the difference between them is that the former, but not the latter is necessarily accompanied by (or constituted by) mental imagery. Without taking sides or venturing into the literature on the distinction between sensory and propositional imagination, it needs to be pointed out that many of the arguments on either side appeal to introspection (Byrne 2007, Chalmers 2002). If we allow for unconscious mental imagery, then these arguments would not lead to any kind of conclusive resolution. The only way in which we can assess whether imagination necessarily involves mental imagery is by empirical means.

Mental imagery is a form of representation. But what does it represent and how does it do so? If mental imagery is a perceptual or at least quasi-perceptual representation, it seems that it represents the way perceptual states represent. Perceptual states attribute properties to the perceived scene. Mental imagery attributes properties to the imagined scene (or imagined properties to the actual scene). Just what such ‘imagined’ attributed properties could be and how to think of the ‘imagined scene’ are highly controversial questions (see, e. g., Kulvicki 2014, Langland-Hassan 2015). What seems to be less controversial is that both forms of property attributions are underwritten by early cortical processes, and both are sensitive to the allocation of attention (Shea 2018, Dijkstra et al. 2019).

This similarity between perception and mental imagery in terms of content plays an important role in thinking about the phenomenology of these states. One old question concerning the relation between (conscious) perception and (conscious) mental imagery is about the phenomenal similarity between the two (Hume 1739, 1.1.1). Mental imagery can feel similar to perception, so much so that under experimental conditions, it is easy to confuse the two (Perky 1910, see also Hopkins 2012 for a contrasting view). Assuming that the phenomenal character of a state depends in some ways on its content (an assumption that doesn’t need to be as strong as that of intentionalism), we can explain this with reference to the similarity of the content of perception and the content of mental imagery.

Not just the similarities, but the differences between mental imagery and perception also need to be explained. And the difference between perceptual content and the content of mental imagery also plays an important role in the debate about a phenomenologically salient and historically influential difference between the vividness of perception and the vividness of mental imagery. The historically influential view, championed most memorably by the British empiricists, is that mental imagery is paler and less vivid than perception. Even if we set aside hyperphantasics, who report very similar vividness for mental imagery and perception, this distinction does not seem to hold across the board. The properties that constitute the content of mental imagery can be very determinate indeed – and most of the properties that constitute perceptual content are not particularly determinate (see Dennett 1996 for a classic argument). Nonetheless, determinacy plays a role in yet another major difference between perceptual content and the content of mental imagery.

When you look at a landscape and shift your attention from the tree on the left to the mountain range on the right, this implies a change in the determinacy of the perceptually attributed properties: the properties attributed to the tree will be less determinate than before and the properties attributed to the mountain range will be more determinate than before (Yeshurun and Carrasco 1998). Let’s focus on the change in determinacy in the latter case: the extra determinacy of the perceptually attributed properties comes from the sensory input: perceptual attention increases determinacy by means of extracting more information from the sensory input. In the case of mental imagery, in contrast, there is no sensory input to exploit, so when you close your eyes and imagine the same landscape with the tree on the left and the mountain range on the right and you shift your attention from the former to the latter, then the increased determinacy of the properties attributed to the mountain range can’t come from the sensory input. It must come from a top-down source – your beliefs or expectations or memories about mountain ranges (Nanay 2015).

The format of a representation is different from its content. Two representations can have the same content but different formats. The usual starting point of talking about representational format is the difference between the way pictures and sentences represent. Pictures represent imagistically or iconically and sentences represent non-imagistically or propositionally. They may represent the same thing: say, a red apple on a green table. But they represent this red apple on a green table differently (for example, to just mention one often-emphasized difference, very few parts of the sentence “there is a red apple on a green table” represent part of what the sentence itself represents, whereas many parts of the picture of the red apple on a green table represent part of what the whole picture represents) – the format of the representation is different.

So the question is: does mental imagery represent the way pictures do or the way sentences do? This was the central question of the so-called ‘Imagery Debate’ of the 1980s (in the imagistic corner: Kosslyn 1980, in the propositional corner: Pylyshyn 1981, see Tye 1991 for a good summary). It was this debate that made philosophers take the concept of mental imagery seriously again, after a long period of behaviorist-inspired skepticism about anything imagery-related.

The Imagery Debate is historically significant for yet another reason: it helped us to appreciate how interpersonal variations in mental imagery can have a major impact on one’s philosophical/theoretical positions. An important and fairly large study conducted at a time when the Imagery Debate was on its way out shows this very clearly. It mapped how philosophers’ and psychologists’ intuitions about the format of mental imagery vary as a result of the vividness of their mental imagery. The results showed that the vividness of imagery has significant impact on theoretical commitments in this debate (Reisberg 2003). Researchers with less vivid mental imagery were more likely to take the propositional side and those with more vivid mental imagery tended to come down on the iconic side.

As the dependence on the vividness of one’s mental imagery shows, it is far from clear that the Imagery Debate is a substantive debate, and many psychologists and neuroscientists (including some of the original participants of this debate) explicitly declared this debate dead (see esp. Pearson and Kosslyn 2015). There are many ways of characterizing the distinction between imagistic and propositional formats, some more controversial than others. Appeal to holism or the ‘picture principle’ have been more on the controversial side (Kulvicki 2014). Describing iconic format as “representation of magnitudes, by magnitudes” (Peacocke 2019, p. 52) is on the less controversial side. And at least according to these criteria it seems clear that mental imagery has iconic format.

Perceptual representations represent magnitudes by means of magnitudes. In the case of vision, for example, they represent magnitudes like illumination, contours, color and they do so by means of magnitudes in the early sensory cortices. The early visual cortices are retinotopic (Grill-Spector and Malach 2004 for a summary and Talavage et al. 2004 on equivalent claims regarding the non-visual sense modalities). If you are looking at a triangle, there is a roughly triangle-pattern activation of direction-sensitive neurons in your primary visual cortex. This is iconic format par excellence. And if you visualize a triangle, there is also a roughly triangle-pattern activation of direction-sensitive neurons in your primary visual cortex (Kosslyn et al. 2006). Again, iconic format par excellence, at least according to the ‘representation of magnitudes, by magnitudes’ criteria.

2. Mental imagery in perception

The role of mental imagery in perception has been an important theme in the history of philosophy. We have seen the debate about the phenomenal similarities and differences between mental imagery and perception in Section 1.4. But there is an even more important question about the relation between mental imagery and perception, namely, about whether and in what sense perception depends on mental imagery. This has been a dominant theme in the history of philosophy and Immanuel Kant was probably the most explicit proponent of a fairly strong constitutive dependence claim. Kant famously claimed that imagination is “a necessary ingredient of perception itself” ( Critique of Pure Reason , A120, fn. A) and this claim has become quite influential not just in philosophy (Strawson 1974, p. 54, Sellars 1978), but in the history of ideas in general. Eugène Delacroix, for example, wrote in his diary on September 1, 1859 that “Even when we look at nature, our imagination constructs the picture” (see also Briscoe 2018 and Van Leeuwen 2011 for examples of perception/mental imagery hybrid).

As we have seen in Section 1.3, in Kant’s time, imagination and mental imagery were not systematically kept apart and a charitable interpretation of Kant’s claim would be that what is a necessary ingredient of perception itself is not voluntary imagination (as we don’t voluntarily imagine each time we perceive), but rather mental imagery (see Strawson 1974 and Gregory 2017 for discussion of just how charitable such interpretation would be). So the charitable interpretation of the Kantian Thesis is that mental imagery is a necessary ingredient of perception itself.

This is a constitutive claim: perception doesn’t merely depend on mental imagery causally, it depends on mental imagery constitutively. This, like all constitutive claims, is a fairly strong one, and a much more modest, also historically influential, pre-Kantian, view dominant among, for example, the British empiricists, would be that perception does not depend on mental imagery at all, of if it does, it depends on it merely causally. While there has never been an explicit debate between these two positions, recent empirical research helps us to assess the respective merits of these two ways of thinking about the relation between perception and mental imagery.

Amodal completion in the visual sense modality is the representation of occluded parts of perceived objects. When we see a cat behind a picket fence, we complete those parts of the cat amodally that are hidden behind the planks. But amodal completion is not just a visual phenomenon: in the auditory sense modality, we amodally complete, for example, beeped out parts of a soundtrack and in the tactile sense modality, we amodally complete the entire shape of the wine glass we hold although we only touch it with the tip of our fingers (see also Young and Nanay forthcoming on olfactory amodal completion). Amodal completion is the representation of those parts of perceived objects that we get no sensory stimulation from (Michotte et al. 1964, Nanay 2018b).

Amodal completion is perceptual representation, as a vast amount of neuroscientific studies show that it happens very early in the sensory cortices, in the visual case it happens in the primary visual cortex (Lee and Nguyen 2001, Ban et al. 2013, Pan et al. 2012, see also Briscoe 2011). And it is not directly triggered by sensory input as the amodally completed shape is not directly triggered by the retinal input – the retinal input that would correspond to the amodally completed contour is empty – no contour on the retina there. In the case of the cat behind the picket fence, the shape of the occluded tail is represented in the primary visual cortex, but there is no corresponding shape on the retina that could have directly triggered this shape representation: the only thing on the part of the retina that would correspond to the shape of the tail is just the monochrome white of the picket fence. Amodal completion is, in this sense, perceptual representation that is not directly triggered by sensory input (a view widely shared among empirical researchers, see van Lier and Ekroll 2020 for a summary).

Is amodal completion a form of mental imagery then? Not everyone thinks so. One could argue that amodal completion is not a perceptual phenomenon at all, but rather a cognitive one: we see the unoccluded parts and then form beliefs about the occluded ones (see Briscoe 2011 for one version of this claim). There are two sorts of reasons to worry about this proposal. First, there are phenomenological worries: it just doesn’t feel as if we merely had beliefs about occluded parts of perceived objects (see, e.g., Noe 2004). Second, there are empirical problems. In particular, this way of thinking about amodal completion does not (and, arguably, could not) explain why the occluded contours show up in early cortical regions of perceptual processing and do so very quickly after stimulus presentation (Sekuler and Palmer 1992, Rauschenberger and Yantis 2001).

And amodal completion is partly constitutive of perception per se. The vast majority of our perceptual states involve amodal completion. Take the visual sense modality: when you look around, you see objects further away from you partly occluded by objects closer to you. So your perceptual system amodally completes these occluded bits of the objects further away from you. But amodal completion is also involved in the representation of the unoccluded objects – you don’t get direct sensory input from the back side of these objects, nonetheless you represent them perceptually – which means you amodally complete the back side of all three-dimensional objects (Bakin et al. 2000, Ekroll et al. 2016). In short, amodal completion is partly constitutive of perception per se. And if amodal completion is indeed a form of mental imagery, then we have reason to think that mental imagery is partly constitutive of perception, just as the charitable interpretation of Kant suggests.

We have seen that the negative definition of mental imagery as perceptual representation that is not directly triggered by sensory input allows for lateral triggering of this perceptual representation. This amounts to perceptual representation in one sense modality, say vision, triggered by sensory input in another sense modality, say, audition. As this is not a direct trigger (your eyes could be closed, so nothing triggers your visual representation directly), this is a form of mental imagery. And it is what is known as multimodal mental imagery (Spence and Deroy 2013, Lacey and Lawson 2013, Nanay 2018a).

One everyday example of multimodal mental imagery is watching the tv muted: your auditory representation is not triggered directly by auditory input (as there is no auditory input), but by visual input (the image on tv). If the person speaking on tv is someone famous whom you have often heard before, you may even have the phenomenal experience of ‘hearing’ this person’s voice ‘in your mind’s ear’. But even if you don’t have this phenomenal experience, your early cortical auditory processes work differently depending on what famous person you see on your muted tv (Pekkola et al. 2005, Hertrich et al. 2011).

Given that most of the things around us are multisensory objects and events, which just means we can get information about them by means of more than one sense modality and given that most of them we do not actually get information about by means of all the possible sense modalities, this means that the norm is that we have multimodal mental imagery of most objects and events around us (even if they are unconscious, we have plenty of evidence that these are instances of unconscious mental imagery, rather than no representation at all, see, e.g., Vetter et al. 2014). This is another important example of why and how perception may depend constitutively on mental imagery, in this case, multimodal mental imagery.

Visually impaired people often report having visual mental imagery. And we know that with the exception of cortical blindness, the visual cortices of blind people remain more or less intact. Hence, (non-cortically) blind people can and do have visual mental imagery that is triggered by sensory input in another sense modality, for example, audition or tactile perception (Arditi et al. 1988). In short, blind people can and do have multimodal mental imagery.

The multimodal mental imagery of the visually impaired plays an important role in various ways in which they can navigate their environment. Cane use and brail reading rely on the subject’s visual mental imagery triggered by tactile input as does echolocation, a more and more widespread means by which blind people can learn to gather information about the spatial layout of their environment (by making clicking sounds and use the echo of these sounds as the source of spatial information). It is now known that echolocation relies on processing in the early visual cortices: it is visual mental imagery that is triggered auditorily (Thaler et al. 2011). Finally, sensory substitution devices also create visual mental imagery. These devices consist of a video camera mounted on the head of the blind subject that provides a continuous stream of tactile or auditory input (transferred from the visual input the camera registers – for example gentle needle pokes on the subject’s skin in a pattern that corresponds, in real time, with the image the camera records). This tactile input then leads to processing in the early visual cortices of these blind subject (which then gives rise to an experience that the subjects characterize as visual). In short, what is referred to as sensory substitution assisted perception is in fact another example of multimodal mental imagery (Renier et al. 2005, see Nanay 2017a for a summary).

Another ‘unusual’ form of mental imagery in general and multimodal mental imagery in particular is in synesthesia. Synesthetes report strong visual experiences of a specific color in response to auditory or tactile (or a wide variety of other non-color) experiences. It has been widely debated just what kind of experience synesthetic experience is. Is it a form of perceptual experience (Matthen 2017, Cohen 2017)? Or is it some kind of higher level, cognitive/linguistic experience (Simner 2007)? The problem is that synesthesia doesn’t really seem to fit squarely in any of these categories.

The connection between synesthesia and mental imagery has long been acknowledged: Synesthetes across the board have more vivid mental imagery than non-synesthetes (Barnett and Newell 2008, Price 2009, Amsel et al. 2017) And this difference is modality specific – so lexical gustatory synesthesia subjects have more vivid gustatory mental imagery (but not necessarily more vivid mental imagery in the, say, auditory sense modality (Spiller et al. 2015). Further, synesthesia is very rare among aphantasia subjects (who report no or hardly any mental imagery) and relatively frequent among hyperphantasia subjects (who report very vivid mental imagery) (Zeman et al. 2020). While there is significant variability between the experiences synesthetes report (see Dixon et al. 2004) and some, but not all of these experiences are reported to be very similar to mental imagery, all instances of synesthesia will count as mental imagery understood as perceptual representation formed in response to early cortical processing that is not triggered directly by sensory input (Nanay 2021a). This gives a unified account of synesthesia and can also explain non-standard (but rigorously demonstrated) cases of synesthetic experiences triggered not by sensory input but by imagining sensory input (Spiller and Jansari 2008).

Perhaps the most useful application of multimodal mental imagery involves pain management. More specifically, one of the most efficient ways of alleviating (chronic) pain is by means of the use of mental imagery in other sense modalities (Fardo et al. 2015, MacIver et al. 2008 and Volz et al. 2015). This presents something of a conundrum: why does mental imagery in, say, the visual sense modality help us with pain?

Pain perception, in textbook cases, starts with the stimulation of pain receptors, known as nociceptors, and this input is then processed in the primary and secondary somatosensory cortices. But sometimes pain processing in the primary and secondary somatosensory cortices is not directly triggered by nociceptors. This would be the equivalent of mental imagery in the context of pain perception – something we could call pain imagery.

The question is then about the relation between pain perception and pain imagery: between processing in the primary and secondary somatosensory cortices that is directly triggered by nociceptors and processing that is not. And the claim that would be structurally similar to the Kantian claim we considered in Section 2.1 and Section 2.2 is that just as visual mental imagery is a crucial ingredient of vision and multimodal mental imagery is a crucial ingredient of perception, pain mental imagery (representation formed as a result of perceptual processing in the primary and secondary somatosensory cortices that is not triggered directly by nociception) is a crucial ingredient of pain perception (in fact, it may even be partly constitutive of it, see Nanay 2017b).

In some instances of pain perception, mental imagery plays an even more central role: for example, phantom limb pain (pain some subjects feel in amputated limbs) consists of cortical pain processing (in S1/S2) that is not triggered by nociceptors (Ramachandran et al. 1995) for the simple reasons that the relevant nociceptors are missing (they have been cut off with the rest of the limb). Further, the thermal grill illusion (where applying warmth to the index and the ring finger and cold to the middle finger triggers strong pain sensation in the middle finger) is also an instance of sensory pain processing that is not triggered by nociception (Defrin et al. 2002). In both cases, as the nociception is missing, there is no pain perception, but only pain imagery.

There may be reasons to generalize the importance of pain mental imagery in pain perception. One important feature of pain perception is that it is extremely dependent on expectations (when you expect painful sensation, a non-painful stimulus can lead to pain sensation, see Sawamoto et al. 2000; Keltner et al. 2006; Ploghaus et al. 1999). If we consider at least some forms of expectations to be (future-oriented) temporal mental imagery (see Section 5.2 for more on expectations and mental imagery), then these results are easily explained.

3. Mental imagery in cognition

Mental imagery is a perceptual phenomenon, but it has important uses in post-perceptual processing and in cognition more generally. Mental imagery is involved in a wide variety of cognitive phenomena and it is deeply intertwined with emotions, memory and even language (see also the rich literature on the role of imagery in inner speech, e.g., Langland-Hassan and Vicente 2018).

The concept of mental imagery has played an important role in the philosophy of memory for at least two reasons. First, imagery training improves memory (in fact, findings along these lines sparked the revival of research into mental imagery in the 1960s, see Yates 1966, Luria 1960). Second, and more importantly, a fundamental distinction in the philosophy of memory is drawn between episodic and semantic memory (Tulving 1972). To put it very simply, episodic memory is remembering an experience and semantic memory is remembering a fact. And one way of cashing out this difference is in terms of mental imagery: mental imagery is a necessary feature of episodic memory, but not of semantic memory.

The connection between episodic memory and mental imagery has been supported by a wide variety of empirical findings (see Laeng et al. 2014 for a summary). The loss of the capacity to form mental imagery results in the loss (or loss of scope) of episodic memory (Byrne et al. 2007, see also Berryhill et al. 2007’s overview). An even more important set of findings is that relevant sensory cortical areas are reactivated when we recall an experience (Wheeler et al. 2000, see also Gelbard-Sagiv et al. 2008).

These results show that episodic memory involves the exercise of mental imagery, but it is an open debate whether there is more to episodic memory than mental imagery. Some have argued that episodic memory has some extra ingredients besides mental imagery, for example, some sort of causal chain to the past observed event (e.g., Bernecker 2010). In contrast, some other philosophers of memory claim that episodic memory is really nothing but mental imagery (Michaelian 2016, De Brigard 2014, Hopkins 2018). The claim is that there is no real difference between future-directed mental time travel (that is, imagining the future) and past-directed mental time travel (episodic memory). Whether we go with the stronger or the weaker claim about the importance of mental imagery in memory, understanding memory seems to presuppose understanding mental imagery.

Try to imagine, as vividly as you can, being attacked by a rabid dog, foaming at the mouth, snapping at your feet right there under your desk. The resulting mental imagery is an important form of mental imagery and also an important form of emotional state. More generally, imagery dramatically affects emotions – it seems for instance difficult to make sense of what goes on in the mind of a fearful or angry person without appealing to imagery. On the other hand, the impact of emotions on imagery is equally significant – the imagery that occupies our minds is very often under the control of our dominant emotion, which sometimes alters its fabric and our capacity to control it. In other words, there is a two-way interaction between emotions and mental imagery (see Holmes and Matthews 2010 for a summary).

Recent findings support this picture of the close connection between mental imagery and emotions. For example, imagining an emotionally charged event or person at an emotionally neutral place confers emotional charge to the place (see Benoit et al. 2019). It has been known for a while that seeing a negatively valenced event (say, a fight between two friends of yours) at a neutral place (say, the corridor in front of your office) makes this formerly neutral place inherit the negative valence of the event. So, in the future, when you see the corridor of your office, it triggers slight (or not so slight) negative emotions. The crucial finding is that the same process also takes place even if you merely imagine a negatively valenced event at a neutral place. In short, negatively valenced mental imagery confers valence on various components of the imagined scene, which then remain emotionally valenced.

The degree to which imagery and affective states are intertwined is further emphasized by the mood congruency effect (Blaney 1986, Matt et al. 1992, Gaddy and Ingram 2014). The most famous example of mood congruency effect is mood congruent memory (Loeffler et al. 2013) – we are more likely to recall scary memories when we are scared, for example. But mood congruency also works in the case of mental imagery: your general mood makes it more likely that you form mental imagery that is congruent with your mood. And it makes it less likely that you form mental imagery that is not congruent with your mood. We also encode emotionally salient stimuli in a more detailed manner, which makes it possible to form more vivid mental imagery (Yonelinas & Ritchey 2015, Hamann 2001, LaBar & Cabeza 2006, Phelps 2004).

Throughout the history of philosophy imagistic mental representations have been routinely contrasted with abstract, linguistic representations (see Yolton 1996 for a summary). So the assumption here is that there is a sharp contrast between two different kinds of mental representations: imagistic ones, like mental imagery and abstract, linguistic ones. And when we talk about the importance of mental imagery in human cognition, the reach of mental imagery is limited as there is an extra layer of mental representations, abstract, linguistic ones, which have nothing to do with mental imagery. This, in fact, may be one of the reasons why the obsessive emphasis on language at the middle of the 20 th century sidelined the philosophical study of mental imagery. Either way, the overall picture then is that there is imagistic cognition and there is linguistic cognition and the two have nothing to do with each other. There have been important debates about where to draw the line between these two domains of the mind: almost all imagistic cognition (a broadly Humean picture) vs almost all linguistic cognition (a broadly Wittgensteinian picture).

Empirical findings work against a common presupposition in this debate. We now know that language processing is not completely detachable from imagistic cognition. One important set of findings come from the ‘dual coding theory’ (Paivio 1971, 1986), according to which linguistic representations themselves are partly constituted (or at least necessarily accompanied) by mental imagery and this explains why concrete words (that are accompanied by more determinate mental imagery) are easier to recall than abstract words (that are accompanied by less determinate and in some cases very indeterminate mental imagery).

While Paivio’s dual coding theory posited the importance of mental imagery in linguistic processing to explain the behavioral differences between the recall of concrete and abstract words, we also know a lot about the ways in which linguistic labels change (and speed up) perceptual processes as well as a fair amount about the time scale of this influence. The crucial piece of finding both from EEG and from eye tracking studies is that linguistic labels influence shape recognition in less than 100 milliseconds (Boutonnet and Lupyan 2015, de Groot et al. 2016, Noorman et al. 2018). This means that linguistic and imagistic representations interact at an extremely early stage of perceptual processing – by any account in early cortical processing (see Thorpe et al. 1996 and Lamme and Roelfsema 2000 for the temporal unfolding of visual processing in unimodal cases). All this indicates that imagistic and linguistic cognition are far from being independent from one another – they are deeply intertwined even at the earliest levels of perceptual processing.

Perception sometimes justifies our beliefs. If you see that it is raining outside, this may justify your belief that it is raining outside. And much of what we know is based on perception. But how about mental imagery? Can mental imagery justify our beliefs? There are two related, but independent questions here. The first one is about whether mental imagery could ever be a source of knowledge or even new information. And the second one is about reliability: if perception is colored by mental imagery, should this give us a more complex picture of perceptual justification?

The first of these questions is about whether mental imagery itself (that is not in conjunction with perception) can lead to knowledge or even to new information. Jean-Paul Sartre, for example, famously claimed that “nothing can be learned from an image that is not already known.” (Sartre 1948, 12) Since on his view “it is impossible to find in the image anything more than what was put into it,” we can conclude that “the image teaches nothing.” (Sartre 1948, 146–7). Sartre was not always making a clear distinction between imagination and mental imagery, so it is not clear whether it is imagination or mental imagery that teaches nothing. Contemporary philosophers tend to raise this issue about imagination (Langland-Hassan 2016, 2020, see also Kind and Kung 2016), but the question from our point of view is whether it is true of mental imagery. And here some examples seem to suggest that even if imagination teaches nothing, mental imagery can and does. When you want to wrap a chocolate box in wrapping paper, you look at it and form (often involuntarily, not by counting to three and voluntarily imagining) visual imagery of the wrapping paper needed and you may find your estimation of the size of the paper unexpected or surprising. Maybe it’s larger than you had assumed. Or smaller. Your estimation of the size of the paper needed can be very different before and after forming the mental imagery of the paper covering the chocolate box (and this can, of course, be still different from the size of the paper actually needed, see Gauker forthcoming for more examples of this kind). If we can generalize from this example (see Levin 2006 for discussion), then mental imagery can lead to new information even if imagination cannot.

The second question is about the reliability of perceptual justification. If (see Section 2 above), perception per se is a hybrid between sensory stimulation-driven perception and mental imagery, what does this mean for the concept of perceptual justification? Even if sensory stimulation-driven perception can justify beliefs, if mental imagery does not, then the hybrid state of the two, that is, perception per se, may not be as epistemically innocent as it has been thought (MacPherson 2012). Mental imagery is defined precisely by the lack of direct causal link with the sensory input. In any kind of broadly externalist account of justification, this raises worries about the epistemic work that mental imagery can do (as the reliability of mental imagery is supposed to depend on the directness of the causal link between mental imagery and what the mental imagery is about). This does not mean that mental imagery does no epistemic work as the lack of a direct causal link would be compatible with the mental imagery nonetheless carrying information about the external world reliably – and, arguably, this is exactly what happens in the case of amodal completion (Helton and Nanay 2019). But if we take the importance of mental imagery in perception seriously, we need to examine the reliability of these non-direct causal links of mental imagery.

4. Mental imagery in action

Mental imagery plays an important role in action. It is involved not only in action execution, but also in desires. Further, it can explain many of the biases in our behavior as well.

We need to keep the concept of mental imagery apart from motor imagery. Motor imagery plays a crucial role in action planning and action execution, but motor imagery is not mental imagery. But how exactly this distinction is to be drawn is subject to debate.

Motor imagery has been traditionally understood as the feeling of imagining doing something. It is sometimes taken to be necessarily conscious, not just by philosophers (Currie & Ravenscroft 1997), sometimes even by psychologists (Jeannerod 1997; see also Brozzo 2017: esp. 243–244 for an overview). And as imagining tends to be a voluntary act, motor imagery is also often taken to be voluntary. So the paradigmatic example here is closing your eyes and imagining reaching for an apple.

There are debates, however, about what this traditional, phenomenological way of zeroing in on motor imagery as the feeling of imagining doing something entails. As it is acknowledged by all involved in this debate, not all imaginative episodes of doing something would count as motor imagery: you somehow need to imagine doing something from a first person, and not a third person perspective. Marc Jeannerod, one of the most important psychologists working on both motor imagery and mental imagery made a distinction (following the practice in sport psychology) between internal (first person) and external (third person) imagery, and only the former would count as motor imagery (the latter would be sensory imagery of me doing something, see Jeannerod 1994, p. 189).

Given that motor imagery, just like mental imagery, can be conscious or unconscious (see, for example, Osuagwu & Vuckovic 2014) and it can also be voluntary or involuntary, there has been a tendency to move away from phenomenological characterization. A more inclusive way of understanding motor imagery is supported by the methodological advice by Jeannerod, who writes: “Motor imagery would be related to motor physiology in the same way visual imagery is related to visual physiology” (Jeannerod 1994, p. 189). And here a better understanding of mental imagery can help us with defining motor imagery.

Mental imagery is the representation that results from perceptual processing that is not triggered directly by sensory input. So we get mental imagery when the first stop of perceptual processing happens without direct sensory input. Motor imagery is to the output what mental imagery is to the input. So we get motor imagery when the last stop in action processing happens without directly triggering motor output. In other words, motor imagery is the representation that results from processing in the motor system (in the motor and premotor cortices) that does not trigger motor output directly.

Another open question about the relation between motor imagery and mental imagery is about whether the former necessarily involves the latter. When we think of conscious examples of motor imagery, it seems that imagining touching the camera of my laptop involves some form of sensory mental imagery (maybe visual imagery of my finger touching the camera, or, maybe, more minimally, proprioceptive mental imagery of my finger being at a different location from where it is now). And empirical studies also show that motor imagery necessarily entails representing the sensory consequences of the imagined action (Kilteni et al. 2018).

Not only motor imagery, but mental imagery also plays an important role in action execution (see Van Leeuwen 2011). Some of our actions (in fact, most of them) are perceptually guided actions: our perceptual states trigger and guide our action. When we pick up a coffee cup to drink from it, this is a perceptually guided action: our perceptual state represents the spatial location of the cup, which then guides your reaching movement (and does so in real time, if the perceptual state changes, your reaching movement changes immediately without you noticing any of these changes, see e.g., Paulignan et al. 1991).

If, after looking at the coffee cup, you close your eyes and pick up the cup with your eyes closed, your action is not perceptually guided as you do not perceive the coffee cup anymore. It is, in this case, guided by your visual mental imagery. You looked at the cup, you close your eyes, form mental imagery of the cup’s whereabouts (as well as its other properties that are necessary for performing this action, like its weight and size) and your reaching action is guided by this ‘pragmatic mental imagery’ (Nanay 2013).

In the first case, the pragmatic mental imagery was formed on the basis of your perceptual state: you looked at the cup and then you closed your eyes, but it is this visual information that the mental imagery is based on. But pragmatic mental imagery is more than just some kind of echo of sensory input. Suppose that you are in your bedroom and it is pitch dark. You want to switch on the light, but you can’t see the switch. You are nonetheless in a position to switch it on given your memory of the room’s layout and the location of the light switch in it. In this case, your pragmatic mental imagery is formed on the basis of your memory. But pragmatic mental imagery can be triggered by completely non-perceptual means as well, for example, if I blindfold you and then explain to you in great details where exactly the coffee cup is in front of you, how far exactly to the left and how far exactly ahead, and so on. Your pragmatic mental imagery can still guide your action, but it does so without any (visual) input. In our everyday life many of our actions, especially our routine actions, like flossing, are in fact guided by pragmatic mental imagery.

Desires are among the prime examples of propositional attitudes. So one would be tempted to think that desires have nothing to do with mental imagery: they are propositional and not imagistic representations. Nonetheless, one of the leading psychological theories of desire, the elaborated intrusion theory, takes mental imagery to be constitutive of desires (Kavanagh et al. 2005, May et al. 2014).

According to the elaborated intrusion theory, forming a desire is a two-step process. First, a mental state intrudes our mind, which represents the desirable state of affairs. This often happens unconsciously, and it is often not clear what triggers this intruding mental state. The second step is that this representation is elaborated with the help of mental imagery. Without this second, elaborating step, which necessarily involves mental imagery, we would not have a desire.

But one does not need to endorse the elaborated intrusion theory of desire to see the close link between desires and mental imagery. Strong occurrent desire is invariably accompanied by vivid mental imagery (Kavanagh et al. 2009). Further, stronger desires (for example, to smoke) are accompanied by more vivid mental imagery (of smoking-related scenes) (Tiffany and Drobes 1990, see also Tiffany and Hakenewerth 1991). Similarly, desire for consuming alcohol can be induced by imagining entering one’s favorite bar, ordering, holding and tasting a cold, refreshing glass of one’s favorite beer. In fact, this guided imaginative episode triggers a stronger desire than actually seeing a glass of beer (Litt and Cooney 1999). More generally, the vividness of mental imagery is correlated with the strength of one’s desire for this thing across a range of desirable substances and activities (May et al. 2008, Harvey et al. 2005, Statham et al. 2011).

Further, mental imagery of neutral scenes, for example, a rose garden, reduces desire for a cigarette in people who are trying to give up smoking (May et al, 2010). Olfactory mental imagery of unrelated odors has the same effect (Versland and Rosenberg 2007). Desire for eating chocolate can also be reduced by mental imagery of neutral scenes (Kemps and Tiggermann 2007, Harvey et al. 2005) and also by engaging involuntary mental imagery (by, for example, modelling clay out of sight (Kemps et al. 2004)). Some of these results show that mental imagery influences desires. Others show that mental imagery is a downstream consequence of desires. In short, if we manipulate mental imagery, the desire changes and if we manipulate desires, the mental imagery changes.

While the elaborated intrusion theory of desire is explicit about the role of mental imagery in desires, other influential empirically plausible accounts of desire (like the reward-based learning account (Schroeder 2004) or the attentional account (Scanlon 1998)) are also consistent with the importance of mental imagery in desires.

Some of our behavior is biased: it goes against our reported beliefs. And often we are not fully aware of these biases. Some of these biases are about racial or gender groups. A big question not just in philosophy and psychology, but in the daily running of our society is where these biases come from and what we can do about them. There is some evidence that at least some of these biases have a lot to do with mental imagery.

First of all, a number of empirical studies show that the vividness of mental imagery biases our behavior. If you are deciding between two positive scenarios, the one that brings up the more vivid mental imagery tends to win out. And if you are deciding between two negative scenarios, the one that brings up the less vivid mental imagery tends to win out (Austin and Vancouver 1996, Trope and Liberman 2003, see also the rich literature on construal level theory and also on the effects of the vividness of mental imagery on future discounting, see Parthasarathi et al. 2017, Mok et al. 2020). Here is an example: If a smoker is deciding between smoking a cigarette and not smoking one, the smoking option brings up very vivid and detailed (and emotionally charged) mental imagery. Meanwhile, the non-smoking option doesn’t bring up any mental imagery at all, or if it does it is not at all detailed and not at all vivid (of just sitting there, not smoking). This is why smoking tends to win out, and also why it is often difficult to stop procrastinating activities like playing video games or checking our social media feed: continuing what we have been doing is represented much more vividly than stopping.

Mental imagery can also explain some famous examples of racial bias (Nanay 2021b, see also Sullivan-Bissett 2019, who describes implicit racial bias as unconscious imagination, not imagery). Subjects are more likely to misperceive a phone as a gun if a black person holds it than if a white person does so (Payne 2001). The perceptual state that represents a black person holding a phone gives rise to the mental imagery that represents a black person holding a gun. This mental imagery does not have to be conscious – and when white people rate black people as more dangerous, it is possible that the mental imagery that grounds these judgments is not conscious. The same is true of the biased behavior of standing further away from some people than others in the elevator. The importance of mental imagery in implicit bias is also supported by the fact that one of the most efficient ways of counteracting implicit bias is based on modifying the subject’s mental imagery and the efficiency of these procedures correlates with the vividness of the subjects’ mental imagery (see Lai et al 2013, Blair et al. 2001, Blair 2002, see also Peck et al. 2013 for further relevant findings).

5. Mental imagery in art

The importance of mental imagery can be traced beyond the confines of philosophy of mind. More specifically, mental imagery plays an important role in our engagement with and appreciation of artworks, which makes mental imagery a crucial concept in aesthetics (see also Lopes 2003). While mental imagery may also play a crucial role in artistic creation, as many artists and composers like to emphasize, the focus here will be on the importance of mental imagery in engaging with artworks.

A somewhat obvious way in which mental imagery plays a role in our engagement with visual arts follows from the simple fact that most pictorial art does not normally encompass the entire visual field. So those parts of the depicted scene that fall outside the frame, could be, and very often are, represented by means of mental imagery. One famous example would be Edgar Degas, who likes to place the protagonists of his paintings in a way that only parts of them are inside the frame. The rest we need to complete by means of mental imagery. In some extreme cases (e.g., Dancers climbing the stairs, 1886–1890, Musee D’Orsay), we only see someone’s arm or the top of their head and we need to complete those parts of her body that are outside the frame by means of mental imagery. Another example is Buster Keaton, who also uses the viewer’s mental imagery of the off-screen space in his films, but normally for comical effects. One example is the first shot of his short film Cops (1922), where we see the protagonist in close up behind bars and looking depressed. The second shot reveals that he is behind an iron gate talking to a girl who does not love him back (see Burch 1973, pp. 17–31 for more examples of this kind).

But mental imagery is also often used within the picture frame. In the 1950 American film Harvey , the character played by Jimmy Stewart is an alcoholic and he hallucinates a six foot three and a half inch tall rabbit (or pooka). We don’t see anyone, but the Jimmy Stewart character clearly does. And, crucially, all the scenes with the imaginary rabbit are framed as if there really were a rabbit in them. So when we see the Jimmy Stewart character in an armchair having a conversation with Harvey, this shot is framed in a way as if there really were a six foot tall creature next to him. This framing is aesthetically relevant and its choice clearly relies on the viewer’s mental imagery. In this example, we have a fairly good idea what we’re supposed to form a mental imagery of – the Jimmy Stewart character gives a fairly accurate description of Harvey’s alleged appearance. But there are examples where we are in a much less fortunate epistemic situation. One classic example is Bunuel’s Belle de Jour , where the Chinese businessman shows a little box to the Cathrine Denevue character, who is clearly fascinated by what is inside. She sees it, he sees it, but we, the viewers don’t. There is a humming voice coming from the box, but we never see what is inside. We have a very indeterminate (crossmodally triggered) visual mental imagery of what could possibly be in the box – whatever is in the box is left intentionally indeterminate. The French film director, Robert Bresson often uses mental imagery this way, so much so that he even takes this use of mental imagery to be the mark of a ‘good’ director (or, as he would put it, of a cinematographer, not merely of a director): “Don’t show all sides of the object. A margin of indefiniteness” (Bresson 1975/1977, p. 52).

Multimodal mental imagery became a hallmark of 1960s European modernist art films. In some of his films, Jean-Luc Godard used sound primarily as a prompt for triggering visual mental imagery (see Levinson 2014’s sensitive analysis of the use of sound in Masculin/Feminin (1966) from this point of view). And both Bresson and Michelangelo Antonioni used sound this way for much of their career, and they were also very explicit about this way of using sound in their theoretical writings and interviews. As Bresson said, “The eye solicited alone makes the ear impatient, the ear solicited alone makes the eye impatient. Use these impatiences” (Bresson 1975/1977, p. 28) and “A locomotive’s whistle imprints on us a whole railroad station” (Bresson 1975/1977, p. 39). And here is Antonioni giving a textbook definition of multimodal mental imagery: “When we hear something, we form images in our head automatically in order to visualize what we hear” (Antonioni 1982, p. 6). Both Bresson and Antonioni use multimodal mental imagery that is indeterminate and that is also very much emotionally laden. As a counterbalance to this high-brow overkill, it needs to be emphasized that multimodal mental imagery can also be used in a very different manner and still be aesthetically relevant. As Ridley Scott repeatedly emphasizes in his interviews about his Alien trilogy, the Alien is shown relatively rarely because having mental imagery of it is much scarier than seeing it. This general credo has been used in suspense for a long time (from Hitchcock films to Jaws ). Finally, the recurring joke on Friends about the ugly naked guy who lives across the street (but whom we never see) clearly utilizes multimodal mental imagery.

Mental imagery also plays a crucial role in our appreciation of music, primarily as a result of the importance of musical expectations, which are a form of auditory mental imagery. Expectations play a crucial role in our engagement with music (and art in general). When we are listening to a song, even when we hear it for the first time, we have some expectations of how it will continue. And when it is a tune we are familiar with, this expectation can be quite strong (and easy to study experimentally). When we hear Ta-Ta-Ta at the beginning of the first movement of Beethoven’s Fifth Symphony in C minor, Op. 67 (1808), we will strongly anticipate the closing Taaam of the Ta-Ta-Ta-Taaaam. Much of our expectations are fairly indeterminate: when we are listening to a musical piece we have never heard before, we will still have some expectations of how a tune will continue, but we don’t know what exactly will happen. We can rule out that the violin glissando will continue with the sounds of a beeping alarm clock (unless it’s a really avant-garde piece…), but we can’t predict with great certainty how exactly it will continue. Our expectations are malleable and dynamic: they change as we listen to the piece.

Expectations are mental states that are about how the musical piece will unfold. So they are future-directed mental states. But this leaves open just what kind of mental states they are – how they are structured, how they represent this upcoming future event and so on (see Judge and Nanay 2021 for an overview of the options and the history of this question). At least some forms of expectations in fact count as mental imagery. And musical expectations (of the kind involved in examples like the Ta-Ta-Ta-Taaaam) count as auditory temporal mental imagery: they are auditory representations that result from perceptual processes that are not directly triggered by the auditory input. The listener forms mental imagery of the fourth note (‘Taaam’) on the basis of the experience of the first three (‘Ta-Ta-Ta’) (there is a lot of empirical evidence that this is in fact what happens – see Yokosawa et al. 2013, Kraemer et al. 2005, Zatorre and Halpern 2005, Herholz et al. 2012, Leaver et al. 2009). This mental imagery may or may not be conscious. But if the actual ‘Taaaam’ diverges from the way our mental imagery represents it (if it is delayed, or altered in pitch or timbre for example), we notice this divergence and experience as salient as a result of the mismatch between the experience and the mental imagery that preceded it.

The Ta-Ta-Ta-Taaam example is a bit simplified, so here is a real-life and very evocative case study, an installation by the British artist, Katie Peterson. The installation is an empty room with a grand piano in it, which plays automatically. It plays a truncated version of Beethoven’s Moonlight Sonata. The title of the installation is ‘ Earth-Moon-Earth (Moonlight Sonata Reflected From The Surface of The Moon ’ (2007). Earth-Moon-Earth is a form of transmission (between two locations on Earth), where Morse codes are beamed up the moon and they are reflected back to earth. While this is an efficient way of communicating between two far-away (Earth-based) locations, some information is inevitably lost (mainly because some of the light does not get reflected back but it is absorbed in the Moon’s craters). In ‘ Earth-Moon-Earth (Moonlight Sonata Reflected From The Surface of The Moon ’ (2007) the piano plays the notes that did get through the Earth-Moon-Earth transmission system, which is most of the notes, but some notes are skipped. Listening to the music the piano plays in this installation, if you know the piece, your auditory mental imagery is constantly active, filling in the gaps where the notes are skipped.

Reading a novel tends to lead to mental imagery in a variety of sense modalities. This triggering of mental imagery is typically involuntary: you do not need to count to three and voluntarily conjure up the mental imagery of the protagonist’s face, instead, you have involuntary mental imagery episodes somewhat reminiscent of flashbacks. While this kind of mental imagery is often visual (when you have imagery of the protagonist’s face or the layout of the room where they are), it can also be auditory (of the protagonist’s tone of voice, for example), olfactory or even gustatory (see Starr 2013 for a wide-ranging analysis with an emphasis on multimodal mental imagery and Stokes 2019 for the role such mental imagery plays in reading fictional works). Further, the more vivid the reader’s mental imagery is, the more likely it is that information from the novel is imported into the reader’s beliefs about the real world (Green and Brock 2000).

At the end of the first book of In Search of Lost Time , Marcel Proust gives a brief but very sophisticated account of how words trigger mental imagery, which is also indicative of the way Proust himself manipulates the reader’s mental imagery (Proust 1913/1928). He makes a distinction between names and words and argues that names trigger a more specific or more determinate mental imagery than words. In both cases, the name or word leads to mental imagery, but then, in turn, mental imagery influences or colors the name or word when we encounter it next time. So throughout the unfolding of the novel, names/words and the mental imagery they occasion evolve in parallel, influencing each other.

Other writers also actively manipulate the reader’s mental imagery. George Orwell points out the importance of mental imagery in understanding metaphors when he says in Poetics and the English Language that “The sole aim of metaphor is to call up a visual image”. We might add to this that this imagery is often not visual, it can be auditory, olfactory, etc. And here is a final example from the third part of Roberto Bolano’s novel 2666 (‘The Part about Fate’). This part of the book introduces a New York-based journalist, Oscar Fate. After about 80 pages of description of Fate’s life in New York City, it is revealed that he is in fact African-American. This comes after very explicit nudges to form mental imagery of him as Caucasian, confronting the readers with their implicit racial bias (see also Section 4.4 above).

While discussions of mental imagery crop up in most fields of aesthetics and art history (including by some of the most influential art historians, like George Kubler, see Kubler 1987), the role of mental imagery is probably the most salient if we turn to conceptual art. Many conceptual artworks actively try to engage our mental imagery in an unexpected manner. Here are two illustrative (and famous) examples, but the point can be generalized.

Marcel Duchamp’s L.H.O.O.Q. Rasée (1965) is a picture that is perceptually indistinguishable from a faithful reproduction of Leonardo’s Mona Lisa . But Duchamp earlier made another picture ( L.H.O.O.Q. ) where he drew a moustache and beard on the picture of Mona Lisa . Duchamp’s L.H.O.O.Q. Rasée (as ‘rasée’ means ‘shaven’) is a reference to this earlier picture and we, presumably, see it differently from the way we see Leonardo’s original: the missing moustache and beard is part of our experience, whereas it is not when we look at Leonardo’s original. And it is difficult to see how we can describe our experience of L.H.O.O.Q. Rasée without some reference to the mental imagery of the missing beard and moustache. What is interesting in this example is that the mental imagery of the beard and moustache is influenced in a top-down manner not just by our prior knowledge (about how the world is) but also by our prior art historical knowledge.

The second example is Robert Rauschenberg’s Erased de Kooning drawing (1953), which is just what it says it is: all we see is an empty paper (with hardly visible traces of the erased drawing on it). Again, it is difficult to look at this artwork without trying to discern what drawing might have been there before Rauschenberg erased it. And this involves trying to conjure up mental imagery of the original drawing. Again, these were two classic examples. But there are more. All of Ai Wei Wei’s works, for example, rely heavily on our mental imagery.

Not all works of conceptual art evokes mental imagery this way. One exception would be Robert Barry’s All the things I know , which is nothing but the following sentence written on the gallery wall with simple block letters: “All the things I know but of which I am not at the moment thinking – 1:36 PM; June 15, 1969”. It would be difficult to argue that this work has much interest in enticing the viewer’s mental imagery. But it is not easy to find an example of a conceptual artwork where mental imagery plays no role. So in the vast majority of conceptual artworks, mental imagery is a necessary feature of appreciating the artwork.

  • Amsel, B. D., M. Kutas and S. Coulson, 2017, “Projectors, Associators, Visual Imagery and the Time Course of Visual Processing in Grapheme-color Synesthesia,” Cognitive Neuroscience , 8 (4): 206–223. doi:10.1080/17588928.2017.1353492
  • Antonioni, M., 1982, “La method de Michelangelo Antonioni (interview with Serge Daney),” Cahiers du Cinema , nr. 342: 4–7, 61–65.
  • Arcangeli, M., 2020, “The Two Faces of Mental Imagery,” Philosophy and Phenomenological Research , 101: 304–322.
  • Arditi, A. J. D. Holtzman and S. M. Kosslyn, 1988, “Mental Imagery and Sensory Experience in Congenital Blindness,” Neuropsychologia , 26: 1–12.
  • Austin, J. T., & J. B. Vancouver , 1996 , “Goal constructs in psychology: Structure, process, and content,” Psychological Bulletin , 120(3): 338–375.
  • Bakin, J., K. Nakayama, & C. Gilbert, 2000, “Visual responses in monkey areas V1 and V2 to three-dimensional surface configurations,” Journal of Neuroscience , 20: 8188–8198.
  • Ban, H., H. Yamamoto, T. Hanakawa, S. Urayama, T. Aso, H. Fukuyama and Y. Ejima, 2013. “Topographic representation of an occluded object and the effects of spatiotemporal context in human early visual areas,” Journal of Neuroscience , 33: 16992–17007.
  • Barnett, K. J., & F. N. Newell, 2008, “Synaesthesia is associated with enhanced, self-rated visual imagery,” Consciousness and Cognition , 17(3): 1032–9.
  • Benoit, R. G., P. C. Paulus and D. L. Schachter, 2019, “Forming attitudes via neural activity supporting affective episodic simulations,” Nature Communications , 10: 2215.
  • Bernecker, S., 2010, Memory: A Philosophical Study , Oxford: Oxford University Press.
  • Berryhill, M. E., L. Phuong, L. Picasso, R. Cabeza, and I. R. Olson, 2007, “Parietal Lobe and Episodic Memory: Bilateral Damage Causes Impaired Free Recall of Autobiographical Memory,” Journal of Neuroscience , 27: 14415–14423.
  • Blair, I. V., 2002, “The malleability of automatic stereotypes and prejudice,” Personality and Social Psychology Review , 6: 242 – 61.
  • Blair, I. V., J. E. Ma, and A. P. Lenton, 2001, “Imagining stereotypes away: the moderation of implicit stereotypes through mental imagery,” Journal of Personality and Social Psychology , 81: 828 – 41.
  • Blaney, P. H., 1986, “Affect and Memory,” Psychological Bulletin , 99: 229–246.
  • Boutonnet, B. & G. Lupyan, 2015, “Words Jump-Start Vision: A Label Advantage in Object Recognition,” Journal of Neuroscience , 35: 9329–9335.
  • Bresson, R., 1975 [1977], Note on the cinematographer , Paris: Gallimard, 1975; New York: Urizen, 1977.
  • Briscoe, R., 2011, “Mental Imagery and the Varieties of Amodal Perception,” Pacific Philosophical Quarterly , 92: 153–173.
  • –––, 2018, “Superimposed Mental Imagery: On the Uses of Make-Perceive,” in F. Macpherson and F. Dorsch (eds.), Perceptual Memory and Perceptual Imagination , Oxford: Oxford University Press.
  • Brogaard, B. and D. E. Gatzia, 2017, “Unconscious imagination and the mental imagery debate,” Frontiers in Psychology , 8: 799. doi:10.3389/fpsyg.2017.00799
  • Brozzo, C., 2017, “Motor intentions: How intentions and motor representations come together,” Mind & Language , 32: 231–256.
  • Burch, N., 1973, Theory of Film Practice , New York: Praeger.
  • Byrne, A., 2007, “Possibility and imagination,” Philosophical Perspectives , 21: 125–44.
  • Byrne P., S. Becker, N. Burgess, 2007, “Remembering the past and imagining the future: a neural model of spatial memory and imagery,” Psychological Review , 114: 340–375.
  • Chalmers, D. J., 2002, “Does conceivability entail possibility?”, in T. S. Gendler and J. Hawthorne (eds.), Conceivability and Possibility , Oxford: Oxford University Press, pp. 145–200.
  • Church, J., 2008, “The hidden image: A defense of unconscious imagining and its importance,” American Imago , 65: 379–404.
  • Cohen, J., 2017, “Synaesthetic perception as continuous with ordinary perception, or, we are all synesthetes now,” in O. Deroy (ed.), Sensory Blending , Oxford: Oxford University Press.
  • Currie, G., 1995, “Visual Imagery as the Simulation of Vision,” Mind and Language , 10: 25–44.
  • Currie, G. & I. Ravenscroft, 1997, “Mental simulation and motor imagery,” Philosophy of Science , 64(1): 161–180.
  • –––, 2002, Recreative Minds: Imagination in Philosophy and Psychology , Oxford: Oxford University Press.
  • De Brigard, F., 2014, “Is memory for remembering? Recollection as a form of episodic hypothetical thinking,” Synthese , 191(2): 155–185.
  • de Groot, F., F. Huettig, & C. N. L. Olivers, 2016, “When meaning matters: The temporal dynamics of semantic influences on visual attention,” Journal of Experimental Psychology: Human Perception and Performance , 42: 180–196.
  • Defrin R., A. Ohry, N. Blumen, and G. Urca, 2002, “Sensory Determinants of Thermal Pain,” Brain , 125: 501–10.
  • Dennett, D. C., 1969, “The Nature of Images and the Introspective Trap,” reprinted in Content and Consciousness , London: Routledge & Kegan Paul, pp. 132–46.
  • –––, 1996, “Seeing is believing—or is it?”, in K. Akins (ed.), Perception , Oxford: Oxford University Press, pp. 111–31.
  • Dijkstra, N., S. E. Boschand M. A. J. van Gerven, 2019, “Shared neural mechanisms of visual perception and imagery,” Trends in Cognitive Sciences , 23: 423–434.
  • Dixon, M. J, D. Smilek, & P. M. Merikle, 2004, “Not All Synaesthetes are Created Equal: Projector versus Associator Synaesthetes,” Cognitive, Affective & Behavioral Neuroscience , 4(3): 335–43.
  • Ekroll, V., B. Sayim, R. Van der Hallen and J. Wagemans, 2016, “Illusory Visual Completion of an Object’s Invisible Backside Can Make Your Finger Feel Shorter,” Current Biology , 26: 1029–1033.
  • Emmanouil, T. & T. Ro, 2014, “Amodal Completion of Unconsciously Presented Objects,” Psychonomic Bulletin & Review , 21(5): 1188–94.
  • Fardo, F., M. Allen, E-M. E. Jegindo, A. Angrilli, and A. Roepstorff, 2015, “Neurocognitive Evidence for Mental Imagery-Driven Hypoalgesic and Hyperalgesic Pain Regulation,” NeuroImage , 120: 350–61.
  • Fazekas, P., B. Nanay and J. Pearson, 2021, “Offline Perception,” Special issue of Philosophical Transactions of the Royal Society B , 376 (1817): 20190686
  • Gaddy, M. A. and R. E. Ingram, 2014, “A Meta-analytic Review of Mood-congruent Implicit Memory in Depressed Mood,” Clinical Psychological Review , 34: 402–416.
  • Galton, F., 1880, “Statistics of Mental Imagery,” Mind , 5: 301–18.
  • Gauker, C., forthcoming, “On the difference between realistic and fantastic imaginings,” Erkenntnis .
  • Gelbard-Sagiv, H., R. Mukamel, M. Harel, R. Malach, and I. Fried, 2008, “Internally Generated Reactivation of Single Neurons in Human Hippocampus During Free Recall,” Science , 322 (5898): 96–101.
  • Green, M. C., & T. C. Brock, 2000, “The role of transportation in the persuasiveness of public narratives,” Journal of Personality and Social Psychology , 79(5): 701–721.
  • Gregory, D., 2010, “Imagery, the Imagination and Experience,” Philosophical Quarterly , 60: 735–753.
  • –––, 2014, Showing, Sensing, and Seeming: Distinctively Sensory Representations and their Contents , Oxford: Oxford University Press.
  • –––, 2017, “Visual expectations and visual imagination,” Philosophical Perspectives , 31: 187–206.
  • Grill-Spector, K. and R. Malach, 2004, “The human visual cortex,” Annual Review of Neuroscience , 27: 649–677.
  • Hamann, S., 2001, “Cognitive and Neural Mechanisms of Emotional Memory,” Trends in Cognitive Science , 5: 394–400
  • Harvey, K., E. Kemps, M. Tiggemann, 2005, “The nature of imagery processes underlying food cravings,” British Journal of Health Psychology , 10: 49–56.
  • Helton, G. and B. Nanay, 2019, “Amodal completion and knowledge,” Analysis , 79: 415–423.
  • Hertrich, I., S. Dietrich, & H. Ackermann, 2011, “Cross-modal interactions during perception of audiovisual speech and nonspeech signals: an fMRI study,” Journal of Cognitive Neuroscience , 23: 221–237.
  • Hobbes, T., 1651, Leviathan , London.
  • Holmes, E. A. and A. Matthews, 2010, “Mental imagery in emotion and emotional disorders,” Clinical Psychology Review , 30: 349–362.
  • Hopkins, R., 2018, “Imagining the past,” in F. Macpherson (ed.), Perceptual Imagination and Perceptual Memory , Oxford: Oxford University Press, pp. 46–71.
  • –––, 2012, “What Perky did not show,” Analysis , 72: 431–439.
  • Hume, D., 1739, A Treatise of Human Nature , London.
  • Jeannerod, M., 1994, “The representing brain: Neural correlates of motor intention and imagery,” Behavioral and Brain Sciences , 17: 187–245.
  • –––, 1997, The Cognitive Neuroscience of Action , Oxford: Blackwell.
  • Judge, J. and B. Nanay, 2021, “Expectations,” in N. Nielsen, J. Levinson and T. McAuley (eds.), Oxford Handbook of Music and Philosophy , New York: Oxford University Press, pp. 997–1018.
  • Kavanagh, D. J., J. Andrade, J. May, 2005, “Imaginary relish and exquisite torture: The Elaborated Intrusion theory of desire,” Psychological Review , 112 (2): 446–467.
  • Kavanagh, D. J., J. May, J. Andrade, 2009, “Tests of the Elaborated Intrusion Theory of craving and desire: Features of alcohol craving during treatment for an alcohol disorder,” British Journal of Clinical Psychology , 48: 241–254.
  • Keltner, J. R., A. Furst, C. Fan, R. Redfern, B. Inglis, and H. K. Fields, 2006, “Isolating the Modulatory Effect of Expectation on Pain Transmission: An fMRI Study,” Journal of Neuroscience , 26: 4437–43.
  • Kemps, E. and M. Tiggemann, 2007, “Modality-specific imagery reduces cravings for food: An application of the elaborated intrusion theory of desire to food craving,” Journal of Experimental Psychology-Applied , 13(2): 95-104.
  • Kemps, E., M. Tiggemann, J. Orr, J. Grear, 2014, “Attentional retraining can reduce chocolate consumption,” Journal of Experimental Psychology-Applied , 20(1): 94–102.
  • Kentridge, R. W., C. A. Heywood, and L. Weiskrantz, 1999, “Attention without awareness in blindsight,” Proceedings of the Royal Society of London B , 266: 1805–1811
  • Kilteni, K. B. J. Andersson, C. Houborg, H. H. Ehrson, 2018, “Motor imagery involves predicting the sensory consequences of the imagined movement”, Nature Communications , 9: 1617. doi: 10.1038/s41467-018-03989-0
  • Kind, A., 2017, “Imaginative vividness,” Journal of the American Philosophical Association , 3: 32–50.
  • –––, 2013, “The Heterogeneity of the Imagination,” Erkenntnis , 78(1): 141–59.
  • Kind, A. and P. Kung (eds.), 2016, Knowledge Through Imagination , New York: Oxford University Press.
  • Koenig-Robert, R. & J. Pearson, 2021, “Why do imagery and perception look and feel so different?”, Philosophical Transactions of the Royal Society B, 376 (1817): 20190703.
  • Kosslyn, S. M., 1980, Image and Mind , Cambridge, MA: Harvard University Press.
  • Kosslyn, S. M., M. Behrmann, & M. Jeannerod, 1995a, “The cognitive neuroscience of mental imagery,” Neuropsychologia , 33: 1335–1344.
  • Kosslyn, S. M., W. L. Thompson, and G. Ganis, 2006, The Case for Mental Imagery , Oxford: Oxford University Press.
  • Kouider, S., & S. Dehaene, 2007, “Levels of processing during non-conscious perception: A critical review of visual masking,” Philosophical Transactions of the Royal Society B , 362: 857–875.
  • Kraemer, D. J. M., C. N. Macrae, A. E. Green, & W. M. Kelley, 2005, “Musical imagery: Sound of silence activates auditory cortex,” Nature , 434: 158.
  • Kubler, G., 1987, “Eidetic imagery and Paleolithic art,” Yale University Art Gallery Bulletin , 40: 78–85.
  • Kulpe, O., 1895, Outlines of Psychology , London: Sonnenschein.
  • Kulvicki, J., 2014, Images , London: Routledge.
  • Kung, P., 2010, “Imagining as a Guide to Possibility,” Philosophy and Phenomenological Research , 81(3): 620–663.
  • LaBar, K. S. & R. Cabeza, 2006, “Cognitive Neuroscience of Emotional Memory,” Nature Reviews Neuroscience , 7: 54–64.
  • Lacey, S. and R. Lawson (eds.), 2013, Multisensory Imagery , New York: Springer.
  • Laeng, B., I. M. Bloem, S. D’Ascenzo and L. Tommasi, 2014, “Scrutinizing visual images: The role of gaze in mental imagery and memory,” Cognition , 131: 263–283.
  • Lai, C. K., M. Marini, S. A. Lehr, C. Cerruti, J-E. L. Shin, J. A. Joy-Gaba, A. K. Ho, B. A. Teachman, S. P. Wojcik, S. P. Koleva, R. S. Frazier, L. Heiphetz, E. E. Chen, R. N. Turner, J. Haidt, S. Kesebir, C. B. Hawkins, H. S. Schaefer, S. Rubichi, G. Sartori, C. M. Dial, N. Sriram, M. R. Banaji, and B. A. Nosek, 2014, “Reducing Implicit Racial Preferences: I. A Comparative Investigation of 17 Interventions,” Journal of Experimental Psychology: General , 143: 1765–85.
  • Lamme, V. A. and P. R. Roelfsema, 2000, “The distinct modes of vision offered by feedforward and recurrent processing,” Trends in Neuroscience , 23: 571–579.
  • Langland-Hassan, P., 2015, “Imaginative Attitudes,” Philosophy and Phenomenological Research , 40: 664­–686.
  • –––, 2016, “On choosing what to imagine,” in A. Kind & P. Kung (eds.), Knowledge Through Imagination , New York: Oxford University Press, pp. 85–109
  • –––, 2020, Explaining Imagination , New York: Oxford University Press.
  • Langland-Hassan, P. and A. Vicente, 2018, Inner Speech: New Voices , New York: Oxford University Press.
  • Leaver, A. M., J. Van Lare, B. Zielinski, A. R. Halpern, & J. P. Rauschecker, 2009, “Brain activation during anticipation of sound sequences,” The Journal of Neuroscience , 29(8): 2477–2485.
  • Lee, T. S. and M. Nguyen, 2001, “Dynamics of subjective contour formation in the early visual cortex,” Proceedings of the National Academy of Sciences , 98: 1907–1911.
  • Levin, J., 2006, “Can mental images provide evidence for what is possible?”, Anthropology and Philosophy , 7: 108–119.
  • Litt, M. D. , & N. L. Cooney, 1999, “Inducing craving for alcohol in the laboratory,” Alcohol Research and Health , 23(3): 174–178.
  • Loeffler, S. N., M. Myrtek & M. Peper, 2013, “Mood-congruent Memory in Daily Life: Evidence from Interactive Ambulatory Monitoring,” Biological Psychology , 93: 308–15.
  • Lopes, D. M., 2003, “Out of Sight, Out of Mind,” in M. Kieran and D. M. Lopes (eds.), Imagination, Philosophy, and the Arts , London: Routledge, pp. 208–224.
  • Luria, A. R., 1960, “Memory and the Structure of Mental Processes,” Problems of Psychology , 4: 81–94.
  • MacIver, K., D. M. Lloyd, S. Kelly, N. Roberts, and T. Nurmikko, 2008, “Phantom Limb Pain, Cortical Reorganization and the Therapeutic Effect of Mental Imagery,” Brain , 131: 2181–91.
  • Macpherson, F., 2012, “Cognitive penetration of colour experience,” Philosophy and Phenomenological Research , 84: 24–62
  • Matt, G. E., C. Vazquez & W. K. Campbell, 1992, “Mood-congruent Recall of Affectively Toned Stimuli: A Meta-analytic Review,” Clinical Psychology Review , 12: 227–255.
  • Matthen, M., 2017, “When is synaesthesia perception?” in O. Deroy (ed.), Sensory Blending , Oxford: Oxford University Press, pp. 166–178..
  • May, J., J. Andrade, H. Batey, L-M. Berry, D. J. Kavanagh, 2010, “Less food for thought: Impact of attentional instructions on intrusive thoughts about snack foods,” Appetite , 55: 279–287.
  • May, J., J. Andrade, D. J. Kavanagh, L. Penfound, 2008, “Imagery and strength of craving for eating, drinking and playing sport,” Cognition and Emotion , 22: 633–50.
  • May, J., D. J. Kavanagh, & J. Andrade, 2014, “The Elaborated Intrusion Theory of Desire: A 10-year retrospective and implications for addiction treatments,” Addictive Behaviors , 44: 29–34.
  • McKoon, G., & R. Ratcliff, 1986, “Inferences about predictable events,” Journal of Experimental Psychology: Learning, Memory, and Cognition , 12: 82–91.
  • Michaelian, K., 2016, Mental Time Travel: Episodic Memory and Our Knowledge of the Personal Past , Cambridge, MA: MIT Press.
  • Michotte, A., G. Thinés, G. Crabbé, 1964, “Les complements amodaux des structures perceptives” [Amodal completion of perceptial structures], in G. Thinés, A. Costall, G. Butterworth (eds.), Michotte’s experimental phenomenology of perception , Hillsdale, NJ: Erlbaum, pp. 140–169.
  • Mok, J. N., D. Kwan, L. Green, J. Myerson, C. F. Craver, & R. S. Rosenbaum, 2020, “Is it Time? Episodic Imagining and the Discounting of Delayed and Probabilistic Rewards in Young and Older Adults,” Cognition , 199: 104222.
  • Nanay, B., 2013, Between Perception and Action , Oxford: Oxford University Press.
  • –––, 2015, “Perceptual content and the content of mental imagery,” Philosophical Studies , 172: 1723–1736.
  • –––, 2017a, “Sensory substitution and multimodal mental imagery,” Perception , 46: 1014–1026.
  • –––, 2017b, “Pain and mental imagery,” The Monist , 100: 485–500.
  • –––, 2018a, “Multimodal mental imagery,” Cortex , 105: 125–134.
  • –––, 2018b, “The importance of amodal completion in everyday perception,” i-Perception , 9 (4): 1–16. doi: 10.1177/204166951878887
  • –––, 2021c, “Unconscious mental imagery,” Philosophical Transaction of the Royal Society B , 376 (1817): 20190689
  • –––, 2021a, “Synesthesia as (multimodal) mental imagery,” Multisensory Research , 34: 281–296.
  • –––, 2021b, “Implicit bias as mental imagery,” Journal of the American Philosophical Association, 7 : 329–347.
  • –––, forthcoming, Mental Imagery , Oxford: Oxford University Press.
  • Noe, A., 2004, Action in Perception , Cambridge, MA: MIT Press.
  • Noorman, S., D. A. Neville & I. Simanova, 2018, “Words affect visual perception by activating object shape representations,” Scientific Reports , 8: 14156
  • Osuagwu, B. A. and A. Vuckovic, 2014, “Similarities between explicit and implicit motor imagery in mental rotation of hands: An EEG study,” Neuropsychologia , 65: 197–210.
  • Paivio, A., 1986, Mental representations: a dual coding approach , Oxford: Oxford University Press.
  • –––, 1971, Imagery and Verbal Processes , New York: Holt, Rinehart and Winston.
  • Pan, Y., M. Chen, J. Yin, X. An, X. Zhang, Y. Lu, H. Gong, W. Li, and W. Wang, 2012, “Equivalent representation of real and illusory contours in macaque V4,” The Journal of Neuroscience , 32: 6760–6770.
  • Parthasarathi, T., M. H. McConnell, J. Leury, and J. W. Kable, 2017, “The Vivid Present: Visualization Abilities Are Associated with Steep Discounting of Future Rewards,” Frontiers in Psychology , 8: 289. doi: 10.3389/fpsyg.2017.00289
  • Paulignan, Y., C. L. MacKenzie, R. G. Marteniuk, and M. Jeannerod, 1991, “Selective perturbation of visual input during prehension movements: 1. The effect of changing object position,” Experimental Brain Research , 83: 502–12.
  • Payne, K. B., 2001, “Prejudice and perception: The role of automatic and controlled processes in misperceiving a weapon,” Journal of Personality and Social Psychology , 81(2): 181–192.
  • Peacocke, C., 2019, The Primacy of Metaphysics , Oxford: Oxford University Press.
  • Pearson, J. and S. M. Kosslyn, 2015, “The heterogeneity of mental representation: Ending the mental imagery debate,” Proceedings of the National Academy of Sciences PNAS ( PNAS ), 112: 10089–10092.
  • Pearson, J., T. Naselaris, E. A. Holmes, and S. M. Kosslyn, 2015, “Mental Imagery: Functional Mechanisms and Clinical Applications,” Trends in Cognitive Sciences , 19: 590–602.
  • Peck, T. C., S. Seinfeld, S. M. Aglioti and M. Slater, 2013, “Putting yourself in the skin of a black avatar reduces implicit racial bias,” Consciousness and Cognition , 22: 779–787.
  • Pekkola, J., V. Ojanen, T. Autti, I. P. Jaaskelainen, R. Mottonen, A. Tarkainen, & M. Sams, 2005, “Primary auditory cortex activation by visual speech: an fMRI study at 3 T,” NeuroReport , 16: 125–128.
  • Perky, C. W., 1910, “An Experimental Study of Imagination.” American Journal of Psychology , 21: 422–52.
  • Phelps, E. A., 2004, “Human Emotion and Memory: Interactions of the Amygdala and Hippocampal Complex,” Current Opinions in Neurobiology , 14: 198–202.
  • Phillips, I., 2014, “Lack of imagination: individual differences in mental imagery and the significance of consciousness,” in J. Kallestrup & M. Sprevak (eds.), New Waves in Philosophy of Mind , London: Palgrave Macmillan.
  • Ploghaus, A., I. Tracey, J.S. Cati, S. Clare, R.S. Menon, P.M. Matthews, and J.N.P. Rawlins, 1999, “Dissociating Pain from Its Anticipation in the Human Brain,” Science , 284: 1979–81.
  • Price, M. C., 2009, “Spatial forms and mental imagery,” Cortex , 45: 1229–1245. doi: 10.1016/j.cortex.2009.06.013
  • Proust, M., 1913 [1928], Swann’s Way , C. K. Scott Moncrieff (trans.), New York: Modern Library, 1928.
  • Pylyshyn, Z., 1981, “The imagery debate: Analogue media versus tacit knowledge,” Psychological Review , 88: 16–45.
  • Ramachandran, V. S., D. Rogers-Ramachandran, and S. Cobb, 1995, “Touching the Phantom Limb,” Nature , 377: 489–90.
  • Rauschenberger, R. and S. Yantis, 2001, “Masking unveils pre-amodal completion representation in visual search,” Nature , 410: 369–372.
  • Reisberg , D., D. G. Pearson and S. M. Kosslyn, 2003, “Intuitions and introspections about imagery: The role of imagery experience in shaping an investigator’s theoretical views,” Journal of Applied Psychology , 17: 147–160.
  • Renier, L., O. Collignon, C. Poirier, D. Tranduy, A. Vanlierde, A. Bol, C. Veraart, A. De Volder, 2005, “Cross-modal activation visual cortex during depth perception using auditory substitution of vision,” Neuroimage , 26: 573–580.
  • Richardson, A., 1969, Mental Imagery , New York: Springer Publishing Company, Inc.
  • Ryle, G., 1949, The Concept of Mind , London: Huchinson.
  • Sartre, J.-P., 1940, L’Imaginaire , Paris: Gallimard
  • Sawamoto, N., M. Honda, T. Okada, T. Hanakawa, M. Kanda, H. Fukuyama, J. Konishi, and H. Shibasaki, 2000, “Expectation of Pain Enhances Responses to Nonpainful Somatosensory Stimulation in the Anterior Cingulate Cortex and Parietal Operculum/Posterior Insula: An Event-Related fMRI Study,” Journal of Neuroscience , 20: 7438–45.
  • Scanlon, T., 1998, What We Owe To Each Other , Cambridge, MA: Harvard University Press.
  • Schroeder, T., 2004, Three Faces of Desire , Oxford: Oxford University Press.
  • Sekuler, A. B. & S. E. Palmer, 1992, “Perception of partly occluded objects: a microgenetic analysis,” Journal of Experimental Psychology General , 121: 95–111.
  • Sellars, W., 1978, “The role of imagination in Kant’s theory of experience,” in H. W. Johnstone (ed.), Categories: A Colloquium , University Park: Pennsylvania State University Press.
  • Shea, N., 2018, Representation in Cognitive Science , Oxford: Oxford University Press.
  • Shepard, R. N. and J. Metzler, 1971, “Mental rotation of three- dimensional objects,” Science , 171: 701–703.
  • Simner, J., 2007, “Beyond perception: Synesthesia as a psycholinguistic phenomenon,” Trends in Cognitive Sciences , 11: 23–29.
  • Spence, C. & O. Deroy, 2013, “Crossmodal imagery,” in S. Lacey and R. Lawson (eds.), Multisensory Imagery , New York: Springer, pp. 157–183.
  • Spiller, M. J. and A. S. Jansari, 2008, “Mental imagery and synaesthesia: Is synaesthesia from internally-generated stimuli possible?”, Cognition , 109: 143–151.
  • Spiller, M. J., C. N. Jonas, J. Simner and A. Jansari, 2015, “Beyond visual imagery: How modality-specific is enhanced mental imagery in synesthesia?”, Consciousness and Cognition , 31: 73–85.
  • Starr, G., 2013, Feeling Beauty: The Neuroscience of Aesthetic Experience , Cambridge, MA, MIT Press.
  • Statham, D. J., J. P. Connor, D. J. Kavanagh, G. F. X. Feeney, R. M. Young, J. May, & J. Andrade, 2011, “Measuring alcohol craving: Development of the Alcohol Craving Questionnaire,” Addiction , 6: 1230–1238.
  • Stokes, D., 2019, “Mental imagery and fiction,” Canadian Journal of Philosophy , 49: 731–754.
  • Strawson, P. F., 1974, “Imagination and Perception,” in P.F. Strawson, Freedom and Resentment , London: Methuen, pp. 45–65.
  • Talavage, T. M., M. I. Sereno, J. R. Melcher, P. J. Ledden, B. R. Rosen, A. M. Dale, 2004, “Tonotopic organization in human auditory cortex revealed by progressions of frequency sensitivity,” Journal of Neurophysiology , 91(3): 1282–96.
  • Thaler, L., S. Arnott, & M. Goodale, 2011, “Neural correlates of natural human echolocation in early and late blind echolocation experts,” PLoS ONE , 6(5): e20162.
  • Thorpe, S., D. Fize and C. Marlot, 1996, “Speed of processing in the human visual system,” Nature , 381: 520–522.
  • Tiffany, S. T., & D. M. Hakenewerth, 1991, “The production of smoking urges through an imagery manipulation: Psychophysiological and verbal manifestations,” Addictive Behaviors , 16: 389–400.
  • Tiffany, S. T., & D. J. Drobes, 1990, “Imagery and smoking urges: the manipulation of affective content,” Addictive Behaviors , 15: 531–539.
  • Titchener, E. B., 1909, Lectures on the Experimental Psychology of the Thought-Processes , New York: Macmillan.
  • Trope, Y., & N. Liberman, 2003, “Temporal construal,” Psychological Review , 110: 403–421.
  • Tulving, E., 1972, “Episodic and semantic memory,” in E. Tulving & W. Donaldson (eds.), Organization of memory , Cambridge, MA: Academic Press, pp. 382–403.
  • Tye, M., 1991, The Imagery Debate , Cambridge, MA: MIT Press.
  • Van Leeuwen, N., 2011, “Imagination is where the action is,” Journal of Philosophy , 108: 55–77.
  • Van Leeuwen, N., 2016, “The Imaginative Agent,” in A. Kind & P. Kung (eds.), Knowledge Through Imagination , New York: Oxford University Press, pp. 85–109.
  • van Lier R. and V. Ekroll, 2020, “A conceptual playground between perception and cognition: Introduction to the special issue on amodal completion,” i-Perception , 11(4): 1-4.
  • Versland, A. and H. Rosenberg, 2007, “Effect of brief imagery interventions on craving in college student smokers,” Addiction Research Theory , 15(2): 177–187.
  • Vetter P., F. W. Smith, L. Muckli, 2014, “Decoding sound and imagery content in early visual cortex,” Current Biology , 24: 1256–1262.
  • Volz, M. S., V. Suarez-Contreras, A.L. Santos Portilla, and F. Fregni, 2015, “Mental Imagery-Induced Attention Modulates Pain Perception and Cortical Excitability,” BMC Neuroscience , 16: 15.
  • Wheeler, M., S. Peterson, and R. Buckner, 2000, “Memory’s Echo: Vivid Remembering Reactivates Sensory-Specific Cortex,” Proceedings of the National Academy of Sciences PNAS ( PNAS ), 97(20): 11125–11129.
  • Wiltsher, N., 2016, “Against the additive view of imagination,” Australasian Journal of Philosophy , 94: 266–282.
  • Wundt, W., 1912, An Introduction to Psychology , 2nd edition, New York: Macmillan.
  • Yates, F., 1966, The Art of Memory , London: Routledge.
  • Yeshurun, Y. and M. Carrasco, 1998, “Attention improves or impairs visual performance by enhancing spatial resolution,” Nature , 396: 72–75.
  • Yokosawa, K., S. Pamilo, L. Hirvenkari, R. Hari, & E. Pihko, 2013, “Activation of auditory cortex by anticipating and hearing emotional sounds: an MEG study,” PLoS ONE , 8(11): e80284.
  • Yolton, J. W., 1996, Perception and Reality: A History from Descartes to Kant , Ithaca, NY: Cornell University Press.
  • Yonelinas, A. P. and M. Ritchey, 2015, “The Slow Forgetting of Emotional Episodic Memories,” Trends in Cognitive Sciences , 19: 259–267.
  • Young, B., 2020, “Olfactory imagery,” Philosophical Studies , 177: 3303–3327.
  • Young, B. and B. Nanay, forthcoming, “Olfactory amodal completion,” Pacific Philosophical Quarterly .
  • Zatorre, R. J. and A. R. Halpern, 2005, “Mental concert: Musical imagery and auditory cortex,” Neuron , 47: 9–12.
  • Zeman, A. Z. J., S. Della Sala, L. A. Torrens, V-E. Gountouna, D. J. McGonigle, R. H. Logie, 2010, “Loss of imagery phenomenology with intact visuo-spatial task performance: A case of ‘blind imagination,” Neuropsychologia , 48: 145–155.
  • Zeman, A., F. Milton, S. Della Sala, M. Dewar, T. Frayling, J. Gaddum, A. Hattersley, B. Heuerman-Williamson, K. Jones, M. MacKisack, C. Winlove, 2020, “Phantasia – The Psychological Significance of Lifelong Visual Imagery Vividness Extremes,” Cortex , 130: 426–440.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Thomas, Nigel, “Mental Imagery”, Stanford Encyclopedia of Philosophy (Fall 2021 Edition), Edward N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2021/entries/mental-imagery/ >. [This was the previous entry on this topic in the Stanford Encyclopedia of Philosophy — see the version history .]
  • Bibliography on Mental Imagery , PhilPapers.org
  • Vividness of Visual Imagery Questionnaire (VVIQ), Aphantasia Network.

desire | emotion | imagination | memory | mental representation | music, philosophy of | perception: the contents of | -->spatial perception -->

Copyright © 2021 by Bence Nanay < bence . nanay @ ua . ac . be >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 05 August 2019

The human imagination: the cognitive neuroscience of visual mental imagery

  • Joel Pearson   ORCID: orcid.org/0000-0003-3704-5037 1  

Nature Reviews Neuroscience volume  20 ,  pages 624–634 ( 2019 ) Cite this article

34k Accesses

274 Citations

218 Altmetric

Metrics details

  • Object vision
  • Sensory systems
  • Working memory

Mental imagery can be advantageous, unnecessary and even clinically disruptive. With methodological constraints now overcome, research has shown that visual imagery involves a network of brain areas from the frontal cortex to sensory areas, overlapping with the default mode network, and can function much like a weak version of afferent perception. Imagery vividness and strength range from completely absent (aphantasia) to photo-like (hyperphantasia). Both the anatomy and function of the primary visual cortex are related to visual imagery. The use of imagery as a tool has been linked to many compound cognitive processes and imagery plays both symptomatic and mechanistic roles in neurological and mental disorders and treatments.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

$189.00 per year

only $15.75 per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

what is visual representation in psychology

Similar content being viewed by others

what is visual representation in psychology

Early-stage visual perception impairment in schizophrenia, bottom-up and back again

what is visual representation in psychology

A cognitive profile of multi-sensory imagery, memory and dreaming in aphantasia

what is visual representation in psychology

Between-subject variability in the influence of mental imagery on conscious perception

Zeman, A., Dewar, M. & Della Sala, S. Lives without imagery — congenital aphantasia. Cortex 73 , 378–380 (2015). This article documents and coins the term aphantasia, described as the complete lack of visual imagery ability .

Article   PubMed   Google Scholar  

Pearson, J. & Westbrook, F. Phantom perception: voluntary and involuntary non-retinal vision. Trends Cogn. Sci. 19 , 278–284 (2015). This opinion paper proposes a unifying framework for both voluntary and involuntary imagery .

Pearson, J., Naselaris, T., Holmes, E. A. & Kosslyn, S. M. Mental imagery: functional mechanisms and clinical applications. Trends Cogn. Sci. 19 , 590–602 (2015).

Article   PubMed   PubMed Central   Google Scholar  

Egeth, H. E. & Yantis, S. Visual attention: control, representation, and time course. Annu. Rev. Psychol. 48 , 269–297 (1997).

Article   CAS   PubMed   Google Scholar  

Dijkstra, N., Zeidman, P., Ondobaka, S., Gerven, M. A. J. & Friston, K. Distinct top-down and bottom-up brain connectivity during visual perception and imagery. Sci. Rep. 7 , 5677 (2017).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Dentico, D. et al. Reversal of cortical information flow during visual imagery as compared to visual perception. Neuroimage 100 , 237–243 (2014).

Schlegel, A. et al. Network structure and dynamics of the mental workspace. Proc. Natl Acad. Sci. USA 110 , 16277–16282 (2013).

Ranganath, C. & D’Esposito, M. Directing the mind’s eye: prefrontal, inferior and medial temporal mechanisms for visual working memory. Curr. Opin. Neurobiol. 15 , 175–182 (2005).

Yomogida, Y. Mental visual synthesis is originated in the fronto-temporal network of the left hemisphere. Cereb. Cortex 14 , 1376–1383 (2004).

Ishai, A., Ungerleider, L. G. & Haxby, J. V. Distributed neural systems for the generation of visual images. Neuron 28 , 979–990 (2000).

Goebel, R., Khorram-Sefat, D., Muckli, L., Hacker, H. & Singer, W. The constructive nature of vision: direct evidence from functional magnetic resonance imaging studies of apparent motion and motion imagery. Eur. J. Neurosci. 10 , 1563–1573 (1998).

Mellet, E. et al. Functional anatomy of spatial mental imagery generated from verbal instructions. J. Neurosci. 16 , 6504–6512 (1996).

O’Craven, K. M. & Kanwisher, N. Mental imagery of faces and places activates corresponding stimulus- specific brain regions. J. Cogn. Neurosci. 12 , 1013–1023 (2000).

Kosslyn, S. M., Ganis, G. & Thompson, W. L. Neural foundations of imagery. Nat. Rev. Neurosci. 2 , 635–642 (2001).

Hassabis, D., Kumaran, D. & Maguire, E. A. Using imagination to understand the neural basis of episodic memory. J. Neurosci. 27 , 14365–14374 (2007).

Bird, C. M., Capponi, C., King, J. A., Doeller, C. F. & Burgess, N. Establishing the boundaries: the hippocampal contribution to imagining scenes. J. Neurosci. 30 , 11688–11695 (2010).

Hassabis, D., Kumaran, D., Vann, S. D. & Maguire, E. A. Patients with hippocampal amnesia cannot imagine new experiences. Proc. Natl Acad. Sci. USA 104 , 1726–1731 (2007).

Kreiman, G., Koch, C. & Fried, I. Imagery neurons in the human brain. Nature 408 , 357–361 (2000).

Maguire, E. A., Vargha-Khadem, F. & Hassabis, D. Imagining fictitious and future experiences: evidence from developmental amnesia. Neuropsychologia 48 , 3187–3192 (2010).

Kim, S. et al. Sparing of spatial mental imagery in patients with hippocampal lesions. Learn. Mem. 20 , 657–663 (2013).

Pearson, J. & Kosslyn, S. M. The heterogeneity of mental representation: ending the imagery debate. Proc. Natl Acad. Sci. USA 112 , 10089–10092 (2015). This paper proposes an end to the ‘imagery debate’ based on the discussed evidence that imagery can be represented in the brain in a depictive manner .

D’Esposito, M. et al. A functional MRI study of mental image generation. Neuropsychologia 35 , 725–730 (1997).

Knauff, M., Kassubek, J., Mulack, T. & Greenlee, M. W. Cortical activation evoked by visual mental imagery as measured by fMRI. Neuroreport 11 , 3957–3962 (2000).

Trojano, L. et al. Matching two imagined clocks: the functional anatomy of spatial analysis in the absence of visual stimulation. Cereb. Cortex 10 , 473–481 (2000).

Wheeler, M. E., Petersen, S. E. & Buckner, R. L. Memory’s echo: vivid remembering reactivates sensory-specific cortex. Proc. Natl Acad. Sci. USA 97 , 11125 (2000).

Formisano, E. et al. Tracking the mind’s image in the brain I: time-resolved fMRI during visuospatial mental imagery. Neuron 35 , 185–194 (2002).

Sack, A. T. et al. Tracking the mind’s image in the brain II: transcranial magnetic stimulation reveals parietal asymmetry in visuospatial imagery. Neuron 35 , 195–204 (2002).

Le Bihan, D. et al. Activation of human primary visual cortex during visual recall: a magnetic resonance imaging study. Proc. Natl Acad. Sci. USA 90 , 11802–11805 (1993).

Sabbah, P. et al. Functional magnetic resonance imaging at 1.5T during sensorimotor and cognitive task. Eur. Neurol. 35 , 131–136 (1995).

Chen, W. et al. Human primary visual cortex and lateral geniculate nucleus activation during visual imagery. Neuroreport 9 , 3669–3674 (1998).

Ishai, A. Visual imagery of famous faces: effects of memory and attention revealed by fMRI. Neuroimage 17 , 1729–1741 (2002).

Ganis, G., Thompson, W. L. & Kosslyn, S. M. Brain areas underlying visual mental imagery and visual perception: an fMRI study. Cogn. Brain Res. 20 , 226–241 (2004).

Article   Google Scholar  

Klein, I., Paradis, A. L., Poline, J. B., Kossly, S. M. & Le Bihan, D. Transient activity in the human calcarine cortex during visual-mental imagery: an event-related fMRI study. J. Cogn. Neurosci. 12 (Suppl. 2), 15–23 (2000).

Lambert, S., Sampaio, E., Scheiber, C. & Mauss, Y. Neural substrates of animal mental imagery: calcarine sulcus and dorsal pathway involvement — an fMRI study. Brain Res. 924 , 176–183 (2002).

Cui, X., Jeter, C. B., Yang, D., Montague, P. R. & Eagleman, D. M. Vividness of mental imagery: individual variability can be measured objectively. Vision Res. 47 , 474–478 (2007).

Amedi, A., Malach, R. & Pascual-Leone, A. Negative BOLD differentiates visual imagery and perception. Neuron 48 , 859–872 (2005).

Reddy, L., Tsuchiya, N. & Serre, T. Reading the mind’s eye: decoding category information during mental imagery. Neuroimage 50 , 818–825 (2010).

Dijkstra, N., Bosch, S. E. & van Gerven, M. A. J. Vividness of visual imagery depends on the neural overlap with perception in visual areas. J. Neurosci. 37 , 1367–1373 (2017).

Kosslyn, S. M. & Thompson, W. L. When is early visual cortex activated during visual mental imagery? Psychol. Bull. 129 , 723–746 (2003).

Albers, A. M., Kok, P., Toni, I., Dijkerman, H. C. & de Lange, F. P. Shared representations for working memory and mental imagery in early visual cortex. Curr. Biol. 23 , 1427–1431 (2013). This paper shows that both imagery and visual working memory can be decoded in the brain based on training on either, showing evidence of a common brain representation .

Koenig-Robert, R. & Pearson, J. Decoding the contents and strength of imagery before volitional engagement. Sci. Rep. 9 , 3504 (2019). This paper shows that the content and vividness of a mental image can be decoded in the brain up to 11 seconds before an individual decides which pattern to imagine .

Article   PubMed   PubMed Central   CAS   Google Scholar  

Naselaris, T., Olman, C. A., Stansbury, D. E., Ugurbil, K. & Gallant, J. L. A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes. Neuroimage 105 , 215–228 (2015). This study shows that mental imagery content can be decoded in the early visual cortex when the decoding model is based on depictive perceptual features .

Fox, M. D. et al. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc. Natl Acad. Sci. USA 102 , 9673–9678 (2005).

Smith, S. M. et al. Correspondence of the brain’s functional architecture during activation and rest. Proc. Natl Acad. Sci. USA 106 , 13040–13045 (2009).

Østby, Y. et al. Mental time travel and default-mode network functional connectivity in the developing brain. Proc. Natl Acad. Sci. USA 109 , 16800–16804 (2012).

Andrews-Hanna, J. R., Reidler, J. S., Sepulcre, J., Poulin, R. & Buckner, R. L. Functional-anatomic fractionation of the brain’s default network. Neuron 65 , 550–562 (2010).

Hassabis, D. & Maguire, E. A. Deconstructing episodic memory with construction. Trends Cogn. Sci. 11 , 299–306 (2007).

Gerlach, K. D., Spreng, R. N., Gilmore, A. W. & Schacter, D. L. Solving future problems: default network and executive activity associated with goal-directed mental simulations. Neuroimage 55 , 1816–1824 (2011).

Levine, D. N., Warach, J. & Farah, M. Two visual systems in mental imagery. Neurology 35 , 1010 (1985).

Keogh, R. & Pearson, J. The blind mind: no sensory visual imagery in aphantasia. Cortex 105 , 53–60 (2017).

Sakai, K. & Miyashita, Y. Neural organization for the long-term memory of paired associates. Nature 354 , 152–155 (1991).

Messinger, A., Squire, L. R., Zola, S. M. & Albright, T. D. Neuronal representations of stimulus associations develop in the temporal lobe during learning. Proc. Natl Acad. Sci. USA 98 , 12239–12244 (2001).

Schlack, A. & Albright, T. D. Remembering visual motion: neural correlates of associative plasticity and motion recall in cortical area MT. Neuron 53 , 881–890 (2007).

Bannert, M. M. & Bartels, A. Decoding the yellow of a gray banana. Curr. Biol. 23 , 2268–2272 (2013).

Hansen, T., Olkkonen, M., Walter, S. & Gegenfurtner, K. R. Memory modulates color appearance. Nat. Neurosci. 9 , 1367–1368 (2006).

Meng, M., Remus, D. A. & Tong, F. Filling-in of visual phantoms in the human brain. Nat. Neurosci. 8 , 1248–1254 (2005).

Sasaki, Y. & Watanabe, T. The primary visual cortex fills in color. Proc. Natl Acad. Sci. USA 101 , 18251–18256 (2004).

Kok, P., Failing, M. F. & de Lange, F. P. Prior expectations evoke stimulus templates in the primary visual cortex. J. Cogn. Neurosci. 26 , 1546–1554 (2014).

Bergmann, J., Genc, E., Kohler, A., Singer, W. & Pearson, J. Smaller primary visual cortex is associated with stronger, but less precise mental imagery. Cereb. Cortex 26 , 3838–3850 (2016). This study shows that stronger but less precise imagery is associated with a smaller primary and secondary visual cortex .

Stensaas, S. S., Eddington, D. K. & Dobelle, W. H. The topography and variability of the primary visual cortex in man. J. Neurosurg. 40 , 747–755 (1974).

Song, C., Schwarzkopf, D. S. & Rees, G. Variability in visual cortex size reflects tradeoff between local orientation sensitivity and global orientation modulation. Nat. Commun. 4 , 1–10 (2013).

Google Scholar  

Dorph-Petersen, K.-A., Pierri, J. N., Wu, Q., Sampson, A. R. & Lewis, D. A. Primary visual cortex volume and total neuron number are reduced in schizophrenia. J. Comp. Neurol. 501 , 290–301 (2007).

Sack, A. T., van de Ven, V. G., Etschenberg, S., Schatz, D. & Linden, D. E. J. Enhanced vividness of mental imagery as a trait marker of schizophrenia? Schizophr. Bull. 31 , 97–104 (2005).

Maróthi, R. & Kéri, S. Enhanced mental imagery and intact perceptual organization in schizotypal personality disorder. Psychiatry Res. 259 , 433–438 (2018).

Morina, N., Leibold, E. & Ehring, T. Vividness of general mental imagery is associated with the occurrence of intrusive memories. J. Behav. Ther. Exp. Psychiatry 44 , 221–226 (2013).

Chao, L. L., Lenoci, M. & Neylan, T. C. Effects of post-traumatic stress disorder on occipital lobe function and structure. Neuroreport 23 , 412–419 (2012).

Tavanti, M. et al. Evidence of diffuse damage in frontal and occipital cortex in the brain of patients with post-traumatic stress disorder. Neurol. Sci. 33 , 59–68 (2011).

Kavanagh, D. J., Andrade, J. & May, J. Imaginary relish and exquisite torture: the elaborated intrusion theory of desire. Psychol. Rev. 112 , 446–467 (2005).

Ersche, K. D. et al. Abnormal brain structure implicated in stimulant drug addiction. Science 335 , 601–604 (2012).

Song, C., Schwarzkopf, D. S., Kanai, R. & Rees, G. Reciprocal anatomical relationship between primary sensory and prefrontal cortices in the human brain. J. Neurosci. 31 , 9472–9480 (2011).

Panizzon, M. S. et al. Distinct genetic influences on cortical surface area and cortical thickness. Cereb. Cortex 19 , 2728–2735 (2009).

Winkler, A. M. et al. Cortical thickness or grey matter volume? The importance of selecting the phenotype for imaging genetics studies. Neuroimage 53 , 1135–1146 (2010).

Bakken, T. E. et al. Association of common genetic variants in GPCPD1 with scaling of visual cortical surface area in humans. Proc. Natl Acad. Sci. USA 109 , 3985–3990 (2012).

Pearson, J., Rademaker, R. L. & Tong, F. Evaluating the mind’s eye: the metacognition of visual imagery. Psychol. Sci. 22 , 1535–1542 (2011).

Rademaker, R. L. & Pearson, J. Training visual imagery: improvements of metacognition, but not imagery strength. Front. Psychol. 3 , 224 (2012).

Pearson, J. New directions in mental-imagery research: the binocular-rivalry technique and decoding fMRI patterns. Curr. Dir. Psychol. Sci. 23 , 178–183 (2014).

Keogh, R., Bergmann, J. & Pearson, J. Cortical excitability controls the strength of mental imagery. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/093690v1 (2016).

Terhune, D. B., Tai, S., Cowey, A., Popescu, T. & Kadosh, R. C. Enhanced cortical excitability in grapheme-color synesthesia and its modulation. Curr. Biol. 21 , 2006–2009 (2011).

Chiou, R., Rich, A. N., Rogers, S. & Pearson, J. Exploring the functional nature of synaesthetic colour: dissociations from colour perception and imagery. Cognition 177 , 107–121 (2018).

Arieli, A., Sterkin, A., Grinvald, A. & Aertsen, A. Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses. Science 273 , 1868–1871 (1996).

Wassell, J., Rogers, S. L., Felmingam, K. L., Bryant, R. A. & Pearson, J. Biological psychology. Biol. Psychol. 107 , 61–68 (2015).

Kraehenmann, R. et al. LSD increases primary process thinking via serotonin 2A receptor activation. Front. Pharmacol. 8 , 418–419 (2017).

Article   CAS   Google Scholar  

Pearson, J., Clifford, C. W. G. & Tong, F. The functional impact of mental imagery on conscious perception. Curr. Biol. 18 , 982–986 (2008). This study shows that the content of visual imagery can bias or prime subsequent binocular rivalry; this paper was the basis for using binocular rivalry as a measurement tool for imagery .

Ishai, A. & Sagi, D. Common mechanisms of visual imagery and perception. Science 268 , 1772–1774 (1995).

Tartaglia, E. M., Bamert, L., Mast, F. W. & Herzog, M. H. Human perceptual learning by mental imagery. Curr. Biol. 19 , 2081–2085 (2009). This study shows that training with a purely imaged visual stimulus transfers to improve performance in perceptual tasks .

Lewis, D. E., O’Reilly, M. J. & Khuu, S. K. Conditioning the mind’s eye associative learning with voluntary mental imagery. Clin. Psychol. Sci. 1 , 390–400 (2013).

Laeng, B. & Sulutvedt, U. The eye pupil adjusts to imaginary light. Psychol. Sci. 25 , 188–197 (2014).

Brascamp, J. W., Knapen, T. H. J., Kanai, R., van Ee, R. & van den Berg, A. V. Flash suppression and flash facilitation in binocular rivalry. J. Vis. 7 , 12 (2007).

Tanaka, Y. & Sagi, D. A perceptual memory for low-contrast visual signals. Proc. Natl Acad. Sci. USA 95 , 12729–12733 (1998).

Chang, S., Lewis, D. E. & Pearson, J. The functional effects of color perception and color imagery. J. Vis. 13 , 4 (2013).

Slotnick, S. D., Thompson, W. L. & Kosslyn, S. M. Visual mental imagery induces retinotopically organized activation of early visual areas. Cereb. Cortex 15 , 1570–1583 (2005).

Thirion, B. et al. Inverse retinotopy: inferring the visual content of images from brain activation patterns. Neuroimage 33 , 1104–1116 (2006).

Horikawa, T. & Kamitani, Y. Generic decoding of seen and imagined objects using hierarchical visual features. Nat. Commun. 8 , 15037 (2017).

Keogh, R. & Pearson, J. Mental imagery and visual working memory. PLOS ONE 6 , e29221 (2011).

Keogh, R. & Pearson, J. The sensory strength of voluntary visual imagery predicts visual working memory capacity. J. Vis. 14 , 7 (2014).

Aydin, C. The differential contributions of visual imagery constructs on autobiographical thinking. Memory 26 , 189–200 (2017).

Schacter, D. L. et al. The future of memory: remembering, imagining, and the brain. Neuron 76 , 677–694 (2012).

Tong, F. Imagery and visual working memory: one and the same? Trends Cogn. Sci. 17 , 489–490 (2013).

Berger, G. H. & Gaunitz, S. C. Self-rated imagery and encoding strategies in visual memory. Br. J. Psychol. 70 , 21–24 (1979).

Harrison, S. A. & Tong, F. Decoding reveals the contents of visual working memory in early visual areas. Nature 458 , 632–635 (2009).

Borst, G., Ganis, G., Thompson, W. L. & Kosslyn, S. M. Representations in mental imagery and working memory: evidence from different types of visual masks. Mem. Cognit. 40 , 204–217 (2011).

Kang, M.-S., Hong, S. W., Blake, R. & Woodman, G. F. Visual working memory contaminates perception. Psychon Bull. Rev. 18 , 860–869 (2011).

Keogh, R. & Pearson, J. The perceptual and phenomenal capacity of mental imagery. Cognition 162 , 124–132 (2017). This study shows a new method to measure the capacity function of visual imagery and shows that it is quite limited .

Luck, S. J. & Vogel, E. K. Visual working memory capacity: from psychophysics and neurobiology to individual differences. Trends Cogn. Sci. 17 , 391–400 (2013).

Pearson, J. & Keogh, R. Redefining visual working memory: a cognitive-strategy, brain-region approach. Curr. Dir. Psychol. Sci. 28 , 266–273 (2019).

Greenberg, D. L. & Knowlton, B. J. The role of visual imagery in autobiographical memory. Mem. Cognit. 42 , 922–934 (2014).

Sheldon, S., Amaral, R. & Levine, B. Individual differences in visual imagery determine how event information is remembered. Memory 25 , 360–369 (2017).

D’Argembeau, A. & Van der Linden, M. Individual differences in the phenomenology of mental time travel: the effect of vivid visual imagery and emotion regulation strategies. Conscious Cogn. 15 , 342–350 (2006).

Vannucci, M., Pelagatti, C., Chiorri, C. & Mazzoni, G. Visual object imagery and autobiographical memory: object Imagers are better at remembering their personal past. Memory 24 , 455–470 (2015).

Galton, F. Statistics of mental imagery. Mind 5 , 301–318 (1880). This paper was the first formal empirical paper investing imagery vividness, including the first report of what is now called aphantasia .

Holmes, E. A. & Mathews, A. Mental imagery in emotion and emotional disorders. Clin. Psychol. Rev. 30 , 349–362 (2010).

Hackmann, A., Bennett-Levy, J. & Holmes, E. A. Oxford Guide to Imagery in Cognitive Therapy (Oxford Univ. Press, 2011).

Blackwell, S. E. et al. Positive imagery-based cognitive bias modification as a web-based treatment tool for depressed adults: a randomized controlled trial. Clin. Psychol. Sci. 3 , 91–111 (2015).

Crane, C., Shah, D., Barnhofer, T. & Holmes, E. A. Suicidal imagery in a previously depressed community sample. Clin. Psychol. Psychother. 19 , 57–69 (2011).

Hales, S. A., Deeprose, C., Goodwin, G. M. & Holmes, E. A. Cognitions in bipolar affective disorder and unipolar depression: imagining suicide. Bipolar Disord. 13 , 651–661 (2011).

Holmes, E. A. et al. Mood stability versus mood instability in bipolar disorder: a possible role for emotional mental imagery. Behav. Res. Ther. 49 , 707–713 (2011).

Tiggemann, M. & Kemps, E. The phenomenology of food cravings: the role of mental imagery. Appetite 45 , 305–313 (2005).

Connor, J. P. et al. Addictive behaviors. Addict. Behav. 39 , 721–724 (2014).

May, J., Andrade, J., Panabokke, N. & Kavanagh, D. Visuospatial tasks suppress craving for cigarettes. Behav. Res. Ther. 48 , 476–485 (2010).

Michael, T., Ehlers, A., Halligan, S. L. & Clark, D. M. Unwanted memories of assault: what intrusion characteristics are associated with PTSD? Behav. Res. Ther. 43 , 613–628 (2005).

Holmes, E. A., James, E. L., Kilford, E. J. & Deeprose, C. Key steps in developing a cognitive vaccine against traumatic flashbacks: visuospatial tetris versus verbal pub quiz. PLOS ONE 5 , e13706 (2010).

Shine, J. M. et al. Imagine that: elevated sensory strength of mental imagery in individuals with Parkinson’s disease and visual hallucinations. Proc. R. Soc. B 282 , 20142047 (2014).

Foa, E. B., Steketee, G., Turner, R. M. & Fischer, S. C. Effects of imaginal exposure to feared disasters in obsessive-compulsive checkers. Behav. Res. Ther. 18 , 449–455 (1980).

Hunt, M. & Fenton, M. Imagery rescripting versus in vivo exposure in the treatment of snake fear. J. Behav. Ther. Exp. Psychiatry 38 , 329–344 (2007).

Holmes, E. A. & Mathews, A. Mental imagery and emotion: a special relationship? Emotion 5 , 489–497 (2005).

Zeman, A. Z. J. et al. Loss of imagery phenomenology with intact visuo-spatial task performance: a case of ‘blind imagination’. Neuropsychologia 48 , 145–155 (2010).

Ungerleider, L. G. & Haxby, J. V. ‘What’ and ‘where’ in the human brain. Curr. Opin. Neurobiol. 4 , 157–165 (1994).

Jacobs, C., Schwarzkopf, D. S. & Silvanto, J. Visual working memory performance in aphantasia. Cortex 105 , 61–73 (2017).

Gray, C. R. & Gummerman, K. The enigmatic eidetic image: a critical examination of methods, data, and theories. Psychol. Bull. 82 , 383–407 (1975).

Stromeyer, C. F. & Psotka, J. The detailed texture of eidetic images. Nature 225 , 346–349 (1970).

Haber, R. N. Twenty years of haunting eidetic imagery: where’s the ghost? Behav. Brain Sci. 2 , 616–617 (1979).

Allport, G. W. Eidetic imagery. Br. J. Psychol. 15 , 99–120 (1924).

Kwok, E. L., Leys, G., Koenig-Robert, R. & Pearson, J. Measuring thought-control failure: sensory mechanisms and individual differences. Psychol. Sci. 57 , 811–821 (2019). This study shows that, even when people think they have successfully suppressed a mental image, it is still actually there and biases subsequent perception (a possible candidate for unconscious imagery) .

Kosslyn, S. M. Image and Mind (Harvard Univ. Press, 1980).

Kosslyn, S. M. Mental images and the brain. Cogn. Neuropsychol. 22 , 333–347 (2005).

Pylyshyn, Z. W. What the mind’s eye tells the mind’s brain: a critique of mental imagery. Psychol. Bull. 80 , 1–24 (1973).

Pylyshyn, Z. Return of the mental image: are there really pictures in the brain? Trends Cogn. Sci. 7 , 113–118 (2003). This review provides an updated summary of the imagery debate .

Chang, S. & Pearson, J. The functional effects of prior motion imagery and motion perception. Cortex 105 , 83–96 (2017).

Stokes, M., Thompson, R., Cusack, R. & Duncan, J. Top-down activation of shape-specific population codes in visual cortex during mental imagery. J. Neurosci. 29 , 1565–1572 (2009).

Amit, E. & Greene, J. D. You see, the ends don’t justify the means: visual imagery and moral judgment. Psychol. Sci. 23 , 861–868 (2012).

Dobson, M. & Markham, R. Imagery ability and source monitoring: implications for eyewitness memory. Br. J. Psychol. 84 , 111–118 (1993).

Gonsalves, B. et al. Neural evidence that vivid imagining can lead to false remembering. Psychol. Sci. 15 , 655–660 (2004).

Bird, C. M., Bisby, J. A. & Burgess, N. The hippocampus and spatial constraints on mental imagery. Front. Hum. Neurosci. 6 , 142 (2012).

Jones, L. & Stuth, G. The uses of mental imagery in athletics: an overview. Appl. Prev. Psychol. 6 , 101–115 (1997).

Dils, A. T. & Boroditsky, L. Visual motion aftereffect from understanding motion language. Proc. Natl Acad. Sci. USA 107 , 16396–16400 (2010).

Christian, B. M., Miles, L. K., Parkinson, C. & Macrae, C. N. Visual perspective and the characteristics of mind wandering. Front. Psychol. 4 , 699 (2013).

Palmiero, M., Cardi, V. & Belardinelli, M. O. The role of vividness of visual mental imagery on different dimensions of creativity. Creat. Res. J. 23 , 372–375 (2011).

Download references

Acknowledgements

The author thanks R. Keogh, R. Koenig-Robert and A. Dawes for helpful feedback and discussion on this paper. This paper, and some of the work discussed in it, was supported by Australian National Health and Medical Research Council grants APP1024800, APP1046198 and APP1085404, a Career Development Fellowship APP1049596 and an Australian Research Council discovery project grant DP140101560.

Author information

Authors and affiliations.

School of Psychology, The University of New South Wales, Sydney, Australia

Joel Pearson

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Joel Pearson .

Ethics declarations

Competing interests.

The author declares no competing interests.

Additional information

Peer review information Nature Reviews Neuroscience thanks D. Kavanagh, J. Hohwy and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The reverse direction of neural information flow, for example, from the top-down, as opposed to the bottom-up.

Magnetic resonance imaging and functional magnetic resonance imaging decoding methods that are constrained by or based on individual voxel responses to perception, which are then used to decode imagery.

Transformations in a spatial domain.

The conscious sense or feeling of something, different from detection.

A mental disorder characterized by social anxiety, thought disorder, paranoid ideation, derealization and transient psychosis.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Pearson, J. The human imagination: the cognitive neuroscience of visual mental imagery. Nat Rev Neurosci 20 , 624–634 (2019). https://doi.org/10.1038/s41583-019-0202-9

Download citation

Published : 05 August 2019

Issue Date : October 2019

DOI : https://doi.org/10.1038/s41583-019-0202-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Predicting the subjective intensity of imagined experiences from electrophysiological measures of oscillatory brain activity.

  • Derek H. Arnold
  • Blake W. Saurels
  • Dietrich S. Schwarzkopf

Scientific Reports (2024)

Visual hallucinations induced by Ganzflicker and Ganzfeld differ in frequency, complexity, and content

  • Oris Shenyan
  • Matteo Lisi
  • Tessa M. Dekker

Neural signatures of imaginary motivational states: desire for music, movement and social play

  • Giada Della Vedova
  • Alice Mado Proverbio

Brain Topography (2024)

Subregions of the fusiform gyrus are differentially involved in the attentional mechanism supporting visual mental imagery in depression

  • Jun-He Zhou
  • Bin-Kun Huang

Brain Imaging and Behavior (2024)

Memory retrieval effects as a function of differences in phenomenal experience

  • Austin H. Schmidt
  • C. Brock Kirwan

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

what is visual representation in psychology

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Greek and Roman Papyrology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Agriculture
  • History of Education
  • History of Emotions
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Variation
  • Language Families
  • Language Evolution
  • Language Reference
  • Lexicography
  • Linguistic Theories
  • Linguistic Typology
  • Linguistic Anthropology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Culture
  • Music and Media
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business History
  • Business Ethics
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic Methodology
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Ways of Seeing: The scope and limits of visual cognition

  • < Previous
  • Next chapter >

Introduction: What is human visual cognition?

  • Published: October 2003
  • Cite Icon Cite
  • Permissions Icon Permissions

Humans can see a great variety of things. They can see tables, trees, flowers, stars, planets, mountains, rivers, substances, tigers, people, vapors, rain, snow, gases, flames, clouds, smoke, shadows, flashes, holes, pictures, signs, movies, events, actions (including people seeing any of the preceding). They can see properties of things such as the color, texture, orientation, shape, contour, location, motion of objects. They can see facts, such as the fact that a given object exemplifies a set of visual attributes and/or stands in some visual relation to other objects. Sight, visual experience or visual perception, is both a particular kind of human experience and a fundamental source of human knowledge of the world. Furthermore, it interacts in multiple ways with human thought, human memory and the rest of human cognition.

Many of the things humans can see they can also think about. Many of the things they can think about, however, they cannot see. For example, they can think about, but they cannot see at all, prime numbers. Nor can they see atoms, molecules and cells without the aid of powerful instruments. Arguably, while atoms, molecules and cells are not visible to the naked eye, unlike numbers, they are not invisible altogether: with powerful microscopes, they become visible. Unlike numerals, however, numbers—whether prime or not—are simply not to be seen at all. Similarly, humans can entertain the thought, but they cannot see, that many of the things they can think about they cannot see.

Signed in as

Institutional accounts.

  • GoogleCrawler [DO NOT DELETE]
  • Google Scholar Indexing

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code

Institutional access

  • Sign in with a library card Sign in with username/password Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Sign in through your institution

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Sign in with a library card

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Visual Cognition

  • Reference work entry
  • First Online: 01 January 2016
  • Cite this reference work entry

what is visual representation in psychology

  • David Vernon 2  

1533 Accesses

1 Citations

Visual Inference

Related Concepts

Cognitive Vision

Visual cognition is the branch of psychology that is concerned with combining visual data with prior knowledge to construct high-level representations and make unconscious decisions about scene content [ 1 ].

Although the terms visual cognition and cognitive vision are strikingly similar, they are not equivalent. Cognitive vision refers to goal-oriented computer vision systems that exhibit adaptive and anticipatory behavior. In contrast, visual cognition is concerned with how the human visual system makes inferences about the large-scale composition of a visual scene using partial information [ 1 – 3 ].

Visual cognition, often associated with high-level vision and top-down visual processing, constructs visual entities by collecting perceived parts into coherent wholes, determining which parts belong together. Since the sensory data on which the processes of visual cognition operate are typically...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cavanagh P (2011) Visual cognition. Vis Res 51(13):1538–1551

Article   Google Scholar  

Coltheart V (ed) (2010) Tutorials in visual cognition. Macquarie monographs in cognitive science. Psychology Press, London

Google Scholar  

Pinker S (1984) Visual cognition: an introduction. Cognition 18:1–63

Blakemore S, Decety J (2001) From the perception of action to the understanding of intention. Nat Rev Neurosci 2(1):561–567

Carrasco M (2011) Visual attention: the past 25 years. Vis Res 51(13):1484–1525

Rensink RA (2002) Change detection. Annu Rev Psychol 53:245–277

Simons DJ (2000) Current approaches to change blindness. Vis Cogn 7(1–3):1–15

Article   MathSciNet   Google Scholar  

Torralba A, Oliva A, Castelhano MS, Henderson JM (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev 113(4):766–786

Deco G, Rolls E (2005) Attention, short term memory, and action selection: a unifying theory. Prog Neurobiol 76: 236–256

Spelke ES (1990) Principles of object perception. Cogn Sci 14:29–56

Oliva A, Torralba A (2006) Building the gist of a scene: the role of global image features in recognition. Prog Brain Res 155:23–36

Newell A (1990) Unified theories of cognition. Harvard University Press, Cambridge

Newell A, Simon HA (1976) Computer science as empirical inquiry: symbols and search. Commun Assoc Comput Mach 19:113–126. Tenth turing award lecture, ACM, 1975

MathSciNet   Google Scholar  

Pylyshyn ZW (1999) Is vision continuous with cognition? The case for cognitive impenetrability of visual perception. Behav Brain Sci 22(3):341–365

Cavanagh P (1999) The cognitive impenetrability of cognition. Behav Brain Sci 22(3):370–371

Pylyshyn ZW (1999) Vision and cognition: how do they connect? Behav Brain Sci 22(3):401–414

Download references

Author information

Authors and affiliations.

Informatics Research Centre, University of Skövde, Skövde, Sweden

David Vernon

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Institute of Industrial Science, The University of Tokyo, Tokyo, Japan

Katsushi Ikeuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry.

Vernon, D. (2014). Visual Cognition. In: Ikeuchi, K. (eds) Computer Vision. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-31439-6_785

Download citation

DOI : https://doi.org/10.1007/978-0-387-31439-6_785

Published : 05 February 2016

Publisher Name : Springer, Boston, MA

Print ISBN : 978-0-387-30771-8

Online ISBN : 978-0-387-31439-6

eBook Packages : Computer Science Reference Module Computer Science and Engineering

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Listen-Hard

Exploring Visual Imagery in Psychology: Definition and Uses

what is visual representation in psychology

Visual imagery plays a crucial role in our daily lives, influencing everything from memory and learning to creativity and problem-solving. In psychology, the study of visual imagery has a rich history and continues to be a fascinating area of research.

From behavioral studies to neuroimaging techniques, researchers have delved deep into how the brain processes visual information. In this article, we will explore the definition of visual imagery, its uses in psychology, and how it can be applied in everyday life for stress management, goal setting, and education.

Join us as we uncover the fascinating world of visual imagery.

  • Visual imagery is the mental representation of visual information, and plays a significant role in psychology research.
  • Studied through behavioral, neuroimaging, and cognitive experiments, visual imagery has various uses such as improving memory and creativity, treating mental health disorders, and enhancing performance.
  • Visual imagery works by activating sensory and motor areas, involving the hippocampus, and connecting to emotions and memory; it can be applied in everyday life for stress management, goal setting, and learning.
  • 1 What is Visual Imagery?
  • 2 History of Visual Imagery in Psychology
  • 3.1 Behavioral Studies
  • 3.2 Neuroimaging Techniques
  • 3.3 Cognitive Experiments
  • 4.1 Enhancing Memory and Learning
  • 4.2 Improving Performance in Sports and Other Activities
  • 4.3 Treating Mental Health Disorders
  • 4.4 Enhancing Creativity and Problem-Solving Skills
  • 5.1 Activation of Sensory and Motor Areas
  • 5.2 Involvement of the Hippocampus
  • 5.3 Connection to Emotions and Memory
  • 6.1 Visualization Techniques for Stress Management
  • 6.2 Using Visual Imagery in Goal Setting and Achievement
  • 6.3 Incorporating Visual Imagery in Education and Learning
  • 7.1 What is visual imagery in psychology?
  • 7.2 How is visual imagery used in psychology?
  • 7.3 What are the benefits of exploring visual imagery in psychology?
  • 7.4 How does visual imagery impact our perception?
  • 7.5 Can visual imagery be improved or enhanced?
  • 7.6 How is visual imagery different from other forms of imagination?

What is Visual Imagery?

Visual Imagery refers to the mental representation or cognitive process that allows individuals to experience and manipulate images in their minds, encompassing static, dynamic, and interactive forms of imagery.

Visual imagery plays a crucial role in various fields, including literature, where vivid descriptions allow readers to paint mental pictures of scenes, characters, and emotions. In art, artists use visual imagery to convey their thoughts, ideas, and emotions through paintings, sculptures, and other forms of visual art. Sports professionals often use mental imagery techniques to visualize successful movements and outcomes before actually performing them, enhancing their performance and confidence.

In psychology, Visual Imagery is studied for its impact on cognitive processes, memory, and problem-solving. Figures like Francis Galton, Wilhelm Wundt, and Edward Titchener have contributed significantly to understanding the nature and functions of mental imagery. In empirical sciences, visual imagery is used to investigate brain activity and mental representations, often compared to sensory information and direct external stimuli to explore the differences between mental imagery and actual images.

History of Visual Imagery in Psychology

The history of Visual Imagery in psychology dates back to seminal works by scholars like F. Scott Fitzgerald, whose vivid descriptions in ‘The Great Gatsby’ showcased the power of mental imagery in shaping perceptions, cognition, and actions.

In the realm of psychology, the study of visual imagery has grown to encompass various theories and methodologies. One significant advancement was the work of Gestalt psychologists in the early 20th century, emphasizing how the mind organizes visual stimuli into coherent patterns. This laid the foundation for understanding the holistic nature of visual perception.

Behaviorism, initiated by B.F. Skinner and John B. Watson, focused on observable behaviors and downplayed the significance of mental processes like imagery. The cognitive revolution in the 1950s and 1960s, spearheaded by researchers such as George Miller and Ulric Neisser, rekindled interest in mental representations and imagery.

Exploring the neural basis of visual imagery, studies have highlighted the involvement of sensory cortices in processing mental images. The concept of retinotopy, where visual stimuli are mapped on the visual cortex according to their location in the visual field, has provided valuable insights into how the brain represents visual information.

How is Visual Imagery Studied in Psychology?

Visual Imagery is studied in psychology through diverse methodologies, including behavioral studies, neuroimaging techniques, and cognitive experiments, aiming to understand its impact on perception, cognition, and actions.

One notable approach often utilized to explore Visual Imagery in psychology is guided imagery therapy, where individuals are guided by a therapist to evoke mental images to enhance relaxation, reduce stress, or address psychological concerns.

References such as the APA Dictionary of Psychology serve as valuable resources to define and study Visual Imagery concepts in a standardized manner.

Analysis of Visual Imagery extends beyond psychology, intersecting with fields like art, neuroscience, and philosophy, which offer unique perspectives on how imagery influences mental processes and creative expression.

Researchers have conducted studies linking Visual Imagery to memory consolidation, problem-solving abilities, and emotional regulation, indicating its profound impact on various cognitive functions.

Behavioral Studies

Behavioral studies on Visual Imagery often involve observing athletes, such as a basketball player, to analyze how mental imagery impacts their perception, cognition, and actions during performance.

Visual imagery plays a crucial role in enhancing athletic performance by allowing athletes to mentally rehearse movements, strategize plays, and build confidence. Research indicates that when athletes vividly picture themselves executing skills with precision, their neural networks fire similarly to when they physically perform those actions.

In a study published in the Journal of Sport and Exercise Psychology, participants who engaged in mental imagery exercises for free throws showed significant improvement in their accuracy on the court. This highlights the effectiveness of incorporating mental imagery techniques in sports psychology to prime athletes for peak performance.

Neuroimaging Techniques

Neuroimaging techniques provide valuable insights into how Visual Imagery is processed in the brain, revealing neural activations in sensory areas and the visual system during the cognitive processes of mental imagery.

Functional magnetic resonance imaging (fMRI) and Positron Emission Tomography (PET) are commonly used neuroimaging methods to study Visual Imagery. These tools allow researchers to observe brain activity in real-time as participants engage in tasks requiring mental imagery.

One of the fascinating aspects of these studies is the identification of specific neural pathways involved in processing mental images. For instance, studies have shown that the occipital lobe, responsible for visual processing, is highly active during Visual Imagery tasks.

Cognitive Experiments

Cognitive experiments delve into the nuances of Visual Imagery as a mental phenomenon, exploring how sensory information is transformed into perceptual representations without direct external stimuli, including auditory mental imagery processes.

This exploration sheds light on the intricate mechanisms that underlie the human mind’s ability to generate vivid mental images. For instance, studies have shown that when individuals engage in visual imagery tasks, specific brain regions associated with visual processing, such as the occipital lobe, become more active.

These experiments reveal the interconnected nature of auditory mental imagery in cognitive processes. For example, experiments involving the ‘auditory imagery continuum’ demonstrate how individuals can mentally represent sounds ranging from pure memory-based recall to vivid imagined experiences that mimic real auditory sensations.

What Are the Uses of Visual Imagery in Psychology?

Visual Imagery serves various purposes in psychology, such as enhancing memory and learning, improving performance in sports and other activities, treating mental health disorders, and fostering creativity and problem-solving skills.

In terms of memory enhancement, Visual Imagery can help individuals create vivid mental images that are easier to recall later. These images can serve as powerful memory cues, aiding in the retention of information over time. In the realm of learning, incorporating visual elements into educational materials can make complex concepts more understandable and memorable.

In sports psychology, athletes often use Visual Imagery techniques to mentally rehearse their performances, visualize success, and build confidence. By visualizing themselves executing flawless techniques or achieving goals, athletes can improve their skills and enhance their overall performance on the field or court.

Moreover, Visual Imagery plays a crucial role in addressing mental health challenges through guided imagery therapy. This therapeutic approach encourages individuals to create calming mental images to reduce anxiety, manage stress, and improve overall well-being. By engaging in positive visualizations, individuals can reframe negative thought patterns and promote emotional healing.

In terms of creativity and problem-solving, Visual Imagery sparks innovative thinking by allowing individuals to mentally manipulate images, visualize different scenarios, and explore various solutions to complex problems. By tapping into the power of Visual Imagery, individuals can unlock fresh perspectives, stimulate creativity, and generate novel ideas that can lead to groundbreaking innovations.”

Enhancing Memory and Learning

Visual Imagery plays a crucial role in enhancing memory and learning processes by leveraging the vividness of imagery and tapping into unconscious mental imagery mechanisms, stimulating the imagination for improved retention and recall.

Visual imagery involves the mental representation of objects, scenes, or concepts in a way that mirrors real-life perception, creating a more engaging learning experience. When a person visualizes information, it activates various regions in the brain responsible for memory and cognition, leading to stronger neural connections that aid in long-term memory formation.

Unconscious mental imagery, on the other hand, occurs automatically without conscious effort, influencing how memories are stored and recalled. This phenomenon underscores the power of our mind to process information subconsciously, emphasizing the importance of incorporating visual cues in educational settings to optimize learning outcomes.

By harnessing the potential of imagination and mental imagery, educators can design interactive lessons that encourage students to visualize complex concepts, making abstract ideas more concrete and easier to remember. For instance, using colorful diagrams, mind maps, or visual metaphors can guide learners in creating mental images that enhance comprehension and retention.

Improving Performance in Sports and Other Activities

Visual Imagery techniques are widely used to enhance performance in sports and other activities, give the power toing athletes, like a basketball player, to visualize successful outcomes, harness the power of imagination, and differentiate mental imagery from actual images for optimal results.

Athletes tap into the power of Visual Imagery to align their thoughts with their physical actions, creating a mental blueprint of success before stepping onto the field or court. By engaging in mental rehearsal and visualization exercises, they prime their minds for peak performance, sharpen focus, and build confidence. Through this process, they fine-tune their skills, anticipate challenges, and cultivate a winning mindset. This mental conditioning, combined with physical training, contributes significantly to their overall athletic success.

Treating Mental Health Disorders

Visual Imagery, particularly through guided imagery therapy, is employed in treating various mental health disorders by reshaping maladaptive mental imagery patterns, altering perceptions, influencing cognitive processes, and guiding adaptive actions for therapeutic benefits.

Through guided imagery therapy , individuals can immerse themselves in positive mental scenarios, envisioning a place of safety, comfort, and healing. This technique can help reduce symptoms of anxiety by calming the mind and promoting relaxation.

In cases of depression, Visual Imagery interventions can facilitate the restructuring of negative thought patterns, fostering a more positive outlook and motivation for change.

In the treatment of PTSD, Visual Imagery can be used to reprocess traumatic memories, desensitize triggering stimuli, and cultivate a sense of control over past experiences. By engaging the senses through mental images, patients can gradually confront and reframe distressing events, leading to symptom reduction and emotional healing.

Enhancing Creativity and Problem-Solving Skills

Visual Imagery plays a pivotal role in enhancing creativity and problem-solving skills by stimulating artistic expression, fostering aesthetic appreciation, engaging the imagination, and facilitating cognitive processes that underpin innovative solutions.

Through the power of Visual Imagery, individuals are able to transcend traditional boundaries and delve into realms of boundless creativity. In the art world, renowned painters like Vincent Van Gogh and Salvador Dali utilized vivid mental imagery to create groundbreaking masterpieces that continue to inspire generations. Similarly, in design and architecture, visionaries such as Zaha Hadid and Frank Gehry harnessed the potential of Visual Imagery to redefine spatial aesthetics and challenge conventional norms.

How Does Visual Imagery Work in the Brain?

Visual Imagery functions in the brain by activating sensory and motor areas, involving the hippocampus in memory formation, and establishing connections to emotions that influence the storage and retrieval of mental images.

When someone engages in Visual Imagery tasks, the brain goes through a complex process where it summons the same neural networks that perceive and trigger movement . This cascade of activity involves not just the primary visual cortex but also areas responsible for interpreting and processing sensory input. As the individual conjures mental images, the hippocampus, known for its crucial role in memory consolidation, plays a pivotal part in encoding these visual representations.

Activation of Sensory and Motor Areas

The activation of sensory and motor areas during Visual Imagery tasks involves the recruitment of specific brain regions, such as the sensory cortices, that facilitate the integration of mental images with cognitive processes and preparatory actions.

When individuals engage in Visual Imagery tasks, such as imagining themselves performing a specific movement or picturing an object, neural networks spanning the occipital, parietal, and temporal lobes are activated. These regions play a crucial role in processing visual stimuli and transforming them into mental representations.

Motor areas, including the premotor and primary motor cortices, are recruited to simulate the execution of planned actions based on the mental images created.

This integration of sensory and motor activation in the brain allows individuals to mentally rehearse movements, improve motor learning, and enhance decision-making processes by linking visual information with action planning.

Studies utilizing functional magnetic resonance imaging (fMRI) have demonstrated the intricate interplay between neural circuits involved in Visual Imagery tasks, shedding light on how the brain orchestrates this complex cognitive function.

Involvement of the Hippocampus

The hippocampus plays a pivotal role in Visual Imagery by encoding memories associated with mental images, integrating emotional components that enhance retention, and supporting cognitive processes linked to recalling visual information.

Research indicates that the hippocampus is crucial in memory consolidation, especially during sleep when it enhances the transfer of new information into long-term memory. Studies have shown that emotional associations with mental images can significantly impact memory formation and retrieval, with the hippocampus playing a key role in processing the emotional salience of visual memories.

The hippocampus is involved in mental imagery tasks, such as spatial navigation and scene construction, where it helps create vivid mental representations that aid in memory retrieval and decision-making processes.

The intricate functions of the hippocampus in Visual Imagery highlight its significance in shaping our memory experiences and emotional responses to visual stimuli.

Connection to Emotions and Memory

Visual Imagery is intricately connected to emotions and memory processes, influencing cognitive actions, aesthetic experiences, and philosophical inquiries within empirical sciences that seek to unravel the mysteries of mental phenomena.

When individuals visualize something, whether it’s a beautiful sunset, a childhood memory, or an abstract concept, their brain activates regions associated with both visual processing and emotional response. This interplay between the sensory input of images and the emotional interpretations they evoke plays a crucial role in shaping our memories. Studies have shown that emotional content embedded in visual stimuli enhances memory retention and recall, indicating a complex relationship between what we see and how we feel.

How Can Visual Imagery Be Used in Everyday Life?

Visual Imagery offers practical applications in everyday life by providing visualization techniques for stress management, aiding in goal setting and achievement, and enhancing educational experiences through immersive learning methods.

By incorporating Visual Imagery in stress management, individuals can create calming mental images to reduce anxiety levels and promote relaxation. This can involve picturing serene landscapes, favorite memories, or peaceful scenarios to promote a sense of tranquility.

In terms of achieving goals, Visual Imagery plays a crucial role in helping individuals envision their desired outcomes, which can enhance motivation and focus. By visualizing success and progress, individuals can reinforce positive behaviors and build confidence in their capabilities.

Visualization Techniques for Stress Management

Visualization techniques utilizing Visual Imagery are effective tools for stress management, as they modulate perceptions, influence cognitive responses, and guide adaptive actions to alleviate emotional distress and promote relaxation.

Visual Imagery plays a crucial role in stress management by engaging the mind in positive mental pictures, redirecting focus from stress triggers to calming scenes. By visualizing serene locations like a peaceful beach or lush forest, individuals can create a mental escape from anxiety-inducing situations. This shift in focus can trigger relaxation responses in the body, reducing physiological reactions to stress. Visualizing oneself successfully overcoming challenges can boost confidence and resilience in facing stressors.

Using Visual Imagery in Goal Setting and Achievement

Visual Imagery aids in goal setting and achievement by engaging the imagination to create vivid mental images of desired outcomes, aligning actions in everyday life with artistic, sports, and psychological practices that enhance performance and motivation.

By leveraging the capabilities of Visual Imagery, individuals can mentally simulate their path towards success, paving the way for overcoming challenges and staying committed to their objectives. This technique is not just limited to specific fields like art or athletics; it extends to various facets of daily routines, enabling individuals to envision their progress and maintain a sense of direction.

Just like how athletes mentally rehearse their performances or artists envision their creations before executing them, ordinary individuals can leverage this mental tool to boost their productivity, confidence, and resilience in the face of setbacks.

Incorporating Visual Imagery in Education and Learning

Visual Imagery enriches educational experiences by integrating mental imagery techniques that enhance cognitive processes, improve memory retention, stimulate imagination, and reinforce learning through interactive and engaging practices.

These techniques not only assist students in grasping complex concepts but also foster a deeper understanding of topics across diverse educational fields. By incorporating visual aids such as diagrams, charts, and mind maps, educators can cater to different learning styles and enhance knowledge retention.

Visual Imagery plays a pivotal role in creating a multi-sensory learning environment that promotes creativity and critical thinking skills. Utilizing storytelling or virtual reality experiences can transport students into immersive educational scenarios, making learning more interactive and memorable.

Frequently Asked Questions

What is visual imagery in psychology.

Visual imagery in psychology refers to the mental process of creating and manipulating images in our minds. It involves using our imagination to visualize objects, people, or events, even when they are not physically present.

How is visual imagery used in psychology?

Visual imagery is used in psychology to understand how our minds create mental images and how these images can influence our thoughts, emotions, and behaviors. It is also used in therapy, such as in guided imagery, to help individuals cope with anxiety, depression, and other psychological issues.

What are the benefits of exploring visual imagery in psychology?

Exploring visual imagery in psychology can provide valuable insights into how our minds work, helping us understand our thoughts and behaviors better. It can also be used as a therapeutic tool to promote relaxation, improve memory, and reduce stress and anxiety.

How does visual imagery impact our perception?

Visual imagery plays a crucial role in our perception of the world as it allows us to create and manipulate mental images that are not present in our immediate environment. These images can influence how we perceive and interpret our surroundings, leading to changes in our behavior and emotions.

Can visual imagery be improved or enhanced?

Yes, visual imagery skills can be improved and enhanced through practice and training. By actively engaging in exercises that involve visualizing objects, scenes, or events, individuals can improve their ability to create vivid and detailed mental images.

How is visual imagery different from other forms of imagination?

Visual imagery specifically refers to the process of creating and manipulating images in our minds, while other forms of imagination, such as auditory or kinesthetic, involve using other senses. Visual imagery is also often more vivid and detailed compared to other forms of imagination.

' src=

Dr. Henry Foster is a neuropsychologist with a focus on cognitive disorders and brain rehabilitation. His clinical work involves assessing and treating individuals with brain injuries and neurodegenerative diseases. Through his writing, Dr. Foster shares insights into the brain’s ability to heal and adapt, offering hope and practical advice for patients and families navigating the challenges of cognitive impairments.

Similar Posts

Uncovering the Meaning of Blind Spots in Psychology

Uncovering the Meaning of Blind Spots in Psychology

The article was last updated by Dr. Emily Tan on February 8, 2024. Ever find yourself making decisions that seem irrational in hindsight? Or perhaps…

The Intricacies of Implicit Attitudes in Psychology

The Intricacies of Implicit Attitudes in Psychology

The article was last updated by Dr. Henry Foster on February 9, 2024. Implicit attitudes are a fascinating aspect of psychology that often operate beneath…

Digging into the Five Themes of Cognitive Psychology

Digging into the Five Themes of Cognitive Psychology

The article was last updated by Nicholas Reed on February 4, 2024. Curious about how the mind processes information and influences behavior? Look no further…

Unlocking Emotional Memory in Psychology: Functions and Significance

Unlocking Emotional Memory in Psychology: Functions and Significance

The article was last updated by Dr. Naomi Kessler on February 5, 2024. Emotional memory plays a crucial role in shaping our decisions, behaviors, and…

Training Fish: The Psychology Behind It

Training Fish: The Psychology Behind It

The article was last updated by Emily (Editor) on February 21, 2024. Have you ever wondered if it’s possible to train a fish? Many people…

Exploring Abstract Reasoning in Psychology

Exploring Abstract Reasoning in Psychology

The article was last updated by Dr. Henry Foster on February 9, 2024. Abstract reasoning is a fundamental aspect of cognitive processes that allows individuals…

psychology

Visual Imagery

Visual Imagery is the mental representation or recreation of something that is not physically present. It involves the mind’s ‘eye’ forming images, enabling us to ‘see’ a concept, idea, or physical object even when it is not before our eyes. This cognitive process can significantly impact our thought processes, memory recall, and even physiological responses.

Types of Visual Imagery

Visual Imagery is not restricted to a single form. It may take various shapes and formats, each with its unique benefits and uses.

Static Imagery

Static Imagery involves the mental visualization of still images . This can include anything from remembering a picture you saw in a book, envisioning the face of a loved one, or even recalling a scene from a movie.

Dynamic Imagery

Dynamic Imagery, on the other hand, involves the creation of moving images in the mind. This could include imagining a horse galloping across a field, visualizing the flow of a river, or picturing a car driving down a street.

Interactive Imagery

Interactive Imagery is a step beyond static and dynamic imagery, wherein you visualize yourself interacting with the visualized scene or object. Athletes often use this type of imagery, imagining themselves performing their sport to prepare mentally for the actual event.

Examples of Visual Imagery

Visual imagery is often used in literature to create vivid mental pictures that immerse readers in the story. Consider the following excerpt from “The Great Gatsby” by F. Scott Fitzgerald:

“In his blue gardens men and girls came and went like moths among the whisperings and the champagne and the stars.”

In this sentence, Fitzgerald uses visual imagery to describe Gatsby’s lavish parties. The readers can picture the men and girls moving around the blue gardens, sipping champagne under the starlit sky, much like moths fluttering around.

Everyday Life

Everyday life presents numerous instances of visual imagery. For example, when planning a trip, you might visualize the sights you’ll see, the hotel room you’ll stay in, or the food you’ll eat. Similarly, if you’re cooking a new recipe, you might imagine each step, visualizing how to chop the vegetables, stir the ingredients in the pan, or present the dish on a plate.

Visual artists use imagery as a crucial component of their work. For instance, painters visualize the final product before even touching the brush to the canvas. This mental image guides their hand movements and choice of colors as they bring their vision to life.

In sports, athletes often employ visual imagery to improve their performance. A basketball player might visualize making a successful free throw, picturing the trajectory of the ball, its arc towards the hoop, and the satisfying swish of the net. This mental practice can boost their confidence and improve their actual performance in the game.

In psychology, guided imagery therapy involves therapists directing patients to imagine a particular scene or scenario. For example, a person dealing with stress might be guided to picture a peaceful beach, feeling the warmth of the sun on their skin, hearing the gentle waves lapping at the shore, and smelling the salty sea air. Such visual imagery can promote relaxation and stress relief.

Visual Imagery is an innate capability that not only enables us to revisit the past and anticipate the future but also empowers us to conceptualize, learn, and even heal. Understanding its various forms and applications can provide us with an essential tool for enhancing various aspects of our lives.

  • Open access
  • Published: 19 July 2015

The role of visual representations in scientific practices: from conceptual understanding and knowledge generation to ‘seeing’ how science works

  • Maria Evagorou 1 ,
  • Sibel Erduran 2 &
  • Terhi Mäntylä 3  

International Journal of STEM Education volume  2 , Article number:  11 ( 2015 ) Cite this article

73k Accesses

78 Citations

13 Altmetric

Metrics details

The use of visual representations (i.e., photographs, diagrams, models) has been part of science, and their use makes it possible for scientists to interact with and represent complex phenomena, not observable in other ways. Despite a wealth of research in science education on visual representations, the emphasis of such research has mainly been on the conceptual understanding when using visual representations and less on visual representations as epistemic objects. In this paper, we argue that by positioning visual representations as epistemic objects of scientific practices, science education can bring a renewed focus on how visualization contributes to knowledge formation in science from the learners’ perspective.

This is a theoretical paper, and in order to argue about the role of visualization, we first present a case study, that of the discovery of the structure of DNA that highlights the epistemic components of visual information in science. The second case study focuses on Faraday’s use of the lines of magnetic force. Faraday is known of his exploratory, creative, and yet systemic way of experimenting, and the visual reasoning leading to theoretical development was an inherent part of the experimentation. Third, we trace a contemporary account from science focusing on the experimental practices and how reproducibility of experimental procedures can be reinforced through video data.

Conclusions

Our conclusions suggest that in teaching science, the emphasis in visualization should shift from cognitive understanding—using the products of science to understand the content—to engaging in the processes of visualization. Furthermore, we suggest that is it essential to design curriculum materials and learning environments that create a social and epistemic context and invite students to engage in the practice of visualization as evidence, reasoning, experimental procedure, or a means of communication and reflect on these practices. Implications for teacher education include the need for teacher professional development programs to problematize the use of visual representations as epistemic objects that are part of scientific practices.

During the last decades, research and reform documents in science education across the world have been calling for an emphasis not only on the content but also on the processes of science (Bybee 2014 ; Eurydice 2012 ; Duschl and Bybee 2014 ; Osborne 2014 ; Schwartz et al. 2012 ), in order to make science accessible to the students and enable them to understand the epistemic foundation of science. Scientific practices, part of the process of science, are the cognitive and discursive activities that are targeted in science education to develop epistemic understanding and appreciation of the nature of science (Duschl et al. 2008 ) and have been the emphasis of recent reform documents in science education across the world (Achieve 2013 ; Eurydice 2012 ). With the term scientific practices, we refer to the processes that take place during scientific discoveries and include among others: asking questions, developing and using models, engaging in arguments, and constructing and communicating explanations (National Research Council 2012 ). The emphasis on scientific practices aims to move the teaching of science from knowledge to the understanding of the processes and the epistemic aspects of science. Additionally, by placing an emphasis on engaging students in scientific practices, we aim to help students acquire scientific knowledge in meaningful contexts that resemble the reality of scientific discoveries.

Despite a wealth of research in science education on visual representations, the emphasis of such research has mainly been on the conceptual understanding when using visual representations and less on visual representations as epistemic objects. In this paper, we argue that by positioning visual representations as epistemic objects, science education can bring a renewed focus on how visualization contributes to knowledge formation in science from the learners’ perspective. Specifically, the use of visual representations (i.e., photographs, diagrams, tables, charts) has been part of science and over the years has evolved with the new technologies (i.e., from drawings to advanced digital images and three dimensional models). Visualization makes it possible for scientists to interact with complex phenomena (Richards 2003 ), and they might convey important evidence not observable in other ways. Visual representations as a tool to support cognitive understanding in science have been studied extensively (i.e., Gilbert 2010 ; Wu and Shah 2004 ). Studies in science education have explored the use of images in science textbooks (i.e., Dimopoulos et al. 2003 ; Bungum 2008 ), students’ representations or models when doing science (i.e., Gilbert et al. 2008 ; Dori et al. 2003 ; Lehrer and Schauble 2012 ; Schwarz et al. 2009 ), and students’ images of science and scientists (i.e., Chambers 1983 ). Therefore, studies in the field of science education have been using the term visualization as “the formation of an internal representation from an external representation” (Gilbert et al. 2008 , p. 4) or as a tool for conceptual understanding for students.

In this paper, we do not refer to visualization as mental image, model, or presentation only (Gilbert et al. 2008 ; Philips et al. 2010 ) but instead focus on visual representations or visualization as epistemic objects. Specifically, we refer to visualization as a process for knowledge production and growth in science. In this respect, modeling is an aspect of visualization, but what we are focusing on with visualization is not on the use of model as a tool for cognitive understanding (Gilbert 2010 ; Wu and Shah 2004 ) but the on the process of modeling as a scientific practice which includes the construction and use of models, the use of other representations, the communication in the groups with the use of the visual representation, and the appreciation of the difficulties that the science phase in this process. Therefore, the purpose of this paper is to present through the history of science how visualization can be considered not only as a cognitive tool in science education but also as an epistemic object that can potentially support students to understand aspects of the nature of science.

Scientific practices and science education

According to the New Generation Science Standards (Achieve 2013 ), scientific practices refer to: asking questions and defining problems; developing and using models; planning and carrying out investigations; analyzing and interpreting data; using mathematical and computational thinking; constructing explanations and designing solutions; engaging in argument from evidence; and obtaining, evaluating, and communicating information. A significant aspect of scientific practices is that science learning is more than just about learning facts, concepts, theories, and laws. A fuller appreciation of science necessitates the understanding of the science relative to its epistemological grounding and the process that are involved in the production of knowledge (Hogan and Maglienti 2001 ; Wickman 2004 ).

The New Generation Science Standards is, among other changes, shifting away from science inquiry and towards the inclusion of scientific practices (Duschl and Bybee 2014 ; Osborne 2014 ). By comparing the abilities to do scientific inquiry (National Research Council 2000 ) with the set of scientific practices, it is evident that the latter is about engaging in the processes of doing science and experiencing in that way science in a more authentic way. Engaging in scientific practices according to Osborne ( 2014 ) “presents a more authentic picture of the endeavor that is science” (p.183) and also helps the students to develop a deeper understanding of the epistemic aspects of science. Furthermore, as Bybee ( 2014 ) argues, by engaging students in scientific practices, we involve them in an understanding of the nature of science and an understanding on the nature of scientific knowledge.

Science as a practice and scientific practices as a term emerged by the philosopher of science, Kuhn (Osborne 2014 ), refers to the processes in which the scientists engage during knowledge production and communication. The work that is followed by historians, philosophers, and sociologists of science (Latour 2011 ; Longino 2002 ; Nersessian 2008 ) revealed the scientific practices in which the scientists engage in and include among others theory development and specific ways of talking, modeling, and communicating the outcomes of science.

Visualization as an epistemic object

Schematic, pictorial symbols in the design of scientific instruments and analysis of the perceptual and functional information that is being stored in those images have been areas of investigation in philosophy of scientific experimentation (Gooding et al. 1993 ). The nature of visual perception, the relationship between thought and vision, and the role of reproducibility as a norm for experimental research form a central aspect of this domain of research in philosophy of science. For instance, Rothbart ( 1997 ) has argued that visualizations are commonplace in the theoretical sciences even if every scientific theory may not be defined by visualized models.

Visual representations (i.e., photographs, diagrams, tables, charts, models) have been used in science over the years to enable scientists to interact with complex phenomena (Richards 2003 ) and might convey important evidence not observable in other ways (Barber et al. 2006 ). Some authors (e.g., Ruivenkamp and Rip 2010 ) have argued that visualization is as a core activity of some scientific communities of practice (e.g., nanotechnology) while others (e.g., Lynch and Edgerton 1988 ) have differentiated the role of particular visualization techniques (e.g., of digital image processing in astronomy). Visualization in science includes the complex process through which scientists develop or produce imagery, schemes, and graphical representation, and therefore, what is of importance in this process is not only the result but also the methodology employed by the scientists, namely, how this result was produced. Visual representations in science may refer to objects that are believed to have some kind of material or physical existence but equally might refer to purely mental, conceptual, and abstract constructs (Pauwels 2006 ). More specifically, visual representations can be found for: (a) phenomena that are not observable with the eye (i.e., microscopic or macroscopic); (b) phenomena that do not exist as visual representations but can be translated as such (i.e., sound); and (c) in experimental settings to provide visual data representations (i.e., graphs presenting velocity of moving objects). Additionally, since science is not only about replicating reality but also about making it more understandable to people (either to the public or other scientists), visual representations are not only about reproducing the nature but also about: (a) functioning in helping solving a problem, (b) filling gaps in our knowledge, and (c) facilitating knowledge building or transfer (Lynch 2006 ).

Using or developing visual representations in the scientific practice can range from a straightforward to a complicated situation. More specifically, scientists can observe a phenomenon (i.e., mitosis) and represent it visually using a picture or diagram, which is quite straightforward. But they can also use a variety of complicated techniques (i.e., crystallography in the case of DNA studies) that are either available or need to be developed or refined in order to acquire the visual information that can be used in the process of theory development (i.e., Latour and Woolgar 1979 ). Furthermore, some visual representations need decoding, and the scientists need to learn how to read these images (i.e., radiologists); therefore, using visual representations in the process of science requires learning a new language that is specific to the medium/methods that is used (i.e., understanding an X-ray picture is different from understanding an MRI scan) and then communicating that language to other scientists and the public.

There are much intent and purposes of visual representations in scientific practices, as for example to make a diagnosis, compare, describe, and preserve for future study, verify and explore new territory, generate new data (Pauwels 2006 ), or present new methodologies. According to Latour and Woolgar ( 1979 ) and Knorr Cetina ( 1999 ), visual representations can be used either as primary data (i.e., image from a microscope). or can be used to help in concept development (i.e., models of DNA used by Watson and Crick), to uncover relationships and to make the abstract more concrete (graphs of sound waves). Therefore, visual representations and visual practices, in all forms, are an important aspect of the scientific practices in developing, clarifying, and transmitting scientific knowledge (Pauwels 2006 ).

Methods and Results: Merging Visualization and scientific practices in science

In this paper, we present three case studies that embody the working practices of scientists in an effort to present visualization as a scientific practice and present our argument about how visualization is a complex process that could include among others modeling and use of representation but is not only limited to that. The first case study explores the role of visualization in the construction of knowledge about the structure of DNA, using visuals as evidence. The second case study focuses on Faraday’s use of the lines of magnetic force and the visual reasoning leading to the theoretical development that was an inherent part of the experimentation. The third case study focuses on the current practices of scientists in the context of a peer-reviewed journal called the Journal of Visualized Experiments where the methodology is communicated through videotaped procedures. The three case studies represent the research interests of the three authors of this paper and were chosen to present how visualization as a practice can be involved in all stages of doing science, from hypothesizing and evaluating evidence (case study 1) to experimenting and reasoning (case study 2) to communicating the findings and methodology with the research community (case study 3), and represent in this way the three functions of visualization as presented by Lynch ( 2006 ). Furthermore, the last case study showcases how the development of visualization technologies has contributed to the communication of findings and methodologies in science and present in that way an aspect of current scientific practices. In all three cases, our approach is guided by the observation that the visual information is an integral part of scientific practices at the least and furthermore that they are particularly central in the scientific practices of science.

Case study 1: use visual representations as evidence in the discovery of DNA

The focus of the first case study is the discovery of the structure of DNA. The DNA was first isolated in 1869 by Friedrich Miescher, and by the late 1940s, it was known that it contained phosphate, sugar, and four nitrogen-containing chemical bases. However, no one had figured the structure of the DNA until Watson and Crick presented their model of DNA in 1953. Other than the social aspects of the discovery of the DNA, another important aspect was the role of visual evidence that led to knowledge development in the area. More specifically, by studying the personal accounts of Watson ( 1968 ) and Crick ( 1988 ) about the discovery of the structure of the DNA, the following main ideas regarding the role of visual representations in the production of knowledge can be identified: (a) The use of visual representations was an important part of knowledge growth and was often dependent upon the discovery of new technologies (i.e., better microscopes or better techniques in crystallography that would provide better visual representations as evidence of the helical structure of the DNA); and (b) Models (three-dimensional) were used as a way to represent the visual images (X-ray images) and connect them to the evidence provided by other sources to see whether the theory can be supported. Therefore, the model of DNA was built based on the combination of visual evidence and experimental data.

An example showcasing the importance of visual representations in the process of knowledge production in this case is provided by Watson, in his book The Double Helix (1968):

…since the middle of the summer Rosy [Rosalind Franklin] had had evidence for a new three-dimensional form of DNA. It occurred when the DNA 2molecules were surrounded by a large amount of water. When I asked what the pattern was like, Maurice went into the adjacent room to pick up a print of the new form they called the “B” structure. The instant I saw the picture, my mouth fell open and my pulse began to race. The pattern was unbelievably simpler than those previously obtained (A form). Moreover, the black cross of reflections which dominated the picture could arise only from a helical structure. With the A form the argument for the helix was never straightforward, and considerable ambiguity existed as to exactly which type of helical symmetry was present. With the B form however, mere inspection of its X-ray picture gave several of the vital helical parameters. (p. 167-169)

As suggested by Watson’s personal account of the discovery of the DNA, the photo taken by Rosalind Franklin (Fig.  1 ) convinced him that the DNA molecule must consist of two chains arranged in a paired helix, which resembles a spiral staircase or ladder, and on March 7, 1953, Watson and Crick finished and presented their model of the structure of DNA (Watson and Berry 2004 ; Watson 1968 ) which was based on the visual information provided by the X-ray image and their knowledge of chemistry.

X-ray chrystallography of DNA

In analyzing the visualization practice in this case study, we observe the following instances that highlight how the visual information played a role:

Asking questions and defining problems: The real world in the model of science can at some points only be observed through visual representations or representations, i.e., if we are using DNA as an example, the structure of DNA was only observable through the crystallography images produced by Rosalind Franklin in the laboratory. There was no other way to observe the structure of DNA, therefore the real world.

Analyzing and interpreting data: The images that resulted from crystallography as well as their interpretations served as the data for the scientists studying the structure of DNA.

Experimenting: The data in the form of visual information were used to predict the possible structure of the DNA.

Modeling: Based on the prediction, an actual three-dimensional model was prepared by Watson and Crick. The first model did not fit with the real world (refuted by Rosalind Franklin and her research group from King’s College) and Watson and Crick had to go through the same process again to find better visual evidence (better crystallography images) and create an improved visual model.

Example excerpts from Watson’s biography provide further evidence for how visualization practices were applied in the context of the discovery of DNA (Table  1 ).

In summary, by examining the history of the discovery of DNA, we showcased how visual data is used as scientific evidence in science, identifying in that way an aspect of the nature of science that is still unexplored in the history of science and an aspect that has been ignored in the teaching of science. Visual representations are used in many ways: as images, as models, as evidence to support or rebut a model, and as interpretations of reality.

Case study 2: applying visual reasoning in knowledge production, the example of the lines of magnetic force

The focus of this case study is on Faraday’s use of the lines of magnetic force. Faraday is known of his exploratory, creative, and yet systemic way of experimenting, and the visual reasoning leading to theoretical development was an inherent part of this experimentation (Gooding 2006 ). Faraday’s articles or notebooks do not include mathematical formulations; instead, they include images and illustrations from experimental devices and setups to the recapping of his theoretical ideas (Nersessian 2008 ). According to Gooding ( 2006 ), “Faraday’s visual method was designed not to copy apparent features of the world, but to analyse and replicate them” (2006, p. 46).

The lines of force played a central role in Faraday’s research on electricity and magnetism and in the development of his “field theory” (Faraday 1852a ; Nersessian 1984 ). Before Faraday, the experiments with iron filings around magnets were known and the term “magnetic curves” was used for the iron filing patterns and also for the geometrical constructs derived from the mathematical theory of magnetism (Gooding et al. 1993 ). However, Faraday used the lines of force for explaining his experimental observations and in constructing the theory of forces in magnetism and electricity. Examples of Faraday’s different illustrations of lines of magnetic force are given in Fig.  2 . Faraday gave the following experiment-based definition for the lines of magnetic forces:

a Iron filing pattern in case of bar magnet drawn by Faraday (Faraday 1852b , Plate IX, p. 158, Fig. 1), b Faraday’s drawing of lines of magnetic force in case of cylinder magnet, where the experimental procedure, knife blade showing the direction of lines, is combined into drawing (Faraday, 1855, vol. 1, plate 1)

A line of magnetic force may be defined as that line which is described by a very small magnetic needle, when it is so moved in either direction correspondent to its length, that the needle is constantly a tangent to the line of motion; or it is that line along which, if a transverse wire be moved in either direction, there is no tendency to the formation of any current in the wire, whilst if moved in any other direction there is such a tendency; or it is that line which coincides with the direction of the magnecrystallic axis of a crystal of bismuth, which is carried in either direction along it. The direction of these lines about and amongst magnets and electric currents, is easily represented and understood, in a general manner, by the ordinary use of iron filings. (Faraday 1852a , p. 25 (3071))

The definition describes the connection between the experiments and the visual representation of the results. Initially, the lines of force were just geometric representations, but later, Faraday treated them as physical objects (Nersessian 1984 ; Pocovi and Finlay 2002 ):

I have sometimes used the term lines of force so vaguely, as to leave the reader doubtful whether I intended it as a merely representative idea of the forces, or as the description of the path along which the power was continuously exerted. … wherever the expression line of force is taken simply to represent the disposition of forces, it shall have the fullness of that meaning; but that wherever it may seem to represent the idea of the physical mode of transmission of the force, it expresses in that respect the opinion to which I incline at present. The opinion may be erroneous, and yet all that relates or refers to the disposition of the force will remain the same. (Faraday, 1852a , p. 55-56 (3075))

He also felt that the lines of force had greater explanatory power than the dominant theory of action-at-a-distance:

Now it appears to me that these lines may be employed with great advantage to represent nature, condition, direction and comparative amount of the magnetic forces; and that in many cases they have, to the physical reasoned at least, a superiority over that method which represents the forces as concentrated in centres of action… (Faraday, 1852a , p. 26 (3074))

For giving some insight to Faraday’s visual reasoning as an epistemic practice, the following examples of Faraday’s studies of the lines of magnetic force (Faraday 1852a , 1852b ) are presented:

(a) Asking questions and defining problems: The iron filing patterns formed the empirical basis for the visual model: 2D visualization of lines of magnetic force as presented in Fig.  2 . According to Faraday, these iron filing patterns were suitable for illustrating the direction and form of the magnetic lines of force (emphasis added):

It must be well understood that these forms give no indication by their appearance of the relative strength of the magnetic force at different places, inasmuch as the appearance of the lines depends greatly upon the quantity of filings and the amount of tapping; but the direction and forms of these lines are well given, and these indicate, in a considerable degree, the direction in which the forces increase and diminish . (Faraday 1852b , p.158 (3237))

Despite being static and two dimensional on paper, the lines of magnetic force were dynamical (Nersessian 1992 , 2008 ) and three dimensional for Faraday (see Fig.  2 b). For instance, Faraday described the lines of force “expanding”, “bending,” and “being cut” (Nersessian 1992 ). In Fig.  2 b, Faraday has summarized his experiment (bar magnet and knife blade) and its results (lines of force) in one picture.

(b) Analyzing and interpreting data: The model was so powerful for Faraday that he ended up thinking them as physical objects (e.g., Nersessian 1984 ), i.e., making interpretations of the way forces act. Of course, he made a lot of experiments for showing the physical existence of the lines of force, but he did not succeed in it (Nersessian 1984 ). The following quote illuminates Faraday’s use of the lines of force in different situations:

The study of these lines has, at different times, been greatly influential in leading me to various results, which I think prove their utility as well as fertility. Thus, the law of magneto-electric induction; the earth’s inductive action; the relation of magnetism and light; diamagnetic action and its law, and magnetocrystallic action, are the cases of this kind… (Faraday 1852a , p. 55 (3174))

(c) Experimenting: In Faraday's case, he used a lot of exploratory experiments; in case of lines of magnetic force, he used, e.g., iron filings, magnetic needles, or current carrying wires (see the quote above). The magnetic field is not directly observable and the representation of lines of force was a visual model, which includes the direction, form, and magnitude of field.

(d) Modeling: There is no denying that the lines of magnetic force are visual by nature. Faraday’s views of lines of force developed gradually during the years, and he applied and developed them in different contexts such as electromagnetic, electrostatic, and magnetic induction (Nersessian 1984 ). An example of Faraday’s explanation of the effect of the wire b’s position to experiment is given in Fig.  3 . In Fig.  3 , few magnetic lines of force are drawn, and in the quote below, Faraday is explaining the effect using these magnetic lines of force (emphasis added):

Picture of an experiment with different arrangements of wires ( a , b’ , b” ), magnet, and galvanometer. Note the lines of force drawn around the magnet. (Faraday 1852a , p. 34)

It will be evident by inspection of Fig. 3 , that, however the wires are carried away, the general result will, according to the assumed principles of action, be the same; for if a be the axial wire, and b’, b”, b”’ the equatorial wire, represented in three different positions, whatever magnetic lines of force pass across the latter wire in one position, will also pass it in the other, or in any other position which can be given to it. The distance of the wire at the place of intersection with the lines of force, has been shown, by the experiments (3093.), to be unimportant. (Faraday 1852a , p. 34 (3099))

In summary, by examining the history of Faraday’s use of lines of force, we showed how visual imagery and reasoning played an important part in Faraday’s construction and representation of his “field theory”. As Gooding has stated, “many of Faraday’s sketches are far more that depictions of observation, they are tools for reasoning with and about phenomena” (2006, p. 59).

Case study 3: visualizing scientific methods, the case of a journal

The focus of the third case study is the Journal of Visualized Experiments (JoVE) , a peer-reviewed publication indexed in PubMed. The journal devoted to the publication of biological, medical, chemical, and physical research in a video format. The journal describes its history as follows:

JoVE was established as a new tool in life science publication and communication, with participation of scientists from leading research institutions. JoVE takes advantage of video technology to capture and transmit the multiple facets and intricacies of life science research. Visualization greatly facilitates the understanding and efficient reproduction of both basic and complex experimental techniques, thereby addressing two of the biggest challenges faced by today's life science research community: i) low transparency and poor reproducibility of biological experiments and ii) time and labor-intensive nature of learning new experimental techniques. ( http://www.jove.com/ )

By examining the journal content, we generate a set of categories that can be considered as indicators of relevance and significance in terms of epistemic practices of science that have relevance for science education. For example, the quote above illustrates how scientists view some norms of scientific practice including the norms of “transparency” and “reproducibility” of experimental methods and results, and how the visual format of the journal facilitates the implementation of these norms. “Reproducibility” can be considered as an epistemic criterion that sits at the heart of what counts as an experimental procedure in science:

Investigating what should be reproducible and by whom leads to different types of experimental reproducibility, which can be observed to play different roles in experimental practice. A successful application of the strategy of reproducing an experiment is an achievement that may depend on certain isiosyncratic aspects of a local situation. Yet a purely local experiment that cannot be carried out by other experimenters and in other experimental contexts will, in the end be unproductive in science. (Sarkar and Pfeifer 2006 , p.270)

We now turn to an article on “Elevated Plus Maze for Mice” that is available for free on the journal website ( http://www.jove.com/video/1088/elevated-plus-maze-for-mice ). The purpose of this experiment was to investigate anxiety levels in mice through behavioral analysis. The journal article consists of a 9-min video accompanied by text. The video illustrates the handling of the mice in soundproof location with less light, worksheets with characteristics of mice, computer software, apparatus, resources, setting up the computer software, and the video recording of mouse behavior on the computer. The authors describe the apparatus that is used in the experiment and state how procedural differences exist between research groups that lead to difficulties in the interpretation of results:

The apparatus consists of open arms and closed arms, crossed in the middle perpendicularly to each other, and a center area. Mice are given access to all of the arms and are allowed to move freely between them. The number of entries into the open arms and the time spent in the open arms are used as indices of open space-induced anxiety in mice. Unfortunately, the procedural differences that exist between laboratories make it difficult to duplicate and compare results among laboratories.

The authors’ emphasis on the particularity of procedural context echoes in the observations of some philosophers of science:

It is not just the knowledge of experimental objects and phenomena but also their actual existence and occurrence that prove to be dependent on specific, productive interventions by the experimenters” (Sarkar and Pfeifer 2006 , pp. 270-271)

The inclusion of a video of the experimental procedure specifies what the apparatus looks like (Fig.  4 ) and how the behavior of the mice is captured through video recording that feeds into a computer (Fig.  5 ). Subsequently, a computer software which captures different variables such as the distance traveled, the number of entries, and the time spent on each arm of the apparatus. Here, there is visual information at different levels of representation ranging from reconfiguration of raw video data to representations that analyze the data around the variables in question (Fig.  6 ). The practice of levels of visual representations is not particular to the biological sciences. For instance, they are commonplace in nanotechnological practices:

Visual illustration of apparatus

Video processing of experimental set-up

Computer software for video input and variable recording

In the visualization processes, instruments are needed that can register the nanoscale and provide raw data, which needs to be transformed into images. Some Imaging Techniques have software incorporated already where this transformation automatically takes place, providing raw images. Raw data must be translated through the use of Graphic Software and software is also used for the further manipulation of images to highlight what is of interest to capture the (inferred) phenomena -- and to capture the reader. There are two levels of choice: Scientists have to choose which imaging technique and embedded software to use for the job at hand, and they will then have to follow the structure of the software. Within such software, there are explicit choices for the scientists, e.g. about colour coding, and ways of sharpening images. (Ruivenkamp and Rip 2010 , pp.14–15)

On the text that accompanies the video, the authors highlight the role of visualization in their experiment:

Visualization of the protocol will promote better understanding of the details of the entire experimental procedure, allowing for standardization of the protocols used in different laboratories and comparisons of the behavioral phenotypes of various strains of mutant mice assessed using this test.

The software that takes the video data and transforms it into various representations allows the researchers to collect data on mouse behavior more reliably. For instance, the distance traveled across the arms of the apparatus or the time spent on each arm would have been difficult to observe and record precisely. A further aspect to note is how the visualization of the experiment facilitates control of bias. The authors illustrate how the olfactory bias between experimental procedures carried on mice in sequence is avoided by cleaning the equipment.

Our discussion highlights the role of visualization in science, particularly with respect to presenting visualization as part of the scientific practices. We have used case studies from the history of science highlighting a scientist’s account of how visualization played a role in the discovery of DNA and the magnetic field and from a contemporary illustration of a science journal’s practices in incorporating visualization as a way to communicate new findings and methodologies. Our implicit aim in drawing from these case studies was the need to align science education with scientific practices, particularly in terms of how visual representations, stable or dynamic, can engage students in the processes of science and not only to be used as tools for cognitive development in science. Our approach was guided by the notion of “knowledge-as-practice” as advanced by Knorr Cetina ( 1999 ) who studied scientists and characterized their knowledge as practice, a characterization which shifts focus away from ideas inside scientists’ minds to practices that are cultural and deeply contextualized within fields of science. She suggests that people working together can be examined as epistemic cultures whose collective knowledge exists as practice.

It is important to stress, however, that visual representations are not used in isolation, but are supported by other types of evidence as well, or other theories (i.e., in order to understand the helical form of DNA, or the structure, chemistry knowledge was needed). More importantly, this finding can also have implications when teaching science as argument (e.g., Erduran and Jimenez-Aleixandre 2008 ), since the verbal evidence used in the science classroom to maintain an argument could be supported by visual evidence (either a model, representation, image, graph, etc.). For example, in a group of students discussing the outcomes of an introduced species in an ecosystem, pictures of the species and the ecosystem over time, and videos showing the changes in the ecosystem, and the special characteristics of the different species could serve as visual evidence to help the students support their arguments (Evagorou et al. 2012 ). Therefore, an important implication for the teaching of science is the use of visual representations as evidence in the science curriculum as part of knowledge production. Even though studies in the area of science education have focused on the use of models and modeling as a way to support students in the learning of science (Dori et al. 2003 ; Lehrer and Schauble 2012 ; Mendonça and Justi 2013 ; Papaevripidou et al. 2007 ) or on the use of images (i.e., Korfiatis et al. 2003 ), with the term using visuals as evidence, we refer to the collection of all forms of visuals and the processes involved.

Another aspect that was identified through the case studies is that of the visual reasoning (an integral part of Faraday’s investigations). Both the verbalization and visualization were part of the process of generating new knowledge (Gooding 2006 ). Even today, most of the textbooks use the lines of force (or just field lines) as a geometrical representation of field, and the number of field lines is connected to the quantity of flux. Often, the textbooks use the same kind of visual imagery than in what is used by scientists. However, when using images, only certain aspects or features of the phenomena or data are captured or highlighted, and often in tacit ways. Especially in textbooks, the process of producing the image is not presented and instead only the product—image—is left. This could easily lead to an idea of images (i.e., photos, graphs, visual model) being just representations of knowledge and, in the worse case, misinterpreted representations of knowledge as the results of Pocovi and Finlay ( 2002 ) in case of electric field lines show. In order to avoid this, the teachers should be able to explain how the images are produced (what features of phenomena or data the images captures, on what ground the features are chosen to that image, and what features are omitted); in this way, the role of visualization in knowledge production can be made “visible” to students by engaging them in the process of visualization.

The implication of these norms for science teaching and learning is numerous. The classroom contexts can model the generation, sharing and evaluation of evidence, and experimental procedures carried out by students, thereby promoting not only some contemporary cultural norms in scientific practice but also enabling the learning of criteria, standards, and heuristics that scientists use in making decisions on scientific methods. As we have demonstrated with the three case studies, visual representations are part of the process of knowledge growth and communication in science, as demonstrated with two examples from the history of science and an example from current scientific practices. Additionally, visual information, especially with the use of technology is a part of students’ everyday lives. Therefore, we suggest making use of students’ knowledge and technological skills (i.e., how to produce their own videos showing their experimental method or how to identify or provide appropriate visual evidence for a given topic), in order to teach them the aspects of the nature of science that are often neglected both in the history of science and the design of curriculum. Specifically, what we suggest in this paper is that students should actively engage in visualization processes in order to appreciate the diverse nature of doing science and engage in authentic scientific practices.

However, as a word of caution, we need to distinguish the products and processes involved in visualization practices in science:

If one considers scientific representations and the ways in which they can foster or thwart our understanding, it is clear that a mere object approach, which would devote all attention to the representation as a free-standing product of scientific labor, is inadequate. What is needed is a process approach: each visual representation should be linked with its context of production (Pauwels 2006 , p.21).

The aforementioned suggests that the emphasis in visualization should shift from cognitive understanding—using the products of science to understand the content—to engaging in the processes of visualization. Therefore, an implication for the teaching of science includes designing curriculum materials and learning environments that create a social and epistemic context and invite students to engage in the practice of visualization as evidence, reasoning, experimental procedure, or a means of communication (as presented in the three case studies) and reflect on these practices (Ryu et al. 2015 ).

Finally, a question that arises from including visualization in science education, as well as from including scientific practices in science education is whether teachers themselves are prepared to include them as part of their teaching (Bybee 2014 ). Teacher preparation programs and teacher education have been critiqued, studied, and rethought since the time they emerged (Cochran-Smith 2004 ). Despite the years of history in teacher training and teacher education, the debate about initial teacher training and its content still pertains in our community and in policy circles (Cochran-Smith 2004 ; Conway et al. 2009 ). In the last decades, the debate has shifted from a behavioral view of learning and teaching to a learning problem—focusing on that way not only on teachers’ knowledge, skills, and beliefs but also on making the connection of the aforementioned with how and if pupils learn (Cochran-Smith 2004 ). The Science Education in Europe report recommended that “Good quality teachers, with up-to-date knowledge and skills, are the foundation of any system of formal science education” (Osborne and Dillon 2008 , p.9).

However, questions such as what should be the emphasis on pre-service and in-service science teacher training, especially with the new emphasis on scientific practices, still remain unanswered. As Bybee ( 2014 ) argues, starting from the new emphasis on scientific practices in the NGSS, we should consider teacher preparation programs “that would provide undergraduates opportunities to learn the science content and practices in contexts that would be aligned with their future work as teachers” (p.218). Therefore, engaging pre- and in-service teachers in visualization as a scientific practice should be one of the purposes of teacher preparation programs.

Achieve. (2013). The next generation science standards (pp. 1–3). Retrieved from http://www.nextgenscience.org/ .

Google Scholar  

Barber, J, Pearson, D, & Cervetti, G. (2006). Seeds of science/roots of reading . California: The Regents of the University of California.

Bungum, B. (2008). Images of physics: an explorative study of the changing character of visual images in Norwegian physics textbooks. NorDiNa, 4 (2), 132–141.

Bybee, RW. (2014). NGSS and the next generation of science teachers. Journal of Science Teacher Education, 25 (2), 211–221. doi: 10.1007/s10972-014-9381-4 .

Article   Google Scholar  

Chambers, D. (1983). Stereotypic images of the scientist: the draw-a-scientist test. Science Education, 67 (2), 255–265.

Cochran-Smith, M. (2004). The problem of teacher education. Journal of Teacher Education, 55 (4), 295–299. doi: 10.1177/0022487104268057 .

Conway, PF, Murphy, R, & Rath, A. (2009). Learning to teach and its implications for the continuum of teacher education: a nine-country cross-national study .

Crick, F. (1988). What a mad pursuit . USA: Basic Books.

Dimopoulos, K, Koulaidis, V, & Sklaveniti, S. (2003). Towards an analysis of visual images in school science textbooks and press articles about science and technology. Research in Science Education, 33 , 189–216.

Dori, YJ, Tal, RT, & Tsaushu, M. (2003). Teaching biotechnology through case studies—can we improve higher order thinking skills of nonscience majors? Science Education, 87 (6), 767–793. doi: 10.1002/sce.10081 .

Duschl, RA, & Bybee, RW. (2014). Planning and carrying out investigations: an entry to learning and to teacher professional development around NGSS science and engineering practices. International Journal of STEM Education, 1 (1), 12. doi: 10.1186/s40594-014-0012-6 .

Duschl, R., Schweingruber, H. A., & Shouse, A. (2008). Taking science to school . Washington DC: National Academies Press.

Erduran, S, & Jimenez-Aleixandre, MP (Eds.). (2008). Argumentation in science education: perspectives from classroom-based research . Dordrecht: Springer.

Eurydice. (2012). Developing key competencies at school in Europe: challenges and opportunities for policy – 2011/12 (pp. 1–72).

Evagorou, M, Jimenez-Aleixandre, MP, & Osborne, J. (2012). “Should we kill the grey squirrels?” A study exploring students’ justifications and decision-making. International Journal of Science Education, 34 (3), 401–428. doi: 10.1080/09500693.2011.619211 .

Faraday, M. (1852a). Experimental researches in electricity. – Twenty-eighth series. Philosophical Transactions of the Royal Society of London, 142 , 25–56.

Faraday, M. (1852b). Experimental researches in electricity. – Twenty-ninth series. Philosophical Transactions of the Royal Society of London, 142 , 137–159.

Gilbert, JK. (2010). The role of visual representations in the learning and teaching of science: an introduction (pp. 1–19).

Gilbert, J., Reiner, M. & Nakhleh, M. (2008). Visualization: theory and practice in science education . Dordrecht, The Netherlands: Springer.

Gooding, D. (2006). From phenomenology to field theory: Faraday’s visual reasoning. Perspectives on Science, 14 (1), 40–65.

Gooding, D, Pinch, T, & Schaffer, S (Eds.). (1993). The uses of experiment: studies in the natural sciences . Cambridge: Cambridge University Press.

Hogan, K, & Maglienti, M. (2001). Comparing the epistemological underpinnings of students’ and scientists’ reasoning about conclusions. Journal of Research in Science Teaching, 38 (6), 663–687.

Knorr Cetina, K. (1999). Epistemic cultures: how the sciences make knowledge . Cambridge: Harvard University Press.

Korfiatis, KJ, Stamou, AG, & Paraskevopoulos, S. (2003). Images of nature in Greek primary school textbooks. Science Education, 88 (1), 72–89. doi: 10.1002/sce.10133 .

Latour, B. (2011). Visualisation and cognition: drawing things together (pp. 1–32).

Latour, B, & Woolgar, S. (1979). Laboratory life: the construction of scientific facts . Princeton: Princeton University Press.

Lehrer, R, & Schauble, L. (2012). Seeding evolutionary thinking by engaging children in modeling its foundations. Science Education, 96 (4), 701–724. doi: 10.1002/sce.20475 .

Longino, H. E. (2002). The fate of knowledge . Princeton: Princeton University Press.

Lynch, M. (2006). The production of scientific images: vision and re-vision in the history, philosophy, and sociology of science. In L Pauwels (Ed.), Visual cultures of science: rethinking representational practices in knowledge building and science communication (pp. 26–40). Lebanon, NH: Darthmouth College Press.

Lynch, M. & S. Y. Edgerton Jr. (1988). ‘Aesthetic and digital image processing representational craft in contemporary astronomy’, in G. Fyfe & J. Law (eds), Picturing Power; Visual Depictions and Social Relations (London, Routledge): 184 – 220.

Mendonça, PCC, & Justi, R. (2013). An instrument for analyzing arguments produced in modeling-based chemistry lessons. Journal of Research in Science Teaching, 51 (2), 192–218. doi: 10.1002/tea.21133 .

National Research Council (2000). Inquiry and the national science education standards . Washington DC: National Academies Press.

National Research Council (2012). A framework for K-12 science education . Washington DC: National Academies Press.

Nersessian, NJ. (1984). Faraday to Einstein: constructing meaning in scientific theories . Dordrecht: Martinus Nijhoff Publishers.

Book   Google Scholar  

Nersessian, NJ. (1992). How do scientists think? Capturing the dynamics of conceptual change in science. In RN Giere (Ed.), Cognitive Models of Science (pp. 3–45). Minneapolis: University of Minnesota Press.

Nersessian, NJ. (2008). Creating scientific concepts . Cambridge: The MIT Press.

Osborne, J. (2014). Teaching scientific practices: meeting the challenge of change. Journal of Science Teacher Education, 25 (2), 177–196. doi: 10.1007/s10972-014-9384-1 .

Osborne, J. & Dillon, J. (2008). Science education in Europe: critical reflections . London: Nuffield Foundation.

Papaevripidou, M, Constantinou, CP, & Zacharia, ZC. (2007). Modeling complex marine ecosystems: an investigation of two teaching approaches with fifth graders. Journal of Computer Assisted Learning, 23 (2), 145–157. doi: 10.1111/j.1365-2729.2006.00217.x .

Pauwels, L. (2006). A theoretical framework for assessing visual representational practices in knowledge building and science communications. In L Pauwels (Ed.), Visual cultures of science: rethinking representational practices in knowledge building and science communication (pp. 1–25). Lebanon, NH: Darthmouth College Press.

Philips, L., Norris, S. & McNab, J. (2010). Visualization in mathematics, reading and science education . Dordrecht, The Netherlands: Springer.

Pocovi, MC, & Finlay, F. (2002). Lines of force: Faraday’s and students’ views. Science & Education, 11 , 459–474.

Richards, A. (2003). Argument and authority in the visual representations of science. Technical Communication Quarterly, 12 (2), 183–206. doi: 10.1207/s15427625tcq1202_3 .

Rothbart, D. (1997). Explaining the growth of scientific knowledge: metaphors, models and meaning . Lewiston, NY: Mellen Press.

Ruivenkamp, M, & Rip, A. (2010). Visualizing the invisible nanoscale study: visualization practices in nanotechnology community of practice. Science Studies, 23 (1), 3–36.

Ryu, S, Han, Y, & Paik, S-H. (2015). Understanding co-development of conceptual and epistemic understanding through modeling practices with mobile internet. Journal of Science Education and Technology, 24 (2-3), 330–355. doi: 10.1007/s10956-014-9545-1 .

Sarkar, S, & Pfeifer, J. (2006). The philosophy of science, chapter on experimentation (Vol. 1, A-M). New York: Taylor & Francis.

Schwartz, RS, Lederman, NG, & Abd-el-Khalick, F. (2012). A series of misrepresentations: a response to Allchin’s whole approach to assessing nature of science understandings. Science Education, 96 (4), 685–692. doi: 10.1002/sce.21013 .

Schwarz, CV, Reiser, BJ, Davis, EA, Kenyon, L, Achér, A, Fortus, D, et al. (2009). Developing a learning progression for scientific modeling: making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46 (6), 632–654. doi: 10.1002/tea.20311 .

Watson, J. (1968). The Double Helix: a personal account of the discovery of the structure of DNA . New York: Scribner.

Watson, J, & Berry, A. (2004). DNA: the secret of life . New York: Alfred A. Knopf.

Wickman, PO. (2004). The practical epistemologies of the classroom: a study of laboratory work. Science Education, 88 , 325–344.

Wu, HK, & Shah, P. (2004). Exploring visuospatial thinking in chemistry learning. Science Education, 88 (3), 465–492. doi: 10.1002/sce.10126 .

Download references

Acknowledgements

The authors would like to acknowledge all reviewers for their valuable comments that have helped us improve the manuscript.

Author information

Authors and affiliations.

University of Nicosia, 46, Makedonitissa Avenue, Egkomi, 1700, Nicosia, Cyprus

Maria Evagorou

University of Limerick, Limerick, Ireland

Sibel Erduran

University of Tampere, Tampere, Finland

Terhi Mäntylä

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Maria Evagorou .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

ME carried out the introductory literature review, the analysis of the first case study, and drafted the manuscript. SE carried out the analysis of the third case study and contributed towards the “Conclusions” section of the manuscript. TM carried out the second case study. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0 ), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Evagorou, M., Erduran, S. & Mäntylä, T. The role of visual representations in scientific practices: from conceptual understanding and knowledge generation to ‘seeing’ how science works. IJ STEM Ed 2 , 11 (2015). https://doi.org/10.1186/s40594-015-0024-x

Download citation

Received : 29 September 2014

Accepted : 16 May 2015

Published : 19 July 2015

DOI : https://doi.org/10.1186/s40594-015-0024-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Visual representations
  • Epistemic practices
  • Science learning

what is visual representation in psychology

Logo for

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

The Visual System

Learning Objectives

By the end of this section, you will be able to:

  • Describe the basic anatomy of the visual system
  • Discuss how rods and cones contribute to different aspects of vision
  • Describe how monocular and binocular cues are used in the perception of depth

The visual system constructs a mental representation of the world around us (Figure 5.11). This contributes to our ability to successfully navigate through physical space and interact with important individuals and objects in our environments. This section will provide an overview of the basic anatomy and function of the visual system. In addition, we will explore our ability to perceive color and depth.

Several photographs of peoples’ eyes are shown.

The eye is the major sensory organ involved in  vision  (Figure 5.12). Light waves are transmitted across the cornea and enter the eye through the pupil. The  cornea  is the transparent covering over the eye. It serves as a barrier between the inner eye and the outside world, and it is involved in focusing light waves that enter the eye. The  pupil  is the small opening in the eye through which light passes, and the size of the pupil can change as a function of light levels as well as emotional arousal. When light levels are low, the pupil will become dilated, or expanded, to allow more light to enter the eye. When light levels are high, the pupil will constrict, or become smaller, to reduce the amount of light that enters the eye. The pupil’s size is controlled by muscles that are connected to the  iris , which is the colored portion of the eye.

Different parts of the eye are labeled in this illustration. The cornea, pupil, iris, and lens are situated toward the front of the eye, and at the back are the optic nerve, fovea, and retina.

After passing through the pupil, light crosses the  lens , a curved, transparent structure that serves to provide additional focus. The lens is attached to muscles that can change its shape to aid in focusing light that is reflected from near or far objects. In a normal-sighted individual, the lens will focus images perfectly on a small indentation in the back of the eye known as the  fovea , which is part of the  retina , the light-sensitive lining of the eye. The fovea contains densely packed specialized photoreceptor cells (Figure 5.13). These  photoreceptor  cells, known as cones , are light-detecting cells. The  cones  are specialized types of photoreceptors that work best in bright light conditions. Cones are very sensitive to acute detail and provide tremendous spatial resolution. They also are directly involved in our ability to perceive color.

While cones are concentrated in the fovea, where images tend to be focused, rods, another type of photoreceptor, are located throughout the remainder of the retina.  Rods  are specialized photoreceptors that work well in low light conditions, and while they lack the spatial resolution and color function of the cones, they are involved in our vision in dimly lit environments as well as in our perception of movement on the periphery of our visual field.

This illustration shows light reaching the optic nerve, beneath which are Ganglion cells, and then rods and cones.

We have all experienced the different sensitivities of rods and cones when making the transition from a brightly lit environment to a dimly lit environment. Imagine going to see a blockbuster movie on a clear summer day. As you walk from the brightly lit lobby into the dark theater, you notice that you immediately have difficulty seeing much of anything. After a few minutes, you begin to adjust to the darkness and can see the interior of the theater. In the bright environment, your vision was dominated primarily by cone activity. As you move to the dark environment, rod activity dominates, but there is a delay in transitioning between the phases. If your rods do not transform light into nerve impulses as easily and efficiently as they should, you will have difficulty seeing in dim light, a condition known as night blindness.

Rods and cones are connected (via several interneurons) to retinal ganglion cells. Axons from the retinal ganglion cells converge and exit through the back of the eye to form the  optic nerve . The optic nerve carries visual information from the retina to the brain. There is a point in the visual field called the  blind spot : Even when light from a small object is focused on the blind spot, we do not see it. We are not consciously aware of our blind spots for two reasons: First, each eye gets a slightly different view of the visual field; therefore, the blind spots do not overlap. Second, our visual system fills in the blind spot so that although we cannot respond to visual information that occurs in that portion of the visual field, we are also not aware that information is missing.

The optic nerve from each eye merges just below the brain at a point called the  optic chiasm . As Figure 5.14 shows, the optic chiasm is an X-shaped structure that sits just below the cerebral cortex at the front of the brain. At the point of the optic chiasm , information from the right visual field (which comes from both eyes) is sent to the left side of the brain, and information from the left visual field is sent to the right side of the brain.

An illustration shows the location of the occipital lobe, optic chiasm, optic nerve, and the eyes in relation to their position in the brain and head.

Once inside the brain, visual information is sent via a number of structures to the occipital lobe at the back of the brain for processing. Visual information might be processed in parallel pathways which can generally be described as the “what pathway” (ventral) and the “where/how” pathway (dorsal), see Figure 5.15. The “ what pathway ” is involved in object recognition and identification, while the “ where/how pathway ” is involved with location in space and how one might interact with a particular visual stimulus (Milner & Goodale, 2008; Ungerleider & Haxby, 1994). For example, when you see a ball rolling down the street, the “what pathway” identifies what the object is, and the “where/how pathway” identifies its location or movement in space.

WHAT DO YOU THINK? The Ethics of Research Using Animals

David Hubel and Torsten Wiesel were awarded the Nobel Prize in Medicine in 1981 for their research on the visual system. They collaborated for more than twenty years and made significant discoveries about the neurology of visual perception (Hubel & Wiesel, 1959, 1962, 1963, 1970; Wiesel & Hubel, 1963). They studied animals, mostly cats and monkeys. Although they used several techniques, they did considerable single unit recordings, during which tiny electrodes were inserted in the animal’s brain to determine when a single cell was activated. Among their many discoveries, they found that specific brain cells respond to lines with specific orientations (called ocular dominance), and they mapped the way those cells are arranged in areas of the visual cortex known as columns and hypercolumns.

In some of their research, they sutured one eye of newborn kittens closed and followed the development of the kittens’ vision. They discovered there was a critical period of development for vision. If kittens were deprived of input from one eye, other areas of their visual cortex filled in the area that was normally used by the eye that was sewn closed. In other words, neural connections that exist at birth can be lost if they are deprived of sensory input.

What do you think about sewing a kitten’s eye closed for research? To many animal advocates, this would seem brutal, abusive, and unethical. What if you could do research that would help ensure babies and children born with certain conditions could develop normal vision instead of becoming blind? Would you want that research done? Would you conduct that research, even if it meant causing some harm to cats? Would you think the same way if you were the parent of such a child? What if you worked at the animal shelter?

Like virtually every other industrialized nation, Canada permits medical experimentation on animals, with few limitations (assuming sufficient scientific justification). The goal of any laws that exist is not to ban such tests but rather to limit unnecessary animal suffering by establishing standards for the humane treatment and housing of animals in laboratories.

As explained by Stephen Latham, the director of the Interdisciplinary Center for Bioethics at Yale (2012), possible legal and regulatory approaches to animal testing vary on a continuum from strong government regulation and monitoring of all experimentation at one end, to a self-regulated approach that depends on the ethics of the researchers at the other end. The United Kingdom has the most significant regulatory scheme, whereas Japan uses the self-regulation approach. The U.S. and Canadian approach is somewhere in the middle, the result of a gradual blending of the two approaches.

There is no question that medical research is a valuable and important practice. The question is whether the use of animals is a necessary or even best practice for producing the most reliable results. Alternatives include the use of patient-drug databases, virtual drug trials, computer models and simulations, and noninvasive imaging techniques such as magnetic resonance imaging and computed tomography scans (“Animals in Science/Alternatives,” n.d.). Other techniques, such as microdosing, use humans not as test animals but as a means to improve the accuracy and reliability of test results. In vitro methods based on human cell and tissue cultures, stem cells, and genetic testing methods are also increasingly available.

In Canada, the CCAC (Canadian Council on Animal Care) oversees the care and use of animals in research. In order to receive federal funding, an institution must comply with, and be approved by, the CCAC to conduct research using animals. You can find out more about the CCAC on their website .

You can also read about animal research and how the CCAC is involved at Queen’s University, here .

Color and Depth Perception

We do not see the world in black and white; neither do we see it as two-dimensional (2-D) or flat (just height and width, no depth). Let’s look at how color vision works and how we perceive three dimensions (height, width, and depth).

Color Vision

Normal-sighted individuals have three different types of cones that mediate  color vision . Each of these cone types is maximally sensitive to a slightly different wavelength of light. According to the  trichromatic theory of color vision , shown in Figure 5.16, all colors in the spectrum can be produced by combining red, green, and blue. The three types of cones are each receptive to one of the colors.

A graph is shown with “sensitivity” plotted on the y-axis and “Wavelength” in nanometers plotted along the x-axis with measurements of 400, 500, 600, and 700. Three lines in different colors move from the base to the peak of the y axis, and back to the base. The blue line begins at 400 nm and hits its peak of sensitivity around 455 nanometers, before the sensitivity drops off at roughly the same rate at which it increased, returning to the lowest sensitivity around 530 nm . The green line begins at 400 nm and reaches its peak of sensitivity around 535 nanometers. Its sensitivity then decreases at roughly the same rate at which it increased, returning to the lowest sensitivity around 650 nm. The red line follows the same pattern as the first two, beginning at 400 nm, increasing and decreasing at the same rate, and it hits its height of sensitivity around 580 nanometers. Below this graph is a horizontal bar showing the colors of the visible spectrum.

CONNECT THE CONCEPTS

Colorblindness: a personal story.

Several years ago, I dressed to go to a public function and walked into the kitchen where my 7-year-old daughter sat. She looked up at me, and in her most stern voice, said, “You can’t wear that.” I asked, “Why not?” and she informed me the colors of my clothes did not match. She had complained frequently that I was bad at matching my shirts, pants, and ties, but this time, she sounded especially alarmed. As a single father with no one else to ask at home, I drove us to the nearest convenience store and asked the store clerk if my clothes matched. She said my pants were a bright green color, my shirt was a reddish-orange, and my tie was brown. She looked at my quizzically and said, “No way do your clothes match.” Over the next few days, I started asking my coworkers and friends if my clothes matched. After several days of being told that my coworkers just thought I had “a really unique style,” I made an appointment with an eye doctor and was tested (Figure 5.17). It was then that I found out that I was colorblind. I cannot differentiate between most greens, browns, and reds. Fortunately, other than unknowingly being badly dressed, my colorblindness rarely harms my day-to-day life.

The figure includes three large circles that are made up of smaller circles of varying shades and sizes. Inside each large circle is a number that is made visible only by its different color. The first circle has an orange number 12 in a background of green. The second color has a green number 74 in a background of orange. The third circle has a red and brown number 42 in a background of black and gray.

Some forms of color deficiency are rare. Seeing in grayscale (only shades of black and white) is extremely rare, and people who do so only have rods, which means they have very low visual acuity and cannot see very well. The most common X-linked inherited abnormality is red-green color blindness (Birch, 2012). Approximately 8% of males with European Caucasian descent, 5% of Asian males, 4% of African males, and less than 2% of indigenous American males, Australian males, and Polynesian males have red-green color deficiency (Birch, 2012). Comparatively, only about 0.4% of females from European Caucasian descent have red-green color deficiency (Birch, 2012).

The trichromatic theory of color vision is not the only theory—another major theory of color vision is known as the  opponent-process theory . According to this theory, color is coded in opponent pairs: black-white, yellow-blue, and green-red. The basic idea is that some cells of the visual system are excited by one of the opponent colors and inhibited by the other. So, a cell that was excited by wavelengths associated with green would be inhibited by wavelengths associated with red, and vice versa. One of the implications of opponent processing is that we do not experience greenish-reds or yellowish-blues as colors. Another implication is that this leads to the experience of negative afterimages. An  afterimage  describes the continuation of a visual sensation after removal of the stimulus. For example, when you stare briefly at the sun and then look away from it, you may still perceive a spot of light although the stimulus (the sun) has been removed. When color is involved in the stimulus, the color pairings identified in the opponent-process theory lead to a negative afterimage. You can test this concept using the flag in Figure 5.18.

An illustration shows a green flag with a thick, black-bordered yellow lines meeting slightly to the left of the center. A small white dot sits within the yellow space in the exact center of the flag.

But these two theories—the trichromatic theory of color vision and the opponent-process theory—are not mutually exclusive. Research has shown that they just apply to different levels of the nervous system. For visual processing on the retina, trichromatic theory applies: the cones are responsive to three different wavelengths that represent red, blue, and green. But once the signal moves past the retina on its way to the brain, the cells respond in a way consistent with opponent-process theory (Land, 1959; Kaiser, 1997).

Depth Perception

Our ability to perceive spatial relationships in three-dimensional (3-D) space is known as  depth perception . With depth perception, we can describe things as being in front, behind, above, below, or to the side of other things.

Our world is three-dimensional, so it makes sense that our mental representation of the world has three-dimensional properties. We use a variety of cues in a visual scene to establish our sense of depth. Some of these are  binocular   cues , which means that they rely on the use of both eyes. One example of a binocular depth cue is  binocular disparity , the slightly different view of the world that each of our eyes receives. To experience this slightly different view, do this simple exercise: extend your arm fully and extend one of your fingers and focus on that finger. Now, close your left eye without moving your head, then open your left eye and close your right eye without moving your head. You will notice that your finger seems to shift as you alternate between the two eyes because of the slightly different view each eye has of your finger.

A 3-D movie works on the same principle: the special glasses you wear allow the two slightly different images projected onto the screen to be seen separately by your left and your right eye. As your brain processes these images, you have the illusion that the leaping animal or running person is coming right toward you.

Although we rely on binocular cues to experience depth in our 3-D world, we can also perceive depth in 2-D arrays. Think about all the paintings and photographs you have seen. Generally, you pick up on depth in these images even though the visual stimulus is 2-D. When we do this, we are relying on a number of  monocular cues , or cues that require only one eye. If you think you can’t see depth with one eye, note that you don’t bump into things when using only one eye while walking—and, in fact, we have more monocular cues than binocular cues.

An example of a monocular cue would be what is known as linear perspective.  Linear perspective  refers to the fact that we perceive depth when we see two parallel lines that seem to converge in an image (Figure 5.19). Some other monocular depth cues are interposition, the partial overlap of objects, and the relative size and closeness of images to the horizon. Can you think of some additional pictorial depth cues that artists use to make you see depth in a 2D painting or photograph?

"A photograph shows an empty road that continues toward the horizon.

DIG DEEPER: Stereoblindness

Bruce Bridgeman was born with an extreme case of lazy eye that resulted in him being stereoblind, or unable to respond to binocular cues of depth. He relied heavily on monocular depth cues, but he never had a true appreciation of the 3-D nature of the world around him. This all changed one night in 2012 while Bruce was seeing a movie with his wife.

The movie the couple was going to see was shot in 3-D, and even though he thought it was a waste of money, Bruce paid for the 3-D glasses when he purchased his ticket. As soon as the film began, Bruce put on the glasses and experienced something completely new. For the first time in his life he appreciated the true depth of the world around him. Remarkably, his ability to perceive depth persisted outside of the movie theater.

There are cells in the nervous system that respond to binocular depth cues. Normally, these cells require activation during early development in order to persist, so experts familiar with Bruce’s case (and others like his) assume that at some point in his development, Bruce must have experienced at least a fleeting moment of binocular vision. It was enough to ensure the survival of the cells in the visual system tuned to binocular cues. The mystery now is why it took Bruce nearly 70 years to have these cells activated (Peck, 2012).

the transparent covering over the eye

the small opening in the eye through which light passes, and the size of the pupil can change as a function of light levels as well as emotional arousal

the colored portion of the eye

a curved, transparent structure that serves to provide additional focus. The lens is attached to muscles that can change its shape to aid in focusing light that is reflected from near or far objects

The part of the retina where images are focused; contains cones

light-detecting cells. The cones are specialized types of photoreceptors that work best in bright light conditions. Cones are very sensitive to acute detail and provide tremendous spatial resolution. They also are directly involved in our ability to perceive color.

specialized photoreceptors that work well in low light conditions, and while they lack the spatial resolution and color function of the cones, they are involved in our vision in dimly lit environments as well as in our perception of movement on the periphery of our visual field

carries visual information from the retina to the brain

part of our visual field where the optic nerve leaves the eye, meaning we do not receive visual information for that area

information from the right visual field (which comes from both eyes) is sent to the left side of the brain, and information from the left visual field is sent to the right side of the brain.

part of the cerebral cortex associated with visual processing; contains the primary visual cortex

involved in object recognition and identification

involved with location in space and how one might interact with a particular visual stimulus

suggests that all colors in the spectrum can be produced by combining red, green, and blue. The three types of cones are each receptive to one of the colors

suggests that color is coded in opponent pairs: black-white, yellow-blue, and green-red

the continuation of a visual sensation after removal of the stimulus

depth cues that they rely on the use of both eyes

the slightly different view of the world that each of our eyes receives

depth cues that require only one eye, such as in 2D paintings or photographs

the fact that we perceive depth when we see two parallel lines that seem to converge in an image

Introduction to Psychology Copyright © 2021 by Southern Alberta Institution of Technology (SAIT) is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

ORIGINAL RESEARCH article

Applying generative artificial intelligence to cognitive models of decision making.

\r\nTyler Malloy

  • Dynamic Decision Making Laboratory, Department of Social and Decision Sciences, Dietrich College, Carnegie Mellon University, Pittsburgh, PA, United States

Introduction: Generative Artificial Intelligence has made significant impacts in many fields, including computational cognitive modeling of decision making, although these applications have not yet been theoretically related to each other. This work introduces a categorization of applications of Generative Artificial Intelligence to cognitive models of decision making.

Methods: This categorization is used to compare the existing literature and to provide insight into the design of an ablation study to evaluate our proposed model in three experimental paradigms. These experiments used for model comparison involve modeling human learning and decision making based on both visual information and natural language, in tasks that vary in realism and complexity. This comparison of applications takes as its basis Instance-Based Learning Theory, a theory of experiential decision making from which many models have emerged and been applied to a variety of domains and applications.

Results: The best performing model from the ablation we performed used a generative model to both create memory representations as well as predict participant actions. The results of this comparison demonstrates the importance of generative models in both forming memories and predicting actions in decision-modeling research.

Discussion: In this work, we present a model that integrates generative and cognitive models, using a variety of stimuli, applications, and training methods. These results can provide guidelines for cognitive modelers and decision making researchers interested in integrating Generative AI into their methods.

1 Introduction

Cognitive models of decision making aim to represent and replicate the cognitive mechanisms driving decisions in various contexts. The motivation for the design and structure of cognitive models is based on various methods; some models focus on the connection to biological processes of the brain, while others aim to emulate more human-like behavior without a biological connection. However, these motivations are not exhaustive or mutually exclusive. In fact, many approaches seek to reconcile these objectives and integrate the various methods. This paper proposes a framework to apply Generative Artificial Intelligence (GAI) research methods to cognitive modeling approaches and evaluates the efficacy of an integrated model to achieve the varied goals of decision modeling research.

Generative Models (GMs) are a category of AI approaches that generate data, often corresponding to the input data type, covering textual, visual, auditory, motor, or multi-modal data ( Cao et al., 2023 ). GMs have shown remarkable advances, in various domains, in the effective generation and representation of complex data, unattainable with conventional methods ( Bandi et al., 2023 ). The large space of research in GAI methods can be daunting for cognitive modelers interested in applying these techniques to their models for various reasons. The complexity and variety of these approaches are one of the motivations of this work, where we additionally seek to provide insights on the methods for applying GAI to cognitive models of decision making.

Although GMs have shown impressive success in various data modalities relevant to decision science research, there are significant concerns about their utilization ( Bommasani et al., 2021 ). This is due in part to the potential of biases present in language processing and generating models such as Large Language Models (LLMs) ( Bender et al., 2021 ). Various lines of research have suggested close connections to GMs and biological processes in some contexts, such as Variational Autoencoders (VAEs) ( Higgins et al., 2021 ) and Generative Adversarial Networks (GANs) ( Gershman, 2019 ). However, there is a general lack of understanding of how GMs integrate with decision making in a biologically plausible manner. Due to this lack of clarity on the relationship between GMs in decision making and biological realism, careful consideration must be given when choosing integrations with cognitive models aiming at reflecting biological realities.

Previously, the integration of GMs with cognitive models of decision making has been largely done on a case-by-case basis aimed at satisfying the needs of particular learning tasks ( Bates and Jacobs, 2020 ; Malloy et al., 2022a ; Xu et al., 2022 ), for a complete list of these approaches, see the Supplementary material . Consequently, there is an absence of a comprehensive framework for potential methods to integrate GMs and cognitive models of decision making. Understanding the impact of different integration methods is important, especially given the risks associated with improper application of AI technologies, particularly new ones within decision-making systems ( Navigli et al., 2023 ) and the broader social sciences ( Bommasani et al., 2021 ). Thus, elucidating these integration strategies has significant implications for ensuring the responsible and effective deployment of AI in decision-making contexts.

To address the challenges posed by GMs, one approach is to construct an integration of GMs and cognitive models in a way that allows for effective testing of component parts. This research introduces a novel application of GAI research and cognitive modeling of decision making, as well as a categorization of the different features of past integrations. This categorization not only aims at informing the design of future integrations, but also provides a means of comparison between different integration approaches. Based on this framework, we offer an ablation study to compare the integration of GMs into cognitive models. This method enables a thorough analysis of the individual components of these integrations, shedding light on how different integration methods affect behavior.

2 Related work

2.1 cognitive architectures and instance-based learning theory.

Several Cognitive Architectures (CAs) have been developed and applied to explain and predict reasoning, decision making, and learning in a variety of tasks, including SOAR ( Laird et al., 1987 ), CLARION ( Sun, 2006 ), and ACT-R ( Anderson et al., 1997 ). Among these, ACT-R has been the basis for many other frameworks and theories that have emerged from the mechanisms it proposes. In particular, Instance-Based Learning Theory (IBLT) is based on an ACT-R mechanism that represents the process of symbolic cognition and emergent reasoning to make predictions from memory and determine human learning and decision making ( Gonzalez et al., 2003 ).

Instance-Based Learning Theory (IBLT) is a cognitive approach that mirrors human decision-making processes by relying on the accumulation and retrieval of examples from memory instead of relying on abstract rules ( Gonzalez et al., 2003 ). IBL models serve as tangible applications of IBLT tailored to specific tasks, encapsulating decision contexts, actions, and rewards pertinent to particular problem domains. These models learn iteratively from previous experiences, store instances of past decisions, and refine the results through feedback from the environment. Subsequently, IBL models leverage this repository of learned instances to navigate novel decision challenges. The adaptive nature of IBL models makes them particularly effective in contexts characterized by variability and uncertainty, as they can adapt flexibly to new situations by drawing parallels with past encounters. In particular, IBL models excel at capturing intricate patterns and relationships inherent in human behavior, a feat often challenging for explicit rule-based representations. Thus, IBLT stands as an intuitive framework to clarify how humans assimilate knowledge from experience and apply it to novel decision-making scenarios ( Gonzalez, 2023 ).

In this research we selected IBLT due to its theoretical connection to the ACT-R cognitive architecture and its wide and general applicability to a multitude of tasks. IBL models have demonstrated fidelity to human decision making processes and have demonstrated their efficacy in various domains, including repeated binary choice tasks ( Gonzalez and Dutt, 2011 ; Lejarraga et al., 2012 ), sequential decision-making ( Bugbee and Gonzalez, 2022 ), theory of mind applications ( Nguyen and Gonzalez, 2022 ), and practical applications such as identifying phishing emails ( Cranford et al., 2019 ), cyber defense ( Cranford et al., 2020 ), and cyber attack decision-making ( Aggarwal et al., 2022 ).

IBL models make decisions by storing and retrieving instances i in memory M . Instances are stored for each decision made by selecting options k . Instances are composed of features j in the set F and utility outcomes u i . These options are observed in an order represented by the time step t , and the time steps in which an instance occurred is given T ( i ).

Each instance i that occurred at time t has an activation value, which represents the availability of that instance in memory ( Anderson and Lebiere, 2014 ). The activation is a function of the frequency of occurrence of an instance, its memory decay, the similarity between instances in memory and the current instance, and noise. The general similarity of an instance is represented by summing the value S ij over all attributes, which is the similarity of the attribute j of instance i to the current state. This gives the activation equation as:

The parameters that are set either by modelers or set to default values are the decay parameter d ; the mismatch penalty μ; the attribute weight of each j feature ω j ; and the noise parameter σ. The default values for these parameters are ( d = 0.5, μ = 1, ω j = 1, σ = 0.25), which are based on previous studies on dynamic decision making in humans ( Gonzalez and Dutt, 2011 ; Lejarraga et al., 2012 ; Gonzalez, 2013 ; Nguyen et al., 2023 ).

The probability of retrieval represents the probability that a single instance in memory will be retrieved when estimating the value associated with an option. To calculate this probability of retrieval, IBL models apply a weighted soft-max function to the memory instance activation values A i ( t ) ( Equation 1 ) giving the equation:

The parameter that is either set by modelers or set to its default value is the temperature parameter τ, which controls the uniformity of the probability distribution defined by this soft-max equation. The default value for this parameter is τ = σ 2 .

The blended value of an option k is calculated at time step t according to the utility outcomes u i weighted by the probability of retrieval of that instance P i ( Equation 2 ) and summing over all instances in memory M k to give the equation:

IBL models use this Equation (3) to predict the value of options in decision-making tasks. These option blended values are ultimately used to determine the behavior of the IBL model, by selecting from the options currently available the choice with the highest estimated utility. The specific notation for these IBL model equations are described in the python programming package PyIBL ( Morrison and Gonzalez, 2024 ).

2.2 Generative Artificial Intelligence

Recent methods in Generative Artificial Intelligence (GAI) have shown impressive success in a variety of domains in the production of natural language ( Brown et al., 2020 ), audio ( Kim et al., 2018 ), motor commands ( Ren and Ben-Tzvi, 2020 ), as well as combinations of these through multi-modal approaches ( Achiam et al., 2023 ). This is done through the training of Generative Models (GMs) which take as input some stimuli, often of the same type as the output, and learn to generate text, audio, and motor commands based on the input and training method. In this work, we focus on the processing of visual and natural language information through the formation of representations achieved by GMs that are useful for cognitive modeling.

Visual GMs form representations of visual information and are originally structured or can be altered to additionally generate utility predictions that are useful for decision-making tasks ( Higgins et al., 2017 ). These utility predictions generated by visual GMs have previously been applied to the prediction of human learning and decision making in contextual bandit tasks ( Malloy et al., 2022a ), as well as human transfer of learning ( Malloy et al., 2023 ). Our approach is agnostic to the specific GM being used, which means that it can be applied to comparisons of different visual GMs to compare their performance.

2.2.1 Representing data with GMs

The first of two desiderata to integrate GM in cognitive modeling of decision making was to relate models to biological processes in humans and animals. Here, this is understood within the context of representing data with GMs in a manner similar to that represented in biological systems. Recent research on GM-formed data representations has demonstrated close similarities to biological systems ( Higgins et al., 2021 ), motivating their integration into cognitive models that are interested in similarity to biological cognitive systems.

An example of such a GM that is used in this work is the β-Variational Autoencoder (β-VAE) ( Higgins et al., 2016 , 2017 ) which learns representations that have been related to biological brain functioning, achieved by comparing the activity of individual neurons in the inferotemporal face patch of Macque monkeys to learned model representations when trained on images of human faces ( Higgins et al., 2021 ). The format of these representations specifically is defined by a multi-variate Gaussian distribution that is sampled from to form a latent representation, which is fed through the decoder neural network layers to result in a lossy reconstruction of the original stimuli. The training of these models includes a variable information bottleneck controlled by the β parameter. This information-bottleneck motivation of these models has been associated with cognitive limitations that impact decision making in humans, resulting in suboptimal behavior ( Bhui et al., 2021 ; Lai and Gershman, 2021 ).

These representations have been related to the processing of visual information from humans in learning tasks ( Malloy and Sims, 2022 ), as they excel in retaining key details associated with stimulus generation factors (such as the shape of a ball or the age of a person's face) ( Malloy et al., 2022b ). Although we employ β-VAEs in this work, there are many alternative visual GMs that are capable of forming representations useful for decision making. This includes visual generation models including Generative Adversarial Networks (GANs) and Visual Transformer (ViT) based models. In our previous work, we performed a comparative analysis of various integrations with an IBL model ( Malloy et al., 2023 ) and demonstrated that each can be effectively integrated with IBL to produce reasonable human-like behavior, but that information-constrained methods like the β-VAE are most accurate.

2.2.2 Decision making with GMs

The second of two desiderata to integrate GMs into cognitive models of decision making is generating behavior that is similar to biological systems. This possibility is most salient in cases where GMs are capable of producing complex data, such as text, speech, or motor commands, which alternative models are not equipped to produce. However, in many cases making decisions in specific contexts with pre-trained GMs can be difficult due to the large size and training time of models such as BERT ( Kenton and Toutanova, 2019 ), GPT ( Radford et al., 2018 ), and PaLM ( Chowdhery et al., 2023 ), as these models are not trained to explicitly make decisions.

Many recent approaches have applied GMs and their component structures (such as transformers Chen et al., 2021 or variational autoencoders Higgins et al., 2017 ), directly to decision making, in machine learning research. In Kirsch et al. (2023) , the authors apply transformer models to learn generalizable behavior that can be applied in a variety of reinforcement learning (RL) domains, such as robotics ( Brohan et al., 2023 ), grid-based environments ( Li et al., 2022 ), and video games ( Reid et al., 2022 ).

Other approaches apply feedback to RL models through the use of LLMs ( McDonald et al., 2023 ; Wu et al., 2023 ), to provide a similar model learning experience as methods such as RL with human feedback ( Griffith et al., 2013 ), without the need to collect human judgements. Offline RL has also been investigated through the integration of LLMs to reduce the need for potentially computationally expensive online learning ( Shi et al., 2023 ). Beyond RL-based methods, some approaches draw some inspiration from cognitive architectures by using a similarity metric to a history of outputs to inform new choices such as the Generative Agents approach ( Park et al., 2023 ).

2.3 Integrations of generative models and cognitive models in decision making

Previous research has explored numerous instances of integrating GMs and cognitive models, but these efforts have often been confined to single domains such as language, visual processing, or motor control. Additionally, the integration of GMs and cognitive models has typically been done for a single task or set of closely related tasks, mainly used to address a specific limitation within a cognitive model. These related applications span a diverse range of domains, including prediction of human transfer of learning ( Malloy et al., 2023 ), phishing email detection ( Xu et al., 2022 ), motor control ( Taniguchi et al., 2022 ), auditory learning ( Beguš, 2020 ), and multi-modal learning ( Ivanovic et al., 2018 ).

Integrating GMs and cognitive models can be done in various ways: by replacing an existing functionality, enhancing a sub-module, or introducing a novel ability to the model. For example, LLMs have been proposed as potential knowledge repositories within cognitive models. These repositories can be accessed when relevant knowledge is required ( Kirk et al., 2023 ), similar to a human-generated repository of general knowledge such as ConceptNet ( Speer et al., 2017 ). In particular, ConceptNet has previously been integrated into a cognitive modeling framework for tasks such as answering questions ( Huet et al., 2021 ).

Another recent approach used LLMs to produce highly human-like interactions between agents in a multi-player game involving natural language communication ( Park et al., 2023 ). Although this model did not directly implement cognitive architectures, it did use inspiration from several architectures that were previously applied to multiplater games like Quakebot-SOAR ( Laird, 2001 ) and ICARUS ( Choi et al., 2007 ). This was done by incorporating a database of encodings of previously observed textual stimuli and then comparing them based on similarity ( Park et al., 2023 ). Human-like language generation has also been investigated by applying GM techniques ( Friston et al., 2020 ).

Outside the context of language models, some work has provided evidence for connections between human visual information processing and Generative Adversarial Networks (GANs) ( Goetschalckx et al., 2021 ). Another method applied VAEs to modeling working memory formation in a task that required identifying the type of fault in a geological education task ( Hedayati et al., 2022 ). In social science research, GMs have been applied on a range of tasks in replicating and reproducing well-studied phenomena in human social behavior ( Aher et al., 2023 ; Ziems et al., 2023 ). In Hedayati et al. (2022) , the authors employ a VAE to form representations used by a Binding Pool (BP) model ( Swan and Wyble, 2014 ) to predict the categorization of visual stimuli.

2.3.1 Categories of integrating generative models and cognitive models in decision making

Table 1 shows a selection of the most relevant previous approaches to the integration of GM and cognitive models of decision making and learning. A longer version of this analysis of previous methods is included in the Supplementary material , including some of the applications of GMs in decision science or machine learning that did not directly utilize cognitive modeling or did not predict human behavior.

www.frontiersin.org

Table 1 . Comparison of previous applications of integrating GMs into cognitive models based on our proposed categorization.

Previous approaches are categorized based on the following features: (1) Generative Actions: whether the GM is used to generate the actions executed by the agent; (2) Generative Memories: Whether the memory representations used by the cognitive model are generated by a GM; (3) Stimuli Type: the types of stimuli the GM is capable of processing; (4) Cognitive Model Type: the type of cognitive model that is used as a base for integration; (5) GM Type: the type of GM that is integrated into the cognitive model; and (6) GM Training: Whether the GM is pre-trained on a large existing corpus, as is done in foundation models, or trained in a tailored manner to solve a specific modeling task.

These features for evaluating existing models are motivated in part by The Common Model of Cognition ( Laird et al., 2017 ), which describes the commonalities that cognitive architectures such as SOAR and ACT-R have in terms of their connections of different cognitive faculties. The common model of cognition reviews the history of cognitive model comparisons, based on their method of producing actions, memories, types of perception items, and how these faculties were connected.

Mitsopoulos et al. (2023b) propose an integration of GMs into their “psychologically valid agent” framework, which is rooted in ACT-R and IBLT. This framework has been instrumental in modeling and predicting COVID masking strategies, as demonstrated in their study on this topic ( Mitsopoulos et al., 2023a ). Another architecture, CogNGen ( Ororbia and Kelly, 2023 ), incorporates MINERVA 2 ( Hintzman, 1984 ) as a short-term memory module while performing other cognitive faculties using both predictive coding ( Rao and Ballard, 1999 ) and neural generative coding ( Ororbia and Kifer, 2022 ). The efficacy of this architecture has been demonstrated in various grid-world tasks ( Chevalier-Boisvert et al., 2018 ), demonstrating improved success in challenging escape-room style environments.

Connecting cognitive models with GMs to produce memory representations of decision making tasks has been explored in Malloy et al. (2023) , which compared Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs) and Visual Transformers (ViTs), in their ability to integrate with an IBL model. This work was inspired by previous applications of GMs in modeling biological decision making, such as Higgins et al. (2021) . Another approach which has incorporated LLMs with instance based learning was presented in Xu et al. (2022) , which involved LLM model representations of phishing emails used to predict human decision making in an email categorization task.

3 Proposed model

3.1 generation informed by generative environment representations (ginger).

In this work, we propose a method that integrates GMs into both the action and memory generation of a cognitive agent based on IBLT. This integration of GMs and IBL models can process either textual or visual information which is achieved by leveraging Variational Auto-Encoders or Large language Models. The result is a method of Generation INformed by Generative Environment Representations (GINGER).

In Figure 1 , we outline a general schematic of our proposed GINGER model. The first step of this process is for the GM model input to be processed by the model. In the experiments used for this work, this includes textual and visual information, but could be applied to others. From this input, the GM produces some model output and representations of the model input that is used as the memory of the GINGER model. This is used by the cognitive model, either as a part or as the whole of the state representation. From these two action prediction methods, the GINGER model produces two action outputs, which are resolved based on the specifics of the environment, such as averaging for utility prediction.

www.frontiersin.org

Figure 1 . Comparison of our proposed GINGER model ( bottom right ) and an ablation of three alternative IBL based models. In the top-left is the basic IBL model which predicts decision making in visual or text based tasks in terms of hand-crafted attributes to produce actions. In the bottom-left is the Generation INformed by IBL model, which makes predictions of actions using a neural network that takes as input generative model representations and is trained according to an IBL model (dashed line). The top-right shows the Generative Environment Representation model, which makes predictions using an IBL model that uses features defined by the generative model. Finally, the full GINGER model combines these two approaches, predicting actions by evenly weighing GINIBL and GERIBL action predictions.

There are two optional connections between GMs and cognitive models that are not investigated in this work and are instead left for future research. The first is the connection from the model output to the action being performed. While the generation of utility predictions is always informed by cognitive model predictions (by training the actions based on cognitive model predictions), it is also possible to include the GM output (text, motor commands, etc.) as the whole or a part of the action performed. Secondly, the cognitive model and GM can optionally be connected from the cognitive model into the GM input, such as by using predictions from the cognitive model as a part of the GM input (e.g., as the prompt of a LLM) to inform how representations and outputs should be formed.

3.1.1 Generative actions

The first part of the GINGER model name, “generation informed by” refers to the sharing of utility predictions made by the cognitive model when training the utility prediction of the generative model. Action generation is accomplished by directly generating utility predictions that are used in decision making tasks to determine the action with the highest utility based on a specific stimulus. This can be achieved in two different ways depending on whether the GM is a pre-trained foundation model or an ad-hoc trained model for a specific task.

In the case of ad-hoc trained models, the models themselves have been adjusted to generate utility predictions and are trained using the cognitive model. For instance, a β-Variational Autoencoder (β-VAEs) model which typically produces reconstructions of original stimuli can be adjusted to additionally predict utility, as was done in previous methods ( Higgins et al., 2017 ; Bates and Jacobs, 2020 ; Lai and Gershman, 2021 ; Malloy et al., 2022a ). Then, instead of training the model to predict actions based on reward observations from the environment, it is trained to match the predictions of the cognitive model. β-VAEs are trained to produce as accurate reconstructions as possible, given the size of the latent representation and its informational complexity, measured by KL-divergence, which is penalized through the β parameter. This means that adjusting the β parameter to individual cognitive abilities can result in more human-like predictions of actions based on model representations ( Bates and Jacobs, 2019 , 2020 ; Malloy et al., 2022b ).

In the case of pre-trained or foundation models, the models cannot be easily adjusted after training prior to integration with cognitive models. For that reason, when integrating pre-trained LLMs or other foundation models, our GINGER approach uses representations learned by these models as input to a separate utility prediction neural network. The structure and precise training of these models is left to the discretion of cognitive modelers according to the demands of the learning task under investigation. In our work, we use a simple 2 layer fully connected network with 64 units to predict the utility associated with these representations. See the Supplementary material for more details on this training approach.

3.1.2 Generative memories

The second part of the GINGER name, “generative environment representations”, refers to the creation of stimuli representations that are created based on the requirements of the learning task to capture the stimuli type of interest. This reliance on representations formed by GMs allows for either total reliance on representations, or adding the representation as an additional feature. When applying these representations to IBL, we determine the similarity S ij in the calculation of the activation function (see Equation 1 ) through an integration of a similarity metric defined by the training of the GM Sim GM as follows:

Formally, GMs process some input x , which can be visual, textual, auditory, or multi-modal input, and produce some output y based on that input. During this generation, these models form representations of the input z that can vary in structure, such as the multivariate Gaussian distributions used by β-VAEs or word vector embedding used by LLMs. In our model, we consider the option or part of the option relevant for modeling k to be the input to the GM. This allows for the formation of representations z based on these options. The similarity of options can be instead calculated based on these representations of current options p ( z | k ) and representations of options stored in the IBL model memory p ( z i | k i ). The similarity of these representations is defined by the training method of the GM, used as a metric of similarity (Sim GM ).

In some GMs such as conversational LLMs, the output y is trained to match with subsequent textual tokens in a conversation or other language domain. In other types of GMs like Variational Autoencoders the models are trained such that the output y is as close to the input x as possible given the information constraint imposed by the model. These two types of models are used in our comparison of different methods of integrating GMs, but alternative GM structures and training methods can also be integrated with our proposed modeling approach.

The generation of internal representations is a requirement in a sense for GMs as they must form some representation z based on the input x in order to process it. As with the model output y , the structure of these internal representations z varies between different GMs. In the case of LLMs, these internal representations are structured as word vector embeddings. This allows for measures of similarity (Sim GM in Equation 4 ) based on cosine similarity, which is conceptually similar to a high-dimensional distance metric. In the case of β-VAEs, these representations take the form of high-dimensional Gaussian distributions which are sampled from and fed through the subsequent layers of the model to form the reconstructed version of the original stimulus. With these types of representations, it is possible to measure similarity in terms of the KL-divergence of these representations.

In both cases, these GMs provide a meaningful representation of the model input, as well as a method of comparing these representations to other inputs. This is highly relevant for integration with an IBL method since the similarity of instances needs to be calculated to determine a memory activation, which is easily achieved through the use of the existing similarity metric required by the training of the GM itself. The next sections on generative action production and generative memory production will further detail how the representations formed by GMs are used in the IBL cognitive model, as well as how the IBL and GM are integrated in an interdependent manner that affords improvements to both models.

4 Model ablation

This work proposes a comparison of different methods of integrating GMs into cognitive models or architectures, through an ablation study comparing the categorizations described in Table 1 . To do this, we use the Instance-Based Learning (IBL) model of dynamic decision making ( Gonzalez et al., 2003 ). As opposed to a comparison of our proposed model against a highly similar model that instead is based on a different cognitive model or GM, or has a different method of integrating GMs and cognitive models, we are interested in providing insight to cognitive modelers interested in applying GMs to their own approaches, and as such adopt an ablation analysis of GINGER.

This ablation is based on the two key features of GINGER, the ‘Generative Environment Representations' which are related to the generation of cognitive model memory representations, and the ‘Generation Informed' by cognitive models, allowing for the actions selected by GMs to take information from cognitive models. Ablating away the generative environment representations results in a model that only uses generation informed by cognitive models (GIN). Ablating away the generation informed by cognitive models results in a model that only uses generative environment representations (GER). Finally, ablating both away results in the baseline Instance Based Learning (IBL) model which makes predictions using hand-crated features of tasks.

These four models (GIN, GER, GINGER, and IBL) form the baseline for our ablation comparisons in three experimental contexts involving different types of stimuli and complexities. The following sections detail these experiments as well as comparisons of the performance of the proposed model and the ablated versions. Participant data from these experiments and all trained models, modeling result data, and code to replicate figures is collected into a single OSF repository. 1

4.1 Contextual bandit task

The experiment was originally conducted at the Niv Neuroscience Lab at Princeton University ( Niv et al., 2015 ). Participants were presented with three options, each distinguished by a unique combination of shape, color, and texture. Shapes included circular, square, and triangular forms; colors ranged from yellow, red, and green; and textures were dotted, wavy, and hatched (see Figure 2A ). In every trial of the task, all of the 9 possible features appeared once within each option, ensuring that there will always be an option of each color, shape and texture. The features within the options were randomized to prevent repetitions in each position (left, middle, right). Participants had 1.5 seconds to make their selection, followed by a brief display (0.5 seconds) of the chosen option and the feedback showing the point reward (0 or 1). Then a blank screen was displayed for 4–7 seconds before the next stimulus.

www.frontiersin.org

Figure 2 . In (B–D) , Blue is the IBL model, Orange is the Generative Environment Representation IBL model, Green is the Generation Informed by IBL model and Red is the full Generation INformed by Generative Environment Representation IBL model. (A) Example of the stimuli shown to participants when making a decision on which of the features is associated with a higher probability of receiving a reward. (B) Schematic of the input of a single stimuli option into the generative and cognitive model making up the full GINGER model. The colored lines indicate the remaining connections of the ablated versions of the models. (C) Learning rate comparison of human participants and four ablated model versions in terms of probability of correctly guessing the option containing the feature of interest. (D) Average model difference to participant performance calculated by mean residual sum of squares for each participant. Error bars represent 95% confidence intervals.

During a single episode of the task, one of the nine features is selected as the feature of interest, and selecting the option with that feature increases the likelihood of receiving a reward. Episodes lasted approximately 20-25 trials before transitioning to a new feature of interest. The reward in this task is probabilistic, and selecting the feature of interest results in a 75% chance of receiving a reward of 1 and a 25% chance of receiving a reward of 0. When selecting one of the two options without the feature of interest resulted in a 25% chance of receiving a reward of 1 and a 75% chance of observing a reward of 0. Given the three possible options, the base probability of selecting the option with the feature of interest was 1/3.

4.1.1 Cognitive modeling

The contextual bandit task serves as a benchmark to compare the three approaches to integrating GMs into cognitive decision-making models. This simple task is useful to ensure that all integrations of GMs in cognitive modeling accurately capture human learning in basic learning scenarios. In Figure 2B , we present a visual representation of the GINGER model, which uses visual stimuli associated with one of the three options as input. First, this stimulus is fed into the GM. In this task, a modified version of a β-Variational Autoencoder is used to further predict the utility associated with stimuli based on the internal representations generated by the GM.

For the baseline IBL model, choice features consisted of shape, color, and texture. For each type of feature, the similarity metric was defined as 1 for identical features and 0 for all other features. The GER model used the β-VAE model representation as an additional feature with a unique similarity metric. The similarity metric of this additional feature was the β-VAE model representation distribution KL-divergence. The GIN model used the baseline IBL model to predict utilities of stimuli options and trained the utility prediction network using these values. The full GINGER model combined these two approaches of the GIN and GER models in this task. All four ablation models used the same predefined parameters for noise, temperature, decay, and as mentioned previously.

4.1.2 Methods

The experimental methodology is reproduced from the original paper; for additional details, see Niv et al. (2015) . This study involved 34 participants (20 female, 14 male, 0 non-binary) recruited from Princeton University, all aged 18 or older. Data from 3 participants were incomplete and thus not analyzed, and another 6 participants were removed due to poor performance. Participants had a mean age of 20.9 years and were compensated at a rate of $20 per hour. This experiment was approved by the Princeton University Institutional Review Board. The experiment was not preregistered. Participant data is accessible on the Niv Lab website. 2

To evaluate the performance of the 4 model ablation of our proposed GINGER model, we compare the probability of a correct guess on each trial within an episode. Figure 2C shows the comparison between participant and model performance regarding the probability of selecting the option containing the feature of interest across trials 1–25. This graphical representation facilitates the visual comparison of the learning of which feature is associated with a higher probability of observing a reward, and the average performance at the end of each episode.

In addition to the trial-by-trial comparison of model and participant performance depicted in Figure 2C , our aim is to compare the overall similarity between them. This is done by measuring the difference in model performance with individual participant performance using the mean residual sum of squares RSS / n where n is the number of participants and R S S = ∑ i = 1 n ( y i - p ( x i ) ) 2 . This difference is calculated for each participant and trial within an episode and across all episodes in the experiment. These values are correlated with the Bayesian Information Criterion (BIC) calculated in terms of the residual sum of squares (RSS) as BIC = n ln( RSS / n )+ kln ( n ) since all four models have 0 fit parameters (all are default values). The resulting values are averaged across all participants and presented in Figure 2D . Error bars in Figure 2D denote the 95% confidence intervals of the model difference from participant performance across each participant and trial of the task.

4.1.3 Results

The initial comparison of model learning to participant behavior focuses on the probability of correct guesses as the trial number within increases, as shown in Figure 2C . Comparing the speed of learning to participants reveals that models that include the generative action selection (GIN and GINGER) demonstrate the fastest learning. Compared to the two versions of the GINGER model (IBL and GER) that do not make direct predictions of utility based on GM representations exhibit slower learning rates. This shows that in learning tasks that require fast updating of predicted utilities, directly predicting these values from GM representations and selecting actions accordingly results in more human-like learning progress.

The second set of results illustrated in Figure 2D , compares the average difference in model performance to participants performance. Among the four models compared, the GINGER model has the lowest deviation from participant performance and a performance difference similar to the GIN ablation model, which relies on predictions of utility derived from GM representations. The IBL and GER models, which make predictions based on hand-crafted stimuli features (IBL) and GM representations (GER), show the highest difference to participant performance. The unique feature of the GINGER model involves predictions of utility partially influenced by the GM-formed stimulus representation related to the IBL model's use of the features. However, by directly predicting utility based on representations, both the GINGER and GIN models are able to quickly update utility predictions.

In summary, the modeling results demonstrate that each approach to incorporating GMs in predicting human learning is viable, as none of the models performs worse than the IBL model, which does not use a GM. However, models that perform actions selected by the GM exhibit more human-like learning trends ( Figure 2C ) and a closer similarity to human learning ( Figure 2D ). While leveraging GM representations aims to improve generalization, the simplicity of this task imposes minimal demands on generalization, meaning that the speed of learning is more relevant in producing human-like learning. The next experiment paradigms will introduce an explicit generalization requirement for participants. This will enable a comparison of ablated models in a task where generalization performance is more important.

4.2 Transfer of learning task

This decision-making task involves learning the values associated with abstract visual stimuli and transferring that knowledge to more visually complex stimuli. Previous research comparing the IBL and the GER model demonstrated improved performance in transfer of learning tasks by introducing generative representations to the IBL model ( Malloy et al., 2023 ). The higher performance of the GER model and its closer resemblance to human performance compared to the standard IBL model, raises questions about how our proposed GIN and GINGER models compare in replicating human-like behavior in this transfer of learning task.

In this task, generalization performance is more relevant than learning speed in evaluating participants and cognitive models. This is due to the increase in task complexity over time. Initially, participants engaged in a contextual bandit task focused only on the shape feature ( Figure 3A Left). After 15 trials the task complexity increases with the introduction of the color feature ( Figure 3A Middle). Transitioning to the color learning task requires participants' ability to transfer knowledge from the shape learning task to determine the optimal option. This demands generalization from past experience to make future decisions in a related but not totally equivalent context. After these 15 trials of the color learning task, participants are introduced to the texture learning task ( Figure 3A Right) which is similar to the structure of the first learning experiment ( Niv et al., 2015 ).

www.frontiersin.org

Figure 3 . In (B–F) , Blue is the IBL model, Orange is the Generative Environment Representation IBL model, Green is the Generation Informed by IBL model, and Red is the full Generation INformed by Generative Environment Representation IBL model. (A) Example stimuli of one block of 15 trials for the shape, color, and texture learning tasks, adding to a total of 45 trials. (B) Performance of the model against human participants on each of the three learning tasks shown in (A). (C) Comparison of GINIBL and human learning across the three learning tasks. (D) Comparison of GINGER and human learning across the three tasks. (E) Comparison of GERIBL and human learning across the three learning tasks. (F) Comparison of accuracy between model predicted learning and human performance calculated by mean residual sum of squares. Error bars represent 95% confidence intervals.

4.2.1 Cognitive modeling

The design of IBL baseline model features was identical to the first experiment, including the use of the shape, color, and texture features, baseline parameter values, and binary similarity metrics. One difference between this task and the previous one is that the GIN and GINGER utility prediction modules are only being trained using one portion of the data set at a time, first shape, then shape-color, then shape-color-texture. This means that predicting utility associated with a representation requires a high degree of generalization to adequately transfer from one task to the other.

4.2.2 Methods

160 participants (86 female, 69 male, 2 non-binary) were recruited online through the Amazon Mechanical Turk (AMT) platform. All participants were over the age of 18 and citizens of the United States of America. Participants had a mean age of 40.5 with a standard deviation of 11.3 years. Participants were required to have completed at least 100 Human Intelligence Tasks (HITs) on AMT with at least a 95% approval on completed HITs. Six of the 160 recruited participants failed to submit data or failed to complete the task within a 1 hour limit, and were excluded from analysis. All results and analysis are done using the remaining 154 participants.

Participants received a base payment of $4 with the potential to receive a bonus of up to $3 depending on their performance in the task. The mean time to complete the task was 16.9 minutes, with a standard deviation of 5.8 minutes. This experiment was approved by the Carnegie Mellon University Internal Review Board. The experiment protocol was preregistered on OSF. Experiment preregistration, participant data, analysis, model code, and a complete experiment protocol are available on OSF. 3 For a more complete description of experiment methods, see Malloy et al. (2023) .

Participant's performance in this task can be measured in their ability to transfer knowledge from one learning task to the subsequent learning tasks. Three commonly used metrics for performance in transferring learned knowledge to subsequent tasks are jumpstart, asymptotic, and episodic performance ( Taylor and Stone, 2009 ). Jump-start performance is defined as the initial performance of an agent on a target task. In the contextual bandit experiment used in this work, the jumpstart performance is calculated as the average of the first third observed utility in trials after the task switches. Asymptotic performance is defined as the final learned performance of an agent in a target task. In the transfer of learning experiment, the asymptotic performance is calculated as the average of the final three reward observations of participants. Episodic Performance is defined as the average performance over an episode; this measure is analogous to the total reward metric commonly used. This value is calculated as the average of the observed utility. These measures are used to compare model difference to participant behavior, and averaged over to produce the results shown in Figure 3F .

4.2.3 Results

To assess transfer of learning for the three measures, we averaged the similarities between human and model performance in jumpstart, episodic, and asymptotic performance in the three learning tasks. This aggregation yields a single metric, providing a holistic evaluation of the fit between the model and human transfer of learning performance. This similarity is based on average residual sum of squares RSS / n calculations for each of the three measures of transfer of learning measures. This integrated measure of congruence is shown in Figure 3F , to facilitate a comparison across the four models. Importantly, these accuracy metrics are computed for each participant individually, ensuring the understanding of performance across the sample. Additionally, the same connection between average residual sum of squares and BIC can be made as in the first experiment, since again there are no fit parameters.

As in the contextual bandit task in Experiment 1, we first compare the four models by their speed of learning, and the similarity to human performance, shown in the four plots ( Figures 3B – E ). This is done for each of the three learning tasks that increase in complexity as the experiment progresses. This comparison shows that the GER and GINGER models have learning trend more similar to humans in the color and texture tasks compared to the IBL and GIN models. This is likely because of the fact that the representations of visual information used by the GER and GINGER models as features of the IBL model allow for improved generalization, which is a key feature of improving transfer of learning ability.

Comparatively, the IBL and GIN models show more human-like learning on the simple shape learning task before the transfer of learning ability becomes relevant. This mirrors the human-like learning achieved by these two models in the first experiment, but because the majority of this task relies more on generalization capability rather than the speed of learning, the end result is that the GER and GINGER models are better fits to human learning averaged across the entire experiment.

The next comparison of model performance is shown in Figure 3F which captures an aggregate average of the three transfer of learning metrics previously discussed. Overall, the IBL model is far more distant from human performance than the three ablation models. The GER and GIN models are about equally distant from human performance, as the GER model has relatively higher performance on the two transfer tasks while GIN model had better performance on the first task. The GINGER model, which combines the more human-like behavior on the first task observed by the GIN model, and the two transfer tasks by the GER model, produces the most human-like learning on average.

4.3 Phishing identification task

Phishing messages are emails that contain attempts to obtain credentials, transmit malware, gain access to internal systems, or cause financial harm ( Hong, 2012 ). An important aspect of preventing these phishing emails from negatively impacting individuals and companies is through training programs to help people identify phishing emails more successfully ( Singh et al., 2020 ). Cognitive models have been applied to predict and improve email phishing training ( Singh et al., 2019 ; Cranford et al., 2021 ). The phishing email identification task is used to compare the ablation of our proposed model in how relevant each of its attributes is in conditions that include complex natural language stimuli.

We use a data set of human judgments on the phishing identification task ( Figure 4A ) that was originally collected in Singh et al. (2023) and is publicly available. The phishing identification task involved the presentation of phishing or safe emails. Participants indicated their guess as to whether the emails were safe or dangerous, their confidence rating, as well as a recommendation of an action to take when receiving this email, such as checking the link, responding to the email, opening an attachment, etc. ( Singh et al., 2023 ). These details are described more fully in the section on experimentation methods.

www.frontiersin.org

Figure 4 . (A) Example email shown to participants and multiple choice and confidence selection from Singh et al. (2023) . (B) Example of expert human coding of email features. In (C–G) , blue is the IBL model, orange is the GER model, green is the GIN model, and red is the GINGER model. Darker shading represents phishing emails, and lighter shading represents non-phishing ham emails. The four graphs on the bottom left show the performance of the models compared to human participants in correctly predicting ham and phishing emails. The error bars represent 95% confidence intervals. (C) IBL model performance compared to human participants. (D) Performance of the GER model compared to human participants (E) GIN model performance compared to human participants. (F) GINGER model performance compared to human participants. (G) Average difference in human participant performance on both ham and phishing email identification across each of the four models. This is calculated by the mean residual sum of squares. Error bars represent 95% confidence intervals.

4.3.1 Cognitive modeling

The baseline IBL model for this task used binary hand-crafted features coded by human experts ( Figure 4B ) including mismatched sender, requesting credentials, urgent language, making an offer, suspicious subject, and a link mismatch. The other main difference in cognitive modeling of this experiment with the previous two is that a LLM model is used to form the representations used both as a feature of the task and directly trained to predict utilities.

These representations are embeddings of textual inputs formed by the OpenAI GPT based model “text-embedding-ada-002”. At the time of writing, this was the only text embedding model available on the OpenAI Application Programming Interface. This model generates representations of text inputs in the form of a vector of 1536 floating point numbers. The IBL similarity metric for these representations is calculated with the sklearn python package cosine similarity function, a commonly used metric when comparing sentence embeddings from large language models ( Li et al., 2020 ).

Due to the high baseline performance of humans in this task, as a result of their experience in reading emails and their experience with phishing warnings, we use a random sampling of 10% of emails to pre-train all models under comparison. This allows for a more realistic comparison of the performance of these models in reflecting human decision making in this type of task.

4.3.2 Methods

The experimental methods for this analysis are detailed in full in Singh et al. (2023) . 228 participants were recruited online through the Amazon Mechanical Turk (AMT) platform. Participants were required to have completed at least 100 Human Intelligence Tasks (HITs) on AMT with at least a 90% approval rate. All participants were over the age of 18. Participants have a mean age of 36.8 with a standard deviation of 11.5 years. Four of the 228 participants failed attention checks and were excluded from the analysis. Participants were paid a base rate of $6 with the potential to receive a bonus of up to $3 depending on their performance. The mean time to complete this experiment was 35 minutes.

Experiment data were made available on request from the original authors and obtained by us after request. This experiment data included participant judgments in the task as well as the 239 emails that were classified by the researchers based features that were relevant to determine if the emails were phishing, referred to as spam, or non-phishing, referred to as ham. These features included whether the sender of the email matched the claimed sender; whether or not the email made a request of credentials; whether or not the subject line was suspicious; whether an offer was made in the email body; whether the tone of the email used urgent language; and finally whether a link in the email matched the text of the link. Textual data and email features are available on OSF 4 and participant data are contained in our previously mentioned combined repository (see text footnote 1 ).

Participants' performance in this task can be measured in their ability to correctly identify phishing emails as phishing, and ham emails as ham. Splitting this classification by the type of email shown to participants allows for a comparison between the different amounts of phishing and ham emails that were shown to participants during the experimental conditions. Ideally, an accurate model of human learning in this task would be similar to human data for each of these types of categorization.

Accurately reflecting differences in experience with the identification of phishing emails from participants can be a difficult task for cognitive models. In IBL models, this could be done by using a set of different models with varied initial experiences with phishing and ham emails, which would result in differences in accuracy for categorizing these two types of email. However, to highlight the differences in ablation analysis, we do not differentiate the experience of models individually to better fit human performance, and instead use the same base-level experience across all models under comparison.

4.3.3 Results

In this experiment, each of the four ablation models predicted the same emails shown to participants, in the same order. The ablation models used the values of the baseline parameters for all the parameters of the IBL model. Therefore, the total number of model runs was equal to the number of participants for each type of model ablation. Models were trained using a reward function of 1 point for correct categorization and 0 points for incorrect categorization. For the GIN and GER models, the utility prediction based on representations was done using the representation input of size 1536 followed by two layers of size 128 and finally an output of size 1. More details of this are included in the Supplementary material .

The performance of the GIN model is unique in that it predicts similarly high performance in the early and later trial periods for both types of emails ( Figure 4E ). This direct utility prediction based on representations can approach high accuracy from only a few examples. This is true for both phishing and ham emails, while humans display lower accuracy overall, and a large difference between accuracy in these two types of emails. It would be possible to reduce this training for the GIN model alone, however, this would mean that the GIN model is using less experience than the other models.

In general, taking an approach to fitting the training time of generative actions to human performance can be difficult for large representations sizes, as it requires multiple training periods that are computationally expensive. This is demonstrated by the difference in similarity with the results of human learning demonstrated by the GIN model. This is a key difference between the phishing email identification task, where the representation size is 1536, compared to the earlier tasks that used β-VAE model representations of size 9. However, these representation sizes are not considered to be a variable or fit parameter in any of the models. Thus, the same connection between the average residual sum of squares and BIC can be made as in the first experiment, since again there are no fit parameters.

The GINGER model has the highest accuracy to human performance ( Figures 4C – G ), as a result of it making predictions using both the GM and the email representations that are fed into an IBL model. This demonstrates the benefits of combining generative actions and generative memory formation, for tasks with complex natural language stimuli. This is especially true for tasks like this one where participants are likely to have previous experience from which they are drawing, as opposed to the two previous abstract tasks. This is because optimizing the GIN model alone to fit human participant performance is computationally expensive and the IBL and GER models are not able to learn the task quickly enough.

5 Discussion

This research proposes a model that demonstrates the benefits of integrating GMs and cognitive modeling and their potential applications. These techniques open new avenues in the investigation of human learning that were previously inaccessible to cognitive modelers. GAI has had a significant impact across many fields of study, motivating its application in cognitive modeling, especially in decision-making processes. However, before integrating GMs into cognitive models to represent and predict human decision making, it is important to investigate the relative impact that different methods of integration have on different tasks.

The GINGER model proposed in this work demonstrates the integration of GMs with cognitive models of decision making, such as IBL. Our approach demonstrates the accurate prediction of human learning and decision making across three distinct experimental paradigms, directly compared to real human decisions. These experiments encompass a diverse range of stimuli, spanning visual cues and natural language that varied in complexity, from learning abstract rewards to detecting phishing attempts in emails. The application of our GINGER model across these domains resulted in an improvement over traditional cognitive modeling techniques, clearly demonstrating the potential benefits of incorporating GMs into cognitive modeling frameworks.

In addition to our GINGER model, we developed a categorization approach that can be used to compare and relate different approaches to integrating GMs into cognitive modeling of decision making. Before current research, there were many applications of GMs in cognitive modeling, although typically this was done in a case-by-case manner to allow for use in a specific learning domain. Here, we compare the integration of GMs in cognitive modeling in six dimensions, including action generation, memory generation, stimuli, cognitive model type, generative model type, and training method.

This categorization motivated an ablation study to compare our proposed model with alternative versions that contained generative actions and memory and did not contain them. Additionally, the three experiment paradigms were chosen to further test the remaining categories of our analysis, to investigate the varied stimuli types, GM types, and training methods. The result is a comparison of model performance that spans many degrees of our proposed categorization. The first experimental comparison demonstrated faster and more human-like learning from models that produced decision predictions directly by GMs (GIN and GINGER). However, this faster learning was observed in a relatively simple task, raising the question of the potential benefits of GM memory formation (GER and GINGER) in more complex environments.

The second comparison of models through experimentation extended the analysis in the first experiment by introducing a generalization task that required transfer of learning. This is a useful comparison for our proposed model, as one of the often cited benefits of applying GMs to cognitive models is improved generalization. This raised the question of which method of integrating GMs would be more relevant for improving performance and the similarity to human participants in this task. The high generalizability of models that utilized GM memory representations confirmed this expectation, demonstrating the ability of cognitive models that integrate GM representations in reflecting human-like generalization.

In the third and final experimentation, we investigated the potential differences of our proposed modeling method when handling complex natural language in a phishing identification task. Comparing the performance of models with that of human participants in this task demonstrated a large difference between categorization accuracy for phishing and ham emails, which was difficult for the models to replicate. Previously, only cognitive models that used GM representations of textual information, such as phishing emails, have been used to predict human-like learning, but these results demonstrate that a combination of directly predicting values and GM representations is best for this type of task.

Overall, these results from the model comparison provide insight into the design of integration of generative modeling methods with cognitive models. Each of our experiments investigated a different area of human learning and decision making modeling and made important conclusions about how best to integrate GMs. Although the applications of our model comparison are broad, they do not represent every possible application of GMs to cognitive modeling. As demonstrated by our categorization, there are remaining stimuli types, generative models, and cognitive models that could be compared. One potential future area of research would be the application of multi-modal models and a comparison of learning with humans engaging in a multi-modal decision task.

While GMs have demonstrated a high degree of usefulness in cognitive modeling, the impact that they have on society at large has been called into question, as noted previously. One potential issue with the use of a model similar to one of the ones we used in the experiment on predicting how participants respond to phishing emails is that it could be used to improve the quality of phishing email campaigns. This is exacerbated by the potential to use GMs themselves to generate phishing emails. One potential future area of research is investigating how we can best mitigate these potential GM missuses. This could be done by tailoring phishing email education to the individual through the application of a model similar to the one we propose, which can allow students to experience phishing emails generated by GMs and learn from them.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://osf.io/m6qc4/ .

Ethics statement

The studies involving humans were approved by Carnegie Mellon University Institutional Review Board. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

TM: Formal analysis, Software, Visualization, Writing – original draft, Writing – review & editing. CG: Conceptualization, Funding acquisition, Methodology, Resources, Supervision, Writing – original draft, Writing – review & editing.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was sponsored by the Army Research Office and accomplished under Australia-US MURI Grant Number W911NF-20-S-000. Compute resources and GPT model credits were provided by the Microsoft Accelerate Foundation Models Research grant “Personalized Education with Foundation Models via Cognitive Modeling”.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2024.1387948/full#supplementary-material

1. ^ https://osf.io/m6qc4/

2. ^ https://nivlab.princeton.edu/data

3. ^ https://osf.io/mt4ws/

4. ^ https://osf.io/sp7d6/

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., et al. (2023). Gpt-4 technical report. arXiv [Preprint] . Available online at: https://arxiv.org/abs/2303.08774

Google Scholar

Aggarwal, P., Thakoor, O., Jabbari, S., Cranford, E. A., Lebiere, C., Tambe, M., et al. (2022). Designing effective masking strategies for cyberdefense through human experimentation and cognitive models. Comp. Secur . 117:102671. doi: 10.1016/j.cose.2022.102671

Crossref Full Text | Google Scholar

Aher, G. V., Arriaga, R. I., and Kalai, A. T. (2023). “Using large language models to simulate multiple humans and replicate human subject studies,” in International Conference on Machine Learning (New York: PMLR), 337–371.

Anderson, J. R., and Lebiere, C. J. (2014). The Atomic Components of Thought . London: Psychology Press.

Anderson, J. R., Matessa, M., and Lebiere, C. (1997). Act-r: a theory of higher level cognition and its relation to visual attention. Human-Comp. Interact . 12, 439–462. doi: 10.1207/s15327051hci1204_5

Bandi, A., Adapa, P. V. S. R., and Kuchi, Y. E. V. P. K. (2023). The power of generative AI: a review of requirements, models, input-output formats, evaluation metrics, and challenges. Future Internet 15:260. doi: 10.3390/fi15080260

Bates, C., and Jacobs, R. (2019). Efficient data compression leads to categorical bias in perception and perceptual memory. CogSci . 43, 1369–1375.

PubMed Abstract | Google Scholar

Bates, C. J., and Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychol. Rev . 127:891. doi: 10.1037/rev0000197

PubMed Abstract | Crossref Full Text | Google Scholar

Beguš, G. (2020). Generative adversarial phonology: modeling unsupervised phonetic and phonological learning with neural networks. Front. Artif. Intellig . 3:44. doi: 10.3389/frai.2020.00044

Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). “On the dangers of stochastic parrots: Can language models be too big?,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (New York, NY: Association for Computing Machinery), 610–623.

Bhui, R., Lai, L., and Gershman, S. J. (2021). Resource-rational decision making. Curr. Opin. Behav. Sci . 41, 15–21. doi: 10.1016/j.cobeha.2021.02.015

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., et al. (2021). On the opportunities and risks of foundation models. arXiv [Preprint] . Available online at: https://arxiv.org/abs/2108.07258

Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., et al. (2023). “Do as I can, not as I say: Grounding language in robotic affordances,” in Conference on Robot Learning (New York: PMLR).

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., et al. (2020). Language models are few-shot learners. Adv. Neural Inf. Process. Syst . 33, 1877–1901. Available online at: https://arxiv.org/abs/2005.14165

Bugbee, E. H., and Gonzalez, C. (2022). “Making predictions without data: How an instance-based learning model predicts sequential decisions in the balloon analog risk task,” in Proceedings of the Annual Meeting of the Cognitive Science Society (Seattle, WA: Cognitive Science Society), 1–6.

Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P. S., et al. (2023). A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv [Preprint] . Available online at: https://arxiv.org/abs/2303.04226

Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., et al. (2021). Decision transformer: Reinforcement learning via sequence modeling. Adv. Neural Inf. Process. Syst . 34, 15084–15097. Available online at: https://arxiv.org/abs/2106.01345

Chevalier-Boisvert, M., Willems, L., and Pal, S. (2018). Minimalistic Gridworld Environment for OpenAI Gym . Available online at: https://github.com/maximecb/gym-minigrid (accessed August 10, 2023).

Choi, D., Konik, T., Nejati, N., Park, C., and Langley, P. (2007). A believable agent for first-person shooter games. Proc. AAAI Conf. Artif. Intellig. Interact. Digit. Entertainm . 3, 71–73. doi: 10.1609/aiide.v3i1.18787

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., et al. (2023). Palm: scaling language modeling with pathways. J. Mach. Learn. Res . 24, 1–113. Available online at: https://arxiv.org/abs/2204.02311

Cranford, E. A., Gonzalez, C., Aggarwal, P., Cooney, S., Tambe, M., and Lebiere, C. (2020). Toward personalized deceptive signaling for cyber defense using cognitive models. Top. Cogn. Sci . 12, 992–1011. doi: 10.1111/tops.12513

Cranford, E. A., Lebiere, C., Rajivan, P., Aggarwal, P., and Gonzalez, C. (2019). “Modeling cognitive dynamics in end-user response to phishing emails,” in Proceedings of the 17th ICCM (State College, PA: Applied Cognitive Science Lab).

Cranford, E. A., Singh, K., Aggarwal, P., Lebiere, C., and Gonzalez, C. (2021). “Modeling phishing susceptibility as decisions from experience,” in Proceedings of the 19th Annual Meeting of the ICCM (State College, PA: Applied Cognitive Science Lab), 44–49.

Friston, K. J., Parr, T., Yufik, Y., Sajid, N., Price, C. J., and Holmes, E. (2020). Generative models, linguistic communication and active inference. Neurosci. Biobehav. Rev . 118, 42–64. doi: 10.1016/j.neubiorev.2020.07.005

Gershman, S. J. (2019). The generative adversarial brain. Front. Artif. Intellig . 2:18. doi: 10.3389/frai.2019.00018

Goetschalckx, L., Andonian, A., and Wagemans, J. (2021). Generative adversarial networks unlock new methods for cognitive science. Trends Cogn. Sci . 25, 788–801. doi: 10.1016/j.tics.2021.06.006

Gonzalez, C. (2013). The boundaries of instance-based learning theory for explaining decisions from experience. Prog. Brain Res . 202, 73–98. doi: 10.1016/B978-0-444-62604-2.00005-8

Gonzalez, C. (2023). Building human-like artificial agents: A general cognitive algorithm for emulating human decision-making in dynamic environments. Persp. Psychol. Sci . 2023, 17456916231196766. doi: 10.1177/17456916231196766

Gonzalez, C., and Dutt, V. (2011). Instance-based learning: integrating sampling and repeated decisions from experience. Psychol. Rev . 118:523. doi: 10.1037/a0024558

Gonzalez, C., Lerch, J. F., and Lebiere, C. (2003). Instance-based learning in dynamic decision making. Cogn. Sci . 27, 591–635. doi: 10.1207/s15516709cog2704_2

Griffith, S., Subramanian, K., Scholz, J., Isbell, C. L., and Thomaz, A. L. (2013). Policy shaping: Integrating human feedback with reinforcement learning. Adv. Neural Inf. Process . Syst . 26, 1–9. doi: 10.5555/2999792.2999905

Hedayati, S., O'Donnell, R. E., and Wyble, B. (2022). A model of working memory for latent representations. Nat. Human Behav . 6, 709–719. doi: 10.1038/s41562-021-01264-9

Higgins, I., Chang, L., Langston, V., Hassabis, D., Summerfield, C., Tsao, D., et al. (2021). Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nat. Commun . 12:6456. doi: 10.1038/s41467-021-26751-5

Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., et al. (2016). “Beta-vae: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations , 1–6. Available online at: https://openreview.net/forum?id=Sy2fzU9gl

Higgins, I., Pal, A., Rusu, A., Matthey, L., Burgess, C., Pritzel, A., et al. (2017). “Darla: improving zero-shot transfer in reinforcement learning,” in International Conference on Machine Learning (New York: PMLR), 1480–1490.

Hintzman, D. L. (1984). Minerva 2: a simulation model of human memory. Behav. Res. Methods, Instrum. Comp . 16, 96–101. doi: 10.3758/BF03202365

Hong, J. (2012). The state of phishing attacks. Commun. ACM 55, 74–81. doi: 10.1145/2063176.2063197

Huet, A., Pinquié, R., Véron, P., Mallet, A., and Segonds, F. (2021). Cacda: A knowledge graph for a context-aware cognitive design assistant. Comp. Indust . 125:103377. doi: 10.1016/j.compind.2020.103377

Ivanovic, B., Schmerling, E., Leung, K., and Pavone, M. (2018). “Generative modeling of multimodal multi-human behavior,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (Madrid: IEEE), 3088–3095.

Kenton, J. D. M.-W. C., and Toutanova, L. K. (2019). “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT , 4171–4186. Available online at: https://arxiv.org/abs/1810.04805

Kim, S., Lee, S., Song, J., Kim, J., and Yoon, S. (2018). Flowavenet: a generative flow for raw audio. arXiv [Preprint] . Available online at: https://arxiv.org/abs/1811.02155

Kirk, J. R., Wray, R. E., and Laird, J. E. (2023). “Exploiting language models as a source of knowledge for cognitive agents,” in arXiv (Washington, DC: AAAI Press).

Kirsch, L., Harrison, J., Freeman, C. D., Sohl-Dickstein, J., and Schmidhuber, J. (2023). “Towards general-purpose in-context learning agents,” in NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models , 1–6. Available online at: https://openreview.net/forum?id=zDTqQVGgzH

Lai, L., and Gershman, S. J. (2021). Policy compression: an information bottleneck in action selection. Psychol. Learn. Motivat . 74, 195–232. doi: 10.1016/bs.plm.2021.02.004

Laird, J. E. (2001). “It knows what you're going to do: adding anticipation to a quakebot,” in Proceedings of the Fifth International Conference on Autonomous Agents (New York, NY: Association for Computing Machinery), 385–392.

Laird, J. E., Lebiere, C., and Rosenbloom, P. S. (2017). A standard model of the mind: Toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. Ai Magazine 38, 13–26. doi: 10.1609/aimag.v38i4.2744

Laird, J. E., Newell, A., and Rosenbloom, P. S. (1987). Soar: An architecture for general intelligence. Artif. Intell . 33(1):1–64. doi: 10.1016/0004-3702(87)90050-6

Lejarraga, T., Dutt, V., and Gonzalez, C. (2012). Instance-based learning: a general model of repeated binary choice. J. Behav. Decis. Mak . 25, 143–153. doi: 10.1002/bdm.722

Li, B., Zhou, H., He, J., Wang, M., Yang, Y., and Li, L. (2020). On the sentence embeddings from pre-trained language models. arXiv [Preprint] . Available online at: https://arxiv.org/abs/2011.05864

Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., Fan, L., et al. (2022). Pre-trained language models for interactive decision-making. Adv. Neural Inf. Process. Syst . 35, 31199–31212. Available online at: https://arxiv.org/abs/2202.01771

Malloy, T., Du, Y., Fang, F., and Gonzalez, C. (2023). “Generative environment-representation instance-based learning: a cognitive model,” in Proceedings of the 2023 AAAI Fall Symposium on Integrating Cognitive Architectures and Generative Models (Washington, DC: AAAI Press), 1–6.

Malloy, T., Klinger, T., and Sims, C. R. (2022a). “Modeling human reinforcement learning with disentangled visual representations,” in Reinforcement Learning and Decision Making (RLDM ) (Washington, DC: Association for Research in Vision and Ophthalmology).

Malloy, T., and Sims, C. R. (2022). A beta-variational auto-encoder model of human visual representation formation in utility-based learning. J. Vis . 22:3747. doi: 10.1167/jov.22.14.3747

Malloy, T., Sims, C. R., Klinger, T., Riemer, M. D., Liu, M., and Tesauro, G. (2022b). “Learning in factored domains with information-constrained visual representations,” in NeurIPS 2022 Workshop on Information-Theoretic Principles in Cognitive Systems , 1–6. Available online at: https://arxiv.org/abs/2303.17508

McDonald, C., Malloy, T., Nguyen, T. N., and Gonzalez, C. (2023). “Exploring the path from instructions to rewards with large language models in instance-based learning,” in Proceedings of the 2023 AAAI Fall Symposium on Integrating Cognitive Architectures and Generative Models (Washington DC: AAAI Press), 1–6.

Mitsopoulos, K., Baker, L., Lebiere, C., Pirolli, P., Orr, M., and Vardavas, R. (2023a). Masking behaviors in epidemiological networks with cognitively-plausible reinforcement learning. arXiv [Preprint] . Available online at: https://arxiv.org/abs/2312.03301

Mitsopoulos, K., Bose, R., Mather, B., Bhatia, A., Gluck, K., Dorr, B., et al. (2023b). “Psychologically-valid generative agents: A novel approach to agent-based modeling in social sciences,” in Proceedings of the 2023 AAAI Fall Symposium on Integrating Cognitive Architectures and Generative Models (Washington DC: AAAI Press).

Morrison, D., and Gonzalez, C. (2024). PyIBL 5.1.1 Manual . Available online at: http://pyibl.ddmlab.com/ (accessed 18 March, 2024).

Navigli, R., Conia, S., and Ross, B. (2023). Biases in large language models: Origins, inventory and discussion. ACM J. Data Informat. Qual . 15, 1–21. doi: 10.1145/3597307

Nguyen, T. N., and Gonzalez, C. (2022). Theory of mind from observation in cognitive models and humans. Top. Cogn. Sci . 14, 665–686. doi: 10.1111/tops.12553

Nguyen, T. N., Phan, D. N., and Gonzalez, C. (2023). Speedyibl: a comprehensive, precise, and fast implementation of instance-based learning theory. Behav. Res. Methods 55, 1734–1757. doi: 10.3758/s13428-022-01848-x

Niv, Y., Daniel, R., Geana, A., Gershman, S. J., Leong, Y. C., Radulescu, A., et al. (2015). Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci . 35, 8145–8157. doi: 10.1523/JNEUROSCI.2978-14.2015

Ororbia, A., and Kelly, M. A. (2023). “A neuro-mimetic realization of the common model of cognition via hebbian learning and free energy minimization,” in Proceedings of the 2023 AAAI Fall Symposium on Integrating Cognitive Architectures and Generative Models (Washington: AAAI Press), 1–6.

Ororbia, A., and Kifer, D. (2022). The neural coding framework for learning generative models. Nat. Commun . 13(1):2064. doi: 10.1038/s41467-022-29632-7

Park, J. S., O'Brien, J., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. (2023). “Generative agents: Interactive simulacra of human behavior,” in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (New York, NY: Association for Computing Machinery), 1–22.

Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training (Preprint) .

Rao, R. P., and Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci . 2, 79–87. doi: 10.1038/4580

Reid, M., Yamada, Y., and Gu, S. S. (2022). Can wikipedia help offline reinforcement learning? arXiv [Preprint] .

Ren, H., and Ben-Tzvi, P. (2020). Learning inverse kinematics and dynamics of a robotic manipulator using generative adversarial networks. Rob. Auton. Syst . 124:103386. doi: 10.1016/j.robot.2019.103386

Shi, R., Liu, Y., Ze, Y., Du, S. S., and Xu, H. (2023). “Unleashing the power of pre-trained language models for offline reinforcement learning,” in NeurIPS 2023 Foundation Models for Decision Making Workshop , 1–6. Available online at: https://arxiv.org/abs/2310.20587

Singh, K., Aggarwal, P., Rajivan, P., and Gonzalez, C. (2019). “Training to detect phishing emails: Effects of the frequency of experienced phishing emails,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Los Angeles: SAGE Publications Sage CA).

Singh, K., Aggarwal, P., Rajivan, P., and Gonzalez, C. (2020). “What makes phishing emails hard for humans to detect?,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Los Angeles, CA: SAGE Publications Sage CA), 431–435.

Singh, K., Aggarwal, P., Rajivan, P., and Gonzalez, C. (2023). Cognitive elements of learning and discriminability in anti-phishing training. Comp. Secur . 127:103105. doi: 10.1016/j.cose.2023.103105

Speer, R., Chin, J., and Havasi, C. (2017). “Conceptnet 5.5: an open multilingual graph of general knowledge,” in Proceedings of the AAAI Conference on Artificial Intelligence (Washington, DC: AAAI Press), 1–6.

Sun, R. (2006). “The clarion cognitive architecture: extending cognitive modeling to social simulation,” in Cognition and Multi-Agent Interaction (Cambridge: Cambridge University Press), 79–99.

Swan, G., and Wyble, B. (2014). The binding pool: a model of shared neural resources for distinct items in visual working memory. Attent. Percep. Psychophys . 76, 2136–2157. doi: 10.3758/s13414-014-0633-3

Taniguchi, T., Yamakawa, H., Nagai, T., Doya, K., Sakagami, M., Suzuki, M., et al. (2022). A whole brain probabilistic generative model: Toward realizing cognitive architectures for developmental robots. Neural Networks 150:293–312. doi: 10.1016/j.neunet.2022.02.026

Taylor, M. E., and Stone, P. (2009). Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res . 10:7. doi: 10.5555/1577069.1755839

Wu, T., Zhu, B., Zhang, R., Wen, Z., Ramchandran, K., and Jiao, J. (2023). “Pairwise proximal policy optimization: Harnessing relative feedback for llm alignment,” in NeurIPS 2023 Foundation Models for Decision Making Workshop , 1–6. Available online at: https://arxiv.org/abs/2310.00212

Xu, T., Singh, K., and Rajivan, P. (2022). “Modeling phishing decision using instance based learning and natural language processing,” in HICSS (Manoa, HI: University of Hawaii at Manoa), 1–10.

Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., and Yang, D. (2023). Can large language models transform computational social science? arXiv [Preprint] . Available online at: https://arxiv.org/abs/2305.03514

Keywords: cognitive modeling, decision making, generative AI, instance based learning, natural language, visual learning

Citation: Malloy T and Gonzalez C (2024) Applying Generative Artificial Intelligence to cognitive models of decision making. Front. Psychol. 15:1387948. doi: 10.3389/fpsyg.2024.1387948

Received: 19 February 2024; Accepted: 12 April 2024; Published: 03 May 2024.

Reviewed by:

Copyright © 2024 Malloy and Gonzalez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tyler Malloy, tylerjmalloy@cmu.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Damien Riggs Ph.D.

Transgender

The importance of representation in psychology, the inclusion of lesbian, gay, bisexual, transgender, intersex and queer people..

Posted November 3, 2019

This blog post was written by Damien W. Riggs, Elizabeth Peel and Sonja Ellis.

For over a century, the psy disciplines have sought to grapple with the topics of sex, gender and sexuality diversity. Starting with the work of Freud , and continuing through to the addition of sex, gender, and sexuality diversity into various versions of the Diagnostic and Statistical Manual for Mental Disorders, the psy disciplines have, for better or worse, had a prominent voice in how lesbian, gay, bisexual, transgender , intersex and queer (LGBTIQ) people are understood.

 Used with permission from Cambridge University Press

In psychology specifically, early research largely adopted a pathologizing approach, seeking to demonstrate familial ‘causes’ of homosexuality or gender diversity. Whilst there are notable exceptions to this, such as in the work of Evelyn Hooker and June Hopkins, psychology in the mid 20th Century was a breeding ground for theories that were either less than positive or entirely pathologizing of lesbians and gay men in particular. In this same period of time, psychologists played an increasing role in gatekeeping transgender people’s access to services.

In terms of representation, then, psychology’s early forays into the lives of LGBTIQ people were largely negative and served to enshrine within the public imaginary stereotypes about LGBTIQ people that continue to this day. These include assumptions of promiscuity amongst gay men and bisexual people, the view that assigned sex determines gender, the belief that gender and sexuality diversity can be ‘corrected’ through therapy , and the assumption that homosexuality constitutes a mental disorder.

From the 1980s onwards, however, a new stream of psychology developed, one that took as its central aim the affirmation of LGBTIQ people’s lives. Often (though not always) led by LGBTIQ researchers and clinicians themselves, this affirming strand of psychology challenged the stereotypes outlined above, and advocated for change both within the discipline and within society more broadly. In this same time period, psychological associations formed their own groups, networks and formal structures that aimed to recognize the study of LGBTIQ people’s lives as a distinct field of psychology.

In many ways, the second edition of our textbook Lesbian, Gay, Bisexual, Trans, Intersex and Queer Psychology: An Introduction signals a significant moment in the trajectory of this field of psychology. It serves to highlight how much has been gained since the initial developments of affirming psychological approaches to LGBTIQ people’s lives. It also highlights how much further we have to go, especially with regard to inclusive and affirming representations of the lives of people born with intersex variations, people who have non-binary genders, and queer people.

Central to our book is a focus on representation: how LGBTIQ people’s lives are represented within psychology, how psychology can play an important advocacy role in terms of producing positive and affirming representations of LGBTIQ people, and how psychology itself as a discipline understands its relationship to the field of LGBTIQ psychology. Representation, as we argue, is not simply a matter of more. Rather, it is a matter of better, more and critical representations of LGBTIQ people: representations that challenge the idea that there is one singular LGBTIQ narrative, representations that recognize diversity across the lives of LGBTIQ people, and representations that hold to account the discipline of psychology for its historical (and in some cases ongoing) less than positive representations.

Importantly, and as we suggested in the first edition of our book, a heterosexual and/or cisgender psychologist or researcher can be an ‘LGBTIQ psychologist’. Whilst, as we suggested above, the field of LGBTIQ has largely been led by LGBTIQ people, this does not limit the field to any certain group, and certainly heterosexual and/or cisgender people have made vital contributions to the psychological study of LGBTIQ people’s lives. Indeed, we would suggest that for representation within psychology to be truly inclusive of LGBTIQ people, it requires the voices of all.

In conclusion, we have come a long way within the psy disciplines, and psychology in particular, in terms of the representation of LGBTIQ people. We have, to differing extents, come to understand the harms that have been done, and have sought to be accountable for this. We also know that in some contexts harms continue, and it is the role of the discipline to continue to speak out when injustices occur, particularly those that occur in the name of psychology. As an evidence based profession, we have a strong base from which to counter negative representations, and instead to produce and advocate for representations that take as their central premise the importance of a just world, in which LGBTIQ people have equitable access to wellbeing.

Damien Riggs Ph.D.

Damien Riggs, Ph.D., is a professor of psychology at Flinders University and an Australian Research Council Future Fellow.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Online Therapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Therapy Center NEW
  • Diagnosis Dictionary
  • Types of Therapy

March 2024 magazine cover

Understanding what emotional intelligence looks like and the steps needed to improve it could light a path to a more emotionally adept world.

  • Emotional Intelligence
  • Gaslighting
  • Affective Forecasting
  • Neuroscience

COMMENTS

  1. Visual Perception Theory In Psychology

    Summary. A lot of information reaches the eye, but much is lost by the time it reaches the brain (Gregory estimates about 90% is lost). Therefore, the brain has to guess what a person sees based on past experiences. We actively construct our perception of reality. Richard Gregory proposed that perception involves a lot of hypothesis testing to ...

  2. A New Look at Visual Thinking

    Visual thinking comes in many forms, but in every case, it is hard work. It may involve the derivation of a new image that connects others, or the manipulation of an image that needs to change. In ...

  3. What is Visual Representation?

    Visual Representation refers to the principles by which markings on a surface are made and interpreted. Designers use representations like typography and illustrations to communicate information, emotions and concepts. Color, imagery, typography and layout are crucial in this communication. Alan Blackwell, cognition scientist and professor ...

  4. Decision making with visualizations: a cognitive framework across

    Visualizations—visual representations of information, depicted in graphics—are studied by researchers in numerous ways, ranging from the study of the basic principles of creating visualizations, to the cognitive processes underlying their use, as well as how visualizations communicate complex information (such as in medical risk or spatial patterns). However, findings from different ...

  5. Mental Imagery

    Here is a typical characterization from a review article that summarizes the state of the art concerning mental imagery in psychology, psychiatry and neuroscience, published in the flagship journal Trends in Cognitive Sciences: "We use the term 'mental imagery' to refer to representations […] of sensory information without a direct ...

  6. Differences between Spatial and Visual Mental Representations

    A difference between spatial and visual mental images is that spatial mental images contain more information, in the sense that the current visual mental image in the visual buffer only contains a "visualized" part of what is represented in the spatial mental image ( Kosslyn et al., 2006, p. 138).

  7. The human imagination: the cognitive neuroscience of visual mental

    Fig. 4: Theoretical representation of visual imagery of a square, showing possible interaction between the strength of the top-down imagery signal and noise in the visual cortex.

  8. Introduction: What is human visual cognition?

    Psychology Professional Development and Training. Research Methods in Psychology. ... 6 Visuomotor representations Notes. Notes. Notes. Expand ... Sight, visual experience or visual perception, is both a particular kind of human experience and a fundamental source of human knowledge of the world. Furthermore, it interacts in multiple ways with ...

  9. Visual Cognition

    Visual cognition is the branch of psychology that is concerned with combining visual data with prior knowledge to construct high-level representations and make unconscious decisions about scene content . ... A key role of the representations of visual cognition is their use to communicate with other centers of the brain. Thus, visual cognition ...

  10. Exploring Visual Imagery in Psychology: Definition and Uses

    Visual imagery is the mental representation of visual information, and plays a significant role in psychology research. Studied through behavioral, neuroimaging, and cognitive experiments, visual imagery has various uses such as improving memory and creativity, treating mental health disorders, and enhancing performance.

  11. What is Visual Imagery?

    Definition. Visual Imagery is the mental representation or recreation of something that is not physically present. It involves the mind's 'eye' forming images, enabling us to 'see' a concept, idea, or physical object even when it is not before our eyes. This cognitive process can significantly impact our thought processes, memory ...

  12. Visual research in psychology.

    The aim of this chapter is to review and make clear the variety of ways in which psychologists use visual images to address research questions. Visual research has been developed mainly by qualitative researchers as a way to study human experiences and to engage participants more fully in the research process. In contemporary culture more generally, visual images have become an important means ...

  13. Differences between spatial and visual mental representations

    This article investigates the relationship between visual mental representations and spatial mental representations in human visuo-spatial processing. By comparing two common theories of visuo-spatial processing—mental model theory and the theory of mental imagery—we identified two open questions: (1) which representations are modality-specific, and (2) what is the role of the two ...

  14. The Science Behind Imagery and Visualisation

    The Science of Multi-Sensory Imagery. Imagery, the process of forming mental representations of sensory experiences, is a cognitive marvel that extends far beyond mere visualization. When we ...

  15. Learning Through Visuals

    Posted July 20, 2012. A large body of research indicates that visual cues help us to better retrieve and remember information. The research outcomes on visual learning make complete sense when you ...

  16. Mental Imagery: Functional Mechanisms and Clinical Applications

    Mental imagery research has weathered both disbelief of the phenomenon and inherent methodological limitations. Here we review recent behavioral, brain imaging, and clinical research that has reshaped our understanding of mental imagery. Research supports the claim that visual mental imagery is a depictive internal representation that functions ...

  17. The Pitfalls of Visual Representations:

    Despite the notable number of publications on the benefits of using visual representations in a variety of fields (Meyer, Höllerer, Jancsary, & Van Leeuwen, 2013), few studies have systematically investigated the possible pitfalls that exist when creating or interpreting visual representations.Some information visualization researchers, however, have raised the issue and called to action ...

  18. (PDF) Visual research in psychology

    visual research is to empower and give voice to mar-. ginalized groups and individuals, but those individ-. uals and groups are anonymized against their. wishes, this raises important questions ...

  19. The role of visual representations in scientific practices: from

    The use of visual representations (i.e., photographs, diagrams, models) has been part of science, and their use makes it possible for scientists to interact with and represent complex phenomena, not observable in other ways. Despite a wealth of research in science education on visual representations, the emphasis of such research has mainly been on the conceptual understanding when using ...

  20. The how, what, and why of mental imagery

    Beech, J. R. (1979) A chronometric study of the scanning of visual representations. Doctoral thesis, The New University of Ulster, Northern Ireland. [IMLH]Google Scholar. Beech, J. R., and Allport, D. A. (1978) Visualization of compound scenes. ... Imagery, propositions, and the form of internal representations. Cognitive Psychology 9: 52 ...

  21. The Visual System

    Anatomy of the Visual System. The eye is the major sensory organ involved in vision (Figure 5.12). Light waves are transmitted across the cornea and enter the eye through the pupil. The cornea is the transparent covering over the eye. It serves as a barrier between the inner eye and the outside world, and it is involved in focusing light waves ...

  22. Why Representation Matters and Why It's Still Not Enough

    When representation is not enough. However, representation simply is not enough—especially when it is one-dimensional, superficial, or not actually representative. Some scholars describe how ...

  23. Frontiers

    These representations have been related to the processing of visual information from humans in learning tasks (Malloy and Sims, 2022), as they excel in retaining key details associated with stimulus generation factors (such as the shape of a ball or the age of a person's face) (Malloy et al., 2022b). Although we employ β-VAEs in this work ...

  24. The Importance of Representation in Psychology

    Central to our book is a focus on representation: how LGBTIQ people's lives are represented within psychology, how psychology can play an important advocacy role in terms of producing positive ...