What makes a picture memorable?
‘Mental sports’ have become a new trend in self-improvement, with video games designed to improve mental fitness. At the World Memory Championships, athletes compete to recall massive amounts of information; contestants must memorize and recall sequences of abstract images and the names of people whose faces are shown in photographs. While these tasks might seem challenging, our research suggests that images that possess certain properties are memorable. Our findings can explain why we have all had some images stuck in our minds, but ignored or quickly forgotten others.
Although image memorability seems subjective and hard to quantify, our recent work1–5 shows that it is not an inexplicable phenomenon. We found that visual memorability is largely intrinsic to the image and reproducible across a diverse population. This means that despite varied experiences, individuals tend to remember and forget the same images. Using experimental data detailing the types of images people remember or quickly forget, we developed an algorithm that automatically predicts whether an image will be memorable.
To determine the intrinsic features that make an image memorable, we first asked 665 individuals to participate in a computer memory game. During each level of the game, for up to 30 levels, participants viewed a stream of images and then pressed the space bar whenever they saw one of those images repeated in a subsequent sequence. In total, the image database contained 2222 repeated images and 8220 unrepeated images that included faces, interior-design photos, nature scenes, streetscapes, and others. We found that photographs with people or central objects were memorable, whereas landscapes—that one might expect to be memorable—were among the most forgettable (see Figure 1).1
Next, we assigned a ‘memorability score’ to each image, which was defined as the percent of correct detections by participants in the study. On average, 78 participants scored each image. We then investigated the features of the more memorable images, including color, object statistics (e.g., number of objects or amount of space occupied by objects in the image), object semantics (e.g., what type of object appeared in an image, such as an animal or car), and scene semantics (e.g., the place the image represented, such as a kitchen or landscape). Using computer vision techniques,1 we developed an image-ranking algorithm to automatically predict the memorability of images. To do this, we trained a support vector regressor to map from features to memorability scores using only features algorithmically extracted from the images. The algorithm learned from the memorability scores calculated from the memory game. We used half of the images in the database to train the algorithm and tested its performance with the remaining half. The algorithm correctly identified images with people as most memorable, indoor scenes and large objects as slightly less memorable, and outdoor landscapes as the least memorable.1 We also found that atypicality and aesthetic beauty attributes did not explain much of the variation we observed in scene memorability.2 For instance, landscapes—despite being beautiful—were often forgotten, whereas generic photos of social scenes—for example, a dinner party, subway car, office—were almost always remembered.
We have furthered this work by developing a framework for predicting image memorability that accounts for how the memorability of image regions and different types of features fade over time.3 Research on human visual memory has shown that observers typically remember visual details attached to objects that have a specific semantic label or distinctive interpretation.6 For example, observers remember different types of cars by tagging each car with a different brand name, but often confuse different types of apple that differ by color.7 This suggests that different features, objects, and regions in an image might have different effects on memorability. To examine this idea, we developed a method for creating automated memorability maps that display which local information in an image is memorable and which is forgettable.3
Predicting image memorability lends itself to a wide variety of applications. We live in an age of data deluge, and memorability prediction could provide a method for summarizing and condensing the onslaught of visual data we encounter. For example, a photo album could be summarized using a few memorable photographs that convey the overall story. In education, textbook diagrams could be created to stick in students' minds, teachers could select memorable examples to illustrate concepts, and memorable cartoons could be used as mnemonic aids to make learning easier. Memorability could also find applications in user-interface design. For example, memorable icons could clarify a messy desktop, and mnemonic labels could be attached to pill containers or entryways in retirement homes. In addition, understanding memorability might lead to intelligent systems that preferentially store information based on its memorability, making sure to prioritize important information that humans will likely forget.
Memorability research could be especially applicable within the domain of face memorability.4 As social creatures, faces are a key part of our lives, and we often struggle to make a strong first impression and to be memorable. Indeed, in future work, we will be looking into algorithms that enable us to modify a portrait in subtle ways to enhance or reduce its memorability, while maintaining other facial traits like identity, attractiveness, and facial expression. Perhaps within the next few years smartphone applications will be developed that can select the most memorable photograph for a profile picture or that can help you apply makeup to boost your memorability. Additionally, therapeutic technologies could be realized to train people to focus on key memorability-determining facial features to help those with social processing and memory-related disorders, such as autism, prosopagnosia, or Alzheimer's disease.
A common factor across disciplines, memorability represents a fairly general quantification of the utility of visual information. Memorability varies from image to image, yet remains largely constant across multiple people viewing the same picture.1, 2,4,5 We have found that we can predict whether people will remember an image based on its components.1, 3 In forthcoming work, we are extending the paradigm beyond photographs; we are investigating the intrinsic memorability of data visualizations, artwork, and even English words. With this base understanding of memorability in place, our work might encourage machine vision and artificial intelligence researchers to consider not only what the world is about, but what humans consider meaningful: what they remember.
The authors thank the National Science Foundation (Grant No. 1016862), Google, and Xerox for partly supporting this research.
Aude Oliva is a principal research scientist in the Computer Science and Artificial Intelligence Laboratory at MIT. Her research lies at the interface between human perception, cognition, neuroscience, and computer vision. She is the recipient of a National Science Foundation CAREER Award.
Phillip Isola is a graduate student in the Department of Brain and Cognitive Sciences at MIT. He works on human and computer vision, and is the recipient of a National Science Foundation Graduate Research Fellowship.
Aditya Khosla is a graduate student in the Electrical Engineering and Computer Science Department at MIT and is interested in computer vision and machine learning. He is the recipient of the Facebook Fellowship 2013–2014.
Wilma A. Bainbridge is a graduate student in the Department of Brain and Cognitive Sciences at MIT working on the human neuroscience of perception and memory. She is the recipient of a National Defense Science and Engineering Graduate Fellowship.