80 Million Tiny Images
Antonio Torralba, Rob Fergus, William T. Freeman
Digg it
Reddit

Visual dictionary
Click on top of the map to visualize the images in that region of the visual dictionary.

We present a visualization of all the nouns in the English language arranged by semantic meaning. Each of the tiles in the mosaic is an arithmetic average of images relating to one of 53,464 nouns. The images for each word were obtained using Google's Image Search and other engines. A total of 7,527,697 images were used, each tile being the average of 140 images. The average reveals the dominant visual characteristics of each word. For some, the average turns out to be a recognizable image; for others the average is a colored blob. The list of nouns was obtained from Wordnet, a database compiled by lexicographers which records the semantic relationship between words. Using this database, we extract a tree-structured semantic hierarchy which we use to arrange tiles within the poster. We tessellate the poster using the hierarchy so that the proximity of two tiles is given by their semantic distance. Thus the poster explores the relationship between visual and semantic similarity. For a large part of our language the two are closely correlated as shown by the extent of visual clustering within the poster. The large-scale groupings correspond to broad categories such as plants or people. Within the plant cluster, for example, tighter semantic groupings are visible such as flowers or trees. In turn each of these clusters contains further groupings all the way down to individual, highly specific nouns. The averaging within each tile removes the variation between images of a given word, enhancing the similarly between neighbors. By clicking on top of the map, you will see the word corresponding to that location, the average image and the first 16 images returned by the image search online tools.

Currently computers have difficult recognizing objects in images. While practical solutions exist for a few simple classes such as human faces or cars, the more general problem of recognizing all different classes of objects in the world (e.g. guitars, bottles, telephones) remains unsolved. Computer Vision researchers are currently investigating methods that can recognize and localize thousands of different object categories in complex scenes. A key component of these algorithms is the data used to train the computers' model of each object. Current approaches use collections of images gathered by hand. Our research explores how the billions of images available on the Internet can be used to train models for object recognition. With overwhelming amounts of data, many problems can be tackled with simple algorithms. We gathered from the web 79 million images. We are using this massive dataset to train a computer to recognize objects within an image and to understand the scenes depicted in photographs.

Help us to train computers to see

You can help us to get better training data for computer vision algorithms by labeling some of the images.

In order to label an image, just click on top of the thumbnail for a word to open the LabelMe tool which allows selecting objects in images and describing what they are. We will use that information to train computers to recognize objects. You can annotate an object by clicking around the boudary of the object. This will draw the object boundary as shown in the example bellow:

We will store the data that you provide and we will make it available to the research community.


Download
The content made available here is for non-comercial purposes only. The images provided here are tiny thumbnails (32x32 pixels).

The visual dictionary poster (3.7 Mbytes). This poster is a high resolution version of the visual dictionary. In exchage, you can help us build a large collection of annotated images by visiting LabelMe and annotating several objects. The LabelMe database is used for research in computer vision as we try to teach computers to recognize everyday objects.

1.5 million tiny images (tar file, 3.5 Gbytes). If you use this dataset, we ask you to cite our Tech report. This file contains around 1.5 million images arranged by words. For each word there are about 30 images. Those are the first images returned by the online search tools. Therefore, these are the most reliable (they are around 40% correct). We removed duplicate images within each word but not across words. We also removed images that contained more than 20% of white pixels. Those correspond mostly to graphs and scanned documents.


Publications

Small codes and large databases for recognition
A. Torralba, R. Fergus, Y. Weiss.
IEEE Computer Vision and Pattern Recognition, June 2008.

80 million tiny images: a large dataset for non-parametric object and scene recognition
A. Torralba, R. Fergus, W. T. Freeman
PDF Submitted to PAMI, October 2007.
To cite this work, please reference our Tech report.

Object Recognition by Scene Alignment
B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman.
PDF Advances in Neural Information Processing Systems, 2007.

LabelMe: a database and web-based tool for image annotation
B. C. Russell, A. Torralba, K. Murphy, W. T. Freeman
PDF International Journal of Computer Vision, 2007.