Image Colorization Datasets

The datasets available for evaluation are the most commonly used ones in the literature for other tasks such as detection, classification, segmentation etc. Where the images are first converted to grayscale, and then apply colorization models to analyze its performance.

COCO-Stuff

COCO-stuff dataset: The Common Objects in COntext-stuff (COCO-stuff) is constructed by annotating the original COCO dataset, which originally annotated things while neglecting stuff annotations. There are 164k images in COCO-stuff dataset that span over 172 categories including 80 things, 91 stuff, and 1 unlabeled class.

H. Caesar, J. Uijlings, and V. Ferrari, “Coco-stuff: Thing and stuff classes in context,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1209–1218.

PASCAL VOC

PASCAL VOC dataset: PASCAL Visual Object Classes (PASCAL VOC) dataset has more than 11000 images that are divided into 20 object categories.

M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International journal of computer vision, vol. 111, no. 1, pp. 98–136, 2015.

CIFAR

CIFAR datasets: CIFAR-10 and CIFAR-100 are two subsets created and reliably labelled from 80 million tiny image dataset. CIFAR-10 is comprised of 60k images equally distributed over mutually exclusive 10 categories with 6k images in each category. On the other hand, CIFAR-100 has the same images distributed over 100 categories with 600 images assigned to each category. Each image in both the subsets is of size 32×32 pixels. In CIFAR-100, two level labelling is used. At the higher level there are 20 superclasses each of which is further divided into five subclasses. Overall, 50k and 1k images comprise training and testing sets, respectively.

A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.

ImageNet

ImageNet ILSVRC2012: This dataset contains 1.2 million high resolution training images spanning over 1k categories where 50k images comprise the hold-out validation set. Images are rescaled to 128 × 128 pixels.

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.

Palette-and-Text dataset

Palette-and-Text dataset: is constructed by making modifications to the data collected from colorhex.com where users upload user-defined color palettes with label names of their choice. The authors first collected 47,665 palette-text pairs and removed non-alphanumeric and non-English words from the collection. After removing text-palette pairs that lack semantic relationships, the final curated dataset contains 10,183 textual phrases with their corresponding five-color palettes.

H. Bahng, S. Yoo, W. Cho, D. Keetae Park, Z. Wu, X. Ma, and J. Choo, “Coloring with words: Guiding image colorization through text-based palette generation,” in IEEE European Conference on Computer Vision (ECCV), 2018, pp. 431–447.

New: NCD Dataset

https://github.com/saeed-anwar/ColorSurvey

The authors aim to remove this unrealistic setting for image colorization by collecting images that are true to their colors. For example, a carrot will have an orange color in most images. Bananas will be either greenish or yellowish. They have collected 723 images from the internet distributed in 20 categories. Each image has an object and a white background. They name the dataset as Natural-Color Dataset (NCD). The following figures shows representative test images for each category from the proposed Natural-Color dataset (NCD).

Leave a Reply Cancel reply