Person Re-ID Datasets

To evaluate a PReID method, some factors must be taken into account to reach a reliable recognition rate. This task faces challenges due to occlusions (e.g. apparent on i-LIDS data set) and illumination variation (common in most of them). On the other hand, background and foreground segmentation in order to distinguish person’s body is challenging task. Some of the datasets provides already segmented region of interest i.e. the person’s body (e.g. VIPeR, ETHZ, and CAVIAR datasets). There are several available datasets that have been prepared to evaluate re-identification task. However, some well-known benchmark data sets like VIPeR, CUHK01, and CUHK03 are mostly used by researchers of this area to evaluate their techniques.

VIPeR is the most challenging due to its challenging images of individual. On the other hand, VIPeR, CAVIAR, and PRID data sets are used when only two fixed camera views are given to evaluate performance

VIPeR

It is made up of two images of 632 individuals from two camera views with pose and illumination changes. This is one of the most challenging and widely datasets yet for PReID task. The images are cropped and scaled to be 128 × 48 pixels.

D. Gray, H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, in: European conference on computer vision, Springer, 2008, pp. 262–275.

i-LIDS

It contains 476 images of 119 pedestrians taken at an airport hall from non-overlapping cameras with pose and lightning variations and strong occlusions. A minimum of 2 images and on an average there are 4 images of each pedestrian.

H. O. S. D. Branch, Imagery library for intelligent detection systems (i-lids), in: Crime and Security, 2006.The Institution of Engineering and Technology Conference on, IET, 2006, pp. 445–448.

ETHZ

It contains three video sequences of a crowded street from two moving cameras; images exhibit considerable illumination changes, scale variations, and occlusions. The images are of different sizes which can be resized to same width according to the requirements. The data set provides three sequences of multiple images of an individual from each sequence. Sequences 1, 2 and 3 have 83, 35, and 28 pedestrians respectively.

A. Ess, B. Leibe, L. Van Gool, Depth and appearance for mobile scene analysis, in: Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, IEEE, 2007, pp. 1–8.

CAVIAR

It contains 72 persons and two views in which 50 of persons appear in both views while 22 persons appear only in one view. Each person has 5 images per view, with different appearance variations due to resolution changes, light conditions, occlusions, and different poses.

D. S. Cheng, M. Cristani, M. Stoppa, L. Bazzani, V. Murino, Custom pictorial structures for reidentiﬁcation., in: BMVC, 2011, p. 6.

CUHK

This is provided by Chinese University of Hong Kong. It particularly gathered persons images for person re-identification task, and includes three different partitions with specific set up for each.; CUHK01 includes 1, 942 images of 971 pedestrians. It has only two images captured in two disjoint camera views, and camera second camera (B) mainly includes images of the frontal view and the back view, and camera A has more variations of viewpoints and poses. Fig. CUHK02 contains 1, 816 individuals constructed by five pairs of camera views (P1-P5 with ten camera views). Each pair includes 971, 306, 107, 193 and 239 individuals respectively. Each individual has two images in each camera view. This dataset is employed to evaluate the performance when camera views in test are different than those in training. Finally, CUHK03 includes 13, 164 images of 1, 360 pedestrians. This data set has been captured with six surveillance cameras. Each identity is observed by two disjoint camera views and has an average of 4.8 images in each view; all manually cropped pedestrian images exhibit illumination changes, misalignment, occlusions and body part missing.

W. Li, R. Zhao, X. Wang, Human reidentiﬁcation with transferred metric learning, in: Asian Conference on Computer Vision, Springer, 2012, pp. 31–44.

W. Li, X. Wang, Locally aligned feature transforms across views, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3594–3601.

W. Li, R. Zhao, T. Xiao, X. Wang, Deepreid: Deep filter pairing neural network for person re-identification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 152–159.

PRID

This dataset is specially designed for PReID focusing on single shot scenario. It contains two image sets containing 385 and 749 persons captured by camera A and camera B, respectively. These two datasets share 200 persons in common.

M. Hirzer, C. Beleznai, P. M. Roth, H. Bischof, Person re-identiﬁcation by descriptive and discriminative classiﬁcation, in: Image Analysis, Springer, 2011, pp. 91–102.

WARD

The dataset has 4,786 images of 70 persons acquired in a real surveillance scenario with three non-overlapping cameras having huge illumination variation, resolution, and pose changes.

N. Martinel, C. Micheloni, Re-identify people in wide area camera network, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, 2012, pp. 31–36.

RAiD

Re-identification Across indoor-outdoor Dataset (RAiD): It has 6920 bounding boxes of 43 identities captured by 4 cameras. The cameras are categorized in four partitions where the first two cameras are indoor while the remaining are outdoor. Apparently, the images consists of very large illumination variations because of indoor and outdoor situations.

A. Das, A. Chakraborty, A. K. Roy-Chowdhury, Consistent re-identification in a camera network, in: European Conference on Computer Vision, Springer, 2014, pp. 330–345.

Market-1501

This is a large PReID dataset which contains 32,643 fully annotated boxes of 1501 pedestrians. Each person is captured by maximum six cameras and each box of person is cropped by a state-of-the-art detector (Deformable Part Model (DPM)).

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, Scalable person re-identification: A benchmark, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1116–1124.

MARS

It is another large sequence-based PReID dataset which contains 1,261 identities with each identity captured by at least two cameras. It consists of 20,478 tracklets and 1,191,003 bounding boxes.

Springer, MARS: A Video Benchmark for Large-Scale Person Re-identiﬁcation.

DukeMTMC

This dataset contains 36,441 manually cropped images of 1,812 persons captured by 8 outdoor cameras. The data set gives the access to some additional information such as full frames, frame level ground truth, and calibration information.

E. Ristani, F. Solera, R. Zou, R. Cucchiara, C. Tomasi, Performance measures and a data set for multitarget, multi-camera tracking, in: European Conference on Computer Vision workshop on Benchmarking Multi-Target Tracking, 2016.

MSMT

This is the most recent and largest PReID dataset. It consists of 126,441 images of 4,101 individuals acquired from 12 indoor and 3 outdoor cameras, with different strong illumination changes, pose, and scale variations.

L. Wei, S. Zhang, W. Gao, Q. Tian, Person trasfer gan to bridge domain gap for person re-identification, in: Computer Vision and Pattern Recognition, IEEE International Conference on, 2018.

This post is mainly based on https://arxiv.org/abs/1807.05284.