Top Cited Papers: CVPR 2010

Curated Papers:

What is an object?

We present a generic objectness measure, quantifying how likely it is for an image window to contain an object of any class. We explicitly train it to distinguish objects with a well-defined boundary in space, such as cows and telephones, from amorphous background elements, such as grass and road. The measure combines in a Bayesian framework several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary. This includes an innovative cue measuring the closed boundary characteristic. In experiments on the challenging PASCAL VOC 07 dataset, we show this new cue to outperform a state-of-the-art saliency measure, and the combined measure to perform better than any cue alone. Finally, we show how to sample windows from an image according to their objectness distribution and give an algorithm to employ them as location priors for modern class-specific object detectors. In experiments on PASCAL VOC 07 we show this greatly reduces the number of windows evaluated by class-specific object detectors.

New features and insights for pedestrian detection

Despite impressive progress in people detection the performance on challenging datasets like Caltech Pedestrians or TUD-Brussels is still unsatisfactory. In this work we show that motion features derived from optic flow yield substantial improvements on image sequences, if implemented correctly – even in the case of low-quality video and consequently degraded flow fields. Furthermore, we introduce a new feature, self-similarity on color channels, which consistently improves detection performance both for static images and for video sequences, across different datasets. In combination with HOG, these two features outperform the state-of-the-art by up to 20%. Finally, we report two insights concerning detector evaluations, which apply to classifier-based object detection in general. First, we show that a commonly under-estimated detail of training, the number of bootstrapping rounds, has a drastic influence on the relative (and absolute) performance of different feature/classifier combinations. Second, we discuss important intricacies of detector evaluation and show that current benchmarking protocols lack crucial details, which can distort evaluations.

Learning mid-level features for recognition

Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pooling schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the relative importance of each step of mid-level feature extraction through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature extractors, our approach aims to facilitate the design of better recognition architectures.

Secrets of optical flow estimation and their principles

The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that “classical” flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. Moreover, we find that while median filtering of intermediate flow fields during optimization is a key to recent performance gains, it leads to higher energy solutions. To understand the principles behind this phenomenon, we derive a new objective that formalizes the median filtering heuristic. This objective includes a nonlocal term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that ranks at the top of the Middlebury benchmark.

Aggregating local descriptors into a compact image representation

We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation. We then show how to jointly optimize the dimension reduction and the indexing algorithm, so that it best preserves the quality of vector comparison. The evaluation shows that our approach significantly outperforms the state of the art: the search accuracy is comparable to the bag-of-features approach for an image representation that fits in 20 bytes. Searching a 10 million image dataset takes about 50ms.

Full List:

Locality-constrained Linear Coding for image classification Jinjun Wang;Jianchao Yang;Kai Yu;Fengjun Lv;Thomas Huang;Yihong Gong Publication Year: 2010,Page(s):3360 – 3367 Cited by: Papers (1886) | Patents (19)
Visual object tracking using adaptive correlation filters David S. Bolme;J. Ross Beveridge;Bruce A. Draper;Yui Man Lui Publication Year: 2010,Page(s):2544 – 2550 Cited by: Papers (1126) | Patents (1)
Aggregating local descriptors into a compact image representation Hervé Jégou;Matthijs Douze;Cordelia Schmid;Patrick Pérez Publication Year: 2010,Page(s):3304 – 3311 Cited by: Papers (1102) | Patents (29)
SUN database: Large-scale scene recognition from abbey to zoo Jianxiong Xiao;James Hays;Krista A. Ehinger;Aude Oliva;Antonio Torralba Publication Year: 2010,Page(s):3485 – 3492 Cited by: Papers (1012) | Patents (17)
Visual tracking decomposition Junseok Kwon;Kyoung Mu Lee Publication Year: 2010,Page(s):1269 – 1276 Cited by: Papers (815) | Patents (1)
Person re-identification by symmetry-driven accumulation of local features M. Farenzena;L. Bazzani;A. Perina;V. Murino;M. Cristani Publication Year: 2010,Page(s):2360 – 2367 Cited by: Papers (800) | Patents (10)
etecting text in natural scenes with stroke width transform Boris Epshtein;Eyal Ofek;Yonatan Wexler Publication Year: 2010,Page(s):2963 – 2970 Cited by: Papers (762) | Patents (43)
Secrets of optical flow estimation and their principles Deqing Sun;Stefan Roth;Michael J. Black Publication Year: 2010,Page(s):2432 – 2439 Cited by: Papers (683) | Patents (14)
iscriminative K-SVD for dictionary learning in face recognition Qiang Zhang;Baoxin Li Publication Year: 2010,Page(s):2691 – 2698 Cited by: Papers (649) | Patents (1)
N learning: Bootstrapping binary classifiers by structural constraints Zdenek Kalal;Jiri Matas;Krystian Mikolajczyk Publication Year: 2010,Page(s):49 – 56 Cited by: Papers (636) | Patents (6)
Anomaly detection in crowded scenes Vijay Mahadevan;Weixin Li;Viral Bhalodia;Nuno Vasconcelos Publication Year: 2010,Page(s):1975 – 1981 Cited by: Papers (551) | Patents (6)
earning mid-level features for recognition Y-Lan Boureau;Francis Bach;Yann LeCun;Jean Ponce Publication Year: 2010,Page(s):2559 – 2566 Cited by: Papers (530) | Patents (12)
ontext-aware saliency detection Stas Goferman;Lihi Zelnik-Manor;Ayellet Tal Publication Year: 2010,Page(s):2376 – 2383 Cited by: Papers (509) | Patents (11)
What is an object? Bogdan Alexe;Thomas Deselaers;Vittorio Ferrari Publication Year: 2010,Page(s):73 – 80 Cited by: Papers (480) | Patents (6)
Deconvolutional networks Matthew D. Zeiler;Dilip Krishnan;Graham W. Taylor;Rob Fergus Publication Year: 2010,Page(s):2528 – 2535 Cited by: Papers (476)
Cascade object detection with deformable part models Pedro F. Felzenszwalb;Ross B. Girshick;David McAllester Publication Year: 2010,Page(s):2241 – 2248 Cited by: Papers (453) | Patents (12)
lassification and clustering via dictionary learning with structured incoherence and shared features Ignacio Ramirez;Pablo Sprechmann;Guillermo Sapiro Publication Year: 2010,Page(s):3501 – 3508 Cited by: Papers (425) | Patents (10)
Efficient hierarchical graph-based video segmentation Matthias Grundmann;Vivek Kwatra;Mei Han;Irfan Essa Publication Year: 2010,Page(s):2141 – 2148 Cited by: Papers (424) | Patents (18)
Large-scale image retrieval with compressed Fisher vectors Florent Perronnin;Yan Liu;Jorge Sánchez;Hervé Poirier Publication Year: 2010,Page(s):3384 – 3391 Cited by: Papers (409) | Patents (19)
Model globally, match locally: Efficient and robust 3D object recognition Bertram Drost;Markus Ulrich;Nassir Navab;Slobodan Ilic Publication Year: 2010,Page(s):998 – 1005 Cited by: Papers (349) | Patents (21)
Towards Internet-scale multi-view stereo Yasutaka Furukawa;Brian Curless;Steven M. Seitz;Richard Szeliski Publication Year: 2010,Page(s):1434 – 1441 Cited by: Papers (334) | Patents (43)
Learning a hierarchy of discriminative space-time neighborhood features for human action recognition Adriana Kovashka;Kristen Grauman Publication Year: 2010,Page(s):2046 – 2053 Cited by: Papers (324) | Patents (3)
emi-supervised hashing for scalable image retrieval Jun Wang;Sanjiv Kumar;Shih-Fu Chang Publication Year: 2010,Page(s):3424 – 3431 Cited by: Papers (324) | Patents (2)
odeling mutual context of object and human pose in human-object interaction activities Bangpeng Yao;Li Fei-Fei Publication Year: 2010,Page(s):17 – 24 Cited by: Papers (308) | Patents (2)
New features and insights for pedestrian detection Stefan Walk;Nikodem Majer;Konrad Schindler;Bernt Schiele Publication Year: 2010,Page(s):1030 – 1037 Cited by: Papers (297) | Patents (7)
Cascaded pose regression Piotr Dollár;Peter Welinder;Pietro Perona Publication Year: 2010,Page(s):1078 – 1085 Cited by: Papers (293) | Patents (15)
Face recognition based on image sets Hakan Cevikalp;Bill Triggs Publication Year: 2010,Page(s):2567 – 2573 Cited by: Papers (293)
ROST: Parallel robust online simple tracking Jakob Santner;Christian Leistner;Amir Saffari;Thomas Pock;Horst Bischof Publication Year: 2010,Page(s):723 – 730 Cited by: Papers (265) | Patents (5)
iCoseg: Interactive co-segmentation with intelligent scribble guidance Dhruv Batra;Adarsh Kowdle;Devi Parikh;Jiebo Luo;Tsuhan Chen Publication Year: 2010,Page(s):3169 – 3176 Cited by: Papers (263) | Patents (10)
Scale-invariant heat kernel signatures for non-rigid shape recognition Michael M. Bronstein;Iasonas Kokkinos Publication Year: 2010,Page(s):1704 – 1711 Cited by: Papers (261) | Patents (1)
Local features are not lonely – Laplacian sparse coding for image classification Shenghua Gao;Ivor Wai-Hung Tsang;Liang-Tien Chia;Peilin Zhao Publication Year: 2010,Page(s):3555 – 3561 Cited by: Papers (260) | Patents (3)
ace recognition with learning-based descriptor Zhimin Cao;Qi Yin;Xiaoou Tang;Jian Sun Publication Year: 2010,Page(s):2707 – 2714 Cited by: Papers (258) | Patents (14)
iscriminative clustering for image co-segmentation Armand Joulin;Francis Bach;Jean Ponce Publication Year: 2010,Page(s):1943 – 1950 Cited by: Papers (257) | Patents (3)
Monocular 3D pose estimation and tracking by detection Mykhaylo Andriluka;Stefan Roth;Bernt Schiele Publication Year: 2010,Page(s):623 – 630 Cited by: Papers (257) | Patents (1)
Constrained parametric min-cuts for automatic object segmentation Joao Carreira;Cristian Sminchisescu Publication Year: 2010,Page(s):3241 – 3248 Cited by: Papers (246) | Patents (12)
Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes Shandong Wu;Brian E. Moore;Mubarak Shah Publication Year: 2010,Page(s):2054 – 2060 Cited by: Papers (232) | Patents (1)
odeling pixel process with scale invariant local patterns for background subtraction in complex scenes Shengcai Liao;Guoying Zhao;Vili Kellokumpu;Matti Pietikäinen;Stan Z. Li Publication Year: 2010,Page(s):1301 – 1306 Cited by: Papers (229) | Patents (2)
Robust video denoising using low rank matrix completion Hui Ji;Chaoqiang Liu;Zuowei Shen;Yuhong Xu Publication Year: 2010,Page(s):1791- 1798 Cited by: Papers (229) | Patents (3)
Single image depth estimation from predicted semantic labels Beyang Liu;Stephen Gould;Daphne Koller Publication Year: 2010,Page(s):1253 – 1260 Cited by: Papers (227) | Patents (8)
Supervised translation-invariant sparse coding Jianchao Yang;Kai Yu;Thomas Huang Publication Year: 2010,Page(s):3517 – 3524 Cited by: Papers (209) | Patents (3)
isual classification with multi-task joint sparse representation Xiao-Tong Yuan;Shuicheng Yan Publication Year: 2010,Page(s):3493 – 3500 Cited by: Papers (204)
Multimodal semi-supervised learning for image classification Matthieu Guillaumin;Jakob Verbeek;Cordelia Schmid Publication Year: 2010,Page(s):902 – 909 Cited by: Papers (202) | Patents (23)
eal time motion capture using a single time-of-flight camera Varun Ganapathi;Christian Plagemann;Daphne Koller;Sebastian Thrun Publication Year: 2010,Page(s):755 – 762 Cited by: Papers (199) | Patents (5)
ata fusion through cross-modality metric learning using similarity-sensitive hashing Michael M. Bronstein;Alexander M. Bronstein;Fabrice Michel;Nikos Paragios Publication Year: 2010,Page(s):3594 – 3601 Cited by: Papers (197) | Patents (2)
Live dense reconstruction with a single moving camera Richard A. Newcombe;Andrew J. Davison Publication Year: 2010,Page(s):1498 – 1505 Cited by: Papers (194) | Patents (40)
Facial point detection using boosted regression and graph models Michel Valstar;Brais Martinez;Xavier Binefa;Maja Pantic Publication Year: 2010,Page(s):2729 – 2736 Cited by: Papers (188) | Patents (3)
eodesic star convexity for interactive image segmentation Varun Gulshan;Carsten Rother;Antonio Criminisi;Andrew Blake;Andrew Zisserman Publication Year: 2010,Page(s):3129 – 3136 Cited by: Papers (188) | Patents (4)
Multi-target tracking by on-line learned discriminative appearance models Cheng-Hao Kuo;Chang Huang;Ramakant Nevatia Publication Year: 2010,Page(s):685 – 692 Cited by: Papers (177) | Patents (1)
RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images Yigang Peng;Arvind Ganesh;John Wright;Wenli Xu;Yi Ma Publication Year: 2010,Page(s):763 – 770 Cited by: Papers (173) | Patents (28)
A Hough transform-based voting framework for action recognition Angela Yao;Juergen Gall;Luc Van Gool Publication Year: 2010,Page(s):2061 – 2068 Cited by: Papers (162) | Patents (4)

https://ieeexplore.ieee.org/xpl/conhome/5521876/proceeding?sortType=paper-citations&rowsPerPage=50&pageNumber=1