Top Cited Papers: CVPR 2009

Curated Papers:

Learning to detect unseen object classes by between-class attribute transfer

We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels. In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson’s classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

Recognizing realistic actions from videos “in the wild”

In this paper, we present a systematic framework for recognizing realistic actions from videos “in the wild”. Such unconstrained videos are abundant in personal collections as well as on the Web. Recognizing action from such videos has not been addressed extensively, primarily due to the tremendous variations that result from camera motion, background clutter, changes in object appearance, and scale, etc. The main challenge is how to extract reliable and informative features from the unconstrained videos. We extract both motion and static features from the videos. Since the raw features of both types are dense yet noisy, we propose strategies to prune these features. We use motion statistics to acquire stable motion features and clean static features. Furthermore, PageRank is used to mine the most informative static features. In order to further construct compact yet discriminative visual vocabularies, a divisive information-theoretic algorithm is employed to group semantically related features. Finally, AdaBoost is chosen to integrate all the heterogeneous yet complementary features for recognition. We have tested the framework on the KTH dataset and our own dataset consisting of 11 categories of actions collected from YouTube and personal videos, and have obtained impressive results for action recognition and action localization.

Understanding and evaluating blind deconvolution algorithms

Blind deconvolution is the recovery of a sharp version of a blurred image when the blur kernel is unknown. Recent algorithms have afforded dramatic progress, yet many aspects of the problem remain challenging and hard to understand. The goal of this paper is to analyze and evaluate recent blind deconvolution algorithms both theoretically and experimentally. We explain the previously reported failure of the naive MAP approach by demonstrating that it mostly favors no-blur explanations. On the other hand we show that since the kernel size is often smaller than the image size a MAP estimation of the kernel alone can be well constrained and accurately recover the true blur. The plethora of recent deconvolution techniques makes an experimental evaluation on ground-truth data important. We have collected blur data with ground truth and compared recent algorithms under equal settings. Additionally, our data demonstrates that the shift-invariant blur assumption made by most algorithms is often violated.

From contours to regions: An empirical evaluation

We propose a generic grouping algorithm that constructs a hierarchy of regions from the output of any contour detector. Our method consists of two steps, an oriented watershed transform (OWT) to form initial regions from contours, followed by construction of an ultra-metric contour map (UCM) defining a hierarchical segmentation. We provide extensive experimental evaluation to demonstrate that, when coupled to a high-performance contour detector, the OWT-UCM algorithm produces state-of-the-art image segmentations. These hierarchical segmentations can optionally be further refined by user-specified annotations.

Geometric reasoning for single image structure recovery

We study the problem of generating plausible interpretations of a scene from a collection of line segments automatically extracted from a single indoor image. We show that we can recognize the three dimensional structure of the interior of a building, even in the presence of occluding objects. Several physically valid structure hypotheses are proposed by geometric reasoning and verified to find the best fitting model to line segments, which is then converted to a full 3D model. Our experiments demonstrate that our structure recovery from line segments is comparable with methods using full image appearance. Our approach shows how a set of rules describing geometric constraints between groups of segments can be used to prune scene interpretation hypotheses and to generate the most plausible interpretation.

Nonparametric scene parsing: Label transfer via dense scene alignment

In this paper we propose a novel nonparametric approach for object recognition and scene parsing using dense scene alignment. Given an input image, we retrieve its best matches from a large database with annotated images using our modified, coarse-to-fine SIFT flow algorithm that aligns the structures within two images. Based on the dense scene correspondence obtained from the SIFT flow, our system warps the existing annotations, and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on a challenging database. Compared to existing object recognition approaches that require training for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.

Describing objects by their attributes

We propose to shift the goal of recognition from naming to describing. Doing so allows us not only to name familiar objects, but also: to report unusual aspects of a familiar object (“spotty dog”, not just “dog”); to say something about unfamiliar objects (“hairy and four-legged”, not just “unknown”); and to learn how to recognize new objects with few or no visual examples. Rather than focusing on identity assignment, we make inferring attributes the core problem of recognition. These attributes can be semantic (“spotty”) or discriminative (“dogs have it but sheep do not”). Learning attributes presents a major new challenge: generalization across object categories, not just across instances within a category. In this paper, we also introduce a novel feature selection method for learning attributes that generalize well across categories. We support our claims by thorough evaluation that provides insights into the limitations of the standard recognition paradigm of naming and demonstrates the new abilities provided by our attribute-based framework.

Picking the best DAISY

Local image descriptors that are highly discriminative, computational efficient, and with low storage footprint have long been a dream goal of computer vision research. In this paper, we focus on learning such descriptors, which make use of the DAISY configuration and are simple to compute both sparsely and densely. We develop a new training set of match/non-match image patches which improves on previous work. We test a wide variety of gradient and steerable filter based configurations and optimize over all parameters to obtain low matching errors for the descriptors. We further explore robust normalization, dimension reduction and dynamic range reduction to increase the discriminative power and yet reduce the storage requirement of the learned descriptors. All these enable us to obtain highly efficient local descriptors: e.g, 13.2% error at 13 bytes storage per descriptor, compared with 26.1% error at 128 bytes for SIFT.

Full List:

ImageNet: A large-scale hierarchical image database Jia Deng;Wei Dong;Richard Socher;Li-Jia Li;Kai Li;Li Fei-Fei Publication Year: 2009,Page(s):248 – 255 Cited by: Papers (8439) | Patents (97)
Frequency-tuned salient region detection Radhakrishna Achanta;Sheila Hemami;Francisco Estrada;Sabine Susstrunk Publication Year: 2009,Page(s):1597 – 1604 Cited by: Papers (1828) | Patents (20)
Describing objects by their attributes Ali Farhadi;Ian Endres;Derek Hoiem;David Forsyth Publication Year: 2009,Page(s):1778 – 1785 Cited by: Papers (800) | Patents (6)
Learning to detect unseen object classes by between-class attribute transfer Christoph H. Lampert;Hannes Nickisch;Stefan Harmeling Publication Year: 2009,Page(s):951 – 958 Cited by: Papers (722) | Patents (8)
Abnormal crowd behavior detection using social force model Ramin Mehran;Alexis Oyama;Mubarak Shah Publication Year: 2009,Page(s):935 – 942 Cited by: Papers (711) | Patents (1)
Visual tracking with online Multiple Instance Learning Boris Babenko;Ming-Hsuan Yang;Serge Belongie Publication Year: 2009,Page(s):983 – 990 Cited by: Papers (694) | Patents (4)
Sparse subspace clustering Ehsan Elhamifar;Rene Vidal Publication Year: 2009,Page(s):2790 – 2797 Cited by: Papers (628)
Actions in context Marcin Marszalek;Ivan Laptev;Cordelia Schmid Publication Year: 2009,Page(s):2929 – 2936 Cited by: Papers (600) | Patents (1)
Recognizing indoor scenes Ariadna Quattoni;Antonio Torralba Publication Year: 2009,Page(s):413 – 420 Cited by: Papers (577) | Patents (2)
Linear spatial pyramid matching using sparse coding for image classification Jianchao Yang;Kai Yu;Yihong Gong;Thomas Huang Publication Year: 2009,Page(s):1794 – 1801 Cited by: Papers (555) | Patents (5)
Pedestrian detection: A benchmark Piotr Dollar;Christian Wojek;Bernt Schiele;Pietro Perona Publication Year: 2009,Page(s):304 – 311 Cited by: Papers (549) | Patents (13)
Understanding and evaluating blind deconvolution algorithms Anat Levin;Yair Weiss;Fredo Durand;William T. Freeman Publication Year: 2009,Page(s):1964 – 1971 Cited by: Papers (511) | Patents (10)
Recognizing realistic actions from videos “in the wild” Jingen Liu;Jiebo Luo;Mubarak Shah Publication Year: 2009,Page(s):1996 – 2003 Cited by: Papers (450) | Patents (3)
Pictorial structures revisited: People detection and articulated pose estimation Mykhaylo Andriluka;Stefan Roth;Bernt Schiele Publication Year: 2009,Page(s):1014 – 1021 Cited by: Papers (395) | Patents (3)
Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models Louis Kratz;Ko Nishino Publication Year: 2009,Page(s):1446 – 1453 Cited by: Papers (306) | Patents (4)
Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions Rizwan Chaudhry;Avinash Ravichandran;Gregory Hager;Rene Vidal Publication Year: 2009,Page(s):1932 – 1939 Cited by: Papers (289) | Patents (4)
Class-specific Hough forests for object detection Juergen Gall;Victor Lempitsky Publication Year: 2009,Page(s):1022 – 1029 Cited by: Papers (275) | Patents (16)
Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates Jaechul Kim;Kristen Grauman Publication Year: 2009,Page(s):2921 – 2928 Cited by: Papers (256) | Patents (3)
From contours to regions: An empirical evaluation Pablo Arbelaez;Michael Maire;Charless Fowlkes;Jitendra Malik Publication Year: 2009,Page(s):2294 – 2301 Cited by: Papers (248) | Patents (1)
Human age estimation using bio-inspired features Guodong Guo;Guowang Mu;Yun Fu;Thomas S. Huang Publication Year: 2009,Page(s):112 – 119 Cited by: Papers (241) | Patents (1)
From structure-from-motion point clouds to fast location recognition Arnold Irschara;Christopher Zach;Jan-Michael Frahm;Horst Bischof Publication Year: 2009,Page(s):2599 – 2606 Cited by: Papers (238) | Patents (22)
On the burstiness of visual elements Herve Jegou;Matthijs Douze;Cordelia Schmid Publication Year: 2009,Page(s):1169 – 1176 Cited by: Papers (237) | Patents (7)
An empirical study of context in object detection Santosh K. Divvala;Derek Hoiem;James H. Hays;Alexei A. Efros;Martial Hebert Publication Year: 2009,Page(s):1271 – 1278 Cited by: Papers (231) | Patents (6)
Surface feature detection and description with applications to mesh matching Andrei Zaharescu;Edmond Boyer;Kiran Varanasi;Radu Horaud Publication Year: 2009,Page(s):373 – 380 Cited by: Papers (229) | Patents (1)
Learning to associate: HybridBoosted multi-target tracker for crowded scene Yuan Li;Chang Huang;Ram Nevatia Publication Year: 2009,Page(s):2953 – 2960 Cited by: Papers (226) | Patents (4)
Geometric reasoning for single image structure recovery David C. Lee;Martial Hebert;Takeo Kanade Publication Year: 2009,Page(s):2136 – 2143 Cited by: Papers (219) | Patents (9)
Multi-cue onboard pedestrian detection Christian Wojek;Stefan Walk;Bernt Schiele Publication Year: 2009,Page(s):794 – 801 Cited by: Papers (206) | Patents (3)
Recognising action as clouds of space-time interest points Matteo Bregonzio;Shaogang Gong;Tao Xiang Publication Year: 2009,Page(s):1948 – 1955 Cited by: Papers (192) | Patents (1)
Manhattan-world stereo Yasutaka Furukawa;Brian Curless;Steven M. Seitz;Richard Szeliski Publication Year: 2009,Page(s):1422 – 1429 Cited by: Papers (190) | Patents (8)
Motion capture using joint skeleton tracking and surface estimation Juergen Gall;Carsten Stoll;Edilson de Aguiar;Christian Theobalt;Bodo Rosenhahn;Hans-Peter Seidel Publication Year: 2009,Page(s):1746 – 1753 Cited by: Papers (173) | Patents (4)
Real-time O(1) bilateral filtering Qingxiong Yang;Kar-Han Tan;Narendra Ahuja Publication Year: 2009,Page(s):557 – 564 Cited by: Papers (173) | Patents (5)
Towards total scene understanding: Classification, annotation and segmentation in an automatic framework Li-Jia Li;Richard Socher;Li Fei-Fei Publication Year: 2009,Page(s):2036 – 2043 Cited by: Papers (171) | Patents (30)
Single image haze removal using dark channel prior Kaiming He;Jian Sun;Xiaoou Tang Publication Year: 2009,Page(s):1956 – 1963 Cited by: Papers (169) | Patents (12)
Object detection using a max-margin Hough transform Subhransu Maji;Jitendra Malik Publication Year: 2009,Page(s):1038 – 1045 Cited by: Papers (165) | Patents (9)
Understanding images of groups of people Andrew C. Gallagher;Tsuhan Chen Publication Year: 2009,Page(s):256 – 263 Cited by: Papers (164) | Patents (1)
Picking the best DAISY Simon Winder;Gang Hua;Matthew Brown Publication Year: 2009,Page(s):178 – 185 Cited by: Papers (163) | Patents (17)
A perceptually motivated online benchmark for image matting Christoph Rhemann;Carsten Rother;Jue Wang;Margrit Gelautz;Pushmeet Kohli;Pamela Rott Publication Year: 2009,Page(s):1826 – 1833 Cited by: Papers (156) | Patents (36)
CHoG: Compressed histogram of gradients A low bit-rate feature descriptor Vijay Chandrasekhar;Gabriel Takacs;David Chen;Sam Tsai;Radek Grzeszczuk;Bernd Girod Publication Year: 2009,Page(s):2504 – 2511 Cited by: Papers (156) | Patents (12)
Large displacement optical flow Thomas Brox;Christoph Bregler;Jitendra Malik Publication Year: 2009,Page(s):41 – 48 Cited by: Papers (152) | Patents (2)
Efficient representation of local geometry for large scale object retrieval Michal Perd’och;Ondrej Chum;Jiri Matas Publication Year: 2009,Page(s):9 – 16 Cited by: Papers (149) | Patents (2)
Multi-class active learning for image classification Ajay J. Joshi;Fatih Porikli;Nikolaos Papanikolopoulos Publication Year: 2009,Page(s):2372 – 2379 Cited by: Papers (146) | Patents (4)
Tour the world: Building a web-scale landmark recognition engine Yan-Tao Zheng;Ming Zhao;Yang Song;Hartwig Adam;Ulrich Buddemeier;Alessandro Bissacco;Fernando Brucher;Tat-Seng Chua;Hartmut Neven Publication Year: 2009,Page(s):1085 – 1092 Cited by: Papers (144) | Patents (21)
Pose estimation for category specific multiview object localization Mustafa Ozuysal;Vincent Lepetit;Pascal Fua Publication Year: 2009,Page(s):778 – 785 Cited by: Papers (143) | Patents (1)
Nonparametric scene parsing: Label transfer via dense scene alignment Ce Liu;Jenny Yuen;Antonio Torralba Publication Year: 2009,Page(s):1972 – 1979 Cited by: Papers (142) | Patents (10)
Learning invariant features through topographic filter maps Koray Kavukcuoglu;Marc’Aurelio Ranzato;Rob Fergus;Yann LeCun Publication Year: 2009,Page(s):1605 – 1612 Cited by: Papers (135) | Patents (2)
A convex relaxation approach for computing minimal partitions Thomas Pock;Antonin Chambolle;Daniel Cremers;Horst Bischof Publication Year: 2009,Page(s):810 – 817 Cited by: Papers (134)
Marked point processes for crowd counting Weina Ge;Robert T. Collins Publication Year: 2009,Page(s):2913 – 2920 Cited by: Papers (131) | Patents (1)
Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos Abhinav Gupta;Praveen Srinivasan;Jianbo Shi;Larry S. Davis Publication Year: 2009,Page(s):2012 – 2019 Cited by: Papers (127) | Patents (8)
Simultaneous image classification and annotation Wang Chong;David Blei;Fei-Fei Li Publication Year: 2009,Page(s):1903 – 1910 Cited by: Papers (126) | Patents (2)
Contextual classification with functional Max-Margin Markov Networks Daniel Munoz;J. Andrew Bagnell;Nicolas Vandapel;Martial Hebert Publication Year: 2009,Page(s):975 – 982 Cited by: Papers (124) | Patents (3)

https://ieeexplore.ieee.org/xpl/conhome/5191365/proceeding?sortType=paper-citations&rowsPerPage=50&pageNumber=1