Curated Papers:
A discriminatively trained, multiscale, deformable part model
This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge. Our system also relies heavily on new methods for discriminative training. We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM. A latent SVM, like a hidden CRF, leads to a non-convex training problem. However, a latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. We believe that our training methods will eventually make possible the effective use of more latent information such as hierarchical (grammar) models and models involving latent three dimensional pose.
In defense of Nearest-Neighbor based image classification
State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric nearest-neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NN-based image classifiers useless. We claim that the effectiveness of non-parametric NN-based image classification has been considerably undervalued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: (i) Quantization of local image descriptors (used to generate “bags-of-words “, codebooks). (ii) Computation of ‘image-to-image’ distance, instead of ‘image-to-class’ distance. We propose a trivial NN-based classifier – NBNN, (Naive-Bayes nearest-neighbor), which employs NN- distances in the space of the local image descriptors (and not in the space of images). NBNN computes direct ‘image- to-class’ distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN. Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101 ,Caltech-256 and Graz-01).
A fast local descriptor for dense matching
We introduce a novel local image descriptor designed for dense wide-baseline matching purposes. We feed our descriptors to a graph-cuts based dense depth map estimation algorithm and this yields better wide-baseline performance than the commonly used correlation windows for which the size is hard to tune. As a result, unlike competing techniques that require many high-resolution images to produce good reconstructions, our descriptor can compute them from pairs of low-quality images such as the ones captured by video streams. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance. Our approach was tested with ground truth laser scanned depth maps as well as on a wide variety of image pairs of different resolutions and we show that good reconstructions are achieved even with only two low quality images.
Unsupervised discovery of visual object class hierarchies
Objects in the world can be arranged into a hierarchy based on their semantic meaning (e.g. organism – animal – feline – cat). What about defining a hierarchy based on the visual appearance of objects? This paper investigates ways to automatically discover a hierarchical structure for the visual world from a collection of unlabeled images. Previous approaches for unsupervised object and scene discovery focused on partitioning the visual data into a set of non-overlapping classes of equal granularity. In this work, we propose to group visual objects using a multi-layer hierarchy tree that is based on common visual elements. This is achieved by adapting to the visual domain the generative hierarchical latent Dirichlet allocation (hLDA) model previously used for unsupervised discovery of topic hierarchies in text. Images are modeled using quantized local image regions as analogues to words in text. Employing the multiple segmentation framework of Russell et al. [22], we show that meaningful object hierarchies, together with object segmentations, can be automatically learned from unlabeled and unsegmented image collections without supervision. We demonstrate improved object classification and localization performance using hLDA over the previous non-hierarchical method on the MSRC dataset [33].
Object categorization using co-occurrence, location and appearance
In this work we introduce a novel approach to object categorization that incorporates two types of context-co-occurrence and relative location – with local appearance-based features. Our approach, named CoLA (for co-occurrence, location and appearance), uses a conditional random field (CRF) to maximize object label agreement according to both semantic and spatial relevance. We model relative location between objects using simple pairwise features. By vector quantizing this feature space, we learn a small set of prototypical spatial relationships directly from the data. We evaluate our results on two challenging datasets: PASCAL 2007 and MSRC. The results show that combining co-occurrence and spatial context improves accuracy in as many as half of the categories compared to using co-occurrence alone.
Full List:
- Learning realistic human actions from movies Ivan Laptev;Marcin Marszalek;Cordelia Schmid;Benjamin Rozenfeld Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (1943) | Patents (17)
- A discriminatively trained, multiscale, deformable part model Pedro Felzenszwalb;David McAllester;Deva Ramanan Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (1282) | Patents (21)
- Visibility in bad weather from a single image Robby T. Tan Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (954) | Patents (13)
- Lost in quantization: Improving particular object retrieval in large scale image databases James Philbin;Ondrej Chum;Michael Isard;Josef Sivic;Andrew Zisserman Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (779) | Patents (14)
- In defense of Nearest-Neighbor based image classification Oren Boiman;Eli Shechtman;Michal Irani Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (663) | Patents (14)
- Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition Mikel D. Rodriguez;Javed Ahmed;Mubarak Shah Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (612) | Patents (2)
- Classification using intersection kernel support vector machines is efficient Subhransu Maji;Alexander C. Berg;Jitendra Malik Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (571) | Patents (5)
- Semantic texton forests for image categorization and segmentation Jamie Shotton;Matthew Johnson;Roberto Cipolla Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (550) | Patents (34)
- People-tracking-by-detection and people-detection-by-tracking Mykhaylo Andriluka;Stefan Roth;Bernt Schiele Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (485) | Patents (5)
- Privacy preserving crowd monitoring: Counting people without people models or tracking Antoni B. Chan;Zhang-Sheng John Liang;Nuno Vasconcelos Publication Year: 2008,Page(s):1 – 7 Cited by: Papers (467) | Patents (7)
- Global data association for multi-object tracking using network flows Li Zhang;Yuan Li;Ramakant Nevatia Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (431) | Patents (3)
- IM2GPS: estimating geographic information from a single image James Hays;Alexei A. Efros Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (427) | Patents (26)
- Discriminative learned dictionaries for local image analysis Julien Mairal;Francis Bach;Jean Ponce;Guillermo Sapiro;Andrew Zisserman Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (421) | Patents (7)
- Beyond sliding windows: Object localization by efficient subwindow search Christoph H. Lampert;Matthew B. Blaschko;Thomas Hofmann Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (413) | Patents (14)
- Small codes and large image databases for recognition Antonio Torralba;Rob Fergus;Yair Weiss Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (408) | Patents (15)
- On benchmarking camera calibration and multi-view stereo for high resolution imagery C. Strecha;W. von Hansen;L. Van Gool;P. Fua;U. Thoennessen Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (407) | Patents (2)
- Progressive search space reduction for human pose estimation Vittorio Ferrari;Manuel Marin-Jimenez;Andrew Zisserman Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (377) | Patents (4)
- Summarizing visual data using bidirectional similarity Denis Simakov;Yaron Caspi;Eli Shechtman;Michal Irani Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (324) | Patents (29)
- PSF estimation using sharp edge prediction Neel Joshi;Richard Szeliski;David J. Kriegman Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (322) | Patents (32)
- A mobile vision system for robust multi-person tracking Andreas Ess;Bastian Leibe;Konrad Schindler;Luc Van Gool Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (317) | Patents (1)
- Optimised KD-trees for fast image descriptor matching Chanop Silpa-Anan;Richard Hartley Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (292) | Patents (9)
- Action snippets: How many frames does human action recognition require? Konrad Schindler;Luc van Gool Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (288) | Patents (3)
- A fast local descriptor for dense matching Engin Tola;Vincent Lepetit;Pascal Fua Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (285) | Patents (25)
- Action recognition by learning mid-level motion features Alireza Fathi;Greg Mori Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (271) | Patents (1)
- Object categorization using co-occurrence, location and appearance Carolina Galleguillos;Andrew Rabinovich;Serge Belongie Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (263) | Patents (17)
- Image super-resolution as sparse representation of raw image patches Jianchao Yang;John Wright;Thomas Huang;Yi Ma Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (250) | Patents (12)
- Using contours to detect and localize junctions in natural images Michael Maire;Pablo Arbelaez;Charless Fowlkes;Jitendra Malik Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (244)
- Superpixel lattices Alastair P. Moore;Simon J. D. Prince;Jonathan Warrell;Umar Mohammed;Graham Jones Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (215) | Patents (1)
- Constant time O(1) bilateral filtering Fatih Porikli Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (213) | Patents (7)
- Graph cut based image segmentation with connectivity priors Sara Vicente;Vladimir Kolmogorov;Carsten Rother Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (210) | Patents (6)
- Probabilistic graph and hypergraph matching Ron Zass;Amnon Shashua Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (209) | Patents (5)
- Learning object motion patterns for anomaly detection and improved object detection Arslan Basharat;Alexei Gritai;Mubarak Shah Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (188) | Patents (1)
- Pose primitive based human action recognition in videos or still images Christian Thurau;Vaclav Hlavac Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (169) | Patents (2)
- Unifying discriminative visual codebook generation with classifier training for object category recognition Liu Yang;Rong Jin;Rahul Sukthankar;Frederic Jurie Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (167) | Patents (6)
- Bayesian color constancy revisited Peter Vincent Gehler;Carsten Rother;Andrew Blake;Tom Minka;Toby Sharp Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (161) | Patents (16)
- Fast image search for learned metrics Prateek Jain;Brian Kulis;Kristen Grauman Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (152) | Patents (20)
- Clothing cosegmentation for recognizing people Andrew C. Gallagher;Tsuhan Chen Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (148) | Patents (24)
- Joint learning and dictionary construction for pattern recognition Duc-Son Pham;Svetha Venkatesh Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (142) | Patents (1)
- Skeletal graphs for efficient structure from motion Noah Snavely;Steven M. Seitz;Richard Szeliski Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (142) | Patents (2)
- Robust higher order potentials for enforcing label consistency Pushmeet Kohli;L’ubor Ladicky;Philip H. S. Torr Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (141) | Patents (3)
- Spatio-temporal Saliency detection using phase spectrum of quaternion fourier transform Chenlei Guo;Qi Ma;Liming Zhang Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (128) | Patents (4)
- Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories Shankar R. Rao;Roberto Tron;Rene Vidal;Yi Ma Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (124)
- Viewpoint-independent object class detection using 3D Feature Maps Joerg Liebelt;Cordelia Schmid;Klaus Schertler Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (122) | Patents (6)
- Learning and using taxonomies for fast visual categorization Gregory Griffin;Pietro Perona Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (120) | Patents (2)
- Simultaneous super-resolution and feature extraction for recognition of low-resolution faces Pablo H. Hennings-Yeomans;Simon Baker;B.V.K. Vijaya Kumar Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (120)
- Classification and evaluation of cost aggregation methods for stereo correspondence Federico Tombari;Stefano Mattoccia;Luigi Di Stefano;Elisa Addimanda Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (119) | Patents (6)
- Unsupervised discovery of visual object class hierarchies Josef Sivic;Bryan C. Russell;Andrew Zisserman;William T. Freeman;Alexei A. Efros Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (119)
- Action recognition using exemplar-based embedding Daniel Weinland;Edmond Boyer Publication Year: 2008,Page(s):1 – 7 Cited by: Papers (112)
- Human-assisted motion annotation Ce Liu;William T. Freeman;Edward H. Adelson;Yair Weiss Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (112)
- Variable baseline/resolution stereo David Gallup;Jan-Michael Frahm;Philippos Mordohai;Marc Pollefeys Publication Year: 2008,Page(s):1 – 8 Cited by: Papers (112) | Patents (2)