Top Cited Papers: CVPR 2012

Curated Papers:

Multi-class cosegmentation

Bottom-up, fully unsupervised segmentation remains a daunting challenge for computer vision. In the cosegmentation context, on the other hand, the availability of multiple images assumed to contain instances of the same object classes provides a weak form of supervision that can be exploited by discriminative approaches. Unfortunately, most existing algorithms are limited to a very small number of images and/or object classes (typically two of each). This paper proposes a novel energy-minimization approach to cosegmentation that can handle multiple classes and a significantly larger number of images. The proposed cost function combines spectral- and discriminative-clustering terms, and it admits a probabilistic interpretation. It is optimized using an efficient EM method, initialized using a convex quadratic approximation of the energy. Comparative experiments show that the proposed approach matches or improves the state of the art on several standard datasets.

Image denoising: Can plain neural networks compete with BM3D?

Image denoising can be described as the problem of mapping from a noisy image to a noise-free image. The best currently available denoising methods approximate this mapping with cleverly engineered algorithms. In this work we attempt to learn this mapping directly with a plain multi layer perceptron (MLP) applied to image patches. While this has been done before, we will show that by training on large image databases we are able to compete with the current state-of-the-art image denoising methods. Furthermore, our approach is easily adapted to less extensively studied types of noise (by merely exchanging the training data), for which we achieve excellent results as well.

Three things everyone should know to improve object retrieval

The objective of this work is object retrieval in large scale image datasets, where the object is specified by an image query and retrieval should be immediate at run time in the manner of Video Google [28]. We make the following three contributions: (i) a new method to compare SIFT descriptors (RootSIFT) which yields superior performance without increasing processing or storage requirements; (ii) a novel method for query expansion where a richer model for the query is learnt discriminatively in a form suited to immediate retrieval through efficient use of the inverted index; (iii) an improvement of the image augmentation method proposed by Turcot and Lowe [29], where only the augmenting features which are spatially consistent with the augmented image are kept. We evaluate these three methods over a number of standard benchmark datasets (Oxford Buildings 5k and 105k, and Paris 6k) and demonstrate substantial improvements in retrieval performance whilst maintaining immediate retrieval speeds. Combining these complementary methods achieves a new state-of-the-art performance on these datasets.

FREAK: Fast Retina Keypoint

A large number of vision applications rely on matching keypoints across images. The last decade featured an arms-race towards faster and more robust keypoints and association algorithms: Scale Invariant Feature Transform (SIFT)[17], Speed-up Robust Feature (SURF)[4], and more recently Binary Robust Invariant Scalable Keypoints (BRISK)[I6] to name a few. These days, the deployment of vision algorithms on smart phones and embedded devices with low memory and computation complexity has even upped the ante: the goal is to make descriptors faster to compute, more compact while remaining robust to scale, rotation and noise. To best address the current requirements, we propose a novel keypoint descriptor inspired by the human visual system and more precisely the retina, coined Fast Retina Keypoint (FREAK). A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sampling pattern. Our experiments show that FREAKs are in general faster to compute with lower memory load and also more robust than SIFT, SURF or BRISK. They are thus competitive alternatives to existing keypoints in particular for embedded applications.

Full List:

Are we ready for autonomous driving? The KITTI vision benchmark suite Andreas Geiger;Philip Lenz;Raquel Urtasun Publication Year: 2012,Page(s):3354 – 3361 Cited by: Papers (2792) | Patents (17)
Multi-column deep neural networks for image classification Dan Ciregan;Ueli Meier;Jürgen Schmidhuber Publication Year: 2012,Page(s):3642 – 3649 Cited by: Papers (1108) | Patents (10)
Large scale metric learning from equivalence constraints Martin Köstinger;Martin Hirzer;Paul Wohlhart;Peter M. Roth;Horst Bischof Publication Year: 2012,Page(s):2288 – 2295 Cited by: Papers (815) | Patents (1)
FREAK: Fast Retina Keypoint Alexandre Alahi;Raphael Ortiz;Pierre Vandergheynst Publication Year: 2012,Page(s):510 – 517 Cited by: Papers (812) | Patents (22)
Saliency filters: Contrast based filtering for salient region detection Federico Perazzi;Philipp Krähenbühl;Yael Pritch;Alexander Hornung Publication Year: 2012,Page(s):733 – 740 Cited by: Papers (782) | Patents (4)
Mining actionlet ensemble for action recognition with depth cameras Jiang Wang;Zicheng Liu;Ying Wu;Junsong Yuan Publication Year: 2012,Page(s):1290 – 1297 Cited by: Papers (611) | Patents (3)
Robust object tracking via sparsity-based collaborative model Wei Zhong;Huchuan Lu;Ming-Hsuan Yang Publication Year: 2012,Page(s):1838 – 1845 Cited by: Papers (573) | Patents (2)
Three things everyone should know to improve object retrieval Relja Arandjelović;Andrew Zisserman Publication Year: 2012,Page(s):2911 – 2918 Cited by: Papers (554) | Patents (6)
Image denoising: Can plain neural networks compete with BM3D? Harold C. Burger;Christian J. Schuler;Stefan Harmeling Publication Year: 2012,Page(s):2392 – 2399 Cited by: Papers (436) | Patents (20)
Real-time scene text localization and recognition Lukáš Neumann;Jiří Matas Publication Year: 2012,Page(s):3538 – 3545 Cited by: Papers (410) | Patents (12)
Action bank: A high-level representation of activity in video Sreemanananth Sadanand;Jason J. Corso Publication Year: 2012,Page(s):1234 – 1241 Cited by: Papers (387) | Patents (7)
PCCA: A new approach for distance learning from sparse pairwise constraints Alexis Mignon;Frédéric Jurie Publication Year: 2012,Page(s):2666 – 2672 Cited by: Papers (371)
Generalized Multiview Analysis: A discriminative latent space Abhishek Sharma;Abhishek Kumar;Hal Daume;David W. Jacobs Publication Year: 2012,Page(s):2160 – 2167 Cited by: Papers (330)
Pedestrian detection at 100 frames per second Rodrigo Benenson;Markus Mathias;Radu Timofte;Luc Van Gool Publication Year: 2012,Page(s):2903 – 2910 Cited by: Papers (324) | Patents (7)
Distribution fields for tracking Laura Sevilla-Lara;Erik Learned-Miller Publication Year: 2012,Page(s):1910 – 1917 Cited by: Papers (316) | Patents (2)
Detecting activities of daily living in first-person camera views Hamed Pirsiavash;Deva Ramanan Publication Year: 2012,Page(s):2847 – 2854 Cited by: Papers (315) | Patents (2)
AVA: A large-scale database for aesthetic visual analysis Naila Murray;Luca Marchesotti;Florent Perronnin Publication Year: 2012,Page(s):2408 – 2415 Cited by: Papers (270) | Patents (4)
Visual tracking via adaptive structural local sparse appearance model Xu Jia;Huchuan Lu;Ming-Hsuan Yang Publication Year: 2012,Page(s):1822 – 1829 Cited by: Papers (244) | Patents (2)
SUN attribute database: Discovering, annotating, and recognizing scene attributes Genevieve Patterson;James Hays Publication Year: 2012,Page(s):2751 – 2758 Cited by: Papers (239) | Patents (2)
Learning object class detectors from weakly annotated video Alessandro Prest;Christian Leistner;Javier Civera;Cordelia Schmid;Vittorio Ferrari Publication Year: 2012,Page(s):3282 – 3289 Cited by: Papers (227) | Patents (1)
Parsing clothing in fashion photographs Kota Yamaguchi;M. Hadi Kiapour;Luis E. Ortiz;Tamara L. Berg Publication Year: 2012,Page(s):3570 – 3577 Cited by: Papers (225) | Patents (5)
Enhancing underwater images and videos by fusion Cosmin Ancuti;Codruta Orniana Ancuti;Tom Haber;Philippe Bekaert Publication Year: 2012,Page(s):81 – 88 Cited by: Papers (221)
Real-time facial feature detection using conditional regression forests Matthias Dantone;Juergen Gall;Gabriele Fanelli;Luc Van Gool Publication Year: 2012,Page(s):2578 – 2585 Cited by: Papers (219) | Patents (7)
Learning hierarchical representations for face verification with convolutional deep belief networks Gary B. Huang;Honglak Lee;Erik Learned-Miller Publication Year: 2012,Page(s):2518 – 2525 Cited by: Papers (209) | Patents (5)
Exploiting local and global patch rarities for saliency detection Ali Borji;Laurent Itti Publication Year: 2012,Page(s):478 – 485 Cited by: Papers (205) | Patents (1)
A database for fine grained activity detection of cooking activities Marcus Rohrbach;Sikandar Amin;Mykhaylo Andriluka;Bernt Schiele Publication Year: 2012,Page(s):1194 – 1201 Cited by: Papers (199) | Patents (2)
Discrete-continuous optimization for multi-target tracking Anton Andriyenko;Konrad Schindler;Stefan Roth Publication Year: 2012,Page(s):1926 – 1933 Cited by: Papers (199)
Locally Orderless Tracking Shaul Oron;Aharon Bar-Hillel;Dan Levi;Shai Avidan Publication Year: 2012,Page(s):1940 – 1947 Cited by: Papers (198) | Patents (1)
Supervised hashing with kernels Wei Liu;Jun Wang;Rongrong Ji;Yu-Gang Jiang;Shih-Fu Chang Publication Year: 2012,Page(s):2074 – 2081 Cited by: Papers (195) | Patents (4)
Learning latent temporal structure for complex event detection Kevin Tang;Li Fei-Fei;Daphne Koller Publication Year: 2012,Page(s):1250 – 1257 Cited by: Papers (189) | Patents (3)
Cats and dogs Omkar M Parkhi;Andrea Vedaldi;Andrew Zisserman;C. V. Jawahar Publication Year: 2012,Page(s):3498 – 3505 Cited by: Papers (188)
Globally consistent depth labeling of 4D light fields Sven Wanner;Bastian Goldluecke Publication Year: 2012,Page(s):41 – 48 Cited by: Papers (181) | Patents (26)
Face detection, pose estimation, and landmark localization in the wild Xiangxin Zhu;Deva Ramanan Publication Year: 2012,Page(s):2879 – 2886 Cited by: Papers (173) | Patents (19)
Geodesic flow kernel for unsupervised domain adaptation Boqing Gong;Yuan Shi;Fei Sha;Kristen Grauman Publication Year: 2012,Page(s):2066 – 2073 Cited by: Papers (172) | Patents (5)
Multi-class cosegmentation Armand Joulin;Francis Bach;Jean Ponce Publication Year: 2012,Page(s):542 – 549 Cited by: Papers (168) | Patents (2) )
See all by looking at a few: Sparse modeling for finding representative objects Ehsan Elhamifar;Guillermo Sapiro;René Vidal Publication Year: 2012,Page(s):1600 – 1607 Cited by: Papers (167)
Semantic segmentation using regions and parts Pablo Arbeláez;Bharath Hariharan;Chunhui Gu;Saurabh Gupta;Lubomir Bourdev;Jitendra Malik Publication Year: 2012,Page(s):3378 – 3385 Cited by: Papers (155) | Patents (3)
Social interactions: A first-person perspective Alircza Fathi;Jessica K. Hodgins;James M. Rehg Publication Year: 2012,Page(s):1226 – 1233 Cited by: Papers (153)
Top-down and bottom-up cues for scene text recognition Anand Mishra;Karteek Alahari;C. V. Jawahar Publication Year: 2012,Page(s):2687 – 2694 Cited by: Papers (153) | Patents (4)
The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation Jonathan Taylor;Jamie Shotton;Toby Sharp;Andrew Fitzgibbon Publication Year: 2012,Page(s):103 – 110 Cited by: Papers (150) | Patents (10)
WhittleSearch: Image search with relative attribute feedback Adriana Kovashka;Devi Parikh;Kristen Grauman Publication Year: 2012,Page(s):2973 – 2980 Cited by: Papers (150) | Patents (6)
Boosting bottom-up and top-down visual features for saliency estimation Ali Borji Publication Year: 2012,Page(s):438 – 445 Cited by: Papers (148) | Patents (2)
Discovering discriminative action parts from mid-level video representations Michalis Raptis;Iasonas Kokkinos;Stefano Soatto Publication Year: 2012,Page(s):1242 – 1249 Cited by: Papers (145) | Patents (2)
Discriminative spatial saliency for image classification Gaurav Sharma;Frédéric Jurie;Cordelia Schmid Publication Year: 2012,Page(s):3506 – 3513 Cited by: Papers (134) | Patents (1)
Tracking the articulated motion of two strongly interacting hands I. Oikonomidis;N. Kyriazis;A. A. Argyros Publication Year: 2012,Page(s):1862 – 1869 Cited by: Papers (129) | Patents (9)
Video anomaly detection based on local statistical aggregates Venkatesh Saligrama;Zhu Chen Publication Year: 2012,Page(s):2112 – 2119 Cited by: Papers (129) | Patents (1)
The inverted multi-index Artem Babenko;Victor Lempitsky Publication Year: 2012,Page(s):3069 – 3076 Cited by: Papers (115)
Teaching 3D geometry to deformable part models Bojan Pepik;Michael Stark;Peter Gehler;Bernt Schiele Publication Year: 2012,Page(s):3362 – 3369 Cited by: Papers (114) | Patents (1)
3D Constrained Local Model for rigid and non-rigid facial tracking Tadas Baltrušaitis;Peter Robinson;Louis-Philippe Morency Publication Year: 2012,Page(s):2610 – 2617 Cited by: Papers (113) | Patents (6)
Fast search in Hamming space with multi-index hashing Mohammad Norouzi;Ali Punjani;David J. Fleet Publication Year: 2012,Page(s):3108 – 3115 Cited by: Papers (106) | Patents (4)

https://ieeexplore.ieee.org/xpl/conhome/6235193/proceeding?sortType=paper-citations&rowsPerPage=50&pageNumber=1