Fast cost-volume filtering for visual correspondence and beyond

Many computer vision tasks can be formulated as labeling problems. The desired solution is often a spatially smooth labeling where label transitions are aligned with color edges of the input image. We show that such solutions can be efficiently achieved by smoothing the label costs with a very fast edge preserving filter. In this paper we propose a generic and simple framework comprising three steps: (i) constructing a cost volume (ii) fast cost volume filtering and (iii) winner-take-all label selection. Our main contribution is to show that with such a simple framework state-of-the-art results can be achieved for several computer vision applications. In particular, we achieve (i) disparity maps in real-time, whose quality exceeds those of all other fast (local) approaches on the Middlebury stereo benchmark, and (ii) optical flow fields with very fine structures as well as large displacements. To demonstrate robustness, the few parameters of our framework are set to nearly identical values for both applications. Also, competitive results for interactive image segmentation are presented. With this work, we hope to inspire other researchers to leverage this framework to other application areas.

Localizing parts of faces using a consensus of exemplars

We present a novel approach to localizing parts in images of human faces. The approach combines the output of local detectors with a non-parametric set of global models for the part locations based on over one thousand hand-labeled exemplar images. By assuming that the global models generate the part locations as hidden variables, we derive a Bayesian objective function. This function is optimized using a consensus of models for these hidden variables. The resulting localizer handles a much wider range of expression, pose, lighting and occlusion than prior ones. We show excellent performance on a new dataset gathered from the internet and show that our localizer achieves state-of-the-art performance on the less challenging BioID dataset.

Unbiased look at dataset bias

Datasets are an integral part of contemporary object recognition research. They have been the chief reason for the considerable progress in the field, not just as source of large amounts of training data, but also as means of measuring and comparing performance of competing algorithms. At the same time, datasets have often been blamed for narrowing the focus of object recognition research, reducing it to a single benchmark performance number. Indeed, some datasets, that started out as data capture efforts aimed at representing the visual world, have become closed worlds unto themselves (e.g. the Corel world, the Caltech-101 world, the PASCAL VOC world). With the focus on beating the latest benchmark numbers on the latest dataset, have we perhaps lost sight of the original purpose? The goal of this paper is to take stock of the current state of recognition datasets. We present a comparison study using a set of popular datasets, evaluated based on a number of criteria including: relative data bias, cross-dataset generalization, effects of closed-world assumption, and sample value. The experimental results, some rather surprising, suggest directions that can improve dataset collection as well as algorithm evaluation protocols. But more broadly, the hope is to stimulate discussion in the community regarding this very important, but largely neglected issue.

  1. Real-time human pose recognition in parts from single depth images Jamie Shotton;Andrew Fitzgibbon;Mat Cook;Toby Sharp;Mark Finocchio;Richard Moore;Alex Kipman;Andrew Blake Publication Year: 2011,Page(s):1297 – 1304 Cited by: Papers (1666) | Patents (106)
  2. Action recognition by dense trajectories Heng Wang;Alexander Kläser;Cordelia Schmid;Cheng-Lin Liu Publication Year: 2011,Page(s):3169 – 3176 Cited by: Papers (1169) | Patents (13)
  3. Global contrast based salient region detection Ming-Ming Cheng;Guo-Xin Zhang;Niloy J. Mitra;Xiaolei Huang;Shi-Min Hu Publication Year: 2011,Page(s):409 – 416 Cited by: Papers (1031) | Patents (21)
  4. Iterative quantization: A procrustean approach to learning binary codes Yunchao Gong;Svetlana Lazebnik Publication Year: 2011,Page(s):817 – 824 Cited by: Papers (581) | Patents (18)
  5. Articulated pose estimation with flexible mixtures-of-parts Yi Yang;Deva Ramanan Publication Year: 2011,Page(s):1385 – 1392 Cited by: Papers (571) | Patents (11)
  6. Face recognition in unconstrained videos with matched background similarity Lior Wolf;Tal Hassner;Itay Maoz Publication Year: 2011,Page(s):529 – 534 Cited by: Papers (565) | Patents (5)
  7. Unbiased look at dataset bias Antonio Torralba;Alexei A. Efros Publication Year: 2011,Page(s):1521 – 1528 Cited by: Papers (554) | Patents (4)
  8. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis Quoc V. Le;Will Y. Zou;Serena Y. Yeung;Andrew Y. Ng Publication Year: 2011,Page(s):3361 – 3368 Cited by: Papers (499) | Patents (2)
  9. Blind deconvolution using a normalized sparsity measure Dilip Krishnan;Terence Tay;Rob Fergus Publication Year: 2011,Page(s):233 – 240 Cited by: Papers (468) | Patents (7)
  10. Globally-optimal greedy algorithms for tracking a variable number of objects Hamed Pirsiavash;Deva Ramanan;Charless C. Fowlkes Publication Year: 2011,Page(s):1201 – 1208 Cited by: Papers (435) | Patents (8)
  11. Learning a discriminative dictionary for sparse coding via label consistent K-SVD Zhuolin Jiang;Zhe Lin;Larry S. Davis Publication Year: 2011,Page(s):1697 – 1704 Cited by: Papers (411) | Patents (2)
  12. Entropy rate superpixel segmentation Ming-Yu Liu;Oncel Tuzel;Srikumar Ramalingam;Rama Chellappa Publication Year: 2011,Page(s):2097 – 2104 Cited by: Papers (383) | Patents (4)
  13. Person re-identification by probabilistic relative distance comparison Wei-Shi Zheng;Shaogang Gong;Tao Xiang Publication Year: 2011,Page(s):649 – 656 Cited by: Papers (368) | Patents (6)
  14. Sparse reconstruction cost for abnormal event detection Yang Cong;Junsong Yuan;Ji Liu Publication Year: 2011,Page(s):3449 – 3456 Cited by: Papers (352) | Patents (3)
  15. Robust tracking using local sparse appearance model and K-selection Baiyang Liu;Junzhou Huang;Lin Yang;Casimir Kulikowsk Publication Year: 2011,Page(s):1313 – 1320 Cited by: Papers (342) | Patents (1)
  16. Localizing parts of faces using a consensus of exemplars Peter N. Belhumeur;David W. Jacobs;David J. Kriegman;Neeraj Kumar Publication Year: 2011,Page(s):545 – 552 Cited by: Papers (338) | Patents (2)
  17. Evaluation of background subtraction techniques for video surveillance Sebastian Brutzer;Benjamin Höferlin;Gunther Heidemann Publication Year: 2011,Page(s):1937 – 1944 Cited by: Papers (337) | Patents (1)
  18. Context tracker: Exploring supporters and distracters in unconstrained environments Thang Ba Dinh;Nam Vo;Gérard Medioni Publication Year: 2011,Page(s):1177 – 1184 Cited by: Papers (335) | Patents (3)
  19. Robust sparse coding for face recognition Meng Yang;Lei Zhang;Jian Yang;David Zhang Publication Year: 2011,Page(s):625 – 632 Cited by: Papers (335)
  20. Multicore bundle adjustment Changchang Wu;Sameer Agarwal;Brian Curless;Steven M. Seitz Publication Year: 2011,Page(s):3057 – 3064 Cited by: Papers (329) | Patents (17)
  21. Fast cost-volume filtering for visual correspondence and beyond Christoph Rhemann;Asmaa Hosni;Michael Bleyer;Carsten Rother;Margrit Gelautz Publication Year: 2011,Page(s):3017 – 3024 Cited by: Papers (325) | Patents (8)
  22. Efficient marginal likelihood optimization in blind deconvolution Anat Levin;Yair Weiss;Fredo Durand;William T. Freeman Publication Year: 2011,Page(s):2657 – 2664 Cited by: Papers (307) | Patents (9)
  23. Stable multi-target tracking in real-time surveillance video Ben Benfold;Ian Reid Publication Year: 2011,Page(s):3457 – 3464 Cited by: Papers (306) | Patents (7)
  24. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms Brian Kulis;Kate Saenko;Trevor Darrell Publication Year: 2011,Page(s):1785 – 1792 Cited by: Papers (306) | Patents (7)
  25. Recognizing human actions by attributes Jingen Liu;Benjamin Kuipers;Silvio Savarese Publication Year: 2011,Page(s):3337 – 3344 Cited by: Papers (282) | Patents (5)
  26. Real time head pose estimation with random regression forests Gabriele Fanelli;Juergen Gall;Luc Van Gool Publication Year: 2011,Page(s):617 – 624 Cited by: Papers (251) | Patents (21)
  27. A large-scale benchmark dataset for event recognition in surveillance video Sangmin Oh;Anthony Hoogs;Amitha Perera;Naresh Cuntoor;Chia-Chih Chen;Jong Taek Lee;Saurajit Mukherjee;J. K. Aggarwal;Hyungtae Lee;Larry Davis;Eran Swears;Xioyang Wang;Qiang Ji;Kishore ReddyMubarak Shah;Carl Vondrick;Hamed Pirsiavash;Deva Ramanan;Jenny Yuen;Antonio Torralba;Bi Song;Anesco Fong;Amit Roy-Chowdhury;Mita Desai Publication Year: 2011,Page(s):3153 – 3160 Cited by: Papers (243) | Patents (3)
  28. Sparsity-based image denoising via dictionary learning and structural clustering Weisheng Dong;Xin Li;Lei Zhang;Guangming Shi Publication Year: 2011,Page(s):457 – 464 Cited by: Papers (241) | Patents (4)
  29. Baby talk: Understanding and generating simple image descriptions Girish Kulkarni;Visruth Premraj;Sagnik Dhar;Siming Li;Yejin Choi;Alexander C Berg;Tamara L Berg Publication Year: 2011,Page(s):1601 – 1608 Cited by: Papers (228) | Patents (12)
  30. Large-scale image classification: Fast feature extraction and SVM training Yuanqing Lin;Fengjun Lv;Shenghuo Zhu;Ming Yang;Timothee Cour;Kai Yu;Liangliang Cao;Thomas Huang Publication Year: 2011,Page(s):1689 – 1696 Cited by: Papers (216) | Patents (7)
  31. City-scale landmark identification on mobile devices David M. Chen;Georges Baatz;Kevin Köser;Sam S. Tsai;Ramakrishna Vedantham;Timo Pylvänäinen;Kimmo Roimela;Xin Chen;Jeff Bach;Marc Pollefeys;Bernd Girod;Radek Grzeszczuk Publication Year: 2011,Page(s):737 – 744 Cited by: Papers (215) | Patents (17)
  32. High level describable attributes for predicting aesthetics and interestingness Sagnik Dhar;Vicente Ordonez;Tamara L Berg Publication Year: 2011,Page(s):1657 – 1664 Cited by: Papers (214) | Patents (17)
  33. Online detection of unusual events in videos via dynamic sparse coding Bin Zhao;Li Fei-Fei;Eric P. Xing Publication Year: 2011,Page(s):3313 – 3320 Cited by: Papers (213) | Patents (4)
  34. Image retrieval with geometry-preserving visual phrases Yimeng Zhang;Zhaoyin Jia;Tsuhan Chen Publication Year: 2011,Page(s):809 – 816 Cited by: Papers (208) | Patents (9)
  35. Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch Abhishek Sharma;David W Jacobs Publication Year: 2011,Page(s):593 – 600 Cited by: Papers (206) | Patents (1)
  36. Saliency estimation using a non-parametric low-level vision model Naila Murray;Maria Vanrell;Xavier Otazu;C. Alejandro Parraga Publication Year: 2011,Page(s):433 – 440 Cited by: Papers (197) | Patents (1)
  37. Who are you with and where are you going? Kota Yamaguchi;Alexander C. Berg;Luis E. Ortiz;Tamara L. Berg Publication Year: 2011,Page(s):1345 – 1352 Cited by: Papers (197) | Patents (4)
  38. Recognition using visual phrases Mohammad Amin Sadeghi;Ali Farhadi Publication Year: 2011,Page(s):1745 – 1752 Cited by: Papers (196) | Patents (1)
  39. A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation Laurent Kneip;Davide Scaramuzza;Roland Siegwart Publication Year: 2011,Page(s):2969 – 2976 Cited by: Papers (194) | Patents (6)
  40. Real-time visual tracking using compressive sensing Hanxi Li;Chunhua Shen;Qinfeng Shi Publication Year: 2011,Page(s):1305 – 1312 Cited by: Papers (191) | Patents (2)
  41. Is face recognition really a Compressive Sensing problem? Qinfeng Shi;Anders Eriksson;Anton van den Hengel;Chunhua Shen Publication Year: 2011,Page(s):553 – 560 Cited by: Papers (187)
  42. Multi-target tracking by continuous energy minimization Anton Andriyenko;Konrad Schindler Publication Year: 2011,Page(s):1265 – 1272 Cited by: Papers (186) | Patents (3)
  43. Image ranking and retrieval based on multi-attribute queries Behjat Siddiquie;Rogerio S. Feris;Larry S. Davis Publication Year: 2011,Page(s):801 – 808 Cited by: Papers (182) | Patents (6)
  44. Combining randomization and discrimination for fine-grained image categorization Bangpeng Yao;Aditya Khosla;Li Fei-Fei Publication Year: 2011,Page(s):1577 – 1584 Cited by: Papers (177) | Patents (5)
  45. Action recognition from a distributed representation of pose and appearance Subhransu Maji;Lubomir Bourdev;Jitendra Malik Publication Year: 2011,Page(s):3177 – 3184 Cited by: Papers (173) | Patents (1)
  46. Object cosegmentation Sara Vicente;Carsten Rother;Vladimir Kolmogorov Publication Year: 2011,Page(s):2217 – 2224 Cited by: Papers (173) | Patents (2)
  47. Visual saliency detection by spatially weighted dissimilarity Lijuan Duan;Chunpeng Wu;Jun Miao;Laiyun Qing;Yu Fu Publication Year: 2011,Page(s):473 – 480 Cited by: Papers (169) | Patents (1)
  48. Total recall II: Query expansion revisited Ondřej Chum;Andrej Mikulík;Michal Perdoch;Jiří Matas Publication Year: 2011,Page(s):889 – 896 Cited by: Papers (168) | Patents (6)
  49. Ordinal hyperplanes ranker with cost sensitivities for age estimation Kuang-Yu Chang;Chu-Song Chen;Yi-Ping Hung Publication Year: 2011,Page(s):585 – 592 Cited by: Papers (167)
  50. Learning to recognize objects in egocentric activities Alireza Fathi;Xiaofeng Ren;James M. Rehg Publication Year: 2011,Page(s):3281 – 3288 Cited by: Papers (166) | Patents (2)

