Top Cited Papers: CVPR 2016

Best Paper Award

2016

“Deep Residual Learning for Image Recognition”

K. He, X. Zhang, S. Ren, J. Sun

Honorable Mention

2016

“Sublabel-Accurate Relaxation of Nonconvex Energies”

T. Mollenhoff, E. Laude, M. Moeller, J. Lellmann, D. Cremers

Best Student Paper Award

2016

“Structural-RNN: Deep Learning on Spatio-Temporal Graphs”

A. Jain, A. R. Zamir, S. Savarese, A. Saxena

Longuet-Higgins Prize(Test-of-Time)

2016	“Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”	S. Lazebnik, C. Schmid, J. Ponce
2016	“Scalable Recognition with a Vocabulary Tree”	D. Nister and H. Stewenius

Curated Papers:

Deep Residual Learning for Image Recognition

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers – 8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

You Only Look Once: Unified, Real-Time Object Detection

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

Rethinking the Inception Architecture for Computer Vision

Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.

Full List:

Deep Residual Learning for Image Recognition Kaiming He;Xiangyu Zhang;Shaoqing Ren;Jian Sun Publication Year: 2016,Page(s):770 – 778 Cited by: Papers (23748) | Patents (42)
You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon;Santosh Divvala;Ross Girshick;Ali Farhadi Publication Year: 2016,Page(s):779 – 788 Cited by: Papers (5067) | Patents (28)
Rethinking the Inception Architecture for Computer Vision Christian Szegedy;Vincent Vanhoucke;Sergey Ioffe;Jon Shlens;Zbigniew Wojna Publication Year: 2016,Page(s):2818 – 2826 Cited by: Papers (3955) | Patents (13)
The Cityscapes Dataset for Semantic Urban Scene Understanding Marius Cordts;Mohamed Omran;Sebastian Ramos;Timo Rehfeld;Markus Enzweiler;Rodrigo Benenson;Uwe Franke;Stefan Roth;Bernt Schiele Publication Year: 2016,Page(s):3213 – 3223 Cited by: Papers (1799) | Patents (21)
Accurate Image Super-Resolution Using Very Deep Convolutional Networks Jiwon Kim;Jung Kwon Lee;Kyoung Mu Lee Publication Year: 2016,Page(s):1646 – 1654 Cited by: Papers (1418) | Patents (5)
Learning Deep Features for Discriminative Localization Bolei Zhou;Aditya Khosla;Agata Lapedriza;Aude Oliva;Antonio Torralba Publication Year: 2016,Page(s):2921 – 2929 Cited by: Papers (1277) | Patents (7)
Image Style Transfer Using Convolutional Neural Networks Leon A. Gatys;Alexander S. Ecker;Matthias Bethge Publication Year: 2016,Page(s):2414 – 2423 Cited by: Papers (1050) | Patents (16)
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network Wenzhe Shi;Jose Caballero;Ferenc Huszár;Johannes Totz;Andrew P. Aitken;Rob Bishop;Daniel Rueckert;Zehan Wang Publication Year: 2016,Page(s):1874 – 1883 Cited by: Papers (1024) | Patents (5)
Context Encoders: Feature Learning by Inpainting Deepak Pathak;Philipp Krähenbühl;Jeff Donahue;Trevor Darrell;Alexei A. Efros Publication Year: 2016,Page(s):2536 – 2544 Cited by: Papers (989) | Patents (3)
Convolutional Two-Stream Network Fusion for Video Action Recognition Christoph Feichtenhofer;Axel Pinz;Andrew Zisserman Publication Year: 2016,Page(s):1933 – 1941 Cited by: Papers (802) | Patents (10)
Convolutional Pose Machines Shih-En Wei;Varun Ramakrishna;Takeo Kanade;Yaser Sheikh Publication Year: 2016,Page(s):4724 – 4732 Cited by: Papers (792) | Patents (1)
Learning Multi-domain Convolutional Neural Networks for Visual Tracking Hyeonseob Nam;Bohyung Han Publication Year: 2016,Page(s):4293 – 4302 Cited by: Papers (755)
Deeply-Recursive Convolutional Network for Image Super-Resolution Jiwon Kim;Jung Kwon Lee;Kyoung Mu Lee Publication Year: 2016,Page(s):1637 – 1645 Cited by: Papers (719)
Staple: Complementary Learners for Real-Time Tracking Luca Bertinetto;Jack Valmadre;Stuart Golodetz;Ondrej Miksik;Philip H. S. Torr Publication Year: 2016,Page(s):1401 – 1409 Cited by: Papers (617)
A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation Nikolaus Mayer;Eddy Ilg;Philip Häusser;Philipp Fischer;Daniel Cremers;Alexey Dosovitskiy;Thomas Brox Publication Year: 2016,Page(s):4040 – 4048 Cited by: Papers (523)
Training Region-Based Object Detectors with Online Hard Example Mining Abhinav Shrivastava;Abhinav Gupta;Ross Girshick Publication Year: 2016,Page(s):761 – 769 Cited by: Papers (523) | Patents (1)
Structure-from-Motion Revisited Johannes L. Schönberger;Jan-Michael Frahm Publication Year: 2016,Page(s):4104 – 4113 Cited by: Papers (517) | Patents (2)
Social LSTM: Human Trajectory Prediction in Crowded Spaces Alexandre Alahi;Kratarth Goel;Vignesh Ramanathan;Alexandre Robicquet;Li Fei-Fei;Silvio Savarese Publication Year: 2016,Page(s):961 – 971 Cited by: Papers (485)
DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks Seyed-Mohsen Moosavi-Dezfooli;Alhussein Fawzi;Pascal Frossard Publication Year: 2016,Page(s):2574 – 2582 Cited by: Papers (473) | Patents (1)
Stacked Attention Networks for Image Question Answering Zichao Yang;Xiaodong He;Jianfeng Gao;Li Deng;Alex Smola Publication Year: 2016,Page(s):21 – 29 Cited by: Papers (450) | Patents (2)
The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes German Ros;Laura Sellart;Joanna Materzynska;David Vazquez;Antonio M. Lopez Publication Year: 2016,Page(s):3234 – 3243 Cited by: Papers (441) | Patents (2)
Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification Tong Xiao;Hongsheng Li;Wanli Ouyang;Xiaogang Wang Publication Year: 2016,Page(s):1249 – 1258 Cited by: Papers (428) | Patents (3)
Image Captioning with Semantic Attention Quanzeng You;Hailin Jin;Zhaowen Wang;Chen Fang;Jiebo Luo Publication Year: 2016,Page(s):4651 – 4659 Cited by: Papers (420)
NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis Amir Shahroudy;Jun Liu;Tian-Tsong Ng;Gang Wang Publication Year: 2016,Page(s):1010 – 1019 Cited by: Papers (420)
Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function De Cheng;Yihong Gong;Sanping Zhou;Jinjun Wang;Nanning Zheng Publication Year: 2016,Page(s):1335 – 1344 Cited by: Papers (420) | Patents (1)
Volumetric and Multi-view CNNs for Object Classification on 3D Data Charles R. Qi;Hao Su;Matthias Nießner;Angela Dai;Mengyuan Yan;Leonidas J. Guibas Publication Year: 2016,Page(s):5648 – 5656 Cited by: Papers (418) | Patents (6)
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks Sean Bell;C. Lawrence Zitnick;Kavita Bala;Ross Girshick Publication Year: 2016,Page(s):2874 – 2883 Cited by: Papers (416) | Patents (6)
Single-Image Crowd Counting via Multi-Column Convolutional Neural Network Yingying Zhang;Desen Zhou;Siqin Chen;Shenghua Gao;Yi Ma Publication Year: 2016,Page(s):589 – 597 Cited by: Papers (383) | Patents (3)
A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation F. Perazzi;J. Pont-Tuset;B. McWilliams;L. Van Gool;M. Gross;A. Sorkine-Hornung Publication Year: 2016,Page(s):724 – 732 Cited by: Papers (363)
Instance-Aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai;Kaiming He;Jian Sun Publication Year: 2016,Page(s):3150 – 3158 Cited by: Papers (361) | Patents (12)
Attention to Scale: Scale-Aware Semantic Image Segmentation Liang-Chieh Chen;Yi Yang;Jiang Wang;Wei Xu;Alan L. Yuille Publication Year: 2016,Page(s):3640 – 3649 Cited by: Papers (360) | Patents (1)
Siamese Instance Search for Tracking Ran Tao;Efstratios Gavves;Arnold W. M. Smeulders Publication Year: 2016,Page(s):1420 – 1429 Cited by: Papers (358) | Patents (2)
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition Relja Arandjelovic;Petr Gronat;Akihiko Torii;Tomas Pajdla;Josef Sivic Publication Year: 2016,Page(s):5297 – 5307 Cited by: Papers (352) | Patents (1)
Deep Metric Learning via Lifted Structured Feature Embedding Hyun Oh Song;Yu Xiang;Stefanie Jegelka;Silvio Savarese Publication Year: 2016,Page(s):4004 – 4012 Cited by: Papers (343) | Patents (3)
DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations Ziwei Liu;Ping Luo;Shi Qiu;Xiaogang Wang;Xiaoou Tang Publication Year: 2016,Page(s):1096 – 1104 Cited by: Papers (342)
Hedged Deep Tracking Yuankai Qi;Shengping Zhang;Lei Qin;Hongxun Yao;Qingming Huang;Jongwoo Lim;Ming-Hsuan Yang Publication Year: 2016,Page(s):4303 – 4311 Cited by: Papers (340) | Patents (1)
Synthetic Data for Text Localisation in Natural Images Ankush Gupta;Andrea Vedaldi;Andrew Zisserman Publication Year: 2016,Page(s):2315 – 2324 Cited by: Papers (328)
WIDER FACE: A Face Detection Benchmark Shuo Yang;Ping Luo;Chen Change Loy;Xiaoou Tang Publication Year: 2016,Page(s):5525 – 5533 Cited by: Papers (327)
DenseCap: Fully Convolutional Localization Networks for Dense Captioning Justin Johnson;Andrej Karpathy;Li Fei-Fei Publication Year: 2016,Page(s):4565 – 4574 Cited by: Papers (325) | Patents (6)
Learning a Discriminative Null Space for Person Re-identification Li Zhang;Tao Xiang;Shaogang Gong Publication Year: 2016,Page(s):1239 – 1248 Cited by: Papers (323) | Patents (4)
Non-local Image Dehazing Dana Berman;Tali Treibitz;Shai Avidan Publication Year: 2016,Page(s):1674 – 1682 Cited by: Papers (321)
Deep Contrast Learning for Salient Object Detection Guanbin Li;Yizhou Yu Publication Year: 2016,Page(s):478 – 487 Cited by: Papers (318)
Face Alignment Across Large Poses: A 3D Solution Xiangyu Zhu;Zhen Lei;Xiaoming Liu;Hailin Shi;Stan Z. Li Publication Year: 2016,Page(s):146 – 155 Cited by: Papers (300)
DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection Nian Liu;Junwei Han Publication Year: 2016,Page(s):678 – 686 Cited by: Papers (299)
Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation Guosheng Lin;Chunhua Shen;Anton van den Hengel;Ian Reid Publication Year: 2016,Page(s):3194 – 3203 Cited by: Papers (290)
HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection Tao Kong;Anbang Yao;Yurong Chen;Fuchun Sun Publication Year: 2016,Page(s):845 – 853 Cited by: Papers (290) | Patents (1)
Efficient Deep Learning for Stereo Matching Wenjie Luo;Alexander G. Schwing;Raquel Urtasun Publication Year: 2016,Page(s):5695 – 5703 Cited by: Papers (273) | Patents (1)
Deep Supervised Hashing for Fast Image Retrieval Haomiao Liu;Ruiping Wang;Shiguang Shan;Xilin Chen Publication Year: 2016,Page(s):2064 – 2072 Cited by: Papers (267)
CNN-RNN: A Unified Framework for Multi-label Image Classification Jiang Wang;Yi Yang;Junhua Mao;Zhiheng Huang;Chang Huang;Wei Xu Publication Year: 2016,Page(s):2285 – 2294 Cited by: Papers (258) | Patents (2)
Hierarchical Gaussian Descriptor for Person Re-identification Tetsu Matsukawa;Takahiro Okabe;Einoshin Suzuki;Yoichi Sato Publication Year: 2016,Page(s):1363 – 1372 Cited by: Papers (250)

https://ieeexplore.ieee.org/xpl/conhome/7776647/proceeding?rowsPerPage=50&sortType=paper-citations

Best Paper Award

Honorable Mention

Best Student Paper Award

Longuet-Higgins Prize(Test-of-Time)

Curated Papers:

Deep Residual Learning for Image Recognition

You Only Look Once: Unified, Real-Time Object Detection

Rethinking the Inception Architecture for Computer Vision

Full List:

Share this:

Leave a Reply Cancel reply