Top Cited Papers: CVPR 2018

Best Paper Award

“Taskonomy: Disentangling Task Transfer Learning”

A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, S. Savarese

Honorable Mention

2018	“Deep Learning of Graph Matching”	A. Zanfir and C. Sminchisescu.
2018	“SPLATNet: Sparse Lattice Networks for Point Cloud Processing”	H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.-H. Yang, and J. Kautz
2018	“CodeSLAM — Learning a Compact, Optimisable Representation for Dense Visual SLAM”	M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davison
2018	“Efficient Optimization for Rank-Based Loss Functions”	P. Mohapatra, M. Rolinek, C.V. Jawahar, V. Kolmogorov, and M. Pawan Kumar

Best Student Paper Award

“Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies”

H. Joo, T. Simon, Y. Sheikh

Longuet-Higgins Prize(Test-of-Time)

2018

“A Discriminatively Trained, Multiscale, Deformable Part Model”

P. Felzenszwalb, D. McAllester, and D. Ramanan

Curated Papers:

MobileNetV2: Inverted Residuals and Linear Bottlenecks

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

Squeeze-and-Excitation Networks

Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, several recent approaches have shown the benefit of enhancing spatial encoding. In this work, we focus on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ~25% relative improvement over the winning entry of 2016. Code and models are available at https://github.com/hujie-frank/SENet.

Learning Transferable Architectures for Scalable Image Recognition

Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. The key contribution of this work is the design of a new search space (which we call the “NASNet search space”) which enables transferability. In our experiments, we search for the best convolutional layer (or “cell”) on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, which we name a “NASNet architecture”. We also introduce a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. On CIFAR-10 itself, a NASNet found by our method achieves 2.4% error rate, which is state-of-the-art. Although the cell is not searched for directly on ImageNet, a NASNet constructed from the best cell achieves, among the published works, state-of-the-art accuracy of 82.7% top-1 and 96.2% top-5 on ImageNet. Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS – a reduction of 28% in computational demand from the previous state-of-the-art model. When evaluated at different levels of computational cost, accuracies of NASNets exceed those of the state-of-the-art human-designed models. For instance, a small version of NASNet also achieves 74% top-1 accuracy, which is 3.1% better than equivalently-sized, state-of-the-art models for mobile platforms. Finally, the image features learned from image classification are generically useful and can be transferred to other computer vision problems. On the task of object detection, the learned features b…

Full List:

Squeeze-and-Excitation Networks Jie Hu;Li Shen;Gang Sun Publication Year: 2018,Page(s):7132 – 7141 Cited by: Papers (1036)
MobileNetV2: Inverted Residuals and Linear Bottlenecks Mark Sandler;Andrew Howard;Menglong Zhu;Andrey Zhmoginov;Liang-Chieh Chen Publication Year: 2018,Page(s):4510 – 4520 Cited by: Papers (829)
Learning Transferable Architectures for Scalable Image Recognition Barret Zoph;Vijay Vasudevan;Jonathon Shlens;Quoc V. Le Publication Year: 2018,Page(s):8697 – 8710 Cited by: Papers (616) | Patents (1)
Non-local Neural Networks Xiaolong Wang;Ross Girshick;Abhinav Gupta;Kaiming He Publication Year: 2018,Page(s):7794 – 7803 Cited by: Papers (536)
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume Deqing Sun;Xiaodong Yang;Ming-Yu Liu;Jan Kautz Publication Year: 2018,Page(s):8934 – 8943 Cited by: Papers (348)
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Xiangyu Zhang;Xinyu Zhou;Mengxiao Lin;Jian Sun Publication Year: 2018,Page(s):6848 – 6856 Cited by: Papers (332) | Patents (2)
Residual Dense Network for Image Super-Resolution Yulun Zhang;Yapeng Tian;Yu Kong;Bineng Zhong;Yun Fu Publication Year: 2018,Page(s):2472 – 2481 Cited by: Papers (325)
Harmonious Attention Network for Person Re-identification Wei Li;Xiatian Zhu;Shaogang Gong Publication Year: 2018,Page(s):2285 – 2294 Cited by: Papers (309)
StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation Yunjey Choi;Minje Choi;Munyoung Kim;Jung-Woo Ha;Sunghun Kim;Jaegul Choo Publication Year: 2018,Page(s):8789 – 8797 Cited by: Papers (271)
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs Ting-Chun Wang;Ming-Yu Liu;Jun-Yan Zhu;Andrew Tao;Jan Kautz;Bryan Catanzaro Publication Year: 2018,Page(s):8798 – 8807 Cited by: Papers (270)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Peter Anderson;Xiaodong He;Chris Buehler;Damien Teney;Mark Johnson;Stephen Gould;Lei Zhang Publication Year: 2018,Page(s):6077 – 6086 Cited by: Papers (261) | Patents (1)
Path Aggregation Network for Instance Segmentation Shu Liu;Lu Qi;Haifang Qin;Jianping Shi;Jiaya Jia Publication Year: 2018,Page(s):8759 – 8768 Cited by: Papers (253)
Cascade R-CNN: Delving Into High Quality Object Detection Zhaowei Cai;Nuno Vasconcelos Publication Year: 2018,Page(s):6154 – 6162 Cited by: Papers (245)
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric Richard Zhang;Phillip Isola;Alexei A. Efros;Eli Shechtman;Oliver Wang Publication Year: 2018,Page(s):586 – 595 Cited by: Papers (231)
High Performance Visual Tracking with Siamese Region Proposal Network Bo Li;Junjie Yan;Wei Wu;Zheng Zhu;Xiaolin Hu Publication Year: 2018,Page(s):8971 – 8980 Cited by: Papers (225)
Generative Image Inpainting with Contextual Attention Jiahui Yu;Zhe Lin;Jimei Yang;Xiaohui Shen;Xin Lu;Thomas S. Huang Publication Year: 2018,Page(s):5505 – 5514 Cited by: Papers (222)
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection Yin Zhou;Oncel Tuzel Publication Year: 2018,Page(s):4490 – 4499 Cited by: Papers (214)
Frustum PointNets for 3D Object Detection from RGB-D Data Charles R. Qi;Wei Liu;Chenxia Wu;Hao Su;Leonidas J. Guibas Publication Year: 2018,Page(s):918 – 927 Cited by: Papers (200) | Patents (3)
CosFace: Large Margin Cosine Loss for Deep Face Recognition Hao Wang;Yitong Wang;Zheng Zhou;Xing Ji;Dihong Gong;Jingchao Zhou;Zhifeng Li;Wei Liu Publication Year: 2018,Page(s):5265 – 5274 Cited by: Papers (180)
Pyramid Stereo Matching Network Jia-Ren Chang;Yong-Sheng Chen Publication Year: 2018,Page(s):5410 – 5418 Cited by: Papers (172)
Deep Ordinal Regression Network for Monocular Depth Estimation Huan Fu;Mingming Gong;Chaohui Wang;Kayhan Batmanghelich;Dacheng Tao Publication Year: 2018,Page(s):2002 – 2011 Cited by: Papers (169)
A Closer Look at Spatiotemporal Convolutions for Action Recognition Du Tran;Heng Wang;Lorenzo Torresani;Jamie Ray;Yann LeCun;Manohar Paluri Publication Year: 2018,Page(s):6450 – 6459 Cited by: Papers (167)
Learning to Compare: Relation Network for Few-Shot Learning Flood Sung;Yongxin Yang;Li Zhang;Tao Xiang;Philip H.S. Torr;Timothy M. Hospedales Publication Year: 2018,Page(s):1199 – 1208 Cited by: Papers (163)
GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose Zhichao Yin;Jianping Shi Publication Year: 2018,Page(s):1983 – 1992 Cited by: Papers (158) | Patents (1)
Single-Shot Refinement Neural Network for Object Detection Shifeng Zhang;Longyin Wen;Xiao Bian;Zhen Lei;Stan Z. Li Publication Year: 2018,Page(s):4203 – 4212 Cited by: Papers (155)
Real-World Anomaly Detection in Surveillance Videos Waqas Sultani;Chen Chen;Mubarak Shah Publication Year: 2018,Page(s):6479 – 6488 Cited by: Papers (152)
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? Kensho Hara;Hirokatsu Kataoka;Yutaka Satoh Publication Year: 2018,Page(s):6546 – 6555 Cited by: Papers (146)
Deep Back-Projection Networks for Super-Resolution Muhammad Haris;Greg Shakhnarovich;Norimichi Ukita Publication Year: 2018,Page(s):1664 – 1673 Cited by: Papers (143)
End-to-End Recovery of Human Shape and Pose Angjoo Kanazawa;Michael J. Black;David W. Jacobs;Jitendra Malik Publication Year: 2018,Page(s):7122 – 7131 Cited by: Papers (143)
Context Encoding for Semantic Segmentation Hang Zhang;Kristin Dana;Jianping Shi;Zhongyue Zhang;Xiaogang Wang;Ambrish Tyagi;Amit Agrawal Publication Year: 2018,Page(s):7151 – 7160 Cited by: Papers (142)
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference Benoit Jacob;Skirmantas Kligys;Bo Chen;Menglong Zhu;Matthew Tang;Andrew Howard;Hartwig Adam;Dmitry Kalenichenko Publication Year: 2018,Page(s):2704 – 2713 Cited by: Papers (141) | Patents (1)
Learning to See in the Dark Chen Chen;Qifeng Chen;Jia Xu;Vladlen Koltun Publication Year: 2018,Page(s):3291 – 3300 Cited by: Papers (140)
Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints Reza Mahjourian;Martin Wicke;Anelia Angelova Publication Year: 2018,Page(s):5667 – 5675 Cited by: Papers (126)
Progressive Attention Guided Recurrent Network for Salient Object Detection Xiaoning Zhang;Tiantian Wang;Jinqing Qi;Huchuan Lu;Gang Wang Publication Year: 2018,Page(s):714 – 722 Cited by: Papers (125)
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks Agrim Gupta;Justin Johnson;Li Fei-Fei;Silvio Savarese;Alexandre Alahi Publication Year: 2018,Page(s):2255 – 2264 Cited by: Papers (124)
3D Semantic Segmentation with Submanifold Sparse Convolutional Networks Benjamin Graham;Martin Engelcke;Laurens van der Maaten Publication Year: 2018,Page(s):9224 – 9232 Cited by: Papers (121)
Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking Qiang Wang;Zhu Teng;Junliang Xing;Jin Gao;Weiming Hu;Stephen Maybank Publication Year: 2018,Page(s):4854 – 4863 Cited by: Papers (121)
Person Transfer GAN to Bridge Domain Gap for Person Re-identification Longhui Wei;Shiliang Zhang;Wen Gao;Qi Tian Publication Year: 2018,Page(s):79 – 88 Cited by: Papers (120)
Learning Depth from Monocular Videos Using Direct Methods Chaoyang Wang;José Miguel Buenaposada;Rui Zhu;Simon Lucey Publication Year: 2018,Page(s):2022 – 2030 Cited by: Papers (119)
DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks Orest Kupyn;Volodymyr Budzan;Mykola Mykhailych;Dmytro Mishkin;Jiri Matas Publication Year: 2018,Page(s):8183 – 8192 Cited by: Papers (117) | Patents (1)
Learning to Adapt Structured Output Space for Semantic Segmentation Yi-Hsuan Tsai;Wei-Chih Hung;Samuel Schulter;Kihyuk Sohn;Ming-Hsuan Yang;Manmohan Chandraker Publication Year: 2018,Page(s):7472 – 7481 Cited by: Papers (116)
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks Tao Xu;Pengchuan Zhang;Qiuyuan Huang;Han Zhang;Zhe Gan;Xiaolei Huang;Xiaodong He Publication Year: 2018,Page(s):1316 – 1324 Cited by: Papers (115)
Relation Networks for Object Detection Han Hu;Jiayuan Gu;Zheng Zhang;Jifeng Dai;Yichen Wei Publication Year: 2018,Page(s):3588 – 3597 Cited by: Papers (111)
Taskonomy: Disentangling Task Transfer Learning Amir R. Zamir;Alexander Sax;William Shen;Leonidas Guibas;Jitendra Malik;Silvio Savarese Publication Year: 2018,Page(s):3712 – 3722 Cited by: Papers (107) | Patents (1)
Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction Huangying Zhan;Ravi Garg;Chamara Saroj Weerasekera;Kejie Li;Harsh Agarwal;Ian M. Reid Publication Year: 2018,Page(s):340 – 349 Cited by: Papers (107)
Scale-Recurrent Network for Deep Image Deblurring Xin Tao;Hongyun Gao;Xiaoyong Shen;Jue Wang;Jiaya Jia Publication Year: 2018,Page(s):8174 – 8182 Cited by: Papers (104) | Patents (1)
Densely Connected Pyramid Dehazing Network He Zhang;Vishal M. Patel Publication Year: 2018,Page(s):3194 – 3203 Cited by: Papers (103)
DensePose: Dense Human Pose Estimation in the Wild Riza Alp Güler;Natalia Neverova;Iasonas Kokkinos Publication Year: 2018,Page(s):7297 – 7306 Cited by: Papers (102)
Learning a Discriminative Feature Network for Semantic Segmentation Changqian Yu;Jingbo Wang;Chao Peng;Changxin Gao;Gang Yu;Nong Sang Publication Year: 2018,Page(s):1857 – 1866 Cited by: Papers (102)
Maximum Classifier Discrepancy for Unsupervised Domain Adaptation Kuniaki Saito;Kohei Watanabe;Yoshitaka Ushiku;Tatsuya Harada Publication Year: 2018,Page(s):3723 – 3732 Cited by: Papers (101)

https://ieeexplore.ieee.org/xpl/conhome/8576498/proceeding?sortType=paper-citations&rowsPerPage=50&pageNumber=1