CVPR 2021 is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers.
Now the 2021 paper has not been fully released, and will be updated directly when it is released later. Now let’s review the 2020 and 2019 papers.
|
CVPR 2021 |
Continuously update Github
Target Detection
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection Paper address: https://arxiv.org/abs/1912.02424
Code: https://github.com/sfzhang15/ATSS
Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector Paper address: https://arxiv.org/abs/1908.01998
Image segmentation
Semi-Supervised Semantic Image Segmentation with Self-correcting Networks Paper address: https://arxiv.org/abs/1811.07073
Deep Snake for Real-Time Instance Segmentation Paper address: https://arxiv.org/abs/2001.01629
CenterMask: Real-Time Anchor-Free Instance Segmentation Paper address: https://arxiv.org/abs/1911.06667 Code: https://github.com/youngwanLEE/CenterMask
SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks Paper address: https://arxiv.org/abs/2003.00678
PolarMask: Single Shot Instance Segmentation with Polar Representation Paper address: https://arxiv.org/abs/1909.13226 Code: https://github.com/xieenze/PolarMask
xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation Paper address: https://arxiv.org/abs/1911.12676
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation Paper address: https://arxiv.org/abs/2001.00309
Face recognition
Towards Universal Representation Learning for Deep Face Recognition paper address: https://arxiv.org/abs/2002.11841
Suppressing Uncertainties for Large-Scale Facial Expression Recognition
Paper address: https://arxiv.org/abs/2002.10392 Code: https://github.com/kaiwang960112/Self-Cure-Network
3. Face X-ray for More General Face Forgery Detection paper address: https://arxiv.org/pdf/1912.13458.pdf
Target Tracking
1. ROAM: Recurrently Optimizing Tracking Model Paper address: https://arxiv.org/abs/1907.12006
3D point cloud & reconstruction
PF-Net: Point Fractal Network for 3D Point Cloud Completion Paper address: https://arxiv.org/abs/2003.00410
PointAugment: an Auto-Augmentation Framework for Point Cloud Classification Paper address: https://arxiv.org/abs/2002.10876 Code: https://github.com/liruihui/PointAugment/
3. Learning multiview 3D point cloud registration address: https://arxiv.org/abs/2001.05119
C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds Paper address: https://arxiv.org/abs/1912.07009
RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds Paper address: https://arxiv.org/abs/1911.11236
Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image Paper address: https://arxiv.org/abs/2002.12212
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion Paper address: https://arxiv.org/abs/2003.01456
In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction from 2D Landmarks Paper address: https://arxiv.org/pdf/1911.11924.pdf
Attitude estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation Paper address: https://arxiv.org/abs/1912.05656
Code: https://github.com/mkocabas/VIBE
Distribution-Aware Coordinate Representation for Human Pose Estimation Paper address: https://arxiv.org/abs/1910.06278
Code: https://github.com/ilovepose/DarkPose
4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras Paper address: https://arxiv.org/abs/2002.12625
Optimal least-squares solution to the hand-eye calibration problem Paper address: https://arxiv.org/abs/2002.10838
D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry Paper address: https://arxiv.org/abs/2003.01060
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition Paper address: https://arxiv.org/abs/2001.09691
Distribution Aware Coordinate Representation for Human Pose Estimation Paper address: https://arxiv.org/abs/1910.06278
The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation Paper address: https://arxiv.org/abs/1911.07524
9.PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation Paper address: https://arxiv.org/abs/1911.04231
GAN
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models Paper address: https://arxiv.org/abs/1911.12287 Code: https://github.com/giannisdaras/ylg
MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis Paper address: https://arxiv.org/abs/1903.06048
Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory Paper address: https://arxiv.org/abs/1911.04636
Small sample & zero sample
- Improved Few-Shot Visual Classification paper address: https://arxiv.org/pdf/1912.03432.pdf
2. Meta-Transfer Learning for Zero-Shot Super-Resolution Paper address: https://arxiv.org/abs/2002.12213
Weak supervision & unsupervised
- Rethinking the Route Towards Weakly Supervised Object Localization Paper address: https://arxiv.org/abs/2002.11359
- NestedVAE: Isolating Common Factors via Weak Supervision Paper address: https://arxiv.org/abs/2002.11576
3.Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation Paper address: https://arxiv.org/abs/1911.07450
4. Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction address: https://arxiv.org/abs/2003.01460
Neural Networks
Visual Commonsense R-CNN paper address: https://arxiv.org/abs/2002.12204
GhostNet: More Features from Cheap Operations Paper address: https://arxiv.org/abs/1911.11907 Code: https://github.com/iamhankai/ghostnet
Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Paper address: https://arxiv.org/abs/2003.01826
Model acceleration
- GPU-Accelerated Mobile Multi-view Style Transfer paper address: https://arxiv.org/abs/2003.00706
Visual common sense
- What it Thinks is Important is Important: Robustness Transfers through Input Gradients Paper address: https://arxiv.org/abs/1912.05699
2. Attentive Context Normalization for Robust Permutation-Equivariant Learning paper address: https://arxiv.org/abs/1907.02545
Bundle Adjustment on a Graph Processor Paper address: https://arxiv.org/abs/2003.03134 https://github.com/joeaortiz/gbp
Transferring Dense Pose to Proximal Animal Classes Paper address: https://arxiv.org/abs/2003.00080
Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs Paper address: https://arxiv.org/abs/2003.00287
Learning in the Frequency Domain paper address: https://arxiv.org/abs/2002.12416
7. Filter Grafting for Deep Neural Networks paper address: https://arxiv.org/pdf/2001.05868.pdf
8.ClusterFit: Improving Generalization of Visual Representations Paper address: https://arxiv.org/abs/1912.03330
9.Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction Paper address: https://arxiv.org/abs/2002.11927
10. Auto-Encoding Twin-Bottleneck Hashing paper address: https://arxiv.org/abs/2002.11930
11. Learning Representations by Predicting Bags of Visual Words Paper address: https://arxiv.org/abs/2002.12247
12.Holistically-Attracted Wireframe Parsing paper address: https://arxiv.org/abs/2003.01663
13.A General and Adaptive Robust Loss Function paper address: https://arxiv.org/abs/1701.03077
14. A Characteristic Function Approach to Deep Implicit Generative Modeling paper address: https://arxiv.org/abs/1909.07425
15.AdderNet: Do We Really Need Multiplications in Deep Learning? Paper address: https://arxiv.org/pdf/1912.13200
16.12-in-1: Multi-Task Vision and Language Representation Learning Paper address: https://arxiv.org/abs/1912.02315
17.Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks Paper address: https://arxiv.org/abs/1912.09393
18.CARS: Contunuous Evolution for Efficient Neural Architecture Search Paper address: https://arxiv.org/pdf/1909.04977.pdf Code: https://github.com/huawei-noah/CARS
19.Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training Paper address: https://arxiv.org/abs/2002.10638 Code: https://github.com/weituo12321/PREVALENT
1.GhostNet: More Features from Cheap Operations (over the architecture of Mobilenet v3) Paper link: https://arxiv.org/pdf/1911.11907 arxiv.org model (amazing performance on ARM CPU): https://github. com/iamhankai/ghostnetgithub.com
We beat other SOTA lightweight CNNs such as MobileNetV3 and FBNet.
AdderNet: Do We Really Need Multiplications in Deep Learning? (Additive Neural Network) Achieved very good performance on large-scale neural networks and datasets. Link to the paper: https://arxiv.org/pdf/1912.13200arxiv.org
Frequency Domain Compact 3D Convolutional Neural Networks (3dCNN compression) Paper link: https://arxiv.org/pdf/1909.04977arxiv.org Open source code: https://github.com/huawei-noah/CARSgithub.com
A Semi-Supervised Assessor of Neural Architectures (NAS)
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection (NAS detection) backbone-neck-head search together, trinity
CARS: Contunuous Evolution for Efficient Neural Architecture Search (Continuously evolved NAS) is efficient, has multiple advantages of differentiability and evolution, and can output Pareto pre-research
On Positive-Unlabeled Classification in GAN (PU+GAN)
Learning multiview 3D point cloud registration (3D point cloud) Link to the paper: arxiv.org/abs/2001.05119
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition (fine-grained action recognition) Link to the paper: arxiv.org/abs/2001.09691
Action Modifiers: Learning from Adverbs in Instructional Video Link to the paper: arxiv.org/abs/1912.06617
PolarMask: Single Shot Instance Segmentation with Polar Representation (instance segmentation modeling) Paper link: arxiv.org/abs/1909.13226 Paper interpretation: https://zhuanlan.zhihu.com/p/84890413 Open source code: https://github. com/xieenze/PolarMask
Rethinking Performance Estimation in Neural Architecture Search (NAS) Since the real time-consuming part of block wise neural architecture search is performance estimation, this article finds the optimal parameters for block wise NAS, which is faster and more relevant.
Distribution Aware Coordinate Representation for Human Pose Estimation (human body pose estimation) Link to the paper: arxiv.org/abs/1910.06278 Github: https://github.com/ilovepose/DarkPose Author team homepage: https://ilovepose.github.io/ coco/
OCR
- ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network Paper address: https://arxiv.org/abs/2002.10200 Code: https://github.com/Yuliang-Liu/bezier_curve_text_spotting,https://github. com/aim-uofa/adet
Image classification
Self-training with Noisy Student improves ImageNet classification Paper address: https://arxiv.org/abs/1911.04252
Image Matching across Wide Baselines: From Paper to Practice Paper address: https://arxiv.org/abs/2003.01587
Towards Robust Image Classification Using Sequential Attention Models Paper address: https://arxiv.org/abs/1912.02184
Video analysis
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications Paper address: https://arxiv.org/abs/2003.01455
Code: https://github.com/bbrattoli/ZeroShotVideoClassification
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs Paper address: https://arxiv.org/abs/2003.00387
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Paper address: https://arxiv.org/abs/2003.00392
Object Relational Graph with Teacher-Recommended Learning for Video Captioning Paper address: https://arxiv.org/abs/2002.11566
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution Paper address: https://arxiv.org/abs/2002.11616
Blurry Video Frame Interpolation Paper address: https://arxiv.org/abs/2002.12259
Hierarchical Conditional Relation Networks for Video Question Answering Paper address: https://arxiv.org/abs/2002.10698
Action Modifiers: Learning from Adverbs in Instructional Video Paper address: https://arxiv.org/abs/1912.06617
Image Processing
- Learning to Shade Hand-drawn Sketches Paper address: https://arxiv.org/abs/2002.11812
2.Single Image Reflection Removal through Cascaded Refinement Paper address: https://arxiv.org/abs/1911.06634
3.Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data Paper address: https://arxiv.org/abs/2002.11297
Deep Image Harmonization via Domain Verification Paper address: https://arxiv.org/abs/1911.13239 Code: https://github.com/bcmi/Image_Harmonization_Datasets
RoutedFusion: Learning Real-time Depth Map Fusion Paper address: https://arxiv.org/pdf/2001.04388.pdf
Update
- Visual Commonsense R-CNN, Visual Commonsense R-CNN
https://arxiv.org/abs/2002.12204
- Out-of-distribution image detection
https://arxiv.org/abs/2002.11297
- Blurry Video Frame Interpolation, Blurry Video Frame Interpolation
https://arxiv.org/abs/2002.12259
- Meta transfer learning zero sample superscore
https://arxiv.org/abs/2002.12213
- 3D indoor scene understanding
https://arxiv.org/abs/2002.12212
6. Generate unbiased scene graphs from biased training
https://arxiv.org/abs/2002.11949
- Automatically encode double bottleneck hash
https://arxiv.org/abs/2002.11930
- A Convolutional Neural Network of Social Spatio-temporal Graph for Human Trajectory Prediction
https://arxiv.org/abs/2002.11927
- For general representation learning for deep face recognition
https://arxiv.org/abs/2002.11841
- Visual representation generalization
https://arxiv.org/abs/1912.03330
- Reduce context bias
https://arxiv.org/abs/2002.11812
- Unsupervised reinforcement learning with transferable meta skills
https://arxiv.org/abs/1911.07450
- Fast and accurate spatio-temporal video super-resolution
https://arxiv.org/abs/2002.11616
- Object relationship diagram Teacher recommended learning video captioning
https://arxiv.org/abs/2002.11566
- Rethinking the Location and Routing of Weakly Supervised Objects
https://arxiv.org/abs/2002.11359
- General agents for learning visual and language navigation through pre-training
https://arxiv.org/pdf/2002.10638.pdf
- GhostNet lightweight neural network
https://arxiv.org/pdf/1911.11907.pdf
- AdderNet: In deep learning, do we really need multiplication?
https://arxiv.org/pdf/1912.13200.pdf
- CARS: continuous evolution of efficient neural structure search
https://arxiv.org/abs/1909.04977
- Removal of reflections in a single image through collaborative iterative cascade fine-tuning
https://arxiv.org/abs/1911.06634
- Filter grafting of deep neural network
https://arxiv.org/pdf/2001.05868.pdf
- PolarMask: unify instance segmentation to FCN
https://arxiv.org/pdf/1909.13226.pdf
- Semi-supervised semantic image segmentation
https://arxiv.org/pdf/1811.07073.pdf
- Defend general attacks through selective feature regeneration
https://arxiv.org/pdf/1906.03444.pdf
- Real-time image retrieval based on fine-grained sketches
https://arxiv.org/abs/2002.10310
- Ask the VQA model with sub-questions
https://arxiv.org/abs/1906.03444
- Learning neural 3D texture space from 2D paradigms
https://geometry.cs.ucl.ac.uk/projects/2020/neuraltexture/
- NestedVAE: Isolate common factors through weak supervision
https://arxiv.org/abs/2002.11576
- Realize multiple future trajectory predictions
https://arxiv.org/pdf/1912.06445.pdf
- Use sequence attention model for robust image classification
https://arxiv.org/pdf/1912.02184
Source : The official website of CVPR 2021