Program

Data Program Europe Time(UTC+1) Beijing Time(UTC+8) Paper ID Paper Title
2021.3.7 Conference Opening 01:00-02:00 08:00-09:00    
Keynote 1 by Jiebo Luo 02:00-03:00 09:00-10:00    
Best Paper Session(4 papers) 03:00-05:00 10:00-12:00 12 Distilling Knowledge in Causal Inference for Unbiased Visual Question Answering
58 Similar Scene Retrieval in Soccer Videos with Weak Annotations by Multimodal Use of Bidirectional LSTM
73 Interactive Re-ranking for Cross-modal Retrieval Based on Object-wise Question Answering
92 Real-Time Arbitrary Video Style Transfer
Tutorial 1:Bias Issues and Solutions in Recommender System   07:00-09:00     14:00-16:00     Bias Issues and Solutions in Recommender System
Demo 1  09:00-10:00 16:00-17:00 142 Synthesized 3D Model Suggestions with Smartphone Based MR to Modify the PreBuilt Environment: Interior Design
Special Session Poster 1-Multimedia application(8 papers) 10:00-10:40 17:00-17:40 18 An Automated Method with Anchor-Free Detection and U-Shaped Segmentation for Nuclei Instance Segmentation
22 Improving face recognition in Surveillance video with judicious selection and fusion of representative frames
25 A Multimedia Solution to Motivate Childhood Cancer Patients to Keep Up with Cancer Treatment
85 Story Segmentation For News Broadcast Based On Primary Caption
86 Intermediate Coordinate based Pose Non-perspective Estimation from Line Correspondences
132 Structure-Preserving Extremely Low Light Image Enhancement with Fractional Order Differential Mask Guidance
136 Change Detection from Synthetic Aperture Radar Images Based on Deformable Residual Convolutional Neural Networks
146 Towards Annotation-Free Evaluation of Cross-Lingual Image Captioning
Poster Session 1(8 papers) 11:00-11:40 18:00-18:40 4 A Treatment Engine by Multimodal EMR Data
11 Storyboard Relational Model for Group Activity Recognition
26 Global and Local Feature Alignment for Video Object Detection
28 Semantic Feature Augmentation for Fine-grained Visual Categorization with Few-Sample Training
33 Destylization of text with decorative elements
35 Hierarchical Clustering via Mutual Learning for Unsupervised Person Re-identification
48 Robust Visual Tracking via Scale-Aware Localization and Peak Response Strength
49 Hungry Networks: 3D Mesh Reconstruction of a Dish and a Plate from a Single Dish Image for Estimating Food Volume
Demo 1-Mirrored  17:00-18:00 00:00-01:00+1 day 142 Synthesized 3D Model Suggestions with Smartphone Based MR to Modify the PreBuilt Environment: Interior Design
Special Session Poster 1-Mirrored-Multimedia application(8 papers) 18:00-18:40 01:00-01:40+1 day 18 An Automated Method with Anchor-Free Detection and U-Shaped Segmentation for Nuclei Instance Segmentation
22 Improving face recognition in Surveillance video with judicious selection and fusion of representative frames
25 A Multimedia Solution to Motivate Childhood Cancer Patients to Keep Up with Cancer Treatment
85 Story Segmentation For News Broadcast Based On Primary Caption
86 Intermediate Coordinate based Pose Non-perspective Estimation from Line Correspondences
132 Structure-Preserving Extremely Low Light Image Enhancement with Fractional Order Differential Mask Guidance
136 Change Detection from Synthetic Aperture Radar Images Based on Deformable Residual Convolutional Neural Networks
146 Towards Annotation-Free Evaluation of Cross-Lingual Image Captioning
Poster Session 1-Mirrored (8 papers) 19:00-19:40 02:00-02:40+1 day 4 A Treatment Engine by Multimodal EMR Data
11 Storyboard Relational Model for Group Activity Recognition
26 Global and Local Feature Alignment for Video Object Detection
28 Semantic Feature Augmentation for Fine-grained Visual Categorization with Few-Sample Training
33 Destylization of text with decorative elements
35 Hierarchical Clustering via Mutual Learning for Unsupervised Person Re-identification
48 Robust Visual Tracking via Scale-Aware Localization and Peak Response Strength
49 Hungry Networks: 3D Mesh Reconstruction of a Dish and a Plate from a Single Dish Image for Estimating Food Volume
Data Program Europe Time(UTC+1) Beijing Time(UTC+8) Paper ID Paper Title
2021.3.8 Keynote 2 by Kristen Grauman 01:00-02:00 08:00-09:00    
Tutorial 2:10 Years of Video Browser Showdown   02:00-04:00     09:00-11:00     10 Years of Video Browser Showdown
Demo 2 04:00-05:00 11:00-12:00 144 SeekSuspect : Retrieving Suspects from Criminal Datasets using Visual Memory
Oral Session 1(4 papers) 07:00-08:20 14:00-15:20 16 Incremental Multi-view Object Detection from a Moving Camera
24 Low-quality Watermarked Face Inpainting with Discriminative Residual Learning
31 Unsupervised learning of co-occurrences for face images retrieval
32 EvoGAN: An Evolutionary GAN for Face Aging and Rejuvenation
Oral Session 2(4 papers) 08:20-09:40 15:20-16:40 36 Self-Supervised Adversarial Learning for Cross-Modal Retrieval
37 Multi-Level Expression Guided Attention Network for Referring Expression Comprehension
45 Learning Intra-inter Semantic Aggregation for Video Object Detection
55 A Multi-Scale Language Embedding Network for Proposal-Free Referring Expression Comprehension
Special Session Poster 2-Multimedia system(8 papers) 10:00-10:40 17:00-17:40 23 Two-stage Structure Aware Image Inpainting Based on Generative Adversarial Network
39 Adaptive Feature Aggregation Network for Nuclei Segmentation
44 Classification of Multimedia SNS Posts about Tourist Sites Based on Their Focus toward Predicting Eco-Friendly Users
80 Table Detection and Cell Segmentation in Online Handwritten Documents with Graph Attention Networks
97 Determining Image Age with Rank-Consistent Ordinal Classification and Object-centered Ensemble
99 Cross-Modal Learning for Saliency Prediction in Mobile Environment
125 Integrating Aspect-aware Interactive Attention and Emotional Position-aware for Multi-aspect Sentiment Analysis
130 Pulse Localization Networks with Infrared Camera
Poster Session 2(8 papers) 11:00-11:40 18:00-18:40 52 A Novel System Architecture and an Automatic Monitoring Method for Remote Production
60 Patch Assembly for Real-time Instance Segmentation
61 Full-Resolution Encoder–Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation
64 Graph-based Variational Auto-Encoder for Generalized Zero-Shot Learning
66 Fixed-size Video Summarization over Streaming Data via Non-monotone Submodular Maximization
69 Multi-focus noisy image fusion based on gradient regularized convolutional sparse representation
71  Fixation Guided Network for Salient Object Detection
83 RICAPS: Residual Inception and Cascaded Capsule Network for Broadcast Sports Video Classification
Demo 2-Mirrored  17:00-18:00 00:00-01:00+1 day 144 SeekSuspect : Retrieving Suspects from Criminal Datasets using Visual Memory
Oral Session 1-Mirrored (4 papers) 18:00-19:20 01:00-02:20+1 day 16 Incremental Multi-view Object Detection from a Moving Camera
24 Low-quality Watermarked Face Inpainting with Discriminative Residual Learning
31 Unsupervised learning of co-occurrences for face images retrieval
32 EvoGAN: An Evolutionary GAN for Face Aging and Rejuvenation
Oral Session 2-Mirrored (4 papers) 19:40-21:00 02:40-04:00+1 day 36 Self-Supervised Adversarial Learning for Cross-Modal Retrieval
37 Multi-Level Expression Guided Attention Network for Referring Expression Comprehension
45 Learning Intra-inter Semantic Aggregation for Video Object Detection
55 A Multi-Scale Language Embedding Network for Proposal-Free Referring Expression Comprehension
Special Session Poster 2-Mirrored-Multimedia system (8 papers) 21:00-21:40 04:00-04:40+1 day 23 Two-stage Structure Aware Image Inpainting Based on Generative Adversarial Network
39 Adaptive Feature Aggregation Network for Nuclei Segmentation
44 Classification of Multimedia SNS Posts about Tourist Sites Based on Their Focus toward Predicting Eco-Friendly Users
80 Table Detection and Cell Segmentation in Online Handwritten Documents with Graph Attention Networks
97 Determining Image Age with Rank-Consistent Ordinal Classification and Object-centered Ensemble
99 Cross-Modal Learning for Saliency Prediction in Mobile Environment
125 Integrating Aspect-aware Interactive Attention and Emotional Position-aware for Multi-aspect Sentiment Analysis
130 Pulse Localization Networks with Infrared Camera
Poster Session 2-Mirrored (8 papers) 22:00-22:40 05:00-05:40+1 day 52 A Novel System Architecture and an Automatic Monitoring Method for Remote Production
60 Patch Assembly for Real-time Instance Segmentation
61 Full-Resolution Encoder–Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation
64 Graph-based Variational Auto-Encoder for Generalized Zero-Shot Learning
66 Fixed-size Video Summarization over Streaming Data via Non-monotone Submodular Maximization
69 Multi-focus noisy image fusion based on gradient regularized convolutional sparse representation
71  Fixation Guided Network for Salient Object Detection
83 RICAPS: Residual Inception and Cascaded Capsule Network for Broadcast Sports Video Classification
Data Program Europe Time(UTC+1) Beijing Time(UTC+8) Paper ID Paper Title
2021.3.9 Keynote 3 by Bernt Schiele 01:00-02:00 08:00-09:00    
Demo 3 02:00-03:00 09:00-10:00 145 A Large-Scale Image Retrieval System for Everyday Scenes
Oral Session 3(4 papers) 03:00-04:20 10:00-11:20 68 Overlap Classification Mechanism for Skeletal Bone Age Assessment
72 Motion-Transformer: Self-supervised Pre-trianing for Skeleton-based Action Recognition
75 A Background-induced Generative Network with Multi-level Discriminator for Text-to-Image Generation
76 WFN-PSC: Weighted-Fusion Network with Poly-Scale Convolution for image dehazing
Oral Session 4(5 papers) 07:00-08:40 14:00-15:40 88 An Autoregressive Generation Model for Producing Instant Basketball Defensive Trajectory
104 Objective Object Segmentation Visual Quality Evaluation based on Pixel-Level and Region-Level Characteristics
115 Fixations Based Personal Target Objects Segmentation
120 Relationship Graph Learning Network For Visual Relationship Detection
127 Graph-Based Motion Prediction for Abnormal Action Detection
Special Session Poster 3-Multimedia analysis and understanding(8 papers) 09:00-09:40 16:00-16:40 51 Scene Graph Generation via Multi-Relation Classification and Cross-modal Attention Coordinator
54 Graph Convolution Network with Node Feature Optimization Using Cross Attention for Few-shot Learning
65 A Multi-scale Human Action Recognition Method Based on Laplacian Pyramid Depth Motion Images
77 Video Scene Detection Based on Link Prediction Using Graph Convolution Network
78 Cross-Cultural Design of Facial Expressions for Humanoids-Is There Cultural Difference Between Japan and Denmark?
119 Improving auto-encoder novelty detection using channel attention and entropy minimization
122 Local Structure Alignment Guided Domain Adaptation with Few Source Samples
138 Efficient Inter-image Relation Graph Neural Network Hashing for Scalable Image Retrieval
Poster Session 3(8 papers) 10:00-10:40 17:00-17:40 84 Transfer Non-stationary Texture with Complex Appearance
93 C3VQG: Category Consistent Cyclic Visual Question Generation
106 Text-based Visual Question Answering with Knowledge Base
109 Attention-Constraint Facial Expression Recognition
110 Defense for adversarial videos by Self-adaptive JPEG Compression and Optical Texture
111 Fusing CAMs-Weighted Features and Temporal Information for Robust Loop Closure Detection
123 Multiplicative Angular Margin Loss for Text-Based Person Search
129 Attended Feature Matching for Weakly-supervised Video Relocalization
Demo 3-Mirrored  17:00-18:00 00:00-01:00+1 day 145 A Large-Scale Image Retrieval System for Everyday Scenes
Oral Session 3-Mirrored (4 papers) 18:00-19:20 01:00-02:20+1 day 68 Overlap Classification Mechanism for Skeletal Bone Age Assessment
72 Motion-Transformer: Self-supervised Pre-trianing for Skeleton-based Action Recognition
75 A Background-induced Generative Network with Multi-level Discriminator for Text-to-Image Generation
76 WFN-PSC: Weighted-Fusion Network with Poly-Scale Convolution for image dehazing
Oral Session 4-Mirrored (5 papers) 19:20-21:00+1 day 02:20-04:00+1 day 88 An Autoregressive Generation Model for Producing Instant Basketball Defensive Trajectory
104 Objective Object Segmentation Visual Quality Evaluation based on Pixel-Level and Region-Level Characteristics
115 Fixations Based Personal Target Objects Segmentation
120 Relationship Graph Learning Network For Visual Relationship Detection
127 Graph-Based Motion Prediction for Abnormal Action Detection
Special Session Poster 3-Mirrored-Multimedia analysis and understanding (8 papers) 21:00-21:40 04:00-04:40+1 day 51 Scene Graph Generation via Multi-Relation Classification and Cross-modal Attention Coordinator
54 Graph Convolution Network with Node Feature Optimization Using Cross Attention for Few-shot Learning
65 A Multi-scale Human Action Recognition Method Based on Laplacian Pyramid Depth Motion Images
77 Video Scene Detection Based on Link Prediction Using Graph Convolution Network
78 Cross-Cultural Design of Facial Expressions for Humanoids-Is There Cultural Difference Between Japan and Denmark?
119 Improving auto-encoder novelty detection using channel attention and entropy minimization
122 Local Structure Alignment Guided Domain Adaptation with Few Source Samples
138 Efficient Inter-image Relation Graph Neural Network Hashing for Scalable Image Retrieval
Poster Session 3-Mirrored (8 papers) 22:00-22:40 05:00-05:40+1 day 84 Transfer Non-stationary Texture with Complex Appearance
93 C3VQG: Category Consistent Cyclic Visual Question Generation
106 Text-based Visual Question Answering with Knowledge Base
109 Attention-Constraint Facial Expression Recognition
110 Defense for adversarial videos by Self-adaptive JPEG Compression and Optical Texture
111 Fusing CAMs-Weighted Features and Temporal Information for Robust Loop Closure Detection
123 Multiplicative Angular Margin Loss for Text-Based Person Search
129 Attended Feature Matching for Weakly-supervised Video Relocalization