Data |
Program |
Europe Time(UTC+1) |
Beijing Time(UTC+8) |
Paper ID |
Paper Title |
2021.3.7 |
Conference Opening |
01:00-02:00 |
08:00-09:00 |
|
|
Keynote 1 by Jiebo Luo |
02:00-03:00 |
09:00-10:00 |
|
|
Best Paper Session(4 papers) |
03:00-05:00 |
10:00-12:00 |
12 |
Distilling Knowledge in Causal Inference for Unbiased Visual Question
Answering |
58 |
Similar Scene Retrieval in Soccer Videos with Weak Annotations by
Multimodal Use of Bidirectional LSTM |
73 |
Interactive Re-ranking for Cross-modal Retrieval Based on Object-wise
Question Answering |
92 |
Real-Time Arbitrary Video Style Transfer |
Tutorial 1:Bias Issues and Solutions in Recommender System |
|
07:00-09:00 |
|
|
14:00-16:00 |
|
|
Bias Issues and Solutions in Recommender System |
Demo 1 |
09:00-10:00 |
16:00-17:00 |
142 |
Synthesized 3D Model Suggestions with Smartphone Based MR to Modify the
PreBuilt Environment: Interior Design |
Special Session Poster
1-Multimedia application(8 papers) |
10:00-10:40 |
17:00-17:40 |
18 |
An Automated Method with Anchor-Free Detection
and U-Shaped Segmentation for Nuclei Instance Segmentation |
22 |
Improving face recognition in Surveillance video with judicious
selection and fusion of representative frames |
25 |
A Multimedia Solution to Motivate Childhood Cancer Patients to Keep Up
with Cancer Treatment |
85 |
Story Segmentation For News Broadcast Based On Primary Caption |
86 |
Intermediate Coordinate based Pose Non-perspective Estimation from Line
Correspondences |
132 |
Structure-Preserving Extremely Low Light Image Enhancement with
Fractional Order Differential Mask Guidance |
136 |
Change Detection from Synthetic Aperture Radar Images Based on
Deformable Residual Convolutional Neural Networks |
146 |
Towards Annotation-Free Evaluation of Cross-Lingual Image Captioning |
Poster Session 1(8 papers) |
11:00-11:40 |
18:00-18:40 |
4 |
A Treatment Engine by Multimodal EMR Data |
11 |
Storyboard Relational Model for Group Activity Recognition |
26 |
Global and Local Feature Alignment for Video Object Detection |
28 |
Semantic Feature Augmentation for Fine-grained Visual Categorization
with Few-Sample Training |
33 |
Destylization of text with decorative elements |
35 |
Hierarchical Clustering via Mutual Learning for Unsupervised Person
Re-identification |
48 |
Robust Visual Tracking via Scale-Aware Localization and Peak Response
Strength |
49 |
Hungry Networks: 3D Mesh Reconstruction of a Dish and a Plate from a
Single Dish Image for Estimating Food Volume |
Demo 1-Mirrored |
17:00-18:00 |
00:00-01:00+1 day |
142 |
Synthesized 3D Model Suggestions with Smartphone Based MR to Modify the
PreBuilt Environment: Interior Design |
Special Session Poster
1-Mirrored-Multimedia application(8 papers) |
18:00-18:40 |
01:00-01:40+1 day |
18 |
An Automated Method with Anchor-Free Detection
and U-Shaped Segmentation for Nuclei Instance Segmentation |
22 |
Improving face recognition in Surveillance video with judicious
selection and fusion of representative frames |
25 |
A Multimedia Solution to Motivate Childhood Cancer Patients to Keep Up
with Cancer Treatment |
85 |
Story Segmentation For News Broadcast Based On Primary Caption |
86 |
Intermediate Coordinate based Pose Non-perspective Estimation from Line
Correspondences |
132 |
Structure-Preserving Extremely Low Light Image Enhancement with
Fractional Order Differential Mask Guidance |
136 |
Change Detection from Synthetic Aperture Radar Images Based on
Deformable Residual Convolutional Neural Networks |
146 |
Towards Annotation-Free Evaluation of Cross-Lingual Image Captioning |
Poster Session 1-Mirrored (8 papers) |
19:00-19:40 |
02:00-02:40+1 day |
4 |
A Treatment Engine by Multimodal EMR Data |
11 |
Storyboard Relational Model for Group Activity Recognition |
26 |
Global and Local Feature Alignment for Video Object Detection |
28 |
Semantic Feature Augmentation for Fine-grained Visual Categorization
with Few-Sample Training |
33 |
Destylization of text with decorative elements |
35 |
Hierarchical Clustering via Mutual Learning for Unsupervised Person
Re-identification |
48 |
Robust Visual Tracking via Scale-Aware Localization and Peak Response
Strength |
49 |
Hungry Networks: 3D Mesh Reconstruction of a Dish and a Plate from a
Single Dish Image for Estimating Food Volume |
Data |
Program |
Europe Time(UTC+1) |
Beijing Time(UTC+8) |
Paper ID |
Paper Title |
2021.3.8 |
Keynote 2 by Kristen Grauman |
01:00-02:00 |
08:00-09:00 |
|
|
Tutorial 2:10 Years of Video Browser Showdown |
|
02:00-04:00 |
|
|
09:00-11:00 |
|
|
10 Years of Video Browser Showdown |
Demo 2 |
04:00-05:00 |
11:00-12:00 |
144 |
SeekSuspect : Retrieving Suspects from Criminal Datasets using Visual
Memory |
Oral Session 1(4 papers) |
07:00-08:20 |
14:00-15:20 |
16 |
Incremental Multi-view Object Detection from a Moving Camera |
24 |
Low-quality Watermarked Face Inpainting with Discriminative Residual
Learning |
31 |
Unsupervised learning of co-occurrences for face images retrieval |
32 |
EvoGAN: An Evolutionary GAN for Face Aging and Rejuvenation |
Oral
Session 2(4 papers) |
08:20-09:40 |
15:20-16:40 |
36 |
Self-Supervised Adversarial Learning for Cross-Modal Retrieval |
37 |
Multi-Level Expression Guided Attention Network for Referring
Expression Comprehension |
45 |
Learning Intra-inter Semantic Aggregation for Video Object Detection |
55 |
A Multi-Scale Language Embedding Network for Proposal-Free Referring
Expression Comprehension |
Special Session Poster 2-Multimedia system(8
papers) |
10:00-10:40 |
17:00-17:40 |
23 |
Two-stage Structure Aware Image Inpainting
Based on Generative Adversarial Network |
39 |
Adaptive Feature Aggregation Network for Nuclei Segmentation |
44 |
Classification of Multimedia SNS Posts about Tourist Sites Based on
Their Focus toward Predicting Eco-Friendly Users |
80 |
Table Detection and Cell Segmentation in Online Handwritten Documents
with Graph Attention Networks |
97 |
Determining Image Age with Rank-Consistent Ordinal Classification and
Object-centered Ensemble |
99 |
Cross-Modal Learning for Saliency Prediction in Mobile Environment |
125 |
Integrating Aspect-aware Interactive Attention and Emotional
Position-aware for Multi-aspect Sentiment Analysis |
130 |
Pulse Localization Networks with Infrared Camera |
Poster Session 2(8 papers) |
11:00-11:40 |
18:00-18:40 |
52 |
A Novel System Architecture and an Automatic Monitoring Method for
Remote Production |
60 |
Patch Assembly for Real-time Instance Segmentation |
61 |
Full-Resolution Encoder–Decoder Networks with Multi-Scale Feature
Fusion for Human Pose Estimation |
64 |
Graph-based Variational Auto-Encoder for Generalized Zero-Shot Learning |
66 |
Fixed-size Video Summarization over Streaming Data via Non-monotone
Submodular Maximization |
69 |
Multi-focus noisy image fusion based on gradient regularized
convolutional sparse representation |
71 |
Fixation Guided Network for
Salient Object Detection |
83 |
RICAPS: Residual Inception and Cascaded Capsule Network for Broadcast
Sports Video Classification |
Demo 2-Mirrored |
17:00-18:00 |
00:00-01:00+1 day |
144 |
SeekSuspect : Retrieving Suspects from Criminal Datasets using Visual
Memory |
Oral Session 1-Mirrored (4 papers) |
18:00-19:20 |
01:00-02:20+1 day |
16 |
Incremental Multi-view Object Detection from a Moving Camera |
24 |
Low-quality Watermarked Face Inpainting with Discriminative Residual
Learning |
31 |
Unsupervised learning of co-occurrences for face images retrieval |
32 |
EvoGAN: An Evolutionary GAN for Face Aging and Rejuvenation |
Oral
Session 2-Mirrored (4 papers) |
19:40-21:00 |
02:40-04:00+1
day |
36 |
Self-Supervised Adversarial Learning for Cross-Modal Retrieval |
37 |
Multi-Level Expression Guided Attention Network for Referring
Expression Comprehension |
45 |
Learning Intra-inter Semantic Aggregation for Video Object Detection |
55 |
A Multi-Scale Language Embedding Network for Proposal-Free Referring
Expression Comprehension |
Special Session Poster 2-Mirrored-Multimedia
system (8 papers) |
21:00-21:40 |
04:00-04:40+1 day |
23 |
Two-stage Structure Aware Image Inpainting
Based on Generative Adversarial Network |
39 |
Adaptive Feature Aggregation Network for Nuclei Segmentation |
44 |
Classification of Multimedia SNS Posts about Tourist Sites Based on
Their Focus toward Predicting Eco-Friendly Users |
80 |
Table Detection and Cell Segmentation in Online Handwritten Documents
with Graph Attention Networks |
97 |
Determining Image Age with Rank-Consistent Ordinal Classification and
Object-centered Ensemble |
99 |
Cross-Modal Learning for Saliency Prediction in Mobile Environment |
125 |
Integrating Aspect-aware Interactive Attention and Emotional
Position-aware for Multi-aspect Sentiment Analysis |
130 |
Pulse Localization Networks with Infrared Camera |
Poster Session 2-Mirrored (8 papers) |
22:00-22:40 |
05:00-05:40+1 day |
52 |
A Novel System Architecture and an Automatic Monitoring Method for
Remote Production |
60 |
Patch Assembly for Real-time Instance Segmentation |
61 |
Full-Resolution Encoder–Decoder Networks with Multi-Scale Feature
Fusion for Human Pose Estimation |
64 |
Graph-based Variational Auto-Encoder for Generalized Zero-Shot Learning |
66 |
Fixed-size Video Summarization over Streaming Data via Non-monotone
Submodular Maximization |
69 |
Multi-focus noisy image fusion based on gradient regularized
convolutional sparse representation |
71 |
Fixation Guided Network for
Salient Object Detection |
83 |
RICAPS: Residual Inception and Cascaded Capsule Network for Broadcast
Sports Video Classification |
Data |
Program |
Europe Time(UTC+1) |
Beijing Time(UTC+8) |
Paper ID |
Paper Title |
2021.3.9 |
Keynote 3 by Bernt
Schiele |
01:00-02:00 |
08:00-09:00 |
|
|
Demo 3 |
02:00-03:00 |
09:00-10:00 |
145 |
A Large-Scale Image Retrieval System for Everyday Scenes |
Oral Session 3(4 papers) |
03:00-04:20 |
10:00-11:20 |
68 |
Overlap Classification Mechanism for Skeletal Bone Age Assessment |
72 |
Motion-Transformer: Self-supervised Pre-trianing for Skeleton-based
Action Recognition |
75 |
A Background-induced Generative Network with Multi-level Discriminator
for Text-to-Image Generation |
76 |
WFN-PSC: Weighted-Fusion Network with Poly-Scale Convolution for image
dehazing |
Oral
Session 4(5 papers) |
07:00-08:40 |
14:00-15:40 |
88 |
An Autoregressive Generation Model for Producing Instant Basketball
Defensive Trajectory |
104 |
Objective Object Segmentation Visual Quality Evaluation based on
Pixel-Level and Region-Level Characteristics |
115 |
Fixations Based Personal Target Objects Segmentation |
120 |
Relationship Graph Learning Network For Visual Relationship Detection |
127 |
Graph-Based Motion Prediction for Abnormal Action Detection |
Special Session Poster 3-Multimedia analysis and
understanding(8 papers) |
09:00-09:40 |
16:00-16:40 |
51 |
Scene Graph Generation via Multi-Relation Classification and
Cross-modal Attention Coordinator |
54 |
Graph Convolution Network with Node Feature Optimization Using Cross
Attention for Few-shot Learning |
65 |
A Multi-scale Human Action Recognition Method Based on Laplacian
Pyramid Depth Motion Images |
77 |
Video Scene Detection Based on Link Prediction Using Graph Convolution
Network |
78 |
Cross-Cultural Design of Facial Expressions for Humanoids-Is There
Cultural Difference Between Japan and Denmark? |
119 |
Improving auto-encoder novelty detection using channel attention and
entropy minimization |
122 |
Local Structure Alignment Guided Domain Adaptation with Few Source
Samples |
138 |
Efficient Inter-image Relation Graph Neural Network Hashing for
Scalable Image Retrieval |
Poster Session 3(8 papers) |
10:00-10:40 |
17:00-17:40 |
84 |
Transfer Non-stationary Texture with Complex Appearance |
93 |
C3VQG: Category Consistent Cyclic Visual Question Generation |
106 |
Text-based Visual Question Answering with Knowledge Base |
109 |
Attention-Constraint Facial Expression Recognition |
110 |
Defense for adversarial videos by Self-adaptive JPEG Compression and
Optical Texture |
111 |
Fusing CAMs-Weighted Features and Temporal Information for Robust Loop
Closure Detection |
123 |
Multiplicative Angular Margin Loss for Text-Based Person Search |
129 |
Attended Feature Matching for Weakly-supervised Video Relocalization |
Demo 3-Mirrored |
17:00-18:00 |
00:00-01:00+1 day |
145 |
A Large-Scale Image Retrieval System for Everyday Scenes |
Oral Session 3-Mirrored (4 papers) |
18:00-19:20 |
01:00-02:20+1 day |
68 |
Overlap Classification Mechanism for Skeletal Bone Age Assessment |
72 |
Motion-Transformer: Self-supervised Pre-trianing for Skeleton-based
Action Recognition |
75 |
A Background-induced Generative Network with Multi-level Discriminator
for Text-to-Image Generation |
76 |
WFN-PSC: Weighted-Fusion Network with Poly-Scale Convolution for image
dehazing |
Oral
Session 4-Mirrored (5 papers) |
19:20-21:00+1
day |
02:20-04:00+1
day |
88 |
An Autoregressive Generation Model for Producing Instant Basketball
Defensive Trajectory |
104 |
Objective Object Segmentation Visual Quality Evaluation based on
Pixel-Level and Region-Level Characteristics |
115 |
Fixations Based Personal Target Objects Segmentation |
120 |
Relationship Graph Learning Network For Visual Relationship Detection |
127 |
Graph-Based Motion Prediction for Abnormal Action Detection |
Special Session Poster 3-Mirrored-Multimedia
analysis and understanding (8 papers) |
21:00-21:40 |
04:00-04:40+1 day |
51 |
Scene Graph Generation via Multi-Relation Classification and
Cross-modal Attention Coordinator |
54 |
Graph Convolution Network with Node Feature Optimization Using Cross
Attention for Few-shot Learning |
65 |
A Multi-scale Human Action Recognition Method Based on Laplacian
Pyramid Depth Motion Images |
77 |
Video Scene Detection Based on Link Prediction Using Graph Convolution
Network |
78 |
Cross-Cultural Design of Facial Expressions for Humanoids-Is There
Cultural Difference Between Japan and Denmark? |
119 |
Improving auto-encoder novelty detection using channel attention and
entropy minimization |
122 |
Local Structure Alignment Guided Domain Adaptation with Few Source
Samples |
138 |
Efficient Inter-image Relation Graph Neural Network Hashing for
Scalable Image Retrieval |
Poster Session 3-Mirrored (8 papers) |
22:00-22:40 |
05:00-05:40+1 day |
84 |
Transfer Non-stationary Texture with Complex Appearance |
93 |
C3VQG: Category Consistent Cyclic Visual Question Generation |
106 |
Text-based Visual Question Answering with Knowledge Base |
109 |
Attention-Constraint Facial Expression Recognition |
110 |
Defense for adversarial videos by Self-adaptive JPEG Compression and
Optical Texture |
111 |
Fusing CAMs-Weighted Features and Temporal Information for Robust Loop
Closure Detection |
123 |
Multiplicative Angular Margin Loss for Text-Based Person Search |
129 |
Attended Feature Matching for Weakly-supervised Video Relocalization |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|