Tutorial 1: Geometric deep learning and its applications for Multimedia
Speaker:
-
- Hannes Fassold, JOANNEUM RESEARCH
- Email: hannes.fassold@joanneum.at
- Hannes Fassold received a MSc degree in Applied Mathematics from Graz University of Technology in 2004. Since then he works at JOANNEUM RESEARCH, where he is currently a senior researcher at the Intelligent Vision Applications Group of the DIGITAL institute. His main research interests are how to employ machine vision and AI methods successfully to solve real-world problems like image and video enhancement, defect inspection, object detection and tracking and so on. He is presenting regularly in renowned computer vision, multimedia & AI conferences like ACM Multimedia, ICIP, ICME, AIVR, MVA, MMSP, ASPAI, ISVC, EUSIPCO, GTC etc.). Furthermore, he is doing paper reviews for several of these conferences and has been also in the program committee. He coordinates the machine learning workflow as well as the dedicated ML hardware & software infrastructure for the DIGITAL institute.
Detailed Description and Outlin:
-
- Geometric deep learning, the learning in non-Euclidean domains, is an emerging research domain of machine learning. In the tutorial, we give an introduction into geometric deep learning - with a focus on manifold learning - and how it is employed for important application fields in multimedia like similarity search, image classification, synthesis & enhancement, video analysis, 3D data processing and nonlinear dimension reduction. We will present also open source software frameworks for geometric deep learning. Finally, as a spotlight we will present the manifold mixing model soup algorithm, a novel algorithm which mixes the latent space manifolds of several finetuned models together which provides significantly better out-of-distribution performance of the fused model.
- The tutorial will cover a number of topics from geometric deep learning. A tentative list of the topics is reported hereafter:
Introduction into geometric deep learning (with a focus on manifolds)
- Motivation
- Manifolds, graphs, 3D data
- Key operations on manifolds (geodesic distance, exponential/logarithm map, Frechet mean, convolution, …)
- Common Riemannian manifolds used in computer vision
- Lie groups & Lie algebra
Geometric deep learning algorithms in various multimedia application fields(for each field, a few approaches are briefly described)
- Similarity search
- Image classification
- Image synthesis & enhancement
- Video analysis
- 3D data processing
- Nonlinear dimension reduction
Open-source software packages for geometric deep learning
Spotlight: Manifold mixing model soups for better out-of-distribution performance
Target Audience and Prerequisite Knowledge
-
- The tutorial is meant for Ph.D and post-doctoral students, researchers and practitioners who deal with images and videos in all areas including detection, classification, segmentation, retrieval. The reason for proposing this tutorial at ACM Multimedia Asia is to promote the usage and tap into the potential of geometric deep learning for all kind of multimedia applications. A basic understanding of mathematics, image processing and machine learning is a prerequisite.
Tutorial 2: Streaming Media: Algorithms, Protocols and Systems
Speaker:
-
- Dr. Ali C. Begen, Ozyegin University (Also Comcast NBCUniversal)
https://ali.begen.net/
- Email: ali.begen@ozyegin.edu.tr
- Ali C. Begen is currently a computer science professor at Ozyegin University and a technical consultant in Comcast's Advanced Technology and Standards Group. Previously, he was a research and development engineer at Cisco. Begen received his PhD in electrical and computer engineering from Georgia Tech in 2006. To date, he received several academic and industry awards (including an Emmy® Award for Technology and Engineering), and was granted 30+ US patents. In 2020 and 2021, he was listed among the world's most influential scientists in the subfield of networking and telecommunications. More details are at https://ali.begen.net/ .
Detailed Description and Outlin:
-
- Streaming is a complex technology with dynamics that need to be studied thoroughly. The experience from the deployments in the last 10+ years suggests that streaming clients typically operate in an unfettered greedy mode and they are not necessarily designed to behave well in environments where other clients exist or network conditions can change dramatically.
- In this tutorial, we will examine the progress made in the streaming space over the last several years, primarily focusing on standards, interop guidelines, workflows, performance indicators, extensions for low latency, server, network and client collaboration and the research directions. We will also present several open-source tools for the attendees to explore these topics further in their own, practical environments.
Basic:
- Brief review of the history of streaming
- Key problems and innovations
- Current standards and interoperability guidelines
- Deployment workflows
- Performance metrics
Standards and ad-insertion methods:
- DASH
- HLS
- CMAF
- Video encoding for adaptive streaming
- Lessons from real-life large-scale deployments
Low-latency live streaming:
- Contributors of latency
- Low-latency extensions for DASH and HLS, and other proprietary and emerging solutions
- Encoder egress, origin and CDN considerations
- Design considerations and implementations
- WebRTC for point-to-multipoint distribution as an alternative to HTTP streaming
- Lessons from low-latency FIFA World Cup 2022 implementation
Emerging areas and research directions:
- Testing and simulation of streaming systems
- CMCD: Towards actionable observability in real time
- CMSD: Server and networked assistance in real time
- IETF Media over QUIC – towards next-generation streaming
- The material for this tutorial will be based on a variety of sources starting from the instructor’s courses and talks at MPEG, IETF, DASH Industry Forum and SCTE and Comcast’s encoding operational experience. The slides will be provided to the participants. Time permitting, example codes will be run.
Target Audience and Prerequisite Knowledge:
-
- This course includes both introductory and advanced level information. The audience is expected of understanding of basic video coding and IP networking principles. Students, researchers, developers, content and service providers are all welcome.