Overview

It is our great pleasure to welcome you to the 1st International Workshop on Trustworthy AI for Multimedia Computing, being held at the 2021 ACM Multimedia Conference. Artificial Intelligence technologies have been widely adopted in various computer systems including many multimedia applications. Meanwhile, various trustworthy AI problems arising during the development and deployment of AI applications have been receiving increasing attention from both academia and industry. These trustworthy topics include model Robustness & Safety, Fairness, Data Privacy, Explainability, Accountability, and Transparency. The goal of this workshop is to bring researchers in multimedia computing and trustworthy AI area together to have an open discussion on how to develop trustworthy multimedia technologies. This year is our first attempt, and we received submissions from China, USA, Europe, and other countries or areas. We also encourage attendees to attend our invited talk presentations. These valuable and insightful talks will guide us to a better understanding of the trustworthy AI problem and thus do better research on intelligent multimedia computing.

Invited Talks

Talk 1: Ethics and Trust for Robots and AIs
Time: 09:05 - 09:45

Benjamin Kuipers

Prof. Benjamin Kuipers is a Professor of Computer Science and Engineering at the University of Michigan. He was previously at the University of Texas at Austin, where he held an endowed professorship and served as Computer Science department chair. He received his B.A. from Swarthmore College, his Ph.D. from MIT, and he is a Fellow of AAAI, IEEE, and AAAS. His research in artificial intelligence and robotics has focused on the representation, learning, and use of foundational domains of commonsense knowledge, including knowledge of space,dynamical change, objects, and actions. He is currently investigating ethics as a foundational domain of knowledge for robots and other AIs that may act as members of human society.

Abstract: To apply trustworthy AI to multimedia computing, we must understand the roles and relations among ethics, trust, and cooperation in ensuring the well-being of our society. As robots and other intelligent systems (AIs) increasingly participate along with humans in human society, they must be worthy of trust by human users and bystanders. Since the world is effectively infinitely complex, efficiently making action decisions for humans or for robots requires simplified models. However, poorly chosen models can propose actions that lead to poor outcomes. The Prisoner’s Dilemma shows how utility maximization with an overly-simple utility measure can lead to a poor outcome, while an improved utility measure gives a good result. In the Moral Machine experiment, an autonomous vehicle faces a “deadly dilemma” and tries to choose the lesser of two evils, while an improved decision model could avoid the dilemma in the first place. Trust plays an essential role in the well-being of society, but when trust is lost, it is lost quickly and recovers slowly, if at all. In certain cases, poor decision models can encourage the exploitation of trust. They could create an existential risk for our society, at a time when we face serious threats.

Talk 2: Trustworthy Machine Learning via Logic Inference
Time: 09:45 - 10:25

Bo Li

Prof. Bo Li is an assistant professor in the department of Computer Science at University of Illinois at Urbana–Champaign, and the recipient of the Symantec Research Labs Fellowship, Rising Stars, MIT Technology Review TR-35 award, Intel Rising Star award, NSF CAREER Award, Research Awards from Tech companies such as Amazon, Facebook, Google, and IBM, and best paper awards in several machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, security, machine learning, privacy, and game theory. She has designed several scalable frameworks for robust machine learning and privacy preserving data publishing systems. Her work has been featured by major publications and media outlets such as Nature, Wired, Fortune, and New York Times.

Abstract: Advances in machine learning have led to rapid and widespread deployment of learning based inference and decision making for safety-critical applications, such as autonomous driving and security diagnostics. Current machine learning systems, however, assume that training and test data follow the same, or similar, distributions, and do not consider active adversaries manipulating either distribution. Recent work has demonstrated that motivated adversaries can circumvent anomaly detection or other machine learning models at test time through evasion attacks, or can inject well-crafted malicious instances into training data to induce errors in inference time through poisoning attacks. In this talk, I will describe my recent research about security and privacy problems in machine learning systems. In particular, I will introduce several adversarial attacks in different domains, and discuss potential defensive approaches and principles, including game theoretic based and knowledge enabled robust learning paradigms, towards developing practical robust learning systems with robustness guarantees.

Talk 3: Large Scale Vertical Federated Learning
Time: 10:25 - 10:55

Liefeng Bo

Dr. Leifeng Bo is the Head of Silicon Valley R&D Center at JD Technology, leading a team to develop advanced AI technologies. He was a Principal Scientist at Amazon for building a grab-and-go shopping experience using computer vision, deep learning and sensor fusion technologies. He received his PhD from Xidian University in 2007, and was a postdoctoral researcher at TTIC and University of Washington, respectively. His research interests are in machine learning, deep learning, computer vision, robotics, and natural language processing. He won the National Excellent Doctoral Dissertation Award of China in 2010, and the Best Vision Paper Award in ICRA 2011.

Abstract: In this talk, we will share our experiences in developing FedLearn, a large-scale vertical federated learning framework that focuses on scalability over vertically partitioned data sets via both algorithmic and engineering optimizations. We start by adopting a distributed framework for Federated Random Forest, which parallelly constructs classification or regression trees over multiple subsets of randomly selected samples and features on clusters of machines. Federated Random Forest relies on time-costly homomorphic encryption to protect transmitted data, and we further boost its efficiency by homomorphic encryption acceleration, high-performance computing, message compression, and asynchronous transmission. Secondly, we develop Federated Kernel Learning that introduces efficient kernel-based gradient approximations to protect transmitted data rather than sticking with homomorphic encryption. We demonstrate kernel-based approximations coordinate with parallel computing and scale well with data size. FedLearn provides a series of reliable optimizations to vertical federated learning algorithms and makes them highly scalable and efficient for industrial applications. We expect FedLearn to help researchers and developers with practical solutions on large-scale vertically federated learning scenarios.

Talk 4: De-biasing algorithms for images seen and unseen
Time: 10:55 - 11:35

Cees Snoek Prof. Cees Snoek is a full professor in computer science at the University of Amsterdam, where he heads the Video & Image Sense Lab. He is also a director of three public-private Al research labs: QUVA Lab with Qualcomm, Atlas Lab with TomTom and AIM Lab with the Inception Institute of Artificial Intelligence. At University spin-off Kepler Vision Technologies he acts as Chief Scientific Officer. Professor Snoek is also the director of the master program in Artificial Intelligence and co founder of the Innovation Center for Artificial Intelligence. His research interests focus on making sense of video and images. He has published over 200 refereed book chapters, journal and conference papers, and frequently serves as an area chair of the major conferences in computer vision and multimedia. He is currently an associate editor for Computer Vision and Image t Understanding and the IEEE Transactions on Pattern Analysis and Machine Intelligence.

Abstract: It is well known that datasets in multimedia computing have a strong built-in bias as they can represent only a narrow view of the real world. Even though addressing biases from the start of the dataset creation is highly recommended, models learned from such data can still be affected by spurious correlations and produce unfair decisions. In this talk I present two algorithms that try to mitigate this bias for image classification. For seen images we identify a bias direction in the feature space that corresponds to the main direction of maximum variance of class-specific prototypes. In light of this, we propose to learn to map inputs to domain-specific embeddings, where each value of a protected attribute has its own domain. For unseen images, in a generalized zero-shot learning setting, we propose a bias-aware learner to map inputs to a semantic embedding space. During training, the model learns to regress to real-valued class prototypes in the embedding space with temperature scaling, while a margin-based bidirectional entropy term regularizes seen and unseen probabilities. Experiments demonstrate the benefits of the proposed de-biased classifiers in multi-label and zero-label settings, as well as their ability to improve fairness of the predictions.

Talk 5: Trustworthy Visual Understanding: Generic Representation Learning and Explainable Interpretation
Time: 11:35 - 12:15

Ting Yao

Dr. Ting Yao is currently a Principal Researcher in Vision and Multimedia Lab at JD AI Research, Beijing, China. His research interests include video understanding, vision and language, and deep learning. Prior to joining JD.com, he was a Researcher with Microsoft Research Asia, Beijing, China. Ting is the lead architect of a few top-performing multimedia analytic systems in international benchmark competitions such as ActivityNet Large Scale Activity Recognition Challenge 2019-2016, Visual Domain Adaptation Challenge 2019-2017, and COCO Image Captioning Challenge. He is also the lead organizer of MSR Video to Language Challenge in ACM Multimedia 2017 & 2016, and built MSR-VTT, a large-scale video to text dataset that is widely used worldwide. His works have led to many awards, including ACM SIGMM Outstanding Ph.D. Thesis Award 2015, ACM SIGMM Rising Star Award 2019, and IEEE TCMC Rising Star Award 2019. He is an Associate Editor of IEEE Trans. on Multimedia.

Abstract: One important factor credited for the remarkable developments in computer vision today is the emergence of deep neural networks. Despite having encouraging performances, these approaches become increasingly opaque to the end users and are brittle against real-world deployment of intelligent systems. As such, trustworthy AI is gaining intensive traction recently and develops AI systems which can be certified to be trustworthy and robust. In this talk, we will describe our recent works in three important aspects of AI trustworthiness: 1) Generalization – aim for learning generic visual representation by self-supervised learning; 2) Generalization – pre-train a universal encoder-decoder structure for cross-modal tasks; 3) Explainability – design explainable representations/models for interpreting images/videos. Moreover, we will conclude with some future potentials.

Talk 6: Towards Efficient and Explainable End-to-End Reinforcement Driving Policy Learning
Time: 13:30 - 14:00

Fisher Yu

Fisher Yu is an Assistant Professor at ETH Zürich in Switzerland. He obtained his Ph.D. degree from Princeton University and became a postdoctoral researcher at UC Berkeley. He is now leading the Visual Intelligence and Systems (VIS) group, part of Computer Vision Lab (CVL), at ETH Zürich. His goal is to build perceptual systems capable of performing complex tasks in complex environments. His research is at the junction of machine learning, computer vision, and robotics. He currently works on closing the loop between vision and action. His works on image representation learning and large-scale datasets, especially dilated convolutions and the BDD100K dataset, have become essential parts of computer vision research.

Abstract: End-to-end driving policy learning has always been an attractive intellectual topic among AI research researchers. Recent advances in this area shed light on how we may achieve it in real systems. In this talk, I will briefly review the current promising methods. I will then introduce our recent work on learning an effective expert to teach an imitation policy learning agent, which can achieve outstanding performance for driving in a simulation environment. Further, I will discuss making the existing policy learning method more explainable and data-efficient from a reinforcement learning perspective. Our works draw inspiration from the traditional model predictive control methods. I hope those ideas can inspire more researchers to study end-to-end learning systems.

Paper Presentation

Time                     Paper
14:00-14:20 An Emperical Study of Uncertainity Gap for Disentangling Factors
14:20-14:40 Patch Replacement: A transformation based method to improve robustness against adversarial attack
14:40-15:00 Dataset Diversity: Measuring and Mitigating Geographical Bias in Image Search and Retrieval
15:00-15:20 Hierarchical Semantic Enhanced Directional Graph Network for Visual Commonsense Reasoning

Call for Contributions

We believe the workshop will offer a timely collection of research updates to benefit the people working in the broad fields ranging from multimedia, computer vision to machine learning. To this end, we solicit original research and survey papers addressing the topics listed below (but not limited to):

Important dates

All deadlines are at midnight(23:59) anywhere on Earth.

Instructions

We use the same formatting template as ACM Multimedia 2021. Submissions can be of varying length from 4 to 8 pages, plus additional pages for the reference pages. There is no distinction between long and short papers. All papers will undergo the same review process and review period. All contributions must be submitted through CMT.

Use the following link: https://cmt3.research.microsoft.com/ACMMM2021/ Select the track: “1st International Workshop on Trustworthy AI for Multimedia Computing”. The accepted workshop papers will be published in the ACM Digital Library.

Schedule

Time Event
9:00 Opening remarks
9:05 Invited talk 1
9:45 Invited talk 2
10:25 Invited talk 3
10:55 Invited talk 4
11:35 Invited talk 5
12:05 Lunch break
13:30 Invited talk 6
14:00 Oral 1
14:20 Oral 2
14:40 Oral 3
15:00 Oral 4
15:20 Poster Session and tea break

Organizers

Teddy Furon
Teddy Furon
INRIA
Jingen Liu
Jingen Liu
JD.com
Yogesh Rawat
Yogesh Rawat
University of Central Florida (UCF)
Wei Zhang
Wei Zhang
JD.com
Qi Zhao
Qi Zhao
University of Minnesota (UMN)

Advisory Board

Prof. Mohan Kankanhalli National University of Singapore, Singapore
Prof. Jiebo Luo University of Rochester, USA
Dr. Tao Mei JD.com AI Research, China
Prof. Mubarak Shah University of Central Florida, USA
Dr. Bowen Zhou JD.com AI Research, China

Program Committee

Ping Liu Sr. Research Scientist, A*STAR, Singapore
Naveed Akhtar Assistant Professor, University of Western Australia, Australia
Zheng Sou Research Scientist, Facebook AI, USA
Jian Liu JD.com Silicon Valley Research Labs, USA
Ziyan Wu Principal Scientist, United Imaging Intelligence, USA
Fisher Yu Assistant Professor, ETH Zürich
Huazhu Fu Senior Scientist, IIAI, United Arab Emirates
Yingwei Pan Research Scientist, JD AI Research, China
Yalong Bai Research Scientist, JD AI Research, China
Rahul Ambati PhD Student, University of Central Florida
Aayush Rana PhD Student, University of Central Florida