Robots, Humans and Action
This research project expands the capabilities of robots by building representations and models that will allow a robot to understand human activity and the intent of human actions; with the ultimate goal to develop methods that facilitate robot-human interaction and cooperation.
Team Members
Hongdong Li
Australian National University
Chief Investigator Professor Hongdong Li has been with the College of Engineering and Computer Science, ANU since 2004. He was seconded to National ICT Australia (NICTA) as a Senior Research Scientist during 2008-2010 working on the ‘Australia Bionic Eyes’ Project. From 2010 he assumed a tenured position with ANU, doing teaching and research in 3D computer vision and robotics. He joined the ACRV in 2014 as one of the founding members. During 2017—2018 he was a Visiting Professor with the Robotics Institute at Carnegie Mellon University (CMU), Pittsburgh. During 2019-2020 he served as the Associate School Director for ANU Research School of Engineering.
Jointly with his students and co-workers he won a number of most prestigious awards in computer vision, including the “David Marr Prize- Honourable mention” in 2017, the IEEE CVPR Best Paper Award in 2012, and IEEE ICIP Best Student Paper Prize in 2014, and IEEE ICPR Best student Paper in 2010. Both the Marr Prize and CVPR Best Paper Award are highly regarded awards in the international computer vision community. He has supervised/co-supervised/graduated 20+ PhD students in the area of computer vision. His research projects have been funded by Australia Research Council, CSIRO, as well as by global technical firms including Microsoft Research, General Motors, Toshiba, Baidu AI etc.
Richard Hartley
Australian National University
Richard is renowned as one of the founders of the field of multi-view geometry in computer vision – his text has received over 28,000 citations. He contributes to the Centre’s Camera Hardware and Learning for Vision projects. Richard has been at ANU since January 2001. He was also the Program Leader for the Autonomous Systems and Sensor Technology Program of NICTA. Richard worked at the General Electric Research and Development Center from 1985 to 2001, where he became involved with Image Understanding and Scene Reconstruction working with GE’s Simulation and Control Systems Division. This division built large-scale flight-simulators. Dr. Hartley’s projects in this area were in the construction of terrain models and texture mosaics from aerial and satellite imagery. From 1995 he was GE project leader for a shared-vision project with Lockheed-Martin involving design and implementation of algorithms for an AFIS (fingerprint analysis) system being developed under a Lockheed-Martin contract with the FBI. This involved work in feature extraction, interactive fingerprint editing and fingerprint database matching. In 2000, he co-authored (with Andrew Zisserman) a book for Cambridge University Press, summarizing the previous decade’s research in this area. (Over 60,000 citations and an h-index of 78).
Dylan Campbell
Australian National University
Dylan joined the Centre as a Research Fellow at the ANU in August 2018. Previously, he was a PhD student at ANU and Data61/CSIRO, where he worked on geometric vision problems, and a research assistant in the Cyber-Physical Systems group of Data61/CSIRO, where he worked on Resource Constrained Vision. Dylan received a BE in Mechatronic Engineering from the University of New South Wales. He has broad research interests within computer vision and robotics, including geometric vision and human-centred vision. In particular, he has investigated geometric sensor data alignment problems, such as camera localisation, simultaneous localisation and mapping, and structure from motion.
He is currently looking at the problems of recognising, modelling, and predicting human actions, poses and human-object interactions with a view to facilitate robot-human interaction as part of a Centre project.
Fatemeh Saleh
Australian National University
Fatemeh joined the Centre as a Research Fellow at ANU in January 2019. Prior to that, she was a PhD student at ANU and Data61-CSIRO, working on weakly-supervised semantic segmentation of images and videos. Within the Centre, she is now working on the problem of video understanding and latent-variable generative models, with the focus on multiple object tracking, human motion prediction, and video activity analysis.
Yizhak (Itzik) Ben-Shabat
Australian National University
Itzik joined the Centre as a Research Fellow at the ANU node in July 2019. Previously, he was a PhD student at Technion Israel Institute of Technology where he worked on “Classification, segmentation, and geometric analysis of 3D point clouds using deep learning” under the supervision of Professor Anath Fischer and Michael Lindenbaum. Itzik completed his Bsc. Cum Laude in 2008 and his Msc. Summa Cum Laude in 2015 (Mechanical Engineering, Technion). His research interests lie at the intersection of robotic perception, 3D computer vision, and geometric analysis, usually using 3D point cloud data.
During his time at the Centre, he played a key role in the IKEA assembly dataset team, joined the RVSS organizing committee and presented DeepFit, a novel surface fitting method, at ECCV 2020 as an oral presentation./ His mission is to make beautifully practical and accessible 3D data algorithms to change the world.
Cristian Rodriguez Opazo
Australian National University
Cristian joined the Centre as PhD researcher under the supervision of Chief Investigator Hongdong Li and Research Fellow Basura Fernando. His research interests are machine learning and pattern recognition focus on the tasks of object detection, scene understanding and occlusion handling. Cristian completed a Bachelor and Computer Engineering degree at Metropolitan Technological University UTEM in Chile, before moving to Australia to complete a Master of Computing (Advanced) with a specialisation in artificial intelligence at ANU, with a Chilean scholarship ‘Becas Chile’. Before Crisitan joined ANU, he also worked as a research assistant and developer in the Web Intelligence Centre in Chile.
Frederic ‘Zhen’ Zhang
Australian National University
Fred is a PhD student at ANU under the supervision of Professor Stephen Gould. In 2018, he received his bachelor degree in engineering from ANU and bachelor of science from Beijing Institute of Technology.
Fred has been working on the task of Human-Object Interaction (HOI) Detection, and is generally interested in vision-based problems and its deep learning solutions.
Sadegh Aliakbarian
Australian National University
Sadegh is an Associated PhD researcher at our ANU node. He is working on generative modeling of natural human motion, which has applications in human motion prediction, motion synthesis, and better motion capture. He is also working on generative models in general, focusing on variational autoencoders, autoregressive models, and normalizing flows. During his PhD, Sadegh has done several internships, working on motion analysis, adversarial machine learning, and generative modeling.
Project Aim
To understand human actions and intent, robots need to make inferences from visual and motion cues, just like humans do. This research project expands the capabilities of robots by building representations and models that allow a robot to understand human activity and the interaction of humans with objects in their environment. The research is important because it will ultimately enable robots to co-operate with humans to complete complex task in unstructured environments; for example, assembling a piece of furniture in the home.
Key Results
In 2020, the project team continue to make important scientific contributions to solving the problems of human activity recognition and forecasting. Two papers by Centre researchers (Fatemeh Saleh, Xin Yu and Hongdong Li) and colleagues received best paper nominations at the 2020 conference for Computer Vision and Pattern Recognition (CVPR), for work on saliency detection and sign language recognition, respectively, which also has applications in understanding human gestures for human-robot cooperation. Another paper by Centre researchers Cristian Rodriguez Opazo, Xin Yu, Hongdong Li and collaborator Dongxu Li also received an honourable mention at the 2020 Winter Conference on Applications of Computer Vision (WACV).
Project Leader, Stephen Gould, Professor Richard Hartley and Postdoctoral Fellow, Dylan Campbell presented a workshop at CVPR 2020 on Deep Declarative Networks, which received a Centre award for Best Profile Raising Event in Robotics and Computer Vision Communities. This was followed up with a tutorial on the same topic organised by Postdoctoral Fellow Itzik Ben-Shabat was held at the European Conference on Computer Vision (ECCV) and included presentations by colleagues from Stanford and Facebook.
The team also published the Ikea Assembly dataset at WACV. The dataset is a multi-modal and multi-view video collection of furniture assembly tasks to enable rich analysis and understanding of human activities. It contains 371 samples of furniture assemblies and their ground-truth annotations. Each sample includes 3 RGB views, one depth stream, atomic actions, human poses, object segments, object tracking, and extrinsic camera calibration. The videos, annotations and associated code for data processing have been publicly released to the research community and were including it in the Best of ACRV Repository that is available on the Centre’s Legacy Website.
The dataset enabled a demonstration of human-robot cooperation in assembling a small Ikea table in collaboration with the Manipulation Project. The demo led by Postdoctoral Fellow Itzik Ben-Shabat and PhD Student Zheyu Zhuang, was showcased at RoboVis 2020. Discussion with Ikea Sweden are underway on research collaborations that can further extend this work.
The end of the year saw several our research team take up new positions as the Centre comes to close. Postdoctoral Fellow Dylan Campbell has taken up a position with the Visual Geometry Group at Oxford University, Itzik Ben-Shabat has commenced a prestigious three-year Marie-Curie Fellowship, and PhD student Cristian Rodriguez successfully submitted his PhD titled “Video Analysis for Understanding Human Actions and Interactions”. Postdoctoral Fellow Fatemeh Saleh and PhD Student Sadegh Aliakbarian were also awarded a DSTG grant to research player analytics and forecasting.