DICTA 2021 - Keynote Speakers

Keynote Speech 1:
Speaker: Jeremy Howard, fast.ai
Title: What we've learned about creating accurate image models quickly and easily.
We've spent the last few years trying to reduce the barriers to creating image models. That meant figuring out how to train accurate models that needed less data, less compute, less development time, and less complexity. In this talk I'll describe what we've learned during this time, and describe how we've made those learnings easily accessible in an open source software library (fastai).

Speaker Biography:

Jeremy Howard is a data scientist, researcher, developer, educator, and entrepreneur. Jeremy is a founding researcher at fast.ai, a research institute dedicated to making deep learning more accessible. He is also a Distinguished Research Scientist at the University of San Francisco, the chair of WAMRI, and is Chief Scientist at platform.ai.

Previously, Jeremy was the founding CEO Enlitic, which was the first company to apply deep learning to medicine, and was selected as one of the world’s top 50 smartest companies by MIT Tech Review two years running. He was the President and Chief Scientist of the data science platform Kaggle, where he was the top ranked participant in international machine learning competitions 2 years running. He was the founding CEO of two successful Australian startups (FastMail, and Optimal Decisions Group–purchased by Lexis-Nexis). Before that, he spent 8 years in management consulting, at McKinsey & Co, and AT Kearney. Jeremy has invested in, mentored, and advised many startups, and contributed to many open source projects.

He has many media appearances, including writing for the Guardian, USA Today, and the Washington Post, appearing on ABC (Good Morning America), MSNBC (Joy Reid), CNN, Fox News, BBC, and was a regular guest on Australia’s highest-rated breakfast news program. His talk on TED.com, "The wonderful and terrifying implications of computers that can learn", has over 2.5 million views. He is a co-founder of the global Masks4All movement.

Keynote Speech 2:
Speaker: Stefan Hrabar, PhD, Emesent
Title: Using autonomous drones to map and explore underground mines

Capturing data in underground mines is critical for optimising production and improving the safety of the operations. With continued depletion of near surface ore bodies, mines are getting deeper, which increases hazards such as seismicity. Monitoring this requires an increase in data collection, which paradoxically increase worker exposure to hazards. The use of autonomous systems breaks this impasse by delivering inspection and data capture methods that do not compromise the safety of personnel. In this talk I’ll describe how Emesent’s Hovermap solution allows drones to navigate the challenging GPS-denied underground environment to capture 3D data, as well as the valuable insights that can be derived from this data.

Speaker Biography:

Dr. Stefan Hrabar has been at the forefront of drone autonomy R&D for nearly 20 years. Following his PhD in Computer Science / Robotics on this topic, he spent 13 years at CSIRO where he continued his work on vision and lidar-based perception and navigation for drones. He led the development and commercialisation of Hovermap in CSIRO, and co-founded Emesent in 2018 to bring this ground-breaking technology to market.

Keynote Speech 3:
Speaker: Professor Mohammed Bennamoun, The University of Western Australia
Title: Computer and Robot Vision

Robotics has made significant progress in cases of structured and constrained environments, e.g. manufacturing. However, it is still in its infancy when it comes to applications in unstructured and unconstrained situations e.g. social environments. In some aspects such as speed, strength and accuracy, robots have superior capacities compared to humans but that is not the case for person/object recognition, language, manual dexterity, and social interaction and understanding capabilities.

Developing a computer vision system with Human visual recognition capabilities has been a very big challenge. It has been hindered mainly by: (i) the non-availability of 3D sensors (with the capabilities of the human eye) which are able to simultaneously capture appearance (colour and texture), surface shapes of objects while in motion, and (ii) the non-availability of algorithms to process this information in real-time. Recently, a number of affordable 3D sensors appeared in the market which is resulting in the development of practical 3D systems. Examples include 3D object and 3D face recognition for biometric applications, as well as the development of home robotic platforms to assist the elderly with mild cognitive impairment.

The objective of the talk will be to describe few 3D computer vision projects and tools used towards the development of a platform for assistive robotics in messy living environments. Various systems with applications and their motivations will be described including 3D object recognition, 3D face/ear biometrics, Grasping of unknown objects, and systems to estimate the 3D pose of a person.

Speaker Biography:

Mohammed Bennamoun is Winthrop Professor in the Department of Computer Science and Software Engineering at the University of Western Australia (UWA) and is a researcher in computer vision, machine/deep learning, robotics, and signal/speech processing. He has published 4 books (available on Amazon), 1 edited book, 1 Encyclopedia article, 14 book chapters, 150+ journal papers, 260+ conference publications, 16 invited and keynote publications. His h-index is 60+ and his number of citations is 16,000+ (Google Scholar). He was awarded 70+ competitive research grants, from the Australian Research Council, and numerous other Government, UWA and industry Research Grants. He successfully supervised +26 PhD students to completion. He won the Best Supervisor of the Year Award at Queensland University of Technology (1998), and received award for research supervision at UWA (2008 and 2016) and Vice-Chancellor Award for mentorship (2016). He delivered conference tutorials at major conferences, including: IEEE CVPR 2016, Interspeech 2014, IEEE ICASSP, and ECCV. He was also invited to give a Tutorial at an International Summer School on Deep Learning (DeepLearn 2017).

Keynote Speech 4:
Speaker: Tao Mei, PhD, IEEE/IAPR Fellow, JD.COM
Title: Towards Deep Visual Understanding: from Perception to Cognition

With the rise and development of deep learning over the past decade, there has been a steady momentum of innovation and breakthroughs that push the limits and improve the state-of-the-arts of visual understanding by visual perception (e.g., object detection and image recognition). Most existing perception techniques heavily rely on large amounts of labeled data and are very vulnerable against adversarial perturbations, which further limits the practical applications of deep visual understanding in the wild. Researchers are now delving into cognition paradigm, which mimics the inference and reasoning abilities of humans. This talk will briefly review recent progresses on deep visual understanding, ranging from perception to cognition, as well as their applications in retail, logistics, and manufacturing scenarios.

Speaker Biography:

Tao Mei is a Vice President with JD.COM and the Deputy Managing Director of JD AI Research. Prior to joining JD.COM in 2018, he was a Senior Research Manager with Microsoft Research Asia. He has authored or co-authored over 200 publications (with 12 best paper awards) in journals and conferences. He is or has been an Editorial Board Member of leading multimedia journals, and the General/Program Chairs of premier multimedia conferences. He was elected to a Fellow of IEEE (2019), a Fellow of IAPR (2016), and a Distinguished Scientist of ACM (2016), for his contributions to large-scale multimedia analysis and applications.