Dr. Deepan Muthirayan is currently an Assistant Professor at Plaksha University.
He obtained his PhD from the University of California at Berkeley (2016) and Dual Degree (BTech/MTech) from the Indian Institute of Technology Madras (2010). His doctoral thesis work focused on market mechanisms for integrating demand flexibility in energy markets. He was a post-doctoral associate in the department of Electrical and Computer Engineering at Cornell University, where his work focused on optimization, parametric learning and matching markets. He was then a post-doctoral fellow at the University of California, Irvine, where his work focused on topics at the intersection of machine learning, decision and control, online learning and optimization.
His current research spans the areas of reinforcement learning, machine learning, robotics and multi-agent systems.
Dr. Deepan's current research interests lie in reinforcement learning, robotics, and online learning.
Reinforcement learning and Robotics:
Robotics can play a major role in automating various processes such as manufacturing, warehouse operations, surgery, agriculture, etc. Automation can make the process time-efficient, high performing and agile, improving the operational outcomes. For example, monitoring soil and crop across a large area of agricultural land by humans can be very time consuming and error prone. Robotics with the aid of automation can significantly reduce the time and improve the accuracy of monitoring the soil and crop conditions. Similarly, Quadrupeds can perform very complex search and rescue operations in highly unstructured environments. Thus, robotics and automation augmented by modern AI tools has the potential to perform complex operations with high agility, performance and efficiency.
A technique that has played a major role in the recent advancement of robotics is Reinforcement Learning. But there are still many practical challenges with Reinforcement Learning (RL). There are challenges on the algorithmic side and the design of reward functions. There are challenges on the training side such as sample efficiency. While simulations are used to compensate for the lack of real-world samples currently, transferring such policies on to a real robot can lead to catastrophic failures because of the sim-real gap. This is especially true for highly dynamic real world-like environments, for instance, high speed locomotion in wild like environments.
My research is currently focused on developing techniques for simplifying the training process and training robust and safe algorithms for the real world. Simplifying the training involves many aspects like reducing the training samples that are required to train a RL policy, simplifying the reward tuning process, which is typically cumbersome, etc. The goal here is to develop a principled and a streamlined approach for training intelligent robots for the real world. Such schemes will go a long way towards enabling robotic applications in the real world like agriculture, medical robotics, complex operations in hazardous environments like search and rescue operations, etc. The research will simultaneously advance the field for reinforcement learning by contributing to architectures and algorithms for training agile, robust and safe policies.
Online learning:
Online learning is a paradigm of machine learning where the agent learns sequentially one data sample at a time. This can for instance be relevant when the environment is drifting and the model has to be continuously updated or learnt “on-the-fly” with no prior data. The theoretical study of online learning is rich and has led to insightful algorithms such as bandit algorithms which have applications in recommender systems, and many such applications.
Recently such algorithms have also been extended to control of uncertain dynamical systems, which is a much harder environment given the non-static nature of a dynamical system. While many aspects of the design of algorithms for such settings have been understood, there are still many open questions.
One question is how each agent in a multi-agent scenario should learn and update its actions where coordination needs to be achieved when each agent is only privy to its local information about the environment. Thus, this is a scenario where the information and control are both decentralized. Each agent not only has to learn on the fly to coordinate but also coordinate the exploration for quick convergence. Currently, his research is exploring the algorithmic aspects of decentralized online learning algorithms for such scenarios. This research can have broad implications in the design of AI algorithms for multi-agent systems such as multi-robot systems.