Dr. Deepan Muthirayan

Assistant Professor, Plaksha University


PhD, University of California at Berkeley (2016)

Dual Degree (BTech/MTech), IIT Madras (2010)

Deepan is currently an Assistant Professor at Plaksha University.

He obtained his PhD from the University of California at Berkeley (2016) and Dual Degree (B.Tech/M.tech) from the Indian Institute of Technology Madras (2010). His doctoral thesis work focused on market mechanisms for integrating demand flexibility in energy markets. He was a post-doctoral associate in the department of Electrical and Computer Engineering at Cornell University, where his work focused on optimization, parametric learning and matching markets. He was then a post-doctoral fellow at the University of California, Irvine, where his work focussed on topics at the intersection of machine learning, decision and control, online learning and optimization.

His current research spans the areas of reinforcement learning, machine learning, robotics and multi-agent systems.

Reinforcement learning and Robotics:

Robotics can play a major role in automating various processes such as manufacturing, warehouse operations, surgery, agriculture, etc. Automation can make the process time-efficient, high performing and agile, improving the operational outcomes. For example, monitoring soil and crop across a large area of agricultural land by humans can be very time consuming and error prone. Robotics with the aid of automation can significantly reduce the time and improve the accuracy of monitoring the soil and crop conditions. Similarly, Quadrupeds can perform very complex search and rescue operations in highly unstructured environments. Thus, robotics and automation augmented by modern AI tools has the potential to perform complex operations with high agility, performance and efficiency.

A technique that has played a major role in the recent advancement of robotics is Reinforcement Learning. But there are still many practical challenges with Reinforcement Learning. There are challenges on the algorithmic side and the design of reward functions. There are challenges on the training side such as sample efficiency. While simulations are used to compensate for the lack of real-world samples currently, transferring such policies on to a real robot can lead to catastrophic failures because of the sim-real gap.

His research currently is focused on:

  1. Improving the techniques for reward design,
  2. Developing new architectures and algorithms for mitigating the effect of sim-real gap
  3. And improving the robustness of the designed control solutions.

Online learning:

Online learning is a paradigm of machine learning where the agent learns sequentially one data sample at a time. This can for instance be relevant when the environment is drifting and the model has to be continuously updated or learnt “on-the-fly” with no prior data. The theoretical study of online learning is rich and has led to insightful algorithms such as bandit algorithms which have applications in recommender systems, and many such applications.

Recently such algorithms have also been extended to control of uncertain dynamical systems, which is a much harder environment given the non-static nature of a dynamical system. While many aspects of the design of algorithms for such settings have been understood, there are still many open questions.

One question is how should each agent in a multi-agent scenario learn and update its actions where coordination needs to be achieved when each agent is only privy to its local information about the environment. Thus, this is a scenario where the information and control are both decentralized. Each agent not only has to learn on the fly to coordinate but also coordinate the exploration for quick convergence. Currently, his research is exploring the algorithmic aspects of decentralized online learning algorithms for such scenarios. This research can have broad implications in the design of AI algorithms for multi-agent systems such as multi-robot systems.