The Conference on Robot Learning (CoRL) is an annual international conference focusing on the intersection of robotics and machine learning. CoRL 2023 takes place in Atlanta, Nov. 6 – 9, and is hosted by Georgia Tech. Among the nearly 900 experts in the main program—hailing from 25 countries—are Georgia Tech’s researchers from Aerospace Engineering, Computer Science, Electrical and Computer Engineering, Interactive Computing, and Mechanical Engineering.

Learn about and explore Georgia Tech’s approach to creating a new generation of thinking machines now.

Georgia Tech @ CoRL 2023

Georgia Tech’s contributions at CoRL represent the expansive and complex field of robot learning. Implementing “embodied” artificial intelligence (aka robots) has many implications. Explore Georgia Tech’s work at CoRL and where our experts are innovating in robot learning.

HOW TO READ:

  • The chord diagram shows the keyword topics that appear across GT@CoRL papers. (Standardized topics are based on author-supplied keywords.)
  • The chords connect topics that appear together on the same paper.
  • The base width of the chord represents the number of papers the “topic pairs” appear on together.
  • The most common pairing is “Imitation Learning/Learning from Demonstration;” this pairing appears on four papers.
  • Manipulation—the ability for a robot to interact physically with objects in the world and manipulate them towards completing a task—is the top topic, appearing in 43% of papers from Georgia Tech.

Dhruv Batra

Assoc. Professor, Interactive Computing

Yongxin Chen

Assoc. Professor, Aerospace Engineering

Sonia Chernova

Assoc. Professor, Interactive Computing

Animesh Garg

Asst. Professor, Interactive Computing

Matthew Gombolay

Asst. Professor, Interactive Computing

Sehoon Ha

Asst. Professor, Interactive Computing

Zsolt Kira

Asst. Professor, Interactive Computing

Harish Ravichandar

Asst. Professor, Interactive Computing

Greg Turk

Professor, Interactive Computing

Bruce Walker

Professor, Interactive Computing/Psychology

Danfei Xu

Asst. Professor, Interactive Computing

Ye Zhao

Asst. Professor, Mechanical Engineering

Graduate Students

Ezra Ameperosa • Matthew Bronars • Letian Chen • Shuo Cheng • Shivang Chopra • Kevin Fu • Yunhai Han • Byeolyi Han • Pierce Howell • Sravan Jayanthi • Mukul Khanna • J. Taery Kim • Yash Kothari • Sachit Kuhar • Arjun Majumdar • Utkarsh Aashu Mishra • Maithili Patel • Erik Scarlatescu • Reza Joseph Torbati • Zixuan Wu • Mandy Xie • Shangjie Xue • Sean Charles Ye • Sriram Yenamandra • Zulfiqar Haider Zaidi

Alumni & Others

Taylor Keith Del Matto • Daniel Martin • Aswin Gururaj Prakash • Arun Ramachandran

Click image to explore coauthors and organizations

Large Language Models Help Humans to Communicate with Robots

By Nathan Deen

Roboticists have a new way to communicate instructions to robots — large language models (LLMs).

Georgia Tech robotics researchers Animesh Garg and Danfei Xu think LLMs like ChatGPT make the best human-to-robot translators when it comes to training robots.

Garg said LLMs have a healthy understanding of human language and can communicate instantly with robots by translating instructions into code. This expedites the training process by weeks and, in some cases, months.

“It’s something that has taken us by surprise,” Garg said. “These models can be applied to many tasks with little or no user input and without retraining. They don’t need specific data for the task that we need them to do. Traditional training methods take a substantial amount of time and data for training.”

From left, Danfei Xu and Animesh Garg are using large language models to train robots to perform assistive tasks like making coffee. Xu and Garg will present multiple papers Nov. 6 to Nov. 9 at the 2023 Conference on Robotic Learning in Atlanta.

LLMs are going to take the job of high-level reasoning. They will be part of the decision-making about what needs to be done in a setting with all the knowledge a robot has about the world.

Danfei Xu, Asst. Professor, School of Interactive Computing

I don’t think anyone has looked at an outside attack in quite the same way we have. If we allow robots access to previous experiences with attackers, they could learn a response strategy. They can adjust their communication scheme or alter their behavior.

Matthew Gombolay, Asst. Professor, School of Interactive Computing

Friendly Hacking Helps Robots Boost Defense Strategies

By Nathan Deen

Multi-agent robotic communication systems used by first responders, traffic and air-traffic control, and search and rescue teams can get a boost in security thanks to a Georgia Tech research team.

Matthew Gombolay, an assistant professor in the School of Interactive Computing, said all those systems are vulnerable to hacking. And the best way to demonstrate it is by hacking them himself.

Friendly hacking allows researchers like Gombolay to develop new training models with enhanced hacking defense strategies. He believes the best way to build more secure multi-agent communication systems is to predict how an adversary might hack them.

Countdown to CoRL 2023🔗

Organizers for the 2023 Conference on Robot Learning (CoRL), Nov. 6 – 9, predict a record-setting attendance, with more than 800 people expected to join the conference at the Starling Hotel in Midtown Atlanta. With Georgia Tech hosting, the international conference brings together top young researchers whose work explores robotics and machine learning. Google created and hosted the first CoRL in 2017, and is a major sponsor of the event.

Five Georgia Tech faculty members are serving on this year’s organizing committee. Hosting CoRL represents part of Georgia Tech’s commitment to the growth of robotics, AI, and machine learning.

CoRL lies at a strategic intersection of topics for Georgia Tech research. It’s the premier conference for machine learning and robotics and has seen rapid growth since its first year in 2017.

Sonia Chernova, General Co-chair, CoRL 2023
Georgia Tech hosts the 7th Conference on Robot Learning (CoRL), Nov. 6 – 9 in Atlanta. Coffee-serving bots may become a theme…

ORAL

Nov. 7, 8:30 a.m. to 9:30 a.m.

  • On the Utility of Koopman Operator Theory in Learning Dexterous Manipulation Skills (Yunhai Han, Mandy Xie, Ye Zhao, Harish Ravichandar)

Nov. 7, 1:45 p.m. to 2:45 p.m.

  • Hijacking Robot Teams Through Adversarial Communication (Zixuan Wu, Sean Charles Ye, Byeolyi Han, Matthew Gombolay)

Nov. 8, 8:30 a.m. to 9:30 am.

  • Language-Guided Traffic Simulation via Scene-Level Diffusion (Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, Baishakhi Ray)

Nov. 9, 8:30 a.m. to 9:30 a.m.

  • MimicPlay: Long-Horizon Imitation Learning by Watching Human Play (Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, Anima Anandkumar)

POSTERS

Nov. 7, 2:45 p.m. to 3:30 p.m.

  • On the Utility of Koopman Operator Theory in Learning Dexterous Manipulation Skills (Yunhai Han, Mandy Xie, Ye Zhao, Harish Ravichandar)
  • Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models (Utkarsh Aashu Mishra, Shangjie Xue, Yongxin Chen, Danfei Xu)

Nov. 7, 4:15 p.m. to 5:00 p.m.

  • DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation (Sravan Jayanthi, Letian Chen, Nadya Balabanska, Van Duong, Erik Scarlatescu, Ezra Ameperosa, Zulfiqar Haider Zaidi, Daniel Martin, Taylor Del Matto, Masahiro Ono, Matthew Gombolay)
  • Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning (Sachit Kuhar, Shuo Cheng, Shivang Chopra, Matthew Bronars, Danfei Xu)
  • Hijacking Robot Teams Through Adversarial Communication (Zixuan Wu, Sean Charles Ye, Byeolyi Han, Matthew Gombolay)

Nov. 8, 12:00 p.m. to 12:45 p.m.

  • HomeRobot: Open-Vocabulary Mobile Manipulation (Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin S Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander Clegg, John M Turner, Zsolt Kira, Manolis Savva, Angel X Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton)
  • Transforming a Quadruped into a Guide Robot for the Visually Impaired: Formalizing Wayfinding, Interaction Modeling, and Safety Mechanism (J. Taery Kim, Wenhao Yu, Yash Kothari, Bruce Walker, Jie Tan, Greg Turk, Sehoon Ha)
  • Language-Guided Traffic Simulation via Scene-Level Diffusion (Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, Baishakhi Ray)

Nov. 8, 5:15 p.m. to 6:00 p.m.

  • Human-in-the-Loop Task and Motion Planning for Imitation Learning (Ajay Mandlekar, Caelan Reed Garrett, Danfei Xu, Dieter Fox)
  • Predicting Routine Object Usage for Proactive Robot Assistance (Maithili Patel, Aswin Prakash, Sonia Chernova)
  • FindThis: Language-Driven Object Disambiguation in Indoor Environments (Arjun Majumdar, Fei Xia, brian ichter, Dhruv Batra, Leonidas Guibas)

Nov. 9, 2:45 p.m. to 3:30 p.m.

  • Geometry Matching for Multi-Embodiment Grasping (Maria Attarian, Muhammad Adil Asif, Animesh Garg, Igor Gilitschenski, Jonathan Tompson)
  • Neural Field Dynamics Model for Granular Object Piles Manipulation (Shangjie Xue, Shuo Cheng, Pujith Kachana, Danfei Xu)
  • Composable Part-Based Manipulation (Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu)
  • MimicPlay: Long-Horizon Imitation Learning by Watching Human Play (Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, Anima Anandkumar)

Nov. 9, 4:15 p.m. to 5:00 p.m.

  • Generalization of Heterogeneous Multi-Robot Policies via Awareness and Communication of Capabilities (Pierce Howell, Max Rudolph, Reza Joseph Torbati, Kevin Fu, Harish Ravichandar)

Composable Part-Based Manipulation

Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu

We compose diffusion models based on different part-part correspondences to improve learning and generalization of robotic manipulation skills.

In this paper, we propose CPM, a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional affordances between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of different constraints. CPM comprises a collection of composable diffusion models, where each model captures a different inter-object correspondence. These diffusion models can generate parameters for manipulation skills based on the specific object parts. Leveraging part-based correspondences coupled with the task decomposition into distinct constraints enables strong generalization to novel objects and object categories. We validate our approach in both simulated and real-world scenarios, demonstrating its effectiveness in achieving robust and generalized manipulation capabilities. KW: Manipulation, Diffusion

DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation

Sravan Jayanthi, Letian Chen, Nadya Balabanska, Van Duong, Erik Scarlatescu, Ezra Ameperosa, Zulfiqar Haider Zaidi, Daniel Martin, Taylor Del Matto, Masahiro Ono, Matthew Gombolay

Offline Learning from Demonstrations (OLfD) is valuable in domains where trial-and-error learning is infeasible or specifying a cost function is difficult, such as robotic surgery, autonomous driving, and path-finding for NASA’s Mars rovers. However, two key problems remain challenging in OLfD: 1) heterogeneity: demonstration data can be generated with diverse preferences and strategies, and 2) generalizability: the learned policy and reward must perform well beyond a limited training regime in unseen test settings. To overcome these challenges, we propose Dual Reward and policy Offline Inverse Distillation (DROID), where the key idea is to leverage diversity to improve generalization performance by decomposing common-task and individual-specific strategies and distilling knowledge in both the reward and policy spaces. We ground DROID in a novel and uniquely challenging Mars rover path-planning problem for NASA’s Mars Curiosity Rover. We also curate a novel dataset along 163 Sols (Martian days) and conduct a novel, empirical investigation to characterize heterogeneity in the dataset. We find DROID outperforms prior SOTA OLfD techniques, leading to a $26\%$ improvement in modeling expert behaviors and $92\%$ closer to the task objective of reaching the final destination. We also benchmark DROID on the OpenAI Gym Cartpole environment and find DROID achieves $55\%$ (significantly) better performance modeling heterogeneous demonstrations. KW: Learning from Demonstration, Model Distillation, Imitation Learning

FindThis: Language-Driven Object Disambiguation in Indoor Environments

Arjun Majumdar, Fei Xia, brian ichter, Dhruv Batra, Leonidas Guibas

We present a new task, dataset, and method focused on language-driven object disambiguation in indoor 3D environments.

Natural language is naturally ambiguous. In this work, we consider interactions between a user and a mobile service robot tasked with locating a desired object, specified by a language utterance. We present a task FindThis, which addresses the problem of how to disambiguate and locate the particular object instance desired through a dialog with the user. To approach this problem we propose an algorithm, GoFind, which exploits visual attributes of the object that may be intrinsic (e.g., color, shape), or extrinsic (e.g., location, relationships to other entities), expressed in an open vocabulary. GoFind leverages the visual common sense learned by large language models to enable fine-grained object localization and attribute differentiation in a zero-shot manner. We also provide a new visio-linguistic dataset, 3D Objects in Context (3DOC), for evaluating agents on this task consisting of Google Scanned Objects placed in Habitat-Matterport 3D scenes. Finally, we validate our approach on a real robot operating in an unstructured physical office environment using complex fine-grained language instructions. KW: Natural Language, Navigation, Vision and Depth

Generalization of Heterogeneous Multi-Robot Policies via Awareness and Communication of Capabilities

Pierce Howell, Max Rudolph, Reza Joseph Torbati, Kevin Fu, Harish Ravichandar

We investigate how the awareness and communication of robots capabilities can enable generalization of heterogeneous multi-robot coordination policies training using multi-agent reinforcement learning.

Recent advances in multi-agent reinforcement learning (MARL) provide promising tools for enabling heterogeneous multi-robot teaming without the need for manually-designed controllers. However, existing approaches often overlook the challenge of generalizing learned policies to teams of new compositions, sizes, and robots. While such generalization might not be important in teams of virtual agents that can retrain policies on-demand, it is pivotal in multi-robot systems that are deployed in the real-world and must adapt to inevitable changes without retraining. As such, multi-robot policies must remain robust to team changes — an ability we call adaptive teaming. In this work, we investigate if awareness and communication of robot capabilities can improve such generalization. We conduct detailed experiments involving a heterogeneous sensor network task implemented on an established multi-robot test bed. We demonstrate that shared decentralized policies, that enable robots to be both aware of and communicate their capabilities, achieve adaptive teaming by implicitly capturing the fundamental relationship between collective capabilities and effective coordination. Videos of trained policies can be viewed at https://sites.google.com/view/cap-comm. KW: Multi-Agent, Generalization and Robustness

Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models

Utkarsh Aashu Mishra, Shangjie Xue, Yongxin Chen, Danfei Xu

We introduce Generative Skill Chaining, a probabilistic framework that learns skill-centric diffusion models and composes their learned distributions to generate long-horizon plans for unseen task skeletons during inference.

Long-horizon tasks, usually characterized by complex subtask dependencies, present a significant challenge in manipulation planning. Skill chaining is a practical approach to solving unseen tasks by combining learned skill priors. However, such methods are myopic if sequenced greedily and face scalability issues with search-based planning strategy. To address these challenges, we introduce Generative Skill Chaining (GSC), a probabilistic framework that learns skill-centric diffusion models and composes their learned distributions to generate long-horizon plans during inference. GSC samples from all skill models in parallel to efficiently solve unseen tasks while enforcing geometric constraints. We evaluate the method on various long-horizon tasks and demonstrate its capability in reasoning about action dependencies, constraint handling, and generalization, along with its ability to replan in the face of perturbations. We show results in simulation and on real robot to validate the efficiency and scalability of GSC, highlighting its potential for advancing long-horizon task planning. More details are available at: https://sites.google.com/view/generative-skill-chaining. KW: Planning, Manipulation, Diffusion, Task and Motion Planning

Geometry Matching for Multi-Embodiment Grasping

Maria Attarian, Muhammad Adil Asif, Animesh Garg, Igor Gilitschenski, Jonathan Tompson

While significant progress has been made on the problem of generating grasps, many existing learning-based approaches still concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. In this paper, we tackle the problem of grasping multi-embodiments through the viewpoint of learning rich geometric representations for both objects and end-effectors using Graph Neural Networks (GNN). Our novel method – GeoMatch – applies supervised learning on grasping data from multiple embodiments, learning end-to-end contact point likelihood maps as well as conditional autoregressive prediction of grasps keypoint-by-keypoint. We compare our method against 3 baselines that provide multi-embodiment support. Our approach performs better across 3 end-effectors, while also providing competitive diversity of grasps. Examples can be found at geomatch.github.io. KW: Dexterous Manipulation, Manipulation, Graph Neural Networks

Hijacking Robot Teams Through Adversarial Communication

Zixuan Wu, Sean Charles Ye, Byeolyi Han, Matthew Gombolay

We contribute a novel black-box adversarial method that learns to hijack robot communication in a multi-agent setting without their ground truth reward or access to their policies

Communication is often necessary for robot teams to collaborate and complete a decentralized task. Multi-agent reinforcement learning (MARL) systems allow agents to learn how to collaborate and communicate to complete a task. These domains are ubiquitous and include safety-critical domains such as wildfire fighting, traffic control, or search and rescue missions. However, critical vulnerabilities may arise in communication systems as jamming the signals can interrupt the robot team. This work presents a framework for applying black-box adversarial attacks to learned MARL policies by manipulating only the communication signals between agents. Our system only requires observations of MARL policies after training is complete, as this is more realistic than attacking the training process. To this end, we imitate a learned policy of the targeted agents without direct interaction with the environment or ground truth rewards. Instead, we infer the rewards by only observing the behavior of the targeted agents. Our framework reduces reward by 201% compared to an equivalent baseline method and also shows favorable results when deployed in real swarm robots. Our novel attack methodology within MARL systems contributes to the field by enhancing our understanding on the reliability of multi-agent systems. KW: Multi-Agent, Reinforcement Learning

HomeRobot: Open-Vocabulary Mobile Manipulation

Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin S Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander Clegg, John M Turner, Zsolt Kira, Manolis Savva, Angel X Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

We propose “open vocabulary mobile manipulation” as a key problem for robotics, and provide both a simulation and a reproducible real-world benchmark.

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://home-robot-ovmm.github.io/. KW: Mobile Manipulation, Manipulation, Sim2Real

Human-in-the-Loop Task and Motion Planning for Imitation Learning

Ajay Mandlekar, Caelan Reed Garrett, Danfei Xu, Dieter Fox

We present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that selectively gives and takes control to and from a human teleoperator, enabling more efficient imitation learning.

Imitation learning from human demonstrations can teach robots complex manipulation skills, but is time-consuming and labor intensive. In contrast, Task and Motion Planning (TAMP) systems are automated and excel at solving long-horizon tasks, but they are difficult to apply to contact-rich tasks. In this paper, we present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that leverages the benefits of both approaches. The system employs a TAMP-gated control mechanism, which selectively gives and takes control to and from a human teleoperator. This enables the human teleoperator to manage a fleet of robots, maximizing data collection efficiency. The collected human data is then combined with an imitation learning framework to train a TAMP-gated policy, leading to superior performance compared to training on full task demonstrations. We compared HITL-TAMP to conventional teleoperation system — users gathered more than 3x the number of demos given the same time budget. Furthermore, proficient agents (75\%+ success) could be trained from just 10 minutes of non-expert teleoperation data. Finally, we collected 2.1K demos with HITL-TAMP across 12 contact-rich, long-horizon tasks and show that the system often produces near-perfect agents. Videos and additional results at https://sites.google.com/view/corl-2023-hitl-tamp. KW: Imitation Learning, Learning from Demonstration, Task and Motion Planning, Planning, Human-Robot Interaction

Language-Guided Traffic Simulation via Scene-Level Diffusion

Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, Baishakhi Ray

A scene-level conditional diffusion model with a LLM based language interface for realistic and controllable traffic simulation.

Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user’s query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations. KW: Simulation, Multi-Agent, Diffusion, Large Language Models, Natural Language

Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning

Sachit Kuhar, Shuo Cheng, Shivang Chopra, Matthew Bronars, Danfei Xu

L2D, a new IL framework, enhances policy performance by learning from varied demonstrations, utilizing latent trajectory representations to discern and prioritize high-quality training data in both simulated and robot tasks

​​Practical Imitation Learning (IL) systems rely on large human demonstration datasets for successful policy learning. However, challenges lie in maintaining the quality of collected data and addressing the suboptimal nature of some demonstrations, which can compromise the overall dataset quality and hence the learning outcome. Furthermore, the intrinsic heterogeneity in human behavior can produce equally successful but disparate demonstrations, further exacerbating the challenge of discerning demonstration quality. To address these challenges, this paper introduces Learning to Discern (L2D), an imitation learning framework for learning from demonstrations with diverse quality and style. Given a small batch of demonstrations with sparse quality labels, we learn a latent representation for temporally-embeded trajectory segments. Preference learning in this latent space trains a quality evaluator that generalizes to new demonstrators exhibiting different styles. Empirically, we show that L2D can effectively assess and learn from varying demonstrations, thereby leading to improved policy performance across a range of tasks in both simulation and on a physical robot. KW: Imitation Learning, Learning from Demonstration, Manipulation

MimicPlay: Long-Horizon Imitation Learning by Watching Human Play

Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, Anima Anandkumar

We present MimicPlay, a novel imitation learning algorithm that leverages cost-effective human play data to learn long-horizon manipulation tasks in a sample-efficient manner.

Imitation learning from human demonstrations is a promising paradigm for teaching robots manipulation skills in the real world. However, learning complex long-horizon tasks often requires an unattainable amount of demonstrations. To reduce the high data requirement, we resort to human play data – video sequences of people freely interacting with the environment using their hands. Even with different morphologies, we hypothesize that human play data contain rich and salient information about physical interactions that can readily facilitate robot policy learning. Motivated by this, we introduce a hierarchical learning framework named MimicPlay that learns latent plans from human play data to guide low-level visuomotor control trained on a small number of teleoperated demonstrations. With systematic evaluations of 14 long-horizon manipulation tasks in the real world, we show that MimicPlay dramatically outperforms state-of-the-art imitation learning methods in task success rate, generalization ability, and robustness to disturbances. More details and video results can be found at https://mimicplaysub.github.io. KW: Imitation Learning, Learning from Demonstration, Human-Robot Interaction, Long-Horizon, Manipulation

Neural Field Dynamics Model for Granular Object Piles Manipulation

Shangjie Xue, Shuo Cheng, Pujith Kachana, Danfei Xu

Our approach combines trajectory optimization and differentiable rendering for granular object manipulation. It introduces a unified density-field-based representation for object states and actions, utilizing a FCN to predict physical dynamics.

We present a learning-based dynamics model for granular material manipulation. Drawing inspiration from computer graphics’ Eulerian approach, our method adopts a fully convolutional neural network that operates on a density field-based representation of object piles, allowing it to exploit the spatial locality of inter-object interactions through the convolution operations. This approach greatly improves the learning and computation efficiency compared to existing latent or particle-based methods and sidesteps the need for state estimation, making it directly applicable to real-world settings. Furthermore, our differentiable action rendering module makes the model fully differentiable and can be directly integrated with a gradient-based algorithm for curvilinear trajectory optimization. We evaluate our model with a wide array of piles manipulation tasks both in simulation and real-world experiments and demonstrate that it significantly exceeds existing methods in both accuracy and computation efficiency. More details can be found at \url{https://sites.google.com/view/nfd-corl23/}. KW: Deformable Objects, Manipulation, Planning

On the Utility of Koopman Operator Theory in Learning Dexterous Manipulation Skills

Yunhai Han, Mandy Xie, Ye Zhao, Harish Ravichandar

This paper investigates the utility of Koopman operator theory on dexterous manipulation tasks and reveals a number of unique benefits.

Despite impressive dexterous manipulation capabilities enabled by learning-based approaches, we are yet to witness widespread adoption beyond well-resourced laboratories. This is likely due to practical limitations, such as significant computational burden, inscrutable learned behaviors, sensitivity to initialization, and the considerable technical expertise required for implementation. In this work, we investigate the utility of Koopman operator theory in alleviating these limitations. Koopman operators are simple yet powerful control-theoretic structures to represent complex nonlinear dynamics as linear systems in higher dimensions. Motivated by the fact that complex nonlinear dynamics underlie dexterous manipulation, we develop a Koopman operator-based imitation learning framework to learn the desired motions of both the robotic hand and the object simultaneously. We show that Koopman operators are surprisingly effective for dexterous manipulation and offer a number of unique benefits. Notably, policies can be learned analytically, drastically reducing computation burden and eliminating sensitivity to initialization and the need for painstaking hyperparameter optimization. Our experiments reveal that a Koopman operator-based approach can perform comparably to state-of-the-art imitation learning algorithms in terms of success rate and sample efficiency, while being an order of magnitude faster. Policy videos can be viewed at https://sites.google.com/view/kodex-corl. KW: Dexterous Manipulation, Manipulation

Predicting Routine Object Usage for Proactive Robot Assistance

Maithili Patel, Aswin Prakash, Sonia Chernova

We propose SLaTe-PRO, a model which can learn to anticipate user’s needs from past observations and use them to provide proactive assistance, and an interactive clarification mechanism, which can further refine such predictions.

Proactivity in robot assistance refers to the robot’s ability to anticipate user needs and perform assistive actions without explicit requests. This requires understanding user routines, predicting consistent activities, and actively seeking information to predict inconsistent behaviors. We propose SLaTe-PRO (Sequential Latent Temporal model for Predicting Routine Object usage), which improves upon prior state-of-the-art by combining object and user action information, and conditioning object usage predictions on past history. Additionally, we find some human behavior to be inherently stochastic and lacking in contextual cues that the robot can use for proactive assistance. To address such cases, we introduce an interactive query mechanism that can be used to ask queries about the user’s intended activities and object use to improve prediction. We evaluate our approach on longitudinal data from three households, spanning 24 activity classes. SLaTe-PRO performance raises the F1 score metric to 0.57 without queries, and 0.60 with user queries, over a score of 0.43 from prior work. We additionally present a case study with a fully autonomous household robot. KW: Human-Robot Interaction

Transforming a Quadruped into a Guide Robot for the Visually Impaired: Formalizing Wayfinding, Interaction Modeling, and Safety Mechanism

J. Taery Kim, Wenhao Yu, Yash Kothari, Bruce Walker, Jie Tan, Greg Turk, Sehoon Ha

This paper discusses principles and practical solutions for developing a robot guide dog, which needs to learn how to safely guide human as real guide dog.

This paper explores the principles for transforming a quadrupedal robot into a guide robot for individuals with visual impairments. A guide robot has great potential to resolve the limited availability of guide animals that are accessible to only two to three percent of the potential blind or visually impaired (BVI) users. To build a successful guide robot, our paper explores three key topics: (1) formalizing the navigation mechanism of a guide dog and a human, (2) developing a data-driven model of their interaction, and (3) improving user safety. First, we formalize the wayfinding task of the human-guide robot team using Markov Decision Processes based on the literature and interviews. Then we collect real human-robot interaction data from three visually impaired and six sighted people and develop an interaction model called the “Delayed Harness” to effectively simulate the navigation behaviors of the team. Additionally, we introduce an action shielding mechanism to enhance user safety by predicting and filtering out dangerous actions. We evaluate the developed interaction model and the safety mechanism in simulation, which greatly reduce the prediction errors and the number of collisions, respectively. We also demonstrate the integrated system on an AlienGo robot with a rigid harness, by guiding users over $100+$m trajectories. KW: Human-Robot Interaction, Navigation

Explore an international view of CoRL 2023 research by country and topic. Interact with the map and click on the bars to go to the research papers.

Click on map to Interact

See You in Atlanta!

Development: College of Computing, Machine Learning Center
Project and Web Lead; Data Graphics: Josh Preston
News: Nathan Deen
Feature Photos: Terence Rushin
Data Source: CoRL 2023
Additional Data Collection: Joni Isbell, Nathan Deen
Special Thanks: Sonia Chernova, Ethan Gordon, Kourosh Darvish, Jie Tan