• Skip to content

Georgia Tech at ECCV 2022

GT Research in Machine Learning

  • Papers
  • About

Main Content

Georgia Tech at ECCV 2022

Computer vision research involves creating the next generation of machines to understand videos and images as easily as humans can. Georgia Tech’s latest contributions to the field will be presented Oct. 23-27 at the European Conference on Computer Vision.

Explore Georgia Tech’s latest work, the experts behind the tech, and where computer vision is headed next.

Georgia Tech Research in
COMPUTER VISION

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects — and then implement various actions based on the machine’s understanding of images.

Georgia Tech is advancing computer vision research as image data drives innovations in areas such as self-driving vehicles, digital medical imagery, agricultural bots, facial recognition, and more.

Georgia Tech research at the European Conference on Computer Vision, Oct. 23-27, 2022, involves multiple approaches and contributions to the field. Learn more about the experts whose discoveries are leading to the next generation of computer vision systems.

Welcome to your front row seats to Georgia Tech work in the main ECCV program, taking place in Tel Aviv, Israel. Click the chart to explore and read the research papers.*

*links available for select papers

FEATURED RESEARCH

Students Developing Home Robot That Can Tidy a House on its Own

By Nathan Deen

Struggling with keeping your home clean and organized? You may soon have an extra set of hands to help around the house.

Imagine a home robot that can keep a house tidy without being given any commands from its owner. Well, the next step in home robotics is here — at least virtually.

A group of doctoral and master’s students from the School of Interactive Computing, in collaboration with researchers from the University of Toronto, believe they have created the benchmark for a home robot that can keep an entire house tidy.

In their paper, Housekeep: Tidying Virtual Households Using Commonsense Reasoning, Georgia Tech doctoral candidates Harsh Agrawal and Andrew Szot, master’s students Arun Ramachandran and Sriram Yenamandra, and Yash Kant, a former research visitor at Georgia Tech who is now a doctoral candidate at Toronto, set out to prove an embodied artificial intelligence (AI) could conduct simple housekeeping tasks without explicit instructions.

FULL STORY

Researchers (L to R from top): Arun Ramachandran, Yash Kant, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Sriram Yenamandra, and Harsh Agrawal

‘Housekeep’ in Action

Housekeep is a benchmark to evaluate commonsense reasoning in the home for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging misplaced objects without explicit instructions specifying which objects need to be rearranged. Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house. Researchers collect a dataset of where humans typically place objects in tidy and untidy houses constituting 1799 objects, 268 object categories, 585 placements, and 105 rooms.

featured research

CLOSING THE GAP: Teaching Computers to Reliably Identify Objects in Images using Large-Scale Unannotated Data

By Joshua Preston

Zsolt Kira is one of Georgia Tech’s leading roboticists and machine learning researchers making advancements in computer vision.

Training computers to detect and reliably identify objects in images is a challenge even with advanced computing power. Humans need only see common objects a few times to learn what they are and the lesson sticks. Computer programs require massive annotated datasets, where humans draw boxes around all objects and identify them. One way to get around this is to train computers to use unlabeled data, by allowing the computer to make predictions of what is in the image (called “pseudo-labels”) and training on those predictions. If the computer gets the object classification wrong—such as labeling a walrus as a dog—the faulty data may be used in the future and the system risks becoming unreliable.

Zsolt Kira, assistant professor in Interactive Computing, and his team, including Ph.D. student Yen-Cheng Liu and collaborators at Meta, are working to train computer programs to more accurately classify objects in images and mitigate the risks of mislabeling data in large-scale real-world unlabeled datasets.

New research from the group is some of the first to explore how computer programs implement semi-supervised object detection or SSOD—using labeled data to apply the pseudo-labels to raw images that haven’t been labeled—for open datasets found on the internet.

FULL STORY

ECCV 2022 Papers with Georgia Tech Authors

Paper links available on select papers
ORAL
POSTER
WORKSHOP

ORAL

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni (Meta AI Research); Adam Polyak (Facebook); Oron Ashual (Facebook AI Research); Shelly Sheynin (Meta); Devi Parikh (Georgia Tech & Facebook AI Research); Yaniv Taigman (Facebook)

Open-Set Semi-Supervised Object Detection
Yen-Cheng Liu (Georgia Institute of Technology); Chih-Yao Ma (Facebook); Xiaoliang Dai (Facebook); Junjiao Tian (Georgia Institute of Technology); Peter Vajda (Facebook); Zijian He (Facebook); Zsolt Kira (Georgia Institute of Technology)

PressureVision: Estimating Hand Pressure from a Single RGB Image
Patrick L Grady (Georgia Institute of Technology)*; Chengcheng Tang (Facebook Reality Labs); Samarth Brahmbhatt (Intel); Christopher D Twigg (Meta); Chengde Wan (Facebook Reality Lab); James Hays (Georgia Institute of Technology, USA); Charlie Kemp (Georgia Institute of Technology)

POSTER

A Sketch Is Worth a Thousand Words:Image Retrieval with Text and Sketch
Patsorn Sangkloy (Georgia Institute of Technology); Wittawat Jitkrittum (Google Research); Diyi Yang (Georgia Institute of Technology); James Hays (Georgia Institute of Technology, USA)

BLT: Bidirectional Layout Transformer for Controllable Layout Generation
Xiang Kong (Carnegie Mellon University); Lu Jiang (Google Research); Huiwen Chang (Google); Han Zhang (Google); Yuan Hao (Google); Haifeng Gong (Google Inc.); Irfan Essa (Google & Georgia Tech)

CoGS: Controllable Generation and Search from Sketch and Style
Cusuh Ham (Georgia Institute of Technology); Gemma Canet Tarrés (CVSSP, University of Surrey); Tu Bui (University of Surrey); James Hays (Georgia Institute of Technology, USA); Zhe Lin (Adobe Research); John Collomosse (Adobe Research)

Decomposing The Tangent of Occluding Boundaries According to Curvatures and Torsions
Huizong Yang (Georgia Institute of Technology); Anthony Yezzi (Georgia Institute of Technology)

Egocentric Activity Recognition and Localization on a 3D Map
Miao Liu (Georgia Institute of Technology); Lingni Ma (Facebook Reality Labs); Kiran Somasundaram (Facebook Reality Labs); Yin Li (University of Wisconsin-Madison); Kristen Grauman (Facebook AI Research & UT Austin); James Rehg (Georgia Institute of Technology); Chao Li (Facebook Reality Labs)

Generative Adversarial Network for Future Hand Segmentation from Egocentric Video
Wenqi Jia (Georgia Institute of Technology); Miao Liu (Georgia Institute of Technology); James Rehg (Georgia Institute of Technology)

Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Mukund Kant (University of Toronto); Arun Ramachandran (Georgia Institute of Technology); Sriram Yenamandra (Georgia Institute of Technology); Igor Gilitschenski (University of Toronto); Dhruv Batra (Georgia Tech & Facebook AI Research); Andrew Szot (Georgia Institute of Technology); Harsh Agrawal (Georgia Institute of Technology)

Improved Masked Image Generation with Token-Critic
Jose Lezama (Google Research); Huiwen Chang (Google); Lu Jiang (Google Research); Irfan Essa (Google & Georgia Tech)

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge (University of Maryland); Thomas F Hayes (Meta); Harry Yang (Facebook); Xi Yin (Facebook); Guan Pang (Facebook); David Jacobs (University of Maryland, USA); Jia-Bin Huang (Facebook ); Devi Parikh (Georgia Tech & Facebook AI Research)

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Thomas F Hayes (Meta); Songyang Zhang (University of Rochester); Xi Yin (Facebook); Guan Pang (Facebook); Sasha Sheng (Meta Platforms); Harry Yang (Facebook); Songwei Ge (University of Maryland, College Park); Qiyuan Hu (Facebook AI Research); Devi Parikh (Georgia Tech & Facebook AI Research)

Planes vs. Chairs: Category-guided 3D shape learning without any 3D cues
Zixuan Huang (Georgia Institute of Technology); Stefan Stojanov (Georgia Institute of Technology); Anh Thai (Georgia Institute of Technology); Varun Jampani (Google); James Rehg (Georgia Institute of Technology)

PT4AL: Using Self-Supervised Pretext Tasks for Active Learning
John Seon Keun Yi (Georgia Institute of Technology); Minseok Seo (si-analytics); Jongchan Park (Lunit); Dong-Geol Choi (Hanbat National University)

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas
John W Lambert (Georgia Institute of Technology); Yuguang Li (Zillow Group); Ivaylo Boyadzhiev (Zillow Group); Lambert Wixson (Zillow Group); Manjunath Narayana (Zillow group); Will A Hutchcroft (Zillow Group); James Hays (Georgia Institute of Technology, USA); Frank Dellaert (Georgia Tech); Sing Bing Kang (Zillow Group)

ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization Muhammad Zubair Irshad (Georgia Institute of Technology); Sergey Zakharov (Toyota Research Institute); Rareș A Ambruș (Toyota Research Institute); Thomas Kollar (Toyota Research Institute); Zsolt Kira (Georgia Institute of Technology); Adrien Gaidon (Toyota Research Institute)

Towards Regression-Free Neural Networks for Diverse Compute Platforms
Rahul Duggal (Georgia Tech); Hao Zhou (Amazon); Shuo Yang (Amazon); Jun Fang (Amazon); Yuanjun Xiong (Amazon); Wei Xia (Amazon)

VQGAN-CLIP: Open Domain Image Generation and Manipulation Using Natural Language
Katherine B Crowson (EleutherAI); Stella R Biderman (Booz Allen Hamilton); daniel kornis (Eleuther.ai); Dashiell Stander (Eleuther AI); Eric Hallahan (EleutherAI); Louis J Castricato (Georgia Tech); Edward Raff (Booz Allen Hamilton)

WORKSHOP


SkeleVision: Towards Adversarial Resiliency of Person Tracking with Multi-Task Learning
Nilaksh Das, ShengYun Peng, Duen Horng Chau

Hydra Attention: Efficient Attention with Many Heads
Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Judy Hoffman

Machine Learning Center at Georgia Tech

The Machine Learning Center was founded in 2016 as an interdisciplinary research center (IRC) at the Georgia Institute of Technology. Since then, we have grown to include over 190 affiliated faculty members and 145 Ph.D. students, all publishing at world-renowned conferences. The center aims to research and develop innovative and sustainable technologies using machine learning and artificial intelligence (AI) that serve our community in socially and ethically responsible ways. Our mission is to establish a research community that leverages the Georgia Tech interdisciplinary context, trains the next generation of machine learning and AI pioneers, and is home to current leaders in machine learning and AI.

learn more

Copyright © 2025 · Altitude Pro on Genesis Framework · WordPress · Log in