Vortex : Open source RISC-V GPGPU

Vortex Homepage

Vortex is an open source Hardware and Software project to support GPGPU based on RISC-V ISA extensions. Currently, Vortex supports OpenCL and it runs on FPGA. The vortex platform is highly customizable and scalable with a complete open source compiler, driver and runtime software stack to enable research in GPU architectures.

Students: Blaise Tine, Roubing Han, Liam Cooper, Shinnung Jeon
Sponsors: Oak Ridge National Lab, Silicon arts

Hardware Support for Memory Safety

Memory safety violations, caused by illegal use of pointers in unsafe programming languages such as C and C++, have been a major threat to modern computer systems. While numerous software solutions have been proposed, their high performance overhead limits their usage to testing and debugging. In this project, we study the challenges of existing memory safety solutions and develop a low-overhead yet secure runtime solution achieved with novel micro-architectural supports.

Students: Yonghae Kim, Jaewon Lee, Anurag Kar, Seonjin Na

GPU Memory Safety

Graphics processing units (GPUs) have become essential general-purpose computing platforms to accelerate a wide range of workloads, However, recent memory corruption attacks, like Mind Control Attack, exposed security vulnerabilities in GPUs. 

In the previous project GPUShield, we developed a hardware-software cooperative region-based bounds-checking mechanism and it improves GPU memory safety for global, local, and heap memory buffers. However, the advance of GPU framework requires more memory safety feature, 

In this project, we explore the comprehensive GPU memory security mechanism which supports temporal memory safety and fine-grain intra-object memory safety.

Students and Sponsors: Jaewon Lee, Yonghae Kim, Seonjin Na, Smaran Mishra, Ammar Ratnani, Hyesoon Kim

Enabling Modern Deep Learning Applications Through Compression and Simulation

Modern deep learning applications are experiencing a data explosion that far surpasses the existing memory availability.  Also, the size and complexity of these systems makes studying these systems demanding.

In this project, we study how to reduce the memory footprint of modern deep learning applications and the related challenges that exist when carrying out such reductions, and how to reduce the time to study such systems through simulation.

Our current research involves studying how to better simulate modern, time consuming, machine learning applications.

Students: Andrei Bersatti

Autonomous Drone Architecture and Ecosystem

Drone Build Guide 

Over the last decade, significant progress has been made in developing autonomous drones, with countless applications such as aerial mapping, natural disaster recovery, search and rescue, ecology, and entertainment. Thus, many control, planning, and perception methods have been assimilated for drones. Nevertheless, drones must operate under quite different conditions than any other compute-based agent. First, weight and power are restrictive parameters in drones. Second, drones must arbitrate between their limited compute, energy, and electromechanical resources not only based on the current tasks and local conditions eg: wind, but also according to the flight plan. Despite huge technological advances, these problems have been approached in isolation, and the end-to-end system design-space tradeoffs are largely unknown. In this project, we quantify and analyze and quantify different drone parameters and their subsystems. Further, we aim to use our data to create a better open-source drone simulation and modelling platform that can benefit the wider research community.

Graduate Student & Mentor: Sam Jijina
Undergraduate Students: Varun Pateel (Spring 2023 PURA winner) and Sri Ranganathan Palaniappan
Sponsor: NSF

Accelerate Database Workloads using GPUs

EVA Multimedia Database System

Database systems often have a large-scale of data for engineers to process. Those data can be both structured (e.g., excel files) and non-structured (e.g., videos). GPUs can accelerate processing those data by its powerful computation resource and high memory bandwidth. In this project, we are exploring:

  • Improve state-of-the-art database execution algorithms on GPUs (e.g., hash-join) for structured data.
  • Build database system with SQL interface for users to quickly query and understand videos or other non-structured data. Additionally, profile performance bottlenecks in the system and improve the query execution performance. We investigate different GPU optimization techniques and also system design choices to fit for GPUs.

Students: Jiashen Cao
Advisors: Hyesoon Kim and Joy Arulraj
Sponsors: Cisco, ETRI, Adobe, and NSF

Source Video
Query Result

CuPBoP: CUDA for Parallelized and Broad-range Processors

CuPBoP framework

CuPBoP is an open source framework, which supports executing CUDA on non-NVIDIA devices. Currently, we are working on supporting CUDA on CPUs, Vortex GPUs, AMD GPUs, and Intel GPUs. By supporting CUDA on non-NVIDIA devices, we can support Single-Kernel-Multiple-Device execution on heterogeneous systems.

Students: Roubing Han, Jun Chen, Bhanu Garg, Mark Ahn, Xuele Zhou, John Lu, Haotian Sheng, Blaise Tine

Accelerate Image Super-Resolution

State-of-the-art image super-resolution approach always requires inference on deep neural networks, which is very compute-intensive. Traditionaly lightweight image processing algorithms (e.g., bicubic interpolation) can still provide a higher-resolution image with very low computation overhead. In this project, we are exploring how to use different algorithms adaptively to acheive a good quality high-resolution image.

Students: Jiashen Cao, Abhilash Dharmavarapu
Sponsors: Adobe

Scheduling for In Storage Computing

In an era of data explosion, more applications are becoming data heavy, which challenges the data transfer capability between compute and storage. A solution to mitigate this data transfer bottleneck is to equip acceleration units right next to storage, without data going through the shared PCIe bus.

In this project, we explore generalized near/in storage accelerators including FPGAs and GPUs. We leverage Samsung’s SmartSSD device and study the schedule strategy between CPU host and FPGA. We are also exploring a future architecture where a GPU accelerator is embedded in the SSD controller. Our research addresses scheduling challenges to maximize SSD data bandwidth from the GPU standpoint.

Students: Xueyang Liu, Jiashen Cao, Seonjin Na, Ayush Saran, Thaneesh Babu Krishnasamy, Kinshuk Phalke, Euijun Chung, Ayush Gundawar, Sachitt Arora
Sponsors: Samsung

Sparse and Irregular Algorithm Acceleration For Programmable Architectures

Sparse matrix algorithms are very important in many domains, including machine learning, graph processing, and physical simulations. Accelerating sparse matrix algorithms is difficult for a number of reasons, not the least of which is that the optimal algorithm and matrix representation format to use is dependent on the input matrices themselves. The wide diversity in algorithms and representations requires very flexible architectures and suggests that programmability is an important design requirement for a successful accelerator. Current programmable architectures, such as GPUs, often perform quite poorly on sparse workloads. This project investigates hardware architectures and extensions and software techniques to accelerate these algorithms.

Students: Eric Lorimer

Process-In-Memory Acceleration for GNN

Work in Progress

Students: Micheal Goldstein

For more past research projects, please see here