Publications – HPArch

2025

Inside VOLT: Designing an Open-Source GPU Compiler [paper]
Shinnung Jeong, Chihyo Ahn, Huanzhi Pu, Jisheng Zhao, Hyesoon Kim, Blaise Tine
arXiv preprint arXiv:2511.13751 (2025)

Deploying Vortex FPGA development environment with Apptainer
Udit Subrmanya, Rahul Raj, Jeff Young, and Hyesoon Kim
Open-Source Computer Architecture Research Workshop (OSCAR 2025)

Multiway Merge Partitioning for Sparse-Sparse Matrix Multiplication on GPUs
Eric Lorimer, Ruobing Han, Sung Ha Kang, Hyesoon Kim
The International Conference on Parallel Architectures and Compilation Techniques (PACT 2025)

Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering [paper]
Euijun Chung, Seonjin Na, Sung Ha Kang, Hyesoon Kim
IEEE/ACM International Symposium on Microarchitecture (MICRO 2025)

Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD [paper]
Xueyang Liu, Seonjin Na, Euijun Chung, Jiashen Cao, Jing Yang, Hyesoon Kim
IEEE Computer Architecture Letters (CAL 2025)

RV-CURE: A RISC-V Capability Architecture for Full Memory Safety [paper]
Yonghae Kim, Anurag Kar, Jaewon Lee, Jaekyu Lee, Hyesoon Kim
IEEE Transactions on Computers (TC 2025)

Buffer Management for Out-of-GPU LLM Execution [paper]
Jiashen Cao, Joy Arulraj, Hyesoon Kim
DEEM of ACM Special Interest Group on Management of Data (SIGMOD DEEM Workshop 2025)

Aero: Adaptive Query Processing of ML Queries [paper]
Gaurav T Kakkar*, Jiashen Cao*, Aubhro Sengupta, Joy Arulraj, Hyesoon Kim
International Conference on Very Large Data Bases (VLDB 2025)

FlexInfer: Flexible LLM Inference with CPU Computation [paper ]
Seonjin Na, Geonhwa Jeong, Byunghoon Ahn, Aaron Jezghani, Jeffrey Young, Christopher J. Hughes, Tushar Krishna, Hyesoon Kim
Annual Conference on Machine Learning and Systems (MLSys 2025)

Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU [paper]
Huanzhi Pu, Rishabh Ravi, Shinnung Jeong, Udit Subramanya, Euijun Chung, Jisheng Zhao, Chihyo Ahn, Hyesoon Kim
OSSMPIC of Design, Automation and Test in Europe Conference (DATE OSSMPIC Workshop 2025)

SoftCUDA: Running CUDA on Softcore GPU [paper]
Chihyo Ahn, Ruobing Han, Udit Subramanya , Jisheng Zhao, Hyesoon Kim
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM 2025)

Let-Me-In: (Still) Employing In-pointer Bounds metadata for Fine-grained GPU Memory Safety [paper]
Jaewon Lee, Euijun Chung, Saurabh Singh, Seonjin Na, Yonghae Kim, Jaekyu Lee, Hyesoon Kim
IEEE International Symposium on High-Performance Computer Architecture (HPCA 2025)

SparseWeaver: Converting Sparse Operations as Dense Operations on GPUs for Graph Workloads [paper]
Shinnung Jeong, Liam Paul Cooper, Ju Min Lee, Heelim Choi, Nicholas Parnenzini, Chihyo Ahn, Yongwoo Lee, Hanjun Kim, Hyesoon Kim
IEEE International Symposium on High-Performance Computer Architecture (HPCA 2025)

2024

Unleashing CPU Potential for Executing GPU Programs through Compiler/Runtime Optimizations [paper]
Ruobing Han, Jisheng Zhao, Hyesoon Kim
IEEE/ACM International Symposium on Microarchitecture (MICRO 2024)

Understanding Performance Implications of LLM Inference on CPU [paper]
Seonjin Na, Geonhwa Jeong, Byunghoon Ahn, Jeffrey Young, Tushar Krishna, and Hyesoon Kim
IEEE International Symposium on Workload Characterization (IISWC 2024)

Allegro: GPU Simulation Acceleration for Machine Learning Workloads
Euijun Chung, Seonjin Na, Hyesoon Kim
MLArchSys Workshop of IEEE/ACM International Symposium on Computer Architecture (ISCA MLArchSys Workshop 2024)

Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs [paper]
Yuan Feng, Seonjin Na, Hyesoon Kim, Hyeran Jeon
IEEE/ACM International Symposium on Computer Architecture (ISCA 2024)

Quantifying CO2 Emission Reduction through Spatial Partitioning in Deep Learning Recommendation System Workloads [paper]
Bersatti Andrei, Euna Kim, Hyesoon Kim
IEEE Micro Journal 2024

Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches [paper]
Chihyo Ahn, Shinnung Jeong, Liam Cooper, Nicholas Parnenzini, Hyesoon Kim
CGRA4HPC Workshop of International Parallel and Distributed Processing Symposium (IPDPS CGRA4HPC Workshop 2024)

Exponentially Expanding the Phase-Ordering Search Space via Dormant Information [paper]
Ruobing Han, Hyesoon Kim
ACM SIGPLAN International Conference on Compiler Construction (CC 2024)

GPU Database Systems Characterization and Optimization [paper]
Jiashen Cao, Rathijit Sen, Matteo Interlandi, Joy Arulraj, Hyesoon Kim
International Conference on Very Large Databases (VLDB 2024)
[Best paper candidate]

Enabling Fine-Grained Incremental Builds By Making Compiler Stateful
Ruobing Han, Jisheng Zhao, Hyesoon Kim
IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2024)

2023

EHT-SR: An Entropy-Based Hybrid Approach for Faster Super-Resolution
Abhilash Dharmavarapu, Stefano Petrangeli, Jiashen Cao, Hyesoon Kim
IEEE International Symposium on Multimedia (ISM 2023)

CuPBoP-AMD: Extending CUDA to AMD Platforms [paper]
Jun Chen, Xule Zhou, Hyesoon Kim
Workshops of The International Conference on High-Performance Computing, Network, Storage, and Analysis (SC 2023 Workshop)

Accelerator integration in a tile-based SoC: lessons learned with a hardware floating point compression engine [paper]
Xueyang Liu, Patricia Gonzalez-Guerrero, Ivy B. Peng, Ronnald Minnich, Maya Gokhale
Workshops of The International Conference on High-Performance Computing, Network, Storage, and Analysis (SC 2023 Workshop)

Hardware-assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity [paper]
Yonghae Kim, Anurag Kar, Jaewon Lee, Jaekyu Lee, Hyesoon Kim
IEEE Computer Architecture Letter (CAL 2023)

LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference for IoT [paper]
Ramyad Hadidi, Jiashen Cao, Michael S. Ryoo, and Hyesoon Kim
Computer Science, Computer Engineering, & Applied Computing (CSCE 2023)

Spica: Exploring FPGA Optimizations to Enable an Efficient SpMV Implementation for Computations at Edge
Dheeraj Ramchandani, Bahar Asgari, Hyesoon Kim
IEEE International Conference on Edge Computing (EDGE 2023)

Reducing Inference Latency with Concurrent Architectures for Image Recognition [paper][slides]
Ramyad Hadidi, Jiashen Cao, Michael S. Ryoo, and Hyesoon Kim
IEEE International Conference on Edge Computing (EDGE 2023)

Creating Robus Deep Neural Networks With Coded Distributed Computing for IoT [paper][slides]
Ramyad Hadidi, Jiashen Cao, Bahar Asgari, and Hyesoon Kim
IEEE International Conference on Edge Computing (EDGE 2023)

Context-Aware Task Handling in Resource-Constrained Robots with Virtualization [paper][slides]
Ramyad Hadidi, Nima Shoghi, Bahar Asgari, and Hyesoon Kim
IEEE International Conference on Edge Computing (EDGE 2023)

Mitigating Timing-Based NoC Side-Channel Attacks With LLC Remapping [paper]
Anurag Kar, Xueyang Liu, Yonghae Kim, Gururaj Saileshwar, Hyesoon Kim, Tushar Krishina
IEEE Computer Architecture Letter (CAL 2023)

Traversing Large Compressed Graphs on GPUs [paper][github]
Prasun Gera, Hyesoon Kim
IEEE International Parallel & Distributed Processing Symposium (IPDPS 2023)

Skybox: Open-source Graphic Rendering on Programmable RISC-V GPUs [paper]
Blaise Tine, Varun Saxena, Santosh Raghav Srivatsan, Joshua R. Simpson, Fadi Alzammar, Liam Cooper, Hyesoon Kim
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2023)

CuPBoP: A framework to make CUDA portable [paper]
Ruobing Han, Jun Chen, Bhanu Garg, Jeffrey Young, Jaewoong Sim, Hyesoon Kim
Principles and Practice of Parallel Programming (PPoPP poster 2023)

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs [paper]
Geonhwa Jeong, Sana Damani, Abhimanyu Bambhaniya, Eric Qin, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim and Tushar Krishna
IEEE International Symposium on High-Performance Computer Architecture (HPCA 2023)

2022

COX: Exposing CUDA Warp-Level Functions to CPUs [paper]
Ruobing Han, Jaewon Lee, Jaewoong Sim, Hyesoon Kim
ACM Transactions on Architecture and Code Optimization (TACO 2022).

Accelerating Graphic Rendering on Programmable RISC-V GPUs [paper]
Blaise Tine, Varun Saxena, Santosh Srivatsan, Joshua R. Simpson, Fadi Alzammar, Liam Paul Cooper,
Sam Jijina, Swetha Rajagoplan, Tejaswini Anand Kumar, Jeff Young, Hyesoon Kim
Hot Chips (2022).

MAIA: Matrix Inversion Acceleration Near Memory [paper]
Bahar Asgari, Dheeraj Ramchandani, Amaan Marfatia, and Hyesoon Kim
International Conference on Field-Programmable Logic and Applications (FPL 2022).

Securing GPU via Region-based Bounds Checking [paper]
Jaewon Lee, Yonghae Kim, Jiashen Cao, Euna Kim, Jaekyu Lee, Hyesoon Kim
IEEE/ACM International Symposium on Computer Architecture (ISCA 2022).
[Best paper nominee]

The Tip of Iceberg in Open-Source Hardware GPU [slides]
Blaise Tine, Ruobing Han and Hyesoon Kim.
Open-Source Computer Architecture Research (OSCAR 2022).

Implementing Hardware Extensions for Multicore RISC-V GPUs [paper]
Blaise Tine and Hyesoon Kim.
Workshop on Computer Architecture Research with RISC-V (CARRV 2022).

AOS-RISC-V: Towards Always-On Heap Memory Safety [paper]
Yonghae Kim, Anurag Kar, Siddant Singh, Ammar A. Ratnani, Jaekyu Lee, Hyesoon Kim
Workshop on Computer Architecture Research with RISC-V (CARRV 2022).

DynaaDCP: Dynamic Navigation of Autonomous Agents for Distributed Capture Processing [paper]
Sam Jijina, Ramyad Hadidi, Jun Chen, Zhen Jiang, Ashutosh Dhekne, Hyesoon Kim
International Workshop on Domain Specific System Architecture (DOSSA-4).

FiGO: Fine-Grained Query Optimization in Video Analytics [paper]
Jiashen Cao, Karan Sarkar, Ramyad Hadidi, Joy Arulraj, Hyesoon Kim
ACM Special Interest Group on Management of Data (SIGMOD 2022).

2021

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs [paper]
Ruobing Han, Jaewon Lee, Jaewoong Sim, Hyesoon Kim
arXiv preprint arXiv:2112.10034 (2021).

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics Research [paper][github]
Blaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, Kim Hyesoon
IEEE/ACM International Symposium on Microarchitecture (MICRO) (2021).

Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads[paper]
Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Amaan Marfatia, Hyesoon Kim
IEEE International Symposium on Workload Characterization (IISWC) (2021).

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU [paper]
Geonhwa Jeong, Eric Qin, Ananda Samajdar, Christopher Hughes, Sreenivas Subramoney, Hyesoon Kim and Tushar Krishna
Design Automation Conference (DAC) (2021).

Single-Source Hardware-Software Codesign
Blaise Tine, Hyesoon Kim, and Sudhakar Yalamanchili
Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE) (2021).

SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering [paper]
Nima Shoghi, Andrei Bersatti, Moinuddin Qureshi, and Hyesoon Kim
IEEE Computer Architecture Letters (CAL) (2021)

A Scalable Multicore RISC-V GPGPU Accelerator for High-End FPGAs
Blaise Tine, Fares Elsabbagh, Apurve Chawda, Will Gulian, Yaotian Feng, Da Eun Shim, Priyadarshini Roshan, Ethan Lyons, Lingjun Zhu, Sung Kyu Lim, Seyong Lee, Jeff Vetter, Hyesoon Kim
Design Automation Conference DESIGNER, IP AND EMBEDDED TRACK (DAC-DIET) (2021).

Bringing OpenCL to Commodity RISC-V CPUs
Tine Blaise, Seyong Lee, Jeff Vetter, Hyesoon Kim
Fifth Workshop on Computer Architecture Research with RISC-V (2021).

Supporting CUDA for an extended RISC-V GPU architecture
Ruobing Han, Blaise Tine, Jaewon Lee, Jaewoong Sim, Hyesoon Kim
Fifth Workshop on Computer Architecture Research with RISC-V (2021).

Cryptography Acceleration in a RISC-V GPGPU
Austin Adams, Pulkit Gupta, Blaise Tine, Hyesoon Kim
Fifth Workshop on Computer Architecture Research with RISC-V (2021).

Hardware Support to Improve Fuzzing Performance and Precision [paper]
Ren Ding*, Yonghae Kim*, Fan Sang, Wen Xu, Gururaj Saileshwar and Taesoo Kim (*co-first authors)
ACM Conference on Computer and Communications Security (CCS), Seoul, South Korea (2021).

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction [paper]
Bahar Asgari, Ramyad Hadidi, Jiashen Cao, Da Eun Shim, Sung-Kyu Lim, Hyesoon Kim
International Symposium on High-Performance Computer Architecture (HPCA), Seoul, South Korea (2021)

Quantifying the Design-Space Tradeoffs in Autonomous Drones [paper]
Ramyad Hadidi, Bahar Asgari, Sam Jijina, Adriana Amyette, Nima Shoghi, Hyesoon Kim
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Detroit, MI (2021)

THIA: Accelerating Video Analytics using Early Inference and Fine-Grained Query Planning [paper]
Jiashen Cao, Ramyad Hadidi, Joy Arulraj, Hyesoon Kim
arXiv preprint arXiv:2102.08481 (2021)

Efficiently Solving Partial Differential Equations in a Partially Reconfigurable Specialized Hardware [paper]
Bahar Asgari, Ramyad Hadidi, Tushar Krishna, Hyesoon Kim, Sudhakar Yalamanchili
IEEE Transactions on Computers (2021)

2020

Things to Consider to Enable Dynamic Graphs in Processing-in-Memory
Euna Kim and Hyesoon Kim
International Symposium on Memory Systems (MEMSYS), Washington, DC (2020)

Parallel Hash Table Design for NDP Systems
Pranith Kumar and Hyesoon Kim
International Symposium on Memory Systems (MEMSYS), Washington, DC (2020)

Neural Network Weight Compression with NNW-BDI
Andrei Bersatti, Nima Shoghi, and Hyesoon Kim
International Symposium on Memory Systems (MEMSYS), Washington, DC (2020)

Reducing Inference Latency with Concurrent Architectures for Image Recognition
Ramyad Hadidi, Jiashen Cao, Michael S. Ryoo, Hyesoon Kim
arXiv preprint arXiv:2011.07092 (2020)

LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference in Image Recognition
Ramyad Hadidi, Bahar Asgari, Jiashen Cao, Younmin Bae, Da Eun Shim, Hyojong Kim, Sung-Kyu Lim, Michael S. Ryoo, Hyesoon Kim
arXiv preprint arXiv:2003.06464 (2020)

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads
Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Hyesoon Kim
arXiv preprint arXiv:2011.10932 (2020)

Secure Location-Aware Authentication and Communication for Intelligent Transportation Systems
Nima Shoghi Ghalehshahi, Ramyad Hadidi, Lee Jaewon, Jun Chen, Arthur Siqueria, Rahul Rajan, Shaan Dhawan, Pooya Shoghi Ghalehshahi, Hyesoon Kim
arXiv preprint arXiv:2011.07092 (2020)

RISC-V FPGA Platform toward ROS-based Robotics Application [Slides]
Jaewon Lee, Hanning Chen, Hyesoon Kim
30th International Conference on Field-Programmable Logic and Applications

MEISSA: Multiplying Matrices Efficiently in a Scalable Systolic Architecture
Bahar Asgari, Ramyad Hadidi, Hyesoon Kim
IEEE International Conference on Computer Design (ICCD), Hartford, Massachusetts (2020)

Hardware-based Always-On Heap Memory Safety [Slides]
Yonghae Kim, Jaekyu Lee, Hyesoon Kim
IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece (2020)

Traversing Large Graphs on GPUs with Unified Memory [Talk Video]
Prasun Gera, Hyojong Kim, Piyush Sao, Hyesoon Kim, David Bader
Proceedings of the VLDB Endowment, Vol. 13, No. 7. VLDB 2020 Tokyo, Japan

Proposing a Fast and Scalable Systolic Array to Implement Matrix Multiplications on FPGA [Slides]
Bahar Asgari, Ramyad Hadidi, Hyesoon Kim
Symposium on Field-Programmable Custom Computing Machines (FCCM), Fayetteville, AR (2020)

Understanding the Software and Hardware Stacks of a General-Purpose Cognitive Drone [Poster]
Sam Jijina, Adriana Amyette, Nima Shoghi, Ramyad Hadidi, Hyesoon Kim
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA (2020)

PISCES: Power-Aware Implementation of SLAM by Customizing Efficient Sparse Algebra
Bahar Asgari, Ramyad Hadidi, Nima Shoghi, Hyesoon Kim
Design Automation Conference (DAC), San Francisco, CA (2020)

Towards a General Purpose Cognitive Drone [Slides]
Sam Jijina, Adriana Amyette, Ramyad Hadidi, Hyesoon Kim
The Fourth Workshop on Cognitive Architectures (CogArch 2020), co-located with HPCA 2020, San Diego, CA (2020)

Batch-Aware Unified Memory Management in GPUs for Irregular Workloads [Talk Video]
Hyojong Kim, Jaewoong Sim, Prasun Gera, Ramyad Hadidi, Hyesoon Kim
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Lausanne, Switzerland (2020)

ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
Bahar Asgari, Ramyad Hadidi, Tushar Krishna, Hyesoon Kim, Sudhakar Yalamanchili
International Symposium on High-Performance Computer Architecture (HPCA), San Diego, CA (2020)

Tango: An Optimizing Compiler for Just-in-time RTL Simulation
Blaise Tine, Hyesoon Kim, Sudhkar Yalamanchili
Design, Automation, and Test in Europe (DATE), Grenoble, France (2020)

ASCELLA: Accelerating Sparse Computation by Enabling Stream Accesses to Memory [Talk Video]
Bahar Asgari, Ramyad Hadidi, Hyesoon Kim
Design, Automation, and Test in Europe (DATE), Grenoble, France (2020)

Productive Hardware Designs using Hybrid HLS-RTL Development
Blaise Tine, Lee Seyong, Jeff Vetter, Hyesoon Kim
International Symposium on Field-Programmable Gate Arrays (FPGA) poster, Seaside, CA (2020)

Cash: A Single-Source Hardware-Software Codesign Framework for Rapid Prototyping
Blaise Tine, Elsabbagh Fares, Jeff Vetter, Hyesoon Kim
International Symposium on Field-Programmable Gate Arrays (FPGA) poster, Seaside, CA (2020)

2019

Impact of Instruction Set Architecture on Machine Learning Workloads
Jeung Moon Lee, Hyesoon Kim, Hyojong Kim and Pranith Kumar
ACM PACT Student Research Competition (SRC), Seattle, Washington, USA (2019)

Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices [Slides] [EdgeBench]
[Best Paper Nominee]
Ramyad Hadidi, Jiashen Cao, Yilun Xie, Bahar Asgari, Tushar Krishna, Hyesoon Kim
IEEE International Symposium on Workload Characterization (IISWC), Orlando, FL (2019)

ERIDANUS: Efficiently Running Inference of DNNs Using Systolic Arrays
Bahar Asgari, Ramyad Hadidi, Hyesoon Kim, Sudhakar Yalamanchili
IEEE Micro, Special Issue on Machine Learning Acceleration (2019)

SLAM Performance on Embedded Robots
Nima Shoghi, Ramyad Hadidi, Hyesoon Kim
Student Research Competition at Embedded System Week (SRC ESWEEK), New York, NY (2019)

Enabling Speech to Text on Embedded Systems
Mohan Dodda, Taejoon Park, Sayuj Shajith, Ramyad Hadidi, Hyesoon Kim
Student Research Competition at Embedded System Week (SRC ESWEEK), New York, NY (2019)

Video Analytics From Edge To Server [Slides]
Jiashen Cao, Ramyad Hadidi, Joy Arulraj and Hyesoon Kim
International Conference on Hardware/Software Codesign and System Synthesis CODES+ISSS (ESWEEK), New York, NY (2019)

Capella: Customizing Perception for Edge Devices by Efficiently Allocating FPGAs to DNNs [Demo Site]
Younmin Bae, Ramyad Hadidi, Bahar Asgari, Jiashen Cao, Hyesoon Kim
International Conference on Field-Programmable Logic and Applications (FPL), Demo, Barcelona, Spain (2019)

Characterizing the Execution of Deep Neural Networks on Collaborative Robots and Edge Devices [Slides]
Matthew Merck, Bingyao Wang, Lixing Liu, Chunjun Jia, Arthur Siqueira, Qiusen Huang, Abhijeet Saraha,
Dongsuk Lim, Jiashen Cao, Ramyad Hadidi, Hyesoon Kim
ACM Practice and Experience in Advanced Research Computing (PEARC), Chicago, IL (2019)

Vortex RISC-V GPGPU system: Extending the ISA, Synthesizing the Microarchitecture, and Modeling the Software Stack
Fares Elsabbagh, Bahar Asgari, Hyesoon Kim and Sudhakar Yalamanchili
Third Workshop on Computer Architecture Research with RISC-V (CARRV), Co-located with ISCA’19, Pheonix, AZ (2019)

Understanding the Power Consumption of Executing Deep Neural Networks on a Distributed Robot System [Slides]
Ramyad Hadidi, Jiashen Cao, Matthew Merck, Arthur Siqueira, Qiusen Huang, Abhijeet Saraha,
Chunjun Jia, Bingyao Wang, Dongsuk Lim, Lixing Liu and Hyesoon Kim
Algorithms and Architectures for Learning in-the-Loop Systems in Autonomous Flight Workshop,
Co-located with IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC (2019)

A Case Study: Exploiting Neural Machine Translation to Translate CUDA to OpenCL
Yonghae Kim, Hyesoon Kim
2nd International Workshop on AI-assisted Design for Architecture,
Co-located with International Symposium on Computer Architecture (ISCA), Phoenix, AZ, June 22 (2019)

A Case Study: Exploiting Neural Machine Translation to Translate CUDA to OpenCL
Yonghae Kim, Hyesoon Kim
arXiv preprint arXiv:1905.07653 (2019)

Translating CUDA to OpenCL for Hardware Generation using Neural Machine Translation
Yonghae Kim, Hyesoon Kim
The ACM CGO Student Research Competition (SRC), Washington, D.C., USA (2019)

FlashGPU: Placing New Flash Next to GPU Cores
Jie Zhang, Miryeong Kwon, Myoungsoo Jung, Hyojong Kim, Hyesoon Kim
56th Design Automation Conference (DAC), June 2019

An Edge-Centric Scalable Intelligent Framework To Collaboratively Execute DNN [Demo] [Paper]
Jiashen Cao, Fei Wu, Ramyad Hadidi, Lixing Liu, Tushar Krishna, Micheal S. Ryoo, Hyesoon Kim
Demo for SysML Conference, Palo Alto, CA (2019)

LODESTAR: Creating Locally-Dense CNNs for Efficient Inference on Systolic Arrays
Bahar Asgari, Ramyad Hadidi, Hyesoon Kim, and Sudhakar Yalamanchili
ACM/IEE Design Automation Conference (DAC) – Late Breaking Results, Las Vegas, NV (2019)

Robustly Executing DNNs in IoT Systems Using Coded Distributed Computing [Slides]
Ramyad Hadidi, Jiashen Cao, Michael S. Ryoo, Hyesoon Kim
ACM/IEE Design Automation Conference (DAC) – Late Breaking Results, Las Vegas, NV (2019)

Empirical Investigation of Stale Value Tolerance on Parallel RNN Learning
Joo Hwan Lee, Hyesoon Kim,
The International Symposium on Performance Analysis of Systems and Software 2019 (ISPASS 2019) ,April 2019

Thermal-Aware Processing-in-memory Instruction Offloading
Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim
Journal of Parallel and Distributed Computing (JPDC), Elsevier, (2019)

Collaborative Execution of Deep Neural Networks on Internet of Things Devices
Ramyad Hadidi, Jiashen Cao, Michael S. Ryoo, Hyesoon Kim
arXiv preprint arXiv:1901.02537 (2019)

2018

Distributed Perception by Collaborative Robots [Slides]
Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, Hyesoon Kim
Invited for IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18), Madrid, Spain (2018)
and IEEE Robotics and Automation Letters (RA-L)

Real-Time Image Recognition Using Collaborative IoT Devices [Slides]
Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, Hyesoon Kim
1st Reproducible Tournament on Pareto-efficient Image Classification (ACM ReQuEST workshop), co-located with ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Williamsburg, VA, USA (2018)

CODA: Enabling Co-location of Computation and Data for Near-Data Processing
Hyojong Kim, Ramyad Hadidi, Lifeng Nai, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, Gabriel H. Loh
ACM Transactions on Architecture and Code Optimization (TACO), Volume 15 Issue 3, October 2018, 2018

CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading [Slides]
Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim
IEEE International Parallel & Distributed Processing Symposium (IPDPS), Vancouver, British Columbia, Canada, May. 2018

Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices
Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, Hyesoon Kim
arXiv preprint arXiv:1802.02138 (2018)

Performance Characterisation and Simulation of Intel’s Integrated GPU Architecture [Slides]
Prasun Gera, Hyojong Kim, Hyesoon Kim, Sunpyo Hong, Vinod George, Chi-Keung (CK) Luk
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Belfast, Northern Ireland, United Kingdom, Apr. 2018

Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube [Slides]
Ramyad Hadidi, Bahar Asgari, Jeffrey Young, Burhan Ahmad Mudassar, Kartikay Garg, Tushar Krishna, Hyesoon Kim
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Belfast, Northern Ireland, United Kingdom, Apr. 2018

2017

StaleLearn: Learning Acceleration with Asynchronous Synchronization between Model Replicas on PIM
Joo Hwan Lee and Hyesoon Kim
IEEE Transactions on Computers, 2017

CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory
Ramyad Hadidi, Lifeng Nai, Hyojong Kim, and Hyesoon Kim
ACM Transactions on Architecture and Code Optimization (TACO), Volume 14 Issue 4, December 2017, 2017

Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube [Slides]
Ramyad Hadidi, Bahar Asgari, Burhan Ahmad Mudassar, Saibal Mukhopadhyay, Sudhakar Yalamanchili, and Hyesoon Kim
IEEE International Symposium on Workload Characterization (IISWC), Seattle, WA, Oct. 2017

Lightweight SIMT Core Designs for Intelligent 3D Stacked DRAM
Chad D. Kersey, Sudhakar Yalamanchili, and Hyesoon Kim
The International Symposium on Memory Systems (MEMSYS’17), Oct. 2017

Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing
Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, Marcus Peinado
USENIX Security Symposium, Aug. 2017

SimProf: A Sampling Framework for Data Analytic Workloads
Jen-Cheng Huang, Lifeng Nai, Pranith Kumar, Hyojong Kim, and Hyesoon Kim
International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, May 2017

GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks [Slides] [Lightning]
Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim
International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, Feb. 2017

2016

Exploring Big Graph Computing – An Empirical Study from Architectural Perspective
Lifeng Nai, Yinglong Xia, Ilie G. Tanase, and Hyesoon Kim
Journal of Parallel and Distributed Computing, 2016

Analyzing Consistency Issues In HMC Atomics
Pranith Kumar, Lifeng Nai, and Hyesoon Kim
The International Symposium on Memory Systems (MEMSYS), Washington, DC, Oct. 2016

2015

GraphBIG: Understanding Graph Computing in the Context of Industrial Solutions
Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim, and Ching-Yung Lin
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2015

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models
[Best Paper Award]
Joo Hwan Lee, Jaewoong Sim and Hyesoon Kim
International Conference on Parallel Architectures and Compilation Techniques (PACT), (2015)

Instruction Offloading with HMC 2.0 Standard – A Case Study for Graph Traversals
Lifeng Nai, and Hyesoon Kim
The International Symposium on Memory Systems (MEMSYS), Oct. 2015

SIMT-based Logic Layers for Stacked DRAM Architectures: A Prototype
Chad D. Kersey, Sudhakar Yalamanchili, and Hyesoon Kim
The International Symposium on Memory Systems (MEMSYS), Oct. 2015

Understanding Energy Aspect of Processing Near Memory for HPC Workloads
Hyojong Kim, Hyesoon Kim, Sudhakar Yalamanchili, Arun F. Rodrigues
The International Symposium on Memory Systems (MEMSYS), Oct. 2015

Cymric: A Framework for Prototyping Near-Memory Architectures
C. Kersey, H. Kim, S. Yalamanchili
WARP 2015, 6th Workshop on Architectural Research Prototyping, Co-Located with the 42nd International Symposium on Computer Architecture, 2015 [talks]

SP-CNN: A Scalable and Programmable CNN-based Accelerator
Dilan Manatunga, Hyesoon Kim, Saibal Mukhopadhyay
IEEE Micro, 2015

SP-CNN: A Scalable and Programmable CNN-based Accelerator
Dilan Manatunga, Hyesoon Kim, Saibal Mukhopadhyay
GOMACTech, Mar. 2015

Block-Precise Processors: Low-Power Processors with Reduced Operand Store Accesses and Result Broadcasts
Nagesh B. Lakshminarayana and Hyesoon Kim
IEEE Transactions on Computers, 2015

GREEN Cache: Exploiting the Disciplined Memory Model of OpenCL on GPUs
Jaekyu Lee, Dong Hyuk Woo, Hyesoon Kim, and Mani Azimi
IEEE Transactions on Computers, 2015

Accelerating Application Start-up with Nonvolatile Memory in Android Systems
Hyojong Kim, Hongyeol Lim, Dilan Manatunga, Hyesoon Kim, Gi-Ho Park
IEEE Micro, Jan/Feb, 2015

2014

Transparent Hardware Management of Stacked DRAM as Part of Memory
Jaewoong Sim, Alaa R. Alameldeen, Zeshan Chishti, Chris Wilkerson, Hyesoon Kim
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Cambridge, UK, Dec. 2014[talks]

GPUMech: GPU Performance Modeling Technique based on Interval Analysis
Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, Hsien-Hsin S. Lee
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Cambridge, UK, Dec. 2014 [talks]

Design space exploration of memory model for heterogeneous computing
Jieun Lim and Hyesoon Kim
2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing(SBAC-PAD), Oct. 2014[talks]

OpenCL Performance Evaluation on Modern Multi Core CPUs
Joo Hwan Lee, Kaushik Patel, Nimit Nigania, Hyojong Kim, Hyesoon Kim,
Scientific Programming, 2014

Power Modeling for GPU Architectures Using McPAT
Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, and Wonyong Sung
ACM Trans. Des. Autom. Electron. Syst. 19, 3, Article 26 (June 2014)

Harmonica: An FPGA-Based Data Parallel Soft Core
Chad Kersey, Sudhakar Yalamanchili, Hyojong Kim, Nimit Nigania, and Hyesoon Kim
The 22nd International Symposium on Field-Programmable Custom Computing Machines (FCCM), May, 2014 (Poster)

A Configurable and Strong RAS Solution for Die-Stacked DRAM Caches
Jaewoong Sim, Gabriel H. Loh, Vilas Sridharan, Mike O’Connor
IEEE Micro, Special Issues: Micro’s Top Picks from 2013 Computer Architecture Conferences (TOP PICKS), May/June 2014

Hardware Support for Safe Execution of Native Client Applications
Dilan Manatunga, Joo Hwan Lee, and Hyesoon Kim
Computer Architecture Letters (CAL), vol.PP, no.99, pp.1,1 2014

Spare Register Aware Prefetching for Graph Algorithms on GPUs
Nagesh B Lakshminarayana and Hyesoon Kim
The 20th International Symposium on High Performance Computer Architecture (HPCA), Orlando, Feb 2014 [talks]

TBPoint: Reducing Simulation Time for Large Scale GPGPU Kernels
Jen-Cheng Huang, Lifeng Nai, Hyesoon Kim, Hsien-Hsin Lee
The 28th International Parallel & Distributed PRocessing Symposium (IPDPS), Phoenix, AZ, May 2014

2013

Design Space Exploration of On-chip Ring Interconnection for a CPU-GPU Heterogeneous Architecture
Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili
In Journal of Parallel and Distributed Computing (JPDC), Vol. 73, Issue 12, pp. 1525-1538, December 2013

Adaptive Virtual Channel Partitioning for Network-on-Chip in Heterogeneous Architectures
Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili
In ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol. 18, No. 4, pp.48:1-48:28, October 2013

SESH framework: A Space Exploration Framework for GPU Application and Hardware Codesign
Joo Hwan Lee, Jiayuan Meng, Hyesoon Kim
4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), held as part of SC13, Denver, Colorado, USA, November 2013

Resilient Die-stacked DRAM Caches
Jaewoong Sim, Gabriel H. Loh, Vilas Sridharan, Mike O’Connor
40th international Symposium on Computer Architecture (ISCA), Tel-Aviv, Israel, June 2013 [talk]

CHiP: A Profiler to Measure the effect of Cache Contention on Scalability
Bevin Brett, Pranith Kumar, Minjang Kim, Hyesoon Kim,
Workshop on Multithreaded Architectures and Applications in conjunction with IPDPS-27, Boston, USA, May 2013

OpenCL Performance Evaluation on Modern Multi Core CPUs
Joo Hwan Lee, Kaushik Patel, Nimit Nigania, Hyojong Kim, Hyesoon Kim,
Multicore and GPU Programming Models, Languages and Compilers Workshop (PLC 2013), in conjunction with IPDPS-27, Boston, USA, May 2013

When Prefetching Works, When It Doesn’t, and Why
Jaekyu Lee, Hyesoon Kim, and Richard Vuduc
An invited paper (originally published in TACO), 8th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Berlin, Germany, January 2013

2012

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch
Jaewoong Sim, Gabriel Loh, Hyesoon Kim, Mike O’Connor, Mithuna Thottethodi
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Vancouver, BC, Canada, Dec. 2012 [talk]

SD3: An Efficient Dynamic Data-Dependence Profiling Mechanism
Minjang Kim, Nagesh B. Lakshminarayana, Hyesoon Kim, Chi-Keung Luk
IEEE Transactions on Computers (TC), July 2012.

FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth with Flexible Exclusion
Jaewoong Sim, Jaekyu Lee, Moinuddin K. Qureshi, and Hyesoon Kim
Proceedings of the 39th IEEE International Symposium on Computer Architecture (ISCA), Portland, OR, June 2012 [talk]

Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model
Minjang Kim, Pranith Kumar, Hyesoon Kim, and Bevin Brett
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012

When Prefetching Works, When It Doesn’t, and Why
Jaekyu Lee, Hyesoon Kim, and Richard Vuduc
ACM Transactions on Architecture and Code Optimization (TACO), Vol. 9, No. 1, pp.2:1-2:29, March 2012

A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications
Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, and Richard Vuduc
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallal Programming (PPoPP), New Orleans, LA, February 2012. [talk]

TAP: A TLP-Aware Cache Management Schemes for a CPU-GPU Heterogeneous Architecture
Jaekyu Lee and Hyesoon Kim
Proceedings of the 18th International Symposium on High Performance Computer Architecture (HPCA), New Orleans, LA, February 2012. [talk]

2011

DRAM Scheduling Policy for a GPGPU Architecture Based on a Potential Function
Nagesh B. Lakshminarayana, Jaekyu Lee, Hyesoon Kim, and Jinwoo Shin
IEEE Computer Architecture Letters (CAL) Nov. 2011

2010

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
Jaekyu Lee, Nagesh B Lakshminarayana, Hyesoon Kim, Richard Vuduc
MICRO-43, Atlanta, GA, 2010. [talk]

SD3: A scalable Approach to Data-Dependence Profiling
Minjang Kim, Hyesoon Kim, Chi-Keung Luk
MICRO-43, Atlanta, GA, 2010.

An Integrated GPU Power and Performance Model
Sunpyo Hong and Hyesoon Kim
ISCA-37, June 2010. [talk]

Prospector: A Dynamic Data-Dependence Profiler To Help Parallel Programming
Minjang Kim, Hyesoon Kim, Chi-Keung Luk
HotPar-2, June, 2010. [poster]

Effect of Instruction Fetch and Memory Scheduling on GPU Performance
Nagesh B. Lakshminarayana and Hyesoon Kim
Workshop on Language, Compiler, and Architecture Support for GPGPU, in conjunction with HPCA/PPoPP 2010, 2010. [talk]

2009

Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping
Chi-Keung Luk, Sunpyo Hong, Hyesoon Kim
MICRO 2009, December, 2009.

Age Based Scheduling Policy for Asymmetric Multiprocessors
Nagesh B. Lakshminarayana, Jaekyu Lee, Hyesoon Kim
Super Computing ,November, 2009. [talk]

An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness
Sunpyo Hong and Hyesoon Kim
Proceedings of the 36th International Symposium on Computer Architecture (ISCA-36), Austin, TX, June 2009. [talk]

Technical Reports

Joo Hwan Lee, Nimit Nigania, Hyesoon Kim, and Bevin Brett, “HPerf : A Lightweight Profiler for Task Distribution on CPU+GPU Platforms”, GT-CS-15-04, Georgia Institute of Technology, 2015.

Jaekyu Lee, Si Li, Hyesoon Kim, and Sudhakar Yalamanchili, “Design Space Exploration of On-chip Ring Interconnection for a CPU-GPU Architecture”, GIT-CERCS-12-05, Georgia Institute of Technology, 2012.

Chayong Lee, Euna Kim, and Hyesoon Kim, “The AM-Bench: An Android Multimedia Benchmark Suite”, GIT-CERCS-12-04, Georgia Institute of Technology, 2012.

Vishal Gupta, Hyesoon Kim, and Karsten Schwan, “Evaluating Scalability of Multi-threaded Applications on a Many-core Platform”, GIT-CERCS-12-03, Georgia Institute of Technology, 2012.

Minjang Kim, Chi-Keung Luk, Hyesoon Kim, “Prospector:Discovering Parallelism via Dynamic Data-Dependence Profiling”, TR-2009-003, Georgia Institute of Technology, 2009.

Sunpyo Hong, Hyesoon Kim, “An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness”, TR-2009-003, Georgia Institute of Technology, 2009.

Sunpyo Hong, Hyesoon Kim, “Parallelization of Mutual-Information Based Registration in the ITK Toolkit Using CUDA and TBB”, TR-2009-002, Georgia Institute of Technology, 2009.

Chi-Keung Luk, Sunpyo Hong, Hyesoon Kim, “Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping”, TR-2009-001, Georgia Institute of Technology, January, 2009.