Projects

Computer Architecture

Tomasulo Algorithm Pipelined Processor (processor simulator design)

Instructor: Prof. Tom Conte, School of ECE
ECE 6100: Advanced Computer Architecture, Spring 2016

Designed an out-of-order processor simulator comprising of a 5-stage Tomasulo pipeline structure with Fetch, Dispatch, Scheduling, Execute and State Update stages.
Simulator was expanded to implement two exception handling schemes: ROB with bypass and Checkpoint Repair.

GPU Kernels Implementation (CUDA C, HARP emulator)

Instructor: Prof. Sudhakar Yalamanchili, School of ECE
ECE 8823: GPU Architectures, Spring 2018

Implemented CUDA kernel for 2D convolution to perform image blur on RGB image with both constant and shared memory.
Implemented CUDA kernels to implement the computation graph of a convolutional neural network to perform image recognition on a gray-scale image.
Implemented PDOM stack algorithm for branch divergence in GPUs on HARP emulator for divergent and nested divergent warps, and divergent loops, using SPLIT/JOIN in assembly language.

Approximation on On-Chip Networks for Mitigation of Memory and Bandwidth Walls (Gem5 simulator(Garnet NoC))

Instructor: Tushar Krishna, School of ECE
ECE 8823: Interconnection Networks, Spring 2016

Developed a novel approach to solve the tradeoff problem of limited off-chip bandwidth and long access latency in CMPs by implementing approximation on on-chip networks.
Based on Rollback-Free Value Prediction approximation technique, which manipulates safe-to-approximate loads in LLC misses.
Here, safe-to-approximate loads are manipulated by the approximator both at the network interface and in a router using a drop rate parameter, which decides the fate of a traversing flit – drop or continue.
Achieved up to 10% power efficiency and 50% latency improvement for an 8×8 mesh topology at a cost of 80% accuracy with a negligible area overhead.

Relaxing PIM Synchronization Constraints (memory consistency, PIM, GAS model)

Instructor: David Devecsery, College of Computing
ECE 8803: Memory Models, Fall 2018

Graph processing is one of the most widely used workloads in high performance computing, which is highly limited by the memory wall and, traditional architectural improvements on parallelism do not help improving the performance of graph applications as data accesses here are very irregular. Architects have proposed to utilize Hybrid Memory Cube (HMC) to move computation in/near memory called Processing in Memory (PIM).
The synchronization overheads are here relaxed for the popular Gather-Apply-Scatter graph processing paradigm by relaxing the memory consistency requirements using a novel technique based on “colored” barriers, which is determined by some pre-processing of a graph snapshot. Weaker memory consistency semantics enable us to extract performance and yet provide an illusion of sequential consistency given some semantic constraints are met.
Our approach shows a speedup of over ∼6% for highly-connected and ∼30% for sparsely-connected graphs over the baseline Gather-Apply-Scatter implementation with minimum modifications to existing hardware.

Qubit Allocation Policies with Variation Awareness (quantum computing, IBM Qiskit)

Instructor: Prof. Moin Qureshi, School of ECE
ECE 8853: Introduction to Quantum Computing, Spring 2019

There’s a technological gap between the quantum software and hardware for NISQ computers, which makes qubit mapping a challenge as two logically coupled qubits need to be optimally physically mapped to perform operations, and this has to be done with the minimum number of SWAP operations. To overcome this, a SWAP-based BidiREctional heuristic search algorithm, named SABRE is used, that finds the best initial mapping of the qubits that will give the least number of SWAPs.
However, SABRE doesn’t take into account the imperfect links and qubit error rates as they exist on real quantum computers (IBMQ16), and SABRE was hence modified to accommodate variation-aware qubit allocation and movement. To implement this, variation-aware scheduler was designed that in addition to the SWAPs given by baseline SABRE, schedules the single-qubit instructions that use the same qubits as the CNOTs alongwith any independent instructions that are not affected by the CNOTs and possibly resulting SWAPs. To maintain a list of logical qubit mappping, a remap table structure is designed that tracks which physical qubit the logical qubit is mapped to. The cost function that determines the best possible SWAP combination is modified to take into account the error rates of the links.
Across 7 quantum workloads, it was observed that the Probability of Successful Trial (PST) increased by an average of 14% whereas the number of SWAPs for that algorithm reduced by an average of 8%.

Hardware Security and Cybersecurity

Analysis of EM Emanations from Cache Side-Channel Attacks on IoT Devices (side-channels, cryptography)

Instructor: Hyesoon Kim, College of Computing
CS 7290: Advanced Microrchitecture, Fall 2017

Implemented FLUSH+RELOAD, a popular cache side-channel attack on an Intel processor-based IoT device. As a proof-of-concept, the attack was demonstrated on GnuPG library’s RSA and MiBench benchmark suite’s bitcounts workload.
Upon analyzing the device’s electromagnetic emanations, prominent “loops” were observed corresponding to the attacker’s probing activity in the frequency-time spectrum, indicating that cache side-channel attacks can be monitored and detected completely remotely on resource-constrained IoT devices with zero overhead.

Information Security(cryptography, malware analysis, overflow attacks)

Instructor: James Cannady, Georgia Tech Research Institute
CS 6235: Introduction to Information Security, Spring 2019

Implemented a stack buffer overflow attack that invokes a new shell by manipulating an application that sorts a list of integers.
Performed malware analysis for 4 types of malware by reading into their registry contents in Windows OS.
Implemented RSA encryption and decryption algorithm. Explored security flaws due to faulty random number generator used for key generation, and the effect of small exponent for broadcast RSA attack.
Implemented XSRF attack, DOM-based XSS attack and SQL injection attack on web pages.

RSA Encryption (GMP library, cryptography)

Instructor: Prof. George P. Riley, School of ECE
ECE 6122: Advanced Programming Techniques, Fall 2016

Using the GNU multi-precision arithmetic library, implemented RSA public key cryptography algorithm.
Implemented RSA key generation, message encryption, and message decryption.
Using Pollard Rho algorithm for factorization, implemented breaking RSA algorithm for 64 bit key.

Software Development

Mandelbrot Set Display (OpenGL, pthreads)

Instructor: Prof. George P. Riley, School of ECE
ECE 6122: Advanced Programming Techniques, Fall 2016

Using the OpenGL graphics API, created an application to compute and display the Mandelbrot set.
Implemented a custom class for conducting mathematical operations on complex numbers to be used for Mandelbrot coordinates generation.
Mouse input was added to select a square region within the display to zoom into the Mandelbrot set, upon which the zoomed region was displayed on screen. pThreads were used for fast and efficient computation of the new set region.

Templated Vector Class (C++ template, vector)

Instructor: Prof. George P. Riley, School of ECE
ECE 6122: Advanced Programming Techniques, Fall 2016

Implemented a custom vector library using templates in C++.
Created popular vector functions performing vector element manipulation while keeping application free of memory leaks.

Hardware Design

FPGA Systems Design using Verilog (verilog, Quartus)

Instructor: Timothy Brothers, Georgia Tech Research Institute
ECE 8813: Advanced Digital Design with Verilog, Spring 2017

Designed and implemented various FPGA-based system controllers for GPS interfacing, VGA-Serial convertor and irrigation systems in Quartus.
Implemented image transpose and Sobel operator based edge detection on a grayscale image using an FPGA.

Polish Expression based Floorplanning : Simulated Annealing (physical design algorithms)

Instructor: Prof. Sung-Kyu Lim, School of ECE
ECE 6133: Physical Design Automation, Fall 2016

Created a simulator to perform optimized floorplanning on any given list of hard blocks based on a simulated annealing approach minimizing both area and half perimeter wire length.
In order to create more room for optimization, the * and + operators in the initial Polish expression were taken randomly. Data structures were created to model the Polish expression and the slicing tree that take into account all internal nodes, including left, right and parent nodes, facilitating fast development of advanced algorithms, and the graphics was created using OpenGL.The cooling rate, initial temperature, cost function and stopping conditions were set based on the saturation observed after multiple runs of simulated annealing.
Apart from the conventional M1, M2 and M3 moves of simulated annealing, an M4 move was introduced to rotate the operand modules at lower temperatures, easing the path to reach the local minima. The Stockmeyer algorithm applied after the simulated annealing had almost no change as the M4 move was either very close or already at the most optimal solution.
Overall, it was observed that an average of 65% HPWL reduction, 90% chip area reduction and 80% chip utilization was achieved after testing multiple differently sized circuits on the designed simulator.

SRAM Memory System and Arithmetic Unit (Cadence, VLSI)

Instructor: Prof. Saibal Mukhopadhyay, School of ECE
ECE 6130: Advanced VLSI Systems, Fall 2015

Designed a 50 nm technology based adder system interfaced with 16×32 SRAM array to perform sequential reads and accumulation.
Achieved full functionality in post-layout simulation and compared its performance against pre-layout simulation.
The post-layout design with its extracted parasitics achieved a total power consumption of 470 uW with an SRAM array area efficiency of 63.37% when operated on a nominal supply voltage of 800 mV and frequency of 1 GHz.
In comparison with other teams, our design had capabilities of working at frequencies upto 2.93 GHz even at nominal voltage.

Embedded Systems

An Innovative Approach to Location based Services and Traffic Management System (ARM Cortex M4, RF communication)

Instructor: Mr. Hardik Shah, Dept of Electronics, VES Institute of Technology
B.E. Final Year Project, 2014 – 2015

Built a positioning and navigation system based on the communication between the RF transmitters present on the road and the RF receiver present in the vehicle without using the Internet or GPS, based on offline maps.
Worked on TI’s ARM Cortex M4 Tiva controller and CC2530 SoC for RF communication.
Developed touch screen GUI and incorporated features such as emergency services and vehicle tracking.
Implemented the system throughout the college campus.

Kindle for the Blind (TI MSP430)

Instructor: Mr. Hardik Shah, Dept of Electronics, VES Institute of Technology
Texas Instruments Innovation Challenge, India Analog Design Contest 2014

Funded by Texas Instruments India (out of 1754 proposals from 321 colleges across India). Competed as semi-finalists in the competition.
Built a TI’s MSP430F5659 microcontroller based device to take voice input from a visually impaired user of the eBook name and display the eBook on the prototypic LED matrix display by loading it from the flash drive containing the eBooks stored in a .txt format.