Research

Fixing DDIO’s leaky DMA with a lightweight hardware-software co-design

Modern datacenter servers move data across the network at exceedingly high speeds—turning on-server resources like memory bandwidth and cache capacity into a potential bottleneck. Data Direct I/O (DDIO) suffers from leaky DMA: large fractions of network traffic spill out of caches into memory, consuming precious memory bandwidth and capping the effective network bandwidth.
Sweeper is a hardware extension with an accompanying API that lets applications mark network buffers after consumption, allowing hardware to skip unnecessary memory writebacks. Sweeper unlocks up to 2.6× higher sustainable network bandwidth, showing how small hardware-software co-designs can unlock major performance gains for the next generation of high-speed datacenter networks.

Related publications:

Patching up Network Data Leaks with Sweeper
55th IEEE/ACM International Symposium on Microarchitecture
Marina Vemmou, Albert Cho, Alexandros Daglis

Turning persist latency into parallelism for crash-consistent Persistent Memory

Programming crash-consistent applications on Persistent Memory (PM) remains challenging. every persist operation must appear in order, which forces the CPU to stall on PM’s long write latencies—dramatically reducing performance.
COSPlay leverages coroutines and lightweight context switching to overlap persistence across concurrent tasks, while still preserving the familiar x86 synchronous persistency model. With small CPU extensions, COSPlay achieves up to 1.7× higher throughput on baseline PM systems, and 2.2–7.3× gains when persist latencies grow due to backend features like encryption and deduplication.

Related Publications:

COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous Persistence
54th IEEE/ACM International Symposium on Microarchitecture
Marina Vemmou, Alexandros Daglis

High-Throughput Persistence with Coroutines
The Third Young Architect Workshop @ ASPLOS 2021
Marina Vemmou