Data Structures Workshop

Dictionary data structures, such as B-trees [1970], binary trees (e.g., red-black trees [1978]), Packed Memory Arrays [1981], maintain a representation of a set of keys or key-value pairs. Such data structures are one of the fundamental topics in computer science and have been widely used and studied in the context of efficient storage systems, databases, graph processing, among many other applications.

There have been many interesting recent developments at the intersection of theory and practice on parallel and concurrent data structures including new settings, models, and applications. The goal of this workshop is to cover some of these topics, including batch-parallel data structures, concurrent data structures, and external-memory data structures.

Organized by: Helen Xu and Laxman Dhulipala

Agenda

The primary goal of the workshop is to bring together researchers at the forefront of research in parallel and concurrent data structures and make the community (both theory and applications) aware of recent developments. This will help facilitate discussion of open research questions and interactions to tackle those questions.

Date/Time

Monday, June 17, 2024

Schedule

All times in CEST

1:15 – 1:30p – Welcome & opening remarks – Helen Xu & Laxman Dhulipala

Part 1 – Parallel and batched data structures

1:30 – 2p – Mining Perfect Hash Functions – Peter Sanders

2 – 2:30p – Batch-Dynamic Algorithms in the Parallel and Concurrent Models – Quanquan Liu

2:30 – 3p – Recent Improvements to Packed Memory Arrays – Brian Wheatman

3 – 3:30p – How to Design Parallel Data Structures? – Yan Gu

3:30 – 4p – Coffee break

Part 2 – Concurrent data structures

4 – 4:30p – Concurrent Data Structures in Persistent Memory – Naama Ben-David

4:30 – 5p – External-Memory Dictionaries – Rob Johnson

Abstracts

Mining Perfect Hash Functions – Peter Sanders

presentation Download

Minimal perfect hash function bijectively map a set S to
1..|S|. This has various applications (e.g., in bioinformatics, data
bases, and as a building block for other compressed data structures).
Recently, there has been significant progress in practical methods for
constructing such hash functions with space closer and closer to the
information-theoretic minimum of 1.44 bits per key. The talk presents
some of these methods that explore the tradeoff between exponential time
brute force construction and polynomial time methods that need
additional space. Using parallelism on multiple scales (words, SIMD
lanes, cores, GPUS) plays an important role.

Batch-Dynamic Algorithms in the Parallel and Concurrent Models – Quanquan C. Liu

Due to the rapidly changing nature of real-world graphs, it is crucial to design dynamic algorithms that efficiently maintain graph statistics upon updates, since the cost of recomputation from scratch can be too large on graphs with millions or billions of connections. Furthermore, since often many changes happen in a very short span of time, we can improve performance by using parallelism to process batches of updates instead of a single update at a time. This talk presents new graph algorithms in this parallel batch-dynamic setting for a variety of problems including k-core decomposition, subgraph counting, and densest subgraphs. I will also discuss a new concurrent version of the batch-dynamic model where reads are allowed to occur concurrently with synchronous updates.

Recent Improvements to Packed Memory Arrays – Brian Wheatman

Recent-Improvements-to-PMA Download

A Packed Memory Array (PMA) is a dynamic data structure that stores N elements in O(N) space. It supports reasonably efficient updates (inserts and deletes) and extremely efficient scans. However, traditional PMAs are theoretically (and often empirically) dominated by B-Trees in many situations. This talk will present recent advancements that enable PMAs to surpass B-Trees in various scenarios, positioning them as a superior choice for ordered sets.

First, we will discuss optimizations to searching in a PMA, showing how PMAs can asymptotically match and empirically outperform B-Trees. Then we will review improvements to updating a PMA with improvements in serial insert as well as a new parallel batch insert algorithm for PMAs. Lastly, we will explore techniques that reduce the space usage for PMAs, which both decrease the memory usage and increase the throughput of many operations due to the decreased memory bandwidth.

How to Design Parallel Data Structures? – Yan Gu

Parallel data structures differ from their sequential and concurrent counterparts in various ways. This talk will summarize (from the speaker’s perspective) the key differences and design challenges in parallel data structures. The talk will use data structures for priority queue as an example, but will also overview the speaker’s recent work on search trees, hash bags, vEB-trees, cover trees, and high-dimensional data structures such as kd-trees and quad/octrees.

Concurrent Data Structures in Persistent Memory – Naama Ben-David

Non-volatile random access memory (NVRAM) offers byte-addressable persistence at speeds comparable to DRAM. However, with caches remaining volatile, automatic cache evictions can reorder updates to memory, potentially leaving persistent memory in an inconsistent state upon a system crash. Flush and fence instructions can be used to force ordering among updates, but are expensive. In this talk, I will discuss general techniques for designing correct and efficient persistent concurrent data structures.

External-Memory Dictionaries – Rob Johnson

Slides Download

Dictionaries, which allow users to store and retrieve key-value pairs, are fundamental and widely used data structures. They are so important that they are built into many programming languages. External-memory dictionaries are used to store, on disk, data sets that are too large to fit in RAM. In 2011, Iacono and Patrascu gave a lower bound on the trade-off between insertion and query costs in external-memory dictionaries and presented a data structure meeting their lower bound. Later, Conway, Farach-Colton and Shilane gave simpler constructions. However, none of these optimal dictionaries supported successor queries. This talk will describe the mapped B^e-tree, a data structure that meets the Iacono-Patrascu lower bound while supporting efficient successor queries. Furthermore, whereas previous designs were fundamentally based on hashing, the mapped B^e-tree is essentially an atomic-key comparison-based dictionary with a little bit of hashing stuck on the side, suggesting an alternative approach to designing external-memory dictionaries that meet the Iacono-Patrascu bound. The talk will also describe SplinterDB, a high-performance and highly concurrent mapped B^e-tree implementation.

Code of Conduct

The workshop follows the ACM Policy on Harassment and Discrimination.

Workshop on Recent Advances in Parallel and Concurrent Data Structures

Georgia Institute of Technology