DSSQueue
This project is a practical implementation of the DSSQueue data structure by Li and Golab, which was presented in the following paper:
Detectable Sequential Specifications for Recoverable Shared Objects. Nan Li and Wojciech Golab. DISC 2021. [https://drops.dagstuhl.de/opus/volltexte/2021/14831/pdf/LIPIcs-DISC-2021-29.pdf]
Persistent Multi-Word Compare-and-Swap (PMwCAS) for NVRAM
The codebase is based on PMWCAS [https://github.com/microsoft/pmwcas], particularly Nan Li's fork [https://github.com/lnnn1982/pmwcas].
PMwCAS is a library that allows atomically changing multiple 8-byte words on non-volatile memory in a lock-free manner. It allows developers to easily build lock-free data structures for non-volatile memory and requires no custom recovery logic from the application. More details are described in this paper:
Easy Lock-Free Indexing in Non-Volatile Memory. Tianzheng Wang, Justin Levandoski, and Paul Larson. ICDE 2018.
Environment
The DSSQueue implementation was developed for Linux, and tested on Ubuntu 20.04.
Build
To build on Linux, run the following commands and replace N with the desired level of parallelism:
$ mkdir build
$ cd build
$ cmake ..
$ make -jN
Test Scripts
perf.sh -- It runs the program and creates a CSV file with ADSS and UDSS implementations throughput for the different number of threads. The test suite will do both enqueue and dequeue at each iteration for each thread.
prod-cons.sh -- It runs the program and creates a CSV file with DSS, ADSS, and UDSS implementations throughput for the different number of producers and consumers. The total number of threads is equal to 20.
test.sh -- It runs the program to test the correctness of resolution method.
overhead.sh -- It runs the program and creates a CSV file with the overhead of ADSS implementation.
CLI Arguments
The following parameters are supported:
queue_size -- number of nodes created in each queue before enqueue and dequeue
threads -- number of client threads in the benchmark
mem_map_path -- path to memory-mapped file
num_seconds -- experiment duration in seconds
queue_type -- Whether queue is "DSS", "ADSS", or "UDSS"
test_type -- the test type that the benchmark uses, "performance", "resolution", "producer-consumer", or "overhead"
num_producers -- Number of producers in the producer-consumer example
num_consumers -- Number of consumers in the producer-consumer example
Design choice
- Memory Map File
Memory map file, developed by PMDK, is used is this project to store the status of queue, including number of nodes allocated in the pool, which node is free and which is occupied, etc. This is used to recover the queue status even after crash. Further, the file is flushed to memory each time a node status is changed.
- Node Status
To better determine which nodes are in use and for further reuse, each node is assigned a status:
- ready: ready and free to be enqueued
- used: node in queue
- removed: node is removed from queue but not reclaimed Further, in each enqueue operation, if there are no "ready" nodes, reclaimation step is performed to collect "removed" nodes and change these status into "ready".
- Test Class
This program uses templates to avoid code duplication. Therefore, if you want to implement a new test class, you could follow the steps below:
- Define a new template class in the
benchmark
directory, which inherits fromQueueTestBase
. - Explicitly indicate your template classes; in the
QueueBenchmark.cpp
and your class's source file. - Add the name of your test suite to the
QueueBenchmark.h
, and in theQueueTestMain.cpp
invoke thestart_benchmark
function with your class's name whenFLAGS_test_type
is the same as your test suite's name.
Benchmark Tests
This code is a simplified version of enqueue and dequeue operation. When test case is initiated, the code works as follows.
- Initialize allocator
- Set up the environment by checking the existance of memory map file.
- If memory map file exist, we load the pool and node status from the memory map file. The following steps are performed when loading the file
- compute head and tail pointer
- compute the address for node pool
- compute address for each queue
- update local variable that stores these addresses
- If the file does not exist, we initialized an empty file specified by the parameter
mem_map_path
. And the following initialization steps are performed:- initialize node pool
- create nodes inside the node pool
- initialize queue head and queue tail for each thread
- initialize a queue for each thread
- update local variable that stores these addresses
- After the set up procedure, initialize the threads and clock (for performance recording).
- Before executing operations, make sure the threads are synchronized by using
barrier
- Protect each thread using epoch manager
- Enqueue several nodes before testing the enqueue and dequeue operation (the number of nodes enqueue in this step is determined by
queue_size
parameter - Unprotect each thread
- Now, we are ready to do the trials. More specifically, the we ensure each thread is working the same amount of time. Same as above, protect each thread before execution
- This step is determined based on the queue test class:
- PerformanceTest and OverheadTest: For each trial, one enqueue and one dequeue operation are executed.
- ProducerConsumerTest: For each trial, one enqueue is executed at the producer and one dequeue operation is executed at the consumer.
- After enqueue and dequeue, unprotect each thread
- Similar as above, use
barrier
to ensure the synchronization of thread - After execution, performance related data is printed and program exited successfully