Skip to content
Snippets Groups Projects
D

DSSQueue

Project ID: 57095

DSSQueue

This project is a practical implementation of the DSSQueue data structure by Li and Golab, which was presented in the following paper:

Detectable Sequential Specifications for Recoverable Shared Objects. Nan Li and Wojciech Golab. DISC 2021. [https://drops.dagstuhl.de/opus/volltexte/2021/14831/pdf/LIPIcs-DISC-2021-29.pdf]

Persistent Multi-Word Compare-and-Swap (PMwCAS) for NVRAM

The codebase is based on PMWCAS [https://github.com/microsoft/pmwcas], particularly Nan Li's fork [https://github.com/lnnn1982/pmwcas].

PMwCAS is a library that allows atomically changing multiple 8-byte words on non-volatile memory in a lock-free manner. It allows developers to easily build lock-free data structures for non-volatile memory and requires no custom recovery logic from the application. More details are described in this paper:

Easy Lock-Free Indexing in Non-Volatile Memory. Tianzheng Wang, Justin Levandoski, and Paul Larson. ICDE 2018.

Environment

The DSSQueue implementation was developed for Linux, and tested on Ubuntu 20.04.

Build

To build on Linux, run the following commands and replace N with the desired level of parallelism:

$ mkdir build
$ cd build
$ cmake ..
$ make -jN

Test Scripts

perf.sh -- It runs the program and creates a CSV file with ADSS and UDSS implementations throughput for the different number of threads. The test suite will do both enqueue and dequeue at each iteration for each thread.

prod-cons.sh -- It runs the program and creates a CSV file with DSS, ADSS, and UDSS implementations throughput for the different number of producers and consumers. The total number of threads is equal to 20.

test.sh -- It runs the program to test the correctness of resolution method.

overhead.sh -- It runs the program and creates a CSV file with the overhead of ADSS implementation.

CLI Arguments

The following parameters are supported:

queue_size -- number of nodes created in each queue before enqueue and dequeue

threads -- number of client threads in the benchmark

mem_map_path -- path to memory-mapped file

num_seconds -- experiment duration in seconds

queue_type -- Whether queue is "DSS", "ADSS", or "UDSS"

test_type -- the test type that the benchmark uses, "performance", "resolution", "producer-consumer", or "overhead"

num_producers -- Number of producers in the producer-consumer example

num_consumers -- Number of consumers in the producer-consumer example

Design choice

  1. Memory Map File

Memory map file, developed by PMDK, is used is this project to store the status of queue, including number of nodes allocated in the pool, which node is free and which is occupied, etc. This is used to recover the queue status even after crash. Further, the file is flushed to memory each time a node status is changed.

  1. Node Status

To better determine which nodes are in use and for further reuse, each node is assigned a status:

  • ready: ready and free to be enqueued
  • used: node in queue
  • removed: node is removed from queue but not reclaimed Further, in each enqueue operation, if there are no "ready" nodes, reclaimation step is performed to collect "removed" nodes and change these status into "ready".
  1. Test Class

This program uses templates to avoid code duplication. Therefore, if you want to implement a new test class, you could follow the steps below:

  • Define a new template class in the benchmark directory, which inherits from QueueTestBase.
  • Explicitly indicate your template classes; in the QueueBenchmark.cpp and your class's source file.
  • Add the name of your test suite to the QueueBenchmark.h, and in the QueueTestMain.cpp invoke the start_benchmark function with your class's name when FLAGS_test_type is the same as your test suite's name.

Benchmark Tests

This code is a simplified version of enqueue and dequeue operation. When test case is initiated, the code works as follows.

  1. Initialize allocator
  2. Set up the environment by checking the existance of memory map file.
  3. If memory map file exist, we load the pool and node status from the memory map file. The following steps are performed when loading the file
    • compute head and tail pointer
    • compute the address for node pool
    • compute address for each queue
    • update local variable that stores these addresses
  4. If the file does not exist, we initialized an empty file specified by the parameter mem_map_path. And the following initialization steps are performed:
    • initialize node pool
    • create nodes inside the node pool
    • initialize queue head and queue tail for each thread
    • initialize a queue for each thread
    • update local variable that stores these addresses
  5. After the set up procedure, initialize the threads and clock (for performance recording).
  6. Before executing operations, make sure the threads are synchronized by using barrier
  7. Protect each thread using epoch manager
  8. Enqueue several nodes before testing the enqueue and dequeue operation (the number of nodes enqueue in this step is determined by queue_size parameter
  9. Unprotect each thread
  10. Now, we are ready to do the trials. More specifically, the we ensure each thread is working the same amount of time. Same as above, protect each thread before execution
  11. This step is determined based on the queue test class:
    • PerformanceTest and OverheadTest: For each trial, one enqueue and one dequeue operation are executed.
    • ProducerConsumerTest: For each trial, one enqueue is executed at the producer and one dequeue operation is executed at the consumer.
  12. After enqueue and dequeue, unprotect each thread
  13. Similar as above, use barrier to ensure the synchronization of thread
  14. After execution, performance related data is printed and program exited successfully