Skip to content
  • Jay Zhuang's avatar
    Align compaction output file boundaries to the next level ones (#10655) · f3cc6663
    Jay Zhuang authored
    Summary:
    Try to align the compaction output file boundaries to the next level ones
    (grandparent level), to reduce the level compaction write-amplification.
    
    In level compaction, there are "wasted" data at the beginning and end of the
    output level files. Align the file boundary can avoid such "wasted" compaction.
    With this PR, it tries to align the non-bottommost level file boundaries to its
    next level ones. It may cut file when the file size is large enough (at least
    50% of target_file_size) and not too large (2x target_file_size).
    
    db_bench shows about 12.56% compaction reduction:
    ```
    TEST_TMPDIR=/data/dbbench2 ./db_bench --benchmarks=fillrandom,readrandom -max_background_jobs=12 -num=400000000 -target_file_size_base=33554432
    
    # baseline:
    Flush(GB): cumulative 25.882, interval 7.216
    Cumulative compaction: 285.90 GB write, 162.36 MB/s write, 269.68 GB read, 153.15 MB/s read, 2926.7 seconds
    
    # with this change:
    Flush(GB): cumulative 25.882, interval 7.753
    Cumulative compaction: 249.97 GB write, 141.96 MB/s write, 233.74 GB read, 132.74 MB/s read, 2534.9 seconds
    ```
    
    The compaction simulator shows a similar result (14% with 100G random data).
    As a side effect, with this PR, the SST file size can exceed the
    target_file_size, but is capped at 2x target_file_size. And there will be
    smaller files. Here are file size statistics when loading 100GB with the target
    file size 32MB:
    ```
              baseline      this_PR
    count  1.656000e+03  1.705000e+03
    mean   3.116062e+07  3.028076e+07
    std    7.145242e+06  8.046139e+06
    ```
    
    The feature is enabled by default, to revert to the old behavior disable it
    with `AdvancedColumnFamilyOptions.level_compaction_dynamic_file_size = false`
    
    Also includes https://github.com/facebook/rocksdb/issues/1963 to cut file before skippable grandparent file. Which is for
    use case like user adding 2 or more non-overlapping data range at the same
    time, it can reduce the overlapping of 2 datasets in the lower levels.
    
    Pull Request resolved: https://github.com/facebook/rocksdb/pull/10655
    
    Reviewed By: cbi42
    
    Differential Revision: D39552321
    
    Pulled By: jay-zhuang
    
    fbshipit-source-id: 640d15f159ab0cd973f2426cfc3af266fc8bdde2
    f3cc6663
To find the state of this project's repository at the time of any of these versions, check out the tags.