Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also quite recent are Ribbon filters: https://arxiv.org/abs/2103.02515

But these, and xor/binary-fuse-filters are only direct replacements for Bloom filters in some problems. There are still problems for which Bloom filters are a better solution.

Bloom filters have the advantage that the filter produced by adding two elements, is the same as the bitwise-or of a pair of filters made by adding each element separately.

    bloom([x, y]) == bloom([x]) | bloom([y])
Also, the filter produced by bitwise-and, `bloom([x]) & bloom([y])`, cannot have any bits set in it which are not also set by `bloom([x, y])`. We can assert that

    bloom([x]) & bloom([x, y]) == bloom([x])
    bloom([y]) & bloom([x, y]) == bloom([y])
    (bloom([x]) & bloom([y])) | bloom([x, y]) == bloom([x, y])
There are practical applications that this can provide which are not satisfied by the "optimized" variants. The bitwise-or does set union (or least upper bound), and the bitwise-and does a set intersection (or greatest lower bound), though the filter produced by `bloom([x, y])` has a higher false positive rate than the filter produced by `bloom([x]) & bloom([y])`.

           bloom([x]) | bloom([y])              == bloom([x, y])
          🡥                       🡤
    bloom([x])                   bloom([y])
          🡤                       🡥
           bloom([x]) & bloom([y])


Also based on Ribbon, the BuRR filters: https://arxiv.org/pdf/2109.01892 which seem to get very close to the theoretical optimum, 0.1%-0.5% overhead comparatively




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: