NoSQL: Bloom Filters, Compressed Bloom Filters, and Golomb Compressed Sets

| | bookmark | email

Bloom Filters, Compressed Bloom Filters, and Golomb Compressed Sets

A GCS [Golomb Compressed Set] mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.

via NoSQL databases