An Optimization Problem Related to Bloom Filters With Bit Patterns
P. Damaschke and A. Schliep
In SOFSEM 2018: Theory and Practice of Computer Science, Springer, 10706, 525–538, Jan 2018.
Bloom filters are hash-based data structures for membership queries without false negatives widely used across many application domains. They also have become a central data structure in bioinformatics. In genomics applications and DNA sequencing the number of items and number of queries are frequently measured in the hundreds of billions. Consequently, issues of cache behavior and hash function overhead become a pressing issue. Blocked Bloom filters with bit patterns offer a variant that can better cope with cache misses and reduce the amount of hashing. In this work we state an optimization problem concerning the minimum false positive rate for given numbers of memory bits, stored elements, and patterns. The aim is to initiate the study of pattern designs best suited for the use in Bloom filters. We provide partial results about the structure of optimal solutions and a link to two-stage group testing.
A reprint is available as PDF.
The publication includes results from the following projects or software tools: HTSMethods.
Further publications by Alexander Schliep.