Theory and Applications of b-Bit Minwise Hashing
Efficient (approximate) computation of set similarity in very large datasets is a common task with many applications in information retrieval and data management. One common approach for this task is minwise hashing.