So, I’ve been reading about streaming algorithms. Seems like the journey to streaming algorithms(aka Algorithms for Big Data) starts with the Flajolet-Martin algorithm.
The Problem
We are given a sequence <u_0, u_1, u_2, u_3, ... , u_n>
of n
elements. Each u_i
comes from the fixed set U
of some finite size. We want to see how many elements are unique.
A simple Python code like below can solve the problem, if we have required memory.