Practical Analysis of Probabilistic Counters
There exists a class of data structures called probabilistic counters which allow the storage of an approximate count using very little memory. A detailed theoretical analysis of these structures can be found here. These data structures are parameterized by their `range', an increasing sequence s_i. Theoretically, the optimal range is an exponentially growing sequence (giving what we call a Morris counter), but the implemenation used by Redis has the sequence increase only quadratically. More information about the Redis implemenation as well as a quick theoretical analysis of why that implementation is suspect, can be found here (I am RDShah in that thread). I implemented each of the counters for various parameter choices to compare their errors in practice. Below is a summary of the results.
Each plot is a different probabilistic counter. On the x-axis are different values the probabilistic counter was used to count to, and the on y-axis is the ratio of the value returned by the probabilistic counter to the true value of the count. The highlighted area of the plot denotes where 80% of the values of the aformented ratio fall. The counters are deterministic for the first few increments, so those data points are omitted from the plots. Also omitted are the points where the counter would require more than 8 bits. Click on the names of the different counters on the legend on the left to toggle their display.
The probabilistic counters are divided into two categories: the first are the so-called `redis' counters, which are what's currently implemented by Redis. These counters have range roughly s_i ≈ a/2 * i ** 2. The second are the standard Morris counters, with range roughly s_i ≈ (1 + a) ** i. Note this means that some plots end much earler than others. Note that the errors for the Redis counters decrease for larger counts, and are huge for small counts. On the other hand, the error for Morris counters holds roughly constant. Also note that the Morris counters generally can support much higher counts with only 8 bits
Recommendation
I recommend switching to a Morris counter with default parameter a somewhere in the range [0.05, 0.06]. The user should be able to set that parameter to any value in the range 0.03 to 0.1.
Implementation details
Coming soon.
Rikhav Shah
Contact: rikhav(dot)shah(at)berkeley(dot)edu