Streaming Algorithms and Sketching: Solving Problems With Sublinear Memory

The assumption that you must store your data to understand it is the first thing you need to unlearn.

For decades, algorithmic thinking has been anchored to a comfortable premise: if you want to compute something meaningful about a dataset, you keep the dataset around. You load it into memory, index it, query it, transform it. This works beautifully when your data fits. It breaks catastrophically when it doesn't. The moment your problem exceeds available RAM—whether that's because you're processing a continuous stream of network packets, analyzing sensor data from millions of IoT devices, or tracking real-time statistics across distributed systems—the classical approach collapses. You cannot afford to store everything. You cannot afford to wait for a second pass. Yet you still need answers.

Streaming algorithms and sketching techniques dissolve this constraint by inverting the problem. Instead of asking "how do I store this data," they ask "what is the minimum information I must retain to answer my question?" The answer, remarkably often, is far less than the data itself.

Consider the frequency estimation problem: you receive a stream of elements and need to identify which items appear most often. A naive approach requires a hash table storing every distinct element and its count—linear memory in the number of unique items. The Count-Min Sketch solves this with logarithmic memory by using a small array of hash functions and counters. You hash each incoming element multiple times, increment the corresponding counters, and later query by taking the minimum value across all hash functions. You lose precision—overestimation is possible—but you gain something more valuable: the ability to process unbounded streams with bounded resources. The trade-off is explicit and tunable.

This is the core insight that separates sketching from traditional data structures. You are not trying to reconstruct the original data. You are not even trying to answer queries with perfect accuracy. You are solving a different problem: answering queries approximately while using dramatically less space. The approximation is not a failure mode—it is the mechanism that makes the algorithm work.

The practical implications are profound. In network monitoring, you cannot store every packet header. But you can sketch the distribution of source IPs, destination ports, and protocol types using a few kilobytes of memory. In database systems, you can estimate cardinality and selectivity without materializing intermediate results. In machine learning, you can compute statistics over data that never fits in memory simultaneously—a necessity when training on datasets larger than any single machine's capacity.

What makes this approach intellectually demanding is that it requires abandoning the certainty of exact computation. Most computer scientists are trained to think in terms of correctness: an algorithm either produces the right answer or it doesn't. Streaming algorithms operate in a different regime. They produce answers with high probability, within bounded error, with specified confidence. This demands different proof techniques, different intuitions about what "solving a problem" means. You must reason about concentration inequalities, hash function properties, and the geometry of high-dimensional spaces.

The algorithms themselves are often elegant precisely because they are constrained. The Morris algorithm estimates the cardinality of a stream using O(log log n) bits—a result that seems to violate information-theoretic intuition until you work through the mathematics. HyperLogLog counts distinct elements in a stream with logarithmic memory and constant-factor error. These are not incremental improvements over naive approaches. They are qualitative shifts in what becomes computationally feasible.

Yet streaming algorithms remain underutilized in practice. Many engineers default to approximate solutions built on top of exact algorithms—sampling data, batching computations, hoping the problem stays small. These approaches work until they don't, usually at the worst possible moment. The alternative is to design systems around the constraints from the beginning, choosing algorithms that were built for streaming from the ground up.

The question is not whether your data will exceed memory. It is whether you will anticipate this when designing your system, or discover it in production.