Circuit Complexity and Neural Network Expressiveness Are Not the Same Problem
The persistent conflation of circuit complexity with neural network expressiveness represents a fundamental category error in how we think about learning systems. One measures the inherent difficulty of computing a function; the other measures whether a parameterized model can approximate it. They are orthogonal concerns, and treating them as interchangeable has led researchers down sterile theoretical paths while obscuring what actually matters about deep learning.
Circuit complexity asks: what is the minimum size of a Boolean circuit needed to compute a specific function? It is a question about the function itself—its intrinsic structure, independent of any learning mechanism. A function might require exponentially large circuits, which tells us something absolute about its computational nature. Neural network expressiveness, by contrast, asks whether a network with a given architecture and parameter budget can represent a target function well enough to be useful. These are not translations of the same question into different formalisms. They are asking about different objects entirely.
The confusion runs deep. Researchers often invoke circuit lower bounds to argue that neural networks "cannot" learn certain functions, as though expressiveness were determined by worst-case circuit complexity. But a function can have exponential circuit complexity while remaining learnable by a neural network of practical size, because learning does not require exact representation. A network needs to approximate the function on a distribution, not compute it exactly on all inputs. The function might have a simple structure on the data manifold even if its worst-case complexity is prohibitive. Circuit complexity tells you nothing about this.
Consider what actually happens during neural network training. The network is not discovering the minimal circuit for a function. It is finding a parameterization—often highly redundant, often exploiting the geometry of the input distribution—that produces correct outputs on training data and generalizes to test data. This is fundamentally a statistical and geometric problem, not a circuit-theoretic one. A function with exponential circuit complexity might be trivially learnable if the data concentrates in a low-dimensional region where the function behaves simply.
The real issue is that circuit complexity is a worst-case measure. It asks: what is the hardest input for this function? Neural networks, by contrast, are optimized for average-case performance on a specific distribution. These are incommensurable frameworks. Invoking circuit lower bounds to constrain neural network expressiveness is like citing the worst-case runtime of quicksort to argue that sorting cannot be done efficiently in practice. The worst case is irrelevant when you have structure.
This matters because it shapes what questions we ask. If we believe circuit complexity determines learnability, we waste effort proving lower bounds that have no bearing on whether networks can learn from data. We miss the actual phenomena: how network architecture interacts with data geometry, how implicit regularization shapes the solutions found, how the optimization landscape enables learning despite non-convexity. These are the questions that explain why deep learning works.
The clarification also reframes what expressiveness results should establish. A meaningful expressiveness theorem for neural networks should specify: given a function class, a data distribution, and an approximation tolerance, what network size suffices to learn it? This is inherently distributional and approximate. It is not asking whether the network can compute the function exactly in the worst case. It is asking whether learning is feasible.
Some recent work has begun moving in this direction, characterizing learnability through the lens of neural tangent kernels, feature learning, and implicit bias. These approaches do not rely on circuit complexity. They ask directly: what does the optimization landscape look like, and what solutions does gradient descent find? This is the right level of analysis.
The distinction matters for how we interpret negative results too. If a circuit lower bound applies to a function, it does not imply that neural networks cannot learn it. It implies only that the function has inherent worst-case complexity. Whether networks can learn it depends on the distribution, the architecture, and the learning algorithm—none of which are constrained by circuit complexity alone.
Expressiveness and complexity are not synonyms. Recognizing this distinction clarifies what we can actually conclude about neural network capabilities and redirects theoretical effort toward questions that bear on practice.