Differential Topology and Neural Architecture Search: The Geometry We Ignore

The search for optimal neural architectures treats the space of possible networks as a discrete combinatorial problem, when it is fundamentally a continuous geometric object with structure that mathematics has already mapped.

This distinction matters because it determines what we can actually find. When researchers frame architecture search as a discrete optimization problem—evaluating individual candidates, comparing metrics, selecting winners—they are working blind to the underlying topology. They are searching a landscape without understanding its shape. Differential topology, the study of smooth manifolds and their properties, offers something different: a language for understanding how architectures relate to one another through continuous deformation, what invariants persist across transformations, and which directions in the space of possibilities actually lead somewhere new.

Consider what happens when we perturb a neural network. Add a layer. Remove a skip connection. Widen a bottleneck. These operations feel discrete, but they exist within a continuous space of weight configurations and architectural parameters. The network's function—its input-output behavior—changes smoothly under small perturbations. This smoothness is not incidental. It is the signature of an underlying manifold structure. The space of neural networks with similar functional properties forms a lower-dimensional subspace embedded in the higher-dimensional space of all possible configurations. This is precisely the kind of structure differential topology was designed to characterize.

The problem with current architecture search methods is that they treat this manifold as if it were a featureless hypercube. Grid search, random search, evolutionary algorithms, reinforcement learning-based search—all of these methods are essentially blind sampling strategies. They do not exploit the geometric properties of the space they are searching. They do not recognize that some directions in architecture space are more productive than others, that certain transformations preserve essential properties while others destroy them, or that the space itself has a structure that constrains which architectures can be reached from which starting points.

Differential topology provides concrete tools here. Morse theory tells us that smooth functions on manifolds have critical points whose structure reveals the manifold's topology. Applied to architecture search, this suggests that the performance landscape is not random noise but has genuine structure—saddle points, local minima, and ridges that reflect the underlying geometry of network design. Understanding this structure would let us navigate it rather than stumble through it. Transversality theory tells us when two submanifolds intersect generically, which translates to understanding when architectural constraints are compatible or in conflict. Homology and cohomology give us invariants that persist across continuous deformations, suggesting which architectural properties are truly fundamental and which are accidents of our current parameterization.

The deeper insight is about what we are actually optimizing. Current methods assume that architecture quality is a scalar function we can evaluate at any point. But architecture quality is not a number floating in space—it is a property that emerges from the interaction between network structure, training dynamics, and task structure. These interactions have geometric content. A network's capacity to learn certain functions is constrained by its topology in ways that differential topology can formalize. The set of networks capable of learning a particular function class forms a submanifold. The set of architectures that remain stable under weight perturbation forms another. The intersection of these constraints defines the actually useful region of architecture space.

What changes when you see this clearly is the entire framing of the search problem. Instead of asking "which architecture is best?" you ask "what is the structure of the space of viable architectures?" Instead of evaluating candidates, you characterize the manifold itself. Instead of hoping your search algorithm stumbles into good regions, you use topological properties to navigate deliberately.

This is not abstract mathematics disconnected from practice. It is the difference between searching blindly and searching with a map. The map already exists in differential topology. We have simply not yet learned to read it.