Categorical Semantics of Neural Computation
The deepest mistake in formalizing neural systems is treating computation as a sequence of states rather than as a structure of transformations.
We have spent decades building neural networks as if they were state machines—tensors flowing through layers, activations updating, gradients descending. This framing works well enough for engineering. It gets models trained. But it obscures something fundamental: what a neural computation is. The moment you ask whether two different architectures compute the same function, or whether a learned representation is truly compositional, the state-based view collapses. You need to talk about the relationships between computations, not just the computations themselves. You need category theory.
Categorical semantics reframes neural computation as a web of structure-preserving maps between mathematical objects. Instead of asking "what values flow through this network," you ask "what morphisms compose here, and what do they preserve?" A layer is not a function from input space to output space. It is a functor—a structure-respecting transformation between categories. The composition of layers is not sequential application. It is categorical composition, which carries with it all the algebraic properties that make reasoning possible.
This matters because neural networks are not arbitrary functions. They have internal structure. A convolutional layer respects spatial locality. An attention mechanism respects relational structure. A residual connection respects additive decomposition. These are not implementation details. They are semantic properties—properties about what the computation preserves and what it ignores. The categorical view makes these properties explicit and manipulable.
Consider what happens when you try to prove that a neural network learns a compositional representation. In the standard framework, you are stuck. You can measure how well a linear probe separates classes in the learned space. You can compute mutual information. But you cannot formally state what compositionality means. Does it mean the representation factors as a product? Does it mean there exist natural transformations between the representation and some ground-truth symbolic structure? The question itself becomes incoherent without categorical language.
Categorical semantics provides the answer: a representation is compositional if there exists a functor from the category of symbolic structures to the category of learned representations that preserves the relevant morphisms. This is not metaphorical. It is a precise mathematical statement. It can be tested. It can be falsified. It can be used to design architectures that are guaranteed to learn compositional structure.
The same applies to generalization. Why does a network trained on one distribution generalize to another? The standard answer—"it learned features"—is vacuous. The categorical answer is sharper: the network learned a representation that is natural with respect to the relevant transformations. Naturality is a categorical concept. It means that certain diagrams commute. When diagrams commute, structure is preserved across domains. That is generalization.
There is also a practical consequence. Once you have a categorical semantics for a class of neural computations, you can use categorical tools to reason about them. You can construct limits and colimits—formal ways of combining computations while preserving structure. You can define adjoint functors—pairs of transformations that are optimal in a precise sense. You can apply homological algebra to study the "holes" in a learned representation. These are not abstract exercises. They are concrete ways to understand and improve neural systems.
The resistance to this framework is understandable. Category theory has a reputation for abstraction divorced from reality. But the opposite is true here. Categorical semantics is more concrete than the state-based view because it forces you to be explicit about what structure matters. It prevents you from hiding assumptions in notation. It makes the difference between a coincidence and a principle visible.
The future of neural network theory will not be found in deeper empirical scaling or more sophisticated optimization. It will be found in formalizing what neural computation preserves. Category theory is the language for that formalization. The question is not whether it will be adopted, but how long we will continue reasoning about neural systems without it.