Persistent Homology Reveals What Neural Networks Actually Learn, Not What We Assume They Learn

Most interpretability work assumes neural networks learn features—discrete, identifiable objects that activate in response to specific inputs. This assumption is wrong, and it has cost the field years of misdirected effort. What networks actually learn are topological structures: the shape of relationships between representations. Persistent homology, a tool from computational topology, makes these shapes visible. It is the most direct window we have into what a network's hidden layers actually contain.

The standard narrative goes like this: a vision model learns edges, then textures, then objects. A language model learns tokens, then syntax, then semantics. These are useful stories for intuition. They are not what the mathematics shows. When you examine the actual geometry of activation space—the way points cluster, how they connect, what holes exist in the distribution—you find something far more interesting and far less interpretable through conventional feature analysis.

Consider what happens when you feed a trained network a batch of inputs. Each input produces a point in some high-dimensional activation space. The network's computation is a trajectory through this space. Most interpretability methods treat this space as a collection of independent dimensions and ask which dimensions matter. Persistent homology asks a different question: what is the shape of this space? Where do points cluster? What topological features persist across scales?

This matters because topology captures something features cannot: the intrinsic structure of the representation, independent of coordinate system. A feature is coordinate-dependent. If you rotate your activation space, features change. Topology does not. A hole in the data remains a hole no matter how you rotate it. A connected component remains connected. These properties are invariant to the arbitrary choices we make about how to embed the representation.

When you apply persistent homology to neural network activations, you compute a filtration—you gradually increase a radius and track which topological features appear and disappear. A feature that appears at small radius and persists to large radius is robust. A feature that flickers in and out is noise. This gives you a principled way to separate signal from artifact, something raw activation analysis cannot do.

The practical insight is this: networks that solve the same task often have radically different feature spaces but similar persistent homology. Two networks trained on MNIST with different architectures will have different individual neurons, different weight distributions, different activation patterns. But their persistent homology—the topological skeleton of their learned representation—is often remarkably similar. This suggests that topology, not features, is what generalizes.

This has immediate implications for understanding what breaks in neural networks. Adversarial examples do not necessarily corrupt features; they corrupt topology. They create holes in the representation space where none should exist, or collapse structures that should be distinct. A network robust to adversarial perturbation maintains its topological structure under small input changes. One that fails does not.

The harder implication is for mechanistic interpretability. If topology is primary and features are secondary, then explaining a network's behavior requires describing its representational geometry, not cataloging which neurons fire. This is harder. It requires thinking in terms of manifolds, fiber bundles, and homology groups rather than activation patterns. But it is also more honest. It acknowledges that neural computation is fundamentally geometric, not symbolic.

Persistent homology is not a complete solution. It tells you about structure but not about meaning. You can know the topology of a representation without understanding what it represents. But it is a necessary step. Before you can interpret what a network learned, you must first see what it actually learned—not the features you expected to find, but the topological reality of its hidden space.

The field has spent years building better feature visualization tools. We need to spend the next years building better topology visualization tools. That is where the real structure lives.

Persistent Homology: Reading Structure from Neural Representations

Persistent Homology Reveals What Neural Networks Actually Learn, Not What We Assume They Learn