PSPACE and Decision Problems in AI Verification

The verification problem for neural networks is fundamentally harder than most practitioners realize, and this gap between perceived and actual complexity is where real progress stalls.

When we ask whether a trained model will behave correctly under adversarial perturbation, or whether a reinforcement learning agent will satisfy a safety constraint across all possible states, we are asking decision questions that live in PSPACE—the class of problems solvable by a Turing machine using polynomial space, regardless of time. This is not a theoretical curiosity. It is a structural fact about what we are trying to compute, and it reshapes what verification can actually deliver.

The standard framing treats verification as a search problem: find a counterexample, or prove none exists. This intuition works well for small, bounded domains. But as the state space grows—and in continuous control or high-dimensional input spaces it grows exponentially—the problem shifts. We are no longer searching a space we can enumerate. We are asking whether a property holds across an infinite or astronomically large set of configurations. That question is PSPACE-complete for many natural formulations, particularly when the property itself is expressed in a language like linear temporal logic or when the system involves nested quantification over states and actions.

What makes this distinction matter is what it implies about algorithms. A problem in PSPACE can be solved with polynomial memory but potentially exponential time. A problem in NP—the class most AI researchers intuitively reach for—requires only polynomial time to verify a solution, but may require exponential time to find one. PSPACE is strictly larger. There exist problems in PSPACE that are not believed to be in NP. For verification, this means that some safety properties cannot be checked by simply guessing a certificate and verifying it quickly. The structure of the problem itself demands a different kind of reasoning.

Consider a concrete case: verifying that a neural network controller maintains stability across a continuous state space under bounded disturbances. The property is naturally expressed with alternating quantifiers—for all states, there exists a control action, for all disturbances, the system remains safe. This alternation is precisely what pushes the problem into PSPACE. No polynomial-time algorithm is known for such problems, and there is strong evidence none exists.

The practical consequence is that practitioners often implicitly retreat to tractable subproblems. We verify over finite abstractions. We use incomplete methods that find bugs but cannot prove absence of bugs. We bound the depth of reasoning or the size of the state space we consider. These are not failures of engineering—they are rational responses to computational hardness. But they are also not solutions. They are deferrals.

What changes when you see the PSPACE structure clearly is the framing of what verification can be. It is not a problem waiting for a faster algorithm or a better solver. It is a problem whose hardness is intrinsic to its logical form. This does not mean verification is impossible. It means verification must be selective, approximate, or restricted in scope. It means the question "Is this system safe?" must be replaced with more specific questions: "Is this system safe under these assumptions? In this region of the state space? With respect to this class of properties?"

The second-order insight is that this hardness is not unique to neural networks. It is endemic to the verification of any system with sufficient expressiveness and state complexity. Classical control systems, hybrid automata, and even some discrete reactive systems exhibit the same structure. The problem is not that AI is uniquely hard to verify. It is that verification of complex systems is fundamentally hard, and pretending otherwise through incomplete methods or bounded abstractions only delays the reckoning.

The field needs to stop asking whether we can verify everything and start asking what we can verify, under what constraints, and what that means for deployment. That shift in question is where real progress begins.