Fun with Automated Proof Generation

A proof is not some abstruse paragraph of part-prose part-formula that is the bane of many a young non-math major. (Well, maybe it is that last thing). Really, a proof is just a directed hypergraph, that is, a graph whose edges connect more than just two vertices. Every vertex is a proposition, and every edge is a rule of inference that connects a set of antecedent vertices to a set of consequent vertices. A vertex is “proved” if it is in the target set of at least one sound edge. An edge is “sound” if it every one of its source vertices is proved. Place the hypothesis edge, which has no sources and whose targets are the hypotheses and axioms. And place the conclusion edge whose sources are all the desired conclusions. If a hypergraph can be constructed between the two such that the conclusion edge is sound, then this is a proof! Almost. We surely cannot permit cyclic dependencies. What does a cycle look like in a hypergraph? The whole concept seems clunky and difficult to manage.

Happily, we can pass to an ordinary directed bipartite graph instead. In partition A are all the vertices of the hypergraph. In partition B is a vertex for each edge of the hypergraph. Draw a directed edge from a \in A to b \in B whenever a is in the source set of b, and draw a directed edge from b \in B to a \in A whenever a is in the target set of b. Now, an a is proved if it has any in-edge from a sound b, and b is sound if all its in-edges come from proved a's. Some people call this an and-or graph, with “and” referring to the “all edges” requirement and “or” referring to the “any edge” requirement. Now that we are using an ordinary graph, “cycle” has its ordinary definition.

Fantastic! A proof is a directed, acyclic, bipartite graph connecting hypotheses to conclusions with the and-or requirement. Naturally, we should make computers write proofs for us. Here is my approach, with inspiration taken from sampling-based motion planning techniques:

  1. The proof graph initially has two connected components, with the hypotheses and axioms and the conclusions. Proposition vertices belong to two (non-disjoint) sets: those which we want (maybe need) to prove, and those which we think (maybe know) we can prove.
  2. Take a proposition vertex at random from the “think” set, preferably with no out-edges yet, and randomly choose a rule to apply (like modus ponens or any known theorem) which depends on that vertex as an antecedent.
  3. Place the rule vertex, along with its other antecedent and consequent propositions, checking if they already exist in the graph, and update the status of the and-or requirement as appropriate. Abandon the rule if it would create a cycle.
  4. Now switch to the “want” set; pick a proposition, preferably with no in-edges this time; pick a rule that could help prove it.
  5. Step 3 again.
  6. Repeat 2–5 until the conclusion is proved.

I was able to get this approach working on some extremely small examples. Here is my code repository. Unfortunately, it gets out of hand very quickly. Take modus ponens, for example. If you have P and P \to Q, then you can apply the rule and get Q. But suppose you want Q and try to apply modus ponens backwards (steps 4–5). What is P? You have to come up with it. Do you randomly generate a P from the language and add it to the “want” set? What are the chances that it is even relevant? Do you check everything in the “think” set to see if there exists a pair P and P \to Q for some P? Is there an efficient way to store propositions so you can search for that, knowing what Q is?

It gets worse. This process can cause the proof graph to grow unbounded toward anything but the desired proof. Mathematicians sometimes have an intuition when propositions in the “want” or “think” sets are probably unattainable or completely irrelevant to the proof at hand. That helps to keep the focus pared down, limiting your own options to find the solution faster. How do you encode that intuition into a computer? I like to dream we will get there one day.