P vs. NP

These are "decision problems" YES is indicated by the Turing Machine halting.  NO is indicated by the Turing Machine infinite looping.

A decision problem is in P if all YES answers are determined on or before p(n) steps, where p is a polynomial and n is the length of the input.

A decision problem is in NP if all YES answers are determined on or before p(n) steps, where p is a polynomial and n is the length of the input, on a nondeterministic Turing Machine.

On a NTM, there may be multiple transitions from any state, character combination.  If there is a path that leads to a halt state, then the NTM machine is said to accept.  (Even if there are a billion other paths that lead to an infinite loop).  We say that the "time" taken by the NTM is the SMALLEST number of steps that leads to a halt state on that input.

Classic problem in NP is Hamiltonian Cycle.  Given a graph G = (V,E).  (A list of vertices and a list of edges).  Is there a cycle of edges through the graph that visits every vertex exactly once before returning to its starting point?  (A-->B-->C-->D-->A).

Suppose I have a graph:  (A,B) (A,C) (A,D).  There is no Hamiltonian Cycle.

This graph can be pictured as vertex a in the middle of a "triangle" of vertices B, C, and D.  B, C, D are all connected to A but not directly connected to each other.

The only real way we know how to do this is to try every possible combination:  every possible order of edges and see if it forms a Hamiltonian Cycle.

A-->B-->C-->D-->A
A-->B-->D-->C-->A
A-->C-->B-->D-->A
A-->C-->D-->B-->A
A-->D-->B-->C-->A
A-->D-->C-->B-->A

Try all 6.  If any of them actually work, hooray, it's a Hamiltonian Cycle.  If none of them work, it's not.

On a regular Turing Machine, this is what you have to do, try all (n-1)! permutations of vertices and see if there are edges connecting them.  In the above example, none of them will work.

But on a NTM, since each <state,character> combination can have multiple transitions, you can have one execution order trying A-->B-->C-->D-->A  another execution order trying A-->B-->D-->C-->A, etc.  If an HC is found, it halts immediately, and so there is a path that leads to halting, so that we say the NTM accepts this graph, no matter how many other execution orders don't halt.

However, all each individual execution path is doing is trying its own combination and simply verifying whether it works or not, which is pretty fast.

The way some people look at NP problems is that they call them "polynomial time verification problems".  Can you verify whether a possible answer is an answer in polynomial time....because that's usually how these NTM programs work.  With different threads trying different possibilities, and halting if their possibility works.

Another example is PARTITION.

Given a set of positive integers (usually described as a "mapping" because this set is allowed to have repeats, something sets generally are not allowed to have), can you split the set into two sets, the sum of whose elements equal each other.

(1,3,4,6)  gives an answer of "YES" because 1+6 = 3+4.
(2,2,2,2,2) gives an answer of "NO" because each half would have to add up to five, and you can't do that if all you have are 2's.

This set is in NP because it's really to test whether one specific split works:  do the first and second sets sum to the same answer?

A whole lot of cool problems are in NP...but who cares?  a NTM is not a real thing.  We can't possibly one or anything approximating one.  What we really want is for these problems to be in P, the class of problems that can be solved in polynomial time on a regular TM.

P is a subset of NP, obviously.  If a problem can be solved in polynomial time, it can certainly be solved non-deterministically in polynomial time.

Is NP a subset of P?  If it is, then P = NP.  However, this is an unsolved problem in theoretical Computer Science.

ax^2 + bx + c = 0:  x = (-b (+/-) sqrt (b^2-4ac))/(2a)

If you invent a symbol for the solution to x^5 + x - a = 0
(We have the square root symbol for x^2-a=0), then you can have a quintic formula.

This is where NP-completeness comes in.  Don't confuse NP-complete with NP-hard.  They are similar concepts but not identical.  NP-complete only applies to decision problems (YES/NO), whereas NP-hard applies to any problem.

A decision problem is NP-Complete if
a) the problem is in NP.
b) EVERY problem in NP "reduces" to that problem.

A reduces to B if I can find a polynomial time algorithm that runs on a REGULAR TM that always halts and converts valid input for problem A into valid input for problem B so that the generated input gives a YES answer for B when and only when the original input gives a YES answer for A.

Here's how NP-Completeness works.  I have an problem X that I know to be NP-Complete.  I'm super smart and I actually prove that X is in P, meaning that I can solve X in polynomial time on a REGULAR TM.

I can now prove that ANY problem in NP is now in P.

Let Y be ANY problem in NP.  Because X is NP-Complete, Y must reduce to X ... because EVERYTHING in NP reduces to X.  This means that I can translate valid input from Y to X in polynomial time.

So, here's I would solve Y on a regular Turing Machine.

Convert the input to valid input for X (polynomial time)
Run the X program on that input (polynomial time, because I just solved this one).
The X program will halt (if it does) in polynomial time because the reduction guarantees that the answer will be the same as what Y would have given on the original input.

So, I have just solved Y in polynomial time.

The solution of ANY NP-Complete problem in polynomial time means that any problem at all in NP can be solved in polynomial time, so P=NP.

NP-hard is something a little different.

A problem is NP-hard if its solution in polynomial time results in all problems in NP also being solvable in polynomial time.

Suppose I can give you the longest cycle in a graph without repeated vertices.  Program P takes (V,E) and gives you the longest cycle without repeated edges.  This isn't a decision problem (YES/NO).  It gives you actual information.

However, I can use this to solve HAMILTONIAN CYCLE.

My Hamiltonian Cycle program call P on (V,E).  It will receive an answer.  If that answer contains every single vertex in V, then I guess I just have a Hamiltonian Cycle.  This solves HC, which proves NP=P.


In 1971, Steven Cook proved that SATISFIABILITY was NP-Complete.  This is called Cook's Theorem.  Steven Cook reduced EVERY SINGLE problem in NP to SATISFIABILITY.

This SATISFIABILITY (also called SAT):

(a v b v c v d) ^ (e v f v g v h v i) ^ ...

You have a collection of disjunctive clauses, meaning that you have a bunch of booleans variables all ored together (and within this clause, you might have "p" or you might have "not p").  Then all of these disjunctive clauses are anded together.

Is there a possible way to assign true and false to these variables in the expression so that the final answer is true.

For the final answer to be true, all of the conjuncts must be true.  Which means there must be at least one true thing in EACH of the disjunctive clauses.

But what Cook did,and in order to prove, that EVERY problem problem NP reduced to it, what he did is he took the NTM for any arbitary problem in NP, and reduced it to a SAT problem.

Once you have one NP-complete problem then you no longer have to prove that EVERY problem in NP reduces to the new NP-complete problem you want to prove.

If B is a known NP-Complete.  And Problem A is in NP, if you can reduce B to A, then A is NP-Complete.  Because every problem in NP reduces to B and then B reduces to A, so every problem in NP reduces to A through B.

SAT was then reduced to 3SAT.  3SAT is just like SAT except all the disjunctive clauses have exactly three variables.

We have thousands of these NP-Complete problems covering essentially every area of theoretical computer science.  The solution of even one of them, means that P=NP.  The fact that no one has solved any of them suggests that P might not equal NP.

Hamiltonian Cycle takes a graph G= (V,E) and is YES if there's a cycle that visits all nodes in the graph without repeats, returning to its staring point.  This is known to be NP-Complete.

Traveling Salesman is given a list of cities, a list of distances between the cities, and then a total distance.  It returns YES if the the Traveling Salesman can visit all cities exactly once, returning to its starting point and can do so without exceeding the total distance.

The idea is we reduce the one we know is NP-Complete to the one we want to be NP-Complete.

So, we are going to transform input from HC to TSP.

The vertices in our graph will be called cities in TSP.
If there is an edge between nodes in our graph, we will say that the distance between those cities in TSP is 1.
If there is no edge between nodes in our graph, we will say that the distance between those cities in TSP is v+1, where v is the number of vertices.

The total distance in Traveling Salesman will be v.

Now, if HC returns a YES, that means that that there is a cycle, and the edges that form that cycle.  Over in TSP, those same cities are 1 apart from each other because the route corresponds to edges in the HC graph.  So I can visit all v cities in a total distance of v.  So TSP must return a YES.

If TSP returns a YES, that means I can visit all cities within a total distance of v.  That means that I never traveled any of those roads with a distance of v+1.  So I must have only traveled roads that correspond to edges in HC.  So there must a Hamiltonian Cycle and HC returns YES.