Indianapolis	0	0
New York	500	100
Atlanta	300.7	-500.3

Indianapolis	New York	1000.3
Indianapolis	Atlanta		1500.7

New York
Atlanta


^^^
File looks something like this.

Print out:

Path from New York to Atlanta:

New York -> Indianapolis 1000.3
Indianapolis -> Atlanta 1500.7

Total:  2501


The A* algorithm, if programmed correctly, will indeed give you the best possible path (not just an approximately best) provided the heuristic is never an overestimate.  Since your heuristic is going the straight Euclidean distance between the two points (sqrt (x^2 + y^2)), it can't be an overestimate UNLESS I use "magic teleportation distances" in my distances table, and I promise not to do that.

A* algorithm:

Start with the start city.
Add all of its neighbors to the "frontier".  But "frontier" actually means "priority queue".  The priority you give each of the neighbors is equal the known distance from the starting point to that neighbor (not just the distance from the neighbor to you, but the distance from the starting point through the intervening nodes to you) added to the "heuristic" which is the euclidean distance from the neighbor to the destination.  So, distance from start added to estimated distance to destination.  If the neighbor is already in the priority queue with a lower priority than what you are trying to add, then don't add the neighbor.  If the neighbor is already in the priority queue with a higher priority, then replace with the newly computed lower priority.

Extract the lowest priority node from the priority queue.  If it's the destination, done.  Otherwise, do the same thing, compute its neighbors, add them to the priority queue.

Is there an easy method for testing whether an element is already in the priority queue?  Perhaps a more detailed listing of the PriorityQueue interface would reveal the answer.  Otherwise you'll have to come up with a way to do this.

One possibility:  don't worry about it.  If you add something to priority queue with a lower priority, it will show up earlier on the retrieval.  If you ever extract something from the priority queue that has already been extracted (you'll have to keep track), throw it out and extract another one.


Running the A* algorithm is probably not the most heinous part of the problem.  What is the most heinous part of the program?  Input processing.  You will also have to keep track of the path.  Easiest way to do this is to have each city know the city that comes immediately before it in the path.

Now, if it were I, and I had all the time in the world, this Priority Queue problem could easily be handled with two red-black trees.  One is keyed on Priority+City, and the other keyed on City, using Priority as data field.  If I want to add a city to the frontier, just look it up in the second tree to see if it's there and what its priority is, if it's lower than mine, fine.  Otherwise I delete that value from both trees, easy because they're red-black trees.  And then add the city and priority to both trees.  When I want to extract the lowest priority city, that's just the first element of the first tree, and that's easy.  Then just delete from both trees.

However, I am not asking you to use the best possible data structures or the most efficient data structures.  All I'm asking is that you use something good enough that you can run the A* algorithm on it.


MORE COMPLEX SEARCHES.  I think of this as "True AI".  I mean, that these don't always give the correct answer.

One such algorithm is called "hill climbing".  The visual imagery is that you are on a plane with hills and valleys.  At any given step, you take the path that leads you up in the steepest way.  The goal is to get to the highest possible point in the plane.  I stop the "hill climbing" algorithm and announce success when I reach a point that has no step that takes me higher.

What is the subtle problem with this algorithm?  It tends to find a local maximum not necessarily the absolute maximum.  If that's OK for your needs, great!  If not, you can run the algorithm a few times with random starting points and take the best answer from these starting points.  You will might not find the real maximum, but you might have a better guess.

Another algorithm that improves upon hill climbing is called "simulated annealing."

The idea is is that it is a lot like hill climbing, but sometimes you do go down, and the trick is this:  Your next step isn't the best step or the steepest step, it's a random step, and sometimes that step takes you down the hill.

Simulated annealing works like this:

current = initialstate
for t = 1 to infinity
	T->schedule (t)  schedule is a slowly decreasing function that will eventually make it to zero.
	if T=0 return current
	next = a randomly selected neighbor of current
	diff = value (next) - value (current)
	if diff > 0 current = next  //random next move is better than my move go with it
	else use a random number generator to pick a number between 0 and 1
	     if that number < e^(diff/T)  then current = next

(in other words, we change to a worse state than we are already at with a probability of e^(diff/T).  Notice that is slowly decreasing.  Since diff is negative, as T gets smaller, diff/T approaches negative infinity, and so e^(diff/T) approaches zero.

As the algorithm proceeds, it is less likely to choose a path that goes downhill.

If the algorithm runs long enough, the odds are really good (almost 100%) that the true maximum will actually be found.