Programming Fundamentals: Dynamic Programming for TSP

What is the meaning of a shortest path? Ahsan ponders...

A shortest path that starts from A and ends at A, through B and C and D and E and F and G.

But sequence between A and A doesn't have to be B to C to D to E to F to G.

Does it really have to end in G, before coming back to A?

No.

Maybe, it ends in B or C or D or E or F, before coming back to A.

It depends on which would lead to the shortest distance.

Let's think again: what are we looking for?

Aside from A, we have a set of cities S = {B,C,D,E,F,G}

We are looking for the overall cost of a path that starts at A and visits each village in the set {B,C,D,E,F,G} before coming back to A.

Let cost(S) be the cost of traveling from A to the cities in S, before coming to A.

If B is the last city to be visited before coming back to A, then

cost(S) = cost(S - B) + dist(B, A)

Here, cost(S-B) is the cost of traveling from A to the cities in S-B, before coming to B.

If C is the last city to be visited before coming back to A, then

cost(S) = cost(S - C) + dist(C, A)

And so on..

The question is which city should be the last before coming back to A.

I can write it all as follows, the sequence of cities to be visited, Ahsan thinks...

The blanks are to be decided..

I need to think the best path that ends in G before coming to A.

The best path that ends in F before coming to A.

ends in E before coming to A.

and so on.

Surely we need to pick the one that will result in with the minimum cost, he strains his head.

In other words,

cost(S) = minimum among the following:

cost(S - B) + dist(B, A)
cost(S - C) + dist(C, A)
cost(S - D) + dist(D, A)
cost(S - E) + dist(E, A)
cost(S - F) + dist(F, A)
cost(S - G) + dist(G, A)

What's in there, the above?

dist(B, A) is, of course, the distance from B back to A.

dist(C, A) from C to A.

And so on.

Ahsan focuses on one of the cost values above..

What's cost(S-G)?

S-G means the set S with G as the last node.

cost(S-G) would then mean a minimum-cost path that starts from A and covers every node in S-{G} once before coming to G.

So what is this value... cost(S-G)?

The path may end in B before going on to G.

The path may end in C before going on to G.

Or it may end in D before coming to G.

Or in E.

Or in F...

Or in G...

In other words, Ahsan articulates:

cost(S-G) = minimum among the following:

cost(S - G - B) + dist(B, G)
cost(S - G - C) + dist(C, G)
cost(S - G - D) + dist(D, G)
cost(S - G - E) + dist(E, G)
cost(S - G - F) + dist(F, G)

or maybe i should write it in this way.. he squibbles

cost(S - {G, B}) + dist(B, G)
cost(S - {G, C}) + dist(C, G)
cost(S - {G, D}) + dist(D, G)
cost(S - {G, E}) + dist(E, G)
cost(S - {G, F}) + dist(F, G)

what am i doing here...

Ahsan glances at the clock.. it's past midnight.

I'm calculating the shortest path for the case whereby we start from A, and the last city before returning back to A is G.

So I have to consider the case where the path before G ends in B.

or ends in C.

or D..

or E..

or F.

and if you consider the path that ends in B...then we have as follows

All for the sake of finding cost(S - {G, B}), you need the minimum of:

cost(S - {G, B, C}) + dist(C, B)
cost(S - {G, B, D}) + dist(D, B)
cost(S - {G, B, E}) + dist(E, B)
cost(S - {G, B, F}) + dist(F, B)

And if you consider the first option, you will need to calculate cost(S - {G, B, C}).

Which is the minimum of

cost(S - {G, B, C, D}) + dist(D, C)
cost(S - {G, B, C, E}) + dist(E, C)
cost(S - {G, B, C, F}) + dist(F, C)

You will just keep continuing, right?

Recursively..

To compute cost(S - {G, B, C, D}), you would need the minimum of

cost(S - {G, B, C, D, E}) + dist(E, D)
cost(S - {G, B, C, D, F}) + dist(F, D)

We are going recursively...

At each recursive iteration, we look for the subpath with the smallest distance...

Ahsan beams with delight.

It all seems set then.

TSP can be solved recursively.

Is it better than a plain brute force method though?

No.. same number of evaluations eventually.

Same number of possibilities...

He moans..

So no advantage..

No, just cleaner implementation probably.

But wait..

Are there repetitions in the recursive calculations?

Clearer if we put it in the form of a tree diagram, he thinks..

At a certain level, there will start to be duplicated computations.

You can see it above.

cost(S-{B,D,E,F}) is the cost for a set in that goes from A, traversing elements in a set that excludes B, D and E, and then coming to F followed by E.

cost(S-{D,B,E,F}) is the cost for a set in that goes from A, traversing elements in a set that excludes B, D and E, then coming to F followed by E.

To fix this repetitions, to eliminate it, we can do as follows: just do it all bottom up.

The fancy name for this: dynamic programming.

It's just recursion turned upside down..

A Sketch of the Python Code

What code I have largely a simplification of the one in https://gist.github.com/mlalevic/6222750.

Suppose we have cities with the following coordinates:

[[3.0, 3.0], [4.0, 5.0], [6.0, 9.0], [0.0, 7.0], [9.0, 9.0], [3.0, 10.0]]

The adjacency matrix, let's name the variable as all_distances, can be derived to be as follows (just use calculator or worksheet or punch in a simple python code :-):

[[ 0. 2.236 6.708 5. 8.485 7. ]

[ 2.236 0. 4.472 4.472 6.403 5.099 ]

[ 6.708 4.472 0. 6.324 3. 3.162 ]

[ 5. 4.472 6.324 0. 9.220 4.243 ]

[ 8.485 6.403 3. 9.220 0. 6.083 ]

[ 7. 5.099 3.162 4.243 6.083 0. ]]

So how will the dynamic programming work..

First build all subsets of size 2, with each starting from city 0 - whatever your first city

In our case here, it is [3.0, 3.0].

Build the list of subsets as a dictionary (https://www.tutorialspoint.com/python/python_dictionary.htm), the key being a tuple of the form: (subset, last_node)

and the value being a tuple of the form: (distance, path)

So the dictionary will be of the form:

{ (subset, last_node): (distance, path), (subset, last_node): (distance, path), ... }

In more details..

You already have an adjacency matrix, all_distances.

Initialize a blank dictionary A

A = {}

Iterate through the distances from 0 as follows, using enumerate (http://book.pythontips.com/en/latest/enumerate.html)

for idx,dist in enumerate(all_distances[0][1:]):

The pairs of idx and dist produced by the loop will be the following:

0 , 2.23606798

1 , 6.70820393

2 , 5.0

3 , 8.48528137

4 , 7.0

Each time in the loop, create a set (Python frozenset: https://www.programiz.com/python-programming/methods/built-in/frozenset),

subset = frozenset([0, idx+1])

Each set will just be of the form {0, index}.

In the code, index is idx + 1, since we want the index to start at 1, not 0 (that's the source).

The last city in the set is just idx+1

lastCity = idx+1

The path involves just 2 nodes, 0 and idx+1.

thePath = [0,idx+1]

Now build a dictionary entry for A:

A[(subset, lastCity)] = (dist, thePath)

The overall initial code as follows:

A = {}

for idx,dist in enumerate(all_distances[0][1:]):

subset = frozenset([0, idx+1])

lastCity = idx+1

thePath = [0,idx+1]

A[(subset, lastCity)] = (dist, thePath)

Your A will be as follows:

{

(frozenset({0, 1}), 1): (2.236, [0, 1]),

(frozenset({0, 2}), 2): (6.708, [0, 2]),

(frozenset({0, 3}), 3): (5.0, [0, 3]),

(frozenset({0, 4}), 4): (8.485, [0, 4]),

(frozenset({0, 5}), 5): (7.0, [0, 5])

}

Now we are going to incrementally build larger paths..

Loop for subset size m (not including the starting point 0) ranging from 2 to the # cities:

for m in range(2, numcities):

For each subset size, initialize a blank dictionary B:

B = {}

B is going to be our new dictionary, containing paths longer by one at each iteration.

Now create combinations of size m.

We make use of the itertools package (https://docs.python.org/2/library/itertools.html).

for C in itertools.combinations(range(1, numcities), m):

Let S be the combination C with the starting city (0) appended to the front.

S = frozenset(C) | {0}

At m = 2, S will be the following:

frozenset({0, 1, 2})

frozenset({0, 1, 3})

frozenset({0, 1, 4})

frozenset({0, 1, 5})

frozenset({0, 2, 3})

frozenset({0, 2, 4})

frozenset({0, 2, 5})

frozenset({0, 3, 4})

frozenset({0, 3, 5})

frozenset({0, 4, 5})

For each of the combinations above, we are going to find the best sequence..

We shall try sequences ending in every possible city

To do this, iterate over the cities in S, minus the starting city

for j in S - {0}:

At each iteration, we are to build a list containing all path ending in the city

temp_list = []

For every other city in S, not including 0 and j (the city currently being considered)

for k in S:

if k != 0 and k!=j: # for each k which is not 0 and not j

Consider what was the best path (and its distance) that did not include j (the current city), and which ended in k.

How to do this?!

Make use of A !

prevSubsetVal = A[(S-{j}, k)]

kLastDistPrev = prevSubsetVal[0]

kLastPathPrev = prevSubsetVal[1]

Append j to that path.

And add up the cost.

That becomes ur tuple, temp.

temp = (kLastDistPrev + all_distances[k][j], kLastPathPrev + [j])

Build up a list containing every path ending in j

temp_list.append(temp)

Now you can find the min among all combination ending in j

Find that and put the min in the new cost dictionary, B

B[(S, j)] = min(temp_list)

When you are done with computing subsets of size m, update A

A = B

Here's the overall code up to this stage:

cnt = len(all_distances)

for m in range(2, cnt):

print('****')

print ('the subset size, m, is ', m)

B = {}

for C in itertools.combinations(range(1, cnt), m):

S = frozenset(C) | {0}

print('S is ', S)

# consider subsets ending in j

for j in S - {0}:

print('considering subset S of size ', m, ' ending in ',j)

temp_list = []

for k in S:

if k != 0 and k!=j:

prevSubsetVal = A[(S-{j}, k)]

kLastDistPrev = prevSubsetVal[0]

kLastPathPrev = prevSubsetVal[1]

temp = (kLastDistPrev + all_distances[k][j], kLastPathPrev + [j])

temp_list.append(temp)

print('Look for min in ', temp_list)

B[(S, j)] = min(temp_list)

print('Adding subsets ending at j = ', j, ' gives you B =', B)

A = B

print('A, now containing subset of sizes ', m, ' is ', A)

And when you are done up to the final subset size....

We can list out the final possible paths ...
final_list = []

To build this list, iterate over the keys in A

Remember, in A, a key is of the form: (subset, last_node)

and the value: (distance, path)

for d in iter(A):

Add up distance for the last leg (from last visited node to the starting node):

entry = (A[d][0] + all_distances[0][d[1]], A[d][1])

Append (distance, path) to final_list
final_list.append(entry)

Take the minimum among the possible paths:
res = min(final_list)

Job done

Programming Fundamentals

Sunday, March 25, 2018

Dynamic Programming for TSP

No comments:

Post a Comment