Analysis of travel salesman problem

Created:

Introduction

Travel Salesman Problem (TSP) is a very well know computer science problem. The roblem has the peculiarity that is very easy to understand, but it's really hard for computer to solve it. TSP is about finding shortest route that visits every city exactly once and retuns to the starting city.

TSP is usually modeled using graphs. A graph is an ordered pair G = (V, E), where V is a set of vertices and E is a set of edges. Graphs can be drawn using points for vertices and lines for edges.

In the case of TSP every city is a vertex and edges has weight which represents the length between two cities. For this problem all vertex are connected with each other, this is call a complete graph. Having this property guarantees that the cicle exists (with Hamiltonian Cycle you have to find if the cicle exists).

Complexity of the problem

Algorithm time complexity is a way to estimate the running time. There exists several complexity classes.

We need to know at lest two classes:

  • P: The complexity class of decision problems that can be solved on a deterministic Turing machine in polynomial time.
  • NP: The complexity class of decision problems that can be solved on a non-deterministic Turing machine in polynomial time.

TSP is a NP problem and that's why is so difficult for computers. Actually TSP is NP-Hard.

We don't want to dig more in this tutorial, it's important to know that this is a problem you should avoid solving with computers with exact algorithms.

As an example how long it could take: "In March 2005, the travelling salesman problem of visiting all 33,810 points in a circuit board was solved using Concorde TSP Solver: a tour of length 66,048,945 units was found and it was proven that no shorter tour exists. The computation took approximately 15.7 CPU-years" (source).

Travel salesman problem exact solution

If we numerate every vertex with incremental integers, we can build a circuit using the sequence 1-2-3-...-n. In order to search for the best circuit we can generate every possible permutation to find the minimum. The algorithm has a complexity of O(n!) and it will generate invalid circuits, since it will try every permutation bindly.

metric-TSP

If the graph that represents the problem has certain properties, the problem could have some nice features like knowing how good the current best solution is. In particular if the graph has the property "All edge cost are symmetric and fulfill the triangle inequality" then we have what is acalled metric-TSP.

Metric-TSP is a NP-hard problem, but we can use approximation algorithms to get within a certain factor of the optimal answer.

Heuristics

Since the problem is hard for computers it is usually sovled by heuristics. Heuristics are algorithms which try to do the best possible to find the solution, but there is no guarantee that the solution found is the best.

For example for metric-TSP we bould try to code a Heuristic which tries to find a solution with certain quality.

When learning TSP one of the first heuristic shown is greedy, which tries to choose the shortest path while trying to navigate the graph.

Another exact algorithm better then O(n!)

We are going to explain an algorithm that has P(n*2^n) complexity, which is better then O(n!). Remember that any exponential algorithm is very slow.

We have our graph G with the set of vertices V={1,2,3,4,..,n} and the edges E. Suppose we start at j=1 (and we need to finish at this same vertex). For all vertex i (i!=1), we find the minimum cost path with 1 as starting point and i as ending point (we don't repeat anyvertex).

We are going to call the cost of this path cost(i).

Let's call dist(i, 1) the function which returns the distance from i to 1. Now given the cost(i), the cost of the cycle would be cost(i) + dist(i, 1).

To finish we need to return the minimum of all cost(i)+disco(i, 1).

Definition: SetCost(s, i) be the cost of the minimum cost path visiting each vertex in set s exactly once, starting at 1 and ending at i.

If size of S is 2, then S must be {1, i},
 SetCost(S, i) = dist(1, i)
Else if size of S is greater than 2.
 SetCost(S, i) = min { SetCost(S-{i}, j) + dis(j, i)} where j belongs to S, j != i and j != 1.