visit
In the previous post, we saw how to solve the all-pairs shortest path problem using the . We also explored several ways to improve the algorithm’s performance by using parallelism and vectorization.
We start by representing a graph (G
) of size N
as matrix (W
) of size N x N
where every cell W[i,j]
contains the weight of an edge from vertex i
to vertex j
(see Picture 1). In cases when there is no edge between vertexes, the cell is set to a special NO_EDGE
value (shown as dark cells in Picture 1).
From now on, we say – W[i,j]
contains a distance between vertexes i
and j
.
Next, we take a vertex k
and iterate through all pairs of vertexes W[i,j]
checking if distance i ⭢ k ⭢ j
is smaller than the currently known distance between i
and j
.
The smallest value is stored in W[i,j]
and step #3 is repeated for the next k
until all vertexes of G
have been used as k
.
algorithm FloydWarshall(W) do
for k = 0 to N - 1 do
for i = 0 to N - 1 do
for j = 0 to N - 1 do
W[i,j] = min(W[i,j], W[i,k] + W[k,j])
end for
end for
end for
end algorithm
When done, every cell W[i,j]
of matrix W
either contains a distance of the shortest path between vertexes i
and j
or a NO_EDGE
value, if there is no path between them.
The essence of the Floyd-Warshall algorithm is a compartment of known distance W[i,j]
with a distance of potentially new path from i
to j
through intermediate vertex k
.
Initially, we know about all graph edges, which gives us the following paths: 0 ⭢ 1 :2
, 0 ⭢ 4 :10
, 1 ⭢ 3 :1
, 2 ⭢ 4 :1
, 3 ⭢ 2 :1
and 3 ⭢ 4 :3
.
I use the “from” ⭢ “to” :”distance” format from the previous post to write down paths.
- Author’s note
We also know there are no edges leading to vertex 0
, so executing an algorithm for k = 0
makes no sense. It is also obvious, there is a single edge (0 ⭢ 1
) leading to from vertex 0
to vertex 1
, which also makes the execution of all i != 0
(the i
is “from” here) quite meaningless and because vertex 1
has edges with 2
and 4
, it also makes no sense to execute algorithms for any j
which isn’t 2
or 4
(the j
is “to” here).
That is why our first step will be to execute an algorithm for k = 1
, i = 0
and j = 2,4
.
Step | Path | Comment |
---|---|---|
1 | 0 ⭢ 1 ⭢ 2 | Path found. Distance = 3 (was nothing) |
2 | 0 ⭢ 1 ⭢ 4 | Path found. Distance = 8 (was 10). |
We have found two paths: a new path (0 ⭢ 1 ⭢ 2
) and a shortcut (0 ⭢ 1 ⭢ 4
). Both go through vertex 1
. If we don’t store this information (the fact we got to 2
and 4
through 1
) somewhere right now, it will be lost forever (and that is quite the opposite of what we want).
So what should we do? We will update matrix W
with new values (see Picture 3a) and also store the value of k
(which is k = 1
) in cells R[0,2]
and R[0,4]
of a new matrix R
of same size as matrix W
but initialized with NO_EDGE
values (see Picture 3b).
Right now, don’t focus on what exactly matrix R
is. Let’s just keep going and execute the algorithm for the next k = 2
.
Here, we will do the same analysis (to understand what combinations make sense to execute) as we did for k = 1,
but this time, we will use matrix W
instead of graph G
. Look at the matrix W
, especially in column #2 and row #2 (Picture 4).
Here you can see, there are two paths to vertex 2
from vertex 0
and from vertex 1
(column #2), and two paths from vertex 2
to vertex 3
and to vertex 4
(row #2). Knowing that, it makes sense to execute the algorithm only for combinations of k = 2
, i = 0,1
and j = 3,4
.
Step | Path | Comment |
---|---|---|
1 | 0 ⭢ 2 ⭢ 3 | Path found. Distance = 4 (was nothing) |
2 | 0 ⭢ 2 ⭢ 4 | Path found. Distance = 6 (was 8) |
3 | 1 ⭢ 2 ⭢ 3 | Path found. Distance = 2 (was nothing) |
4 | 1 ⭢ 2 ⭢ 4 | Path found. Distance = 4 (was 6) |
As we have already done previously, we are updating cells W[0,3]
, W[0,4]
, W[1,3]
, W[1,4]
with new distances and cells R[0,3]
, R[0,4]
, R[1,3]
and R[1,4]
with k = 2
(see Picture 5).
There is only k = 3
left to process (because there are no edges leading from vertex 4
to any other vertex in the graph).
Again, let’s take a look at the matrix W
(Picture 6).
According to the matrix W
, there are three paths to vertex 3
from vertexes 0
, 1
and 2
(column #3), and there is one path from vertex 3
to vertex 4
(row #3). Therefore, we have the following paths to process:
Step | Path | Comment |
---|---|---|
1 | 0 ⭢ 3 ⭢ 4 | Path found. Distance = 5 (was 6) |
2 | 1 ⭢ 3 ⭢ 4 | Path found. Distance = 3 (was 4) |
3 | 2 ⭢ 3 ⭢ 4 | Path found. Distance = 2 (was 3) |
This was the last iteration of the algorithm. All that is left is to update cells W[0,4]
, W[1,4]
, W[2,4]
with new distances and set cells R[0,4]
, R[1,4]
, R[2,4]
to 3
.
As we know from the previous post, matrix W
now contains all pairs of shortest paths in graph G
. But what is stored inside matrix R
?
Every time we found a new shortest path, we were updating the corresponding cell of matrix R
with the current value of k
. While at first, this action might look mysterious it has a very simple meaning – we were storing the vertex, from which we reached the destination vertex: i ⭢ k ⭢ j
(here we are reaching j
directly from k
).
This moment is crucial. Because knowing a vertex we came from allows us to rebuild the route by moving backward (like a lobster) from the vertex j
(the “to”) to vertex i
(the “from”).
Here is a textual description of the algorithm to rebuild the route from i
to j
:
X
.z
from cell R[i,j]
.z
is NO_EDGE
, then the route from i
to j
is found and we should proceed to step #7.z
to a dynamic array X
.R[i,z]
into z
.i
to X.j
to X
.X
contains the route from i
to j
.
- Author’s note
algorithm RebuildRoute(i, j, R)
x = array()
z = R[i,j]
while (z ne NO_EDGE) do
x.prepend(z)
z = R[i,z]
end while
x.prepend(i)
x.append(j)
return x
end algorithm
Let’s try it on our graph G
and rebuild a route from vertex 0
to vertex 4
(see Picture 8).
We start by reading the value from R[0,4]
, which results in 3
. According to the algorithm, this means the route goes to vertex 4
from vertex 3
(shown in BLUE).
Because the value of R[0,4]
isn’t NO_EDGE
, we proceed and read the value of R[0,3]
which results in 2
(shown in GREEN).
Again, because the value of R[0,3]
isn’t NO_EDGE
, we proceed and read the value of R[0,2]
which results in 1
(shown in RED).
At last, we read a value of R[0,1]
, which results in the NO_EDGE
value, meaning, there are no more vertexes except 0
and 4
which contribute to the route. Therefore, the resulting route is: 0 ⭢ 1 ⭢ 2 ⭢ 3 ⭢ 4
which if you look at the graph indeed corresponds to the shortest path from vertex 0
to vertex 4
.
How can we be sure that all data we read from matrix R
belongs to the same path?
- Thoughtful reader
It is a very good question. We are sure because we update matrix R
with a new value when we update the shortest path value in matrix W
. So in the end, every row of matrix R
contains data related to shortest paths. No more, no less.
In the previous post, besides implementing an original version of the Floyd-Warshall algorithm, we have also integrated various optimizations: support for sparse graphs, parallelism, and vectorization, and in the end, we also combined them all.
Extend function signature to include R
matrix as a separate parameter – int[] routes
.
Save k to routes
every time the shortest path is changed (lines: 2 and 14).
public void BaselineWithRoutes(
int[] matrix, int[] routes, int sz)
{
for (var k = 0; k < sz; ++k)
{
for (var i = 0; i < sz; ++i)
{
for (var j = 0; j < sz; ++j)
{
var distance = matrix[i * sz + k] + matrix[k * sz + j];
if (matrix[i * sz + j] > distance)
{
matrix[i * sz + j] = distance;
routes[i * sz + j] = k;
}
}
}
}
}
Extend function signature to include R
matrix as a separate parameter – int[] routes
.
On each iteration, initialize a new vector of k
values (line: 6).
Save k
values vector to routes
every time the shortest path is changed (lines: 31-32).
public void SpartialVectorOptimisationsWithRoutes(
int[] matrix, int[] routes, int sz)
{
for (var k = 0; k < sz; ++k)
{
var k_vec = new Vector<int>(k);
for (var i = 0; i < sz; ++i)
{
if (matrix[i * sz + k] == Constants.NO_EDGE)
{
continue;
}
var ik_vec = new Vector<int>(matrix[i * sz + k]);
var j = 0;
for (; j < sz - Vector<int>.Count; j += Vector<int>.Count)
{
var ij_vec = new Vector<int>(matrix, i * sz + j);
var ikj_vec = new Vector<int>(matrix, k * sz + j) + ik_vec;
var lt_vec = Vector.LessThan(ij_vec, ikj_vec);
if (lt_vec == new Vector<int>(-1))
{
continue;
}
var r_vec = Vector.ConditionalSelect(lt_vec, ij_vec, ikj_vec);
r_vec.CopyTo(matrix, i * sz + j);
var ro_vec = new Vector<int>(routes, i * sz + j);
var rk_vec = Vector.ConditionalSelect(lt_vec, ro_vec, k_vec);
rk_vec.CopyTo(routes, i * sz + j);
}
for (; j < sz; ++j)
{
var distance = matrix[i * sz + k] + matrix[k * sz + j];
if (matrix[i * sz + j] > distance)
{
matrix[i * sz + j] = distance;
routes[i * sz + j] = k;
}
}
}
}
}
A small reminder – Vector.ConditionalSelect
operation returns a new vector where values are equal to the smaller of two corresponding values of input vectors, i.e., if the value of vector lt_vec
is equal to -1
, then the operation selects a value from ij_vec
, otherwise, it selects a value from k_vec
.
- Author’s note
All experiments were executed on the predefined set of graphs used in the previous post: 300, 600, 1200, 2400, and 4800 vertexes.
The source code and experimental graphs are available in the repository on GitHub. The experimental graphs can be found in the Data/Sparse-Graphs.zip
directory. All benchmarks in this post are implemented in the file.
Below are the benchmark results where Baseline
and BaselineWithRoutes
methods correspond to the original version of the algorithm and SpartialVectorOptimisations
and SpartialVectorOptimisationsWithRoutes
methods correspond to a vectorized (with support for sparse graphs) version of the algorithm.
Method | Size | Mean (ms) | Error (ms) | StdDev (ms) |
---|---|---|---|---|
Baseline | 300 | 40.233 | 0.0572 | 0.0477 |
BaselineWithRoutes | 300 | 40.349 | 0.1284 | 0.1201 |
SpartialVectorOptimisations | 300 | 4.472 | 0.0168 | 0.0140 |
SpartialVectorOptimisationsWithRoutes | 300 | 4.517 | 0.0218 | 0.0193 |
Baseline | 600 | 324.637 | 5.6161 | 4.6897 |
BaselineWithRoutes | 600 | 321.173 | 1.4339 | 1.2711 |
SpartialVectorOptimisations | 600 | 32.133 | 0.2151 | 0.1679 |
SpartialVectorOptimisationsWithRoutes | 600 | 34.646 | 0.1286 | 0.1073 |
Baseline | 1200 | 2,656.024 | 6.9640 | 5.8153 |
BaselineWithRoutes | 1200 | 2,657.883 | 8.8814 | 7.4164 |
SpartialVectorOptimisations | 1200 | 361.435 | 2.5877 | 2.2940 |
SpartialVectorOptimisationsWithRoutes | 1200 | 381.625 | 3.6975 | 3.2777 |
Baseline | 2400 | 21,059.519 | 38.2291 | 33.8891 |
BaselineWithRoutes | 2400 | 20,954.852 | 56.4719 | 50.0609 |
SpartialVectorOptimisations | 2400 | 3,029.488 | 12.1528 | 11.3677 |
SpartialVectorOptimisationsWithRoutes | 2400 | 3,079.006 | 8.6125 | 7.1918 |
Baseline |
4800 |
164,869.803 |
547.6675 |
427.5828 |
BaselineWithRoutes |
4800 |
184,305.980 |
210.9535 |
164.6986 |
SpartialVectorOptimisations | 4800 | 21,882.379 | 200.2820 | 177.5448 |
SpartialVectorOptimisationsWithRoutes | 4800 | 21,004.612 | 79.8752 | 70.8073 |
This looks very confusing (and if it is – I strongly advise you to listen to this with and to better understand what tricky things can affect measurements). My best take here is that “confusing” behavior is caused by great CPU cache performance (because graphs aren’t large enough to saturate caches). Partially, this theory is based on the “bold” sample where we can see significant degradation on the largest graph. However, I didn’t verify it.
In general, the benchmark shows that if we aren’t talking about extremely high-performance scenarios and huge graphs, it is okay to integrate route memorization into both algorithms by default (but keep in mind, it will double memory consumption because we need to allocate two matrices W
and R
instead of one).
Implementation of route rebuild algorithms in C# is straightforward and almost completely reflects the previously demonstrated pseudo-code (we use LinkedList<T>
to represent dynamic array):
public static IEnumerable<int> RebuildWithLinkedList(
int[] routes, int sz, int i, int j)
{
var x = new LinkedList<int>();
var z = routes[i * sz + j];
while (z != Constants.NO_EDGE)
{
x.AddFirst(z);
z = routes[i * sz + z];
}
x.AddFirst(i);
x.AddLast(j);
return x;
}
The above algorithm isn’t perfect and definitely can be improved. The simplest improvement we can make is to preallocate an array of sz
size and fill it in reverse order:
public static IEnumerable<int> RebuildWithArray(
int[] routes, int sz, int i, int j)
{
var x = new int[sz];
var y = sz - 1;
// Fill array from the end
x[y--] = j;
var z = routes[i * sz + j];
while (z != Constants.NO_EDGE)
{
x[y--] = z;
z = routes[i * sz + z];
}
x[y] = i;
// Create an array segment from 'y' to the end of the array
//
// This is required because 'sz' (and therefore length of the array x)
// equals to the number of vertices in the graph
//
// So, in cases when route doesn't span through
// all vertices - we need return a segment of the array
return new ArraySegment<int>(x, y, sz - y);
}
While this “optimization” reduces the number of allocations to one, it doesn’t necessarily make the algorithm “faster” or “allocates less memory” than the previous one. The point here is if we need to have a route ordered from i
to j
, we always will have to “reverse” the data we retrieve from the matrix R
, and there is no way to escape it.
public static IEnumerable<int> RebuildWithReverseYield(
int[] routes, int sz, int i, int j)
{
yield return j;
var z = routes[i * sz + j];
while (z != Constants.NO_EDGE)
{
yield return z;
z = routes[i * sz + z];
}
yield return i;
}
There can be many more variants of how this code can be “changed” or “improved.” The important moment here is to remember – any “change” has downsides in terms of memory or CPU cycles.
You can find implementations of all route algorithms on GitHub in the file.
In this post, we had a deep dive into the theory behind the Floyd-Warshall algorithm and have extended it with the possibility of memorizing shortest paths “routes.”Then we have updated C# implementations (original and vectorized) from the previous post. In the end, we have implemented a few versions of the algorithm to rebuild the “routes”.