Optimal transport 1: Minimum cost bipartite matching

Marc

2024/10/26

The optimal transport problem is a fascinating area in applied mathematics and optimization. One way to tackle this problem is through minimum cost bipartite matching. In this blog, we will show how this problem can be represented by a bipartite graph where the edges connecting the nodes are weighted by the transportation cost and then we will show step by step how to use minimum cost bipartite matching to solve the problem.

Monge formulation

Let \( (\Omega_{X}, F_{X}, \mu) \) and \( (\Omega_{Y}, F_{Y}, \nu) \) be two probability spaces with \(\mu\) and \(\nu\) discrete measures. The Monge formulation of the optimal transport problem is

$$\min \left\{ \sum_{X} c_{T}(x) \mu(x) : T_{\#}(\mu) = \nu \right\}$$

where \( T_{\#} \) is the pushforward measure of \(\mu\) by \(T\).

If \( X: \Omega_{X} \rightarrow \mathbb{R} \) and \( Y: \Omega_{Y} \rightarrow \mathbb{R} \) are two discrete random variables with \( \Omega_{X} = \Omega_{Y} = [1, \ldots, N] \), this formulation is equivalent to

$$ \begin{array}{c}\text{min}\\ \sigma \in \Pi_{N}\end{array} \left\{ \sum_{i=1}^{N} C_{i},\sigma(i) \right\}$$

where \( \Pi_{N} \) is the set of all the permutation matrices of \( {1, \ldots, N} \text{x} {1, \ldots, N}\), \(\sigma \in \Pi_{N} \) a permutation matrix and \(\sigma(i)\) the ith row of \(\sigma\).

Importing libraries

import numpy as np
import networkx as nx
import math
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use("science")

Bakeries and cafeterias

Suppose we have N bakeries and N cafeterias. A bakery can deliver to only one cafeteria. The cost of the ith bakery supplying the jth cafeteria is given by the following matrix

N = 6
C = np.array([[12., 10., 31., 27., 10., 30.],
              [22.,  7., 25., 15., 11., 14.],
              [19.,  7., 19., 10., 15., 15.],
              [10.,  6., 21., 19., 14., 24.],
              [15., 23., 14., 24., 31., 34.],
              [35., 26., 16.,  9., 34., 15.]])
C.shape
(6, 6)

We want to minimize the total cost of supplying all the N cafeterias. This is a problem of finding an optimal transport map i.e. an application T that maps a bakery to a cafeteria and minimizes the total cost.

The number of possibilities is

$$ \begin{align} \left( \begin{array}{cc}6\1\end{array} \right) * \left( \begin{array}{cc}5\1\end{array} \right) * … * \left( \begin{array}{cc}2\1\end{array} \right) * 1 = 6! \end{align} $$

math.factorial(6)
720

One of these 720 possibilities is for the ith bakery to supply the ith cafeteria. The permutation matrix is \( \sigma = I_{N} \) the identity matrix. The cost is the sum of the diagonal

np.diag(C).sum()
103.0

Instead of testing all the N! possible solutions which is impractical, we formulate the problem as follow

$$ \begin{array}{c}\text{min}\ \sigma \in \Pi_{N}\end{array} { \sum_{i=1}^{N} C_{i},\sigma(i) } $$

where \(C\) is the cost and \(\sigma\) is a permutation matrix of \( {1, \ldots, N} \text{x} {1, \ldots, N} \)

Optimal transport as minimum cost bipartite matching

Let’s construct a graph where both the bakeries and the cafeterias are the nodes

g = nx.Graph()
nodes_x = ['x' + str(_x) for _x in list(range(N))]
nodes_y = ['y' + str(_y) for _y in list(range(N))]
print("Nodes x :", *nodes_x)
print("Nodes y :", *nodes_y)
Nodes x : x0 x1 x2 x3 x4 x5
Nodes y : y0 y1 y2 y3 y4 y5

Adding the nodes to the graph

for node in nodes_x + nodes_y:
    g.add_node(node)

Adding the edges

for i in range(N):
    for j in range(N):
        g.add_edge(nodes_x[i], nodes_y[j], weight=C[i, j])

Drawing the bipartite graph

def draw_graph(g, filename = "", layout = "bipartite"):
    fig = plt.figure(figsize=(5, 2))
    if layout == "bipartite" :
        pos = nx.bipartite_layout(g, nodes_x, align = 'horizontal')
    else:
        pos = nx.spring_layout(g, nodes_x)
    nx.draw_networkx_nodes(g, pos, nodelist=nodes_x, node_color = 'silver', node_size = 400)
    nx.draw_networkx_nodes(g, pos, nodelist=nodes_y, node_color = 'skyblue', node_size = 400)
    nx.draw_networkx_labels(g, pos, font_color = 'black')
    nx.draw_networkx_edges(g, pos)
    edge_labels = nx.get_edge_attributes(g, 'weight')
    nx.draw_networkx_edge_labels(g, pos, edge_labels)
    plt.axis('off')
    plt.tight_layout()
    if filename != "":
        plt.savefig(filename)
    plt.show()
draw_graph(g)

png

This is a bipartite graph, its vertices are divided into two disjoint sets \(X = { x_{0}, x_{1}, …, x_{5} }\) and \(Y = { y_{0}, y_{1}, …, y_{5}}\)

The optimal transport problem formulated above is the problem finding a matching in a bipartite graph for which the sum of the weight of the edges is minimum.

Solving the problem

Many polynomial time (in the number of vertices \(N\) ) algorithms for solving this problem exists. The Hopcroft-karp algotihm is implemented in scipy and can be called from networkx to find the minimum bipartite matching of a graph.

res = nx.bipartite.minimum_weight_full_matching(g)
res
{'x1': 'y5',
 'x2': 'y1',
 'x5': 'y3',
 'x4': 'y2',
 'x3': 'y0',
 'x0': 'y4',
 'y5': 'x1',
 'y1': 'x2',
 'y3': 'x5',
 'y2': 'x4',
 'y0': 'x3',
 'y4': 'x0'}

The minimum cost is the sum of the weights

min_cost = 0.
for src, dest in res.items():
    min_cost += g.get_edge_data(src, dest)['weight']
min_cost /= 2

The minimum cost is

min_cost
64.0

Visual checking

fig, ax = plt.subplots(figsize = (4, 4))
ax.matshow(C, cmap = mpl.colors.ListedColormap(["white"]))
ax.set_title("Optimal transport plan")
for i in range(N):
    for j in range(N):
        nodex = nodes_x[i]
        nodey = nodes_y[j]
        if (res[nodex] == nodey):
            color = 'red'
        else:
            color = 'black'
        ax.text(j, i, int(C[i, j]), horizontalalignment = 'center',
                verticalalignment = 'center', weight = 'bold', color = color)

png

If we sum the values in white we have indeed the minimum cost

10 + 7 + 14 + 9 + 10 + 14
64

The solution is for the bakery 3 to supply the cafeteria 0, the bakery 2 to supply the cafeteria 1, the bakery 4 to supply the cafeteria 2 and so on…

The optimal transport matrix is

z = np.zeros((N, N))
for i in range(N):
    for j in range(N):
        x = nodes_x[i]
        y = nodes_y[j]
        z[i,j] = (res[x] == y)
z
array([[0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.]])

z is the permutation matrix that minimizes the transportation cost

$$z = \begin{array}{c}\text{argmin}\ \sigma \in \Pi_{N}\end{array} \sum_{i=1}^{N} C_{i},\sigma(i)$$

The corresponding bipartite graph is

sg = nx.Graph()
nodes_x = ['x' + str(_x) for _x in list(range(N))]
nodes_y = ['y' + str(_y) for _y in list(range(N))]
for node in nodes_x + nodes_y:
    sg.add_node(node)
for i in range(N):
    for j in range(N):
        nodex = nodes_x[i]
        nodey = nodes_y[j]
        if (res[nodex] == nodey):
            sg.add_edge(nodex, nodey, weight=C[i, j])
#draw_graph(sg, layout = "planar")
fig = plt.figure(figsize=(5, 2))
# Layout
pos = nx.spring_layout(sg, k = 4, seed = 10)
# Nodes
nx.draw_networkx_nodes(sg, pos, nodelist=nodes_x, node_color = 'silver', node_size = 400)
nx.draw_networkx_nodes(sg, pos, nodelist=nodes_y, node_color = 'skyblue', node_size = 400)
# Nodes labels
nx.draw_networkx_labels(sg, pos, font_color = 'black')
# Edges
nx.draw_networkx_edges(sg, pos, style = "dashed")
# Edge weights labels
edge_labels = nx.get_edge_attributes(sg, 'weight')
nx.draw_networkx_edge_labels(sg, pos, edge_labels, font_color = 'black')
plt.axis('off')
plt.tight_layout()
plt.show()

png

References