Interpretation of label propagation algorithm

Interpretation of label propagation algorithm

The original author writes very well in case he can't find it in the future, so reprint it. Thank the original author! Original link

Social media networks have spread all over the world and are growing every day. For a social network, you know the interests of some people, and you want to predict the interests of others, so that we can carry out targeted marketing activities. Therefore, we can use graph based semi supervised machine learning technology called label propagation. In this article, I will explain the tag propagation process with some examples and sample code.

What is tag propagation?

Label propagation algorithm (LPA) is an iterative algorithm that assigns labels to unlabeled points by propagating labels in the dataset. The algorithm was first proposed by Xiaojin Zhu and Zoubin Ghahramani in 2002. LPA belongs to transformation learning because we want to predict the labels of unlabeled data points that have been given.

Suppose we have a network as shown below, in which there are two label classes "interested in cricket" and "not interested in cricket". So the question is, can we predict whether the rest of us are interested in cricket?

In order for LPA to work in this case, we need to make an assumption that the edges of two nodes connected by edges are similar. In other words, if two people are connected, it means that the two people are likely to have the same interests. We can make this assumption because people tend to associate with people with similar interests.

Random walk in the graph
Consider the example diagram given in Figure 1, where we have 2 label classes (red and green) and 4 colored nodes (2 per class). We want to predict the label of node 4.

We can walk randomly in the graph, starting from node 4 until we encounter any labeled nodes. When we reach a marker node, we stop swimming. Therefore, these labeled nodes are called absorption states. Let's consider all possible paths from node 4. Of all possible paths, the following paths will end with green nodes.

  1. 4 → 9 → 15 → 16
  2. 4 → 9 → 13 → 14
  3. 4 → 9 → 13 → 15 → 16
  4. 4 → 9 → 15 → 13 → 14

The following path ends with a red node.

  1. 4 → 7 → 8
  2. 4 → 7 → 6 → 5 → 1
  3. 4 → 5 → 1
  4. 4 → 5 → 6 → 7 → 8
  5. 4 → 2 → 1
    Based on all possible random walks starting from node 4, we can see that most walks end with red nodes. We can paint node 4 red. This is the basic intuition behind LPA.

mathematical formula
X ₗ is the set of labels of all nodes, and Y ₗ is the one hot label of marked data. Assuming that there are {1,..., C} class labels, X ᵤ is an unlabeled vertex. We don't know what y ᵤ is, so y ᵤ is all 0.

We use the following formula to represent random walk.

In matrix form, the equation is as follows:

Figure 3, matrix form of random walk
If we can calculate the probability transition matrix T, we can calculate the label probability of all unlabeled nodes.

How to calculate the probability transfer matrix?

Figure 4, example Figure 2
Consider an example diagram with an absorption state, as shown in Fig. 4. For each node, we need to calculate the probability of jumping to other nodes. When we reach the absorption state, when we are trapped in the absorption state (represented as a self cycle in the figure), the swimming ends. This is an undirected graph, so we can move in any direction.

Assuming that the probability of transferring from a node to its neighbors is equal, we can write T as:


Figure 5. Example matrix of Figure 2
The probability from node 1 to node 1 is 1 because node 1 is an absorption state. From node 1, we cannot reach any other nodes. Therefore, the probability of reaching other nodes from node 1 is 0. The same method applies to node 2.

From node 4, you can go to nodes 1, 3, and 5. Therefore, the probability of moving from node 4 to nodes 1, 3 and 5 is equal, and the probability of each node is 0.33. Similarly, from node 5, we can move to nodes 4 and 6 with a probability of 0.5 per node.

Note that we can use the degree matrix (D) and adjacency matrix (A) of the graph to calculate T, using the following formula.

T = D⁻¹A

Now note that we can decompose matrix T, as shown in Figure 6.

In Figure 6, T can be decomposed into four blocks
T ₗ - probability from labeled node to labeled node
T ₗ ᵤ - probability from labeled node to unlabeled node
T ᵤ - probability from unlabeled node to labeled node
T ᵤ - probability from unlabeled node to unlabeled node
Note: T ₗ is an identity matrix and T ₗ is a zero matrix, because we cannot leave the marked nodes because they are absorption states.

What happens if we multiply matrix T by ourselves t times and then t tends to ∞? You can enter this matrix in MATLAB and get t ¹ ⁰⁰. You will get this result.


Figure 7. T times yourself 100 times
When you increase the number to the power of T, the probability will stop changing (saturated) and get a stable transition probability. Now you can see that only the first two columns contain non-zero values, and the rest are zero.

We can describe it mathematically.


Figure 7. Formula of t power infinity
Get the final answer
Finally, the labeled matrix is like this. We can get the label vector with labeled nodes and the label vector without labeled nodes.

Figure 8. One hot label formula for labeled nodes and unlabeled nodes
Now let's consider example Figure 2 in Figure 4, where we want to predict the label of unlabeled nodes. Using the results of MATLAB, we can get the following labels.


Figure 9. Get the label of the unlabeled node
For each unlabeled node, we assign the class label with the highest probability. However, it can be seen that the probability of red and green in node 5 is equal. Therefore, our final marking diagram will be shown in Figure 10.


Fig. 10. Example of the last marking result of Fig. 2

Sample code

Create diagram:

from igraph import *

node_count = 7

# Create graph
g = Graph()

# Add vertices
g.add_vertices(node_count)

for i in range(len(g.vs)):
    g.vs[i]["id"]= i
    g.vs[i]["label"]= str(i+1)

edges = [(0,3), (2,3), (3,4), (4,5), (5,6), (5,1)]

g.add_edges(edges)

g.simplify(multiple=True, loops=False, combine_edges=None)

out_fig_name = "graph_plot.png"

visual_style = {}

# Define colors for nodes
node_colours = ["red", "green", "grey", "grey", "grey", "grey", "grey"]
g.vs["color"] = node_colours

# Set bbox and margin
visual_style["bbox"] = (500,500)
visual_style["margin"] = 17

# # Scale vertices based on degree
# outdegree = g.outdegree()
visual_style["vertex_size"] = 25

# Set vertex lable size
visual_style["vertex_label_size"] = 8

# Don't curve the edges
visual_style["edge_curved"] = False

# Set the layout
layout_1 = g.layout_fruchterman_reingold()
visual_style["layout"] = layout_1

# Plot the graph
plot(g, out_fig_name, **visual_style)

The degree matrix and inverse are obtained

import numpy as np
from numpy.linalg import inv

D = np.matrix(np.array([[1,0,0,0,0,0,0], [0,1,0,0,0,0,0], [0,0,1,0,0,0,0], [0,0,0,3,0,0,0], [0,0,0,0,2,0,0], [0,0,0,0,0,3,0], [0,0,0,0,0,0,1]]))
Dinv = inv(D)

The adjacency matrix is obtained

A = np.matrix(np.array([[1,0,0,0,0,0,0], [0,1,0,0,0,0,0], [0,0,0,1,0,0,0], [1,0,1,0,1,0,0], [0,0,0,1,0,1,0], [0,1,0,0,1,0,1], [0,0,0,0,0,1,0]]))

The inverse of D is multiplied by A

S = Dinv*A
import sys

def LabelPropagation(T, Y, diff, max_iter, labelled):
    
    # Initialize
    Y_init = Y
    Y1 = Y
    
    # Initialize convergence parameters
    n=0
    current_diff = sys.maxsize
    
    # Iterate till difference reduces below diff or till the maximum number of iterations is reached
    while current_diff > diff or n < max_iter:
        
        current_diff = 0.0
        # Set Y(t)
        Y0 = Y1
        
        # Calculate Y(t+1)
        Y1 = T*Y0
        
        # Clamp labelled data
        for i in range(Y_init.shape[0]):
            if i in labelled:
                for j in range(Y_init.shape[1]):
                    if i!=j:
                        Y1.A[i][j] = Y_init.A[i][j]
        
        # Get difference between values of Y(t+1) and Y(t)
        for i in range(Y1.shape[0]):
            for j in range(Y1.shape[1]):
                current_diff += abs(Y1.A[i][j] - Y0.A[i][j])
        
        n += 1
        
    return Y1

Run the label propagation algorithm:

%%time
Y = np.matrix(np.array([[1,0], [0,1], [0,0], [0,0], [0,0], [0,0], [0,0]]))
L = LabelPropagation(S, Y, 0.0001, 100, [0,1])

Label of unlabeled node:

L.argmax(1)

Last thought

LPA uses the labels of labeled nodes as the basis and attempts to predict the labels of unlabeled nodes. However, if the initial tag is wrong, this may affect the tag propagation process, and the wrong tag may be propagated. In order to solve this problem, we introduce label extension to learn not only the labels of unlabeled nodes, but also the labels of labeled nodes. This is also an application of label correction. You can learn more about label dissemination from the article Learning with Local and Global Consistency by Dengyong Zhou et al.

Tags: Algorithm Deep Learning

Posted on Tue, 28 Sep 2021 16:31:11 -0400 by dpiland