Adjacency Matrix
The number of layers in a graph neural network is the number of layers in the computation graph, not the number of layers in the neural network. If the graph neural network is too deep, then the values of each node will be the same, so it cannot be too deep, generally 2 or 3 layers.
Method to find surrounding nodes of node v (matrix display),
A wavelet matrix, as long as your graph is given, it is determined, and it will appear in many GNN articles later.
In the previous computation graph, node A can only aggregate information from other nodes, but needs to obtain its own information to represent itself.
The two formulas below are equivalent, just that the neighborhood nodes and the node itself are written separately below. But they share a set of weights.
The image below uses two sets of weights for neighborhood nodes and the node itself.
Supervised Learning
Definition of loss function for regression tasks
Definition of loss function for classification tasks
Unsupervised Learning (using the graph's own results as supervision)
I. Summary of Core Knowledge of Graph Neural Networks (GNN)#
1. What does a graph neural network do?#
One-sentence definition:
A graph neural network is a neural network that learns representations of nodes, edges, or entire graphs through repeated "neighbor information passing and aggregation" on graph structures.
It addresses:
How to perform learnable representation learning on non-regular structured data (graphs).
2. Basic Framework of Graph Neural Networks#
Overall Process#
Graph structure + Node features
↓
Multi-layer message passing
↓
Node / Edge / Graph representations
↓
Readout layer (prediction head)
↓
Loss function + Backpropagation
3. The Essence of Message Passing#
Each layer of message passing consists of three steps:#
-
Message Construction
Features of neighboring nodes, transformed through a learnable transformation (linear layer / MLP) -
Message Aggregation
Combine all neighbor messages into one (Sum / Mean / Max / Attention) -
Node Update
Use the aggregated result to update the current node's representation (neural network)
Where:
Neural networks mainly appear in: message construction and node update
Aggregation itself is usually non-parametric
4. Why use neural networks for message passing?#
If neural networks are not used:
-
It is just doing fixed neighbor statistics
-
Equivalent to diffusion/smoothing on the graph
-
Information will:
-
Either explode
-
Or disappear
-
Ultimately all nodes become the same (over-smoothing)
-
The significance of using neural networks:
Allows the model to learn
"Which neighbors are important, how to use the information, and how much to change".
5. Differences among Mainstream GNN Models (GCN / GraphSAGE / GAT / R-GCN)#
Conclusion:
They are all message passing models,
The difference lies in: how messages are passed, how they are aggregated, and how neighbors are distinguished.
| Model | Main Problem Addressed |
|---|---|
| GCN | Treats all neighbors equally |
| GraphSAGE | Too many neighbors, sampling |
| GAT | Different importance for different neighbors |
| R-GCN | Different relationship types |
6. Normalized Adjacency Matrix in GCN#
Key Formula#
Understanding in Layman's Terms#
Adds a "speed limiter" to information propagation on the graph,
Preventing nodes with many neighbors from being too loud.
7. What does "maximum eigenvalue = 1" mean?#
Intuitive Explanation:
-
Eigenvalue ≈ "amplification factor" of information propagation
-
Maximum eigenvalue = 1 means:
-
Information will not keep increasing (exploding)
-
Nor will it keep decreasing (disappearing)
-
This is the source of numerical stability in GCN.
8. Intuitive Understanding of Eigenvalues#
One-sentence version:
Eigenvalues describe:
The amplification or reduction ratio of an operation on certain "information directions".
In GNN:
-
Repeatedly multiplying the adjacency matrix = repeatedly propagating information
-
The maximum eigenvalue controls whether propagation is stable
9. Residual Connection#
Definition#
Essential Understanding#
The network does not directly learn "results",
But learns "the change on the original representation (residual)".
Role in GNN#
-
Prevents over-smoothing
-
Retains the node's own information
-
Supports deeper networks
10. Why is it called "residual connection"?#
"Residual" comes from the concept in mathematics:
Residual = True value − Current estimated value
In the network:
-
The network learns output − input
-
Rather than the output itself
11. Readout Layer (Readout / Head)#
Different Readout Methods for Different Tasks#
-
Node Task: Directly use node representation
-
Edge Task: Use a combination of two node representations
-
Graph Task: Aggregate all nodes again (Sum / Mean / Attention)
12. What is a regression head / prediction head?#
One-sentence translation:
The "head" is the small part of the model responsible for producing the answer.
-
Backbone: learns representations
-
Head: transforms representations into task outputs
13. Where are the parameters and loss functions?#
Learnable parameters are mainly in:#
-
Message construction (linear layer / MLP)
-
Attention weights (if any)
-
Node updates
-
Prediction head
Loss function:#
-
Defined on the final output
-
Updates all parameters through backpropagation
II. Key Questions Raised During Your Learning Process#
The following section is very suitable for directly writing into a blog, reflecting your learning path.
1. What is a graph neural network?#
2. What is message passing? Why is it designed this way?#
3. What exactly does message aggregation do?#
4. Why is a neural network used in message passing?#
5. What are the main differences between GCN, GraphSAGE, GAT, and R-GCN?#
6. How is the neural network specifically used in message passing? Can you give examples?#
7. What would happen if we didn't use neural networks and only did message passing?#
8. What is a readout layer? How is it used in different tasks?#
9. Which part of GNN parameters are mainly updated?#
10. Where is the loss function defined in the model?#
11. Why use in GCN?#
12. Why does "maximum eigenvalue equals 1" imply numerical stability?#
13. What exactly are eigenvalues? Can they be understood without linear algebra?#
14. What is a residual connection? Why does GNN need it?#
15. Why is it called "residual connection"?#
16. What do the terms regression head / prediction head mean?#
III. Conclusion#
The core of graph neural networks is not in complex formulas, but in a simple idea:
Nodes influence each other through structure, but will not be overwhelmed by the structure.
Message passing is responsible for propagating information, neural networks are responsible for learning rules,
Normalization and residuals are responsible for stabilizing the system.