Graph Neural Networks

Adjacency Matrix

Data:
https://www.bilibili.com/video/BV1iT4y1d7zP/?spm_id_from=333.337.search-card.all.click&vd_source=99babd80fd00a8cf96fc562811962382

The number of layers in a graph neural network is the number of layers in the computation graph, not the number of layers in the neural network. If the graph neural network is too deep, then the values of each node will be the same, so it cannot be too deep, generally 2 or 3 layers.

Method to find surrounding nodes of node v (matrix display),

A wavelet matrix, as long as your graph is given, it is determined, and it will appear in many GNN articles later.

In the previous computation graph, node A can only aggregate information from other nodes, but needs to obtain its own information to represent itself.

The two formulas below are equivalent, just that the neighborhood nodes and the node itself are written separately below. But they share a set of weights.

The image below uses two sets of weights for neighborhood nodes and the node itself.

Supervised Learning
Definition of loss function for regression tasks

Definition of loss function for classification tasks

Unsupervised Learning (using the graph's own results as supervision)

I. Summary of Core Knowledge of Graph Neural Networks (GNN)#

1. What does a graph neural network do?#

One-sentence definition:

A graph neural network is a neural network that learns representations of nodes, edges, or entire graphs through repeated "neighbor information passing and aggregation" on graph structures.

It addresses:
How to perform learnable representation learning on non-regular structured data (graphs).

2. Basic Framework of Graph Neural Networks#

Overall Process#

Graph structure + Node features
        ↓
Multi-layer message passing
        ↓
Node / Edge / Graph representations
        ↓
Readout layer (prediction head)
        ↓
Loss function + Backpropagation

3. The Essence of Message Passing#

Each layer of message passing consists of three steps:#

Message Construction
Features of neighboring nodes, transformed through a learnable transformation (linear layer / MLP)
Message Aggregation
Combine all neighbor messages into one (Sum / Mean / Max / Attention)
Node Update
Use the aggregated result to update the current node's representation (neural network)

Where:

Neural networks mainly appear in: message construction and node update

Aggregation itself is usually non-parametric

4. Why use neural networks for message passing?#

If neural networks are not used:

It is just doing fixed neighbor statistics
Equivalent to diffusion/smoothing on the graph
Information will:
- Either explode
- Or disappear
- Ultimately all nodes become the same (over-smoothing)

The significance of using neural networks:

Allows the model to learn
"Which neighbors are important, how to use the information, and how much to change".

5. Differences among Mainstream GNN Models (GCN / GraphSAGE / GAT / R-GCN)#

Conclusion:

They are all message passing models,
The difference lies in: how messages are passed, how they are aggregated, and how neighbors are distinguished.

Model	Main Problem Addressed
GCN	Treats all neighbors equally
GraphSAGE	Too many neighbors, sampling
GAT	Different importance for different neighbors
R-GCN	Different relationship types

6. Normalized Adjacency Matrix in GCN#

Key Formula#

A_{\text{sym}} = D^{-1/2} A D^{-1/2}

Understanding in Layman's Terms#

Adds a "speed limiter" to information propagation on the graph,
Preventing nodes with many neighbors from being too loud.

7. What does "maximum eigenvalue = 1" mean?#

Intuitive Explanation:

Eigenvalue ≈ "amplification factor" of information propagation
Maximum eigenvalue = 1 means:
- Information will not keep increasing (exploding)
- Nor will it keep decreasing (disappearing)

This is the source of numerical stability in GCN.

8. Intuitive Understanding of Eigenvalues#

One-sentence version:

Eigenvalues describe:
The amplification or reduction ratio of an operation on certain "information directions".

In GNN:

Repeatedly multiplying the adjacency matrix = repeatedly propagating information
The maximum eigenvalue controls whether propagation is stable

9. Residual Connection#

Definition#

h^{(k+1)} = h^{(k)} + F(h^{(k)})

Essential Understanding#

The network does not directly learn "results",
But learns "the change on the original representation (residual)".

Role in GNN#

Prevents over-smoothing
Retains the node's own information
Supports deeper networks

10. Why is it called "residual connection"?#

"Residual" comes from the concept in mathematics:

Residual = True value − Current estimated value

In the network:

The network learns output − input
Rather than the output itself

11. Readout Layer (Readout / Head)#

Different Readout Methods for Different Tasks#

Node Task: Directly use node representation
Edge Task: Use a combination of two node representations
Graph Task: Aggregate all nodes again (Sum / Mean / Attention)

12. What is a regression head / prediction head?#

One-sentence translation:

The "head" is the small part of the model responsible for producing the answer.

Backbone: learns representations
Head: transforms representations into task outputs

13. Where are the parameters and loss functions?#

Learnable parameters are mainly in:#

Message construction (linear layer / MLP)
Attention weights (if any)
Node updates
Prediction head

Loss function:#

Defined on the final output
Updates all parameters through backpropagation

II. Key Questions Raised During Your Learning Process#

The following section is very suitable for directly writing into a blog, reflecting your learning path.

1. What is a graph neural network?#

2. What is message passing? Why is it designed this way?#

3. What exactly does message aggregation do?#

4. Why is a neural network used in message passing?#

5. What are the main differences between GCN, GraphSAGE, GAT, and R-GCN?#

6. How is the neural network specifically used in message passing? Can you give examples?#

7. What would happen if we didn't use neural networks and only did message passing?#

8. What is a readout layer? How is it used in different tasks?#

9. Which part of GNN parameters are mainly updated?#

10. Where is the loss function defined in the model?#

11. Why use $D^{-1/2} A D^{-1/2}$ in GCN?#

12. Why does "maximum eigenvalue equals 1" imply numerical stability?#

13. What exactly are eigenvalues? Can they be understood without linear algebra?#

14. What is a residual connection? Why does GNN need it?#

15. Why is it called "residual connection"?#

16. What do the terms regression head / prediction head mean?#

III. Conclusion#

The core of graph neural networks is not in complex formulas, but in a simple idea:
Nodes influence each other through structure, but will not be overwhelmed by the structure.
Message passing is responsible for propagating information, neural networks are responsible for learning rules,
Normalization and residuals are responsible for stabilizing the system.