The learning process over the past few days has gradually clarified my understanding of linear algebra. In this post, I will summarize some key new content.
When dealing with matrices and singular values, the following understanding should be established:
✅ A matrix is a spatial operator.
✅ Singular value decomposition helps you break down the essence of a matrix: rotation → stretching → rotation.
✅ The size ordering of singular values tells you: in which directions the matrix truly has strength, and which directions are ineffective.
1. Orthogonal Matrix#
The core definition of an Orthogonal Matrix
An real matrix is called an orthogonal matrix if it satisfies
Here, is the transpose, and is the -dimensional identity matrix.
The inverse is the transpose
This simplifies calculations and ensures numerical stability.
An orthogonal matrix is a real matrix that "preserves inner products"—it rotates or flips the coordinate system but never stretches or distorts it.
2. Angle Brackets#
Here
is the symbol for the "inner product." In the most common case—real vector space —it is equivalent to the dot product we are familiar with:
3. Matrix Position Exchange#
- Eliminate the left
-
The attached to the left of
-
Multiply both sides by its inverse from the left:
Note:
You must multiply from the left consistently on both sides;
Do not attempt to multiply to the right (that would disrupt the order of multiplication).
- Eliminate the right
-
The attached to the right of
-
Multiply both sides by its inverse from the right:
If is an orthogonal matrix, , then
Why can't the order be reversed?
-
Once you multiply incorrectly, the symbols will "insert" themselves in different positions:
. -
The same operation must be performed symmetrically on both sides of the equation for the equality to hold.
-
This is essentially the same as the order of function composition or coordinate transformation: the order in which transformations are performed must be written in the corresponding positions of the product, and cannot be arbitrarily swapped.
4. Similar Diagonalizable Matrix#
A similar diagonalizable matrix (commonly referred to as a "diagonalizable matrix") means:
There exists an invertible matrix such that
where is a diagonal matrix.
At this point, we say that can be diagonalized through a similarity transformation, or simply that is diagonalizable.
The "mechanical process" of diagonalization
-
Find the eigenvalues: Solve .
-
Find the eigenvectors: For each , solve .
-
Assemble : Arrange the linearly independent eigenvectors as columns in matrix .
-
Obtain : Fill in the corresponding eigenvalues along the diagonal: .
Thus, .
5. Singular Value Decomposition#
Symbol | Meaning |
---|---|
Given real symmetric matrix () | |
Orthogonal matrix: , column vectors are orthogonal and of unit length | |
Diagonal matrix: |
The expression is called orthogonal similarity diagonalization; geometrically, it means "rotate (or mirror) the coordinate system → A only retains independent stretching."
- Why can "real symmetric matrices always be orthogonally diagonalized"?
Spectral Theorem:
For any real symmetric matrix , there exists an orthogonal matrix such that is diagonal, and the diagonal elements are the eigenvalues of .
- Real eigenvalues: Symmetry ensures that all eigenvalues are real numbers.
- Orthogonal eigenvectors: If , the corresponding eigenvectors must be orthogonal.
- Repeated roots can also take orthogonal bases: The same eigenvalue may correspond to multiple vectors; in this case, perform Gram–Schmidt in the subspace they span.
- Step-by-step textual analysis
Step | Explanation |
---|---|
1. Find all eigenvalues and eigenvectors of | Calculate to obtain all ; for each , solve to find the eigenvectors. |
2. Arrange the eigenvalues in a certain order along the diagonal to obtain the diagonal matrix | For example, arrange them in ascending order as . The order does not matter, as long as it is consistent with the order of the column vectors later. |
3. Eigenvectors corresponding to different eigenvalues are orthogonal; for repeated root eigenvectors, use Gram-Schmidt to orthogonalize and normalize | - If , the corresponding vectors are naturally orthogonal, no action needed. |
- If is repeated (geometric multiplicity >1), first take a random set of linearly independent vectors, then perform Gram-Schmidt in that subspace to make them orthogonal and normalize (adjust length to 1). |
| 4. Arrange the modified eigenvectors horizontally according to the order of eigenvalues on the diagonal to obtain the orthogonal matrix | Arrange the modified eigenvectors as columns in according to the order of the diagonal eigenvalues. At this point, , and . |
- A specific small example of
Let
① Find the eigenvalues
② Find the eigenvectors
- :
- :
③ Normalize
④ Assemble and verify
6. Determinant (det · )#
The determinant is an operation that maps a square matrix to a scalar .
This scalar encapsulates the most essential geometric and algebraic information of the matrix: volume scaling factor, invertibility, product of eigenvalues, etc.
Formula
Order | Formula |
---|---|
"Sarrus' rule" or expand along the first row: | |
Core properties (any definition must satisfy) |
Property | Explanation |
---|---|
Multiplicativity | |
Invertibility Criterion | is invertible |
Linearity in Rows/Columns | Each row (or column) is linear in terms of its elements |
Alternating | Swapping two rows (or columns) changes the sign of the determinant |
Diagonal Product | For upper/lower triangular matrices: |
Product of Eigenvalues | (including multiplicities) |
3×3 Hand Calculation Example
Let
Expand along the first row:
In summary
"Taking the determinant" means: collapsing an square matrix into a single number through a set of alternating, linear rules, and this number encodes key information about the matrix's volume scaling, direction, invertibility, and product of eigenvalues.
7. Rank of a Matrix#
What exactly is the "rank" of a matrix?
Equivalent Perspective | Intuitive Explanation |
---|---|
Linear Independence | The number of linearly independent vectors that can be selected from the rows (or columns) is the rank. |
Dimensionality of Space | The dimension of the subspace spanned by the column vectors (column space) = the dimension of the subspace spanned by the row vectors (row space) = rank. |
Full Rank Minor | The order of the largest non-zero determinant in the matrix = rank. |
Singular Values | In SVD , the number of non-zero singular values = rank. |
Linear Independence
Below are three comparative cases using a 3 × 3 small matrix to make the statement "rank = how many linearly independent column (or row) vectors can be selected" clear.
| Matrix $A$ | Column Vectors Written as | Linear Relationship | Rank |
|------------|-----------------------------------------------|-----------|--------|
| |
| All three columns lie on the same line—only 1 independent vector | 1 |
| |
| are not collinear ⇒ 2-dimensional plane; lies in this plane | 2 |
| |
| Any two columns cannot linearly express the third column ⇒ All three columns span the entire | 3 |
How to determine "independence"?
-
Manual Calculation Combine the columns into a matrix and perform elimination → The number of non-zero rows is the rank.
-
Concept If there exist constants $c_1,c_2,c_3$ such that $c_1v_1+c_2v_2+c_3v_3=0$ and not all are 0, the vectors are dependent; otherwise, they are independent.
-
Case 1: $2v_1-v_2=0$ → dependent
-
Case 2: Only $v_3=v_1+v_2$ is dependent, while $v_1,v_2$ are independent
-
Case 3: Any non-trivial combination ≠ 0 → All three vectors are independent
-
In summary: Rank = how much independent information (dimension) this matrix can truly "hold."
8. Low-Rank Approximation#
Why does truncating SVD (low-rank approximation) only require storing k (m+n)+k
numbers?
When the original matrix
is truncated to rank , it can be written as
Block | Shape | Number of Scalars to Save | Explanation |
---|---|---|---|
Left singular vectors: only the first columns are taken | |||
Right singular vectors: similarly | |||
diagonal | Only the singular values on the diagonal are retained |
Adding the three blocks together gives
-
and : each has columns, with each column storing a vector of length equal to the number of rows
numbers. -
: is a diagonal matrix, only requiring those diagonal elements—not .
Therefore, using rank- SVD approximation instead of the original storage, the parameter count reduces from to .
If , the saved space becomes quite considerable.
Lowering rank = reducing information dimension, low-rank storage = reducing parameter count/memory simultaneously
9. Norms#
The "double vertical bars" $|,\cdot,|$ in linear algebra represent norms.
-
For a vector , the most commonly used is the Euclidean norm:
In the figure, represents the sum of the squares of each component of the vector . -
For a matrix , if also written as , it commonly refers to the Frobenius norm: . However, the figures here involve vectors.
In contrast, the single vertical bar typically represents absolute value (scalar) or determinant . Thus, double vertical bars denote the "length" of vectors/matrices, while single vertical bars denote the magnitude of scalars or determinants—different objects and meanings.
Common Euclidean Distance for Vectors -- 2-Norm (L2 Norm)#
import torch
b = torch.tensor([3.0, 4.0])
print(b.norm()) # Outputs 5.0
.norm()
is a method of PyTorch tensors (torch.Tensor).
Common Frobenius Norm for Matrices#
Matrices also have "length"—the commonly used is the Frobenius norm
Name | Notation | Formula (for $A\in\mathbb R^{m\times n}$) | Analogy with Vectors |
---|---|---|---|
Frobenius Norm | $\displaystyle|A|_F$ | $\displaystyle\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}A_{ij}^{2}}$ | Like vector 2-norm $|v|=\sqrt{\sum v_i^2}$ |
1. Why can it also be written as "matrix dot product"
The commonly used inner product in matrix space is
where is the trace operation (sum of diagonal elements).
Taking the inner product of with itself gives:
Thus:
This is the matrix version of —just replacing vector dot product with "trace dot product."
The Frobenius norm is indeed equal to the square root of the sum of the squares of all singular values, that is:
Here:
-
$| A |_F$ is the Frobenius norm of matrix $A$
-
$\sigma_i$ are the singular values of $A$
The Frobenius norm is indeed equal to the square root of the sum of the squares of all singular values
Expanded explanation:
The Frobenius norm is defined as:
But singular value decomposition (SVD) tells us:
where is a diagonal matrix, with singular values on the main diagonal.
Since the Frobenius norm is invariant (unitary orthogonal transformations do not change the norm), we can directly compute:
Thus, ultimately:
Beware of misconceptions
Note:
✅ Not the square root of a single singular value, nor the maximum singular value
✅ Is the sum of the squares of all singular values taken square root
The spectral norm looks at "the direction that stretches the most," while the Frobenius norm sums all energies.
Spectral Norm of a Matrix#
✅ Definition of Spectral Norm
The spectral norm of a matrix is defined as:
In simple terms, it is the maximum value to which matrix stretches a unit vector.
Singular values inherently represent the stretching transformations of the matrix.
It equals the maximum singular value of :
From another perspective: the spectral norm ≈ the maximum length to which a unit vector is stretched after being input into the matrix.
✅ Its relationship with the Frobenius norm
-
Frobenius Norm → Looks at overall energy (sum of squares of matrix elements)
-
Spectral Norm → Looks at the maximum stretching amount in a single direction
In other words:
-
Frobenius is like the total "volume" of the matrix
-
Spectral norm is like the "most extreme" stretching rate in a single direction
✅ Example: Why is it important?
Imagine a linear layer in a neural network:
-
If is very large, even small perturbations in the input will be amplified,
making the network prone to overfitting and sensitive to noise. -
If is moderate, the output changes will be stable against input perturbations,
leading to better generalization.
Thus, modern methods (like spectral normalization)
will directly constrain the spectral norm of within a certain range during training.
⚠ Directly stating the drawbacks
The spectral norm is powerful, but:
-
It only focuses on a single maximum direction, ignoring other directions of stretching;
-
It is more complex to compute than the Frobenius norm (requires singular value decomposition, rather than simple element-wise square sums).
Summary Comparison
Euclidean Norm (2-norm, ‖v‖) | Frobenius Norm (‖A‖F) | |
---|---|---|
Object | Vector | Matrix |
Definition | ||
Equivalent Expression | ||
Geometric Meaning | Length of the vector in -dimensional Euclidean space | Length when viewing matrix elements as a "long vector" |
Unit/Scale | Has the same metric as the coordinate axes | Same for matrices; does not depend on the arrangement of rows and columns |
Common Uses | Error measurement, regularization , distance | Weight decay, matrix approximation error, kernel methods |
Relationship with Spectral Norm | (only one singular value) | ; equal if rank = 1 |
- Same idea, different dimensions
-
The Euclidean norm is the square root of a vector's dot product with itself.
-
The Frobenius norm treats all matrix elements as a long vector and does the same; in matrix language, it can be expressed as
This is "transpose → multiply → take trace."
- When to use which?
Scenario | Recommended Norm | Reason |
---|---|---|
Prediction error, gradient descent | Euclidean (vector residual) | Residuals are naturally column vectors |
Regularization of network weights (Dense / Conv) | Frobenius | Does not care about parameter shape, only overall magnitude |
Comparing matrix approximation quality (SVD, PCA) | Frobenius | Easily corresponds to the sum of squares of singular values |
Stability/Lipschitz bounds | Spectral Norm () | Concerned with stretching rates rather than total energy |
- Intuitive differences
-
Euclidean: Measures the length in a single direction;
-
Frobenius: Measures the total sum of each element's energy, thus for matrices, no particular column or row is special; all elements are treated equally.
One-sentence memory:
Euclidean Norm: The "ruler" for vectors.
Frobenius Norm: Measures the overall size of a matrix as if "flattened" with the same ruler.
10. Transpose of Matrix Multiplication#
In matrix algebra, there is a fixed "reversal order" rule for the transpose of the product of two (or more) matrices:
This means transpose each matrix first, then reverse the order of multiplication.
This property holds for any dimension-matching real (or complex) matrices and can be recursively extended:
xLog Editing Markdown Document Notes
-
Ensure all mathematical expressions are enclosed in
$$
…$$
-
If there are single
$n \times n$
types, change ton × n
or$$n\\times n$$
Reference video: