KAN explained

This post aims to demystify one of the most crucial components of Kolmogorov-Arnold Networks (KAN) - the B-spline activation function. While traditional neural networks rely on fixed activation functions like ReLU or sigmoid, KAN leverages the flexibility of B-splines to create adaptive, learnable activations. This approach is not just a minor architectural variation but a fundamental rethinking of how neural networks transform information. By understanding B-splines and their role in KAN, we can better grasp why these networks achieve impressive expressivity with theoretical guarantees. The interactive visualization below enables hands-on exploration of how B-splines work and how adjusting their parameters affects the resulting transformations.

KAN B-Spline Activation Function Visualization

KAN (Kolmogorov-Arnold Networks) utilizes B-spline functions as activation functions to create flexible mappings between inputs and outputs. This post explores the visualization of B-spline basis functions and their role in KAN’s edge mappings.

Mathematical Foundation

The activation function in KAN can be represented as:

\[\phi(x) = x + \sum_{k} s_k B_k(x)\]

Where:

\(x\) is the identity mapping
\(s_k\) are learnable coefficients
\(B_k(x)\) are B-spline basis functions of degree \(p\)

B-spline basis functions are defined recursively using the Cox-de Boor recursion formula:

For \(p = 0\) (zeroth degree):

\[B_{i,0}(u) = \begin{cases} 1 & \text{if } u_i \leq u < u_{i+1} \\ 0 & \text{otherwise} \end{cases}\]

For \(p > 0\) (higher degrees):

\[B_{i,p}(u) = \frac{u - u_i}{u_{i+p} - u_i} \cdot B_{i,p-1}(u) + \frac{u_{i+p+1} - u}{u_{i+p+1} - u_{i+1}} \cdot B_{i+1,p-1}(u)\]

The knot vector \(u = [u_0, u_1, ..., u_m]\) partitions the domain and determines the shape and support of each basis function.

Interactive Visualization

Degree: 3

Number of Basis Functions: 10

Show All Basis Functions

Control Coefficients (s_k)

KAN Edge Mapping Function: φ(x) = x + Σ s_kB_k(x)

B-Spline Basis Functions: B_k(x)

Knot Vector:

Instructions:

1. This visualization demonstrates the KAN activation function: φ(x) = x + Σ s_kB_k(x)

2. The top chart shows the total mapping function φ(x) (red line) and each weighted basis function s_kB_k(x) (colored lines)

3. The bottom chart shows each B-spline basis function B_k(x)

4. Drag the sliders on the left to adjust coefficients s_k and observe how they affect the total mapping function

5. Click on any coefficient to highlight the corresponding basis function

Key Insights

Flexibility through B-splines: KAN achieves its flexibility by using B-spline basis functions, which provide a smooth and controllable way to transform inputs.
Learnable Parameters: The coefficients (\(s_k\)) are the learnable parameters in KAN, determining how each basis function contributes to the overall transformation.
Locality Property: B-spline basis functions have local support, meaning each one affects only a limited portion of the input domain. This property helps KAN learn complex functions efficiently.
Degree Control: Higher degree B-splines provide smoother functions but with wider support, while lower degree splines offer more localized control.
Theoretical Foundation: The universal approximation capabilities of KAN networks are based on the Kolmogorov-Arnold representation theorem, which states that any multivariate continuous function can be represented as a composition of continuous functions of one variable and addition:

\[f(x_1, x_2, ..., x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_p)\right)\]

Relation to Traditional Neural Networks: Unlike traditional neural networks that use fixed activation functions (like ReLU or sigmoid), KAN uses these adaptive B-spline activations to create more expressive transformations with theoretical guarantees.

This interactive visualization helps understand how B-splines contribute to the expressiveness of KAN models and how adjusting various parameters affects the resulting mapping function.

Written on April 20, 2025