KAN explained

This post aims to demystify one of the most crucial components of Kolmogorov-Arnold Networks (KAN) - the B-spline activation function. While traditional neural networks rely on fixed activation functions like ReLU or sigmoid, KAN leverages the flexibility of B-splines to create adaptive, learnable activations. This approach is not just a minor architectural variation but a fundamental rethinking of how neural networks transform information. By understanding B-splines and their role in KAN, we can better grasp why these networks achieve impressive expressivity with theoretical guarantees. The interactive visualization below enables hands-on exploration of how B-splines work and how adjusting their parameters affects the resulting transformations.

KAN B-Spline Activation Function Visualization

KAN (Kolmogorov-Arnold Networks) utilizes B-spline functions as activation functions to create flexible mappings between inputs and outputs. This post explores the visualization of B-spline basis functions and their role in KAN’s edge mappings.

Mathematical Foundation

The activation function in KAN can be represented as:

\[\phi(x) = x + \sum_{k} s_k B_k(x)\]

Where:

  • \(x\) is the identity mapping
  • \(s_k\) are learnable coefficients
  • \(B_k(x)\) are B-spline basis functions of degree \(p\)

B-spline basis functions are defined recursively using the Cox-de Boor recursion formula:

For \(p = 0\) (zeroth degree):

\[B_{i,0}(u) = \begin{cases} 1 & \text{if } u_i \leq u < u_{i+1} \\ 0 & \text{otherwise} \end{cases}\]

For \(p > 0\) (higher degrees):

\[B_{i,p}(u) = \frac{u - u_i}{u_{i+p} - u_i} \cdot B_{i,p-1}(u) + \frac{u_{i+p+1} - u}{u_{i+p+1} - u_{i+1}} \cdot B_{i+1,p-1}(u)\]

The knot vector \(u = [u_0, u_1, ..., u_m]\) partitions the domain and determines the shape and support of each basis function.

Interactive Visualization

Control Coefficients (sk)

KAN Edge Mapping Function: φ(x) = x + Σ skBk(x)

B-Spline Basis Functions: Bk(x)

Knot Vector:

Instructions:

1. This visualization demonstrates the KAN activation function: φ(x) = x + Σ skBk(x)

2. The top chart shows the total mapping function φ(x) (red line) and each weighted basis function skBk(x) (colored lines)

3. The bottom chart shows each B-spline basis function Bk(x)

4. Drag the sliders on the left to adjust coefficients sk and observe how they affect the total mapping function

5. Click on any coefficient to highlight the corresponding basis function

Key Insights

  1. Flexibility through B-splines: KAN achieves its flexibility by using B-spline basis functions, which provide a smooth and controllable way to transform inputs.

  2. Learnable Parameters: The coefficients (\(s_k\)) are the learnable parameters in KAN, determining how each basis function contributes to the overall transformation.

  3. Locality Property: B-spline basis functions have local support, meaning each one affects only a limited portion of the input domain. This property helps KAN learn complex functions efficiently.

  4. Degree Control: Higher degree B-splines provide smoother functions but with wider support, while lower degree splines offer more localized control.

  5. Theoretical Foundation: The universal approximation capabilities of KAN networks are based on the Kolmogorov-Arnold representation theorem, which states that any multivariate continuous function can be represented as a composition of continuous functions of one variable and addition:

\[f(x_1, x_2, ..., x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_p)\right)\]
  1. Relation to Traditional Neural Networks: Unlike traditional neural networks that use fixed activation functions (like ReLU or sigmoid), KAN uses these adaptive B-spline activations to create more expressive transformations with theoretical guarantees.

This interactive visualization helps understand how B-splines contribute to the expressiveness of KAN models and how adjusting various parameters affects the resulting mapping function.

Written on April 20, 2025