Novel Architecture Makes Neural Networks More Understandable

135

“Neural networks are currently the most powerful tools in artificial intelligence,” said Sebastian Wetzel, a researcher at the Perimeter Institute for Theoretical Physics. “When we scale them up to larger data sets, nothing can compete.”

And yet, all this time, neural networks have had a disadvantage. The basic building block of many of today’s successful networks is known as a multilayer perceptron, or MLP. But despite a string of successes, humans just can’t understand how networks built on these MLPs arrive at their conclusions, or whether there may be some underlying principle that explains those results. The amazing feats that neural networks perform, like those of a magician, are kept secret, hidden behind what’s commonly called a black box.

AI researchers have long wondered if it’s possible for a different kind of network to deliver similarly reliable results in a more transparent way.

An April 2024 study introduced an alternative neural network design, called a Kolmogorov-Arnold network (KAN), that is more transparent yet can also do almost everything a regular neural network can for a certain class of problems. It’s based on a mathematical idea from the mid-20th century that has been rediscovered and reconfigured for deployment in the deep learning era.

Although this innovation is just a few months old, the new design has already attracted widespread interest within research and coding communities. “KANs are more interpretable and may be particularly useful for scientific applications where they can extract scientific rules from data,” said Alan Yuille, a computer scientist at Johns Hopkins University. “[They’re] an exciting, novel alternative to the ubiquitous MLPs.” And researchers are already learning to make the most of their newfound powers.

Fitting the Impossible

A typical neural network works like this: Layers of artificial neurons (or nodes) connect to each other using artificial synapses (or edges). Information passes through each layer, where it is processed and transmitted to the next layer, until it eventually becomes an output. The edges are weighted, so that those with greater weights have more influence than others. During a period known as training, these weights are continually tweaked to get the network’s output closer and closer to the right answer.

A common objective for neural networks is to find a mathematical function, or curve, that best connects certain data points. The closer the network can get to that function, the better its predictions and the more accurate its results. If your neural network models some physical process, the output function will ideally represent an equation describing the physics — the equivalent of a physical law.

For MLPs, there’s a mathematical theorem that tells you how close a network can get to the best possible function. One consequence of this theorem is that an MLP cannot perfectly represent that function.

But KANs, in the right circumstances, can.

KANs go about function fitting — connecting the dots of the network’s output — in a fundamentally different way than MLPs. Instead of relying on edges with numerical weights, KANs use functions. These edge functions are nonlinear, meaning they can represent more complicated curves. They’re also learnable, so they can be tweaked with far greater sensitivity than the simple numerical weights of MLPs.

Yet for the past 35 years, KANs were thought to be fundamentally impractical. A 1989 paper co-authored by Tomaso Poggio, a physicist turned computational neuroscientist at the Massachusetts Institute of Technology, explicitly stated that the mathematical idea at the heart of a KAN is “irrelevant in the context of networks for learning.”

One of Poggio’s concerns goes back to the mathematical concept at the heart of a KAN. In 1957, the mathematicians Andrey Kolmogorov and Vladimir Arnold showed — in separate though complementary papers — that if you have a single mathematical function that uses many variables, you can transform it into a combination of many functions that each have a single variable.

There’s an important catch, however. The single-variable functions the theorem spits out might not be “smooth,” meaning they can have sharp edges like the vertex of a V. That’s a problem for any network that tries to use the theorem to re-create the multivariable function. The simpler, single-variable pieces need to be smooth so that they can learn to bend the right way during training, in order to match the target values.

So KANs looked like a dim prospect — until a cold day this past January, when Ziming Liu, a physics graduate student at MIT, decided to revisit the subject. He and his adviser, the MIT physicist Max Tegmark, had been working on making neural networks more understandable for scientific applications — hoping to offer a peek inside the black box — but things weren’t panning out. In an act of desperation, Liu decided to look into the Kolmogorov-Arnold theorem. “Why not just try it and see how it works, even if people hadn’t given it much attention it in the past?” he asked.

Previous articleHow ‘Embeddings’ Encode What Words Mean — Sort Of
Next articleComputer Scientists Prove That Heat Destroys Quantum Entanglement