Neural Networks Parallels
White Paper • November 2025

The Astonishing Neuro-Mathematical Parallels Between Human Cognition And Transformer Models

By Eric Moon | MIA Research Foundation
Abstract

Human cognition and modern transformer-based language models appear radically different in their underlying substrates-biological mechanisms versus silicon architecture. However, recent research indicates that both systems operate according to the same fundamental computational principles: nonlinear weighted summation, competitive selection, predictive state propagation, and error-driven adaptation.

In this paper, we derive a Single Unified Master Equation (SUME) that captures the shared mathematical structure of biological neural computation and transformer-based prediction systems. We demonstrate that both systems instantiate a recursive predictive algorithm formalized as: $$x_{t+1} = \sigma\big( W x_t + A(x_t) - \Theta \big)$$ This unified formulation models (1) membrane voltage thresholding and cortical predictive coding in the brain, and (2) transformer attention, softmax selection, and gradient descent in artificial models.

1. Introduction

The fields of neuroscience and machine learning have largely evolved independently. Biological brains operate via ionic conduction and action potentials, while modern artificial intelligence, particularly transformer models, relies on floating-point matrix multiplication and attention mechanisms executed on specialized hardware. Despite these profound differences in physical implementation, structurally similar computational motifs have emerged in both domains.

Both systems engage in predictive computation-a continual process of forming expectations, comparing those expectations to incoming sensory or data inputs, and updating internal states to minimize future errors.

Recent analyses highlight remarkable structural parallels:

The convergence of these motifs suggests the existence of a deeper, underlying mathematical structure that unifies biological and artificial prediction. The goal of this white paper is to explicitly identify this structure, formalized as the Single Unified Master Equation (SUME), and explore its implications for the future of both neuroscience and AI.

2. Foundations of Biological Neural Computation

Biological neural networks process information through a combination of electrical and chemical signaling. The fundamental operations can be abstracted as follows:

2.1 Weighted Summation and Dendritic Integration

The primary stage of neural computation involves the integration of signals from upstream neurons. Dendritic integration computes a weighted sum of inputs, where the weights ($W$) represent synaptic strengths:

$$u = W \cdot x_t$$

2.2 Nonlinear Thresholding and Action Potentials

Neurons maintain a resting membrane potential. When the integrated input ($u$) causes the membrane voltage to cross a specific threshold ($\theta$), the neuron generates an action potential (spike). This process is highly nonlinear:

$$V_{t+1} = \sigma(u - \theta)$$

Where $\sigma$ represents the nonlinear spike-generation function.

2.3 Competitive Dynamics and Lateral Inhibition

Neural circuits often exhibit competitive dynamics, most notably through lateral inhibition. In this mechanism, active neurons suppress the activity of their neighbors. This implements a form of winner-take-all selection, sharpening the neural representation and focusing resources on the most salient inputs:

$$x_i \leftarrow x_i - \sum_{j \neq i} w_{ij}x_j$$

2.4 Predictive Coding

A dominant theory in neuroscience posits that cortical circuits operate on the principle of predictive coding. The brain continually generates predictions of incoming sensory signals based on its internal model of the world:

$$\hat{s}_{t+1} = f(s_t)$$

2.5 Learning Through Synaptic Plasticity

The ability of the brain to learn and adapt relies on synaptic plasticity-the modification of the strength of connections between neurons. Weights are updated based on neural activity:

$$W_{t+1} = W_t + \Delta W$$

3. Foundations of Transformer Computation

Transformer models process sequences through a series of layers incorporating self-attention and feed-forward networks.

3.1 Linear Projections

The input sequence ($X$) is transformed through learned linear projections to create Query ($Q$), Key ($K$), and Value ($V$) matrices. These projections are analogous to the weighted summation in biological neurons:

$$Q = XW^Q,\quad K = XW^K,\quad V = XW^V$$

3.2 Scaled Dot-Product Attention and Integration

The core innovation of the transformer is the self-attention mechanism. It calculates the relevance of different parts of the input sequence to each other. The softmax function acts as a competitive selection mechanism, normalizing the relevance scores into a probability distribution:

$$A(x_t) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right)V$$

3.3 Nonlinear Update

The output of the attention layer is passed through a feed-forward network, which applies a nonlinear activation function ($\sigma$), typically GELU or ReLU, allowing the model to learn complex patterns:

$$x_{t+1} = \sigma(W_oy + b)$$

4. The Single Unified Master Equation (SUME)

By comparing the computational operations in biological systems and transformer models, we observe a profound structural equivalence. Both systems are instantiations of a universal predictive recurrence process.

4.1 Unified State Evolution

The core of the SUME describes how the system state evolves from time $t$ to $t+1$:

$$x_{t+1} = \sigma\big( W x_t + A(x_t) - \Theta \big)$$

Where the components are interpreted as:

4.2 Unified Weight Update Rule

The adaptation and learning mechanism in both systems is captured by the weight dynamics:

$$W_{t+1} = W_t - \eta \nabla_W \mathcal{L}(x_{t+1}, \hat{x}_{t+1})$$

5. Mapping the Parallels

Function Human Brain (Biological) Transformer Model (Artificial)
Weighted Sum Dendritic integration Linear projections (Q, K, V)
Nonlinearity Spike threshold (Action Potential) Activation functions (ReLU/GELU)
Competition Lateral inhibition networks Softmax function
Prediction Cortical generative models Next-token prediction
Error Correction Sensory mismatch/Prediction error Cross-entropy loss
Weight Updates Synaptic plasticity (Hebbian) Optimization (SGD/Adam)

6. Conclusion

This white paper introduces the Single Unified Master Equation (SUME), demonstrating that human brains and transformer models implement the same predictive recurrence equation. The SUME formalism reveals that cognition-whether biological or artificial-is fundamentally the same mathematical operation instantiated on different substrates.

This unification bridges the gap between neuroscience and artificial intelligence, offering a principled foundation for building more biologically grounded AI architectures and providing a computational framework for understanding human predictive processing. The central insight is clear: prediction is a universal operation of intelligence, and the mathematical structures supporting it are conserved across biological and artificial systems.