What Makes Versor Unique: Code Examples¶
Ten innovations that distinguish Versor from standard deep learning frameworks, each illustrated with actual source code.
1. Signature-Aware Exponential Map¶
What it does: A single vectorized formula exponentiates bivectors across all three metric regimes — elliptic (cos/sin for Euclidean rotations), hyperbolic (cosh/sinh for Lorentz boosts), and parabolic (1+B for null dimensions) — with no branching per element.
How standard DL differs: Separate rotation matrix construction, no mixed-signature support.
From core/algebra.py — CliffordAlgebra._exp_bivector_closed:
# Signed squared norm: alpha = Sum_k b_k^2 . (e_k)^2
# alpha < 0 -> elliptic (Euclidean-like), alpha > 0 -> hyperbolic
alpha = (bv_coeffs * bv_coeffs * bv_sq).sum(dim=-1, keepdim=True)
abs_alpha = alpha.abs().clamp(min=1e-12)
theta = torch.sqrt(abs_alpha)
# Elliptic branch: cos(theta) and sin(theta)/theta
cos_theta = torch.cos(theta)
sinc_theta = torch.where(
theta > 1e-7,
torch.sin(theta) / theta,
1.0 - abs_alpha / 6.0,
)
# Hyperbolic branch: cosh(theta) and sinh(theta)/theta
cosh_theta = torch.cosh(theta)
sinhc_theta = torch.where(
theta > 1e-7,
torch.sinh(theta) / theta,
1.0 + abs_alpha / 6.0,
)
# Select branch based on sign of alpha
is_elliptic = alpha < -1e-12
is_hyperbolic = alpha > 1e-12
scalar_part = torch.where(
is_elliptic, cos_theta,
torch.where(is_hyperbolic, cosh_theta, torch.ones_like(theta))
)
coeff_part = torch.where(
is_elliptic, sinc_theta,
torch.where(is_hyperbolic, sinhc_theta, torch.ones_like(theta))
)
result = coeff_part * B
result[..., 0] = scalar_part.squeeze(-1)
Zero geometric products. Exact for simple bivectors in any Cl(p,q,r).
2. Rotor Sandwich Product¶
What it does: Learns bivector parameters B, computes R = exp(-B/2), and applies the isometry x' = RxR~ — a pure geometric rotation that preserves lengths, angles, and the origin.
How standard DL differs: nn.Linear applies an unconstrained weight matrix W that can stretch, shear, and deform.
From layers/primitives/rotor.py — RotorLayer:
def _compute_rotors(self, device, dtype):
"""Compute R and R~ from bivector weights."""
B = torch.zeros(self.channels, self.algebra.dim, device=device, dtype=dtype)
indices = self.bivector_indices.unsqueeze(0).expand(self.channels, -1)
B.scatter_(1, indices, self.bivector_weights)
R = self.algebra.exp(-0.5 * B)
R_rev = self.algebra.reverse(R)
return R, R_rev
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Apply the sandwich product x' = RxR~."""
R, R_rev = self._compute_rotors(x.device, x.dtype)
R_expanded = R.unsqueeze(0)
R_rev_expanded = R_rev.unsqueeze(0)
Rx = self.algebra.geometric_product(R_expanded, x)
res = self.algebra.geometric_product(Rx, R_rev_expanded)
return res
Every parameter is a bivector component — a specific plane of rotation with direct geometric meaning.
3. Direction-Preserving Activation (GeometricGELU)¶
What it does: Scales the magnitude of a multivector via GELU while preserving its direction (the unit multivector). The activation cannot rotate the data — only the RotorLayer does that.
How standard DL differs: ReLU/GELU applied coefficient-wise destroys geometric direction information.
From functional/activation.py — GeometricGELU.forward:
def forward(self, x: torch.Tensor) -> torch.Tensor:
norm = x.norm(dim=-1, keepdim=True)
eps = 1e-6
scale = F.gelu(norm + self.bias.view(1, -1, 1)) / (norm + eps)
return x * scale
Five lines. The direction x / ||x|| is untouched; only the scalar magnitude changes.
4. Riemannian Adam¶
What it does: Runs Adam momentum in the Lie algebra (bivector space) with bivector norm clipping. Combined with exp(-B/2) in the forward pass, this gives a Riemannian update on the Spin(n) manifold.
How standard DL differs: Standard Adam updates unconstrained Euclidean parameters with no manifold awareness.
From optimizers/riemannian.py — RiemannianAdam.step:
# Adam update in Lie algebra (bivector space)
# Combined with exp(-B/2) in forward pass, this gives Riemannian update
denom = (exp_avg_sq.sqrt() / bias_correction2_sqrt).add_(eps)
p.addcdiv_(exp_avg, denom, value=-step_size)
# Clip bivector norm for numerical stability in exp()
if self.max_bivector_norm is not None:
p_norm = p.norm(dim=-1, keepdim=True)
scale = torch.clamp(p_norm / self.max_bivector_norm, min=1.0)
p.div_(scale)
The key insight: because Versor parameterizes rotors via bivectors (the Lie algebra), Euclidean gradient updates in bivector space ARE geometrically meaningful. The exp(-B/2) in the forward pass completes the manifold retraction.
5. Automatic Metric Search¶
What it does: Lifts data into a conformal algebra Cl(X+1, 1), trains multiple GBN probes with biased initialization, then reads the learned bivector energy distribution to infer the optimal (p, q, r) signature.
How standard DL differs: Architecture hyperparameters (hidden sizes, attention heads) are manually chosen or grid-searched with no geometric interpretation.
From core/search.py — MetricSearch._analyze_bivector_energy:
# For each basis bivector e_ab, look up bv_sq_scalar:
# -1 -> elliptic, +1 -> hyperbolic, 0 -> null
sq_val = bv_sq[bv_idx_pos].item()
if sq_val < -0.5:
sig_type = 'elliptic'
elif sq_val > 0.5:
sig_type = 'hyperbolic'
else:
sig_type = 'null'
The probe's learned rotation planes directly reveal which metric regime the data lives in.
6. Hermitian Metrics for Mixed Signatures¶
What it does: Constructs a positive-definite inner product for any Cl(p,q,r) via Clifford conjugation + metric signs. This ensures gradient-based optimization works even in Minkowski or degenerate algebras where the standard norm can be negative.
How standard DL differs: Euclidean L2 norm only — breaks when applied to Minkowski-signature data.
From core/metric.py — _hermitian_signs and hermitian_inner_product:
def _hermitian_signs(algebra: CliffordAlgebra) -> torch.Tensor:
signs = torch.ones(algebra.dim, device=algebra.device)
pq = algebra.p + algebra.q
for i in range(algebra.dim):
k = bin(i).count('1') # grade
conj_sign = ((-1) ** k) * ((-1) ** (k * (k - 1) // 2))
metric_product = 1
has_null = False
for bit in range(algebra.n):
if i & (1 << bit):
if bit >= pq:
has_null = True
break
metric_product *= (1 if bit < algebra.p else -1)
if has_null:
signs[i] = 0
else:
metric_sign = ((-1) ** (k * (k - 1) // 2)) * metric_product
signs[i] = conj_sign * metric_sign
return signs
def hermitian_inner_product(algebra, A, B):
signs = _hermitian_signs(algebra).to(device=A.device, dtype=A.dtype)
return (signs * A * B).sum(dim=-1, keepdim=True)
Precomputed once per algebra. For Euclidean Cl(p,0), all signs are +1 and this reduces to the standard dot product.
7. Bivector Decomposition via Power Iteration¶
What it does: Decomposes a non-simple bivector (one that cannot be written as a single wedge product) into simple components via GA power iteration, then exponentiates each in closed form and composes via geometric product.
How standard DL differs: Matrix exponentials typically use Taylor series or Pade approximation — no geometric decomposition.
From core/decomposition.py — ga_power_iteration:
def ga_power_iteration(algebra, b, v_init=None, threshold=1e-6, max_iterations=100):
"""Find the dominant simple bivector component."""
if v_init is None:
v_raw = torch.randn(*batch_shape, algebra.n, device=device, dtype=dtype)
v = algebra.embed_vector(v_raw)
v = v / v.norm(dim=-1, keepdim=True).clamp(min=1e-6)
for _ in range(max_iterations):
v_prev = v
v = algebra.right_contraction(b, v) # Key: GA right contraction
v = v / v.norm(dim=-1, keepdim=True).clamp(min=1e-6)
if (v - v_prev).norm(dim=-1).max() < threshold:
break
u = algebra.right_contraction(b, v)
u = u / u.norm(dim=-1, keepdim=True).clamp(min=1e-6)
sigma = b.norm(dim=-1, keepdim=True)
b_s = sigma * algebra.wedge(u, v) # Simple projection
return b_s, v
Reference: Pence, T., Yamada, D., & Singh, V. (2025). "Composing Linear Layers from Irreducibles." arXiv:2507.11688v1
8. Rotor Gadget (Parameter-Efficient Linear)¶
What it does: Replaces nn.Linear with left/right rotor pairs and block-diagonal channel routing. Uses O(K * n(n-1)/2) parameters instead of O(in_channels * out_channels), achieving ~63% parameter reduction.
How standard DL differs: nn.Linear uses a dense weight matrix with no geometric structure.
From layers/primitives/rotor_gadget.py — RotorGadget.forward:
def _compute_rotors(self):
B_left = self._bivector_to_multivector(self.bivector_left) # [pairs, dim]
B_right = self._bivector_to_multivector(self.bivector_right) # [pairs, dim]
R_left = self.algebra.exp(-0.5 * B_left) # Left rotor
R_right = self.algebra.exp(-0.5 * B_right) # Right rotor
R_right_rev = self.algebra.reverse(R_right) # Reverse for sandwich
return R_left, R_right_rev
def forward(self, x):
R_left, R_right_rev = self._compute_rotors()
# Two batched GPs instead of 2*K sequential GPs
temp = self.algebra.geometric_product(R_left_expanded, x)
out = self.algebra.geometric_product(temp, R_right_expanded)
return self._aggregate_to_output_channels(out)
The transformation psi(x) = r . x . s~ where r, s are independent rotors — more expressive than a single sandwich product.
9. Rotor-to-Formula Translation¶
What it does: Reads trained bivector weights as symbolic rotation angles and planes, then maps each to its closed-form trigonometric/hyperbolic action — producing a human-readable formula from a trained neural network.
How standard DL differs: Black-box neural networks require post-hoc interpretation tools (SHAP, LIME). Here the formula is a direct readout.
From models/sr/translator.py — RotorTranslator._plane_to_action:
def _plane_to_action(self, plane: SimplePlane) -> sympy.Expr:
"""Closed-form sandwich product action for a single plane."""
xi = self.symbols[plane.var_i]
xj = self.symbols[plane.var_j]
theta = plane.angle
if plane.sig_type == "elliptic":
return xi * sympy.cos(2 * theta) - xj * sympy.sin(2 * theta)
elif plane.sig_type == "hyperbolic":
return xi * sympy.cosh(2 * theta) + xj * sympy.sinh(2 * theta)
else: # parabolic
return xi + 2 * theta * xj
A trained rotation angle of 0.785 rad in the e12 plane becomes cos(1.57)*x1 - sin(1.57)*x2. The algebra guarantees this mapping is exact.
10. Iterative Geometric Unbending¶
What it does: A 4-phase pipeline for symbolic regression that uses GA blade rejection instead of numerical subtraction to iteratively extract formula terms.
How standard DL differs: Genetic programming (PySR) or transformer-based equation search. No geometric structure in the search.
From models/sr/unbender.py — pipeline summary:
Phase 0: Data Preparation
- SVD alignment, variable grouping, implicit probe
Phase 1: Per-Group Iterative Extraction
- Single-rotor-per-stage with GA orthogonal elimination
- blade rejection (NOT numerical subtraction)
Phase 2: Mother Algebra Cross-Term Discovery
- GPCA in Cl(P,Q,R) for cross-group interactions
Phase 3: SymPy Refinement
- lstsq reweight, implicit solve, simplify
Key advantage: GA blade rejection (algebra.blade_reject) orthogonally removes discovered components from the residual, preserving geometric structure that numerical subtraction would corrupt. Each stage's rotor directly encodes a formula term via the translator (Innovation #9).