Geometric Algebra Model Design Guide¶
A practitioner's guide to building models with Versor. Assumes familiarity with PyTorch; no prior Clifford algebra knowledge required.
1. Does Your Task Have Geometric Structure?¶
Versor adds inductive bias: it constrains certain layers to isometries (length- and angle-preserving maps). This is useful when you know the symmetry group of your data; it is overhead when you don't.
Use Versor when your task has one or more of these properties:
- Spatial coordinates as inputs — positions, orientations, distances, angles, velocities (molecules, point clouds, robot kinematics, weather fields).
- Equivariance requirement — a rotation of the input should produce a predictable, structured transformation of the output, not just a scalar similarity score.
- Known manifold structure — EEG phase-amplitude coupling, Lorentz-invariant physics, hyperbolic embeddings, projective geometry.
Standard PyTorch is fine when:
- Inputs are tokens, pixels, or tabular features with no geometric interpretation.
- Channels are arbitrary feature slots with no spatial relationship.
- You are prototyping and want to minimize debugging surface area.
2. Choosing a Metric Signature¶
The signature \(Cl(p, q, r)\) determines what the algebra "knows" about your geometry. Use the table below as a starting point.
| Signature | Geometry | Typical tasks |
|---|---|---|
| \(Cl(3, 0)\) | Euclidean 3D | Molecules (QM9, MD17), point clouds |
| \(Cl(3, 0, 1)\) | Projective GA / SE(3) | Molecular dynamics with translations, robotics |
| \(Cl(3, 1)\) or \(Cl(1, 3)\) | Minkowski / spacetime | EEG phase-amplitude, relativistic physics |
| \(Cl(4, 1)\) | Conformal GA | Logic, CAD, translations-as-rotations |
| \(Cl(n, 0)\) | High-dimensional Euclidean | Semantic embeddings, symbolic regression |
When none of the above fits, let the data decide:
from core.analysis import MetricSearch
best_p, best_q, best_r = MetricSearch(device='cpu').search(your_data_tensor)
algebra = CliffordAlgebra(best_p, best_q, best_r, device='cpu')
Apple Silicon / CPU note: always pass device='cpu' explicitly to CliffordAlgebra. It does not default to MPS.
3. Layer Decision Map¶
Every GBN model uses a mix of geometric and non-geometric layers. This table shows which layer to reach for and when to skip it.
| Purpose | Layer | Key property | When to skip |
|---|---|---|---|
| Geometric rotation (even versor) | RotorLayer(grade=2) |
Isometry via exp(-B/2); Spin group | No manifold structure in the input |
| Grade-k versor transform | RotorLayer(grade=k) |
Learns grade-k element V; applies hat(V) x V⁻¹ | Grade-2 (default) covers most cases |
| Reflection (odd versor, unit-constrained) | ReflectionLayer |
Learns unit vectors; normalizes before applying x' = -nxn⁻¹; Pin group |
Task has no reflection symmetry |
| Multi-scale rotation | MultiRotorLayer(grade=2) |
K-rotor superposition | Simple tasks; use RotorLayer first |
| Multi-scale grade-k versor | MultiRotorLayer(grade=k) |
K-versor superposition for arbitrary grade | Grade-2 covers most cases |
| Channel mixing | CliffordLinear (traditional backend) |
Standard scalar weight matrix | Never — always needed alongside rotors |
| Constrained channel mixing | CliffordLinear(backend='rotor') |
~63% fewer params, bivector-constrained | Need full cross-channel expressivity |
| Normalization | CliffordLayerNorm |
Preserves direction, normalizes magnitude | Very shallow models (1–2 layers) |
| Non-linearity | GeometricGELU |
Magnitude gating, preserves direction | When coefficient-wise activation is intentional |
| Grade filtering | BladeSelector |
Soft attention over basis blades | No a priori grade structure in the task |
| Task readout | nn.Linear on flattened multivector |
Unconstrained projection to output space | Never — always use standard linear for readout |
The key principle: RotorLayer rotates; CliffordLinear mixes channels; nn.Linear projects to outputs. These are three different jobs. Do not conflate them.
4. The Standard GBN Stack¶
The canonical Geometric Blade Network block, annotated:
import torch.nn as nn
from core.algebra import CliffordAlgebra
from layers.primitives.linear import CliffordLinear
from layers.primitives.rotor import RotorLayer
from layers.primitives.normalization import CliffordLayerNorm
from functional.activation import GeometricGELU
algebra = CliffordAlgebra(p=3, q=0, device='cpu') # Euclidean 3D
class GBNBlock(nn.Module):
def __init__(self, algebra, channels):
super().__init__()
# Channel mixing: scalar weight matrix, O(channels^2) params
# This is NOT a geometric operation — channels are feature slots
self.linear_in = CliffordLinear(algebra, channels, channels)
# Normalization: preserves the direction of each multivector
self.norm = CliffordLayerNorm(algebra, channels)
# Non-linearity: gates the magnitude via GELU, direction unchanged
self.act = GeometricGELU(algebra, channels)
# Geometric rotation: isometry, O(n^2/2) bivector params per channel
# This IS the geometric operation — constrains to the rotation group
self.rotor = RotorLayer(algebra, channels)
def forward(self, x):
# x: [Batch, channels, 2^n]
x = self.linear_in(x) # channel mixing (standard linear algebra)
x = self.norm(x) # normalize magnitudes
x = self.act(x) # non-linearity
x = self.rotor(x) # geometric rotation
return x
Stack multiple GBNBlock instances for depth. The final readout is always a standard nn.Linear:
# Readout: flatten multivector dimension, project to task output
# [Batch, channels, 2^n] → [Batch, channels * 2^n] → [Batch, out_dim]
self.head = nn.Linear(channels * algebra.dim, out_dim)
# In forward:
out = x.flatten(1) # flatten channels + blade dimensions
out = self.head(out)
5. Hybrid Design: Versor + Standard PyTorch¶
Versor models are intentionally hybrid. Standard nn.Linear and CliffordLinear are expected — not a compromise.
Use nn.Linear for:
- Embedding raw scalar features into the multivector channel space
- Attention weight computation (no geometric meaning)
- Final task-specific readout heads
- Any projection between non-geometric feature spaces
Use CliffordLinear (traditional backend) for:
- Mixing channels within the multivector representation
- Situations where you want the weight matrix to see all blade components together
Use RotorLayer / MultiRotorLayer for:
- Any step where the input carries spatial/geometric meaning that must be preserved
- Message passing over graphs when edge features are spatial
- Equivariant transformations in the network backbone
Minimal hybrid model:
import torch
import torch.nn as nn
from core.algebra import CliffordAlgebra
from layers.primitives.linear import CliffordLinear
from layers.primitives.rotor import RotorLayer
from layers.primitives.normalization import CliffordLayerNorm
from functional.activation import GeometricGELU
class HybridModel(nn.Module):
"""
Hybrid model: standard nn.Linear for embedding and readout,
Versor geometric layers for the transformation backbone.
"""
def __init__(self, in_dim, hidden_channels, out_dim, algebra):
super().__init__()
self.algebra = algebra
dim = algebra.dim # 2^n blade dimensions
# Standard: project raw scalar features into channel space
# Output is treated as channel 0 of the multivector (grade 0)
self.embed = nn.Linear(in_dim, hidden_channels)
# Geometric backbone: channel mix → normalize → activate → rotate
self.linear = CliffordLinear(algebra, hidden_channels, hidden_channels)
self.norm = CliffordLayerNorm(algebra, hidden_channels)
self.act = GeometricGELU(algebra, hidden_channels)
self.rotor = RotorLayer(algebra, hidden_channels)
# Standard: flatten and project to task output
self.head = nn.Linear(hidden_channels * dim, out_dim)
def forward(self, x_scalar, x_mv):
# x_scalar: [B, in_dim] — raw scalar features (e.g., atom types)
# x_mv: [B, hidden_channels, dim] — multivector geometric features
# Embed scalars to channel dimension; add as grade-0 component
scalar_feat = self.embed(x_scalar) # [B, hidden_channels]
x_mv = x_mv + scalar_feat.unsqueeze(-1) * torch.zeros_like(x_mv)
# Geometric transformation
x_mv = self.linear(x_mv)
x_mv = self.norm(x_mv)
x_mv = self.act(x_mv)
x_mv = self.rotor(x_mv)
# Readout
return self.head(x_mv.flatten(1))
For a production example of this hybrid pattern, see models/md17.py (MD17InteractionBlock), which uses nn.Linear for edge projections alongside RotorLayer for spatial message passing.
6. When NOT to Use Rotors¶
Rotors are an inductive bias, not a universal improvement. There are clear cases where they hurt:
Channels with no geometric relationship. If your 128 channels are learned feature slots with no spatial interpretation, a rotor over them does not correspond to any meaningful rotation. Use CliffordLinear(backend='traditional') instead.
Tasks that need arbitrary cross-channel amplification. A rotor is an isometry — it cannot learn to scale one channel relative to another. If your task requires the network to suppress or amplify specific feature dimensions, use an unconstrained linear layer.
Unknown or mismatched metric signature. A rotor in \(Cl(3, 0)\) on data that lives in \(Cl(3, 1)\) will produce geometrically incorrect transformations. If you are unsure of the signature and MetricSearch is too expensive, default to standard layers until you have a hypothesis.
Very shallow networks. A 1–2 layer model may not benefit from the Cayley table overhead. The geometric inductive bias pays off over depth; for shallow models, a standard nn.Linear is usually faster and simpler.
Rule of thumb: start with a standard PyTorch baseline. Add RotorLayer where you can articulate why a rotation group is the right constraint for that step.
7. Setting Up Training¶
For models where all backbone weights are bivectors (i.e., backend='rotor' throughout), use RiemannianAdam:
from optimizers.riemannian import RiemannianAdam
optimizer = RiemannianAdam(model.parameters(), lr=1e-3, algebra=algebra)
RiemannianAdam runs Adam updates in bivector space (the Lie algebra of the rotation group). The exp(-B/2) map provides the manifold retraction automatically — the update stays on the Spin manifold without needing a projection step. Bivector norm clipping (default max_norm=10.0) prevents instability in deep networks.
Standard torch.optim.Adam also works and is the right choice when your model mixes bivector and non-bivector parameters (e.g., nn.Linear readout heads). RiemannianAdam is most beneficial when the entire parameter space lives on the Bivector Manifold.
8. Complete Minimal Example¶
End-to-end: choose algebra, build a 3-layer GBN, train, evaluate. Runs in under 60 seconds on CPU with synthetic data.
import torch
import torch.nn as nn
from core.algebra import CliffordAlgebra
from layers.primitives.linear import CliffordLinear
from layers.primitives.rotor import RotorLayer
from layers.primitives.normalization import CliffordLayerNorm
from functional.activation import GeometricGELU
from optimizers.riemannian import RiemannianAdam
# --- 1. Algebra ---
algebra = CliffordAlgebra(p=3, q=0, device='cpu') # Cl(3,0): Euclidean 3D
dim = algebra.dim # 8 = 2^3 blade components
# --- 2. Model ---
hidden = 16
out_dim = 1
class SimpleGBN(nn.Module):
def __init__(self):
super().__init__()
self.l1 = CliffordLinear(algebra, hidden, hidden)
self.n1 = CliffordLayerNorm(algebra, hidden)
self.a1 = GeometricGELU(algebra, hidden)
self.r1 = RotorLayer(algebra, hidden)
self.l2 = CliffordLinear(algebra, hidden, hidden)
self.n2 = CliffordLayerNorm(algebra, hidden)
self.a2 = GeometricGELU(algebra, hidden)
self.r2 = RotorLayer(algebra, hidden)
self.head = nn.Linear(hidden * dim, out_dim)
def forward(self, x):
x = self.r1(self.a1(self.n1(self.l1(x))))
x = self.r2(self.a2(self.n2(self.l2(x))))
return self.head(x.flatten(1))
model = SimpleGBN()
# --- 3. Synthetic data ---
B = 32 # batch size
x = torch.randn(B, hidden, dim) # [Batch, Channels, 2^n]
y = torch.randn(B, out_dim)
# --- 4. Training loop ---
optimizer = RiemannianAdam(model.parameters(), lr=1e-3, algebra=algebra)
loss_fn = nn.MSELoss()
for step in range(200):
optimizer.zero_grad()
pred = model(x)
loss = loss_fn(pred, y)
loss.backward()
optimizer.step()
if step % 50 == 0:
print(f"step {step:3d} loss {loss.item():.4f}")
print("Done.")
Expected output: loss decreasing from ~1.0 toward ~0.0 over 200 steps on synthetic random targets.
9. Where to Go Next¶
| If you want... | Read... |
|---|---|
| All layers with annotated code examples | docs/innovations.md |
| Step-by-step tutorial with each layer | docs/tutorial.md |
| Formal mathematical definitions | docs/mathematical.md |
| Task-specific configurations (MD17, SR, LQA, EEG) | docs/tutorial.md |
| Design philosophy and motivation | docs/philosophy.md |
| Common errors and troubleshooting | docs/faq.md |