Layers¶
Base¶
CliffordModule
¶
Bases: Module
Base module for Clifford algebra layers.
Manages the algebra configuration.
Source code in layers/primitives/base.py
algebra
property
¶
Return the algebra instance, reconstructing if necessary.
__init__(algebra)
¶
Sets up the module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
Source code in layers/primitives/base.py
Primitives¶
RotorLayer
¶
Bases: CliffordModule
Learnable rotor layer for sandwich-product transformation.
Learns R = exp(-B/2) and applies the isometry x' = RxR~. Preserves origin, lengths, and angles.
Attributes:
| Name | Type | Description |
|---|---|---|
channels |
int
|
Number of rotors. |
bivector_weights |
Parameter
|
Learnable B coefficients. |
use_decomposition |
bool
|
If True, use power iteration decomposition. |
decomp_k |
int
|
Number of simple components for decomposition. |
Source code in layers/primitives/rotor.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
__init__(algebra, channels, use_decomposition=False, decomp_k=None)
¶
Initialize the rotor layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
channels
|
int
|
Number of features. |
required |
use_decomposition
|
bool
|
If True, use bivector decomposition. Reference: Pence et al. (2025), arXiv:2507.11688v1 |
False
|
decomp_k
|
int
|
Number of simple components for decomposition. |
None
|
Source code in layers/primitives/rotor.py
reset_parameters()
¶
forward(x)
¶
Apply the sandwich product x' = RxR~.
Caches rotors during eval mode for faster inference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input [Batch, Channels, Dim]. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Rotated input. |
Source code in layers/primitives/rotor.py
train(mode=True)
¶
Override to invalidate rotor cache when switching to train mode.
prune_bivectors(threshold=0.0001)
¶
Zero out bivector weights below the threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
Cutoff magnitude. |
0.0001
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of pruned parameters. |
Source code in layers/primitives/rotor.py
MultiRotorLayer
¶
Bases: CliffordModule
Multi-rotor layer with weighted superposition: x' = sum_k w_k R_k x R~_k.
Replaces rigid single-rotor rotations with a flexible superposition.
Attributes:
| Name | Type | Description |
|---|---|---|
channels |
int
|
Input features. |
num_rotors |
int
|
Number of overlapping rotors. |
use_decomposition |
bool
|
If True, use power iteration decomposition. |
decomp_k |
int | None
|
Number of simple components for decomposition. |
rotor_bivectors |
Parameter
|
Bivector coefficients [num_rotors, num_bv] |
weights |
Parameter
|
Mixing weights [channels, num_rotors] |
Source code in layers/primitives/multi_rotor.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
__init__(algebra, channels, num_rotors=8, use_decomposition=False, decomp_k=None)
¶
Initialize Multi-Rotor Layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
channels
|
int
|
Input features. |
required |
num_rotors
|
int
|
Parallel heads. |
8
|
use_decomposition
|
bool
|
If True, use bivector decomposition. Reference: Pence et al. (2025), arXiv:2507.11688v1 |
False
|
decomp_k
|
int
|
Number of simple components for decomposition. |
None
|
Source code in layers/primitives/multi_rotor.py
reset_parameters()
¶
forward(x, return_invariants=False)
¶
Apply weighted multi-rotor transformation.
Caches rotors during eval mode for faster inference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input [Batch, Channels, Dim]. |
required |
return_invariants
|
bool
|
If True, returns grade norms. |
False
|
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Transformed output [Batch, Channels, Dim]. |
Source code in layers/primitives/multi_rotor.py
train(mode=True)
¶
Override to invalidate rotor cache when switching to train mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
bool
|
Whether to set to training mode. |
True
|
Source code in layers/primitives/multi_rotor.py
sparsity_loss()
¶
Computes the L1 sparsity loss for rotor bivectors and weights.
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Scalar sparsity loss. |
Source code in layers/primitives/multi_rotor.py
CliffordLinear
¶
Bases: CliffordModule
Fully connected layer with optional rotor-based backend.
Can use either: - Traditional scalar weight matrix (default, backward compatible) - Rotor-based transformation (new, parameter efficient via RotorGadget)
The traditional backend uses O(in_channels x out_channels) parameters, while the rotor backend uses O(num_rotor_pairs x n(n-1)/2) parameters where n is the number of basis vectors.
Attributes:
| Name | Type | Description |
|---|---|---|
in_channels |
int
|
Input features. |
out_channels |
int
|
Output features. |
backend |
str
|
'traditional' or 'rotor' |
weight |
Parameter | None
|
Weights [Out, In] (traditional backend only). |
bias |
Parameter | None
|
Bias multivector [Out, Dim] (traditional backend only). |
gadget |
Module | None
|
Rotor transformation (rotor backend only). |
Source code in layers/primitives/linear.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
__init__(algebra, in_channels, out_channels, backend='traditional', num_rotor_pairs=4, use_decomposition=False, decomp_k=10, aggregation='mean', shuffle='none')
¶
Initialize Clifford Linear.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
in_channels
|
int
|
Input size. |
required |
out_channels
|
int
|
Output size. |
required |
backend
|
str
|
'traditional' for standard linear layer, 'rotor' for rotor-based transformation |
'traditional'
|
num_rotor_pairs
|
int
|
Number of rotor pairs (rotor backend only) |
4
|
use_decomposition
|
bool
|
Use bivector decomposition (rotor backend only) |
False
|
decomp_k
|
int
|
Decomposition iterations (rotor backend only) |
10
|
aggregation
|
str
|
Aggregation method (rotor backend only) |
'mean'
|
shuffle
|
str
|
Input channel shuffle strategy (rotor backend only): - 'none': No shuffle (default) - 'fixed': Fixed random permutation - 'random': Random permutation each forward pass |
'none'
|
Source code in layers/primitives/linear.py
reset_parameters()
¶
Initialize weights with Xavier uniform and zero bias.
forward(x)
¶
Apply channel-mixing linear transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input [Batch, In, Dim]. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Output [Batch, Out, Dim]. |
Source code in layers/primitives/linear.py
extra_repr()
¶
String representation for debugging.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Layer parameters description |
Source code in layers/primitives/linear.py
RotorGadget
¶
Bases: CliffordModule
Rotor-based linear transformation (Generalized Rotor Gadget).
Replaces standard linear layers with parameter-efficient rotor-sandwich transformations. Instead of using O(in_channels x out_channels) parameters, this uses O(num_rotor_pairs x n(n-1)/2) parameters where n is the number of basis vectors in the Clifford algebra.
Architecture
- Partition input channels into blocks
- For each rotor pair (i, j):
- Apply rotor sandwich: r_ij . x_i . s_ij.H
- Pool/aggregate results to output channels
The transformation is: psi(x) = r.x.s.H where r, s are rotors (bivector exponentials).
Attributes:
| Name | Type | Description |
|---|---|---|
algebra |
CliffordAlgebra
|
CliffordAlgebra instance |
in_channels |
Number of input channels |
|
out_channels |
Number of output channels |
|
num_rotor_pairs |
Number of rotor pairs to use |
|
use_decomposition |
Whether to use bivector decomposition |
|
aggregation |
Aggregation method ('mean', 'sum', or 'learned') |
Source code in layers/primitives/rotor_gadget.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 | |
__init__(algebra, in_channels, out_channels, num_rotor_pairs=4, use_decomposition=False, decomp_k=10, aggregation='mean', shuffle='none', bias=False)
¶
Initialize rotor gadget layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
CliffordAlgebra instance |
required |
in_channels
|
int
|
Number of input channels |
required |
out_channels
|
int
|
Number of output channels |
required |
num_rotor_pairs
|
int
|
Number of rotor pairs (higher = more expressive) |
4
|
use_decomposition
|
bool
|
Use bivector decomposition for efficiency |
False
|
decomp_k
|
int
|
Number of iterations for decomposition (if enabled) |
10
|
aggregation
|
Literal['mean', 'sum', 'learned']
|
How to pool rotor outputs ('mean', 'sum', 'learned') |
'mean'
|
shuffle
|
Literal['none', 'fixed', 'random']
|
Input channel shuffle strategy: - 'none': No shuffle, sequential block assignment (default) - 'fixed': Random permutation at initialization (fixed during training) - 'random': Random permutation each forward pass (regularization) |
'none'
|
bias
|
bool
|
Whether to include bias term (applied after transformation) |
False
|
Source code in layers/primitives/rotor_gadget.py
forward(x)
¶
Apply rotor-based transformation.
Uses batched geometric products - all rotor pairs are applied in parallel via a single pair of GP calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape [Batch, In_Channels, Dim] |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape [Batch, Out_Channels, Dim] |
Source code in layers/primitives/rotor_gadget.py
train(mode=True)
¶
Override to invalidate rotor cache when switching to train mode.
extra_repr()
¶
String representation for debugging.
Source code in layers/primitives/rotor_gadget.py
CliffordLayerNorm
¶
Bases: CliffordModule
Geometric LayerNorm that preserves direction and recovers scale.
Normalizes the multivector to unit norm (preserving geometric direction), then injects the original log-magnitude into the scalar (grade-0) part via a learnable gate.
Attributes:
| Name | Type | Description |
|---|---|---|
weight |
Parameter
|
Per-channel direction scale [C]. |
bias |
Parameter
|
Per-channel scalar bias [C]. |
norm_scale |
Parameter
|
Per-channel gate for log-magnitude injection into grade-0. Initialized to zero so the layer starts identical to the old (scale-discarding) behaviour. |
Source code in layers/primitives/normalization.py
__init__(algebra, channels, eps=1e-06, recover=True)
¶
Sets up normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
channels
|
int
|
Features. |
required |
eps
|
float
|
Stability term. |
1e-06
|
recover
|
bool
|
Whether to inject original scale into the scalar part. |
True
|
Source code in layers/primitives/normalization.py
forward(x)
¶
Normalizes energy, preserves direction, optionally recovers scale in grade-0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input [Batch, Channels, Dim]. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Normalized input. |
Source code in layers/primitives/normalization.py
BladeSelector
¶
Bases: CliffordModule
Blade Selector. Filters insignificant components.
Learns to weigh geometric grades, suppressing less relevant ones.
Attributes:
| Name | Type | Description |
|---|---|---|
weights |
Parameter
|
Soft gates [Channels, Dim]. |
Source code in layers/primitives/projection.py
__init__(algebra, channels)
¶
Sets up the selector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
channels
|
int
|
Input features. |
required |
Source code in layers/primitives/projection.py
reset_parameters()
¶
forward(x)
¶
Gates the grades.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input [Batch, Channels, Dim]. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Filtered input. |
Source code in layers/primitives/projection.py
Blocks¶
GeometricProductAttention
¶
Bases: CliffordModule
Multi-head attention using geometric product scoring.
Standard attention: score(Q, K) = / sqrt(d) (scalar only)
GA attention
product = Q_c * reverse(K_c) (geometric product per head-channel)
score = (
The grade-0 (scalar) part measures alignment (like dot product). The grade-2 (bivector) part measures relative orientation - novel.
Memory: naive [B, H, L, L, H_c, D] is too large. We chunk over L_q in blocks of BLOCK_SIZE to bound peak VRAM.
Attributes:
| Name | Type | Description |
|---|---|---|
num_heads |
int
|
Number of attention heads. |
head_channels |
int
|
Channels per head. |
causal |
bool
|
If True, apply autoregressive causal mask. |
bivector_weight |
float
|
lambda_ - weight of bivector score component. |
Source code in layers/blocks/attention.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 | |
__init__(algebra, channels, num_heads, causal=True, bivector_weight=0.5, dropout=0.0)
¶
Sets up geometric product attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
Clifford algebra instance. |
required |
channels
|
int
|
Total number of multivector channels. |
required |
num_heads
|
int
|
Number of attention heads. |
required |
causal
|
bool
|
Apply causal mask for autoregressive generation. |
True
|
bivector_weight
|
float
|
lambda_ weight on bivector score component. |
0.5
|
dropout
|
float
|
Dropout rate on attention weights. |
0.0
|
Source code in layers/blocks/attention.py
forward(x, key_padding_mask=None)
¶
Computes geometric product attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input multivectors [B, L, C, D]. |
required |
key_padding_mask
|
Tensor
|
Optional [B, L] bool mask where True = padded (ignored). |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Output multivectors [B, L, C, D]. |
Source code in layers/blocks/attention.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 | |
MultiRotorFFN
¶
Bases: CliffordModule
Embedded Geometric Toolbox - Feed-Forward Network via rotor superposition.
Standard transformers use: Linear -> GELU -> Linear. This replaces that with:
CliffordLinear(expand) -> CliffordLayerNorm
-> MultiRotorLayer(K rotors) -> GeometricGELU
-> CliffordLinear(contract) -> BladeSelector
The expand step lifts x into a ffn_mult x channels toolbox subspace.
MultiRotorLayer applies K parallel rotors, each exploring a different
rotation plane - this IS the nonlinearity, not just a scalar gate.
The contract step projects back to the original channel count.
Designed as a standalone module so it can be reused in other tasks (md17, pdbbind, etc.) beyond the language model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
channels
|
int
|
Input/output channel count. |
required |
ffn_mult
|
int
|
Expansion factor (ffn_channels = channels * ffn_mult). |
4
|
num_rotors
|
int
|
Number of parallel rotors K in the toolbox. |
8
|
use_decomposition
|
bool
|
Use power-iteration bivector decomposition. |
False
|
decomp_k
|
int
|
Number of simple components for decomposition. |
None
|
use_rotor_backend
|
bool
|
Use RotorGadget backend for CliffordLinear. |
False
|
Input/Output shape: [B, C, D] where D = algebra.dim.
Source code in layers/blocks/multi_rotor_ffn.py
forward(x)
¶
Applies the geometric toolbox FFN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Output |
Source code in layers/blocks/multi_rotor_ffn.py
GeometricTransformerBlock
¶
Bases: CliffordModule
Modular Geometric Transformer block.
Architecture: 1. Pre-norm 2. Geometric Attention (Standard or Entropy-Gated) 3. Residual connection 4. Pre-norm 5. Multi-Rotor FFN 6. Residual connection
Source code in layers/blocks/transformer.py
__init__(algebra, channels, num_heads=4, num_rotors=8, dropout=0.1, use_entropy_gating=False, eta=1.5, H_base=0.5)
¶
Initializes the Geometric Transformer Block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
Clifford algebra instance. |
required |
channels
|
int
|
Total multivector channels. |
required |
num_heads
|
int
|
Number of attention heads. |
4
|
num_rotors
|
int
|
Number of rotors in the FFN. |
8
|
dropout
|
float
|
Dropout rate. |
0.1
|
use_entropy_gating
|
bool
|
If True, uses EntropyGatedAttention. |
False
|
eta
|
float
|
Gating multiplier for entropy attention. |
1.5
|
H_base
|
float
|
Base entropy threshold. |
0.5
|
Source code in layers/blocks/transformer.py
forward(x, key_padding_mask=None, return_state=False)
¶
Forward pass through the transformer block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input multivectors [B, L, C, D]. |
required |
key_padding_mask
|
Tensor
|
Optional [B, L] bool mask where True = padded. |
None
|
return_state
|
bool
|
If True, returns intermediate entropy/gating states. |
False
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Processed multivectors [B, L, C, D] (and optionally intermediate states). |
Source code in layers/blocks/transformer.py
Adapters¶
MultivectorEmbedding
¶
Bases: CliffordModule
Token embedding as multivectors.
Each token maps to a [channels, dim] multivector. Initializes content in grade-1 (vector) subspace only - semantic content starts as directed quantities before rotors act on them.
Attributes:
| Name | Type | Description |
|---|---|---|
vocab_size |
int
|
Number of tokens. |
channels |
int
|
Number of multivector channels. |
embedding |
Embedding
|
Underlying embedding table. |
Source code in layers/adapters/embedding.py
__init__(algebra, vocab_size, channels)
¶
Sets up the multivector embedding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
Clifford algebra instance. |
required |
vocab_size
|
int
|
Vocabulary size. |
required |
channels
|
int
|
Number of multivector channels per token. |
required |
Source code in layers/adapters/embedding.py
forward(token_ids)
¶
Maps token ids to multivector embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token_ids
|
Tensor
|
Token indices [B, L]. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Multivector embeddings [B, L, channels, dim]. |
Source code in layers/adapters/embedding.py
MotherEmbedding
¶
Bases: CliffordModule
Embeds local feature groups into a canonical Mother Algebra with Procrustes Alignment.
Uses fixed rotors (R_fixed) to rotate individual channel vectors into a shared reference frame, effectively aligning disparate geometric manifolds.
Source code in layers/adapters/mother.py
__init__(algebra, input_dim, channels, U=0.0, V=None)
¶
Initializes the Mother Embedding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
Clifford algebra instance. |
required |
input_dim
|
int
|
Dimension of the input features. |
required |
channels
|
int
|
Number of multivector channels. |
required |
U
|
float
|
Geometric uncertainty index for manifold suppression. |
0.0
|
V
|
Tensor
|
Fixed rotor proxy for Procrustes alignment (input_dim x input_dim). |
None
|
Source code in layers/adapters/mother.py
forward(x)
¶
Projects input into the aligned mother manifold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input features [B, input_dim]. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Aligned multivectors [B, channels, dim]. |
Source code in layers/adapters/mother.py
EntropyGatedAttention
¶
Bases: CliffordModule
Dynamic geometric attention governed by bivector information entropy.
Segments with high bivector entropy (disordered phase states) are "stiffened" or suppressed, allowing only coherent, synchronized states to propagate.
Source code in layers/adapters/mother.py
__init__(algebra, channels, num_heads, eta=1.0, H_base=0.5)
¶
Initializes Entropy-Gated Attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
Clifford algebra instance. |
required |
channels
|
int
|
Total multivector channels. |
required |
num_heads
|
int
|
Number of attention heads. |
required |
eta
|
float
|
Gating multiplier. |
1.0
|
H_base
|
float
|
Base entropy threshold. |
0.5
|
Source code in layers/adapters/mother.py
forward(x, key_padding_mask=None, return_gating=False)
¶
Applies entropy-gated geometric attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input multivectors [B, L, C, D]. |
required |
key_padding_mask
|
Tensor
|
Optional [B, L] bool mask where True = padded. |
None
|
return_gating
|
bool
|
If True, returns entropy and gating values. |
False
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Attended multivectors [B, L, C, D]. |
Source code in layers/adapters/mother.py
Optional dependency
CliffordGraphConv requires torch-geometric. Install with uv sync --extra md17.
CliffordGraphConv
¶
Bases: CliffordModule
Geometric Graph Conv. Performs message passing using multivector features.
Aggregates features based on graph topology. H' = Aggregate(H) * W + Bias.
Attributes:
| Name | Type | Description |
|---|---|---|
linear |
CliffordLinear
|
The transformation. |
Source code in layers/adapters/gnn.py
__init__(algebra, in_channels, out_channels)
¶
Sets up the GNN layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algebra
|
CliffordAlgebra
|
The algebra instance. |
required |
in_channels
|
int
|
Input features. |
required |
out_channels
|
int
|
Output features. |
required |
Source code in layers/adapters/gnn.py
forward(x, adj)
¶
Aggregates and transforms node features using geometric operations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Node features. |
required |
adj
|
Tensor
|
Adjacency matrix. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Updated features. |