I measure how language is represented in human neural population codes — and ask whether the brain’s
compressed geometry can guide more efficient, compositional machine learning.
The Language Manifold: how the hippocampus encodes syntax and semantics
Flagship
Manuscript in preparation
In recordings from human hippocampus, syntactic and semantic information are not spread across separate, distributed populations — they are written into semi-orthogonal subspaces along a single, strikingly low-dimensional population axis. The brain’s solution is more compressed than the internal geometry of any state-of-the-art language model I compared it against. This is the core of my research: characterizing that geometry and asking what it teaches us about efficient computation.
syntaxsemantics
A rotating three-dimensional point cloud: two semi-orthogonal subspaces, one for syntax and one
for semantics, sharing a single dominant low-dimensional population axis.
Figure: semi-orthogonal syntactic and semantic subspaces along a shared low-dimensional hippocampal population axis.
I built a question-answer embedding framework (QA-Emb) that interrogates the hidden states of large language models and aligns them against hippocampal population geometry. Across ten models, syntax is read out from shallower layers than semantics — a depth gap that survives length-controlled, grain-matched comparison in 9 of 10 models.
figure placeholder — add export
Figure: layer-wise LLM-to-brain alignment — the syntax-before-semantics depth gap.
Poisson encoding across domains: music and grammar
Active
The same population-geometry toolkit generalizes beyond English narrative. In a piano-listening task I characterized 704 hippocampal neurons and found that the Krumhansl–Kessler tonal hierarchy — not the Circle of Fifths — best predicts their geometry, while dissociating absolute from relative pitch. In bilingual listeners, SVM decoders read grammatical gender and conjugation from Spanish-evoked activity.
figure placeholder — add export
Figure: hippocampal tuning to tonal function across 704 neurons (Krumhansl–Kessler model).
Toward a universal language manifold (forward-looking)
Designed & documented — not yet implemented
A research direction I have designed and specified: a sequence-to-sequence transformer that learns a language-agnostic semantic manifold directly from neural data, by forcing an encoder to translate hippocampal spike patterns across English, Spanish, and Hebrew renderings of the same narrative. It is the architectural payoff of the geometry work — treating the brain’s code as a prior for machines.
figure placeholder — add export
Figure: conceptual schematic of the trilingual brain-to-manifold encoder bottleneck.
Seq2seq transformer (POYO-style unit tokenization) Perceiver cross-attention bottleneck InfoNCE contrastive cross-lingual loss Language-adversarial gradient reversal Poisson reconstruction loss