Deeper explorations of embedding structure and its implications for retrieval
Beyond cosineCosine similarity collapses thousands of dimensions into a single number. This is like reducing a full-color image to a single brightness value — you get some information, but you lose the structure of what made the original rich.
Two nodes can have the same cosine similarity score against a query but be similar for completely different reasons. The scalar hides which dimensions contributed — and those differences carry structural meaning.
Consider two concept pairs that both score 0.40 against the same query. When we look at which dimensional regions drive the similarity, the picture diverges:
Pair A shares activation in the same dimensional region — they are close because they use similar vocabulary. This is vocabulary kinship.
Pair B shares activation across different regions — they are close because of distributed structural alignment that spans multiple concept facets. This is structural kinship.
Cosine similarity treats both identically. A smarter traversal algorithm could use the activation pattern to distinguish these cases — weighting structural kinship higher during graph expansion.
When two nodes are connected by an edge, the vector difference between them has a direction in embedding space. That direction encodes what kind of relationship the edge represents — or at least, that's the hypothesis. Let's walk it out layer by layer, marking what's solid and what's uncertain.
Entity A has an embedding in ℝ³⁰⁷². Entity B has an embedding in ℝ³⁰⁷².
The edge between them can be represented as the difference vector (B − A).
That vector has a direction (which dimensions change and by how much) and a magnitude
(how far apart they are in embedding space). This is just arithmetic on vectors —
no interpretation needed yet.
If A's embedding has high-magnitude values concentrated in certain dimensions (say, 200–400) and B's are concentrated in different dimensions (say, 1000–1200), we say the difference vector "rotates" — it points out of A's active subspace and into a different one.
Contrast with "elaboration": A lives in dimensions 200–400, B also lives in dimensions 200–400 but with different values. The difference vector stays within the same subspace. Both A and B are "about the same kind of thing," and the edge between them is more-of-the-same rather than a boundary crossing.
The condition: this distinction only works if embeddings have localized activation — meaning individual entities primarily activate a subset of the 3072 dimensions rather than spreading evenly across all of them. If every entity activates all dimensions roughly equally (a dense, distributed representation), then every difference vector "rotates" and the distinction collapses into noise.
text-embedding-3-large embeddings
exhibit localized activation? We don't yet know. This is empirically testable:
measure the sparsity pattern (how many dimensions carry most of the variance for
a given entity). If activation is concentrated — great, subspace analysis works.
If it's uniformly distributed, we need a decomposition step first (sparse autoencoders,
PCA rotation, or the Matryoshka structure already built into the model).
The interpretive leap: if different dimension ranges encode different conceptual domains (temporal reasoning, spatial relations, social dynamics, etc.), then a vector that rotates between subspaces represents a domain shift — the relationship crosses a conceptual boundary.
Evidence that this might hold: research on dimension-to-concept mapping (Mellina et al. 2025) shows individual dimensions in embedding models can be mapped to ontology concepts using KL divergence. Monosemanticity research (Anthropic, 2023) finds that linear directions in activation space correspond to interpretable features — though individual neurons are polysemantic, directions aren't. And RotatE (Sun et al. 2019) proves that KG relations can be modeled as element-wise rotations, with each dimension pair encoding a rotation angle specific to the relation.
Evidence it might not hold for our case: those results come from models trained for specific tasks, not general-purpose text embeddings. OpenAI's model may distribute concept information more uniformly. We also don't control the training — we can't inspect what each dimension "means" without empirical probing.
If claims 2 and 3 hold, then edges where the embedding rotates into a new subspace connect things that share mechanism across domains — structural bridges. Edges that stay in the same subspace are elaborations — more detail on the same topic.
This matters for traversal: a query activates certain dimensions. Following edges that stay in those dimensions gives you "more like this." Following edges that rotate out of those dimensions gives you "related through mechanism, not through vocabulary" — the cross-domain connections that are invisible to cosine search.
The example: "basin key" → "antigenic sin in basin keys" should stay in the same dimensional neighborhood (elaboration — same concept, more specific). "Dormant fidelity" → "wake problem" should rotate — they share a relationship through mechanism (records that exist but don't activate), not through vocabulary.
To move these claims from hypothesis to engineering:
| Test | What it tells us | Tools needed |
|---|---|---|
| Sparsity check | Do our entities have localized activation, or is it uniform? | Compute L1/L2 ratio or Gini coefficient per entity embedding |
| Rotation measurement | For known elaboration pairs vs known bridge pairs, does rotation angle differ? | Cosine of (B−A) direction across pair types; compare distributions |
| Dimension clustering | Do dimensions group into interpretable factors? | PCA / sparse autoencoder on our 1100 entity embeddings |
| Bridge prediction | Can rotation angle predict which edges are structural bridges? | Label edges (elaboration vs bridge), train simple classifier on rotation features |
If the sparsity check fails (embeddings are dense/uniform), we know to decompose first before attempting subspace analysis. If it succeeds, the remaining tests become tractable.
Standard retrieval compares the query against every node using full cosine across all dimensions. But once phase 1 returns seeds, those seeds define a dimensional subspace — the set of dimensions where the seeds cluster together.
Phase 2 could score neighbors only in that subspace — the dimensions where the query's seeds agree. This filters out noise dimensions (the ones that contribute nothing to the query's specific structural context) and makes the expansion sensitive to why the seeds are relevant, not just that they are.
A neighbor that scores high in the seed subspace shares the specific structural features that matter for this query. A neighbor that scores high in full cosine but low in the seed subspace is coincidentally close — similar for reasons unrelated to the question being asked.
text-embedding-3-large encode structural relationships
vs surface vocabulary. This is an empirical question — testable by comparing
dimensional activation patterns between known structural-kinship pairs and known
vocabulary-kinship pairs. If the signatures are distinct and consistent, we can
build traversal algorithms that weight structural bridges over vocabulary echoes.
Most systems use embeddings for exactly one thing: "find the nearest neighbor." But a vector space supports at least six distinct operations:
| # | Use | Mechanism | What it finds |
|---|---|---|---|
| 1 | Retrieval | Query vs all nodes → top N | Best entry point into memory |
| 2 | Discovery | Pairwise similarity above threshold | Undeclared relationships |
| 3 | Drift detection | Compare old vs new embeddings of the same term | Meaning shift over time |
| 4 | Cluster formation | Community detection on similarity graph | Concept families emerging before they're named |
| 5 | Gap detection | Find isolated nodes far from all clusters | Orphans that should connect to something |
| 6 | Surprise detection | Curated edge + low cosine | Structural relationships invisible to surface similarity |
Use #6 is especially important. A curated edge with low cosine similarity is not a failure — it's a signal. It means: these concepts are related through mechanism, not through vocabulary. These are exactly the connections that should be preserved and studied, because they encode reasoning that no automated system would find.