Beyond cosine: what the dimensions encode

Deeper explorations of embedding structure and its implications for retrieval

Beyond cosine

What else the dimensions encode

Cosine similarity collapses thousands of dimensions into a single number. This is like reducing a full-color image to a single brightness value — you get some information, but you lose the structure of what made the original rich.

3072 dimensions → 1 scalar = massive information loss

Two nodes can have the same cosine similarity score against a query but be similar for completely different reasons. The scalar hides which dimensions contributed — and those differences carry structural meaning.

Same score, different structure

Consider two concept pairs that both score 0.40 against the same query. When we look at which dimensional regions drive the similarity, the picture diverges:

dimensional activation heatmaps — same cosine, different internal structure

Pair A shares activation in the same dimensional region — they are close because they use similar vocabulary. This is vocabulary kinship.

Pair B shares activation across different regions — they are close because of distributed structural alignment that spans multiple concept facets. This is structural kinship.

Cosine similarity treats both identically. A smarter traversal algorithm could use the activation pattern to distinguish these cases — weighting structural kinship higher during graph expansion.

The direction of difference

When two nodes are connected by an edge, the vector difference between them has a direction in embedding space. That direction encodes what kind of relationship the edge represents — or at least, that's the hypothesis. Let's walk it out layer by layer, marking what's solid and what's uncertain.

edges as vectors — rotation into new subspace (domain shift) vs staying in same subspace (elaboration)

Claim 1: An edge is a vector difference [solid — linear algebra]

Entity A has an embedding in ℝ³⁰⁷². Entity B has an embedding in ℝ³⁰⁷². The edge between them can be represented as the difference vector (B − A). That vector has a direction (which dimensions change and by how much) and a magnitude (how far apart they are in embedding space). This is just arithmetic on vectors — no interpretation needed yet.

Claim 2: "Rotation" means the active dimensions change [precise, but conditional]

If A's embedding has high-magnitude values concentrated in certain dimensions (say, 200–400) and B's are concentrated in different dimensions (say, 1000–1200), we say the difference vector "rotates" — it points out of A's active subspace and into a different one.

Contrast with "elaboration": A lives in dimensions 200–400, B also lives in dimensions 200–400 but with different values. The difference vector stays within the same subspace. Both A and B are "about the same kind of thing," and the edge between them is more-of-the-same rather than a boundary crossing.

The condition: this distinction only works if embeddings have localized activation — meaning individual entities primarily activate a subset of the 3072 dimensions rather than spreading evenly across all of them. If every entity activates all dimensions roughly equally (a dense, distributed representation), then every difference vector "rotates" and the distinction collapses into noise.

Open question: Do text-embedding-3-large embeddings exhibit localized activation? We don't yet know. This is empirically testable: measure the sparsity pattern (how many dimensions carry most of the variance for a given entity). If activation is concentrated — great, subspace analysis works. If it's uniformly distributed, we need a decomposition step first (sparse autoencoders, PCA rotation, or the Matryoshka structure already built into the model).

Claim 3: Subspace = domain [interpretive hypothesis — testable]

The interpretive leap: if different dimension ranges encode different conceptual domains (temporal reasoning, spatial relations, social dynamics, etc.), then a vector that rotates between subspaces represents a domain shift — the relationship crosses a conceptual boundary.

Evidence that this might hold: research on dimension-to-concept mapping (Mellina et al. 2025) shows individual dimensions in embedding models can be mapped to ontology concepts using KL divergence. Monosemanticity research (Anthropic, 2023) finds that linear directions in activation space correspond to interpretable features — though individual neurons are polysemantic, directions aren't. And RotatE (Sun et al. 2019) proves that KG relations can be modeled as element-wise rotations, with each dimension pair encoding a rotation angle specific to the relation.

Evidence it might not hold for our case: those results come from models trained for specific tasks, not general-purpose text embeddings. OpenAI's model may distribute concept information more uniformly. We also don't control the training — we can't inspect what each dimension "means" without empirical probing.

Claim 4: Rotation edges are structural bridges [follows from claim 3 if it holds]

If claims 2 and 3 hold, then edges where the embedding rotates into a new subspace connect things that share mechanism across domains — structural bridges. Edges that stay in the same subspace are elaborations — more detail on the same topic.

This matters for traversal: a query activates certain dimensions. Following edges that stay in those dimensions gives you "more like this." Following edges that rotate out of those dimensions gives you "related through mechanism, not through vocabulary" — the cross-domain connections that are invisible to cosine search.

The example: "basin key" → "antigenic sin in basin keys" should stay in the same dimensional neighborhood (elaboration — same concept, more specific). "Dormant fidelity" → "wake problem" should rotate — they share a relationship through mechanism (records that exist but don't activate), not through vocabulary.

The motivating case: "life resisting entropy" → "rheology."
The Deborah number (from rheology) measures whether material appears solid or fluid depending on observation timescale. Applied to selfhood: whether an agent appears to have persistent identity depends on the timescale at which you observe it. This connection has zero vocabulary overlap with "life resisting entropy" — cosine can't find it. But if our hypothesis holds, the vector difference between these two entities should show a large rotation (high-activation dimensions shift), flagging it as a structural bridge rather than a vocabulary neighbor.

Status: We found this connection through graph structure (neighborhood overlap via shared structural hubs). The dimensional rotation hypothesis would give us a second, independent way to detect such bridges — directly from the embeddings, without needing the graph.

What we'd need to verify

To move these claims from hypothesis to engineering:

TestWhat it tells usTools needed
Sparsity check Do our entities have localized activation, or is it uniform? Compute L1/L2 ratio or Gini coefficient per entity embedding
Rotation measurement For known elaboration pairs vs known bridge pairs, does rotation angle differ? Cosine of (B−A) direction across pair types; compare distributions
Dimension clustering Do dimensions group into interpretable factors? PCA / sparse autoencoder on our 1100 entity embeddings
Bridge prediction Can rotation angle predict which edges are structural bridges? Label edges (elaboration vs bridge), train simple classifier on rotation features

If the sparsity check fails (embeddings are dense/uniform), we know to decompose first before attempting subspace analysis. If it succeeds, the remaining tests become tractable.

Subspace projection for smarter retrieval

Standard retrieval compares the query against every node using full cosine across all dimensions. But once phase 1 returns seeds, those seeds define a dimensional subspace — the set of dimensions where the seeds cluster together.

projecting into the seed-defined subspace filters noise dimensions

Phase 2 could score neighbors only in that subspace — the dimensions where the query's seeds agree. This filters out noise dimensions (the ones that contribute nothing to the query's specific structural context) and makes the expansion sensitive to why the seeds are relevant, not just that they are.

A neighbor that scores high in the seed subspace shares the specific structural features that matter for this query. A neighbor that scores high in full cosine but low in the seed subspace is coincidentally close — similar for reasons unrelated to the question being asked.

The research direction: We don't yet know which dimensions in models like text-embedding-3-large encode structural relationships vs surface vocabulary. This is an empirical question — testable by comparing dimensional activation patterns between known structural-kinship pairs and known vocabulary-kinship pairs. If the signatures are distinct and consistent, we can build traversal algorithms that weight structural bridges over vocabulary echoes.
Uses

Six things embeddings can do beyond similarity search

Most systems use embeddings for exactly one thing: "find the nearest neighbor." But a vector space supports at least six distinct operations:

#UseMechanismWhat it finds
1RetrievalQuery vs all nodes → top NBest entry point into memory
2DiscoveryPairwise similarity above thresholdUndeclared relationships
3Drift detectionCompare old vs new embeddings of the same termMeaning shift over time
4Cluster formationCommunity detection on similarity graphConcept families emerging before they're named
5Gap detectionFind isolated nodes far from all clustersOrphans that should connect to something
6Surprise detectionCurated edge + low cosineStructural relationships invisible to surface similarity

Use #6 is especially important. A curated edge with low cosine similarity is not a failure — it's a signal. It means: these concepts are related through mechanism, not through vocabulary. These are exactly the connections that should be preserved and studied, because they encode reasoning that no automated system would find.

Cosine finds proximity. Curated edges preserve reasoning.
The embedding space tells you what's nearby. The graph tells you what's connected and why. The source artifacts tell you what actually happened. Three layers, three questions, one system.