How knowledge graphs measure whether two concepts point in the same direction
Section 1
Two arrows
Every concept in a knowledge graph is stored as a vector — an arrow from the origin
in high-dimensional space. Cosine similarity measures the angle between
two such arrows, producing a score from −1 to 1.
1 — the vectors point in the same direction (maximum similarity). 0 — the vectors are perpendicular (no relationship). −1 — the vectors point in opposite directions (anti-similar).
The score depends only on direction, not on length. Two vectors of different magnitudes
pointing the same way still score 1.
The question is: where does this score come from? It emerges from an operation called
the dot product.
COSINE SIMILARITY
1.000
θ = 0°
drag the circles to move vectors
Section 2
The dot product, derived
A 2D vector has two components: its x and y coordinates. The dot product multiplies
corresponding components and sums the results:
A · B = a₁b₁ + a₂b₂
This produces a scalar. The connection to angle comes from the law of cosines.
The triangle formed by vectors A, B, and their difference A−B has sides
|A|, |B|, and |A−B|. Applying the law of cosines:
Equating both expressions and canceling |A|² + |B|²:
A · B = |A| |B| cos(θ)
The dot product is not arbitrary — it is the unique component-wise
operation that encodes the angle between vectors. The "multiply and add"
recipe is forced by the Pythagorean theorem.
A = [120, 90]
B = [100, 100]
120 × 100 + 90 × 100 = 21000
drag vectors · triangle shows A, B, and A−B
|A| = 0 · |B| = 0 · |A−B| = 0
Section 3
Projection
The dot product has a direct geometric interpretation. Drop a perpendicular from the
tip of A onto the line of B. The foot of that perpendicular is the
projection of A onto B — the "shadow" A casts along B's direction.
The shadow's length is |A|cos(θ): the component of A that aligns with B. Multiply
this by |B| and you recover the dot product. Divide the dot product by |B| and you
get the shadow length.
Same direction — long positive shadow. A agrees with B. Perpendicular — zero shadow. A has no component along B. Opposite — negative shadow (falls behind the origin). A opposes B.
The dot product measures directional agreement. The projection makes this literal:
the shadow is the shared direction, rendered visible.
DOT PRODUCT
0
shadow = 0 · |A| = 0 · |B| = 0 · θ = 0°
drag vectors · blue line is A's shadow on B
Section 4
Normalization
The dot product conflates angle and magnitude. Longer vectors yield larger
dot products at the same angle. To isolate the angle, divide by both lengths:
cos(θ) = (A · B) / (|A| × |B|)
This is cosine similarity. The denominator cancels magnitudes;
only direction remains.
Unlock the lengths and experiment. Stretching or shrinking a vector changes the
dot product but leaves cosine similarity unchanged. This is why cosine works for
comparing texts of different lengths — a sentence and a paragraph about the same
topic can score near 1.0.
Why cosine, not Euclidean distance? In high-dimensional spaces,
Euclidean distance becomes unreliable — all points tend toward equidistance. Cosine
similarity preserves directional structure even when distances collapse.
COSINE SIMILARITY
1.000
θ = 0°
unlock lengths · drag along arrow to stretch
dot = 0|A|×|B| = 0 cosine = 0 / 0 = 0
The dimensional leap
From 2D to 3D to nD
The derivation above used two components. Adding a third changes nothing structural —
the law of cosines still holds, and expanding |A−B|² produces one additional term:
2D: A · B = a₁b₁ + a₂b₂
3D: A · B = a₁b₁ + a₂b₂ + a₃b₃
nD: A · B = a₁b₁ + a₂b₂ + ⋯ + aₙbₙ
Each additional dimension adds one multiplication to the sum. The formula scales linearly.
The geometric meaning is unchanged: the dot product still encodes the angle between
two vectors, regardless of how many dimensions the space has.
What does change is the richness of the space. In 2D, there is exactly one
direction perpendicular to any given vector. In 3D, there is an entire plane of
perpendicular directions. In 768 dimensions, there are 767 independent orthogonal
directions — meaning vectors can be "unrelated" in exponentially more ways.
This is why high-dimensional embeddings work: there is enough room to encode
thousands of independent similarity relationships simultaneously without interference.
Two concepts can be close along one set of dimensions and far along another, without
conflicting — because there are hundreds of axes available.
Section 5
Three dimensions
The same operation in 3D. Rotate the view to confirm that cosine similarity is a
property of the vectors, not of the viewpoint. The score is invariant under rotation.
COSINE SIMILARITY
1.000
θ = 0°
drag background to orbit · drag circles to move vectors
The jump from 3 to 768 (or 3,072) dimensions is purely algebraic — the geometry
does not change. Each additional dimension is another axis the embedding model uses to
encode meaning. Cosine similarity measures whether two texts point in the same composite
direction across all axes simultaneously.
Section 6
Application: knowledge graph retrieval
In a knowledge graph, the primary structure is curated edges — explicit relationships
declared by a human or agent. Cosine similarity provides a secondary layer: connections
that nobody declared but that exist in embedding space.
In practice:
• similar — nearest neighbors in embedding space. "What else points this direction?"
• surprise — nodes connected by curated edges but distant in embedding space. Structurally linked, semantically divergent.
• threshold = 0.70 — a design decision, not a constant. Below 0.50, everything connects. Above 0.90, only near-duplicates match.