Cosine Similarity

How knowledge graphs measure whether two concepts point in the same direction

Section 1

Two arrows

Every concept in a knowledge graph is stored as a vector — an arrow from the origin in high-dimensional space. Cosine similarity measures the angle between two such arrows, producing a score from −1 to 1.

1 — the vectors point in the same direction (maximum similarity).
0 — the vectors are perpendicular (no relationship).
−1 — the vectors point in opposite directions (anti-similar).

The score depends only on direction, not on length. Two vectors of different magnitudes pointing the same way still score 1.

The question is: where does this score come from? It emerges from an operation called the dot product.

COSINE SIMILARITY
1.000
θ = 0°
drag the circles to move vectors
Section 2

The dot product, derived

A 2D vector has two components: its x and y coordinates. The dot product multiplies corresponding components and sums the results:

A · B = a₁b₁ + a₂b₂

This produces a scalar. The connection to angle comes from the law of cosines. The triangle formed by vectors A, B, and their difference A−B has sides |A|, |B|, and |A−B|. Applying the law of cosines:

|A − B|² = |A|² + |B|² − 2|A||B|cos(θ)

Expanding |A−B|² from components:

|A − B|² = (a₁−b₁)² + (a₂−b₂)²
         = |A|² + |B|² − 2(a₁b₁ + a₂b₂)

Equating both expressions and canceling |A|² + |B|²:

A · B = |A| |B| cos(θ)

The dot product is not arbitrary — it is the unique component-wise operation that encodes the angle between vectors. The "multiply and add" recipe is forced by the Pythagorean theorem.

A = [120, 90]    B = [100, 100]
120 × 100 + 90 × 100 = 21000
drag vectors · triangle shows A, B, and A−B
|A| = 0 · |B| = 0 · |A−B| = 0
Section 3

Projection

The dot product has a direct geometric interpretation. Drop a perpendicular from the tip of A onto the line of B. The foot of that perpendicular is the projection of A onto B — the "shadow" A casts along B's direction.

The shadow's length is |A|cos(θ): the component of A that aligns with B. Multiply this by |B| and you recover the dot product. Divide the dot product by |B| and you get the shadow length.

Same direction — long positive shadow. A agrees with B.
Perpendicular — zero shadow. A has no component along B.
Opposite — negative shadow (falls behind the origin). A opposes B.

The dot product measures directional agreement. The projection makes this literal: the shadow is the shared direction, rendered visible.

DOT PRODUCT
0
shadow = 0 · |A| = 0 · |B| = 0 · θ = 0°
drag vectors · blue line is A's shadow on B
Section 4

Normalization

The dot product conflates angle and magnitude. Longer vectors yield larger dot products at the same angle. To isolate the angle, divide by both lengths:

cos(θ) = (A · B) / (|A| × |B|)

This is cosine similarity. The denominator cancels magnitudes; only direction remains.

Unlock the lengths and experiment. Stretching or shrinking a vector changes the dot product but leaves cosine similarity unchanged. This is why cosine works for comparing texts of different lengths — a sentence and a paragraph about the same topic can score near 1.0.

Why cosine, not Euclidean distance? In high-dimensional spaces, Euclidean distance becomes unreliable — all points tend toward equidistance. Cosine similarity preserves directional structure even when distances collapse.
COSINE SIMILARITY
1.000
θ = 0°
unlock lengths · drag along arrow to stretch
dot = 0    |A|×|B| = 0
cosine = 0 / 0 = 0
The dimensional leap

From 2D to 3D to nD

The derivation above used two components. Adding a third changes nothing structural — the law of cosines still holds, and expanding |A−B|² produces one additional term:

2D:  A · B = a₁b₁ + a₂b₂

3D:  A · B = a₁b₁ + a₂b₂ + a₃b₃

nD:  A · B = a₁b₁ + a₂b₂ + ⋯ + aₙbₙ

Each additional dimension adds one multiplication to the sum. The formula scales linearly. The geometric meaning is unchanged: the dot product still encodes the angle between two vectors, regardless of how many dimensions the space has.

What does change is the richness of the space. In 2D, there is exactly one direction perpendicular to any given vector. In 3D, there is an entire plane of perpendicular directions. In 768 dimensions, there are 767 independent orthogonal directions — meaning vectors can be "unrelated" in exponentially more ways.

This is why high-dimensional embeddings work: there is enough room to encode thousands of independent similarity relationships simultaneously without interference. Two concepts can be close along one set of dimensions and far along another, without conflicting — because there are hundreds of axes available.

Section 5

Three dimensions

The same operation in 3D. Rotate the view to confirm that cosine similarity is a property of the vectors, not of the viewpoint. The score is invariant under rotation.

COSINE SIMILARITY
1.000
θ = 0°
drag background to orbit · drag circles to move vectors

The jump from 3 to 768 (or 3,072) dimensions is purely algebraic — the geometry does not change. Each additional dimension is another axis the embedding model uses to encode meaning. Cosine similarity measures whether two texts point in the same composite direction across all axes simultaneously.

Section 6

Application: knowledge graph retrieval

In a knowledge graph, the primary structure is curated edges — explicit relationships declared by a human or agent. Cosine similarity provides a secondary layer: connections that nobody declared but that exist in embedding space.

In practice:
similar — nearest neighbors in embedding space. "What else points this direction?"
surprise — nodes connected by curated edges but distant in embedding space. Structurally linked, semantically divergent.
threshold = 0.70 — a design decision, not a constant. Below 0.50, everything connects. Above 0.90, only near-duplicates match.