Word Embeddings (Word2Vec, GloVe)

What embeddings are

Embeddings map words to dense vectors:

Example intuition:

BoW/TF-IDF are sparse and don’t capture similarity.

Embeddings capture:

false

  flowchart LR
  W[Word] --> E[Embedding vector (dense)]
  E --> M[Model]

Learns embeddings by predicting:

Learns from global word co-occurrence statistics.

In modern NLP, embeddings are often learned as part of a deep model (Transformers). But Word2Vec/GloVe are great for understanding the concept.

What does it mean if two words have a high cosine similarity between embeddings?

If this helped you, consider buying me a coffee ☕