Word Embeddings

While TF-IDF is highly effective for basic document classification, it fails to capture semantic meaning. To a TF-IDF algorithm, the word "King" and "Queen" mathematically have absolutely zero relationship.

1. Dense Vectors (Word2Vec & GloVe)

Instead of assigning a single number to a word based on how often it appears, Word Embeddings represent a word as an $N$-dimensional spatial vector (usually 300 dimensions).

The Geometry of Meaning: If you train a Neural Network to predict missing words in millions of Wikipedia articles, the network eventually learns that words appearing in similar contexts (e.g., "I drive a ___" $\rightarrow$ Car, Truck, Vehicle) must have similar mathematical representations.
The Result: The words "Dog" and "Puppy" will physically be placed next to each other in 300-dimensional mathematical space.

2. Mathematical King - Man + Woman = Queen

Because meaning is mapped to spatial coordinates, you can perform actual arithmetic on language. If you take the vector coordinates for [King], subtract the vector for [Man], and add the vector for [Woman], the resulting coordinates will mathematically place you almost exactly on top of the word [Queen].

These dense vectors are universally used as the primary Input Layer for Deep Learning NLP models (LSTMs).

How to execute the examples:

Go to the Examples/ folder and run the script: python NLP_Word2Vec_GloVe.py