Deep Learning 101: Lesson 27: Understanding Word Embeddings

Muneeb S. Ahmad
4 min readSep 3, 2024

--

This article is part of the “Deep Learning 101” series. Explore the full series for more insights and in-depth learning here.

Word embedding is a technique in NLP where words or phrases from the vocabulary are mapped to vectors of real numbers, effectively translating text into a form that can be understood by machine learning algorithms. Unlike traditional word embeddings, which provide a single representation per word, models like BERT provide “contextual embeddings”. This means that the same word can have different representations based on its context within a sentence, leading to more nuanced language understanding. In Transformer models, word embeddings serve as the initial input representations. These embeddings are then processed through multiple layers of the Transformer to capture complex linguistic relationships.

Let’s consider a very simple example of word embedding in vector form, as shown in the table below, with three columns labeled “Gender”, “Age”, and “Royalty”.

Figure 1: Word Embedding in 3 Dimensions

These columns represent the different dimensions or attributes of the words listed in the rows of the table. These rows list different words such as “grandfather,” “man,” “woman,” and so on. Each word in a row has corresponding values in the three columns, providing a numerical representation for that word in terms of its gender, age, and royalty attributes. These numeric values are the components of the embedding vector. This is an example of word embedding in 3 dimensions and is provided for understanding the concept. In real NLP systems, the dimension of the word embedding vector can be in the hundreds.

Let’s analyze each of the dimensions in this word embedding example. For the gender dimension, positive values indicate femininity, while negative values indicate masculinity. For example, “woman” has a value of 0.77, indicating strong femininity, while “man” has a value of -0.72, indicating strong masculinity. Gender-neutral words such as “monarch” will have values close to zero, such as 0.07, indicating gender neutrality.

For the Age dimension, positive values represent older age and negative values represent youth. “Grandfather” has a value of 0.64, indicating older age, while “Infant” has a value of -0.71, indicating very young age.

For the royalty dimension, positive values indicate royalty, while negative values indicate non-royalty. “Monarch has a high value of 0.88, indicating strong royal status, while Child has a value of -0.81, indicating a lack of royal association.

Essentially, these vectors provide a numerical representation of the characteristics of each word. For example, “queen” has positive values on both gender and royalty, indicating that it’s associated with femininity and royalty. On the other hand, “boy” has negative values on all dimensions, suggesting masculinity, youth, and non-royalty.

Next, we can visualize these embedding vectors in 3D by plotting the word embedding vectors as a 3D scatterplot, as shown below.

Figure 2: 3D Plot of Word Embeddings

In this plot, each embedding vector is represented by a blue dot, with the gender, age, and royalty components on the x, y, and z axes, respectively, and the origin shown as a red dot. By visually inspecting these words in 3D space, we can see if and in what aspects the two words are close to each other.

Summary

Word embeddings translate words into numerical vectors, capturing their characteristics and relationships in a way that can be processed by machine learning algorithms. By representing words in multi-dimensional space, embeddings enable models to understand and differentiate between words based on their contextual meanings and attributes. This capability is essential for various natural language processing tasks, such as sentiment analysis, machine translation, and text classification. Traditional word embeddings, like Word2Vec, provide a single vector representation for each word, while advanced models like BERT generate contextual embeddings that vary according to the word’s context within a sentence. This context-sensitive representation enhances the model’s ability to grasp nuanced language patterns and relationships. Visualizing these embeddings in 3D space helps illustrate how words with similar meanings or attributes cluster together, providing insights into the underlying structure of language. Overall, word embeddings serve as a foundational element in modern NLP, driving advancements in how machines interpret and generate human language.

4 Ways to Learn

1. Read the article: Word Embeddings

2. Play with the visual tool: Word Embeddings

Play with the visual tool: Word Embeddings

3. Watch the video: Word Embeddings

4. Practice with the code: Word Embeddings

--

--

Muneeb S. Ahmad

Muneeb Ahmad is a Senior Microservices Architect and Recognized Educator at IBM. He is pursuing passion in ABC (AI, Blockchain, and Cloud)