Mastering NLP with GloVe Embeddings: Word Similarity, Sentiment Analysis, and More

Muneeb S. Ahmad
6 min readOct 22, 2024

--

Introduction

In this article, we will explore several fundamental Natural Language Processing (NLP) tasks using pre-trained GloVe embeddings. From calculating word similarity to performing text classification and named entity recognition (NER), we’ll see how GloVe embeddings can be leveraged for a wide range of NLP tasks. This article walks through a practical example using a Colab notebook, demonstrating key concepts of embedding-based NLP models.

By the end, you’ll understand how to use GloVe embeddings for tasks like:

  • Word similarity comparisons
  • Sentiment classification
  • Named entity recognition (NER)
  • Part-of-speech (POS) tagging

Understanding GloVe Embeddings

Before we dive into the tasks, let’s briefly understand what GloVe embeddings are.

GloVe (Global Vectors for Word Representation) is a popular word embedding technique that learns to represent words as vectors based on their co-occurrence statistics from a large corpus. These embeddings capture both semantic and syntactic relationships between words. For example, words like “king” and “queen” will have similar vector representations because they share semantic context.

We’ll use pre-trained GloVe embeddings throughout this notebook to build models for various NLP tasks.

Installing Dependencies

To begin, we need to install torchtext, a library that provides easy access to pre-trained GloVe embeddings. We’ll use it throughout the notebook.

!pip install torchtext==0.16.0

Loading Pre-Trained GloVe Embeddings

The first step is to load the pre-trained GloVe embeddings with 100 dimensions. GloVe captures relationships between words, which will be useful for solving tasks like word similarity, classification, and tagging.

from torchtext.vocab import GloVe
glove = GloVe(name='6B', dim=100)

Embedding-Based NLP Tasks

In this section, we will explore the diverse ways in which GloVe embeddings can be used in various natural language processing (NLP) tasks. From comparing the semantic similarity of words and sentences to more complex tasks like sentiment classification and part-of-speech tagging, GloVe provides a robust foundation for many NLP applications. By representing words as dense vectors, GloVe embeddings allow us to capture semantic relationships that are crucial for these tasks.

1. Word Similarity Using GloVe Embeddings

Let’s start with a simple example of calculating the similarity between words using cosine similarity. We’ll compare words like “king” and “queen” and check how closely related they are based on their GloVe embeddings.

word1 = "king"
word2 = "queen"
cosine_similarity = torch.nn.functional.cosine_similarity(glove[word1], glove[word2], dim=0)
print(f"Cosine similarity between '{word1}' and '{word2}': {cosine_similarity:.4f}")

This similarity score tells us how semantically similar the two words are. In our case, “king” and “queen” are quite similar, as reflected by a high cosine similarity score.

2. Sentence Similarity Using GloVe Embeddings

We can extend this concept to entire sentences. By averaging the GloVe embeddings of words in a sentence, we can represent the sentence as a vector and compare it to another sentence.

sentence1 = "The cat is on the mat"
sentence2 = "The dog is on the mat"
cosine_similarity = F.cosine_similarity(embedding_sentence1, embedding_sentence2, dim=0)
print(f"Cosine similarity between the sentences: {cosine_similarity.item():.4f}")

Here, we compare two similar sentences that differ by just one word (“cat” and “dog”). The high cosine similarity reflects their closeness.

3. Sentiment Classification Using GloVe Embeddings

Now, let’s move on to a more complex task: sentiment classification. In this task, we will classify the sentiment of a sentence (positive, negative, or neutral) using GloVe embeddings and a simple feed-forward neural network.

  • Dataset: Short sentences with labels for positive, negative, and neutral sentiment.
  • Model: A simple neural network that takes sentence embeddings as input and predicts sentiment.
# Example sentences
texts = ["This product is amazing!", "I'm very disappointed with this service.", "The weather today is average."]
# Labels: 0 -> Positive, 1 -> Negative, 2 -> Neutral

The model is trained on these sentences and can classify unseen sentences based on their GloVe embeddings.

4. Named Entity Recognition (NER) Using GloVe Embeddings

Next, we’ll tackle Named Entity Recognition (NER), a task that involves identifying entities in a sentence (such as people, locations, or organizations).

sentence = ["Barack", "Obama", "was", "born", "in", "Hawaii"]
labels = [1, 1, 0, 0, 0, 2] # 1 -> Person, 2 -> Location

We use GloVe embeddings for each word and build a neural network to classify whether a word is a person, location, or non-entity.\

5. Part-of-Speech (POS) Tagging Using GloVe Embeddings

In the final task, we will implement a POS tagger using GloVe embeddings. POS tagging is the process of labeling each word in a sentence with its part of speech (e.g., noun, verb, adjective).

train_sentences = [
["The", "dog", "chased", "the", "cat"],
["A", "man", "runs", "quickly"],
]
train_pos_tags = [
["DET", "NOUN", "VERB", "DET", "NOUN"],
["DET", "NOUN", "VERB", "ADV"]
]

The model learns to predict POS tags based on the GloVe embeddings of each word.

Try it Yourself: Explore the Interactive Google Colab Notebook

Now that you’ve seen how GloVe embeddings can be used for various NLP tasks, why not try it yourself? We’ve created an interactive Google Colab notebook where you can run all the examples discussed in this article. You can modify the code, experiment with your own data, and extend the models to suit your needs.

👉 Access the Interactive Google Colab Notebook Here

Colab allows you to run Python code in the cloud for free, so no setup is required. Simply click the link, and you can start running the code directly in your browser.

Explore Interactive NLP Tools at 101ai.net

Before we conclude, I encourage you to explore various interactive tools available on 101ai.net that allow you to visualize and experiment with NLP concepts like word embeddings, spam detection, and question answering.

These tools provide a hands-on experience to deepen your understanding of how NLP models work in practice. Below are the links and brief descriptions of the available tools:

1. Word Embedding Visualization

Explore how words like “king”, “queen”, “man”, and “woman” are represented in vector space. The tool provides an interactive way to see the relationships between word vectors based on GloVe embeddings.

👉 Try the Word Embedding Tool

2. Spam Detection

This tool allows you to classify a comment or sentence as spam or not-spam using a pre-trained model. Input your own sentences and see how the model detects spam in real-time.

👉 Try the Spam Detection Tool

3. Question Answering System

Test a pre-trained question-answering model that uses a context passage to answer questions. You can load example contexts or input your own text to see how the model retrieves answers from the passage.

👉 Try the Question Answering Tool

These tools offer an excellent way to interact with NLP models visually and gain practical insights into how they work. Feel free to explore these resources, experiment with different inputs, and enhance your learning through hands-on interaction.

Conclusion

In this article, we explored how to use pre-trained GloVe embeddings for various NLP tasks, including word similarity, sentiment classification, named entity recognition, and POS tagging. GloVe provides a powerful way to represent words as dense vectors, allowing us to capture semantic and syntactic relationships between words.

By leveraging these embeddings, we can build effective models for a variety of NLP tasks with relatively simple neural network architectures.

--

--

Muneeb S. Ahmad

Muneeb Ahmad is a Senior Microservices Architect and Recognized Educator at IBM. He is pursuing passion in ABC (AI, Blockchain, and Cloud)