⬡ Hub
Skip to content

beginner_sentiment_analysis_nltk

Beginner - Sentiment Analysis with NLTK

Description

This project is a beginner-friendly introduction to Natural Language Processing (NLP) and sentiment analysis. It uses the NLTK (Natural Language Toolkit) library in Python to determine the sentiment (positive, negative, or neutral) of text sentences.

Instead of building a model from scratch, this project leverages VADER (Valence Aware Dictionary and sEntiment Reasoner), a pre-trained sentiment analysis tool included with NLTK. VADER is specifically designed to be good at analyzing sentiments expressed in social media and other short, informal texts.

Functionality

  1. NLTK Setup: The script first checks if the required NLTK vader_lexicon is downloaded and, if not, it downloads it automatically.
  2. Sentiment Analysis: It initializes the SentimentIntensityAnalyzer from NLTK's VADER module.
  3. Process Text: The analyzer is run on a predefined list of sample sentences.
  4. Interpret Scores: For each sentence, VADER returns four scores:
    • neg: The negativity score.
    • neu: The neutrality score.
    • pos: The positivity score.
    • compound: A normalized, weighted composite score that ranges from -1 (most negative) to +1 (most positive).
  5. Classify Sentiment: The script classifies each sentence as "Positive," "Negative," or "Neutral" based on the compound score.
  6. Display Results: The results, including the original text and all sentiment scores, are neatly displayed in a pandas DataFrame.

Architecture

  • NLTK (Natural Language Toolkit): A powerful Python library for working with human language data. It provides the tools for various NLP tasks.
  • VADER: A lexicon and rule-based sentiment analysis tool that is part of NLTK. It is effective because it doesn't require any training data.
  • pandas: Used to present the final sentiment analysis results in a clean, readable, tabular format.

How to Run

Prerequisites

Make sure you have Python installed, along with the nltk and pandas libraries. You can install them using pip:

pip install nltk pandas

Execution

To run the project, navigate to the project directory and execute the following command:

python beginner_sentiment_analysis_nltk.py

On the first run, the script may take a moment to download the VADER lexicon from NLTK. After that, it will print a table containing the sentiment analysis results for the sample sentences.

Concepts Covered

  • Natural Language Processing (NLP): The field of AI focused on enabling computers to understand and process human language.
  • Sentiment Analysis: The task of identifying and categorizing opinions expressed in a piece of text.
  • Lexicon-Based (or Rule-Based) Analysis: An approach to NLP that relies on a pre-defined dictionary of words and rules, as opposed to a machine learning model.
  • NLTK: One of the foundational libraries for NLP in Python.
  • Text Preprocessing: While VADER handles most of it internally, this project introduces the idea of processing raw text for analysis.

Files and Subdirectories