advanced_reinforcement_learning_q_learning

Advanced - Reinforcement Learning (Q-Learning for CartPole)

Description

This project provides an introduction to Reinforcement Learning (RL) by implementing the Q-learning algorithm from scratch. The goal is to train an agent to solve the classic CartPole-v1 environment from the OpenAI Gym library. In this environment, the agent must learn to balance a pole on top of a movable cart by applying forces to the left or right.

Q-learning is a model-free RL algorithm that learns a policy by discovering the optimal action to take in a given state. It does this by learning a "Q-function," which estimates the value of taking a certain action in a certain state.

Functionality

Environment Setup: The script initializes the CartPole-v1 environment using OpenAI Gym.
State Discretization: The CartPole environment has a continuous state space. Since a basic Q-table requires discrete states, the script converts the continuous state values (cart position, cart velocity, pole angle, pole velocity) into a discrete set of bins.
Q-Table Initialization: A Q-table is created with dimensions corresponding to the discretized state space and the number of possible actions (left or right). This table is initialized with zeros.
Q-Learning Algorithm: The agent is trained over a series of episodes. In each step of an episode:
- It uses an epsilon-greedy policy to decide whether to explore (take a random action) or exploit (take the best-known action based on the Q-table).
- It takes an action and observes the reward and the next state from the environment.
- It updates the Q-value for the state-action pair using the Bellman equation, which incorporates the reward received and the estimated future rewards.
Training Visualization: After training, the script uses matplotlib to plot the total reward received in each episode, allowing you to visualize whether the agent successfully learned to balance the pole.

Architecture

OpenAI Gym: Provides the CartPole-v1 environment, which includes the simulation, state observations, and reward system.
numpy: Used for all numerical operations, most importantly for creating and updating the Q-table.
matplotlib: Used to plot the training progress, showing the rewards over time.
Q-Table: A multi-dimensional numpy array that serves as the agent's "brain." It stores the learned action-value function, mapping state-action pairs to expected rewards.

How to Run

Prerequisites

Make sure you have Python installed, along with the required libraries. You can install them using pip:

pip install gym numpy matplotlib

Execution

To run the project, navigate to the project directory and execute the following command:

python advanced_reinforcement_learning_q_learning.py

The script will print the agent's progress every 100 episodes. Once training is complete, it will display two plots showing the rewards per episode and a rolling average of the rewards.

Concepts Covered

Reinforcement Learning (RL): The fundamental paradigm of learning through interaction with an environment.
Agent, Environment, State, Action, Reward: The core components of an RL problem.
Q-Learning: A classic, value-based, off-policy RL algorithm.
Q-Table: The data structure used to store the learned state-action values.
Exploration vs. Exploitation: The trade-off between trying new actions and taking known good actions, managed here by an epsilon-greedy strategy.
Discount Factor (gamma): The importance given to future rewards.
Learning Rate (alpha): The extent to which new information overrides old information.
State Space Discretization: A technique to adapt continuous state spaces for use with tabular RL methods.

Files and Subdirectories

📄 advanced_reinforcement_learning_q_learning.py