⬡ Hub
Skip to content

advanced_reinforcement_learning_q_learning

Advanced - Reinforcement Learning (Q-Learning for CartPole)

Description

This project provides an introduction to Reinforcement Learning (RL) by implementing the Q-learning algorithm from scratch. The goal is to train an agent to solve the classic CartPole-v1 environment from the OpenAI Gym library. In this environment, the agent must learn to balance a pole on top of a movable cart by applying forces to the left or right.

Q-learning is a model-free RL algorithm that learns a policy by discovering the optimal action to take in a given state. It does this by learning a "Q-function," which estimates the value of taking a certain action in a certain state.

Functionality

  1. Environment Setup: The script initializes the CartPole-v1 environment using OpenAI Gym.
  2. State Discretization: The CartPole environment has a continuous state space. Since a basic Q-table requires discrete states, the script converts the continuous state values (cart position, cart velocity, pole angle, pole velocity) into a discrete set of bins.
  3. Q-Table Initialization: A Q-table is created with dimensions corresponding to the discretized state space and the number of possible actions (left or right). This table is initialized with zeros.
  4. Q-Learning Algorithm: The agent is trained over a series of episodes. In each step of an episode:
    • It uses an epsilon-greedy policy to decide whether to explore (take a random action) or exploit (take the best-known action based on the Q-table).
    • It takes an action and observes the reward and the next state from the environment.
    • It updates the Q-value for the state-action pair using the Bellman equation, which incorporates the reward received and the estimated future rewards.
  5. Training Visualization: After training, the script uses matplotlib to plot the total reward received in each episode, allowing you to visualize whether the agent successfully learned to balance the pole.

Architecture

  • OpenAI Gym: Provides the CartPole-v1 environment, which includes the simulation, state observations, and reward system.
  • numpy: Used for all numerical operations, most importantly for creating and updating the Q-table.
  • matplotlib: Used to plot the training progress, showing the rewards over time.
  • Q-Table: A multi-dimensional numpy array that serves as the agent's "brain." It stores the learned action-value function, mapping state-action pairs to expected rewards.

How to Run

Prerequisites

Make sure you have Python installed, along with the required libraries. You can install them using pip:

pip install gym numpy matplotlib

Execution

To run the project, navigate to the project directory and execute the following command:

python advanced_reinforcement_learning_q_learning.py

The script will print the agent's progress every 100 episodes. Once training is complete, it will display two plots showing the rewards per episode and a rolling average of the rewards.

Concepts Covered

  • Reinforcement Learning (RL): The fundamental paradigm of learning through interaction with an environment.
  • Agent, Environment, State, Action, Reward: The core components of an RL problem.
  • Q-Learning: A classic, value-based, off-policy RL algorithm.
  • Q-Table: The data structure used to store the learned state-action values.
  • Exploration vs. Exploitation: The trade-off between trying new actions and taking known good actions, managed here by an epsilon-greedy strategy.
  • Discount Factor (gamma): The importance given to future rewards.
  • Learning Rate (alpha): The extent to which new information overrides old information.
  • State Space Discretization: A technique to adapt continuous state spaces for use with tabular RL methods.

Files and Subdirectories