advanced_reinforcement_learning_q_learning
Advanced - Reinforcement Learning (Q-Learning for CartPole)
Description
This project provides an introduction to Reinforcement Learning (RL) by implementing the Q-learning algorithm from scratch. The goal is to train an agent to solve the classic CartPole-v1 environment from the OpenAI Gym library. In this environment, the agent must learn to balance a pole on top of a movable cart by applying forces to the left or right.
Q-learning is a model-free RL algorithm that learns a policy by discovering the optimal action to take in a given state. It does this by learning a "Q-function," which estimates the value of taking a certain action in a certain state.
Functionality
- Environment Setup: The script initializes the
CartPole-v1environment using OpenAI Gym. - State Discretization: The CartPole environment has a continuous state space. Since a basic Q-table requires discrete states, the script converts the continuous state values (cart position, cart velocity, pole angle, pole velocity) into a discrete set of bins.
- Q-Table Initialization: A Q-table is created with dimensions corresponding to the discretized state space and the number of possible actions (left or right). This table is initialized with zeros.
- Q-Learning Algorithm: The agent is trained over a series of episodes. In each step of an episode:
- It uses an epsilon-greedy policy to decide whether to explore (take a random action) or exploit (take the best-known action based on the Q-table).
- It takes an action and observes the reward and the next state from the environment.
- It updates the Q-value for the state-action pair using the Bellman equation, which incorporates the reward received and the estimated future rewards.
- Training Visualization: After training, the script uses
matplotlibto plot the total reward received in each episode, allowing you to visualize whether the agent successfully learned to balance the pole.
Architecture
OpenAI Gym: Provides theCartPole-v1environment, which includes the simulation, state observations, and reward system.numpy: Used for all numerical operations, most importantly for creating and updating the Q-table.matplotlib: Used to plot the training progress, showing the rewards over time.- Q-Table: A multi-dimensional
numpyarray that serves as the agent's "brain." It stores the learned action-value function, mapping state-action pairs to expected rewards.
How to Run
Prerequisites
Make sure you have Python installed, along with the required libraries. You can install them using pip:
pip install gym numpy matplotlib
Execution
To run the project, navigate to the project directory and execute the following command:
python advanced_reinforcement_learning_q_learning.py
The script will print the agent's progress every 100 episodes. Once training is complete, it will display two plots showing the rewards per episode and a rolling average of the rewards.
Concepts Covered
- Reinforcement Learning (RL): The fundamental paradigm of learning through interaction with an environment.
- Agent, Environment, State, Action, Reward: The core components of an RL problem.
- Q-Learning: A classic, value-based, off-policy RL algorithm.
- Q-Table: The data structure used to store the learned state-action values.
- Exploration vs. Exploitation: The trade-off between trying new actions and taking known good actions, managed here by an epsilon-greedy strategy.
- Discount Factor (gamma): The importance given to future rewards.
- Learning Rate (alpha): The extent to which new information overrides old information.
- State Space Discretization: A technique to adapt continuous state spaces for use with tabular RL methods.