machine learning

CS-7642 Reinforcement Learning

Ben Yu

Dec 25, 2022 • 3 min read

Instructor(s): Charles Isbell / Michael Littman
Course Page: Link

CS-7641 is a core course for the OMSCS Machine Learning specialization. It serves as a introduction to reinforcement learning, and a continuation of CS-7641 Machine Learning

At the time of writing, the course consists of 3 major written assignments, 6 homework assignments and a final exam. The course follows Richard Sutton's RL Book very heavily, as do most undergraduate/graduate courses nowadays. The assignments were the main highlight of the course and are designed to mostly be open-ended and force you to demonstrate your understanding of the material. You are challenged to write a technical paper (6 page max) usually either solving a particular reinforcement learning problem or replicating a key result in RL research.

Assignment 1 - Temporal Difference Learning

You are tasked with replicating key results from Sutton's seminal 1988 paper on temporal difference learning methods. We basically need to show that TD(λ) is more efficicient that perceptron learning. This intuitively makes sense as we're now updating our agent continuously rather than waiting for the final outcome label. We also run several experiments looking at the trade-offs between different lambda parameters. As with most things in ML, there's a tradeoff decision with setting lambda, as you want to balance how far you look into the future and how fast you propogate learnings to your agent.

Assignment 2 - Lunar Lander

We get to apply our learnings to solve harder and more state of the art toy learning problems. You are tasked with solving OpenAI's Lunar Lander environment. Your agent needs to land a 2D lander without crashing. Your lander has left/right and upward thrusters and you're rewarded if you land safely and softly within the target area.

A successful run of my lunar lander agent

To solve this problem we implement we leverage Deep Q-Networking and implement a DQN agent with action replay. This technique was first introduced an popularized by DeepMind researchers Mnih et al. back in 2015. I essentially replicated their algorithm verbatim from their paper in PyTorch (we are restricted from using any existing libraries like rl-baselines).

Assignment 3 - Football

The problems get harder! We are now tasked with solving a multi-agent reinforcement learning problem. In this assignment we're given a modified version of Google's Football environment, and we're tasked with training an agent that can play 3v3 football. If you thought traning one agent was already difficult, you know have the added problem of training several agents that have to co-ordinate and interact with your environment together. The goal is to demonstrate an improvement in agent behaviour compared to 3 provided baseline algorithms.

My paper ended up investigating how centralized critic methods improve learning performance and potentially help agents better co-ordinate with each other.

Conclusion

CS-7642 has been one of the more challenging and rewarding courses I've taken in OMSCS. Definetly complement your learning with other RL courses from other universities. Most notably I watched David Silver's RL Lectures. Berkley's Deep RL course was also extremely helpful for understanding current state of the art algorithms that weren't covered heavily in the lecture material like Deep Q-Learning and PPO. Reinforcement learning is very facintating field that's advancing very quickly. Most interstingly it played a pivotal part in ChatGPT's recent success, which relied on RL wit Human Feedback for it's training.

I'll be continuing my learning journey into the Spring as I take HuggingFace's Deep RL course. See you then!