Reinforcement Learning¶

Python notebook: https://github.com/daviskregers/data-science-recap/blob/main/22-reinforcement-learning.ipynb

You have some sort of agent that explores some space
As it goes, it learns the value of different state changes in different conditions
Those values inform subsequent behaviour of the agent
Examples: Pac-Man, Cat & mouse game
Yields fast on-line performance once the space has been expored

Q-learning¶

A specific implementation of reinforcement learning
You have:
- A set of enviromental variables s
- A set of possible actions in those states a
- A value of each state/action Q
Start off with Q values of 0
Explore the space
As bad things happen after a given state/action, reduce it's Q
As rewards happen after a given state/action - increase it's Q
What are some state/actions here?
- Pac-man has a wall to the west
- Pac-man dies if he moves one step south
- Pac-man just continues to live if going north or east
You can "look ahead" more than one step by using a discount factor when computing Q (here s is previous state, s' is current state.
\(Q(s,a) += discount * (reward(s,a) + max(Q(s')) - Q(s,a))\)

You can make an intellingeng pac-man in a few steps:
- Have it semi-randomly explore different choices of movement (actions) given different conditions (states)
- Keep track of the reward or penalty associated with each choice for a given state/action (Q)
- Use those stored Q values to inform it's future choices

Python Markov Decision Process toolbox:
- https://pymdptoolbox.readthedocs.io/en/latest/
Cat&mouse example:
- https://github.com/studywolf/blog/tree/master/RL/Cat%20vs%20Mouse%20exploration