deep reinforcement learning

by diego llanes

what is reinforcement learning?

there are many different types of ML that we are familiar with

  • supervised learning
  • unsupervised learning
  • reinforcement learning (RL)

what is reinforcement learning?

what separates RL from these other branches of ML?

typically in ML, you have:

  • inputs: $x$
  • targets: $y$

except in the case of unsupervised! (only $y$)

what is reinforcement learning?

so what are the inputs and outputs in RL???

short answer: it's a little less clear, it is a little less cut and dry.

expected outcomes

let's talk expected outcomes:
  • when to use drl (why)
  • basic familiarity with drl keywords
  • a gained appreciation for the field
  • where to start (libraries)

keywords

in RL you don't necessarily have $x$ and $y$, but rather you have the following.
  • state-action tuples
  • an environment
  • an agent
  • a reward function

keywords['observation-space']

an observation space $(Y)$ is the set of features about the state that your controller has access to

an important thing to note is that, lots of the time you don't have the access to the full state $(X)$

these spaces can be continuous, discrete, or even pictorially!

keywords['action-space']

your action space $(U)$ is the set of all possible actions
continuous: an infinite number of possible actions
discrete: a finite number of possible actions

keywords['state-action']

A state-action tuple pair are what they sound like

they are tuples for a timestep $t$ that contain a state and an action for that cooresponding timestep

\[(y_t, a_t) \in \mathcal{Y} \times \mathcal{A}\]

keywords['environment']

an environment defines the state-action transition function

\[(s_t, a_t) \rightarrow s_{t+1}\]

keywords['agent']

an agent defines our control policy $(\pi(y_t) \rightarrow a_t)$

in deep reinforcement learning $\pi$ is a neural network, but in more traditional approaches, $\pi$ is a table

why drl???

"why use drl instead of something like supervised learning?" it's an important question!

drl becomes useful when you are operating in a
non-differentiable environment

why drl???

it is often the case that your reward functions are also "non-convex"
in the case of acrobot our reward function looks something like the following:
\[ r(x) = \begin{cases} 1, & \text{if } x > \text{threshold} \\ 0, & \text{otherwise} \end{cases} \]

why drl???

it is often the case that we are trying to solve what are called
sequential decision problems

these problems naturally lend themselves to being effectively solved by these techniques!

Why DRL??? (ABCD)

Which of the following might be an area wherein DRL could be useful?

  1. Sorting lists
  2. Fine-tuning language models
  3. Basic arithmetic operations
  4. Image compression

Why DRL??? (ABCD)

Which of the following might be an area wherein DRL could be useful?

  1. Sorting lists
  2. Fine-tuning language models
  3. Basic arithmetic operations
  4. Image compression

Why DRL??? (ABCD)

Which of the following might be an area wherein DRL could be useful?

  1. Sorting lists
  2. Fine-tuning language models
  3. Basic arithmetic operations
  4. Image compression

where do I even start???

well there are a few libraries that give us lots of support:

where do I even start???

lets check out an example using skrl & gymnasium farama!

just to be clear this stuff is stolen straight from skrl