Skip to content

Latest commit

 

History

History
 
 

Exercise 06

For this exercise we will have a look at n-step methods, which are the generalization of Monte-Carlo and TD learning algorithms. The environment under examination is given by the inverted pendulum, which is a popular system for toy examples of control theory.

Tasks:

  1. discretization of continuous state spaces in order to make corresponding systems available for tabular RL algorithms
  2. on-policy epsilon-greedy control using n-step Sarsa
  3. off-policy epsilon-greedy control using tree backups
  4. hyperparameter optimization for the Q(σ) algorithm