Homework #5

EE 7280 Homework #5
The following Mountain Car Task is taken from the textbook (Sutton and Barto, 2018). Consider the task of driving an
underpowered car up a steep mountain road. The difficulty is that gravity is stronger than the car’s engine, and even at
full throttle the car cannot accelerate up the steep slope. The only solution is to first move away from the goal and up
the opposite slope on the left. Then, by applying full throttle the car can build up enough inertia to carry it up the
steep slope even though it is slowing down the whole way. This is a simple example of a control task where
things have to get worse in a sense (farther from the goal) before they can get better. Many control methodologies
have great difficulties with tasks of this kind unless explicitly aided by a human designer.
The reward in this problem is -1 on all time steps until the car moves past its goal position at the top of the mountain,
which ends the episode ( with a reward of zero). There are three possible actions: full throttle forward (+1), full throttle
reverse (-1), and zero throttle (0). The car moves according to a simplified physics. Its position,
x , and velocity, x␒ ,
are updated by:
xt+1 = bound [xt +x␒ t+1]
x␒ = +0.001A -0.0025 3x
t+1 bound [x␒ t t cos( t)]
where the bound operation enforces -1.2 ⩽ xt+1 ⩽ 0.5 and -0.07 ⩽ ⩽ x␒ t+1 0.07 . In addition, when reached the xt+1
left bound, was reset to zero. When it reached the right bound, the goal was reached and the episode was x␒ t+1
terminated. Each episode started from a random position xt ∈ [ ] -0.6, -0.4 and zero velocity.
In this assignment, we will implement the value iteration algorithm to solve this control problem. Read carefully the
Matlab start code provided and complete the missing part in the function “ValueIteration.m”. Your submission to the
dropbox should include the following: (1) the completed “ValueIteration.m” file, (2) the “animation.gif” file
generated, (3) a short report briefly describing your results and observations with two Matlab plots included. One
shows the convergence errors during training iterations, and the other shows your testing results.