Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • tkylim/ece493t25-assignment2-spring2020
  • a2rajago/ece493t25-assignment2-spring2020
  • f28xie/ece493t25-assignment2-spring2020
  • x769zhan/ece493t25-assignment2-spring2020
  • jssahi/ece493t25-assignment2-spring2020
  • h66guo/ece493t25-assignment2-spring2020
  • sarias/ece493t25-assignment2-spring2020
  • drtimone/ece493t25-assignment2-spring2020
  • k4chopra/ece493t25-assignment2-spring2020
  • y454yang/ece493t25-assignment2-spring2020
  • h398li/ece493t25-assignment2-spring2020
  • y327lin/ece493t25-assignment2-spring2020
  • z5mohamm/ece493t25-assignment2-spring2020
  • d7gu/ece493t25-assignment2-spring2020
  • mm3moham/ece493t25-assignment2-spring2020
  • a24logan/ece493t25-assignment2-spring2020
  • xke2bui/ece493t25-assignment2-spring2020
  • dh47kim/ece493t25-assignment2-spring2020
  • xy2song/ece493t25-assignment2-spring2020
  • z683wang/ece493t25-assignment2-spring2020
  • rakif/ece493t25-assignment2-spring2020
  • yasegu/ece493t25-assignment2-spring2020
  • sl5chen/ece493t25-assignment2-spring2020
  • j285wu/ece493t25-assignment2-spring2020
  • s7dutta/ece493t25-assignment2-spring2020
25 results
Show changes
Commits on Source (6)
......@@ -11,7 +11,7 @@ Updates to code which will be useful for all or bugs in the provided code will b
## Domain Description - GridWorld
The domain consists of a 10x10 grid of cells. The agent being controlled is represented as a red square. The goal is a yellow oval and you receive a reward of 1 for reaching it, this ends and resets the episode.
Blue squares are **pits** which yield a penalty of -10 and end the episode.
Black squares are **walls** which cannot be passed through. If the agent tries to walk into a wall they will remain in their current position and receive a penalty of -.3.
Black squares are **walls** which cannot be passed through. If the agent tries to walk into a wall they will remain in their current position and receive a penalty of -.3. Apart from these, the agent will receive a -0.1 for reaching any other cell in the grid as the objective is to move to the goal state as quickly as possible.
There are **three tasks** defined in `run_main.py` which can be commented out to try each. They include a combination of pillars, rooms, pits and obstacles. The aim is to learn a policy that maximizes expected reward and reaches the goal as quickly as possible.
# <img src="task1.png" width="300"/><img src="task2.png" width="300"/><img src="task3.png" width="300"/>
......@@ -19,13 +19,13 @@ There are **three tasks** defined in `run_main.py` which can be commented out to
## Assignment Requirements
This assignment will have a written component and a programming component.
Clone the mazeworld environment locally and run the code looking at the implemtation of the sample algorithm.
Your task is to implement three other algortihms on this domain.
Clone the mazeworld environment locally and run the code looking at the implementation of the sample algorithm.
Your task is to implement four other algorithms on this domain.
- **(15%)** Implement Value Iteration
- **(15%)** Implement Policy Iteration
- **(15%)** Implement SARSA
- **(15%)** Implement QLearning
- **(40%)** Report : Write a short report on the problem and the results of your three algorithms. The report should be submited on LEARN as a pdf:
- **(40%)** Report : Write a short report on the problem and the results of your four algorithms. The report should be submited on LEARN as a pdf:
- Describing each algorithm you used, define the states, actions, dynamics. Define the mathematical formulation of your algorithm, show the Bellman updates you use.
- Some quantitative analysis of the results, a default plot for comparing all algorithms is given. You can do more plots than this.
- Some qualitative analysis of you observations where one algorithm works well in each case, what you noticed along the way, explain the differences in performance related to the algorithms.
......@@ -35,9 +35,9 @@ Your task is to implement three other algortihms on this domain.
You will also submit your code to LEARN and grading will be carried out using a combination of automated and manual grading.
Your algorithms should follow the pattern of the `RL_brain.py` and `RL_brainsample_PI.py` files.
We will look at your definition and implmentation which should match the description in the document.
We will also automatically run your code on the given domain on the three tasks define in `run_main.py` as well as other maps you have not seen in order to evaluate it.
We will also automatically run your code on the given domain on the three tasks defined in `run_main.py` as well as other maps you have not seen in order to evaluate it.
Part of your grade will come from the overall performance of your algorithm on each domain.
So make sure your code runs with the given unmodified `run_main` and `maze_end` code if we import your class names.
So make sure your code runs with the given unmodified `maze_env` code if we import your class names.
### Code Suggestions
......