Skip to content
Snippets Groups Projects
Commit 9e0f0bd6 authored by Mark Crowley's avatar Mark Crowley
Browse files

Update README.md to include deadlines and algorihtms to work on

parent 1c5c0e18
No related branches found
No related tags found
No related merge requests found
......@@ -2,7 +2,7 @@
Assignment code for course ECE 493 T25 at the University of Waterloo in Spring 2020.
(*Code designed and created by Sriram Ganapathi Subramanian and Mark Crowley, 2020*)
**Due Date:** TBD: submitted as PDF and code to LEARN dropbox.
**Due Date:** July 5, 2020 by 11:50pm submitted as PDF report and code to the LEARN dropbox.
**Collaboration:** You can discuss solutions and help to work out the code. But each person *must do their own work*. All code and writing will be cross-checked against each other and against internet databases for cheating.
......@@ -12,7 +12,7 @@ Updates to code which will be useful for all or bugs in the provided code will b
The domain consists of a 10x10 grid of cells. The agent being controlled is represented as a red square. The goal is a yellow oval and you receive a reward of 1 for reaching it, this ends and resets the episode.
Blue squares are **pits** which yield a penalty of -10 and end the episode.
Black squares are **walls** which cannot be passed through. If the agent tries to walk into a wall they will remain in their current position and receive a penalty of -.3.
Their are **three tasks** defined in `run_main.py` which can be commented out to try each. They include a combination of pillars, rooms, pits and obstacles. The aim is to learn a policy that maximizes expected reward and reaches the goal as quickly as possible.
There are **three tasks** defined in `run_main.py` which can be commented out to try each. They include a combination of pillars, rooms, pits and obstacles. The aim is to learn a policy that maximizes expected reward and reaches the goal as quickly as possible.
# <img src="task1.png" width="300"/><img src="task2.png" width="300"/><img src="task3.png" width="300"/>
......@@ -21,20 +21,14 @@ Their are **three tasks** defined in `run_main.py` which can be commented out to
This assignment will have a written component and a programming component.
Clone the mazeworld environment locally and run the code looking at the implemtation of the sample algorithm.
Your task is to implement three other algortihms on this domain.
- **(20%)** Implement SARSA
- **(20%)** Implement QLearning
- **(20%)** At least one other algorithm of your choice or own design.
Suggestions to try:
- Policy Iteration (easy)
- Expected SARSA (less easy)
- Double Q-Learning (less easy)
- n-step TD or TD(Lambda) with eligibility traces (harder)
- Policy Gradients (harderer)
- **(10%) bonus** Implement four algorithms in total (you can do more but we'll only look at four, you need to tell us which).
- **(40%)** Report : Write a short report on the problem and the results of your three algorithms. The report should be submited on LEARN as a pdf.
- Describing each algorithm you used, define the states, actions, dynamics. Define the mathematical formulation of your algorithm, show the Bellman updates for you use.
- Some quantitative analysis of the results, a default plot for comparing all algorithms is given. You can do more than that.
- Some qualitative analysis of why one algorithm works well in each case, what you noticed along the way.
- **(15%)** Implement Value Iteration
- **(15%)** Implement Policy Iteration
- **(15%)** Implement SARSA
- **(15%)** Implement QLearning
- **(40%)** Report : Write a short report on the problem and the results of your three algorithms. The report should be submited on LEARN as a pdf:
- Describing each algorithm you used, define the states, actions, dynamics. Define the mathematical formulation of your algorithm, show the Bellman updates you use.
- Some quantitative analysis of the results, a default plot for comparing all algorithms is given. You can do more plots than this.
- Some qualitative analysis of you observations where one algorithm works well in each case, what you noticed along the way, explain the differences in performance related to the algorithms.
### Evaluation
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment