Commit 8446c223 authored by Mark Crowley's avatar Mark Crowley
Browse files

added template files for expsarsa and doubql

parent 20ceb8f1
# TODO Before Release
- [ ] (Nham) update map domain start states to be a new setup from last year
- [ ] (Mark) reconsider tasks to be completed and weighting (drop policy iteration? add TDLambda?)
- [ ] (all) test out the current description, ensure all the claims are true, text is correct
- [ ] (nham) prepare testing harness code
# Maze World - Assignment 2
Assignment code for course ECE 493 T42 at the University of Waterloo in Spring 2021.
(*Code was initially designed and created by Sriram Ganapathi Subramanian and Mark Crowley, 2020*)
**Due Date:** July 9, 2021 by 11:59pm submitted as PDF report and code to the LEARN dropbox.
**Due Date:** July 9, 2021 by 11:59pm submitted as PDF report submitted to Crowdmark (link will be emailed) and code submitted to the LEARN dropbox for your group.
**Collaboration:** You can discuss solutions and help to work out the code. But each person *must do their own work*. All code and writing will be cross-checked against each other and against internet databases for cheating.
**Collaboration:** You can discuss solutions and help to work out the code. But the work of the assignemnt must be done either *alone* or in a group of just *two people*. All code and writing will be cross-checked against each other and against internet databases for cheating. If you are doing the assignment alone, you still need to join a group on LEARN in order to get a dropbox. If you are wroking with a partner then you need to sign up for a group on LEARN and Crowdmark to link your submissions.
Updates to code which will be useful for all or bugs in the provided code will be updated on gitlab and announced.
......@@ -26,20 +20,30 @@ There are **three tasks** defined in `run_main.py` which can be commented out to
This assignment will have a written component and a programming component.
Clone the mazeworld environment locally and run the code looking at the implementation of the sample algorithm.
Your task is to implement four other algorithms on this domain, using the corresponding skeleton.
- **(15%)** Implement Value Iteration (`RL_brainsample_VI.py`)
- **(15%)** Implement Policy Iteration (`RL_brainsample_PI.py`)
- **(15%)** Implement SARSA (`RL_brainsample_sarsa.py`)
- **(15%)** Implement QLearning (`RL_brainsample_qlearning.py`)
- **(40%)** Report : Write a short report on the problem and the results of your four algorithms. The report should be submited on LEARN as a pdf:
- Describing each algorithm you used, define the states, actions, dynamics. Define the mathematical formulation of your algorithm, show the Bellman updates you use.
- Some quantitative analysis of the results, a default plot for comparing all algorithms is given. You can do more plots than this.
- Some qualitative analysis of you observations where one algorithm works well in each case, what you noticed along the way, explain the differences in performance related to the algorithms.
For each of the following four algorithms you will implement them using the corresponding skeleton code, and provide a breif report on the problem and the results.
1. SARSA
- code: **(15%)** Implement SARSA (`RL_brainsample_sarsa.py`)
- report: **(10%)** Report on definition, design and results.
2. QLearning
- code: **(15%)** Implement QLearning (`RL_brainsample_qlearning.py`)
- report: **(10%)** Report on definition, design and results.
3. Expected SARSA
- code: **(15%)** Implement Expected SARSA (`RL_brainsample_expsarsa.py`)
- report: **(10%)** Report on definition, design and results.
4. Double QLearning
- code: **(15%)** Implement Double QLearning (`RL_brainsample_doubqlearning.py`)
- report: **(10%)** Report on definition, design and results.
### Report Details
The report should be submited on to Crowdmark as a pdf:
- Describing each algorithm you used, define the states, actions, dynamics. Define the mathematical formulation of your algorithm, show the Bellman updates you use.
- Some quantitative analysis of the results, a default plot for comparing all algorithms is given. You can do more plots than this.
- Some qualitative analysis of you observations where one algorithm works well in each case, what you noticed along the way, explain the differences in performance related to the algorithms.
### Evaluation
You will also submit your code to LEARN and grading will be carried out using a combination of automated and manual grading.
Your algorithms should follow the pattern of the `RL_brain.py` and `RL_brainsample_hacky_PI.py` files.
Your algorithms should follow the pattern of the `RL_brain.py`, which is duplicated in each of the particular algorithm files. The file `RL_brainsample_hacky_PI.py` gives a bad implementation of a Policy-Iteration-style algorithm to give you an idea how things work.
We will look at your definition and implmentation which should match the description in the document.
We will also automatically run your code on the given domain on the three tasks defined in `run_main.py` as well as other maps you have not seen in order to evaluate it.
Part of your grade will come from the overall performance of your algorithm on each domain.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment