Compare revisions

Sriram Ganapathi Subramanian · Sriram Ganapathi Subramanian · Sriram Ganapathi Subramanian · Sriram Ganapathi Subramanian · Sriram Ganapathi Subramanian · Sriram Ganapathi Subramanian
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@ Updates to code which will be useful for all or bugs in the provided code will b
 ## Domain Description - GridWorld
 The domain consists of a 10x10 grid of cells. The agent being controlled is represented as a red square. The goal is a yellow oval and you receive a reward of 1 for reaching it, this ends and resets the episode.
 Blue squares are **pits** which yield a penalty of -10 and end the episode. 
-Black squares are **walls** which cannot be passed through. If the agent tries to walk into a wall they will remain in their current position and receive a penalty of -.3.
+Black squares are **walls** which cannot be passed through. If the agent tries to walk into a wall they will remain in their current position and receive a penalty of -.3. Apart from these, the agent will receive a -0.1 for reaching any other cell in the grid as the objective is to move to the goal state as quickly as possible.
 There are **three tasks** defined in `run_main.py` which can be commented out to try each. They include a combination of pillars, rooms, pits and obstacles. The aim is to learn a policy that maximizes expected reward and reaches the goal as quickly as possible.

 # <img src="task1.png" width="300"/><img src="task2.png" width="300"/><img src="task3.png" width="300"/>
@@ -19,13 +19,13 @@ There are **three tasks** defined in `run_main.py` which can be commented out to
 ## Assignment Requirements

 This assignment will have a written component and a programming component.
-Clone the mazeworld environment locally and run the code looking at the implemtation of the sample algorithm.
-Your task is to implement three other algortihms on this domain.
+Clone the mazeworld environment locally and run the code looking at the implementation of the sample algorithm.
+Your task is to implement four other algorithms on this domain.
 - **(15%)** Implement Value Iteration
 - **(15%)** Implement Policy Iteration
 - **(15%)** Implement SARSA
 - **(15%)** Implement QLearning
- **(40%)** Report : Write a short report on the problem and the results of your three algorithms. The report should be submited on LEARN as a pdf: 
+- **(40%)** Report : Write a short report on the problem and the results of your four algorithms. The report should be submited on LEARN as a pdf: 
    - Describing each algorithm you used, define the states, actions, dynamics. Define the mathematical formulation of your algorithm, show the Bellman updates you use.
    - Some quantitative analysis of the results, a default plot for comparing all algorithms is given. You can do more plots than this.
    - Some qualitative analysis of you observations where one algorithm works well in each case, what you noticed along the way, explain the differences in performance related to the algorithms.
@@ -35,9 +35,9 @@ Your task is to implement three other algortihms on this domain.
 You will also submit your code to LEARN and grading will be carried out using a combination of automated and manual grading.
 Your algorithms should follow the pattern of the `RL_brain.py` and `RL_brainsample_PI.py` files.
 We will look at your definition and implmentation which should match the description in the document.
-We will also automatically run your code on the given domain on the three tasks define in `run_main.py` as well as other maps you have not seen in order to evaluate it. 
+We will also automatically run your code on the given domain on the three tasks defined in `run_main.py` as well as other maps you have not seen in order to evaluate it. 
 Part of your grade will come from the overall performance of your algorithm on each domain.
-So make sure your code runs with the given unmodified `run_main` and `maze_end` code if we import your class names.
+So make sure your code runs with the given unmodified `maze_env` code if we import your class names.


 ### Code Suggestions
No results found