smartstart.environments package¶
Submodules¶
smartstart.environments.environment module¶
Environment module
Contains base classes for constructing environments and visualizers
-
class
Environment
(name=None)¶ Bases:
object
Base class for constructing environments
-
reset
(start_state=None)¶ Reset the agent to the start state
Parameters: start_state ( np.ndarray
, optional) – Start state, if not supplied standard start state is used
-
step
(action)¶ Take a step in the environment
Parameters: action ( int
,float
ornp.ndarray
) – actionRaises: NotImplementedError
– use a subclass ofEnvironment
likeGridWorld
.
-
visualizer
¶ Visualizer
– visualizer of the agent, value function, density and/or console.Sets the name of the visualizer to be equal to the environments name
-
-
class
Visualizer
¶ Bases:
object
-
CONSOLE
= 0¶
-
DENSITY
= 3¶
-
LIVE_AGENT
= 1¶
-
VALUE_FUNCTION
= 2¶
-
add_visualizer
(*args)¶ Adds a window to the visualizer
Available windows can be found in the implemented class attributes
Parameters: args ( list
ofint
) – visualizers to add. Possible visualizers are defined in the class attributesRaises: NotImplementedError
-
render
(value_map=None, density_map=None, message=None, close=False)¶ Render the current state of the training algorithm
The render method should include all functionality to close the visualizer. When the close parameter is True the visualizer should be closed and send back to the learning algorithm to stop rendering.
Parameters: - value_map (
np.ndarray
, optional) – value function map - density_map (
np.ndarray
, optional) – density map - message (
str
: optional) – message to print to the console window - close (
bool
) – True if the visualizer has to be closed
Returns: True if the agent has to keep rendering
Return type: bool
Raises: NotImplementedError
- value_map (
-
smartstart.environments.generate_gridworld module¶
Random GridWorld Generator module
Defines methods necessary for generating a random gridworld. The generated gridworld is a numpy array with dtype int where each entry is a state.
- Attributes in the gridworld are defined with the following types:
- State
- Wall
- Start state
- Goal state
States will have a grid size of 3x3. States are separated by a 1x3 of 3x1 wall that can be turned in to a state.
smartstart.environments.gridworld module¶
GridWorld module
-
class
GridWorld
(name, layout, T_prob=0.0, wall_reset=False, scale=5)¶ Bases:
smartstart.environments.environment.Environment
GridWorld environment
Creates a GridWorld environment with the layout specified by the user. A layout is is double numpy array, where each entry is a state with a certain type. The possible types are:
- State
- Wall
- Start state
- Goal state
Example layout = [[3, 0, 0],[1, 1, 0],[1, 1, 0],[2, 0, 0]]
The agent can take discrete steps in 4 directions; up, right, down and left. When no transition probability is defined (T_prob=0) the environment will be deterministic (i.e. action up will bring the agent to state above the current state). When a transition probability is defined there is T_prob change the agent will choose a random action from the four actions.
When the agent tries to step through the wall it will count as a step but the agent will stay in the same state. When wall_reset is set to True the environment will return done to the learning algorithm.
A reward of 1. is given for reaching the goal state and the environment will return done=True to the learning algorithm.
Five different preset gridworlds are defined; EASY, MEDIUM, HARD, EXTREME and IMPOSSIBRUUHHH. The first three have a simple layout like, the EXTREME maze is a more complicated maze and the IMPOSSIBRUUHHH generates a random maze with a specified size.
Parameters: - name (
str
) – name for the environment, class name will be prefixed to this name - layout (double
list
of int or np.ndarray) – layout of the gridworld - T_prob (
float
) – transition probability - wall_reset (
bool
) – when True, the environment will return done when a wall is hit - scale (
int
) – scale factor, gridworld will become layout * scale in size
-
EASY
= 0¶
-
EXTREME
= 3¶
-
HARD
= 2¶
-
IMPOSSIBRUUHHH
= 4¶
-
MEDIUM
= 1¶
-
close_render
()¶ Close the visualizer
Returns: True if the visualizer was closed properly Return type: bool
-
classmethod
generate
(type=0, size=None)¶ Generates a gridworld according to the presets
Preset are given in
smartstart.environments.presets
. Presets contain the name, layout and scale of the gridworld.Parameters: - type (
int
) – gridworld type as defined in the class attributes (Default value = EASY) - size (
tuple
, optional) – used for generating a random gridworld when the type is IMPOSSIBRUUHH.
Returns: new gridworld environment
Return type: GridWorld
- type (
-
get_T_R
()¶ Return transition model and reward function
Creates a the transition model and reward function for the gridworld and transition probability. Can be used by dynamic programming, like
ValueIteration
.Returns: collections.defaultdict
– transition modelcollections.defaultdict
– reward function
-
get_all_states
()¶ Return all the states of the gridworld
Returns: states Return type: set
-
get_grid
()¶ Returns copy of the gridworld
Returns: Copy of the gridworld Return type: np.ndarray
-
possible_actions
(state)¶ Returns the available actions for state
Note
State is not used in this environment, might be necessary for more complicated environments where states don’t have the same set actions.
Parameters: state ( np.ndarray
) – stateReturns: containing available actions for state Return type: list
ofint
-
render
(**kwargs)¶ Render the environment
If no visualizer is defined prints it to the console.
Parameters: **kwargs – See GridWorldVisualizer
for detailsReturns: True if the agent has to stop rendering Return type: bool
-
reset
(start_state=None)¶ Parameters: start_state ( np.ndarray
, optional) – (Default value = None)Returns: state of the agent (start state) Return type: np.ndarray
-
step
(action)¶ Take a step in the environment
Executes action and retrieves the new state. Determines if any wall is hit and if so goes back to previous state.
Calculates the reward using the previous state, action and new state. Determines if the new state is terminal.
Parameters: action ( int
) – actionReturns: np.ndarray
– new statefloat
– rewardbool
– True if new state is terminaldict
– Empty info dict (to make it equal to the step method in the OpenAI Gym)
smartstart.environments.gridworldvisualizer module¶
GridWorld Visualizer module
-
class
GridWorldVisualizer
(env, name='GridWorld', size=None, fps=60)¶ Bases:
smartstart.environments.environment.Visualizer
Render the gridworld an optional extra visualizations. Available visualizers:
- console
- live agent
- value function
- density
Parameters: - env (
GridWorld
) – GridWorld environment - name (
str
) – visualizer name - size (
tuple
) – size of each window in the visualizer (Default = (450, 450)) - fps (
int
) – frames per second for pygame
-
name
¶ str
– visualizer name
-
size
¶ tuple
– size of each window in the visualizer (Default = (450, 450))
-
fps
¶ int
– frames per second for pygame
-
screen
¶ pygame.Surface
– pygame surface for rendering
-
clock
¶ pygame.Clock
– pygame clock
-
grid
¶ np.ndarray
– GridWorld grid
-
colors
¶ dict
– colors for rendering the GridWorld
-
messages
¶ collections.deque
– deque with messages to show in the console window (maxlen=29)
-
active_visualizers
¶ set
– contain the visualizers to be used
-
add_visualizer
(*args)¶ Add visualizer
Parameters: *args – visualizers to add, see class attributes for available visualizers
-
render
(value_map=None, density_map=None, message=None, close=False)¶ Render the current state of the GridWorld
Parameters: - value_map (
np.ndarray
) – (Default value = None) - density_map (
np.ndarray
) – (Default value = None) - message (
str
) – (Default value = None) - close (
bool
) – (Default value = False)
Returns: True if the agent has to stop rendering
Return type: bool
- value_map (
smartstart.environments.presets module¶
Preset GridWorlds module
Are used by the GridWorld
to
generate the gridworld.
-
easy
()¶ Easy GridWorld
Returns: str
– name- double
list
of int – layout int
– scale
-
extreme
()¶ Extreme GridWorld
Returns: str
– name- double
list
of int – layout int
– scale
-
hard
()¶ Hard GridWorld
Returns: str
– name- double
list
of int – layout int
– scale
-
medium
()¶ Medium GridWorld
Returns: str
– name- double
list
of int – layout int
– scale