Skip to content

Commit

Permalink
(v2.2.5) - Multi-Objective Reward compability in Sinergym (#303)
Browse files Browse the repository at this point in the history
* Updated Sinergym version from 2.2.4 to 2.2.5

* Updated reward terms dictionary

* Improved environment layer info dict construction

* Added MultiObjectiveReward in wrappers.py

* Enhanced Sinergym logger: Using info for reward information in order to be more general

* Fixed loggerWrapper to logger update

* Added timestep and time_elapsed (both 0) to simulator info in reset

* Added fixture with multiobjective wrapper

* Added tests for multiobjective wrapper

* Fix simulator test to new info dict dimension

* Fixed pytype errors

* Update documentation modules for API reference documentation

* Documentation: Updated wrapper section

* Update wrapper notebook example

* Documentation: Updated reward section and added issue reference un multiObjective wrapper section
  • Loading branch information
AlejandroCN7 authored Mar 10, 2023
1 parent 50ce3f7 commit f33f130
Show file tree
Hide file tree
Showing 18 changed files with 398 additions and 245 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sinergym.utils.wrappers.LoggerWrapper
sinergym.utils.wrappers.LoggerWrapper
=====================================

.. currentmodule:: sinergym.utils.wrappers
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
sinergym.utils.wrappers.MultiObjectiveReward
============================================

.. currentmodule:: sinergym.utils.wrappers

.. autoclass:: MultiObjectiveReward
:members:
:undoc-members:


.. automethod:: __init__


.. rubric:: Methods

.. autosummary::

~MultiObjectiveReward.__init__
~MultiObjectiveReward.class_name
~MultiObjectiveReward.close
~MultiObjectiveReward.render
~MultiObjectiveReward.reset
~MultiObjectiveReward.step





.. rubric:: Attributes

.. autosummary::

~MultiObjectiveReward.action_space
~MultiObjectiveReward.metadata
~MultiObjectiveReward.np_random
~MultiObjectiveReward.observation_space
~MultiObjectiveReward.render_mode
~MultiObjectiveReward.reward_range
~MultiObjectiveReward.spec
~MultiObjectiveReward.unwrapped


Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sinergym.utils.wrappers.MultiObsWrapper
sinergym.utils.wrappers.MultiObsWrapper
=======================================

.. currentmodule:: sinergym.utils.wrappers
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sinergym.utils.wrappers.NormalizeObservation
sinergym.utils.wrappers.NormalizeObservation
============================================

.. currentmodule:: sinergym.utils.wrappers
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sinergym.utils.wrappers.OfficeGridStorageSmoothingActionConstraintsWrapper
sinergym.utils.wrappers.OfficeGridStorageSmoothingActionConstraintsWrapper
==========================================================================

.. currentmodule:: sinergym.utils.wrappers
Expand Down
1 change: 1 addition & 0 deletions docs/source/pages/modules/sinergym.utils.wrappers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
:template: custom-class-template.rst

LoggerWrapper
MultiObjectiveReward
MultiObsWrapper
NormalizeObservation
OfficeGridStorageSmoothingActionConstraintsWrapper
Expand Down
3 changes: 3 additions & 0 deletions docs/source/pages/rewards.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ But you can change this configuration using ``gym.make()`` as follows:
'range_comfort_summer': (23.0, 26.0),
'energy_weight': 0.1})
.. note:: By default, reward class will return the reward value and the terms used in its calculation.
This terms will be added to info dict in environment automatically.

.. warning:: When specifying a different reward with `gym.make` than the
default environment ID, it is very important to set the `reward_kwargs`
that are required and therefore do not have a default value.
Expand Down
52 changes: 14 additions & 38 deletions docs/source/pages/wrappers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,49 +4,25 @@ Wrappers

*Sinergym* has several **wrappers** in order to add some functionality in the environment
that it doesn't have by default. Currently, we have developed a **normalization wrapper**,
**multi-observation wrapper** and **Logger wrapper**. The code can be found in
**multi-observation wrapper**, **multi-objective wrapper** and **Logger wrapper**. The code can be found in
`sinergym/sinergym/utils/wrappers.py <https://github.com/ugr-sail/sinergym/blob/main/sinergym/utils/wrappers.py>`__.
You can implement your own wrappers inheriting from *gym.Wrapper* or some of its variants.

An usage of these wrappers could be the next:

.. code:: python
import gymnasium as gym
import sinergym
from sinergym.utils.wrapper import LoggerWrapper, NormalizeObservation
env = gym.make('Eplus-5Zone-hot-continuous-v1')
env = NormalizeObservation(env)
env = LoggerWrapper(env)
...
for i in range(1):
obs, info = env.reset()
rewards = []
terminated = False
current_month = 0
while not terminated:
a = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
- **NormalizeObservation**: It is used to transform observation received from simulator in values between 0 and 1.

- **LoggerWrapper**: Wrapper for logging all interactions between agent and environment. Logger class can be selected
in the constructor if other type of logging is required. For more information about *Sinergym* Logger visit :ref:`Logger`.

- **MultiObjectiveReward**: Environment step will return a vector reward (selected elements in wrapper constructor,
one for each objective) instead of a traditional scalar value. See `#301 <https://github.com/ugr-sail/sinergym/issues/301>`__.

- **MultiObsWrapper**: Stack observation received in a history queue (size customizable).

.. note:: For examples about how to use these wrappers, visit :ref:`Wrappers example`.

.. warning:: The order of wrappers if you are going to use several at the same time is really important.
The correct order is **Normalization - Logger - MultiObs** and subsets (for example, *Normalization* - *Multiobs* is valid).
The correct order is **Normalization - Logger - Multi-Objectives - MultiObs** and subsets (for example, *Normalization* - *Multiobs* is valid).

.. warning:: If you add new observation variables to the environment than the default ones, you have
to update the **value range dictionary** in `sinergym/sinergym/utils/constants.py <https://github.com/ugr-sail/sinergym/blob/main/sinergym/utils/constants.py>`__
so that normalization can be applied correctly. Otherwise, you will encounter bug `#249 <https://github.com/ugr-sail/sinergym/issues/249>`__.

.. note:: For more information about *Sinergym* Logger, visit :ref:`Logger`.
so that normalization can be applied correctly. Otherwise, you will encounter bug `#249 <https://github.com/ugr-sail/sinergym/issues/249>`__.
Loading

0 comments on commit f33f130

Please sign in to comment.