Minor docs improvements

Digitalized-Energy-Systems · Jan 28, 2025 · 67615b5 · 67615b5
1 parent 44fb851
commit 67615b5
Show file tree

Hide file tree

Showing 6 changed files with 61 additions and 52 deletions.
diff --git a/docs/source/advanced_features.rst b/docs/source/advanced_features.rst
@@ -14,15 +14,17 @@ Multi-Stage OPF
 The multi-stage OPF problem is an OPF that is performed over multiple time 
 steps, including constraint satisfaction over multiple time steps, for example,
 storage state-of-charge or ramping constraints.
-The multi-stage OPF can be implemented by overwriting the :meth:`step` method, as 
-shown in the 
+The multi-stage OPF can be implemented by overwriting the :meth:`step` method, 
+which can be done by inheriting from the :class:`MultiStageOpfEnv` class, 
+as shown in the 
 `multi-stage OPF example <https://github.com/Digitalized-Energy-Systems/opfgym/blob/development/opfgym/examples/multi_stage.py>`_.
 
 Security-Constrained OPF
 ------------------------
 The security-constrained OPF problem is an OPF were all constraints are also
 consideref for the N-1 case with line outages. It can be implemented by adding
-a loop to the :meth:`calculate_violations` method, as shown in the
+a loop to the :meth:`calculate_violations` method, which can be easily done by 
+inheriting from the :class:`SecurityConstrainedOpfEnv` class, as shown in the
 `security-constrained OPF example <https://github.com/Digitalized-Energy-Systems/opfgym/blob/development/opfgym/examples/security_constrained.py>`_.
 
 Mixed Continuous and Discrete Actions

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -35,3 +35,9 @@
 
 html_theme = 'furo'
 html_static_path = ['_static']
+
+html_theme_options = {
+    "source_repository": "https://github.com/Digitalized-Energy-Systems/opfgym",
+    "source_branch": "development",
+    "source_directory": "docs/source/",
+}
diff --git a/docs/source/environment_design.rst b/docs/source/environment_design.rst
@@ -18,65 +18,65 @@ Most environment design options are described in detail in
 
 TODO: Work in progress, more information will follow.
 
-Reward function
----------------
+.. Reward function
+.. ---------------
 
-The reward function represents the goal of the agent. In the case of the RL-OPF,
-the goal is to minimize the objective function while satisfying all constraints,
-which can be represented by penalties. 
+.. The reward function represents the goal of the agent. In the case of the RL-OPF,
+.. the goal is to minimize the objective function while satisfying all constraints,
+.. which can be represented by penalties. 
 
-Three different standard reward functions to combine the objective function and
-the constraint violations are available:
+.. Three different standard reward functions to combine the objective function and
+.. the constraint violations are available:
 
-Summation reward
-^^^^^^^^^^^^^^^^
-In the summation reward, we simply add the penalties :math:`p_i(x)` 
-for constraint violations in the current state :math:`x`
-to the negative objective function value :math:`f(x)`:
+.. Summation reward
+.. ^^^^^^^^^^^^^^^^
+.. In the summation reward, we simply add the penalties :math:`p_i(x)` 
+.. for constraint violations in the current state :math:`x`
+.. to the negative objective function value :math:`f(x)`:
 
-:math:`r = -f(x) - \sum_{i} p_i(x)`
+.. :math:`r = -f(x) - \sum_{i} p_i(x)`
 
-Replacement reward
-^^^^^^^^^^^^^^^^^^
-In the replacement reward, we only provide either the objective function value
-as a learning feedback or the penalty:
+.. Replacement reward
+.. ^^^^^^^^^^^^^^^^^^
+.. In the replacement reward, we only provide either the objective function value
+.. as a learning feedback or the penalty:
 
-If valid: :math:`r = -f(x) + C`
+.. If valid: :math:`r = -f(x) + C`
 
-Else: :math:`r = -\sum_{i} p_i(x)`
+.. Else: :math:`r = -\sum_{i} p_i(x)`
 
-Additionally, we need a constant :math:`C` to ensure that the valid reward is 
-always better than the invalid one.
+.. Additionally, we need a constant :math:`C` to ensure that the valid reward is 
+.. always better than the invalid one.
 
-Parameterized reward
-^^^^^^^^^^^^^^^^^^^^
-This reward combines the previous two and allows for all possible combinations:
+.. Parameterized reward
+.. ^^^^^^^^^^^^^^^^^^^^
+.. This reward combines the previous two and allows for all possible combinations:
 
-If valid: :math:`r = -f(x) + C_{valid}`
+.. If valid: :math:`r = -f(x) + C_{valid}`
 
-Else: :math:`r = w * -f(x) - \sum_{i} p_i(x) - C_{invalid}`
+.. Else: :math:`r = w * -f(x) - \sum_{i} p_i(x) - C_{invalid}`
 
-Note that if the objective weight :math:`w` is set to zero, it is equivalent to
-the replacement reward. If it is set to one and both constants 
-:math:`C` are set to zero, it is equivalent to the summation reward. 
+.. Note that if the objective weight :math:`w` is set to zero, it is equivalent to
+.. the replacement reward. If it is set to one and both constants 
+.. :math:`C` are set to zero, it is equivalent to the summation reward. 
 
 
-Observation space
------------------
+.. Observation space
+.. -----------------
 
-TODO: Work in progress, more information will follow.
+.. TODO: Work in progress, more information will follow.
 
-Action space
-------------
+.. Action space
+.. ------------
 
-TODO: Work in progress, more information will follow.
+.. TODO: Work in progress, more information will follow.
 
-Episode definition
-------------------
+.. Episode definition
+.. ------------------
 
-TODO: Work in progress, more information will follow.
+.. TODO: Work in progress, more information will follow.
 
-Training and test data
-----------------------
+.. Training and test data
+.. ----------------------
 
-TODO: Work in progress, more information will follow.
+.. TODO: Work in progress, more information will follow.
diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst
@@ -76,7 +76,7 @@ design options:
     env = QMarket(**kwargs)  
 
     # Interact with the environment in the usual way (see above)
-    # env.reset()
+    obs, info = env.reset()
     # ...
 
 For more information on environment design and why it is important, see
@@ -152,15 +152,16 @@ More details can be found in :ref:`Create Custom Environments`.
 
             return net, profiles
 
-    # Note that by inheriting from `OpfEnv`, all env design options are available
+    # Note that by inheriting from `OpfEnv`, all standard env design options are available
     kwargs = {
         # Add current line load to the observation space
         'add_res_obs': ['line_loading'],
+        # ...
     }
 
     # Load the custom environment
     env = CustomEnv(**kwargs)
 
     # Interact with the environment in the usual way (see above)
-    # env.reset()
+    obs, info = env.reset()
     # ...
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -3,6 +3,7 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
+
 OPF-Gym
 =======
 
@@ -24,7 +25,7 @@ benchmarks power grids and time-series data by default.
 All pandapower OPF variants can be represented as an RL environment by 
 *OPF-Gym*. Additionally, advanced OPF problems like multi-stage OPF,
 security-constrained OPF, mixed continuous and discrete actions, stochastic OPF,
-etc. are easily possible with *OPF-Gym*.
+etc. are possible as well.
 
 Contact thomas.wolgast@uol.de for questions, feedback, and collaboration.
 

diff --git a/docs/source/supervised_learning.rst b/docs/source/supervised_learning.rst
@@ -1,5 +1,5 @@
 Support for Supervised Learning
-==========
+===============================
 
 While the focus of *OPF-Gym* is on reinforcement learning and its environments, 
 it also enables comparability with other machine learning approaches like 
@@ -30,6 +30,5 @@ with the pandapower conventional OPF to generate ground-truth labels. That is
 the case for all provided :ref:`Benchmarks`. However, it might not be the case
 for custom environments, especially when implementing advanced OPF concepts 
 like multi-stage OPF or stochastic OPF. These are not solvable with the 
-pandapower OPF. In that case, you have to overwrite the 
-:py:meth:`env.run_optimal_power_flow` method of your custom environment and 
-provide your own OPF solver.
+pandapower OPF. In that case, you also have to provide your own OPF solver to 
+your custom environment, as described in :ref:`Create Custom Environments`.